great questions, weide. in addition, i'd also like to hear more about how to horizontally scale a spark-streaming cluster.
i've gone through the samples (standalone mode) and read the documentation, but it's still not clear to me how to scale this puppy out under high load. i assume i add more receivers (kinesis, flume, etc), but physically how does this work? @TD: can you comment? thanks! -chris On Sun, May 4, 2014 at 2:10 PM, Weide Zhang <weo...@gmail.com> wrote: > Hi , > > It might be a very general question to ask here but I'm curious to know > why spark streaming can achieve better throughput than storm as claimed in > the spark streaming paper. Does it depend on certain use cases and/or data > source ? What drives better performance in spark streaming case or in other > ways, what makes storm not as performant as spark streaming ? > > Also, in order to guarantee exact-once semantics when node failure > happens, spark makes replicas of RDDs and checkpoints so that data can be > recomputed on the fly while on Trident case, they use transactional object > to persist the state and result but it's not obvious to me which approach > is more costly and why ? Any one can provide some experience here ? > > Thanks a lot, > > Weide >