Re: spark streaming question

Chris Fregly Sun, 04 May 2014 17:12:22 -0700

great questions, weide.  in addition, i'd also like to hear more about how
to horizontally scale a spark-streaming cluster.


i've gone through the samples (standalone mode) and read the documentation,
but it's still not clear to me how to scale this puppy out under high load.
 i assume i add more receivers (kinesis, flume, etc), but physically how
does this work?

@TD:  can you comment?

thanks!

-chris


On Sun, May 4, 2014 at 2:10 PM, Weide Zhang <weo...@gmail.com> wrote:

> Hi ,
>
> It might be a very general question to ask here but I'm curious to know
> why spark streaming can achieve better throughput than storm as claimed in
> the spark streaming paper. Does it depend on certain use cases and/or data
> source ? What drives better performance in spark streaming case or in other
> ways, what makes storm not as performant as spark streaming ?
>
> Also, in order to guarantee exact-once semantics when node failure
> happens,  spark makes replicas of RDDs and checkpoints so that data can be
> recomputed on the fly while on Trident case, they use transactional object
> to persist the state and result but it's not obvious to me which approach
> is more costly and why ? Any one can provide some experience here ?
>
> Thanks a lot,
>
> Weide
>

Re: spark streaming question

Reply via email to