Spark will simply have a backlog of tasks, it'll manage to process them
nonetheless, though if it keeps falling behind, you may run out of memory
or have unreasonable latency. For momentary spikes, Spark streaming will
manage.
Mostly if you are looking to do 100% processing, you'll have to go with 5
sec processing, alternative is to process data in two pipelines (.5 & 5 )
in two spark streaming jobs & overwrite results of one with the other.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>


On Sat, Sep 6, 2014 at 12:39 AM, qihong <qc...@pivotal.io> wrote:

> repost since original msg was marked with "This post has NOT been accepted
> by
> the mailing list yet."
>
> I have some questions regarding DStream batch interval:
>
> 1. if it only take 0.5 second to process the batch 99% of time, but 1% of
> batches need 5 seconds to process (due to some random factor or failures),
> then what's the right batch interval? 5 seconds (the worst case)?
>
> 2. What will happen to DStream processing if 1 batch took longer than batch
> interval? Can Spark recover from that?
>
> Thanks,
> Qihong
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-choose-right-DStream-batch-interval-tp13578p13579.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to