This has been discussed in a number of threads in this mailing list. Here
is a summary.
1. Processing of batch T+1 always starts after all the processing of batch
T has completed. But here a batch is defined by data of all the receivers
running the in the system receiving within the batch
Anyone who can give some highlight over HOW SPARK DOES *ORDERING OF
BATCHES * .
On Sat, Jul 11, 2015 at 9:19 AM, anshu shukla anshushuk...@gmail.com
wrote:
Thanks Ayan ,
I was curious to know* how Spark does it *.Is there any *Documentation*
where i can get the detail about that . Will
Hey ,
Is there any *guarantee of fix ordering among the batches/RDDs* .
After searching a lot I found there is no ordering by default (from the
framework itself ) not only on *batch wise *but *also ordering within
batches* .But i doubt is there any change from old spark versions to spark
Thanks Ayan ,
I was curious to know* how Spark does it *.Is there any *Documentation*
where i can get the detail about that . Will you please point me out some
detailed link etc .
May be it does something like *transactional topologies in storm*.(
AFAIK, it is guranteed that batch t+1 will not start processing until batch
t is done.
ordeing within batch - what do you mean by that? In essence, the (mini)
batch will get distributed in partitions like a normal RDD, so following
rdd.zipWithIndex should give a wy to order them by the time they