Re: Ordering of Batches in Spark streaming

2015-07-14 Thread Tathagata Das
This has been discussed in a number of threads in this mailing list. Here is a summary. 1. Processing of batch T+1 always starts after all the processing of batch T has completed. But here a batch is defined by data of all the receivers running the in the system receiving within the batch

Re: Ordering of Batches in Spark streaming

2015-07-12 Thread anshu shukla
Anyone who can give some highlight over HOW SPARK DOES *ORDERING OF BATCHES * . On Sat, Jul 11, 2015 at 9:19 AM, anshu shukla anshushuk...@gmail.com wrote: Thanks Ayan , I was curious to know* how Spark does it *.Is there any *Documentation* where i can get the detail about that . Will

Ordering of Batches in Spark streaming

2015-07-10 Thread anshu shukla
Hey , Is there any *guarantee of fix ordering among the batches/RDDs* . After searching a lot I found there is no ordering by default (from the framework itself ) not only on *batch wise *but *also ordering within batches* .But i doubt is there any change from old spark versions to spark

Re: Ordering of Batches in Spark streaming

2015-07-10 Thread anshu shukla
Thanks Ayan , I was curious to know* how Spark does it *.Is there any *Documentation* where i can get the detail about that . Will you please point me out some detailed link etc . May be it does something like *transactional topologies in storm*.(

Re: Ordering of Batches in Spark streaming

2015-07-10 Thread ayan guha
AFAIK, it is guranteed that batch t+1 will not start processing until batch t is done. ordeing within batch - what do you mean by that? In essence, the (mini) batch will get distributed in partitions like a normal RDD, so following rdd.zipWithIndex should give a wy to order them by the time they