Re: How to know whether I'm in the first batch of spark streaming

2016-04-21 Thread Praveen Devarao
Thanks Yu for sharing the use case. >>If our system have some problem, such as hdfs issue, and the "first batch" and "second batch" were both queued. When the issue gone, these two batch will start together. Then, will onBatchStarted be called concurrently for these two batches?<< Not

Re: How to know whether I'm in the first batch of spark streaming

2016-04-21 Thread Yu Xie
Thank you Praveen in our spark streaming, we write down the data to a HDFS directory, and use the MMDDHHHmm00 format of batch time as the directory name. So, when we stop the streaming and start the streaming again (we do not use checkpoint), in the init of the first batch, we will write

Re: How to know whether I'm in the first batch of spark streaming

2016-04-21 Thread Praveen Devarao
Hi Yu, Could you provide more details on what and how are you trying to initialize.are you having this initialization as part of the code block in action of the DStream? Say if the second batch finishes before first batch wouldn't your results be affected as init would have not

How to know whether I'm in the first batch of spark streaming

2016-04-19 Thread Yu Xie
hi spark users I'm running a spark streaming application, with concurrentJobs > 1, so maybe more than one batches could run together. Now I would like to do some init work in the first batch based on the "time" of the first batch. So even the second batch runs faster than the first batch, I