Thanks Yu for sharing the use case.
>>If our system have some problem, such as hdfs issue, and the "first
batch" and "second batch" were both queued. When the issue gone, these two
batch will start together. Then, will onBatchStarted be called
concurrently for these two batches?<<
Not
Thank you Praveen
in our spark streaming, we write down the data to a HDFS directory, and
use the MMDDHHHmm00 format of batch time as the directory name.
So, when we stop the streaming and start the streaming again (we do not
use checkpoint), in the init of the first batch, we will write
Hi Yu,
Could you provide more details on what and how are you trying to
initialize.are you having this initialization as part of the code
block in action of the DStream? Say if the second batch finishes before
first batch wouldn't your results be affected as init would have not
hi spark users
I'm running a spark streaming application, with concurrentJobs > 1, so
maybe more than one batches could run together.
Now I would like to do some init work in the first batch based on the
"time" of the first batch. So even the second batch runs faster than the
first batch, I