Hi Yu,

        Could you provide more details on what and how are you trying to 
initialize.....are you having this initialization as part of the code 
block in action of the DStream? Say if the second batch finishes before 
first batch wouldn't your results be affected as init would have not taken 
place (since you want it on first batch itself)?

        One way we could think of knowing the first batch is by 
implementing the StreamingListener trait which has a method onBatchStarted 
and onBatchCompleted...These methods should help you determine the first 
batch (definitely first batch will start first though order of ending is 
not guaranteed with concurrentJobs set to more than 1)...

        Would be interesting to know your use case...could you share, if 
possible?

Thanking You
---------------------------------------------------------------------------------
Praveen Devarao
Spark Technology Centre
IBM India Software Labs
---------------------------------------------------------------------------------
"Courage doesn't always roar. Sometimes courage is the quiet voice at the 
end of the day saying I will try again"



From:   Yu Xie <yuu...@gmail.com>
To:     user@spark.apache.org
Date:   19/04/2016 01:24 pm
Subject:        How to know whether I'm in the first batch of spark 
streaming



hi spark users

I'm running a spark streaming application, with concurrentJobs > 1, so 
maybe more than one batches could run together.

Now I would like to do some init work in the first batch based on the 
"time" of the first batch. So even the second batch runs faster than the 
first batch, I still need to init in the literal "first batch"

Then is there a way that I can know that?
Thank you



Reply via email to