Hi,

I am using spark streaming check-pointing mechanism and reading the data
from kafka. The window duration for my application is 2 hrs with a sliding
interval of 15 minutes.

So, my batches run at following intervals...
09:45
10:00
10:15
10:30  and so on

Suppose, my running batch dies at 09:55 and I restart the application at
12:05, then the flow is something like

At 12:05 it would run the 10:00 batch -> would this read the kafka offsets
from the time it went down (or 9:45)  to 12:00 ? or  just upto 10:10 ?
then next would 10:15 batch - what would be the offsets as input for this
batch ? ...so on for all the queued batches


Basically, my requirement is such that when the application is restarted at
12:05 then it should read the kafka offsets till 10:00  and then the next
queued batch takes offsets from 10:00 to 10:15 and so on until all the
queued batches are processed.

If this is the way offsets are handled for all the queued batched and I am
fine.

Or else please provide suggestions on how this can be done.



Thanks!!!

Reply via email to