Re: Kafka Offsets after application is restarted using Spark Streaming Checkpointing

2015-11-15 Thread Cody Koeninger
Not sure on that, maybe someone else can chime in On Sat, Nov 14, 2015 at 4:51 AM, kundan kumar wrote: > Hi Cody , > > Thanks for the clarification. I will try to come up with some workaround. > > I have an another doubt. When my job is restarted, and recovers from the >

Re: Kafka Offsets after application is restarted using Spark Streaming Checkpointing

2015-11-15 Thread kundan kumar
Sure Thanks !! On Sun, Nov 15, 2015 at 9:13 PM, Cody Koeninger wrote: > Not sure on that, maybe someone else can chime in > > On Sat, Nov 14, 2015 at 4:51 AM, kundan kumar > wrote: > >> Hi Cody , >> >> Thanks for the clarification. I will try to come

Re: Kafka Offsets after application is restarted using Spark Streaming Checkpointing

2015-11-14 Thread kundan kumar
Hi Cody , Thanks for the clarification. I will try to come up with some workaround. I have an another doubt. When my job is restarted, and recovers from the checkpoint it does the re-partitioning step twice for each 15 minute job until the window of 2 hours is complete. Then the re-partitioning

Re: Kafka Offsets after application is restarted using Spark Streaming Checkpointing

2015-11-13 Thread Cody Koeninger
Unless you change maxRatePerPartition, a batch is going to contain all of the offsets from the last known processed to the highest available. Offsets are not time-based, and Kafka's time-based api currently has very poor granularity (it's based on filesystem timestamp of the log segment). There's

Kafka Offsets after application is restarted using Spark Streaming Checkpointing

2015-11-13 Thread kundan kumar
Hi, I am using spark streaming check-pointing mechanism and reading the data from kafka. The window duration for my application is 2 hrs with a sliding interval of 15 minutes. So, my batches run at following intervals... 09:45 10:00 10:15 10:30 and so on Suppose, my running batch dies at 09:55