Honestly I'd concentrate more on getting your batches to finish in a timely fashion, so you won't even have the issue to begin with...
On Tue, Sep 1, 2015 at 10:16 AM, Shushant Arora <shushantaror...@gmail.com> wrote: > What if I use custom checkpointing. So that I can take care of offsets > being checkpointed at end of each batch. > > Will it be possible then to reset the offset. > > On Tue, Sep 1, 2015 at 8:42 PM, Cody Koeninger <c...@koeninger.org> wrote: > >> No, if you start arbitrarily messing around with offset ranges after >> compute is called, things are going to get out of whack. >> >> e.g. checkpoints are no longer going to correspond to what you're >> actually processing >> >> On Tue, Sep 1, 2015 at 10:04 AM, Shushant Arora < >> shushantaror...@gmail.com> wrote: >> >>> can I reset the range based on some condition - before calling >>> transformations on the stream. >>> >>> Say - >>> before calling : >>> directKafkaStream.foreachRDD(new Function<JavaRDD<byte[][]>, Void>() { >>> >>> @Override >>> public Void call(JavaRDD<byte[][]> v1) throws Exception { >>> v1.foreachPartition(new VoidFunction<Iterator<byte[][]>>{ >>> @Override >>> public void call(Iterator<byte[][]> t) throws Exception { >>> }});}}); >>> >>> change directKafkaStream's RDD's offset range.(fromOffset). >>> >>> I can't do this in compute method since compute would have been called >>> at current batch queue time - but condition is set at previous batch run >>> time. >>> >>> >>> On Tue, Sep 1, 2015 at 7:09 PM, Cody Koeninger <c...@koeninger.org> >>> wrote: >>> >>>> It's at the time compute() gets called, which should be near the time >>>> the batch should have been queued. >>>> >>>> On Tue, Sep 1, 2015 at 8:02 AM, Shushant Arora < >>>> shushantaror...@gmail.com> wrote: >>>> >>>>> Hi >>>>> >>>>> In spark streaming 1.3 with kafka- when does driver bring latest >>>>> offsets of this run - at start of each batch or at time when batch gets >>>>> queued ? >>>>> >>>>> Say few of my batches take longer time to complete than their batch >>>>> interval. So some of batches will go in queue. Will driver waits for >>>>> queued batches to get started or just brings the latest offsets before >>>>> they even actually started. And when they start running they will work on >>>>> old offsets brought at time when they were queued. >>>>> >>>>> >>>> >>> >> >