Honestly I'd concentrate more on getting your batches to finish in a timely
fashion, so you won't even have the issue to begin with...

On Tue, Sep 1, 2015 at 10:16 AM, Shushant Arora <shushantaror...@gmail.com>
wrote:

> What if I use custom checkpointing. So that I can take care of offsets
> being checkpointed at end of each batch.
>
> Will it be possible then to reset the offset.
>
> On Tue, Sep 1, 2015 at 8:42 PM, Cody Koeninger <c...@koeninger.org> wrote:
>
>> No, if you start arbitrarily messing around with offset ranges after
>> compute is called, things are going to get out of whack.
>>
>> e.g. checkpoints are no longer going to correspond to what you're
>> actually processing
>>
>> On Tue, Sep 1, 2015 at 10:04 AM, Shushant Arora <
>> shushantaror...@gmail.com> wrote:
>>
>>> can I reset the range based on some condition - before calling
>>> transformations on the stream.
>>>
>>> Say -
>>> before calling :
>>>  directKafkaStream.foreachRDD(new Function<JavaRDD<byte[][]>, Void>() {
>>>
>>> @Override
>>> public Void call(JavaRDD<byte[][]> v1) throws Exception {
>>> v1.foreachPartition(new  VoidFunction<Iterator<byte[][]>>{
>>> @Override
>>> public void call(Iterator<byte[][]> t) throws Exception {
>>> }});}});
>>>
>>> change directKafkaStream's RDD's offset range.(fromOffset).
>>>
>>> I can't do this in compute method since compute would have been called
>>> at current batch queue time - but condition is set at previous batch run
>>> time.
>>>
>>>
>>> On Tue, Sep 1, 2015 at 7:09 PM, Cody Koeninger <c...@koeninger.org>
>>> wrote:
>>>
>>>> It's at the time compute() gets called, which should be near the time
>>>> the batch should have been queued.
>>>>
>>>> On Tue, Sep 1, 2015 at 8:02 AM, Shushant Arora <
>>>> shushantaror...@gmail.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> In spark streaming 1.3 with kafka- when does driver bring latest
>>>>> offsets of this run - at start of each batch or at time when  batch gets
>>>>> queued ?
>>>>>
>>>>> Say few of my batches take longer time to complete than their batch
>>>>> interval. So some of batches will go in queue. Will driver waits for
>>>>>  queued batches to get started or just brings the latest offsets before
>>>>> they even actually started. And when they start running they will work on
>>>>> old offsets brought at time when they were queued.
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to