Re: Understanding spark structured streaming checkpointing system

Ruijing Li Sun, 19 Apr 2020 17:19:23 -0700

It’s not intermittent, seems to happen everytime spark fails when it starts
up from last checkpoint and complains the offset is old. I checked the
offset and it is indeed true the offset expired from kafka side. My version
of spark is 2.4.4 using kafka 0.10


On Sun, Apr 19, 2020 at 3:38 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> That sounds odd. Is it intermittent, or always reproducible if you starts
> with same checkpoint? What's the version of Spark?
>
> On Fri, Apr 17, 2020 at 6:17 AM Ruijing Li <liruijin...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a question on how structured streaming does checkpointing. I’m
>> noticing that spark is not reading from the max / latest offset it’s seen.
>> For example, in HDFS, I see it stored offset file 30 which contains
>> partition: offset {1: 2000}
>>
>> But instead after stopping the job and restarting it, I see it instead
>> reads from offset file 9 which contains {1:1000}
>>
>> Can someone explain why spark doesn’t take the max offset?
>>
>> Thanks.
>> --
>> Cheers,
>> Ruijing Li
>>
> --
Cheers,
Ruijing Li

Re: Understanding spark structured streaming checkpointing system

Reply via email to