Re: Understanding spark structured streaming checkpointing system

2020-04-19 Thread Ruijing Li
It’s not intermittent, seems to happen everytime spark fails when it starts up from last checkpoint and complains the offset is old. I checked the offset and it is indeed true the offset expired from kafka side. My version of spark is 2.4.4 using kafka 0.10 On Sun, Apr 19, 2020 at 3:38 PM

Re: Understanding spark structured streaming checkpointing system

2020-04-19 Thread Jungtaek Lim
That sounds odd. Is it intermittent, or always reproducible if you starts with same checkpoint? What's the version of Spark? On Fri, Apr 17, 2020 at 6:17 AM Ruijing Li wrote: > Hi all, > > I have a question on how structured streaming does checkpointing. I’m > noticing that spark is not reading

Understanding spark structured streaming checkpointing system

2020-04-16 Thread Ruijing Li
Hi all, I have a question on how structured streaming does checkpointing. I’m noticing that spark is not reading from the max / latest offset it’s seen. For example, in HDFS, I see it stored offset file 30 which contains partition: offset {1: 2000} But instead after stopping the job and