Offsets out of order - Spark Datasource V2

Cressy, Taylor Tue, 21 May 2019 13:49:07 -0700

Hello Spark community,

Please let me know if this is the appropriate place to ask this question – will 
happily move it. I haven’t been able to find the answer going to the usual 
outlets.


I am currently implementing two custom readers for our projects (JMS / SQS) and 
am experiencing a problem where I can’t determine the root cause. I can’t share 
the code out as of right now, but I followed this as boiler plate: 
https://github.com/hienluu/wikiedit-streaming/blob/master/streaming-receiver/src/main/scala/org/twitterstreaming/receiver/TwitterStreamingSource.scala

The problem I am encountering is within my commit implementation – I seem to be 
getting commit ids out of order after the job runs for about 30-60 minutes. I 
am getting:

        Caused by: java.lang.RuntimeException: Offsets committed out of order: 
608799 followed by 2982

From line 206 
(https://github.com/hienluu/wikiedit-streaming/blob/master/streaming-receiver/src/main/scala/org/twitterstreaming/receiver/TwitterStreamingSource.scala#L206)

I have a vague suspicion that it is related to Spark reloading checkpoints? But 
don’t have anything concrete to confirm my suspicion.

Has anyone else encountered this issue? Or have any guidance on what I may be 
doing wrong?

Thanks,
Taylor Cressy

Offsets out of order - Spark Datasource V2

Reply via email to