[ 
https://issues.apache.org/jira/browse/SPARK-26267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726973#comment-16726973
 ] 

ASF GitHub Bot commented on SPARK-26267:
----------------------------------------

zsxwing opened a new pull request #23365: [SPARK-26267][SS] Retry when 
detecting incorrect offsets from Kafka (2.4)
URL: https://github.com/apache/spark/pull/23365
 
 
   ## What changes were proposed in this pull request?
   
   Backport #23324 to branch-2.4.
   
   ## How was this patch tested?
   
   Jenkins

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Kafka source may reprocess data
> -------------------------------
>
>                 Key: SPARK-26267
>                 URL: https://issues.apache.org/jira/browse/SPARK-26267
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: Shixiong Zhu
>            Assignee: Shixiong Zhu
>            Priority: Blocker
>              Labels: correctness
>
> Due to KAFKA-7703, when the Kafka source tries to get the latest offset, it 
> may get an earliest offset, and then it will reprocess messages that have 
> been processed when it gets the correct latest offset in the next batch.
> This usually happens when restarting a streaming query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to