[jira] [Updated] (SPARK-23685) Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)

sirisha (JIRA) Wed, 14 Mar 2018 12:08:26 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-23685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


sirisha updated SPARK-23685:
----------------------------
    Description: 
When Kafka does log compaction offsets often end up with gaps, meaning the next 
requested offset will be frequently not be offset+1. The logic in 
KafkaSourceRDD & CachedKafkaConsumer assumes that the next offset will always 
be just an increment of 1 .If not, it throws the below exception:

 

"Cannot fetch records in [5589, 5693) (GroupId: XXX, TopicPartition:XXXX). Some 
data may have been lost because they are not available in Kafka any more; 
either the data was aged out by Kafka or the topic may have been deleted before 
all the data in the topic was processed. If you don't want your streaming query 
to fail on such cases, set the source option "failOnDataLoss" to "false". "

 

FYI: This bug is related to https://issues.apache.org/jira/browse/SPARK-17147

 

 

  was:
When Kafka does log compaction offsets often end up with gaps, meaning the next 
requested offset will be frequently not be offset+1. The logic in 
KafkaSourceRDD & CachedKafkaConsumer assumes that the next offset will always 
be just an increment of 1 .If not, it throws the below exception:

 

"Cannot fetch records in [5589, 5693) (GroupId: XXX, TopicPartition:XXXX). Some 
data may have been lost because they are not available in Kafka any more; 
either the data was aged out by Kafka or the topic may have been deleted before 
all the data in the topic was processed. If you don't want your streaming query 
to fail on such cases, set the source option "failOnDataLoss" to "false". "

 

 


> Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive 
> Offsets (i.e. Log Compaction)
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23685
>                 URL: https://issues.apache.org/jira/browse/SPARK-23685
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>            Reporter: sirisha
>            Priority: Major
>
> When Kafka does log compaction offsets often end up with gaps, meaning the 
> next requested offset will be frequently not be offset+1. The logic in 
> KafkaSourceRDD & CachedKafkaConsumer assumes that the next offset will always 
> be just an increment of 1 .If not, it throws the below exception:
>  
> "Cannot fetch records in [5589, 5693) (GroupId: XXX, TopicPartition:XXXX). 
> Some data may have been lost because they are not available in Kafka any 
> more; either the data was aged out by Kafka or the topic may have been 
> deleted before all the data in the topic was processed. If you don't want 
> your streaming query to fail on such cases, set the source option 
> "failOnDataLoss" to "false". "
>  
> FYI: This bug is related to https://issues.apache.org/jira/browse/SPARK-17147
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23685) Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)

Reply via email to