[ 
https://issues.apache.org/jira/browse/SPARK-52096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951085#comment-17951085
 ] 

Aparna Garg commented on SPARK-52096:
-------------------------------------

User 'eason-yuchen-liu' has created a pull request for this issue:
https://github.com/apache/spark/pull/50866

> Reclassify kafka source offset assertion error
> ----------------------------------------------
>
>                 Key: SPARK-52096
>                 URL: https://issues.apache.org/jira/browse/SPARK-52096
>             Project: Spark
>          Issue Type: Improvement
>          Components: Connect, Structured Streaming
>    Affects Versions: 4.0.0, 4.1.0
>            Reporter: Yuchen Liu
>            Priority: Major
>
> This assertion error could be due to user error or kafka internal error. 
> There are different possible input cases that can lead to this. Here is a 
> list in the time order that we can catch them:
>  # If the user specifies startingOffset and endingOffset in a comparable way 
> (either by index or timestamp), we can compare them before start the query.
>  # If the user provides startingOffset and endingOffset but incomparable, we 
> should compare them as soon as they are resolved according to the kafka 
> brokers.
>  # If the user sets startingOffset to some value and endingOffset to latest, 
> we should compare them after the query has started and the latest offset has 
> been fetched.
> This way we have a better separation of user error versus system error.
>  
> {code:java}
> java.lang.AssertionError: assertion failed: Beginning offset 49041276637 is 
> after the ending offset 48774253518 for topic xxx partition 1. You either 
> provided an invalid fromOffset, or the Kafka topic has been damaged
>       at scala.Predef$.assert(Predef.scala:223)
>       at 
> org.apache.spark.sql.kafka010.KafkaSourceRDD.compute(KafkaSourceRDD.scala:79)
>       at 
> org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409)
>       at 
> com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406) 
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to