[ https://issues.apache.org/jira/browse/SPARK-52096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951085#comment-17951085 ]
Aparna Garg commented on SPARK-52096: ------------------------------------- User 'eason-yuchen-liu' has created a pull request for this issue: https://github.com/apache/spark/pull/50866 > Reclassify kafka source offset assertion error > ---------------------------------------------- > > Key: SPARK-52096 > URL: https://issues.apache.org/jira/browse/SPARK-52096 > Project: Spark > Issue Type: Improvement > Components: Connect, Structured Streaming > Affects Versions: 4.0.0, 4.1.0 > Reporter: Yuchen Liu > Priority: Major > > This assertion error could be due to user error or kafka internal error. > There are different possible input cases that can lead to this. Here is a > list in the time order that we can catch them: > # If the user specifies startingOffset and endingOffset in a comparable way > (either by index or timestamp), we can compare them before start the query. > # If the user provides startingOffset and endingOffset but incomparable, we > should compare them as soon as they are resolved according to the kafka > brokers. > # If the user sets startingOffset to some value and endingOffset to latest, > we should compare them after the query has started and the latest offset has > been fetched. > This way we have a better separation of user error versus system error. > > {code:java} > java.lang.AssertionError: assertion failed: Beginning offset 49041276637 is > after the ending offset 48774253518 for topic xxx partition 1. You either > provided an invalid fromOffset, or the Kafka topic has been damaged > at scala.Predef$.assert(Predef.scala:223) > at > org.apache.spark.sql.kafka010.KafkaSourceRDD.compute(KafkaSourceRDD.scala:79) > at > org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409) > at > com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406) > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org