[ https://issues.apache.org/jira/browse/SPARK-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-24453: ------------------------------------ Assignee: Apache Spark (was: Tathagata Das) > Fix error recovering from the failure in a no-data batch > -------------------------------------------------------- > > Key: SPARK-24453 > URL: https://issues.apache.org/jira/browse/SPARK-24453 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.4.0 > Reporter: Tathagata Das > Assignee: Apache Spark > Priority: Major > > ``` > java.lang.AssertionError: assertion failed: Concurrent update to the log. > Multiple streaming jobs detected for 159897 > ``` > The error occurs when we are recovering from a failure in a no-data batch > (say X) that has been planned (i.e. written to offset log) but not executed > (i.e. not written to commit log). Upon recovery, the following sequence of > events happen. > - `MicroBatchExecution.populateStartOffsets` sets `currentBatchId` to X. > Since there was no data in the batch, the `availableOffsets` is same as > `committedOffsets`, so `isNewDataAvailable` is false. > - When MicroBatchExecution.constructNextBatch is called, ideally it should > immediately return true because the next batch has already been constructed. > However, the check of whether the batch has been constructed was `if > (isNewDataAvailable) return true`. Since the planned batch is a no-data > batch, it escaped this check and proceeded to plan the same batch X once > again. And if there is new data since the failure, it does plan a new batch, > and try to write new offsets to the `offsetLog` as batchId X, and fail with > the above error. > The correct solution is to check the offset log whether the currentBatchId is > the latest or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org