[ https://issues.apache.org/jira/browse/SPARK-24453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500770#comment-16500770 ]
Apache Spark commented on SPARK-24453: -------------------------------------- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/21491 > Fix error recovering from the failure in a no-data batch > -------------------------------------------------------- > > Key: SPARK-24453 > URL: https://issues.apache.org/jira/browse/SPARK-24453 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.4.0 > Reporter: Tathagata Das > Assignee: Tathagata Das > Priority: Major > > ``` > java.lang.AssertionError: assertion failed: Concurrent update to the log. > Multiple streaming jobs detected for 159897 > ``` > The error occurs when we are recovering from a failure in a no-data batch > (say X) that has been planned (i.e. written to offset log) but not executed > (i.e. not written to commit log). Upon recovery, the following sequence of > events happen. > - `MicroBatchExecution.populateStartOffsets` sets `currentBatchId` to X. > Since there was no data in the batch, the `availableOffsets` is same as > `committedOffsets`, so `isNewDataAvailable` is false. > - When MicroBatchExecution.constructNextBatch is called, ideally it should > immediately return true because the next batch has already been constructed. > However, the check of whether the batch has been constructed was `if > (isNewDataAvailable) return true`. Since the planned batch is a no-data > batch, it escaped this check and proceeded to plan the same batch X once > again. And if there is new data since the failure, it does plan a new batch, > and try to write new offsets to the `offsetLog` as batchId X, and fail with > the above error. > The correct solution is to check the offset log whether the currentBatchId is > the latest or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org