[
https://issues.apache.org/jira/browse/SPARK-43951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728736#comment-17728736
]
Adam Binford commented on SPARK-43951:
--------------------------------------
Of course as soon as I finish figuring all this out I found
https://github.com/apache/spark/pull/41089
> RocksDB state store can become corrupt on task retries
> ------------------------------------------------------
>
> Key: SPARK-43951
> URL: https://issues.apache.org/jira/browse/SPARK-43951
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.4.0
> Reporter: Adam Binford
> Priority: Major
>
> A couple of our streaming jobs have failed since upgrading to Spark 3.4 with
> an error such as:
> org.rocksdb.RocksDBException: Mismatch in unique ID on table file ###.
> Expected: [###,###} Actual\{###,###} in file ..../MANIFEST-####
> This is due to the change from
> [https://github.com/facebook/rocksdb/commit/6de7081cf37169989e289a4801187097f0c50fae]
> that enabled unique ID checks by default, and I finally tracked down the
> exact sequence of steps that leads to this failure in the way RocksDB state
> store is used.
> # A task fails after uploading the checkpoint to HDFS. Lets say it uploaded
> 11.zip to version 11 of the table, but the task failed before it could finish
> after successfully uploading the checkpoint.
> # The same task is retried and goes back to load version 10 of the table as
> expected.
> # Cleanup/maintenance is called for this partition, which looks in HDFS for
> persisted versions and sees up through version 11 since that zip file was
> successfully uploaded on the previous task.
> # As part of resolving what SST files are part of each table version,
> versionToRocksDBFiles.put(version, newResolvedFiles) is called for version 11
> with its SST files that were uploaded in the first failed task.
> # The second attempt at the task commits and goes to sync its checkpoint to
> HDFS.
> # versionToRocksDBFiles contains the SST files to upload from step 4, and
> these files are considered "the same" as what's in the local working dir
> because the name and file size match.
> # No SST files are uploaded because they matched above, but in reality the
> unique ID inside the SST files is different (presumably this is just randomly
> generated and inserted into each SST file?), it just doesn't affect the size.
> # A new METADATA file is uploaded which has the new unique IDs listed inside.
> # When version 11 of the table is read during the next batch, the unique IDs
> in the METADATA file don't match the unique IDS in the SST files, which
> causes the exception.
>
> This is basically a ticking time bomb for anyone using RocksDB. Thoughts on
> possible fixes would be:
> * Disable unique ID verification. I don't currently see a binding for this
> in the RocksDB java wrapper, so that would probably have to be added first.
> * Disable checking if files are already uploaded with the same size, and
> just always upload SST files no matter what.
> * Update the "same file" check to also be able to do some kind of CRC
> comparison or something like that.
> * Update the mainteance/cleanup to not update the versionToRocksDBFiles map.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]