[ https://issues.apache.org/jira/browse/SPARK-43951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728736#comment-17728736 ]
Adam Binford commented on SPARK-43951: -------------------------------------- Of course as soon as I finish figuring all this out I found https://github.com/apache/spark/pull/41089 > RocksDB state store can become corrupt on task retries > ------------------------------------------------------ > > Key: SPARK-43951 > URL: https://issues.apache.org/jira/browse/SPARK-43951 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.0 > Reporter: Adam Binford > Priority: Major > > A couple of our streaming jobs have failed since upgrading to Spark 3.4 with > an error such as: > org.rocksdb.RocksDBException: Mismatch in unique ID on table file ###. > Expected: [###,###} Actual\{###,###} in file ..../MANIFEST-#### > This is due to the change from > [https://github.com/facebook/rocksdb/commit/6de7081cf37169989e289a4801187097f0c50fae] > that enabled unique ID checks by default, and I finally tracked down the > exact sequence of steps that leads to this failure in the way RocksDB state > store is used. > # A task fails after uploading the checkpoint to HDFS. Lets say it uploaded > 11.zip to version 11 of the table, but the task failed before it could finish > after successfully uploading the checkpoint. > # The same task is retried and goes back to load version 10 of the table as > expected. > # Cleanup/maintenance is called for this partition, which looks in HDFS for > persisted versions and sees up through version 11 since that zip file was > successfully uploaded on the previous task. > # As part of resolving what SST files are part of each table version, > versionToRocksDBFiles.put(version, newResolvedFiles) is called for version 11 > with its SST files that were uploaded in the first failed task. > # The second attempt at the task commits and goes to sync its checkpoint to > HDFS. > # versionToRocksDBFiles contains the SST files to upload from step 4, and > these files are considered "the same" as what's in the local working dir > because the name and file size match. > # No SST files are uploaded because they matched above, but in reality the > unique ID inside the SST files is different (presumably this is just randomly > generated and inserted into each SST file?), it just doesn't affect the size. > # A new METADATA file is uploaded which has the new unique IDs listed inside. > # When version 11 of the table is read during the next batch, the unique IDs > in the METADATA file don't match the unique IDS in the SST files, which > causes the exception. > > This is basically a ticking time bomb for anyone using RocksDB. Thoughts on > possible fixes would be: > * Disable unique ID verification. I don't currently see a binding for this > in the RocksDB java wrapper, so that would probably have to be added first. > * Disable checking if files are already uploaded with the same size, and > just always upload SST files no matter what. > * Update the "same file" check to also be able to do some kind of CRC > comparison or something like that. > * Update the mainteance/cleanup to not update the versionToRocksDBFiles map. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org