zsxwing opened a new pull request #33749:
URL: https://github.com/apache/spark/pull/33749


   ### What changes were proposed in this pull request?
   
   RocksDB provides backward compatibility but it doesn't always provide 
forward compatibility. It's better to store the RocksDB format version in the 
checkpoint so that it would give us more information to provide the rollback 
guarantee when we upgrade the RocksDB version that may introduce incompatible 
change in a new Spark version.
   
   A typical case is when a user upgrades their query to a new Spark version, 
and this new Spark version has a new RocksDB version which may use a new 
format. But the user hits some bug and decide to rollback. But in the old Spark 
version, the old RocksDB version cannot read the new format.
   
   In order to handle this case, we will write the RocksDB format version to 
the checkpoint. When restarting from a checkpoint, we will force RocksDB to use 
the format version stored in the checkpoint. This will ensure the user can 
rollback their Spark version if needed.
   
   We also provide a config 
`spark.sql.streaming.stateStore.rocksdb.formatVersion` for users who don't need 
to rollback their Spark versions to overwrite the format version specified in 
the checkpoint.
   
   ### Why are the changes needed?
   
   Provide the Spark version rollback guarantee for streaming queries when a 
new RocksDB introduces an incompatible format change.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. RocksDB state store is a new feature in Spark 3.2, which has not yet 
released.
   
   ### How was this patch tested?
   
   The new unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to