pvary opened a new issue #1496: URL: https://github.com/apache/iceberg/issues/1496
We discussed in #1465 that it would be good to find a way to update the version-hint.txt atomically > The version file should not be corrupt and we should make sure of it by changing how we write the file. Since this is for HDFS, creating the file with an atomic rename to ensure the entire file is written before making it the current version hint makes sense to me. That way we don't get dirty reads. Checked a few things and here is what I have found (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29) ``` Destination exists and is a file Renaming a file atop an existing file is specified as failing, raising an exception. - Local FileSystem : the rename succeeds; the destination file is replaced by the source file. - HDFS : The rename fails, no exception is raised. Instead the method call simply returns false. ``` Based on this I think our best solution would be: 1. Create a new file 2. Delete old version-hint.txt 3. Move the new file to version-hint.txt This would mean that for a while (between 2-3) we will have a short period when the fallback listing created by #1465 will kick in and use listing to get the version, or alternatively we can add retry logic to getting the version-hint.txt file in place finally. Your thoughts @jacques-n, @fbocse, @rdblue? Thanks, Peter ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
