Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

via GitHub Fri, 02 Feb 2024 06:20:51 -0800


BsoBird commented on code in PR #9546:
URL: https://github.com/apache/iceberg/pull/9546#discussion_r1476118577



##########
core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java:
##########
@@ -289,23 +341,58 @@ Path versionHintFile() {
     return metadataPath(Util.VERSION_HINT_FILENAME);
   }
 
-  private void writeVersionHint(int versionToWrite) {
-    Path versionHintFile = versionHintFile();
-    FileSystem fs = getFileSystem(versionHintFile, conf);
-
+  @VisibleForTesting
+  boolean writeVersionHint(Integer versionToWrite, Path finalMetadataFile) {
+    boolean deleteSuccess = false;
     try {
-      Path tempVersionHintFile = metadataPath(UUID.randomUUID().toString() + 
"-version-hint.temp");
+      Path versionHintFile = versionHintFile();
+      FileSystem fs = getFileSystem(versionHintFile, conf);
+      Path tempVersionHintFile = metadataPath(UUID.randomUUID() + 
"-version-hint.temp");
       writeVersionToPath(fs, tempVersionHintFile, versionToWrite);
-      fs.delete(versionHintFile, false /* recursive delete */);
-      fs.rename(tempVersionHintFile, versionHintFile);
-    } catch (IOException e) {
-      LOG.warn("Failed to update version hint", e);
+      deleteSuccess = dropOldVersionHint(fs, versionHintFile);
+      if (!deleteSuccess) {
+        throw new RuntimeException("Can't delete version Hint, something wrong 
with File System?");
+      }
+      return renameVersionHint(fs, tempVersionHintFile, versionHintFile);
+    } catch (Exception e) {
+      // This is the most dangerous scenario. There are two possibilities:
+      // 1. We wrote to metadata-v+1.json, but didn't have time to change the 
versionHint, the
+      // content in VersionHint is old. If we do nothing, then all commits 
after this table will
+      // fail because versionHint points to the old version, but 
metadata-v+1.json exists and rename
+      // will fail.
+      // 2. We deleted the versionHint, but did not have time to write a new 
versionHint. At this
+      // point, because VersionHint is lost, all clients can only traverse the 
metadata-json to find
+      // the latest version, that is: as long as the versionHint is deleted, 
then in fact, commit
+      // will be succeeded.
+      // For case 1, we'd better clean up the metadata-v+1.json to ensure that 
the next commit will
+      // be successful.
+      // For case 2, Since the commit was successful, we can't clean up the 
data files associated
+      // with this commit.we need to throw a CommitStateUnknownException to 
skip the data file
+      // cleanup.
+      if (!deleteSuccess) {
+        io().deleteFile(finalMetadataFile.toString());

Review Comment:
   @RussellSpitzer 
   Perhaps we should change the order of execution from ```writeMeta -> 
deleteHint -> writeHite``` to ```deleteHint -> writeMeta -> writeHint```. This 
is because in reality the versionHint is just an index, and by the time the 
metaData write succeeds, the commit has actually succeeded. By changing the 
order of execution, we can eliminate the UnknowCommitStateException. Make the 
HadoopCatalog implementation conform to Spec.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

Reply via email to