Neer393 commented on code in PR #5987:
URL: https://github.com/apache/hive/pull/5987#discussion_r2217686006


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##########
@@ -786,14 +786,19 @@ private static FilesForCommit collectResults(int 
numTasks, ExecutorService execu
         .retry(3)
         .run(taskId -> {
           final String taskFileName = generateFileForCommitLocation(location, 
conf, jobContext.getJobID(), taskId);
-          final FilesForCommit files = readFileForCommit(taskFileName, io);
-          LOG.debug("Found Iceberg commitTask manifest file: {}\n{}", 
taskFileName, files);
-
-          dataFiles.addAll(files.dataFiles());
-          deleteFiles.addAll(files.deleteFiles());
-          replacedDataFiles.addAll(files.replacedDataFiles());
-          referencedDataFiles.addAll(files.referencedDataFiles());
-          mergedAndDeletedFiles.addAll(files.mergedAndDeletedFiles());
+          try {
+            final FilesForCommit files;
+            files = readFileForCommit(taskFileName, io);
+            LOG.debug("Found Iceberg commitTask manifest file: {}\n{}", 
taskFileName, files);
+
+            dataFiles.addAll(files.dataFiles());
+            deleteFiles.addAll(files.deleteFiles());
+            replacedDataFiles.addAll(files.replacedDataFiles());
+            referencedDataFiles.addAll(files.referencedDataFiles());
+            mergedAndDeletedFiles.addAll(files.mergedAndDeletedFiles());
+          } catch (NotFoundException e) {

Review Comment:
   Yes even I felt that this is just a workaround where we are just ignoring 
the non existent commit files but this is not a good method.
   The issue is currently we just find the number of tasks based on number of 
mapper/reducer jobs and assume that it is equal to the number of files and 
iterate over it. This is the reason why it still tries to find the not created 
commit files



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to