SourabhBadhya commented on code in PR #5251:
URL: https://github.com/apache/hive/pull/5251#discussion_r1629515925


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##########
@@ -773,8 +796,22 @@ public List<FileStatus> getOutputFiles(List<JobContext> 
jobContexts) throws IOEx
                 FilesForCommit results = collectResults(numTasks, 
fileExecutor, table.location(), jobContext,
                         table.io(), false);
                 for (DataFile dataFile : results.dataFiles()) {
-                  FileStatus fileStatus = fileSystem.getFileStatus(new 
Path(dataFile.path().toString()));
-                  dataFiles.add(fileStatus);
+                  Path filePath = new Path(dataFile.path().toString());
+                  FileStatus fileStatus = fileSystem.getFileStatus(filePath);
+                  parentDirToDataFile.merge(

Review Comment:
   This is to check the list of files which are part of the same parent 
directory. Later this is used to decide whether they are eligible candidates 
for merge.



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##########
@@ -773,8 +796,22 @@ public List<FileStatus> getOutputFiles(List<JobContext> 
jobContexts) throws IOEx
                 FilesForCommit results = collectResults(numTasks, 
fileExecutor, table.location(), jobContext,
                         table.io(), false);
                 for (DataFile dataFile : results.dataFiles()) {
-                  FileStatus fileStatus = fileSystem.getFileStatus(new 
Path(dataFile.path().toString()));
-                  dataFiles.add(fileStatus);
+                  Path filePath = new Path(dataFile.path().toString());
+                  FileStatus fileStatus = fileSystem.getFileStatus(filePath);
+                  parentDirToDataFile.merge(
+                      filePath.getParent(), Lists.newArrayList(fileStatus), 
(oldList, newList) -> {
+                      oldList.addAll(newList);
+                      return oldList;
+                    });
+                }
+                for (DeleteFile deleteFile : results.deleteFiles()) {
+                  Path filePath = new Path(deleteFile.path().toString());
+                  FileStatus fileStatus = fileSystem.getFileStatus(filePath);
+                  parentDirToDeleteFile.merge(

Review Comment:
   This is to check the list of files which are part of the same parent 
directory. Later this is used to decide whether they are eligible candidates 
for merge.



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##########
@@ -783,12 +820,59 @@ public List<FileStatus> getOutputFiles(List<JobContext> 
jobContexts) throws IOEx
         tableExecutor.shutdown();
       }
     }
+    Collection<FileStatus> dataFiles = new ConcurrentLinkedQueue<>();

Review Comment:
   Used List. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to