SourabhBadhya commented on code in PR #5251: URL: https://github.com/apache/hive/pull/5251#discussion_r1629515925
########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ########## @@ -773,8 +796,22 @@ public List<FileStatus> getOutputFiles(List<JobContext> jobContexts) throws IOEx FilesForCommit results = collectResults(numTasks, fileExecutor, table.location(), jobContext, table.io(), false); for (DataFile dataFile : results.dataFiles()) { - FileStatus fileStatus = fileSystem.getFileStatus(new Path(dataFile.path().toString())); - dataFiles.add(fileStatus); + Path filePath = new Path(dataFile.path().toString()); + FileStatus fileStatus = fileSystem.getFileStatus(filePath); + parentDirToDataFile.merge( Review Comment: This is to check the list of files which are part of the same parent directory. Later this is used to decide whether they are eligible candidates for merge. ########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ########## @@ -773,8 +796,22 @@ public List<FileStatus> getOutputFiles(List<JobContext> jobContexts) throws IOEx FilesForCommit results = collectResults(numTasks, fileExecutor, table.location(), jobContext, table.io(), false); for (DataFile dataFile : results.dataFiles()) { - FileStatus fileStatus = fileSystem.getFileStatus(new Path(dataFile.path().toString())); - dataFiles.add(fileStatus); + Path filePath = new Path(dataFile.path().toString()); + FileStatus fileStatus = fileSystem.getFileStatus(filePath); + parentDirToDataFile.merge( + filePath.getParent(), Lists.newArrayList(fileStatus), (oldList, newList) -> { + oldList.addAll(newList); + return oldList; + }); + } + for (DeleteFile deleteFile : results.deleteFiles()) { + Path filePath = new Path(deleteFile.path().toString()); + FileStatus fileStatus = fileSystem.getFileStatus(filePath); + parentDirToDeleteFile.merge( Review Comment: This is to check the list of files which are part of the same parent directory. Later this is used to decide whether they are eligible candidates for merge. ########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ########## @@ -783,12 +820,59 @@ public List<FileStatus> getOutputFiles(List<JobContext> jobContexts) throws IOEx tableExecutor.shutdown(); } } + Collection<FileStatus> dataFiles = new ConcurrentLinkedQueue<>(); Review Comment: Used List. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org