prashantwason commented on PR #18047: URL: https://github.com/apache/hudi/pull/18047#issuecomment-3949834972
Rebased with latest master. **Regarding @the-other-tim-brown's suggestion to use `getAllFilesInPartitions`:** I looked at this, but the signatures are different: - `getAllFilesInPartitions(Collection<String> partitionPaths)` returns `Map<String, List<StoragePathInfo>>` - `listPartitions(List<Pair<String, StoragePath>> partitionPathList)` returns `Map<Pair<String, StoragePath>, List<StoragePathInfo>>` To use `getAllFilesInPartitions` internally, we'd need to: 1. Extract partition paths from the pairs 2. Call `getAllFilesInPartitions` 3. Map the results back to the Pair keys The current implementation directly uses `FSUtils.getAllDataFilesInPartition()` which is the same filtering utility used by `getAllFilesInPartitions()`. This achieves the same filtering while avoiding the extra mapping overhead. **Regarding @danny0405's question about the removed error handling:** The original code had special handling for the case "in case the partition path was created by another caller". The new implementation using `FSUtils.getAllDataFilesInPartition()` handles empty/non-existent directories gracefully (returns empty list), so the complex try-catch logic is no longer needed. Let me know if you'd prefer me to refactor to call `getAllFilesInPartitions` internally instead. @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
