prashantwason commented on PR #18047:
URL: https://github.com/apache/hudi/pull/18047#issuecomment-3949834972

   Rebased with latest master.
   
   **Regarding @the-other-tim-brown's suggestion to use 
`getAllFilesInPartitions`:**
   
   I looked at this, but the signatures are different:
   - `getAllFilesInPartitions(Collection<String> partitionPaths)` returns 
`Map<String, List<StoragePathInfo>>`
   - `listPartitions(List<Pair<String, StoragePath>> partitionPathList)` 
returns `Map<Pair<String, StoragePath>, List<StoragePathInfo>>`
   
   To use `getAllFilesInPartitions` internally, we'd need to:
   1. Extract partition paths from the pairs
   2. Call `getAllFilesInPartitions`
   3. Map the results back to the Pair keys
   
   The current implementation directly uses 
`FSUtils.getAllDataFilesInPartition()` which is the same filtering utility used 
by `getAllFilesInPartitions()`. This achieves the same filtering while avoiding 
the extra mapping overhead.
   
   **Regarding @danny0405's question about the removed error handling:**
   
   The original code had special handling for the case "in case the partition 
path was created by another caller". The new implementation using 
`FSUtils.getAllDataFilesInPartition()` handles empty/non-existent directories 
gracefully (returns empty list), so the complex try-catch logic is no longer 
needed.
   
   Let me know if you'd prefer me to refactor to call `getAllFilesInPartitions` 
internally instead.
   
   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to