nbalajee opened a new pull request, #18279: URL: https://github.com/apache/hudi/pull/18279
When performing rollback on the MDT, listStatus was called for each log file. This change pre-computes the latest log file versions with one listing call per partition, significantly reducing filesystem operations during rollback. closes #18278 ## Describe the issue this Pull Request addresses When performing rollback on the metadata table (MDT), the log writer builder calls FSUtils.getLatestLogVersion() for each rollback request, which internally does a listDirectEntries() per log file. For tables with many log files, this results in N listing calls (one per log file), which is expensive on cloud storage. ## Summary and Changelog Pre-computes the latest log file versions with one listing call per partition (P calls, where P is much less than N), significantly reducing filesystem operations during MDT rollback. - Added preComputeLogVersions() to RollbackHelper that lists each unique partition directory once, parses all log files, and builds a map of (partition, fileId, deltaCommitTime) to the latest (logVersion, logWriteToken). - Added logVersionLookupKey() helper for consistent map key formation. - Modified RollbackHelper.maybeDeleteAndCollectStats() to use the pre-computed map when building the log writer, bypassing per-request listing. - Applied the same optimization to RollbackHelperV1.maybeDeleteAndCollectStats(). - When doDelete=true, replaced the post-write getPathInfo() call with writer.getCurrentSize() to avoid an additional filesystem stat call per file. - Added defensive InvalidHoodiePathException catch in the pre-compute loop to skip non-standard log files gracefully. Falls back to per-request listing if pre-compute fails for a partition (e.g., IOException). - Added unit tests: testPreComputeLogVersionsListsOncePerPartition and testPreComputeLogVersionsEmptyWhenNoLogBlockRequests. - No code was copied from external sources. ### Impact No public API or user-facing feature change. This is a performance optimization that reduces the number of filesystem listing calls during rollback from O(N) (per log file) to O(P) (per partition), where P is much less than N. This is most impactful on cloud storage (S3, GCS, ADLS) where each listing call has significant latency. ### Risk Level Low. The optimization only affects the internal rollback path. If pre-computation fails for any partition, it falls back gracefully to the original per-request listing behavior. The getCurrentSize() optimization for file size is a minor trade-off (see below), but the value is accurate in all practical scenarios since close() does not write additional data. ### Edge case: if a writer rollover occurs during appendBlock() (file exceeds size threshold), getCurrentSize() would reflect the new file rather than the written file. This is extremely unlikely since rollback command blocks are ~100 bytes vs. typical thresholds of 128MB+. ### Documentation Update None. No new configs, features, or user-facing changes. ### Contributor's checklist [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) [x] Enough context is provided in the sections above [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
