[PR] perf(core): optimize rollback listing calls on metadata table [hudi]

via GitHub Wed, 04 Mar 2026 13:37:04 -0800


nbalajee opened a new pull request, #18279:
URL: https://github.com/apache/hudi/pull/18279


   When performing rollback on the MDT, listStatus was called for each log 
file. This change pre-computes the latest log file versions with one listing 
call per partition, significantly reducing filesystem operations during 
rollback.
   
   closes #18278
   
   ## Describe the issue this Pull Request addresses
   When performing rollback on the metadata table (MDT), the log writer builder 
calls FSUtils.getLatestLogVersion() for each rollback request, which internally 
does a listDirectEntries() per log file. For tables with many log files, this 
results in N listing calls (one per log file), which is expensive on cloud 
storage.
   
   ## Summary and Changelog
   Pre-computes the latest log file versions with one listing call per 
partition (P calls, where P is much less than N), significantly reducing 
filesystem operations during MDT rollback.
   
   - Added preComputeLogVersions() to RollbackHelper that lists each unique 
partition directory once, parses all log files, and builds a map of (partition, 
fileId, deltaCommitTime) to the latest (logVersion, logWriteToken).
   - Added logVersionLookupKey() helper for consistent map key formation.
   - Modified RollbackHelper.maybeDeleteAndCollectStats() to use the 
pre-computed map when building the log writer, bypassing per-request listing.
   - Applied the same optimization to 
RollbackHelperV1.maybeDeleteAndCollectStats().
   - When doDelete=true, replaced the post-write getPathInfo() call with 
writer.getCurrentSize() to avoid an additional filesystem stat call per file.
   - Added defensive InvalidHoodiePathException catch in the pre-compute loop 
to skip non-standard log files gracefully.
   Falls back to per-request listing if pre-compute fails for a partition 
(e.g., IOException).
   - Added unit tests: testPreComputeLogVersionsListsOncePerPartition and 
testPreComputeLogVersionsEmptyWhenNoLogBlockRequests.
   - No code was copied from external sources.
   
   ### Impact
   No public API or user-facing feature change. This is a performance 
optimization that reduces the number of filesystem listing calls during 
rollback from O(N) (per log file) to O(P) (per partition), where P is much less 
than N. This is most impactful on cloud storage (S3, GCS, ADLS) where each 
listing call has significant latency.
   
   ### Risk Level
   Low. The optimization only affects the internal rollback path. If 
pre-computation fails for any partition, it falls back gracefully to the 
original per-request listing behavior. The getCurrentSize() optimization for 
file size is a minor trade-off (see below), but the value is accurate in all 
practical scenarios since close() does not write additional data.
   
   ### Edge case: if a writer rollover occurs during appendBlock() (file 
exceeds size threshold), getCurrentSize() would reflect the new file rather 
than the written file. This is extremely unlikely since rollback command blocks 
are ~100 bytes vs. typical thresholds of 128MB+.
   
   ### Documentation Update
   None. No new configs, features, or user-facing changes.
   
   ### Contributor's checklist
   
   [x] Read through
   [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   
   [x] Enough context is provided in the sections above
   
   [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] perf(core): optimize rollback listing calls on metadata table [hudi]

Reply via email to