chihsuan opened a new pull request, #10578:
URL: https://github.com/apache/ozone/pull/10578

   ## What changes were proposed in this pull request?
   
   **Problem.** When `FSORepairTool` marks the temporary `reachable` and 
`pendingToDelete` tables in `temp.db`, it writes each entry with an individual 
`Table.put`. Every put is a separate RocksDB write (WAL + fsync). For FSO 
buckets with thousands or millions of files and directories, this per-entry 
fsync overhead dominates the run.
   
   **Fix.** Accumulate those temp-table writes in a bounded RocksDB 
`BatchOperation` and commit them in batches. A small `BatchedTempWriter` helper 
buffers `putWithBatch` calls, flushes (commit + reopen) every `tempDbBatchSize` 
entries to cap memory, and commits any remainder on close. Each marking phase 
wraps its directory walk in one writer. This is safe because the two temp 
tables are only written during the marking phases and only read back later in 
the classification phase, so all writes for a bucket are committed before that 
bucket is classified. The repair-mode logic that moves entries to the OM 
deleted tables was already batched and is unchanged.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-14187
   
   ## How was this patch tested?
   
   - The existing `TestFSORepairTool` suite (connected / disconnected / empty / 
unreachable trees, dry-run, volume and bucket filters, repair mode, and 
post-repair OM restart validation) passes unchanged, confirming the batched 
writes produce identical reports.
   - Added `testBatchedTempWrites`, which sets `tempDbBatchSize = 1` and runs a 
full dry-run so the batch commit/reset path is exercised for both the 
`reachable` and `pendingToDelete` tables across all tree shapes, and asserts 
the report is identical to the default-batch run.
   - `checkstyle.sh` is clean on the changed modules.
   
   Generated-by: Claude Code (claude-opus-4-8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to