yihua commented on code in PR #18867: URL: https://github.com/apache/hudi/pull/18867#discussion_r3315309361
########## website/docs/concurrency_control.md: ########## @@ -359,6 +391,32 @@ hoodie.write.lock.client.num_retries *Setting the right values for these depends on a case by case basis; some defaults have been provided for general cases.* +## Pre-Write Cleaner Policy + +When running multi-writer pipelines, failed writes can accumulate on storage if a writer crashes before a clean cycle runs. Hudi 1.2.0 introduces `hoodie.prewrite.cleaner.policy` to proactively handle this at write startup: + +| Config Key | Default | Description | +|---|---|---| +| `hoodie.prewrite.cleaner.policy` | `NONE` | Policy applied before starting a new ingestion write commit. `NONE`: no pre-write action (default). `CLEAN`: force a clean table service call (also rolls back failed writes). `ROLLBACK_FAILED_WRITES`: only roll back failed writes without running a full clean. | + +This is useful when a writer is perpetually crashing before completing a `CLEAN`. See [Cleaning](cleaning.md) for the full list of cleaning configurations. Review Comment: Good catch — fixed in e1c1aa67e136. Verified against source: `HoodiePreWriteCleanerPolicy.CLEAN` is documented in the enum as "Force a CLEAN table service call before starting the write (also performs rollback of failed writes)", and `BaseHoodieTableServiceClient#clean` invokes `rollbackFailedWrites` before the clean itself. Updated `cleaning.md` to match the wording in `concurrency_control.md`: `CLEAN` runs a clean pass that also rolls back failed writes; `ROLLBACK_FAILED_WRITES` is the rollback-only variant. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
