shangxinli opened a new pull request, #18765:
URL: https://github.com/apache/hudi/pull/18765

   ### Describe the issue this Pull Request addresses
   
   Closes #18750. Migrates `HoodieStreamerWriteStatusValidator` (HSWSV) into 
the pre-commit validator framework (#18068, #18362, #18405).
   
   ### Summary and Changelog
   
   Deletes HSWSV and replaces it with explicit pre-commit orchestration in 
`StreamSync`. HSWSV's three concerns are extracted into named single-purpose 
helpers; the framework-wired equivalent is added as an opt-in validator.
   
   - **`SparkWriteErrorValidator`** (new) — `BasePreCommitValidator` for write 
errors. Opt-in. `failure.policy=FAIL` mirrors `commitOnErrors=false`; 
`WARN_LOG` mirrors `commitOnErrors=true`.
   - **`SuccessfulRecordCounter`** (new) — pure counting; supports error-table 
unification.
   - **`ErrorTableCommitter`** (new) — error-table commit; returns 
success/failure for caller-driven strategy handling.
   - **`WriteErrorReporter`** (new) — top-N errored-status logging.
   - **`StreamSync.writeToSinkAndDoMetaSync()`** — orchestrates explicitly: run 
validators → count → commit error table → apply write-error gate (preserves 
`commitOnErrors`) → `writeClient.commit()` without the `WriteStatusValidator` 
callback.
   - **HSWSV deleted** (~100 LOC).
   - **`HoodiePreCommitValidatorConfig.VALIDATOR_CLASS_NAMES`** doc references 
the new validator.
   
   ### Impact
   
   - **`WriteStatusValidator` interface preserved.** 
`DataSourceUtils.SparkDataSourceWriteStatusValidator` is another active caller 
in the Spark datasource path — the hook stays; only the HoodieStreamer 
registration is removed.
   - **No behavior change** for users who don't configure 
`hoodie.precommit.validators`. The inline error gate in `StreamSync` preserves 
HSWSV semantics. `commitOnErrors` continues to work.
   - **`ROLLBACK_COMMIT` / `LOG_ERROR`** strategies preserved. Because the 
orchestration now runs *before* `writeClient.commit()`, `ROLLBACK_COMMIT` no 
longer needs to roll back — the commit simply doesn't happen.
   
   ### Risk Level
   
   **medium** — touches the hot path of every HoodieStreamer commit. Semantics 
are equivalent to HSWSV by construction, but the call site moves from inside 
`writeClient.commit()` to before it.
   
   Verified: `test-compile` BUILD SUCCESS · 
`TestSparkKafkaOffsetValidator,TestSparkValidationContext,TestSparkStreamerValidatorUtils,TestSparkWriteErrorValidator,TestSuccessfulRecordCounter`
 49/49 · `TestStreamSync,TestHoodieStreamerUtils` 51/51 · checkstyle 0 · RAT 0.
   
   Not run locally: full `TestHoodieDeltaStreamer` integration suite — relying 
on CI.
   
   ### Documentation Update
   
   `VALIDATOR_CLASS_NAMES` Javadoc updated to list `SparkWriteErrorValidator`. 
Each new helper has class-level Javadoc explaining its single responsibility. 
No website changes needed — `commitOnErrors` user-facing config is unchanged.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to