raghusrhyme opened a new pull request, #18914:
URL: https://github.com/apache/hudi/pull/18914
## Summary
Custom JAR transformers that do column projection (e.g. `ColumnFilter` with
`mode=include`) drop `_corrupt_record` since they are unaware of the
error-table contract. `ErrorTableAwareChainedTransformer` called `validate()`
after every transformer, causing `HoodieValidationException` for any
transformer that projects the column away.
This replaces the `validate()` call with
`addNullValueErrorTableCorruptRecordColumn()` after each `transformer.apply()`,
so the column is re-injected if dropped. One line change, covers all
transformers — built-in and custom.
## Root Cause
Customer has a custom JAR with `ColumnFilter` transformer using
`mode=include` and an explicit column list. This calls
`dataset.select(columns...)`, which drops `_corrupt_record`. The chain's
post-transformer `validate()` then fails.
## Test plan
- `testCorruptRecordReInjectedAfterTransformerDropsIt` — chain with error
handler → column drop → cast; verifies `_corrupt_record` survives
- `testCustomColumnProjectionPreservesCorruptRecord` — single transformer
doing `dataset.select("foo")`; verifies re-injection as null
- `testTransformerPreservingCorruptRecordIsNoOp` — transformer that keeps
all columns; verifies no duplicate column
- Existing `testForErrorTableConfig` unchanged — quarantine-aware
transformers still work
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]