raghusrhyme opened a new pull request, #18914:
URL: https://github.com/apache/hudi/pull/18914

   ## Summary
   Custom JAR transformers that do column projection (e.g. `ColumnFilter` with 
`mode=include`) drop `_corrupt_record` since they are unaware of the 
error-table contract. `ErrorTableAwareChainedTransformer` called `validate()` 
after every transformer, causing `HoodieValidationException` for any 
transformer that projects the column away.
   
   This replaces the `validate()` call with 
`addNullValueErrorTableCorruptRecordColumn()` after each `transformer.apply()`, 
so the column is re-injected if dropped. One line change, covers all 
transformers — built-in and custom.
   
   ## Root Cause
   Customer has a custom JAR with `ColumnFilter` transformer using 
`mode=include` and an explicit column list. This calls 
`dataset.select(columns...)`, which drops `_corrupt_record`. The chain's 
post-transformer `validate()` then fails.
   
   ## Test plan
   - `testCorruptRecordReInjectedAfterTransformerDropsIt` — chain with error 
handler → column drop → cast; verifies `_corrupt_record` survives
   - `testCustomColumnProjectionPreservesCorruptRecord` — single transformer 
doing `dataset.select("foo")`; verifies re-injection as null
   - `testTransformerPreservingCorruptRecordIsNoOp` — transformer that keeps 
all columns; verifies no duplicate column
   - Existing `testForErrorTableConfig` unchanged — quarantine-aware 
transformers still work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to