ad1happy2go opened a new pull request, #18948:
URL: https://github.com/apache/hudi/pull/18948

   ### Change Logs
   
   CDC before/after images inferred directly from base/log data files leaked 
the `_hoodie_*` meta columns, while images served from the supplemental CDC log 
already have them stripped at write time (`HoodieCDCLogger`). The clearest 
trigger is the `BASE_FILE_INSERT` case — an insert-only commit writes no CDC 
log file, so its change data is inferred from the base file. The result was an 
inconsistent, alternating-per-commit image schema where some commits' images 
carried meta columns and others did not.
   
   Fix: skip the `_hoodie_*` meta columns in `InternalRowToJsonStringConverter` 
so every inference case produces a schema-consistent, business-columns-only 
image.
   
   Tests:
   - `TestInternalRowToJsonStringConverter#stripsHoodieMetaColumns` — unit 
regression test asserting meta columns are dropped from the JSON image.
   - `TestCDCDataFrameSuite#testCDCImagesExcludeHoodieMetaFields` — functional 
test (MOR + inline compaction + upsert) parameterized over all 
`HoodieCDCSupplementalLoggingMode` values, asserting no before/after image 
contains a `_hoodie_*` column.
   
   ### Impact
   
   CDC incremental query before/after images are now consistent across all 
commits and supplemental-logging modes, containing only business columns.
   
   ### Risk level: low
   
   Behavior change is limited to the JSON content of CDC before/after images 
(removal of meta columns that should never have been present).
   
   ### Documentation Update
   
   None required.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to