voonhous commented on code in PR #18938:
URL: https://github.com/apache/hudi/pull/18938#discussion_r3463846891


##########
hudi-hadoop-common/src/main/java/org/apache/hudi/io/storage/hadoop/HoodieVariantReconstruction.java:
##########


Review Comment:
   Good call - this caught a real latent bug. Added 
`TestHoodieVariantReconstructionRoundTrip` (in hudi-spark4-common, so the real 
provider is auto-detected): a `create` -> `reconstruct` round-trip over a 
record with a shredded variant column plus a non-variant column, asserting the 
variant is rebuilt, the non-variant column passes through, and the field 
positions line up.
   
   It failed on first run: `create` reused the requested field instances for 
non-target columns when assembling the intermediate read schema, and those Avro 
`Field`s are already bound to the requested record - so `Schema.setFields` 
throws `Field already used` as soon as a non-variant column sits alongside a 
shredded variant (i.e. every real table). The target branch only avoided it 
because `withSchema()` builds a fresh `Field`. Fixed by doing the same for 
non-target fields. Both in 50ab440.
   
   It stayed latent because nothing exercised this path: Spark compaction reads 
base files via the InternalRow reader, and the Java/Flink AVRO path fails fast 
first (no provider).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to