voonhous commented on code in PR #18938:
URL: https://github.com/apache/hudi/pull/18938#discussion_r3463846891


##########
hudi-hadoop-common/src/main/java/org/apache/hudi/io/storage/hadoop/HoodieVariantReconstruction.java:
##########


Review Comment:
   Good catch, this found a bug. 
   
   Added `TestHoodieVariantReconstructionRoundTrip` (in hudi-spark4-common, so 
the real provider is auto-detected): a `create` -> `reconstruct` round-trip 
over a record with a shredded variant column plus a non-variant column, 
asserting the variant is rebuilt, the non-variant column passes through, and 
the field positions line up.
   
   It failed on first run: `create` reused the requested field instances for 
non-target columns when assembling the intermediate read schema, and those Avro 
`Field`s are already bound to the requested record - so `Schema.setFields` 
throws `Field already used` as soon as a non-variant column sits alongside a 
shredded variant (i.e. every real table). The target branch only avoided it 
because `withSchema()` builds a fresh `Field`. Fixed by doing the same for 
non-target fields. Both in 50ab440.
   
   It stayed latent because nothing exercised this path: Spark compaction reads 
base files via the InternalRow reader, and the Java/Flink AVRO path fails fast 
first (no provider).



##########
hudi-hadoop-common/src/main/java/org/apache/hudi/io/storage/hadoop/HoodieVariantReconstruction.java:
##########


Review Comment:
   Good catch, this found a bug. 
   
   Added `TestHoodieVariantReconstructionRoundTrip` (in hudi-spark4-common, so 
the real provider is auto-detected): a `create` -> `reconstruct` round-trip 
over a record with a shredded variant column plus a non-variant column, 
asserting the variant is rebuilt, the non-variant column passes through, and 
the field positions line up.
   
   It failed on first run: `create` reused the requested field instances for 
non-target columns when assembling the intermediate read schema, and those Avro 
`Field`s are already bound to the requested record - so `Schema.setFields` 
throws `Field already used` as soon as a non-variant column sits alongside a 
shredded variant (i.e. every real table). The target branch only avoided it 
because `withSchema()` builds a fresh `Field`. Fixed by doing the same for 
non-target fields.
   
   It stayed latent because nothing exercised this path: Spark compaction reads 
base files via the InternalRow reader, and the Java/Flink AVRO path fails fast 
first (no provider).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to