voonhous commented on code in PR #18938:
URL: https://github.com/apache/hudi/pull/18938#discussion_r3458800167


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/schema/TestVariantDataType.scala:
##########
@@ -142,6 +150,8 @@ class TestVariantDataType extends HoodieSparkSqlTestBase {
            |  primaryKey = 'id',
            |  type = 'mor',
            |  preCombineField = 'ts',
+           |  hoodie.parquet.variant.write.shredding.enabled = 'true',

Review Comment:
   `withRecordType` only swaps the record merger and log-block format, and 
Spark compaction reads base files via `SparkReaderContextFactory` -> 
`SparkFileFormatInternalRowReaderContext` (InternalRow), so 
`HoodieAvroParquetReader` (where 
`HoodieVariantReconstruction`/`rebuildVariantRecord` run) is never reached on 
either record type.
   
   I have added `TestSpark4VariantShreddingProvider`, a direct shred -> rebuild 
round-trip over the provider covering scalar (numeric/string/boolean/decimal), 
object, and array shapes - this exercises `rebuildVariantRecord` and the 
`AvroVariantRow`/`AvroObjectRow`/`AvroArrayRow` accessors. The two `create()` 
fail-fast branches are covered by `TestHoodieVariantReconstruction` (added for 
your other comment). Also corrected the misleading comment on the SQL test to 
say it covers the Spark InternalRow (native) read of a shredded base file, not 
the Avro path.
   
   Note on the Java-client suggestion: `Spark4VariantShreddingProvider` is the 
only `VariantShreddingProvider`, so a Java/Flink compaction over a shredded 
base file would hit the no-provider fail-fast branch rather than reconstruct - 
which is why I went with the direct unit test for the reconstruction coverage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to