yihua commented on code in PR #18599:
URL: https://github.com/apache/hudi/pull/18599#discussion_r3251099684
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkLanceWriter.java:
##########
@@ -399,13 +460,59 @@ protected void updateRecordMetadata(InternalRow row,
row.update(FILENAME_METADATA_FIELD.ordinal(), fileName);
}
- @AllArgsConstructor(staticName = "of")
- private static class SparkArrowWriter implements ArrowWriter<InternalRow> {
+ /**
+ * Forwards rows to the lance-spark {@link LanceArrowWriter}. When the schema
+ * has no {@code VariantType} columns, rows are passed through directly.
When it
+ * does, a single {@link VariantProjectedRow} instance is reused per row to
+ * delegate every accessor to the underlying input row except at variant
+ * ordinals, where it returns a pre-allocated {@code (metadata, value)}
struct
+ * populated by {@link
org.apache.spark.sql.hudi.SparkAdapter#createVariantValueWriter}.
Review Comment:
Does this projection introduce overhead? Does Lance library or writer
provide its own projection or adaptation for Variant Type?
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkLanceWriter.java:
##########
@@ -293,6 +305,54 @@ private static Field rewriteBlobDataChild(Field
blobStructField) {
return new Field(blobStructField.getName(),
blobStructField.getFieldType(), rebuilt);
}
+ /**
+ * Single-pass walk that returns (a) the enriched schema with top-level
+ * {@code VariantType} fields replaced by Hudi's canonical
+ * {@code Struct[metadata: binary, value: binary]} (tagged {@code
hudi_type=VARIANT}
+ * so {@code HoodieSparkSchemaConverters} promotes it back on read), and (b)
the
+ * variant ordinals in ascending order. {@code LanceArrowUtils} has no
VariantType
+ * case, so we hand it a plain struct. Top-level only - nested variants are
not
+ * yet supported.
+ */
+ private static Pair<StructType, int[]> enrichForLanceVariant(StructType
sparkSchema) {
Review Comment:
Could `enrichForLanceVariant` be incorporated into
`enrichSparkSchemaForLance`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]