hudi-agent commented on code in PR #18599:
URL: https://github.com/apache/hudi/pull/18599#discussion_r3245295431


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkLanceWriter.java:
##########
@@ -293,6 +305,54 @@ private static Field rewriteBlobDataChild(Field 
blobStructField) {
     return new Field(blobStructField.getName(), 
blobStructField.getFieldType(), rebuilt);
   }
 
+  /**
+   * Single-pass walk that returns (a) the enriched schema with top-level
+   * {@code VariantType} fields replaced by Hudi's canonical
+   * {@code Struct[metadata: binary, value: binary]} (tagged {@code 
hudi_type=VARIANT}
+   * so {@code HoodieSparkSchemaConverters} promotes it back on read), and (b) 
the
+   * variant ordinals in ascending order. {@code LanceArrowUtils} has no 
VariantType
+   * case, so we hand it a plain struct. Top-level only - nested variants are 
not
+   * yet supported.
+   */
+  private static Pair<StructType, int[]> enrichForLanceVariant(StructType 
sparkSchema) {
+    StructField[] fields = sparkSchema.fields();
+    StructField[] newFields = null;
+    List<Integer> ordinals = null;
+    for (int i = 0; i < fields.length; i++) {
+      StructField field = fields[i];
+      if 
(!SparkAdapterSupport$.MODULE$.sparkAdapter().isVariantType(field.dataType())) {
+        if (newFields != null) {
+          newFields[i] = field;
+        }
+        continue;
+      }
+      if (newFields == null) {
+        newFields = new StructField[fields.length];
+        System.arraycopy(fields, 0, newFields, 0, i);
+        ordinals = new ArrayList<>();
+      }
+      ordinals.add(i);
+      StructField metaField = new StructField(
+          HoodieSchema.Variant.VARIANT_METADATA_FIELD, DataTypes.BinaryType, 
false, Metadata.empty());
+      StructField valField = new StructField(
+          HoodieSchema.Variant.VARIANT_VALUE_FIELD, DataTypes.BinaryType, 
false, Metadata.empty());
+      StructType variantStruct = new StructType(new StructField[] {metaField, 
valField});
+      Metadata enriched = new MetadataBuilder()
+          .withMetadata(field.metadata())
+          .putString(HoodieSchema.TYPE_METADATA_FIELD, 
HoodieSchemaType.VARIANT.name())
+          .build();
+      newFields[i] = new StructField(field.name(), variantStruct, 
field.nullable(), enriched);
+    }
+    if (newFields == null) {

Review Comment:
   🤖 nit: could you replace the manual unboxing loop with 
`ordinals.stream().mapToInt(Integer::intValue).toArray()`? It expresses the 
intent in one line.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkLanceWriter.java:
##########
@@ -399,13 +460,59 @@ protected void updateRecordMetadata(InternalRow row,
     row.update(FILENAME_METADATA_FIELD.ordinal(), fileName);
   }
 
-  @AllArgsConstructor(staticName = "of")
-  private static class SparkArrowWriter implements ArrowWriter<InternalRow> {
+  /**
+   * Forwards rows to the lance-spark {@link LanceArrowWriter}. When the schema
+   * has no {@code VariantType} columns, rows are passed through directly. 
When it
+   * does, a single {@link VariantProjectedRow} instance is reused per row to
+   * delegate every accessor to the underlying input row except at variant
+   * ordinals, where it returns a pre-allocated {@code (metadata, value)} 
struct
+   * populated by {@link 
org.apache.spark.sql.hudi.SparkAdapter#createVariantValueWriter}.
+   */
+  private static final class SparkArrowWriter implements 
ArrowWriter<InternalRow> {
     private final LanceArrowWriter lanceArrowWriter;
+    private final VariantProjectedRow projectedRow;
+
+    private SparkArrowWriter(LanceArrowWriter lanceArrowWriter, 
VariantProjectedRow projectedRow) {
+      this.lanceArrowWriter = lanceArrowWriter;
+      this.projectedRow = projectedRow;
+    }
+
+    static SparkArrowWriter of(LanceArrowWriter lanceArrowWriter,
+                               StructType inputSchema,
+                               int[] variantOrdinals) {
+      if (variantOrdinals.length == 0) {
+        return new SparkArrowWriter(lanceArrowWriter, null);
+      }

Review Comment:
   🤖 nit: the two-step init (`new ArrayList<>(numFields)` + null-fill loop) 
could be collapsed to `new ArrayList<>(Collections.nCopies(numFields, null))`, 
which is a bit more declarative about what's happening.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to