hudi-agent commented on code in PR #18877:
URL: https://github.com/apache/hudi/pull/18877#discussion_r3428939730


##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetRowDataWriter.java:
##########
@@ -64,16 +69,19 @@ public class ParquetRowDataWriter {
 
   public ParquetRowDataWriter(
       RecordConsumer recordConsumer,
-      RowType rowType,
-      GroupType schema,
-      boolean utcTimestamp) {
+      boolean utcTimestamp,
+      HoodieSchema hoodieSchema) {
     this.recordConsumer = recordConsumer;
     this.utcTimestamp = utcTimestamp;
 
+    RowType rowType = HoodieSchemaConverter.convertToRowType(hoodieSchema);
     this.filedWriters = new FieldWriter[rowType.getFieldCount()];
     this.fieldNames = rowType.getFieldNames().toArray(new String[0]);
     for (int i = 0; i < rowType.getFieldCount(); i++) {
-      this.filedWriters[i] = createWriter(rowType.getTypeAt(i));
+      String fieldName = fieldNames[i];
+      HoodieSchema fieldSchema = 
hoodieSchema.getNonNullType().getField(fieldName).map(HoodieSchemaField::schema)

Review Comment:
   🤖 nit: `hoodieSchema.getNonNullType()` is called on every loop iteration — 
could you extract it to a local variable (e.g. `HoodieSchema nonNullSchema = 
hoodieSchema.getNonNullType();`) above the `for` loop? It would also align with 
how `nonNullSchema` is introduced inside `createWriter`.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/parquet/ParquetSchemaConverter.java:
##########
@@ -205,12 +209,35 @@ public static Pair<Type, Type> 
parquetMapKeyValueType(GroupType mapType) {
     return Pair.of(keyValue.getType(MAP_KEY_NAME), 
keyValue.getType(MAP_VALUE_NAME));
   }
 
+  /**
+   * Converts a Flink row type to a Parquet message type.
+   *
+   * <p>This overload preserves the pre-VECTOR public API, but converting from
+   * {@link RowType} to {@link HoodieSchema} loses VECTOR logical metadata 
such as dimension and
+   * element type. Prefer {@link #convertToParquetMessageType(String, 
HoodieSchema)} whenever the
+   * caller already has a metadata-aware {@link HoodieSchema}.
+   */
   public static MessageType convertToParquetMessageType(String name, RowType 
rowType) {
+    HoodieSchema hoodieSchema = HoodieSchemaConverter.convertToSchema(rowType);
+    return convertToParquetMessageType(name, hoodieSchema);
+  }

Review Comment:
   🤖 nit: the `RowType` overload's Javadoc tells callers to prefer this 
`HoodieSchema` overload, but the preferred overload itself has no Javadoc. It 
might be worth moving (or duplicating) the main description here so a developer 
who lands directly on this method understands its behaviour and the 
relationship without having to find the other overload first.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/util/RowDataToAvroConverters.java:
##########
@@ -331,6 +331,10 @@ private static RowDataToAvroConverter 
createArrayConverter(ArrayType arrayType,
 
       @Override
       public Object convert(HoodieSchema schema, Object object) {
+        if (schema.getType() == HoodieSchemaType.VECTOR) {
+          HoodieSchema.Vector vectorSchema = (HoodieSchema.Vector) schema;
+          return new GenericData.Fixed(schema.toAvroSchema(), 
VectorConversionUtils.encodeVectorArrayData((ArrayData) object, vectorSchema));

Review Comment:
   🤖 nit: `schema.toAvroSchema()` is used here right after casting `schema` to 
`vectorSchema` — could you use `vectorSchema.toAvroSchema()` instead to keep 
the two references consistent within the block?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to