Re: [PR] feat(flink): Support writing VECTOR columns for flink writer [hudi]

via GitHub Sat, 13 Jun 2026 18:23:52 -0700


hudi-agent commented on code in PR #18877:
URL: https://github.com/apache/hudi/pull/18877#discussion_r3408854284



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringOperator.java:
##########
@@ -169,6 +169,8 @@ public void open() throws Exception {
     this.table = writeClient.getHoodieTable();
 
     this.schema = HoodieSchemaConverter.convertToSchema(rowType);
+    // update write schema to include metadata fields.

Review Comment:
   🤖 nit: the comment says "to include metadata fields" but the schema being 
set here comes straight from `HoodieSchemaConverter.convertToSchema(rowType)` — 
no `addMetadataFields` call in sight. Could you clarify what this comment is 
actually describing? As written it reads like the schema contains metadata 
fields when it may not, which could mislead whoever touches this next.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/storage/row/RowDataParquetWriteSupport.java:
##########
@@ -38,21 +36,21 @@
  */
 public class RowDataParquetWriteSupport extends WriteSupport<RowData> {
 
-  private final RowType rowType;
-  private final MessageType schema;
+  protected final HoodieSchema schema;

Review Comment:
   🤖 nit: the field used to be named `hoodieSchema` as the constructor 
parameter, but now both the field and the parameter are just `schema`. Right 
next to `fileSchema` (the Parquet `MessageType`), the unqualified `schema` is a 
bit ambiguous — could you keep the name `hoodieSchema` for the field to make it 
immediately clear which schema type each field holds?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/util/HoodieSchemaConverter.java:
##########
@@ -238,6 +279,30 @@ public static HoodieSchema convertToSchema(LogicalType 
logicalType, String rowNa
     return nullable ? HoodieSchema.createNullable(schema) : schema;
   }
 
+  /**
+   * Validates that all configured VECTOR columns resolve to top-level fields 
of the row schema.
+   *
+   * <p>Invoked by {@link #convertToSchema(LogicalType, String, String)} 
before schema inference so
+   * an unknown column is rejected instead of being silently ignored during 
conversion.
+   *
+   * @param logicalType   Flink logical type
+   * @param vectorColumns parsed vector columns (normalized column name to 
dimension), may be empty
+   */
+  private static void validateVectorColumns(LogicalType logicalType, 
Map<String, Integer> vectorColumns) {
+    if (vectorColumns.isEmpty()) {
+      return;
+    }
+    List<String> normalizedFieldNames = ((RowType) 
logicalType).getFieldNames().stream()
+        .map(fieldName -> fieldName.toLowerCase(Locale.ROOT))
+        .collect(Collectors.toList());
+    vectorColumns.keySet().stream()
+        .filter(vectorColumn -> !normalizedFieldNames.contains(vectorColumn))
+        .findFirst()
+        .ifPresent(vectorColumn -> {

Review Comment:
   🤖 nit: using `ifPresent(x -> { throw ... })` to signal an error is 
non-idiomatic — `ifPresent` reads as a "do something with a value" operation, 
not an exception gate. Have you considered `findFirst().ifPresent(col -> ...)` 
→ just storing the Optional and throwing explicitly if `isPresent()`? Something 
like `Optional<String> unknown = ...; if (unknown.isPresent()) throw new 
IllegalArgumentException(...)` makes the control flow immediately obvious.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(flink): Support writing VECTOR columns for flink writer [hudi]

Reply via email to