github-actions[bot] commented on code in PR #64949:
URL: https://github.com/apache/doris/pull/64949#discussion_r3490591619


##########
be/src/core/data_type_serde/data_type_string_serde.cpp:
##########
@@ -246,6 +246,29 @@ Status 
DataTypeStringSerDeBase<ColumnType>::write_column_to_arrow(
     } else if (array_builder->type()->id() == arrow::Type::STRING) {
         auto& builder = assert_cast<arrow::StringBuilder&>(*array_builder);
         return write_column_to_arrow_impl(column, null_map, builder, start, 
end);
+    } else if (array_builder->type()->id() == arrow::Type::BINARY) {
+        auto& builder = assert_cast<arrow::BinaryBuilder&>(*array_builder);
+        return write_column_to_arrow_impl(column, null_map, builder, start, 
end);
+    } else if (array_builder->type()->id() == arrow::Type::FIXED_SIZE_BINARY) {
+        auto& builder = 
assert_cast<arrow::FixedSizeBinaryBuilder&>(*array_builder);
+        const int byte_width =
+                static_cast<const 
arrow::FixedSizeBinaryType&>(*array_builder->type()).byte_width();
+        const auto& string_column = assert_cast<const ColumnType&>(column);
+        for (size_t string_i = start; string_i < end; ++string_i) {
+            if (null_map && (*null_map)[string_i]) {
+                RETURN_IF_ERROR(checkArrowStatus(builder.AppendNull(), column, 
builder));
+                continue;
+            }
+            auto string_ref = string_column.get_data_at(string_i);
+            if (string_ref.size != byte_width) {

Review Comment:
   `CatalogProperty` leaves `enable.mapping.varbinary` false by default, and in 
that mode `IcebergUtils.icebergPrimitiveTypeToDorisType()` still exposes 
Iceberg UUID columns as Doris `STRING`. The writer then passes the original 
Iceberg schema through `VIcebergPartitionWriter`/`VParquetTransformer`, so this 
PR creates a `fixed_size_binary(16)` Arrow builder while the block column is 
still handled by the string serde. A normal UUID value like 
`123e4567-e89b-12d3-a456-426614174000` is 36 bytes, so this width check returns 
`InvalidArgument` instead of writing the row. Please either convert UUID 
strings to their 16-byte representation before appending, or keep this 
schema/data-type mapping consistent for the default UUID-as-STRING path, and 
add coverage that writes a UUID column with `enable.mapping.varbinary=false`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to