github-actions[bot] commented on code in PR #64949:
URL: https://github.com/apache/doris/pull/64949#discussion_r3490591619
##########
be/src/core/data_type_serde/data_type_string_serde.cpp:
##########
@@ -246,6 +246,29 @@ Status
DataTypeStringSerDeBase<ColumnType>::write_column_to_arrow(
} else if (array_builder->type()->id() == arrow::Type::STRING) {
auto& builder = assert_cast<arrow::StringBuilder&>(*array_builder);
return write_column_to_arrow_impl(column, null_map, builder, start,
end);
+ } else if (array_builder->type()->id() == arrow::Type::BINARY) {
+ auto& builder = assert_cast<arrow::BinaryBuilder&>(*array_builder);
+ return write_column_to_arrow_impl(column, null_map, builder, start,
end);
+ } else if (array_builder->type()->id() == arrow::Type::FIXED_SIZE_BINARY) {
+ auto& builder =
assert_cast<arrow::FixedSizeBinaryBuilder&>(*array_builder);
+ const int byte_width =
+ static_cast<const
arrow::FixedSizeBinaryType&>(*array_builder->type()).byte_width();
+ const auto& string_column = assert_cast<const ColumnType&>(column);
+ for (size_t string_i = start; string_i < end; ++string_i) {
+ if (null_map && (*null_map)[string_i]) {
+ RETURN_IF_ERROR(checkArrowStatus(builder.AppendNull(), column,
builder));
+ continue;
+ }
+ auto string_ref = string_column.get_data_at(string_i);
+ if (string_ref.size != byte_width) {
Review Comment:
`CatalogProperty` leaves `enable.mapping.varbinary` false by default, and in
that mode `IcebergUtils.icebergPrimitiveTypeToDorisType()` still exposes
Iceberg UUID columns as Doris `STRING`. The writer then passes the original
Iceberg schema through `VIcebergPartitionWriter`/`VParquetTransformer`, so this
PR creates a `fixed_size_binary(16)` Arrow builder while the block column is
still handled by the string serde. A normal UUID value like
`123e4567-e89b-12d3-a456-426614174000` is 36 bytes, so this width check returns
`InvalidArgument` instead of writing the row. Please either convert UUID
strings to their 16-byte representation before appending, or keep this
schema/data-type mapping consistent for the default UUID-as-STRING path, and
add coverage that writes a UUID column with `enable.mapping.varbinary=false`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]