github-actions[bot] commented on code in PR #64949:
URL: https://github.com/apache/doris/pull/64949#discussion_r3490606269
##########
be/src/format/table/iceberg/arrow_schema_util.cpp:
##########
@@ -75,13 +75,24 @@ Status ArrowSchemaUtil::convert_to(const
iceberg::NestedField& field,
break;
}
- case iceberg::TypeID::BINARY:
case iceberg::TypeID::STRING:
- case iceberg::TypeID::UUID:
- case iceberg::TypeID::FIXED:
arrow_type = arrow::utf8();
break;
+ case iceberg::TypeID::BINARY:
+ arrow_type = arrow::binary();
+ break;
+
+ case iceberg::TypeID::UUID:
+ arrow_type = arrow::fixed_size_binary(16);
Review Comment:
This makes UUID writes fail for the default Iceberg catalog mapping. FE
still exposes Iceberg `uuid` columns as Doris `STRING` unless
`enable.mapping.varbinary` is set, and `IcebergTableSink` sends the original
Iceberg schema JSON to BE. With this change BE builds an Arrow
`fixed_size_binary(16)` field, then `convert_to_arrow_batch` calls the
`DataTypeStringSerDe` for the Doris string column. A normal UUID value like
`550e8400-e29b-41d4-a716-446655440000` is 36 bytes, so the new fixed-size
branch returns `InvalidArgument("Fixed size binary column expects 16 bytes, got
36")` instead of writing the row. Please either convert canonical UUID strings
to the 16-byte Iceberg representation before appending, or keep this mapping
aligned with the Doris column type / varbinary catalog setting. An end-to-end
BE test that writes a UUID-valued block through the Iceberg schema would catch
this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]