limenilbuz opened a new issue, #9976:
URL: https://github.com/apache/arrow-rs/issues/9976
I created a sample Parquet file with pyarrow that contained Uuid columns.
Here is a test script:
```
use std::fs::File;
use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;
fn main() {
let file = File::open("uuids.parquet").unwrap();
let reader = ParquetRecordBatchReaderBuilder::try_new(file).unwrap();
let schema = reader.schema();
let field = schema.fields.get(0).unwrap();
println!("{:?}", field);
match field.try_canonical_extension_type() {
Ok(extension_type) => println!("I am of extension type: {:?}",
extension_type),
_ => println!("I am NOT an extension type")
}
}
```
and the output of `cargo run`:
```
$ cargo run
Field { name: "uuids", data_type: FixedSizeBinary(16), metadata:
{"ARROW:extension:metadata": "", "ARROW:extension:name": "arrow.uuid"} }
I am NOT an extension type
```
This is because in the C++ definition of
[UuidType](https://github.com/paleolimbot/arrow/blob/060062178ca85fa2d7dbd4083574bca6f91cc44c/cpp/src/arrow/extension/uuid.h#L52),
the SerDe methods both return and expect the empty string. This causes issues
in the Rust SerDe methods for
[Uuid](https://github.com/apache/arrow-rs/blob/2108f20db1f6bc300bc6e1deacc0fca299e7feda/arrow-schema/src/extension/canonical/uuid.rs#L53),
which return and expect Option::None.
I found this
[comment](https://github.com/apache/arrow-rs/pull/5822/changes#r1926065958) in
the original commit for the extension types. The simplest fix is just to accept
the empty string as valid metadata.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]