scovich commented on issue #8549:
URL: https://github.com/apache/arrow-rs/issues/8549#issuecomment-3367125440
> FWIW I would expect a Decimal32Array, i.e. the arrow version, to use the
embedded arrow schema to roundtrip. But I suspect you are referring to
roundtripping outside arrow.
You're right, the parquet files used by variant shredding integration are
written by parquet-mr, not arrow:
```
File path: parquet-testing/shredded_variant/case-025.parquet
Created by: parquet-mr version 1.16.0-SNAPSHOT (build
ee34713e4d906d61f95d2b09145945638b2e2296)
Properties:
parquet.avro.schema:
{"type":"record","name":"table","fields":[{"name":"id","type":"int"},{"name":"var","type":["null",{"type":"record","name":"var","fields":[{"name":"metadata","type":"bytes"},{"name":"value","type":["null","bytes"],"default":null},{"name":"typed_value","type":["null","int"],"default":null}]}],"default":null}]}
writer.model.name: avro
Schema:
message table {
required int32 id = 1;
optional group var (VARIANT(1)) = 2 {
required binary metadata;
optional binary value;
optional int32 typed_value (DECIMAL(9,4));
}
}
Row group 0: count: 1 127.00 B records start: 4 total(compressed): 127 B
total(uncompressed):127 B
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
id INT32 _ _ 1 27.00 B 0 "1" / "1"
var.metadata BINARY _ _ 1 36.00 B 0 "0x010000"
/ "0x010000"
var.value BINARY _ _ 1 30.00 B 1
var.typed_value INT32 _ _ 1 34.00 B 0
"-12345.6789" / "-12345.6789"
```
... tho even lacking arrow info, it seems like an `INT32 (Decimal(9, 4))`
column really is decimal32?
> IIRC there may even be some smarts on the write path to use smaller
representations when writing Decimal128Array based on the precision, which
would further complicate any changes here.
You mean, the parquet writer might emit an `INT32 (Decimal(9, 4))` column
when asked to write out a `Decimal128(9, 4)` column? Or it might emit an `INT32
(Decimal(30, 4))` column if it notices that the values all happen to fit?
Either way, I thought the parquet writer's job was to emit what it was told
to emit, not get fancy with data types and automatic conversions?? That came up
_several_ times in variant shredding discussions...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]