sdf-jkl commented on code in PR #10015:
URL: https://github.com/apache/arrow-rs/pull/10015#discussion_r3349401253
##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -868,6 +868,20 @@ impl TryFrom<&StructArray> for ShreddingState {
}
}
+/// Build the `typed_value` [`FieldRef`] for a shredded column.
+///
+/// The Variant spec maps `FixedSizeBinary(16)` exclusively to UUID, so any
Review Comment:
| Variant Type | Parquet Physical Type | Parquet
Logical Type |
|-----------------------------|-----------------------------------|--------------------------|
...
| decimal16 | BYTE_ARRAY / FIXED_LEN_BYTE_ARRAY |
DECIMAL(P, S) |
...
| uuid | FIXED_LEN_BYTE_ARRAY[len=16] | UUID
|
...
Only these two logical Variant types can be Physically stored as FLBA in
Parquet.
On the arrow side we have `Decimal` types, so there's no need for
`FixedSizedBinary` in-memory representation.
For UUID we used FSB(16) with the extension type that parquet writer picks
up. Nothing other than UUID can produce a Shredded FSB(16).
Given that only these two types can be physically stored as FLBA it makes
little sense to allow other types, like Binary to be shredded into FSB in
memory. Since their physical representation is limited by the spec and we'd
have to cast back to Binary before writing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]