sdf-jkl commented on code in PR #10015:
URL: https://github.com/apache/arrow-rs/pull/10015#discussion_r3349401253


##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -868,6 +868,20 @@ impl TryFrom<&StructArray> for ShreddingState {
     }
 }
 
+/// Build the `typed_value` [`FieldRef`] for a shredded column.
+///
+/// The Variant spec maps `FixedSizeBinary(16)` exclusively to UUID, so any

Review Comment:
   | Variant Type                | Parquet Physical Type             | Parquet 
Logical Type     |
   
|-----------------------------|-----------------------------------|--------------------------|
   ...
   | decimal16                   | BYTE_ARRAY / FIXED_LEN_BYTE_ARRAY | 
DECIMAL(P, S)            |
   ...
   | uuid                        | FIXED_LEN_BYTE_ARRAY[len=16]      | UUID     
                |
   ...
   
   Only these two logical Variant types can be Physically stored as FLBA in 
Parquet. 
   
   On the arrow side we have `Decimal` types, so there's no need for 
`FixedSizedBinary` in-memory representation.
   
   For UUID we used FSB(16) with the extension type that parquet writer picks 
up. Nothing other than UUID can produce a Shredded FSB(16).
   
   Given that only these two types can be physically stored as FLBA it makes 
little sense to allow other types, like Binary to be shredded into FSB in 
memory. Since their physical representation is limited by the spec and we'd 
have to cast back to Binary before writing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to