liamzwbao commented on code in PR #8768:
URL: https://github.com/apache/arrow-rs/pull/8768#discussion_r2492349064


##########
parquet-variant-compute/src/variant_to_arrow.rs:
##########
@@ -60,6 +61,9 @@ pub(crate) enum PrimitiveVariantToArrowRowBuilder<'a> {
     TimestampNanoNtz(VariantToTimestampNtzArrowRowBuilder<'a, 
datatypes::TimestampNanosecondType>),
     Time(VariantToPrimitiveArrowRowBuilder<'a, 
datatypes::Time64MicrosecondType>),
     Date(VariantToPrimitiveArrowRowBuilder<'a, datatypes::Date32Type>),
+    Utf8(VariantToUtf8ArrowRowBuilder<'a, i32>),
+    LargeUtf8(VariantToUtf8ArrowRowBuilder<'a, i64>),
+    Binary(VariantToBinaryArrowRowBuilder<'a>),

Review Comment:
   Agreed on the first one as we may not need metadata column if explicitly 
cast to Binary.
   
   For the 2nd, I think the primitive is for arrow because we check 
`is_primitive` and will throw error if the request data type is not primitive 
[here](https://github.com/apache/arrow-rs/blob/6be6cbadf1f9bc0705b0aa97ca434ae3677ed43f/parquet-variant-compute/src/variant_to_arrow.rs#L325-L334).
 Also based on the definition 
[here](https://github.com/apache/arrow-rs/blob/6be6cbadf1f9bc0705b0aa97ca434ae3677ed43f/parquet-variant-compute/src/variant_to_arrow.rs#L36),
 this builder is to convert variant values to primitive arrow. Given that, 
having `Null/Boolean/Uuid` in this builder makes the intent tricky. I’d prefer 
we rename the builder and update the docs as needed. We don't need a separate 
builder tho as we can reuse a lot of codes here.
   
   For Variant primitive, we had a related 
[discussion](https://github.com/apache/arrow-rs/pull/8600#discussion_r2442296788)
 before. I think Parquet enforces primitive types at write time and unshred to 
ensure valid data. But for `variant_get`, it’s reasonable to allow casting to 
any valid Arrow datatype, even those without a direct Variant-primitive 
counterpart (e.g., `Decimal256` or `LargeUtf8`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to