sdf-jkl commented on PR #10015:
URL: https://github.com/apache/arrow-rs/pull/10015#issuecomment-4633398313

   > Meanwhile -- requesting changes because AFAIK this tweak (hack?) only 
applies to top-level fields of a shredded variant column. We have no testing 
for nested uuid columns, but I'm pretty sure they would still write out to 
parquet as bare binary arrays? 
   
   Parquet writer recursively travels through each field and runs 
`arrow_to_parquet_type` on it, so nested fields ext type is preserved. (I added 
a nested uuid unit test that passes - 
https://github.com/apache/arrow-rs/pull/10015/commits/70d2242b5615e70151d7926cd03e688e189a2c00)
   
   
https://github.com/apache/arrow-rs/blob/e5e66fa05ce986fa6a3f61c36c11ff6b476cb76c/parquet/src/arrow/schema/mod.rs#L776-L783
   
   ----
   
   
   > Probably not even validated (because VariantArray::from_parts does not 
invoke canonicalize_shredded_types)?
   
   I think the point of `from_parts` is that we assume we created valid parts 
for it to build from. `canonicalize_shredded_types` is more for validating 
already created `VariantArray`s that came from somewhere and we don't know if 
they're valid (that's why used in `VariantArray::try_new`).
   
   ----
   
   
   > Thinking more about this -- should canonicalize_and_verify_data_type add 
the UUID extension type to FixedSizeBinary(16) fields it encounters? (it 
already has a match arm that enforces the length). In theory it would be easier 
to plumb the necessary Field information through that private helper, but I'm 
not sure how it would work with top-level UUID fields?
   > 
   > Also -- why is VariantArray::from_parts the correct (and only) place we 
should be imposing the UUID extension type? Do other code paths need similar 
treatment so the behavior is consistent?
   
   I added uuid support to `canonicalize_and_verify_field`. It will add uuid 
ext type on read for `VariantArray` that come from different writers that 
potentially omit it.
   
   I'm not sure about that last change, so let's think about it :nerd_face: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to