Re: [I] [Parquet] Implement Variant type support in Parquet [arrow-rs]

via GitHub Thu, 06 Mar 2025 03:32:32 -0800


alamb commented on issue #6736:
URL: https://github.com/apache/arrow-rs/issues/6736#issuecomment-2703586014


   I spent some time listening and thinking about this on the parquet call 
yesterday: https://lists.apache.org/thread/cnn6264g56jktrwmplz89x8cgkcvr4ql
   
   Note there is a thread on the arrow mailing list about adding variant 
support in arrow-rs:
   - https://lists.apache.org/thread/lsmkmxsp1qvjzn497z582hjm0w8hmg0n
   
   (and it looks like @wjones127 made some sort of demo using an extension type 
in datafusion):
   * https://github.com/datafusion-contrib/datafusion-functions-variant
   
   
   > Unfortunately I can't see an obvious way to be able to represent this sort 
of semi-structured data within the arrow format 
   
   What I suggest is that the parquet reader reads variant columns as `Binary` 
/ `LargeBinary` with an arrow extension type annotation, which would let 
downstream projects interpret / read the extension type correctly
   
   I think one challenge will be "how to tell the parquet writer to write / 
annotate the columns as variant"
   
   Before we can do anything useful with the variant type, we'll need a library 
to parse / interpret a variant value (aka the equivalent of a JSON parser / set 
of objects)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [Parquet] Implement Variant type support in Parquet [arrow-rs]

Reply via email to