alamb commented on issue #6736: URL: https://github.com/apache/arrow-rs/issues/6736#issuecomment-2703586014
I spent some time listening and thinking about this on the parquet call yesterday: https://lists.apache.org/thread/cnn6264g56jktrwmplz89x8cgkcvr4ql Note there is a thread on the arrow mailing list about adding variant support in arrow-rs: - https://lists.apache.org/thread/lsmkmxsp1qvjzn497z582hjm0w8hmg0n (and it looks like @wjones127 made some sort of demo using an extension type in datafusion): * https://github.com/datafusion-contrib/datafusion-functions-variant > Unfortunately I can't see an obvious way to be able to represent this sort of semi-structured data within the arrow format What I suggest is that the parquet reader reads variant columns as `Binary` / `LargeBinary` with an arrow extension type annotation, which would let downstream projects interpret / read the extension type correctly I think one challenge will be "how to tell the parquet writer to write / annotate the columns as variant" Before we can do anything useful with the variant type, we'll need a library to parse / interpret a variant value (aka the equivalent of a JSON parser / set of objects) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org