Re: [I] [Variant] Add low level support for shredding and unshredding [arrow-rs]

via GitHub Fri, 11 Jul 2025 06:49:14 -0700


scovich commented on issue #7715:
URL: https://github.com/apache/arrow-rs/issues/7715#issuecomment-3062418674


   > > > Do we need an `unshred_variant` kernel
   > > 
   > > 
   > > Yes.
   > > If nothing else, we need a way for engines that don't support shredding 
to correctly consume shredded variant. Or who do support shredding to some 
degree, but don't want the high complexity of propagating shredded variant all 
through the query plan above the scan. Or who fully support variant but want to 
write it back out with a different shredding schema (see below).
   > 
   > I also agree with this [@scovich](https://github.com/scovich) -- however, 
I am not quite sure what the API would look like yet so I am not sure yet what 
ticket to file
   
   The public API seems simple enough? A shredded variant column would 
(physically) be a `StructArray` with `typed_value` alongside its `metadata` and 
`value` fields. I would expect an `unshred_variant` kernel to take such an 
input, and produce an output that does _not_ have a `typed_value` column any 
more. The spec requires that the `metadata` column already contain every needed 
variant path name, so it's really just a matter of rewriting the `value` column 
under the hood.
   
   The internal API (for low-level variant operations) would leverage a variant 
builder with some tweaks:
   * Wraps a `VariantMetadata` instead of a `VariantMetadataBuilder`, and field 
insertions fail if the key is not present
   * Ability to manually inject the bytes of an existing variant object as the 
value of an object field or array element, so we can bring across existing 
variant-encoded fields. With optional full validation, in case the input 
variant bytes are untrusted.
   
   That's what immediately comes to mind; some pathfinding would probably be 
required to go any further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [I] [Variant] Add low level support for shredding and unshredding [arrow-rs]

Reply via email to