alamb commented on issue #7715:
URL: https://github.com/apache/arrow-rs/issues/7715#issuecomment-3058435051

   > We should start figuring out what that low-level support looks like. A 
likely starting point would be the ability to insert and remove specific 
variant values from an existing variant object. These should be cheap 
byte-shuffling operations that don't waste time introspecting unrelated parts 
of the variant value buffer. And it needs to be efficient even when doing 
recursive inserts and removes as part of a partial (un)shredding operation.
   
   
   @friendlymatthew and I spoke a bit about this API today. Here is what I 
think I heard
   
   Add new kernels (in the `parquet-variant-compute` crate that @harshmotw-db 
is making in https://github.com/apache/arrow-rs/pull/7884), something like
   
   ## Field Access
   
   The first kernel  we need is something to extract a field from a variant
   
   Here is a databricks function that does this:  
https://docs.databricks.com/gcp/en/sql/language-manual/functions/variant_get
   
   ```rust
   /// Given a StructArray with a Variant value stored as `metadata`, `value`, 
and optionally typed_value fields
   /// returns the specified field
   /// The returned array might be another Variant StructArray or a Primitive 
or StringArray
   /// if the requested field was shredded
   pub fn variant_get(variant_array: StructArray, path: VariantPath) -> 
Result>ArrayRef { 
   ..
   }
   ```
   
   Open questions: 
   1. What should the "path" argument be? A String? A JSON path? Some 
structured thing (Vec<PathSegment>)`?
   2. Should we also provide a "requested data type" field? Similar to the data 
bricks function
   
   ```rust
   /// Given a StructArray with a Variant value stored as `metadata`, `value`, 
and optionally typed_value fields
   /// returns the specified field CAST TO `as_type`, TYPE, IF SPECIFIED
   /// 
   /// if `as_type` is None, the returned array might be another Variant 
StructArray or a Primitive or StringArray
   /// if the requested field was shredded 
   ///
   /// if `as_type` is Some(type) the field is returned as the specified type. 
To specify returning
   /// a Variant, pass a Field with variant type in the metadata
   pub fn variant_get(variant_array: StructArray, path: VariantPath, as_type: 
Option<&Field>) -> Result<ArrayRef> { 
   ..
   }
   ```
   
   
   ## Shredding Kernel
   ```rust
   /// Given a StructArray with a Variant value stored in the `metadata` and 
value fields, 
   /// returns a new StructArray with metadata, value, and typed_value fields
   /// that have the specified columns "shredded" into strongly typed columns
   pub fn shred_variant(variant_array: StructArray, spec: 
ShreddingSpecification) -> StructArray { 
   ..
   }
   ```
   
   
   Open questions:
   1. What does `ShreddingSpecification` look like (we could look at the API in 
iceberg-java) to figure this out
   2. What should happen if the input `variant_array` already has some shredded 
columns
   3. Do we need an `unshred_variant` kernel
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to