alamb opened a new issue, #7895:
URL: https://github.com/apache/arrow-rs/issues/7895

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   - part of https://github.com/apache/arrow-rs/issues/6736
   - This is a follow up to https://github.com/apache/arrow-rs/issues/7715
   
   As we begin to contemplate how to read and write shredded variants, we will 
need some way to construct arrow arrays that contain shredded variants 
   
   Physically these will be Arrow `StructArrays` with two or three fields
   * Non shredded: (2 fields) `STRUCT { "metadata": Binary, "value": Binary}`
   * Shredded: (3 fields)`STRUCT { "metadata": Binary, "value": Binary, 
typed_value: STRUCT { ... } }`
   
   More information on to represent Variants as Arrow arrays can be found on 
the proposal: 
   - https://github.com/apache/arrow/issues/46908
   - Google Document: 
https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing
   
   
   **Describe the solution you'd like**
   
   I would like some way to construct such shredded arrays easily and 
efficiently in Idomatic Rust style
   
   **Describe alternatives you've considered**
   
   One an idea from @zeroshade (thank you!) is to create a 
`VariantArrayBuilder` that is responsible for building the correct 
`StructArray`s from variants, including shredding out any columns.  In order to 
created a shredded output, you would  provide the shredded schema up front
   
   For example,  (based on the go implemntation), to create a shredded Arrow 
array that shreds out columns "foo" and "bar" from any variant objects, 
   
   We would need this schema:
   ```text
   STRUCT {
     metadata: BinaryView,
     value: BinaryView,
     typed_value: STRUCT {
       foo: Int64,
       bar: Int32
     }
   }
   ```
   
   The code would look like this
   ```rust
   // Create an arrow Field that describes the desired shredded output schema
   let shredded_schema = Field::new_struct(
       vec![ "metadata", "value", "typed_value"],
       vec![Field::new(DataType::BinaryView), Field::new(DataType::BinaryView), 
Field:::new_struct(
           vec!["foo", "bar"],
           vec![Field::new(DataType::Int64), Field::new(DataType::Int32)],
        ));
   
   // Create a builder for an array (batch) of Variant values
   let array_builder = VariantArrayBuilder::new(shredded_schema);
   
   // append a row to the builder
   let object= array_builder.new_object();
   ... add appropriate fields ...
   // use like normal ObjectBuilder(??)
   object.finish()
   
   // append a second row (has no foo or bar fields)
   array_builder.append_value(43);
   ...
   
   /// Finalze the builder
   let variant_array: StructArray = array_builder.build()?;
   // variant_array is a shreded variant
   ```
   
   I think a VariantArrayBuilder will be helpful for usecases other than 
Variant, and @harshmotw-db has created some version of one here:
   - https://github.com/apache/arrow-rs/issues/7883
   
   
   ### Prior Art
   Golang implementation: 
   - 
https://github.com/apache/arrow-go/blob/main/arrow/extensions/variant_test.go
   - https://github.com/apache/arrow-go/blob/main/arrow/extensions/variant.go
   - Here are some examples of it being used: 
https://github.com/apache/arrow-go/blob/b196d3b316d09f63786f021d4f1baa1fdd7620d2/arrow/extensions/variant_test.go#L363-L391
   - Spark variant code: 
https://github.com/apache/spark/tree/master/common/variant/src/main/java/org/apache/spark/types/variant
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to