alamb commented on code in PR #7919:
URL: https://github.com/apache/arrow-rs/pull/7919#discussion_r2202901396


##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -0,0 +1,265 @@
+use std::sync::Arc;
+
+use arrow::{
+    array::{
+        Array, ArrayRef, ArrowPrimitiveType, BinaryArray, PrimitiveArray, 
PrimitiveBuilder,
+        StructArray,
+    },
+    compute::CastOptions,
+    datatypes::UInt64Type,
+    error::Result,
+};
+use arrow_schema::{ArrowError, DataType, Field};
+use parquet_variant::Variant;
+
+use crate::utils::variant_from_struct_array;
+
+/// Returns an array with the specified path extracted from the variant values.
+pub fn variant_get(input: &ArrayRef, options: GetOptions) -> Result<ArrayRef> {
+    let (struct_array, metadata_array, value_array) = 
variant_from_struct_array(input)?;
+
+    // TODO: can we use OffsetBuffer and NullBuffer here instead?
+    //       I couldn't find a way to set individual indices here, so went 
with vecs.
+    let mut offsets = vec![0; struct_array.len()];
+    let mut nulls = if let Some(struct_nulls) = struct_array.nulls() {
+        struct_nulls.iter().collect()
+    } else {
+        vec![true; struct_array.len()]
+    };
+
+    for path in options
+        .path
+        .0
+        .iter()
+        .take(options.path.0.len().saturating_sub(1))
+    {
+        match path {
+            VariantPathElement::Field { name } => {
+                go_to_object_field(
+                    struct_array,
+                    metadata_array,
+                    value_array,
+                    name,
+                    &mut offsets,
+                    &mut nulls,
+                )?;
+            }
+            VariantPathElement::Index { offset } => {
+                go_to_array_index(
+                    struct_array,
+                    metadata_array,
+                    value_array,
+                    *offset,
+                    &mut offsets,
+                    &mut nulls,
+                )?;
+            }
+        }
+    }
+
+    let as_type = options.as_type.ok_or_else(|| {

Review Comment:
   another way to potentially simplify / break this PR up into smaller parts 
would be to first focus on a `variant_get` that only supports returning the 
requested field as another VariantArray.
   
   We could then work out how to support casting to primitive types after that. 



##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -0,0 +1,265 @@
+use std::sync::Arc;
+
+use arrow::{
+    array::{
+        Array, ArrayRef, ArrowPrimitiveType, BinaryArray, PrimitiveArray, 
PrimitiveBuilder,
+        StructArray,
+    },
+    compute::CastOptions,
+    datatypes::UInt64Type,
+    error::Result,
+};
+use arrow_schema::{ArrowError, DataType, Field};
+use parquet_variant::Variant;
+
+use crate::utils::variant_from_struct_array;
+
+/// Returns an array with the specified path extracted from the variant values.
+pub fn variant_get(input: &ArrayRef, options: GetOptions) -> Result<ArrayRef> {

Review Comment:
   While trying to review this PR, one thing that I thought might help could be 
to separate out the Variant traversal and the construction of the output
   
   For example, perhaps you could implement a function to find the appropriate 
sub Variant  like
   
   ```rust
   impl Variant {
     fn get_path(&self, path: &VariantPath) -> Option<Variant> {
       ...
     }
   }
   ```
   
   With that building block, you could implement the basic "extract a variant" 
kernel to a new VariantArray using the newly introduced `VariantArrayBuilder` 
   
   ```rust
   let mut output_builder = VariantArrayBuilder::new();
   // loop over each input row
   for input_variant in input_variant_array.iter() {
     // copy the input variant to the output builder
     let mut vb = VariantBuilder::new()
     vb.append(input_variant)
     let (metadata, value) = vb.build();
     output_builder.append_buffers()
   }
   ```
   
   I think once we get https://github.com/apache/arrow-rs/pull/7914 from 
@friendlymatthew in that should work
   
   One downside of this approach is that it will copy the variant over and if 
the source and destinations are both BinaryViewArray we can probably be much 
more clever
   



##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -0,0 +1,265 @@
+use std::sync::Arc;
+
+use arrow::{
+    array::{
+        Array, ArrayRef, ArrowPrimitiveType, BinaryArray, PrimitiveArray, 
PrimitiveBuilder,
+        StructArray,
+    },
+    compute::CastOptions,
+    datatypes::UInt64Type,
+    error::Result,
+};
+use arrow_schema::{ArrowError, DataType, Field};
+use parquet_variant::Variant;
+
+use crate::utils::variant_from_struct_array;
+
+/// Returns an array with the specified path extracted from the variant values.
+pub fn variant_get(input: &ArrayRef, options: GetOptions) -> Result<ArrayRef> {
+    let (struct_array, metadata_array, value_array) = 
variant_from_struct_array(input)?;
+
+    // TODO: can we use OffsetBuffer and NullBuffer here instead?
+    //       I couldn't find a way to set individual indices here, so went 
with vecs.
+    let mut offsets = vec![0; struct_array.len()];
+    let mut nulls = if let Some(struct_nulls) = struct_array.nulls() {
+        struct_nulls.iter().collect()
+    } else {
+        vec![true; struct_array.len()]
+    };
+
+    for path in options

Review Comment:
   As I mentioned above I think it might be easier to implement this function 
as a method on variant itself



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to