alamb commented on code in PR #7919: URL: https://github.com/apache/arrow-rs/pull/7919#discussion_r2202901396
########## parquet-variant-compute/src/variant_get.rs: ########## @@ -0,0 +1,265 @@ +use std::sync::Arc; + +use arrow::{ + array::{ + Array, ArrayRef, ArrowPrimitiveType, BinaryArray, PrimitiveArray, PrimitiveBuilder, + StructArray, + }, + compute::CastOptions, + datatypes::UInt64Type, + error::Result, +}; +use arrow_schema::{ArrowError, DataType, Field}; +use parquet_variant::Variant; + +use crate::utils::variant_from_struct_array; + +/// Returns an array with the specified path extracted from the variant values. +pub fn variant_get(input: &ArrayRef, options: GetOptions) -> Result<ArrayRef> { + let (struct_array, metadata_array, value_array) = variant_from_struct_array(input)?; + + // TODO: can we use OffsetBuffer and NullBuffer here instead? + // I couldn't find a way to set individual indices here, so went with vecs. + let mut offsets = vec![0; struct_array.len()]; + let mut nulls = if let Some(struct_nulls) = struct_array.nulls() { + struct_nulls.iter().collect() + } else { + vec![true; struct_array.len()] + }; + + for path in options + .path + .0 + .iter() + .take(options.path.0.len().saturating_sub(1)) + { + match path { + VariantPathElement::Field { name } => { + go_to_object_field( + struct_array, + metadata_array, + value_array, + name, + &mut offsets, + &mut nulls, + )?; + } + VariantPathElement::Index { offset } => { + go_to_array_index( + struct_array, + metadata_array, + value_array, + *offset, + &mut offsets, + &mut nulls, + )?; + } + } + } + + let as_type = options.as_type.ok_or_else(|| { Review Comment: another way to potentially simplify / break this PR up into smaller parts would be to first focus on a `variant_get` that only supports returning the requested field as another VariantArray. We could then work out how to support casting to primitive types after that. ########## parquet-variant-compute/src/variant_get.rs: ########## @@ -0,0 +1,265 @@ +use std::sync::Arc; + +use arrow::{ + array::{ + Array, ArrayRef, ArrowPrimitiveType, BinaryArray, PrimitiveArray, PrimitiveBuilder, + StructArray, + }, + compute::CastOptions, + datatypes::UInt64Type, + error::Result, +}; +use arrow_schema::{ArrowError, DataType, Field}; +use parquet_variant::Variant; + +use crate::utils::variant_from_struct_array; + +/// Returns an array with the specified path extracted from the variant values. +pub fn variant_get(input: &ArrayRef, options: GetOptions) -> Result<ArrayRef> { Review Comment: While trying to review this PR, one thing that I thought might help could be to separate out the Variant traversal and the construction of the output For example, perhaps you could implement a function to find the appropriate sub Variant like ```rust impl Variant { fn get_path(&self, path: &VariantPath) -> Option<Variant> { ... } } ``` With that building block, you could implement the basic "extract a variant" kernel to a new VariantArray using the newly introduced `VariantArrayBuilder` ```rust let mut output_builder = VariantArrayBuilder::new(); // loop over each input row for input_variant in input_variant_array.iter() { // copy the input variant to the output builder let mut vb = VariantBuilder::new() vb.append(input_variant) let (metadata, value) = vb.build(); output_builder.append_buffers() } ``` I think once we get https://github.com/apache/arrow-rs/pull/7914 from @friendlymatthew in that should work One downside of this approach is that it will copy the variant over and if the source and destinations are both BinaryViewArray we can probably be much more clever ########## parquet-variant-compute/src/variant_get.rs: ########## @@ -0,0 +1,265 @@ +use std::sync::Arc; + +use arrow::{ + array::{ + Array, ArrayRef, ArrowPrimitiveType, BinaryArray, PrimitiveArray, PrimitiveBuilder, + StructArray, + }, + compute::CastOptions, + datatypes::UInt64Type, + error::Result, +}; +use arrow_schema::{ArrowError, DataType, Field}; +use parquet_variant::Variant; + +use crate::utils::variant_from_struct_array; + +/// Returns an array with the specified path extracted from the variant values. +pub fn variant_get(input: &ArrayRef, options: GetOptions) -> Result<ArrayRef> { + let (struct_array, metadata_array, value_array) = variant_from_struct_array(input)?; + + // TODO: can we use OffsetBuffer and NullBuffer here instead? + // I couldn't find a way to set individual indices here, so went with vecs. + let mut offsets = vec![0; struct_array.len()]; + let mut nulls = if let Some(struct_nulls) = struct_array.nulls() { + struct_nulls.iter().collect() + } else { + vec![true; struct_array.len()] + }; + + for path in options Review Comment: As I mentioned above I think it might be easier to implement this function as a method on variant itself -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org