alamb commented on issue #7941: URL: https://github.com/apache/arrow-rs/issues/7941#issuecomment-3090534867
@friendlymatthew in https://github.com/apache/arrow-rs/pull/7915#discussion_r2203418536 Hi, how do we plan on storing `typed_value`s? Do we plan on encoding it as a `Variant` and storing the binary representation of it? Or since it is supposed to be strongly typed, we want to map to specific `DataType`s, which would require something like https://github.com/apache/arrow-rs/pull/7921 I've been playing around with the unshredding logic and the most naive version would look something like this. I don't mean this to be the actual implementation, but I am curious how we plan on representing `typed_value`. <details> <summary>naive reconstruct logic</summary> ```rs pub fn reconstruct_variant( metadata: &[u8], value: Option<&[u8]>, typed_value: Option<(&[u8], &[u8])>, // this is itself a variant ) -> Result<(Vec<u8>, Vec<u8>), ArrowError> { match typed_value { Some((typed_metadata, typed_value)) => { let variant = Variant::try_new(typed_metadata, typed_value)?; match variant { Variant::Null | Variant::Int8(_) | Variant::Int16(_) | Variant::Int32(_) | Variant::Int64(_) | Variant::Date(_) | Variant::TimestampMicros(_) | Variant::TimestampNtzMicros(_) | Variant::Decimal4(_) | Variant::Decimal8(_) | Variant::Decimal16(_) | Variant::Float(_) | Variant::Double(_) | Variant::BooleanTrue | Variant::BooleanFalse | Variant::Binary(_) | Variant::String(_) | Variant::ShortString(_) => Ok((typed_metadata.to_vec(), typed_value.to_vec())), Variant::Object(shredded_object) => { if let Some(value) = value { let variant = Variant::try_new(metadata, value)?; let partially_shredded_object = variant.as_object().ok_or(ArrowError::InvalidArgumentError( "partially shredded value must be an object".to_string(), ))?; let shredded_keys: HashSet<&str> = HashSet::from_iter(shredded_object.iter().map(|(k, _)| k)); let partially_shredded_keys: HashSet<&str> = HashSet::from_iter(partially_shredded_object.iter().map(|(k, _)| k)); if !shredded_keys.is_disjoint(&partially_shredded_keys) { return Err(ArrowError::InvalidArgumentError( "object keys must be disjoint".to_string(), )); } // union the two objects together let mut variant_builder = VariantBuilder::new(); let mut object_builder = variant_builder.new_object(); for (f, v) in shredded_object.iter() { object_builder.insert(f, v); } for (f, v) in partially_shredded_object.iter() { object_builder.insert(f, v); } object_builder.finish()?; return Ok(variant_builder.finish()); } Ok((typed_metadata.to_vec(), typed_value.to_vec())) } Variant::List(_) => { if value.is_some() { return Err(ArrowError::InvalidArgumentError( "shredded array must not conflict with variant value".to_string(), )); } return Ok((typed_metadata.to_vec(), typed_value.to_vec())); } } } None => match value { Some(value) => Ok((metadata.to_vec(), value.to_vec())), None => Err(ArrowError::InvalidArgumentError( "No value or typed value provided".to_string(), )), }, } } ``` </details> If we strongly type `typed_value` to whatever the variant type is, the primitive variants work nicely, since we'd be storing a `PrimitiveArray<T>` and we can easily index into it. For complex variants like objects/lists, I'm having a hard time figuring out how to strongly type these values. Maybe for these cases we should just store the binary encoded version of a Variant? (metadata, value). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org