Re: [PR] Support reading/writing `VariantArray` to parquet with correct LogicalType [arrow-rs]

via GitHub Wed, 17 Sep 2025 12:48:26 -0700


alamb commented on code in PR #8365:
URL: https://github.com/apache/arrow-rs/pull/8365#discussion_r2356610808



##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -24,12 +24,54 @@ use arrow::datatypes::{
     Float16Type, Float32Type, Float64Type, Int16Type, Int32Type, Int64Type, 
Int8Type, UInt16Type,
     UInt32Type, UInt64Type, UInt8Type,
 };
+use arrow_schema::extension::ExtensionType;
 use arrow_schema::{ArrowError, DataType, Field, FieldRef, Fields};
 use parquet_variant::Uuid;
 use parquet_variant::Variant;
 use std::any::Any;
 use std::sync::Arc;
 
+/// Variant Canonical Extension Type
+pub struct VariantType;
+
+impl ExtensionType for VariantType {
+    const NAME: &'static str = "parquet.variant";
+
+    // Variants have no extension metadata
+    type Metadata = ();
+
+    fn metadata(&self) -> &Self::Metadata {
+        &()
+    }
+
+    fn serialize_metadata(&self) -> Option<String> {
+        None
+    }
+
+    fn deserialize_metadata(_metadata: Option<&str>) -> Result<Self::Metadata, 
ArrowError> {
+        Ok(())
+    }
+
+    fn supports_data_type(&self, data_type: &DataType) -> Result<(), 
ArrowError> {
+        // Note don't check for metadata/value fields here because they may be
+        // absent in shredded variants
+        if matches!(data_type, DataType::Struct(_)) {
+            Ok(())

Review Comment:
   I don't fully understand the problem, and thus probably don't fully 
understand your proposal
   
   I would expect that `variant_get` would always preserve the existing 
shredding structure, and if the user wants to unshred the result (e.g. ensure 
the output of variant_get is an unshredded variant) and that I would specify 
any conversions as part of `variant_get` rather than some annotation on the 
VariantArray itself
   
   Maybe we could add a `variant_unshred` kernel, potentially with a path 
argument to only partially unshred an object 🤔  I am a little unclear on the 
usecase for that type of operation thoug



##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -24,12 +24,54 @@ use arrow::datatypes::{
     Float16Type, Float32Type, Float64Type, Int16Type, Int32Type, Int64Type, 
Int8Type, UInt16Type,
     UInt32Type, UInt64Type, UInt8Type,
 };
+use arrow_schema::extension::ExtensionType;
 use arrow_schema::{ArrowError, DataType, Field, FieldRef, Fields};
 use parquet_variant::Uuid;
 use parquet_variant::Variant;
 use std::any::Any;
 use std::sync::Arc;
 
+/// Variant Canonical Extension Type
+pub struct VariantType;
+
+impl ExtensionType for VariantType {
+    const NAME: &'static str = "parquet.variant";
+
+    // Variants have no extension metadata
+    type Metadata = ();
+
+    fn metadata(&self) -> &Self::Metadata {
+        &()
+    }
+
+    fn serialize_metadata(&self) -> Option<String> {
+        None
+    }
+
+    fn deserialize_metadata(_metadata: Option<&str>) -> Result<Self::Metadata, 
ArrowError> {
+        Ok(())
+    }
+
+    fn supports_data_type(&self, data_type: &DataType) -> Result<(), 
ArrowError> {
+        // Note don't check for metadata/value fields here because they may be
+        // absent in shredded variants
+        if matches!(data_type, DataType::Struct(_)) {
+            Ok(())

Review Comment:
   I don't fully understand the problem, and thus probably don't fully 
understand your proposal
   
   I would expect that `variant_get` would always preserve the existing 
shredding structure, and if the user wants to unshred the result (e.g. ensure 
the output of variant_get is an unshredded variant) and that I would specify 
any conversions as part of `variant_get` rather than some annotation on the 
VariantArray itself
   
   Maybe we could add a `variant_unshred` kernel, potentially with a path 
argument to only partially unshred an object 🤔  I am a little unclear on the 
usecase for that type of operation though



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Support reading/writing `VariantArray` to parquet with correct LogicalType [arrow-rs]

Reply via email to