scovich commented on code in PR #8365:
URL: https://github.com/apache/arrow-rs/pull/8365#discussion_r2360909097


##########
parquet-variant-compute/src/variant_array.rs:
##########
@@ -24,12 +24,54 @@ use arrow::datatypes::{
     Float16Type, Float32Type, Float64Type, Int16Type, Int32Type, Int64Type, 
Int8Type, UInt16Type,
     UInt32Type, UInt64Type, UInt8Type,
 };
+use arrow_schema::extension::ExtensionType;
 use arrow_schema::{ArrowError, DataType, Field, FieldRef, Fields};
 use parquet_variant::Uuid;
 use parquet_variant::Variant;
 use std::any::Any;
 use std::sync::Arc;
 
+/// Variant Canonical Extension Type
+pub struct VariantType;
+
+impl ExtensionType for VariantType {
+    const NAME: &'static str = "parquet.variant";
+
+    // Variants have no extension metadata
+    type Metadata = ();
+
+    fn metadata(&self) -> &Self::Metadata {
+        &()
+    }
+
+    fn serialize_metadata(&self) -> Option<String> {
+        None
+    }
+
+    fn deserialize_metadata(_metadata: Option<&str>) -> Result<Self::Metadata, 
ArrowError> {
+        Ok(())
+    }
+
+    fn supports_data_type(&self, data_type: &DataType) -> Result<(), 
ArrowError> {
+        // Note don't check for metadata/value fields here because they may be
+        // absent in shredded variants
+        if matches!(data_type, DataType::Struct(_)) {
+            Ok(())

Review Comment:
   The scenario I'm thinking of is a more forgiving replacement for normal json 
parsing, where the requested schema has a mix of strongly-typed and variant 
fields. So parse JSON to variant, and then use `variant_get` to extract the 
actual schema of interest. Fields that are known to have inconsistent (or 
legally flexible) behavior would be kept as variant, with strict typing 
enforcement applied only to the strongly typed fields (similar to today's JSON 
parsing with a schema). But in that case, we have a (potentially deeply) nested 
struct with multiple variant leaf fields, and fetching individual fields one at 
a time would be pretty annoying.
   
   If the caller _wants_ a specific schema -- whether binary variant or a 
specific shredding -- it's easy enough to pass that. The difficulty comes if 
the caller just wants to get back whatever flavor of variant is already there 
(without shredding or unshredding it first).
   
   I don't see how `cast_to_variant` would help in that case?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to