CTTY commented on code in PR #2188:
URL: https://github.com/apache/iceberg-rust/pull/2188#discussion_r2957005373
##########
crates/iceberg/src/avro/schema.rs:
##########
@@ -220,6 +221,39 @@ impl SchemaVisitor for SchemaToAvroSchema {
}
}
+ fn variant(&mut self, _v: &VariantType) -> Result<AvroSchemaOrField> {
+ let fields = vec![
+ AvroRecordField {
+ name: "metadata".to_string(),
+ schema: AvroSchema::Bytes,
+ order: RecordFieldOrder::Ignore,
+ position: 0,
+ doc: None,
+ aliases: None,
+ default: None,
+ custom_attributes: Default::default(),
+ },
+ AvroRecordField {
+ name: "value".to_string(),
+ schema: AvroSchema::Bytes,
+ order: RecordFieldOrder::Ignore,
+ position: 1,
+ doc: None,
+ aliases: None,
+ default: None,
+ custom_attributes: Default::default(),
+ },
+ ];
Review Comment:
Should these be static?
##########
crates/iceberg/src/arrow/reader.rs:
##########
@@ -840,8 +844,46 @@ impl ArrowReader {
arrow_schema: &ArrowSchemaRef,
type_promotion_is_valid: fn(Option<&PrimitiveType>,
Option<&PrimitiveType>) -> bool,
) -> Result<ProjectionMask> {
- let mut column_map = HashMap::new();
+ // Maps field_id → leaf column indices. Vec because variant
contributes two
+ // leaves (metadata + value) under a single field ID.
+ let mut column_map: HashMap<i32, Vec<usize>> = HashMap::new();
let fields = arrow_schema.fields();
+ // HashSet for O(1) membership checks instead of O(n) slice scans.
+ let leaf_field_id_set: HashSet<i32> =
leaf_field_ids.iter().copied().collect();
+
+ // Variant fields are an Iceberg leaf type but a Parquet GROUP. Their
+ // sub-fields (metadata, value) carry no embedded field IDs — only the
+ // parent group has the field ID. filter_leaves therefore never finds
+ // them via the standard field-ID scan below.
+ //
+ // Java's PruneColumns.variant() simply returns the group as-is with no
+ // type-compatibility check (isStruct() also short-circuits on
isVariantType()).
+ // We replicate that here: pre-scan top-level Arrow struct fields whose
+ // field ID resolves to Type::Variant and record all their sub-fields
so
+ // the second filter_leaves can include them directly.
+ let mut variant_sub_fields: HashMap<FieldRef, i32> = HashMap::new();
+ for top_field in fields.iter() {
Review Comment:
What would happen if variant is nested within another type?
##########
crates/catalog/glue/src/schema.rs:
##########
@@ -182,6 +182,13 @@ impl SchemaVisitor for GlueSchemaBuilder {
Ok(glue_type)
}
+
+ fn variant(&mut self, _v: &iceberg::spec::VariantType) -> Result<Self::T> {
+ Err(Error::new(
+ ErrorKind::FeatureUnsupported,
+ "Conversion from VariantType is not supported for Glue",
+ ))
Review Comment:
we can just return "variant".to_string(), on glue it would look like below:
```
{
"data": {
"unknown": "variant"
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]