yuhao-su commented on code in PR #2188:
URL: https://github.com/apache/iceberg-rust/pull/2188#discussion_r2984966011


##########
crates/iceberg/src/arrow/reader.rs:
##########
@@ -840,8 +844,46 @@ impl ArrowReader {
         arrow_schema: &ArrowSchemaRef,
         type_promotion_is_valid: fn(Option<&PrimitiveType>, 
Option<&PrimitiveType>) -> bool,
     ) -> Result<ProjectionMask> {
-        let mut column_map = HashMap::new();
+        // Maps field_id → leaf column indices. Vec because variant 
contributes two
+        // leaves (metadata + value) under a single field ID.
+        let mut column_map: HashMap<i32, Vec<usize>> = HashMap::new();
         let fields = arrow_schema.fields();
+        // HashSet for O(1) membership checks instead of O(n) slice scans.
+        let leaf_field_id_set: HashSet<i32> = 
leaf_field_ids.iter().copied().collect();
+
+        // Variant fields are an Iceberg leaf type but a Parquet GROUP.  Their
+        // sub-fields (metadata, value) carry no embedded field IDs — only the
+        // parent group has the field ID. filter_leaves therefore never finds
+        // them via the standard field-ID scan below.
+        //
+        // Java's PruneColumns.variant() simply returns the group as-is with no
+        // type-compatibility check (isStruct() also short-circuits on 
isVariantType()).
+        // We replicate that here: pre-scan top-level Arrow struct fields whose
+        // field ID resolves to Type::Variant and record all their sub-fields 
so
+        // the second filter_leaves can include them directly.
+        let mut variant_sub_fields: HashMap<FieldRef, i32> = HashMap::new();
+        for top_field in fields.iter() {

Review Comment:
   It will fail to find the leaf node metadata/value in the variant. The 
following are errors that can happen if nested in a Map.
   ```
   Map field must have exactly 2 fields
   partial projection of MapArray is not supported
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to