The GitHub Actions job "Flink CI" on 
iceberg.git/fix/prune-columns-nested-fields has failed.
Run started by GitHub user IgorBerman (triggered by IgorBerman).

Head commit for run:
610fd128c1a3388a14696e3706570c380b51bc16 / Igor Berman <[email protected]>
Parquet: Fix nested field pruning in PruneColumns

When pruning nested structures (lists, maps, structs), the PruneColumns
visitor was incorrectly returning the original unpruned field when the
container's field ID was in the selectedIds set, even when child fields
had been pruned.

This fix ensures that:
1. In struct(): When a field is selected and has been pruned (field != 
originalField),
   use the pruned version instead of the original.
2. In list(): Check for pruned element first before checking if elementId
   is selected, ensuring nested pruning is applied.
3. In map(): Similarly check for pruned value before checking selected 
keys/values.
4. Add validatePrunedField() to verify pruned fields maintain compatibility
   with original fields (same name, ID, and repetition).

This enables proper column pruning for deeply nested schemas like:
list<struct<field1, nested_list: list<struct<a, b, c, d>>>>

When projecting only field1 and nested_list[].a, b, the fix ensures
fields c and d are properly pruned from the Parquet projection schema.

Note: When a struct is explicitly selected (SELECT struct_field, 
struct_field.sub_field),
the full struct is preserved because field == originalField in that case.

Report URL: https://github.com/apache/iceberg/actions/runs/19920312914

With regards,
GitHub Actions via GitBox

Reply via email to