sundy-li opened a new pull request, #2001:
URL: https://github.com/apache/iceberg-rust/pull/2001
## Which issue does this PR close?
- Closes #.
## What changes are included in this PR?
This PR enables projection of nested fields within struct columns when
reading parquet files. Previously, selecting a field nested inside a struct
would result in a `FeatureUnsupported` error.
### Problem
When users try to select nested fields like `person.name` from a schema such
as:
```
id: Int (field_id=1)
person: Struct (field_id=2)
name: String (field_id=3)
age: Int (field_id=4)
```
The scan would fail with "Projecting nested field is not supported now"
error, blocking access to nested column data.
### Solution
**1. `crates/iceberg/src/arrow/reader.rs`**
- Add `RecordBatchProjector` integration to detect and handle nested field
projections
- After parquet projection, detect if any requested field IDs are nested
(not direct children of the schema's top-level struct)
- Create a `RecordBatchProjector` to extract nested fields from their parent
structs, flattening them into the output record batch
- Exclude metadata fields (like `_file`) from nested field detection
**2. `crates/iceberg/src/arrow/record_batch_transformer.rs`**
- Extend `build_field_id_to_arrow_schema_map` to recursively index nested
struct fields
- Add helper function `collect_field_ids_recursive` to traverse the field
hierarchy
- This allows the transformer to find field IDs that are nested within
structs
**3. `crates/iceberg/src/scan/mod.rs`**
- Remove the restriction that blocked nested field selection (the
`FeatureUnsupported` error)
### How it works
1. When processing a `FileScanTask`, detect if any requested field IDs are
nested by checking if `schema.as_struct().field_by_id(id)` returns `None`
2. If nested fields are detected, create a `RecordBatchProjector` with the
projected arrow schema
3. The projector builds index paths to locate nested fields (e.g., `[1, 0]`
means column 1, inner field 0)
4. After parquet reads the data, the projector extracts nested fields from
their parent structs
5. The transformer then processes the flattened batch normally
## Are these changes tested?
Yes, added `test_read_nested_parquet_column` test that:
- Creates a parquet file with nested struct data (`id`, `person { name, age
}`)
- Reads with projection `[1, 3]` (selecting `id` and nested `name`)
- Verifies both the top-level field and nested field are correctly extracted
- All 1051 existing tests continue to pass
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]