The GitHub Actions job "CI" on iceberg-rust.git/main has failed.
Run started by GitHub user liurenjie1024 (triggered by liurenjie1024).

Head commit for run:
1384a4f2d71ed16b73f3b1f139d5dbd4e5035428 / Gerald Berger 
<[email protected]>
feat(core): Add support for `_file` column (#1824)

## Which issue does this PR close?


- Closes #1766.

## What changes are included in this PR?

Integrates virtual field handling for the `_file` metadata column into
`RecordBatchTransformer` using a pre-computed constants map, eliminating
post-processing and duplicate lookups.

## Key Changes

**New `metadata_columns.rs` module**: Centralized utilities for metadata
columns
- Constants: `RESERVED_FIELD_ID_FILE`, `RESERVED_COL_NAME_FILE`
- Helper functions: `get_metadata_column_name()`,
`get_metadata_field_id()`, `is_metadata_field()`,
`is_metadata_column_name()`

**Enhanced `RecordBatchTransformer`**:
- Added `constant_fields: HashMap<i32, (DataType, PrimitiveLiteral)>` -
pre-computed during initialization
- New `with_constant()` method - computes Arrow type once during setup
- Updated to use pre-computed types and values (avoids duplicate
lookups)
- Handles `DataType::RunEndEncoded` for constant strings (memory
efficient)

**Simplified `reader.rs`**:
- Pass full `project_field_ids` (including virtual) to
RecordBatchTransformer
- Single `with_constant()` call to register `_file` column
- Removed post-processing loop

**Updated `scan/mod.rs`**:
- Use `is_metadata_column_name()` and `get_metadata_field_id()` instead
of hardcoded checks
## Are these changes tested?

Yes, comprehensive tests have been added to verify the functionality:

### New Tests (7 tests added)

#### Table Scan API Tests (7 tests)

1. **`test_select_with_file_column`** - Verifies basic functionality of
selecting `_file` with regular columns
2. **`test_select_file_column_position`** - Verifies column ordering is
preserved
3. **`test_select_file_column_only`** - Tests selecting only the `_file`
column
4. **`test_file_column_with_multiple_files`** - Tests multiple data
files scenario
5. **`test_file_column_at_start`** - Tests `_file` at position 0
6. **`test_file_column_at_end`** - Tests `_file` at the last position
7. **`test_select_with_repeated_column_names`** - Tests repeated column
selection

Report URL: https://github.com/apache/iceberg-rust/actions/runs/20060746531

With regards,
GitHub Actions via GitBox

Reply via email to