suxiaogang223 opened a new pull request, #64098:
URL: https://github.com/apache/doris/pull/64098

   ## Summary
   
   Implements complex type predicate filtering and statistics-based file-layer 
pruning for nested Parquet STRUCT columns, aligning with DuckDB's nested filter 
semantics while respecting Doris' new parquet reader architecture.
   
   ## Changes
   
   ### Row-level Expr Localization
   - `struct_element(VSlotRef(parent), literal child)` chains are recognized as 
nested paths
   - Parent slot is rewritten to file-local top-level block slot while 
preserving `struct_element` form
   - Struct children are NOT registered as independent block slots
   
   ### Filter-only Nested Projection
   - Filter-referenced struct children are merged into the same top-level 
complex column's `FieldProjection.children`
   - Output children maintain priority order; filter-only children are appended 
to read projection
   - Filter-only children are excluded from `ColumnMapping.child_mappings` to 
avoid affecting table output materialization
   
   ### Nested File-layer Pruning Target
   - `FileColumnPredicateFilter` adds `file_child_id_path` for file-local child 
field-id paths
   - AND-semantics `struct_element(...) op literal` / `IN (...)` construct 
pruning hints
   - OR/NOT/arbitrary function subtrees are NOT extracted for pruning (safety)
   - Supports renamed nested children via table-to-file field-id mapping
   
   ### Parquet Leaf Resolution & Pruning
   - `ResolvePredicateLeafSchema()` resolves top-level or nested targets to 
primitive leaf schema
   - Row group min/max statistics pruning for nested struct primitives
   - Dictionary pruning for nested struct string-like columns
   - Bloom filter pruning via Arrow adapter for supported primitive types
   - Page index row range pruning for non-repeated primitive leaves only
   
   ### Test Coverage
   - Mapper unit tests: nested predicate filters (GT, IN_LIST, reverse 
comparison, deep path)
   - Renamed child projection via field-id mapping
   - Missing child and OR subtree safety (no false pruning hints)
   - Real Parquet fixture tests for statistics, dictionary, and page index 
pruning
   - Bloom filter unit tests via Arrow adapter
   
   ### Out of Scope (intentionally)
   - LIST/MAP/repeated leaf pruning
   - Dynamic field names or non-deterministic expressions
   - Real Parquet bloom filter fixture (Arrow writer lacks stable bloom 
metadata API)
   - Full complex child schema change (requires FE/table reader support)
   
   ## Related
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to