Larborator opened a new pull request, #63889:
URL: https://github.com/apache/doris/pull/63889
### What problem does this PR solve?
Issue Number: close #63887
Problem Summary:
Fix null pointer dereference (SIGSEGV) in
`ScalarColumnReader::gen_filter_map` when reading nested type columns
(Struct/Array/Map) from Parquet-based external tables.
When a predicate filters out all rows in a RowGroup, `FilterMap` is
initialized with `filter_all=true` and `_filter_map_data=nullptr`. The
`_read_nested_column` function only checks `has_filter()` before calling
`gen_filter_map`, which dereferences `filter_map_data()` unconditionally —
causing a crash.
Fix: add a `filter_all()` check before calling `gen_filter_map`. When
`filter_all` is true, directly construct an all-zero nested filter map with
`filter_all=true` propagation. This is logically equivalent to what
`gen_filter_map` would produce with valid all-zero filter data — both correctly
discard all data from the RowGroup.
### Release note
Fix BE crash (SIGSEGV) when querying Parquet-based external tables
(Paimon/Hive/Iceberg) with nested type columns under lazy read, if predicates
filter out all rows in a RowGroup.
### Check List (For Author)
- Test
- [ ] Regression test
- [x] Unit Test
- [x] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason
**Manual test**: Verified with standalone program simulating
`gen_filter_map` logic — crashes without fix, passes with fix.
**Unit test**: Added `test_filter_all_nullptr_nested_filter_map` and
`test_all_zero_filter_nested_filter_map` in `parquet_common_test.cpp`.
- Behavior changed:
- [x] No.
- [ ] Yes.
- Does this need documentation?
- [x] No.
- [ ] Yes.
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]