Larborator opened a new issue, #63887: URL: https://github.com/apache/doris/issues/63887
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version 4.x/master ### What's Wrong? BE crashes with SIGSEGV (null pointer dereference at 0x0) when querying Parquet-based external tables (Paimon/Hive/Iceberg) with nested type columns (Struct/Array/Map), if a predicate filters out all rows in a RowGroup. The crash occurs in `ScalarColumnReader::gen_filter_map` which dereferences `filter_map.filter_map_data()` — this is `nullptr` when `filter_all=true`. **Root Cause**: `_read_nested_column` only checks `has_filter()` but not `filter_all()`. When all rows are filtered out, `FilterMap` is initialized via `init(nullptr, total_rows, true)`, setting `_has_filter=true` but `_filter_map_data=nullptr`. The newer `FilterMap::generate_nested_filter_map` already has the correct guard (`if (!has_filter() || filter_all()) return error`), but the inline `gen_filter_map` lacks this check. ``` *** SIGSEGV address not mapped to object (@0x0) received by PID 72584 (TID 88200 OR 0x7f319ec10700) from PID 0; stack trace: *** 0# 0x000055EC0721DC35 in /doris/be/lib/doris_be 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/local/jdk-17.0.10/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/local/jdk-17.0.10/lib/server/libjvm.so 3# 0x00007F783FC78630 in /lib64/libpthread.so.0 4# doris::ScalarColumnReader<false, true>::gen_filter_map(doris::FilterMap&, unsigned long, unsigned long, unsigned long, std::vector<unsigned char, std::allocator<unsigned char> >&, std::unique_ptr<doris::FilterMap, std::default_delete<doris::FilterMap> >*) in /doris/be/lib/doris_be 5# doris::ScalarColumnReader<false, true>::_read_nested_column(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&, std::shared_ptr<doris::IDataType const>&, doris::FilterMap&, unsigned long, unsigned long*, bool*, bool)::{lambda(unsigned long, unsigned long)#1}::operator()(unsigned long, unsigned long) const in /doris/be/lib/doris_be 6# doris::ScalarColumnReader<false, true>::_read_nested_column(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&, std::shared_ptr<doris::IDataType const>&, doris::FilterMap&, unsigned long, unsigned long*, bool*, bool) in /doris/be/lib/doris_be 7# doris::ScalarColumnReader<false, true>::read_column_data(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&, std::shared_ptr<doris::IDataType const>&, std::shared_ptr<doris::TableSchemaChangeHelper::Node> const&, doris::FilterMap&, unsigned long, unsigned long*, bool*, bool, long) in /doris/be/lib/doris_be 8# doris::StructColumnReader::read_column_data(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&, std::shared_ptr<doris::IDataType const>&, std::shared_ptr<doris::TableSchemaChangeHelper::Node> const&, doris::FilterMap&, unsigned long, unsigned long*, bool*, bool, long) in /doris/be/lib/doris_be 9# doris::RowGroupReader::_read_column_data(doris::Block*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, unsigned long, unsigned long*, bool*, doris::FilterMap&) in /doris/be/lib/doris_be 10# doris::RowGroupReader::_do_lazy_read(doris::Block*, unsigned long, unsigned long*, bool*) in /doris/be/lib/doris_be 11# doris::RowGroupReader::next_batch(doris::Block*, unsigned long, unsigned long*, bool*) in /doris/be/lib/doris_be 12# doris::ParquetReader::get_next_block(doris::Block*, unsigned long*, bool*) in /doris/be/lib/doris_be 13# doris::PaimonReader::get_next_block_inner(doris::Block*, unsigned long*, bool*) in /doris/be/lib/doris_be 14# doris::TableFormatReader::get_next_block(doris::Block*, unsigned long*, bool*) in /doris/be/lib/doris_be 15# doris::FileScanner::_get_block_wrapped(doris::RuntimeState*, doris::Block*, bool*) in /doris/be/lib/doris_be 16# doris::FileScanner::_get_block_impl(doris::RuntimeState*, doris::Block*, bool*) in /doris/be/lib/doris_be 17# doris::Scanner::get_block(doris::RuntimeState*, doris::Block*, bool*) in /doris/be/lib/doris_be 18# doris::Scanner::get_block_after_projects(doris::RuntimeState*, doris::Block*, bool*) in /doris/be/lib/doris_be 19# doris::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::ScannerContext>, std::shared_ptr<doris::ScanTask>) in /doris/be/lib/doris_be 20# 0x000055EC0CCBBB55 in /doris/be/lib/doris_be 21# doris::ScannerSplitRunner::process_for(std::chrono::duration<long, std::ratio<1l, 1000000000l> >) in /doris/be/lib/doris_be 22# doris::PrioritizedSplitRunner::process() in /doris/be/lib/doris_be 23# doris::TimeSharingTaskExecutor::_dispatch_thread() in /doris/be/lib/doris_be 24# doris::Thread::supervise_thread(void*) in /doris/be/lib/doris_be 25# start_thread in /lib64/libpthread.so.0 26# __clone in /lib64/libc.so.6 ### What You Expected? The query should return results without crashing. When `filter_all=true`, nested columns should be correctly skipped without dereferencing `nullptr`. ### How to Reproduce? The following standalone program reproduces the core logic of `gen_filter_map`. Commenting out the `if (filter_all)` guard and running the `else` branch causes SIGSEGV: ```cpp #include <cassert> #include <cstdint> #include <cstdio> #include <vector> int main() { // Simulate filter_all=true: filter_map_data is nullptr const uint8_t* filter_map_data = nullptr; bool has_filter = true; bool filter_all = true; // rep_levels for a nested column: 3 rows with varying element counts std::vector<uint16_t> rep_levels = {0, 1, 1, 0, 1, 0}; std::vector<uint8_t> nested_filter_map_data; if (has_filter) { if (filter_all) { // FIX: skip gen_filter_map, produce all-zero nested filter nested_filter_map_data.assign(rep_levels.size(), 0); printf("PASS: filter_all path correctly produces all-zero nested filter\n"); } else { // BUG: dereferences nullptr → SIGSEGV size_t filter_loc = 0; nested_filter_map_data.resize(rep_levels.size()); for (size_t i = 0; i < rep_levels.size(); i++) { if (i != 0 && rep_levels[i] == 0) filter_loc++; nested_filter_map_data[i] = filter_map_data[filter_loc]; // CRASH HERE } } } for (auto v : nested_filter_map_data) assert(v == 0); printf("All elements filtered — correct behavior\n"); return 0; } ``` ### Anything Else? _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
