Larborator opened a new issue, #63887:
URL: https://github.com/apache/doris/issues/63887

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   4.x/master
   
   ### What's Wrong?
   
   BE crashes with SIGSEGV (null pointer dereference at 0x0) when querying 
Parquet-based external tables (Paimon/Hive/Iceberg) with nested type columns 
(Struct/Array/Map), if a predicate filters out all rows in a RowGroup.
   
   The crash occurs in `ScalarColumnReader::gen_filter_map` which dereferences 
`filter_map.filter_map_data()` — this is `nullptr` when `filter_all=true`.
   
   **Root Cause**: `_read_nested_column` only checks `has_filter()` but not 
`filter_all()`. When all rows are filtered out, `FilterMap` is initialized via 
`init(nullptr, total_rows, true)`, setting `_has_filter=true` but 
`_filter_map_data=nullptr`. The newer `FilterMap::generate_nested_filter_map` 
already has the correct guard (`if (!has_filter() || filter_all()) return 
error`), but the inline `gen_filter_map` lacks this check.
   
   ```
   *** SIGSEGV address not mapped to object (@0x0) received by PID 72584 (TID 
88200 OR 0x7f319ec10700) from PID 0; stack trace: ***
   0# 0x000055EC0721DC35 in /doris/be/lib/doris_be
    1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in 
/usr/local/jdk-17.0.10/lib/server/libjvm.so
    2# JVM_handle_linux_signal in /usr/local/jdk-17.0.10/lib/server/libjvm.so
    3# 0x00007F783FC78630 in /lib64/libpthread.so.0
    4# doris::ScalarColumnReader<false, 
true>::gen_filter_map(doris::FilterMap&, unsigned long, unsigned long, unsigned 
long, std::vector<unsigned char, std::allocator<unsigned char> >&, 
std::unique_ptr<doris::FilterMap, std::default_delete<doris::FilterMap> >*) in 
/doris/be/lib/doris_be
    5# doris::ScalarColumnReader<false, 
true>::_read_nested_column(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&,
 std::shared_ptr<doris::IDataType const>&, doris::FilterMap&, unsigned long, 
unsigned long*, bool*, bool)::{lambda(unsigned long, unsigned 
long)#1}::operator()(unsigned long, unsigned long) const in 
/doris/be/lib/doris_be
    6# doris::ScalarColumnReader<false, 
true>::_read_nested_column(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&,
 std::shared_ptr<doris::IDataType const>&, doris::FilterMap&, unsigned long, 
unsigned long*, bool*, bool) in /doris/be/lib/doris_be
    7# doris::ScalarColumnReader<false, 
true>::read_column_data(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&,
 std::shared_ptr<doris::IDataType const>&, 
std::shared_ptr<doris::TableSchemaChangeHelper::Node> const&, 
doris::FilterMap&, unsigned long, unsigned long*, bool*, bool, long) in 
/doris/be/lib/doris_be
    8# 
doris::StructColumnReader::read_column_data(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&,
 std::shared_ptr<doris::IDataType const>&, 
std::shared_ptr<doris::TableSchemaChangeHelper::Node> const&, 
doris::FilterMap&, unsigned long, unsigned long*, bool*, bool, long) in 
/doris/be/lib/doris_be
    9# doris::RowGroupReader::_read_column_data(doris::Block*, 
std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > > const&, unsigned long, 
unsigned long*, bool*, doris::FilterMap&) in /doris/be/lib/doris_be
   10# doris::RowGroupReader::_do_lazy_read(doris::Block*, unsigned long, 
unsigned long*, bool*) in /doris/be/lib/doris_be
   11# doris::RowGroupReader::next_batch(doris::Block*, unsigned long, unsigned 
long*, bool*) in /doris/be/lib/doris_be
   12# doris::ParquetReader::get_next_block(doris::Block*, unsigned long*, 
bool*) in /doris/be/lib/doris_be
   13# doris::PaimonReader::get_next_block_inner(doris::Block*, unsigned long*, 
bool*) in /doris/be/lib/doris_be
   14# doris::TableFormatReader::get_next_block(doris::Block*, unsigned long*, 
bool*) in /doris/be/lib/doris_be
   15# doris::FileScanner::_get_block_wrapped(doris::RuntimeState*, 
doris::Block*, bool*) in /doris/be/lib/doris_be
   16# doris::FileScanner::_get_block_impl(doris::RuntimeState*, doris::Block*, 
bool*) in /doris/be/lib/doris_be
   17# doris::Scanner::get_block(doris::RuntimeState*, doris::Block*, bool*) in 
/doris/be/lib/doris_be
   18# doris::Scanner::get_block_after_projects(doris::RuntimeState*, 
doris::Block*, bool*) in /doris/be/lib/doris_be
   19# 
doris::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::ScannerContext>, 
std::shared_ptr<doris::ScanTask>) in /doris/be/lib/doris_be
   20# 0x000055EC0CCBBB55 in /doris/be/lib/doris_be
   21# doris::ScannerSplitRunner::process_for(std::chrono::duration<long, 
std::ratio<1l, 1000000000l> >) in /doris/be/lib/doris_be
   22# doris::PrioritizedSplitRunner::process() in /doris/be/lib/doris_be
   23# doris::TimeSharingTaskExecutor::_dispatch_thread() in 
/doris/be/lib/doris_be
   24# doris::Thread::supervise_thread(void*) in /doris/be/lib/doris_be
   25# start_thread in /lib64/libpthread.so.0
   26# __clone in /lib64/libc.so.6
   
   ### What You Expected?
   
   The query should return results without crashing. When `filter_all=true`, 
nested columns should be correctly skipped without dereferencing `nullptr`.
   
   
   ### How to Reproduce?
   
   The following standalone program reproduces the core logic of 
`gen_filter_map`. Commenting out the `if (filter_all)` guard and running the 
`else` branch causes SIGSEGV:
   
   ```cpp
   #include <cassert>
   #include <cstdint>
   #include <cstdio>
   #include <vector>
   
   int main() {
       // Simulate filter_all=true: filter_map_data is nullptr
       const uint8_t* filter_map_data = nullptr;
       bool has_filter = true;
       bool filter_all = true;
   
       // rep_levels for a nested column: 3 rows with varying element counts
       std::vector<uint16_t> rep_levels = {0, 1, 1, 0, 1, 0};
       std::vector<uint8_t> nested_filter_map_data;
   
       if (has_filter) {
           if (filter_all) {
               // FIX: skip gen_filter_map, produce all-zero nested filter
               nested_filter_map_data.assign(rep_levels.size(), 0);
               printf("PASS: filter_all path correctly produces all-zero nested 
filter\n");
           } else {
               // BUG: dereferences nullptr → SIGSEGV
               size_t filter_loc = 0;
               nested_filter_map_data.resize(rep_levels.size());
               for (size_t i = 0; i < rep_levels.size(); i++) {
                   if (i != 0 && rep_levels[i] == 0) filter_loc++;
                   nested_filter_map_data[i] = filter_map_data[filter_loc]; // 
CRASH HERE
               }
           }
       }
   
       for (auto v : nested_filter_map_data) assert(v == 0);
       printf("All elements filtered — correct behavior\n");
       return 0;
   }
   ```
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to