github-actions[bot] commented on code in PR #61888:
URL: https://github.com/apache/doris/pull/61888#discussion_r3014817523


##########
be/src/storage/segment/column_reader.cpp:
##########
@@ -1838,6 +2094,18 @@ Status 
FileColumnIterator::next_batch_of_zone_map(size_t* n, MutableColumnPtr& d
 }
 
 Status FileColumnIterator::next_batch(size_t* n, MutableColumnPtr& dst, bool* 
has_null) {
+    if (read_null_map_only()) {
+        DLOG(INFO) << "File column iterator column " << _column_name
+                   << " in NULL_MAP_ONLY mode, reading only null map.";
+        DORIS_CHECK(dst->is_nullable());
+        auto& nullable_col = assert_cast<ColumnNullable&>(*dst);
+        nullable_col.get_nested_column().insert_many_defaults(*n);
+        auto& null_map_data = nullable_col.get_null_map_data();
+        null_map_data.resize(null_map_data.size() + *n);
+        *has_null = true;

Review Comment:
   **Bug (data correctness)**: This line replaces the previous 
`RETURN_IF_ERROR(read_null_map(&read_rows, null_map_data))` but introduces two 
problems:
   
   1. **Uninitialized null map entries**: `NullMap` is `PaddedPODArray<UInt8>`, 
and `PODArray::resize()` does NOT zero-initialize new elements — it merely 
advances the `c_end` pointer. The null map will contain garbage values, leading 
to unpredictable query results (rows randomly appearing as null or non-null).
   
   2. **Actual null data never read from disk**: Unlike the `read_by_rowids` 
NULL_MAP_ONLY path (which correctly loads pages and uses 
`_page.null_decoder.GetNextRun()` to read actual null information), this path 
doesn't load any pages or decode null data at all.
   
   This path IS reachable: `StringFileColumnIterator` inherits `next_batch` 
from `FileColumnIterator`, and `StringFileColumnIterator::set_access_paths` can 
set `_read_mode = NULL_MAP_ONLY` via `_check_and_set_meta_read_mode`.
   
   **Suggested fix**: Implement proper null reading similar to the 
`read_by_rowids` NULL_MAP_ONLY path (lines 2174-2232), or factor out the 
page-iterating null-read logic into a shared helper and call it from both paths:
   
   ```cpp
   // Example approach: iterate pages and read null data
   auto& null_map_data = nullable_col.get_null_map_data();
   const size_t base_size = null_map_data.size();
   null_map_data.resize(base_size + *n);
   
   size_t remaining = *n;
   size_t total_read = 0;
   while (remaining > 0) {
       if (!_page.has_remaining()) {
           bool eos = false;
           RETURN_IF_ERROR(_load_next_page(&eos));
           if (eos) break;
       }
       size_t nrows = std::min(remaining, _page.remaining());
       if (_page.has_null) {
           size_t read_in_page = 0;
           while (read_in_page < nrows) {
               bool is_null = false;
               size_t run = _page.null_decoder.GetNextRun(&is_null, nrows - 
read_in_page);
               memset(null_map_data.data() + base_size + total_read + 
read_in_page,
                      is_null ? 1 : 0, run);
               read_in_page += run;
               _page.offset_in_page += run;
           }
       } else {
           memset(null_map_data.data() + base_size + total_read, 0, nrows);
           _page.offset_in_page += nrows;
       }
       _current_ordinal += nrows;
       total_read += nrows;
       remaining -= nrows;
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to