github-actions[bot] commented on code in PR #61888:
URL: https://github.com/apache/doris/pull/61888#discussion_r3013400746


##########
be/src/storage/segment/column_reader.cpp:
##########
@@ -1903,6 +2171,68 @@ Status FileColumnIterator::next_batch(size_t* n, 
MutableColumnPtr& dst, bool* ha
 
 Status FileColumnIterator::read_by_rowids(const rowid_t* rowids, const size_t 
count,
                                           MutableColumnPtr& dst) {
+    if (read_null_map_only()) {
+        DLOG(INFO) << "File column iterator column " << _column_name
+                   << " in NULL_MAP_ONLY mode, reading only null map by 
rowids.";
+
+        DORIS_CHECK(dst->is_nullable());
+        auto& nullable_col = assert_cast<ColumnNullable&>(*dst);
+        auto& null_map_data = nullable_col.get_null_map_data();
+        const size_t base_size = null_map_data.size();
+        null_map_data.resize(base_size + count);
+

Review Comment:
   **[Medium Bug — Uninitialized Memory]** Same `PODArray::resize()` issue as 
in `next_batch`: `null_map_data.resize(base_size + count)` allocates `count` 
new entries with uninitialized memory.
   
   Unlike `next_batch`, this `read_by_rowids` path does subsequently fill the 
null map via `memset` in the loop below (lines 2215-2228). However, if the 
while loop terminates early (e.g., all remaining rows are processed but some 
entries weren't reached), entries could remain uninitialized.
   
   Suggestion: Use `resize_fill(base_size + count)` or `memset` the entire 
range to 0 after resize, then let the loop overwrite with actual values.



##########
be/src/storage/segment/column_reader.cpp:
##########
@@ -1838,6 +2094,18 @@ Status 
FileColumnIterator::next_batch_of_zone_map(size_t* n, MutableColumnPtr& d
 }
 
 Status FileColumnIterator::next_batch(size_t* n, MutableColumnPtr& dst, bool* 
has_null) {
+    if (read_null_map_only()) {
+        DLOG(INFO) << "File column iterator column " << _column_name
+                   << " in NULL_MAP_ONLY mode, reading only null map.";
+        DORIS_CHECK(dst->is_nullable());
+        auto& nullable_col = assert_cast<ColumnNullable&>(*dst);
+        nullable_col.get_nested_column().insert_many_defaults(*n);
+        auto& null_map_data = nullable_col.get_null_map_data();
+        null_map_data.resize(null_map_data.size() + *n);
+        *has_null = true;

Review Comment:
   **[Critical Bug — Data Correctness]** `PODArray::resize()` does NOT 
zero-initialize new elements (see `pod_array.h:68`: *"It differs from 
std::vector in that it does not initialize the elements"*). The new null map 
entries will contain **uninitialized memory (garbage)**, causing random rows to 
be incorrectly classified as null or non-null.
   
   But fixing this alone is insufficient — this entire NULL_MAP_ONLY block 
never reads the actual null bitmap from the segment pages. The normal path 
(lines 2119-2162) iterates through pages, decodes the null bitmap via 
`_page.null_decoder.GetNextRun()`, and processes each run. This path skips all 
of that.
   
   For scalar `FileColumnIterator`, the null bitmap is embedded in the data 
pages (unlike complex types which have a separate `_null_iterator`). To 
correctly implement NULL_MAP_ONLY here, you need to:
   1. Iterate through pages (like the normal path does)
   2. Read `_page.null_decoder` to get actual null/non-null status
   3. Write the correct values into `null_map_data`
   4. Skip `_page.data_decoder->next_batch()` calls (since we only want nulls)
   
   Alternatively, if this mode is only used for complex type sub-iterators that 
have their own `_null_iterator`, consider whether `FileColumnIterator` actually 
needs NULL_MAP_ONLY support at all — the Map/Array/Struct iterators already 
handle it via their own null iterators.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to