github-actions[bot] commented on code in PR #61888:
URL: https://github.com/apache/doris/pull/61888#discussion_r3025593955


##########
be/src/core/column/column_string.h:
##########
@@ -276,6 +276,29 @@ class ColumnStr final : public COWHelper<IColumn, 
ColumnStr<T>> {
         sanity_check_simple();
     }
 
+    // Insert `num` string entries with real length information but no actual
+    // character data. The `lengths` array provides the byte length of each
+    // string. Offsets are built with correct cumulative sizes so that
+    // size_at(i) returns the true string length. The chars buffer is extended
+    // with zero-filled padding to maintain the invariant chars.size() == 
offsets.back().
+    // Used by OFFSET_ONLY reading mode where actual string content is not 
needed
+    // but length information must be preserved (e.g., for length() function).
+    void insert_offsets_from_lengths(const uint32_t* lengths, size_t num) 
override {
+        if (UNLIKELY(num == 0)) {
+            return;
+        }
+        const auto old_rows = offsets.size();
+        // Build cumulative offsets from lengths
+        offsets.resize(old_rows + num);
+        auto* offsets_ptr = &offsets[old_rows];
+        size_t running_offset = offsets[old_rows - 1];

Review Comment:
   `old_rows` is `size_t`, so on the first append this expression becomes 
`offsets[SIZE_MAX]`, not the padded `offsets[-1]` sentinel that other code 
relies on. That means an empty destination `ColumnString` can enter OFFSET_ONLY 
mode and build its first batch of offsets from garbage.
   
   A concrete failure path is a fresh result column for `length(string_col)` or 
a nested string subcolumn read entirely through the new offset-only decoder 
path: `insert_offsets_from_lengths()` is the first write, `old_rows == 0`, and 
the computed running offset is undefined.
   
   Please initialize from `old_rows == 0 ? 0 : offsets[old_rows - 1]` (and 
ideally keep the same invariant checks as the other insert helpers).
   



##########
be/src/storage/segment/column_reader.cpp:
##########
@@ -910,10 +914,22 @@ Status MapFileColumnIterator::seek_to_ordinal(ordinal_t 
ord) {
         return Status::OK();
     }
 
+    if (read_null_map_only()) {
+        // In NULL_MAP_ONLY mode, only seek the null iterator; skip 
offset/key/val iterators
+        if (_map_reader->is_nullable() && _null_iterator) {
+            RETURN_IF_ERROR(_null_iterator->seek_to_ordinal(ord));
+        }
+        return Status::OK();
+    }
+

Review Comment:
   This new branch unconditionally writes `*has_null`, but 
`StructFileColumnIterator::read_by_rowids()` and 
`ArrayFileColumnIterator::read_by_rowids()` still implement their rowid path by 
coalescing runs and calling `next_batch(&num_read, dst, nullptr)`.
   
   So after `set_access_paths()` switches a struct/array column into 
`NULL_MAP_ONLY`, any `read_by_rowids()` on that iterator will hit this branch 
and dereference a null pointer. The same issue exists in the analogous array 
branch below.
   
   Please either guard the store with `if (has_null != nullptr)` or keep the 
rowid path from calling `next_batch(..., nullptr)` when these meta-only modes 
are active.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to