github-actions[bot] commented on code in PR #61888:
URL: https://github.com/apache/doris/pull/61888#discussion_r3025593955
##########
be/src/core/column/column_string.h:
##########
@@ -276,6 +276,29 @@ class ColumnStr final : public COWHelper<IColumn,
ColumnStr<T>> {
sanity_check_simple();
}
+ // Insert `num` string entries with real length information but no actual
+ // character data. The `lengths` array provides the byte length of each
+ // string. Offsets are built with correct cumulative sizes so that
+ // size_at(i) returns the true string length. The chars buffer is extended
+ // with zero-filled padding to maintain the invariant chars.size() ==
offsets.back().
+ // Used by OFFSET_ONLY reading mode where actual string content is not
needed
+ // but length information must be preserved (e.g., for length() function).
+ void insert_offsets_from_lengths(const uint32_t* lengths, size_t num)
override {
+ if (UNLIKELY(num == 0)) {
+ return;
+ }
+ const auto old_rows = offsets.size();
+ // Build cumulative offsets from lengths
+ offsets.resize(old_rows + num);
+ auto* offsets_ptr = &offsets[old_rows];
+ size_t running_offset = offsets[old_rows - 1];
Review Comment:
`old_rows` is `size_t`, so on the first append this expression becomes
`offsets[SIZE_MAX]`, not the padded `offsets[-1]` sentinel that other code
relies on. That means an empty destination `ColumnString` can enter OFFSET_ONLY
mode and build its first batch of offsets from garbage.
A concrete failure path is a fresh result column for `length(string_col)` or
a nested string subcolumn read entirely through the new offset-only decoder
path: `insert_offsets_from_lengths()` is the first write, `old_rows == 0`, and
the computed running offset is undefined.
Please initialize from `old_rows == 0 ? 0 : offsets[old_rows - 1]` (and
ideally keep the same invariant checks as the other insert helpers).
##########
be/src/storage/segment/column_reader.cpp:
##########
@@ -910,10 +914,22 @@ Status MapFileColumnIterator::seek_to_ordinal(ordinal_t
ord) {
return Status::OK();
}
+ if (read_null_map_only()) {
+ // In NULL_MAP_ONLY mode, only seek the null iterator; skip
offset/key/val iterators
+ if (_map_reader->is_nullable() && _null_iterator) {
+ RETURN_IF_ERROR(_null_iterator->seek_to_ordinal(ord));
+ }
+ return Status::OK();
+ }
+
Review Comment:
This new branch unconditionally writes `*has_null`, but
`StructFileColumnIterator::read_by_rowids()` and
`ArrayFileColumnIterator::read_by_rowids()` still implement their rowid path by
coalescing runs and calling `next_batch(&num_read, dst, nullptr)`.
So after `set_access_paths()` switches a struct/array column into
`NULL_MAP_ONLY`, any `read_by_rowids()` on that iterator will hit this branch
and dereference a null pointer. The same issue exists in the analogous array
branch below.
Please either guard the store with `if (has_null != nullptr)` or keep the
rowid path from calling `next_batch(..., nullptr)` when these meta-only modes
are active.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]