github-actions[bot] commented on code in PR #61888:
URL: https://github.com/apache/doris/pull/61888#discussion_r3026985461


##########
be/src/core/column/column_string.h:
##########
@@ -276,6 +276,29 @@ class ColumnStr final : public COWHelper<IColumn, 
ColumnStr<T>> {
         sanity_check_simple();
     }
 
+    // Insert `num` string entries with real length information but no actual
+    // character data. The `lengths` array provides the byte length of each
+    // string. Offsets are built with correct cumulative sizes so that
+    // size_at(i) returns the true string length. The chars buffer is extended
+    // with zero-filled padding to maintain the invariant chars.size() == 
offsets.back().
+    // Used by OFFSET_ONLY reading mode where actual string content is not 
needed
+    // but length information must be preserved (e.g., for length() function).
+    void insert_offsets_from_lengths(const uint32_t* lengths, size_t num) 
override {
+        if (UNLIKELY(num == 0)) {
+            return;
+        }
+        const auto old_rows = offsets.size();
+        // Build cumulative offsets from lengths
+        offsets.resize(old_rows + num);
+        auto* offsets_ptr = &offsets[old_rows];
+        size_t running_offset = offsets[old_rows - 1];
+        for (size_t i = 0; i < num; ++i) {
+            running_offset += lengths[i];
+            offsets_ptr[i] = static_cast<T>(running_offset);
+        }
+        chars.resize(offsets[old_rows + num - 1]);

Review Comment:
   `insert_offsets_from_lengths()` is missing the same `check_chars_length()` 
guard that every other string insert path uses. In `OFFSET_ONLY` scans we still 
accumulate the full logical byte length into `chars`/`offsets`; once that sum 
crosses the `UInt32` limit, the offsets wrap and the synthesized string column 
becomes corrupt. Please keep the overflow check here before resizing.



##########
be/test/storage/segment/column_reader_test.cpp:
##########
@@ -301,4 +301,42 @@ TEST_F(ColumnReaderTest, 
MapReadByRowidsSkipReadingResizesDestination) {
     ASSERT_TRUE(st.ok()) << "read_by_rowids failed: " << st.to_string();
     ASSERT_EQ(count, dst->size());
 }
-} // namespace doris::segment_v2
\ No newline at end of file
+TEST_F(ColumnReaderTest, MapAccessAllWithOffsetDoesNotPropagateOffsetToKey) {
+    // Regression test: when the access path is [map_col, *, OFFSET]
+    // (e.g. length(map_col['some_key'])), the key column must be fully read
+    // so that element_at() can match the key. Only the value column should
+    // enter OFFSET_ONLY mode.
+    auto map_reader = std::make_shared<ColumnReader>();
+    auto null_iter = 
std::make_unique<FileColumnIterator>(std::make_shared<ColumnReader>());
+    auto offsets_iter = std::make_unique<OffsetFileColumnIterator>(
+            
std::make_unique<FileColumnIterator>(std::make_shared<ColumnReader>()));
+    auto key_iter = 
std::make_unique<StringFileColumnIterator>(std::make_shared<ColumnReader>());
+    key_iter->set_column_name("key");
+    auto val_iter = 
std::make_unique<StringFileColumnIterator>(std::make_shared<ColumnReader>());
+    val_iter->set_column_name("value");
+
+    MapFileColumnIterator map_iter(map_reader, std::move(null_iter), 
std::move(offsets_iter),
+                                   std::move(key_iter), std::move(val_iter));
+    map_iter.set_column_name("map_col");
+
+    // path: [map_col, *, OFFSET]  — simulates length(map_col['c_phone'])
+    TColumnAccessPaths all_access_paths;
+    all_access_paths.emplace_back();
+    all_access_paths[0].data_access_path.path = {"map_col", "*", "OFFSET"};
+    TColumnAccessPaths predicate_access_paths;
+
+    auto st = map_iter.set_access_paths(all_access_paths, 
predicate_access_paths);
+    ASSERT_TRUE(st.ok()) << "set_access_paths failed: " << st.to_string();
+
+    // Key must be fully readable (NEED_TO_READ), NOT in OFFSET_ONLY mode.

Review Comment:
   This UT reaches into `MapFileColumnIterator::_key_iterator` / 
`_val_iterator` (private) and `ColumnIterator::_reading_flag` (protected), so 
it will not compile as written. Please assert through public accessors or add 
an explicit test-only friend instead of reading the internals directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to