github-actions[bot] commented on code in PR #63001:
URL: https://github.com/apache/doris/pull/63001#discussion_r3257930806


##########
be/src/core/block/block.cpp:
##########
@@ -79,6 +79,51 @@ template void 
clear_blocks<Block>(moodycamel::ConcurrentQueue<Block>&,
 template void clear_blocks<BlockUPtr>(moodycamel::ConcurrentQueue<BlockUPtr>&,
                                       RuntimeProfile::Counter* 
memory_used_counter);
 
+namespace {
+
+// The no-clone fast path is only safe when the whole column tree is uniquely
+// owned. A composite column with shared children still needs COW detachment.
+bool is_recursively_exclusive(const IColumn& column) {
+    if (!column.is_exclusive()) {
+        return false;
+    }
+
+    bool exclusive = true;
+    IColumn::ColumnCallback callback = [&](IColumn::WrappedPtr& subcolumn) {
+        if (!exclusive) {
+            return;
+        }
+        const ColumnPtr& subcolumn_ptr = const_cast<const 
IColumn::WrappedPtr&>(subcolumn);
+        DCHECK(subcolumn_ptr);
+        exclusive = is_recursively_exclusive(*subcolumn_ptr);
+    };
+    // `for_each_subcolumn` only exposes a mutable callback type. This callback
+    // only reads the wrapped pointers and never calls the non-const accessors.
+    const_cast<IColumn&>(column).for_each_subcolumn(callback);

Review Comment:
   `is_recursively_exclusive()` is intended to be a read-only ownership probe, 
but this calls the non-const `for_each_subcolumn()` through `const_cast`. 
`ColumnVariant::for_each_subcolumn()` is not read-only: after invoking the 
callback it assigns `num_rows = serialized_sparse_column->size()` and runs 
`ENABLE_CHECK_CONSISTENCY(this)`. That means a plain 
`Block::mutate_columns_scoped()` acquisition can mutate a `ColumnVariant` (or 
throw from the consistency check) before any caller-requested mutation happens. 
Please add a const/read-only subcolumn traversal or avoid using 
`for_each_subcolumn()` for this exclusivity check.



##########
be/src/core/column/column_map.cpp:
##########
@@ -518,59 +546,84 @@ void ColumnMap::insert_range_from_ignore_overflow(const 
IColumn& src, size_t sta
 }
 
 ColumnPtr ColumnMap::filter(const Filter& filt, ssize_t result_size_hint) 
const {
-    auto k_arr =
-            ColumnArray::create(keys_column->assume_mutable(), 
offsets_column->assume_mutable())
-                    ->filter(filt, result_size_hint);
-    auto v_arr =
-            ColumnArray::create(values_column->assume_mutable(), 
offsets_column->assume_mutable())
-                    ->filter(filt, result_size_hint);
+    auto k_arr = ColumnArray::create(static_cast<const 
IColumn::Ptr&>(keys_column),
+                                     static_cast<const 
IColumn::Ptr&>(offsets_column))
+                         ->filter(filt, result_size_hint);
+    auto v_arr = ColumnArray::create(static_cast<const 
IColumn::Ptr&>(values_column),
+                                     static_cast<const 
IColumn::Ptr&>(offsets_column))
+                         ->filter(filt, result_size_hint);
     return ColumnMap::create(assert_cast<const 
ColumnArray&>(*k_arr).get_data_ptr(),
                              assert_cast<const 
ColumnArray&>(*v_arr).get_data_ptr(),
                              assert_cast<const 
ColumnArray&>(*k_arr).get_offsets_ptr());
 }
 
 size_t ColumnMap::filter(const Filter& filter) {
-    MutableColumnPtr copied_off = offsets_column->clone_empty();
-    copied_off->insert_range_from(*offsets_column, 0, offsets_column->size());
-    ColumnArray::create(keys_column->assume_mutable(), 
offsets_column->assume_mutable())
-            ->filter(filter);
-    ColumnArray::create(values_column->assume_mutable(), 
copied_off->assume_mutable())
-            ->filter(filter);
-    return get_offsets().size();
+    // Move subcolumns out of this ColumnMap to get exclusive ownership, then 
write back.

Review Comment:
   This in-place `filter()` steals the live map subcolumns before the operation 
is guaranteed to finish. If the second/third `IColumn::mutate()`, 
`clone_empty()`, `insert_range_from()`, `ColumnArray::create()`, or either 
`filter()` throws, the `ColumnMap` is left with one or more moved-from/null 
subcolumns because the write-back happens only at the end. This is a 
live-column mutation path, so it needs a scoped restore guard (or detach into 
temporaries without moving out of `this` until all throwing work has completed).



##########
be/src/storage/segment/column_reader.cpp:
##########
@@ -996,7 +997,8 @@ Status MapFileColumnIterator::next_batch(size_t* n, 
MutableColumnPtr& dst, bool*
 
     auto& column_map = assert_cast<ColumnMap&, TypeCheckOnRelease::DISABLE>(
             dst->is_nullable() ? 
static_cast<ColumnNullable&>(*dst).get_nested_column() : *dst);
-    auto column_offsets_ptr = column_map.get_offsets_column().assume_mutable();

Review Comment:
   The `Defer` restore is installed only after 
`IColumn::mutate(std::move(column_map.get_offsets_ptr()))` has already consumed 
the live subcolumn. If this detach needs to clone and throws/OOMs, `column_map` 
is left with a moved-from offsets pointer. The same pattern appears for map 
keys/values below and in the array/struct iterator changes in this file. Please 
install restoration before any throwing detach, or detach from a copied 
`ColumnPtr` and move it back only after success.



##########
be/src/exec/operator/set_source_operator.cpp:
##########
@@ -114,7 +114,7 @@ void SetSourceOperatorX<is_intersect>::_create_mutable_cols(
     for (int i = 0; i < local_state._left_table_data_types.size(); ++i) {
         if (mem_reuse) {
             local_state._mutable_cols[i] =
-                    
std::move(*output_block->get_by_position(i).column).mutate();
+                    
IColumn::mutate(std::move(output_block->get_by_position(i).column));

Review Comment:
   `_create_mutable_cols()` moves every reusable output column into 
`local_state._mutable_cols`, but `_add_result_columns()` can still 
allocate/throw while appending indices before the columns are replaced into 
`output_block` at lines 176-179. On that failure path the output block remains 
with moved-from slots. Please use a scoped block/column guard and release it 
only after `_add_result_columns()` succeeds.



##########
be/src/exec/operator/aggregation_source_operator.cpp:
##########
@@ -113,157 +113,164 @@ Status 
AggLocalState::_get_results_with_serialized_key(RuntimeState* state, Bloc
     MutableColumns key_columns;
     for (int i = 0; i < key_size; ++i) {
         if (mem_reuse) {
-            
key_columns.emplace_back(std::move(*block->get_by_position(i).column).mutate());
+            
key_columns.emplace_back(IColumn::mutate(std::move(block->get_by_position(i).column)));

Review Comment:
   In the `mem_reuse` path this steals key columns out of the caller-provided 
`block`, then performs hash-table iteration and aggregate serialization before 
`block->set_columns()` restores the slots. Any exception from allocation, key 
insertion, or `serialize_to_column()` leaves the reusable output block with 
moved-from columns. This is not covered by the new scoped block guard because 
this path detaches individual slots manually; please use a scoped 
restore/release pattern here.



##########
be/src/core/column/column_variant.h:
##########
@@ -325,7 +325,7 @@ class ColumnVariant final : public COWHelper<IColumn, 
ColumnVariant> {
         if (subcolumns.empty()) {
             return nullptr;
         }
-        return 
subcolumns.get_mutable_root()->data.get_finalized_column_ptr()->assume_mutable();
+        return 
std::move(*subcolumns.get_mutable_root()->data.get_finalized_column_ptr()).mutate();

Review Comment:
   `get_root()` now returns a recursively mutated root, but it does not write 
the possibly cloned root back to the variant tree. If the finalized root column 
is shared, callers that mutate the returned column (for example 
`insert_indices_from()` in `ColumnVariant::insert_indices_from()` or 
root-column preparation in the variant writers) mutate only the detached clone 
and the `ColumnVariant` still points at the old root. Please either return the 
existing mutable root only when exclusive, or assign the detached root back 
into `subcolumns.get_mutable_root()->data` before returning it.



##########
be/src/exec/operator/aggregation_source_operator.cpp:
##########
@@ -294,15 +301,16 @@ Status 
AggLocalState::_get_with_serialized_key_result(RuntimeState* state, Block
         if (!mem_reuse) {
             
key_columns.emplace_back(columns_with_schema[i].type->create_column());
         } else {
-            
key_columns.emplace_back(std::move(*block->get_by_position(i).column).mutate());

Review Comment:
   This second aggregation result path has the same manual-steal gap: when 
`mem_reuse` is true, key/value columns are moved out here, but the function can 
still throw while iterating/inserting/serializing before the 
`block->set_columns()` at the end. Please protect these moved columns with RAII 
or defer moving them out until the error-producing work has completed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to