github-actions[bot] commented on code in PR #63001:
URL: https://github.com/apache/doris/pull/63001#discussion_r3257930806
##########
be/src/core/block/block.cpp:
##########
@@ -79,6 +79,51 @@ template void
clear_blocks<Block>(moodycamel::ConcurrentQueue<Block>&,
template void clear_blocks<BlockUPtr>(moodycamel::ConcurrentQueue<BlockUPtr>&,
RuntimeProfile::Counter*
memory_used_counter);
+namespace {
+
+// The no-clone fast path is only safe when the whole column tree is uniquely
+// owned. A composite column with shared children still needs COW detachment.
+bool is_recursively_exclusive(const IColumn& column) {
+ if (!column.is_exclusive()) {
+ return false;
+ }
+
+ bool exclusive = true;
+ IColumn::ColumnCallback callback = [&](IColumn::WrappedPtr& subcolumn) {
+ if (!exclusive) {
+ return;
+ }
+ const ColumnPtr& subcolumn_ptr = const_cast<const
IColumn::WrappedPtr&>(subcolumn);
+ DCHECK(subcolumn_ptr);
+ exclusive = is_recursively_exclusive(*subcolumn_ptr);
+ };
+ // `for_each_subcolumn` only exposes a mutable callback type. This callback
+ // only reads the wrapped pointers and never calls the non-const accessors.
+ const_cast<IColumn&>(column).for_each_subcolumn(callback);
Review Comment:
`is_recursively_exclusive()` is intended to be a read-only ownership probe,
but this calls the non-const `for_each_subcolumn()` through `const_cast`.
`ColumnVariant::for_each_subcolumn()` is not read-only: after invoking the
callback it assigns `num_rows = serialized_sparse_column->size()` and runs
`ENABLE_CHECK_CONSISTENCY(this)`. That means a plain
`Block::mutate_columns_scoped()` acquisition can mutate a `ColumnVariant` (or
throw from the consistency check) before any caller-requested mutation happens.
Please add a const/read-only subcolumn traversal or avoid using
`for_each_subcolumn()` for this exclusivity check.
##########
be/src/core/column/column_map.cpp:
##########
@@ -518,59 +546,84 @@ void ColumnMap::insert_range_from_ignore_overflow(const
IColumn& src, size_t sta
}
ColumnPtr ColumnMap::filter(const Filter& filt, ssize_t result_size_hint)
const {
- auto k_arr =
- ColumnArray::create(keys_column->assume_mutable(),
offsets_column->assume_mutable())
- ->filter(filt, result_size_hint);
- auto v_arr =
- ColumnArray::create(values_column->assume_mutable(),
offsets_column->assume_mutable())
- ->filter(filt, result_size_hint);
+ auto k_arr = ColumnArray::create(static_cast<const
IColumn::Ptr&>(keys_column),
+ static_cast<const
IColumn::Ptr&>(offsets_column))
+ ->filter(filt, result_size_hint);
+ auto v_arr = ColumnArray::create(static_cast<const
IColumn::Ptr&>(values_column),
+ static_cast<const
IColumn::Ptr&>(offsets_column))
+ ->filter(filt, result_size_hint);
return ColumnMap::create(assert_cast<const
ColumnArray&>(*k_arr).get_data_ptr(),
assert_cast<const
ColumnArray&>(*v_arr).get_data_ptr(),
assert_cast<const
ColumnArray&>(*k_arr).get_offsets_ptr());
}
size_t ColumnMap::filter(const Filter& filter) {
- MutableColumnPtr copied_off = offsets_column->clone_empty();
- copied_off->insert_range_from(*offsets_column, 0, offsets_column->size());
- ColumnArray::create(keys_column->assume_mutable(),
offsets_column->assume_mutable())
- ->filter(filter);
- ColumnArray::create(values_column->assume_mutable(),
copied_off->assume_mutable())
- ->filter(filter);
- return get_offsets().size();
+ // Move subcolumns out of this ColumnMap to get exclusive ownership, then
write back.
Review Comment:
This in-place `filter()` steals the live map subcolumns before the operation
is guaranteed to finish. If the second/third `IColumn::mutate()`,
`clone_empty()`, `insert_range_from()`, `ColumnArray::create()`, or either
`filter()` throws, the `ColumnMap` is left with one or more moved-from/null
subcolumns because the write-back happens only at the end. This is a
live-column mutation path, so it needs a scoped restore guard (or detach into
temporaries without moving out of `this` until all throwing work has completed).
##########
be/src/storage/segment/column_reader.cpp:
##########
@@ -996,7 +997,8 @@ Status MapFileColumnIterator::next_batch(size_t* n,
MutableColumnPtr& dst, bool*
auto& column_map = assert_cast<ColumnMap&, TypeCheckOnRelease::DISABLE>(
dst->is_nullable() ?
static_cast<ColumnNullable&>(*dst).get_nested_column() : *dst);
- auto column_offsets_ptr = column_map.get_offsets_column().assume_mutable();
Review Comment:
The `Defer` restore is installed only after
`IColumn::mutate(std::move(column_map.get_offsets_ptr()))` has already consumed
the live subcolumn. If this detach needs to clone and throws/OOMs, `column_map`
is left with a moved-from offsets pointer. The same pattern appears for map
keys/values below and in the array/struct iterator changes in this file. Please
install restoration before any throwing detach, or detach from a copied
`ColumnPtr` and move it back only after success.
##########
be/src/exec/operator/set_source_operator.cpp:
##########
@@ -114,7 +114,7 @@ void SetSourceOperatorX<is_intersect>::_create_mutable_cols(
for (int i = 0; i < local_state._left_table_data_types.size(); ++i) {
if (mem_reuse) {
local_state._mutable_cols[i] =
-
std::move(*output_block->get_by_position(i).column).mutate();
+
IColumn::mutate(std::move(output_block->get_by_position(i).column));
Review Comment:
`_create_mutable_cols()` moves every reusable output column into
`local_state._mutable_cols`, but `_add_result_columns()` can still
allocate/throw while appending indices before the columns are replaced into
`output_block` at lines 176-179. On that failure path the output block remains
with moved-from slots. Please use a scoped block/column guard and release it
only after `_add_result_columns()` succeeds.
##########
be/src/exec/operator/aggregation_source_operator.cpp:
##########
@@ -113,157 +113,164 @@ Status
AggLocalState::_get_results_with_serialized_key(RuntimeState* state, Bloc
MutableColumns key_columns;
for (int i = 0; i < key_size; ++i) {
if (mem_reuse) {
-
key_columns.emplace_back(std::move(*block->get_by_position(i).column).mutate());
+
key_columns.emplace_back(IColumn::mutate(std::move(block->get_by_position(i).column)));
Review Comment:
In the `mem_reuse` path this steals key columns out of the caller-provided
`block`, then performs hash-table iteration and aggregate serialization before
`block->set_columns()` restores the slots. Any exception from allocation, key
insertion, or `serialize_to_column()` leaves the reusable output block with
moved-from columns. This is not covered by the new scoped block guard because
this path detaches individual slots manually; please use a scoped
restore/release pattern here.
##########
be/src/core/column/column_variant.h:
##########
@@ -325,7 +325,7 @@ class ColumnVariant final : public COWHelper<IColumn,
ColumnVariant> {
if (subcolumns.empty()) {
return nullptr;
}
- return
subcolumns.get_mutable_root()->data.get_finalized_column_ptr()->assume_mutable();
+ return
std::move(*subcolumns.get_mutable_root()->data.get_finalized_column_ptr()).mutate();
Review Comment:
`get_root()` now returns a recursively mutated root, but it does not write
the possibly cloned root back to the variant tree. If the finalized root column
is shared, callers that mutate the returned column (for example
`insert_indices_from()` in `ColumnVariant::insert_indices_from()` or
root-column preparation in the variant writers) mutate only the detached clone
and the `ColumnVariant` still points at the old root. Please either return the
existing mutable root only when exclusive, or assign the detached root back
into `subcolumns.get_mutable_root()->data` before returning it.
##########
be/src/exec/operator/aggregation_source_operator.cpp:
##########
@@ -294,15 +301,16 @@ Status
AggLocalState::_get_with_serialized_key_result(RuntimeState* state, Block
if (!mem_reuse) {
key_columns.emplace_back(columns_with_schema[i].type->create_column());
} else {
-
key_columns.emplace_back(std::move(*block->get_by_position(i).column).mutate());
Review Comment:
This second aggregation result path has the same manual-steal gap: when
`mem_reuse` is true, key/value columns are moved out here, but the function can
still throw while iterating/inserting/serializing before the
`block->set_columns()` at the end. Please protect these moved columns with RAII
or defer moving them out until the error-producing work has completed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]