Copilot commented on code in PR #63518:
URL: https://github.com/apache/doris/pull/63518#discussion_r3285775912
##########
be/src/core/column/column_string.h:
##########
@@ -128,7 +127,6 @@ class ColumnStr final : public COWHelper<IColumn,
ColumnStr<T>> {
if (offsets[-1] != 0) {
throw Exception(Status::InternalError("wrong offsets[-1]: {}",
offsets[-1]));
}
-#endif
}
Review Comment:
Removing the `#ifndef NDEBUG` guard means `sanity_check_simple()` now runs
in release builds. This method is called from many hot-path accessors (e.g.
`get_data_at()` and others), so this introduces per-element invariant checking
overhead across string processing. If the intent is boundary validation,
consider keeping these checks behind `sanity_check()` (invoked at operator
output boundaries) or gating with a runtime flag/debug point, rather than
running on every string access.
##########
be/src/exec/operator/nested_loop_join_probe_operator.cpp:
##########
@@ -51,6 +52,7 @@ ColumnPtr align_eval_column_nullable(const
ColumnWithTypeAndName& target, const
void append_many_from_source(MutableColumnPtr& dst_column, const
ColumnWithTypeAndName& src_column,
size_t row, size_t rows) {
+ src_column.column->sanity_check();
if (!src_column.column->is_nullable() && dst_column->is_nullable()) {
Review Comment:
Same issue as above: `src_column.column->sanity_check()` may throw and is
not exception-safe in this helper. This can terminate the process rather than
surfacing a `Status` error. Please wrap in an exception-to-Status conversion at
a `Status` boundary (or refactor helper to return `Status`).
##########
be/src/exec/operator/nested_loop_join_probe_operator.cpp:
##########
@@ -67,7 +69,9 @@ void append_filtered_from_source(MutableColumnPtr& dst_column,
if (selected_rows == 0) {
return;
}
+ src_column.column->sanity_check();
auto filtered_column = src_column.column->filter(filter, selected_rows);
+ filtered_column->sanity_check();
if (!src_column.column->is_nullable() && dst_column->is_nullable()) {
Review Comment:
`sanity_check()` calls here (both on the source and the filtered result) can
throw exceptions, and there’s no `RETURN_IF_CATCH_EXCEPTION`/try-catch to
convert to `Status`. Since the pipeline task loop doesn’t catch
`doris::Exception`, this risks crashing the backend on malformed columns.
Please refactor so these checks return/propgate `Status` instead of throwing
across operator execution.
##########
be/src/exec/operator/nested_loop_join_probe_operator.cpp:
##########
@@ -39,6 +39,7 @@ constexpr int8_t MARK_TRUE = 1;
constexpr int8_t MARK_NULL = -1;
ColumnPtr make_const_column_from_row(const ColumnWithTypeAndName& source,
size_t row, size_t rows) {
+ source.column->sanity_check();
Review Comment:
`sanity_check()` can throw `doris::Exception`, but here it’s called without
any `RETURN_IF_CATCH_EXCEPTION`/try-catch wrapper. Pipeline execution paths
(e.g. PipelineTask::execute) don’t generally catch exceptions, so this can
crash the BE instead of returning a `Status`. Consider moving these checks into
a `Status`-returning path (wrap at call sites with `RETURN_IF_CATCH_EXCEPTION`,
or change helper to return `Status` and propagate errors).
##########
be/src/exec/operator/operator.cpp:
##########
@@ -413,11 +413,14 @@ Status
OperatorXBase::get_block_after_projects(RuntimeState* state, Block* block
return status;
}
status = do_projections(state, &local_state->_origin_block, block);
- return status;
+ RETURN_IF_ERROR(status);
+ RETURN_IF_ERROR(block->check_type_and_column());
+ return Status::OK();
}
status = get_block(state, block, eos);
+ RETURN_IF_ERROR(status);
RETURN_IF_ERROR(block->check_type_and_column());
Review Comment:
`get_block_after_projects()` now calls `block->check_type_and_column()` on
every operator output. Note that PipelineTask already calls
`block->check_type_and_column()` immediately after
`_root->get_block_after_projects()`
(be/src/exec/pipeline/pipeline_task.cpp:649-651), so the root output block will
be validated twice per pull. Consider avoiding duplicate checks (e.g. keep the
check only at operator boundaries OR only at the pipeline task boundary, or
gate one side behind a flag) to reduce overhead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]