Copilot commented on code in PR #63518:
URL: https://github.com/apache/doris/pull/63518#discussion_r3285775912


##########
be/src/core/column/column_string.h:
##########
@@ -128,7 +127,6 @@ class ColumnStr final : public COWHelper<IColumn, 
ColumnStr<T>> {
         if (offsets[-1] != 0) {
             throw Exception(Status::InternalError("wrong offsets[-1]: {}", 
offsets[-1]));
         }
-#endif
     }

Review Comment:
   Removing the `#ifndef NDEBUG` guard means `sanity_check_simple()` now runs 
in release builds. This method is called from many hot-path accessors (e.g. 
`get_data_at()` and others), so this introduces per-element invariant checking 
overhead across string processing. If the intent is boundary validation, 
consider keeping these checks behind `sanity_check()` (invoked at operator 
output boundaries) or gating with a runtime flag/debug point, rather than 
running on every string access.



##########
be/src/exec/operator/nested_loop_join_probe_operator.cpp:
##########
@@ -51,6 +52,7 @@ ColumnPtr align_eval_column_nullable(const 
ColumnWithTypeAndName& target, const
 
 void append_many_from_source(MutableColumnPtr& dst_column, const 
ColumnWithTypeAndName& src_column,
                              size_t row, size_t rows) {
+    src_column.column->sanity_check();
     if (!src_column.column->is_nullable() && dst_column->is_nullable()) {

Review Comment:
   Same issue as above: `src_column.column->sanity_check()` may throw and is 
not exception-safe in this helper. This can terminate the process rather than 
surfacing a `Status` error. Please wrap in an exception-to-Status conversion at 
a `Status` boundary (or refactor helper to return `Status`).



##########
be/src/exec/operator/nested_loop_join_probe_operator.cpp:
##########
@@ -67,7 +69,9 @@ void append_filtered_from_source(MutableColumnPtr& dst_column,
     if (selected_rows == 0) {
         return;
     }
+    src_column.column->sanity_check();
     auto filtered_column = src_column.column->filter(filter, selected_rows);
+    filtered_column->sanity_check();
     if (!src_column.column->is_nullable() && dst_column->is_nullable()) {

Review Comment:
   `sanity_check()` calls here (both on the source and the filtered result) can 
throw exceptions, and there’s no `RETURN_IF_CATCH_EXCEPTION`/try-catch to 
convert to `Status`. Since the pipeline task loop doesn’t catch 
`doris::Exception`, this risks crashing the backend on malformed columns. 
Please refactor so these checks return/propgate `Status` instead of throwing 
across operator execution.



##########
be/src/exec/operator/nested_loop_join_probe_operator.cpp:
##########
@@ -39,6 +39,7 @@ constexpr int8_t MARK_TRUE = 1;
 constexpr int8_t MARK_NULL = -1;
 
 ColumnPtr make_const_column_from_row(const ColumnWithTypeAndName& source, 
size_t row, size_t rows) {
+    source.column->sanity_check();

Review Comment:
   `sanity_check()` can throw `doris::Exception`, but here it’s called without 
any `RETURN_IF_CATCH_EXCEPTION`/try-catch wrapper. Pipeline execution paths 
(e.g. PipelineTask::execute) don’t generally catch exceptions, so this can 
crash the BE instead of returning a `Status`. Consider moving these checks into 
a `Status`-returning path (wrap at call sites with `RETURN_IF_CATCH_EXCEPTION`, 
or change helper to return `Status` and propagate errors).
   



##########
be/src/exec/operator/operator.cpp:
##########
@@ -413,11 +413,14 @@ Status 
OperatorXBase::get_block_after_projects(RuntimeState* state, Block* block
             return status;
         }
         status = do_projections(state, &local_state->_origin_block, block);
-        return status;
+        RETURN_IF_ERROR(status);
+        RETURN_IF_ERROR(block->check_type_and_column());
+        return Status::OK();
     }
     status = get_block(state, block, eos);
+    RETURN_IF_ERROR(status);
     RETURN_IF_ERROR(block->check_type_and_column());

Review Comment:
   `get_block_after_projects()` now calls `block->check_type_and_column()` on 
every operator output. Note that PipelineTask already calls 
`block->check_type_and_column()` immediately after 
`_root->get_block_after_projects()` 
(be/src/exec/pipeline/pipeline_task.cpp:649-651), so the root output block will 
be validated twice per pull. Consider avoiding duplicate checks (e.g. keep the 
check only at operator boundaries OR only at the pipeline task boundary, or 
gate one side behind a flag) to reduce overhead.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to