github-actions[bot] commented on code in PR #61260:
URL: https://github.com/apache/doris/pull/61260#discussion_r2964634011


##########
be/src/exec/operator/aggregation_sink_operator.cpp:
##########
@@ -25,6 +25,7 @@
 #include "core/data_type/primitive_type.h"
 #include "exec/common/hash_table/hash.h"
 #include "exec/operator/operator.h"
+#include "exprs/aggregate/aggregate_function_count.h"
 #include "exprs/aggregate/aggregate_function_simple_factory.h"
 #include "exprs/vectorized_agg_fn.h"
 #include "runtime/runtime_profile.h"

Review Comment:
   **Minor: Comment says "count(*) / count(not_null_column)" but only count(*) 
is actually supported**
   
   `AggregateFunctionCountNotNullUnary` does NOT override `is_simple_count()` 
(it stays `false`). So this optimization only applies to 
`AggregateFunctionCount` (the `count(*)` variant), not to 
`count(not_null_column)`. The comment is misleading.
   
   Also, consider adding `static_assert(sizeof(AggregateDataPtr) == 
sizeof(UInt64))` somewhere in this file to document and enforce the fundamental 
64-bit assumption of this optimization.



##########
be/src/exec/operator/streaming_aggregation_operator.cpp:
##########
@@ -97,22 +99,36 @@ Status StreamingAggLocalState::open(RuntimeState* state) {
 
     RETURN_IF_ERROR(_init_hash_method(_probe_expr_ctxs));
 
-    std::visit(Overload {[&](std::monostate& arg) -> void {
-                             throw doris::Exception(ErrorCode::INTERNAL_ERROR,
-                                                    "uninited hash table");
-                         },

Review Comment:
   **Bug: `limit == -1` check is always true (tautology)**
   
   At this point, `limit` is still its default value `-1` (initialized in the 
header at line 106). The actual assignment `limit = p._sort_limit` happens 
later at line 133.
   
   This means when `p._sort_limit > 0` (sort-limit optimization active), 
`_use_simple_count` will incorrectly be set to `true`, and 
`_aggregate_data_container` will be `nullptr` (skipped at line 122). This leads 
to a null pointer dereference when `build_limit_heap()` → 
`_get_keys_hash_table()` accesses `_aggregate_data_container->begin()`.
   
   **Fix:** Either move `limit = p._sort_limit` before this check, or use 
`p._sort_limit` directly:
   ```cpp
   if (_aggregate_evaluators.size() == 1 &&
       _aggregate_evaluators[0]->function()->is_simple_count() && p._sort_limit 
== -1) {
   ```
   
   Note: The analogous check in `AggSinkLocalState::open()` 
(aggregation_sink_operator.cpp:175) correctly uses `!_should_limit_output` 
which is already set by that point.



##########
be/src/exec/pipeline/dependency.h:
##########
@@ -320,6 +320,7 @@ struct AggSharedState : public BasicSharedState {
     bool enable_spill = false;
     bool reach_limit = false;
 
+    bool use_simple_count = false;
     int64_t limit = -1;
     bool do_sort_limit = false;
     MutableColumns limit_columns;

Review Comment:
   **Latent bug: `reset_hash_table()` in `dependency.cpp:278-311` lacks 
`use_simple_count` guard**
   
   `_close_with_serialized_key()` here correctly checks `use_simple_count` 
before calling `_destroy_agg_status()`. However, 
`AggSharedState::reset_hash_table()` (dependency.cpp:290-295) does NOT check 
`use_simple_count` before calling `_destroy_agg_status()` on mapped values.
   
   Currently safe because `use_simple_count` requires `!enable_spill` and 
`reset_hash_table()` is only called from spill paths. But if the spill 
exclusion is ever relaxed, this becomes a wild-pointer dereference crash.
   
   Suggestion: Add a matching guard or at minimum a `DCHECK(!use_simple_count)` 
in `reset_hash_table()` for safety.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to