Copilot commented on code in PR #60577:
URL: https://github.com/apache/doris/pull/60577#discussion_r2773377927
##########
regression-test/suites/query_p0/join/test_left_join1.groovy:
##########
@@ -23,6 +23,25 @@ suite("test_left_join1", "query,p0,arrow_flight_sql") {
sql """insert into ${tableName} values (1, 123),(2, 124),(3, 125),(4,
126);"""
+ qt_select """ SELECT
+ *
+ FROM
+ ( SELECT f_key, f_value FROM ${tableName} ) a
+ LEFT JOIN ( SELECT f_key, f_value FROM ${tableName} ) b ON
a.f_key = b.f_key
+ LEFT JOIN (
+ SELECT
+ *
+ FROM
+ ${tableName}
+ WHERE
+ f_key IN ( SELECT f_key FROM ${tableName} WHERE f_key IN (
SELECT f_key FROM ${tableName} WHERE f_value > 123 ) )
+ ) c ON a.f_key = c.f_key
+ ORDER BY
+ a.f_key;
+ """
Review Comment:
The test should explicitly verify that the LEFT_SEMI_DIRECT_RETURN_OPT
feature is being used. Consider adding a check in the test (e.g., via EXPLAIN
or query profile) to confirm that "LeftSemiDirectReturn" is set to true,
ensuring that the optimization is actually triggered and tested.
##########
be/src/olap/column_predicate.h:
##########
@@ -329,8 +329,10 @@ class ColumnPredicate : public
std::enable_shared_from_this<ColumnPredicate> {
void attach_profile_counter(
int filter_id, std::shared_ptr<RuntimeProfile::Counter>
predicate_filtered_rows_counter,
std::shared_ptr<RuntimeProfile::Counter>
predicate_input_rows_counter,
- std::shared_ptr<RuntimeProfile::Counter>
predicate_always_true_rows_counter) {
+ std::shared_ptr<RuntimeProfile::Counter>
predicate_always_true_rows_counter,
+ const RuntimeFilterSelectivity& rf_selectivity) {
_runtime_filter_id = filter_id;
+ _rf_selectivity = rf_selectivity;
Review Comment:
Copying RuntimeFilterSelectivity by value in attach_profile_counter may lead
to unexpected behavior. The selectivity tracking state (_judge_input_rows,
_judge_filter_rows, _judge_counter, _always_true, _sampling_frequency) is
copied, but the two instances will then track independently. This means changes
to the original RuntimeFilterSelectivity won't be reflected in the
ColumnPredicate's copy. Consider whether this should be a reference (perhaps
stored as a pointer) or if the copy semantics are intentional.
##########
be/src/runtime_filter/runtime_filter_wrapper.h:
##########
@@ -139,6 +153,10 @@ class RuntimeFilterWrapper {
std::shared_ptr<BloomFilterFuncBase> _bloom_filter_func;
std::shared_ptr<BitmapFilterFuncBase> _bitmap_filter_func;
+ // disable always_true logic if detected in filter
+ // to make left_semi_direct_return_opt work correctly
+ bool _disable_always_true_logic = false;
Review Comment:
The _disable_always_true_logic field should be std::atomic<bool> instead of
plain bool. While there appears to be a happens-before relationship through the
signaling mechanism between the producer setting this flag and the consumer
reading it, using an atomic type would provide clearer thread-safety guarantees
and prevent potential issues from compiler optimizations or memory reordering.
The wrapper is explicitly documented as being shared between producer and
consumer (line 160-161), making thread-safety critical.
##########
be/src/runtime_filter/runtime_filter_consumer.h:
##########
@@ -93,8 +93,10 @@ class RuntimeFilterConsumer : public RuntimeFilter {
_registration_time(MonotonicMillis()),
_rf_state(State::NOT_READY) {
// If bitmap filter is not applied, it will cause the query result to
be incorrect
+ // local rf must wait until timeout, otherwise it may lead results
incorrectness, because LEFT_SEMI_DIRECT_RETURN_OPT
Review Comment:
The comment should clarify the specific scenario. The issue is: when
LEFT_SEMI_DIRECT_RETURN_OPT is enabled, the probe operator bypasses the hash
table lookup and directly outputs all probe rows. For this to be correct, the
local IN filter must be applied to all probe rows before they reach the probe
operator. If the consumer times out and doesn't wait for the local filter,
unfiltered rows would be incorrectly output. Consider expanding the comment to
explain this specific correctness requirement more explicitly.
```suggestion
// When LEFT_SEMI_DIRECT_RETURN_OPT is enabled, the probe operator
can bypass the hash
// table lookup and directly return all probe rows. For the query
result to be correct,
// the local IN/bitmap runtime filter must have been applied to
every probe row before
// it reaches that probe operator. If a local consumer "gives up"
too early (does not
// wait for the local filter to arrive and does not wait until
timeout), unfiltered
// probe rows would be passed through and returned, leading to
incorrect query results.
// Therefore, for certain local filters (e.g. bitmap filters or when
there is no remote
// target), we force the consumer to wait until timeout (or until
the filter is ready).
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]