yiguolei commented on code in PR #59832:
URL: https://github.com/apache/doris/pull/59832#discussion_r2766889531
##########
be/src/pipeline/exec/hashjoin_build_sink.h:
##########
@@ -148,6 +148,14 @@ class HashJoinBuildSinkOperatorX MOCK_REMOVE(final)
}
std::vector<bool>& is_null_safe_eq_join() { return _is_null_safe_eq_join; }
+ bool allow_left_semi_direct_return(RuntimeState* state) const {
+ // only single join conjunct and left semi join can direct return
+ return _join_op == TJoinOp::LEFT_SEMI_JOIN && _build_expr_ctxs.size()
== 1 &&
Review Comment:
这里_build_expr_ctxs 的size 肯定 == probe 端的expr size ?
##########
be/src/runtime_filter/runtime_filter_producer_helper.cpp:
##########
@@ -164,4 +164,13 @@ void RuntimeFilterProducerHelper::collect_realtime_profile(
build_timer->set(_runtime_filter_compute_timer->value());
}
+bool RuntimeFilterProducerHelper::detect_local_in_filter(RuntimeState* state) {
+ for (const auto& filter : _producers) {
+ if (filter->detect_in_filter()) {
Review Comment:
加一下注释:_producers 数组中对应这个 join key 的 filter 只有一个
或者如果有多个 filter(比如 IN_OR_BLOOM 类型),只要其中任意一个是 IN filter,就说明 build
端数据量足够小,可以精确匹配
##########
be/src/runtime_filter/runtime_filter_consumer.cpp:
##########
@@ -83,6 +83,12 @@ Status
RuntimeFilterConsumer::_get_push_exprs(std::vector<vectorized::VRuntimeFi
auto real_filter_type = _wrapper->get_real_type();
bool null_aware = _wrapper->contain_null();
+ bool detected_in_filter = _wrapper->is_detected_in_filter();
+
+ // Set sampling frequency based on detected_in_filter status
+ int sampling_frequency = detected_in_filter ? -1 :
config::runtime_filter_sampling_frequency;
Review Comment:
写成这样,或者加注释
static constexpr int DISABLE_SAMPLING = -1;
int _sampling_frequency = DISABLE_SAMPLING; // -1 means disabled
##########
be/src/runtime_filter/runtime_filter_selectivity.h:
##########
@@ -61,34 +61,35 @@ class RuntimeFilterSelectivity {
bool maybe_always_true_can_ignore() const {
/// TODO: maybe we can use session variable to control this behavior ?
- if (config::runtime_filter_sampling_frequency <= 0) {
+ if (_sampling_frequency <= 0) {
Review Comment:
不要依赖sample frequency 这种隐式的判断,直接弄一个变量比如_can_be_ignore 这种
##########
be/src/runtime_filter/runtime_filter_consumer.h:
##########
@@ -93,7 +93,8 @@ class RuntimeFilterConsumer : public RuntimeFilter {
_rf_state(State::NOT_READY) {
// If bitmap filter is not applied, it will cause the query result to
be incorrect
bool wait_infinitely = state->runtime_filter_wait_infinitely() ||
Review Comment:
之前对于local 的rf,不是wait 成功吗
##########
be/src/runtime_filter/runtime_filter_wrapper.h:
##########
@@ -120,6 +122,17 @@ class RuntimeFilterWrapper {
}
}
+ bool detect_in_filter() {
+ if (get_real_type() != RuntimeFilterType::IN_FILTER) {
+ return false;
+ }
+ if (_state != State::READY) {
+ return false;
+ }
+ _detected_in_filter = true;
Review Comment:
不要在这里设置这个,这个方法实际做了两件事,
1. 检查自己是不是in
2. 第二设置了这个_detected_in_filter 这个标记,让后续always true的逻辑失效。
我们把第二个逻辑,单独搞一个方法,比如叫 disable_always_true。
// 1. 纯查询方法 - 只判断是否是 IN filter
bool is_local_in_filter() const {
return get_real_type() == RuntimeFilterType::IN_FILTER &&
_state == State::READY;
}
// 2. 显式的状态修改方法 - 禁用 always_true 优化
void disable_always_true_optimization() {
_disable_always_true_optimization = true;
}
##########
be/src/runtime_filter/runtime_filter_producer_helper.cpp:
##########
@@ -164,4 +164,13 @@ void RuntimeFilterProducerHelper::collect_realtime_profile(
build_timer->set(_runtime_filter_compute_timer->value());
}
+bool RuntimeFilterProducerHelper::detect_local_in_filter(RuntimeState* state) {
+ for (const auto& filter : _producers) {
+ if (filter->detect_in_filter()) {
Review Comment:
这里为什么只有有一个in 就返回true,而不是所有的?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]