(doris) branch master updated (cfe1506550c -> 9bd671f6b97)

2024-09-30 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from cfe1506550c [opt](nereids) reimplement or-to-in rule (#41222)
 add 9bd671f6b97 [chore](check) open shorten-64-to-32 error  (#41197)

No new revisions were added by this update.

Summary of changes:
 be/src/common/cast_set.h   | 73 ++
 .../{io/fs/hdfs.h => common/compile_check_begin.h} |  8 +--
 .../{env_config.h.in => compile_check_end.h}   |  9 ++-
 be/src/pipeline/exec/aggregation_sink_operator.cpp | 18 +++---
 .../pipeline/exec/aggregation_source_operator.cpp  | 18 +++---
 be/src/pipeline/exec/analytic_sink_operator.cpp|  3 +-
 be/src/pipeline/exec/analytic_sink_operator.h  |  2 +-
 be/src/pipeline/exec/analytic_source_operator.cpp  |  7 ++-
 be/src/pipeline/exec/analytic_source_operator.h|  2 +-
 be/src/pipeline/exec/scan_operator.cpp |  3 +-
 be/src/pipeline/pipeline_fragment_context.cpp  | 22 ---
 be/src/vec/core/block.cpp  |  4 +-
 be/src/vec/core/block.h|  6 +-
 be/src/vec/exprs/vectorized_agg_fn.cpp |  2 +-
 be/src/vec/exprs/vectorized_agg_fn.h   |  2 +-
 be/src/vec/exprs/vexpr_context.cpp |  4 +-
 be/src/vec/exprs/vexpr_context.h   |  4 +-
 17 files changed, 133 insertions(+), 54 deletions(-)
 create mode 100644 be/src/common/cast_set.h
 copy be/src/{io/fs/hdfs.h => common/compile_check_begin.h} (85%)
 copy be/src/common/{env_config.h.in => compile_check_end.h} (89%)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-3.0 updated: [cherry-pick](branch-30) execute expr should use local states instead of operators(#40189) (#41324)

2024-09-27 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 5734bbcb287 [cherry-pick](branch-30) execute expr should use local 
states instead of operators(#40189) (#41324)
5734bbcb287 is described below

commit 5734bbcb287a4d16cd0a7db545c406d966c714fb
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Fri Sep 27 19:36:45 2024 +0800

[cherry-pick](branch-30) execute expr should use local states instead of 
operators(#40189) (#41324)

The expr of operator cannot be executed concurrently, should use local
state's expr.
cherry-pick from master https://github.com/apache/doris/pull/40189
---
 be/src/pipeline/exec/aggregation_source_operator.cpp|  3 ++-
 be/src/pipeline/exec/assert_num_rows_operator.cpp   |  3 ++-
 be/src/pipeline/exec/assert_num_rows_operator.h |  3 +++
 .../exec/distinct_streaming_aggregation_operator.cpp|  4 ++--
 be/src/pipeline/exec/multi_cast_data_stream_source.h|  7 ---
 be/src/pipeline/exec/operator.h | 12 +++-
 be/src/pipeline/exec/repeat_operator.cpp|  2 +-
 be/src/pipeline/exec/streaming_aggregation_operator.cpp |  4 ++--
 regression-test/data/javaudf_p0/test_javaudf_string.out |  3 +++
 .../suites/javaudf_p0/test_javaudf_string.groovy| 13 +
 10 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/be/src/pipeline/exec/aggregation_source_operator.cpp 
b/be/src/pipeline/exec/aggregation_source_operator.cpp
index 3264ad56f3c..a5f40a431c5 100644
--- a/be/src/pipeline/exec/aggregation_source_operator.cpp
+++ b/be/src/pipeline/exec/aggregation_source_operator.cpp
@@ -443,7 +443,8 @@ Status AggSourceOperatorX::get_block(RuntimeState* state, 
vectorized::Block* blo
 RETURN_IF_ERROR(local_state._executor.get_result(state, block, eos));
 local_state.make_nullable_output_key(block);
 // dispose the having clause, should not be execute in prestreaming agg
-RETURN_IF_ERROR(vectorized::VExprContext::filter_block(_conjuncts, block, 
block->columns()));
+
RETURN_IF_ERROR(vectorized::VExprContext::filter_block(local_state._conjuncts, 
block,
+   block->columns()));
 local_state.do_agg_limit(block, eos);
 return Status::OK();
 }
diff --git a/be/src/pipeline/exec/assert_num_rows_operator.cpp 
b/be/src/pipeline/exec/assert_num_rows_operator.cpp
index 4a51002beff..5aa27b51c45 100644
--- a/be/src/pipeline/exec/assert_num_rows_operator.cpp
+++ b/be/src/pipeline/exec/assert_num_rows_operator.cpp
@@ -116,7 +116,8 @@ Status AssertNumRowsOperatorX::pull(doris::RuntimeState* 
state, vectorized::Bloc
 }
 COUNTER_SET(local_state.rows_returned_counter(), 
local_state.num_rows_returned());
 COUNTER_UPDATE(local_state.blocks_returned_counter(), 1);
-RETURN_IF_ERROR(vectorized::VExprContext::filter_block(_conjuncts, block, 
block->columns()));
+
RETURN_IF_ERROR(vectorized::VExprContext::filter_block(local_state._conjuncts, 
block,
+   block->columns()));
 return Status::OK();
 }
 
diff --git a/be/src/pipeline/exec/assert_num_rows_operator.h 
b/be/src/pipeline/exec/assert_num_rows_operator.h
index 423bd69144e..dcc64f57878 100644
--- a/be/src/pipeline/exec/assert_num_rows_operator.h
+++ b/be/src/pipeline/exec/assert_num_rows_operator.h
@@ -28,6 +28,9 @@ public:
 AssertNumRowsLocalState(RuntimeState* state, OperatorXBase* parent)
 : PipelineXLocalState(state, parent) {}
 ~AssertNumRowsLocalState() = default;
+
+private:
+friend class AssertNumRowsOperatorX;
 };
 
 class AssertNumRowsOperatorX final : public 
StreamingOperatorX {
diff --git a/be/src/pipeline/exec/distinct_streaming_aggregation_operator.cpp 
b/be/src/pipeline/exec/distinct_streaming_aggregation_operator.cpp
index e8efb51973e..ab71b52ae01 100644
--- a/be/src/pipeline/exec/distinct_streaming_aggregation_operator.cpp
+++ b/be/src/pipeline/exec/distinct_streaming_aggregation_operator.cpp
@@ -462,8 +462,8 @@ Status DistinctStreamingAggOperatorX::pull(RuntimeState* 
state, vectorized::Bloc
 local_state._make_nullable_output_key(block);
 if (!_is_streaming_preagg) {
 // dispose the having clause, should not be execute in prestreaming agg
-RETURN_IF_ERROR(
-vectorized::VExprContext::filter_block(_conjuncts, block, 
block->columns()));
+
RETURN_IF_ERROR(vectorized::VExprContext::filter_block(local_state._conjuncts, 
block,
+   
block->columns()));
 }
 local_state.add_num_rows_returned(block->rows());
 COUNTER_UPDATE(local_state.blocks_retu

(doris) branch master updated: [fix](function) add time type in conditional-functions (#41270)

2024-09-26 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 0830ecb2e9e [fix](function) add time type in conditional-functions 
(#41270)
0830ecb2e9e is described below

commit 0830ecb2e9ea537ac6ff1ba948461fd00022bf7f
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Thu Sep 26 22:27:32 2024 +0800

[fix](function) add time type in conditional-functions (#41270)

before
```
mysql [(none)]>select sec_to_time(time_to_sec(cast('2024-09-24 16:00:00' as 
varchar)));

+---+
| sec_to_time(time_to_sec(cast(cast('2024-09-24 16:00:00' as 
VARCHAR(65533)) as TIME))) |

+---+
| 16:00:00  
|

+---+

mysql [(none)]>select ifnull(sec_to_time(time_to_sec(cast('2024-09-24 
16:00:00' as varchar))), cast(300 as time));

+--+
| ifnull(cast(sec_to_time(time_to_sec(cast(cast('2024-09-24 16:00:00' as 
VARCHAR(65533)) as TIME))) as DOUBLE), cast(cast(300 as TIME) as DOUBLE)) |

+--+
|   
   576 |

+--+
```
now
```
mysql [(none)]>select ifnull(sec_to_time(time_to_sec(cast('2024-09-24 
16:00:00' as varchar))), cast(300 as time));

+--+
| ifnull(sec_to_time(time_to_sec(cast(cast('2024-09-24 16:00:00' as 
VARCHAR(65533)) as TIME))), cast(300 as TIME)) |

+--+
| 16:00:00  
   |

+--+
```
---
 .../nereids/trees/expressions/functions/scalar/Coalesce.java |  4 
 .../doris/nereids/trees/expressions/functions/scalar/If.java |  9 -
 .../nereids/trees/expressions/functions/scalar/NullIf.java   |  4 
 .../nereids/trees/expressions/functions/scalar/Nvl.java  |  8 +++-
 .../data/correctness_p0/test_case_when_decimal.out   |  6 ++
 .../suites/correctness_p0/test_case_when_decimal.groovy  | 12 
 6 files changed, 41 insertions(+), 2 deletions(-)

diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/Coalesce.java
 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/Coalesce.java
index f1d122d0179..2ed864ba9a0 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/Coalesce.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/Coalesce.java
@@ -36,6 +36,8 @@ import org.apache.doris.nereids.types.IntegerType;
 import org.apache.doris.nereids.types.LargeIntType;
 import org.apache.doris.nereids.types.SmallIntType;
 import org.apache.doris.nereids.types.StringType;
+import org.apache.doris.nereids.types.TimeType;
+import org.apache.doris.nereids.types.TimeV2Type;
 import org.apache.doris.nereids.types.TinyIntType;
 import org.apache.doris.nereids.types.VarcharType;
 import org.apache.doris.nereids.util.ExpressionUtils;
@@ -64,6 +66,8 @@ public class Coalesce extends ScalarFunction
 
FunctionSignature.ret(DateTimeType.INSTANCE).varArgs(DateTimeType.INSTANCE),
 
FunctionSignature.ret(DateV2Type.INSTANCE).varArgs(DateV2Type.INSTANCE),
 
FunctionSignature.ret(DateType.INSTANCE).varArgs(DateType.INSTANCE),
+
FunctionSignature.ret(TimeType.INSTANCE).varArgs(TimeType.INSTANCE),
+
FunctionSignature.ret(TimeV2Type.INSTANCE).varArgs(TimeV2Type.INSTANCE),
 
FunctionSignature.ret(DecimalV3Type.WILDCARD).varArgs(DecimalV3Type.WILDCARD),
 
FunctionSigna

(doris) branch master updated: [Performance](func) opt the print_id func (#41302)

2024-09-25 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 2208cde22e7 [Performance](func) opt the print_id func (#41302)
2208cde22e7 is described below

commit 2208cde22e7d8f85c7b62ee571d6dc7fd514b91f
Author: HappenLee 
AuthorDate: Thu Sep 26 10:20:10 2024 +0800

[Performance](func) opt the print_id func (#41302)

Load Average: 52.48, 35.97, 38.43
--
BenchmarkTime CPU   Iterations
--
old 3390427270 ns   3390354519 ns1
new335514305 ns335513720 ns2
---
 be/src/util/uid_util.cpp | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/be/src/util/uid_util.cpp b/be/src/util/uid_util.cpp
index 6743c05a842..0f93f437ab6 100644
--- a/be/src/util/uid_util.cpp
+++ b/be/src/util/uid_util.cpp
@@ -17,6 +17,7 @@
 
 #include "util/uid_util.h"
 
+#include 
 #include 
 #include 
 #include 
@@ -44,15 +45,13 @@ std::ostream& operator<<(std::ostream& os, const UniqueId& 
uid) {
 }
 
 std::string print_id(const TUniqueId& id) {
-std::stringstream out;
-out << std::hex << id.hi << "-" << id.lo;
-return out.str();
+return fmt::format(FMT_COMPILE("{:x}-{:x}"), static_cast(id.hi),
+   static_cast(id.lo));
 }
 
 std::string print_id(const PUniqueId& id) {
-std::stringstream out;
-out << std::hex << id.hi() << "-" << id.lo();
-return out.str();
+return fmt::format(FMT_COMPILE("{:x}-{:x}"), 
static_cast(id.hi()),
+   static_cast(id.lo()));
 }
 
 bool parse_id(const std::string& s, TUniqueId* id) {


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (af9a2f43862 -> c687481db84)

2024-09-25 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from af9a2f43862 [improvement](binlog) filter dropped indexes (#41246)
 add c687481db84 [Exec)(cache) add element count in LRU cache (#41199)

No new revisions were added by this update.

Summary of changes:
 be/src/olap/lru_cache.cpp   | 8 
 be/src/olap/lru_cache.h | 4 
 be/src/pipeline/query_cache/query_cache.cpp | 2 +-
 be/src/runtime/memory/lru_cache_policy.h| 2 ++
 4 files changed, 15 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (c95eff563b6 -> 82262218c68)

2024-09-24 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from c95eff563b6 [fix](load) fix memtable memory limiter total mem usage 
(#41245)
 add 82262218c68 [feat](skew & kurt) New aggregate function skew & kurt 
(#40945)

No new revisions were added by this update.

Summary of changes:
 .../pipeline/exec/aggregation_source_operator.cpp  |   5 +
 .../aggregate_function_kurtosis.cpp|  80 ++
 .../aggregate_function_simple_factory.cpp  |   5 +
 .../aggregate_function_simple_factory.h|   1 -
 .../aggregate_function_skew.cpp|  80 ++
 .../aggregate_function_statistic.h | 163 +
 be/src/vec/aggregate_functions/moments.h   | 114 ++
 .../doris/catalog/BuiltinAggregateFunctions.java   |   6 +-
 .../java/org/apache/doris/catalog/FunctionSet.java |  36 +
 .../trees/expressions/functions/agg/Kurt.java  |  79 ++
 .../trees/expressions/functions/agg/Skew.java  |  80 ++
 .../visitor/AggregateFunctionVisitor.java  |  10 ++
 .../query_p0/aggregate/aggregate_function_kurt.out |  52 +++
 .../query_p0/aggregate/aggregate_function_skew.out |  52 +++
 .../aggregate/aggregate_function_kurt.groovy   |  78 ++
 .../aggregate/aggregate_function_skew.groovy   |  78 ++
 16 files changed, 917 insertions(+), 2 deletions(-)
 create mode 100644 
be/src/vec/aggregate_functions/aggregate_function_kurtosis.cpp
 create mode 100644 be/src/vec/aggregate_functions/aggregate_function_skew.cpp
 create mode 100644 
be/src/vec/aggregate_functions/aggregate_function_statistic.h
 create mode 100644 be/src/vec/aggregate_functions/moments.h
 create mode 100644 
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/Kurt.java
 create mode 100644 
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/Skew.java
 create mode 100644 
regression-test/data/query_p0/aggregate/aggregate_function_kurt.out
 create mode 100644 
regression-test/data/query_p0/aggregate/aggregate_function_skew.out
 create mode 100644 
regression-test/suites/query_p0/aggregate/aggregate_function_kurt.groovy
 create mode 100644 
regression-test/suites/query_p0/aggregate/aggregate_function_skew.groovy


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [refactor](opt) improve BE code readability of multi_match_any function (#39354)

2024-09-22 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new cb2e915869d [refactor](opt) improve BE code readability of 
multi_match_any function (#39354)
cb2e915869d is described below

commit cb2e915869d1590640ba44b25c40207b3603ae99
Author: Chester 
AuthorDate: Mon Sep 23 09:34:30 2024 +0800

[refactor](opt) improve BE code readability of multi_match_any function 
(#39354)

To improve BE code readability of **multi_match_any** function, this PR
refactored codes by:
1. optimize the head files from deprecated C++ header 'stddef.h' to
'cstddef'
2. use readability-qualified-auto
3. use readability-braces-around-statements
4. extract common codes of `vector_constant()` and `vector_vector()`into
two functions:
`prepare_regexps_and_scratch()` and `on_match()`
5. simplify codes of `execute_impl()` dealing with null input by:
removing two rarely used variables `haystack_nullable` and
`needles_nullable`;
adding the function `handle_nullable_column()`
---
 .../functions/functions_multi_string_search.cpp| 200 +++--
 1 file changed, 101 insertions(+), 99 deletions(-)

diff --git a/be/src/vec/functions/functions_multi_string_search.cpp 
b/be/src/vec/functions/functions_multi_string_search.cpp
index f7a1b8d7a90..7736a1a039b 100644
--- a/be/src/vec/functions/functions_multi_string_search.cpp
+++ b/be/src/vec/functions/functions_multi_string_search.cpp
@@ -20,10 +20,10 @@
 
 #include 
 #include 
-#include 
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -80,42 +80,30 @@ public:
 auto haystack_column = block.get_by_position(arguments[0]).column;
 auto needles_column = block.get_by_position(arguments[1]).column;
 
-bool haystack_nullable = false;
-bool needles_nullable = false;
-
-if (haystack_column->is_nullable()) {
-haystack_nullable = true;
-}
-
-if (needles_column->is_nullable()) {
-needles_nullable = true;
-}
-
 auto haystack_ptr = remove_nullable(haystack_column);
 auto needles_ptr = remove_nullable(needles_column);
 
-const ColumnString* col_haystack_vector =
-check_and_get_column(&*haystack_ptr);
+const auto* col_haystack_vector = 
check_and_get_column(&*haystack_ptr);
 const ColumnConst* col_haystack_const =
 check_and_get_column_const(&*haystack_ptr);
 
-const ColumnArray* col_needles_vector =
-check_and_get_column(needles_ptr.get());
+const auto* col_needles_vector = 
check_and_get_column(needles_ptr.get());
 const ColumnConst* col_needles_const =
 check_and_get_column_const(needles_ptr.get());
 
-if (!col_needles_const && !col_needles_vector)
+if (!col_needles_const && !col_needles_vector) {
 return Status::InvalidArgument(
 "function '{}' encountered unsupported needles column, 
found {}", name,
 needles_column->get_name());
+}
 
-if (col_haystack_const && col_needles_vector)
+if (col_haystack_const && col_needles_vector) {
 return Status::InvalidArgument(
 "function '{}' doesn't support search with non-constant 
needles "
 "in constant haystack",
 name);
+}
 
-using ResultType = typename Impl::ResultType;
 auto col_res = ColumnVector::create();
 auto col_offsets = ColumnArray::ColumnOffsets::create();
 
@@ -140,25 +128,8 @@ public:
 return status;
 }
 
-if (haystack_nullable) {
-auto column_nullable = 
check_and_get_column(haystack_column.get());
-auto& null_map = column_nullable->get_null_map_data();
-for (size_t i = 0; i != input_rows_count; ++i) {
-if (null_map[i] == 1) {
-vec_res[i] = 0;
-}
-}
-}
-
-if (needles_nullable) {
-auto column_nullable = 
check_and_get_column(needles_column.get());
-auto& null_map = column_nullable->get_null_map_data();
-for (size_t i = 0; i != input_rows_count; ++i) {
-if (null_map[i] == 1) {
-vec_res[i] = 0;
-}
-}
-}
+handle_nullable_column(haystack_column, vec_res, input_rows_count);
+handle_nullable_column(needles_column, vec_res, input_rows_count);
 
 block.replace_by_position(result, std::move(col_res));
 
@@ -166,9 +137,25 @@ public:
 }
 
 private:
+ 

(doris) branch branch-2.0 updated: [Fix-2.0](column) Fix wrong has_null flag after filter_by_selector (#40849)

2024-09-20 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.0 by this push:
 new f4674f61a76 [Fix-2.0](column) Fix wrong has_null flag after 
filter_by_selector (#40849)
f4674f61a76 is described below

commit f4674f61a76fd2e774309849db6a5623b9c6a479
Author: zclllhhjj 
AuthorDate: Fri Sep 20 19:04:11 2024 +0800

[Fix-2.0](column) Fix wrong has_null flag after filter_by_selector (#40849)

in master and branch-2.1 it fixed by refactor
https://github.com/apache/doris/pull/40769. for 2.0 we pick
https://github.com/apache/doris/pull/40756 as a minimal modification
---
 be/src/vec/columns/column_nullable.cpp | 11 ++---
 .../correctness/test_column_nullable_cache.out |  5 ++
 .../correctness/test_column_nullable_cache.groovy  | 57 ++
 3 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/be/src/vec/columns/column_nullable.cpp 
b/be/src/vec/columns/column_nullable.cpp
index 47bd2d9e247..5dd612e6ced 100644
--- a/be/src/vec/columns/column_nullable.cpp
+++ b/be/src/vec/columns/column_nullable.cpp
@@ -365,15 +365,14 @@ size_t ColumnNullable::filter(const Filter& filter) {
 }
 
 Status ColumnNullable::filter_by_selector(const uint16_t* sel, size_t 
sel_size, IColumn* col_ptr) {
-const ColumnNullable* nullable_col_ptr = reinterpret_cast(col_ptr);
+ColumnNullable* nullable_col_ptr = 
reinterpret_cast(col_ptr);
 ColumnPtr nest_col_ptr = nullable_col_ptr->nested_column;
-ColumnPtr null_map_ptr = nullable_col_ptr->null_map;
+// `get_null_map_data` will set `_need_update_has_null` to true
+auto& res_nullmap = nullable_col_ptr->get_null_map_data();
+
 RETURN_IF_ERROR(get_nested_column().filter_by_selector(
 sel, sel_size, 
const_cast(nest_col_ptr.get(;
-//insert cur nullmap into result nullmap which is empty
-auto& res_nullmap = reinterpret_cast*>(
-
const_cast(null_map_ptr.get()))
-->get_data();
+
 DCHECK(res_nullmap.empty());
 res_nullmap.resize(sel_size);
 auto& cur_nullmap = get_null_map_column().get_data();
diff --git a/regression-test/data/correctness/test_column_nullable_cache.out 
b/regression-test/data/correctness/test_column_nullable_cache.out
new file mode 100644
index 000..024f1702b54
--- /dev/null
+++ b/regression-test/data/correctness/test_column_nullable_cache.out
@@ -0,0 +1,5 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !test --
+
+-- !test --
+0
diff --git 
a/regression-test/suites/correctness/test_column_nullable_cache.groovy 
b/regression-test/suites/correctness/test_column_nullable_cache.groovy
new file mode 100644
index 000..c6ec7b73848
--- /dev/null
+++ b/regression-test/suites/correctness/test_column_nullable_cache.groovy
@@ -0,0 +1,57 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_column_nullable_cache") {
+sql """
+drop table if exists test_column_nullable_cache;
+"""
+sql """
+CREATE TABLE `test_column_nullable_cache` (
+`col_int_undef_signed2` int NULL,
+`col_int_undef_signed` int NULL,
+`col_int_undef_signed3` int NULL,
+`col_int_undef_signed4` int NULL,
+`pk` int NULL
+) ENGINE=OLAP
+DUPLICATE KEY(`col_int_undef_signed2`)
+DISTRIBUTED by RANDOM BUCKETS 10
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1"
+);
+"""
+
+sql """
+insert into test_column_nullable_cache
+
(pk,col_int_undef_signed,col_int_undef_signed2,col_int_undef_signed3,col_int_undef_signed4)
 
+values 
(0,3,7164641,5,8),(1,null,3916062,5,6),(2,1,5533498,0,9),(3,7,2,null,7057679),(4,1,0,7,7),
+
(5,null,4,2448564,1),(6,7531976,7324373,9,7),(7,3,1,1,3)

(doris) branch master updated (c55c72121c2 -> 066de31fdae)

2024-09-20 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from c55c72121c2 [opt](binlog) Support rename column binlog (#39782)
 add 066de31fdae [Performance](exec) Add EOF status back in exchange 
operator (#41011)

No new revisions were added by this update.

Summary of changes:
 be/src/util/ref_count_closure.h | 8 ++--
 be/src/vec/runtime/vdata_stream_mgr.cpp | 6 +++---
 be/src/vec/sink/vdata_stream_sender.h   | 3 ++-
 3 files changed, 11 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [Refactor](scan) remove unless code in BE and FE (#40927)

2024-09-18 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 552cb345be8 [Refactor](scan) remove unless code in BE and FE (#40927)
552cb345be8 is described below

commit 552cb345be8fa5d359db88d2db910449d6889e60
Author: HappenLee 
AuthorDate: Thu Sep 19 11:15:19 2024 +0800

[Refactor](scan) remove unless code in BE and FE (#40927)

1. change error comment of FE
2. delete the unless code of BE/FE
---
 be/src/pipeline/exec/scan_operator.cpp   | 12 +++-
 .../src/main/java/org/apache/doris/qe/Coordinator.java   |  3 ---
 .../src/main/java/org/apache/doris/qe/SessionVariable.java   |  4 +++-
 3 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/be/src/pipeline/exec/scan_operator.cpp 
b/be/src/pipeline/exec/scan_operator.cpp
index 0c0cfb18c77..507039b1f5e 100644
--- a/be/src/pipeline/exec/scan_operator.cpp
+++ b/be/src/pipeline/exec/scan_operator.cpp
@@ -1200,18 +1200,12 @@ Status ScanOperatorX::init(const 
TPlanNode& tnode, RuntimeState*
 }
 }
 } else {
-
DCHECK(query_options.__isset.adaptive_pipeline_task_serial_read_on_limit);
 // The set of enable_adaptive_pipeline_task_serial_read_on_limit
 // is checked in previous branch.
 if (query_options.enable_adaptive_pipeline_task_serial_read_on_limit) {
-int32_t adaptive_pipeline_task_serial_read_on_limit =
-ADAPTIVE_PIPELINE_TASK_SERIAL_READ_ON_LIMIT_DEFAULT;
-if 
(query_options.__isset.adaptive_pipeline_task_serial_read_on_limit) {
-adaptive_pipeline_task_serial_read_on_limit =
-
query_options.adaptive_pipeline_task_serial_read_on_limit;
-}
-
-if (tnode.limit > 0 && tnode.limit <= 
adaptive_pipeline_task_serial_read_on_limit) {
+
DCHECK(query_options.__isset.adaptive_pipeline_task_serial_read_on_limit);
+if (tnode.limit > 0 &&
+tnode.limit <= 
query_options.adaptive_pipeline_task_serial_read_on_limit) {
 _should_run_serial = true;
 }
 }
diff --git a/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java 
b/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java
index e0ec272cf4c..b1ea3772deb 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java
@@ -426,15 +426,12 @@ public class Coordinator implements CoordInterface {
 
 private void initQueryOptions(ConnectContext context) {
 this.queryOptions = context.getSessionVariable().toThrift();
-this.queryOptions.setBeExecVersion(Config.be_exec_version);
 this.queryOptions.setQueryTimeout(context.getExecTimeout());
 this.queryOptions.setExecutionTimeout(context.getExecTimeout());
 if (this.queryOptions.getExecutionTimeout() < 1) {
 LOG.info("try set timeout less than 1", new RuntimeException(""));
 }
-
this.queryOptions.setEnableScanNodeRunSerial(context.getSessionVariable().isEnableScanRunSerial());
 
this.queryOptions.setFeProcessUuid(ExecuteEnv.getInstance().getProcessUUID());
-
this.queryOptions.setWaitFullBlockScheduleTimes(context.getSessionVariable().getWaitFullBlockScheduleTimes());
 this.queryOptions.setMysqlRowBinaryFormat(
 context.getCommand() == MysqlCommand.COM_STMT_EXECUTE);
 }
diff --git a/fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java 
b/fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java
index 438afdaea53..94cf0cb3469 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java
@@ -1045,7 +1045,7 @@ public class SessionVariable implements Serializable, 
Writable {
 
 @VariableMgr.VarAttr(name = PARALLEL_SCAN_MIN_ROWS_PER_SCANNER, fuzzy = 
true,
 varType = VariableAnnotation.EXPERIMENTAL, needForward = true)
-private long parallelScanMinRowsPerScanner = 2097152; // 16K
+private long parallelScanMinRowsPerScanner = 2097152; // 2M
 
 @VariableMgr.VarAttr(name = IGNORE_STORAGE_DATA_DISTRIBUTION, fuzzy = 
false,
 varType = VariableAnnotation.EXPERIMENTAL, needForward = true)
@@ -3656,6 +3656,7 @@ public class SessionVariable implements Serializable, 
Writable {
 
tResult.setTrimTailingSpacesForExternalTableQuery(trimTailingSpacesForExternalTableQuery);
 
tResult.setEnableShareHashTableForBroadcastJoin(enableShareHashTableForBroadcastJoin);
 
tResult.setEnableHashJoinEarlyStartProbe(enableHashJoinEarlyStartProbe);
+tResult.setEnableScanNodeRunSerial(enableSc

(doris) branch master updated: [fix](function) fix Substring/SubReplace error result with input utf8 string (#40929)

2024-09-18 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new cee07d62ce6 [fix](function) fix Substring/SubReplace error result with 
input utf8 string (#40929)
cee07d62ce6 is described below

commit cee07d62ce6fcd9370f5789daec4571a09af41a4
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Thu Sep 19 09:29:31 2024 +0800

[fix](function) fix Substring/SubReplace error result with input utf8 
string (#40929)

```

mysql [(none)]>select sub_replace("你好世界","a",1);
+-+
| sub_replace('你好世界', 'a', 1) |
+-+
| �a�好世界 |
+-+



mysql [(none)]>select SUBSTRING('中文测试',5);
+--+
| substring('中文测试', 5, 2147483647) |
+--+
| 中文测试 |
+--+
1 row in set (0.04 sec)



now
mysql [(none)]>select sub_replace("你好世界","a",1);
+-+
| sub_replace('你好世界', 'a', 1) |
+-+
| 你a世界 |
+-+
1 row in set (0.05 sec)

mysql [(none)]>select SUBSTRING('中文测试',5);
+--+
| substring('中文测试', 5, 2147483647) |
+--+
|  |
+--+
1 row in set (0.13 sec)
```
---
 be/src/vec/functions/function_string.h | 132 +++--
 .../string_functions/test_string_function.out  |  60 ++
 .../string_functions/test_string_function.out  | Bin 4590 -> 4838 bytes
 .../string_functions/test_string_function.groovy   |  23 
 .../string_functions/test_string_function.groovy   |  10 ++
 5 files changed, 188 insertions(+), 37 deletions(-)

diff --git a/be/src/vec/functions/function_string.h 
b/be/src/vec/functions/function_string.h
index 53c300f50aa..4ae8cbf5ff2 100644
--- a/be/src/vec/functions/function_string.h
+++ b/be/src/vec/functions/function_string.h
@@ -242,9 +242,11 @@ private:
 const char* str_data = (char*)chars.data() + offsets[i - 1];
 int start_value = is_const ? start[0] : start[i];
 int len_value = is_const ? len[0] : len[i];
-
+// Unsigned numbers cannot be used here because start_value can be 
negative.
+int char_len = simd::VStringFunctions::get_char_len(str_data, 
str_size);
 // return empty string if start > src.length
-if (start_value > str_size || str_size == 0 || start_value == 0 || 
len_value <= 0) {
+// Here, start_value is compared against the length of the 
character.
+if (start_value > char_len || str_size == 0 || start_value == 0 || 
len_value <= 0) {
 StringOP::push_empty_string(i, res_chars, res_offsets);
 continue;
 }
@@ -3386,8 +3388,6 @@ public:
 return get_variadic_argument_types_impl().size();
 }
 
-bool use_default_implementation_for_nulls() const override { return false; 
}
-
 Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
 size_t result, size_t input_rows_count) const override 
{
 return Impl::execute_impl(context, block, arguments, result, 
input_rows_count);
@@ -3398,59 +3398,116 @@ struct SubReplaceImpl {
 static Status replace_execute(Block& block, const ColumnNumbers& 
arguments, size_t result,
   size_t input_rows_count) {
 auto res_column = ColumnString::create();
-auto result_column = assert_cast(res_column.get());
+auto* result_column = assert_cast(res_column.get());
 auto args_null_map = ColumnUInt8::create(input_rows_count, 0);
 ColumnPtr argument_columns[4];
+bool col_const[4];
 for (int i = 0; i < 4; ++i) {
-argument_columns[i] =
-
block.get_by_position(arguments[i]).column->convert_to_full_column_if_const();
-if (auto* nullable = 
check_and_get_column(*argument_columns[i])) {
-// Danger: Here must dispose the null map data first! Because
-// argument_columns[i]=nullable->get_nested_column_ptr(); will 
release the mem
-// of column

(doris) branch branch-3.0 updated: [opt](in expr) Optimize the IN expression by skipping constant column… (#40917)

2024-09-18 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a6a008a5491 [opt](in expr) Optimize the IN expression by skipping 
constant column… (#40917)
a6a008a5491 is described below

commit a6a008a5491d619b79c1147e850e11a865000594
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Wed Sep 18 19:42:05 2024 +0800

[opt](in expr) Optimize the IN expression by skipping constant column… 
(#40917)

…s. (#39912)
https://github.com/apache/doris/pull/39912

Optimize the IN expression by skipping constant columns
---
 be/src/vec/exprs/vin_predicate.cpp | 19 ---
 be/src/vec/exprs/vin_predicate.h   |  3 +++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/be/src/vec/exprs/vin_predicate.cpp 
b/be/src/vec/exprs/vin_predicate.cpp
index 1411254a2ca..e4a4969b00e 100644
--- a/be/src/vec/exprs/vin_predicate.cpp
+++ b/be/src/vec/exprs/vin_predicate.cpp
@@ -93,10 +93,24 @@ Status VInPredicate::open(RuntimeState* state, 
VExprContext* context,
 if (scope == FunctionContext::FRAGMENT_LOCAL) {
 RETURN_IF_ERROR(VExpr::get_const_col(context, nullptr));
 }
+
+_is_args_all_constant = std::all_of(_children.begin() + 1, _children.end(),
+[](const VExprSPtr& expr) { return 
expr->is_constant(); });
 _open_finished = true;
 return Status::OK();
 }
 
+size_t VInPredicate::skip_constant_args_size() const {
+if (_is_args_all_constant && !_can_fast_execute) {
+// This is an optimization. For expressions like colA IN (1, 2, 3, 4),
+// where all values inside the IN clause are constants,
+// a hash set is created during open, and it will not be accessed 
again during execute
+//  Here, _children[0] is colA
+return 1;
+}
+return _children.size();
+}
+
 void VInPredicate::close(VExprContext* context, 
FunctionContext::FunctionStateScope scope) {
 VExpr::close_function_context(context, scope, _function);
 VExpr::close(context, scope);
@@ -115,9 +129,8 @@ Status VInPredicate::execute(VExprContext* context, Block* 
block, int* result_co
 return Status::OK();
 }
 DCHECK(_open_finished || _getting_const_col);
-// TODO: not execute const expr again, but use the const column in 
function context
-doris::vectorized::ColumnNumbers arguments(_children.size());
-for (int i = 0; i < _children.size(); ++i) {
+doris::vectorized::ColumnNumbers arguments(skip_constant_args_size());
+for (int i = 0; i < skip_constant_args_size(); ++i) {
 int column_id = -1;
 RETURN_IF_ERROR(_children[i]->execute(context, block, &column_id));
 arguments[i] = column_id;
diff --git a/be/src/vec/exprs/vin_predicate.h b/be/src/vec/exprs/vin_predicate.h
index 4d227510b91..1b640056284 100644
--- a/be/src/vec/exprs/vin_predicate.h
+++ b/be/src/vec/exprs/vin_predicate.h
@@ -51,6 +51,8 @@ public:
 
 std::string debug_string() const override;
 
+size_t skip_constant_args_size() const;
+
 const FunctionBasePtr function() { return _function; }
 
 bool is_not_in() const { return _is_not_in; };
@@ -62,5 +64,6 @@ private:
 
 const bool _is_not_in;
 static const constexpr char* function_name = "in";
+bool _is_args_all_constant = false;
 };
 } // namespace doris::vectorized
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (7cca2523fae -> d5c24d348dd)

2024-09-18 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 7cca2523fae [Chore][inverted index] remove duplicate null bitmap 
reader in function array index (#40907)
 add d5c24d348dd [fix](function) fix error result in split_by_string with 
utf8 chars (#40710)

No new revisions were added by this update.

Summary of changes:
 be/src/vec/functions/function_string.h | 83 --
 .../string_functions/test_split_by_string.out  |  4 ++
 .../string_functions/test_split_by_string.groovy   |  2 +
 3 files changed, 53 insertions(+), 36 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [opt](function) optimize from_unixtime/date_format by specially format str (#40821)

2024-09-17 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 32d4b08989a [opt](function) optimize from_unixtime/date_format by 
specially format str (#40821)
32d4b08989a is described below

commit 32d4b08989a4359342bdc796e1fabfbf422656c0
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Wed Sep 18 11:36:19 2024 +0800

[opt](function) optimize from_unixtime/date_format by specially format str 
(#40821)

```
 mysql [test]>select count(date_format(a, 'MMdd')) from date_format_tmp;
+---+
| count(date_format(a, 'MMdd')) |
+---+
|  1600 |
+---+
1 row in set (0.53 sec)


mysql [test]>select count(date_format(a, 'MMdd')) from date_format_tmp;
+---+
| count(date_format(a, 'MMdd')) |
+---+
|  1600 |
+---+
1 row in set (0.28 sec)
```
---
 be/src/vec/functions/date_format_type.h| 156 +
 be/src/vec/functions/date_time_transforms.h| 104 +-
 .../functions/function_datetime_string_to_string.h | 140 ++
 .../datetime_functions/test_date_function.out  |  15 ++
 .../datetime_functions/test_date_function.groovy   |   9 +-
 5 files changed, 357 insertions(+), 67 deletions(-)

diff --git a/be/src/vec/functions/date_format_type.h 
b/be/src/vec/functions/date_format_type.h
new file mode 100644
index 000..071ecf44853
--- /dev/null
+++ b/be/src/vec/functions/date_format_type.h
@@ -0,0 +1,156 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+
+#include "vec/common/string_ref.h"
+
+namespace doris::vectorized::time_format_type {
+// Used to optimize commonly used date formats.
+
+inline StringRef rewrite_specific_format(const char* raw_str, size_t str_size) 
{
+const static std::string specific_format_strs[3] = {"%Y%m%d", "%Y-%m-%d", 
"%Y-%m-%d %H:%i:%s"};
+const static std::string specific_format_rewrite[3] = {"MMdd", 
"-MM-dd",
+   "-MM-dd 
HH:mm:ss"};
+for (int i = 0; i < 3; i++) {
+const StringRef specific_format {specific_format_strs[i].data(),
+ specific_format_strs[i].size()};
+if (specific_format == StringRef {raw_str, str_size}) {
+return {specific_format_rewrite[i].data(), 
specific_format_rewrite[i].size()};
+}
+}
+return {raw_str, str_size};
+}
+
+template 
+void put_year(T y, char* buf, int& i) {
+int t = y / 100;
+buf[i++] = t / 10 + '0';
+buf[i++] = t % 10 + '0';
+
+t = y % 100;
+buf[i++] = t / 10 + '0';
+buf[i++] = t % 10 + '0';
+}
+
+template 
+void put_other(T m, char* buf, int& i) {
+buf[i++] = m / 10 + '0';
+buf[i++] = m % 10 + '0';
+}
+
+// NoneImpl indicates that no specific optimization has been applied, and the 
general logic is used for processing.
+struct NoneImpl {};
+
+struct MMddImpl {
+template 
+size_t static date_to_str(const DateType& date_value, char* buf) {
+int i = 0;
+put_year(date_value.year(), buf, i);
+put_other(date_value.month(), buf, i);
+put_other(date_value.day(), buf, i);
+return i;
+}
+};
+
+struct _MM_ddImpl {
+template 
+size_t static date_to_str(const DateType& date_value, char* buf) {
+int i = 0;
+put_year(date_value.year(), buf, i);
+buf[i++] = '-';
+put_other(date_value.month(), buf, i);
+buf[i++] = '-';
+put_other(da

(doris) branch master updated (2ae4ff86439 -> 8c97aa16296)

2024-09-17 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 2ae4ff86439 [opt](function) Optimize the concat(col, constant, 
constant, constant) function (#40670)
 add 8c97aa16296 [Bug](exchange) fix tablet sink shuffle without project 
not match the output tuple (#40299)

No new revisions were added by this update.

Summary of changes:
 be/src/pipeline/exec/exchange_sink_operator.cpp| 27 +-
 be/src/pipeline/exec/exchange_sink_operator.h  |  5 +++-
 .../glue/translator/PhysicalPlanTranslator.java|  3 +--
 .../plans/commands/insert/OlapInsertExecutor.java  |  1 +
 .../org/apache/doris/planner/DataStreamSink.java   | 10 
 gensrc/thrift/DataSinks.thrift |  1 +
 .../data/nereids_p0/insert_into_table/random.out   |  3 +++
 .../nereids_p0/insert_into_table/random.groovy | 11 +
 8 files changed, 57 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [opt](function) Optimize the concat(col, constant, constant, constant) function (#40670)

2024-09-17 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 2ae4ff86439 [opt](function) Optimize the concat(col, constant, 
constant, constant) function (#40670)
2ae4ff86439 is described below

commit 2ae4ff86439e7532845963d470e1061ee62656ad
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Wed Sep 18 09:17:55 2024 +0800

[opt](function) Optimize the concat(col, constant, constant, constant) 
function (#40670)

```
mysql [test]>select count(concat(short, 
"123","121231","123123",'12312313')) from strings;
+-+
| count(concat(short, '123', '121231', '123123', '12312313')) |
+-+
|1000 |
+-+
1 row in set (0.52 sec)

mysql [test]>select count(concat(short, "123","121231","123123",'12312313' 
, short , short, short)) from strings;

+--+
| count(concat(short, '123', '121231', '123123', '12312313', short, short, 
short)) |

+--+
| 
1000 |

+--+
1 row in set (0.98 sec)



now

mysql [test]>select count(concat(short, 
"123","121231","123123",'12312313')) from strings;
+-+
| count(concat(short, '123', '121231', '123123', '12312313')) |
+-+
|1000 |
+-+
1 row in set (0.19 sec)

mysql [test]>select count(concat(short, "123","121231","123123",'12312313' 
, short , short, short)) from strings;

+--+
| count(concat(short, '123', '121231', '123123', '12312313', short, short, 
short)) |

+--+
| 
1000 |

+--+
1 row in set (0.71 sec)
```
---
 be/src/vec/functions/function_string.h  | 126 
 be/test/vec/core/column_string_test.cpp |  12 ++-
 2 files changed, 121 insertions(+), 17 deletions(-)

diff --git a/be/src/vec/functions/function_string.h 
b/be/src/vec/functions/function_string.h
index 160cc484a74..ef5122ac84d 100644
--- a/be/src/vec/functions/function_string.h
+++ b/be/src/vec/functions/function_string.h
@@ -1007,6 +1007,11 @@ public:
 
 class FunctionStringConcat : public IFunction {
 public:
+struct ConcatState {
+bool use_state = false;
+std::string tail;
+};
+
 static constexpr auto name = "concat";
 static FunctionPtr create() { return 
std::make_shared(); }
 String get_name() const override { return name; }
@@ -1017,6 +1022,40 @@ public:
 return std::make_shared();
 }
 
+Status open(FunctionContext* context, FunctionContext::FunctionStateScope 
scope) override {
+if (scope == FunctionContext::THREAD_LOCAL) {
+return Status::OK();
+}
+std::shared_ptr state = std::make_shared();
+
+context->set_function_state(scope, state);
+
+state->use_state = true;
+
+// Optimize function calls like this:
+// concat(col, "123", "abc", "456") -> tail = "123abc456"
+for (size_t i = 1; i < context->get_num_args(); i++) {
+const auto* column_string = context->get_constant_col(i);
+if (column_string == nullptr) {
+state->use_state = false;
+return IFunction::open(context, scope);
+}
+auto string_vale = column_string->column_ptr->get_data_at(0);
+if (string_vale.data == nullptr) {
+// For concat(col, null)

(doris) branch master updated: [opt](join)Support MethodOneString to optimize hash join with a single string key (#40559)

2024-09-11 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new b37148e2cb0 [opt](join)Support MethodOneString to optimize hash join 
with a single string key (#40559)
b37148e2cb0 is described below

commit b37148e2cb0d466fe70320ec2b6778e8bb067152
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Wed Sep 11 18:52:12 2024 +0800

[opt](join)Support MethodOneString to optimize hash join with a single 
string key (#40559)

```
mysql [test]>SELECT count() from hits_10m WHERE SearchPhrase IN (SELECT 
SearchPhrase from hits_10m);
+--+
| count(*) |
+--+
| 1000 |
+--+
1 row in set (1.15 sec)

now
mysql [test]>SELECT count() from hits_10m WHERE SearchPhrase IN (SELECT 
SearchPhrase from hits_10m);
+--+
| count(*) |
+--+
| 1000 |
+--+
1 row in set (0.66 sec)

```
---
 be/src/pipeline/common/join_utils.h|  4 +-
 be/src/pipeline/dependency.h   | 10 -
 be/src/pipeline/exec/hashjoin_build_sink.cpp   |  6 +++
 .../exec/join/process_hash_table_probe_impl.h  |  1 +
 be/src/vec/common/hash_table/hash_map_context.h| 48 +++---
 5 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/be/src/pipeline/common/join_utils.h 
b/be/src/pipeline/common/join_utils.h
index cd3374995f7..7fcf669d42e 100644
--- a/be/src/pipeline/common/join_utils.h
+++ b/be/src/pipeline/common/join_utils.h
@@ -43,7 +43,7 @@ using I32HashTableContext = 
vectorized::PrimaryTypeHashTableContext;
 using I128HashTableContext = vectorized::PrimaryTypeHashTableContext;
 using I256HashTableContext = vectorized::PrimaryTypeHashTableContext;
-
+using MethodOneString = 
vectorized::MethodStringNoCache>;
 template 
 using I64FixedKeyHashTableContext = 
vectorized::FixedKeyHashTableContext;
 
@@ -63,6 +63,6 @@ using HashTableVariants =
  I64FixedKeyHashTableContext, 
I128FixedKeyHashTableContext,
  I128FixedKeyHashTableContext, 
I256FixedKeyHashTableContext,
  I256FixedKeyHashTableContext, 
I136FixedKeyHashTableContext,
- I136FixedKeyHashTableContext>;
+ I136FixedKeyHashTableContext, MethodOneString>;
 
 } // namespace doris::pipeline
diff --git a/be/src/pipeline/dependency.h b/be/src/pipeline/dependency.h
index e5738e48f93..863458d3bde 100644
--- a/be/src/pipeline/dependency.h
+++ b/be/src/pipeline/dependency.h
@@ -656,8 +656,8 @@ public:
 };
 
 using SetHashTableVariants =
-std::variant>,
+std::variant,
  
vectorized::SetPrimaryTypeHashTableContext,
  
vectorized::SetPrimaryTypeHashTableContext,
@@ -735,6 +735,12 @@ public:
 case TYPE_DATETIMEV2:
 
hash_table_variants->emplace>();
 break;
+case TYPE_CHAR:
+case TYPE_VARCHAR:
+case TYPE_STRING: {
+hash_table_variants->emplace();
+break;
+}
 case TYPE_LARGEINT:
 case TYPE_DECIMALV2:
 case TYPE_DECIMAL128I:
diff --git a/be/src/pipeline/exec/hashjoin_build_sink.cpp 
b/be/src/pipeline/exec/hashjoin_build_sink.cpp
index 0bee88ed537..8f7b176a979 100644
--- a/be/src/pipeline/exec/hashjoin_build_sink.cpp
+++ b/be/src/pipeline/exec/hashjoin_build_sink.cpp
@@ -377,6 +377,12 @@ void 
HashJoinBuildSinkLocalState::_hash_table_init(RuntimeState* state) {
 }
 break;
 }
+case TYPE_CHAR:
+case TYPE_VARCHAR:
+case TYPE_STRING: {
+
_shared_state->hash_table_variants->emplace();
+break;
+}
 default:
 _shared_state->hash_table_variants
 
->emplace();
diff --git a/be/src/pipeline/exec/join/process_hash_table_probe_impl.h 
b/be/src/pipeline/exec/join/process_hash_table_probe_impl.h
index 3ffdb9cb990..653cc8ab447 100644
--- a/be/src/pipeline/exec/join/process_hash_table_probe_impl.h
+++ b/be/src/pipeline/exec/join/process_hash_table_probe_impl.h
@@ -739,6 +739,7 @@ struct ExtractType {
 INSTANTIATION(JoinOpType, (I256FixedKeyHashTableContext)); \
 INSTANTIATION(JoinOpType, (I256FixedKeyHashTableContext));\
 INSTANTIATION(JoinOpType, (I136FixedKeyHashTableContext)); \
+INSTANTIATION(JoinOpType, (MethodOneString));\
 INSTANTIATION(JoinOpType, (I136FixedKeyHashTableContext));
 
 } // namespace doris::pipeline
diff --git a/be/src/vec/common/hash_

(doris) branch master updated: [feature-WIP](query cache) cache tablets aggregate result, BE part (#40171)

2024-09-10 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new e3b81e7e3fb [feature-WIP](query cache) cache tablets aggregate result, 
BE part (#40171)
e3b81e7e3fb is described below

commit e3b81e7e3fb20fb0be857650bfda45e67678b3af
Author: HappenLee 
AuthorDate: Wed Sep 11 14:28:06 2024 +0800

[feature-WIP](query cache) cache tablets aggregate result, BE part (#40171)

support cache tablets aggregate result

for example

SQL 1:
```sql
select key, sum(value)
from tbl
where dt between '2024-08-01' and '2024-08-10'
group by key
```

SQL 2:
```sql
select key, sum(value)
from tbl
where dt between '2024-08-5' and '2024-08-15'
group by key
```

SQL 1 will update the tablets aggregate result which partition between
'2024-08-01' and '2024-08-10'.
Then SQL 2 will reuse the tablets aggregate which partition between
'2024-08-05' and '2024-08-10', and compute aggregate which partition
between '2024-08-11' and '2024-08-15'

We only support simple aggregate which not contains join with runtime
filter, at present.

# How to use

```sql
set enable_query_cache=true;
```
---
 be/src/common/config.cpp   |   2 +
 be/src/common/config.h |   3 +
 be/src/pipeline/dependency.h   |   7 +
 be/src/pipeline/exec/cache_sink_operator.cpp   |  73 +
 be/src/pipeline/exec/cache_sink_operator.h |  73 +
 be/src/pipeline/exec/cache_source_operator.cpp | 199 +
 be/src/pipeline/exec/cache_source_operator.h   | 104 +
 be/src/pipeline/exec/olap_scan_operator.cpp|  34 +++--
 be/src/pipeline/exec/olap_scan_operator.h  |   4 +-
 be/src/pipeline/exec/operator.cpp  |   7 +-
 be/src/pipeline/pipeline_fragment_context.cpp  |  80 --
 be/src/pipeline/query_cache/query_cache.cpp|  70 +
 be/src/pipeline/query_cache/query_cache.h  | 151 +++
 be/src/runtime/exec_env.h  |   7 +
 be/src/runtime/exec_env_init.cpp   |   9 +-
 be/src/runtime/memory/cache_policy.h   |   3 +
 be/src/runtime/memory/mem_tracker_limiter.cpp  |   2 +-
 be/src/vec/core/block.cpp  |   2 -
 18 files changed, 804 insertions(+), 26 deletions(-)

diff --git a/be/src/common/config.cpp b/be/src/common/config.cpp
index 0c00bd1a38f..df8240cb234 100644
--- a/be/src/common/config.cpp
+++ b/be/src/common/config.cpp
@@ -1327,6 +1327,8 @@ DEFINE_mInt32(lz4_compression_block_size, "262144");
 
 DEFINE_mBool(enable_pipeline_task_leakage_detect, "false");
 
+DEFINE_Int32(query_cache_size, "512");
+
 // clang-format off
 #ifdef BE_TEST
 // test s3
diff --git a/be/src/common/config.h b/be/src/common/config.h
index 720f4f72cb4..5bca9ac280a 100644
--- a/be/src/common/config.h
+++ b/be/src/common/config.h
@@ -1415,6 +1415,9 @@ DECLARE_mInt32(lz4_compression_block_size);
 
 DECLARE_mBool(enable_pipeline_task_leakage_detect);
 
+// MB
+DECLARE_Int32(query_cache_size);
+
 #ifdef BE_TEST
 // test s3
 DECLARE_String(test_s3_resource);
diff --git a/be/src/pipeline/dependency.h b/be/src/pipeline/dependency.h
index f7990c097ef..e5738e48f93 100644
--- a/be/src/pipeline/dependency.h
+++ b/be/src/pipeline/dependency.h
@@ -35,6 +35,7 @@
 #include "pipeline/exec/join/process_hash_table_probe.h"
 #include "vec/common/sort/partition_sorter.h"
 #include "vec/common/sort/sorter.h"
+#include "vec/core/block.h"
 #include "vec/core/types.h"
 #include "vec/spill/spill_stream.h"
 
@@ -541,6 +542,12 @@ public:
 const int _child_count;
 };
 
+struct CacheSharedState : public BasicSharedState {
+ENABLE_FACTORY_CREATOR(CacheSharedState)
+public:
+DataQueue data_queue;
+};
+
 class MultiCastDataStreamer;
 
 struct MultiCastSharedState : public BasicSharedState {
diff --git a/be/src/pipeline/exec/cache_sink_operator.cpp 
b/be/src/pipeline/exec/cache_sink_operator.cpp
new file mode 100644
index 000..b8b5b534659
--- /dev/null
+++ b/be/src/pipeline/exec/cache_sink_operator.cpp
@@ -0,0 +1,73 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/L

(doris) branch master updated: [case](array) fix unsort array case (#39941)

2024-08-31 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new dac10b761ca [case](array) fix unsort array case (#39941)
dac10b761ca is described below

commit dac10b761caece33203cb45893b7b0872eaf167b
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Sat Aug 31 16:45:30 2024 +0800

[case](array) fix unsort array case (#39941)
---
 regression-test/data/nereids_p0/aggregate/agg_nullable_2.out  | 2 +-
 regression-test/suites/nereids_p0/aggregate/agg_nullable_2.groovy | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/regression-test/data/nereids_p0/aggregate/agg_nullable_2.out 
b/regression-test/data/nereids_p0/aggregate/agg_nullable_2.out
index 37a9eba793c..a03834d 100644
--- a/regression-test/data/nereids_p0/aggregate/agg_nullable_2.out
+++ b/regression-test/data/nereids_p0/aggregate/agg_nullable_2.out
@@ -141,7 +141,7 @@
 [4, 5, 6]
 
 -- !select_group_array_intersect_n --
-[2, 1, 3]
+[1, 2, 3]
 
 -- !select_group_bit_and --
 50
diff --git a/regression-test/suites/nereids_p0/aggregate/agg_nullable_2.groovy 
b/regression-test/suites/nereids_p0/aggregate/agg_nullable_2.groovy
index 42c24815a9a..5337f59d015 100644
--- a/regression-test/suites/nereids_p0/aggregate/agg_nullable_2.groovy
+++ b/regression-test/suites/nereids_p0/aggregate/agg_nullable_2.groovy
@@ -331,19 +331,19 @@ suite("agg_nullable_2") {
 contains "colUniqueId=null, type=bigint, nullable=false"
 }
 
-qt_select_group_array_intersect """select group_array_intersect(kaint) 
from agg_nullable_test_2;"""
+qt_select_group_array_intersect """select 
array_sort(group_array_intersect(kaint)) from agg_nullable_test_2;"""
 explain {
 sql("verbose select group_array_intersect(kaint) from 
agg_nullable_test_2;")
 contains "colUniqueId=null, type=array, nullable=false"
 }
 
-qt_select_group_array_intersect2 """select group_array_intersect(kaint) 
from agg_nullable_test_2 group by id;"""
+qt_select_group_array_intersect2 """select 
array_sort(group_array_intersect(kaint)) from agg_nullable_test_2 group by 
id;"""
 explain {
 sql("verbose select group_array_intersect(kaint) from 
agg_nullable_test_2 group by id;")
 contains "colUniqueId=null, type=array, nullable=false"
 }
 
-qt_select_group_array_intersect_n """select group_array_intersect(knaint) 
from agg_nullable_test_2;"""
+qt_select_group_array_intersect_n """select 
array_sort(group_array_intersect(knaint)) from agg_nullable_test_2;"""
 explain {
 sql("verbose select group_array_intersect(knaint) from 
agg_nullable_test_2;")
 contains "colUniqueId=null, type=array, nullable=false"


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.0 updated: [cherry-pick](branch-20) fix partition-topn calculate partition input rows have error (#39100) (#39581) (#40003)

2024-08-30 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.0 by this push:
 new 5e82c2bb7b2 [cherry-pick](branch-20) fix partition-topn calculate 
partition input rows have error (#39100) (#39581) (#40003)
5e82c2bb7b2 is described below

commit 5e82c2bb7b239af6fe914b218ddecea233b020c1
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Sat Aug 31 14:22:51 2024 +0800

[cherry-pick](branch-20) fix partition-topn calculate partition input rows 
have error (#39100) (#39581) (#40003)

cherry-pick from master
https://github.com/apache/doris/pull/39100
https://github.com/apache/doris/pull/39581
---
 be/src/vec/columns/column_impl.h | 6 --
 be/src/vec/exec/vpartition_sort_node.cpp | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/be/src/vec/columns/column_impl.h b/be/src/vec/columns/column_impl.h
index 0d573b24fbd..64a73ac47fd 100644
--- a/be/src/vec/columns/column_impl.h
+++ b/be/src/vec/columns/column_impl.h
@@ -67,8 +67,10 @@ void IColumn::append_data_by_selector_impl(MutablePtr& res, 
const Selector& sele
 LOG(FATAL) << fmt::format("Size of selector: {}, is larger than size 
of column:{}",
   selector.size(), num_rows);
 }
-
-res->reserve(num_rows);
+// here wants insert some value from this column, and the nums is 
selector.size()
+// and many be this column num_rows is 4096, but only need insert num is 
size = 1
+// so can't call res->reserve(num_rows), it's will be too mush waste memory
+res->reserve(res->size() + selector.size());
 
 for (size_t i = 0; i < selector.size(); ++i)
 static_cast(*res).insert_from(*this, selector[i]);
diff --git a/be/src/vec/exec/vpartition_sort_node.cpp 
b/be/src/vec/exec/vpartition_sort_node.cpp
index 09d3d1be1de..e82cb37bb6b 100644
--- a/be/src/vec/exec/vpartition_sort_node.cpp
+++ b/be/src/vec/exec/vpartition_sort_node.cpp
@@ -172,7 +172,6 @@ void VPartitionSortNode::_emplace_into_hash_table(const 
ColumnRawPtrs& key_colum
 Status VPartitionSortNode::sink(RuntimeState* state, vectorized::Block* 
input_block, bool eos) {
 auto current_rows = input_block->rows();
 if (current_rows > 0) {
-child_input_rows = child_input_rows + current_rows;
 if (UNLIKELY(_partition_exprs_num == 0)) {
 if (UNLIKELY(_value_places.empty())) {
 _value_places.push_back(_pool->add(new PartitionBlocks()));
@@ -192,6 +191,7 @@ Status VPartitionSortNode::sink(RuntimeState* state, 
vectorized::Block* input_bl
 RETURN_IF_ERROR(
 state->check_query_state("VPartitionSortNode, while 
split input block."));
 input_block->clear_column_data();
+child_input_rows = child_input_rows + current_rows;
 }
 }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [Enhancement](DDL) check illegal partition exprs (#40158)

2024-08-30 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 3ff7ad3a576 [Enhancement](DDL) check illegal partition exprs (#40158)
3ff7ad3a576 is described below

commit 3ff7ad3a576ae90a6838f1e71fe8d78c463f013f
Author: zclllhhjj 
AuthorDate: Sat Aug 31 14:08:15 2024 +0800

[Enhancement](DDL) check illegal partition exprs (#40158)

before:
```sql
mysql> CREATE TABLE not_auto_expr (
-> `TIME_STAMP` date NOT NULL
-> )
-> partition by range (date_trunc(`TIME_STAMP`, 'day'))()
-> DISTRIBUTED BY HASH(`TIME_STAMP`) BUCKETS 10
-> PROPERTIES (
-> "replication_allocation" = "tag.location.default: 1"
-> );
Query OK, 0 rows affected (0.14 sec)
```
now:
```sql
mysql> CREATE TABLE not_auto_expr (
-> `TIME_STAMP` date NOT NULL
-> )
-> partition by range (date_trunc(`TIME_STAMP`, 'day'))()
-> DISTRIBUTED BY HASH(`TIME_STAMP`) BUCKETS 10
-> PROPERTIES (
-> "replication_allocation" = "tag.location.default: 1"
-> );
ERROR 1105 (HY000): errCode = 2, detailMessage = errCode = 2, detailMessage 
= Non-auto partition table not support partition expr!
```
---
 .../apache/doris/nereids/parser/PartitionTableInfo.java|  9 +
 .../auto_partition/test_auto_partition_behavior.groovy | 14 ++
 2 files changed, 23 insertions(+)

diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/parser/PartitionTableInfo.java
 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/parser/PartitionTableInfo.java
index a68ddcdf87a..e9f7fdcfee3 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/parser/PartitionTableInfo.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/parser/PartitionTableInfo.java
@@ -28,6 +28,7 @@ import org.apache.doris.analysis.SlotRef;
 import org.apache.doris.analysis.StringLiteral;
 import org.apache.doris.catalog.AggregateType;
 import org.apache.doris.catalog.PartitionType;
+import org.apache.doris.common.DdlException;
 import org.apache.doris.nereids.analyzer.UnboundFunction;
 import org.apache.doris.nereids.analyzer.UnboundSlot;
 import org.apache.doris.nereids.exceptions.AnalysisException;
@@ -269,6 +270,14 @@ public class PartitionTableInfo {
 
 try {
 ArrayList exprs = 
convertToLegacyAutoPartitionExprs(partitionList);
+
+// only auto partition support partition expr
+if (!isAutoPartition) {
+if (exprs.stream().anyMatch(expr -> expr instanceof 
FunctionCallExpr)) {
+throw new DdlException("Non-auto partition table not 
support partition expr!");
+}
+}
+
 // here we have already extracted identifierPartitionColumns
 if (partitionType.equals(PartitionType.RANGE.name())) {
 if (isAutoPartition) {
diff --git 
a/regression-test/suites/partition_p0/auto_partition/test_auto_partition_behavior.groovy
 
b/regression-test/suites/partition_p0/auto_partition/test_auto_partition_behavior.groovy
index fb8be0b5510..e5ce52af31e 100644
--- 
a/regression-test/suites/partition_p0/auto_partition/test_auto_partition_behavior.groovy
+++ 
b/regression-test/suites/partition_p0/auto_partition/test_auto_partition_behavior.groovy
@@ -407,4 +407,18 @@ suite("test_auto_partition_behavior") {
 sql """ insert into test_change values ("20001212"); """
 part_result = sql " show tablets from test_change "
 assertEquals(part_result.size, 52 * replicaNum)
+
+test {
+sql """
+CREATE TABLE not_auto_expr (
+`TIME_STAMP` date NOT NULL
+)
+partition by range (date_trunc(`TIME_STAMP`, 'day'))()
+DISTRIBUTED BY HASH(`TIME_STAMP`) BUCKETS 10
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1"
+);
+"""
+exception "Non-auto partition table not support partition expr!"
+}
 }


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (f852385ba3a -> ce25a19c472)

2024-08-30 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from f852385ba3a [fix](auth)fix case should grant after create view (#40108)
 add ce25a19c472 [feature](profile)Enable merging of incomplete profiles. 
(#39560)

No new revisions were added by this update.

Summary of changes:
 .../doris/common/profile/ExecutionProfile.java | 102 +++--
 1 file changed, 74 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch 2.0.13-tebu updated (9f840dc1aa2 -> d01c9d27211)

2024-08-28 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch 2.0.13-tebu
in repository https://gitbox.apache.org/repos/asf/doris.git


from 9f840dc1aa2 Revert "Add log"
 add d01c9d27211 [exec](pipeline) enable pipeline dml in coordinator 
(#40085)

No new revisions were added by this update.

Summary of changes:
 fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [fix](json) fix parsing fails when json key string is empty. (#39937)

2024-08-27 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 62990d87276 [fix](json) fix parsing fails when json key string is 
empty. (#39937)
62990d87276 is described below

commit 62990d872762d2ab9f164460a8a93a5caf0d9fa7
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Tue Aug 27 16:58:11 2024 +0800

[fix](json) fix parsing fails when json key string is empty. (#39937)

This is a very subtle bug that can only be reproduced in a non-AVX2
environment.
```
// USE_AVX2 = false
mysql [(none)]>mysql [(none)]>select cast('{"":1, " ":"v1"}' as json);
+--+
| cast('{"":1, " ":"v1"}' as JSON) |
+--+
| NULL |
+--+

//USE_AVX2 = true
mysql [(none)]>select cast('{"":1, " ":"v1"}' as json);
+--+
| cast('{"":1, " ":"v1"}' as JSON) |
+--+
| {"":1," ":"v1"}  |
+--+
```
---
 be/src/util/jsonb_parser.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/be/src/util/jsonb_parser.h b/be/src/util/jsonb_parser.h
index 1591fa563c7..c90012a4fbe 100644
--- a/be/src/util/jsonb_parser.h
+++ b/be/src/util/jsonb_parser.h
@@ -369,8 +369,8 @@ private:
 key[key_len++] = ch;
 }
 }
-
-if (!in.good() || in.peek() != '"' || key_len == 0) {
+// The JSON key can be an empty string.
+if (!in.good() || in.peek() != '"') {
 if (key_len == JsonbKeyValue::sMaxKeyLen)
 err_ = JsonbErrType::E_INVALID_KEY_LENGTH;
 else


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.0 updated: [exec](pipeline) enable pipeline dml in coordinator (#39365)

2024-08-26 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.0 by this push:
 new ffb5950d9e8 [exec](pipeline) enable pipeline dml in coordinator 
(#39365)
ffb5950d9e8 is described below

commit ffb5950d9e8273dd38a22830e77b1a8b76576489
Author: HappenLee 
AuthorDate: Mon Aug 26 22:12:41 2024 +0800

[exec](pipeline) enable pipeline dml in coordinator (#39365)
---
 fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java 
b/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java
index 869db1e7d89..85f4201eb1e 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/qe/Coordinator.java
@@ -300,8 +300,7 @@ public class Coordinator implements CoordInterface {
 this.returnedAllResults = false;
 this.enableShareHashTableForBroadcastJoin = 
context.getSessionVariable().enableShareHashTableForBroadcastJoin;
 // Only enable pipeline query engine in query, not load
-this.enablePipelineEngine = 
context.getSessionVariable().getEnablePipelineEngine()
-&& (fragments.size() > 0 && fragments.get(0).getSink() 
instanceof ResultSink);
+this.enablePipelineEngine = 
context.getSessionVariable().getEnablePipelineEngine();
 
 this.fasterFloatConvert = 
context.getSessionVariable().fasterFloatConvert();
 


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (e2a97611911 -> 932297fdc38)

2024-08-22 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from e2a97611911 [opt](Nereids) forbid distribute under project and filter 
(#39812)
 add 932297fdc38 [Improvement](top-n) adjust the strategy for selecting the 
sort algorithm (#39780)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/doris/planner/SortNode.java | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.0 updated: [fix](function)timediff with now function causes a error signature

2024-08-22 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.0 by this push:
 new d7b39daba53 [fix](function)timediff with now function  causes a error 
signature
d7b39daba53 is described below

commit d7b39daba5350021ab3bdad0f906248d323c4567
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Fri Aug 23 14:28:33 2024 +0800

[fix](function)timediff with now function  causes a error signature

https://github.com/apache/doris/pull/39322
The derivation of precision for the datetime constant in version 2.0 is
incorrect, it tends to be derived as the maximum precision.

```
mysql [(none)]>select round(timediff(now(),'2024-08-15')/60/60,2);
ERROR 1105 (HY000): errCode = 2, detailMessage = argument 1 requires 
datetimev2 type, however 'now()' is of datetime type
```
The reason is that the function parameter types were modified in
expectedInputTypes, which led to no match being found. The code here is
from a long time ago. Because the precision of datetimev2 could not be
deduced in the past, a separate implementation was made here. This code
can be safely deleted.
---
 .../trees/expressions/functions/scalar/TimeDiff.java   | 18 --
 .../data/correctness/test_time_diff_microseconds.out   |  3 +++
 .../data/correctness_p0/test_char_implicit_cast.out|  4 ++--
 .../data/nereids_p0/test_char_implicit_cast.out|  4 ++--
 .../data/query_p0/test_char_implicit_cast.out  |  4 ++--
 .../correctness/test_time_diff_microseconds.groovy |  8 
 6 files changed, 17 insertions(+), 24 deletions(-)

diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/TimeDiff.java
 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/TimeDiff.java
index 2cff7c8886b..997475ca40e 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/TimeDiff.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/TimeDiff.java
@@ -21,7 +21,6 @@ import org.apache.doris.catalog.FunctionSignature;
 import org.apache.doris.nereids.trees.expressions.Expression;
 import 
org.apache.doris.nereids.trees.expressions.functions.ExplicitlyCastableSignature;
 import 
org.apache.doris.nereids.trees.expressions.functions.PropagateNullableOnDateLikeV2Args;
-import org.apache.doris.nereids.trees.expressions.literal.StringLikeLiteral;
 import org.apache.doris.nereids.trees.expressions.shape.BinaryExpression;
 import org.apache.doris.nereids.trees.expressions.visitor.ExpressionVisitor;
 import org.apache.doris.nereids.types.DateTimeType;
@@ -29,7 +28,6 @@ import org.apache.doris.nereids.types.DateTimeV2Type;
 import org.apache.doris.nereids.types.DateV2Type;
 import org.apache.doris.nereids.types.TimeType;
 import org.apache.doris.nereids.types.TimeV2Type;
-import org.apache.doris.nereids.types.coercion.AbstractDataType;
 
 import com.google.common.base.Preconditions;
 import com.google.common.collect.ImmutableList;
@@ -96,20 +94,4 @@ public class TimeDiff extends ScalarFunction
 }
 return signature;
 }
-
-@Override
-public List expectedInputTypes() {
-FunctionSignature signature = getSignature();
-if (getArgument(0) instanceof StringLikeLiteral) {
-StringLikeLiteral str = (StringLikeLiteral) getArgument(0);
-DateTimeV2Type left = 
DateTimeV2Type.forTypeFromString(str.getStringValue());
-signature = signature.withArgumentType(0, left);
-}
-if (getArgument(1) instanceof StringLikeLiteral) {
-StringLikeLiteral str = (StringLikeLiteral) getArgument(1);
-DateTimeV2Type right = 
DateTimeV2Type.forTypeFromString(str.getStringValue());
-signature = signature.withArgumentType(1, right);
-}
-return signature.argumentsTypes;
-}
 }
diff --git a/regression-test/data/correctness/test_time_diff_microseconds.out 
b/regression-test/data/correctness/test_time_diff_microseconds.out
index dbeeb067f26..a04370f8139 100644
--- a/regression-test/data/correctness/test_time_diff_microseconds.out
+++ b/regression-test/data/correctness/test_time_diff_microseconds.out
@@ -27,3 +27,6 @@
 -- !select8 --
 48:00:00.11500
 
+-- !select9 --
+67:19:00.123000
+
diff --git a/regression-test/data/correctness_p0/test_char_implicit_cast.out 
b/regression-test/data/correctness_p0/test_char_implicit_cast.out
index 59f5d47377e..3dcd2252594 100644
--- a/regression-test/data/correctness_p0/test_char_implicit_cast.out
+++ b/regression-test/data/correctness_p0/test_char_implicit_cast.out
@@ -6,10 +6,10 @@
 7
 
 -- !test_timediff_varchar --
--24:00:00
+-24:00:00.00
 
 -- !test_timedi

(doris) branch master updated: [fix](arrow-flight-sql) Fix exceed user property max connection cause `Reach limit of connections` (#39127)

2024-08-22 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 7a066e68365 [fix](arrow-flight-sql) Fix exceed user property max 
connection cause `Reach limit of connections` (#39127)
7a066e68365 is described below

commit 7a066e68365839c6c0d73bd789acc04cd787322c
Author: Xinyi Zou 
AuthorDate: Fri Aug 23 10:07:57 2024 +0800

[fix](arrow-flight-sql) Fix exceed user property max connection cause 
`Reach limit of connections` (#39127)

Limit the number of arrow flight connections for a single user to less
than the user property max_user_connections / 2, default 50.
---
 .../service/arrowflight/DorisFlightSqlService.java |  4 ++
 .../arrowflight/tokens/FlightTokenManagerImpl.java | 53 +++---
 2 files changed, 51 insertions(+), 6 deletions(-)

diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/DorisFlightSqlService.java
 
b/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/DorisFlightSqlService.java
index 85377788097..df9099c6816 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/DorisFlightSqlService.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/DorisFlightSqlService.java
@@ -57,6 +57,10 @@ public class DorisFlightSqlService {
 DorisFlightSqlProducer producer = new DorisFlightSqlProducer(location, 
flightSessionsManager);
 flightServer = FlightServer.builder(allocator, location, producer)
 .headerAuthenticator(new 
FlightBearerTokenAuthenticator(flightTokenManager)).build();
+LOG.info("Arrow Flight SQL service is created, port: {}, 
token_cache_size: {}"
++ ", qe_max_connection: {}, token_alive_time: {}",
+port, Config.arrow_flight_token_cache_size, 
Config.qe_max_connection,
+Config.arrow_flight_token_alive_time);
 }
 
 // start Arrow Flight SQL service, return true if success, otherwise false
diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/tokens/FlightTokenManagerImpl.java
 
b/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/tokens/FlightTokenManagerImpl.java
index cd1b492de06..57101d995e0 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/tokens/FlightTokenManagerImpl.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/tokens/FlightTokenManagerImpl.java
@@ -19,6 +19,7 @@
 
 package org.apache.doris.service.arrowflight.tokens;
 
+import org.apache.doris.catalog.Env;
 import org.apache.doris.qe.ConnectContext;
 import org.apache.doris.service.ExecuteEnv;
 import org.apache.doris.service.arrowflight.auth2.FlightAuthResult;
@@ -31,9 +32,12 @@ import com.google.common.cache.RemovalListener;
 import com.google.common.cache.RemovalNotification;
 import org.apache.logging.log4j.LogManager;
 import org.apache.logging.log4j.Logger;
+import org.jetbrains.annotations.NotNull;
 
 import java.math.BigInteger;
 import java.security.SecureRandom;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ExecutionException;
 import java.util.concurrent.TimeUnit;
 
 /**
@@ -46,7 +50,9 @@ public class FlightTokenManagerImpl implements 
FlightTokenManager {
 private final int cacheSize;
 private final int cacheExpiration;
 
-private LoadingCache tokenCache;
+private final LoadingCache tokenCache;
+// >
+private final ConcurrentHashMap> 
usersTokenLRU = new ConcurrentHashMap<>();
 
 public FlightTokenManagerImpl(final int cacheSize, final int 
cacheExpiration) {
 this.cacheSize = cacheSize;
@@ -56,17 +62,19 @@ public class FlightTokenManagerImpl implements 
FlightTokenManager {
 .expireAfterWrite(cacheExpiration, TimeUnit.MINUTES)
 .removalListener(new RemovalListener() {
 @Override
-public void onRemoval(RemovalNotification notification) {
+public void onRemoval(@NotNull RemovalNotification notification) {
 // TODO: broadcast this message to other FE
-LOG.info("evict bearer token: " + 
notification.getKey() + ", reason: "
+String token = notification.getKey();
+FlightTokenDetails tokenDetails = 
notification.getValue();
+LOG.info("evict bearer token: " + token + ", reason: 
token number exceeded, "
 + notification.getCause());
 ConnectContext context = 
ExecuteEnv.getInstance().getScheduler()
-.getContext(notification.getKey());
+.getContext(token);
 if 

(doris) branch branch-3.0 updated: [fix](function) MicroSecondsSub without scale (#38945) (#39195)

2024-08-22 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new c3810238700 [fix](function) MicroSecondsSub without scale  (#38945) 
(#39195)
c3810238700 is described below

commit c381023870080d2823241d23cd8457555f0b0e5f
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Fri Aug 23 09:53:49 2024 +0800

[fix](function) MicroSecondsSub without scale  (#38945) (#39195)

https://github.com/apache/doris/pull/38945
Added the computeSignature function for millisecond/microsecond
calculation functions to generate parameters and return values with the
appropriate precision.
Modified the microSecondsAdd function, which was used for constant
folding, because constant folding uses the precision of the parameters
for calculation. However, for millisecond/microsecond calculations, it
is necessary to set the precision to the maximum to ensure correct
display.


before
```
mysql> SELECT MICROSECONDS_SUB('2010-11-30 23:50:50', 2);
+---+
| microseconds_sub(cast('2010-11-30 23:50:50' as DATETIMEV2(0)), 2) |
+---+
| 2010-11-30 23:50:49   |
+---+
```
now
```
mysql> SELECT MICROSECONDS_SUB('2010-11-30 23:50:50', 2);
+---+
| microseconds_sub(cast('2010-11-30 23:50:50' as DATETIMEV2(0)), 2) |
+---+
| 2010-11-30 23:50:49.98|
+---+
```


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.1 updated: [refine](pipeline) refine some VDataStreamRecvr code (#35063) (#37802)

2024-08-22 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.1 by this push:
 new 04e993c1de8 [refine](pipeline) refine some VDataStreamRecvr code  
(#35063) (#37802)
04e993c1de8 is described below

commit 04e993c1de8802a3dbea44710e399c04b6aae5ff
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Thu Aug 22 19:55:17 2024 +0800

[refine](pipeline) refine some VDataStreamRecvr code  (#35063) (#37802)

## Proposed changes
https://github.com/apache/doris/pull/35063
https://github.com/apache/doris/pull/35428
---
 be/src/vec/runtime/vdata_stream_recvr.cpp | 60 ++-
 be/src/vec/runtime/vdata_stream_recvr.h   | 20 +--
 2 files changed, 46 insertions(+), 34 deletions(-)

diff --git a/be/src/vec/runtime/vdata_stream_recvr.cpp 
b/be/src/vec/runtime/vdata_stream_recvr.cpp
index cb483e986c8..912ecf53989 100644
--- a/be/src/vec/runtime/vdata_stream_recvr.cpp
+++ b/be/src/vec/runtime/vdata_stream_recvr.cpp
@@ -49,6 +49,7 @@ VDataStreamRecvr::SenderQueue::SenderQueue(VDataStreamRecvr* 
parent_recvr, int n
   _num_remaining_senders(num_senders),
   _received_first_batch(false) {
 _cancel_status = Status::OK();
+_queue_mem_tracker = std::make_unique("local data queue mem 
tracker");
 }
 
 VDataStreamRecvr::SenderQueue::~SenderQueue() {
@@ -98,17 +99,14 @@ Status 
VDataStreamRecvr::SenderQueue::_inner_get_batch_without_lock(Block* block
 
 DCHECK(!_block_queue.empty());
 auto [next_block, block_byte_size] = std::move(_block_queue.front());
-update_blocks_memory_usage(-block_byte_size);
 _block_queue.pop_front();
+sub_blocks_memory_usage(block_byte_size);
 _record_debug_info();
 if (_block_queue.empty() && _source_dependency) {
 if (!_is_cancelled && _num_remaining_senders > 0) {
 _source_dependency->block();
 }
 }
-if (_local_channel_dependency) {
-_local_channel_dependency->set_ready();
-}
 
 if (!_pending_closures.empty()) {
 auto closure_pair = _pending_closures.front();
@@ -136,9 +134,6 @@ void 
VDataStreamRecvr::SenderQueue::try_set_dep_ready_without_lock() {
 Status VDataStreamRecvr::SenderQueue::add_block(const PBlock& pblock, int 
be_number,
 int64_t packet_seq,
 ::google::protobuf::Closure** 
done) {
-const auto pblock_byte_size = pblock.ByteSizeLong();
-COUNTER_UPDATE(_recvr->_bytes_received_counter, pblock_byte_size);
-
 {
 std::lock_guard l(_lock);
 if (_is_cancelled) {
@@ -191,6 +186,7 @@ Status VDataStreamRecvr::SenderQueue::add_block(const 
PBlock& pblock, int be_num
 COUNTER_UPDATE(_recvr->_blocks_produced_counter, 1);
 
 _block_queue.emplace_back(std::move(block), block_byte_size);
+COUNTER_UPDATE(_recvr->_remote_bytes_received_counter, block_byte_size);
 _record_debug_info();
 try_set_dep_ready_without_lock();
 
@@ -202,7 +198,7 @@ Status VDataStreamRecvr::SenderQueue::add_block(const 
PBlock& pblock, int be_num
 _pending_closures.emplace_back(*done, monotonicStopWatch);
 *done = nullptr;
 }
-update_blocks_memory_usage(block_byte_size);
+add_blocks_memory_usage(block_byte_size);
 _data_arrival_cv.notify_one();
 return Status::OK();
 }
@@ -216,7 +212,6 @@ void VDataStreamRecvr::SenderQueue::add_block(Block* block, 
bool use_move) {
 }
 }
 
-auto block_bytes_received = block->bytes();
 // Has to use unique ptr here, because clone column may failed if allocate 
memory failed.
 BlockUPtr nblock = 
Block::create_unique(block->get_columns_with_type_and_name());
 
@@ -236,11 +231,11 @@ void VDataStreamRecvr::SenderQueue::add_block(Block* 
block, bool use_move) {
 if (_is_cancelled) {
 return;
 }
-COUNTER_UPDATE(_recvr->_local_bytes_received_counter, 
block_bytes_received);
 COUNTER_UPDATE(_recvr->_rows_produced_counter, rows);
 COUNTER_UPDATE(_recvr->_blocks_produced_counter, 1);
 
 _block_queue.emplace_back(std::move(nblock), block_mem_size);
+COUNTER_UPDATE(_recvr->_local_bytes_received_counter, block_mem_size);
 _record_debug_info();
 try_set_dep_ready_without_lock();
 _data_arrival_cv.notify_one();
@@ -249,7 +244,7 @@ void VDataStreamRecvr::SenderQueue::add_block(Block* block, 
bool use_move) {
 // should be done before the following logic, because the _lock will be 
released
 // by `iter->second->wait(l)`, after `iter->second->wait(l)` returns, 
_recvr may
 // have been closed and resouces in _recvr are released;
-update_blocks_memory_usage(block_mem_size);
+add_blocks_memory_usage(bloc

(doris-website) branch master updated: [Doc] update fragment mgr aynic work pool config (#1024)

2024-08-21 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
 new 43c3c3bf4a [Doc] update fragment mgr aynic work pool config (#1024)
43c3c3bf4a is described below

commit 43c3c3bf4a4f7db2551cbaa350a98fd60a45037f
Author: Pxl 
AuthorDate: Thu Aug 22 13:44:50 2024 +0800

[Doc] update fragment mgr aynic work pool config (#1024)
---
 docs/admin-manual/config/be-config.md| 16 
 .../current/admin-manual/config/be-config.md | 16 
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/docs/admin-manual/config/be-config.md 
b/docs/admin-manual/config/be-config.md
index 4a08abb464..e86d0a2698 100644
--- a/docs/admin-manual/config/be-config.md
+++ b/docs/admin-manual/config/be-config.md
@@ -334,20 +334,20 @@ The maximum size of a (received) message of the thrift 
server, in bytes. If the
 
 ### Query
 
- `fragment_pool_queue_size`
+ `fragment_mgr_asynic_work_pool_queue_size`
 
-* Description: The upper limit of query requests that can be processed on a 
single node
+* Description: The upper limit of asynic work that can be processed on a 
single node
 * Default value: 4096
 
- `fragment_pool_thread_num_min`
+ `fragment_mgr_asynic_work_pool_thread_num_min`
 
-* Description: Query the number of threads. By default, the minimum number of 
threads is 64.
-* Default value: 64
+* Description: Number of threads to excute asynic work. By default, the 
minimum number of threads is 16.
+* Default value: 16
 
- `fragment_pool_thread_num_max`
+ `fragment_mgr_asynic_work_pool_thread_num_max`
 
-* Description: Follow up query requests create threads dynamically, with a 
maximum of 512 threads created.
-* Default value: 2048
+* Description: Follow up asynic work create threads dynamically, with a 
maximum of 512 threads created.
+* Default value: 512
 
  `doris_max_scan_key_num`
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/be-config.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/be-config.md
index 7a01d7a5ff..b1417b9acb 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/be-config.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/be-config.md
@@ -345,20 +345,20 @@ Thrift 服务器接收请求消息的大小(字节数)上限。如果客户
 
 ### 查询
 
- `fragment_pool_queue_size`
+ `fragment_mgr_asynic_work_pool_queue_size`
 
-* 描述:单节点上能够处理的查询请求上限
+* 描述:单节点上异步任务的队列上限
 * 默认值:4096
 
- `fragment_pool_thread_num_min`
+ `fragment_mgr_asynic_work_pool_thread_num_min`
 
-* 描述:查询线程数,默认最小启动 64 个线程。
-* 默认值:64
+* 描述:处理异步任务的线程数,默认最小启动 16 个线程。
+* 默认值:16
 
- `fragment_pool_thread_num_max`
+ `fragment_mgr_asynic_work_pool_thread_num_max`
 
-* 描述:后续查询请求动态创建线程,最大创建 512 个线程。
-* 默认值:2048
+* 描述:根据后续任务动态创建线程,最大创建 512 个线程。
+* 默认值:512
 
  `doris_max_scan_key_num`
 


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [Improvement](sort) add session variable force_sort_algorithm and adjust some parameter about sort (#39334)

2024-08-16 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 7372c99eca1 [Improvement](sort) add session variable 
force_sort_algorithm and adjust some parameter about sort (#39334)
7372c99eca1 is described below

commit 7372c99eca1ce79bae320c0c2caba85bb9e88355
Author: Pxl 
AuthorDate: Fri Aug 16 15:13:30 2024 +0800

[Improvement](sort) add session variable force_sort_algorithm and adjust 
some parameter about sort (#39334)

1. add force_sort_algorithm to set sort algorithm
2. do not use partitial sort on column string
```sql
select count(*) from (select lo_orderpriority from lineorder order by 
lo_orderpriority limit 10)t;
partition sort: 22s
pdq sort: 8s
```
4. enlarge topn_opt_limit_threshold to 1024
```sql
select count(*) from (select * from lineorder order by lo_linenumber limit 
10)t;
heap 1s
set topn_opt_limit_threshold=1024; heap  0.4s

select count(*) from (select * from lineorder order by lo_linenumber limit 
1000)t;
heap 13s
set topn_opt_limit_threshold=1024; heap 12s

select count(*) from (select * from lineorder order by lo_linenumber limit 
1)t;
heap 2min13s
set topn_opt_limit_threshold=10240;  heap 2 min 22.56 sec

select count(*) from (select lo_orderpriority from lineorder order by 
lo_orderpriority limit 10)t;
heap 2.4s
set topn_opt_limit_threshold=10240;  heap 1s

select count(*) from (select lo_orderpriority from lineorder order by 
lo_orderpriority limit 1000)t;
heap 21s
set topn_opt_limit_threshold=10240; heap 20s

```
---
 be/src/vec/columns/column_string.cpp   | 17 +++--
 .../java/org/apache/doris/planner/SortNode.java| 28 +++---
 .../java/org/apache/doris/qe/SessionVariable.java  |  9 ++-
 3 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/be/src/vec/columns/column_string.cpp 
b/be/src/vec/columns/column_string.cpp
index c3cf6dadf0a..d8fd42e36c7 100644
--- a/be/src/vec/columns/column_string.cpp
+++ b/be/src/vec/columns/column_string.cpp
@@ -483,21 +483,10 @@ void ColumnStr::get_permutation(bool reverse, size_t 
limit, int /*nan_directi
 res[i] = i;
 }
 
-// std::partial_sort need limit << s can get performance benefit
-if (limit > (s / 8.0)) limit = 0;
-
-if (limit) {
-if (reverse) {
-std::partial_sort(res.begin(), res.begin() + limit, res.end(), 
less(*this));
-} else {
-std::partial_sort(res.begin(), res.begin() + limit, res.end(), 
less(*this));
-}
+if (reverse) {
+pdqsort(res.begin(), res.end(), less(*this));
 } else {
-if (reverse) {
-pdqsort(res.begin(), res.end(), less(*this));
-} else {
-pdqsort(res.begin(), res.end(), less(*this));
-}
+pdqsort(res.begin(), res.end(), less(*this));
 }
 }
 
diff --git a/fe/fe-core/src/main/java/org/apache/doris/planner/SortNode.java 
b/fe/fe-core/src/main/java/org/apache/doris/planner/SortNode.java
index 5a8f9f628f8..34cbcf9f620 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/planner/SortNode.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/planner/SortNode.java
@@ -29,6 +29,7 @@ import org.apache.doris.analysis.SlotRef;
 import org.apache.doris.analysis.SortInfo;
 import org.apache.doris.common.NotImplementedException;
 import org.apache.doris.common.UserException;
+import org.apache.doris.qe.ConnectContext;
 import org.apache.doris.statistics.StatisticalType;
 import org.apache.doris.statistics.StatsRecursiveDerive;
 import org.apache.doris.thrift.TExplainLevel;
@@ -339,16 +340,27 @@ public class SortNode extends PlanNode {
 msg.sort_node.setIsAnalyticSort(isAnalyticSort);
 msg.sort_node.setIsColocate(isColocate);
 
-boolean isFixedLength = info.getOrderingExprs().stream().allMatch(e -> 
!e.getType().isStringType()
-&& !e.getType().isCollectionType());
+boolean isFixedLength = info.getOrderingExprs().stream()
+.allMatch(e -> !e.getType().isStringType() && 
!e.getType().isCollectionType());
+ConnectContext connectContext = ConnectContext.get();
 TSortAlgorithm algorithm;
-if (limit > 0 && limit + offset < 1024 && (useTwoPhaseReadOpt || 
hasRuntimePredicate
-|| isFixedLength)) {
-algorithm = TSortAlgorithm.HEAP_SORT;
-} else if (limit > 0 && !isFixedLength && limit + offset < 256) {
-algorithm = TSortAlgorithm.TOPN_SORT;
+if (connectContext != null && 
!connectContext.getSessionVariable().forceSortAlgorithm.isEmpty()) {
+Str

(doris) branch master updated (8306a21238e -> 7a0920223e5)

2024-08-15 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 8306a21238e [Fix](Txn) Fix wrong columns sequence in txn model (#39295)
 add 7a0920223e5 [chore](conf)remove unused  doris_max_scan_key_num and 
max_send_batch_parallelism_per_job conf (#39219)

No new revisions were added by this update.

Summary of changes:
 be/src/common/config.cpp   | 13 -
 be/src/common/config.h | 11 ---
 be/src/pipeline/exec/scan_operator.cpp |  4 
 be/src/pipeline/exec/scan_operator.h   |  4 ++--
 be/src/vec/sink/writer/vtablet_writer.cpp  |  4 +---
 .../src/main/java/org/apache/doris/qe/SessionVariable.java | 14 --
 6 files changed, 7 insertions(+), 43 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (8befc47c1ca -> 389c77aa2df)

2024-08-15 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 8befc47c1ca [enhancement][regression] remove binary data in regression 
test output file (#39364)
 add 389c77aa2df [fix](java udf) fix clean_udf_cache_callback without 
enable_java_support   (#39340)

No new revisions were added by this update.

Summary of changes:
 be/src/agent/task_worker_pool.cpp | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (b949929276c -> ea9682aa31c)

2024-08-13 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from b949929276c [fix](function) fix error return type in 
corr(float32,float32)  (#39251)
 add ea9682aa31c [fix](function) Results for stddev with nan and inf are 
unstable. (#38866)

No new revisions were added by this update.

Summary of changes:
 .../aggregate_function_stddev.h| 14 +++-
 .../aggregate/aggregate_stddev_over_range.out} |  3 +
 .../aggregate/aggregate_stddev_over_range.groovy   | 90 ++
 3 files changed, 105 insertions(+), 2 deletions(-)
 copy 
regression-test/data/{auto_increment_p2/test_unique_auto_inc_concurrent.out => 
nereids_p0/aggregate/aggregate_stddev_over_range.out} (93%)
 create mode 100644 
regression-test/suites/nereids_p0/aggregate/aggregate_stddev_over_range.groovy


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [fix](function) fix error return type in corr(float32,float32) (#39251)

2024-08-13 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new b949929276c [fix](function) fix error return type in 
corr(float32,float32)  (#39251)
b949929276c is described below

commit b949929276c199e0667e88ed617cbe85da5791b4
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Wed Aug 14 11:53:34 2024 +0800

[fix](function) fix error return type in corr(float32,float32)  (#39251)

```
mysql [test11]>select corr(cast(x as float),cast(y as float)) from 
test_corr;
ERROR 1105 (HY000): errCode = 2, detailMessage = 
(127.0.0.1)[INTERNAL_ERROR]column_type not match data_types in agg node, 
column_type=Nullable(Float64), data_types=Nullable(Float32),column name=

```
---
 be/src/vec/aggregate_functions/aggregate_function_binary.h | 3 +--
 regression-test/data/nereids_function_p0/agg_function/test_corr.out| 3 +++
 .../suites/nereids_function_p0/agg_function/test_corr.groovy   | 3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/be/src/vec/aggregate_functions/aggregate_function_binary.h 
b/be/src/vec/aggregate_functions/aggregate_function_binary.h
index ca06cc1bb81..a5b6e2b1e0e 100644
--- a/be/src/vec/aggregate_functions/aggregate_function_binary.h
+++ b/be/src/vec/aggregate_functions/aggregate_function_binary.h
@@ -41,8 +41,7 @@ template  
typename Moments>
 struct StatFunc {
 using Type1 = T1;
 using Type2 = T2;
-using ResultType = std::conditional_t && 
std::is_same_v,
-  Float32, Float64>;
+using ResultType = Float64;
 using Data = Moments;
 };
 
diff --git 
a/regression-test/data/nereids_function_p0/agg_function/test_corr.out 
b/regression-test/data/nereids_function_p0/agg_function/test_corr.out
index 4fc9a9d4baa..c694f95ebec 100644
--- a/regression-test/data/nereids_function_p0/agg_function/test_corr.out
+++ b/regression-test/data/nereids_function_p0/agg_function/test_corr.out
@@ -11,3 +11,6 @@
 -- !sql --
 0.894427190159
 
+-- !sql --
+0.894427190159
+
diff --git 
a/regression-test/suites/nereids_function_p0/agg_function/test_corr.groovy 
b/regression-test/suites/nereids_function_p0/agg_function/test_corr.groovy
index 15f27f84276..09ed98fab06 100644
--- a/regression-test/suites/nereids_function_p0/agg_function/test_corr.groovy
+++ b/regression-test/suites/nereids_function_p0/agg_function/test_corr.groovy
@@ -80,6 +80,7 @@ suite("test_corr") {
 (5, 5, 10)
 """
 qt_sql "select corr(x,y) from test_corr"
-
+
+qt_sql "select corr(cast(x as float),cast(y as float)) from test_corr"
 sql """ DROP TABLE IF EXISTS test_corr """
 }


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris-website) branch master updated: [docs](function) support split_by_regexp function (#904)

2024-08-12 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
 new d95d45293bb [docs](function) support split_by_regexp function (#904)
d95d45293bb is described below

commit d95d45293bb0d40111f7a68ec42e65ceb4a464aa
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Tue Aug 13 12:07:03 2024 +0800

[docs](function) support split_by_regexp function (#904)
---
 .../string-functions/split-by-regexp.md| 62 ++
 .../string-functions/split-by-regexp.md| 62 ++
 sidebars.json  |  1 +
 3 files changed, 125 insertions(+)

diff --git a/docs/sql-manual/sql-functions/string-functions/split-by-regexp.md 
b/docs/sql-manual/sql-functions/string-functions/split-by-regexp.md
new file mode 100644
index 000..11a7c7d7fc2
--- /dev/null
+++ b/docs/sql-manual/sql-functions/string-functions/split-by-regexp.md
@@ -0,0 +1,62 @@
+---
+{
+"title": "SPLIT_BY_REGEXP",
+"language": "en"
+}
+---
+
+
+
+## split_by_regexp
+
+### description
+
+ Syntax
+
+`ARRAY split_by_regexp(STRING str, STRING pattern[, int max_limit])`
+
+Split the string 'str' based on the input regular expression 'pattern', with 
the option to retain up to the maximum number 'max_imit'. By default, all 
strings will be retained, and a split string array will be returned.
+
+ Arguments
+`Str ` - The string that needs to be split Type: `String`
+`Pattern `- Regular expression Type: `String`
+`Max_imit ` - Reserved number, optional parameter Type: `Int`
+
+
+### example
+
+```
+mysql [test_query_qa]>select split_by_regexp('abcde',"");
++--+
+| split_by_regexp('abcde', '') |
++--+
+| ["a", "b", "c", "d", "e"]|
++--+
+1 row in set (0.02 sec)
+
+mysql [test_query_qa]>select split_by_regexp('a12bc23de345f',"\\d+");
++-+
+| split_by_regexp('a12bc23de345f', '\d+') |
++-+
+| ["a", "bc", "de", "f"]  |
++-+
+1 row in set (0.01 sec)
+```
+### keywords
+
+SPLIT_BY_REGEXP,SPLIT
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/string-functions/split-by-regexp.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/string-functions/split-by-regexp.md
new file mode 100644
index 000..49b81da16ca
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/string-functions/split-by-regexp.md
@@ -0,0 +1,62 @@
+---
+{
+"title": "SPLIT_BY_REGEXP",
+"language": "zh-CN"
+}
+---
+
+
+
+## split_by_regexp
+
+
+### description
+
+ Syntax
+
+`ARRAY split_by_regexp(STRING str, STRING pattern[, int max_limit])`
+将字符串 `str` ,根据输入的正则表达式 `pattern` 进行拆分,可选择保留的个数 `max_limit` ,默认全部保留, 
最终返回一个拆分好的字符串数组。
+
+ Arguments
+
+`str` — 需要分割的字符串. 类型: `String`
+`pattern` — 正则表达式. 类型: `String`
+`max_limit` — 保留个数,可选参数. 类型: `Int`
+
+### example
+
+```
+mysql [test_query_qa]>select split_by_regexp('abcde',"");
++--+
+| split_by_regexp('abcde', '') |
++--+
+| ["a", "b", "c", "d", "e"]|
++--+
+1 row in set (0.02 sec)
+
+mysql [test_query_qa]>select split_by_regexp('a12bc23de345f',"\\d+");
++-+
+| split_by_regexp('a12bc23de345f', '\d+') |
++-+
+| ["a", "bc", "de", "f"]  |
++-+
+1 row in set (0.01 sec)
+```
+### keywords
+
+SPLIT_BY_REGEXP
diff --git a/sidebars.json b/sidebars.json
index 2a753e865d2..ebfa49e556b 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -765,6 +765,7 @@
 
"sql-manual/sql-functions/string-functions/strright",
 
"sql-manual/sql-functions/string-functions/split-part",
 
"sql-manual/sql-functions/string-functions/split-by-string",
+
"sql-manual/sql-functions/string-functions/split-by-regexp",
 
"sql-manual/sql-functions/string-functions/substring-index",
 
"sql-manual/sql-functions/string-functions/money-format",
 
"sql-manual/sql-functions/string-functions/parse-url",


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris-website) branch master updated: [fix](function) remove function sleep and fix array_split function title (#993)

2024-08-12 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
 new 6efe38120cb [fix](function) remove function sleep and fix array_split 
function title (#993)
6efe38120cb is described below

commit 6efe38120cbef4a72c34f74e82959df99064d5dc
Author: zclllhhjj 
AuthorDate: Tue Aug 13 11:40:19 2024 +0800

[fix](function) remove function sleep and fix array_split function title 
(#993)

1. function sleep should only appear in dev/debug-functions. remove them
from all other version docs
2. fix array-split title
---
 .../sql-functions/array-functions/array-split.md   |  2 +-
 .../sql-functions/array-functions/array-split.md   |  2 +-
 .../sql-functions/string-functions/sleep.md| 48 --
 .../sql-functions/string-functions/sleep.md| 48 --
 .../sql-functions/string-functions/sleep.md| 48 --
 .../sql-functions/array-functions/array-split.md   |  2 +-
 .../sql-functions/string-functions/sleep.md| 48 --
 .../sql-functions/string-functions/sleep.md| 48 --
 .../sql-functions/string-functions/sleep.md| 48 --
 .../sql-functions/string-functions/sleep.md| 48 --
 .../sql-functions/array-functions/array-split.md   |  2 +-
 .../sql-functions/string-functions/sleep.md| 48 --
 versioned_sidebars/version-1.2-sidebars.json   |  1 -
 versioned_sidebars/version-2.0-sidebars.json   |  1 -
 versioned_sidebars/version-2.1-sidebars.json   |  1 -
 versioned_sidebars/version-3.0-sidebars.json   |  1 -
 16 files changed, 4 insertions(+), 392 deletions(-)

diff --git a/docs/sql-manual/sql-functions/array-functions/array-split.md 
b/docs/sql-manual/sql-functions/array-functions/array-split.md
index 094a2de8e24..3ae2dbe62ae 100644
--- a/docs/sql-manual/sql-functions/array-functions/array-split.md
+++ b/docs/sql-manual/sql-functions/array-functions/array-split.md
@@ -22,7 +22,7 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## array_sortby
+## array_split
 
 array_split
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/array-functions/array-split.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/array-functions/array-split.md
index aa771a709fd..71c41dfe726 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/array-functions/array-split.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/array-functions/array-split.md
@@ -22,7 +22,7 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## array_sortby
+## array_split
 
 array_split
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/sql-manual/sql-functions/string-functions/sleep.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/sql-manual/sql-functions/string-functions/sleep.md
deleted file mode 100644
index 3a3e6280fac..000
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-1.2/sql-manual/sql-functions/string-functions/sleep.md
+++ /dev/null
@@ -1,48 +0,0 @@

-{
-"title": "sleep",
-"language": "zh-CN"
-}

-
-
-
-## sleep
-### description
- Syntax
-
-`BOOLEAN sleep(INT num)`
-
-使该线程休眠num秒。
-
-### example
-
-```
-mysql> select sleep(10);
-+---+
-| sleep(10) |
-+---+
-| 1 |
-+---+
-1 row in set (10.01 sec)
-
-```
-### keywords
-sleep
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-functions/string-functions/sleep.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-functions/string-functions/sleep.md
deleted file mode 100644
index 1e277869756..000
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-functions/string-functions/sleep.md
+++ /dev/null
@@ -1,48 +0,0 @@

-{
-"title": "SLEEP",
-"language": "zh-CN"
-}

-
-
-
-## sleep
-### description
- Syntax
-
-`BOOLEAN sleep(INT num)`
-
-使该线程休眠num秒。
-
-### example
-
-```
-mysql> select sleep(10);
-+---+
-| sleep(10) |
-+---+
-| 1 |
-+---+
-1 row in set (10.01 sec)
-
-```
-### keywords
-sleep
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/string-functions/sleep.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/string-functions/sleep.md
deleted file mode 100644
index 1e277869756..000
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-fun

(doris-website) branch master updated: [fix](function) Add parameter restrictions for function random (#992)

2024-08-12 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
 new a925249a9d6 [fix](function) Add parameter restrictions for function 
random (#992)
a925249a9d6 is described below

commit a925249a9d6dc53ac5a6b008f3bf0a0229b17d32
Author: zclllhhjj 
AuthorDate: Tue Aug 13 10:24:02 2024 +0800

[fix](function) Add parameter restrictions for function random (#992)

master pr: https://github.com/apache/doris/pull/39255
---
 docs/sql-manual/sql-functions/numeric-functions/random.md   | 2 ++
 .../current/sql-manual/sql-functions/numeric-functions/random.md| 2 ++
 .../version-2.1/sql-manual/sql-functions/numeric-functions/random.md| 2 ++
 .../version-3.0/sql-manual/sql-functions/numeric-functions/random.md| 2 ++
 .../version-2.1/sql-manual/sql-functions/numeric-functions/random.md| 2 ++
 .../version-3.0/sql-manual/sql-functions/numeric-functions/random.md| 2 ++
 6 files changed, 12 insertions(+)

diff --git a/docs/sql-manual/sql-functions/numeric-functions/random.md 
b/docs/sql-manual/sql-functions/numeric-functions/random.md
index 53e3ba4158f..722ff2e020e 100644
--- a/docs/sql-manual/sql-functions/numeric-functions/random.md
+++ b/docs/sql-manual/sql-functions/numeric-functions/random.md
@@ -38,6 +38,8 @@ Returns a random number between a and b. a must be less than 
b.
 
 Alias: `rand`.
 
+Note: All parameters must be constants.
+
 ### example
 
 ```sql
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/numeric-functions/random.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/numeric-functions/random.md
index 36442afffd2..59100fee8e5 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/numeric-functions/random.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/numeric-functions/random.md
@@ -38,6 +38,8 @@ under the License.
 
 别名:`rand`
 
+注意:所有参数必须为常量。
+
 ### example
 
 ```sql
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/numeric-functions/random.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/numeric-functions/random.md
index 36442afffd2..59100fee8e5 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/numeric-functions/random.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/numeric-functions/random.md
@@ -38,6 +38,8 @@ under the License.
 
 别名:`rand`
 
+注意:所有参数必须为常量。
+
 ### example
 
 ```sql
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/numeric-functions/random.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/numeric-functions/random.md
index 36442afffd2..59100fee8e5 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/numeric-functions/random.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/numeric-functions/random.md
@@ -38,6 +38,8 @@ under the License.
 
 别名:`rand`
 
+注意:所有参数必须为常量。
+
 ### example
 
 ```sql
diff --git 
a/versioned_docs/version-2.1/sql-manual/sql-functions/numeric-functions/random.md
 
b/versioned_docs/version-2.1/sql-manual/sql-functions/numeric-functions/random.md
index 53e3ba4158f..722ff2e020e 100644
--- 
a/versioned_docs/version-2.1/sql-manual/sql-functions/numeric-functions/random.md
+++ 
b/versioned_docs/version-2.1/sql-manual/sql-functions/numeric-functions/random.md
@@ -38,6 +38,8 @@ Returns a random number between a and b. a must be less than 
b.
 
 Alias: `rand`.
 
+Note: All parameters must be constants.
+
 ### example
 
 ```sql
diff --git 
a/versioned_docs/version-3.0/sql-manual/sql-functions/numeric-functions/random.md
 
b/versioned_docs/version-3.0/sql-manual/sql-functions/numeric-functions/random.md
index 53e3ba4158f..722ff2e020e 100644
--- 
a/versioned_docs/version-3.0/sql-manual/sql-functions/numeric-functions/random.md
+++ 
b/versioned_docs/version-3.0/sql-manual/sql-functions/numeric-functions/random.md
@@ -38,6 +38,8 @@ Returns a random number between a and b. a must be less than 
b.
 
 Alias: `rand`.
 
+Note: All parameters must be constants.
+
 ### example
 
 ```sql


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris-website) branch master updated: [Refactor](exec) remove the useless config:doris_scanner_queue_size (#990)

2024-08-12 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
 new 8d2a6646d96 [Refactor](exec) remove the useless 
config:doris_scanner_queue_size (#990)
8d2a6646d96 is described below

commit 8d2a6646d966139bce2d9fb1906f78d8cb3a266f
Author: HappenLee 
AuthorDate: Mon Aug 12 22:24:03 2024 +0800

[Refactor](exec) remove the useless config:doris_scanner_queue_size (#990)

remove the useless config:doris_scanner_queue_size
---
 docs/admin-manual/config/be-config.md   | 6 --
 .../current/admin-manual/config/be-config.md| 6 --
 .../version-2.0/admin-manual/config/be-config.md| 6 --
 .../version-2.1/admin-manual/config/be-config.md| 6 --
 .../version-3.0/admin-manual/config/be-config.md| 6 --
 versioned_docs/version-2.0/admin-manual/config/be-config.md | 6 --
 versioned_docs/version-2.1/admin-manual/config/be-config.md | 6 --
 versioned_docs/version-3.0/admin-manual/config/be-config.md | 6 --
 8 files changed, 48 deletions(-)

diff --git a/docs/admin-manual/config/be-config.md 
b/docs/admin-manual/config/be-config.md
index 3c0002dac35..4a08abb4640 100644
--- a/docs/admin-manual/config/be-config.md
+++ b/docs/admin-manual/config/be-config.md
@@ -362,12 +362,6 @@ The maximum size of a (received) message of the thrift 
server, in bytes. If the
 * Description: When BE performs data scanning, it will split the same scanning 
range into multiple ScanRanges. This parameter represents the scan data range 
of each ScanRange. This parameter can limit the time that a single OlapScanner 
occupies the io thread.
 * Default value: 524288
 
- `doris_scanner_queue_size`
-
-* Type: int32
-* Description: The length of the RowBatch buffer queue between TransferThread 
and OlapScanner. When Doris performs data scanning, it is performed 
asynchronously. The Rowbatch scanned by OlapScanner will be placed in the 
scanner buffer queue, waiting for the upper TransferThread to take it away.
-* Default value: 1024
-
  `doris_scanner_row_num`
 
 * Description: The maximum number of data rows returned by each scanning 
thread in a single execution
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/be-config.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/be-config.md
index c3be1fb8a7b..7a01d7a5ffb 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/be-config.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/config/be-config.md
@@ -373,12 +373,6 @@ Thrift 服务器接收请求消息的大小(字节数)上限。如果客户
 * 描述:BE 在进行数据扫描时,会将同一个扫描范围拆分为多个 ScanRange。该参数代表了每个 ScanRange 
代表扫描数据范围。通过该参数可以限制单个 OlapScanner 占用 io 线程的时间。
 * 默认值:524288
 
- `doris_scanner_queue_size`
-
-* 类型:int32
-* 描述:TransferThread 与 OlapScanner 之间 RowBatch 
的缓存队列的长度。Doris、进行数据扫描时是异步进行的,OlapScanner 扫描上来的 Rowbatch 会放入缓存队列之中,等待上层 
TransferThread 取走。
-* 默认值:1024
-
  `doris_scanner_row_num`
 
 * 描述:每个扫描线程单次执行最多返回的数据行数
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/config/be-config.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/config/be-config.md
index ea420f07841..f07712bb0ae 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/config/be-config.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/admin-manual/config/be-config.md
@@ -383,12 +383,6 @@ thrift 服务器接收请求消息的大小(字节数)上限。如果客户
 * 描述:BE 在进行数据扫描时,会将同一个扫描范围拆分为多个 ScanRange。该参数代表了每个 ScanRange 
代表扫描数据范围。通过该参数可以限制单个 OlapScanner 占用 io 线程的时间。
 * 默认值:524288
 
- `doris_scanner_queue_size`
-
-* 类型:int32
-* 描述:TransferThread 与 OlapScanner 之间 RowBatch 的缓存队列的长度。Doris 
进行数据扫描时是异步进行的,OlapScanner 扫描上来的 Rowbatch 会放入缓存队列之中,等待上层 TransferThread 取走。
-* 默认值:1024
-
  `doris_scanner_row_num`
 
 * 描述:每个扫描线程单次执行最多返回的数据行数
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/config/be-config.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/config/be-config.md
index e5405bba80a..bc3ed9620ef 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/config/be-config.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/config/be-config.md
@@ -375,12 +375,6 @@ thrift 服务器接收请求消息的大小(字节数)上限。如果客户
 * 描述:BE 在进行数据扫描时,会将同一个扫描范围拆分为多个 ScanRange。该参数代表了每个 ScanRange 
代表扫描数据范围。通过该参数可以限制单个 OlapScanner 占用 io 线程的时间。
 * 默认值:524288
 
- `doris_scanner_queue_size`
-
-* 类型:int32
-* 描述:TransferThread 与 OlapScanner 之间 RowBatch 的缓存队列的长度。Doris 
进行数据扫描时是异步进行的,OlapScanner 扫描上来的 Rowbatch 会放入缓存队列之中,等待上层 TransferThread 取走。
-* 默认值:1024
-
  `doris_scanner_row_num`
 
 * 描述:每个扫描线程单次执行最多返回的数据行数
diff --git 
a/i18n/zh

(doris) branch master updated: [Refactor](config) remove unless config in doris (#39218)

2024-08-12 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 3040b1aca96 [Refactor](config) remove unless config in doris (#39218)
3040b1aca96 is described below

commit 3040b1aca9621ddf4b860e0fca13dd1f9883e1b0
Author: HappenLee 
AuthorDate: Mon Aug 12 21:30:05 2024 +0800

[Refactor](config) remove unless config in doris (#39218)

remove unless config in doris
`doris_scanner_queue_size`
---
 be/src/common/config.cpp | 2 --
 be/src/common/config.h   | 2 --
 2 files changed, 4 deletions(-)

diff --git a/be/src/common/config.cpp b/be/src/common/config.cpp
index bc2f6d3e025..6d23ce0eed6 100644
--- a/be/src/common/config.cpp
+++ b/be/src/common/config.cpp
@@ -268,8 +268,6 @@ DEFINE_mInt32(doris_scan_range_row_count, "524288");
 DEFINE_mInt32(doris_scan_range_max_mb, "1024");
 // max bytes number for single scan block, used in segmentv2
 DEFINE_mInt32(doris_scan_block_max_mb, "67108864");
-// size of scanner queue between scanner thread and compute thread
-DEFINE_mInt32(doris_scanner_queue_size, "1024");
 // single read execute fragment row number
 DEFINE_mInt32(doris_scanner_row_num, "16384");
 // single read execute fragment row bytes
diff --git a/be/src/common/config.h b/be/src/common/config.h
index 3c43ed66593..3dc9b2deed7 100644
--- a/be/src/common/config.h
+++ b/be/src/common/config.h
@@ -319,8 +319,6 @@ DECLARE_mInt32(doris_scan_range_row_count);
 DECLARE_mInt32(doris_scan_range_max_mb);
 // max bytes number for single scan block, used in segmentv2
 DECLARE_mInt32(doris_scan_block_max_mb);
-// size of scanner queue between scanner thread and compute thread
-DECLARE_mInt32(doris_scanner_queue_size);
 // single read execute fragment row number
 DECLARE_mInt32(doris_scanner_row_num);
 // single read execute fragment row bytes


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.0 updated: [Performance](opt) opt the order by performance in permutation (#39092)

2024-08-10 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.0 by this push:
 new 0fe67a8e763 [Performance](opt) opt the order by performance in 
permutation (#39092)
0fe67a8e763 is described below

commit 0fe67a8e76397b120427f8105f19da23f387d724
Author: HappenLee 
AuthorDate: Sat Aug 10 19:37:30 2024 +0800

[Performance](opt) opt the order by performance in permutation (#39092)

Issue Number: cherry pick #38985
---
 be/src/vec/columns/column_decimal.h  | 25 +
 be/src/vec/columns/column_string.cpp |  9 -
 be/src/vec/columns/column_vector.cpp |  3 ++-
 3 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/be/src/vec/columns/column_decimal.h 
b/be/src/vec/columns/column_decimal.h
index 8d10fb806e4..26ec505e426 100644
--- a/be/src/vec/columns/column_decimal.h
+++ b/be/src/vec/columns/column_decimal.h
@@ -21,6 +21,7 @@
 #pragma once
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -294,14 +295,22 @@ protected:
 for (U i = 0; i < s; ++i) res[i] = i;
 
 auto sort_end = res.end();
-if (limit && limit < s) sort_end = res.begin() + limit;
-
-if (reverse)
-std::partial_sort(res.begin(), sort_end, res.end(),
-  [this](size_t a, size_t b) { return data[a] > 
data[b]; });
-else
-std::partial_sort(res.begin(), sort_end, res.end(),
-  [this](size_t a, size_t b) { return data[a] < 
data[b]; });
+if (limit && limit < s / 8.0) {
+sort_end = res.begin() + limit;
+if (reverse)
+std::partial_sort(res.begin(), sort_end, res.end(),
+  [this](size_t a, size_t b) { return data[a] 
> data[b]; });
+else
+std::partial_sort(res.begin(), sort_end, res.end(),
+  [this](size_t a, size_t b) { return data[a] 
< data[b]; });
+} else {
+if (reverse)
+pdqsort(res.begin(), res.end(),
+[this](size_t a, size_t b) { return data[a] > data[b]; 
});
+else
+pdqsort(res.begin(), res.end(),
+[this](size_t a, size_t b) { return data[a] < data[b]; 
});
+}
 }
 
 void ALWAYS_INLINE decimalv2_do_crc(size_t i, uint64_t& hash) const {
diff --git a/be/src/vec/columns/column_string.cpp 
b/be/src/vec/columns/column_string.cpp
index 5d2670acb78..e5f900f62a0 100644
--- a/be/src/vec/columns/column_string.cpp
+++ b/be/src/vec/columns/column_string.cpp
@@ -381,9 +381,8 @@ void ColumnString::get_permutation(bool reverse, size_t 
limit, int /*nan_directi
 res[i] = i;
 }
 
-if (limit >= s) {
-limit = 0;
-}
+// std::partial_sort need limit << s can get performance benefit
+if (limit > (s / 8.0)) limit = 0;
 
 if (limit) {
 if (reverse) {
@@ -393,9 +392,9 @@ void ColumnString::get_permutation(bool reverse, size_t 
limit, int /*nan_directi
 }
 } else {
 if (reverse) {
-std::sort(res.begin(), res.end(), less(*this));
+pdqsort(res.begin(), res.end(), less(*this));
 } else {
-std::sort(res.begin(), res.end(), less(*this));
+pdqsort(res.begin(), res.end(), less(*this));
 }
 }
 }
diff --git a/be/src/vec/columns/column_vector.cpp 
b/be/src/vec/columns/column_vector.cpp
index 1c96f4f2e6c..c12b14dd57e 100644
--- a/be/src/vec/columns/column_vector.cpp
+++ b/be/src/vec/columns/column_vector.cpp
@@ -245,7 +245,8 @@ void ColumnVector::get_permutation(bool reverse, size_t 
limit, int nan_direct
 
 if (s == 0) return;
 
-if (limit >= s) limit = 0;
+// std::partial_sort need limit << s can get performance benefit
+if (limit > (s / 8.0)) limit = 0;
 
 if (limit) {
 for (size_t i = 0; i < s; ++i) res[i] = i;


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [Bug](partition-topn) fix partition-topn calculate partition input rows have error (#39100)

2024-08-10 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 0e9951f9cb4 [Bug](partition-topn) fix partition-topn calculate 
partition input rows have error (#39100)
0e9951f9cb4 is described below

commit 0e9951f9cb4a609cb88ef47b50334a724dd17cb6
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Sat Aug 10 18:31:37 2024 +0800

[Bug](partition-topn) fix partition-topn calculate partition input rows 
have error (#39100)

1. fix the _sorted_partition_input_rows calculate have error, it's
should only update the rows which have been emplace into hash table, not
include the rows which is pass through.

2. add some counter in profile could get some info of about input/output
rows have been do partition-topn.
---
 be/src/pipeline/exec/partition_sort_sink_operator.cpp   | 12 
 be/src/pipeline/exec/partition_sort_sink_operator.h |  3 ++-
 be/src/pipeline/exec/partition_sort_source_operator.cpp |  8 +---
 be/src/pipeline/exec/partition_sort_source_operator.h   |  6 +++---
 4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/be/src/pipeline/exec/partition_sort_sink_operator.cpp 
b/be/src/pipeline/exec/partition_sort_sink_operator.cpp
index 62dafd54849..404d9095f96 100644
--- a/be/src/pipeline/exec/partition_sort_sink_operator.cpp
+++ b/be/src/pipeline/exec/partition_sort_sink_operator.cpp
@@ -115,6 +115,8 @@ Status PartitionSortSinkLocalState::init(RuntimeState* 
state, LocalSinkStateInfo
 _selector_block_timer = ADD_TIMER(_profile, "SelectorBlockTime");
 _emplace_key_timer = ADD_TIMER(_profile, "EmplaceKeyTime");
 _passthrough_rows_counter = ADD_COUNTER(_profile, 
"PassThroughRowsCounter", TUnit::UNIT);
+_sorted_partition_input_rows_counter =
+ADD_COUNTER(_profile, "SortedPartitionInputRows", TUnit::UNIT);
 _partition_sort_info = std::make_shared(
 &_vsort_exec_exprs, p._limit, 0, p._pool, p._is_asc_order, 
p._nulls_first,
 p._child_x->row_desc(), state, _profile, p._has_global_limit, 
p._partition_inner_limit,
@@ -173,7 +175,6 @@ Status PartitionSortSinkOperatorX::sink(RuntimeState* 
state, vectorized::Block*
 SCOPED_TIMER(local_state.exec_time_counter());
 if (current_rows > 0) {
 COUNTER_UPDATE(local_state.rows_input_counter(), 
(int64_t)input_block->rows());
-local_state.child_input_rows = local_state.child_input_rows + 
current_rows;
 if (UNLIKELY(_partition_exprs_num == 0)) {
 if (UNLIKELY(local_state._value_places.empty())) {
 local_state._value_places.push_back(_pool->add(new 
PartitionBlocks(
@@ -185,10 +186,9 @@ Status PartitionSortSinkOperatorX::sink(RuntimeState* 
state, vectorized::Block*
 //if is TWO_PHASE_GLOBAL, must be sort all data thought partition 
num threshold have been exceeded.
 if (_topn_phase != TPartTopNPhase::TWO_PHASE_GLOBAL &&
 local_state._num_partition > 
config::partition_topn_partition_threshold &&
-local_state.child_input_rows < 1 * 
local_state._num_partition) {
+local_state._sorted_partition_input_rows < 1 * 
local_state._num_partition) {
 {
-COUNTER_UPDATE(local_state._passthrough_rows_counter,
-   (int64_t)input_block->rows());
+COUNTER_UPDATE(local_state._passthrough_rows_counter, 
(int64_t)current_rows);
 std::lock_guard 
lock(local_state._shared_state->buffer_mutex);
 
local_state._shared_state->blocks_buffer.push(std::move(*input_block));
 // buffer have data, source could read this.
@@ -198,6 +198,8 @@ Status PartitionSortSinkOperatorX::sink(RuntimeState* 
state, vectorized::Block*
 RETURN_IF_ERROR(_split_block_by_partition(input_block, 
local_state, eos));
 RETURN_IF_CANCELLED(state);
 input_block->clear_column_data();
+local_state._sorted_partition_input_rows =
+local_state._sorted_partition_input_rows + 
current_rows;
 }
 }
 }
@@ -220,6 +222,8 @@ Status PartitionSortSinkOperatorX::sink(RuntimeState* 
state, vectorized::Block*
 }
 
 COUNTER_SET(local_state._hash_table_size_counter, 
int64_t(local_state._num_partition));
+COUNTER_SET(local_state._sorted_partition_input_rows_counter,
+local_state._sorted_partition_input_rows);
 //so all data from child have sink completed
 {
 std::unique_lock 
lc(local_state._shared_state->sink_eos_lock);
diff --git a/be/sr

(doris-website) branch master updated: [doc](node) add some doc about intersect/except node (#975)

2024-08-10 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
 new 8b5b928d2ae [doc](node) add some doc about intersect/except node (#975)
8b5b928d2ae is described below

commit 8b5b928d2ae7f596f0672b2b60806727266b6243
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Sat Aug 10 18:28:23 2024 +0800

[doc](node) add some doc about intersect/except node (#975)
---
 docs/query/query-data/select.md| 41 ++
 .../current/query/query-data/select.md | 40 +
 .../version-2.0/query/query-data/select.md | 40 +
 .../version-2.1/query/query-data/select.md | 39 
 .../version-2.0/query/query-data/select.md | 41 ++
 .../version-2.1/query/query-data/select.md | 41 ++
 6 files changed, 242 insertions(+)

diff --git a/docs/query/query-data/select.md b/docs/query/query-data/select.md
index db96b2cfe88..acd8bfa1dcb 100644
--- a/docs/query/query-data/select.md
+++ b/docs/query/query-data/select.md
@@ -111,6 +111,31 @@ UNION [ALL| DISTINCT] SELECT ..
 
 By default, `UNION` removes duplicate rows from the result. The optional 
`DISTINCT` keyword has no effect beyond the default, as it also specifies 
duplicate row removal. Using the optional `ALL` keyword, no duplicate row 
removal occurs, and the result includes all matching rows from all `SELECT` 
statements.
 
+INTERSECT:
+
+```sql
+SELECT ...
+INTERSECT [DISTINCT] SELECT ..
+[INTERSECT [DISTINCT] SELECT ...]
+```
+
+`INTERSECT` is used to return the intersection of results from multiple 
`SELECT` statements, with duplicate results removed.
+The effect of `INTERSECT` is equivalent to `INTERSECT DISTINCT`. The `ALL` 
keyword is not supported.
+Each `SELECT` query must return the same number of columns, And when the 
column types are inconsistent, they will be `CAST` to the same type.
+
+EXCEPT/MINUS:
+
+```sql
+SELECT ...
+EXCEPT [DISTINCT] SELECT ..
+[EXCEPT [DISTINCT] SELECT ...]
+```
+
+The `EXCEPT` clause is used to return the complement between the results of 
multiple queries, meaning it returns the data from the left query that does not 
exist in the right query, with duplicates removed.
+`EXCEPT` is functionally equivalent to `MINUS`.
+The effect of `EXCEPT` is the same as `EXCEPT DISTINCT`. The `ALL` keyword is 
not supported.
+Each `SELECT` query must return the same number of columns, And when the 
column types are inconsistent, they will be `CAST` to the same type.
+
 WITH:
 
 To specify a common table expression, use a `WITH` clause with one or more 
comma-separated subclauses. Each subclause provides a subquery that generates a 
result set and associates a name with the subquery. The following example 
defines CTEs named `cte1` and `cte2` in the `WITH` clause, and refers to them 
in the top-level `SELECT `following the WITH clause.
@@ -206,6 +231,22 @@ UNION
 SELECT a FROM t2 WHERE a = 11 AND B = 2 ORDER by a LIMIT 10;
 ```
 
+- INTERSECT
+
+```sql
+SELECT a FROM t1 WHERE a = 10 AND B = 1 ORDER by a LIMIT 10
+INTERSECT
+SELECT a FROM t2 WHERE a = 11 AND B = 2 ORDER by a LIMIT 10;
+```
+
+- EXCEPT
+
+```sql
+SELECT a FROM t1 WHERE a = 10 AND B = 1 ORDER by a LIMIT 10
+EXCEPT
+SELECT a FROM t2 WHERE a = 11 AND B = 2 ORDER by a LIMIT 10;
+```
+
 - WITH clause
 
 ```
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query/query-data/select.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query/query-data/select.md
index d5e1d9fbac9..dcc543fb2b4 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query/query-data/select.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query/query-data/select.md
@@ -160,6 +160,30 @@ UNION [ALL| DISTINCT] SELECT ..
 
 默认行为 `UNION`是从结果中删除重复的行。可选 `DISTINCT` 关键字除了默认值之外没有任何效果,因为它还指定了重复行删除。使用可选 `ALL` 
关键字,不会发生重复行删除,结果包括所有 `SELECT` 语句中的 所有匹配行 
 
+INTERSECT 语法:
+
+```sql
+SELECT ...
+INTERSECT [DISTINCT] SELECT ..
+[INTERSECT [DISTINCT] SELECT ...]
+```
+
+`INTERSECT` 用于返回多个 `SELECT` 语句的结果之间的交集,并对结果进行去重。
+`INTERSECT` 效果等同于 `INTERSECT DISTINCT`。不支持 `ALL` 关键字。
+每条 `SELECT` 查询返回的列数必须相同,且当列类型不一致时,会 `CAST` 到相同类型。
+
+EXCEPT/MINUS 语法:
+
+```sql
+SELECT ...
+EXCEPT [DISTINCT] SELECT ..
+[EXCEPT [DISTINCT] SELECT ...]
+```
+`EXCEPT` 子句用于返回多个查询结果之间的补集,即返回左侧查询中在右侧查询中不存在的数据,并对结果集去重。
+`EXCEPT` 和 `MINUS` 功能对等。
+`EXCEPT` 效果等同于 `EXCEPT DISTINCT`。不支持 `ALL` 关键字。
+每条 `SELECT` 查询返回的列数必须相同,且当列类型不一致时,会 `CAST` 到相同类型。
+
 WITH 语句:
 
 要指定公用表表达式,请使用 `WITH` 
具有一个或多个逗号分隔子句的子句。每个子条款都提供一个子查询,用于生成结果集,并将名称与子查询相关联。下面的示例定义名为的 CTE `cte1` 和 
`cte2` 中 `WITH` 子句,并且是指在它们的顶层 `SELECT` 下面的 `WITH` 子句:
@@ -259,6 +283,22 @@ UNION
 SELECT a FROM t2 WHERE a = 11 AND B = 2 ORDER by a LIMIT 10;
 ```
 
+- INTERS

(doris) branch master updated: [Bug](rf) fix rf of in filter cast data as different class type maybe return wrong result (#39026)

2024-08-10 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 947397e9994 [Bug](rf) fix rf of in filter cast data as different class 
type maybe return wrong result (#39026)
947397e9994 is described below

commit 947397e999429104ce941e13df1d7369f4077160
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Sat Aug 10 18:27:31 2024 +0800

[Bug](rf) fix rf of in filter cast data as different class type maybe 
return wrong result (#39026)

two point have changed:
1. in batch_assign function:
const std::string& string_value = column.stringval();
if call **insert(&string_value)**, will cast as string_ref:
reinterpret_cast(data), this maybe error;
```
void insert(const void* data) override {
if (data == nullptr) {
_contains_null = true;
return;
}

const auto* value = reinterpret_cast(data);
std::string str_value(value->data, value->size);
_set.insert(str_value);
}
```

2. in batch_copy function, will cast void_value as T*
but the it->get_value() return is StringRef, so need change T as
StringRef
```
template 
void batch_copy(PInFilter* filter, HybridSetBase::IteratorBase* it,
void (*set_func)(PColumnValue*, const T*)) {
while (it->has_next()) {
const void* void_value = it->get_value();
auto origin_value = reinterpret_cast(void_value);
set_func(filter->add_values(), origin_value);
it->next();
}
}
```
---
 be/src/exprs/runtime_filter.cpp  | 12 
 regression-test/data/query_p0/join/test_runtimefilter_2.out  |  9 +
 .../suites/query_p0/join/test_runtimefilter_2.groovy | 11 +++
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/be/src/exprs/runtime_filter.cpp b/be/src/exprs/runtime_filter.cpp
index c6fd3338b14..5a241326f90 100644
--- a/be/src/exprs/runtime_filter.cpp
+++ b/be/src/exprs/runtime_filter.cpp
@@ -694,8 +694,10 @@ public:
 case TYPE_CHAR:
 case TYPE_STRING: {
 batch_assign(in_filter, [](std::shared_ptr& set, 
PColumnValue& column) {
-const auto& string_val_ref = column.stringval();
-set->insert(&string_val_ref);
+const std::string& string_value = column.stringval();
+// string_value is std::string, call insert(data, size) 
function in StringSet will not cast as StringRef
+// so could avoid some cast error at different class object.
+set->insert((void*)string_value.data(), string_value.size());
 });
 break;
 }
@@ -1630,8 +1632,10 @@ void IRuntimeFilter::to_protobuf(PInFilter* filter) {
 case TYPE_CHAR:
 case TYPE_VARCHAR:
 case TYPE_STRING: {
-batch_copy(filter, it, [](PColumnValue* column, const 
std::string* value) {
-column->set_stringval(*value);
+//const void* void_value = it->get_value();
+//Now the get_value return void* is StringRef
+batch_copy(filter, it, [](PColumnValue* column, const 
StringRef* value) {
+column->set_stringval(value->to_string());
 });
 return;
 }
diff --git a/regression-test/data/query_p0/join/test_runtimefilter_2.out 
b/regression-test/data/query_p0/join/test_runtimefilter_2.out
index d6cc7fc59a0..005406e6793 100644
--- a/regression-test/data/query_p0/join/test_runtimefilter_2.out
+++ b/regression-test/data/query_p0/join/test_runtimefilter_2.out
@@ -2,3 +2,12 @@
 -- !select_1 --
 aaa
 
+-- !select_2 --
+aaa
+
+-- !select_3 --
+BSDSAE1018 1   1   trueBSDSAE1018  1   truetrue
+
+-- !select_4 --
+2  3   BSDSAE1018
+
diff --git a/regression-test/suites/query_p0/join/test_runtimefilter_2.groovy 
b/regression-test/suites/query_p0/join/test_runtimefilter_2.groovy
index 6e6e57c6c2d..50a61a366b1 100644
--- a/regression-test/suites/query_p0/join/test_runtimefilter_2.groovy
+++ b/regression-test/suites/query_p0/join/test_runtimefilter_2.groovy
@@ -30,4 +30,15 @@
  qt_select_1 """
 select "aaa" FROM t_ods_tpisyncjpa4_2 tpisyncjpa4 
inner join ( SELECT USER_ID, MAX(INTERNAL_CODE) 
as INTERNAL_CODE FROM t_ods_tpisyncjpa4_2 WHERE 
STATE_ID = '1' GROUP BY USER_ID ) jpa4 on 
tpisyncjpa4.USER_ID = jpa4.USER_ID;
  """
+ sql """set runtime_filter_type='IN';"""
+ qt_select_2 

(doris) branch master updated: [Performance](opt) opt the order by performance in permutation (#38985)

2024-08-07 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new df5563971bc [Performance](opt) opt the order by performance in 
permutation (#38985)
df5563971bc is described below

commit df5563971bcc9476bcab97501c325027d6729f16
Author: HappenLee 
AuthorDate: Thu Aug 8 10:48:53 2024 +0800

[Performance](opt) opt the order by performance in permutation (#38985)

## Proposed changes

Before:
```
select l_quantity from lineitem order by l_quantity limit 1020;
+--+
| ReturnedRows |
+--+
| 1020 |
+--+
1 row in set (2 min 24.42 sec)

```

after:
```
mysql [tpch]>select l_quantity from lineitem order by l_quantity limit 
1020;
+--+
| ReturnedRows |
+--+
| 1020 |
+--+
1 row in set (28.42 sec)
```


---
 be/src/vec/columns/column_decimal.h  | 25 +
 be/src/vec/columns/column_string.cpp |  9 -
 be/src/vec/columns/column_vector.cpp |  3 ++-
 3 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/be/src/vec/columns/column_decimal.h 
b/be/src/vec/columns/column_decimal.h
index 24982b7504c..cc1661312a8 100644
--- a/be/src/vec/columns/column_decimal.h
+++ b/be/src/vec/columns/column_decimal.h
@@ -21,6 +21,7 @@
 #pragma once
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -269,14 +270,22 @@ protected:
 for (U i = 0; i < s; ++i) res[i] = i;
 
 auto sort_end = res.end();
-if (limit && limit < s) sort_end = res.begin() + limit;
-
-if (reverse)
-std::partial_sort(res.begin(), sort_end, res.end(),
-  [this](size_t a, size_t b) { return data[a] > 
data[b]; });
-else
-std::partial_sort(res.begin(), sort_end, res.end(),
-  [this](size_t a, size_t b) { return data[a] < 
data[b]; });
+if (limit && limit < s / 8.0) {
+sort_end = res.begin() + limit;
+if (reverse)
+std::partial_sort(res.begin(), sort_end, res.end(),
+  [this](size_t a, size_t b) { return data[a] 
> data[b]; });
+else
+std::partial_sort(res.begin(), sort_end, res.end(),
+  [this](size_t a, size_t b) { return data[a] 
< data[b]; });
+} else {
+if (reverse)
+pdqsort(res.begin(), res.end(),
+[this](size_t a, size_t b) { return data[a] > data[b]; 
});
+else
+pdqsort(res.begin(), res.end(),
+[this](size_t a, size_t b) { return data[a] < data[b]; 
});
+}
 }
 
 void ALWAYS_INLINE decimalv2_do_crc(size_t i, uint32_t& hash) const {
diff --git a/be/src/vec/columns/column_string.cpp 
b/be/src/vec/columns/column_string.cpp
index db0902d15a1..952a1a97915 100644
--- a/be/src/vec/columns/column_string.cpp
+++ b/be/src/vec/columns/column_string.cpp
@@ -483,9 +483,8 @@ void ColumnStr::get_permutation(bool reverse, size_t 
limit, int /*nan_directi
 res[i] = i;
 }
 
-if (limit >= s) {
-limit = 0;
-}
+// std::partial_sort need limit << s can get performance benefit
+if (limit > (s / 8.0)) limit = 0;
 
 if (limit) {
 if (reverse) {
@@ -495,9 +494,9 @@ void ColumnStr::get_permutation(bool reverse, size_t 
limit, int /*nan_directi
 }
 } else {
 if (reverse) {
-std::sort(res.begin(), res.end(), less(*this));
+pdqsort(res.begin(), res.end(), less(*this));
 } else {
-std::sort(res.begin(), res.end(), less(*this));
+pdqsort(res.begin(), res.end(), less(*this));
 }
 }
 }
diff --git a/be/src/vec/columns/column_vector.cpp 
b/be/src/vec/columns/column_vector.cpp
index ff7ab99d5de..f8d05c3d492 100644
--- a/be/src/vec/columns/column_vector.cpp
+++ b/be/src/vec/columns/column_vector.cpp
@@ -255,7 +255,8 @@ void ColumnVector::get_permutation(bool reverse, size_t 
limit, int nan_direct
 
 if (s == 0) return;
 
-if (limit >= s) limit = 0;
+// std::partial_sort need limit << s can get performance benefit
+if (limit > (s / 8.0)) limit = 0;
 
 if (limit) {
 for (size_t i = 0; i < s; ++i) res[i] = i;


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (d1bc5258661 -> 7c58c71d1c1)

2024-08-07 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from d1bc5258661 [Improment]Limit remote scan IO (#39012)
 add 7c58c71d1c1 [fix](function) stddev with DecimalV2 type will result in 
an error (#38731)

No new revisions were added by this update.

Summary of changes:
 .../aggregate_function_stddev.cpp  |   9 --
 .../aggregate_function_stddev.h| 105 +
 .../trees/expressions/functions/agg/Stddev.java|   5 +-
 .../expressions/functions/agg/StddevSamp.java  |   5 +-
 .../trees/expressions/functions/agg/Variance.java  |   5 +-
 .../expressions/functions/agg/VarianceSamp.java|   5 +-
 6 files changed, 8 insertions(+), 126 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (547ff3c8bb7 -> ccd00f8d2aa)

2024-08-07 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 547ff3c8bb7 [fix](parquet) disable parquet page index by default 
(#38691)
 add ccd00f8d2aa [Feature](function) support array_contains_all function 
(#38376)

No new revisions were added by this update.

Summary of changes:
 .../array/function_array_contains_all.cpp  | 267 +
 .../functions/array/function_array_register.cpp|   2 +
 .../doris/catalog/BuiltinScalarFunctions.java  |   2 +
 .../{ArraysOverlap.java => ArrayContainsAll.java}  |  18 +-
 .../expressions/visitor/ScalarFunctionVisitor.java |   5 +
 gensrc/script/doris_builtins_functions.py  |   1 +
 .../array_functions/test_array_contains_all.out}   | 233 +++---
 .../array_functions/test_array_contains_all.groovy | 125 ++
 8 files changed, 495 insertions(+), 158 deletions(-)
 create mode 100644 be/src/vec/functions/array/function_array_contains_all.cpp
 copy 
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/{ArraysOverlap.java
 => ArrayContainsAll.java} (78%)
 copy regression-test/data/{correctness_p0/test_and_or.out => 
nereids_p0/sql_functions/array_functions/test_array_contains_all.out} (50%)
 create mode 100644 
regression-test/suites/nereids_p0/sql_functions/array_functions/test_array_contains_all.groovy


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-3.0 updated: [Performance](opt) opt the order by performance in permutation (#39023)

2024-08-07 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3a96165e724 [Performance](opt) opt the order by performance in 
permutation (#39023)
3a96165e724 is described below

commit 3a96165e7242dc70e06759fabcb9d2463a4b91bb
Author: HappenLee 
AuthorDate: Wed Aug 7 22:46:02 2024 +0800

[Performance](opt) opt the order by performance in permutation (#39023)

## Proposed changes

cherry pick #38985


---
 be/src/vec/columns/column_decimal.h  | 25 +
 be/src/vec/columns/column_string.cpp |  9 -
 be/src/vec/columns/column_vector.cpp |  3 ++-
 3 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/be/src/vec/columns/column_decimal.h 
b/be/src/vec/columns/column_decimal.h
index 24982b7504c..cc1661312a8 100644
--- a/be/src/vec/columns/column_decimal.h
+++ b/be/src/vec/columns/column_decimal.h
@@ -21,6 +21,7 @@
 #pragma once
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -269,14 +270,22 @@ protected:
 for (U i = 0; i < s; ++i) res[i] = i;
 
 auto sort_end = res.end();
-if (limit && limit < s) sort_end = res.begin() + limit;
-
-if (reverse)
-std::partial_sort(res.begin(), sort_end, res.end(),
-  [this](size_t a, size_t b) { return data[a] > 
data[b]; });
-else
-std::partial_sort(res.begin(), sort_end, res.end(),
-  [this](size_t a, size_t b) { return data[a] < 
data[b]; });
+if (limit && limit < s / 8.0) {
+sort_end = res.begin() + limit;
+if (reverse)
+std::partial_sort(res.begin(), sort_end, res.end(),
+  [this](size_t a, size_t b) { return data[a] 
> data[b]; });
+else
+std::partial_sort(res.begin(), sort_end, res.end(),
+  [this](size_t a, size_t b) { return data[a] 
< data[b]; });
+} else {
+if (reverse)
+pdqsort(res.begin(), res.end(),
+[this](size_t a, size_t b) { return data[a] > data[b]; 
});
+else
+pdqsort(res.begin(), res.end(),
+[this](size_t a, size_t b) { return data[a] < data[b]; 
});
+}
 }
 
 void ALWAYS_INLINE decimalv2_do_crc(size_t i, uint32_t& hash) const {
diff --git a/be/src/vec/columns/column_string.cpp 
b/be/src/vec/columns/column_string.cpp
index 8e142208061..ba44d90a234 100644
--- a/be/src/vec/columns/column_string.cpp
+++ b/be/src/vec/columns/column_string.cpp
@@ -458,9 +458,8 @@ void ColumnStr::get_permutation(bool reverse, size_t 
limit, int /*nan_directi
 res[i] = i;
 }
 
-if (limit >= s) {
-limit = 0;
-}
+// std::partial_sort need limit << s can get performance benefit
+if (limit > (s / 8.0)) limit = 0;
 
 if (limit) {
 if (reverse) {
@@ -470,9 +469,9 @@ void ColumnStr::get_permutation(bool reverse, size_t 
limit, int /*nan_directi
 }
 } else {
 if (reverse) {
-std::sort(res.begin(), res.end(), less(*this));
+pdqsort(res.begin(), res.end(), less(*this));
 } else {
-std::sort(res.begin(), res.end(), less(*this));
+pdqsort(res.begin(), res.end(), less(*this));
 }
 }
 }
diff --git a/be/src/vec/columns/column_vector.cpp 
b/be/src/vec/columns/column_vector.cpp
index 4812befe3ed..049db3545d3 100644
--- a/be/src/vec/columns/column_vector.cpp
+++ b/be/src/vec/columns/column_vector.cpp
@@ -236,7 +236,8 @@ void ColumnVector::get_permutation(bool reverse, size_t 
limit, int nan_direct
 
 if (s == 0) return;
 
-if (limit >= s) limit = 0;
+// std::partial_sort need limit << s can get performance benefit
+if (limit > (s / 8.0)) limit = 0;
 
 if (limit) {
 for (size_t i = 0; i < s; ++i) res[i] = i;


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (91595fa340e -> 76c984a914c)

2024-08-04 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 91595fa340e [fix](testcase) add order by to fix unstable output of 
passwordLeaked (#38813)
 add 76c984a914c [fix](hist) Fix unstable result of aggregrate function 
hist (#38608)

No new revisions were added by this update.

Summary of changes:
 .../aggregate_function_histogram.h | 41 ++
 .../test_aggregate_all_functions2.out  | 72 
 .../test_aggregate_all_functions2.groovy   | 95 ++
 3 files changed, 193 insertions(+), 15 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-3.0 updated: [Bug](fix) fix coredump case in (not null, null) execpt (not null, not null) case (#38750)

2024-08-01 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d1d2e63068f [Bug](fix) fix coredump case in (not null, null) execpt 
(not null, not null) case (#38750)
d1d2e63068f is described below

commit d1d2e63068f7154a64d10944aec01c6e8c15969f
Author: HappenLee 
AuthorDate: Fri Aug 2 11:28:03 2024 +0800

[Bug](fix) fix coredump case in (not null, null) execpt (not null, not 
null) case (#38750)

cherry pick #38737
---
 be/src/pipeline/dependency.h| 5 +++--
 be/src/pipeline/exec/set_sink_operator.cpp  | 8 ++--
 regression-test/data/query_p0/except/test_query_except.out  | 2 ++
 regression-test/suites/query_p0/except/test_query_except.groovy | 3 +++
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/be/src/pipeline/dependency.h b/be/src/pipeline/dependency.h
index 8adc24d3b4e..f37766394e0 100644
--- a/be/src/pipeline/dependency.h
+++ b/be/src/pipeline/dependency.h
@@ -761,8 +761,9 @@ public:
 // (select 0) intersect (select null) the build side hash table should 
not
 // ignore null value.
 std::vector data_types;
-for (const auto& ctx : child_exprs_lists[0]) {
-data_types.emplace_back(build_not_ignore_null[0]
+for (int i = 0; i < child_exprs_lists[0].size(); i++) {
+const auto& ctx = child_exprs_lists[0][i];
+data_types.emplace_back(build_not_ignore_null[i]
 ? 
make_nullable(ctx->root()->data_type())
 : ctx->root()->data_type());
 }
diff --git a/be/src/pipeline/exec/set_sink_operator.cpp 
b/be/src/pipeline/exec/set_sink_operator.cpp
index 5fc38f3ca70..6c76f9a57a3 100644
--- a/be/src/pipeline/exec/set_sink_operator.cpp
+++ b/be/src/pipeline/exec/set_sink_operator.cpp
@@ -130,9 +130,13 @@ Status 
SetSinkOperatorX::_extract_build_column(
 block.get_by_position(result_col_id).column =
 
block.get_by_position(result_col_id).column->convert_to_full_column_if_const();
 }
+// Do make nullable should not change the origin column and type in 
origin block
+// which may cause coredump problem
 if (local_state._shared_state->build_not_ignore_null[i]) {
-block.get_by_position(result_col_id).column =
-make_nullable(block.get_by_position(result_col_id).column);
+auto column_ptr = 
make_nullable(block.get_by_position(result_col_id).column, false);
+block.insert(
+{column_ptr, 
make_nullable(block.get_by_position(result_col_id).type), ""});
+result_col_id = block.columns() - 1;
 }
 
 raw_ptrs[i] = block.get_by_position(result_col_id).column.get();
diff --git a/regression-test/data/query_p0/except/test_query_except.out 
b/regression-test/data/query_p0/except/test_query_except.out
index 7aea45fde18..763cb44c7f8 100644
--- a/regression-test/data/query_p0/except/test_query_except.out
+++ b/regression-test/data/query_p0/except/test_query_except.out
@@ -14,3 +14,5 @@
 14
 15
 
+-- !select_except2 --
+
diff --git a/regression-test/suites/query_p0/except/test_query_except.groovy 
b/regression-test/suites/query_p0/except/test_query_except.groovy
index a13fd76e7a9..410e24f89b9 100644
--- a/regression-test/suites/query_p0/except/test_query_except.groovy
+++ b/regression-test/suites/query_p0/except/test_query_except.groovy
@@ -22,4 +22,7 @@ suite("test_query_except", "arrow_flight_sql") {
   SELECT * FROM (SELECT k1 FROM test_query_db.baseall
  EXCEPT SELECT k1 FROM test_query_db.test) 
a ORDER BY k1
   """
+qt_select_except2 """
+ select not_null_k1, not_null_k1 from (SELECT 
non_nullable(k1) as not_null_k1 FROM test_query_db.baseall where k1 is not 
null) b1 except select non_nullable(k1), k1 from test_query_db.baseall where k1 
is not null order by 1, 2;
+ """
 }


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-3.0 updated: [Bug](fix) fix ubsan use int32_t pointer access bool value (#38622)

2024-08-01 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ece72611179 [Bug](fix) fix ubsan use int32_t pointer access bool value 
(#38622)
ece72611179 is described below

commit ece72611179b17ac1eff7769123f38b54cc69ad5
Author: HappenLee 
AuthorDate: Thu Aug 1 21:33:35 2024 +0800

[Bug](fix) fix ubsan use int32_t pointer access bool value (#38622)

Issue Number: close #38617
---
 be/src/exprs/runtime_filter.cpp|  4 +-
 .../query_p0/join/test_runtime_filter_boolean.out  |  9 +++
 .../join/test_runtime_filter_boolean.groovy| 64 ++
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/be/src/exprs/runtime_filter.cpp b/be/src/exprs/runtime_filter.cpp
index ead2af3efb5..d259d694fdc 100644
--- a/be/src/exprs/runtime_filter.cpp
+++ b/be/src/exprs/runtime_filter.cpp
@@ -1691,8 +1691,8 @@ void IRuntimeFilter::to_protobuf(PMinMaxFilter* filter) {
 
 switch (_wrapper->column_type()) {
 case TYPE_BOOLEAN: {
-filter->mutable_min_val()->set_boolval(*reinterpret_cast(min_data));
-filter->mutable_max_val()->set_boolval(*reinterpret_cast(max_data));
+filter->mutable_min_val()->set_boolval(*reinterpret_cast(min_data));
+filter->mutable_max_val()->set_boolval(*reinterpret_cast(max_data));
 return;
 }
 case TYPE_TINYINT: {
diff --git a/regression-test/data/query_p0/join/test_runtime_filter_boolean.out 
b/regression-test/data/query_p0/join/test_runtime_filter_boolean.out
new file mode 100644
index 000..5858b696341
--- /dev/null
+++ b/regression-test/data/query_p0/join/test_runtime_filter_boolean.out
@@ -0,0 +1,9 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !rf_bool --
+10 11
+10 11
+10 11
+11011
+11011
+11011
+
diff --git 
a/regression-test/suites/query_p0/join/test_runtime_filter_boolean.groovy 
b/regression-test/suites/query_p0/join/test_runtime_filter_boolean.groovy
new file mode 100644
index 000..e241909e79c
--- /dev/null
+++ b/regression-test/suites/query_p0/join/test_runtime_filter_boolean.groovy
@@ -0,0 +1,64 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_runtime_filter_boolean", "query_p0") {
+sql "drop table if exists test_runtime_filter_boolean0;"
+sql """ create table test_runtime_filter_boolean0(k1 int, v1 boolean)
+DUPLICATE KEY(`k1`)
+DISTRIBUTED BY HASH(`k1`) BUCKETS 1
+properties("replication_num" = "1"); """
+
+sql """insert into test_runtime_filter_boolean0 values 
+(10, false),
+(10, false),
+(10, false),
+(110, false),
+(110, false),
+(110, false);"""
+
+sql "drop table if exists test_runtime_filter_boolean1;"
+sql """ create table test_runtime_filter_boolean1(k1 int, v1 boolean)
+DUPLICATE KEY(`k1`)
+DISTRIBUTED BY HASH(`k1`) BUCKETS 1
+properties("replication_num" = "1"); """
+
+sql """insert into test_runtime_filter_boolean1 values 
+(11, false);"""
+
+
+qt_rf_bool """
+select
+t0.k1, t1.k1
+from
+(
+select
+k1,
+v1
+from
+test_runtime_filter_boolean0
+) t0
+inner join [shuffle] (
+select
+k1,
+v1
+from
+test_runtime_filter_boolean1
+) t1 on t0.v1 = t1.v1
+order by
+1,2;
+"""
+}


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris-website) branch master updated: [Refactor](doc) Add debug functions in dev branch (#922)

2024-07-31 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
 new 8526a720d88 [Refactor](doc) Add debug functions in dev branch (#922)
8526a720d88 is described below

commit 8526a720d887db47f07d9a61848fa07c2fea2e17
Author: zclllhhjj 
AuthorDate: Wed Jul 31 21:13:15 2024 +0800

[Refactor](doc) Add debug functions in dev branch (#922)

debug functions should only appear in dev docs. so no need add to
others.
there's no code related.
---
 .../sleep.md => debug-functions/ignore.md} | 39 +-
 .../sql-functions/debug-functions/non-nullable.md  | 60 ++
 .../sleep.md => debug-functions/nullable.md}   | 36 -
 .../{string-functions => debug-functions}/sleep.md |  9 ++--
 .../docusaurus-plugin-content-docs/current.json|  4 ++
 .../sleep.md => debug-functions/ignore.md} | 37 -
 .../sql-functions/debug-functions/non-nullable.md  | 60 ++
 .../sleep.md => debug-functions/nullable.md}   | 34 +++-
 .../{string-functions => debug-functions}/sleep.md |  7 ++-
 sidebars.json  | 11 +++-
 10 files changed, 233 insertions(+), 64 deletions(-)

diff --git a/docs/sql-manual/sql-functions/string-functions/sleep.md 
b/docs/sql-manual/sql-functions/debug-functions/ignore.md
similarity index 51%
copy from docs/sql-manual/sql-functions/string-functions/sleep.md
copy to docs/sql-manual/sql-functions/debug-functions/ignore.md
index 4aa6b16beb8..ee86d5d4061 100644
--- a/docs/sql-manual/sql-functions/string-functions/sleep.md
+++ b/docs/sql-manual/sql-functions/debug-functions/ignore.md
@@ -1,6 +1,6 @@
 ---
 {
-"title": "SLEEP",
+"title": "IGNORE",
 "language": "en"
 }
 ---
@@ -24,25 +24,36 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## sleep
-### Description
+## ignore
+### description
+
+:::tip
+For developer debugging only, do not call this function manually in the 
production environment.
+:::
+
  Syntax
 
-`BOOLEAN sleep(INT num)`
+`BOOLEAN ignore(T expr...)`
 
-Make the thread sleep for num seconds.
+Returns `false` for any input.
 
 ### example
 
+```sql
+mysql> select m1, ignore(m1,m2,m1+m2,1) from t_nullable;
++--+--+
+| m1   | ignore(CAST(`m1` AS BIGINT), CAST(`m2` AS BIGINT), (`m1` + `m2`), 1) |
++--+--+
+|1 |0 |
++--+--+
+
+mysql> select ignore();
++--+
+| ignore() |
++--+
+|0 |
++--+
 ```
-mysql> select sleep(10);
-+---+
-| sleep(10) |
-+---+
-| 1 |
-+---+
-1 row in set (10.01 sec)
 
-```
 ### keywords
-sleep
+ignore
\ No newline at end of file
diff --git a/docs/sql-manual/sql-functions/debug-functions/non-nullable.md 
b/docs/sql-manual/sql-functions/debug-functions/non-nullable.md
new file mode 100644
index 000..e3adb890d30
--- /dev/null
+++ b/docs/sql-manual/sql-functions/debug-functions/non-nullable.md
@@ -0,0 +1,60 @@
+---
+{
+"title": "NON_NULLABLE",
+"language": "en"
+}
+---
+
+
+
+## non_nullable
+### description
+
+:::tip
+For developer debugging only, do not call this function manually in the 
production environment.
+:::
+
+ Syntax
+
+`T non_nullable(T expr)`
+
+Raise an error if `expr` is of not nullable, or is of nullable and contains a 
`NULL` value. Otherwise, returns the non-nullable data column of the input 
column.
+
+### example
+
+```sql
+mysql> select k1, non_nullable(k1) from test_nullable_functions order by k1;
++--+--+
+| k1   | non_nullable(k1) |
++--+--+
+|1 |1 |
+|2 |2 |
+|3 |3 |
+|4 |4 |
++--+--+
+
+mysql> select k1, non_nullable(k1) from test_nullable_functions order by k1;
+ERROR 1105 (HY000): errCode = 2, detailMessage = [CANCELLED]There's NULL value 
in column Nullable(Int32) which is illegal for non_nullable
+mysql> select non_nullable(1);
+ERROR 1105 (HY000): errCode = 2, detailMessage = [CANCELLED]Try to use 
originally non-nullable column Int8 in nullable's non-nullable convertion.
+```
+
+### keywords
+non_nullable
\ No newline at end of file
diff --git a/docs/sql-manual/sql-functions/string-functions/sleep.md 
b/docs/sql-manual/sql-functions/debug-functions/nullable.md
similarity index 54%
copy from docs/sql-ma

(doris) branch master updated: [env](patch) use patch to opt bitshuffle (#38378)

2024-07-30 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new a2d0d103631 [env](patch) use patch to opt bitshuffle (#38378)
a2d0d103631 is described below

commit a2d0d103631e5ebbd13ffe776cbd9fb72dba6dc8
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Wed Jul 31 11:19:20 2024 +0800

[env](patch) use patch to opt bitshuffle (#38378)
---
 .../olap/rowset/segment_v2/bitshuffle_wrapper.cpp  |  13 ++
 thirdparty/build-thirdparty.sh |   7 +-
 thirdparty/download-thirdparty.sh  |  13 ++
 thirdparty/patches/bitshuffle-0.5.1.patch  | 225 +
 4 files changed, 256 insertions(+), 2 deletions(-)

diff --git a/be/src/olap/rowset/segment_v2/bitshuffle_wrapper.cpp 
b/be/src/olap/rowset/segment_v2/bitshuffle_wrapper.cpp
index 7ad20f210c2..2dca068ac6f 100644
--- a/be/src/olap/rowset/segment_v2/bitshuffle_wrapper.cpp
+++ b/be/src/olap/rowset/segment_v2/bitshuffle_wrapper.cpp
@@ -34,6 +34,15 @@
 #undef bshuf_compress_lz4
 #undef bshuf_decompress_lz4
 
+#undef BITSHUFFLE_H
+#define bshuf_compress_lz4_bound bshuf_compress_lz4_bound_neon
+#define bshuf_compress_lz4 bshuf_compress_lz4_neon
+#define bshuf_decompress_lz4 bshuf_decompress_lz4_neon
+#include  // NOLINT(*)
+#undef bshuf_compress_lz4_bound
+#undef bshuf_compress_lz4
+#undef bshuf_decompress_lz4
+
 using base::CPU;
 
 namespace doris {
@@ -63,6 +72,10 @@ __attribute__((constructor)) void 
SelectBitshuffleFunctions() {
 g_bshuf_compress_lz4 = bshuf_compress_lz4;
 g_bshuf_decompress_lz4 = bshuf_decompress_lz4;
 }
+#elif defined(__ARM_NEON) && defined(__aarch64__)
+g_bshuf_compress_lz4_bound = bshuf_compress_lz4_bound_neon;
+g_bshuf_compress_lz4 = bshuf_compress_lz4_neon;
+g_bshuf_decompress_lz4 = bshuf_decompress_lz4_neon;
 #else
 g_bshuf_compress_lz4_bound = bshuf_compress_lz4_bound;
 g_bshuf_compress_lz4 = bshuf_compress_lz4;
diff --git a/thirdparty/build-thirdparty.sh b/thirdparty/build-thirdparty.sh
index cf0d7576d0d..07acd21e44d 100755
--- a/thirdparty/build-thirdparty.sh
+++ b/thirdparty/build-thirdparty.sh
@@ -1141,7 +1141,7 @@ build_bitshuffle() {
 MACHINE_TYPE="$(uname -m)"
 # Becuase aarch64 don't support avx2, disable it.
 if [[ "${MACHINE_TYPE}" == "aarch64" || "${MACHINE_TYPE}" == 'arm64' ]]; 
then
-arches=('default')
+arches=('default' 'neon')
 fi
 
 to_link=""
@@ -1153,6 +1153,9 @@ build_bitshuffle() {
 if [[ "${arch}" == "avx512" ]]; then
 arch_flag="-mavx512bw -mavx512f"
 fi
+if [[ "${arch}" == "neon" ]]; then
+arch_flag="-march=armv8-a+crc"
+fi
 tmp_obj="bitshuffle_${arch}_tmp.o"
 dst_obj="bitshuffle_${arch}.o"
 "${CC}" ${EXTRA_CFLAGS:+${EXTRA_CFLAGS}} ${arch_flag:+${arch_flag}} 
-std=c99 "-I${PREFIX}/include/lz4" -O3 -DNDEBUG -c \
@@ -1162,7 +1165,7 @@ build_bitshuffle() {
 # Merge the object files together to produce a combined .o file.
 "${ld}" -r -o "${tmp_obj}" bitshuffle_core.o bitshuffle.o iochain.o
 # For the AVX2 symbols, suffix them.
-if [[ "${arch}" == "avx2" ]] || [[ "${arch}" == "avx512" ]]; then
+if [[ "${arch}" == "avx2" ]] || [[ "${arch}" == "avx512" ]] || [[ 
"${arch}" == "neon" ]]; then
 local nm="${DORIS_BIN_UTILS}/nm"
 local objcopy="${DORIS_BIN_UTILS}/objcopy"
 
diff --git a/thirdparty/download-thirdparty.sh 
b/thirdparty/download-thirdparty.sh
index ca26f448970..2f2f35734c5 100755
--- a/thirdparty/download-thirdparty.sh
+++ b/thirdparty/download-thirdparty.sh
@@ -502,4 +502,17 @@ if [[ " ${TP_ARCHIVES[*]} " =~ " KRB5 " ]]; then
 echo "Finished patching ${KRB5_SOURCE}"
 fi
 
+# patch bitshuffle
+if [[ " ${TP_ARCHIVES[*]} " =~ " BITSHUFFLE " ]]; then
+if [[ "${BITSHUFFLE_SOURCE}" = "bitshuffle-0.5.1" ]]; then
+cd "${TP_SOURCE_DIR}/${BITSHUFFLE_SOURCE}"
+if [[ ! -f "${PATCHED_MARK}" ]]; then
+patch -p1 <"${TP_PATCH_DIR}/bitshuffle-0.5.1.patch"
+touch "${PATCHED_MARK}"
+fi
+cd -
+fi
+echo "Finished patching ${BITSHUFFLE_SOURCE}"
+fi
+
 # vim: ts=4 sw=4 ts=4 tw=100:
diff --git a/thirdparty/patches/bitshuffle-0.5.1.patch 
b/thirdparty/patches/bitshuffle-0.5.1.patch
new file mode 100644
index 000..b7d7fcd

(doris-website) branch master updated: [enhance](function) add more explanation about regexp functions (#916)

2024-07-29 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
 new 47cad09b04f [enhance](function) add more explanation about regexp 
functions (#916)
47cad09b04f is described below

commit 47cad09b04f2741d143e5f4745bdfa62d86affbe
Author: zclllhhjj 
AuthorDate: Tue Jul 30 11:15:29 2024 +0800

[enhance](function) add more explanation about regexp functions (#916)

1. add explanation about Chinese character matching
2. remove all `not_regexp` docs. it's not a function.
---
 .../string-functions/regexp/not-regexp.md  | 56 --
 .../string-functions/regexp/regexp-extract-all.md  | 11 -
 .../string-functions/regexp/regexp-extract.md  | 13 -
 .../string-functions/regexp/regexp-replace-one.md  | 13 -
 .../string-functions/regexp/regexp-replace.md  | 13 -
 .../string-functions/regexp/regexp.md  | 15 --
 .../string-functions/regexp/not-regexp.md  | 56 --
 .../string-functions/regexp/regexp-extract-all.md  | 11 -
 .../string-functions/regexp/regexp-extract.md  | 14 +-
 .../string-functions/regexp/regexp-replace-one.md  | 13 -
 .../string-functions/regexp/regexp-replace.md  | 13 -
 .../string-functions/regexp/regexp.md  | 15 --
 .../string-functions/regexp/not_regexp.md  | 56 --
 .../string-functions/regexp/regexp.md  | 17 +--
 .../string-functions/regexp/regexp_extract.md  | 16 +--
 .../string-functions/regexp/regexp_extract_all.md  | 13 -
 .../string-functions/regexp/regexp_replace.md  | 15 --
 .../string-functions/regexp/regexp_replace_one.md  | 15 --
 .../string-functions/regexp/not-regexp.md  | 56 --
 .../string-functions/regexp/regexp-extract-all.md  | 11 -
 .../string-functions/regexp/regexp-extract.md  | 14 +-
 .../string-functions/regexp/regexp-replace-one.md  | 13 -
 .../string-functions/regexp/regexp-replace.md  | 13 -
 .../string-functions/regexp/regexp.md  | 15 --
 .../string-functions/regexp/not-regexp.md  | 56 --
 .../string-functions/regexp/regexp-extract-all.md  | 11 -
 .../string-functions/regexp/regexp-extract.md  | 14 +-
 .../string-functions/regexp/regexp-replace-one.md  | 13 -
 .../string-functions/regexp/regexp-replace.md  | 13 -
 .../string-functions/regexp/regexp.md  | 15 --
 .../string-functions/regexp/not-regexp.md  | 56 --
 .../string-functions/regexp/regexp-extract-all.md  | 11 -
 .../string-functions/regexp/regexp-extract.md  | 14 +-
 .../string-functions/regexp/regexp-replace-one.md  | 13 -
 .../string-functions/regexp/regexp-replace.md  | 13 -
 .../string-functions/regexp/regexp.md  | 15 --
 sidebars.json  |  3 +-
 .../string-functions/regexp/not_regexp.md  | 56 --
 .../string-functions/regexp/regexp.md  | 17 +--
 .../string-functions/regexp/regexp_extract.md  | 15 --
 .../string-functions/regexp/regexp_extract_all.md  | 13 -
 .../string-functions/regexp/regexp_replace.md  | 15 --
 .../string-functions/regexp/regexp_replace_one.md  | 15 --
 .../string-functions/regexp/not-regexp.md  | 56 --
 .../string-functions/regexp/regexp-extract-all.md  | 11 -
 .../string-functions/regexp/regexp-extract.md  | 13 -
 .../string-functions/regexp/regexp-replace-one.md  | 13 -
 .../string-functions/regexp/regexp-replace.md  | 13 -
 .../string-functions/regexp/regexp.md  | 15 --
 .../string-functions/regexp/not-regexp.md  | 56 --
 .../string-functions/regexp/regexp-extract-all.md  | 11 -
 .../string-functions/regexp/regexp-extract.md  | 13 -
 .../string-functions/regexp/regexp-replace-one.md  | 13 -
 .../string-functions/regexp/regexp-replace.md  | 13 -
 .../string-functions/regexp/regexp.md  | 15 --
 .../string-functions/regexp/not-regexp.md  | 56 --
 .../string-functions/regexp/regexp-extract-all.md  | 11 -
 .../string-functions/regexp/regexp-extract.md  | 13 -
 .../string-functions/regexp/regexp-replace-one.md  | 13 -
 .../string-functions/regexp/regexp-replace.md  | 13 -
 .../string-functions/regexp/regexp.md  | 15 --
 versioned_sidebars/version-1.2-sidebars.json   |  3 +-
 versioned_sidebars/version-2.0-sidebars.json   |  3 +-
 versioned_sidebars/version-2.1-sidebars.json   |  3 +-
 versioned_sidebars/version-3.0-sidebars.json   |  3 +-
 65 files changed, 570 inser

(doris) branch master updated: [opt](parse) optimize parsing string to datetime (#38385)

2024-07-29 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new bb5b05b1f44 [opt](parse) optimize parsing string to datetime (#38385)
bb5b05b1f44 is described below

commit bb5b05b1f447bddf21fe1b4ed8c8b720aaa8291f
Author: zclllhhjj 
AuthorDate: Mon Jul 29 23:52:41 2024 +0800

[opt](parse) optimize parsing string to datetime (#38385)
---
 be/src/vec/functions/function_cast.h   | 81 ++
 be/src/vec/runtime/vdatetime_value.cpp | 29 +++-
 2 files changed, 51 insertions(+), 59 deletions(-)

diff --git a/be/src/vec/functions/function_cast.h 
b/be/src/vec/functions/function_cast.h
index af2fadc84c2..5f3968e512b 100644
--- a/be/src/vec/functions/function_cast.h
+++ b/be/src/vec/functions/function_cast.h
@@ -978,9 +978,9 @@ struct NameToDateTime {
 static constexpr auto name = "toDateTime";
 };
 
-template 
+template 
 bool try_parse_impl(typename DataType::FieldType& x, ReadBuffer& rb, 
FunctionContext* context,
-Additions additions [[maybe_unused]] = Additions()) {
+UInt32 scale [[maybe_unused]] = 0) {
 if constexpr (IsDateTimeType) {
 return try_read_datetime_text(x, rb, context->state()->timezone_obj());
 }
@@ -994,7 +994,6 @@ bool try_parse_impl(typename DataType::FieldType& x, 
ReadBuffer& rb, FunctionCon
 }
 
 if constexpr (IsDateTimeV2Type) {
-UInt32 scale = additions;
 return try_read_datetime_v2_text(x, rb, 
context->state()->timezone_obj(), scale);
 }
 
@@ -1032,7 +1031,6 @@ bool try_parse_impl(typename DataType::FieldType& x, 
ReadBuffer& rb, FunctionCon
 
 template 
 StringParser::ParseResult try_parse_decimal_impl(typename DataType::FieldType& 
x, ReadBuffer& rb,
- const cctz::time_zone& 
local_time_zone,
  Additions additions
  [[maybe_unused]] = 
Additions()) {
 if constexpr (IsDataTypeDecimalV2) {
@@ -1461,15 +1459,9 @@ private:
 const char* name;
 };
 
-struct NameCast {
-static constexpr auto name = "CAST";
-};
-
-template 
-struct ConvertThroughParsing {
-static_assert(std::is_same_v,
-  "ConvertThroughParsing is only applicable for String or 
FixedString data types");
-
+// always from DataTypeString
+template 
+struct StringParsing {
 using ToFieldType = typename ToDataType::FieldType;
 
 static bool is_all_read(ReadBuffer& in) { return in.eof(); }
@@ -1482,48 +1474,38 @@ struct ConvertThroughParsing {
 ColumnDecimal, 
ColumnVector>;
 
 const IColumn* col_from = 
block.get_by_position(arguments[0]).column.get();
-const ColumnString* col_from_string = 
check_and_get_column(col_from);
+const auto* col_from_string = 
check_and_get_column(col_from);
 
-if (std::is_same_v && !col_from_string) {
+if (!col_from_string) {
 return Status::RuntimeError("Illegal column {} of first argument 
of function {}",
 col_from->get_name(), Name::name);
 }
 
-size_t size = input_rows_count;
+size_t row = input_rows_count;
 typename ColVecTo::MutablePtr col_to = nullptr;
 
 if constexpr (IsDataTypeDecimal) {
 UInt32 scale = ((PrecisionScaleArg)additions).scale;
 ToDataType::check_type_scale(scale);
-col_to = ColVecTo::create(size, scale);
+col_to = ColVecTo::create(row, scale);
 } else {
-col_to = ColVecTo::create(size);
+col_to = ColVecTo::create(row);
 }
 
 typename ColVecTo::Container& vec_to = col_to->get_data();
 
 ColumnUInt8::MutablePtr col_null_map_to;
 ColumnUInt8::Container* vec_null_map_to [[maybe_unused]] = nullptr;
-col_null_map_to = ColumnUInt8::create(size);
+col_null_map_to = ColumnUInt8::create(row);
 vec_null_map_to = &col_null_map_to->get_data();
 
-const ColumnString::Chars* chars = nullptr;
-const IColumn::Offsets* offsets = nullptr;
-size_t fixed_string_size = 0;
-
-if constexpr (std::is_same_v) {
-chars = &col_from_string->get_chars();
-offsets = &col_from_string->get_offsets();
-}
+const ColumnString::Chars* chars = &col_from_string->get_chars();
+const IColumn::Offsets* offsets = &col_from_string->get_offsets();
 
 size_t current_offset = 0;
-for (size_t i = 0; i < size; ++i) {
-size_t next_offset = std::is_same_v
-

(doris-website) branch master updated: [docs](function) support array_contains_all function (#914)

2024-07-28 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
 new 259468b6d29 [docs](function) support array_contains_all function (#914)
259468b6d29 is described below

commit 259468b6d29dd3c8b877b1333af9f921336b94b3
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Mon Jul 29 10:43:00 2024 +0800

[docs](function) support array_contains_all function (#914)
---
 .../array-functions/array-contains_all.md  | 84 ++
 .../array-functions/array-contains_all.md  | 83 +
 sidebars.json  |  1 +
 3 files changed, 168 insertions(+)

diff --git 
a/docs/sql-manual/sql-functions/array-functions/array-contains_all.md 
b/docs/sql-manual/sql-functions/array-functions/array-contains_all.md
new file mode 100644
index 000..db49aa496b4
--- /dev/null
+++ b/docs/sql-manual/sql-functions/array-functions/array-contains_all.md
@@ -0,0 +1,84 @@
+---
+{
+"title": "ARRAY_CONTAINS_ALL",
+"language": "en"
+}
+---
+
+
+
+## array_contains_all
+
+array_contains_all
+
+### description
+
+ Syntax
+
+`BOOLEAN array_contains_all(ARRAY array1, ARRAY array2)`
+
+check whether array1 contains the subarray array2, ensuring that the element 
order is exactly the same. The return results are as follows:
+
+```
+1- array1 contains subarray array2;
+0- array1 does not contain subarray array2;
+NULL - array1 or array2 is NULL.
+```
+
+### example
+
+```
+mysql [(none)]>select array_contains_all([1,2,3,4], [1,2,4]);
++-+
+| array_contains_all([1, 2, 3, 4], [1, 2, 4]) |
++-+
+|   0 |
++-+
+1 row in set (0.01 sec)
+
+mysql [(none)]>select array_contains_all([1,2,3,4], [1,2]);
++--+
+| array_contains_all([1, 2, 3, 4], [1, 2]) |
++--+
+|1 |
++--+
+1 row in set (0.01 sec)
+
+mysql [(none)]>select array_contains_all([1,2,3,4], []);
++--+
+| array_contains_all([1, 2, 3, 4], cast([] as ARRAY)) |
++--+
+|1 |
++--+
+1 row in set (0.01 sec)
+
+mysql [(none)]>select array_contains_all([1,2,3,4], NULL);
+++
+| array_contains_all([1, 2, 3, 4], NULL) |
+++
+|   NULL |
+++
+1 row in set (0.00 sec)
+```
+
+### keywords
+
+ARRAY,CONTAIN,ARRAY_CONTAINS_ALL
+
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/array-functions/array-contains_all.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/array-functions/array-contains_all.md
new file mode 100644
index 000..7ed794ab296
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/array-functions/array-contains_all.md
@@ -0,0 +1,83 @@
+---
+{
+"title": "ARRAY_CONTAINS_ALL",
+"language": "zh-CN"
+}
+---
+
+
+
+## array_contains_all
+
+array_contains_all
+
+### description
+
+ Syntax
+
+`BOOLEAN array_contains_all(ARRAY array1, ARRAY array2)`
+
+判断数组array1中是否包含子数组array2,且需要保证元素顺序完全一致。返回结果如下:
+
+```
+1- array1中存在子数组array2;
+0- array1中不存在数组array2;
+NULL - array1或array2为NULL。
+```
+
+### example
+
+```
+mysql [(none)]>select array_contains_all([1,2,3,4], [1,2,4]);
++-+
+| array_contains_all([1, 2, 3, 4], [1, 2, 4]) |
++-+
+|   0 |
++-+
+1 row in set (0.01 sec)
+
+mysql [(none)]>select array_contains_all([1,2,3,4], [1,2]);
++--+
+| array_contains_all([1, 2, 3, 4], [1, 2]) |
++--+
+|1 |
++--+
+1 row in set (0.01 sec)
+
+mysql [(none)]>select array_contains_all([1,2,3,4], []);
++--+
+| array_contains_all([1, 2, 3, 4], cast([] as ARRAY)) |
++--+
+|  

(doris) branch branch-2.0 updated: [opt](MultiCast) Avoid copying while holding a lock (#38348)

2024-07-28 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.0 by this push:
 new 535ae2c9806 [opt](MultiCast) Avoid copying while holding a lock 
(#38348)
535ae2c9806 is described below

commit 535ae2c9806309f5168056c4ddf215e566b7e7fd
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Mon Jul 29 10:40:16 2024 +0800

[opt](MultiCast) Avoid copying while holding a lock (#38348)

pick https://github.com/apache/doris/pull/37462
The difference is quite large, so it can't be directly picked.
---
 be/src/pipeline/exec/multi_cast_data_streamer.cpp | 62 +--
 be/src/pipeline/exec/multi_cast_data_streamer.h   |  9 +++-
 2 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/be/src/pipeline/exec/multi_cast_data_streamer.cpp 
b/be/src/pipeline/exec/multi_cast_data_streamer.cpp
index 373a896852e..75dfe293500 100644
--- a/be/src/pipeline/exec/multi_cast_data_streamer.cpp
+++ b/be/src/pipeline/exec/multi_cast_data_streamer.cpp
@@ -21,31 +21,41 @@
 
 namespace doris::pipeline {
 
-MultiCastBlock::MultiCastBlock(vectorized::Block* block, int used_count, 
size_t mem_size)
-: _used_count(used_count), _mem_size(mem_size) {
+MultiCastBlock::MultiCastBlock(vectorized::Block* block, int used_count, int 
un_finish_copy,
+   size_t mem_size)
+: _used_count(used_count), _un_finish_copy(un_finish_copy), 
_mem_size(mem_size) {
 _block = 
vectorized::Block::create_unique(block->get_columns_with_type_and_name());
 block->clear();
 }
 
 Status MultiCastDataStreamer::pull(int sender_idx, doris::vectorized::Block* 
block, bool* eos) {
-std::lock_guard l(_mutex);
-auto& pos_to_pull = _sender_pos_to_read[sender_idx];
-if (pos_to_pull != _multi_cast_blocks.end()) {
-if (pos_to_pull->_used_count == 1) {
-DCHECK(pos_to_pull == _multi_cast_blocks.begin());
-pos_to_pull->_block->swap(*block);
+int* un_finish_copy = nullptr;
+int use_count = 0;
+{
+std::lock_guard l(_mutex);
+auto& pos_to_pull = _sender_pos_to_read[sender_idx];
+const auto end = _multi_cast_blocks.end();
+if (pos_to_pull != end) {
+*block = *pos_to_pull->_block;
 
 _cumulative_mem_size -= pos_to_pull->_mem_size;
-pos_to_pull++;
-_multi_cast_blocks.pop_front();
-} else {
-pos_to_pull->_block->create_same_struct_block(0)->swap(*block);
-
RETURN_IF_ERROR(vectorized::MutableBlock(block).merge(*pos_to_pull->_block));
 pos_to_pull->_used_count--;
+use_count = pos_to_pull->_used_count;
+un_finish_copy = &pos_to_pull->_un_finish_copy;
+
 pos_to_pull++;
 }
+*eos = _eos and pos_to_pull == end;
+}
+
+if (un_finish_copy) {
+if (use_count == 0) {
+// will clear _multi_cast_blocks
+_wait_copy_block(block, *un_finish_copy);
+} else {
+_copy_block(block, *un_finish_copy);
+}
 }
-*eos = _eos and pos_to_pull == _multi_cast_blocks.end();
 return Status::OK();
 }
 
@@ -60,12 +70,33 @@ void MultiCastDataStreamer::close_sender(int sender_idx) {
 _multi_cast_blocks.pop_front();
 } else {
 pos_to_pull->_used_count--;
+pos_to_pull->_un_finish_copy--;
 pos_to_pull++;
 }
 }
 _closed_sender_count++;
 }
 
+void MultiCastDataStreamer::_copy_block(vectorized::Block* block, int& 
un_finish_copy) {
+const auto rows = block->rows();
+for (int i = 0; i < block->columns(); ++i) {
+block->get_by_position(i).column = 
block->get_by_position(i).column->clone_resized(rows);
+}
+
+std::unique_lock l(_mutex);
+un_finish_copy--;
+if (un_finish_copy == 0) {
+l.unlock();
+_cv.notify_one();
+}
+}
+
+void MultiCastDataStreamer::_wait_copy_block(vectorized::Block* block, int& 
un_finish_copy) {
+std::unique_lock l(_mutex);
+_cv.wait(l, [&]() { return un_finish_copy == 0; });
+_multi_cast_blocks.pop_front();
+}
+
 Status MultiCastDataStreamer::push(RuntimeState* state, 
doris::vectorized::Block* block, bool eos) {
 auto rows = block->rows();
 COUNTER_UPDATE(_process_rows, rows);
@@ -79,7 +110,8 @@ Status MultiCastDataStreamer::push(RuntimeState* state, 
doris::vectorized::Block
 // TODO: if the [queue back block rows + block->rows()] < batch_size, 
better
 // do merge block. but need check the need_process_count and used_count 
whether
 // equal
-_multi_cast_blocks.emplace_back(block, need_process_count, block_mem_size);
+_multi_cast_blocks.emplac

(doris) branch master updated (629e094c1f3 -> 60eea39cd60)

2024-07-28 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 629e094c1f3 [Refactor]use async to get be resource (#38389)
 add 60eea39cd60 [fix](type)support runtime predicate for time type  
(#38258)

No new revisions were added by this update.

Summary of changes:
 be/src/exec/olap_common.h  |  3 +++
 be/src/runtime/runtime_predicate.cpp   | 10 
 .../runtime/time_value.h}  | 22 --
 .../time_type/test_time_in_runtimepredicate.out}   | 12 ++
 .../time_type/test_time_in_runtimepredicate.groovy | 27 +-
 5 files changed, 41 insertions(+), 33 deletions(-)
 copy be/src/{http/action/file_cache_action.h => vec/runtime/time_value.h} (65%)
 copy regression-test/data/{correctness_p0/test_case_when_decimal.out => 
datatype_p0/time_type/test_time_in_runtimepredicate.out} (59%)
 copy be/src/cloud/cloud_rowset_writer.h => 
regression-test/suites/datatype_p0/time_type/test_time_in_runtimepredicate.groovy
 (52%)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [feature](function) support ngram_search function (#38226)

2024-07-26 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new a59fdf4ead0 [feature](function) support ngram_search function (#38226)
a59fdf4ead0 is described below

commit a59fdf4ead09103f492f49c0d4dfe5dcac7a5dc3
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Fri Jul 26 16:07:22 2024 +0800

[feature](function) support ngram_search function (#38226)

mysql [test]>select ngram_search('123456789' , '12345' , 3);
+---+
| ngram_search('123456789', '12345', 3) |
+---+
|   0.6 |
+---+
1 row in set (0.01 sec)

mysql [test]>select ngram_search("abababab","babababa",2);
+-+
| ngram_search('abababab', 'babababa', 2) |
+-+
|   1 |
+-+
1 row in set (0.01 sec)
```

doc https://github.com/apache/doris-website/pull/899
---
 be/src/vec/functions/function_string.cpp   |   1 +
 be/src/vec/functions/function_string.h | 127 +
 .../doris/catalog/BuiltinScalarFunctions.java  |   2 +
 .../expressions/functions/scalar/NgramSearch.java  |  78 +
 .../expressions/visitor/ScalarFunctionVisitor.java |   5 +
 .../string_functions/test_string_function.out  | Bin 4217 -> 4562 bytes
 .../string_functions/test_string_function.groovy   |  29 +
 7 files changed, 242 insertions(+)

diff --git a/be/src/vec/functions/function_string.cpp 
b/be/src/vec/functions/function_string.cpp
index 955468f1a16..223d32a5682 100644
--- a/be/src/vec/functions/function_string.cpp
+++ b/be/src/vec/functions/function_string.cpp
@@ -1062,6 +1062,7 @@ void register_function_string(SimpleFunctionFactory& 
factory) {
 factory.register_function>();
 factory.register_function();
 factory.register_function();
+factory.register_function();
 
 factory.register_alias(FunctionLeft::name, "strleft");
 factory.register_alias(FunctionRight::name, "strright");
diff --git a/be/src/vec/functions/function_string.h 
b/be/src/vec/functions/function_string.h
index 5e119e2146c..22eeb93591f 100644
--- a/be/src/vec/functions/function_string.h
+++ b/be/src/vec/functions/function_string.h
@@ -56,6 +56,7 @@
 #include "vec/columns/column.h"
 #include "vec/columns/column_const.h"
 #include "vec/columns/column_vector.h"
+#include "vec/common/hash_table/phmap_fwd_decl.h"
 #include "vec/common/int_exp.h"
 #include "vec/common/memcmp_small.h"
 #include "vec/common/memcpy_small.h"
@@ -3674,4 +3675,130 @@ private:
 }
 }
 };
+
+class FunctionNgramSearch : public IFunction {
+public:
+static constexpr auto name = "ngram_search";
+static FunctionPtr create() { return 
std::make_shared(); }
+String get_name() const override { return name; }
+size_t get_number_of_arguments() const override { return 3; }
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+// ngram_search(text,pattern,gram_num)
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+size_t result, size_t input_rows_count) const override 
{
+CHECK_EQ(arguments.size(), 3);
+auto col_res = ColumnFloat64::create();
+bool col_const[3];
+ColumnPtr argument_columns[3];
+for (int i = 0; i < 3; ++i) {
+std::tie(argument_columns[i], col_const[i]) =
+
unpack_if_const(block.get_by_position(arguments[i]).column);
+}
+// There is no need to check if the 2-th,3-th parameters are const 
here because fe has already checked them.
+auto pattern = assert_cast(argument_columns[1].get())->get_data_at(0);
+auto gram_num = assert_cast(argument_columns[2].get())->get_element(0);
+const auto* text_col = assert_cast(argument_columns[0].get());
+
+if (col_const[0]) {
+_execute_impl(text_col, pattern, gram_num, *col_res, 
input_rows_count);
+} else {
+_execute_impl(text_col, pattern, gram_num, *col_res, 
input_rows_count);
+}
+
+block.replace_by_position(result, std::move(col_res));
+return Status::OK();
+}
+
+private:
+using NgramMap = phmap::flat_hash_map;
+// In the map, the key is the CRC32 hash result of a substring 

(doris) branch master updated: [Refactor](exec) remove the unless file and refactor the agg code (#38193)

2024-07-24 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 8d0b7f17e87 [Refactor](exec) remove the unless file and refactor the 
agg code (#38193)
8d0b7f17e87 is described below

commit 8d0b7f17e876d985a8da078ad9f9d3ff380f7a30
Author: HappenLee 
AuthorDate: Wed Jul 24 20:53:02 2024 +0800

[Refactor](exec) remove the unless file and refactor the agg code (#38193)

1. delete the file:be/src/vec/utils/count_by_enum_helpers.hpp
2. move code
to:be/src/vec/aggregate_functions/aggregate_function_count_by_enum.h
---
 .../aggregate_function_count_by_enum.h | 45 +--
 be/src/vec/utils/count_by_enum_helpers.hpp | 67 --
 2 files changed, 41 insertions(+), 71 deletions(-)

diff --git a/be/src/vec/aggregate_functions/aggregate_function_count_by_enum.h 
b/be/src/vec/aggregate_functions/aggregate_function_count_by_enum.h
index 93a5103ef59..5d4a3dde355 100644
--- a/be/src/vec/aggregate_functions/aggregate_function_count_by_enum.h
+++ b/be/src/vec/aggregate_functions/aggregate_function_count_by_enum.h
@@ -14,13 +14,15 @@
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.
-// This file is copied from
-// 
https://github.com/ClickHouse/ClickHouse/blob/master/src/AggregateFunctions/AggregateFunctionCount.h
-// and modified by Doris
 
 #pragma once
 
+#include 
+#include 
+#include 
+
 #include 
+#include 
 
 #include "common/logging.h"
 #include "vec/aggregate_functions/aggregate_function.h"
@@ -28,10 +30,45 @@
 #include "vec/common/assert_cast.h"
 #include "vec/data_types/data_type_number.h"
 #include "vec/io/io_helper.h"
-#include "vec/utils/count_by_enum_helpers.hpp"
 
 namespace doris::vectorized {
 
+struct CountByEnumData {
+std::unordered_map cbe;
+uint64_t not_null = 0;
+uint64_t null = 0;
+uint64_t all = 0;
+};
+
+void build_json_from_vec(rapidjson::StringBuffer& buffer,
+ const std::vector& data_vec) {
+rapidjson::Document doc;
+doc.SetArray();
+rapidjson::Document::AllocatorType& allocator = doc.GetAllocator();
+
+int vec_size_number = data_vec.size();
+for (int idx = 0; idx < vec_size_number; ++idx) {
+rapidjson::Value obj(rapidjson::kObjectType);
+
+rapidjson::Value obj_cbe(rapidjson::kObjectType);
+std::unordered_map unordered_map = 
data_vec[idx].cbe;
+for (auto it : unordered_map) {
+rapidjson::Value key_cbe(it.first.c_str(), allocator);
+rapidjson::Value value_cbe(it.second);
+obj_cbe.AddMember(key_cbe, value_cbe, allocator);
+}
+obj.AddMember("cbe", obj_cbe, allocator);
+obj.AddMember("notnull", data_vec[idx].not_null, allocator);
+obj.AddMember("null", data_vec[idx].null, allocator);
+obj.AddMember("all", data_vec[idx].all, allocator);
+
+doc.PushBack(obj, allocator);
+}
+
+rapidjson::Writer writer(buffer);
+doc.Accept(writer);
+}
+
 struct AggregateFunctionCountByEnumData {
 using MapType = std::unordered_map;
 
diff --git a/be/src/vec/utils/count_by_enum_helpers.hpp 
b/be/src/vec/utils/count_by_enum_helpers.hpp
deleted file mode 100644
index 20c38b765bc..000
--- a/be/src/vec/utils/count_by_enum_helpers.hpp
+++ /dev/null
@@ -1,67 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements.  See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership.  The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License.  You may obtain a copy of the License at
-//
-//   http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied.  See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-#pragma once
-
-#include 
-#include 
-#include 
-
-#include 
-
-#include "vec/data_types/data_type_decimal.h"
-#include "vec/io/io_helper.h"
-
-namespace doris::vectorized {
-
-struct CountByEnumData {
-std::unordered_map cbe;
-uint64_t not_null;
-uint64_t null;
-uint64_t all;
-};
-
-void build_json_from_vec(rapidjson::StringBuffer& buffer,
- const std::vector& data_vec) {
-rapidjson::Document doc;
-  

(doris) branch master updated: [Bug](agg) fix collect_set function core dump without arena pool (#38234)

2024-07-24 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new c16ea419b27 [Bug](agg) fix collect_set function core dump without 
arena pool (#38234)
c16ea419b27 is described below

commit c16ea419b27095a37d54005815f441512ef5b782
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Wed Jul 24 16:18:07 2024 +0800

[Bug](agg) fix collect_set function core dump without arena pool (#38234)

before the add_range_single_place pass nullptr as arena object,
but collect_set function need save data in arena, so will core dump
without arena pool.
---
 be/src/pipeline/exec/analytic_source_operator.cpp| 4 +---
 .../data/nereids_function_p0/agg_function/group_unique_array.out | 9 +
 .../nereids_function_p0/agg_function/group_unique_array.groovy   | 3 +++
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/be/src/pipeline/exec/analytic_source_operator.cpp 
b/be/src/pipeline/exec/analytic_source_operator.cpp
index a036481d727..406108fbc4f 100644
--- a/be/src/pipeline/exec/analytic_source_operator.cpp
+++ b/be/src/pipeline/exec/analytic_source_operator.cpp
@@ -279,8 +279,6 @@ void AnalyticLocalState::_destroy_agg_status() {
 }
 }
 
-//now is execute for lead/lag row_number/rank/dense_rank/ntile functions
-//sum min max count avg first_value last_value functions
 void AnalyticLocalState::_execute_for_win_func(int64_t partition_start, 
int64_t partition_end,
int64_t frame_start, int64_t 
frame_end) {
 for (size_t i = 0; i < _agg_functions_size; ++i) {
@@ -292,7 +290,7 @@ void AnalyticLocalState::_execute_for_win_func(int64_t 
partition_start, int64_t
 partition_start, partition_end, frame_start, frame_end,
 _fn_place_ptr +
 
_parent->cast()._offsets_of_aggregate_states[i],
-agg_columns.data(), nullptr);
+agg_columns.data(), _agg_arena_pool.get());
 
 // If the end is not greater than the start, the current window should 
be empty.
 _current_window_empty =
diff --git 
a/regression-test/data/nereids_function_p0/agg_function/group_unique_array.out 
b/regression-test/data/nereids_function_p0/agg_function/group_unique_array.out
index 036ac5ce57f..74c053e38f6 100644
--- 
a/regression-test/data/nereids_function_p0/agg_function/group_unique_array.out
+++ 
b/regression-test/data/nereids_function_p0/agg_function/group_unique_array.out
@@ -8,3 +8,12 @@
 3  ["2023-01-02"]  ["hello"]
 4  ["2023-01-02", "2023-01-03"]["sql"]
 
+-- !3 --
+["doris", "world", "hello", "sql"]
+["doris", "world", "hello", "sql"]
+["doris", "world", "hello", "sql"]
+["doris", "world", "hello", "sql"]
+["doris", "world", "hello", "sql"]
+["doris", "world", "hello", "sql"]
+["doris", "world", "hello", "sql"]
+
diff --git 
a/regression-test/suites/nereids_function_p0/agg_function/group_unique_array.groovy
 
b/regression-test/suites/nereids_function_p0/agg_function/group_unique_array.groovy
index f17eadf73a5..f110f1a50c9 100644
--- 
a/regression-test/suites/nereids_function_p0/agg_function/group_unique_array.groovy
+++ 
b/regression-test/suites/nereids_function_p0/agg_function/group_unique_array.groovy
@@ -48,4 +48,7 @@ suite("group_unique_array") {
 qt_2 """
 select k1,collect_set(k2),collect_set(k3,1) from 
test_group_unique_array_table group by k1 order by k1;
 """
+qt_3 """
+select collect_set(k3) over() from test_group_unique_array_table;
+"""
 }
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (7229db0f927 -> 67e1bc331c6)

2024-07-23 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 7229db0f927 [opt](Nereids): disable PRUNE_EMPTY_PARTITION rule in 
SqlTestBase.java (#38246)
 add 67e1bc331c6 [bug](function) fix conv function get wrong result as 
parse overflow  (#38001)

No new revisions were added by this update.

Summary of changes:
 be/src/vec/functions/function_conv.cpp   | 6 +-
 .../data/nereids_p0/sql_functions/math_functions/test_conv.out   | 9 +
 .../nereids_p0/sql_functions/math_functions/test_conv.groovy | 4 
 3 files changed, 18 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris-website) branch master updated: [docs](udf) fix some unreached link on branch 2.1/2.0 (#892)

2024-07-23 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
 new c9f8709445 [docs](udf) fix some unreached link on branch 2.1/2.0 (#892)
c9f8709445 is described below

commit c9f87094452b35cabdd145a8d049bb4107065e79
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Wed Jul 24 10:59:39 2024 +0800

[docs](udf) fix some unreached link on branch 2.1/2.0 (#892)
---
 .../version-2.0/query/udf/java-user-defined-function.md | 6 +++---
 .../version-2.1/query/udf/java-user-defined-function.md | 6 +++---
 versioned_docs/version-2.0/query/udf/java-user-defined-function.md  | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/query/udf/java-user-defined-function.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/query/udf/java-user-defined-function.md
index 63bc8f1756..5ac6ab8773 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/query/udf/java-user-defined-function.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/query/udf/java-user-defined-function.md
@@ -40,7 +40,7 @@ Doris 支持使用 JAVA 编写 UDF、UDAF 和 UDTF。下文如无特殊说明,
 
 否则将会返回错误状态信息 `Couldn't open file ..`。
 
-更多语法帮助可参阅 [CREATE 
FUNCTION](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-FUNCTION.md).
+更多语法帮助可参阅 [CREATE 
FUNCTION](../../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-FUNCTION.md).
 
 ### UDF
 
@@ -89,7 +89,7 @@ UDF 的使用与普通的函数方式一致,唯一的区别在于,内置函
 
 ## 删除 UDF
 
-当你不再需要 UDF 函数时,你可以通过下述命令来删除一个 UDF 函数,可以参考 [DROP 
FUNCTION](../sql-manual/sql-statements/Data-Definition-Statements/Drop/DROP-FUNCTION.md)
+当你不再需要 UDF 函数时,你可以通过下述命令来删除一个 UDF 函数,可以参考 [DROP 
FUNCTION](../../sql-manual/sql-reference/Data-Definition-Statements/Drop/DROP-FUNCTION.md)
 
 ## 类型对应关系
 
@@ -354,7 +354,7 @@ public class MedianUDAF {
 
 UDTF 和 UDF 函数一样,需要用户自主实现一个 `evaluate` 方法, 但是 UDTF 函数的返回值必须是 Array 类型。
 
-另外Doris中表函数会因为 `_outer` 后缀有不同的表现,可查看[OUTER 
组合器](../sql-manual/sql-functions/table-functions/explode-numbers-outer.md)
+另外Doris中表函数会因为 `_outer` 后缀有不同的表现,可查看[OUTER 
组合器](../../sql-manual/sql-functions/table-functions/explode-numbers-outer.md)
 
 ```JAVA
 public class UDTFStringTest {
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query/udf/java-user-defined-function.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query/udf/java-user-defined-function.md
index 97da4da5fa..23dc6302a0 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query/udf/java-user-defined-function.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/query/udf/java-user-defined-function.md
@@ -40,7 +40,7 @@ Doris 支持使用 JAVA 编写 UDF、UDAF 和 UDTF。下文如无特殊说明,
 
 否则将会返回错误状态信息 `Couldn't open file ..`。
 
-更多语法帮助可参阅 [CREATE 
FUNCTION](../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-FUNCTION.md).
+更多语法帮助可参阅 [CREATE 
FUNCTION](../../sql-manual/sql-statements/Data-Definition-Statements/Create/CREATE-FUNCTION.md).
 
 ### UDF
 
@@ -89,7 +89,7 @@ UDF 的使用与普通的函数方式一致,唯一的区别在于,内置函
 
 ## 删除 UDF
 
-当你不再需要 UDF 函数时,你可以通过下述命令来删除一个 UDF 函数,可以参考 [DROP 
FUNCTION](../sql-manual/sql-statements/Data-Definition-Statements/Drop/DROP-FUNCTION.md)
+当你不再需要 UDF 函数时,你可以通过下述命令来删除一个 UDF 函数,可以参考 [DROP 
FUNCTION](../../sql-manual/sql-statements/Data-Definition-Statements/Drop/DROP-FUNCTION.md)
 
 ## 类型对应关系
 
@@ -354,7 +354,7 @@ public class MedianUDAF {
 
 UDTF 和 UDF 函数一样,需要用户自主实现一个 `evaluate` 方法, 但是 UDTF 函数的返回值必须是 Array 类型。
 
-另外Doris中表函数会因为 `_outer` 后缀有不同的表现,可查看[OUTER 
组合器](../sql-manual/sql-functions/table-functions/explode-numbers-outer.md)
+另外Doris中表函数会因为 `_outer` 后缀有不同的表现,可查看[OUTER 
组合器](../../sql-manual/sql-functions/table-functions/explode-numbers-outer.md)
 
 ```JAVA
 public class UDTFStringTest {
diff --git a/versioned_docs/version-2.0/query/udf/java-user-defined-function.md 
b/versioned_docs/version-2.0/query/udf/java-user-defined-function.md
index 68437f53f5..f677d71ec6 100644
--- a/versioned_docs/version-2.0/query/udf/java-user-defined-function.md
+++ b/versioned_docs/version-2.0/query/udf/java-user-defined-function.md
@@ -355,7 +355,7 @@ public class MedianUDAF {
 
 Similar to UDFs, UDTFs require users to implement an `evaluate` method. 
However, the return value of a UDTF must be of the Array type.
 
-Additionally, table functions in Doris may exhibit different behaviors due to 
the `_outer` suffix. For more details, refer to [OUTER 
combinator](../sql-manual/sql-functions/table-functions/explode-numbers-outer.md).
+Additionally, table functions in Doris may exhibit different behaviors due to 
the `_outer` suffix. For more details, refer to [OUTER 
combinator](../../sql-manual/sql-functions/table-functions/explode-numbers-outer.md).
 
 ```JA

(doris) branch master updated: [fix](agg) fix RowsProduced counter is not set (#38271)

2024-07-23 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new ed4abc7ff0a [fix](agg) fix RowsProduced counter is not set (#38271)
ed4abc7ff0a is described below

commit ed4abc7ff0a4089277a9569b3b819ecb554ed6e6
Author: TengJianPing <18241664+jackte...@users.noreply.github.com>
AuthorDate: Wed Jul 24 10:20:36 2024 +0800

[fix](agg) fix RowsProduced counter is not set (#38271)

## Proposed changes
`RowsProduced` and `BlocksProduced` counter of agg source is not right:

```
Pipeline  :  0(instance_num=4):
  RESULT_SINK_OPERATOR  (id=0):
-  PlanInfo
  -  TABLE:  
rqg_158549.table_10_undef_partitions2_keys3_properties4_distributed_by52(table_10_undef_partitions2_keys3_properties4_distributed_by52),
  PREAGGREGATION:  ON
  -  partitions=1/1  
(table_10_undef_partitions2_keys3_properties4_distributed_by52)
  -  tablets=10/10,  
tabletList=39981436181982,39981436181984,39981436181986  ...
  -  cardinality=10,  avgRowSize=0.0,  
numNodes=1
  -  pushAggOp=NONE
-  BlocksProduced:  sum  4,  avg  1,  max  1,  
min  1
-  CloseTime:  avg  28.396us,  max  43.17us,  
min  19.647us
-  ExecTime:  avg  514.688us,  max  694.677us,  
min  353.811us
-  InitTime:  avg  52.136us,  max  55.309us,  
min  46.966us
-  InputRows:  sum  0,  avg  0,  max  0,  min  0
-  MemoryUsage:  sum  ,  avg  ,  max  ,  min
-  PeakMemoryUsage:  sum  0.00  ,  avg  
0.00  ,  max  0.00  ,  min  0.00
-  OpenTime:  avg  212.328us,  max  249.375us,  
min  170.678us
-  RowsProduced:  sum  8,  avg  2,  max  3,  
min  0
-  WaitForDependencyTime:  avg  0ns,  max  0ns, 
 min  0ns
-  
WaitForDependency[RESULT_SINK_OPERATOR_DEPENDENCY]Time:  avg  0ns,  max  0ns,  
min  0ns
  AGGREGATION_OPERATOR  (id=10  ,  nereids_id=598):
-  PlanInfo
  -  output:  count(pk)[#31]
  -  group  by:  
col_varchar_10__undef_signed
  -  sortByGroupKey:false
  -  cardinality=8
  -  projections:  field1,  
col_varchar_10__undef_signed
  -  project  output  tuple  id:  11
-  BlocksProduced:  sum  0,  avg  0,  max  
0,  min  0
-  CloseTime:  avg  5.617us,  max  6.543us, 
 min  5.247us
-  ExecTime:  avg  1.172ms,  max  1.609ms,  
min  289.815us
-  InitTime:  avg  0ns,  max  0ns,  min  0ns
-  MemoryUsage:  sum  ,  avg  ,  max  ,  min
-  PeakMemoryUsage:  sum  0.00  ,  avg  
0.00  ,  max  0.00  ,  min  0.00
-  OpenTime:  avg  130.883us,  max  
143.370us,  min  120.96us
-  ProjectionTime:  avg  420.824us,  max  
636.882us,  min  763ns
-  RowsProduced:  sum  0,  avg  0,  max  0, 
 min  0
-  
WaitForDependency[AGGREGATION_OPERATOR_DEPENDENCY]Time:  avg  72.547ms,  max  
79.260ms,  min  65.118ms
```
---
 be/src/pipeline/exec/aggregation_source_operator.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/be/src/pipeline/exec/aggregation_source_operator.cpp 
b/be/src/pipeline/exec/aggregation_source_operator.cpp
index 1b7a151e2af..0c05c965f1f 100644
--- a/be/src/pipeline/exec/aggregation_source_operator.cpp
+++ b/be/src/pipeline/exec/aggregation_source_operator.cpp
@@ -460,6 +460,12 @@ void AggLocalState::do_agg_limit(vectorized::Block* block, 
bool* eos) {
 } else {
 reached_limit(block, eos);
 }
+} else {
+if (auto rows = block->rows()) {
+_num_rows_returned += rows;
+COUNTER_UPDATE(_blocks_returned_counter, 1);
+COUNTER_SET(_rows_returned_counter, _num_rows_returned);
+}
 }
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional command

(doris) branch master updated: [Chore](topn) add case for topn opt (#38154)

2024-07-23 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new d53f006c6f9 [Chore](topn) add case for topn opt (#38154)
d53f006c6f9 is described below

commit d53f006c6f9ce08a737c4190b24d7fd4f14cb2f8
Author: Pxl 
AuthorDate: Wed Jul 24 00:18:15 2024 +0800

[Chore](topn) add case for topn opt (#38154)
---
 .../data/nereids_arith_p0/topn_alltype.out | 1207 
 .../suites/nereids_arith_p0/load.groovy|   43 +
 .../suites/nereids_arith_p0/topn_alltype.groovy|  433 +++
 3 files changed, 1683 insertions(+)

diff --git a/regression-test/data/nereids_arith_p0/topn_alltype.out 
b/regression-test/data/nereids_arith_p0/topn_alltype.out
new file mode 100644
index 000..22f0f3a24a6
--- /dev/null
+++ b/regression-test/data/nereids_arith_p0/topn_alltype.out
@@ -0,0 +1,1207 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !boolean --
+\N
+1
+2
+
+-- !boolean --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !tinyint --
+\N
+1
+13
+
+-- !tinyint --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !smallint --
+\N
+1
+13
+
+-- !smallint --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !int --
+\N
+1
+13
+
+-- !int --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !bigint --
+\N
+1
+13
+
+-- !bigint --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !largeint --
+\N
+1
+13
+
+-- !largeint --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !varchar --
+18
+19
+24
+
+-- !varchar --
+\N
+1
+4
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+
+-- !char --
+13
+19
+20
+
+-- !char --
+1
+4
+7
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+
+-- !str --
+13
+14
+20
+
+-- !str --
+\N
+1
+4
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+
+-- !date --
+\N
+1
+13
+
+-- !date --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !datev2 --
+\N
+1
+13
+
+-- !datev2 --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !datetime --
+\N
+1
+13
+
+-- !datetime --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !datetimev2 --
+\N
+1
+13
+
+-- !datetimev2 --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !decimal32 --
+\N
+1
+2
+
+-- !decimal32 --
+\N
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+
+-- !decimal64 --
+\N
+1
+2
+
+-- !decimal64 --
+\N
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+
+-- !decimal128 --
+\N
+1
+2
+
+-- !decimal128 --
+\N
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+
+-- !decimalv2 --
+\N
+1
+13
+
+-- !decimalv2 --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !decimal256 --
+1
+2
+3
+
+-- !decimal256 --
+1
+2
+3
+4
+5
+
+-- !ipv4 --
+1
+2
+3
+
+-- !ipv4 --
+1
+1
+2
+2
+3
+
+-- !ipv6 --
+1
+2
+3
+
+-- !ipv6 --
+1
+1
+2
+2
+3
+
+-- !boolean --
+1
+2
+3
+
+-- !boolean --
+1
+2
+3
+4
+5
+6
+7
+8
+13
+14
+15
+16
+17
+18
+19
+
+-- !tinyint --
+1
+2
+13
+
+-- !tinyint --
+1
+2
+3
+4
+5
+6
+7
+8
+13
+14
+15
+16
+17
+18
+19
+
+-- !smallint --
+1
+2
+13
+
+-- !smallint --
+1
+2
+3
+4
+5
+6
+7
+8
+13
+14
+15
+16
+17
+18
+19
+
+-- !int --
+1
+2
+13
+
+-- !int --
+1
+2
+3
+4
+5
+6
+7
+8
+13
+14
+15
+16
+17
+18
+19
+
+-- !bigint --
+1
+2
+13
+
+-- !bigint --
+1
+2
+3
+4
+5
+6
+7
+8
+13
+14
+15
+16
+17
+18
+19
+
+-- !largeint --
+1
+2
+13
+
+-- !largeint --
+1
+2
+3
+4
+5
+6
+7
+8
+13
+14
+15
+16
+17
+18
+19
+
+-- !varchar --
+18
+19
+24
+
+-- !varchar --
+1
+4
+7
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+
+-- !char --
+13
+19
+20
+
+-- !char --
+1
+4
+7
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+
+-- !str --
+13
+14
+20
+
+-- !str --
+1
+4
+7
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+
+-- !date --
+1
+2
+13
+
+-- !date --
+1
+2
+3
+4
+5
+6
+7
+8
+13
+14
+15
+16
+17
+18
+19
+
+-- !datev2 --
+1
+2
+13
+
+-- !datev2 --
+1
+2
+3
+4
+5
+6
+7
+8
+13
+14
+15
+16
+17
+18
+19
+
+-- !datetime --
+1
+2
+13
+
+-- !datetime --
+1
+2
+3
+4
+5
+6
+7
+8
+13
+14
+15
+16
+17
+18
+19
+
+-- !datetimev2 --
+1
+2
+13
+
+-- !datetimev2 --
+1
+2
+3
+4
+5
+6
+7
+8
+13
+14
+15
+16
+17
+18
+19
+
+-- !decimal32 --
+1
+2
+3
+
+-- !decimal32 --
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+
+-- !decimal64 --
+1
+2
+3
+
+-- !decimal64 --
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+
+-- !decimal128 --
+1
+2
+3
+
+-- !decimal128 --
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+
+-- !decimalv2 --
+1
+2
+13
+
+-- !decimalv2 --
+1
+2
+3
+4
+5
+6
+7
+8
+13
+14
+15
+16
+17
+18
+19
+
+-- !decimal256 --
+1
+4
+5
+
+-- !decimal256 --
+1
+2
+3
+4
+5
+
+-- !ipv4 --
+1
+1
+2
+
+-- !ipv4 --
+1
+1
+2
+2
+3
+
+-- !ipv6 --
+2
+3
+3
+
+-- !ipv6 --
+1
+2
+2
+3
+3
+
+-- !boolean --
+\N
+1
+2
+
+-- !boolean --
+\N
+1
+2
+3
+4
+5
+6
+7
+13
+14
+15
+16
+17
+18
+19
+
+-- !tinyint --
+\N
+1
+13
+
+-- !tinyint --
+\N
+1

(doris) branch master updated: [fix](profile) Task state of query profile is not set correctly (#38082)

2024-07-23 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 912a23887fc [fix](profile) Task state of query profile is not set 
correctly (#38082)
912a23887fc is described below

commit 912a23887fcae26c34ff6d03ad76bd221e44c5c8
Author: zhiqiang 
AuthorDate: Tue Jul 23 15:29:45 2024 +0800

[fix](profile) Task state of query profile is not set correctly (#38082)

Task state in connection context will only be updated after profile is
updated. So task state of profile should be set to query state of 
coordinator.
---
 .../src/main/java/org/apache/doris/qe/StmtExecutor.java   | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java 
b/fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java
index af141ff8d12..681f659b1a2 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java
@@ -387,8 +387,15 @@ public class StmtExecutor {
 builder.endTime(TimeUtils.longToTimeString(currentTimestamp));
 builder.totalTime(DebugUtil.getPrettyStringMs(currentTimestamp - 
context.getStartTime()));
 }
-builder.taskState(!isFinished && 
context.getState().getStateType().equals(MysqlStateType.OK) ? "RUNNING"
-: context.getState().toString());
+String taskState = "RUNNING";
+if (isFinished) {
+if (coord != null) {
+taskState = coord.queryStatus.getErrorCode().name();
+} else {
+taskState = context.getState().toString();
+}
+}
+builder.taskState(taskState);
 builder.user(context.getQualifiedUser());
 builder.defaultDb(context.getDatabase());
 builder.workloadGroup(context.getWorkloadGroupName());


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [Fix](function) fix coredump for MULTI_MATCH_ANY (#37959)

2024-07-22 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new d10a3dca726 [Fix](function) fix coredump for MULTI_MATCH_ANY (#37959)
d10a3dca726 is described below

commit d10a3dca726a3b192952d5ca45e06c4276fc08ef
Author: zclllhhjj 
AuthorDate: Tue Jul 23 10:24:19 2024 +0800

[Fix](function) fix coredump for MULTI_MATCH_ANY (#37959)

[INVALID_ARGUMENT][E33] Compile regexp expression failed. got Embedded 
start anchors not supported.. some expressions may be illegal
---
 be/src/vec/functions/regexps.h | 58 +++--
 .../search_functions/test_multi_string_search.out  | 24 ++-
 .../test_multi_string_search.groovy| 74 +-
 3 files changed, 90 insertions(+), 66 deletions(-)

diff --git a/be/src/vec/functions/regexps.h b/be/src/vec/functions/regexps.h
index d2963d853f5..efa2f77ccd2 100644
--- a/be/src/vec/functions/regexps.h
+++ b/be/src/vec/functions/regexps.h
@@ -21,9 +21,9 @@
 #pragma once
 
 #include 
+#include 
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -31,11 +31,10 @@
 #include 
 #include 
 
+#include "common/exception.h"
 #include "vec/common/string_ref.h"
 
-namespace doris::vectorized {
-
-namespace multiregexps {
+namespace doris::vectorized::multiregexps {
 
 template 
 struct HyperscanDeleter {
@@ -75,7 +74,9 @@ public:
 
 Regexps* get() {
 std::lock_guard lock(mutex);
-if (regexps) return &*regexps;
+if (regexps) {
+return &*regexps;
+}
 regexps = constructor();
 return &*regexps;
 }
@@ -136,7 +137,9 @@ Regexps constructRegexps(const std::vector& 
str_patterns,
 /// We mark the patterns to provide the callback results.
 if constexpr (save_indices) {
 ids.reset(new unsigned int[patterns.size()]);
-for (size_t i = 0; i < patterns.size(); ++i) ids[i] = 
static_cast(i + 1);
+for (size_t i = 0; i < patterns.size(); ++i) {
+ids[i] = static_cast(i + 1);
+}
 }
 
 for (auto& pattern : patterns) {
@@ -144,24 +147,28 @@ Regexps constructRegexps(const std::vector& 
str_patterns,
 }
 
 hs_error_t err;
-if constexpr (!WithEditDistance)
+if constexpr (!WithEditDistance) {
 err = hs_compile_multi(patterns.data(), flags.data(), ids.get(),
static_cast(patterns.size()), 
HS_MODE_BLOCK, nullptr, &db,
&compile_error);
-else
+} else {
 err = hs_compile_ext_multi(patterns.data(), flags.data(), ids.get(), 
ext_exprs_ptrs.data(),
static_cast(patterns.size()), 
HS_MODE_BLOCK, nullptr,
&db, &compile_error);
+}
 
-if (err != HS_SUCCESS) {
+if (err != HS_SUCCESS) [[unlikely]] {
 /// CompilerError is a unique_ptr, so correct memory free after the 
exception is thrown.
 CompilerError error(compile_error);
 
-if (error->expression < 0)
-LOG(FATAL) << "Logical error: " + String(error->message);
-else
-LOG(FATAL) << "Bad arguments: Pattern " + 
str_patterns[error->expression] +
-  "failed with error " + 
String(error->message);
+if (error->expression < 0) { // error has nothing to do with the 
patterns themselves
+throw doris::Exception(Status::InternalError("Compile regexp 
expression failed. got {}",
+ error->message));
+} else {
+throw doris::Exception(Status::InvalidArgument(
+"Compile regexp expression failed. got {}. some 
expressions may be illegal",
+error->message));
+}
 }
 
 /// We allocate the scratch space only once, then copy it across multiple 
threads with hs_clone_scratch
@@ -169,8 +176,15 @@ Regexps constructRegexps(const std::vector& 
str_patterns,
 hs_scratch_t* scratch = nullptr;
 err = hs_alloc_scratch(db, &scratch);
 
-/// If not HS_SUCCESS, it is guaranteed that the memory would not be 
allocated for scratch.
-if (err != HS_SUCCESS) LOG(FATAL) << "Could not allocate scratch space for 
hyperscan";
+if (err != HS_SUCCESS) [[unlikely]] {
+if (err == HS_NOMEM) [[unlikely]] {
+throw doris::Exception(Status::MemoryAllocFailed(
+"Allocating memory failed on compiling regexp 
expressions."));
+} else {
+throw doris::Exception(Status::InvalidArgument(
+"Compile regexp expression failed with

(doris) branch master updated: [improve](execution) replace the LOG(FATAL) to throw Exception in query execute layer (#38144)

2024-07-22 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 6ac42a93778 [improve](execution) replace the LOG(FATAL) to throw 
Exception in query execute layer (#38144)
6ac42a93778 is described below

commit 6ac42a937787c33dcc2b596c237e2fdbf038fc73
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Tue Jul 23 10:21:47 2024 +0800

[improve](execution) replace the LOG(FATAL) to throw Exception in query 
execute layer (#38144)

As the LOG(FATAL) will cause BE core dump, so change it to throw
exception in Execute Engine Layer.

## Proposed changes

Issue Number: close #xxx


---
 .../aggregate_function_distinct.cpp|  5 +--
 .../aggregate_function_min_max.h   |  8 ++---
 .../aggregate_functions/aggregate_function_null.h  |  8 +++--
 .../aggregate_function_reader_first_last.h | 12 +---
 .../aggregate_function_window.h| 12 +---
 be/src/vec/aggregate_functions/factory_helpers.h   | 12 +---
 be/src/vec/data_types/data_type.cpp| 17 +++---
 be/src/vec/data_types/data_type_array.h|  3 +-
 be/src/vec/data_types/data_type_bitmap.h   |  3 +-
 be/src/vec/data_types/data_type_decimal.h  |  3 +-
 be/src/vec/data_types/data_type_factory.cpp|  2 +-
 .../vec/data_types/data_type_fixed_length_object.h |  3 +-
 be/src/vec/data_types/data_type_hll.h  |  2 +-
 be/src/vec/data_types/data_type_map.h  |  2 +-
 be/src/vec/data_types/data_type_nothing.cpp|  4 +--
 be/src/vec/data_types/data_type_nothing.h  | 10 --
 be/src/vec/data_types/data_type_nullable.cpp   |  3 +-
 be/src/vec/data_types/data_type_object.h   |  6 +++-
 be/src/vec/data_types/data_type_quantilestate.h|  4 ++-
 be/src/vec/data_types/data_type_struct.cpp |  6 ++--
 be/src/vec/data_types/data_type_struct.h   |  3 +-
 be/src/vec/data_types/data_type_time.h |  2 +-
 be/src/vec/data_types/data_type_time_v2.h  |  2 +-
 .../vec/data_types/serde/data_type_decimal_serde.h |  3 +-
 .../vec/data_types/serde/data_type_number_serde.h  |  3 +-
 .../vec/functions/array/function_array_cum_sum.cpp |  8 +++--
 .../functions/array/function_array_difference.h|  7 +++--
 .../functions/array/function_array_enumerate.cpp   |  7 +++--
 .../array/function_array_enumerate_uniq.cpp| 12 +---
 be/src/vec/functions/function.h| 26 ++--
 be/src/vec/functions/function_binary_arithmetic.h  | 12 +---
 be/src/vec/functions/function_cast.h   | 10 +++---
 .../function_date_or_datetime_computation.h| 14 +
 .../function_date_or_datetime_to_something.h   | 30 +-
 be/src/vec/functions/function_helpers.cpp  | 16 ++
 be/src/vec/functions/function_jsonb.cpp|  3 +-
 be/src/vec/functions/function_string.cpp   |  5 +--
 be/src/vec/functions/function_string_to_string.h   |  5 +--
 be/src/vec/functions/function_unary_arithmetic.h   |  5 +--
 be/src/vec/functions/function_variadic_arguments.h |  8 +++--
 be/src/vec/functions/functions_comparison.h| 18 ++-
 be/src/vec/functions/functions_logical.cpp | 20 +++-
 be/src/vec/functions/if.cpp|  3 +-
 be/src/vec/functions/round.h   | 36 ++
 be/src/vec/json/json_parser.cpp|  6 ++--
 45 files changed, 243 insertions(+), 146 deletions(-)

diff --git a/be/src/vec/aggregate_functions/aggregate_function_distinct.cpp 
b/be/src/vec/aggregate_functions/aggregate_function_distinct.cpp
index 5b2269a27d9..3155aa24be2 100644
--- a/be/src/vec/aggregate_functions/aggregate_function_distinct.cpp
+++ b/be/src/vec/aggregate_functions/aggregate_function_distinct.cpp
@@ -35,8 +35,9 @@ public:
 
 DataTypes transform_arguments(const DataTypes& arguments) const override {
 if (arguments.empty()) {
-LOG(FATAL)
-<< "Incorrect number of arguments for aggregate function 
with Distinct suffix";
+throw doris::Exception(
+ErrorCode::INTERNAL_ERROR,
+"Incorrect number of arguments for aggregate function with 
Distinct suffix");
 }
 return arguments;
 }
diff --git a/be/src/vec/aggregate_functions/aggregate_function_min_max.h 
b/be/src/vec/aggregate_functions/aggregate_function_min_max.h
index f7cf69b3aca..7fe6e2923e1 100644
--- a/be/src/vec/aggregate_functions/aggregate_function_min_max.h
+++ b/be/src/vec/aggregate_functions/aggregate_function_min_max.h
@@ -528,10 +528,10 @@ public:
 if (StringRef(Data::name()) == String

(doris) branch master updated: [Refactor](common) refactor the Exception code (#38172)

2024-07-21 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 4cc4944ca13 [Refactor](common) refactor the Exception code (#38172)
4cc4944ca13 is described below

commit 4cc4944ca139948dab78459337e91b3612f2b8c5
Author: HappenLee 
AuthorDate: Mon Jul 22 14:11:15 2024 +0800

[Refactor](common) refactor the Exception code (#38172)

1. Remove the useless code in Exception
2. Use fmt replace the streamstring for performance
---
 be/src/common/exception.cpp   | 19 ---
 be/src/common/exception.h | 18 +-
 be/test/common/exception_test.cpp |  6 --
 3 files changed, 5 insertions(+), 38 deletions(-)

diff --git a/be/src/common/exception.cpp b/be/src/common/exception.cpp
index c6139c0f995..48e1229d44e 100644
--- a/be/src/common/exception.cpp
+++ b/be/src/common/exception.cpp
@@ -32,23 +32,4 @@ Exception::Exception(int code, const std::string_view& msg) {
 LOG(FATAL) << "[ExitOnException] error code: " << code << ", message: 
" << msg;
 }
 }
-
-Exception::Exception(const Exception& nested, int code, const 
std::string_view& msg) {
-_code = code;
-_err_msg = std::make_unique();
-_err_msg->_msg = msg;
-if (ErrorCode::error_states[abs(code)].stacktrace) {
-_err_msg->_stack = get_stack_trace();
-}
-_nested_excption = std::make_unique();
-_nested_excption->_code = nested._code;
-_nested_excption->_err_msg = std::make_unique();
-_nested_excption->_err_msg->_msg = nested._err_msg->_msg;
-_nested_excption->_err_msg->_stack = nested._err_msg->_stack;
-
-if (config::exit_on_exception) {
-LOG(FATAL) << "[ExitOnException] error code: " << code << ", message: 
" << msg;
-}
-}
-
 } // namespace doris
\ No newline at end of file
diff --git a/be/src/common/exception.h b/be/src/common/exception.h
index ce44e658749..b35ef7e8ff8 100644
--- a/be/src/common/exception.h
+++ b/be/src/common/exception.h
@@ -19,8 +19,8 @@
 
 #include 
 #include 
-#include 
 
+#include 
 #include 
 #include 
 #include 
@@ -39,9 +39,6 @@ public:
 Exception() : _code(ErrorCode::OK) {}
 Exception(int code, const std::string_view& msg);
 Exception(const Status& status) : Exception(status.code(), status.msg()) {}
-// add nested exception as first param, or the template may could not find
-// the correct method for ...args
-Exception(const Exception& nested, int code, const std::string_view& msg);
 
 // Format message with fmt::format, like the logging functions.
 template 
@@ -63,7 +60,6 @@ private:
 std::string _stack;
 };
 std::unique_ptr _err_msg;
-std::unique_ptr _nested_excption;
 mutable std::string _cache_string;
 };
 
@@ -71,16 +67,12 @@ inline const std::string& Exception::to_string() const {
 if (!_cache_string.empty()) {
 return _cache_string;
 }
-std::stringstream ostr;
-ostr << "[E" << _code << "] ";
-ostr << (_err_msg ? _err_msg->_msg : "");
+fmt::memory_buffer buf;
+fmt::format_to(buf, "[E{}] {}", _code, _err_msg ? _err_msg->_msg : "");
 if (_err_msg && !_err_msg->_stack.empty()) {
-ostr << '\n' << _err_msg->_stack;
+fmt::format_to(buf, "\n{}", _err_msg->_stack);
 }
-if (_nested_excption != nullptr) {
-ostr << '\n' << "Caused by:" << _nested_excption->to_string();
-}
-_cache_string = ostr.str();
+_cache_string = fmt::to_string(buf);
 return _cache_string;
 }
 
diff --git a/be/test/common/exception_test.cpp 
b/be/test/common/exception_test.cpp
index 344c0bb1faf..0878c394281 100644
--- a/be/test/common/exception_test.cpp
+++ b/be/test/common/exception_test.cpp
@@ -50,12 +50,6 @@ TEST_F(ExceptionTest, NestedError) {
 throw doris::Exception(ErrorCode::OS_ERROR, "test OS_ERROR {}", "bug");
 } catch (doris::Exception& e1) {
 EXPECT_TRUE(e1.to_string().find("OS_ERROR") != std::string::npos);
-try {
-throw doris::Exception(e1, ErrorCode::INVALID_ARGUMENT, "test 
INVALID_ARGUMENT");
-} catch (doris::Exception& e2) {
-EXPECT_TRUE(e2.to_string().find("OS_ERROR") != std::string::npos);
-EXPECT_TRUE(e2.to_string().find("INVALID_ARGUMENT") != 
std::string::npos);
-}
 }
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [Fix](function) fix some date function impl in FE with special dates (#37766)

2024-07-19 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new a31aae73f34 [Fix](function) fix some date function impl in FE with 
special dates (#37766)
a31aae73f34 is described below

commit a31aae73f34c104b8e68d5b910cf625f6cb5b3cc
Author: zclllhhjj 
AuthorDate: Fri Jul 19 14:07:18 2024 +0800

[Fix](function) fix some date function impl in FE with special dates 
(#37766)

make `dayofyear`, `dayofweek`, `weekofyear` result of FE folding
constant result same with MySQL. mainly about date in BC01 (year ``)

before:
```sql
mysql> select 
DAYOFWEEK('-01-01'),DAYOFWEEK('-01-02'),DAYOFWEEK('-01-03'),DAYOFWEEK('-01-04'),DAYOFWEEK('-01-05'),DAYOFWEEK('-01-06'),DAYOFWEEK('-01-07'),DAYOFWEEK('-01-08');
+--+--+--+--+--+--+--+--+
| 7| 1| 2| 3| 4| 5| 6| 7|
+--+--+--+--+--+--+--+--+
|7 |1 |2 |3 |4 |5 |6 |7 |
+--+--+--+--+--+--+--+--+

mysql> select 
DAYOFYEAR('-02-27'),DAYOFYEAR('-02-28'),DAYOFYEAR('-03-01'),DAYOFYEAR('-03-02');
+--+--+--+--+
| 58   | 59   | 61   | 62   |
+--+--+--+--+
|   58 |   59 |   61 |   62 |
+--+--+--+--+

mysql> select 
WEEKOFYEAR('-01-01'),WEEKOFYEAR('-01-02'),WEEKOFYEAR('-01-03'),WEEKOFYEAR('-01-04'),WEEKOFYEAR('-01-05'),WEEKOFYEAR('-01-06'),WEEKOFYEAR('-01-07'),WEEKOFYEAR('-01-08');
+--+--+--+--+--+--+--+--+
| 52   | 52   | 1| 1| 1| 1| 1| 1|
+--+--+--+--+--+--+--+--+
|   52 |   52 |1 |1 |1 |1 |1 |1 |
+--+--+--+--+--+--+--+--+

mysql> select 
WEEKOFYEAR('-02-25'),WEEKOFYEAR('-02-26'),WEEKOFYEAR('-02-27'),WEEKOFYEAR('-02-28'),WEEKOFYEAR('-03-01'),WEEKOFYEAR('-03-02'),WEEKOFYEAR('2022-03-03');
+--+--+--+--+--+--+--+
| 8| 8| 8| 9| 9| 9| 9|
+--+--+--+--+--+--+--+
|8 |8 |8 |9 |9 |9 |9 |
+--+--+--+--+--+--+--+
```

after:
```sql
mysql> select 
DAYOFWEEK('-01-01'),DAYOFWEEK('-01-02'),DAYOFWEEK('-01-03'),DAYOFWEEK('-01-04'),DAYOFWEEK('-01-05'),DAYOFWEEK('-01-06'),DAYOFWEEK('-01-07'),DAYOFWEEK('-01-08');
+--+--+--+--+--+--+--+--+
| 1| 2| 3| 4| 5| 6| 7| 1|
+--+--+--+--+--+--+--+--+
|1 |2 |3 |4 |5 |6 |7 |1 |
+--+--+--+--+--+--+--+--+

mysql> select 
DAYOFYEAR('-02-27'),DAYOFYEAR('-02-28'),DAYOFYEAR('-03-01'),DAYOFYEAR('-03-02');
+--+--+--+--+
| 58   | 59   | 60   | 61   |
+--+--+--+--+
|   58 |   59 |   60 |   61 |
+--+--+--+--+

mysql> select 
WEEKOFYEAR('-01-01'),WEEKOFYEAR('-01-02'),WEEKOFYEAR('-01-03'),WEEKOFYEAR('-01-04'),WEEKOFYEAR('-01-05'),WEEKOFYEAR('-01-06'),WEEKOFYEAR('-01-07'),WEEKOFYEAR('-01-08');
+--+--+--+--+--+--+--+--+
| 52   | 1| 1| 1| 1| 1| 1| 1|
+--+--+--+--+--+--+--+--+
|   52 |1 |1 |1 |1 |1 |1 |1 |
+--+--+--+--+--+--+--+--+

mysql> select 
WEEKOFYEAR('-02-25'),WEEKOFYEAR('-02-26'),WEEKOFYEAR('-02-27'),WEEKOFYEAR('-02-28'),WEEKOFYEAR('-03-01'),WEEKOFYEAR('-03-02'),WEEKOFYEAR('2022-03-03');
+--+--+--+--+--+--+--+
| 8| 8| 9| 9| 9| 9| 9|
+--+--+--+--+--+--+--+
|8 |8 |9 |9 |9 |9 |9 |
+--+--+--+--+--+--+--+
 

(doris) branch master updated: [regression](limit) Add group by limit regression test case (#37940)

2024-07-18 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 1e65c24455e [regression](limit) Add group by limit regression test 
case (#37940)
1e65c24455e is described below

commit 1e65c24455ebdf9570cd5f97a8bac1b0956a2f14
Author: HappenLee 
AuthorDate: Fri Jul 19 09:54:55 2024 +0800

[regression](limit) Add group by limit regression test case (#37940)

Add group by limit regression test case
---
 be/src/common/config.cpp   |  2 +
 be/src/common/config.h |  4 ++
 be/src/pipeline/exec/aggregation_sink_operator.cpp |  3 +-
 .../data/query_p0/limit/test_group_by_limit.out| 66 ++
 .../query_p0/limit/test_group_by_limit.groovy  | 64 +
 5 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/be/src/common/config.cpp b/be/src/common/config.cpp
index 3e9203987c2..b152111011e 100644
--- a/be/src/common/config.cpp
+++ b/be/src/common/config.cpp
@@ -1343,6 +1343,8 @@ DEFINE_mBool(ignore_not_found_file_in_external_table, 
"true");
 
 DEFINE_mBool(enable_hdfs_mem_limiter, "true");
 
+DEFINE_mInt16(topn_agg_limit_multiplier, "2");
+
 // clang-format off
 #ifdef BE_TEST
 // test s3
diff --git a/be/src/common/config.h b/be/src/common/config.h
index 1ce9c66939c..f4ed1decaa0 100644
--- a/be/src/common/config.h
+++ b/be/src/common/config.h
@@ -1435,6 +1435,10 @@ DECLARE_mBool(ignore_not_found_file_in_external_table);
 
 DECLARE_mBool(enable_hdfs_mem_limiter);
 
+// Define how many percent data in hashtable bigger than limit
+// we should do agg limit opt
+DECLARE_mInt16(topn_agg_limit_multiplier);
+
 #ifdef BE_TEST
 // test s3
 DECLARE_String(test_s3_resource);
diff --git a/be/src/pipeline/exec/aggregation_sink_operator.cpp 
b/be/src/pipeline/exec/aggregation_sink_operator.cpp
index 79ca07281d9..f3a6942c33f 100644
--- a/be/src/pipeline/exec/aggregation_sink_operator.cpp
+++ b/be/src/pipeline/exec/aggregation_sink_operator.cpp
@@ -503,7 +503,8 @@ Status 
AggSinkLocalState::_execute_with_serialized_key_helper(vectorized::Block*
 _shared_state->reach_limit =
 hash_table_size >=
 (_shared_state->do_sort_limit
- ? Base::_parent->template 
cast()._limit * 5
+ ? Base::_parent->template 
cast()._limit *
+   config::topn_agg_limit_multiplier
  : Base::_parent->template 
cast()._limit);
 if (_shared_state->reach_limit && 
_shared_state->do_sort_limit) {
 _shared_state->build_limit_heap(hash_table_size);
diff --git a/regression-test/data/query_p0/limit/test_group_by_limit.out 
b/regression-test/data/query_p0/limit/test_group_by_limit.out
new file mode 100644
index 000..d9ac2a2481a
--- /dev/null
+++ b/regression-test/data/query_p0/limit/test_group_by_limit.out
@@ -0,0 +1,66 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !select --
+253967024  8491AIR
+259556658  8641FOB
+260402265  8669MAIL
+
+-- !select --
+449872500  15000   1
+386605746  12900   2
+320758616  10717   3
+
+-- !select --
+198674527  65880.0
+198679731  65630.01
+198501055  66220.02
+
+-- !select --
+27137  1   1992-02-02
+45697  1   1992-02-04
+114452 5   1992-02-05
+
+-- !select --
+27137  1   1992-02-02T00:00
+45697  1   1992-02-04T00:00
+114452 5   1992-02-05T00:00
+
+-- !select --
+139015016  46321
+130287219  43132
+162309750  53343
+
+-- !select --
+64774969   2166AIR 1
+54166166   1804AIR 2
+45538267   1532AIR 3
+
+-- !select --
+6882631228 AIR 1   0.0
+6756423228 AIR 1   0.01
+7920028254 AIR 1   0.02
+
+-- !select --
+7618   1   AIR 1   0.0 1992-02-06
+2210   1   AIR 1   0.0 1992-03-24
+16807  1   AIR 1   0.0 1992-03-29
+
+-- !select --
+6882631228 AIR 1   0.0
+6756423228 AIR 1   0.01
+7920028254 AIR 1   0.02
+
+-- !select --
+6882631228 AIR 1   0.0
+6756423228 AIR 1   0.01
+7920028254 AIR 1   0.02
+
+-- !select --
+7707018238 TRUCK   1   0.0
+7467045233 TRUCK   1   0.01
+6927206245 TRUCK   1   0.02
+
+-- !select --
+7661562249 TRUCK   1   0.08
+6673139228 TRUCK   1   0.07
+8333862265 TRUCK   1   0.06
+
diff --git a/regression-test/su

(doris) branch master updated: [Fix] Error: missing field initializer (#37403)

2024-07-17 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 52972fce940 [Fix] Error: missing field initializer (#37403)
52972fce940 is described below

commit 52972fce9403ea65c0760060fb102460762cb55d
Author: Uniqueyou <134280716+wyxxx...@users.noreply.github.com>
AuthorDate: Thu Jul 18 14:37:13 2024 +0800

[Fix] Error: missing field initializer (#37403)

Compiling with -Werror turns a warning into an error , clang-18 now 
compiles correctly
---
 be/src/agent/task_worker_pool.cpp |  3 +++
 be/src/cloud/cloud_tablet.cpp |  1 +
 be/src/io/cache/block_file_cache_downloader.cpp   |  1 +
 be/src/io/file_factory.cpp|  6 +-
 be/src/olap/rowset/beta_rowset.cpp|  2 ++
 be/src/olap/rowset/beta_rowset_writer.cpp |  4 +++-
 be/src/olap/rowset/segment_v2/column_reader.h |  2 +-
 be/src/runtime/load_stream.cpp|  1 +
 be/src/runtime/workload_group/workload_group.cpp  |  4 ++--
 be/src/util/s3_util.cpp   |  1 +
 be/src/vec/exec/format/table/paimon_reader.cpp|  1 +
 be/src/vec/exec/scan/new_olap_scanner.cpp | 19 +++
 .../sink/writer/iceberg/viceberg_partition_writer.cpp |  3 ++-
 be/src/vec/sink/writer/vhive_partition_writer.cpp |  3 ++-
 be/src/vec/sink/writer/vtablet_writer_v2.cpp  |  1 +
 15 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/be/src/agent/task_worker_pool.cpp 
b/be/src/agent/task_worker_pool.cpp
index 3e0ac91dce1..efd15d0711b 100644
--- a/be/src/agent/task_worker_pool.cpp
+++ b/be/src/agent/task_worker_pool.cpp
@@ -1385,9 +1385,12 @@ void update_s3_resource(const TStorageResource& param, 
io::RemoteFileSystemSPtr
 auto client = 
static_cast(existed_fs.get())->client_holder();
 auto new_s3_conf = S3Conf::get_s3_conf(param.s3_storage_param);
 S3ClientConf conf {
+.endpoint {},
+.region {},
 .ak = std::move(new_s3_conf.client_conf.ak),
 .sk = std::move(new_s3_conf.client_conf.sk),
 .token = std::move(new_s3_conf.client_conf.token),
+.bucket {},
 .provider = new_s3_conf.client_conf.provider,
 };
 st = client->reset(conf);
diff --git a/be/src/cloud/cloud_tablet.cpp b/be/src/cloud/cloud_tablet.cpp
index ff341ae..50c8765a18d 100644
--- a/be/src/cloud/cloud_tablet.cpp
+++ b/be/src/cloud/cloud_tablet.cpp
@@ -228,6 +228,7 @@ void CloudTablet::add_rowsets(std::vector 
to_add, bool version_
 {
 .expiration_time = 
expiration_time,
 },
+.download_done {},
 });
 }
 #endif
diff --git a/be/src/io/cache/block_file_cache_downloader.cpp 
b/be/src/io/cache/block_file_cache_downloader.cpp
index 9ab172fedd0..02e8f736828 100644
--- a/be/src/io/cache/block_file_cache_downloader.cpp
+++ b/be/src/io/cache/block_file_cache_downloader.cpp
@@ -191,6 +191,7 @@ void FileCacheBlockDownloader::download_segment_file(const 
DownloadFileMeta& met
 FileReaderOptions opts {
 .cache_type = FileCachePolicy::FILE_BLOCK_CACHE,
 .is_doris_table = true,
+.cache_base_path {},
 .file_size = meta.file_size,
 };
 auto st = meta.file_system->open_file(meta.path, &file_reader, &opts);
diff --git a/be/src/io/file_factory.cpp b/be/src/io/file_factory.cpp
index 0c84c2eb74c..7f64ea50710 100644
--- a/be/src/io/file_factory.cpp
+++ b/be/src/io/file_factory.cpp
@@ -55,7 +55,11 @@ constexpr std::string_view RANDOM_CACHE_BASE_PATH = "random";
 
 io::FileReaderOptions FileFactory::get_reader_options(RuntimeState* state,
   const 
io::FileDescription& fd) {
-io::FileReaderOptions opts {.file_size = fd.file_size, .mtime = fd.mtime};
+io::FileReaderOptions opts {
+.cache_base_path {},
+.file_size = fd.file_size,
+.mtime = fd.mtime,
+};
 if (config::enable_file_cache && state != nullptr &&
 state->query_options().__isset.enable_file_cache &&
 state->query_options().enable_file_cache) {
diff --git a/be/src/olap/rowset/beta_rowset.cpp 
b/be/src/olap/rowset/beta_rowset.cpp
index a76cbe636ee..d16c1146142 100644
--- a/be/src/olap/rowset/beta_rowset.cpp
+++ b/be/src/olap/rowset/beta_rowset.cpp
@@ -174,6 +174,7 @@ Status BetaRowset::load_segment(int64_t seg_id, 
segment_v2::SegmentSharedPtr* se
 .cach

(doris) branch master updated: [Fix](function) fix FE impl of some time functions (#37746)

2024-07-17 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 8840daab84c [Fix](function) fix FE impl of some time functions (#37746)
8840daab84c is described below

commit 8840daab84c4276c50f42c869ccdf7430035b644
Author: zclllyybb 
AuthorDate: Thu Jul 18 14:08:00 2024 +0800

[Fix](function) fix FE impl of some time functions (#37746)

before:
```sql
mysql> select date_ceil("2020-12-12 12:12:12.123", interval 2 second);
+---+
| '2020-12-12 12:12:12' |
+---+
| 2020-12-12 12:12:12   |
+---+
1 row in set (0.10 sec)

mysql> select CONVERT_TZ('-12-31 23:59:59.99', 'Pacific/Galapagos', 
'Pacific/Galapagos');
+--+
| NULL |
+--+
| NULL |
+--+
1 row in set (0.09 sec)

mysql [(none)]>select CONVERT_TZ('-12-31 23:59:59.99', 
'Pacific/Galapagos', 'Pacific/GalapaGoS');

+---+
| convert_tz(cast('-12-31 23:59:59.99' as DATETIMEV2(6)), 
'Pacific/Galapagos', 'Pacific/GalapaGoS') |

+---+
| -12-31 23:59:59.99
|

+---+
1 row in set (0.08 sec) --- gone to BE
```
after:
```sql
mysql> select date_ceil("2020-12-12 12:12:12.123", interval 2 second);
+--+
| '2020-12-12 12:12:14.00' |
+--+
| 2020-12-12 12:12:14  |
+--+
1 row in set (0.11 sec)

mysql> select CONVERT_TZ('-12-31 23:59:59.99', 'Pacific/Galapagos', 
'Pacific/Galapagos');

+---+
| convert_tz(cast('-12-31 23:59:59.99' as DATETIMEV2(6)), 
'Pacific/Galapagos', 'Pacific/Galapagos') |

+---+
| -12-31 23:59:59.99
|

+---+
1 row in set (0.23 sec)

mysql> select CONVERT_TZ('-12-31 23:59:59.99', 'Pacific/Galapagos', 
'Pacific/GalapaGoS');
+--+
| '-12-31 23:59:59.99' |
+--+
| -12-31 23:59:59.99   |
+--+
1 row in set (0.11 sec) --- finished in FE
```
---
 .../executable/DateTimeExtractAndTransform.java| 18 --
 .../functions/executable/TimeRoundSeries.java  |  2 +-
 .../trees/expressions/literal/DateLiteral.java |  2 +-
 .../expressions/literal/DateTimeV2Literal.java |  2 +-
 .../nereids/rules/expression/FoldConstantTest.java | 21 +---
 .../data/correctness/test_timev2_fold.out  | 34 +++---
 .../suites/correctness/test_timev2_fold.groovy | 40 ++
 .../nereids_p0/javaudf/test_alias_function.groovy  |  2 +-
 8 files changed, 97 insertions(+), 24 deletions(-)

diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/executable/DateTimeExtractAndTransform.java
 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/executable/DateTimeExtractAndTransform.java
index f719eea44b3..c14b372f201 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/executable/DateTimeExtractAndTransform.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/executable/DateTimeExtractAndTransform.java
@@ -51,7 +51,9 @@ import java.time.LocalDateTime;
 import java.time.ZoneId;
 import java.time.ZonedDateTime;
 import java.time.format.DateTimeFormatter;
+import java.time.format.DateTimeFormatterBuilder;
 import java.time.format.DateTimeParseException;
+import java.time.format.ResolverStyle;
 import java.time.format.TextStyle;
 import java.time.temporal.ChronoUnit;
 import java.time.temporal.WeekFields;
@@ -645,12 +647,22 @@ public class DateTimeExtractAndTransform 

(doris) branch master updated (86575e2719b -> 3eed7b7c072)

2024-07-16 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 86575e2719b [fix](ci) adjust failed_suites_threshold for cloud_p* 
(#37968)
 add 3eed7b7c072 [fix](regression) fix regression test case failure (#37853)

No new revisions were added by this update.

Summary of changes:
 .../sql}/test_left_anti_join_batch_size.out   | 0
 .../spill/partitioned_agg_fault_injection.groovy  | 2 +-
 .../spill/partitioned_hash_join_fault_injection.groovy| 2 +-
 .../spill/spill_sort_fault_injection.groovy   | 2 +-
 .../sql}/test_left_anti_join_batch_size.sql   | 4 ++--
 5 files changed, 5 insertions(+), 5 deletions(-)
 rename regression-test/data/{correctness_p0 => 
tpch_unique_sql_zstd_bucket1_p0/sql}/test_left_anti_join_batch_size.out (100%)
 rename regression-test/suites/{fault_injection_p0 => 
tpch_unique_sql_zstd_bucket1_p0}/spill/partitioned_agg_fault_injection.groovy 
(98%)
 rename regression-test/suites/{fault_injection_p0 => 
tpch_unique_sql_zstd_bucket1_p0}/spill/partitioned_hash_join_fault_injection.groovy
 (99%)
 rename regression-test/suites/{fault_injection_p0 => 
tpch_unique_sql_zstd_bucket1_p0}/spill/spill_sort_fault_injection.groovy (98%)
 rename regression-test/suites/{correctness_p0 => 
tpch_unique_sql_zstd_bucket1_p0/sql}/test_left_anti_join_batch_size.sql (76%)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [Fix](bug) fix the divide zero in local shuffle (#37906)

2024-07-16 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new c138902a0de [Fix](bug) fix the divide zero in local shuffle (#37906)
c138902a0de is described below

commit c138902a0dea538343acabff7b54cbd6010e934c
Author: HappenLee 
AuthorDate: Tue Jul 16 20:37:32 2024 +0800

[Fix](bug) fix the divide zero in local shuffle (#37906)

if 'num_buckets == 0' means the fragment is colocated by exchange node
not the
scan node. so here use `_num_instance` to replace the `num_buckets` to
prevent dividing 0
  still keep colocate plan after local shuffle


`coredump`:
```
SIGFPE integer divide by zero (@0x56431791a54a) received by PID 33673 (TID 
37768 OR 0x7f8028018640) from PID 395421002; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, 
siginfo_t*, void*) at 
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in 
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in 
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
3# 0x7F8C47895520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::vectorized::Partitioner::do_partitioning(doris::RuntimeState*, 
doris::vectorized::Block*, doris::MemTracker*) const at 
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/runtime/partitioner.cpp:50
5# doris::pipeline::ShuffleExchanger::sink(doris::RuntimeState*, 
doris::vectorized::Block*, bool, doris::pipeline::LocalExchangeSinkLocalState&) 
at 
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/local_exchange/local_exchanger.cpp:33
6# doris::pipeline::LocalExchangeSinkOperatorX::sink(doris::RuntimeState*, 
doris::vectorized::Block*, bool) in 
/mnt/ssd01/doris-branch40preview/NEREIDS_ASAN/be/lib/doris_be
7# doris::pipeline::PipelineTask::execute(bool*) at 
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/pipeline_task.cpp:359
8# doris::pipeline::TaskScheduler::_do_work(unsigned long) at 
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/task_scheduler.cpp:138
9# doris::ThreadPool::dispatch_thread() in 
/mnt/ssd01/doris-branch40preview/NEREIDS_ASAN/be/lib/doris_be
10# doris::Thread::supervise_thread(void*) at 
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499
11# start_thread at ./nptl/pthread_create.c:442
12# 0x7F8C47979850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```
---
 be/src/pipeline/pipeline_fragment_context.cpp  |  6 ++-
 .../data/query_p0/limit/sql/withGroupByUnion.out   | 52 ++
 .../suites/query_p0/limit/sql/withGroupByUnion.sql |  1 +
 3 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/be/src/pipeline/pipeline_fragment_context.cpp 
b/be/src/pipeline/pipeline_fragment_context.cpp
index 8138c7594b8..39555d3614e 100644
--- a/be/src/pipeline/pipeline_fragment_context.cpp
+++ b/be/src/pipeline/pipeline_fragment_context.cpp
@@ -884,8 +884,12 @@ Status PipelineFragmentContext::_plan_local_exchange(
 }
 }
 
+// if 'num_buckets == 0' means the fragment is colocated by exchange 
node not the
+// scan node. so here use `_num_instance` to replace the `num_buckets` 
to prevent dividing 0
+// still keep colocate plan after local shuffle
 RETURN_IF_ERROR(_plan_local_exchange(
-
_pipelines[pip_idx]->operator_xs().front()->ignore_data_hash_distribution()
+
_pipelines[pip_idx]->operator_xs().front()->ignore_data_hash_distribution() ||
+num_buckets == 0
 ? _num_instances
 : num_buckets,
 pip_idx, _pipelines[pip_idx], bucket_seq_to_instance_idx,
diff --git a/regression-test/data/query_p0/limit/sql/withGroupByUnion.out 
b/regression-test/data/query_p0/limit/sql/withGroupByUnion.out
new file mode 100644
index 000..2d0e2af41c5
--- /dev/null
+++ b/regression-test/data/query_p0/limit/sql/withGroupByUnion.out
@@ -0,0 +1,52 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !withGroupByUnion --
+0  ALGERIA
+1  ALGERIA
+1  ARGENTINA
+1  BRAZIL
+1  CANADA
+1  CHINA
+1  EGYPT
+1  ETHIOPIA
+1  FRANCE
+1  GERMANY
+1  INDIA
+1  INDONESIA
+1  IRAN
+1  IRAQ
+1  JAPAN
+1  JORDAN
+1  KENYA
+1  MOROCCO
+1  MOZAMBIQUE
+1  PERU
+1  ROMANIA
+1  RUSSIA
+1  SAUDI ARABIA
+1  UNITED KINGDOM
+1  UNITED STATES
+1  VIETNAM
+2  BRAZIL
+3  CANADA
+4  EGYPT
+5  ETHIOPIA
+6  FRANCE
+7  GERMANY
+8  INDIA
+9 

(doris) branch master updated: [opt](MultiCast) Avoid copying while holding a lock (#37462)

2024-07-15 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new ad3e4d5b655 [opt](MultiCast) Avoid copying while holding a lock 
(#37462)
ad3e4d5b655 is described below

commit ad3e4d5b6556f2e4003acea92bf1b108aa827fc3
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Mon Jul 15 16:55:52 2024 +0800

[opt](MultiCast) Avoid copying while holding a lock (#37462)

Previously, copying was done while holding a lock;
Now, get block while holding the lock and then copy
---
 be/src/pipeline/exec/multi_cast_data_streamer.cpp | 115 +-
 be/src/pipeline/exec/multi_cast_data_streamer.h   |  16 ++-
 2 files changed, 78 insertions(+), 53 deletions(-)

diff --git a/be/src/pipeline/exec/multi_cast_data_streamer.cpp 
b/be/src/pipeline/exec/multi_cast_data_streamer.cpp
index deebf7d11bb..d44cf3974a6 100644
--- a/be/src/pipeline/exec/multi_cast_data_streamer.cpp
+++ b/be/src/pipeline/exec/multi_cast_data_streamer.cpp
@@ -23,63 +23,97 @@
 
 namespace doris::pipeline {
 
-MultiCastBlock::MultiCastBlock(vectorized::Block* block, int used_count, 
size_t mem_size)
-: _used_count(used_count), _mem_size(mem_size) {
+MultiCastBlock::MultiCastBlock(vectorized::Block* block, int used_count, int 
un_finish_copy,
+   size_t mem_size)
+: _used_count(used_count), _un_finish_copy(un_finish_copy), 
_mem_size(mem_size) {
 _block = 
vectorized::Block::create_unique(block->get_columns_with_type_and_name());
 block->clear();
 }
 
 Status MultiCastDataStreamer::pull(int sender_idx, doris::vectorized::Block* 
block, bool* eos) {
-std::lock_guard l(_mutex);
-auto& pos_to_pull = _sender_pos_to_read[sender_idx];
-if (pos_to_pull != _multi_cast_blocks.end()) {
-if (pos_to_pull->_used_count == 1) {
-DCHECK(pos_to_pull == _multi_cast_blocks.begin());
-pos_to_pull->_block->swap(*block);
-
-_cumulative_mem_size -= pos_to_pull->_mem_size;
-pos_to_pull++;
-_multi_cast_blocks.pop_front();
-} else {
-pos_to_pull->_block->create_same_struct_block(0)->swap(*block);
-
RETURN_IF_ERROR(vectorized::MutableBlock(block).merge(*pos_to_pull->_block));
-pos_to_pull->_used_count--;
-pos_to_pull++;
+int* un_finish_copy = nullptr;
+int use_count = 0;
+{
+std::lock_guard l(_mutex);
+auto& pos_to_pull = _sender_pos_to_read[sender_idx];
+const auto end = _multi_cast_blocks.end();
+DCHECK(pos_to_pull != end);
+
+*block = *pos_to_pull->_block;
+
+_cumulative_mem_size -= pos_to_pull->_mem_size;
+
+pos_to_pull->_used_count--;
+use_count = pos_to_pull->_used_count;
+un_finish_copy = &pos_to_pull->_un_finish_copy;
+
+pos_to_pull++;
+
+if (pos_to_pull == end) {
+_block_reading(sender_idx);
 }
+
+*eos = _eos and pos_to_pull == end;
 }
-*eos = _eos and pos_to_pull == _multi_cast_blocks.end();
-if (pos_to_pull == _multi_cast_blocks.end()) {
-_block_reading(sender_idx);
+
+if (use_count == 0) {
+// will clear _multi_cast_blocks
+_wait_copy_block(block, *un_finish_copy);
+} else {
+_copy_block(block, *un_finish_copy);
 }
+
 return Status::OK();
 }
 
+void MultiCastDataStreamer::_copy_block(vectorized::Block* block, int& 
un_finish_copy) {
+const auto rows = block->rows();
+for (int i = 0; i < block->columns(); ++i) {
+block->get_by_position(i).column = 
block->get_by_position(i).column->clone_resized(rows);
+}
+
+std::unique_lock l(_mutex);
+un_finish_copy--;
+if (un_finish_copy == 0) {
+l.unlock();
+_cv.notify_one();
+}
+}
+
+void MultiCastDataStreamer::_wait_copy_block(vectorized::Block* block, int& 
un_finish_copy) {
+std::unique_lock l(_mutex);
+_cv.wait(l, [&]() { return un_finish_copy == 0; });
+_multi_cast_blocks.pop_front();
+}
+
 Status MultiCastDataStreamer::push(RuntimeState* state, 
doris::vectorized::Block* block, bool eos) {
 auto rows = block->rows();
 COUNTER_UPDATE(_process_rows, rows);
 
-auto block_mem_size = block->allocated_bytes();
-std::lock_guard l(_mutex);
-int need_process_count = _cast_sender_count - _closed_sender_count;
-if (need_process_count == 0) {
-return Status::EndOfFile("All data streamer is EOF");
-}
-// TODO: if the [queue back block rows + block->rows()] < batch_size, 
better
-// do merge block. but need check the need_process_count and used_count 
whether
-// equal
-_multi_cast_blocks.emplace_back(block, nee

(doris) branch master updated: [fix](arrow-flight-sql) Open regression-test/pipeline/p0/arrow_flight_sql (#36854)

2024-07-12 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 5e3a4e8bcbd [fix](arrow-flight-sql) Open 
regression-test/pipeline/p0/arrow_flight_sql (#36854)
5e3a4e8bcbd is described below

commit 5e3a4e8bcbd77a5e571a05e197c0cfbaff36ef97
Author: Xinyi Zou 
AuthorDate: Fri Jul 12 19:31:29 2024 +0800

[fix](arrow-flight-sql) Open regression-test/pipeline/p0/arrow_flight_sql 
(#36854)

1. Fix `enableParallelResultSink` is true, ADBC client use QueryId to
fetch result, else use FinstId.
2. Add arrow flight sql conf in
regression-test/pipeline/p0/conf/regression-conf.groovy.
3. Add regression-test/suites/arrow_flight_sql_p0/test_select.groovy
---
 .../arrowflight/DorisFlightSqlProducer.java| 11 +---
 regression-test/conf/regression-conf.groovy|  2 +-
 .../data/arrow_flight_sql_p0/test_select.out   |  4 +++
 regression-test/pipeline/p0/conf/fe.conf   |  4 +--
 .../pipeline/p0/conf/regression-conf.groovy|  6 +
 .../suites/arrow_flight_sql_p0/test_select.groovy  | 31 ++
 6 files changed, 52 insertions(+), 6 deletions(-)

diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/DorisFlightSqlProducer.java
 
b/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/DorisFlightSqlProducer.java
index af6d85c954e..16195469af9 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/DorisFlightSqlProducer.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/service/arrowflight/DorisFlightSqlProducer.java
@@ -90,7 +90,7 @@ import java.util.concurrent.Executors;
 
 /**
  * Implementation of Arrow Flight SQL service
- *
+ * 
  * All methods must catch all possible Exceptions, print and throw CallStatus,
  * otherwise error message will be discarded.
  */
@@ -224,8 +224,13 @@ public class DorisFlightSqlProducer implements 
FlightSqlProducer, AutoCloseable
 }
 } else {
 // Now only query stmt will pull results from BE.
-final ByteString handle = ByteString.copyFromUtf8(
-DebugUtil.printId(connectContext.queryId()) + ":" + 
query);
+final ByteString handle;
+if 
(connectContext.getSessionVariable().enableParallelResultSink()) {
+handle = 
ByteString.copyFromUtf8(DebugUtil.printId(connectContext.queryId()) + ":" + 
query);
+} else {
+// only one instance
+handle = 
ByteString.copyFromUtf8(DebugUtil.printId(connectContext.getFinstId()) + ":" + 
query);
+}
 Schema schema = 
flightSQLConnectProcessor.fetchArrowFlightSchema(5000);
 if (schema == null) {
 throw CallStatus.INTERNAL.withDescription("fetch arrow 
flight schema is null").toRuntimeException();
diff --git a/regression-test/conf/regression-conf.groovy 
b/regression-test/conf/regression-conf.groovy
index 6d4d9156339..527b0231394 100644
--- a/regression-test/conf/regression-conf.groovy
+++ b/regression-test/conf/regression-conf.groovy
@@ -200,7 +200,7 @@ s3Region = "ap-hongkong"
 
 //arrow flight sql test config
 extArrowFlightSqlHost = "127.0.0.1"
-extArrowFlightSqlPort = 9090
+extArrowFlightSqlPort = 8080
 extArrowFlightSqlUser = "root"
 extArrowFlightSqlPassword= ""
 
diff --git a/regression-test/data/arrow_flight_sql_p0/test_select.out 
b/regression-test/data/arrow_flight_sql_p0/test_select.out
new file mode 100644
index 000..d643597bbaf
--- /dev/null
+++ b/regression-test/data/arrow_flight_sql_p0/test_select.out
@@ -0,0 +1,4 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !arrow_flight_sql --
+7774
+
diff --git a/regression-test/pipeline/p0/conf/fe.conf 
b/regression-test/pipeline/p0/conf/fe.conf
index ae5a97e2ba4..24853b0a0c6 100644
--- a/regression-test/pipeline/p0/conf/fe.conf
+++ b/regression-test/pipeline/p0/conf/fe.conf
@@ -30,11 +30,11 @@ LOG_DIR = ${DORIS_HOME}/log
 JAVA_OPTS="-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx4096m 
-XX:+HeapDumpOnOutOfMemoryError -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC 
-XX:MaxGCPauseMillis=200 -XX:+PrintGCDateStamps -XX:+PrintGCDetails 
-Xloggc:$DORIS_HOME/log/fe.gc.log.$CUR_DATE -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=50M 
-Dlog4j2.formatMsgNoLookups=true 
-Dcom.mysql.cj.disableAbandonedConnectionCleanup=true"
 
 # For jdk 17, this JAVA_OPTS will be used as default JVM options
-JAVA_OPTS_FOR_JDK_17="-Djavax.security.auth.useSubjectCredsOnly=false 
-Xmx8192m -Xms8192m -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=$DORIS

(doris) branch branch-2.1 updated: [branch-2.1](memory) Add HTTP API to clear data cache (#37704)

2024-07-12 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.1 by this push:
 new 326b40cde2f [branch-2.1](memory) Add HTTP API to clear data cache 
(#37704)
326b40cde2f is described below

commit 326b40cde2f67f7df08a5791fd40a04b5f878c61
Author: Xinyi Zou 
AuthorDate: Fri Jul 12 17:21:52 2024 +0800

[branch-2.1](memory) Add HTTP API to clear data cache (#37704)

pick #36599

Co-authored-by: Gabriel 
---
 be/src/http/action/clear_cache_action.cpp | 39 +++
 be/src/http/action/clear_cache_action.h   | 35 +++
 be/src/runtime/memory/cache_manager.cpp   |  7 ++
 be/src/runtime/memory/cache_manager.h |  1 +
 be/src/service/http_service.cpp   |  6 +
 5 files changed, 88 insertions(+)

diff --git a/be/src/http/action/clear_cache_action.cpp 
b/be/src/http/action/clear_cache_action.cpp
new file mode 100644
index 000..f42499090c4
--- /dev/null
+++ b/be/src/http/action/clear_cache_action.cpp
@@ -0,0 +1,39 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "http/action/clear_cache_action.h"
+
+#include 
+#include 
+
+#include "http/http_channel.h"
+#include "http/http_headers.h"
+#include "http/http_request.h"
+#include "http/http_status.h"
+#include "runtime/memory/cache_manager.h"
+
+namespace doris {
+
+const static std::string HEADER_JSON = "application/json";
+
+void ClearDataCacheAction::handle(HttpRequest* req) {
+req->add_output_header(HttpHeaders::CONTENT_TYPE, "text/plain; 
version=0.0.4");
+CacheManager::instance()->clear_once();
+HttpChannel::send_reply(req, HttpStatus::OK, "");
+}
+
+} // end namespace doris
diff --git a/be/src/http/action/clear_cache_action.h 
b/be/src/http/action/clear_cache_action.h
new file mode 100644
index 000..3840f63593f
--- /dev/null
+++ b/be/src/http/action/clear_cache_action.h
@@ -0,0 +1,35 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "http/http_handler.h"
+
+namespace doris {
+
+class HttpRequest;
+
+class ClearDataCacheAction : public HttpHandler {
+public:
+ClearDataCacheAction() = default;
+
+~ClearDataCacheAction() override = default;
+
+void handle(HttpRequest* req) override;
+};
+
+} // end namespace doris
diff --git a/be/src/runtime/memory/cache_manager.cpp 
b/be/src/runtime/memory/cache_manager.cpp
index d17954ffe8b..9bf3d1e12d0 100644
--- a/be/src/runtime/memory/cache_manager.cpp
+++ b/be/src/runtime/memory/cache_manager.cpp
@@ -56,6 +56,13 @@ int64_t 
CacheManager::for_each_cache_prune_all(RuntimeProfile* profile) {
 return 0;
 }
 
+void CacheManager::clear_once() {
+std::lock_guard l(_caches_lock);
+for (const auto& pair : _caches) {
+pair.second->prune_all(true);
+}
+}
+
 void CacheManager::clear_once(CachePolicy::CacheType type) {
 std::lock_guard l(_caches_lock);
 _caches[type]->prune_all(true); // will print log
diff --git a/be/src/runtime/memory/cache_manager.h 
b/be/src/runtime/memory/cache_manager.h
index c4d8c7bb6f3..20372366aa1 100644
--- a/be/src/runtime/memory/cache_manager.h
+++ b/be/src/runtime/memory/cache_ma

(doris) branch branch-2.1 updated: [branch-2.1](memory) Support make all memory snapshots (#37705)

2024-07-12 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.1 by this push:
 new a61030215e4 [branch-2.1](memory) Support make all memory snapshots 
(#37705)
a61030215e4 is described below

commit a61030215e4b039fd6b5b544227d81ecd23d9bc7
Author: Xinyi Zou 
AuthorDate: Fri Jul 12 16:21:37 2024 +0800

[branch-2.1](memory) Support make all memory snapshots (#37705)

pick #36679
---
 be/src/common/daemon.cpp   |  8 ++--
 be/src/http/default_path_handlers.cpp  |  2 +
 be/src/runtime/memory/mem_tracker.cpp  | 16 +++-
 be/src/runtime/memory/mem_tracker.h| 35 +
 be/src/runtime/memory/mem_tracker_limiter.cpp  | 45 +++---
 be/src/runtime/memory/mem_tracker_limiter.h| 37 ++
 ...emory_arbitrator.cpp => memory_reclamation.cpp} | 13 ---
 .../{memory_arbitrator.h => memory_reclamation.h}  |  2 +-
 be/src/vec/sink/writer/vtablet_writer.cpp  |  6 +--
 9 files changed, 102 insertions(+), 62 deletions(-)

diff --git a/be/src/common/daemon.cpp b/be/src/common/daemon.cpp
index 77d0fdaf0e5..d54189bce23 100644
--- a/be/src/common/daemon.cpp
+++ b/be/src/common/daemon.cpp
@@ -50,7 +50,7 @@
 #include "runtime/memory/global_memory_arbitrator.h"
 #include "runtime/memory/mem_tracker.h"
 #include "runtime/memory/mem_tracker_limiter.h"
-#include "runtime/memory/memory_arbitrator.h"
+#include "runtime/memory/memory_reclamation.h"
 #include "runtime/runtime_query_statistics_mgr.h"
 #include "runtime/workload_group/workload_group_manager.h"
 #include "util/cpu_info.h"
@@ -234,7 +234,7 @@ void Daemon::memory_gc_thread() {
 auto process_memory_usage = 
doris::GlobalMemoryArbitrator::process_memory_usage();
 
 // GC excess memory for resource groups that not enable overcommit
-auto tg_free_mem = 
doris::MemoryArbitrator::tg_disable_overcommit_group_gc();
+auto tg_free_mem = 
doris::MemoryReclamation::tg_disable_overcommit_group_gc();
 sys_mem_available += tg_free_mem;
 process_memory_usage -= tg_free_mem;
 
@@ -248,7 +248,7 @@ void Daemon::memory_gc_thread() {
 memory_minor_gc_sleep_time_ms = memory_gc_sleep_time_ms;
 LOG(INFO) << fmt::format("[MemoryGC] start full GC, {}.", 
mem_info);
 doris::MemTrackerLimiter::print_log_process_usage();
-if (doris::MemoryArbitrator::process_full_gc(std::move(mem_info))) 
{
+if 
(doris::MemoryReclamation::process_full_gc(std::move(mem_info))) {
 // If there is not enough memory to be gc, the process memory 
usage will not be printed in the next continuous gc.
 doris::MemTrackerLimiter::enable_print_log_process_usage();
 }
@@ -261,7 +261,7 @@ void Daemon::memory_gc_thread() {
 memory_minor_gc_sleep_time_ms = memory_gc_sleep_time_ms;
 LOG(INFO) << fmt::format("[MemoryGC] start minor GC, {}.", 
mem_info);
 doris::MemTrackerLimiter::print_log_process_usage();
-if 
(doris::MemoryArbitrator::process_minor_gc(std::move(mem_info))) {
+if 
(doris::MemoryReclamation::process_minor_gc(std::move(mem_info))) {
 doris::MemTrackerLimiter::enable_print_log_process_usage();
 }
 } else {
diff --git a/be/src/http/default_path_handlers.cpp 
b/be/src/http/default_path_handlers.cpp
index 5c697539fbc..8d1a14ffda3 100644
--- a/be/src/http/default_path_handlers.cpp
+++ b/be/src/http/default_path_handlers.cpp
@@ -158,6 +158,8 @@ void mem_tracker_handler(const WebPageHandler::ArgumentMap& 
args, std::stringstr
 MemTrackerLimiter::make_type_snapshots(&snapshots, 
MemTrackerLimiter::Type::OTHER);
 } else if (iter->second == "reserved_memory") {
 GlobalMemoryArbitrator::make_reserved_memory_snapshots(&snapshots);
+} else if (iter->second == "all") {
+MemTrackerLimiter::make_all_memory_state_snapshots(&snapshots);
 }
 } else {
 (*output) << "*Notice:\n";
diff --git a/be/src/runtime/memory/mem_tracker.cpp 
b/be/src/runtime/memory/mem_tracker.cpp
index 27b16c76f2c..f5a3853f79f 100644
--- a/be/src/runtime/memory/mem_tracker.cpp
+++ b/be/src/runtime/memory/mem_tracker.cpp
@@ -45,9 +45,11 @@ MemTracker::MemTracker(const std::string& label, 
MemTrackerLimiter* parent) : _l
 
 void MemTracker::bind_parent(MemTrackerLimiter* parent) {
 if (parent) {
+_type = parent->type();
 _parent_label = parent->label();
 _parent_group_num = parent->group_num();
 } else {
+_type =

(doris) branch branch-2.1 updated (62e02305233 -> cf2fb6945a2)

2024-07-11 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git


from 62e02305233 [branch-2.1](memory) Add `ThreadMemTrackerMgr` BE UT 
(#37654)
 add cf2fb6945a2 [branch-2.1](memory) Refactor LRU cache policy memory 
tracking (#37658)

No new revisions were added by this update.

Summary of changes:
 be/src/common/config.cpp   |   2 +
 be/src/common/config.h |   5 +-
 be/src/olap/page_cache.cpp |  35 +-
 be/src/olap/page_cache.h   |  54 +++-
 .../segment_v2/bitshuffle_page_pre_decoder.h   |   4 +-
 be/src/olap/rowset/segment_v2/encoding_info.h  |   2 +-
 .../rowset/segment_v2/inverted_index_cache.cpp |  10 +-
 .../olap/rowset/segment_v2/inverted_index_cache.h  |  42 +++
 be/src/olap/rowset/segment_v2/page_io.cpp  |  14 +--
 be/src/olap/schema_cache.h |   9 +-
 be/src/olap/segment_loader.cpp |   6 +-
 be/src/olap/segment_loader.h   |  11 +-
 be/src/olap/storage_engine.h   |  10 +-
 be/src/olap/tablet_meta.h  |  11 +-
 be/src/olap/tablet_schema_cache.cpp|   4 +-
 be/src/olap/tablet_schema_cache.h  |  10 +-
 be/src/olap/txn_manager.h  |  11 +-
 be/src/runtime/load_channel_mgr.h  |   9 +-
 be/src/runtime/memory/cache_manager.h  |   5 +-
 be/src/runtime/memory/cache_policy.h   |  28 -
 be/src/runtime/memory/lru_cache_policy.h   | 140 +
 be/src/runtime/memory/lru_cache_value_base.h   |  12 +-
 be/src/service/point_query_executor.cpp|  10 +-
 be/src/service/point_query_executor.h  |  15 ++-
 be/src/util/obj_lru_cache.cpp  |   6 +-
 be/src/util/obj_lru_cache.h|  11 +-
 be/src/vec/common/allocator.cpp|   5 +-
 be/test/olap/lru_cache_test.cpp|  12 +-
 be/test/olap/page_cache_test.cpp   |  30 +++--
 29 files changed, 294 insertions(+), 229 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.1 updated: [branch-2.1](memory) Add `ThreadMemTrackerMgr` BE UT (#37654)

2024-07-11 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.1 by this push:
 new 62e02305233 [branch-2.1](memory) Add `ThreadMemTrackerMgr` BE UT 
(#37654)
62e02305233 is described below

commit 62e02305233b7a5dc2638bbacf396edc876f773f
Author: Xinyi Zou 
AuthorDate: Thu Jul 11 21:03:49 2024 +0800

[branch-2.1](memory) Add `ThreadMemTrackerMgr` BE UT (#37654)

## Proposed changes

pick #35518
---
 be/src/runtime/exec_env.h  |   4 +-
 be/src/runtime/memory/thread_mem_tracker_mgr.h |   8 +-
 be/src/runtime/thread_context.h|   6 -
 .../mem_tracker_test.cpp}  |   2 +-
 .../runtime/memory/thread_mem_tracker_mgr_test.cpp | 455 +
 be/test/testutil/run_all_tests.cpp |   2 +
 6 files changed, 467 insertions(+), 10 deletions(-)

diff --git a/be/src/runtime/exec_env.h b/be/src/runtime/exec_env.h
index 41d8c740326..d877096aec2 100644
--- a/be/src/runtime/exec_env.h
+++ b/be/src/runtime/exec_env.h
@@ -263,7 +263,9 @@ public:
 this->_dummy_lru_cache = dummy_lru_cache;
 }
 void set_write_cooldown_meta_executors();
-
+static void set_tracking_memory(bool tracking_memory) {
+_s_tracking_memory.store(tracking_memory, std::memory_order_acquire);
+}
 #endif
 LoadStreamMapPool* load_stream_map_pool() { return 
_load_stream_map_pool.get(); }
 
diff --git a/be/src/runtime/memory/thread_mem_tracker_mgr.h 
b/be/src/runtime/memory/thread_mem_tracker_mgr.h
index 64c2190a149..9d36cd2d807 100644
--- a/be/src/runtime/memory/thread_mem_tracker_mgr.h
+++ b/be/src/runtime/memory/thread_mem_tracker_mgr.h
@@ -125,6 +125,9 @@ public:
 fmt::to_string(consumer_tracker_buf));
 }
 
+int64_t untracked_mem() const { return _untracked_mem; }
+int64_t reserved_mem() const { return _reserved_mem; }
+
 private:
 // is false: ExecEnv::ready() = false when thread local is initialized
 bool _init = false;
@@ -190,7 +193,7 @@ inline void ThreadMemTrackerMgr::pop_consumer_tracker() {
 
 inline void ThreadMemTrackerMgr::consume(int64_t size, int 
skip_large_memory_check) {
 if (_reserved_mem != 0) {
-if (_reserved_mem >= size) {
+if (_reserved_mem > size) {
 // only need to subtract _reserved_mem, no need to consume 
MemTracker,
 // every time _reserved_mem is minus the sum of size >= 
SYNC_PROC_RESERVED_INTERVAL_BYTES,
 // subtract size from process global reserved memory,
@@ -208,7 +211,8 @@ inline void ThreadMemTrackerMgr::consume(int64_t size, int 
skip_large_memory_che
 }
 return;
 } else {
-// reserved memory is insufficient, the remaining _reserved_mem is 
subtracted from this memory consumed,
+// _reserved_mem <= size, reserved memory used done,
+// the remaining _reserved_mem is subtracted from this memory 
consumed,
 // and reset _reserved_mem to 0, and subtract the remaining 
_reserved_mem from
 // process global reserved memory, this means that all reserved 
memory has been used by BE process.
 size -= _reserved_mem;
diff --git a/be/src/runtime/thread_context.h b/be/src/runtime/thread_context.h
index 72d3c8111f6..7a4695a4e98 100644
--- a/be/src/runtime/thread_context.h
+++ b/be/src/runtime/thread_context.h
@@ -156,14 +156,12 @@ public:
 
 void attach_task(const TUniqueId& task_id,
  const std::shared_ptr& mem_tracker) {
-#ifndef BE_TEST
 // will only attach_task at the beginning of the thread function, 
there should be no duplicate attach_task.
 DCHECK(mem_tracker);
 // Orphan is thread default tracker.
 DCHECK(thread_mem_tracker()->label() == "Orphan")
 << ", thread mem tracker label: " << 
thread_mem_tracker()->label()
 << ", attach mem tracker label: " << mem_tracker->label();
-#endif
 _task_id = task_id;
 thread_mem_tracker_mgr->attach_limiter_tracker(mem_tracker);
 thread_mem_tracker_mgr->set_query_id(_task_id);
@@ -374,9 +372,7 @@ public:
 class SwitchThreadMemTrackerLimiter {
 public:
 explicit SwitchThreadMemTrackerLimiter(const 
std::shared_ptr& mem_tracker) {
-#ifndef BE_TEST
 DCHECK(mem_tracker);
-#endif
 ThreadLocalHandle::create_thread_local_if_not_exits();
 _old_mem_tracker = 
thread_context()->thread_mem_tracker_mgr->limiter_mem_tracker();
 
thread_context()->thread_mem_tracker_mgr->attach_limiter_tracker(mem_tracker);
@@ -385,9 +381,7 @@ public:
 explicit SwitchThreadMemTrackerLimiter(const QueryThreadContext& 
query_thread_cont

(doris-website) branch master updated: [doc](function) add some doc for function ipv4-to-ipv6/cut-ipv6 (#840)

2024-07-10 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
 new 0a528a83aa [doc](function) add some doc for function 
ipv4-to-ipv6/cut-ipv6 (#840)
0a528a83aa is described below

commit 0a528a83aa9c84d622fab2c08ecc3d257afcf82c
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Wed Jul 10 20:32:34 2024 +0800

[doc](function) add some doc for function ipv4-to-ipv6/cut-ipv6 (#840)

https://github.com/apache/doris/pull/36883
---
 .../sql-functions/ip-functions/cut-ipv6.md | 51 +
 .../sql-functions/ip-functions/ipv4-to-ipv6.md | 51 +
 .../sql-functions/ip-functions/cut-ipv6.md | 53 ++
 .../sql-functions/ip-functions/ipv4-to-ipv6.md | 53 ++
 sidebars.json  |  2 +
 5 files changed, 210 insertions(+)

diff --git a/docs/sql-manual/sql-functions/ip-functions/cut-ipv6.md 
b/docs/sql-manual/sql-functions/ip-functions/cut-ipv6.md
new file mode 100644
index 00..f0012c3b3d
--- /dev/null
+++ b/docs/sql-manual/sql-functions/ip-functions/cut-ipv6.md
@@ -0,0 +1,51 @@
+---
+{
+"title": "CUT_IPV6",
+"language": "en"
+}
+---
+
+
+
+## CUT_IPV6
+
+CUT_IPV6
+
+### Description
+
+ Syntax
+
+`STRING CUT_IPV6(IPV4 ipv4, TinyInt cut_ipv6_bytes, TinyInt cut_ipv4_bytes)`
+
+accept an IPv6 type address and return a string containing the address of the 
specified number of bytes removed in text format
+
+### Example
+
+```sql
+mysql [(none)]>select 
cut_ipv6(to_ipv6('2001:0DB8:AC10:FE01:FEED:BABE:CAFE:F00D'), 10, 0);
++---+
+| '2001:db8:ac10::' |
++---+
+| 2001:db8:ac10::   |
++---+
+1 row in set (0.00 sec)
+```
+
+### Keywords
+
+CUT_IPV6, IP
diff --git a/docs/sql-manual/sql-functions/ip-functions/ipv4-to-ipv6.md 
b/docs/sql-manual/sql-functions/ip-functions/ipv4-to-ipv6.md
new file mode 100644
index 00..ba3a0c0092
--- /dev/null
+++ b/docs/sql-manual/sql-functions/ip-functions/ipv4-to-ipv6.md
@@ -0,0 +1,51 @@
+---
+{
+"title": "IPV4_TO_IPV6",
+"language": "en"
+}
+---
+
+
+
+## IPV4_TO_IPV6
+
+IPV4_TO_IPV6
+
+### Description
+
+ Syntax
+
+`IPV6 IPV4_TO_IPV6(IPV4 ipv4)`
+
+accept an IPv4 type address and return the converted IPv6 type address.
+
+### Example
+
+```sql
+mysql [(none)]>select ipv6_num_to_string(ipv4_to_ipv6(to_ipv4('192.168.0.1')));
++--+
+| ':::192.168.0.1' |
++--+
+| :::192.168.0.1   |
++--+
+1 row in set (0.02 sec)
+```
+
+### Keywords
+
+IPV4_TO_IPV6, IP
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/ip-functions/cut-ipv6.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/ip-functions/cut-ipv6.md
new file mode 100644
index 00..df46575054
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/ip-functions/cut-ipv6.md
@@ -0,0 +1,53 @@
+---
+{
+"title": "CUT_IPV6",
+"language": "zh-CN"
+}
+---
+
+
+
+## CUT_IPV6
+
+CUT_IPV6
+
+### Description
+
+ Syntax
+
+`STRING CUT_IPV6(IPV4 ipv4, TinyInt cut_ipv6_bytes, TinyInt cut_ipv4_bytes)`
+
+
+接受一个 IPv6 类型的地址,并以文本格式返回一个包含指定字节数的地址的字符串。
+
+
+### Example
+
+```sql
+mysql [(none)]>select 
cut_ipv6(to_ipv6('2001:0DB8:AC10:FE01:FEED:BABE:CAFE:F00D'), 10, 0);
++---+
+| '2001:db8:ac10::' |
++---+
+| 2001:db8:ac10::   |
++---+
+1 row in set (0.00 sec)
+```
+
+### Keywords
+
+CUT_IPV6, IP
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/ip-functions/ipv4-to-ipv6.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/ip-functions/ipv4-to-ipv6.md
new file mode 100644
index 00..753afce931
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/ip-functions/ipv4-to-ipv6.md
@@ -0,0 +1,53 @@
+---
+{
+"title": "IPV4_TO_IPV6",
+"language": "zh-CN"
+}
+---
+
+
+
+## IPV4_TO_IPV6
+
+IPV4_TO_IPV6
+
+### Description
+
+ Syntax
+
+`IPV6 IPV4_TO_IPV6(IPV4 ipv4)`
+
+
+接受一个类型为 IPv4 的地址,返回相应 IPv6 的形式。
+
+
+### Example
+
+```sql
+mysql [(none)]>select ipv6_num_to_string(ipv4_to_ipv6(to_ipv4('192.168.0.1')));
++--+
+| ':::192.168.0.1' |
++--+
+| :::192.168.0.1   |
++--+
+1 row in set (0.02 sec)
+```
+
+### Keywords
+
+IPV4_TO_IPV6, IP
diff --git a/sidebars.json b/sidebars.json
index a937f

(doris) branch master updated: [fix](serde)fix string deserialize with unescaped char (#37251)

2024-07-08 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 308dd2b65e0 [fix](serde)fix string deserialize with unescaped char 
(#37251)
308dd2b65e0 is described below

commit 308dd2b65e0fb5616a5ae38f3e79bff0c7b88ea5
Author: amory 
AuthorDate: Mon Jul 8 17:41:50 2024 +0800

[fix](serde)fix string deserialize with unescaped char (#37251)

before this pr : if we use streamload with unescaped char in json format
, we will not deal with it in nested type.
like this :
```
|   27 | "双引号"| [""双引号"", "反斜\线"]   |
```
---
 be/src/vec/functions/function_cast.h   |  1 +
 .../data/jsonb_p0/test_jsonb_unescaped.csv |  5 ++
 .../data/jsonb_p0/test_jsonb_unescaped.json|  5 ++
 .../jsonb_p0/test_jsonb_with_unescaped_string.out  | 15 
 .../test_jsonb_with_unescaped_string.groovy| 99 ++
 5 files changed, 125 insertions(+)

diff --git a/be/src/vec/functions/function_cast.h 
b/be/src/vec/functions/function_cast.h
index d4b21aacc5c..f896770 100644
--- a/be/src/vec/functions/function_cast.h
+++ b/be/src/vec/functions/function_cast.h
@@ -576,6 +576,7 @@ struct ConvertImplGenericFromString {
 const bool is_complex = is_complex_type(data_type_to);
 DataTypeSerDe::FormatOptions format_options;
 format_options.converted_from_string = true;
+format_options.escape_char = '\\';
 
 for (size_t i = 0; i < size; ++i) {
 const auto& val = col_from_string->get_data_at(i);
diff --git a/regression-test/data/jsonb_p0/test_jsonb_unescaped.csv 
b/regression-test/data/jsonb_p0/test_jsonb_unescaped.csv
new file mode 100644
index 000..e4f859e7511
--- /dev/null
+++ b/regression-test/data/jsonb_p0/test_jsonb_unescaped.csv
@@ -0,0 +1,5 @@
+1  \N
+2  ['{\'x\' : \'{"y" : 1}\', \'t\' : \'{"y" : 2}\'}', '{"x" : 1}']
+3  ['foo\'bar', 'foo"bar', 'foo\\'bar', 'foo\'\'bar']
+4  ['\/some\/cool\/url', '/some/cool/url', 
'a\\_\\c\\l\\i\\c\\k\\h\\o\\u\\s\\e']
+5  ["\"双引号\"", "反斜\\线"]
\ No newline at end of file
diff --git a/regression-test/data/jsonb_p0/test_jsonb_unescaped.json 
b/regression-test/data/jsonb_p0/test_jsonb_unescaped.json
new file mode 100644
index 000..de718c8efde
--- /dev/null
+++ b/regression-test/data/jsonb_p0/test_jsonb_unescaped.json
@@ -0,0 +1,5 @@
+{"id":1,"a":null}
+{"id":2,"a":['{\'x\' : \'{"y" : 1}\', \'t\' : \'{"y" : 2}\'}', \'{"x" : 1}']}
+{"id":3,"a":['foo\'bar', 'foo\"bar', 'foo\\\'bar', 'foo\'\'bar']}
+{"id":4,"a":['\/some\/cool\/url', '/some/cool/url', 
'a\\_\\c\\l\\i\\c\\k\\h\\o\\u\\s\\e']}
+{"id":5,"a":["\"双引号\"", "反斜\\线"]}
\ No newline at end of file
diff --git a/regression-test/data/jsonb_p0/test_jsonb_with_unescaped_string.out 
b/regression-test/data/jsonb_p0/test_jsonb_with_unescaped_string.out
new file mode 100644
index 000..99fb23ef9ee
--- /dev/null
+++ b/regression-test/data/jsonb_p0/test_jsonb_with_unescaped_string.out
@@ -0,0 +1,15 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !select_csv --
+1  \N
+2  ["{'x' : '{"y" : 1}', 't' : '{"y" : 2}'}", "{"x" : 1}"]
+3  ["foo'bar', 'foo"bar', 'foo\\'bar', 'foo''bar"]
+4  ["/some/cool/url", "/some/cool/url", 
"a\\_\\c\\l\\i\\c\\k\\h\\o\\u\\s\\e"]
+5  [""双引号"", "反斜\\线"]
+
+-- !select_json --
+1  \N
+2  ["{'x' : '{"y" : 1}', 't' : '{"y" : 2}'}", "'{"x" : 1}'"]
+3  ["foo'bar', 'foo"bar', 'foo\\'bar', 'foo''bar"]
+4  ["/some/cool/url", "/some/cool/url", 
"a\\_\\c\\l\\i\\c\\k\\h\\o\\u\\s\\e"]
+5  [""双引号"", "反斜\\线"]
+
diff --git 
a/regression-test/suites/jsonb_p0/test_jsonb_with_unescaped_string.groovy 
b/regression-test/suites/jsonb_

(doris) branch master updated: [bug](function)fix json_replace check return type error (#37014)

2024-07-05 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 63314b84d25 [bug](function)fix json_replace check return type error 
(#37014)
63314b84d25 is described below

commit 63314b84d25a47e1a0de7f243ae152cf68ea1abb
Author: zhangstar333 <87313068+zhangstar...@users.noreply.github.com>
AuthorDate: Fri Jul 5 18:43:06 2024 +0800

[bug](function)fix json_replace check return type error (#37014)

1. fix the return type dcheck error:
```
mysql [test]>select (json_replace(a, '$.fparam.nested_2', "qwe")) from 
json_table_2 limit 1;
ERROR 1105 (HY000): errCode = 2, detailMessage = 
(10.16.10.8)[INTERNAL_ERROR]Function json_replace get failed, expr is 
VectorizedFnCall[json_replace](arguments=a, String, String, 
String,return=Nullable(String)) and return type is Nullable(String).
```

2. improve the json_replace/json_insert/json_set function execute of not
convert const column, test about could faster 1s on 1000w table rows
---
 be/src/vec/functions/function_json.cpp | 75 +-
 .../json_function/test_query_json_replace.out  |  5 ++
 .../json_function/test_query_json_replace.groovy   | 23 +++
 3 files changed, 74 insertions(+), 29 deletions(-)

diff --git a/be/src/vec/functions/function_json.cpp 
b/be/src/vec/functions/function_json.cpp
index 2faeb24d514..cb667d2dc76 100644
--- a/be/src/vec/functions/function_json.cpp
+++ b/be/src/vec/functions/function_json.cpp
@@ -1346,11 +1346,13 @@ private:
 
 Status 
get_parsed_path_columns(std::vector>>& 
json_paths,
const std::vector& 
data_columns,
-   size_t input_rows_count) const {
+   size_t input_rows_count,
+   std::vector& column_is_consts) const {
 for (auto col = 1; col + 1 < data_columns.size() - 1; col += 2) {
 json_paths.emplace_back(std::vector>());
 for (auto row = 0; row < input_rows_count; row++) {
-const auto path = data_columns[col]->get_data_at(row);
+const auto path = data_columns[col]->get_data_at(
+index_check_const(row, column_is_consts[col]));
 std::string_view path_string(path.data, path.size);
 std::vector parsed_paths;
 
@@ -1384,7 +1386,7 @@ public:
 DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
 bool is_nullable = false;
 // arguments: (json_str, path, val[, path, val...], type_flag)
-for (auto col = 2; col < arguments.size() - 1; col += 2) {
+for (auto col = 0; col < arguments.size() - 1; col += 1) {
 if (arguments[col]->is_nullable()) {
 is_nullable = true;
 break;
@@ -1398,36 +1400,42 @@ public:
 size_t result, size_t input_rows_count) const override 
{
 auto result_column = ColumnString::create();
 bool is_nullable = false;
-auto ret_null_map = ColumnUInt8::create(0, 0);
+ColumnUInt8::MutablePtr ret_null_map = nullptr;
+ColumnUInt8::Container* ret_null_map_data = nullptr;
 
-std::vector column_ptrs; // prevent converted column 
destruct
 std::vector data_columns;
 std::vector nullmaps;
+std::vector column_is_consts;
 for (int i = 0; i < arguments.size(); i++) {
-auto column = block.get_by_position(arguments[i]).column;
-column_ptrs.push_back(column->convert_to_full_column_if_const());
-const ColumnNullable* col_nullable =
-
check_and_get_column(column_ptrs.back().get());
+ColumnPtr arg_col;
+bool arg_const;
+std::tie(arg_col, arg_const) =
+
unpack_if_const(block.get_by_position(arguments[i]).column);
+const auto* col_nullable = 
check_and_get_column(arg_col.get());
+column_is_consts.push_back(arg_const);
 if (col_nullable) {
 if (!is_nullable) {
 is_nullable = true;
-ret_null_map = ColumnUInt8::create(input_rows_count, 0);
 }
-const ColumnUInt8* col_nullmap = 
check_and_get_column(
+const ColumnUInt8* col_nullmap = assert_cast(
 col_nullable->get_null_map_column_ptr().get());
 nullmaps.push_back(col_nullmap);
-const ColumnString* col = check_and_get_column(
+const ColumnString* col = assert_cast(
 col_nullable->get_nested_column_ptr().get());
 data_co

(doris) branch master updated (8419b40b268 -> d582f11102d)

2024-07-03 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 8419b40b268 [refact](meta-service) Split `commit_txn` into several 
functions (#36848)
 add d582f11102d [Refactor](timezone) refactor tzdata load to accelerate 
and unify timezone parsing (#37062)

No new revisions were added by this update.

Summary of changes:
 .gitignore |   1 -
 be/src/common/config.cpp   |   3 -
 be/src/common/config.h |   3 -
 be/src/runtime/exec_env_init.cpp   |   1 -
 be/src/util/timezone_utils.cpp | 309 +++--
 be/src/util/timezone_utils.h   |  19 +-
 be/test/vec/function/function_time_test.cpp|   4 +-
 .../utils/arrow_column_to_doris_column_test.cpp|   2 +-
 build.sh   |   7 -
 .../trees/expressions/literal/DateLiteral.java |  11 -
 .../trees/expressions/literal/DateTimeLiteral.java |  11 +-
 .../doris/nereids/util/DateTimeFormatterUtils.java |   1 +
 .../data/datatype_p0/datetimev2/test_timezone.out  |  11 +-
 .../datatype_p0/datetimev2/test_tz_streamload.csv  |   6 +-
 .../datatype_p0/datetimev2/test_tz_streamload2.csv |   6 +-
 .../datatype_p0/datetimev2/test_timezone.groovy|  45 ++-
 .../datetimev2/test_tz_streamload.groovy   |   2 +-
 .../jdbc/test_jdbc_query_mysql.groovy  |  22 +-
 .../fold_constant/fold_constant_by_fe.groovy   |   4 +-
 .../datetime_functions/test_date_function.groovy   |  10 +-
 .../datetime_functions/test_date_function.groovy   |  10 +-
 resource/zoneinfo.tar.gz   | Bin 134456 -> 0 bytes
 22 files changed, 126 insertions(+), 362 deletions(-)
 delete mode 100644 resource/zoneinfo.tar.gz


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.1 updated: [fix](hash join) fix numeric overflow when calculating hash table bucket size #37193 (#37213)

2024-07-03 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.1 by this push:
 new fb344b66cae [fix](hash join) fix numeric overflow when calculating 
hash table bucket size #37193 (#37213)
fb344b66cae is described below

commit fb344b66caefe32a07d7d175f54802652fc853a2
Author: TengJianPing <18241664+jackte...@users.noreply.github.com>
AuthorDate: Thu Jul 4 11:12:52 2024 +0800

[fix](hash join) fix numeric overflow when calculating hash table bucket 
size #37193 (#37213)

## Proposed changes

Bp #37193
---
 be/src/vec/common/hash_table/join_hash_table.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/be/src/vec/common/hash_table/join_hash_table.h 
b/be/src/vec/common/hash_table/join_hash_table.h
index a869ad419ad..99ce2d13b48 100644
--- a/be/src/vec/common/hash_table/join_hash_table.h
+++ b/be/src/vec/common/hash_table/join_hash_table.h
@@ -19,6 +19,8 @@
 
 #include 
 
+#include 
+
 #include "vec/columns/column_filter_helper.h"
 #include "vec/common/hash_table/hash.h"
 #include "vec/common/hash_table/hash_table.h"
@@ -35,7 +37,8 @@ public:
 
 static uint32_t calc_bucket_size(size_t num_elem) {
 size_t expect_bucket_size = num_elem + (num_elem - 1) / 7;
-return phmap::priv::NormalizeCapacity(expect_bucket_size) + 1;
+return std::min(phmap::priv::NormalizeCapacity(expect_bucket_size) + 1,
+
static_cast(std::numeric_limits::max()));
 }
 
 size_t get_byte_size() const {


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.0 updated (473c163ceda -> f94ddb7edc9)

2024-07-03 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/doris.git


from 473c163ceda [fix](merge-on-write) when full clone failed, duplicate 
key might occur (#37001) (#37227)
 add f94ddb7edc9 [branch-2.0](function) fix nereids fold constant wrong 
result of abs (#37065) (#37107)

No new revisions were added by this update.

Summary of changes:
 .../functions/executable/ExecutableFunctions.java  |  8 +--
 .../functions/ExecutableFunctionsTest.java | 64 ++
 2 files changed, 68 insertions(+), 4 deletions(-)
 create mode 100644 
fe/fe-core/src/test/java/org/apache/doris/nereids/trees/expressions/functions/ExecutableFunctionsTest.java


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.0 updated (7c0b113aa83 -> e2caee628f5)

2024-07-03 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/doris.git


from 7c0b113aa83 [feature](datatype) add BE config to allow zero date 
(#34961) (#37214)
 add e2caee628f5 [branch-2.0](function) fix date_format and from_unixtime 
core when meet long format string (#35883) (#37178)

No new revisions were added by this update.

Summary of changes:
 be/src/olap/types.h| 20 ++---
 .../serde/data_type_datetimev2_serde.cpp   | 15 --
 be/src/vec/functions/date_time_transforms.h| 10 ---
 be/src/vec/runtime/vdatetime_value.cpp | 35 ++
 be/src/vec/runtime/vdatetime_value.h   | 17 +--
 .../data/datatype_p0/date/test_from_unixtime.out   |  3 ++
 .../datetime_functions/test_date_function.out  |  3 ++
 .../datatype_p0/date/test_from_unixtime.groovy |  1 +
 .../datetime_functions/test_date_function.groovy   |  1 +
 9 files changed, 67 insertions(+), 38 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (4984222d8eb -> cb795bd4b73)

2024-07-03 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 4984222d8eb [Fix](schema change) Fix can't do reorder column schema 
change for MOW table and duplicate key table (#37067)
 add cb795bd4b73 [Performance](func) Opt the any_value agg function no 
group by (#37156)

No new revisions were added by this update.

Summary of changes:
 be/src/vec/aggregate_functions/aggregate_function_min_max.h | 13 +
 1 file changed, 13 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.1 updated: [fix](null safe equal join) fix coredump if both sides of the conjunct is not nullable #36263 (#37073)

2024-07-01 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.1 by this push:
 new 6789f5bc80d [fix](null safe equal join) fix coredump if both sides of 
the conjunct is not nullable #36263 (#37073)
6789f5bc80d is described below

commit 6789f5bc80db8742efbf54ac2a5562325914ff91
Author: TengJianPing <18241664+jackte...@users.noreply.github.com>
AuthorDate: Tue Jul 2 11:01:55 2024 +0800

[fix](null safe equal join) fix coredump if both sides of the conjunct is 
not nullable #36263 (#37073)
---
 be/src/pipeline/exec/hashjoin_build_sink.cpp   |  10 +-
 be/src/pipeline/exec/hashjoin_probe_operator.cpp   |   8 +-
 .../data/query_p0/join/rqg/rqg12257/rqg12257.out   |   5 +
 .../query_p0/join/rqg/rqg12257/rqg12257.groovy | 341 +
 4 files changed, 360 insertions(+), 4 deletions(-)

diff --git a/be/src/pipeline/exec/hashjoin_build_sink.cpp 
b/be/src/pipeline/exec/hashjoin_build_sink.cpp
index 56480096e3d..ccfd934af99 100644
--- a/be/src/pipeline/exec/hashjoin_build_sink.cpp
+++ b/be/src/pipeline/exec/hashjoin_build_sink.cpp
@@ -486,8 +486,14 @@ Status HashJoinBuildSinkOperatorX::init(const TPlanNode& 
tnode, RuntimeState* st
 const auto vexpr = _build_expr_ctxs.back()->root();
 
 /// null safe equal means null = null is true, the operator in SQL 
should be: <=>.
-const bool is_null_safe_equal = eq_join_conjunct.__isset.opcode &&
-eq_join_conjunct.opcode == 
TExprOpcode::EQ_FOR_NULL;
+const bool is_null_safe_equal =
+eq_join_conjunct.__isset.opcode &&
+(eq_join_conjunct.opcode == TExprOpcode::EQ_FOR_NULL) &&
+// For a null safe equal join, FE may generate a plan that
+// both sides of the conjuct are not nullable, we just treat it
+// as a normal equal join conjunct.
+(eq_join_conjunct.right.nodes[0].is_nullable ||
+ eq_join_conjunct.left.nodes[0].is_nullable);
 
 const bool should_convert_to_nullable = is_null_safe_equal &&
 
!eq_join_conjunct.right.nodes[0].is_nullable &&
diff --git a/be/src/pipeline/exec/hashjoin_probe_operator.cpp 
b/be/src/pipeline/exec/hashjoin_probe_operator.cpp
index 00cf6a65eb0..002a79f2db2 100644
--- a/be/src/pipeline/exec/hashjoin_probe_operator.cpp
+++ b/be/src/pipeline/exec/hashjoin_probe_operator.cpp
@@ -540,13 +540,17 @@ Status HashJoinProbeOperatorX::init(const TPlanNode& 
tnode, RuntimeState* state)
 
RETURN_IF_ERROR(vectorized::VExpr::create_expr_tree(eq_join_conjunct.left, 
ctx));
 _probe_expr_ctxs.push_back(ctx);
 bool null_aware = eq_join_conjunct.__isset.opcode &&
-  eq_join_conjunct.opcode == TExprOpcode::EQ_FOR_NULL;
+  eq_join_conjunct.opcode == TExprOpcode::EQ_FOR_NULL 
&&
+  (eq_join_conjunct.right.nodes[0].is_nullable ||
+   eq_join_conjunct.left.nodes[0].is_nullable);
 probe_not_ignore_null[conjuncts_index] =
 null_aware ||
 (_probe_expr_ctxs.back()->root()->is_nullable() && 
probe_dispose_null);
 conjuncts_index++;
 const bool is_null_safe_equal = eq_join_conjunct.__isset.opcode &&
-eq_join_conjunct.opcode == 
TExprOpcode::EQ_FOR_NULL;
+(eq_join_conjunct.opcode == 
TExprOpcode::EQ_FOR_NULL) &&
+
(eq_join_conjunct.right.nodes[0].is_nullable ||
+ 
eq_join_conjunct.left.nodes[0].is_nullable);
 
 /// If it's right anti join,
 /// we should convert the probe to nullable if the build side is 
nullable.
diff --git a/regression-test/data/query_p0/join/rqg/rqg12257/rqg12257.out 
b/regression-test/data/query_p0/join/rqg/rqg12257/rqg12257.out
new file mode 100644
index 000..e82718978b6
--- /dev/null
+++ b/regression-test/data/query_p0/join/rqg/rqg12257/rqg12257.out
@@ -0,0 +1,5 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !rqg12257 --
+
+-- !rqg12257_2 --
+
diff --git a/regression-test/suites/query_p0/join/rqg/rqg12257/rqg12257.groovy 
b/regression-test/suites/query_p0/join/rqg/rqg12257/rqg12257.groovy
new file mode 100644
index 000..c04a1f460d4
--- /dev/null
+++ b/regression-test/suites/query_p0/join/rqg/rqg12257/rqg12257.groovy
@@ -0,0 +1,341 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for

(doris) branch master updated (06f875726b4 -> 6c03ca1641e)

2024-07-01 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 06f875726b4 [opt](function) opt pad function process the input const 
column (#36863)
 add 6c03ca1641e [opt](function) opt ParseUrl function by process the input 
const column (#36882)

No new revisions were added by this update.

Summary of changes:
 be/src/vec/columns/column_const.h  |   9 +++
 be/src/vec/functions/function_string.h | 134 +
 2 files changed, 93 insertions(+), 50 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (606ce442438 -> 06f875726b4)

2024-07-01 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 606ce442438 [opt](nereids) add join shuffle type in explain shape plan 
(#36712)
 add 06f875726b4 [opt](function) opt pad function process the input const 
column (#36863)

No new revisions were added by this update.

Summary of changes:
 be/src/vec/functions/function_string.h | 151 +
 1 file changed, 78 insertions(+), 73 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.1 updated: [opt](log) Remove unnecessary warning log (#37093)

2024-07-01 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.1 by this push:
 new 011f203d717 [opt](log) Remove unnecessary warning log (#37093)
011f203d717 is described below

commit 011f203d717aab2fb37218b6bb9dedbf97892023
Author: zhiqiang 
AuthorDate: Tue Jul 2 10:53:36 2024 +0800

[opt](log) Remove unnecessary warning log (#37093)

When enable_profile = true or report_succeed=true, it is very likely

fe/fe-core/src/main/java/org/apache/doris/qe/QeProcessorImpl.java::reportExecStatus
will print much warning log. It is not necesary.
---
 fe/fe-core/src/main/java/org/apache/doris/qe/QeProcessorImpl.java | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fe/fe-core/src/main/java/org/apache/doris/qe/QeProcessorImpl.java 
b/fe/fe-core/src/main/java/org/apache/doris/qe/QeProcessorImpl.java
index a4cd867a31e..8bc189321bd 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/qe/QeProcessorImpl.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/qe/QeProcessorImpl.java
@@ -245,8 +245,7 @@ public final class QeProcessorImpl implements QeProcessor {
 } else {
 result.setStatus(new TStatus(TStatusCode.RUNTIME_ERROR));
 }
-LOG.warn("ReportExecStatus() runtime error, query {} with type {} 
does not exist",
-DebugUtil.printId(params.query_id), params.query_type);
+
 return result;
 }
 try {


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (6a9f003944c -> 98ac57c3af4)

2024-07-01 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 6a9f003944c [improve](json)improve json support empty keys (#36762)
 add 98ac57c3af4 [fix](stmt) fix show create table consistency (#37074)

No new revisions were added by this update.

Summary of changes:
 fe/fe-core/src/main/java/org/apache/doris/analysis/SlotRef.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated: [opt](function)avoid virtual function calls in geo functions (#37003)

2024-06-30 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 444f96aa913 [opt](function)avoid virtual function calls in geo 
functions (#37003)
444f96aa913 is described below

commit 444f96aa9136f81ccd53244e0e41769f54f0e064
Author: Mryange <59914473+mrya...@users.noreply.github.com>
AuthorDate: Mon Jul 1 12:53:32 2024 +0800

[opt](function)avoid virtual function calls in geo functions (#37003)
---
 be/src/vec/functions/functions_geo.cpp | 285 +
 be/src/vec/functions/functions_geo.h   |   5 +-
 2 files changed, 189 insertions(+), 101 deletions(-)

diff --git a/be/src/vec/functions/functions_geo.cpp 
b/be/src/vec/functions/functions_geo.cpp
index 036033db2a2..b389bc1636e 100644
--- a/be/src/vec/functions/functions_geo.cpp
+++ b/be/src/vec/functions/functions_geo.cpp
@@ -26,6 +26,7 @@
 #include "geo/geo_common.h"
 #include "geo/geo_types.h"
 #include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
 #include "vec/columns/columns_number.h"
 #include "vec/common/string_ref.h"
 #include "vec/core/block.h"
@@ -33,6 +34,7 @@
 #include "vec/core/field.h"
 #include "vec/data_types/data_type_nullable.h"
 #include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
 #include "vec/functions/simple_function_factory.h"
 
 namespace doris::vectorized {
@@ -41,6 +43,7 @@ struct StPoint {
 static constexpr auto NEED_CONTEXT = false;
 static constexpr auto NAME = "st_point";
 static const size_t NUM_ARGS = 2;
+using Type = DataTypeString;
 static Status execute(Block& block, const ColumnNumbers& arguments, size_t 
result) {
 DCHECK_EQ(arguments.size(), 2);
 auto return_type = block.get_data_type(result);
@@ -52,26 +55,29 @@ struct StPoint {
 
 const auto size = std::max(left_column->size(), right_column->size());
 
-MutableColumnPtr res = return_type->create_column();
-
+auto res = ColumnString::create();
+auto null_map = ColumnUInt8::create(size, 0);
+auto& null_map_data = null_map->get_data();
 GeoPoint point;
 std::string buf;
 if (left_const) {
-const_vector(left_column, right_column, res, size, point, buf);
+const_vector(left_column, right_column, res, null_map_data, size, 
point, buf);
 } else if (right_const) {
-vector_const(left_column, right_column, res, size, point, buf);
+vector_const(left_column, right_column, res, null_map_data, size, 
point, buf);
 } else {
-vector_vector(left_column, right_column, res, size, point, buf);
+vector_vector(left_column, right_column, res, null_map_data, size, 
point, buf);
 }
 
-block.replace_by_position(result, std::move(res));
+block.replace_by_position(result,
+  ColumnNullable::create(std::move(res), 
std::move(null_map)));
 return Status::OK();
 }
 
-static void loop_do(GeoParseStatus& cur_res, MutableColumnPtr& res, 
GeoPoint& point,
-std::string& buf) {
+static void loop_do(GeoParseStatus& cur_res, ColumnString::MutablePtr& 
res, NullMap& null_map,
+int row, GeoPoint& point, std::string& buf) {
 if (cur_res != GEO_PARSE_OK) {
-res->insert_data(nullptr, 0);
+null_map[row] = 1;
+res->insert_default();
 return;
 }
 
@@ -81,32 +87,32 @@ struct StPoint {
 }
 
 static void const_vector(const ColumnPtr& left_column, const ColumnPtr& 
right_column,
- MutableColumnPtr& res, const size_t size, 
GeoPoint& point,
- std::string& buf) {
+ ColumnString::MutablePtr& res, NullMap& null_map, 
const size_t size,
+ GeoPoint& point, std::string& buf) {
 double x = left_column->operator[](0).get();
 for (int row = 0; row < size; ++row) {
 auto cur_res = point.from_coord(x, 
right_column->operator[](row).get());
-loop_do(cur_res, res, point, buf);
+loop_do(cur_res, res, null_map, row, point, buf);
 }
 }
 
 static void vector_const(const ColumnPtr& left_column, const ColumnPtr& 
right_column,
- MutableColumnPtr& res, const size_t size, 
GeoPoint& point,
- std::string& buf) {
+ ColumnString::MutablePtr& res, NullMap& null

(doris) branch master updated: [Exec](agg) Fix agg limit result error (#37025)

2024-06-30 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e6fdc0 [Exec](agg) Fix agg limit result error (#37025)
2e6fdc0 is described below

commit 2e6fdc0815021579cbc137f43d7bb6fc2ac7
Author: HappenLee 
AuthorDate: Mon Jul 1 09:49:04 2024 +0800

[Exec](agg) Fix agg limit result error (#37025)

Before merge #34853, should merge the pr firstly
---
 be/src/pipeline/dependency.cpp   | 10 ++
 be/src/pipeline/dependency.h |  3 ++-
 be/src/pipeline/exec/aggregation_sink_operator.cpp   |  4 +++-
 be/src/pipeline/exec/aggregation_source_operator.cpp |  8 +++-
 4 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/be/src/pipeline/dependency.cpp b/be/src/pipeline/dependency.cpp
index 68c00af409d..4938883062a 100644
--- a/be/src/pipeline/dependency.cpp
+++ b/be/src/pipeline/dependency.cpp
@@ -248,7 +248,8 @@ void AggSharedState::build_limit_heap(size_t 
hash_table_size) {
 limit_columns_min = limit_heap.top()._row_id;
 }
 
-bool AggSharedState::do_limit_filter(vectorized::Block* block, size_t 
num_rows) {
+bool AggSharedState::do_limit_filter(vectorized::Block* block, size_t num_rows,
+ const std::vector* key_locs) {
 if (num_rows) {
 cmp_res.resize(num_rows);
 need_computes.resize(num_rows);
@@ -257,9 +258,10 @@ bool AggSharedState::do_limit_filter(vectorized::Block* 
block, size_t num_rows)
 
 const auto key_size = null_directions.size();
 for (int i = 0; i < key_size; i++) {
-block->get_by_position(i).column->compare_internal(
-limit_columns_min, *limit_columns[i], null_directions[i], 
order_directions[i],
-cmp_res, need_computes.data());
+block->get_by_position(key_locs ? key_locs->operator[](i) : i)
+.column->compare_internal(limit_columns_min, 
*limit_columns[i],
+  null_directions[i], 
order_directions[i], cmp_res,
+  need_computes.data());
 }
 
 auto set_computes_arr = [](auto* __restrict res, auto* __restrict 
computes, int rows) {
diff --git a/be/src/pipeline/dependency.h b/be/src/pipeline/dependency.h
index 5214022db13..8adc24d3b4e 100644
--- a/be/src/pipeline/dependency.h
+++ b/be/src/pipeline/dependency.h
@@ -311,7 +311,8 @@ public:
 
 Status reset_hash_table();
 
-bool do_limit_filter(vectorized::Block* block, size_t num_rows);
+bool do_limit_filter(vectorized::Block* block, size_t num_rows,
+ const std::vector* key_locs = nullptr);
 void build_limit_heap(size_t hash_table_size);
 
 // We should call this function only at 1st phase.
diff --git a/be/src/pipeline/exec/aggregation_sink_operator.cpp 
b/be/src/pipeline/exec/aggregation_sink_operator.cpp
index fae987394b4..1dab1669dd5 100644
--- a/be/src/pipeline/exec/aggregation_sink_operator.cpp
+++ b/be/src/pipeline/exec/aggregation_sink_operator.cpp
@@ -329,6 +329,7 @@ Status 
AggSinkLocalState::_merge_with_serialized_key_helper(vectorized::Block* b
 if (limit) {
 need_do_agg = _emplace_into_hash_table_limit(_places.data(), 
block, key_locs,
  key_columns, rows);
+rows = block->rows();
 } else {
 _emplace_into_hash_table(_places.data(), key_columns, rows);
 }
@@ -589,7 +590,8 @@ bool 
AggSinkLocalState::_emplace_into_hash_table_limit(vectorized::AggregateData
 bool need_filter = false;
 {
 SCOPED_TIMER(_hash_table_limit_compute_timer);
-need_filter = 
_shared_state->do_limit_filter(block, num_rows);
+need_filter =
+_shared_state->do_limit_filter(block, 
num_rows, &key_locs);
 }
 
 auto& need_computes = _shared_state->need_computes;
diff --git a/be/src/pipeline/exec/aggregation_source_operator.cpp 
b/be/src/pipeline/exec/aggregation_source_operator.cpp
index 5b371877f36..1b7a151e2af 100644
--- a/be/src/pipeline/exec/aggregation_source_operator.cpp
+++ b/be/src/pipeline/exec/aggregation_source_operator.cpp
@@ -452,8 +452,14 @@ void AggLocalState::do_agg_limit(vectorized::Block* block, 
bool* eos) {
 if (_shared_state->reach_limit) {
 if (_shared_state->do_sort_limit && 
_shared_state->do_limit_filter(block, block->rows())) {
 vectorized::Block::filter_block_internal(block, 
_shared_state->need_computes);
+if (auto rows = block->rows()) {
+   

(doris) branch branch-2.1 updated: [Bug](runtime-filter) disable sync filter when pipeline engine is off (#36994)

2024-06-28 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.1 by this push:
 new cb80ae906f2 [Bug](runtime-filter) disable sync filter when pipeline 
engine is off (#36994)
cb80ae906f2 is described below

commit cb80ae906f2a0e7e4a57b77ba645a81e238b3fcf
Author: Pxl 
AuthorDate: Fri Jun 28 16:59:26 2024 +0800

[Bug](runtime-filter) disable sync filter when pipeline engine is off 
(#36994)

## Proposed changes
1. disable sync filter when pipeline engine is off
2. reduce some warning log
---
 be/src/exprs/runtime_filter.cpp | 2 +-
 be/src/runtime/runtime_state.cpp| 6 --
 be/src/vec/runtime/vdata_stream_mgr.cpp | 4 ++--
 3 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/be/src/exprs/runtime_filter.cpp b/be/src/exprs/runtime_filter.cpp
index 39eb814bbea..1271ec39156 100644
--- a/be/src/exprs/runtime_filter.cpp
+++ b/be/src/exprs/runtime_filter.cpp
@@ -1852,7 +1852,7 @@ RuntimeFilterType IRuntimeFilter::get_real_type() {
 bool IRuntimeFilter::need_sync_filter_size() {
 return (type() == RuntimeFilterType::IN_OR_BLOOM_FILTER ||
 type() == RuntimeFilterType::BLOOM_FILTER) &&
-   _wrapper->get_build_bf_cardinality() && !_is_broadcast_join;
+   _wrapper->get_build_bf_cardinality() && !_is_broadcast_join && 
_enable_pipeline_exec;
 }
 
 Status IRuntimeFilter::update_filter(const UpdateRuntimeFilterParams* param) {
diff --git a/be/src/runtime/runtime_state.cpp b/be/src/runtime/runtime_state.cpp
index 75d06adc561..2713ee441dd 100644
--- a/be/src/runtime/runtime_state.cpp
+++ b/be/src/runtime/runtime_state.cpp
@@ -544,15 +544,9 @@ Status 
RuntimeState::register_consumer_runtime_filter(const doris::TRuntimeFilte
   bool need_local_merge, 
int node_id,
   doris::IRuntimeFilter** 
consumer_filter) {
 if (desc.has_remote_targets || need_local_merge) {
-LOG(WARNING) << "registe global ins:" << _profile.name()
- << " ,mgr: " << global_runtime_filter_mgr()
- << " ,filter id:" << desc.filter_id;
 return global_runtime_filter_mgr()->register_consumer_filter(desc, 
query_options(), node_id,
  
consumer_filter, false, true);
 } else {
-LOG(WARNING) << "registe local ins:" << _profile.name()
- << " ,mgr: " << global_runtime_filter_mgr()
- << " ,filter id:" << desc.filter_id;
 return local_runtime_filter_mgr()->register_consumer_filter(desc, 
query_options(), node_id,
 
consumer_filter, false, false);
 }
diff --git a/be/src/vec/runtime/vdata_stream_mgr.cpp 
b/be/src/vec/runtime/vdata_stream_mgr.cpp
index 46d335fbf00..4e48effb566 100644
--- a/be/src/vec/runtime/vdata_stream_mgr.cpp
+++ b/be/src/vec/runtime/vdata_stream_mgr.cpp
@@ -97,8 +97,8 @@ Status VDataStreamMgr::find_recvr(const TUniqueId& 
fragment_instance_id, PlanNod
 }
 ++range.first;
 }
-return Status::InternalError("Could not find local receiver for node {} 
with instance {}",
- node_id, print_id(fragment_instance_id));
+return Status::InvalidArgument("Could not find local receiver for node {} 
with instance {}",
+   node_id, print_id(fragment_instance_id));
 }
 
 Status VDataStreamMgr::transmit_block(const PTransmitDataParams* request,


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch branch-2.1 updated: [fix](bitmap) incorrect type of BitmapValue with fastunion (#36834) (#36896)

2024-06-27 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a commit to branch branch-2.1
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-2.1 by this push:
 new f27ae8fa097 [fix](bitmap) incorrect type of BitmapValue with fastunion 
(#36834) (#36896)
f27ae8fa097 is described below

commit f27ae8fa097902d2c7cfb7f0020ebeed5c7f95ac
Author: Jerry Hu 
AuthorDate: Fri Jun 28 11:29:03 2024 +0800

[fix](bitmap) incorrect type of BitmapValue with fastunion (#36834) (#36896)
---
 be/src/util/bitmap_value.h |  5 +++--
 be/test/util/bitmap_value_test.cpp | 23 +++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/be/src/util/bitmap_value.h b/be/src/util/bitmap_value.h
index d8f68a227e7..ea99fa58baf 100644
--- a/be/src/util/bitmap_value.h
+++ b/be/src/util/bitmap_value.h
@@ -1652,7 +1652,6 @@ public:
 case SINGLE: {
 _set.insert(_sv);
 _type = SET;
-_convert_to_bitmap_if_need();
 break;
 }
 case BITMAP:
@@ -1663,10 +1662,12 @@ public:
 _type = BITMAP;
 break;
 case SET: {
-_convert_to_bitmap_if_need();
 break;
 }
 }
+if (_type == SET) {
+_convert_to_bitmap_if_need();
+}
 }
 
 if (_type == EMPTY && single_values.size() == 1) {
diff --git a/be/test/util/bitmap_value_test.cpp 
b/be/test/util/bitmap_value_test.cpp
index e7652199ab0..d0ad3a82fda 100644
--- a/be/test/util/bitmap_value_test.cpp
+++ b/be/test/util/bitmap_value_test.cpp
@@ -1026,6 +1026,29 @@ TEST(BitmapValueTest, bitmap_union) {
 EXPECT_EQ(3, bitmap3.cardinality());
 bitmap3.fastunion({&bitmap});
 EXPECT_EQ(5, bitmap3.cardinality());
+
+const auto old_config = config::enable_set_in_bitmap_value;
+config::enable_set_in_bitmap_value = true;
+BitmapValue bitmap4; // empty
+
+BitmapValue bitmap_set1;
+BitmapValue bitmap_set2;
+BitmapValue bitmap_set3;
+
+const int set_data1[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
15};
+bitmap_set1.add_many(set_data1, 15);
+
+const int set_data2[] = {16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 
28, 29, 30};
+bitmap_set2.add_many(set_data2, 15);
+
+const int set_data3[] = {31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 
43, 44, 45};
+bitmap_set3.add_many(set_data3, 15);
+
+bitmap4.fastunion({&bitmap_set1, &bitmap_set2, &bitmap_set3});
+
+EXPECT_EQ(bitmap4.cardinality(), 45);
+EXPECT_EQ(bitmap4.get_type_code(), BitmapTypeCode::BITMAP32);
+config::enable_set_in_bitmap_value = old_config;
 }
 
 TEST(BitmapValueTest, bitmap_intersect) {


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



(doris) branch master updated (6b300ff6af6 -> 4012bceb3c8)

2024-06-27 Thread lihaopeng
This is an automated email from the ASF dual-hosted git repository.

lihaopeng pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


from 6b300ff6af6 [feat](function) Function to encode/decode varchar (#36649)
 add 4012bceb3c8 [Bug](cast) fix cast string to int return wrong result 
(#36788)

No new revisions were added by this update.

Summary of changes:
 be/src/util/string_parser.hpp|  5 +
 regression-test/data/datatype_p0/json/json_cast.out  | 12 
 regression-test/suites/datatype_p0/json/json_cast.groovy |  6 +-
 3 files changed, 22 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



  1   2   3   4   5   6   7   >