Mryange commented on code in PR #63128:
URL: https://github.com/apache/doris/pull/63128#discussion_r3217123288


##########
be/src/exprs/aggregate/aggregate_function_collect.h:
##########
@@ -393,28 +394,188 @@ struct AggregateFunctionCollectListData<T, HasLimit> {
     void insert_result_into(IColumn& to) const { 
to.insert_range_from(*column_data, 0, size()); }
 };
 
-template <typename Data, bool HasLimit>
+// V2 complex type specialization: uses IColumn::serialize_value_into_arena /
+// deserialize_and_insert_from_arena for binary serialization instead of JSON 
text.
+template <PrimitiveType T, bool HasLimit>
+    requires(!is_string_type(T) && !is_int_or_bool(T) && 
!is_float_or_double(T) && !is_decimal(T) &&
+             !is_date_type(T) && !is_ip(T) && !is_timestamptz_type(T))
+struct AggregateFunctionCollectListDataV2 {
+    static constexpr PrimitiveType PType = T;
+    static constexpr bool is_collect_list = true;
+    using ElementType = StringRef;
+    using Self = AggregateFunctionCollectListDataV2<T, HasLimit>;
+    MutableColumnPtr column_data;
+    Int64 max_size = -1;
+
+    AggregateFunctionCollectListDataV2(const DataTypes& argument_types) {
+        column_data = argument_types[0]->create_column();
+    }
+
+    size_t size() const { return column_data->size(); }
+
+    void add(const IColumn& column, size_t row_num) { 
column_data->insert_from(column, row_num); }
+
+    void merge(const Self& rhs) {
+        if constexpr (HasLimit) {
+            if (max_size == -1) {
+                max_size = rhs.max_size;
+            }
+            max_size = rhs.max_size;
+            column_data->insert_range_from(
+                    *rhs.column_data, 0,
+                    std::min(static_cast<size_t>(max_size - size()), 
rhs.size()));
+        } else {

Review Comment:
   > This V2 complex merge path can exceed the requested limit when different 
partial states have different `max_size` values. For example, with 
`collect_list_v2(array_col, limit_expr)`, one fragment can build a state with 
`max_size = 10` and 10 rows while another builds `rhs.max_size = 1`; merging 
sets `max_size` to 1 and then `static_cast<size_t>(max_size - size())` 
underflows, so `min(huge, rhs.size())` appends the RHS rows instead of 
appending none. The primitive list implementation checks `size() >= max_size` 
before each append, and the set V2 path has a similar guard. Please add the 
same guard here, e.g. return when `max_size != -1 && static_cast<Int64>(size()) 
>= max_size` before computing the remaining capacity, and cover the 
distributed/merge limit case in a test.
   
   这个在V1上也是同样的逻辑,不要检查这块相同位数的有无符号的转换(如果位数变小了,这个才需要检查)
   
   重新review



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to