Copilot commented on code in PR #61503:
URL: https://github.com/apache/doris/pull/61503#discussion_r2957615086


##########
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java:
##########
@@ -1490,6 +1491,9 @@ public enum IgnoreSplitType {
             needForward = true)
     private boolean enableParallelScan = true;
 
+    @VariableMgr.VarAttr(name = ENABLE_AGGREGATE_FUNCTION_NULL_V2, fuzzy = 
true, needForward = true)
+    private boolean enableAggregateFunctionNullV2 = true;
+

Review Comment:
   SessionVariable defaults for the new forwarded options look inconsistent 
with their Thrift defaults: enable_aggregate_function_null_v2 is declared in 
Thrift with default false, but SessionVariable initializes 
enableAggregateFunctionNullV2 to true and always sets it in toThrift(), 
effectively turning the feature on by default for all queries. If this is meant 
to be opt-in (safer for compatibility), set the Java default to false (and/or 
only set the Thrift field when explicitly enabled). Similar concern applies to 
enableAdjustConjunctOrderByCost defaulting to true while the Thrift field has 
no default.



##########
be/src/agent/be_exec_version_manager.h:
##########
@@ -25,13 +25,7 @@
 
 namespace doris {
 
-constexpr inline int USE_NEW_SERDE = 4;         // release on DORIS version 2.1
-constexpr inline int AGG_FUNCTION_NULLABLE = 5; // change some agg nullable 
property: PR #37215
-constexpr inline int VARIANT_SERDE = 6;         // change variant serde to fix 
PR #38413
-constexpr inline int AGGREGATION_2_1_VERSION =
-        6; // some aggregation changed the data format after this version
-constexpr inline int USE_CONST_SERDE =
-        8; // support const column in serialize/deserialize function: PR #41175
+constexpr inline int USE_NEW_FIXED_OBJECT_SERIALIZATION_VERSION = 10;
 

Review Comment:
   be_exec_version_manager.h now only defines 
USE_NEW_FIXED_OBJECT_SERIALIZATION_VERSION, but many BE code paths still 
reference older version constants like USE_CONST_SERDE / VARIANT_SERDE / 
USE_NEW_SERDE (e.g. multiple vec/data_types files). As-is this will not 
compile. Please either keep the previous constants here (even if deprecated) or 
update all remaining references to use the new versioning scheme / explicit 
numeric values.
   



##########
be/src/vec/data_types/data_type_fixed_length_object.cpp:
##########
@@ -136,7 +190,21 @@ const char* DataTypeFixedLengthObject::deserialize(const 
char* buf, MutableColum
 // data  : item data1 | item data2...
 int64_t DataTypeFixedLengthObject::get_uncompressed_serialized_bytes(const 
IColumn& column,
                                                                      int 
be_exec_version) const {
-    if (be_exec_version >= USE_CONST_SERDE) {
+    if (be_exec_version >= USE_NEW_FIXED_OBJECT_SERIALIZATION_VERSION) {
+        // New format size calculation with streamvbyte
+        auto size = sizeof(bool) + sizeof(size_t) + sizeof(size_t) + 
sizeof(size_t);
+        auto real_need_copy_num = is_column_const(column) ? 1 : column.size();
+        const auto& src_col = assert_cast<const ColumnType&>(column);
+        auto mem_size = src_col.item_size() * real_need_copy_num;
+        if (mem_size <= SERIALIZED_MEM_SIZE_LIMIT) {
+            return size + mem_size;
+        } else {
+            // Throw exception if mem_size is large than UINT32_MAX
+            return size + sizeof(size_t) +
+                   std::max(mem_size, streamvbyte_max_compressedbytes(
+                                              
cast_set<UInt32>(upper_int32(mem_size))));
+        }

Review Comment:
   DataTypeFixedLengthObject::get_uncompressed_serialized_bytes() overestimates 
the new-format size by always including an extra sizeof(size_t) 
("encode_size"), even when mem_size <= SERIALIZED_MEM_SIZE_LIMIT and 
serialize() does not write that field. This mismatch is why the unit test 
removed the pointer-advance ASSERT, but it also weakens safety (callers may 
rely on the returned size for exact buffer bounds). Please make the size 
calculation match the exact bytes written for both branches, and restore/keep 
the serialize() contract that the returned pointer equals buf + computed_size.



##########
be/src/vec/aggregate_functions/aggregate_function.h:
##########
@@ -178,8 +177,10 @@ class IAggregateFunction {
                                                          const IColumn& 
column, size_t begin,
                                                          size_t end, Arena&) 
const = 0;
 
-    virtual void deserialize_and_merge_from_column(AggregateDataPtr __restrict 
place,
-                                                   const IColumn& column, 
Arena&) const = 0;
+    void deserialize_and_merge_from_column(AggregateDataPtr __restrict place, 
const IColumn& column,
+                                           Arena& arena) const {
+        deserialize_and_merge_from_column_range(place, column, 0, 
column.size() - 1, arena);
+    }

Review Comment:
   IAggregateFunction::deserialize_and_merge_from_column() calls 
deserialize_and_merge_from_column_range(place, column, 0, column.size() - 1, 
...). When column is empty this underflows (size_t) and can lead to huge 
indices / OOB access in release builds. Add an empty-column guard (return 
early), and consider documenting that the range end is inclusive to avoid 
future off-by-one mistakes.



##########
be/src/vec/aggregate_functions/helpers.h:
##########
@@ -68,12 +69,13 @@
                                     decltype(&IAggregateFunctionHelper<        
                    \
                                              
FunctionTemplate>::serialize_without_key_to_column)>, \
                     "need to override serialize_without_key_to_column");       
                    \
-            static_assert(!std::is_same_v<                                     
                    \
-                                  
decltype(&FunctionTemplate::deserialize_and_merge_from_column),  \
-                                  decltype(&IAggregateFunctionHelper<          
                    \
-                                           
FunctionTemplate>::deserialize_and_merge_from_column)>, \
-                          "need to override "                                  
                    \
-                          "deserialize_and_merge_from_column");                
                    \
+            static_assert(                                                     
                    \
+                    !std::is_same_v<                                           
                    \
+                            
decltype(&FunctionTemplate::deserialize_and_merge_from_column_range),  \
+                            decltype(&IAggregateFunctionHelper<                
                    \
+                                     
FunctionTemplate>::deserialize_and_merge_from_column)>,       \

Review Comment:
   CHECK_AGG_FUNCTION_SERIALIZED_TYPE is intended to enforce overriding 
deserialize_and_merge_from_column(_range) for non-default serialized types, but 
the current static_assert compares 
FunctionTemplate::deserialize_and_merge_from_column_range against 
IAggregateFunctionHelper<...>::deserialize_and_merge_from_column (different 
signatures), so the check will always pass and won't catch missing overrides. 
Update the static_assert to compare against 
IAggregateFunctionHelper<...>::deserialize_and_merge_from_column_range instead.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to