[PR] DRILL-8545: Disable HashAgg for collect_to_list_varchar due to ordering requirements (drill)

via GitHub Mon, 23 Mar 2026 12:22:05 -0700


rymarm opened a new pull request, #3042:
URL: https://github.com/apache/drill/pull/3042


   
   
   # [DRILL-8545](https://issues.apache.org/jira/browse/DRILL-8545): 
COLLECT_TO_LIST_VARCHAR function returns incorrect result when Hash Aggregator 
operator used
   
   ## Description
   
   ### Root cause
   
   The `collect_to_list_varchar` function is incompatible with the **Hash 
Aggregator** because the aggregator processes data in a non-sequential manner, 
while the underlying `ValueVector` framework requires sequential writes for 
variable-length data. Furthermore, the Drill UDF framework lacks a 
straightforward mechanism to buffer these values internally before flushing 
them to the output vector, making it impossible to reorder them on the fly 
during the aggregation phase.
   Solution
   
   ### Solution
   To ensure data integrity and prevent index out-of-bounds exceptions, I have 
modified the **Hash Aggregator physical planning rule**. The planner will now 
explicitly disallow the Hash Aggregator if a `collect_to_list_varchar` call is 
detected in the aggregate expression. This forces the optimizer to fall back to 
the **Streaming Aggregator**, which provides the necessary ordered input.
   
   ## Documentation
   No changes.
   
   ## Testing
   Updated the available unit test cases so they cover the mentioned problem.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] DRILL-8545: Disable HashAgg for collect_to_list_varchar due to ordering requirements (drill)

Reply via email to