rymarm opened a new pull request, #3042: URL: https://github.com/apache/drill/pull/3042
# [DRILL-8545](https://issues.apache.org/jira/browse/DRILL-8545): COLLECT_TO_LIST_VARCHAR function returns incorrect result when Hash Aggregator operator used ## Description ### Root cause The `collect_to_list_varchar` function is incompatible with the **Hash Aggregator** because the aggregator processes data in a non-sequential manner, while the underlying `ValueVector` framework requires sequential writes for variable-length data. Furthermore, the Drill UDF framework lacks a straightforward mechanism to buffer these values internally before flushing them to the output vector, making it impossible to reorder them on the fly during the aggregation phase. Solution ### Solution To ensure data integrity and prevent index out-of-bounds exceptions, I have modified the **Hash Aggregator physical planning rule**. The planner will now explicitly disallow the Hash Aggregator if a `collect_to_list_varchar` call is detected in the aggregate expression. This forces the optimizer to fall back to the **Streaming Aggregator**, which provides the necessary ordered input. ## Documentation No changes. ## Testing Updated the available unit test cases so they cover the mentioned problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
