[ 
https://issues.apache.org/jira/browse/DRILL-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069618#comment-18069618
 ] 

ASF GitHub Bot commented on DRILL-8545:
---------------------------------------

rymarm commented on PR #3042:
URL: https://github.com/apache/drill/pull/3042#issuecomment-4155297549

   @cgivre Hi Charles!
   > If a Drill cluster is configured with HashAgg enabled, and a user executes 
a query with collect_to_list_varchar, will Drill fall back to StreamAgg or will 
Drill throw an error? 
   
   Yes, definitely. If a Drill cluster is configured with `HashAgg` enabled and 
`StreamAgg` is enabled either (by default, both `HashAgg` and `StreamAgg` 
operators are enabled), Drill will simply fallback to the `StreamAgg` operator. 
Otherwise, if `HashAgg` is enabled and `StreamAgg` is **DISABLED** - Drill will 
throw an exception `CannotPlanException`, because Drill has only 2 aggregation 
operator implementations at all, and in this case, one is not acceptable, and 
another one is disabled.




> COLLECT_TO_LIST_VARCHAR function returns incorrect result when Hash 
> Aggregator operator used
> --------------------------------------------------------------------------------------------
>
>                 Key: DRILL-8545
>                 URL: https://issues.apache.org/jira/browse/DRILL-8545
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.21.2
>            Reporter: Maksym Rymar
>            Assignee: Maksym Rymar
>            Priority: Major
>             Fix For: 1.23.0
>
>
> Drill returns an incorrect result for a query with the 
> {{COLLECT_TO_LIST_VARCHAR}} function and the Hash Aggregator operator used 
> during execution.
> *Query:*
> {code:java}
> SELECT cars.color,
>        COLLECT_TO_LIST_VARCHAR(cars.id)
> FROM (
>       VALUES (1, 'Volkswagen', 'RED'), 
>                (2, 'Volvo', 'RED'), 
>                (3, 'Hyundai', 'BLACK'), 
>                (4, 'Toyota', 'BLACK'), 
>                (5, 'Fiat', 'RED'), 
>                (6, 'Pegeout', 'RED'), 
>                (7, 'Cintroen', 'RED'), 
>                (8, 'Audi', 'BLACK'), 
>                (9, 'Porsche', 'BLACK')
>       ) AS cars(id, brand, color) 
> GROUP BY cars.color {code}
> {*}Given result:{*}{*}{*}
> {code:java}
> +-------+-----------------------+
> | color |        EXPR$1         |
> +-------+-----------------------+
> | RED   | ["1","2","5","6","7"] |
> | BLACK | ["8","9"]             |
> +-------+-----------------------+{code}
>  
> {*}Expected result:{*}{*}{*}
> {code:java}
> +-------+-----------------------+
> | color |        EXPR$1         |
> +-------+-----------------------+
> | BLACK | ["3","4","8","9"]     |
> | RED   | ["1","2","5","6","7"] |
> +-------+-----------------------+ {code}
> Query plan of the query:
> {code:java}
> 00-00    Screen
> 00-01      Project(color=[$0], EXPR$1=[$1])
> 00-02        HashAgg(group=[{0}], EXPR$1=[COLLECT_TO_LIST_VARCHAR($1)])
> 00-03          Project(color=[$2], id=[$0])
> 00-04            Values(tuples=[[{ 1, 'Volkswagen', 'RED  ' }, { 2, 'Volvo    
>  ', 'RED  ' }, { 3, 'Hyundai   ', 'BLACK' }, { 4, 'Toyota    ', 'BLACK' }, { 
> 5, 'Fiat      ', 'RED  ' }, { 6, 'Pegeout   ', 'RED  ' }, { 7, 'Cintroen  ', 
> 'RED  ' }, { 8, 'Audi      ', 'BLACK' }, { 9, 'Porsche   ', 'BLACK' }]]) 
> {code}
> **
>  
>  
> The same query executed without the Hash Aggregator operator returns the 
> correct result:
>  
> {code:java}
> apache drill> set `planner.enable_hashagg` = false;
> +------+---------------------------------+
> |  ok  |             summary             |
> +------+---------------------------------+
> | true | planner.enable_hashagg updated. |
> +------+---------------------------------+
> 1 row selected (0.292 seconds)
> apache drill> SELECT cars.color,
> 2..semicolon>        COLLECT_TO_LIST_VARCHAR(cars.id)
> 3..semicolon> FROM (
> 4..........)>       VALUES (1, 'Volkswagen', 'RED'),
> 5..........)>                (2, 'Volvo', 'RED'),
> 6..........)>                (3, 'Hyundai', 'BLACK'),
> 7..........)>                (4, 'Toyota', 'BLACK'),
> 8..........)>                (5, 'Fiat', 'RED'),
> 9..........)>                (6, 'Pegeout', 'RED'),
> 10.........)>                (7, 'Cintroen', 'RED'),
> 11.........)>                (8, 'Audi', 'BLACK'),
> 12.........)>                (9, 'Porsche', 'BLACK')
> 13.........)>       ) AS cars(id, brand, color)
> 14.semicolon> GROUP BY cars.color;
> +-------+-----------------------+
> | color |        EXPR$1         |
> +-------+-----------------------+
> | BLACK | ["3","4","8","9"]     |
> | RED   | ["1","2","5","6","7"] |
> +-------+-----------------------+
> 2 rows selected (8.577 seconds) {code}
>  
>  
> h2. Steps to reproduce
>  # Make sure Hash Aggregator is used in the execution plan. It can be forced 
> by disabling Stream Aggregator operator:
> {code:java}
> set `planner.enable_streamagg` = false {code}
> {code:java}
> 00-00    Screen
> 00-01      Project(color=[$0], EXPR$1=[$1])
> 00-02        HashAgg(group=[{0}], EXPR$1=[COLLECT_TO_LIST_VARCHAR($1)])
> 00-03          Project(color=[$2], id=[$0])
> 00-04            Values(tuples=[[{ 1, 'Volkswagen', 'RED  ' }, { 2, 'Volvo    
>  ', 'RED  ' }, { 3, 'Hyundai   ', 'BLACK' }, { 4, 'Toyota    ', 'BLACK' }, { 
> 5, 'Fiat      ', 'RED  ' }, { 6, 'Pegeout   ', 'RED  ' }, { 7, 'Cintroen  ', 
> 'RED  ' }, { 8, 'Audi      ', 'BLACK' }, { 9, 'Porsche   ', 'BLACK' }]]) 
> {code}
>  # Execute a query with aggregation and {{COLLECT_TO_LIST_VARCHAR}} function 
> used:
> {code:java}
> SELECT cars.color,
>        COLLECT_TO_LIST_VARCHAR(cars.id)
> FROM (
>       VALUES (1, 'Volkswagen', 'RED'), 
>                (2, 'Volvo', 'RED'), 
>                (3, 'Hyundai', 'BLACK'), 
>                (4, 'Toyota', 'BLACK'), 
>                (5, 'Fiat', 'RED'), 
>                (6, 'Pegeout', 'RED'), 
>                (7, 'Cintroen', 'RED'), 
>                (8, 'Audi', 'BLACK'), 
>                (9, 'Porsche', 'BLACK')
>       ) AS cars(id, brand, color) 
> GROUP BY cars.color{code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to