[
https://issues.apache.org/jira/browse/DRILL-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069728#comment-18069728
]
ASF GitHub Bot commented on DRILL-8545:
---------------------------------------
cgivre merged PR #3042:
URL: https://github.com/apache/drill/pull/3042
> COLLECT_TO_LIST_VARCHAR function returns incorrect result when Hash
> Aggregator operator used
> --------------------------------------------------------------------------------------------
>
> Key: DRILL-8545
> URL: https://issues.apache.org/jira/browse/DRILL-8545
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.21.2
> Reporter: Maksym Rymar
> Assignee: Maksym Rymar
> Priority: Major
> Fix For: 1.23.0
>
>
> Drill returns an incorrect result for a query with the
> {{COLLECT_TO_LIST_VARCHAR}} function and the Hash Aggregator operator used
> during execution.
> *Query:*
> {code:java}
> SELECT cars.color,
> COLLECT_TO_LIST_VARCHAR(cars.id)
> FROM (
> VALUES (1, 'Volkswagen', 'RED'),
> (2, 'Volvo', 'RED'),
> (3, 'Hyundai', 'BLACK'),
> (4, 'Toyota', 'BLACK'),
> (5, 'Fiat', 'RED'),
> (6, 'Pegeout', 'RED'),
> (7, 'Cintroen', 'RED'),
> (8, 'Audi', 'BLACK'),
> (9, 'Porsche', 'BLACK')
> ) AS cars(id, brand, color)
> GROUP BY cars.color {code}
> {*}Given result:{*}{*}{*}
> {code:java}
> +-------+-----------------------+
> | color | EXPR$1 |
> +-------+-----------------------+
> | RED | ["1","2","5","6","7"] |
> | BLACK | ["8","9"] |
> +-------+-----------------------+{code}
>
> {*}Expected result:{*}{*}{*}
> {code:java}
> +-------+-----------------------+
> | color | EXPR$1 |
> +-------+-----------------------+
> | BLACK | ["3","4","8","9"] |
> | RED | ["1","2","5","6","7"] |
> +-------+-----------------------+ {code}
> Query plan of the query:
> {code:java}
> 00-00 Screen
> 00-01 Project(color=[$0], EXPR$1=[$1])
> 00-02 HashAgg(group=[{0}], EXPR$1=[COLLECT_TO_LIST_VARCHAR($1)])
> 00-03 Project(color=[$2], id=[$0])
> 00-04 Values(tuples=[[{ 1, 'Volkswagen', 'RED ' }, { 2, 'Volvo
> ', 'RED ' }, { 3, 'Hyundai ', 'BLACK' }, { 4, 'Toyota ', 'BLACK' }, {
> 5, 'Fiat ', 'RED ' }, { 6, 'Pegeout ', 'RED ' }, { 7, 'Cintroen ',
> 'RED ' }, { 8, 'Audi ', 'BLACK' }, { 9, 'Porsche ', 'BLACK' }]])
> {code}
> **
>
>
> The same query executed without the Hash Aggregator operator returns the
> correct result:
>
> {code:java}
> apache drill> set `planner.enable_hashagg` = false;
> +------+---------------------------------+
> | ok | summary |
> +------+---------------------------------+
> | true | planner.enable_hashagg updated. |
> +------+---------------------------------+
> 1 row selected (0.292 seconds)
> apache drill> SELECT cars.color,
> 2..semicolon> COLLECT_TO_LIST_VARCHAR(cars.id)
> 3..semicolon> FROM (
> 4..........)> VALUES (1, 'Volkswagen', 'RED'),
> 5..........)> (2, 'Volvo', 'RED'),
> 6..........)> (3, 'Hyundai', 'BLACK'),
> 7..........)> (4, 'Toyota', 'BLACK'),
> 8..........)> (5, 'Fiat', 'RED'),
> 9..........)> (6, 'Pegeout', 'RED'),
> 10.........)> (7, 'Cintroen', 'RED'),
> 11.........)> (8, 'Audi', 'BLACK'),
> 12.........)> (9, 'Porsche', 'BLACK')
> 13.........)> ) AS cars(id, brand, color)
> 14.semicolon> GROUP BY cars.color;
> +-------+-----------------------+
> | color | EXPR$1 |
> +-------+-----------------------+
> | BLACK | ["3","4","8","9"] |
> | RED | ["1","2","5","6","7"] |
> +-------+-----------------------+
> 2 rows selected (8.577 seconds) {code}
>
>
> h2. Steps to reproduce
> # Make sure Hash Aggregator is used in the execution plan. It can be forced
> by disabling Stream Aggregator operator:
> {code:java}
> set `planner.enable_streamagg` = false {code}
> {code:java}
> 00-00 Screen
> 00-01 Project(color=[$0], EXPR$1=[$1])
> 00-02 HashAgg(group=[{0}], EXPR$1=[COLLECT_TO_LIST_VARCHAR($1)])
> 00-03 Project(color=[$2], id=[$0])
> 00-04 Values(tuples=[[{ 1, 'Volkswagen', 'RED ' }, { 2, 'Volvo
> ', 'RED ' }, { 3, 'Hyundai ', 'BLACK' }, { 4, 'Toyota ', 'BLACK' }, {
> 5, 'Fiat ', 'RED ' }, { 6, 'Pegeout ', 'RED ' }, { 7, 'Cintroen ',
> 'RED ' }, { 8, 'Audi ', 'BLACK' }, { 9, 'Porsche ', 'BLACK' }]])
> {code}
> # Execute a query with aggregation and {{COLLECT_TO_LIST_VARCHAR}} function
> used:
> {code:java}
> SELECT cars.color,
> COLLECT_TO_LIST_VARCHAR(cars.id)
> FROM (
> VALUES (1, 'Volkswagen', 'RED'),
> (2, 'Volvo', 'RED'),
> (3, 'Hyundai', 'BLACK'),
> (4, 'Toyota', 'BLACK'),
> (5, 'Fiat', 'RED'),
> (6, 'Pegeout', 'RED'),
> (7, 'Cintroen', 'RED'),
> (8, 'Audi', 'BLACK'),
> (9, 'Porsche', 'BLACK')
> ) AS cars(id, brand, color)
> GROUP BY cars.color{code}
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)