[jira] [Created] (DRILL-8403) Rewritten aggregate functions are incorrectly grouped when used with PIVOT

James Turton (Jira) Wed, 22 Feb 2023 05:11:15 -0800

James Turton created DRILL-8403:
-----------------------------------

             Summary: Rewritten aggregate functions are incorrectly grouped 
when used with PIVOT
                 Key: DRILL-8403
                 URL: https://issues.apache.org/jira/browse/DRILL-8403
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.21.0
            Reporter: James Turton
            Assignee: Vova Vysotskyi
             Fix For: 1.21.1



The following query should group aggregates by both marital_status and 
education_level but only groups them by education_level.


apache drill> SELECT
2..semicolon>   *
3..semicolon> FROM
4..semicolon>   (SELECT
5..........)>       education_level,
6..........)>       salary,
7..........)>       marital_status,
8..........)>       extract(year from age(birth_date)) age
9..........)>   FROM
10.........)>       cp.`employee.json`)
11.semicolon> PIVOT (
12.........)>   avg(salary) avg_salary, avg(age) avg_age FOR marital_status IN 
('M' married, 'S' single)
13.........)> );
+---------------------+--------------------+--------------------+--------------------+--------------------+
|   education_level   | married_avg_salary |  married_avg_age   | 
single_avg_salary  |   single_avg_age   |
+---------------------+--------------------+--------------------+--------------------+--------------------+
| Graduate Degree     | 4392.823529411765  | 100.32352941176471 | 
4392.823529411765  | 100.32352941176471 |
| Bachelors Degree    | 4492.404181184669  | 102.22996515679442 | 
4492.404181184669  | 102.22996515679442 |
| Partial College     | 4047.1180555555557 | 100.10069444444444 | 
4047.1180555555557 | 100.10069444444444 |
| High School Degree  | 3516.1565836298932 | 103.12811387900356 | 
3516.1565836298932 | 103.12811387900356 |
| Partial High School | 3511.0852713178297 | 102.30232558139535 | 
3511.0852713178297 | 102.30232558139535 |
+---------------------+--------------------+--------------------+--------------------+--------------------+
5 rows selected (0.285 seconds)
 
00-00    Screen : rowType = RecordType(ANY education_level, ANY 
married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE 
single_avg_age): rowcount = 46.3, cumulative cost = \{1486.23 rows, 
35748.229999999996 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id 
= 812
00-01      Project(education_level=[$0], married_min_salary=[$1], 
married_avg_age=[$2], single_min_salary=[$3], single_avg_age=[$4]) : rowType = 
RecordType(ANY education_level, ANY married_min_salary, DOUBLE married_avg_age, 
ANY single_min_salary, DOUBLE single_avg_age): rowcount = 46.3, cumulative cost 
= \{1481.6 rows, 35743.6 cpu, 474630.0 io, 0.0 network, 8148.800000000001 
memory}, id = 811
00-02        Project(education_level=[$0], 
married_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], 
married_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)], 
single_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], 
single_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)]) : rowType 
= RecordType(ANY education_level, ANY married_min_salary, DOUBLE 
married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 
46.3, cumulative cost = \{1435.3 rows, 35512.1 cpu, 474630.0 io, 0.0 network, 
8148.800000000001 memory}, id = 808
00-03          HashAgg(group=[\{0}], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], 
agg#2=[$SUM0($3)], agg#3=[COUNT($3)]) : rowType = RecordType(ANY 
education_level, ANY $f1, BIGINT $f2, BIGINT $f3, BIGINT $f4): rowcount = 46.3, 
cumulative cost = \{1389.0 rows, 34725.0 cpu, 474630.0 io, 0.0 network, 
8148.800000000001 memory}, id = 807
00-04            Project(education_level=[$0], marital_status=[$1], 
salary=[$2], age=[EXTRACT(FLAG(YEAR), AGE($3))], $f4=[IS TRUE(=($1, 'M'))], 
$f5=[IS TRUE(=($1, 'S'))]) : rowType = RecordType(ANY education_level, ANY 
marital_status, ANY salary, BIGINT age, BOOLEAN $f4, BOOLEAN $f5): rowcount = 
463.0, cumulative cost = \{926.0 rows, 8797.0 cpu, 474630.0 io, 0.0 network, 
0.0 memory}, id = 806
00-05              Scan(table=[[cp, employee.json]], groupscan=[EasyGroupScan 
[selectionRoot=classpath:/employee.json, numFiles=1, 
columns=[`education_level`, `marital_status`, `salary`, `birth_date`], 
files=[classpath:/employee.json], usedMetastore=false, limit=-1, 
formatConfig=JSONFormatConfig [extensions=[json]]]]) : rowType = RecordType(ANY 
education_level, ANY marital_status, ANY salary, ANY birth_date): rowcount = 
463.0, cumulative cost = \{463.0 rows, 1852.0 cpu, 474630.0 io, 0.0 network, 
0.0 memory}, id = 805



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (DRILL-8403) Rewritten aggregate functions are incorrectly grouped when used with PIVOT

Reply via email to