James Turton created DRILL-8403: ----------------------------------- Summary: Rewritten aggregate functions are incorrectly grouped when used with PIVOT Key: DRILL-8403 URL: https://issues.apache.org/jira/browse/DRILL-8403 Project: Apache Drill Issue Type: Bug Affects Versions: 1.21.0 Reporter: James Turton Assignee: Vova Vysotskyi Fix For: 1.21.1
The following query should group aggregates by both marital_status and education_level but only groups them by education_level. apache drill> SELECT 2..semicolon> * 3..semicolon> FROM 4..semicolon> (SELECT 5..........)> education_level, 6..........)> salary, 7..........)> marital_status, 8..........)> extract(year from age(birth_date)) age 9..........)> FROM 10.........)> cp.`employee.json`) 11.semicolon> PIVOT ( 12.........)> avg(salary) avg_salary, avg(age) avg_age FOR marital_status IN ('M' married, 'S' single) 13.........)> ); +---------------------+--------------------+--------------------+--------------------+--------------------+ | education_level | married_avg_salary | married_avg_age | single_avg_salary | single_avg_age | +---------------------+--------------------+--------------------+--------------------+--------------------+ | Graduate Degree | 4392.823529411765 | 100.32352941176471 | 4392.823529411765 | 100.32352941176471 | | Bachelors Degree | 4492.404181184669 | 102.22996515679442 | 4492.404181184669 | 102.22996515679442 | | Partial College | 4047.1180555555557 | 100.10069444444444 | 4047.1180555555557 | 100.10069444444444 | | High School Degree | 3516.1565836298932 | 103.12811387900356 | 3516.1565836298932 | 103.12811387900356 | | Partial High School | 3511.0852713178297 | 102.30232558139535 | 3511.0852713178297 | 102.30232558139535 | +---------------------+--------------------+--------------------+--------------------+--------------------+ 5 rows selected (0.285 seconds) 00-00 Screen : rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 46.3, cumulative cost = \{1486.23 rows, 35748.229999999996 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 812 00-01 Project(education_level=[$0], married_min_salary=[$1], married_avg_age=[$2], single_min_salary=[$3], single_avg_age=[$4]) : rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 46.3, cumulative cost = \{1481.6 rows, 35743.6 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 811 00-02 Project(education_level=[$0], married_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], married_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)], single_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], single_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)]) : rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 46.3, cumulative cost = \{1435.3 rows, 35512.1 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 808 00-03 HashAgg(group=[\{0}], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], agg#2=[$SUM0($3)], agg#3=[COUNT($3)]) : rowType = RecordType(ANY education_level, ANY $f1, BIGINT $f2, BIGINT $f3, BIGINT $f4): rowcount = 46.3, cumulative cost = \{1389.0 rows, 34725.0 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 807 00-04 Project(education_level=[$0], marital_status=[$1], salary=[$2], age=[EXTRACT(FLAG(YEAR), AGE($3))], $f4=[IS TRUE(=($1, 'M'))], $f5=[IS TRUE(=($1, 'S'))]) : rowType = RecordType(ANY education_level, ANY marital_status, ANY salary, BIGINT age, BOOLEAN $f4, BOOLEAN $f5): rowcount = 463.0, cumulative cost = \{926.0 rows, 8797.0 cpu, 474630.0 io, 0.0 network, 0.0 memory}, id = 806 00-05 Scan(table=[[cp, employee.json]], groupscan=[EasyGroupScan [selectionRoot=classpath:/employee.json, numFiles=1, columns=[`education_level`, `marital_status`, `salary`, `birth_date`], files=[classpath:/employee.json], usedMetastore=false, limit=-1, formatConfig=JSONFormatConfig [extensions=[json]]]]) : rowType = RecordType(ANY education_level, ANY marital_status, ANY salary, ANY birth_date): rowcount = 463.0, cumulative cost = \{463.0 rows, 1852.0 cpu, 474630.0 io, 0.0 network, 0.0 memory}, id = 805 -- This message was sent by Atlassian Jira (v8.20.10#820010)