[jira] [Commented] (DRILL-8403) Generate aggregate function calls are missing required filters when used with PIVOT

ASF GitHub Bot (Jira) Wed, 22 Feb 2023 13:32:44 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692373#comment-17692373
 ]


ASF GitHub Bot commented on DRILL-8403:
---------------------------------------

vvysotskyi opened a new pull request, #2765:
URL: https://github.com/apache/drill/pull/2765

   # [DRILL-8403](https://issues.apache.org/jira/browse/DRILL-8403): Generate 
aggregate function calls are missing required filters when used with PIVOT
   
   ## Description
   Passing filters to agg calls when applying agg reduce rule.
   
   ## Documentation
   NA
   
   ## Testing
   Added UT.
   




> Generate aggregate function calls are missing required filters when used with 
> PIVOT
> -----------------------------------------------------------------------------------
>
>                 Key: DRILL-8403
>                 URL: https://issues.apache.org/jira/browse/DRILL-8403
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.21.0
>            Reporter: James Turton
>            Assignee: Vova Vysotskyi
>            Priority: Major
>             Fix For: 1.21.1
>
>
> The following query should generate aggregates grouped by education_level and 
> containing filters on marital_status but the requisite filters are lost 
> during function rewriting.
> apache drill> SELECT
> 2..semicolon> *
> 3..semicolon> FROM
> 4..semicolon> (SELECT
> 5..........)> education_level,
> 6..........)> salary,
> 7..........)> marital_status,
> 8..........)> extract(year from age(birth_date)) age
> 9..........)> FROM
> 10.........)> cp.`employee.json`)
> 11.semicolon> PIVOT (
> 12.........)> avg(salary) avg_salary, avg(age) avg_age FOR marital_status IN 
> ('M' married, 'S' single)
> 13.........)> );
> {+}--------------------{-}{-}{+}------------------{-}++{-}------------------{-}{-}------------------{-}++{-}-------------------
> |education_level|married_avg_salary|married_avg_age|single_avg_salary|single_avg_age|
> {+}--------------------{-}{-}{+}------------------{-}++{-}------------------{-}{-}------------------{-}++{-}-------------------
> |Graduate 
> Degree|4392.823529411765|100.32352941176471|4392.823529411765|100.32352941176471|
> |Bachelors 
> Degree|4492.404181184669|102.22996515679442|4492.404181184669|102.22996515679442|
> |Partial 
> College|4047.1180555555557|100.10069444444444|4047.1180555555557|100.10069444444444|
> |High School 
> Degree|3516.1565836298932|103.12811387900356|3516.1565836298932|103.12811387900356|
> |Partial High 
> School|3511.0852713178297|102.30232558139535|3511.0852713178297|102.30232558139535|
> {+}--------------------{-}{-}{+}------------------{-}++{-}------------------{-}{-}------------------{-}++{-}-------------------
> 5 rows selected (0.285 seconds)
>  
> 00-00 Screen : rowType = RecordType(ANY education_level, ANY 
> married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE 
> single_avg_age): rowcount = 46.3, cumulative cost = \{1486.23 rows, 
> 35748.229999999996 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, 
> id = 812
> 00-01 Project(education_level=[$0], married_min_salary=[$1], 
> married_avg_age=[$2], single_min_salary=[$3], single_avg_age=[$4]) : rowType 
> = RecordType(ANY education_level, ANY married_min_salary, DOUBLE 
> married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 
> 46.3, cumulative cost = \{1481.6 rows, 35743.6 cpu, 474630.0 io, 0.0 network, 
> 8148.800000000001 memory}, id = 811
> 00-02 Project(education_level=[$0], 
> married_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], 
> married_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)], 
> single_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], 
> single_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)]) : 
> rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE 
> married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 
> 46.3, cumulative cost = \{1435.3 rows, 35512.1 cpu, 474630.0 io, 0.0 network, 
> 8148.800000000001 memory}, id = 808
> 00-03 HashAgg(group=[\\{0}], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], 
> agg#2=[$SUM0($3)], agg#3=[COUNT($3)]) : rowType = RecordType(ANY 
> education_level, ANY $f1, BIGINT $f2, BIGINT $f3, BIGINT $f4): rowcount = 
> 46.3, cumulative cost = \{1389.0 rows, 34725.0 cpu, 474630.0 io, 0.0 network, 
> 8148.800000000001 memory}, id = 807
> 00-04 Project(education_level=[$0], marital_status=[$1], salary=[$2], 
> age=[EXTRACT(FLAG(YEAR), AGE($3))], $f4=[IS TRUE(=($1, 'M'))], $f5=[IS 
> TRUE(=($1, 'S'))]) : rowType = RecordType(ANY education_level, ANY 
> marital_status, ANY salary, BIGINT age, BOOLEAN $f4, BOOLEAN $f5): rowcount = 
> 463.0, cumulative cost = \{926.0 rows, 8797.0 cpu, 474630.0 io, 0.0 network, 
> 0.0 memory}, id = 806
> 00-05 Scan(table=[[cp, employee.json]], groupscan=[EasyGroupScan 
> [selectionRoot=classpath:/employee.json, numFiles=1, 
> columns=[`education_level`, `marital_status`, `salary`, `birth_date`], 
> files=[classpath:/employee.json], usedMetastore=false, limit=-1, 
> formatConfig=JSONFormatConfig [extensions=[json]]]]) : rowType = 
> RecordType(ANY education_level, ANY marital_status, ANY salary, ANY 
> birth_date): rowcount = 463.0, cumulative cost = \{463.0 rows, 1852.0 cpu, 
> 474630.0 io, 0.0 network, 0.0 memory}, id = 805



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8403) Generate aggregate function calls are missing required filters when used with PIVOT

Reply via email to