Jacques Nadeau created DRILL-4473:
-------------------------------------

             Summary: Removing trivial projects reveals bugs in handling of 
nonexistent columns in StreamingAggregate
                 Key: DRILL-4473
                 URL: https://issues.apache.org/jira/browse/DRILL-4473
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jacques Nadeau


We see a couple unit test failures in working with nonexistent columns once 
DRILL-4467 is fixed. This is because trivial projects no longer protect 
StreamingAggregate from non-existent columns. This is likely due to an 
incorrect check before throwing a Unsupported error. An unknown/ANY type should 
probably be allowed in the case of using sum/max/stddev

{code:title=Plan before DRILL-4467}
VOLCANO:Physical Planning (71ms):
ScreenPrel: rowcount = 1.0, cumulative cost = {464.1 rows, 2375.1 cpu, 0.0 io, 
0.0 network, 0.0 memory}, id = 185
  ProjectPrel(col1=[$0], col2=[$1], col3=[$2], col4=[$3], col5=[$4]): rowcount 
= 1.0, cumulative cost = {464.0 rows, 2375.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 184
    StreamAggPrel(group=[{}], col1=[SUM($0)], col2=[SUM($1)], col3=[SUM($2)], 
col4=[SUM($3)], col5=[SUM($4)]): rowcount = 1.0, cumulative cost = {464.0 rows, 
2375.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 183
      LimitPrel(offset=[0], fetch=[0]): rowcount = 1.0, cumulative cost = 
{463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 182
        ProjectPrel(int_col=[$0], bigint_col=[$3], float4_col=[$4], 
float8_col=[$1], interval_year_col=[$2]): rowcount = 463.0, cumulative cost = 
{463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 181
          ScanPrel(groupscan=[EasyGroupScan 
[selectionRoot=classpath:/employee.json, numFiles=1, columns=[`int_col`, 
`bigint_col`, `float4_col`, `float8_col`, `interval_year_col`], 
files=[classpath:/employee.json]]]): rowcount = 463.0, cumulative cost = {463.0 
rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 160
{code}

{code:title=Plan after DRILL-4467}
VOLCANO:Physical Planning (63ms):
ScreenPrel: rowcount = 1.0, cumulative cost = {464.1 rows, 2375.1 cpu, 0.0 io, 
0.0 network, 0.0 memory}, id = 151
  ProjectPrel(col1=[$0], col2=[$1], col3=[$2], col4=[$3], col5=[$4]): rowcount 
= 1.0, cumulative cost = {464.0 rows, 2375.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 150
    StreamAggPrel(group=[{}], col1=[SUM($0)], col2=[SUM($1)], col3=[SUM($2)], 
col4=[SUM($3)], col5=[SUM($4)]): rowcount = 1.0, cumulative cost = {464.0 rows, 
2375.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 149
      LimitPrel(offset=[0], fetch=[0]): rowcount = 1.0, cumulative cost = 
{463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 148
        ScanPrel(groupscan=[EasyGroupScan 
[selectionRoot=classpath:/employee.json, numFiles=1, columns=[`int_col`, 
`bigint_col`, `float4_col`, `float8_col`, `interval_year_col`], 
files=[classpath:/employee.json]]]): rowcount = 463.0, cumulative cost = {463.0 
rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 141


Tests disabled referring to this bug in TestAggregateFunctions show multiple 
examples of this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to