Jacques Nadeau created DRILL-4473:
-------------------------------------
Summary: Removing trivial projects reveals bugs in handling of
nonexistent columns in StreamingAggregate
Key: DRILL-4473
URL: https://issues.apache.org/jira/browse/DRILL-4473
Project: Apache Drill
Issue Type: Bug
Reporter: Jacques Nadeau
We see a couple unit test failures in working with nonexistent columns once
DRILL-4467 is fixed. This is because trivial projects no longer protect
StreamingAggregate from non-existent columns. This is likely due to an
incorrect check before throwing a Unsupported error. An unknown/ANY type should
probably be allowed in the case of using sum/max/stddev
{code:title=Plan before DRILL-4467}
VOLCANO:Physical Planning (71ms):
ScreenPrel: rowcount = 1.0, cumulative cost = {464.1 rows, 2375.1 cpu, 0.0 io,
0.0 network, 0.0 memory}, id = 185
ProjectPrel(col1=[$0], col2=[$1], col3=[$2], col4=[$3], col5=[$4]): rowcount
= 1.0, cumulative cost = {464.0 rows, 2375.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 184
StreamAggPrel(group=[{}], col1=[SUM($0)], col2=[SUM($1)], col3=[SUM($2)],
col4=[SUM($3)], col5=[SUM($4)]): rowcount = 1.0, cumulative cost = {464.0 rows,
2375.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 183
LimitPrel(offset=[0], fetch=[0]): rowcount = 1.0, cumulative cost =
{463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 182
ProjectPrel(int_col=[$0], bigint_col=[$3], float4_col=[$4],
float8_col=[$1], interval_year_col=[$2]): rowcount = 463.0, cumulative cost =
{463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 181
ScanPrel(groupscan=[EasyGroupScan
[selectionRoot=classpath:/employee.json, numFiles=1, columns=[`int_col`,
`bigint_col`, `float4_col`, `float8_col`, `interval_year_col`],
files=[classpath:/employee.json]]]): rowcount = 463.0, cumulative cost = {463.0
rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 160
{code}
{code:title=Plan after DRILL-4467}
VOLCANO:Physical Planning (63ms):
ScreenPrel: rowcount = 1.0, cumulative cost = {464.1 rows, 2375.1 cpu, 0.0 io,
0.0 network, 0.0 memory}, id = 151
ProjectPrel(col1=[$0], col2=[$1], col3=[$2], col4=[$3], col5=[$4]): rowcount
= 1.0, cumulative cost = {464.0 rows, 2375.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 150
StreamAggPrel(group=[{}], col1=[SUM($0)], col2=[SUM($1)], col3=[SUM($2)],
col4=[SUM($3)], col5=[SUM($4)]): rowcount = 1.0, cumulative cost = {464.0 rows,
2375.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 149
LimitPrel(offset=[0], fetch=[0]): rowcount = 1.0, cumulative cost =
{463.0 rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 148
ScanPrel(groupscan=[EasyGroupScan
[selectionRoot=classpath:/employee.json, numFiles=1, columns=[`int_col`,
`bigint_col`, `float4_col`, `float8_col`, `interval_year_col`],
files=[classpath:/employee.json]]]): rowcount = 463.0, cumulative cost = {463.0
rows, 2315.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 141
Tests disabled referring to this bug in TestAggregateFunctions show multiple
examples of this behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)