Khurram Faraaz created DRILL-2504:
-------------------------------------

             Summary: Aggregate query with grouping results in Error
                 Key: DRILL-2504
                 URL: https://issues.apache.org/jira/browse/DRILL-2504
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Relational Operators
    Affects Versions: 0.8.0
         Environment: 4 node cluster
{code}
0: jdbc:drill:> select * from sys.version;
+------------+----------------+-------------+-------------+------------+
| commit_id  | commit_message | commit_time | build_email | build_time |
+------------+----------------+-------------+-------------+------------+
| f658a3c513ddf7f2d1b0ad7aa1f3f65049a594fe | DRILL-2209 Insert ProjectOperator 
with MuxExchange | 09.03.2015 @ 01:49:18 EDT | Unknown     | 09.03.2015 @ 
04:52:49 EDT |
+------------+----------------+-------------+-------------+------------+
1 row selected (0.062 seconds)
{code}
            Reporter: Khurram Faraaz
            Assignee: Chris Westin


The below aggregate query with group by over distinct/non-distinct data results 
in an Exception. Please note that I had set enable_hashagg=false and I was 
querying from a CSV file. Query was run over a four node cluster.

alter system set `planner.enable_hashagg`=true;
alter session set `planner.enable_hashagg`=true;

{code}
0: jdbc:drill:> alter system set `planner.enable_hashagg`=false;
+------------+------------+
|     ok     |  summary   |
+------------+------------+
| true       | planner.enable_hashagg updated. |
+------------+------------+
1 row selected (0.075 seconds)
0: jdbc:drill:> select columns[4], sum(columns[0]), count(distinct columns[1]), 
max(columns[2]), count(distinct columns[3]), max(columns[5]), min(columns[6]), 
avg(columns[7])
. . . . . . . > from `conftest.csv`
. . . . . . . > group by columns[4];
Query failed: Query stopped., Failure while trying to materialize incoming 
schema.  Errors:
 
Error in expression at index -1.  Error: Missing function implementation: 
[castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--.. [ 
6cd09ba7-3e4b-4b3b-b111-39f74f53e1b0 on centos-01.qa.lab:31010 ]

Error: exception while executing query: Failure while executing query. 
(state=,code=0)
{code}

{code}
Stack trace from drillbit.log

2015-03-19 17:47:43,123 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] ERROR 
o.a.d.e.w.f.AbstractStatusReporter - Error 
bab1babd-48fe-4719-8a77-dc5826027ba7: Failure while running fragment.
org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to 
materialize incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function implementation: 
[castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
        at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:390)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) 
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:97)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) 
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121)
 ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_75]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_75]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
2015-03-19 17:47:43,124 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] INFO  
o.a.drill.exec.work.foreman.Foreman - State change requested.  RUNNING --> 
FAILED
org.apache.drill.exec.rpc.RemoteRpcException: Failure while running fragment., 
Failure while trying to materialize incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function implementation: 
[castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--.. [ 
bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
[ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]

        at 
org.apache.drill.exec.work.foreman.QueryManager.statusUpdate(QueryManager.java:95)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:154)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:114)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:110)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.internalFail(FragmentExecutor.java:176)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:123)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_75]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_75]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
2015-03-19 17:47:43,141 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] INFO  
o.a.drill.exec.work.foreman.Foreman - State change requested.  FAILED --> 
COMPLETED
2015-03-19 17:47:43,141 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] WARN  
o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED 
state as query is already at FAILED state (which is terminal).
2015-03-19 17:47:43,142 [2af4f441-8c04-99f9-1a12-a55a7c72ece7:frag:0:0] ERROR 
o.a.drill.exec.work.foreman.Foreman - Error 
82bd01dc-d27d-47e4-836d-426f3625511f: RemoteRpcException: Failure while running 
fragment., Failure while trying to materialize incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function implementation: 
[castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--.. [ 
bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
[ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]

org.apache.drill.exec.rpc.RemoteRpcException: Failure while running fragment., 
Failure while trying to materialize incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function implementation: 
[castINT(BIT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--.. [ 
bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]
[ bab1babd-48fe-4719-8a77-dc5826027ba7 on centos-01.qa.lab:31010 ]

at 
org.apache.drill.exec.work.foreman.QueryManager.statusUpdate(QueryManager.java:95)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:154)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:114)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:110)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.internalFail(FragmentExecutor.java:176)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:123)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
 [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_75]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_75]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
{code}

Physical plan for the failing query

{code}
00-00    Screen : rowType = RecordType(ANY EXPR$0, ANY EXPR$1, BIGINT EXPR$2, 
ANY EXPR$3, BIGINT EXPR$4, ANY EXPR$5, ANY EXPR$6, ANY EXPR$7): rowcount = 
29.5, cumulative cost = {832.95 rows, 13844.73454409817 cpu, 0.0 io, 0.0 
network, 5852.799999999999 memory}, id = 2049
00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3], 
EXPR$4=[$4], EXPR$5=[$5], EXPR$6=[$6], EXPR$7=[$7]) : rowType = RecordType(ANY 
EXPR$0, ANY EXPR$1, BIGINT EXPR$2, ANY EXPR$3, BIGINT EXPR$4, ANY EXPR$5, ANY 
EXPR$6, ANY EXPR$7): rowcount = 29.5, cumulative cost = {830.0 rows, 
13841.78454409817 cpu, 0.0 io, 0.0 network, 5852.799999999999 memory}, id = 2048
00-02        Project(EXPR$0=[$0], EXPR$1=[CASE(=($2, 0), null, $1)], 
EXPR$2=[$11], EXPR$3=[$3], EXPR$4=[$9], EXPR$5=[$4], EXPR$6=[$5], 
EXPR$7=[CAST(/(CastHigh(CASE(=($7, 0), null, $6)), $7)):ANY]) : rowType = 
RecordType(ANY EXPR$0, ANY EXPR$1, BIGINT EXPR$2, ANY EXPR$3, BIGINT EXPR$4, 
ANY EXPR$5, ANY EXPR$6, ANY EXPR$7): rowcount = 29.5, cumulative cost = {800.5 
rows, 13809.78454409817 cpu, 0.0 io, 0.0 network, 5852.799999999999 memory}, id 
= 2047
00-03          MergeJoin(condition=[IS NOT DISTINCT FROM($0, $10)], 
joinType=[inner]) : rowType = RecordType(ANY EXPR$0, ANY $f1, BIGINT $f2, ANY 
EXPR$3, ANY EXPR$5, ANY EXPR$6, ANY $f6, BIGINT $f7, ANY EXPR$00, BIGINT 
EXPR$4, ANY EXPR$01, BIGINT EXPR$2): rowcount = 29.5, cumulative cost = {771.0 
rows, 13777.78454409817 cpu, 0.0 io, 0.0 network, 5852.799999999999 memory}, id 
= 2046
00-05            MergeJoin(condition=[IS NOT DISTINCT FROM($0, $8)], 
joinType=[inner]) : rowType = RecordType(ANY EXPR$0, ANY $f1, BIGINT $f2, ANY 
EXPR$3, ANY EXPR$5, ANY EXPR$6, ANY $f6, BIGINT $f7, ANY EXPR$00, BIGINT 
EXPR$4): rowcount = 29.5, cumulative cost = {491.70000000000005 rows, 
10177.344151873782 cpu, 0.0 io, 0.0 network, 4814.4 memory}, id = 2038
00-08              StreamAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)], 
EXPR$3=[MAX($3)], EXPR$5=[MAX($5)], EXPR$6=[MIN($6)], agg#5=[$SUM0($7)], 
agg#6=[COUNT($7)]) : rowType = RecordType(ANY EXPR$0, ANY $f1, BIGINT $f2, ANY 
EXPR$3, ANY EXPR$5, ANY EXPR$6, ANY $f6, BIGINT $f7): rowcount = 5.9, 
cumulative cost = {236.0 rows, 6671.303759649394 cpu, 0.0 io, 0.0 network, 
3776.0 memory}, id = 2030
00-11                Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
EXPR$0, ANY $f1, ANY $f2, ANY $f3, ANY $f4, ANY $f5, ANY $f6, ANY $f7): 
rowcount = 59.0, cumulative cost = {177.0 rows, 1479.3037596493946 cpu, 0.0 io, 
0.0 network, 3776.0 memory}, id = 2029
00-14                  Project(EXPR$0=[ITEM($0, 4)], $f1=[ITEM($0, 0)], 
$f2=[ITEM($0, 1)], $f3=[ITEM($0, 2)], $f4=[ITEM($0, 3)], $f5=[ITEM($0, 5)], 
$f6=[ITEM($0, 6)], $f7=[ITEM($0, 7)]) : rowType = RecordType(ANY EXPR$0, ANY 
$f1, ANY $f2, ANY $f3, ANY $f4, ANY $f5, ANY $f6, ANY $f7): rowcount = 59.0, 
cumulative cost = {118.0 rows, 91.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
2028
00-17                    Scan(groupscan=[EasyGroupScan 
[selectionRoot=/tmp/conftest.csv, numFiles=1, columns=[`columns`[4], 
`columns`[0], `columns`[1], `columns`[2], `columns`[3], `columns`[5], 
`columns`[6], `columns`[7]], files=[maprfs:/tmp/conftest.csv]]]) : rowType = 
RecordType(ANY columns): rowcount = 59.0, cumulative cost = {59.0 rows, 59.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2027
00-07              Project(EXPR$00=[$0], EXPR$4=[$1]) : rowType = 
RecordType(ANY EXPR$00, BIGINT EXPR$4): rowcount = 1.0, cumulative cost = 
{248.8 rows, 3478.440392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 
2037
00-10                StreamAgg(group=[{0}], EXPR$4=[COUNT($1)]) : rowType = 
RecordType(ANY EXPR$0, BIGINT EXPR$4): rowcount = 1.0, cumulative cost = {247.8 
rows, 3470.440392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 2036
00-13                  Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
EXPR$0, ANY $f4): rowcount = 5.9, cumulative cost = {241.9 rows, 
3376.040392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 2035
00-16                    StreamAgg(group=[{0, 1}]) : rowType = RecordType(ANY 
EXPR$0, ANY $f4): rowcount = 5.9, cumulative cost = {236.0 rows, 
3315.607519298789 cpu, 0.0 io, 0.0 network, 944.0 memory}, id = 2034
00-19                      Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) 
: rowType = RecordType(ANY EXPR$0, ANY $f4): rowcount = 59.0, cumulative cost = 
{177.0 rows, 2843.607519298789 cpu, 0.0 io, 0.0 network, 944.0 memory}, id = 
2033
00-21                        Project(EXPR$0=[ITEM($0, 4)], $f4=[ITEM($0, 3)]) : 
rowType = RecordType(ANY EXPR$0, ANY $f4): rowcount = 59.0, cumulative cost = 
{118.0 rows, 67.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2032
00-22                          Scan(groupscan=[EasyGroupScan 
[selectionRoot=/tmp/conftest.csv, numFiles=1, columns=[`columns`[4], 
`columns`[3]], files=[maprfs:/tmp/conftest.csv]]]) : rowType = RecordType(ANY 
columns): rowcount = 59.0, cumulative cost = {59.0 rows, 59.0 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 2031
00-04            Project(EXPR$01=[$0], EXPR$2=[$1]) : rowType = RecordType(ANY 
EXPR$01, BIGINT EXPR$2): rowcount = 1.0, cumulative cost = {248.8 rows, 
3478.440392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 2045
00-06              StreamAgg(group=[{0}], EXPR$2=[COUNT($1)]) : rowType = 
RecordType(ANY EXPR$0, BIGINT EXPR$2): rowcount = 1.0, cumulative cost = {247.8 
rows, 3470.440392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 2044
00-09                Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY 
EXPR$0, ANY $f2): rowcount = 5.9, cumulative cost = {241.9 rows, 
3376.040392224387 cpu, 0.0 io, 0.0 network, 1038.4 memory}, id = 2043
00-12                  StreamAgg(group=[{0, 1}]) : rowType = RecordType(ANY 
EXPR$0, ANY $f2): rowcount = 5.9, cumulative cost = {236.0 rows, 
3315.607519298789 cpu, 0.0 io, 0.0 network, 944.0 memory}, id = 2042
00-15                    Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) : 
rowType = RecordType(ANY EXPR$0, ANY $f2): rowcount = 59.0, cumulative cost = 
{177.0 rows, 2843.607519298789 cpu, 0.0 io, 0.0 network, 944.0 memory}, id = 
2041
00-18                      Project(EXPR$0=[ITEM($0, 4)], $f2=[ITEM($0, 1)]) : 
rowType = RecordType(ANY EXPR$0, ANY $f2): rowcount = 59.0, cumulative cost = 
{118.0 rows, 67.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2040
00-20                        Scan(groupscan=[EasyGroupScan 
[selectionRoot=/tmp/conftest.csv, numFiles=1, columns=[`columns`[4], 
`columns`[1]], files=[maprfs:/tmp/conftest.csv]]]) : rowType = RecordType(ANY 
columns): rowcount = 59.0, cumulative cost = {59.0 rows, 59.0 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 2039
{code}

Query over view returns correct results

{code}
0: jdbc:drill:> create view v8(col_decimal, col_int, col_bigint, col_char, 
col_varchar, col_timestmp, col_float, col_double)
. . . . . . . > as
. . . . . . . > select cast(columns[0] as decimal(11,3)),
. . . . . . . > cast(columns[1] as int),
. . . . . . . > cast(columns[2] as bigint),
. . . . . . . > cast(columns[3] as char(9)),
. . . . . . . > cast(columns[4] as varchar(256)),
. . . . . . . > cast(columns[5] as timestamp),
. . . . . . . > cast(columns[6] as float),
. . . . . . . > cast(columns[7] as double)
. . . . . . . > from `conftest.csv`;
+------------+------------+
|     ok     |  summary   |
+------------+------------+
| true       | View 'v8' created successfully in 'dfs.tmp' schema |
+------------+------------+
1 row selected (0.102 seconds)
0: jdbc:drill:> select col_varchar, sum(col_decimal), count(distinct col_int)
. . . . . . . > from v8
. . . . . . . > group by col_varchar;

209 rows selected (0.713 seconds)
{code}

1. Original query over the VIEW returns correct results.
2. Original query over the CSV file, and with casting in the select statement 
on csv data, also returned correct results.

So the problem seems to be when we run original aggregate+grouping query over 
CSV file, when we do not explicitly cast the columns.

{code}
0: jdbc:drill:> select col_varchar, sum(col_decimal), count(distinct col_int), 
max(col_bigint), count(distinct col_char), max(col_timestmp), min(col_float), 
avg(col_double)
. . . . . . . > from v8
. . . . . . . > group by col_varchar;
+-------------+------------+------------+------------+------------+------------+------------+------------+
| col_varchar |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |   EXPR$5   
|   EXPR$6   |   EXPR$7   |
+-------------+------------+------------+------------+------------+------------+------------+------------+

209 rows selected (1.286 seconds)

The below query returns correct results

0: jdbc:drill:> select columns[4], sum(cast(columns[0] as decimal(11,3))), 
count(distinct cast(columns[1] as int)), max(cast(columns[2] as bigint)), 
count(distinct cast(columns[3] as char(9))), max(cast(columns[5] as 
timestamp)), min(cast(columns[6] as float)), avg(cast(columns[7] as double))
. . . . . . . > from `conftest.csv`
. . . . . . . > group by columns[4];

209 rows selected (1.129 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to