[ https://issues.apache.org/jira/browse/DRILL-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126582#comment-15126582 ]
Khurram Faraaz commented on DRILL-4255: --------------------------------------- I have not tried that. I will test with that option set to true and share results here. > SELECT DISTINCT query over JSON data returns UNSUPPORTED OPERATION > ------------------------------------------------------------------ > > Key: DRILL-4255 > URL: https://issues.apache.org/jira/browse/DRILL-4255 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.4.0 > Environment: CentOS > Reporter: Khurram Faraaz > > SELECT DISTINCT over mapr fs generated audit logs (JSON files) results in > unsupported operation. An exact query over another set of JSON data returns > correct results. > MapR Drill 1.4.0, commit ID : 9627a80f > MapRBuildVersion : 5.1.0.36488.GA > OS : CentOS x86_64 GNU/Linux > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select distinct t.operation from `auditlogs` t; > Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema > changes > Fragment 3:3 > [Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 on example.com:31010] > (state=,code=0) > {noformat} > Stack trace from drillbit.log > {noformat} > 2016-01-08 11:35:35,093 [297060f9-1c7a-b32c-09e8-24b5ad863e73:frag:3:3] INFO > o.a.d.e.p.i.aggregate.HashAggBatch - User Error Occurred > org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION > ERROR: Hash aggregate does not support schema changes > [Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) > ~[drill-common-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:144) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) > [drill-java-exec-1.4.0.jar:1.4.0] > at java.security.AccessController.doPrivileged(Native Method) > [na:1.7.0_65] > at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_65] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) > [hadoop-common-2.7.0-mapr-1506.jar:na] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) > [drill-java-exec-1.4.0.jar:1.4.0] > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > [drill-common-1.4.0.jar:1.4.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_65] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_65] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] > {noformat} > Query plan for above query. > {noformat} > 00-00 Screen : rowType = RecordType(ANY operation): rowcount = 141437.16, > cumulative cost = {3.4100499276E7 rows, 1.69455861396E8 cpu, 0.0 io, > 1.2165858754560001E10 network, 2.7382234176000005E8 memory}, id = 7572 > 00-01 UnionExchange : rowType = RecordType(ANY operation): rowcount = > 141437.16, cumulative cost = {3.408635556E7 rows, 1.6944171768E8 cpu, 0.0 io, > 1.2165858754560001E10 network, 2.7382234176000005E8 memory}, id = 7571 > 01-01 Project(operation=[$0]) : rowType = RecordType(ANY operation): > rowcount = 141437.16, cumulative cost = {3.3944918400000006E7 rows, > 1.683102204E8 cpu, 0.0 io, 1.15865321472E10 network, 2.7382234176000005E8 > memory}, id = 7570 > 01-02 HashAgg(group=[{0}]) : rowType = RecordType(ANY operation): > rowcount = 141437.16, cumulative cost = {3.3944918400000006E7 rows, > 1.683102204E8 cpu, 0.0 io, 1.15865321472E10 network, 2.7382234176000005E8 > memory}, id = 7569 > 01-03 Project(operation=[$0]) : rowType = RecordType(ANY > operation): rowcount = 1414371.6, cumulative cost = {3.2530546800000004E7 > rows, 1.569952476E8 cpu, 0.0 io, 1.15865321472E10 network, > 2.4892940160000002E8 memory}, id = 7568 > 01-04 HashToRandomExchange(dist0=[[$0]]) : rowType = > RecordType(ANY operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = > 1414371.6, cumulative cost = {3.2530546800000004E7 rows, 1.569952476E8 cpu, > 0.0 io, 1.15865321472E10 network, 2.4892940160000002E8 memory}, id = 7567 > 02-01 UnorderedMuxExchange : rowType = RecordType(ANY > operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1414371.6, cumulative > cost = {3.1116175200000003E7 rows, 1.34365302E8 cpu, 0.0 io, 0.0 network, > 2.4892940160000002E8 memory}, id = 7566 > 03-01 Project(operation=[$0], > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) : rowType = RecordType(ANY > operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1414371.6, cumulative > cost = {2.97018036E7 rows, 1.329509304E8 cpu, 0.0 io, 0.0 network, > 2.4892940160000002E8 memory}, id = 7565 > 03-02 HashAgg(group=[{0}]) : rowType = RecordType(ANY > operation): rowcount = 1414371.6, cumulative cost = {2.8287432E7 rows, > 1.27293444E8 cpu, 0.0 io, 0.0 network, 2.4892940160000002E8 memory}, id = 7564 > 03-03 Scan(groupscan=[EasyGroupScan > [selectionRoot=maprfs:/tmp/auditlogs, numFiles=31, columns=[`operation`], > files=[maprfs:/tmp/auditlogs/DBAudit.log-2015-12-30-001.json, > maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-002.json, > maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-001.json, > maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-003.json, > maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-002.json, > maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-001.json, > maprfs:/tmp/auditlogs/FSAudit.log-2015-12-30-001.json, > maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-003.json, > maprfs:/tmp/auditlogs/DBAudit.log-2015-12-31-002.json, > maprfs:/tmp/auditlogs/FSAudit.log-2016-01-04-001.json, > maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-001.json, > maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-003.json, > maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-002.json, > maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-003.json, > maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-003.json, > maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-001.json, > maprfs:/tmp/auditlogs/FSAudit.log-2016-01-03-001.json, > maprfs:/tmp/auditlogs/DBAudit.log-2015-12-31-001.json, > maprfs:/tmp/auditlogs/DBAudit.log-2015-12-29-001.json, > maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-004.json, > maprfs:/tmp/auditlogs/FSAudit.log-2016-01-01-001.json, > maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-004.json, > maprfs:/tmp/auditlogs/FSAudit.log-2015-12-29-001.json, > maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-001.json, > maprfs:/tmp/auditlogs/DBAudit.log-2016-01-01-001.json, > maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-004.json, > maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-004.json, > maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-002.json, > maprfs:/tmp/auditlogs/FSAudit.log-2016-01-07-001.json, > maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-002.json, > maprfs:/tmp/auditlogs/FSAudit.log-2016-01-08-001.json]]]) : rowType = > RecordType(ANY operation): rowcount = 1.4143716E7, cumulative cost = > {1.4143716E7 rows, 1.4143716E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = > 7563 > {noformat} > Another query that is exactly like the failing query reported here, this one > returns correct results though. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select distinct t.key2 from `twoKeyJsn.json` t; > +-------+ > | key2 | > +-------+ > | d | > | c | > | b | > | 1 | > | a | > | 0 | > | k | > | m | > | j | > | h | > | e | > | n | > | g | > | f | > | l | > | i | > +-------+ > 16 rows selected (27.097 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)