[ https://issues.apache.org/jira/browse/DRILL-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453573#comment-16453573 ]
Khurram Faraaz edited comment on DRILL-6359 at 4/26/18 6:40 AM: ---------------------------------------------------------------- [~paul-rogers] On Apache Drill 1.14.0-SNAPSHOT, commit id: 931b43ec54bf47dcbb4aa9ae4499f37a8f21b408 we see the same error message {noformat} [root@qa102-45 ~]# cd /opt/mapr/drill/drill-1.14.0/bin [root@qa102-45 bin]# ./sqlline -u "jdbc:drill:schema=dfs.tmp;drillbit=<IP-ADDRESS>" apache drill 1.14.0-SNAPSHOT "the only truly happy people are children, the creative minority and drill users" 0: jdbc:drill:schema=dfs.tmp> SELECT a, b FROM `generated.json` WHERE b IS NOT NULL ORDER BY a; Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type. Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (INT:OPTIONAL)]], selectionVector=NONE] Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)]], selectionVector=NONE] Fragment 0:0 [Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a on qa102-45.qa.lab:31010] (state=,code=0) 0: jdbc:drill:schema=dfs.tmp> {noformat} Data looks like this, and there are over 70K rows in the JSON data file, generated.json {noformat} { "a": "5ae16f29fb675a7ed96bb532" } { "a": "5ae16f29fb675a7ed96bb532" } ... { "a": "5ae16f2906e007f3dcbaa714" } { "a": "5ae16f2906e007f3dcbaa714" } { "a": "5ae16f29972377bc859056c8","b":"10.5"} {noformat} Stack trace from drillbit.log for the above failure {noformat} 2018-04-25 23:32:08,980 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] WARN o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path `b`, returning null instance. 2018-04-25 23:32:09,050 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] INFO o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: Schema changes not supported in External Sort. Please enable Union type. org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type. Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (INT:OPTIONAL)]], selectionVector=NONE] Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)]], selectionVector=NONE] [Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.setupSchema(ExternalSortBatch.java:459) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:410) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:354) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:299) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:80) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:292) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:279) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_161] at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_161] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) [hadoop-common-2.7.0-mapr-1707.jar:na] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:279) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] {noformat} was (Author: khfaraaz): [~paul-rogers] On Apache Drill 1.14.0-SNAPSHOT, commit id: 931b43ec54bf47dcbb4aa9ae4499f37a8f21b408 we see the same error message {noformat} [root@qa102-45 ~]# cd /opt/mapr/drill/drill-1.14.0/bin [root@qa102-45 bin]# ./sqlline -u "jdbc:drill:schema=dfs.tmp;drillbit=<IP-ADDRESS>" apache drill 1.14.0-SNAPSHOT "the only truly happy people are children, the creative minority and drill users" 0: jdbc:drill:schema=dfs.tmp> SELECT a, b FROM `generated.json` WHERE b IS NOT NULL ORDER BY a; Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type. Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (INT:OPTIONAL)]], selectionVector=NONE] Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)]], selectionVector=NONE] Fragment 0:0 [Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a on qa102-45.qa.lab:31010] (state=,code=0) 0: jdbc:drill:schema=dfs.tmp> {noformat} Data looks like this, and there are over 70K rows in the JSON data file, generated.json {noformat} { "a": "5ae16f29fb675a7ed96bb532" } { "a": "5ae16f29fb675a7ed96bb532" } ... \{ "a": "5ae16f2906e007f3dcbaa714" } \{ "a": "5ae16f2906e007f3dcbaa714" } \{ "a": "5ae16f29972377bc859056c8","b":"10.5"} {noformat} Stack trace from drillbit.log for the above failure {noformat} 2018-04-25 23:32:08,980 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] WARN o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path `b`, returning null instance. 2018-04-25 23:32:09,050 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] INFO o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: Schema changes not supported in External Sort. Please enable Union type. org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type. Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (INT:OPTIONAL)]], selectionVector=NONE] Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)]], selectionVector=NONE] [Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.setupSchema(ExternalSortBatch.java:459) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:410) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:354) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:299) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:80) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:292) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:279) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_161] at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_161] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) [hadoop-common-2.7.0-mapr-1707.jar:na] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:279) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] {noformat} > All-text mode in JSON still reads missing column as Nullable Int > ---------------------------------------------------------------- > > Key: DRILL-6359 > URL: https://issues.apache.org/jira/browse/DRILL-6359 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.13.0 > Reporter: Paul Rogers > Priority: Major > > Suppose we have the following file: > {noformat} > {a: 0} > {a: 1} > ... > {a: 70001, b: 10.5} > {noformat} > Where the "..." indicates another 70K records. (Chosen to force the > appearance of {{b}} into a second or later batch.) > Suppose we execute the following query: > {code} > ALTER SESSION SET `store.json.all_text_mode` = true; > SELECT a, b FROM `70Kmissing.json` WHERE b IS NOT NULL ORDER BY a; > {code} > The query should work. We have an explicit project for column {{b}} and we've > told JSON to always use text. So, JSON should have enough information to > create column {{b}} as {{Nullable VarChar}}. > Yet, the result of the query in {{sqlline}} is: > {noformat} > Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External > Sort. Please enable Union type. > Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` > (INT:OPTIONAL)]], selectionVector=NONE] > Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` > (VARCHAR:OPTIONAL)]], selectionVector=NONE] > {noformat} > The expected result is that the query works because even missing columns > should be subject to the "all text mode" setting because the JSON reader > handles projection push-down, and is responsible for filling in the missing > columns. > This is with the shipping Drill 1.13 JSON reader. I *think* this is fixed in > the "batch size handling" JSON reader rewrite, but I've not tested it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)