[jira] [Comment Edited] (DRILL-6359) All-text mode in JSON still reads missing column as Nullable Int

Khurram Faraaz (JIRA) Wed, 25 Apr 2018 23:42:13 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453573#comment-16453573
 ]


Khurram Faraaz edited comment on DRILL-6359 at 4/26/18 6:40 AM:
----------------------------------------------------------------

[~paul-rogers]

On Apache Drill 1.14.0-SNAPSHOT, commit id: 
931b43ec54bf47dcbb4aa9ae4499f37a8f21b408

we see the same error message
{noformat}
[root@qa102-45 ~]# cd /opt/mapr/drill/drill-1.14.0/bin
[root@qa102-45 bin]# ./sqlline -u 
"jdbc:drill:schema=dfs.tmp;drillbit=<IP-ADDRESS>"
apache drill 1.14.0-SNAPSHOT
"the only truly happy people are children, the creative minority and drill 
users"

0: jdbc:drill:schema=dfs.tmp> SELECT a, b FROM `generated.json` WHERE b IS NOT 
NULL ORDER BY a;
Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(VARCHAR:OPTIONAL)]], selectionVector=NONE]
Fragment 0:0

[Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a on qa102-45.qa.lab:31010] 
(state=,code=0)
0: jdbc:drill:schema=dfs.tmp>
{noformat}
Data looks like this, and there are over 70K rows in the JSON data file, 
generated.json
{noformat}
{
 "a": "5ae16f29fb675a7ed96bb532"
 }
 {
 "a": "5ae16f29fb675a7ed96bb532"
 }
 ...
 { "a": "5ae16f2906e007f3dcbaa714" }
 { "a": "5ae16f2906e007f3dcbaa714" }
 { "a": "5ae16f29972377bc859056c8","b":"10.5"}
{noformat}
Stack trace from drillbit.log for the above failure
{noformat}
2018-04-25 23:32:08,980 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] WARN 
o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path `b`, 
returning null instance.
2018-04-25 23:32:09,050 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] INFO 
o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: Schema changes not 
supported in External Sort. Please enable Union type.
org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: 
Schema changes not supported in External Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(VARCHAR:OPTIONAL)]], selectionVector=NONE]

[Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a ]
 at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.setupSchema(ExternalSortBatch.java:459)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:410)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:354)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:299)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:80)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) 
[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) 
[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:292)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:279)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_161]
 at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_161]
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
 [hadoop-common-2.7.0-mapr-1707.jar:na]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:279)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_161]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
 at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
{noformat}


was (Author: khfaraaz):
[~paul-rogers]

On Apache Drill 1.14.0-SNAPSHOT, commit id: 
931b43ec54bf47dcbb4aa9ae4499f37a8f21b408

we see the same error message

{noformat}
[root@qa102-45 ~]# cd /opt/mapr/drill/drill-1.14.0/bin
[root@qa102-45 bin]# ./sqlline -u 
"jdbc:drill:schema=dfs.tmp;drillbit=<IP-ADDRESS>"
apache drill 1.14.0-SNAPSHOT
"the only truly happy people are children, the creative minority and drill 
users"

0: jdbc:drill:schema=dfs.tmp> SELECT a, b FROM `generated.json` WHERE b IS NOT 
NULL ORDER BY a;
Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(VARCHAR:OPTIONAL)]], selectionVector=NONE]
Fragment 0:0

[Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a on qa102-45.qa.lab:31010] 
(state=,code=0)
0: jdbc:drill:schema=dfs.tmp>
{noformat}

Data looks like this, and there are over 70K rows in the JSON data file, 
generated.json

{noformat}
{
 "a": "5ae16f29fb675a7ed96bb532"
 }
 {
 "a": "5ae16f29fb675a7ed96bb532"
 }
 ...
 \{ "a": "5ae16f2906e007f3dcbaa714" }
 \{ "a": "5ae16f2906e007f3dcbaa714" }
 \{ "a": "5ae16f29972377bc859056c8","b":"10.5"}
{noformat}


Stack trace from drillbit.log for the above failure

{noformat}
2018-04-25 23:32:08,980 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] WARN 
o.a.d.e.e.ExpressionTreeMaterializer - Unable to find value vector of path `b`, 
returning null instance.
2018-04-25 23:32:09,050 [251e8d97-3a31-2d67-0b14-7389103b1690:frag:0:0] INFO 
o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: Schema changes not 
supported in External Sort. Please enable Union type.
org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: 
Schema changes not supported in External Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(INT:OPTIONAL)]], selectionVector=NONE]
Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
(VARCHAR:OPTIONAL)]], selectionVector=NONE]

[Error Id: fdc19434-9b8b-4bb8-87cf-dd28b182c93a ]
 at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.setupSchema(ExternalSortBatch.java:459)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:410)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:354)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:299)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:80)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:118)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:108)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:164)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) 
[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) 
[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:292)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:279)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_161]
 at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_161]
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
 [hadoop-common-2.7.0-mapr-1707.jar:na]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:279)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_161]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
 at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
{noformat}

> All-text mode in JSON still reads missing column as Nullable Int
> ----------------------------------------------------------------
>
>                 Key: DRILL-6359
>                 URL: https://issues.apache.org/jira/browse/DRILL-6359
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Priority: Major
>
> Suppose we have the following file:
> {noformat}
> {a: 0}
> {a: 1}
> ...
> {a: 70001, b: 10.5}
> {noformat}
> Where the "..." indicates another 70K records. (Chosen to force the 
> appearance of {{b}} into a second or later batch.)
> Suppose we execute the following query:
> {code}
> ALTER SESSION SET `store.json.all_text_mode` = true;
> SELECT a, b FROM `70Kmissing.json` WHERE b IS NOT NULL ORDER BY a;
> {code}
> The query should work. We have an explicit project for column {{b}} and we've 
> told JSON to always use text. So, JSON should have enough information to 
> create column {{b}} as {{Nullable VarChar}}.
> Yet, the result of the query in {{sqlline}} is:
> {noformat}
> Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External 
> Sort. Please enable Union type.
> Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
> (INT:OPTIONAL)]], selectionVector=NONE]
> Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` 
> (VARCHAR:OPTIONAL)]], selectionVector=NONE]
> {noformat}
> The expected result is that the query works because even missing columns 
> should be subject to the "all text mode" setting because the JSON reader 
> handles projection push-down, and is responsible for filling in the missing 
> columns.
> This is with the shipping Drill 1.13 JSON reader. I *think* this is fixed in 
> the "batch size handling" JSON reader rewrite, but I've not tested it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DRILL-6359) All-text mode in JSON still reads missing column as Nullable Int

Reply via email to