[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files
[ https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse updated DRILL-2677: --- Fix Version/s: (was: 1.2.0) 1.3.0 > Query does not go beyond 4096 lines in small JSON files > --- > > Key: DRILL-2677 > URL: https://issues.apache.org/jira/browse/DRILL-2677 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON > Environment: drill 0.8 official build >Reporter: Alexander Reshetov >Assignee: Jason Altekruse > Fix For: 1.3.0 > > Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, > dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, > dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, > dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af > > > Hello, > I'm trying to execute next query: > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed; > {code} > And it works as expected and I got result: > {code} > ++++ > |pck | timestamp | entry| > ++++ > | 3547 | 1419807470286356 | > {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"} > | > ++++ > 1 row selected (0.188 seconds) > {code} > This file contains 4095 same lines of one JSON string + at the end another > JOSN line (see attached file dataset_4095_and_1.json) > The problem is when first string repeats more than 4095 times query got > exception. Here is query for file with 4096 string of first type + 1 string > of another (see attached file dataset_4096_and_1.json). > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed; > Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" > java.lang.RuntimeException: Error closing fragment context. > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassCastException: > org.apache.drill.exec.vector.NullableIntVector cannot be cast to > org.apache.drill.exec.vector.RepeatedVector > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68) > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163) > ... 4 more >
[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files
[ https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse updated DRILL-2677: --- Fix Version/s: (was: 1.1.0) 1.2.0 > Query does not go beyond 4096 lines in small JSON files > --- > > Key: DRILL-2677 > URL: https://issues.apache.org/jira/browse/DRILL-2677 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON > Environment: drill 0.8 official build >Reporter: Alexander Reshetov >Assignee: Jason Altekruse > Fix For: 1.2.0 > > Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, > dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, > dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, > dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af > > > Hello, > I'm trying to execute next query: > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed; > {code} > And it works as expected and I got result: > {code} > ++++ > |pck | timestamp | entry| > ++++ > | 3547 | 1419807470286356 | > {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"} > | > ++++ > 1 row selected (0.188 seconds) > {code} > This file contains 4095 same lines of one JSON string + at the end another > JOSN line (see attached file dataset_4095_and_1.json) > The problem is when first string repeats more than 4095 times query got > exception. Here is query for file with 4096 string of first type + 1 string > of another (see attached file dataset_4096_and_1.json). > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed; > Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" > java.lang.RuntimeException: Error closing fragment context. > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassCastException: > org.apache.drill.exec.vector.NullableIntVector cannot be cast to > org.apache.drill.exec.vector.RepeatedVector > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68) > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163) > ... 4 more >
[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files
[ https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victoria Markman updated DRILL-2677: Fix Version/s: (was: 1.0.0) 1.1.0 > Query does not go beyond 4096 lines in small JSON files > --- > > Key: DRILL-2677 > URL: https://issues.apache.org/jira/browse/DRILL-2677 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON > Environment: drill 0.8 official build >Reporter: Alexander Reshetov >Assignee: Jason Altekruse > Fix For: 1.1.0 > > Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, > dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, > dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, > dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af > > > Hello, > I'm trying to execute next query: > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed; > {code} > And it works as expected and I got result: > {code} > ++++ > |pck | timestamp | entry| > ++++ > | 3547 | 1419807470286356 | > {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"} > | > ++++ > 1 row selected (0.188 seconds) > {code} > This file contains 4095 same lines of one JSON string + at the end another > JOSN line (see attached file dataset_4095_and_1.json) > The problem is when first string repeats more than 4095 times query got > exception. Here is query for file with 4096 string of first type + 1 string > of another (see attached file dataset_4096_and_1.json). > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed; > Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" > java.lang.RuntimeException: Error closing fragment context. > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassCastException: > org.apache.drill.exec.vector.NullableIntVector cannot be cast to > org.apache.drill.exec.vector.RepeatedVector > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68) > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163) > ... 4 more
[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files
[ https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse updated DRILL-2677: --- Fix Version/s: 1.0.0 > Query does not go beyond 4096 lines in small JSON files > --- > > Key: DRILL-2677 > URL: https://issues.apache.org/jira/browse/DRILL-2677 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON > Environment: drill 0.8 official build >Reporter: Alexander Reshetov > Fix For: 1.0.0 > > Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, > dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, > dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, > dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af > > > Hello, > I'm trying to execute next query: > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed; > {code} > And it works as expected and I got result: > {code} > ++++ > |pck | timestamp | entry| > ++++ > | 3547 | 1419807470286356 | > {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"} > | > ++++ > 1 row selected (0.188 seconds) > {code} > This file contains 4095 same lines of one JSON string + at the end another > JOSN line (see attached file dataset_4095_and_1.json) > The problem is when first string repeats more than 4095 times query got > exception. Here is query for file with 4096 string of first type + 1 string > of another (see attached file dataset_4096_and_1.json). > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed; > Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" > java.lang.RuntimeException: Error closing fragment context. > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassCastException: > org.apache.drill.exec.vector.NullableIntVector cannot be cast to > org.apache.drill.exec.vector.RepeatedVector > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68) > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163) > ... 4 more > Query failed: RemoteRpcException: Failure while running fragment., > org.
[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files
[ https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-2677: -- Component/s: Storage - JSON > Query does not go beyond 4096 lines in small JSON files > --- > > Key: DRILL-2677 > URL: https://issues.apache.org/jira/browse/DRILL-2677 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON > Environment: drill 0.8 official build >Reporter: Alexander Reshetov > Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, > dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, > dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, > dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af > > > Hello, > I'm trying to execute next query: > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed; > {code} > And it works as expected and I got result: > {code} > ++++ > |pck | timestamp | entry| > ++++ > | 3547 | 1419807470286356 | > {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"} > | > ++++ > 1 row selected (0.188 seconds) > {code} > This file contains 4095 same lines of one JSON string + at the end another > JOSN line (see attached file dataset_4095_and_1.json) > The problem is when first string repeats more than 4095 times query got > exception. Here is query for file with 4096 string of first type + 1 string > of another (see attached file dataset_4096_and_1.json). > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed; > Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" > java.lang.RuntimeException: Error closing fragment context. > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassCastException: > org.apache.drill.exec.vector.NullableIntVector cannot be cast to > org.apache.drill.exec.vector.RepeatedVector > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68) > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163) > ... 4 more > Query failed: RemoteRpcException: Failure while running fragment., > org.apache.drill.exec.vector.N
[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files
[ https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Reshetov updated DRILL-2677: -- Attachment: dataset_sample.json.gz.part-af dataset_sample.json.gz.part-ae dataset_sample.json.gz.part-ad dataset_sample.json.gz.part-ac dataset_sample.json.gz.part-ab dataset_sample.json.gz.part-aa Large file reggarding second part of the issue. Jira does not allow files >10M, so I splitted archive in 10M part. You can merge them via: {code} cat dataset_sample.json.gz.part-* > dataset_sample.json.gz {code} MD5 sum of it should be: {{238744304ff5200df5a357d9bac090dc}} > Query does not go beyond 4096 lines in small JSON files > --- > > Key: DRILL-2677 > URL: https://issues.apache.org/jira/browse/DRILL-2677 > Project: Apache Drill > Issue Type: Bug > Environment: drill 0.8 official build >Reporter: Alexander Reshetov > Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, > dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, > dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, > dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af > > > Hello, > I'm trying to execute next query: > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed; > {code} > And it works as expected and I got result: > {code} > ++++ > |pck | timestamp | entry| > ++++ > | 3547 | 1419807470286356 | > {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"} > | > ++++ > 1 row selected (0.188 seconds) > {code} > This file contains 4095 same lines of one JSON string + at the end another > JOSN line (see attached file dataset_4095_and_1.json) > The problem is when first string repeats more than 4095 times query got > exception. Here is query for file with 4096 string of first type + 1 string > of another (see attached file dataset_4096_and_1.json). > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed; > Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" > java.lang.RuntimeException: Error closing fragment context. > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassCastException: > org.apache.drill.exec.vector.NullableIntVector cannot be cast to > org.apache.drill.exec.vector.RepeatedVector > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at
[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files
[ https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Reshetov updated DRILL-2677: -- Attachment: dataset_4096_and_1.json dataset_4095_and_1.json First two files regarding root cause of issue. > Query does not go beyond 4096 lines in small JSON files > --- > > Key: DRILL-2677 > URL: https://issues.apache.org/jira/browse/DRILL-2677 > Project: Apache Drill > Issue Type: Bug > Environment: drill 0.8 official build >Reporter: Alexander Reshetov > Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json > > > Hello, > I'm trying to execute next query: > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed; > {code} > And it works as expected and I got result: > {code} > ++++ > |pck | timestamp | entry| > ++++ > | 3547 | 1419807470286356 | > {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"} > | > ++++ > 1 row selected (0.188 seconds) > {code} > This file contains 4095 same lines of one JSON string + at the end another > JOSN line (see attached file dataset_4095_and_1.json) > The problem is when first string repeats more than 4095 times query got > exception. Here is query for file with 4096 string of first type + 1 string > of another (see attached file dataset_4096_and_1.json). > {code} > select * from (select source.pck, source.`timestamp`, > flatten(source.HostUpdateTypeNW.Transfers) as entry from > dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed; > Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" > java.lang.RuntimeException: Error closing fragment context. > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassCastException: > org.apache.drill.exec.vector.NullableIntVector cannot be cast to > org.apache.drill.exec.vector.RepeatedVector > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78) > at > org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89) > at > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134) > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142) > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68) > at > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96) > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163) > ... 4 more > Query failed: RemoteRpcException: Failure while running fragment., > org.apache.drill.exec.vector.NullableIntVector cannot be cast to > org.apache.drill.exec.vector.RepeatedVector [ > cb6c7914-438f-440a-9c74-fe39130feca9 on testlab-