[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files

2015-08-21 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-2677:
---
Fix Version/s: (was: 1.2.0)
   1.3.0

> Query does not go beyond 4096 lines in small JSON files
> ---
>
> Key: DRILL-2677
> URL: https://issues.apache.org/jira/browse/DRILL-2677
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
> Environment: drill 0.8 official build
>Reporter: Alexander Reshetov
>Assignee: Jason Altekruse
> Fix For: 1.3.0
>
> Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, 
> dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, 
> dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, 
> dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af
>
>
> Hello,
> I'm trying to execute next query:
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed;
> {code}
> And it works as expected and I got result:
> {code}
> ++++
> |pck | timestamp  |   entry|
> ++++
> | 3547   | 1419807470286356 | 
> {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"}
>  |
> ++++
> 1 row selected (0.188 seconds)
> {code}
> This file contains 4095 same lines of one JSON string + at the end another 
> JOSN line (see attached file dataset_4095_and_1.json)
> The problem is when first string repeats more than 4095 times query got 
> exception. Here is query for file with 4096 string of first type + 1 string 
> of another (see attached file dataset_4096_and_1.json).
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed;
> Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" 
> java.lang.RuntimeException: Error closing fragment context.
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.drill.exec.vector.NullableIntVector cannot be cast to 
> org.apache.drill.exec.vector.RepeatedVector
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68)
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163)
>   ... 4 more
>

[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files

2015-06-26 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-2677:
---
Fix Version/s: (was: 1.1.0)
   1.2.0

> Query does not go beyond 4096 lines in small JSON files
> ---
>
> Key: DRILL-2677
> URL: https://issues.apache.org/jira/browse/DRILL-2677
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
> Environment: drill 0.8 official build
>Reporter: Alexander Reshetov
>Assignee: Jason Altekruse
> Fix For: 1.2.0
>
> Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, 
> dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, 
> dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, 
> dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af
>
>
> Hello,
> I'm trying to execute next query:
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed;
> {code}
> And it works as expected and I got result:
> {code}
> ++++
> |pck | timestamp  |   entry|
> ++++
> | 3547   | 1419807470286356 | 
> {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"}
>  |
> ++++
> 1 row selected (0.188 seconds)
> {code}
> This file contains 4095 same lines of one JSON string + at the end another 
> JOSN line (see attached file dataset_4095_and_1.json)
> The problem is when first string repeats more than 4095 times query got 
> exception. Here is query for file with 4096 string of first type + 1 string 
> of another (see attached file dataset_4096_and_1.json).
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed;
> Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" 
> java.lang.RuntimeException: Error closing fragment context.
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.drill.exec.vector.NullableIntVector cannot be cast to 
> org.apache.drill.exec.vector.RepeatedVector
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68)
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163)
>   ... 4 more
>

[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files

2015-05-08 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-2677:

Fix Version/s: (was: 1.0.0)
   1.1.0

> Query does not go beyond 4096 lines in small JSON files
> ---
>
> Key: DRILL-2677
> URL: https://issues.apache.org/jira/browse/DRILL-2677
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
> Environment: drill 0.8 official build
>Reporter: Alexander Reshetov
>Assignee: Jason Altekruse
> Fix For: 1.1.0
>
> Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, 
> dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, 
> dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, 
> dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af
>
>
> Hello,
> I'm trying to execute next query:
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed;
> {code}
> And it works as expected and I got result:
> {code}
> ++++
> |pck | timestamp  |   entry|
> ++++
> | 3547   | 1419807470286356 | 
> {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"}
>  |
> ++++
> 1 row selected (0.188 seconds)
> {code}
> This file contains 4095 same lines of one JSON string + at the end another 
> JOSN line (see attached file dataset_4095_and_1.json)
> The problem is when first string repeats more than 4095 times query got 
> exception. Here is query for file with 4096 string of first type + 1 string 
> of another (see attached file dataset_4096_and_1.json).
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed;
> Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" 
> java.lang.RuntimeException: Error closing fragment context.
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.drill.exec.vector.NullableIntVector cannot be cast to 
> org.apache.drill.exec.vector.RepeatedVector
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68)
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163)
>   ... 4 more

[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files

2015-05-04 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-2677:
---
Fix Version/s: 1.0.0

> Query does not go beyond 4096 lines in small JSON files
> ---
>
> Key: DRILL-2677
> URL: https://issues.apache.org/jira/browse/DRILL-2677
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
> Environment: drill 0.8 official build
>Reporter: Alexander Reshetov
> Fix For: 1.0.0
>
> Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, 
> dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, 
> dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, 
> dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af
>
>
> Hello,
> I'm trying to execute next query:
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed;
> {code}
> And it works as expected and I got result:
> {code}
> ++++
> |pck | timestamp  |   entry|
> ++++
> | 3547   | 1419807470286356 | 
> {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"}
>  |
> ++++
> 1 row selected (0.188 seconds)
> {code}
> This file contains 4095 same lines of one JSON string + at the end another 
> JOSN line (see attached file dataset_4095_and_1.json)
> The problem is when first string repeats more than 4095 times query got 
> exception. Here is query for file with 4096 string of first type + 1 string 
> of another (see attached file dataset_4096_and_1.json).
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed;
> Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" 
> java.lang.RuntimeException: Error closing fragment context.
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.drill.exec.vector.NullableIntVector cannot be cast to 
> org.apache.drill.exec.vector.RepeatedVector
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68)
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163)
>   ... 4 more
> Query failed: RemoteRpcException: Failure while running fragment., 
> org.

[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files

2015-04-10 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-2677:
--
Component/s: Storage - JSON

> Query does not go beyond 4096 lines in small JSON files
> ---
>
> Key: DRILL-2677
> URL: https://issues.apache.org/jira/browse/DRILL-2677
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
> Environment: drill 0.8 official build
>Reporter: Alexander Reshetov
> Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, 
> dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, 
> dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, 
> dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af
>
>
> Hello,
> I'm trying to execute next query:
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed;
> {code}
> And it works as expected and I got result:
> {code}
> ++++
> |pck | timestamp  |   entry|
> ++++
> | 3547   | 1419807470286356 | 
> {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"}
>  |
> ++++
> 1 row selected (0.188 seconds)
> {code}
> This file contains 4095 same lines of one JSON string + at the end another 
> JOSN line (see attached file dataset_4095_and_1.json)
> The problem is when first string repeats more than 4095 times query got 
> exception. Here is query for file with 4096 string of first type + 1 string 
> of another (see attached file dataset_4096_and_1.json).
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed;
> Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" 
> java.lang.RuntimeException: Error closing fragment context.
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.drill.exec.vector.NullableIntVector cannot be cast to 
> org.apache.drill.exec.vector.RepeatedVector
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68)
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163)
>   ... 4 more
> Query failed: RemoteRpcException: Failure while running fragment., 
> org.apache.drill.exec.vector.N

[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files

2015-04-03 Thread Alexander Reshetov (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Reshetov updated DRILL-2677:
--
Attachment: dataset_sample.json.gz.part-af
dataset_sample.json.gz.part-ae
dataset_sample.json.gz.part-ad
dataset_sample.json.gz.part-ac
dataset_sample.json.gz.part-ab
dataset_sample.json.gz.part-aa

Large file reggarding second part of the issue.

Jira does not allow files >10M, so I splitted archive in 10M part.
You can merge them via:
{code}
cat dataset_sample.json.gz.part-* > dataset_sample.json.gz
{code}

MD5 sum of it should be: {{238744304ff5200df5a357d9bac090dc}}

> Query does not go beyond 4096 lines in small JSON files
> ---
>
> Key: DRILL-2677
> URL: https://issues.apache.org/jira/browse/DRILL-2677
> Project: Apache Drill
>  Issue Type: Bug
> Environment: drill 0.8 official build
>Reporter: Alexander Reshetov
> Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json, 
> dataset_sample.json.gz.part-aa, dataset_sample.json.gz.part-ab, 
> dataset_sample.json.gz.part-ac, dataset_sample.json.gz.part-ad, 
> dataset_sample.json.gz.part-ae, dataset_sample.json.gz.part-af
>
>
> Hello,
> I'm trying to execute next query:
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed;
> {code}
> And it works as expected and I got result:
> {code}
> ++++
> |pck | timestamp  |   entry|
> ++++
> | 3547   | 1419807470286356 | 
> {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"}
>  |
> ++++
> 1 row selected (0.188 seconds)
> {code}
> This file contains 4095 same lines of one JSON string + at the end another 
> JOSN line (see attached file dataset_4095_and_1.json)
> The problem is when first string repeats more than 4095 times query got 
> exception. Here is query for file with 4096 string of first type + 1 string 
> of another (see attached file dataset_4096_and_1.json).
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed;
> Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" 
> java.lang.RuntimeException: Error closing fragment context.
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.drill.exec.vector.NullableIntVector cannot be cast to 
> org.apache.drill.exec.vector.RepeatedVector
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at

[jira] [Updated] (DRILL-2677) Query does not go beyond 4096 lines in small JSON files

2015-04-03 Thread Alexander Reshetov (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Reshetov updated DRILL-2677:
--
Attachment: dataset_4096_and_1.json
dataset_4095_and_1.json

First two files regarding root cause of issue.

> Query does not go beyond 4096 lines in small JSON files
> ---
>
> Key: DRILL-2677
> URL: https://issues.apache.org/jira/browse/DRILL-2677
> Project: Apache Drill
>  Issue Type: Bug
> Environment: drill 0.8 official build
>Reporter: Alexander Reshetov
> Attachments: dataset_4095_and_1.json, dataset_4096_and_1.json
>
>
> Hello,
> I'm trying to execute next query:
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4095_and_1.json` as source) as parsed;
> {code}
> And it works as expected and I got result:
> {code}
> ++++
> |pck | timestamp  |   entry|
> ++++
> | 3547   | 1419807470286356 | 
> {"TransferingPurpose":"8","TransferingImpact":"88","TransferingKind":"8","TransferingTime":"8","PackageOrigSenderID":"8","TransferingID":"8","TransitCN":"888","PackageChkPnt":"","PackageFullSize":"8","TransferingSessionID":"8","SubpackagesCounter":"8"}
>  |
> ++++
> 1 row selected (0.188 seconds)
> {code}
> This file contains 4095 same lines of one JSON string + at the end another 
> JOSN line (see attached file dataset_4095_and_1.json)
> The problem is when first string repeats more than 4095 times query got 
> exception. Here is query for file with 4096 string of first type + 1 string 
> of another (see attached file dataset_4096_and_1.json).
> {code}
> select * from (select source.pck, source.`timestamp`, 
> flatten(source.HostUpdateTypeNW.Transfers) as entry from 
> dfs.`/mnt/data/dataset_4096_and_1.json` as source) as parsed;
> Exception in thread "2ae108ff-b7ea-8f07-054e-84875815d856:frag:0:0" 
> java.lang.RuntimeException: Error closing fragment context.
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:224)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:187)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.drill.exec.vector.NullableIntVector cannot be cast to 
> org.apache.drill.exec.vector.RepeatedVector
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.getFlattenFieldTransferPair(FlattenRecordBatch.java:274)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.setupNewSchema(FlattenRecordBatch.java:296)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>   at 
> org.apache.drill.exec.physical.impl.flatten.FlattenRecordBatch.innerNext(FlattenRecordBatch.java:122)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:99)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:89)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:134)
>   at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:68)
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:96)
>   at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:58)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:163)
>   ... 4 more
> Query failed: RemoteRpcException: Failure while running fragment., 
> org.apache.drill.exec.vector.NullableIntVector cannot be cast to 
> org.apache.drill.exec.vector.RepeatedVector [ 
> cb6c7914-438f-440a-9c74-fe39130feca9 on testlab-