[jira] [Commented] (DRILL-4155) Query with two-way join and flatten fails with "IllegalArgumentException: maxCapacity"

2015-12-02 Thread Abhishek Girish (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037103#comment-15037103
 ] 

Abhishek Girish commented on DRILL-4155:


Looks like the profile doesn't get created (or possibly I couldn't find it). 

> Query with two-way join and flatten fails with "IllegalArgumentException: 
> maxCapacity"
> --
>
> Key: DRILL-4155
> URL: https://issues.apache.org/jira/browse/DRILL-4155
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Storage - JSON
>Reporter: Abhishek Girish
> Attachments: drillbit.log.txt
>
>
> The following query on the Yelp Academic dataset fails to execute:
> {code}
> select u.name, b.name , flatten(b.categories) from 
> maprfs.yelp_tutorial.`yelp_academic_dataset_user.json` u, 
> maprfs.yelp_tutorial.`yelp_academic_dataset_business.json` b where 
> u.average_stars = b.stars limit 10
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalArgumentException: maxCapacity: -104845 (expected: >= 0) Fragment 1:0 
> [Error Id: b0d99a6c-3434-49ce-8aa6-181993cdd853 on atsqa6c62.qa.lab:31010]
> {code}
> Tried on multiple setups in distributed mode - consistently fails. 
> Dataset can be accessed from : 
> https://s3.amazonaws.com/apache-drill/files/yelp.tgz (uncompressed tar 
> archive)
> Log attached. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4155) Query with two-way join and flatten fails with "IllegalArgumentException: maxCapacity"

2015-12-02 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-4155:
---
Attachment: (was: profile.json.txt)

> Query with two-way join and flatten fails with "IllegalArgumentException: 
> maxCapacity"
> --
>
> Key: DRILL-4155
> URL: https://issues.apache.org/jira/browse/DRILL-4155
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Storage - JSON
>Reporter: Abhishek Girish
> Attachments: drillbit.log.txt
>
>
> The following query on the Yelp Academic dataset fails to execute:
> {code}
> select u.name, b.name , flatten(b.categories) from 
> maprfs.yelp_tutorial.`yelp_academic_dataset_user.json` u, 
> maprfs.yelp_tutorial.`yelp_academic_dataset_business.json` b where 
> u.average_stars = b.stars limit 10
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalArgumentException: maxCapacity: -104845 (expected: >= 0) Fragment 1:0 
> [Error Id: b0d99a6c-3434-49ce-8aa6-181993cdd853 on atsqa6c62.qa.lab:31010]
> {code}
> Tried on multiple setups in distributed mode - consistently fails. 
> Dataset can be accessed from : 
> https://s3.amazonaws.com/apache-drill/files/yelp.tgz (uncompressed tar 
> archive)
> Log attached. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4155) Query with two-way join and flatten fails with "IllegalArgumentException: maxCapacity"

2015-12-02 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-4155:
---
Attachment: profile.json.txt

> Query with two-way join and flatten fails with "IllegalArgumentException: 
> maxCapacity"
> --
>
> Key: DRILL-4155
> URL: https://issues.apache.org/jira/browse/DRILL-4155
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Storage - JSON
>Reporter: Abhishek Girish
> Attachments: drillbit.log.txt, profile.json.txt
>
>
> The following query on the Yelp Academic dataset fails to execute:
> {code}
> select u.name, b.name , flatten(b.categories) from 
> maprfs.yelp_tutorial.`yelp_academic_dataset_user.json` u, 
> maprfs.yelp_tutorial.`yelp_academic_dataset_business.json` b where 
> u.average_stars = b.stars limit 10
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalArgumentException: maxCapacity: -104845 (expected: >= 0) Fragment 1:0 
> [Error Id: b0d99a6c-3434-49ce-8aa6-181993cdd853 on atsqa6c62.qa.lab:31010]
> {code}
> Tried on multiple setups in distributed mode - consistently fails. 
> Dataset can be accessed from : 
> https://s3.amazonaws.com/apache-drill/files/yelp.tgz (uncompressed tar 
> archive)
> Log attached. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4155) Query with two-way join and flatten fails with "IllegalArgumentException: maxCapacity"

2015-12-02 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-4155:
---
Attachment: drillbit.log.txt

> Query with two-way join and flatten fails with "IllegalArgumentException: 
> maxCapacity"
> --
>
> Key: DRILL-4155
> URL: https://issues.apache.org/jira/browse/DRILL-4155
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Storage - JSON
>Reporter: Abhishek Girish
> Attachments: drillbit.log.txt
>
>
> The following query on the Yelp Academic dataset fails to execute:
> {code}
> select u.name, b.name , flatten(b.categories) from 
> maprfs.yelp_tutorial.`yelp_academic_dataset_user.json` u, 
> maprfs.yelp_tutorial.`yelp_academic_dataset_business.json` b where 
> u.average_stars = b.stars limit 10
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalArgumentException: maxCapacity: -104845 (expected: >= 0) Fragment 1:0 
> [Error Id: b0d99a6c-3434-49ce-8aa6-181993cdd853 on atsqa6c62.qa.lab:31010]
> {code}
> Tried on multiple setups in distributed mode - consistently fails. 
> Dataset can be accessed from : 
> https://s3.amazonaws.com/apache-drill/files/yelp.tgz (uncompressed tar 
> archive)
> Log attached. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4155) Query with two-way join and flatten fails with "IllegalArgumentException: maxCapacity"

2015-12-02 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-4155:
--

 Summary: Query with two-way join and flatten fails with 
"IllegalArgumentException: maxCapacity"
 Key: DRILL-4155
 URL: https://issues.apache.org/jira/browse/DRILL-4155
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow, Storage - JSON
Reporter: Abhishek Girish
 Attachments: drillbit.log.txt

The following query on the Yelp Academic dataset fails to execute:

{code}
select u.name, b.name , flatten(b.categories) from 
maprfs.yelp_tutorial.`yelp_academic_dataset_user.json` u, 
maprfs.yelp_tutorial.`yelp_academic_dataset_business.json` b where 
u.average_stars = b.stars limit 10

Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
IllegalArgumentException: maxCapacity: -104845 (expected: >= 0) Fragment 1:0 
[Error Id: b0d99a6c-3434-49ce-8aa6-181993cdd853 on atsqa6c62.qa.lab:31010]
{code}

Tried on multiple setups in distributed mode - consistently fails. 

Dataset can be accessed from : 
https://s3.amazonaws.com/apache-drill/files/yelp.tgz (uncompressed tar archive)
Log attached. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4154) Metadata Caching : Upgrading cache to v2 from v1 corrupts the cache in some scenarios

2015-12-02 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-4154:
-
Attachment: fewtypes_varcharpartition.tar.tgz
old-cache.txt
broken-cache.txt

Also I removed the cache file from a directory and copied another cache file. 
The data in the directory has not been modified. Now when I run a query over 
that directory, I see that the cache file is updated. to my knowledge this 
should not happen. Am I missing something?

> Metadata Caching : Upgrading cache to v2 from v1 corrupts the cache in some 
> scenarios
> -
>
> Key: DRILL-4154
> URL: https://issues.apache.org/jira/browse/DRILL-4154
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Rahul Challapalli
>Priority: Critical
> Attachments: broken-cache.txt, fewtypes_varcharpartition.tar.tgz, 
> old-cache.txt
>
>
> git.commit.id.abbrev=46c47a2
> I copied the data along with the cache file onto maprfs. Now I ran the 
> upgrade tool (https://github.com/parthchandra/drill-upgrade). Now I ran the 
> metadata_caching suite from the functional tests (concurrency 10) without the 
> datagen phase. I see 3 test failures and when I looked at the cache file it 
> seems to be containing wrong information for the varchar column. 
> Sample from the cache :
> {code}
>   {
> "name" : [ "varchar_col" ]
>   }, {
> "name" : [ "float_col" ],
> "mxValue" : 68797.22,
> "nulls" : 0
>   }
> {code}
> Now I followed the same steps and instead of running the suites I executed 
> the "REFRESH TABLE METADATA" command or any query on that folder,  the cache 
> file seems to be created properly
> I attached the data and cache files required. Let me know if you need anything



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4154) Metadata Caching : Upgrading cache to v2 from v1 corrupts the cache in some scenarios

2015-12-02 Thread Rahul Challapalli (JIRA)
Rahul Challapalli created DRILL-4154:


 Summary: Metadata Caching : Upgrading cache to v2 from v1 corrupts 
the cache in some scenarios
 Key: DRILL-4154
 URL: https://issues.apache.org/jira/browse/DRILL-4154
 Project: Apache Drill
  Issue Type: Bug
Reporter: Rahul Challapalli
Priority: Critical


git.commit.id.abbrev=46c47a2

I copied the data along with the cache file onto maprfs. Now I ran the upgrade 
tool (https://github.com/parthchandra/drill-upgrade). Now I ran the 
metadata_caching suite from the functional tests (concurrency 10) without the 
datagen phase. I see 3 test failures and when I looked at the cache file it 
seems to be containing wrong information for the varchar column. 

Sample from the cache :
{code}
  {
"name" : [ "varchar_col" ]
  }, {
"name" : [ "float_col" ],
"mxValue" : 68797.22,
"nulls" : 0
  }
{code}

Now I followed the same steps and instead of running the suites I executed the 
"REFRESH TABLE METADATA" command or any query on that folder,  the cache file 
seems to be created properly

I attached the data and cache files required. Let me know if you need anything



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4109) NPE in RecordIterator

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036994#comment-15036994
 ] 

ASF GitHub Bot commented on DRILL-4109:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/282


> NPE in RecordIterator
> -
>
> Key: DRILL-4109
> URL: https://issues.apache.org/jira/browse/DRILL-4109
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Victoria Markman
>Assignee: amit hadke
>Priority: Blocker
> Fix For: 1.4.0
>
> Attachments: 29ac6c1b-9b33-3457-8bc8-9e2dff6ad438.sys.drill, 
> 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill, drillbit.log, 
> drillbit.log.debug
>
>
> 4 node cluster
> 36GB of direct memory
> 4GB heap memory
> planner.memory.max_query_memory_per_node=2GB (default)
> planner.enable_hashjoin = false
> Spill directory has 6.4T of memory available:
> {noformat}
> [Tue Nov 17 18:23:18 /tmp/drill ] # df -H .
> Filesystem   Size  Used Avail Use% Mounted on
> localhost:/mapr  7.7T  1.4T  6.4T  18% /mapr
> {noformat}
> Run query below: 
> framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> drillbit.log
> {code}
> 2015-11-18 02:22:12,639 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:9] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_9/operator_17/7
> 2015-11-18 02:22:12,770 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:5] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_5/operator_17/7
> 2015-11-18 02:22:13,345 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:17] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_17/operator_17/7
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_13/operator_16/1
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] WARN 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 34 batch groups. 
> Current allocated memory: 2252186
> 2015-11-18 02:22:13,363 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested RUNNING --> 
> FAILED
> 2015-11-18 02:22:13,370 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested FAILED --> 
> FINISHED
> 2015-11-18 02:22:13,371 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> java.lang.NullPointerException: null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4109) NPE in RecordIterator

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036959#comment-15036959
 ] 

ASF GitHub Bot commented on DRILL-4109:
---

Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/282#issuecomment-161478598
  
+1


> NPE in RecordIterator
> -
>
> Key: DRILL-4109
> URL: https://issues.apache.org/jira/browse/DRILL-4109
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Victoria Markman
>Assignee: amit hadke
>Priority: Blocker
> Fix For: 1.4.0
>
> Attachments: 29ac6c1b-9b33-3457-8bc8-9e2dff6ad438.sys.drill, 
> 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill, drillbit.log, 
> drillbit.log.debug
>
>
> 4 node cluster
> 36GB of direct memory
> 4GB heap memory
> planner.memory.max_query_memory_per_node=2GB (default)
> planner.enable_hashjoin = false
> Spill directory has 6.4T of memory available:
> {noformat}
> [Tue Nov 17 18:23:18 /tmp/drill ] # df -H .
> Filesystem   Size  Used Avail Use% Mounted on
> localhost:/mapr  7.7T  1.4T  6.4T  18% /mapr
> {noformat}
> Run query below: 
> framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> drillbit.log
> {code}
> 2015-11-18 02:22:12,639 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:9] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_9/operator_17/7
> 2015-11-18 02:22:12,770 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:5] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_5/operator_17/7
> 2015-11-18 02:22:13,345 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:17] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_17/operator_17/7
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_13/operator_16/1
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] WARN 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 34 batch groups. 
> Current allocated memory: 2252186
> 2015-11-18 02:22:13,363 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested RUNNING --> 
> FAILED
> 2015-11-18 02:22:13,370 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested FAILED --> 
> FINISHED
> 2015-11-18 02:22:13,371 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> java.lang.NullPointerException: null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3572) Provide a simple interface to append metadata to files and directories (.drill)

2015-12-02 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036898#comment-15036898
 ] 

Julien Le Dem edited comment on DRILL-3572 at 12/2/15 11:52 PM:


I created separate sub-tickets for several aspects of the dotdrill file.
Each one of them can be implemented independently assuming they are separate 
fields in a {{.drill}} file:
{noformat}
{
   version: ...
   format: {
 ...
   },
   schema: {
   ...
   },
   error_handling: {
   ...
   }
}
{noformat}
  


was (Author: julienledem):
I created separate sub-tickets for several aspects of the dotdrill file.
Each one of them can be implemented independently assuming they are separate 
fields in a {{.drill}} file:
{noformat}
{
   version: ...
   format: {
 ...
   },
  schema: {
   ...
   },
   error_handling: {
   ...
   }
}
{noformat}
  

> Provide a simple interface to append metadata to files and directories 
> (.drill)
> ---
>
> Key: DRILL-3572
> URL: https://issues.apache.org/jira/browse/DRILL-3572
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Jacques Nadeau
> Fix For: Future
>
>
> We need a way to store small amounts of metadata about a file or a collection 
> of files.  The current thinking was a way to have a "dot drill file" that 
> ascribes metadata to a particular asset. 
> Initial example file might be something that includes the following:
> {code}
> {
>   // Drill version identifier
>   version: "dd1"  
>   
>   // Format Plugin Configuration
>   format: {  
> type: "httpd", 
> format: "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" 
> \"%{Cookie}i\""}
>   },
>   
>   // Traits of underlying data (a.k.a physical properties)
>   traits: [ // traits of the underlying data
> {type: "sort_nulls_first", columns: ["request.uri", "client.host"]}
> {type: "unique", columns ["abc"]}
> {type: "unqiue", columns ["xy", "zz"]}
>   ],
>   
>   // Mappings between directory names and exposed columns
>   dirs: [
> {skip: true}, // don't include this directory name in the directory path.
> {name: "year", type: "integer"},
> {name: "month", type: "integer"},
> {name: "day", type: "integer"}
>   ],
>   // whether or not a user can add new columns to the table through insert
>   rigid_table: true
>   
> }
> {code}
> We also need to support adding more machine-generated/managed data such as 
> statistics.  That should be done using a separate file from the one that is 
> human description.
> A user should be able to ascribe this metadata directly through the file 
> system as well as through sql commands such as 
> {code}
> ALTER TABLE ADD METADATA ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3572) Provide a simple interface to append metadata to files and directories (.drill)

2015-12-02 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036898#comment-15036898
 ] 

Julien Le Dem commented on DRILL-3572:
--

I created separate sub-tickets for several aspects of the dotdrill file.
Each one of them can be implemented independently assuming they are separate 
fields in a {{.drill}} file:
{noformat}
{
   version: ...
   format: {
 ...
   },
  schema: {
   ...
   },
   error_handling: {
   ...
   }
}
{noformat}
  

> Provide a simple interface to append metadata to files and directories 
> (.drill)
> ---
>
> Key: DRILL-3572
> URL: https://issues.apache.org/jira/browse/DRILL-3572
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Jacques Nadeau
> Fix For: Future
>
>
> We need a way to store small amounts of metadata about a file or a collection 
> of files.  The current thinking was a way to have a "dot drill file" that 
> ascribes metadata to a particular asset. 
> Initial example file might be something that includes the following:
> {code}
> {
>   // Drill version identifier
>   version: "dd1"  
>   
>   // Format Plugin Configuration
>   format: {  
> type: "httpd", 
> format: "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" 
> \"%{Cookie}i\""}
>   },
>   
>   // Traits of underlying data (a.k.a physical properties)
>   traits: [ // traits of the underlying data
> {type: "sort_nulls_first", columns: ["request.uri", "client.host"]}
> {type: "unique", columns ["abc"]}
> {type: "unqiue", columns ["xy", "zz"]}
>   ],
>   
>   // Mappings between directory names and exposed columns
>   dirs: [
> {skip: true}, // don't include this directory name in the directory path.
> {name: "year", type: "integer"},
> {name: "month", type: "integer"},
> {name: "day", type: "integer"}
>   ],
>   // whether or not a user can add new columns to the table through insert
>   rigid_table: true
>   
> }
> {code}
> We also need to support adding more machine-generated/managed data such as 
> statistics.  That should be done using a separate file from the one that is 
> human description.
> A user should be able to ascribe this metadata directly through the file 
> system as well as through sql commands such as 
> {code}
> ALTER TABLE ADD METADATA ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4066) support for format in dot drill file

2015-12-02 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036891#comment-15036891
 ] 

Julien Le Dem commented on DRILL-4066:
--

I'm currently not working on this as I'm focusing on something else for now.
I did a little bit of investigation on the feasibility.
My take is the DynamicDrillTable can have a compound selection object with 
different FormatPlugin associations for different subset of the paths to read.
That will allow using multiple FormatPlugins in case all the dot drill files do 
not point to the same one. This is useful when the format has changed over time.

> support for format in dot drill file
> 
>
> Key: DRILL-4066
> URL: https://issues.apache.org/jira/browse/DRILL-4066
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Other
>Reporter: Julien Le Dem
> Fix For: Future
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4066) support for format in dot drill file

2015-12-02 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated DRILL-4066:
-
Assignee: (was: Julien Le Dem)

> support for format in dot drill file
> 
>
> Key: DRILL-4066
> URL: https://issues.apache.org/jira/browse/DRILL-4066
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Other
>Reporter: Julien Le Dem
> Fix For: Future
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3572) Provide a simple interface to append metadata to files and directories (.drill)

2015-12-02 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated DRILL-3572:
-
Assignee: (was: Julien Le Dem)

> Provide a simple interface to append metadata to files and directories 
> (.drill)
> ---
>
> Key: DRILL-3572
> URL: https://issues.apache.org/jira/browse/DRILL-3572
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Reporter: Jacques Nadeau
> Fix For: Future
>
>
> We need a way to store small amounts of metadata about a file or a collection 
> of files.  The current thinking was a way to have a "dot drill file" that 
> ascribes metadata to a particular asset. 
> Initial example file might be something that includes the following:
> {code}
> {
>   // Drill version identifier
>   version: "dd1"  
>   
>   // Format Plugin Configuration
>   format: {  
> type: "httpd", 
> format: "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" 
> \"%{Cookie}i\""}
>   },
>   
>   // Traits of underlying data (a.k.a physical properties)
>   traits: [ // traits of the underlying data
> {type: "sort_nulls_first", columns: ["request.uri", "client.host"]}
> {type: "unique", columns ["abc"]}
> {type: "unqiue", columns ["xy", "zz"]}
>   ],
>   
>   // Mappings between directory names and exposed columns
>   dirs: [
> {skip: true}, // don't include this directory name in the directory path.
> {name: "year", type: "integer"},
> {name: "month", type: "integer"},
> {name: "day", type: "integer"}
>   ],
>   // whether or not a user can add new columns to the table through insert
>   rigid_table: true
>   
> }
> {code}
> We also need to support adding more machine-generated/managed data such as 
> statistics.  That should be done using a separate file from the one that is 
> human description.
> A user should be able to ascribe this metadata directly through the file 
> system as well as through sql commands such as 
> {code}
> ALTER TABLE ADD METADATA ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4109) NPE in RecordIterator

2015-12-02 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036883#comment-15036883
 ] 

Venki Korukanti commented on DRILL-4109:


I just got to know that Vicky is OOF today. I will check if anybody else has 
the same configuration to verify.

> NPE in RecordIterator
> -
>
> Key: DRILL-4109
> URL: https://issues.apache.org/jira/browse/DRILL-4109
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Victoria Markman
>Assignee: amit hadke
>Priority: Blocker
> Fix For: 1.4.0
>
> Attachments: 29ac6c1b-9b33-3457-8bc8-9e2dff6ad438.sys.drill, 
> 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill, drillbit.log, 
> drillbit.log.debug
>
>
> 4 node cluster
> 36GB of direct memory
> 4GB heap memory
> planner.memory.max_query_memory_per_node=2GB (default)
> planner.enable_hashjoin = false
> Spill directory has 6.4T of memory available:
> {noformat}
> [Tue Nov 17 18:23:18 /tmp/drill ] # df -H .
> Filesystem   Size  Used Avail Use% Mounted on
> localhost:/mapr  7.7T  1.4T  6.4T  18% /mapr
> {noformat}
> Run query below: 
> framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> drillbit.log
> {code}
> 2015-11-18 02:22:12,639 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:9] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_9/operator_17/7
> 2015-11-18 02:22:12,770 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:5] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_5/operator_17/7
> 2015-11-18 02:22:13,345 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:17] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_17/operator_17/7
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_13/operator_16/1
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] WARN 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 34 batch groups. 
> Current allocated memory: 2252186
> 2015-11-18 02:22:13,363 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested RUNNING --> 
> FAILED
> 2015-11-18 02:22:13,370 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested FAILED --> 
> FINISHED
> 2015-11-18 02:22:13,371 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> java.lang.NullPointerException: null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036869#comment-15036869
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46493263
  
--- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java ---
@@ -230,20 +249,31 @@ public synchronized boolean release() {
*/
   @Override
   public synchronized boolean release(int decrement) {
+if (isEmpty) {
+  return false;
+}
 
-if(rootBuffer){
-  final long newRefCnt = this.rootRefCnt.addAndGet(-decrement);
-  Preconditions.checkArgument(newRefCnt > -1, "Buffer has negative 
reference count.");
-  if (newRefCnt == 0) {
-b.release(decrement);
-acct.release(this, length);
-return true;
-  }else{
-return false;
-  }
-}else{
-  return b.release(decrement);
+if (decrement < 1) {
+  throw new IllegalStateException(String.format("release(%d) argument 
is not positive. Buffer Info: %s",
+  decrement, toVerboseString()));
 }
+
+if (BaseAllocator.DEBUG) {
+  historicalLog.recordEvent("release(%d)", decrement);
--- End diff --

I think it would be helpful to also record the new `refCnt`


> Incorporate remaining patches from DRILL-1942 Allocator refactor
> 
>
> Key: DRILL-4134
> URL: https://issues.apache.org/jira/browse/DRILL-4134
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4124) Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise in logs

2015-12-02 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4124.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise 
> in logs
> ---
>
> Key: DRILL-4124
> URL: https://issues.apache.org/jira/browse/DRILL-4124
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4124) Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise in logs

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036859#comment-15036859
 ] 

ASF GitHub Bot commented on DRILL-4124:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/281


> Make all uses of AutoCloseables use addSuppressed exceptions to avoid noise 
> in logs
> ---
>
> Key: DRILL-4124
> URL: https://issues.apache.org/jira/browse/DRILL-4124
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036816#comment-15036816
 ] 

ASF GitHub Bot commented on DRILL-4145:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/287


> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
>Assignee: Jacques Nadeau
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-X.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   
> FIELD_12   | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 
> 925630488  | 0| 925630488  | -1   | 19531580547  |   | 
> 27/10/2015 02:00:00  |   | 30| 300   | 0 | 0  
>|   |   | 27/10/2015 02:05:27  | 0 | 1 | 0 
> | 35.0  |   |   |   | 505   | 872.0   
>   |   | aBc   |   |   |   |   |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+--

[jira] [Commented] (DRILL-4109) NPE in RecordIterator

2015-12-02 Thread amit hadke (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036757#comment-15036757
 ] 

amit hadke commented on DRILL-4109:
---

[~vicky] I pushed in my changes for DRILL-4125,  Could you run query15.sql on 
latest change?
Repo: https://github.com/amithadke/drill
Branch: DRILL-4109

> NPE in RecordIterator
> -
>
> Key: DRILL-4109
> URL: https://issues.apache.org/jira/browse/DRILL-4109
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Victoria Markman
>Assignee: amit hadke
>Priority: Blocker
> Fix For: 1.4.0
>
> Attachments: 29ac6c1b-9b33-3457-8bc8-9e2dff6ad438.sys.drill, 
> 29b41f37-4803-d7ce-e05f-912d1f65da79.sys.drill, drillbit.log, 
> drillbit.log.debug
>
>
> 4 node cluster
> 36GB of direct memory
> 4GB heap memory
> planner.memory.max_query_memory_per_node=2GB (default)
> planner.enable_hashjoin = false
> Spill directory has 6.4T of memory available:
> {noformat}
> [Tue Nov 17 18:23:18 /tmp/drill ] # df -H .
> Filesystem   Size  Used Avail Use% Mounted on
> localhost:/mapr  7.7T  1.4T  6.4T  18% /mapr
> {noformat}
> Run query below: 
> framework/resources/Advanced/tpcds/tpcds_sf100/original/query15.sql
> drillbit.log
> {code}
> 2015-11-18 02:22:12,639 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:9] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_9/operator_17/7
> 2015-11-18 02:22:12,770 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:5] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - Merging and spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_5/operator_17/7
> 2015-11-18 02:22:13,345 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:17] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_17/operator_17/7
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
> /tmp/drill/spill/29b41f37-4803-d7ce-e05f-912d1f65da79/major_fragment_3/minor_fragment_13/operator_16/1
> 2015-11-18 02:22:13,346 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] WARN 
>  o.a.d.e.p.i.xsort.ExternalSortBatch - Starting to merge. 34 batch groups. 
> Current allocated memory: 2252186
> 2015-11-18 02:22:13,363 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested RUNNING --> 
> FAILED
> 2015-11-18 02:22:13,370 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] INFO 
>  o.a.d.e.w.fragment.FragmentExecutor - 
> 29b41f37-4803-d7ce-e05f-912d1f65da79:3:13: State change requested FAILED --> 
> FINISHED
> 2015-11-18 02:22:13,371 [29b41f37-4803-d7ce-e05f-912d1f65da79:frag:3:13] 
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> Fragment 3:13
> [Error Id: c5d67dcb-16aa-4951-89f5-599b4b4eb54d on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> java.lang.NullPointerException: null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4127) HiveSchema.getSubSchema() should use lazy loading of all the table names

2015-12-02 Thread Jinfeng Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036677#comment-15036677
 ] 

Jinfeng Ni commented on DRILL-4127:
---

For a hive storage plugin with about 8 schema/databases, if I run a simple 
query like this:

select count(*) from hive.table1;

>From hive.log, we saw that the # of hive metastore api call as following:

Without the patch. Impersonation is turned on.
1. # of get_all_databases API call: 31
2. # of get_all_tables API call: 30
3. # of get_table API call: 2

That explains that why some Drill users report that they saw Drill spent 20-30 
seconds on planning for such simple query,  making the query not "interactive" 
at all.

 


> HiveSchema.getSubSchema() should use lazy loading of all the table names
> 
>
> Key: DRILL-4127
> URL: https://issues.apache.org/jira/browse/DRILL-4127
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>
> Currently, HiveSchema.getSubSchema() will pre-load all the table names when 
> it constructs the subschema, even though those tables names are not requested 
> at all. This could cause considerably big performance overhead, especially 
> when the hive schema contains large # of objects (thousands of tables/views 
> are not un-common in some use case). 
> In stead, we should change the loading of table names to on-demand. Only when 
> there is a request of get all table names, we load them into hive schema.
> This should help "show schemas", since it only requires the schema name, not 
> the table names in the schema. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036659#comment-15036659
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46477393
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/BaseAllocator.java 
---
@@ -0,0 +1,689 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.memory;
+
+import io.netty.buffer.ByteBufAllocator;
+import io.netty.buffer.DrillBuf;
+import io.netty.buffer.UnsafeDirectLittleEndian;
+
+import java.util.Arrays;
+import java.util.IdentityHashMap;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.drill.common.HistoricalLog;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.memory.AllocatorManager.BufferLedger;
+import org.apache.drill.exec.ops.BufferManager;
+import org.apache.drill.exec.util.AssertionUtil;
+
+import com.google.common.base.Preconditions;
+
+public abstract class BaseAllocator extends Accountant implements 
BufferAllocator {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BaseAllocator.class);
+
+  public static final String DEBUG_ALLOCATOR = 
"drill.memory.debug.allocator";
+
+  private static final AtomicLong ID_GENERATOR = new AtomicLong(0);
+  private static final int CHUNK_SIZE = 
AllocatorManager.INNER_ALLOCATOR.getChunkSize();
+
+  public static final int DEBUG_LOG_LENGTH = 6;
+  public static final boolean DEBUG = AssertionUtil.isAssertionsEnabled()
+  || Boolean.parseBoolean(System.getProperty(DEBUG_ALLOCATOR, 
"false"));
+  private final Object DEBUG_LOCK = DEBUG ? new Object() : null;
+
+  private final BaseAllocator parentAllocator;
+  private final ByteBufAllocator thisAsByteBufAllocator;
+  private final IdentityHashMap childAllocators;
+  private final DrillBuf empty;
+
+  private volatile boolean isClosed = false; // the allocator has been 
closed
+
+  // Package exposed for sharing between AllocatorManger and BaseAllocator 
objects
+  final long id = ID_GENERATOR.incrementAndGet(); // unique ID assigned to 
each allocator
+  final String name;
+  final RootAllocator root;
+
+  // members used purely for debugging
+  private final IdentityHashMap childLedgers;
+  private final IdentityHashMap reservations;
+  private final HistoricalLog historicalLog;
+
+  protected BaseAllocator(
+  final BaseAllocator parentAllocator,
+  final String name,
+  final long initReservation,
+  final long maxAllocation) throws OutOfMemoryException {
+super(parentAllocator, initReservation, maxAllocation);
+
+if (parentAllocator != null) {
+  this.root = parentAllocator.root;
+  empty = parentAllocator.empty;
+} else if (this instanceof RootAllocator) {
+  this.root = (RootAllocator) this;
+  empty = createEmpty();
+} else {
+  throw new IllegalStateException("An parent allocator must either 
carry a root or be the root.");
+}
+
+this.parentAllocator = parentAllocator;
+this.name = name;
+
+// TODO: DRILL-4131
+// this.thisAsByteBufAllocator = new DrillByteBufAllocator(this);
+this.thisAsByteBufAllocator = 
AllocatorManager.INNER_ALLOCATOR.allocator;
+
+if (DEBUG) {
+  childAllocators = new IdentityHashMap<>();
+  reservations = new IdentityHashMap<>();
+  childLedgers = new IdentityHashMap<>();
+  historicalLog = new HistoricalLog(DEBUG_LOG_LENGTH, "allocator[%d]", 
id);
+  hist("created by \"%s\", owned = %d", name.toString(), 
this.getAllocatedMemory());
 

[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036655#comment-15036655
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46477028
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/AllocatorManager.java
 ---
@@ -0,0 +1,386 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.memory;
+
+import static org.apache.drill.exec.memory.BaseAllocator.indent;
+import io.netty.buffer.DrillBuf;
+import io.netty.buffer.PooledByteBufAllocatorL;
+import io.netty.buffer.UnsafeDirectLittleEndian;
+
+import java.util.IdentityHashMap;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReadWriteLock;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+
+import org.apache.drill.common.HistoricalLog;
+import org.apache.drill.exec.memory.BaseAllocator.Verbosity;
+import org.apache.drill.exec.metrics.DrillMetrics;
+import org.apache.drill.exec.ops.BufferManager;
+
+import com.carrotsearch.hppc.LongObjectOpenHashMap;
+import com.google.common.base.Preconditions;
+
+/**
+ * Manages the relationship between one or more allocators and a 
particular UDLE. Ensures that one allocator owns the
+ * memory that multiple allocators may be referencing. Manages a 
BufferLedger between each of its associated allocators.
+ * This class is also responsible for managing when memory is allocated 
and returned to the Netty-based
+ * PooledByteBufAllocatorL.
+ *
+ * The only reason that this isn't package private is we're forced to put 
DrillBuf in Netty's package which need access
+ * to these objects or methods.
+ *
+ * Threading: AllocatorManager manages thread-safety internally. 
Operations within the context of a single BufferLedger
+ * are lockless in nature and can be leveraged by multiple threads. 
Operations that cross the context of two ledgers
+ * will acquire a lock on the AllocatorManager instance. Important note, 
there is one AllocatorManager per
+ * UnsafeDirectLittleEndian buffer allocation. As such, there will be 
thousands of these in a typical query. The
+ * contention of acquiring a lock on AllocatorManager should be very low.
+ *
+ */
+public class AllocatorManager {
+  // private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(AllocatorManager.class);
+
+  private static final AtomicLong LEDGER_ID_GENERATOR = new AtomicLong(0);
+  static final PooledByteBufAllocatorL INNER_ALLOCATOR = new 
PooledByteBufAllocatorL(DrillMetrics.getInstance());
+
+  private final RootAllocator root;
+  private volatile BufferLedger owningLedger;
+  private final int size;
+  private final UnsafeDirectLittleEndian underlying;
+  private final ReadWriteLock lock = new ReentrantReadWriteLock();
+  private final LongObjectOpenHashMap map = new 
LongObjectOpenHashMap<>();
+  private final AutoCloseableLock readLock = new 
AutoCloseableLock(lock.readLock());
+  private final AutoCloseableLock writeLock = new 
AutoCloseableLock(lock.writeLock());
+  private final IdentityHashMap buffers =
+  BaseAllocator.DEBUG ? new IdentityHashMap() : null;
+
+  AllocatorManager(BaseAllocator accountingAllocator, int size) {
+Preconditions.checkNotNull(accountingAllocator);
+this.root = accountingAllocator.root;
+this.underlying = INNER_ALLOCATOR.allocate(size);
+this.owningLedger = associate(accountingAllocator);
+this.size = underlying.capacity();
+  }
+
+  /**
+   * Associate the existing underlying buffer with a new allocator.
+   *
+   * @param allocator
+   *  The target alloc

[jira] [Commented] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-12-02 Thread John Omernik (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036572#comment-15036572
 ] 

John Omernik commented on DRILL-4145:
-

Looks like Steven pushed a change. Steven does that one line addition fix this? 
That's awesome if that's all it took! I did confirm that I have the same issue 
on MapRFS as well. 

Peter, the other issue I saw you mention was that adding the extractHeader to 
the csv didn't actually have the desired affect.  That may be a bug too, do you 
want to open a Jira on that too? (It should work, and when I did my testing, it 
didn't either). 

Thanks for your work on this Peter, it's great to find bugs like this. Helps 
everyone!

John

> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
>Assignee: Jacques Nadeau
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-X.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   
> FIELD_12   | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 
> 925630488  | 0| 925630488  | -1   | 19531580547  |   | 
> 27/10/2015 02:00:00  |   | 30| 300   | 0 | 0  
>|   |   | 27/10/2015 02:05:27  | 0 | 1 | 0 
> | 35.0  |   |   |

[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036515#comment-15036515
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46468356
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/BaseAllocator.java 
---
@@ -0,0 +1,689 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.memory;
+
+import io.netty.buffer.ByteBufAllocator;
+import io.netty.buffer.DrillBuf;
+import io.netty.buffer.UnsafeDirectLittleEndian;
+
+import java.util.Arrays;
+import java.util.IdentityHashMap;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.drill.common.HistoricalLog;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.memory.AllocatorManager.BufferLedger;
+import org.apache.drill.exec.ops.BufferManager;
+import org.apache.drill.exec.util.AssertionUtil;
+
+import com.google.common.base.Preconditions;
+
+public abstract class BaseAllocator extends Accountant implements 
BufferAllocator {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BaseAllocator.class);
+
+  public static final String DEBUG_ALLOCATOR = 
"drill.memory.debug.allocator";
+
+  private static final AtomicLong ID_GENERATOR = new AtomicLong(0);
+  private static final int CHUNK_SIZE = 
AllocatorManager.INNER_ALLOCATOR.getChunkSize();
+
+  public static final int DEBUG_LOG_LENGTH = 6;
+  public static final boolean DEBUG = AssertionUtil.isAssertionsEnabled()
+  || Boolean.parseBoolean(System.getProperty(DEBUG_ALLOCATOR, 
"false"));
+  private final Object DEBUG_LOCK = DEBUG ? new Object() : null;
+
+  private final BaseAllocator parentAllocator;
+  private final ByteBufAllocator thisAsByteBufAllocator;
+  private final IdentityHashMap childAllocators;
+  private final DrillBuf empty;
+
+  private volatile boolean isClosed = false; // the allocator has been 
closed
+
+  // Package exposed for sharing between AllocatorManger and BaseAllocator 
objects
+  final long id = ID_GENERATOR.incrementAndGet(); // unique ID assigned to 
each allocator
+  final String name;
+  final RootAllocator root;
+
+  // members used purely for debugging
+  private final IdentityHashMap childLedgers;
+  private final IdentityHashMap reservations;
+  private final HistoricalLog historicalLog;
+
+  protected BaseAllocator(
+  final BaseAllocator parentAllocator,
+  final String name,
+  final long initReservation,
+  final long maxAllocation) throws OutOfMemoryException {
+super(parentAllocator, initReservation, maxAllocation);
+
+if (parentAllocator != null) {
+  this.root = parentAllocator.root;
+  empty = parentAllocator.empty;
+} else if (this instanceof RootAllocator) {
+  this.root = (RootAllocator) this;
+  empty = createEmpty();
+} else {
+  throw new IllegalStateException("An parent allocator must either 
carry a root or be the root.");
+}
+
+this.parentAllocator = parentAllocator;
+this.name = name;
+
+// TODO: DRILL-4131
+// this.thisAsByteBufAllocator = new DrillByteBufAllocator(this);
+this.thisAsByteBufAllocator = 
AllocatorManager.INNER_ALLOCATOR.allocator;
+
+if (DEBUG) {
+  childAllocators = new IdentityHashMap<>();
+  reservations = new IdentityHashMap<>();
+  childLedgers = new IdentityHashMap<>();
+  historicalLog = new HistoricalLog(DEBUG_LOG_LENGTH, "allocator[%d]", 
id);
+  hist("created by \"%s\", owned = %d", name.toString(), 
this.getAllocatedMemory());
 

[jira] [Updated] (DRILL-1760) Count on a map fails with SchemaChangeException

2015-12-02 Thread Hanifi Gunes (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanifi Gunes updated DRILL-1760:

Fix Version/s: (was: 1.4.0)

> Count on a map fails with SchemaChangeException
> ---
>
> Key: DRILL-1760
> URL: https://issues.apache.org/jira/browse/DRILL-1760
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.0.0
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
>
> Take yelp business dataset and run
> {code:sql}
> select count(attributes) from dfs.`/path/to/yelp-business.json`
> {code}
> you should read
> {code:java}
> org.apache.drill.exec.exception.SchemaChangeException: Failure while 
> materializing expression. 
> Error in expression at index -1.  Error: Missing function implementation: 
> [count(MAP-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
>   at 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.createAggregatorInternal(StreamingAggBatch.java:221)
>  [classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.createAggregator(StreamingAggBatch.java:173)
>  [classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema(StreamingAggBatch.java:89)
>  [classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema(IteratorValidatorBatchIterator.java:80)
>  [classes/:na]
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.buildSchema(AbstractSingleRecordBatch.java:109)
>  [classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema(IteratorValidatorBatchIterator.java:80)
>  [classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.buildSchema(RemovingRecordBatch.java:64)
>  [classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema(IteratorValidatorBatchIterator.java:80)
>  [classes/:na]
>   at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.buildSchema(ScreenCreator.java:95)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:111)
>  [classes/:na]
>   at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:249)
>  [classes/:na]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_65]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_65]
>   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> {code}
> I would expect to be able run count query on `attributes` field given that I 
> can run a select on the same field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036481#comment-15036481
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46466143
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/BaseAllocator.java 
---
@@ -0,0 +1,689 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.memory;
+
+import io.netty.buffer.ByteBufAllocator;
+import io.netty.buffer.DrillBuf;
+import io.netty.buffer.UnsafeDirectLittleEndian;
+
+import java.util.Arrays;
+import java.util.IdentityHashMap;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.drill.common.HistoricalLog;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.memory.AllocatorManager.BufferLedger;
+import org.apache.drill.exec.ops.BufferManager;
+import org.apache.drill.exec.util.AssertionUtil;
+
+import com.google.common.base.Preconditions;
+
+public abstract class BaseAllocator extends Accountant implements 
BufferAllocator {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BaseAllocator.class);
+
+  public static final String DEBUG_ALLOCATOR = 
"drill.memory.debug.allocator";
+
+  private static final AtomicLong ID_GENERATOR = new AtomicLong(0);
+  private static final int CHUNK_SIZE = 
AllocatorManager.INNER_ALLOCATOR.getChunkSize();
+
+  public static final int DEBUG_LOG_LENGTH = 6;
+  public static final boolean DEBUG = AssertionUtil.isAssertionsEnabled()
+  || Boolean.parseBoolean(System.getProperty(DEBUG_ALLOCATOR, 
"false"));
+  private final Object DEBUG_LOCK = DEBUG ? new Object() : null;
+
+  private final BaseAllocator parentAllocator;
+  private final ByteBufAllocator thisAsByteBufAllocator;
+  private final IdentityHashMap childAllocators;
+  private final DrillBuf empty;
+
+  private volatile boolean isClosed = false; // the allocator has been 
closed
+
+  // Package exposed for sharing between AllocatorManger and BaseAllocator 
objects
+  final long id = ID_GENERATOR.incrementAndGet(); // unique ID assigned to 
each allocator
+  final String name;
+  final RootAllocator root;
+
+  // members used purely for debugging
+  private final IdentityHashMap childLedgers;
+  private final IdentityHashMap reservations;
+  private final HistoricalLog historicalLog;
+
+  protected BaseAllocator(
+  final BaseAllocator parentAllocator,
+  final String name,
+  final long initReservation,
+  final long maxAllocation) throws OutOfMemoryException {
+super(parentAllocator, initReservation, maxAllocation);
+
+if (parentAllocator != null) {
+  this.root = parentAllocator.root;
+  empty = parentAllocator.empty;
+} else if (this instanceof RootAllocator) {
+  this.root = (RootAllocator) this;
+  empty = createEmpty();
+} else {
+  throw new IllegalStateException("An parent allocator must either 
carry a root or be the root.");
+}
+
+this.parentAllocator = parentAllocator;
+this.name = name;
+
+// TODO: DRILL-4131
+// this.thisAsByteBufAllocator = new DrillByteBufAllocator(this);
+this.thisAsByteBufAllocator = 
AllocatorManager.INNER_ALLOCATOR.allocator;
+
+if (DEBUG) {
+  childAllocators = new IdentityHashMap<>();
+  reservations = new IdentityHashMap<>();
+  childLedgers = new IdentityHashMap<>();
+  historicalLog = new HistoricalLog(DEBUG_LOG_LENGTH, "allocator[%d]", 
id);
+  hist("created by \"%s\", owned = %d", name.toString(), 
this.getAllocatedMemory());
 

[jira] [Commented] (DRILL-4111) turn tests off in travis as they don't work there

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036468#comment-15036468
 ] 

ASF GitHub Bot commented on DRILL-4111:
---

Github user julienledem commented on the pull request:

https://github.com/apache/drill/pull/267#issuecomment-161412734
  
Thank you!


> turn tests off in travis as they don't work there
> -
>
> Key: DRILL-4111
> URL: https://issues.apache.org/jira/browse/DRILL-4111
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.4.0
>
>
> Since the travis build always fails, we should just turn it off for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4153) Query with "select columnName, *" fails with IOB

2015-12-02 Thread Abhishek Girish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-4153:
---
Attachment: drillbit.log.txt

> Query with "select columnName, *" fails with IOB
> 
>
> Key: DRILL-4153
> URL: https://issues.apache.org/jira/browse/DRILL-4153
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Reporter: Abhishek Girish
> Attachments: drillbit.log.txt
>
>
> Query with select columnName, * fails with IOB:
> {code}
> select c_customer_sk, * as c from dfs.tpcds_sf1_parquet.customer limit 1;
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IndexOutOfBoundsException: index (-1) must not be negative [Error Id: 
> 05b29f77-7668-48e3-a423-a13f0fe9e79a on atsqa6c64.qa.lab:31010]
> {code}
> This issue isn't seen when * precedes columnName.
> Log attached. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4153) Query with "select columnName, *" fails with IOB

2015-12-02 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-4153:
--

 Summary: Query with "select columnName, *" fails with IOB
 Key: DRILL-4153
 URL: https://issues.apache.org/jira/browse/DRILL-4153
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Reporter: Abhishek Girish


Query with select columnName, * fails with IOB:

{code}
select c_customer_sk, * as c from dfs.tpcds_sf1_parquet.customer limit 1;

Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
IndexOutOfBoundsException: index (-1) must not be negative [Error Id: 
05b29f77-7668-48e3-a423-a13f0fe9e79a on atsqa6c64.qa.lab:31010]
{code}

This issue isn't seen when * precedes columnName.

Log attached. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-2419) UDF that returns string representation of expression type

2015-12-02 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid resolved DRILL-2419.

   Resolution: Fixed
Fix Version/s: (was: Future)
   1.3.0

Fixed in eb6325dc9b59291582cd7d3c3e5d02efd5d15906. 



> UDF that returns string representation of expression type
> -
>
> Key: DRILL-2419
> URL: https://issues.apache.org/jira/browse/DRILL-2419
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Victoria Markman
>Assignee: Steven Phillips
> Fix For: 1.3.0
>
>
> Suggested name: typeof (credit goes to Aman)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2419) UDF that returns string representation of expression type

2015-12-02 Thread Mehant Baid (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-2419:
---
Assignee: Steven Phillips  (was: Mehant Baid)

> UDF that returns string representation of expression type
> -
>
> Key: DRILL-2419
> URL: https://issues.apache.org/jira/browse/DRILL-2419
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Victoria Markman
>Assignee: Steven Phillips
> Fix For: Future
>
>
> Suggested name: typeof (credit goes to Aman)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036320#comment-15036320
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46453393
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/BaseAllocator.java 
---
@@ -0,0 +1,689 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.memory;
+
+import io.netty.buffer.ByteBufAllocator;
+import io.netty.buffer.DrillBuf;
+import io.netty.buffer.UnsafeDirectLittleEndian;
+
+import java.util.Arrays;
+import java.util.IdentityHashMap;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.drill.common.HistoricalLog;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.memory.AllocatorManager.BufferLedger;
+import org.apache.drill.exec.ops.BufferManager;
+import org.apache.drill.exec.util.AssertionUtil;
+
+import com.google.common.base.Preconditions;
+
+public abstract class BaseAllocator extends Accountant implements 
BufferAllocator {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BaseAllocator.class);
+
+  public static final String DEBUG_ALLOCATOR = 
"drill.memory.debug.allocator";
+
+  private static final AtomicLong ID_GENERATOR = new AtomicLong(0);
+  private static final int CHUNK_SIZE = 
AllocatorManager.INNER_ALLOCATOR.getChunkSize();
+
+  public static final int DEBUG_LOG_LENGTH = 6;
+  public static final boolean DEBUG = AssertionUtil.isAssertionsEnabled()
+  || Boolean.parseBoolean(System.getProperty(DEBUG_ALLOCATOR, 
"false"));
+  private final Object DEBUG_LOCK = DEBUG ? new Object() : null;
+
+  private final BaseAllocator parentAllocator;
+  private final ByteBufAllocator thisAsByteBufAllocator;
+  private final IdentityHashMap childAllocators;
+  private final DrillBuf empty;
+
+  private volatile boolean isClosed = false; // the allocator has been 
closed
+
+  // Package exposed for sharing between AllocatorManger and BaseAllocator 
objects
+  final long id = ID_GENERATOR.incrementAndGet(); // unique ID assigned to 
each allocator
+  final String name;
+  final RootAllocator root;
+
+  // members used purely for debugging
+  private final IdentityHashMap childLedgers;
+  private final IdentityHashMap reservations;
+  private final HistoricalLog historicalLog;
+
+  protected BaseAllocator(
+  final BaseAllocator parentAllocator,
+  final String name,
+  final long initReservation,
+  final long maxAllocation) throws OutOfMemoryException {
+super(parentAllocator, initReservation, maxAllocation);
+
+if (parentAllocator != null) {
+  this.root = parentAllocator.root;
+  empty = parentAllocator.empty;
+} else if (this instanceof RootAllocator) {
+  this.root = (RootAllocator) this;
+  empty = createEmpty();
+} else {
+  throw new IllegalStateException("An parent allocator must either 
carry a root or be the root.");
+}
+
+this.parentAllocator = parentAllocator;
+this.name = name;
+
+// TODO: DRILL-4131
+// this.thisAsByteBufAllocator = new DrillByteBufAllocator(this);
+this.thisAsByteBufAllocator = 
AllocatorManager.INNER_ALLOCATOR.allocator;
+
+if (DEBUG) {
+  childAllocators = new IdentityHashMap<>();
+  reservations = new IdentityHashMap<>();
+  childLedgers = new IdentityHashMap<>();
+  historicalLog = new HistoricalLog(DEBUG_LOG_LENGTH, "allocator[%d]", 
id);
+  hist("created by \"%s\", owned = %d", name.toString(), 
this.getAllocatedMemory());
 

[jira] [Commented] (DRILL-4134) Incorporate remaining patches from DRILL-1942 Allocator refactor

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036314#comment-15036314
 ] 

ASF GitHub Bot commented on DRILL-4134:
---

Github user adeneche commented on a diff in the pull request:

https://github.com/apache/drill/pull/283#discussion_r46452877
  
--- Diff: 
exec/memory/base/src/main/java/org/apache/drill/exec/memory/BaseAllocator.java 
---
@@ -0,0 +1,689 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.memory;
+
+import io.netty.buffer.ByteBufAllocator;
+import io.netty.buffer.DrillBuf;
+import io.netty.buffer.UnsafeDirectLittleEndian;
+
+import java.util.Arrays;
+import java.util.IdentityHashMap;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.drill.common.HistoricalLog;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.memory.AllocatorManager.BufferLedger;
+import org.apache.drill.exec.ops.BufferManager;
+import org.apache.drill.exec.util.AssertionUtil;
+
+import com.google.common.base.Preconditions;
+
+public abstract class BaseAllocator extends Accountant implements 
BufferAllocator {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BaseAllocator.class);
+
+  public static final String DEBUG_ALLOCATOR = 
"drill.memory.debug.allocator";
+
+  private static final AtomicLong ID_GENERATOR = new AtomicLong(0);
+  private static final int CHUNK_SIZE = 
AllocatorManager.INNER_ALLOCATOR.getChunkSize();
+
+  public static final int DEBUG_LOG_LENGTH = 6;
+  public static final boolean DEBUG = AssertionUtil.isAssertionsEnabled()
+  || Boolean.parseBoolean(System.getProperty(DEBUG_ALLOCATOR, 
"false"));
+  private final Object DEBUG_LOCK = DEBUG ? new Object() : null;
+
+  private final BaseAllocator parentAllocator;
+  private final ByteBufAllocator thisAsByteBufAllocator;
+  private final IdentityHashMap childAllocators;
+  private final DrillBuf empty;
+
+  private volatile boolean isClosed = false; // the allocator has been 
closed
+
+  // Package exposed for sharing between AllocatorManger and BaseAllocator 
objects
+  final long id = ID_GENERATOR.incrementAndGet(); // unique ID assigned to 
each allocator
+  final String name;
+  final RootAllocator root;
+
+  // members used purely for debugging
+  private final IdentityHashMap childLedgers;
+  private final IdentityHashMap reservations;
+  private final HistoricalLog historicalLog;
+
+  protected BaseAllocator(
+  final BaseAllocator parentAllocator,
+  final String name,
+  final long initReservation,
+  final long maxAllocation) throws OutOfMemoryException {
+super(parentAllocator, initReservation, maxAllocation);
+
+if (parentAllocator != null) {
+  this.root = parentAllocator.root;
+  empty = parentAllocator.empty;
+} else if (this instanceof RootAllocator) {
+  this.root = (RootAllocator) this;
+  empty = createEmpty();
+} else {
+  throw new IllegalStateException("An parent allocator must either 
carry a root or be the root.");
+}
+
+this.parentAllocator = parentAllocator;
+this.name = name;
+
+// TODO: DRILL-4131
+// this.thisAsByteBufAllocator = new DrillByteBufAllocator(this);
+this.thisAsByteBufAllocator = 
AllocatorManager.INNER_ALLOCATOR.allocator;
+
+if (DEBUG) {
+  childAllocators = new IdentityHashMap<>();
+  reservations = new IdentityHashMap<>();
+  childLedgers = new IdentityHashMap<>();
+  historicalLog = new HistoricalLog(DEBUG_LOG_LENGTH, "allocator[%d]", 
id);
+  hist("created by \"%s\", owned = %d", name.toString(), 
this.getAllocatedMemory());
 

[jira] [Created] (DRILL-4152) Add additional logging and metrics to the Parquet reader

2015-12-02 Thread Parth Chandra (JIRA)
Parth Chandra created DRILL-4152:


 Summary: Add additional logging and metrics to the Parquet reader
 Key: DRILL-4152
 URL: https://issues.apache.org/jira/browse/DRILL-4152
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Reporter: Parth Chandra
Assignee: Parth Chandra


In some cases, we see the Parquet reader as the bottleneck in reading from the 
file system. RWSpeedTest is able to read 10x faster than the Parquet reader so 
reading from disk is not the issue. This issue is to add more instrumentation 
to the Parquet reader so speed bottlenecks can be better diagnosed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4151) CSV Support with multiline header

2015-12-02 Thread Jaroslaw Sosnicki (JIRA)
Jaroslaw Sosnicki created DRILL-4151:


 Summary: CSV Support with multiline header
 Key: DRILL-4151
 URL: https://issues.apache.org/jira/browse/DRILL-4151
 Project: Apache Drill
  Issue Type: Wish
  Components: Functions - Drill
Reporter: Jaroslaw Sosnicki


The modern data sources produce CSV files with two header lines:
first line contains field descriptions while second line contains filed types
Would be feasible to implement such a format in to DRILL as additional storage 
format type?

This example demonstrates an output CSV header from one of the data sources.


LDEV_COUNT,MONITORED_LDEV_COUNT,READ_IO_COUNT,READ_IO_RATE,READ_HIT_IO_COUNT,READ_HIT_RATE,WRITE_IO_COUNT,WRITE_IO_RATE,WRITE_HIT_IO_COUNT,WRITE_HIT_RATE,READ_MBYTES,READ_XFER_RATE,WRITE_MBYTES,WRITE_XFER_RATE,INTERVAL,INPUT_RECORD_TYPE,RECORD_TIME
ulong,ulong,double,double,double,double,double,double,double,double,double,double,double,double,ulong,string(8),time_t







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4081) Handle schema changes in ExternalSort

2015-12-02 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4081.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.4.0
>
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4094) Respect -DskipTests=true for JDBC plugin tests

2015-12-02 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4094.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Respect -DskipTests=true for JDBC plugin tests
> --
>
> Key: DRILL-4094
> URL: https://issues.apache.org/jira/browse/DRILL-4094
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Andrew
>Assignee: Andrew
>Priority: Trivial
> Fix For: 1.4.0
>
>
> The maven config for the JDBC storage plugin does not respect the -DskipTests 
> option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4047) Select with options

2015-12-02 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4047.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Select with options
> ---
>
> Key: DRILL-4047
> URL: https://issues.apache.org/jira/browse/DRILL-4047
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.4.0
>
>
> Add a mechanism to pass parameters down to the StoragePlugin when writing a 
> Select statement.
> Some discussion here:
> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAO%2Bvc4AcGK3%2B3QYvQV1-xPPdpG3Tc%2BfG%3D0xDGEUPrhd6ktHv5Q%40mail.gmail.com%3E
> http://mail-archives.apache.org/mod_mbox/drill-dev/201511.mbox/%3ccao+vc4clzylvjevisfjqtcyxb-zsmfy4bqrm-jhbidwzgqf...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4063) Missing files/classes needed for S3a access

2015-12-02 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated DRILL-4063:
---
Fix Version/s: 1.3.0

> Missing files/classes needed for S3a access
> ---
>
> Key: DRILL-4063
> URL: https://issues.apache.org/jira/browse/DRILL-4063
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.3.0
> Environment: All
>Reporter: Nathan Griffith
>Assignee: Abhijit Pol
>  Labels: aws, aws-s3, s3, storage
> Fix For: 1.3.0
>
>
> Specifying
> {code}
> "connection": "s3a://"
> {code}
> results in the following error:
> {code}
> Error: SYSTEM ERROR: ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> {code}
> I can fix this by dropping in these files from the hadoop binary tarball:
> hadoop-aws-2.6.2.jar
> aws-java-sdk-1.7.4.jar
> And then adding this to my core-site.xml:
> {code:xml}
>   
> fs.s3a.access.key
> ACCESSKEY
>   
>   
> fs.s3a.secret.key
> SECRETKEY
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-3997) JDBC v1.2 Error: java.lang.IndexOutOfBoundsException: Index: 0

2015-12-02 Thread Jaroslaw Sosnicki (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaroslaw Sosnicki closed DRILL-3997.

Resolution: Not A Problem

No longer a problem

> JDBC v1.2 Error: java.lang.IndexOutOfBoundsException: Index: 0
> --
>
> Key: DRILL-3997
> URL: https://issues.apache.org/jira/browse/DRILL-3997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.2.0
> Environment: Windows Linux
>Reporter: Jaroslaw Sosnicki
>
> Connecting to Apache Drill V1.2 using JDBC driver supplied with v1.2 
> configured on Squirrel 3.7 produces this error:
> Error: Drill_Dev v1.2: java.sql.SQLException: Unexpected RuntimeException: 
> java.lang.IndexOutOfBoundsException: Index: 0
> A connection alias configured using v1.1 version of JDBC driver does not 
> produce this error and connection succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4053) Reduce metadata cache file size

2015-12-02 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4053.

Resolution: Fixed

> Reduce metadata cache file size
> ---
>
> Key: DRILL-4053
> URL: https://issues.apache.org/jira/browse/DRILL-4053
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: 1.3.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
> Fix For: 1.4.0
>
>
> The parquet metadata cache file has fair amount of redundant metadata that 
> causes the size of the cache file to bloat. Two things that we can reduce are 
> :
> 1) Schema is repeated for every row group. We can keep a merged schema 
> (similar to what was discussed for insert into functionality) 2) The max and 
> min value in the stats are used for partition pruning when the values are the 
> same. We can keep the maxValue only and that too only if it is the same as 
> the minValue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4108) Query on csv file w/ header fails with an exception when non existing column is requested

2015-12-02 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4108.

Resolution: Fixed
  Assignee: Abhijit Pol

> Query on csv file w/ header fails with an exception when non existing column 
> is requested
> -
>
> Key: DRILL-4108
> URL: https://issues.apache.org/jira/browse/DRILL-4108
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.3.0
>Reporter: Abhi Pol
>Assignee: Abhijit Pol
> Fix For: 1.4.0
>
>
> Drill query on a csv file with header requesting column(s) that do not exists 
> in header fails with an exception.
> *Current behavior:* once extractHeader is enabled, query columns must be 
> columns from the header
> *Expected behavior:* non existing columns should appear with 'null' values 
> like default drill behavior
> {noformat}
> 0: jdbc:drill:zk=local> select Category from dfs.`/tmp/cars.csvh` limit 10;
> java.lang.ArrayIndexOutOfBoundsException: -1
>   at 
> org.apache.drill.exec.store.easy.text.compliant.FieldVarCharOutput.(FieldVarCharOutput.java:104)
>   at 
> org.apache.drill.exec.store.easy.text.compliant.CompliantTextRecordReader.setup(CompliantTextRecordReader.java:118)
>   at 
> org.apache.drill.exec.physical.impl.ScanBatch.(ScanBatch.java:108)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:198)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:35)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:151)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:105)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:79)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:230)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: -1
> Fragment 0:0
> [Error Id: f272960e-fa2f-408e-918c-722190398cd3 on blackhole:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4147) Union All operator runs in a single fragment

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036014#comment-15036014
 ] 

ASF GitHub Bot commented on DRILL-4147:
---

GitHub user hsuanyi opened a pull request:

https://github.com/apache/drill/pull/288

DRILL-4147: Change UnionPrel's DrillDistributionTrait to be ANY to al…

…low Union-All to be done in parallel

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hsuanyi/incubator-drill DRILL-4147

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/288.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #288


commit 9f31a4c04c2cb219237519070b35d5fae3010908
Author: Hsuan-Yi Chu 
Date:   2015-12-02T00:46:51Z

DRILL-4147: Change UnionPrel's DrillDistributionTrait to be ANY to allow 
Union-All to be done in parallel




> Union All operator runs in a single fragment
> 
>
> Key: DRILL-4147
> URL: https://issues.apache.org/jira/browse/DRILL-4147
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: amit hadke
>Assignee: Sean Hsuan-Yi Chu
>
> A User noticed that running select  from a single directory is much faster 
> than union all on two directories.
> (https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/#comment-2349732267)
>  
> It seems like UNION ALL operator doesn't parallelize sub scans (its using 
> SINGLETON for distribution type). Everything is ran in single fragment.
> We may have to use SubsetTransformer in UnionAllPrule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-12-02 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips updated DRILL-4145:
---
Assignee: Jacques Nadeau  (was: Steven Phillips)

> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
>Assignee: Jacques Nadeau
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-X.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   
> FIELD_12   | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 
> 925630488  | 0| 925630488  | -1   | 19531580547  |   | 
> 27/10/2015 02:00:00  |   | 30| 300   | 0 | 0  
>|   |   | 27/10/2015 02:05:27  | 0 | 1 | 0 
> | 35.0  |   |   |   | 505   | 872.0   
>   |   | aBc   |   |   |   |   |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---

[jira] [Commented] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035590#comment-15035590
 ] 

ASF GitHub Bot commented on DRILL-4145:
---

GitHub user StevenMPhillips opened a pull request:

https://github.com/apache/drill/pull/287

DRILL-4145: Handle empty final field in Text reader correctly



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StevenMPhillips/drill drill-4145

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #287


commit 8f56250aeb29d5d21bcdc6c727cec89607150224
Author: Steven Phillips 
Date:   2015-12-02T10:09:20Z

DRILL-4145: Handle empty final field in Text reader correctly




> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-X.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   
> FIELD_12   | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 
> 925630488  | 0| 925630488  | -1   | 19531580547  |   | 
> 27/10/2015 02:00

[jira] [Assigned] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-12-02 Thread Steven Phillips (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Phillips reassigned DRILL-4145:
--

Assignee: Steven Phillips

> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
>Assignee: Steven Phillips
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-X.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   
> FIELD_12   | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 
> 925630488  | 0| 925630488  | -1   | 19531580547  |   | 
> 27/10/2015 02:00:00  |   | 30| 300   | 0 | 0  
>|   |   | 27/10/2015 02:05:27  | 0 | 1 | 0 
> | 35.0  |   |   |   | 505   | 872.0   
>   |   | aBc   |   |   |   |   |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+--

[jira] [Commented] (DRILL-4145) IndexOutOfBoundsException raised during select * query on S3 csv file

2015-12-02 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035589#comment-15035589
 ] 

Steven Phillips commented on DRILL-4145:


There is a bug in the case where there is an empty string for the last field. 
Basically, when the parser sees the pattern , 
the parser calls the "endEmptyField()" method of the TextInput. This was ok 
when using the RepeatedVarCharInput, because calling this method resulted in an 
empty string element being added to the array. But in the FieldVarCharOutput, 
ending the field doesn't do anything unless you first start the field.

> IndexOutOfBoundsException raised during select * query on S3 csv file
> -
>
> Key: DRILL-4145
> URL: https://issues.apache.org/jira/browse/DRILL-4145
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.3.0
> Environment: Drill 1.3.0 on a 3 node distributed-mode cluster on AWS.
> Data files on S3.
> S3 storage plugin configuration:
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3a://",
>   "workspaces": {
> "root": {
>   "location": "/",
>   "writable": false,
>   "defaultInputFormat": null
> },
> "views": {
>   "location": "/processed",
>   "writable": true,
>   "defaultInputFormat": null
> },
> "tmp": {
>   "location": "/tmp",
>   "writable": true,
>   "defaultInputFormat": null
> }
>   },
>   "formats": {
> "psv": {
>   "type": "text",
>   "extensions": [
> "tbl"
>   ],
>   "delimiter": "|"
> },
> "csv": {
>   "type": "text",
>   "extensions": [
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> },
> "tsv": {
>   "type": "text",
>   "extensions": [
> "tsv"
>   ],
>   "delimiter": "\t"
> },
> "parquet": {
>   "type": "parquet"
> },
> "json": {
>   "type": "json"
> },
> "avro": {
>   "type": "avro"
> },
> "sequencefile": {
>   "type": "sequencefile",
>   "extensions": [
> "seq"
>   ]
> },
> "csvh": {
>   "type": "text",
>   "extensions": [
> "csvh",
> "csv"
>   ],
>   "extractHeader": true,
>   "delimiter": ","
> }
>   }
> }
>Reporter: Peter McTaggart
> Attachments: apps1-bad.csv, apps1.csv
>
>
> When trying to query (via sqlline or WebUI) a .csv file I am getting an 
> IndexOutofBoundsException:
> {noformat} 0: jdbc:drill:> select * from 
> s3data.root.`staging/data/apps1-bad.csv` limit 1;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 
> (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: be9856d2-0b80-4b9c-94a4-a1ca38ec5db0 on 
> ip-X.compute.internal:31010] (state=,code=0)
> 0: jdbc:drill:> select * from s3data.root.`staging/data/apps1.csv` limit 1;
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | FIELD_1  |   FIELD_2| FIELD_3  | FIELD_4  | FIELD_5  |  FIELD_6 
>   | FIELD_7  |  FIELD_8   | FIELD_9  |   FIELD_10   | FIELD_11  |   
> FIELD_12   | FIELD_13  | FIELD_14  | FIELD_15  | FIELD_16  | FIELD_17  | 
> FIELD_18  | FIELD_19  |   FIELD_20   | FIELD_21  | FIELD_22  | 
> FIELD_23  | FIELD_24  | FIELD_25  | FIELD_26  | FIELD_27  | FIELD_28  | 
> FIELD_29  | FIELD_30  | FIELD_31  | FIELD_32  | FIELD_33  | FIELD_34  | 
> FIELD_35  |
> +--+--+--+--+--++--++--+--+---+--+---+---+---+---+---+---+---+--+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
> | 489517   | 27/10/2015 02:05:27  | 261  | 1130232  | 0| 
> 925630488  | 0| 925630488  | -1   | 19531580547  |   | 
> 27/10/2015 02:00:00  |   | 30| 300   | 0 | 0  
>|   |   | 27/10/2015 02:05:27  | 0 | 1 | 0 
> | 35.0  |   |   |   | 505   | 872.0   
>   |   | aBc   |   |   |   |   |
> +--+---

[jira] [Commented] (DRILL-4108) Query on csv file w/ header fails with an exception when non existing column is requested

2015-12-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035584#comment-15035584
 ] 

ASF GitHub Bot commented on DRILL-4108:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/269


> Query on csv file w/ header fails with an exception when non existing column 
> is requested
> -
>
> Key: DRILL-4108
> URL: https://issues.apache.org/jira/browse/DRILL-4108
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.3.0
>Reporter: Abhi Pol
> Fix For: 1.4.0
>
>
> Drill query on a csv file with header requesting column(s) that do not exists 
> in header fails with an exception.
> *Current behavior:* once extractHeader is enabled, query columns must be 
> columns from the header
> *Expected behavior:* non existing columns should appear with 'null' values 
> like default drill behavior
> {noformat}
> 0: jdbc:drill:zk=local> select Category from dfs.`/tmp/cars.csvh` limit 10;
> java.lang.ArrayIndexOutOfBoundsException: -1
>   at 
> org.apache.drill.exec.store.easy.text.compliant.FieldVarCharOutput.(FieldVarCharOutput.java:104)
>   at 
> org.apache.drill.exec.store.easy.text.compliant.CompliantTextRecordReader.setup(CompliantTextRecordReader.java:118)
>   at 
> org.apache.drill.exec.physical.impl.ScanBatch.(ScanBatch.java:108)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin.getReaderBatch(EasyFormatPlugin.java:198)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:35)
>   at 
> org.apache.drill.exec.store.dfs.easy.EasyReaderBatchCreator.getBatch(EasyReaderBatchCreator.java:28)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:151)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:131)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:174)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:105)
>   at 
> org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:79)
>   at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:230)
>   at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Error: SYSTEM ERROR: ArrayIndexOutOfBoundsException: -1
> Fragment 0:0
> [Error Id: f272960e-fa2f-408e-918c-722190398cd3 on blackhole:31010] 
> (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)