[jira] [Updated] (DRILL-5772) Enable UTF-8 support in query string by default

2017-10-25 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5772:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Enable UTF-8 support in query string by default
> ---
>
> Key: DRILL-5772
> URL: https://issues.apache.org/jira/browse/DRILL-5772
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.12.0
>
>
> Add unit test to indicated how utf-8 support can be enabled in Drill.
> To select utf-8 data user needs to update system property 
> {{saffron.default.charset}} to {{UTF-16LE}} before starting the drillbit. 
> Calcite uses this property to get default charset, if it is not set then 
> {{ISO-8859-1}} is used by default. Drill gets default charset from Calcite.
> This information should be also documented, probably in 
> https://drill.apache.org/docs/data-type-conversion/.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5772) Enable UTF-8 support in query string by default

2017-10-25 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5772:

Description: 
Add unit test to indicated how utf-8 support can be enabled in Drill.
To select utf-8 data user needs to update system property 
{{saffron.default.charset}} to {{UTF-16LE}} before starting the drillbit. 
Calcite uses this property to get default charset, if it is not set then 
{{ISO-8859-1}} is used by default. Drill gets default charset from Calcite.

Enable 

This information should be also documented, probably in 
https://drill.apache.org/docs/data-type-conversion/.

  was:
Add unit test to indicated how utf-8 support can be enabled in Drill.
To select utf-8 data user needs to update system property 
{{saffron.default.charset}} to {{UTF-16LE}} before starting the drillbit. 
Calcite uses this property to get default charset, if it is not set then 
{{ISO-8859-1}} is used by default. Drill gets default charset from Calcite.

This information should be also documented, probably in 
https://drill.apache.org/docs/data-type-conversion/.


> Enable UTF-8 support in query string by default
> ---
>
> Key: DRILL-5772
> URL: https://issues.apache.org/jira/browse/DRILL-5772
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.12.0
>
>
> Add unit test to indicated how utf-8 support can be enabled in Drill.
> To select utf-8 data user needs to update system property 
> {{saffron.default.charset}} to {{UTF-16LE}} before starting the drillbit. 
> Calcite uses this property to get default charset, if it is not set then 
> {{ISO-8859-1}} is used by default. Drill gets default charset from Calcite.
> Enable 
> This information should be also documented, probably in 
> https://drill.apache.org/docs/data-type-conversion/.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5772) Enable UTF-8 support in query string by default

2017-10-25 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5772:

Description: 
Now saffron.propertries file will be added into Drill conf directory which will 
define default encoding used to parse query string.
Content:
{noformat}
saffron.default.charset=UTF-16LE
saffron.default.nationalcharset=UTF-16LE
saffron.default.collation.name=UTF-16LE$en_US 
{noformat}
This information should be also documented, probably in 
https://drill.apache.org/docs/data-type-conversion/.

  was:
Add unit test to indicated how utf-8 support can be enabled in Drill.
To select utf-8 data user needs to update system property 
{{saffron.default.charset}} to {{UTF-16LE}} before starting the drillbit. 
Calcite uses this property to get default charset, if it is not set then 
{{ISO-8859-1}} is used by default. Drill gets default charset from Calcite.

Enable 

This information should be also documented, probably in 
https://drill.apache.org/docs/data-type-conversion/.


> Enable UTF-8 support in query string by default
> ---
>
> Key: DRILL-5772
> URL: https://issues.apache.org/jira/browse/DRILL-5772
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.11.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.12.0
>
>
> Now saffron.propertries file will be added into Drill conf directory which 
> will define default encoding used to parse query string.
> Content:
> {noformat}
> saffron.default.charset=UTF-16LE
> saffron.default.nationalcharset=UTF-16LE
> saffron.default.collation.name=UTF-16LE$en_US 
> {noformat}
> This information should be also documented, probably in 
> https://drill.apache.org/docs/data-type-conversion/.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5878) TableNotFound exception is being reported for a wrong storage plugin.

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218394#comment-16218394
 ] 

ASF GitHub Bot commented on DRILL-5878:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/996#discussion_r146815141
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
 ---
@@ -481,6 +485,19 @@ public RelOptTableImpl getTable(final List 
names) {
 .message("Temporary tables usage is disallowed. Used temporary 
table name: %s.", names)
 .build(logger);
   }
+
+  // Check the schema and throw a valid SchemaNotFound exception 
instead of TableNotFound exception.
--- End diff --

Could you please factor out this logic in a separate method?


> TableNotFound exception is being reported for a wrong storage plugin.
> -
>
> Key: DRILL-5878
> URL: https://issues.apache.org/jira/browse/DRILL-5878
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.11.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Minor
> Fix For: 1.12.0
>
>
> Drill is reporting TableNotFound exception for a wrong storage plugin. 
> Consider the following query where employee.json is queried using cp plugin.
> {code}
> 0: jdbc:drill:zk=local> select * from cp.`employee.json` limit 10;
> +--++-++--+-+---++-++--++---+-+-++
> | employee_id  | full_name  | first_name  | last_name  | position_id  
> | position_title  | store_id  | department_id  | birth_date  |   
> hire_date|  salary  | supervisor_id  |  education_level  | 
> marital_status  | gender  |  management_role   |
> +--++-++--+-+---++-++--++---+-+-++
> | 1| Sheri Nowmer   | Sheri   | Nowmer | 1
> | President   | 0 | 1  | 1961-08-26  | 
> 1994-12-01 00:00:00.0  | 8.0  | 0  | Graduate Degree   | S
>| F   | Senior Management  |
> | 2| Derrick Whelply| Derrick | Whelply| 2
> | VP Country Manager  | 0 | 1  | 1915-07-03  | 
> 1994-12-01 00:00:00.0  | 4.0  | 1  | Graduate Degree   | M
>| M   | Senior Management  |
> | 4| Michael Spence | Michael | Spence | 2
> | VP Country Manager  | 0 | 1  | 1969-06-20  | 
> 1998-01-01 00:00:00.0  | 4.0  | 1  | Graduate Degree   | S
>| M   | Senior Management  |
> | 5| Maya Gutierrez | Maya| Gutierrez  | 2
> | VP Country Manager  | 0 | 1  | 1951-05-10  | 
> 1998-01-01 00:00:00.0  | 35000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 6| Roberta Damstra| Roberta | Damstra| 3
> | VP Information Systems  | 0 | 2  | 1942-10-08  | 
> 1994-12-01 00:00:00.0  | 25000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 7| Rebecca Kanagaki   | Rebecca | Kanagaki   | 4
> | VP Human Resources  | 0 | 3  | 1949-03-27  | 
> 1994-12-01 00:00:00.0  | 15000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 8| Kim Brunner| Kim | Brunner| 11   
> | Store Manager   | 9 | 11 | 1922-08-10  | 
> 1998-01-01 00:00:00.0  | 1.0  | 5  | Bachelors Degree  | S
>| F   | Store Management   |
> | 9| Brenda Blumberg| Brenda  | Blumberg   | 11   
> | Store Manager   | 21| 11 | 1979-06-23  | 
> 1998-01-01 00:00:00.0  | 17000.0  | 5  | Graduate Degree   | M
>| F   | Store Management   |
> | 10   | Darren Stanz   | Darren  | Stanz  | 5
> | VP Finance  | 0 | 5  | 1949-08-26  | 
> 1994-12-01 00:00:00.0  | 5.0  | 1  | Partial C

[jira] [Commented] (DRILL-5878) TableNotFound exception is being reported for a wrong storage plugin.

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218392#comment-16218392
 ] 

ASF GitHub Bot commented on DRILL-5878:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/996#discussion_r146816584
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SchemaUtilites.java
 ---
@@ -77,6 +77,22 @@ public static SchemaPlus findSchema(final SchemaPlus 
defaultSchema, final String
 return findSchema(defaultSchema, schemaPathAsList);
   }
 
+  /**
+   * Utility function to get the commonPrefix schema between two supplied 
schemas.
--- End diff --

Could you please add example in Java doc?


> TableNotFound exception is being reported for a wrong storage plugin.
> -
>
> Key: DRILL-5878
> URL: https://issues.apache.org/jira/browse/DRILL-5878
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.11.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Minor
> Fix For: 1.12.0
>
>
> Drill is reporting TableNotFound exception for a wrong storage plugin. 
> Consider the following query where employee.json is queried using cp plugin.
> {code}
> 0: jdbc:drill:zk=local> select * from cp.`employee.json` limit 10;
> +--++-++--+-+---++-++--++---+-+-++
> | employee_id  | full_name  | first_name  | last_name  | position_id  
> | position_title  | store_id  | department_id  | birth_date  |   
> hire_date|  salary  | supervisor_id  |  education_level  | 
> marital_status  | gender  |  management_role   |
> +--++-++--+-+---++-++--++---+-+-++
> | 1| Sheri Nowmer   | Sheri   | Nowmer | 1
> | President   | 0 | 1  | 1961-08-26  | 
> 1994-12-01 00:00:00.0  | 8.0  | 0  | Graduate Degree   | S
>| F   | Senior Management  |
> | 2| Derrick Whelply| Derrick | Whelply| 2
> | VP Country Manager  | 0 | 1  | 1915-07-03  | 
> 1994-12-01 00:00:00.0  | 4.0  | 1  | Graduate Degree   | M
>| M   | Senior Management  |
> | 4| Michael Spence | Michael | Spence | 2
> | VP Country Manager  | 0 | 1  | 1969-06-20  | 
> 1998-01-01 00:00:00.0  | 4.0  | 1  | Graduate Degree   | S
>| M   | Senior Management  |
> | 5| Maya Gutierrez | Maya| Gutierrez  | 2
> | VP Country Manager  | 0 | 1  | 1951-05-10  | 
> 1998-01-01 00:00:00.0  | 35000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 6| Roberta Damstra| Roberta | Damstra| 3
> | VP Information Systems  | 0 | 2  | 1942-10-08  | 
> 1994-12-01 00:00:00.0  | 25000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 7| Rebecca Kanagaki   | Rebecca | Kanagaki   | 4
> | VP Human Resources  | 0 | 3  | 1949-03-27  | 
> 1994-12-01 00:00:00.0  | 15000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 8| Kim Brunner| Kim | Brunner| 11   
> | Store Manager   | 9 | 11 | 1922-08-10  | 
> 1998-01-01 00:00:00.0  | 1.0  | 5  | Bachelors Degree  | S
>| F   | Store Management   |
> | 9| Brenda Blumberg| Brenda  | Blumberg   | 11   
> | Store Manager   | 21| 11 | 1979-06-23  | 
> 1998-01-01 00:00:00.0  | 17000.0  | 5  | Graduate Degree   | M
>| F   | Store Management   |
> | 10   | Darren Stanz   | Darren  | Stanz  | 5
> | VP Finance  | 0 | 5  | 1949-08-26  | 
> 1994-12-01 00:00:00.0  | 5.0  | 1  | Partial College   | M
>| M   | Senior Management  |
> | 11   | J

[jira] [Commented] (DRILL-5878) TableNotFound exception is being reported for a wrong storage plugin.

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218391#comment-16218391
 ] 

ASF GitHub Bot commented on DRILL-5878:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/996#discussion_r146816101
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/dfs/TestFileSelection.java
 ---
@@ -63,4 +63,17 @@ public void testEmptyFolderThrowsTableNotFound() throws 
Exception {
 }
   }
 
+  @Test(expected = Exception.class)
+  public void testWrongSchemaThrowsSchemaNotFound() throws Exception {
+final String table = String.format("%s/empty", 
TestTools.getTestResourcesPath());
+final String query = String.format("select * from dfs1.`%s`", table);
+try {
+  testNoResult(query);
+} catch (Exception ex) {
+  final String pattern = String.format("[[dfs1]] is not valid with 
respect to either root schema or current default schema").toLowerCase();
+  final boolean isSchemaNotFound = 
ex.getMessage().toLowerCase().contains(pattern);
+  assertTrue(isSchemaNotFound);
+  throw ex;
+}
+  }
--- End diff --

Can you please add test case for the incorrect workspace, 
a. `select * from dfs.incorrect_wk.table;`
b. 
```
use dfs; 
select * from incorrect_wk.table;
```
I assume it will return incorrect schema exception as well? 


> TableNotFound exception is being reported for a wrong storage plugin.
> -
>
> Key: DRILL-5878
> URL: https://issues.apache.org/jira/browse/DRILL-5878
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.11.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Minor
> Fix For: 1.12.0
>
>
> Drill is reporting TableNotFound exception for a wrong storage plugin. 
> Consider the following query where employee.json is queried using cp plugin.
> {code}
> 0: jdbc:drill:zk=local> select * from cp.`employee.json` limit 10;
> +--++-++--+-+---++-++--++---+-+-++
> | employee_id  | full_name  | first_name  | last_name  | position_id  
> | position_title  | store_id  | department_id  | birth_date  |   
> hire_date|  salary  | supervisor_id  |  education_level  | 
> marital_status  | gender  |  management_role   |
> +--++-++--+-+---++-++--++---+-+-++
> | 1| Sheri Nowmer   | Sheri   | Nowmer | 1
> | President   | 0 | 1  | 1961-08-26  | 
> 1994-12-01 00:00:00.0  | 8.0  | 0  | Graduate Degree   | S
>| F   | Senior Management  |
> | 2| Derrick Whelply| Derrick | Whelply| 2
> | VP Country Manager  | 0 | 1  | 1915-07-03  | 
> 1994-12-01 00:00:00.0  | 4.0  | 1  | Graduate Degree   | M
>| M   | Senior Management  |
> | 4| Michael Spence | Michael | Spence | 2
> | VP Country Manager  | 0 | 1  | 1969-06-20  | 
> 1998-01-01 00:00:00.0  | 4.0  | 1  | Graduate Degree   | S
>| M   | Senior Management  |
> | 5| Maya Gutierrez | Maya| Gutierrez  | 2
> | VP Country Manager  | 0 | 1  | 1951-05-10  | 
> 1998-01-01 00:00:00.0  | 35000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 6| Roberta Damstra| Roberta | Damstra| 3
> | VP Information Systems  | 0 | 2  | 1942-10-08  | 
> 1994-12-01 00:00:00.0  | 25000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 7| Rebecca Kanagaki   | Rebecca | Kanagaki   | 4
> | VP Human Resources  | 0 | 3  | 1949-03-27  | 
> 1994-12-01 00:00:00.0  | 15000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 8| Kim Brunner| Kim | Brunner| 11   
> | Store Manager   | 9 | 11   

[jira] [Commented] (DRILL-5878) TableNotFound exception is being reported for a wrong storage plugin.

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218393#comment-16218393
 ] 

ASF GitHub Bot commented on DRILL-5878:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/996#discussion_r146818260
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/dfs/TestFileSelection.java
 ---
@@ -63,4 +63,17 @@ public void testEmptyFolderThrowsTableNotFound() throws 
Exception {
 }
   }
 
--- End diff --

Maybe it makes sense to move new unit tests and 
`testEmptyFolderThrowsTableNotFound` into separate class which will test 
behavior when incorrect object is defined (i.e. schema, workspace, table)?


> TableNotFound exception is being reported for a wrong storage plugin.
> -
>
> Key: DRILL-5878
> URL: https://issues.apache.org/jira/browse/DRILL-5878
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.11.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Minor
> Fix For: 1.12.0
>
>
> Drill is reporting TableNotFound exception for a wrong storage plugin. 
> Consider the following query where employee.json is queried using cp plugin.
> {code}
> 0: jdbc:drill:zk=local> select * from cp.`employee.json` limit 10;
> +--++-++--+-+---++-++--++---+-+-++
> | employee_id  | full_name  | first_name  | last_name  | position_id  
> | position_title  | store_id  | department_id  | birth_date  |   
> hire_date|  salary  | supervisor_id  |  education_level  | 
> marital_status  | gender  |  management_role   |
> +--++-++--+-+---++-++--++---+-+-++
> | 1| Sheri Nowmer   | Sheri   | Nowmer | 1
> | President   | 0 | 1  | 1961-08-26  | 
> 1994-12-01 00:00:00.0  | 8.0  | 0  | Graduate Degree   | S
>| F   | Senior Management  |
> | 2| Derrick Whelply| Derrick | Whelply| 2
> | VP Country Manager  | 0 | 1  | 1915-07-03  | 
> 1994-12-01 00:00:00.0  | 4.0  | 1  | Graduate Degree   | M
>| M   | Senior Management  |
> | 4| Michael Spence | Michael | Spence | 2
> | VP Country Manager  | 0 | 1  | 1969-06-20  | 
> 1998-01-01 00:00:00.0  | 4.0  | 1  | Graduate Degree   | S
>| M   | Senior Management  |
> | 5| Maya Gutierrez | Maya| Gutierrez  | 2
> | VP Country Manager  | 0 | 1  | 1951-05-10  | 
> 1998-01-01 00:00:00.0  | 35000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 6| Roberta Damstra| Roberta | Damstra| 3
> | VP Information Systems  | 0 | 2  | 1942-10-08  | 
> 1994-12-01 00:00:00.0  | 25000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 7| Rebecca Kanagaki   | Rebecca | Kanagaki   | 4
> | VP Human Resources  | 0 | 3  | 1949-03-27  | 
> 1994-12-01 00:00:00.0  | 15000.0  | 1  | Bachelors Degree  | M
>| F   | Senior Management  |
> | 8| Kim Brunner| Kim | Brunner| 11   
> | Store Manager   | 9 | 11 | 1922-08-10  | 
> 1998-01-01 00:00:00.0  | 1.0  | 5  | Bachelors Degree  | S
>| F   | Store Management   |
> | 9| Brenda Blumberg| Brenda  | Blumberg   | 11   
> | Store Manager   | 21| 11 | 1979-06-23  | 
> 1998-01-01 00:00:00.0  | 17000.0  | 5  | Graduate Degree   | M
>| F   | Store Management   |
> | 10   | Darren Stanz   | Darren  | Stanz  | 5
> | VP Finance  | 0 | 5  | 1949-08-26  | 
> 1994-12-01 00:00:00.0  | 5.0  | 1  | Partial College   | M
>| M   | Senior Management  |
> | 11   | Jonatha

[jira] [Resolved] (DRILL-5890) Tests Leak Many Open File Descriptors

2017-10-25 Thread salim achouche (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salim achouche resolved DRILL-5890.
---
   Resolution: Fixed
 Reviewer: Parth Chandra
Fix Version/s: 1.12.0

The number of open file descriptors when running the test-suite should be an 
order of magnitude smaller when applying the fix.

> Tests Leak Many Open File Descriptors
> -
>
> Key: DRILL-5890
> URL: https://issues.apache.org/jira/browse/DRILL-5890
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: salim achouche
> Fix For: 1.12.0
>
>
> Salim and I have discovered that the tests leak many open file descriptors 
> and the tests can hang with even a 64k open file limit. Also doing an lsof 
> periodically shows the number of open files steadily grows over time as the 
> tests run. Fixing this would likely speed up the unit tests and prevent 
> developers from scratching their heads about why the tests are hanging or 
> throwing Too Many Open file exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5906) java.lang.NullPointerException while quering Hive ORC tables on MapR cluster.

2017-10-25 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-5906:
--

 Summary: java.lang.NullPointerException while quering Hive ORC 
tables on MapR cluster. 
 Key: DRILL-5906
 URL: https://issues.apache.org/jira/browse/DRILL-5906
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.11.0
Reporter: Vitalii Diravka
Assignee: Vitalii Diravka
 Fix For: 1.12.0


Record reader throw an exception when trying to read an empty split.

Possible fix: upgrade drill to 
[1.2.0-mapr-1707|https://maprdocs.mapr.com/52/EcosystemRN/HiveRN-1.2.1-1707.html]
 hive.version, where this issue was fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219262#comment-16219262
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1001#discussion_r146948741
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -17,37 +17,166 @@
  */
 package org.apache.drill.exec.expr.fn.impl;
 
-public class SqlPatternContainsMatcher implements SqlPatternMatcher {
+public final class SqlPatternContainsMatcher implements SqlPatternMatcher {
   final String patternString;
   CharSequence charSequenceWrapper;
   final int patternLength;
+  final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString, CharSequence 
charSequenceWrapper) {
-this.patternString = patternString;
+this.patternString   = patternString;
 this.charSequenceWrapper = charSequenceWrapper;
-patternLength = patternString.length();
+patternLength= patternString.length();
+
+// The idea is to write loops with simple condition checks to allow 
the Java Hotspot achieve
+// better optimizations (especially vectorization)
+if (patternLength == 1) {
+  matcherFcn = new Matcher1();
--- End diff --

how does  matcherN perform compared to matcher1, matcher2, matcher3 for 
pattern lengths 1, 2 and 3 ? If matcherN performs well for patternLengths 1, 2 
and 3, we can just have one matcher instead of multiple for different pattern 
lengths. 


> Optimize "Like" operator
> 
>
> Key: DRILL-5879
> URL: https://issues.apache.org/jira/browse/DRILL-5879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
> Environment: * 
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.12.0
>
>
> Query: select  from  where colA like '%a%' or colA like 
> '%xyz%';
> Improvement Opportunities
> # Avoid isAscii computation (full access of the input string) since we're 
> dealing with the same column twice
> # Optimize the "contains" for-loop 
> Implementation Details
> 1)
> * Added a new integer variable "asciiMode" to the VarCharHolder class
> * The default value is -1 which indicates this info is not known
> * Otherwise this value will be set to either 1 or 0 based on the string being 
> in ASCII mode or Unicode
> * The execution plan already shares the same VarCharHolder instance for all 
> evaluations of the same column value
> * The asciiMode will be correctly set during the first LIKE evaluation and 
> will be reused across other LIKE evaluations
> 2) 
> * The "Contains" LIKE operation is quite expensive as the code needs to 
> access the input string to perform character based comparisons
> * Created 4 versions of the same for-loop to a) make the loop simpler to 
> optimize (Vectorization) and b) minimize comparisons
> Benchmarks
> * Lineitem table 100GB
> * Query: select l_returnflag, count(*) from dfs.`` where l_comment 
> not like '%a%' or l_comment like '%the%' group by l_returnflag
> * Before changes: 33sec
> * After changes: 27sec



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5898) Query returns columns in the wrong order

2017-10-25 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219272#comment-16219272
 ] 

Vitalii Diravka commented on DRILL-5898:


[~rhou] Wrong column ordering for queries with ORDER BY and LIMIT clauses was 
fixed with commit for DRILL-5845. Feel free to edit the baseline values.
And looks like the issue mentioned by Boaz is different and root cause is 
unknown yet.

> Query returns columns in the wrong order
> 
>
> Key: DRILL-5898
> URL: https://issues.apache.org/jira/browse/DRILL-5898
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Robert Hou
>Priority: Blocker
> Fix For: 1.12.0
>
>
> This is a regression.  It worked with this commit:
> {noformat}
> f1d1945b3772bb782039fd6811e34a7de66441c8  DRILL-5582: C++ Client: [Threat 
> Modeling] Drillbit may be spoofed by an attacker and this may lead to data 
> being written to the attacker's target instead of Drillbit
> {noformat}
> It fails with this commit, although there are six commits total between the 
> last good one and this one:
> {noformat}
> b0c4e0486d6d4620b04a1bb8198e959d433b4840  DRILL-5876: Use openssl profile 
> to include netty-tcnative dependency with the platform specific classifier
> {noformat}
> Query is:
> {noformat}
> select * from 
> dfs.`/drill/testdata/tpch100_dir_partitioned_5files/lineitem` where 
> dir0=2006 and dir1=12 and dir2=15 and l_discount=0.07 order by l_orderkey, 
> l_extendedprice limit 10
> {noformat}
> Columns are returned in a different order.  Here are the expected results:
> {noformat}
> foxes. furiously final ideas cajol1994-05-27  0.071731.42 4   
> F   653442  4965666.0   1.0 1994-06-23  A   1994-06-22
>   NONESHIP215671  0.07200612  15 (1 time(s))
> lly final account 1994-11-09  0.0745881.783   F   
> 653412  1.320809E7  46.01994-11-24  R   1994-11-08  TAKE 
> BACK RETURNREG AIR 458104  0.08200612  15 (1 time(s))
>  the asymptotes   1997-12-29  0.0760882.8 6   O   653413  
> 1.4271413E7 44.01998-02-04  N   1998-01-20  DELIVER IN 
> PERSON   MAIL21456   0.05200612  15 (1 time(s))
> carefully a   1996-09-23  0.075381.88 2   O   653378  
> 1.6702792E7 3.0 1996-11-14  N   1996-10-15  NONEREG 
> AIR 952809  0.05200612  15 (1 time(s))
> ly final requests. boldly ironic theo 1995-09-04  0.072019.94 2   
> O   653380  2416094.0   2.0 1995-11-14  N   1995-10-18
>   COLLECT COD FOB 166101  0.02200612  15 (1 time(s))
> alongside of the even, e  1996-02-14  0.0786140.322   
> O   653409  5622872.0   48.01996-05-02  N   1996-04-22
>   NONESHIP372888  0.04200612  15 (1 time(s))
> es. regular instruct  1996-10-18  0.0725194.0 1   O   653382  
> 6048060.0   25.01996-08-29  N   1996-08-20  DELIVER IN 
> PERSON   AIR 798079  0.0 200612  15 (1 time(s))
> en package1993-09-19  0.0718718.322   F   653440  
> 1.372054E7  12.01993-09-12  A   1993-09-09  DELIVER IN 
> PERSON   TRUCK   970554  0.0 200612  15 (1 time(s))
> ly regular deposits snooze. unusual, even 1998-01-18  0.07
> 12427.921   O   653413  2822631.0   8.0 1998-02-09
>   N   1998-02-05  TAKE BACK RETURNREG AIR 322636  0.01
> 200612  15 (1 time(s))
>  ironic ideas. bra1996-10-13  0.0764711.533   O   
> 653383  6806672.0   41.01996-12-06  N   1996-11-10  TAKE 
> BACK RETURNAIR 556691  0.01200612  15 (1 time(s))
> {noformat}
> Here are the actual results:
> {noformat}
> 2006  12  15  653383  6806672 556691  3   41.064711.53
> 0.070.01N   O   1996-11-10  1996-10-13  1996-12-06
>   TAKE BACK RETURNAIR  ironic ideas. bra
> 2006  12  15  653378  16702792952809  2   3.0 5381.88 
> 0.070.05N   O   1996-10-15  1996-09-23  1996-11-14
>   NONEREG AIR carefully a
> 2006  12  15  653380  2416094 166101  2   2.0 2019.94 0.07
> 0.02N   O   1995-10-18  1995-09-04  1995-11-14  
> COLLECT COD FOB ly final requests. boldly ironic theo
> 2006  12  15  653413  2822631 322636  1   8.0 12427.92
> 0.070.0

[jira] [Commented] (DRILL-5879) Optimize "Like" operator

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219279#comment-16219279
 ] 

ASF GitHub Bot commented on DRILL-5879:
---

Github user sachouche commented on a diff in the pull request:

https://github.com/apache/drill/pull/1001#discussion_r146952578
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SqlPatternContainsMatcher.java
 ---
@@ -17,37 +17,166 @@
  */
 package org.apache.drill.exec.expr.fn.impl;
 
-public class SqlPatternContainsMatcher implements SqlPatternMatcher {
+public final class SqlPatternContainsMatcher implements SqlPatternMatcher {
   final String patternString;
   CharSequence charSequenceWrapper;
   final int patternLength;
+  final MatcherFcn matcherFcn;
 
   public SqlPatternContainsMatcher(String patternString, CharSequence 
charSequenceWrapper) {
-this.patternString = patternString;
+this.patternString   = patternString;
 this.charSequenceWrapper = charSequenceWrapper;
-patternLength = patternString.length();
+patternLength= patternString.length();
+
+// The idea is to write loops with simple condition checks to allow 
the Java Hotspot achieve
+// better optimizations (especially vectorization)
+if (patternLength == 1) {
+  matcherFcn = new Matcher1();
--- End diff --

I ran two types of tests to evaluate the generic vs custom method:
Test1 - A match does exist
o Custom method is 3x faster because the code will go to the "else" part
o A nested loop is always slower than unrolled code (the loop is also 
correlated with the outer one); please refer to this article 
(https://en.wikipedia.org/wiki/Loop_unrolling) on the benefits of loop unrolling
o Older match function performed in 59sec

Test2- A match doesn't exist
o Custom method and generic one perform in 15sec; this is because both 
perform a comparison and proceed to the next iteration
o Older match function performed in 45sec 


> Optimize "Like" operator
> 
>
> Key: DRILL-5879
> URL: https://issues.apache.org/jira/browse/DRILL-5879
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Relational Operators
> Environment: * 
>Reporter: salim achouche
>Assignee: salim achouche
>Priority: Minor
> Fix For: 1.12.0
>
>
> Query: select  from  where colA like '%a%' or colA like 
> '%xyz%';
> Improvement Opportunities
> # Avoid isAscii computation (full access of the input string) since we're 
> dealing with the same column twice
> # Optimize the "contains" for-loop 
> Implementation Details
> 1)
> * Added a new integer variable "asciiMode" to the VarCharHolder class
> * The default value is -1 which indicates this info is not known
> * Otherwise this value will be set to either 1 or 0 based on the string being 
> in ASCII mode or Unicode
> * The execution plan already shares the same VarCharHolder instance for all 
> evaluations of the same column value
> * The asciiMode will be correctly set during the first LIKE evaluation and 
> will be reused across other LIKE evaluations
> 2) 
> * The "Contains" LIKE operation is quite expensive as the code needs to 
> access the input string to perform character based comparisons
> * Created 4 versions of the same for-loop to a) make the loop simpler to 
> optimize (Vectorization) and b) minimize comparisons
> Benchmarks
> * Lineitem table 100GB
> * Query: select l_returnflag, count(*) from dfs.`` where l_comment 
> not like '%a%' or l_comment like '%the%' group by l_returnflag
> * Before changes: 33sec
> * After changes: 27sec



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5907) Cleanup Unit Test Output

2017-10-25 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-5907:
-

 Summary: Cleanup Unit Test Output
 Key: DRILL-5907
 URL: https://issues.apache.org/jira/browse/DRILL-5907
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: salim achouche






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5898) Query returns columns in the wrong order

2017-10-25 Thread Robert Hou (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219336#comment-16219336
 ] 

Robert Hou commented on DRILL-5898:
---

Boaz, you are correct.  The order shown in the output is not correct.  The test 
framework handles order by verification separately, so what you see is not the 
order in which the results are displayed.  We do this because on a distributed 
cluster, if a query does not have an order by clause (majority of our tests), 
then results can be returned in any order, and can differ from run to run.  So 
we can display the order of the results in various orders.

I ran the query through sqlline, and the order is displayed correctly.  If we 
detected an incorrect order, then a different error message would have appeared.



> Query returns columns in the wrong order
> 
>
> Key: DRILL-5898
> URL: https://issues.apache.org/jira/browse/DRILL-5898
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Robert Hou
>Priority: Blocker
> Fix For: 1.12.0
>
>
> This is a regression.  It worked with this commit:
> {noformat}
> f1d1945b3772bb782039fd6811e34a7de66441c8  DRILL-5582: C++ Client: [Threat 
> Modeling] Drillbit may be spoofed by an attacker and this may lead to data 
> being written to the attacker's target instead of Drillbit
> {noformat}
> It fails with this commit, although there are six commits total between the 
> last good one and this one:
> {noformat}
> b0c4e0486d6d4620b04a1bb8198e959d433b4840  DRILL-5876: Use openssl profile 
> to include netty-tcnative dependency with the platform specific classifier
> {noformat}
> Query is:
> {noformat}
> select * from 
> dfs.`/drill/testdata/tpch100_dir_partitioned_5files/lineitem` where 
> dir0=2006 and dir1=12 and dir2=15 and l_discount=0.07 order by l_orderkey, 
> l_extendedprice limit 10
> {noformat}
> Columns are returned in a different order.  Here are the expected results:
> {noformat}
> foxes. furiously final ideas cajol1994-05-27  0.071731.42 4   
> F   653442  4965666.0   1.0 1994-06-23  A   1994-06-22
>   NONESHIP215671  0.07200612  15 (1 time(s))
> lly final account 1994-11-09  0.0745881.783   F   
> 653412  1.320809E7  46.01994-11-24  R   1994-11-08  TAKE 
> BACK RETURNREG AIR 458104  0.08200612  15 (1 time(s))
>  the asymptotes   1997-12-29  0.0760882.8 6   O   653413  
> 1.4271413E7 44.01998-02-04  N   1998-01-20  DELIVER IN 
> PERSON   MAIL21456   0.05200612  15 (1 time(s))
> carefully a   1996-09-23  0.075381.88 2   O   653378  
> 1.6702792E7 3.0 1996-11-14  N   1996-10-15  NONEREG 
> AIR 952809  0.05200612  15 (1 time(s))
> ly final requests. boldly ironic theo 1995-09-04  0.072019.94 2   
> O   653380  2416094.0   2.0 1995-11-14  N   1995-10-18
>   COLLECT COD FOB 166101  0.02200612  15 (1 time(s))
> alongside of the even, e  1996-02-14  0.0786140.322   
> O   653409  5622872.0   48.01996-05-02  N   1996-04-22
>   NONESHIP372888  0.04200612  15 (1 time(s))
> es. regular instruct  1996-10-18  0.0725194.0 1   O   653382  
> 6048060.0   25.01996-08-29  N   1996-08-20  DELIVER IN 
> PERSON   AIR 798079  0.0 200612  15 (1 time(s))
> en package1993-09-19  0.0718718.322   F   653440  
> 1.372054E7  12.01993-09-12  A   1993-09-09  DELIVER IN 
> PERSON   TRUCK   970554  0.0 200612  15 (1 time(s))
> ly regular deposits snooze. unusual, even 1998-01-18  0.07
> 12427.921   O   653413  2822631.0   8.0 1998-02-09
>   N   1998-02-05  TAKE BACK RETURNREG AIR 322636  0.01
> 200612  15 (1 time(s))
>  ironic ideas. bra1996-10-13  0.0764711.533   O   
> 653383  6806672.0   41.01996-12-06  N   1996-11-10  TAKE 
> BACK RETURNAIR 556691  0.01200612  15 (1 time(s))
> {noformat}
> Here are the actual results:
> {noformat}
> 2006  12  15  653383  6806672 556691  3   41.064711.53
> 0.070.01N   O   1996-11-10  1996-10-13  1996-12-06
>   TAKE BACK RETURNAIR  ironic ideas. bra
> 2006  12  15  653378  16702792952809  2   3.0 5381.88 
> 0.070.05N   O   1996-10-15  1996-

[jira] [Created] (DRILL-5908) Regression: Query intermittently may fail with error "Waited for 15000ms, but tasks for 'Get block maps' are not complete."

2017-10-25 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5908:
-

 Summary: Regression: Query intermittently may fail with error 
"Waited for 15000ms, but tasks for 'Get block maps' are not complete."
 Key: DRILL-5908
 URL: https://issues.apache.org/jira/browse/DRILL-5908
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Pritesh Maker


This is from the Functional-Baseline-88.193 Jenkins run.  The test is in the 
Functional test suite, 
partition_pruning/dfs/csv/plan/csvselectpartormultiplewithdir_MD-185.q

Query is:
{noformat}
explain plan for select 
columns[0],columns[1],columns[4],columns[10],columns[13],dir0 from 
`/drill/testdata/partition_pruning/dfs/lineitempart` where (dir0=1993 and 
columns[0]>29600) or (dir0=1994 and columns[0]>29700)
{noformat}

The error is:
{noformat}
Failed with exception
java.sql.SQLException: RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get 
block maps' are not complete. Total runnable size 2, parallelism 2.


[Error Id: ab911277-36cb-465c-a9aa-8e3d21bcc09c on atsqa4-195.qa.lab:31010]


at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:181)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:110)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:224)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:136)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:748)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps' are not 
complete. Total runnable size 2, parallelism 2.


[Error Id: ab911277-36cb-465c-a9aa-8e3d21bcc09c on atsqa4-195.qa.lab:31010]


at 
oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:465)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:102)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at 
oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.f

[jira] [Updated] (DRILL-5908) Regression: Query intermittently may fail with error "Waited for 15000ms, but tasks for 'Get block maps' are not complete."

2017-10-25 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5908:
--
Attachment: 260f8e8b-8b7b-12ef-0752-c178de03d7c7.sys.drill
drillbit.log

> Regression: Query intermittently may fail with error "Waited for 15000ms, but 
> tasks for 'Get block maps' are not complete."
> ---
>
> Key: DRILL-5908
> URL: https://issues.apache.org/jira/browse/DRILL-5908
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Pritesh Maker
> Attachments: 260f8e8b-8b7b-12ef-0752-c178de03d7c7.sys.drill, 
> drillbit.log
>
>
> This is from the Functional-Baseline-88.193 Jenkins run.  The test is in the 
> Functional test suite, 
> partition_pruning/dfs/csv/plan/csvselectpartormultiplewithdir_MD-185.q
> Query is:
> {noformat}
> explain plan for select 
> columns[0],columns[1],columns[4],columns[10],columns[13],dir0 from 
> `/drill/testdata/partition_pruning/dfs/lineitempart` where (dir0=1993 and 
> columns[0]>29600) or (dir0=1994 and columns[0]>29700)
> {noformat}
> The error is:
> {noformat}
> Failed with exception
> java.sql.SQLException: RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get 
> block maps' are not complete. Total runnable size 2, parallelism 2.
> [Error Id: ab911277-36cb-465c-a9aa-8e3d21bcc09c on atsqa4-195.qa.lab:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
>   at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
>   at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:181)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:110)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:224)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:136)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
> RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps' are not 
> complete. Total runnable size 2, parallelism 2.
> [Error Id: ab911277-36cb-465c-a9aa-8e3d21bcc09c on atsqa4-195.qa.lab:31010]
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:465)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:102)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>   at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>   at 
> oadd.io.netty.c

[jira] [Commented] (DRILL-5906) java.lang.NullPointerException while quering Hive ORC tables on MapR cluster.

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219629#comment-16219629
 ] 

ASF GitHub Bot commented on DRILL-5906:
---

GitHub user vdiravka opened a pull request:

https://github.com/apache/drill/pull/1010

DRILL-5906: java.lang.NullPointerException while quering Hive ORC tables on 
MapR cluster

- Upgrade drill to 1.2.0-mapr-1707 hive.version.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vdiravka/drill DRILL-5906

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1010.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1010


commit 9a646bcfa7bf572d1428cd10c7ba80b4320af039
Author: Vitalii Diravka 
Date:   2017-10-25T17:59:09Z

DRILL-5906: java.lang.NullPointerException while quering Hive ORC tables on 
MapR cluster

- Upgrade drill to 1.2.0-mapr-1707 hive.version.




> java.lang.NullPointerException while quering Hive ORC tables on MapR cluster. 
> --
>
> Key: DRILL-5906
> URL: https://issues.apache.org/jira/browse/DRILL-5906
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.12.0
>
>
> Record reader throw an exception when trying to read an empty split.
> Possible fix: upgrade drill to 
> [1.2.0-mapr-1707|https://maprdocs.mapr.com/52/EcosystemRN/HiveRN-1.2.1-1707.html]
>  hive.version, where this issue was fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5908) Regression: Query intermittently may fail with error "Waited for 15000ms, but tasks for 'Get block maps' are not complete."

2017-10-25 Thread Pritesh Maker (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-5908:


Assignee: Vlad Rozov  (was: Pritesh Maker)

> Regression: Query intermittently may fail with error "Waited for 15000ms, but 
> tasks for 'Get block maps' are not complete."
> ---
>
> Key: DRILL-5908
> URL: https://issues.apache.org/jira/browse/DRILL-5908
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Vlad Rozov
> Attachments: 260f8e8b-8b7b-12ef-0752-c178de03d7c7.sys.drill, 
> drillbit.log
>
>
> This is from the Functional-Baseline-88.193 Jenkins run.  The test is in the 
> Functional test suite, 
> partition_pruning/dfs/csv/plan/csvselectpartormultiplewithdir_MD-185.q
> Query is:
> {noformat}
> explain plan for select 
> columns[0],columns[1],columns[4],columns[10],columns[13],dir0 from 
> `/drill/testdata/partition_pruning/dfs/lineitempart` where (dir0=1993 and 
> columns[0]>29600) or (dir0=1994 and columns[0]>29700)
> {noformat}
> The error is:
> {noformat}
> Failed with exception
> java.sql.SQLException: RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get 
> block maps' are not complete. Total runnable size 2, parallelism 2.
> [Error Id: ab911277-36cb-465c-a9aa-8e3d21bcc09c on atsqa4-195.qa.lab:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
>   at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
>   at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:181)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:110)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:224)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:136)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
> RESOURCE ERROR: Waited for 15000ms, but tasks for 'Get block maps' are not 
> complete. Total runnable size 2, parallelism 2.
> [Error Id: ab911277-36cb-465c-a9aa-8e3d21bcc09c on atsqa4-195.qa.lab:31010]
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:465)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:102)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>   at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContex

[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219691#comment-16219691
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147007663
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -19,24 +19,33 @@
 
 import org.apache.drill.categories.OperatorTest;
 import org.apache.drill.categories.SqlTest;
+import org.apache.drill.test.BaseTestQuery;
+import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 @Category({SqlTest.class, OperatorTest.class})
-public class TestAltSortQueries extends BaseTestQuery{
+public class TestAltSortQueries extends BaseTestQuery {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TestAltSortQueries.class);
 
+  @BeforeClass
+  public static void setupTestFiles() {
+dirTestWatcher.copyFileToRoot("sample-data/region.parquet");
+dirTestWatcher.copyFileToRoot("sample-data/regionsSF");
+dirTestWatcher.copyFileToRoot("sample-data/nation.parquet");
+  }
+
   @Test
   public void testOrderBy() throws Exception{
 test("select R_REGIONKEY " +
- "from dfs_test.`[WORKING_PATH]/../../sample-data/region.parquet` 
" +
+ "from dfs_test.`/sample-data/region.parquet` " +
--- End diff --

I have now removed **dfs_test** completely. There was no reason for it to 
be added and it was inconsistently being mixed with **dfs**. The **dfs** 
workspaces are automatically mapped to the correct temp directories for you 
provided that you use **BaseTestQuery** or the **ClusterFixture**. I will 
update **org.apache.drill.test.package-info.java** with the theory of how this 
works and will add a simple example to **ExampleTest.java**


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219693#comment-16219693
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147007945
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestAltSortQueries.java ---
@@ -64,9 +73,9 @@ public void testJoinWithLimit() throws Exception{
 "  nations.N_NAME,\n" +
 "  regions.R_NAME\n" +
 "FROM\n" +
-"  dfs_test.`[WORKING_PATH]/../../sample-data/nation.parquet` 
nations\n" +
+"  dfs.`/sample-data/nation.parquet` nations\n" +
--- End diff --

Just mentioned this above but will repeat here.I have now removed 
**dfs_test** completely. There was no reason for it to be added and it was 
inconsistently being mixed with **dfs**. If you want to query a file on the 
local filesystem that is not on the classpath just using **dfs** will be 
sufficient now.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (DRILL-5211) Queries fail due to direct memory fragmentation

2017-10-25 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5211:
--

Assignee: Paul Rogers  (was: Paul Rogers)

> Queries fail due to direct memory fragmentation
> ---
>
> Key: DRILL-5211
> URL: https://issues.apache.org/jira/browse/DRILL-5211
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Attachments: ApacheDrillMemoryFragmentationBackground.pdf, 
> ApacheDrillVectorSizeLimits.pdf, EnhancedScanOperator.pdf, 
> ScanSchemaManagement.pdf
>
>
> Consider a test of the external sort as follows:
> * Direct memory: 3GB
> * Input file: 18 GB, with one Varchar column of 8K width
> The sort runs, spilling to disk. Once all data arrives, the sort beings to 
> merge the results. But, to do that, it must first do an intermediate merge. 
> For example, in this sort, there are 190 spill files, but only 19 can be 
> merged at a time. (Each merge file contains 128 MB batches, and only 19 can 
> fit in memory, giving a total footprint of 2.5 GB, well below the 3 GB limit.
> Yet, when loading batch xx, Drill fails with an OOM error. At that point, 
> total available direct memory is 3,817,865,216. (Obtained from {{maxMemory}} 
> in the {{Bits}} class in the JDK.)
> It appears that Drill wants to allocate 58,257,868 bytes, but the 
> {{totalCapacity}} (again in {{Bits}}) is already 3,800,769,206, causing an 
> OOM.
> The problem is that, at this point, the external sort should not ask the 
> system for more memory. The allocator for the external sort is at just 
> 1,192,350,366 before the allocation request. Plenty of spare memory should be 
> available, released when the in-memory batches were spilled to disk prior to 
> merging. Indeed, earlier in the run, the sort had reached a peak memory usage 
> of 2,710,716,416 bytes. This memory should be available for reuse during 
> merging, and is plenty sufficient to fill the particular request in question.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219695#comment-16219695
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147008923
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestCTASPartitionFilter.java ---
@@ -59,48 +58,48 @@ public void withDistribution() throws Exception {
 test("alter session set `planner.slice_target` = 1");
 test("alter session set `store.partition.hash_distribute` = true");
 test("use dfs_test.tmp");
-test(String.format("create table orders_distribution partition by 
(o_orderpriority) as select * from dfs_test.`%s/multilevel/parquet`", 
TEST_RES_PATH));
+test("create table orders_distribution partition by (o_orderpriority) 
as select * from dfs_test.`/multilevel/parquet`");
 String query = "select * from orders_distribution where 
o_orderpriority = '1-URGENT'";
-testExcludeFilter(query, 1, "Filter", 24);
+testExcludeFilter(query, 1, "Filter\\(", 24);
--- End diff --

It is no longer sufficient to match "Filter" because the test class name 
contains "Filter" and the test class name is used to create the tmp directory. 
And the fully qualified path of a queried file is included in the plan. We want 
to only match the Filter steps generated in the plan, not the 
Filters in our file paths. In order to do this I tell it to match 
"Filter(" which corresponds to a filter step in the plan.


> Make code generation in the TopN operator more modular and test it
> --
>
> Key: DRILL-5783
> URL: https://issues.apache.org/jira/browse/DRILL-5783
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5783) Make code generation in the TopN operator more modular and test it

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219696#comment-16219696
 ] 

ASF GitHub Bot commented on DRILL-5783:
---

Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/984#discussion_r147009092
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TopN/TopNBatchTest.java
 ---
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.TopN;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Random;
+
+import com.google.common.collect.Lists;
+import org.apache.drill.test.TestBuilder;
+import org.apache.drill.categories.OperatorTest;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.logical.data.Order;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.compile.ClassBuilder;
+import org.apache.drill.exec.compile.CodeCompiler;
+import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry;
+import org.apache.drill.exec.memory.RootAllocator;
+import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
+import org.apache.drill.exec.pop.PopUnitTestBase;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.ExpandableHyperContainer;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.server.options.OptionSet;
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.FixtureBuilder;
+import org.apache.drill.test.OperatorFixture;
+import org.apache.drill.test.BatchUtils;
+import org.apache.drill.test.DirTestWatcher;
+import org.apache.drill.test.rowSet.RowSetBuilder;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(OperatorTest.class)
+public class TopNBatchTest extends PopUnitTestBase {
+  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(TopNBatchTest.class);
+
+  // Allows you to look at generated code after tests execute
+  @Rule
+  public DirTestWatcher dirTestWatcher = new DirTestWatcher(false);
+
+  /**
+   * Priority queue unit test.
+   * @throws Exception
+   */
+  @Test
+  public void priorityQueueOrderingTest() throws Exception {
+Properties properties = new Properties();
+properties.setProperty(ClassBuilder.CODE_DIR_OPTION, 
dirTestWatcher.getDirPath());
+
+DrillConfig drillConfig = DrillConfig.create(properties);
+OptionSet optionSet = new OperatorFixture.TestOptionSet();
+
+FieldReference expr = FieldReference.getWithQuotedRef("colA");
+Order.Ordering ordering = new 
Order.Ordering(Order.Ordering.ORDER_DESC, expr, Order.Ordering.NULLS_FIRST);
+List orderings = Lists.newArrayList(ordering);
+
+MaterializedField colA = MaterializedField.create("colA", 
Types.required(TypeProtos.MinorType.INT));
+MaterializedField colB = MaterializedField.create("colB", 
Types.required(TypeProtos.MinorType.INT));
+
+List cols = Lists.newArrayList(colA, colB);
+BatchSchema batchSchema = new 
BatchSchema(BatchSchema.SelectionVectorMode.NONE, cols);
+
+try (RootAllocator allocator = new RootAllocator(100_000_000)) {
+  VectorContainer expectedVectors = new RowSetBuilder(allocator, 
batchSchema)
+.add(110, 10)
+.add(109, 9)
+.add(108, 8)
+.add(107, 7)
+.add(106, 6)
+.add(105, 5)
+.add(104, 4)
+.add(103, 3)
+.add(102, 2

[jira] [Resolved] (DRILL-5898) Query returns columns in the wrong order

2017-10-25 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5898.
---
Resolution: Fixed

Updated expected results file.

> Query returns columns in the wrong order
> 
>
> Key: DRILL-5898
> URL: https://issues.apache.org/jira/browse/DRILL-5898
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Robert Hou
>Priority: Blocker
> Fix For: 1.12.0
>
>
> This is a regression.  It worked with this commit:
> {noformat}
> f1d1945b3772bb782039fd6811e34a7de66441c8  DRILL-5582: C++ Client: [Threat 
> Modeling] Drillbit may be spoofed by an attacker and this may lead to data 
> being written to the attacker's target instead of Drillbit
> {noformat}
> It fails with this commit, although there are six commits total between the 
> last good one and this one:
> {noformat}
> b0c4e0486d6d4620b04a1bb8198e959d433b4840  DRILL-5876: Use openssl profile 
> to include netty-tcnative dependency with the platform specific classifier
> {noformat}
> Query is:
> {noformat}
> select * from 
> dfs.`/drill/testdata/tpch100_dir_partitioned_5files/lineitem` where 
> dir0=2006 and dir1=12 and dir2=15 and l_discount=0.07 order by l_orderkey, 
> l_extendedprice limit 10
> {noformat}
> Columns are returned in a different order.  Here are the expected results:
> {noformat}
> foxes. furiously final ideas cajol1994-05-27  0.071731.42 4   
> F   653442  4965666.0   1.0 1994-06-23  A   1994-06-22
>   NONESHIP215671  0.07200612  15 (1 time(s))
> lly final account 1994-11-09  0.0745881.783   F   
> 653412  1.320809E7  46.01994-11-24  R   1994-11-08  TAKE 
> BACK RETURNREG AIR 458104  0.08200612  15 (1 time(s))
>  the asymptotes   1997-12-29  0.0760882.8 6   O   653413  
> 1.4271413E7 44.01998-02-04  N   1998-01-20  DELIVER IN 
> PERSON   MAIL21456   0.05200612  15 (1 time(s))
> carefully a   1996-09-23  0.075381.88 2   O   653378  
> 1.6702792E7 3.0 1996-11-14  N   1996-10-15  NONEREG 
> AIR 952809  0.05200612  15 (1 time(s))
> ly final requests. boldly ironic theo 1995-09-04  0.072019.94 2   
> O   653380  2416094.0   2.0 1995-11-14  N   1995-10-18
>   COLLECT COD FOB 166101  0.02200612  15 (1 time(s))
> alongside of the even, e  1996-02-14  0.0786140.322   
> O   653409  5622872.0   48.01996-05-02  N   1996-04-22
>   NONESHIP372888  0.04200612  15 (1 time(s))
> es. regular instruct  1996-10-18  0.0725194.0 1   O   653382  
> 6048060.0   25.01996-08-29  N   1996-08-20  DELIVER IN 
> PERSON   AIR 798079  0.0 200612  15 (1 time(s))
> en package1993-09-19  0.0718718.322   F   653440  
> 1.372054E7  12.01993-09-12  A   1993-09-09  DELIVER IN 
> PERSON   TRUCK   970554  0.0 200612  15 (1 time(s))
> ly regular deposits snooze. unusual, even 1998-01-18  0.07
> 12427.921   O   653413  2822631.0   8.0 1998-02-09
>   N   1998-02-05  TAKE BACK RETURNREG AIR 322636  0.01
> 200612  15 (1 time(s))
>  ironic ideas. bra1996-10-13  0.0764711.533   O   
> 653383  6806672.0   41.01996-12-06  N   1996-11-10  TAKE 
> BACK RETURNAIR 556691  0.01200612  15 (1 time(s))
> {noformat}
> Here are the actual results:
> {noformat}
> 2006  12  15  653383  6806672 556691  3   41.064711.53
> 0.070.01N   O   1996-11-10  1996-10-13  1996-12-06
>   TAKE BACK RETURNAIR  ironic ideas. bra
> 2006  12  15  653378  16702792952809  2   3.0 5381.88 
> 0.070.05N   O   1996-10-15  1996-09-23  1996-11-14
>   NONEREG AIR carefully a
> 2006  12  15  653380  2416094 166101  2   2.0 2019.94 0.07
> 0.02N   O   1995-10-18  1995-09-04  1995-11-14  
> COLLECT COD FOB ly final requests. boldly ironic theo
> 2006  12  15  653413  2822631 322636  1   8.0 12427.92
> 0.070.01N   O   1998-02-05  1998-01-18  1998-02-09
>   TAKE BACK RETURNREG AIR ly regular deposits snooze. unusual, even 
> 2006  12  15  653382  6048060 798079  1   25.025194.0 0.07
> 0.0 N   O 

[jira] [Closed] (DRILL-5898) Query returns columns in the wrong order

2017-10-25 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou closed DRILL-5898.
-

> Query returns columns in the wrong order
> 
>
> Key: DRILL-5898
> URL: https://issues.apache.org/jira/browse/DRILL-5898
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Robert Hou
>Priority: Blocker
> Fix For: 1.12.0
>
>
> This is a regression.  It worked with this commit:
> {noformat}
> f1d1945b3772bb782039fd6811e34a7de66441c8  DRILL-5582: C++ Client: [Threat 
> Modeling] Drillbit may be spoofed by an attacker and this may lead to data 
> being written to the attacker's target instead of Drillbit
> {noformat}
> It fails with this commit, although there are six commits total between the 
> last good one and this one:
> {noformat}
> b0c4e0486d6d4620b04a1bb8198e959d433b4840  DRILL-5876: Use openssl profile 
> to include netty-tcnative dependency with the platform specific classifier
> {noformat}
> Query is:
> {noformat}
> select * from 
> dfs.`/drill/testdata/tpch100_dir_partitioned_5files/lineitem` where 
> dir0=2006 and dir1=12 and dir2=15 and l_discount=0.07 order by l_orderkey, 
> l_extendedprice limit 10
> {noformat}
> Columns are returned in a different order.  Here are the expected results:
> {noformat}
> foxes. furiously final ideas cajol1994-05-27  0.071731.42 4   
> F   653442  4965666.0   1.0 1994-06-23  A   1994-06-22
>   NONESHIP215671  0.07200612  15 (1 time(s))
> lly final account 1994-11-09  0.0745881.783   F   
> 653412  1.320809E7  46.01994-11-24  R   1994-11-08  TAKE 
> BACK RETURNREG AIR 458104  0.08200612  15 (1 time(s))
>  the asymptotes   1997-12-29  0.0760882.8 6   O   653413  
> 1.4271413E7 44.01998-02-04  N   1998-01-20  DELIVER IN 
> PERSON   MAIL21456   0.05200612  15 (1 time(s))
> carefully a   1996-09-23  0.075381.88 2   O   653378  
> 1.6702792E7 3.0 1996-11-14  N   1996-10-15  NONEREG 
> AIR 952809  0.05200612  15 (1 time(s))
> ly final requests. boldly ironic theo 1995-09-04  0.072019.94 2   
> O   653380  2416094.0   2.0 1995-11-14  N   1995-10-18
>   COLLECT COD FOB 166101  0.02200612  15 (1 time(s))
> alongside of the even, e  1996-02-14  0.0786140.322   
> O   653409  5622872.0   48.01996-05-02  N   1996-04-22
>   NONESHIP372888  0.04200612  15 (1 time(s))
> es. regular instruct  1996-10-18  0.0725194.0 1   O   653382  
> 6048060.0   25.01996-08-29  N   1996-08-20  DELIVER IN 
> PERSON   AIR 798079  0.0 200612  15 (1 time(s))
> en package1993-09-19  0.0718718.322   F   653440  
> 1.372054E7  12.01993-09-12  A   1993-09-09  DELIVER IN 
> PERSON   TRUCK   970554  0.0 200612  15 (1 time(s))
> ly regular deposits snooze. unusual, even 1998-01-18  0.07
> 12427.921   O   653413  2822631.0   8.0 1998-02-09
>   N   1998-02-05  TAKE BACK RETURNREG AIR 322636  0.01
> 200612  15 (1 time(s))
>  ironic ideas. bra1996-10-13  0.0764711.533   O   
> 653383  6806672.0   41.01996-12-06  N   1996-11-10  TAKE 
> BACK RETURNAIR 556691  0.01200612  15 (1 time(s))
> {noformat}
> Here are the actual results:
> {noformat}
> 2006  12  15  653383  6806672 556691  3   41.064711.53
> 0.070.01N   O   1996-11-10  1996-10-13  1996-12-06
>   TAKE BACK RETURNAIR  ironic ideas. bra
> 2006  12  15  653378  16702792952809  2   3.0 5381.88 
> 0.070.05N   O   1996-10-15  1996-09-23  1996-11-14
>   NONEREG AIR carefully a
> 2006  12  15  653380  2416094 166101  2   2.0 2019.94 0.07
> 0.02N   O   1995-10-18  1995-09-04  1995-11-14  
> COLLECT COD FOB ly final requests. boldly ironic theo
> 2006  12  15  653413  2822631 322636  1   8.0 12427.92
> 0.070.01N   O   1998-02-05  1998-01-18  1998-02-09
>   TAKE BACK RETURNREG AIR ly regular deposits snooze. unusual, even 
> 2006  12  15  653382  6048060 798079  1   25.025194.0 0.07
> 0.0 N   O   1996-08-20  1996-10-18  1996-08-29  
> DELIV

[jira] [Commented] (DRILL-5887) Display process user/ groups in Drill UI

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219760#comment-16219760
 ] 

ASF GitHub Bot commented on DRILL-5887:
---

Github user prasadns14 commented on a diff in the pull request:

https://github.com/apache/drill/pull/998#discussion_r147018186
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/DrillRoot.java 
---
@@ -85,19 +86,24 @@ public ClusterInfo getClusterInfoJSON() {
 // For all other cases the user info need-not or should-not be 
displayed
 OptionManager optionManager = work.getContext().getOptionManager();
 final boolean isUserLoggedIn = AuthDynamicFeature.isUserLoggedIn(sc);
-String adminUsers = isUserLoggedIn ?
-
ExecConstants.ADMIN_USERS_VALIDATOR.getAdminUsers(optionManager) : null;
-String adminUserGroups = isUserLoggedIn ?
-
ExecConstants.ADMIN_USER_GROUPS_VALIDATOR.getAdminUserGroups(optionManager) : 
null;
+final String processUser = ImpersonationUtil.getProcessUserName();
+final String processUserGroups = Joiner.on(", 
").join(ImpersonationUtil.getProcessUserGroupNames());
+String adminUsers = 
ExecConstants.ADMIN_USERS_VALIDATOR.getAdminUsers(optionManager);
+String adminUserGroups = 
ExecConstants.ADMIN_USER_GROUPS_VALIDATOR.getAdminUserGroups(optionManager);
 
 // separate groups by comma + space
-if (adminUsers != null) {
+if (adminUsers.length() == 0) {
--- End diff --

Made the changes. @arina-ielchiieva please review


> Display process user/ groups in Drill UI
> 
>
> Key: DRILL-5887
> URL: https://issues.apache.org/jira/browse/DRILL-5887
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
>Priority: Minor
> Fix For: 1.12.0
>
>
> Drill UI only lists admin user/ groups specified as options
> We should display the process user/ groups who have admin privilege



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5905) Exclude jdk-tools from project dependencies

2017-10-25 Thread Vlad Rozov (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Rozov updated DRILL-5905:
--
Reviewer: Paul Rogers

> Exclude jdk-tools from project dependencies
> ---
>
> Key: DRILL-5905
> URL: https://issues.apache.org/jira/browse/DRILL-5905
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Reporter: Vlad Rozov
>Assignee: Vlad Rozov
>Priority: Minor
>
> hadoop-annotations and hbase-annotations have system scope dependency on JDK 
> tools.jar. This dependency is provided by JDK and should be excluded from the 
> project dependencies



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5896) Handle vector creation in HbaseRecordReader to avoid NullableInt vectors later

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219951#comment-16219951
 ] 

ASF GitHub Bot commented on DRILL-5896:
---

Github user prasadns14 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1005#discussion_r147042347
  
--- Diff: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
 ---
@@ -75,6 +75,8 @@
 
   private TableName hbaseTableName;
   private Scan hbaseScan;
+  private Scan hbaseScan1;
+  Set completeFamilies;
--- End diff --

Fixed


> Handle vector creation in HbaseRecordReader to avoid NullableInt vectors later
> --
>
> Key: DRILL-5896
> URL: https://issues.apache.org/jira/browse/DRILL-5896
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
> Fix For: 1.12.0
>
>
> When a hbase query projects both a column family and a column in the column 
> family, the vector for the column is not created in the HbaseRecordReader.
> So, in cases where scan batch is empty we create a NullableInt vector for 
> this column. We need to handle column creation in the reader.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5896) Handle vector creation in HbaseRecordReader to avoid NullableInt vectors later

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219953#comment-16219953
 ] 

ASF GitHub Bot commented on DRILL-5896:
---

Github user prasadns14 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1005#discussion_r147042366
  
--- Diff: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
 ---
@@ -121,16 +125,18 @@ public HBaseRecordReader(Connection connection, 
HBaseSubScan.HBaseSubScanSpec su
 byte[] family = root.getPath().getBytes();
 transformed.add(SchemaPath.getSimplePath(root.getPath()));
 PathSegment child = root.getChild();
-if (!completeFamilies.contains(new String(family, 
StandardCharsets.UTF_8).toLowerCase())) {
-  if (child != null && child.isNamed()) {
-byte[] qualifier = child.getNameSegment().getPath().getBytes();
+if (child != null && child.isNamed()) {
+  byte[] qualifier = child.getNameSegment().getPath().getBytes();
+  hbaseScan1.addColumn(family, qualifier);
+  if (!completeFamilies.contains(new String(family, 
StandardCharsets.UTF_8))) {
--- End diff --

Fixed


> Handle vector creation in HbaseRecordReader to avoid NullableInt vectors later
> --
>
> Key: DRILL-5896
> URL: https://issues.apache.org/jira/browse/DRILL-5896
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
> Fix For: 1.12.0
>
>
> When a hbase query projects both a column family and a column in the column 
> family, the vector for the column is not created in the HbaseRecordReader.
> So, in cases where scan batch is empty we create a NullableInt vector for 
> this column. We need to handle column creation in the reader.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5896) Handle vector creation in HbaseRecordReader to avoid NullableInt vectors later

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219952#comment-16219952
 ] 

ASF GitHub Bot commented on DRILL-5896:
---

Github user prasadns14 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1005#discussion_r147042356
  
--- Diff: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
 ---
@@ -87,6 +89,7 @@ public HBaseRecordReader(Connection connection, 
HBaseSubScan.HBaseSubScanSpec su
 hbaseTableName = TableName.valueOf(
 Preconditions.checkNotNull(subScanSpec, "HBase reader needs a 
sub-scan spec").getTableName());
 hbaseScan = new Scan(subScanSpec.getStartRow(), 
subScanSpec.getStopRow());
+hbaseScan1 = new Scan();
--- End diff --

Fixed


> Handle vector creation in HbaseRecordReader to avoid NullableInt vectors later
> --
>
> Key: DRILL-5896
> URL: https://issues.apache.org/jira/browse/DRILL-5896
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
> Fix For: 1.12.0
>
>
> When a hbase query projects both a column family and a column in the column 
> family, the vector for the column is not created in the HbaseRecordReader.
> So, in cases where scan batch is empty we create a NullableInt vector for 
> this column. We need to handle column creation in the reader.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5909) need new JMX metrics for (FAILED and CANCELED) queries

2017-10-25 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5909:
-

 Summary: need new JMX metrics for (FAILED and CANCELED) queries
 Key: DRILL-5909
 URL: https://issues.apache.org/jira/browse/DRILL-5909
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Monitoring
Affects Versions: 1.11.0, 1.12.0
Reporter: Khurram Faraaz


we have these JMX metrics today

{noformat}
drill.queries.running
drill.queries.completed
{noformat}

we need these new JMX metrics

{noformat}
drill.queries.failed
drill.queries.canceled
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5896) Handle vector creation in HbaseRecordReader to avoid NullableInt vectors later

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219954#comment-16219954
 ] 

ASF GitHub Bot commented on DRILL-5896:
---

Github user prasadns14 commented on a diff in the pull request:

https://github.com/apache/drill/pull/1005#discussion_r147042924
  
--- Diff: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
 ---
@@ -186,6 +192,10 @@ public void setup(OperatorContext context, 
OutputMutator output) throws Executio
   }
 }
   }
+
+  for (String familyName : completeFamilies) {
+getOrCreateFamilyVector(familyName, false);
+  }
--- End diff --

It creates only the map vector


> Handle vector creation in HbaseRecordReader to avoid NullableInt vectors later
> --
>
> Key: DRILL-5896
> URL: https://issues.apache.org/jira/browse/DRILL-5896
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Prasad Nagaraj Subramanya
> Fix For: 1.12.0
>
>
> When a hbase query projects both a column family and a column in the column 
> family, the vector for the column is not created in the HbaseRecordReader.
> So, in cases where scan batch is empty we create a NullableInt vector for 
> this column. We need to handle column creation in the reader.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5211) Queries fail due to direct memory fragmentation

2017-10-25 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5211:
---
Attachment: BatchSizeControl-AsBuilt.pdf

Attached and updated, "as-built" version of the design specification. This 
outlines the major components of the solution from the vector accessor layer 
though the result set loader and scan operator levels. The sections on other 
operators are still in the design stage.

> Queries fail due to direct memory fragmentation
> ---
>
> Key: DRILL-5211
> URL: https://issues.apache.org/jira/browse/DRILL-5211
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Attachments: ApacheDrillMemoryFragmentationBackground.pdf, 
> ApacheDrillVectorSizeLimits.pdf, BatchSizeControl-AsBuilt.pdf, 
> EnhancedScanOperator.pdf, ScanSchemaManagement.pdf
>
>
> Consider a test of the external sort as follows:
> * Direct memory: 3GB
> * Input file: 18 GB, with one Varchar column of 8K width
> The sort runs, spilling to disk. Once all data arrives, the sort beings to 
> merge the results. But, to do that, it must first do an intermediate merge. 
> For example, in this sort, there are 190 spill files, but only 19 can be 
> merged at a time. (Each merge file contains 128 MB batches, and only 19 can 
> fit in memory, giving a total footprint of 2.5 GB, well below the 3 GB limit.
> Yet, when loading batch xx, Drill fails with an OOM error. At that point, 
> total available direct memory is 3,817,865,216. (Obtained from {{maxMemory}} 
> in the {{Bits}} class in the JDK.)
> It appears that Drill wants to allocate 58,257,868 bytes, but the 
> {{totalCapacity}} (again in {{Bits}}) is already 3,800,769,206, causing an 
> OOM.
> The problem is that, at this point, the external sort should not ask the 
> system for more memory. The allocator for the external sort is at just 
> 1,192,350,366 before the allocation request. Plenty of spare memory should be 
> available, released when the in-memory batches were spilled to disk prior to 
> merging. Indeed, earlier in the run, the sort had reached a peak memory usage 
> of 2,710,716,416 bytes. This memory should be available for reuse during 
> merging, and is plenty sufficient to fill the particular request in question.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)