[jira] [Updated] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-27 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-5258:

Affects Version/s: (was: 1.10)
   1.10.0

> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5258) Allow "extended" mock tables access from SQL queries

2017-02-27 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-5258:

Fix Version/s: (was: 1.10)
   1.10.0

> Allow "extended" mock tables access from SQL queries
> 
>
> Key: DRILL-5258
> URL: https://issues.apache.org/jira/browse/DRILL-5258
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> DRILL-5152 provided a simple way to generate sample data in SQL using a new, 
> simplified version of the mock data generator. This approach is very 
> convenient, but is inherently limited. For example, the limited syntax 
> available in SQL does not encoding much information about columns such as 
> repeat count, data generator or so on. The simple SQL approach does not allow 
> generating multiple groups of data.
> However, all these features are present in the original mock data source via 
> a special JSON configuration file. Previously, only physical plans could 
> access that extended syntax.
> This ticket requests a SQL interface to the extended mock data source:
> {code}
> SELECT * FROM `mock`.`example/mock-options.json`
> {code}
> Mock data source options are always stored as a JSON file. Since the existing 
> mock data generator for SQL never uses JSON files, a simple rule is that if 
> the table name ends in ".json" then it is a specification, else the 
> information is encoded in table and column names.
> The format of the data generation syntax is documented in the mock data 
> source classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5287) Provide option to skip updates of ephemeral state changes in Zookeeper

2017-02-27 Thread Kunal Khatua (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua updated DRILL-5287:

Fix Version/s: (was: 1.10)
   1.10.0

> Provide option to skip updates of ephemeral state changes in Zookeeper
> --
>
> Key: DRILL-5287
> URL: https://issues.apache.org/jira/browse/DRILL-5287
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.10.0
>
>
> We put transient profiles in zookeeper and update state as query progresses 
> and changes states. It is observed that this adds latency of ~45msec for each 
> update in the query execution path. This gets even worse when high number of 
> concurrent queries are in progress. For concurrency=100, the average query 
> response time even for short queries  is 8 sec vs 0.2 sec with these updates 
> disabled. For short lived queries in a high-throughput scenario, it is of no 
> value to update state changes in zookeeper. We need an option to disable 
> these updates for short running operational queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-462) Periodic connection failure with Drillbit

2017-02-27 Thread prasann...@trinitymobility.com (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

prasann...@trinitymobility.com updated DRILL-462:
-

Currently i am using apache drill 1.9.0. Now I am also facing same issue,my
java application is using drill for query purpose.But periodically
disconnects with drillbit service after few queries executions. If we use
jps it is showing drillbit service is running.I am not find the cause,am i
missing anything in the flow.I am getting logs as,

 

ERROR 2017-02-28 10:12 (http-nio-8098-exec-5)
org.trinity.social.dao.SocialDataDaoImpl  StatementCallback; uncategorized
SQLException for SQL [SELECT source, sentiment, COUNT(*) AS data_count FROM
dfs.tmp.social_data_nlp WHERE created_time > DATE_SUB(CURRENT_DATE, 1) AND
keyword_search = true AND ( searched_keyword = 'fire' OR searched_keyword =
'acciden' OR user_name = 'aajtak') GROUP BY source, sentiment]; SQL state
[null]; error code [0]; SYSTEM ERROR: RetriesExhaustedException: Can't get
the locations

 

 

[Error Id: 9f6ccd37-ee0f-47b4-9a6b-4eeb4456d542 on
trinitybdClusterM02.trinitymobility.local:31010]; nested exception is
java.sql.SQLException: SYSTEM ERROR: RetriesExhaustedException: Can't get
the locations

 

 

[Error Id: 9f6ccd37-ee0f-47b4-9a6b-4eeb4456d542 on
trinitybdClusterM02.trinitymobility.local:31010]

org.springframework.jdbc.UncategorizedSQLException: StatementCallback;
uncategorized SQLException for SQL [SELECT source, sentiment, COUNT(*) AS
data_count FROM dfs.tmp.social_data_nlp WHERE created_time >
DATE_SUB(CURRENT_DATE, 1) AND keyword_search = true AND ( searched_keyword =
'fire' OR searched_keyword = 'acciden' OR user_name = 'aajtak') GROUP BY
source, sentiment]; SQL state [null]; error code [0]; SYSTEM ERROR:
RetriesExhaustedException: Can't get the locations

 

 

[Error Id: 9f6ccd37-ee0f-47b4-9a6b-4eeb4456d542 on
trinitybdClusterM02.trinitymobility.local:31010]; nested exception is
java.sql.SQLException: SYSTEM ERROR: RetriesExhaustedException: Can't get
the locations

 

 

[Error Id: 9f6ccd37-ee0f-47b4-9a6b-4eeb4456d542 on
trinitybdClusterM02.trinitymobility.local:31010]

at
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.tran
slate(AbstractFallbackSQLExceptionTranslator.java:84)
~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.tran
slate(AbstractFallbackSQLExceptionTranslator.java:81)
~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.tran
slate(AbstractFallbackSQLExceptionTranslator.java:81)
~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:419)
~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:474)
~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:484)
~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.springframework.jdbc.core.JdbcTemplate.queryForList(JdbcTemplate.java:51
0) ~[spring-jdbc-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.trinity.social.dao.SocialDataDaoImpl.getAllDataByKeywordCount(SocialData
DaoImpl.java:98) [classes/:?]

at
org.trinity.social.service.SpatialDataFilterServiceImpl.getDataByKeywords(Sp
atialDataFilterServiceImpl.java:48) [classes/:?]

at
org.trinity.social.controller.SocialDataController.getKeywordData(SocialData
Controller.java:26) [classes/:?]

at sun.reflect.GeneratedMethodAccessor106.invoke(Unknown Source)
~[?:?]

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43) ~[?:1.8.0_66]

at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66]

at
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(Invoc
ableHandlerMethod.java:220) [spring-web-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.springframework.web.method.support.InvocableHandlerMethod.invokeForReque
st(InvocableHandlerMethod.java:134)
[spring-web-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandle
rMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:116)
[spring-webmvc-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerA
dapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
[spring-webmvc-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerA
dapter.handleInternal(RequestMappingHandlerAdapter.java:738)
[spring-webmvc-4.3.5.RELEASE.jar:4.3.5.RELEASE]

at
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.hand

[jira] [Commented] (DRILL-5290) Provide an option to build operator table once for built-in static functions and reuse it across queries.

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887206#comment-15887206
 ] 

ASF GitHub Bot commented on DRILL-5290:
---

Github user sudheeshkatkam commented on the issue:

https://github.com/apache/drill/pull/757
  
+1

Please squash commits, and open a ticket for the enhancement you mentioned.


> Provide an option to build operator table once for built-in static functions 
> and reuse it across queries.
> -
>
> Key: DRILL-5290
> URL: https://issues.apache.org/jira/browse/DRILL-5290
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> Currently, DrillOperatorTable which contains standard SQL operators and 
> functions and Drill User Defined Functions (UDFs) (built-in and dynamic) gets 
> built for each query as part of creating QueryContext. This is an expensive 
> operation ( ~30 msec to build) and allocates  ~2M on heap for each query. For 
> high throughput, low latency operational queries, we quickly run out of heap 
> memory, causing JVM hangs. Build operator table once during startup for 
> static built-in functions and save in DrillbitContext, so we can reuse it 
> across queries.
> Provide a system/session option to not use dynamic UDFs so we can use the 
> operator table saved in DrillbitContext and avoid building each time.
> *Please note, changes are adding new option exec.udf.use_dynamic which needs 
> to be documented.*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5293) Poor performance of Hash Table due to same hash value as distribution below

2017-02-27 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5293:

Reviewer: Chunhui Shi

Assigned Reviewer to [~cshi]

> Poor performance of Hash Table due to same hash value as distribution below
> ---
>
> Key: DRILL-5293
> URL: https://issues.apache.org/jira/browse/DRILL-5293
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.8.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>
> The computation of the hash value is basically the same whether for the Hash 
> Table (used by Hash Agg, and Hash Join), or for distribution of rows at the 
> exchange. As a result, a specific Hash Table (in a parallel minor fragment) 
> gets only rows "filtered out" by the partition below ("upstream"), so the 
> pattern of this filtering leads to a non uniform usage of the hash buckets in 
> the table.
>   Here is a simplified example: An exchange partitions into TWO (minor 
> fragments), each running a Hash Agg. So the partition sends rows of EVEN hash 
> values to the first, and rows of ODD hash values to the second. Now the first 
> recomputes the _same_ hash value for its Hash table -- and only the even 
> buckets get used !!  (Or with a partition into EIGHT -- possibly only one 
> eighth of the buckets would be used !! ) 
>This would lead to longer hash chains and thus a _poor performance_ !
> A possible solution -- add a distribution function distFunc (only for 
> partitioning) that takes the hash value and "scrambles" it so that the 
> entropy in all the bits effects the low bits of the output. This function 
> should be applied (in HashPrelUtil) over the generated code that produces the 
> hash value, like:
>distFunc( hash32(field1, hash32(field2, hash32(field3, 0))) );
> Tested with a huge hash aggregate (64 M rows) and a parallelism of 8 ( 
> planner.width.max_per_node = 8 ); minor fragments 0 and 4 used only 1/8 of 
> their buckets, the others used 1/4 of their buckets.  Maybe the reason for 
> this variance is that distribution is using "hash32AsDouble" and hash agg is 
> using "hash32".  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5287) Provide option to skip updates of ephemeral state changes in Zookeeper

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887108#comment-15887108
 ] 

ASF GitHub Bot commented on DRILL-5287:
---

Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/758#discussion_r103364711
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryManager.java
 ---
@@ -280,8 +281,15 @@ public void interrupted(final InterruptedException ex) 
{
 }
   }
 
-  QueryState updateEphemeralState(final QueryState queryState) {
-switch (queryState) {
+  void updateEphemeralState(final QueryState queryState) {
+  // If query is already in zk transient store, ignore the transient 
state update option.
+  // Else, they will not be removed from transient store upon 
completion.
+  if (transientProfiles.get(stringQueryId) == null &&
--- End diff --

Why not just check the option?

`transientProfiles.get(stringQueryId)` is quite expensive itself ([contacts 
ZooKeeper and deserializes 
data](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZkEphemeralStore.java#L61)).


> Provide option to skip updates of ephemeral state changes in Zookeeper
> --
>
> Key: DRILL-5287
> URL: https://issues.apache.org/jira/browse/DRILL-5287
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.10
>
>
> We put transient profiles in zookeeper and update state as query progresses 
> and changes states. It is observed that this adds latency of ~45msec for each 
> update in the query execution path. This gets even worse when high number of 
> concurrent queries are in progress. For concurrency=100, the average query 
> response time even for short queries  is 8 sec vs 0.2 sec with these updates 
> disabled. For short lived queries in a high-throughput scenario, it is of no 
> value to update state changes in zookeeper. We need an option to disable 
> these updates for short running operational queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5221) cancel message is delayed until queryid or data is received

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887012#comment-15887012
 ] 

ASF GitHub Bot commented on DRILL-5221:
---

Github user vkorukanti commented on the issue:

https://github.com/apache/drill/pull/733
  
LGTM, +1. We may end up sending CANCEL twice to server, but the server 
already has state management, so should be fine.


> cancel message is delayed until queryid or data is received
> ---
>
> Key: DRILL-5221
> URL: https://issues.apache.org/jira/browse/DRILL-5221
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.9.0
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>
> When user is calling the cancel method of the C++ client, the client wait for 
> a message from the server to reply back with a cancellation message.
> In case of queries taking a long time to return batch results, it means 
> cancellation won't be effective until the next batch is received, instead of 
> cancelling right away the query (assuming the query id has already been 
> received, which is generally the case).
> It seems this was foreseen by [~vkorukanti] in his initial patch 
> (https://github.com/vkorukanti/drill/commit/e0ef6349aac48de5828b6d725c2cf013905d18eb)
>  but was omitted when I backported it post metadata changes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5167) C++ connector does not set escape string for metadata search pattern

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886964#comment-15886964
 ] 

ASF GitHub Bot commented on DRILL-5167:
---

Github user vkorukanti commented on the issue:

https://github.com/apache/drill/pull/712
  
+1


> C++ connector does not set escape string for metadata search pattern
> 
>
> Key: DRILL-5167
> URL: https://issues.apache.org/jira/browse/DRILL-5167
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Minor
>
> C++ connector does not set the escape string for search pattern when doing 
> metadata operation (getCatalogs/getSchema/getTables/getColumns). It is 
> assumed to be '\\' as returned by DrillMetadata::getSearchEscapeString(), but 
> because this is not sent over the wire, the server will actually consider 
> that there's no escape character, and might return different or no result 
> compared to what has been requested.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886821#comment-15886821
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/761
  
Fixed code review comments. Rebased on latest master.


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886810#comment-15886810
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103332364
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -392,22 +448,31 @@ private void configure(DrillConfig config) {
 // Set too large and the ratio between memory and input data sizes 
becomes
 // small. Set too small and disk seek times dominate performance.
 
-spillBatchSize = 
config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
-spillBatchSize = Math.max(spillBatchSize, MIN_SPILL_BATCH_SIZE);
+preferredSpillBatchSize = 
config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
+
+// In low memory, use no more than 1/4 of memory for each spill batch. 
Ensures we
+// can merge.
+
+preferredSpillBatchSize = Math.min(preferredSpillBatchSize, 
memoryLimit / 4);
--- End diff --

In low memory conditions, restrict the spill batch size to 1/4 of memory. 
Why?

* We need to accumulate at least 2 such batches to do a merge. (Now at 1/2 
of memory.)
* We need to create an output batch from the two inputs (3/4 of memory).
* Need overhead for other direct memory uses. (Remaining 1/4 of memory.)

Sadly, memory management in Drill is not very precise: batch sizes can't be 
predicted with any accuracy. Trying to use, say, 1/3 of memory for the spill 
batch would seem more logical. (Two batches into the merge, one out), but the 
allocator issues a fatal error if we guess wrong by even one byte. So, we are 
forced to be conservative.

If we had better control, or a more forgiving allocator, we could make 
different choices.

Also, why try to sort GBs of data in 20 MB? Yet, this is the test case that 
had to be solved and that this particular fix enables.

I'm open to suggestions for better solutions; this is a very tricky area...


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886819#comment-15886819
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103330903
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/SpillSet.java
 ---
@@ -357,9 +393,13 @@ public SpillSet(FragmentContext context, 
PhysicalOperator popConfig) {
 } else {
   fileManager = new HadoopFileManager(spillFs);
 }
-FragmentHandle handle = context.getHandle();
-spillDirName = String.format("%s_major%s_minor%s_op%s", 
QueryIdHelper.getQueryId(handle.getQueryId()),
-handle.getMajorFragmentId(), handle.getMinorFragmentId(), 
popConfig.getOperatorId());
+spillDirName = String.format(
--- End diff --

Either format is fine. Go ahead and overwrite this change with your 
preferred format when you commit your work.


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886816#comment-15886816
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103332807
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -765,12 +838,12 @@ private void processBatch() {
   spillFromMemory();
 }
 
-// Sanity check. We should now be above the spill point.
+// Sanity check. We should now be below the buffer memory maximum.
 
 long startMem = allocator.getAllocatedMemory();
-if (memoryLimit - startMem < spillPoint) {
-  logger.error( "ERROR: Failed to spill below the spill point. Spill 
point = {}, free memory = {}",
-spillPoint, memoryLimit - startMem);
+if (startMem > bufferMemoryPool) {
+  logger.error( "ERROR: Failed to spill above buffer limit. Buffer 
pool = {}, memory = {}",
+  bufferMemoryPool, startMem);
--- End diff --

We could. But, at this point, it is a potential problem, not a real one. 
Maybe the input has no rows; we won't overflow memory. Maybe just one or two 
rows and we'll be fine.

This warning says, "if we continue as we are now, and we have large amounts 
of data, we'll run off the rails." It helps explain any later OOM error that 
occurs.


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886811#comment-15886811
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103332576
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -392,22 +448,31 @@ private void configure(DrillConfig config) {
 // Set too large and the ratio between memory and input data sizes 
becomes
 // small. Set too small and disk seek times dominate performance.
 
-spillBatchSize = 
config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
-spillBatchSize = Math.max(spillBatchSize, MIN_SPILL_BATCH_SIZE);
+preferredSpillBatchSize = 
config.getBytes(ExecConstants.EXTERNAL_SORT_SPILL_BATCH_SIZE);
+
+// In low memory, use no more than 1/4 of memory for each spill batch. 
Ensures we
+// can merge.
+
+preferredSpillBatchSize = Math.min(preferredSpillBatchSize, 
memoryLimit / 4);
+
+// But, the spill batch should be above some minimum size to prevent 
complete
+// thrashing.
+
+preferredSpillBatchSize = Math.max(preferredSpillBatchSize, 
MIN_SPILL_BATCH_SIZE);
--- End diff --

Done later when we tally up memory needs and compare it to available 
memory. We issue an error log message if memory overflow is likely. This way, 
when a query fails, we can look at the log to see that we knew it would fail 
due to low memory limits.


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886812#comment-15886812
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r10813
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
 // spill batches of either 64K records, or as many records as fit into 
the
 // amount of memory dedicated to each spill batch, whichever is less.
 
-spillBatchRowCount = (int) Math.max(1, spillBatchSize / 
estimatedRowWidth);
+spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / 
estimatedRowWidth / 2);
--- End diff --

Yes. Another wonderful Drill artifact. Suppose we have 1023 bytes of data. 
We will allocate a vector of 1024 bytes in size. Suppose we have 1025 bytes of 
data. (Just 0.2% more.) We allocate a vector of 2048 bytes.

Now, we could be more conservative and assume that, on average, each vector 
will bye 3/4 full, so we should us a factor of 1.5 for the calcs. We can file a 
JIRA and experiment with this change as a future enhancement.

It would also help if the allocator didn't kill the query if we allocate 
even one extra byte. But, since math errors are fatal, we are 
super-conservative for now.


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886815#comment-15886815
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103335045
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -1231,52 +1308,44 @@ private boolean consolidateBatches() {
* This method spills only half the accumulated batches
* minimizing unnecessary disk writes. The exact count must lie between
* the minimum and maximum spill counts.
-*/
+   */
 
   private void spillFromMemory() {
 
 // Determine the number of batches to spill to create a spill file
 // of the desired size. The actual file size might be a bit larger
 // or smaller than the target, which is expected.
 
-long estSize = 0;
 int spillCount = 0;
+long spillSize = 0;
 for (InputBatch batch : bufferedBatches) {
-  estSize += batch.getDataSize();
-  if (estSize > spillFileSize) {
-break; }
+  long batchSize = batch.getDataSize();
+  spillSize += batchSize;
   spillCount++;
+  if (spillSize + batchSize / 2 > spillFileSize) {
+break; }
 }
 
-// Should not happen, but just to be sure...
+// Must always spill at least 2, even if this creates an over-size
+// spill file.
 
-if (spillCount == 0) {
-  return; }
+spillCount = Math.max(spillCount, 2);
 
 // Do the actual spill.
 
-logger.trace("Starting spill from memory. Memory = {}, Buffered batch 
count = {}, Spill batch count = {}",
- allocator.getAllocatedMemory(), bufferedBatches.size(), 
spillCount);
 mergeAndSpill(bufferedBatches, spillCount);
   }
 
   private void mergeAndSpill(LinkedList source, int 
count) {
-if (count == 0) {
-  return; }
 spilledRuns.add(doMergeAndSpill(source, count));
   }
 
   private BatchGroup.SpilledRun doMergeAndSpill(LinkedList batchGroups, int spillCount) {
 List batchesToSpill = Lists.newArrayList();
 spillCount = Math.min(batchGroups.size(), spillCount);
 assert spillCount > 0 : "Spill count to mergeAndSpill must not be 
zero";
-long spillSize = 0;
 for (int i = 0; i < spillCount; i++) {
-  @SuppressWarnings("resource")
-  BatchGroup batch = batchGroups.pollFirst();
-  assert batch != null : "Encountered a null batch during merge and 
spill operation";
-  batchesToSpill.add(batch);
-  spillSize += batch.getDataSize();
+  batchesToSpill.add(batchGroups.pollFirst());
--- End diff --

This won't happen for first-generation spills due to the check in 
`isSpillNeeded`. But, it could happen when spilling for merges, so I fixed the 
code.


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886814#comment-15886814
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103331438
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -219,7 +220,18 @@
 
   private BatchSchema schema;
 
+  /**
+   * Incoming batches buffered in memory prior to spilling
+   * or an in-memory merge.
+   */
+
   private LinkedList bufferedBatches = 
Lists.newLinkedList();
+
+  /**
+   * Spilled runs consisting of a large number of spilled
+   * in-memory batches.
--- End diff --

Fixed.


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886818#comment-15886818
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r103334294
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
 // spill batches of either 64K records, or as many records as fit into 
the
 // amount of memory dedicated to each spill batch, whichever is less.
 
-spillBatchRowCount = (int) Math.max(1, spillBatchSize / 
estimatedRowWidth);
+spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize / 
estimatedRowWidth / 2);
 spillBatchRowCount = Math.min(spillBatchRowCount, Character.MAX_VALUE);
 
+// Compute the actual spill batch size which may be larger or smaller
+// than the preferred size depending on the row width. Double the 
estimated
+// memory needs to allow for power-of-two rounding.
+
+targetSpillBatchSize = spillBatchRowCount * estimatedRowWidth * 2;
+
 // Determine the number of records per batch per merge step. The goal 
is to
 // merge batches of either 64K records, or as many records as fit into 
the
 // amount of memory dedicated to each merge batch, whichever is less.
 
-targetMergeBatchSize = preferredMergeBatchSize;
-mergeBatchRowCount = (int) Math.max(1, targetMergeBatchSize / 
estimatedRowWidth);
+mergeBatchRowCount = (int) Math.max(1, preferredMergeBatchSize / 
estimatedRowWidth / 2);
 mergeBatchRowCount = Math.min(mergeBatchRowCount, Character.MAX_VALUE);
+targetMergeBatchSize = mergeBatchRowCount * estimatedRowWidth * 2;
 
 // Determine the minimum memory needed for spilling. Spilling is done 
just
 // before accepting a batch, so we must spill if we don't have room 
for a
 // (worst case) input batch. To spill, we need room for the output 
batch created
 // by merging the batches already in memory. Double this to allow for 
power-of-two
 // memory allocations.
 
-spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
+long spillPoint = estimatedInputBatchSize + 2 * targetSpillBatchSize;
 
 // The merge memory pool assumes we can spill all input batches. To 
make
 // progress, we must have at least two merge batches (same size as an 
output
 // batch) and one output batch. Again, double to allow for power-of-two
 // allocation and add one for a margin of error.
 
-int minMergeBatches = 2 * 3 + 1;
-long minMergeMemory = minMergeBatches * targetMergeBatchSize;
+long minMergeMemory = Math.round((2 * targetSpillBatchSize + 
targetMergeBatchSize) * 1.05);
 
 // If we are in a low-memory condition, then we might not have room 
for the
 // default output batch size. In that case, pick a smaller size.
 
-long minMemory = Math.max(spillPoint, minMergeMemory);
-if (minMemory > memoryLimit) {
-
-  // Figure out the minimum output batch size based on memory, but 
can't be
-  // any smaller than the defined minimum.
-
-  targetMergeBatchSize = Math.max(MIN_MERGED_BATCH_SIZE, memoryLimit / 
minMergeBatches);
+if (minMergeMemory > memoryLimit) {
 
-  // Regardless of anything else, the batch must hold at least one
-  // complete row.
+  // Figure out the minimum output batch size based on memory,
+  // must hold at least one complete row.
 
-  targetMergeBatchSize = Math.max(estimatedRowWidth, 
targetMergeBatchSize);
-  spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
-  minMergeMemory = minMergeBatches * targetMergeBatchSize;
+  long mergeAllowance = Math.round((memoryLimit - 2 * 
targetSpillBatchSize) * 0.95);
+  targetMergeBatchSize = Math.max(estimatedRowWidth, mergeAllowance / 
2);
+  mergeBatchRowCount = (int) (targetMergeBatchSize / estimatedRowWidth 
/ 2);
--- End diff --

Good catch! Fixed.


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> 

[jira] [Commented] (DRILL-5284) Roll-up of final fixes for managed sort

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886813#comment-15886813
 ] 

ASF GitHub Bot commented on DRILL-5284:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/761#discussion_r10406
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
@@ -934,6 +1005,14 @@ private void updateMemoryEstimates(long memoryDelta, 
RecordBatchSizer sizer) {
 long origInputBatchSize = estimatedInputBatchSize;
 estimatedInputBatchSize = Math.max(estimatedInputBatchSize, 
actualBatchSize);
 
+// The row width may end up as zero if all fields are nulls or some
+// other unusual situation. In this case, assume a width of 10 just
+// to avoid lots of special case code.
+
+if (estimatedRowWidth == 0) {
+  estimatedRowWidth = 10;
--- End diff --

This is a very peculiar case that came up in testing. It seems that we can 
have a row with one column and that one column is always null. Imagine a 
Parquet file that has 1 million Varchars, all of which are null. In every 
batch, the row width will be 0. Since we often divide by the row width, bad 
things happen. So, here, we arbitrarily say that if the row is abnormally 
small, just assume 10 bytes to avoid the need for a bunch of special case 
calcs. (The calcs are already too complex already.)

If there are 1000 columns, all of which are null, we would write 1000 "bit" 
(really byte) vectors, so each row would be 1000 bytes wide. But, in such a 
case, the batch analyzer should have come up with a number other than 0 for the 
row width.


> Roll-up of final fixes for managed sort
> ---
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time, 
> extensive testing has identified a number of minor fixes and improvements. 
> Given the long PR cycles, it is not practical to spend a week or two to do a 
> PR for each fix individually. This ticket represents a roll-up of a 
> combination of a number of fixes. Small fixes are listed here, larger items 
> appear as sub-tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5226) External Sort encountered an error while spilling to disk

2017-02-27 Thread Rahul Challapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-5226:
-
Attachment: scenario3.log
profile_scenario3.sys.drill

One more scenario :
{code}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 104857600;
select col11 from (select * from dfs.`/drill/testdata/identical1` order by 
col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11 desc) d 
where d.col11 < 10;
{code}

The logfile (scenario3.log) and profile(profile_scenario3.sys.drill) are 
attached

> External Sort encountered an error while spilling to disk
> -
>
> Key: DRILL-5226
> URL: https://issues.apache.org/jira/browse/DRILL-5226
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
> Attachments: 277578d5-8bea-27db-0da1-cec0f53a13df.sys.drill, 
> profile_scenario3.sys.drill, scenario3.log
>
>
> Environment : 
> {code}
> git.commit.id.abbrev=2af709f
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> Nodes in Mapr Cluster : 1
> Data Size : ~ 0.35 GB
> No of Columns : 1
> Width of column : 256 chars
> {code}
> The below query fails before spilling to disk due to wrong estimates of the 
> record batch size.
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_node` = 1;
> +---+--+
> |  ok   |   summary|
> +---+--+
> | true  | planner.width.max_per_node updated.  |
> +---+--+
> 1 row selected (1.11 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 62914560;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +---++
> 1 row selected (0.362 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.disable_exchanges` = true;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | planner.disable_exchanges updated.  |
> +---+-+
> 1 row selected (0.277 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select * from (select * from 
> dfs.`/drill/testdata/resource-manager/250wide-small.tbl` order by 
> columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf';
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to 
> disk
> Unable to allocate buffer of size 1048576 (rounded from 618889) due to memory 
> limit. Current allocation: 62736000
> Fragment 0:0
> [Error Id: 1bb933c8-7dc6-4cbd-8c8e-0e095baac719 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Exception from the logs
> {code}
> 2017-01-26 15:33:09,307 [277578d5-8bea-27db-0da1-cec0f53a13df:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: External Sort 
> encountered an error while spilling to disk (Unable to allocate buffer of 
> size 1048576 (rounded from 618889) due to memory limit. Current allocation: 
> 62736000)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External 
> Sort encountered an error while spilling to disk
> Unable to allocate buffer of size 1048576 (rounded from 618889) due to memory 
> limit. Current allocation: 62736000
> [Error Id: 1bb933c8-7dc6-4cbd-8c8e-0e095baac719 ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:603)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:411)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  

[jira] [Commented] (DRILL-5114) Rationalize use of Logback logging in unit tests

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886670#comment-15886670
 ] 

ASF GitHub Bot commented on DRILL-5114:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/762
  
Can't achieve different log levels for different appenders. To favor 
Lilith, cannot reduce the level for console. So, need to come up with an 
alternative solution. Closing again until that is sorted out.


> Rationalize use of Logback logging in unit tests
> 
>
> Key: DRILL-5114
> URL: https://issues.apache.org/jira/browse/DRILL-5114
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Drill uses Logback as its logger. The logger is used in several to display 
> some test output. Test output is sent to stdout, rather than a log file. 
> Since Drill also uses Logback, that same configuration sends much Drill 
> logging output to stdout as well, cluttering test output.
> Logback requires that that one Logback config file (either logback.xml or 
> hogback-test.xml) exist on the class path. Tests store the config file in the 
> src/test/resources folder of each sub-project.
> These files set the default logging level to debug. While this setting is 
> fine when working with individual tests, the output is overwhelming for bulk 
> test runs.
> The first requested change is to set the default logging level to error.
> The existing config files are usually called "logback.xml." Change the name 
> of test files to "logback-test.xml" to make clear that they are, in fact, 
> test configs.
> The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full 
> version of Drill's production config file. Replace this with a config 
> suitable for testing (that is, the same as other modules.)
> The java-exec project includes a production-like config file in its non-test 
> sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it 
> is not needed. (Instead, rely on the one shipped in the distribution 
> subsystem, which is the one copied to the Drill distribution.)
> Since Logback complains bitterly (via many log messages) when it cannot find 
> a configuration file (and each sub-module must have its own test 
> configuration), add missing logging configuration files:
> * exec/memory/base/src/test/resources/logback-test.xml
> * logical/src/test/resources/logback-test.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5114) Rationalize use of Logback logging in unit tests

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886668#comment-15886668
 ] 

ASF GitHub Bot commented on DRILL-5114:
---

Github user paul-rogers closed the pull request at:

https://github.com/apache/drill/pull/762


> Rationalize use of Logback logging in unit tests
> 
>
> Key: DRILL-5114
> URL: https://issues.apache.org/jira/browse/DRILL-5114
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Drill uses Logback as its logger. The logger is used in several to display 
> some test output. Test output is sent to stdout, rather than a log file. 
> Since Drill also uses Logback, that same configuration sends much Drill 
> logging output to stdout as well, cluttering test output.
> Logback requires that that one Logback config file (either logback.xml or 
> hogback-test.xml) exist on the class path. Tests store the config file in the 
> src/test/resources folder of each sub-project.
> These files set the default logging level to debug. While this setting is 
> fine when working with individual tests, the output is overwhelming for bulk 
> test runs.
> The first requested change is to set the default logging level to error.
> The existing config files are usually called "logback.xml." Change the name 
> of test files to "logback-test.xml" to make clear that they are, in fact, 
> test configs.
> The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full 
> version of Drill's production config file. Replace this with a config 
> suitable for testing (that is, the same as other modules.)
> The java-exec project includes a production-like config file in its non-test 
> sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it 
> is not needed. (Instead, rely on the one shipped in the distribution 
> subsystem, which is the one copied to the Drill distribution.)
> Since Logback complains bitterly (via many log messages) when it cannot find 
> a configuration file (and each sub-module must have its own test 
> configuration), add missing logging configuration files:
> * exec/memory/base/src/test/resources/logback-test.xml
> * logical/src/test/resources/logback-test.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5114) Rationalize use of Logback logging in unit tests

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886630#comment-15886630
 ] 

ASF GitHub Bot commented on DRILL-5114:
---

Github user jinfengni commented on a diff in the pull request:

https://github.com/apache/drill/pull/762#discussion_r103325603
  
--- Diff: exec/java-exec/src/test/resources/logback-test.xml ---
@@ -32,12 +32,13 @@
   
 
   
-
+
 
+
--- End diff --

Why do we want to put all the log messages in STDOUT? 


> Rationalize use of Logback logging in unit tests
> 
>
> Key: DRILL-5114
> URL: https://issues.apache.org/jira/browse/DRILL-5114
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Drill uses Logback as its logger. The logger is used in several to display 
> some test output. Test output is sent to stdout, rather than a log file. 
> Since Drill also uses Logback, that same configuration sends much Drill 
> logging output to stdout as well, cluttering test output.
> Logback requires that that one Logback config file (either logback.xml or 
> hogback-test.xml) exist on the class path. Tests store the config file in the 
> src/test/resources folder of each sub-project.
> These files set the default logging level to debug. While this setting is 
> fine when working with individual tests, the output is overwhelming for bulk 
> test runs.
> The first requested change is to set the default logging level to error.
> The existing config files are usually called "logback.xml." Change the name 
> of test files to "logback-test.xml" to make clear that they are, in fact, 
> test configs.
> The {{exec/java-exec/src/test/resources/logback.xml}} config file is a full 
> version of Drill's production config file. Replace this with a config 
> suitable for testing (that is, the same as other modules.)
> The java-exec project includes a production-like config file in its non-test 
> sources: {{exec/java-exec/src/main/resources/logback.xml}}. Remove this as it 
> is not needed. (Instead, rely on the one shipped in the distribution 
> subsystem, which is the one copied to the Drill distribution.)
> Since Logback complains bitterly (via many log messages) when it cannot find 
> a configuration file (and each sub-module must have its own test 
> configuration), add missing logging configuration files:
> * exec/memory/base/src/test/resources/logback-test.xml
> * logical/src/test/resources/logback-test.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5208) Finding path to java executable should be deterministic

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886619#comment-15886619
 ] 

ASF GitHub Bot commented on DRILL-5208:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/763
  
Adressed code review comments and rebased onto latest master.


> Finding path to java executable should be deterministic
> ---
>
> Key: DRILL-5208
> URL: https://issues.apache.org/jira/browse/DRILL-5208
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.10.0
>Reporter: Krystal
>Assignee: Paul Rogers
>Priority: Minor
>
> Command to find JAVA in drill-config.sh is not deterministic.  
> drill-config.sh uses the following command to find JAVA:
> JAVA=`find -L "$JAVA_HOME" -name $JAVA_BIN -type f | head -n 1`
> On one of my node the following command returned 2 entries:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> /usr/local/java/jdk1.7.0_67/bin/java
> On another node, the same command returned entries in different order:
> find -L $JAVA_HOME -name java -type f
> /usr/local/java/jdk1.7.0_67/bin/java
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> The complete command picks the first one returned which may not be the same 
> on each node:
> find -L $JAVA_HOME -name java -type f | head -n 1
> /usr/local/java/jdk1.7.0_67/jre/bin/java
> If JAVA_HOME is found, we should just append the "bin/java" to the path"
> JAVA=$JAVA_HOME/bin/java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5290) Provide an option to build operator table once for built-in static functions and reuse it across queries.

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886333#comment-15886333
 ] 

ASF GitHub Bot commented on DRILL-5290:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/757#discussion_r103285068
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -413,4 +413,8 @@
 
   String DYNAMIC_UDF_SUPPORT_ENABLED = "exec.udf.enable_dynamic_support";
   BooleanValidator DYNAMIC_UDF_SUPPORT_ENABLED_VALIDATOR = new 
BooleanValidator(DYNAMIC_UDF_SUPPORT_ENABLED, true, true);
+
+  String USE_DYNAMIC_UDFS = "exec.udf.use_dynamic";
--- End diff --

ok,  we need to use readWriteLocks if we update the table each time a 
function gets added/removed. That is unnecessary overhead and will cause 
contention with concurrency. One option is to split the table into two, one for 
built-in functions (which can be accessed without locks) and other for dynamic 
functions.  That will be a bigger change and like I mentioned before, is not 
considered worth the effort. 


> Provide an option to build operator table once for built-in static functions 
> and reuse it across queries.
> -
>
> Key: DRILL-5290
> URL: https://issues.apache.org/jira/browse/DRILL-5290
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> Currently, DrillOperatorTable which contains standard SQL operators and 
> functions and Drill User Defined Functions (UDFs) (built-in and dynamic) gets 
> built for each query as part of creating QueryContext. This is an expensive 
> operation ( ~30 msec to build) and allocates  ~2M on heap for each query. For 
> high throughput, low latency operational queries, we quickly run out of heap 
> memory, causing JVM hangs. Build operator table once during startup for 
> static built-in functions and save in DrillbitContext, so we can reuse it 
> across queries.
> Provide a system/session option to not use dynamic UDFs so we can use the 
> operator table saved in DrillbitContext and avoid building each time.
> *Please note, changes are adding new option exec.udf.use_dynamic which needs 
> to be documented.*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5287) Provide option to skip updates of ephemeral state changes in Zookeeper

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886231#comment-15886231
 ] 

ASF GitHub Bot commented on DRILL-5287:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/758#discussion_r103271146
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -413,4 +413,8 @@
 
   String DYNAMIC_UDF_SUPPORT_ENABLED = "exec.udf.enable_dynamic_support";
   BooleanValidator DYNAMIC_UDF_SUPPORT_ENABLED_VALIDATOR = new 
BooleanValidator(DYNAMIC_UDF_SUPPORT_ENABLED, true, true);
+
+  String ZK_QUERY_STATE_UPDATE_KEY = "drill.exec.zk.query.state.update";
--- End diff --

I changed it to QUERY_TRANSIENT_STATE_UPDATE_KEY  and 
exec.query.progress.update. Please review the new diffs. 


> Provide option to skip updates of ephemeral state changes in Zookeeper
> --
>
> Key: DRILL-5287
> URL: https://issues.apache.org/jira/browse/DRILL-5287
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.10
>
>
> We put transient profiles in zookeeper and update state as query progresses 
> and changes states. It is observed that this adds latency of ~45msec for each 
> update in the query execution path. This gets even worse when high number of 
> concurrent queries are in progress. For concurrency=100, the average query 
> response time even for short queries  is 8 sec vs 0.2 sec with these updates 
> disabled. For short lived queries in a high-throughput scenario, it is of no 
> value to update state changes in zookeeper. We need an option to disable 
> these updates for short running operational queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-3161) Drill JDBC driver not visible/auto-registered via Service Provider Mechanism

2017-02-27 Thread Laurent Goujon (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laurent Goujon closed DRILL-3161.
-
Resolution: Duplicate

> Drill JDBC driver not visible/auto-registered via Service Provider Mechanism
> 
>
> Key: DRILL-3161
> URL: https://issues.apache.org/jira/browse/DRILL-3161
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Reporter: Daniel Barclay
> Fix For: Future
>
>
> Drill's JDBC driver is not automatically made visible to JDBC's DriverManager 
> and auto-registered, because it does not use Java's Service Provider 
> Mechanism as specified by JDBC 4.0.
> This usually means that instead of just having to put the Drill JDBC driver 
> Jar file on the class path and use a Drill JDBC URL (one starting with 
> "{{jdbc:drill:}}"), users also have to configure their tools or code with the 
> name of the Drill driver class.
> 
> The Drill JDBC driver's Jar file should contain a 
> {{META-INF/services/java.sql.Driver}} file that contains a line consisting of 
> the fully qualified name of the Drill JDBC driver class 
> ({{org.apache.drill.jdbc.Driver}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3510) Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886213#comment-15886213
 ] 

ASF GitHub Bot commented on DRILL-3510:
---

Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/520#discussion_r103268387
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
@@ -695,6 +698,33 @@ public void runQuery(QueryType type, 
List planFragments, UserResul
   }
 
   /**
+   * Get server properties that represent the list of server session 
options.
+   *
+   * @return server properties for the server session options.
+   */
+  public ServerProperties getOptions() throws RpcException {
--- End diff --

Sorry, it took me longer than a week but PR #764 contains API change for 
server metadata support with C++ client/JDBC driver support. If approved, it 
should make things way easier for you as the only change you would need is to 
update the server metadata to get the quoting information from the session.


> Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL 
> identifiers 
> --
>
> Key: DRILL-3510
> URL: https://issues.apache.org/jira/browse/DRILL-3510
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Reporter: Jinfeng Ni
>Assignee: Vitalii Diravka
>  Labels: doc-impacting
> Fix For: 1.10.0
>
> Attachments: DRILL-3510.patch, DRILL-3510.patch
>
>
> Currently Drill's SQL parser uses backtick as identifier quotes, the same as 
> what MySQL does. However, this is different from ANSI SQL specification, 
> where double quote is used as identifier quotes.  
> MySQL has an option "ANSI_QUOTES", which could be switched on/off by user. 
> Drill should follow the same way, so that Drill users do not have to rewrite 
> their existing queries, if their queries use double quotes. 
> {code}
> SET sql_mode='ANSI_QUOTES';
> {code}
>



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4994) Prepared statement stopped working between 1.8.0 client and < 1.8.0 server

2017-02-27 Thread Laurent Goujon (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886204#comment-15886204
 ] 

Laurent Goujon commented on DRILL-4994:
---

Part of this pull request: https://github.com/apache/drill/pull/613

> Prepared statement stopped working between 1.8.0 client and < 1.8.0 server
> --
>
> Key: DRILL-4994
> URL: https://issues.apache.org/jira/browse/DRILL-4994
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>
> Older servers (pre-1.8.0) don't support the prepared statement rpc method, 
> but the JDBC client doesn't check if it is available or not. The end result 
> is that the statement is stuck as the server is not responding back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-4994) Prepared statement stopped working between 1.8.0 client and < 1.8.0 server

2017-02-27 Thread Laurent Goujon (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laurent Goujon reassigned DRILL-4994:
-

Assignee: Laurent Goujon

> Prepared statement stopped working between 1.8.0 client and < 1.8.0 server
> --
>
> Key: DRILL-4994
> URL: https://issues.apache.org/jira/browse/DRILL-4994
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>
> Older servers (pre-1.8.0) don't support the prepared statement rpc method, 
> but the JDBC client doesn't check if it is available or not. The end result 
> is that the statement is stuck as the server is not responding back.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5301) Add server metadata API

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886202#comment-15886202
 ] 

ASF GitHub Bot commented on DRILL-5301:
---

GitHub user laurentgo opened a pull request:

https://github.com/apache/drill/pull/764

DRILL-5301: Server metadata API

Add a Server metadata API to the User protocol, to query server support of 
various SQL features.

Add support to the client (DrillClient) to query this information.

Add support to the JDBC driver to query this information, if the server 
supports the new API, or fallback to the previous behaviour (rely on Avatica 
defaults) otherwise.

Add support to the Server metadata API to the C++ client if available. If 
the API is not supported to the server, fallback to the previous hard-coded 
values.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/laurentgo/drill laurent/server-meta

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/764.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #764


commit 48bf728c88b8244c0fc51ae8856d0f786bd9e986
Author: Laurent Goujon 
Date:   2016-11-04T20:31:19Z

Refactor DrillCursor

Refactor DrillCursor to be more self-contained.

commit 6583d69df3b972270e146e53ab2ddcf9c4aff93c
Author: Laurent Goujon 
Date:   2016-11-04T20:32:44Z

DRILL-4730: Update JDBC DatabaseMetaData implementation to use new Metadata 
APIs

Update JDBC driver to use Metadata APIs instead of executing SQL queries

commit 17ce38a44d098e744620a28b25c93fd352e7c76d
Author: Laurent Goujon 
Date:   2016-11-05T00:36:42Z

DRILL-4994: Add back JDBC prepared statement for older servers

When the JDBC client is connected to an older Drill server, it always
attempted to use server-side prepared statement with no fallback.

With this change, client will check server version and will fallback to the
previous client-side prepared statement (which is still limited to only 
execute
queries and does not provide metadata).

commit 5048bb650bf3a42e9f7920727e19e33ae59f0188
Author: Laurent Goujon 
Date:   2017-02-24T23:41:07Z

DRILL-5301: Server metadata API

Add a Server metadata API to the User protocol, to query server support
of various SQL features.

Add support to the client (DrillClient) to query this information.

Add support to the JDBC driver to query this information, if the server 
supports
the new API, or fallback to the previous behaviour (rely on Avatica 
defaults) otherwise.

commit d912267efad379e3730f800bc7b3af57bee2aa06
Author: Laurent Goujon 
Date:   2017-02-26T18:23:59Z

DRILL-5301: Add C++ client support for Server metadata API

Add support to the Server metadata API to the C++ client if
available. If the API is not supported to the server, fallback
to the previous hard-coded values.

Update the querySubmitter example program to query the information.




> Add server metadata API
> ---
>
> Key: DRILL-5301
> URL: https://issues.apache.org/jira/browse/DRILL-5301
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server, Client - C++, Client - Java, Client - JDBC, 
> Client - ODBC
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>
> JDBC and ODBC clients exposes lots of metadata regarding server version and 
> support of various parts of the SQL standard.
> Currently the returned information is hardcoded in both clients/drivers which 
> means that the infomation returned is support as of the client version, not 
> the server version.
> Instead, a new method should be provided to the clients to query the actual 
> server support. Support on the client or the server should be optional (for 
> example a client should not use this API if the server doesn't support it and 
> fallback to default values).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5301) Add server metadata API

2017-02-27 Thread Laurent Goujon (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laurent Goujon reassigned DRILL-5301:
-

Assignee: Laurent Goujon

> Add server metadata API
> ---
>
> Key: DRILL-5301
> URL: https://issues.apache.org/jira/browse/DRILL-5301
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server, Client - C++, Client - Java, Client - JDBC, 
> Client - ODBC
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>
> JDBC and ODBC clients exposes lots of metadata regarding server version and 
> support of various parts of the SQL standard.
> Currently the returned information is hardcoded in both clients/drivers which 
> means that the infomation returned is support as of the client version, not 
> the server version.
> Instead, a new method should be provided to the clients to query the actual 
> server support. Support on the client or the server should be optional (for 
> example a client should not use this API if the server doesn't support it and 
> fallback to default values).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5301) Add server metadata API

2017-02-27 Thread Laurent Goujon (JIRA)
Laurent Goujon created DRILL-5301:
-

 Summary: Add server metadata API
 Key: DRILL-5301
 URL: https://issues.apache.org/jira/browse/DRILL-5301
 Project: Apache Drill
  Issue Type: Improvement
  Components:  Server, Client - C++, Client - Java, Client - JDBC, 
Client - ODBC
Reporter: Laurent Goujon


JDBC and ODBC clients exposes lots of metadata regarding server version and 
support of various parts of the SQL standard.

Currently the returned information is hardcoded in both clients/drivers which 
means that the infomation returned is support as of the client version, not the 
server version.

Instead, a new method should be provided to the clients to query the actual 
server support. Support on the client or the server should be optional (for 
example a client should not use this API if the server doesn't support it and 
fallback to default values).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-4385) Support metadata and prepare operations on User RPC layer

2017-02-27 Thread Laurent Goujon (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laurent Goujon closed DRILL-4385.
-
Resolution: Duplicate
  Assignee: Venki Korukanti

> Support metadata and prepare operations on User RPC layer
> -
>
> Key: DRILL-4385
> URL: https://issues.apache.org/jira/browse/DRILL-4385
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Right now, we don't support prepare and metadata operations are done through 
> code and queries on the jdbc and odbc drivers. This is an umbrella task to 
> implement metadata and prepare operations directly at the RPC layer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-4419) JDBC driver should move to using the new metadata methods provided by DRILL-4385

2017-02-27 Thread Laurent Goujon (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laurent Goujon closed DRILL-4419.
-
Resolution: Fixed
  Assignee: Laurent Goujon

> JDBC driver should move to using the new metadata methods provided by 
> DRILL-4385
> 
>
> Key: DRILL-4419
> URL: https://issues.apache.org/jira/browse/DRILL-4419
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Laurent Goujon
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-4385) Support metadata and prepare operations on User RPC layer

2017-02-27 Thread Laurent Goujon (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886173#comment-15886173
 ] 

Laurent Goujon commented on DRILL-4385:
---

I believe this was done as part of DRILL-4728 and DRILL-4729

> Support metadata and prepare operations on User RPC layer
> -
>
> Key: DRILL-4385
> URL: https://issues.apache.org/jira/browse/DRILL-4385
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Jacques Nadeau
>
> Right now, we don't support prepare and metadata operations are done through 
> code and queries on the jdbc and odbc drivers. This is an umbrella task to 
> implement metadata and prepare operations directly at the RPC layer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (DRILL-5300) SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files

2017-02-27 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886026#comment-15886026
 ] 

Zelaine Fong edited comment on DRILL-5300 at 2/27/17 4:06 PM:
--

Based on these lines in your stack trace:

{code}
... 5 common frames omitted
2017-02-27 04:32:57,867 [drill-executor-453] ERROR 
o.a.d.exec.server.BootStrapContext - 
org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$DecompressionHelper.decompress(AsyncPageReader.java:402)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
{code}

The memory leak appears to be DRILL-5160.  

The missing snappy dependency is DRILL-5157.  If you pick up the fix for 
DRILL-5157, that will avoid the dependency problem you're hitting.


was (Author: zfong):
Based on these lines in your stack trace:

... 5 common frames omitted
2017-02-27 04:32:57,867 [drill-executor-453] ERROR 
o.a.d.exec.server.BootStrapContext - 
org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$DecompressionHelper.decompress(AsyncPageReader.java:402)
 ~[drill-java-exec-1.9.0.jar:1.9.0]

The memory leak appears to be DRILL-5160.  

The missing snappy dependency is DRILL-5157.  If you pick up the fix for 
DRILL-5157, that will avoid the dependency problem you're hitting.

> SYSTEM ERROR: IllegalStateException: Memory was leaked by query while 
> querying parquet files
> 
>
> Key: DRILL-5300
> URL: https://issues.apache.org/jira/browse/DRILL-5300
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
> Environment: OS: Linux
>Reporter: Muhammad Gelbana
> Attachments: both_queries_logs.zip
>
>
> Running the following query against parquet files (I modified some values for 
> privacy reasons)
> {code:title=Query causing the long logs|borderStyle=solid}
> SELECT AL4.NAME, AL5.SEGMENT2, SUM(AL1.AMOUNT), AL2.ATTRIBUTE4, 
> AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
> AL11.NAME FROM 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA__TRX_LINE_GL_DIST_ALL`
>  AL1, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA_OMER_TRX_ALL`
>  AL2, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX`
>  AL3, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_HR_COMMON/HR_ALL_ORGANIZATION_UNITS`
>  AL4, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_CODE_COMBINATIONS`
>  AL5, 
> dfs.`/disk2/XXX/XXX//data/../parquet//XXAT_AR_MU_TAB` 
> AL8, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX`
>  AL11, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
>  AL12, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
>  AL13, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
>  AL14, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
>  AL15, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
>  AL16, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
>  AL17, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
>  AL18, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
>  AL19 WHERE (AL2.SHIP_TO__USE_ID = AL15._USE_ID AND 
> AL15.___ID = AL14.___ID AND AL14.X__ID = 
> AL12.X__ID AND AL12.LOCATION_ID = AL13.LOCATION_ID AND 
> AL17.___ID = AL16.___ID AND AL16.X__ID = 
> AL19.X__ID AND AL19.LOCATION_ID = AL18.LOCATION_ID AND 
> AL2.BILL_TO__USE_ID = AL17._USE_ID AND AL2.SET_OF_X_ID = 
> AL3.SET_OF_X_ID AND AL1.CODE_COMBINATION_ID = AL5.CODE_COMBINATION_ID AND 
> AL5.SEGMENT4 = AL8.MU AND AL1.SET_OF_X_ID = AL11.SET_OF_X_ID AND 
> AL2.ORG_ID = AL4.ORGANIZATION_ID AND AL2.OMER_TRX_ID = 
> AL1.OMER_TRX_ID) AND ((AL5.SEGMENT2 = '41' AND AL1.AMOUNT <> 0 AND 
> AL4.NAME IN ('XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 
> 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-') 
> AND AL3.NAME like '%-PR-%')) GROUP BY AL4.NAME, AL5.SEGMENT2, AL2.ATTRIBUTE4, 
> AL2.XXX__CODE, 

[jira] [Commented] (DRILL-5300) SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files

2017-02-27 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886026#comment-15886026
 ] 

Zelaine Fong commented on DRILL-5300:
-

Based on these lines in your stack trace:

... 5 common frames omitted
2017-02-27 04:32:57,867 [drill-executor-453] ERROR 
o.a.d.exec.server.BootStrapContext - 
org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy
at 
org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$DecompressionHelper.decompress(AsyncPageReader.java:402)
 ~[drill-java-exec-1.9.0.jar:1.9.0]

The memory leak appears to be DRILL-5160.  

The missing snappy dependency is DRILL-5157.  If you pick up the fix for 
DRILL-5157, that will avoid the dependency problem you're hitting.

> SYSTEM ERROR: IllegalStateException: Memory was leaked by query while 
> querying parquet files
> 
>
> Key: DRILL-5300
> URL: https://issues.apache.org/jira/browse/DRILL-5300
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
> Environment: OS: Linux
>Reporter: Muhammad Gelbana
> Attachments: both_queries_logs.zip
>
>
> Running the following query against parquet files (I modified some values for 
> privacy reasons)
> {code:title=Query causing the long logs|borderStyle=solid}
> SELECT AL4.NAME, AL5.SEGMENT2, SUM(AL1.AMOUNT), AL2.ATTRIBUTE4, 
> AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
> AL11.NAME FROM 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA__TRX_LINE_GL_DIST_ALL`
>  AL1, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA_OMER_TRX_ALL`
>  AL2, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX`
>  AL3, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_HR_COMMON/HR_ALL_ORGANIZATION_UNITS`
>  AL4, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_CODE_COMBINATIONS`
>  AL5, 
> dfs.`/disk2/XXX/XXX//data/../parquet//XXAT_AR_MU_TAB` 
> AL8, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX`
>  AL11, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
>  AL12, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
>  AL13, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
>  AL14, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
>  AL15, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
>  AL16, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
>  AL17, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
>  AL18, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
>  AL19 WHERE (AL2.SHIP_TO__USE_ID = AL15._USE_ID AND 
> AL15.___ID = AL14.___ID AND AL14.X__ID = 
> AL12.X__ID AND AL12.LOCATION_ID = AL13.LOCATION_ID AND 
> AL17.___ID = AL16.___ID AND AL16.X__ID = 
> AL19.X__ID AND AL19.LOCATION_ID = AL18.LOCATION_ID AND 
> AL2.BILL_TO__USE_ID = AL17._USE_ID AND AL2.SET_OF_X_ID = 
> AL3.SET_OF_X_ID AND AL1.CODE_COMBINATION_ID = AL5.CODE_COMBINATION_ID AND 
> AL5.SEGMENT4 = AL8.MU AND AL1.SET_OF_X_ID = AL11.SET_OF_X_ID AND 
> AL2.ORG_ID = AL4.ORGANIZATION_ID AND AL2.OMER_TRX_ID = 
> AL1.OMER_TRX_ID) AND ((AL5.SEGMENT2 = '41' AND AL1.AMOUNT <> 0 AND 
> AL4.NAME IN ('XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 
> 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-') 
> AND AL3.NAME like '%-PR-%')) GROUP BY AL4.NAME, AL5.SEGMENT2, AL2.ATTRIBUTE4, 
> AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
> AL11.NAME
> {code}
> {code:title=Query causing the short logs|borderStyle=solid}
> SELECT AL11.NAME
> FROM
> dfs.`/XXX/XXX/XXX/data/../parquet/XXX_XXX_COMMON/GL_XXX` 
> LIMIT 10
> {code}
> This issue may be a duplicate for [this 
> one|https://issues.apache.org/jira/browse/DRILL-4398] but I created a new one 
> based on [this 
> suggestion|https://issues.apache.org/jira/browse/DRILL-4398?focusedCommentId=15884846=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15884846].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5300) SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files

2017-02-27 Thread Muhammad Gelbana (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muhammad Gelbana updated DRILL-5300:

Attachment: both_queries_logs.zip

> SYSTEM ERROR: IllegalStateException: Memory was leaked by query while 
> querying parquet files
> 
>
> Key: DRILL-5300
> URL: https://issues.apache.org/jira/browse/DRILL-5300
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
> Environment: OS: Linux
>Reporter: Muhammad Gelbana
> Attachments: both_queries_logs.zip
>
>
> Running the following query against parquet files (I modified some values for 
> privacy reasons)
> {code:title=Query causing the long logs|borderStyle=solid}
> SELECT AL4.NAME, AL5.SEGMENT2, SUM(AL1.AMOUNT), AL2.ATTRIBUTE4, 
> AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
> AL11.NAME FROM 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA__TRX_LINE_GL_DIST_ALL`
>  AL1, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA_OMER_TRX_ALL`
>  AL2, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX`
>  AL3, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_HR_COMMON/HR_ALL_ORGANIZATION_UNITS`
>  AL4, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_CODE_COMBINATIONS`
>  AL5, 
> dfs.`/disk2/XXX/XXX//data/../parquet//XXAT_AR_MU_TAB` 
> AL8, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX`
>  AL11, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
>  AL12, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
>  AL13, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
>  AL14, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
>  AL15, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
>  AL16, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
>  AL17, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
>  AL18, 
> dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
>  AL19 WHERE (AL2.SHIP_TO__USE_ID = AL15._USE_ID AND 
> AL15.___ID = AL14.___ID AND AL14.X__ID = 
> AL12.X__ID AND AL12.LOCATION_ID = AL13.LOCATION_ID AND 
> AL17.___ID = AL16.___ID AND AL16.X__ID = 
> AL19.X__ID AND AL19.LOCATION_ID = AL18.LOCATION_ID AND 
> AL2.BILL_TO__USE_ID = AL17._USE_ID AND AL2.SET_OF_X_ID = 
> AL3.SET_OF_X_ID AND AL1.CODE_COMBINATION_ID = AL5.CODE_COMBINATION_ID AND 
> AL5.SEGMENT4 = AL8.MU AND AL1.SET_OF_X_ID = AL11.SET_OF_X_ID AND 
> AL2.ORG_ID = AL4.ORGANIZATION_ID AND AL2.OMER_TRX_ID = 
> AL1.OMER_TRX_ID) AND ((AL5.SEGMENT2 = '41' AND AL1.AMOUNT <> 0 AND 
> AL4.NAME IN ('XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 
> 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-') 
> AND AL3.NAME like '%-PR-%')) GROUP BY AL4.NAME, AL5.SEGMENT2, AL2.ATTRIBUTE4, 
> AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
> AL11.NAME
> {code}
> {code:title=Query causing the short logs|borderStyle=solid}
> SELECT AL11.NAME
> FROM
> dfs.`/XXX/XXX/XXX/data/../parquet/XXX_XXX_COMMON/GL_XXX` 
> LIMIT 10
> {code}
> This issue may be a duplicate for [this 
> one|https://issues.apache.org/jira/browse/DRILL-4398] but I created a new one 
> based on [this 
> suggestion|https://issues.apache.org/jira/browse/DRILL-4398?focusedCommentId=15884846=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15884846].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5300) SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files

2017-02-27 Thread Muhammad Gelbana (JIRA)
Muhammad Gelbana created DRILL-5300:
---

 Summary: SYSTEM ERROR: IllegalStateException: Memory was leaked by 
query while querying parquet files
 Key: DRILL-5300
 URL: https://issues.apache.org/jira/browse/DRILL-5300
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.9.0
 Environment: OS: Linux
Reporter: Muhammad Gelbana
 Attachments: both_queries_logs.zip

Running the following query against parquet files (I modified some values for 
privacy reasons)
{code:title=Query causing the long logs|borderStyle=solid}
SELECT AL4.NAME, AL5.SEGMENT2, SUM(AL1.AMOUNT), AL2.ATTRIBUTE4, 
AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
AL11.NAME FROM 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA__TRX_LINE_GL_DIST_ALL`
 AL1, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA_OMER_TRX_ALL`
 AL2, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` 
AL3, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_HR_COMMON/HR_ALL_ORGANIZATION_UNITS`
 AL4, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_CODE_COMBINATIONS`
 AL5, 
dfs.`/disk2/XXX/XXX//data/../parquet//XXAT_AR_MU_TAB` 
AL8, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` 
AL11, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
 AL12, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
 AL13, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
 AL14, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
 AL15, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
 AL16, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
 AL17, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
 AL18, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
 AL19 WHERE (AL2.SHIP_TO__USE_ID = AL15._USE_ID AND 
AL15.___ID = AL14.___ID AND AL14.X__ID = 
AL12.X__ID AND AL12.LOCATION_ID = AL13.LOCATION_ID AND 
AL17.___ID = AL16.___ID AND AL16.X__ID = 
AL19.X__ID AND AL19.LOCATION_ID = AL18.LOCATION_ID AND 
AL2.BILL_TO__USE_ID = AL17._USE_ID AND AL2.SET_OF_X_ID = 
AL3.SET_OF_X_ID AND AL1.CODE_COMBINATION_ID = AL5.CODE_COMBINATION_ID AND 
AL5.SEGMENT4 = AL8.MU AND AL1.SET_OF_X_ID = AL11.SET_OF_X_ID AND 
AL2.ORG_ID = AL4.ORGANIZATION_ID AND AL2.OMER_TRX_ID = AL1.OMER_TRX_ID) 
AND ((AL5.SEGMENT2 = '41' AND AL1.AMOUNT <> 0 AND AL4.NAME IN 
('XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 
'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-') AND AL3.NAME like 
'%-PR-%')) GROUP BY AL4.NAME, AL5.SEGMENT2, AL2.ATTRIBUTE4, 
AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
AL11.NAME
{code}

{code:title=Query causing the short logs|borderStyle=solid}
SELECT AL11.NAME
FROM
dfs.`/XXX/XXX/XXX/data/../parquet/XXX_XXX_COMMON/GL_XXX` 
LIMIT 10
{code}
This issue may be a duplicate for [this 
one|https://issues.apache.org/jira/browse/DRILL-4398] but I created a new one 
based on [this 
suggestion|https://issues.apache.org/jira/browse/DRILL-4398?focusedCommentId=15884846=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15884846].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata

2017-02-27 Thread Senthilkumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Senthilkumar updated DRILL-5298:

Comment: was deleted

(was: The issue has not been fixed still. Table is not created on the event on 
empty dataset from a Hive Table)

> CTAS with 0 records from a SELECT query should create the table with metadata
> -
>
> Key: DRILL-5298
> URL: https://issues.apache.org/jira/browse/DRILL-5298
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata, Query Planning & Optimization, SQL Parser
>Affects Versions: 1.9.0
> Environment: MapR 5.2
>Reporter: Senthilkumar
> Fix For: 1.9.0
>
>
> Hello team,
> I create a table in Drill using CTAS as
> CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0
> It runs successfully.
> But the table is not getting created as there are 0 records getting returned 
> from the SELECT query. 
> CTAS should still go ahead and create the table with the column metadata.
> When BI tools fire up multi-pass queries, with CTAS in the first query, the 
> subsequent queries fail because of a missing table.
> In databases like SQL Server, Postgres, CTAS will create the table, even if 
> the SELECT doesnt return any rows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata

2017-02-27 Thread Senthilkumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Senthilkumar closed DRILL-5298.
---
Resolution: Duplicate

> CTAS with 0 records from a SELECT query should create the table with metadata
> -
>
> Key: DRILL-5298
> URL: https://issues.apache.org/jira/browse/DRILL-5298
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata, Query Planning & Optimization, SQL Parser
>Affects Versions: 1.9.0
> Environment: MapR 5.2
>Reporter: Senthilkumar
> Fix For: 1.9.0
>
>
> Hello team,
> I create a table in Drill using CTAS as
> CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0
> It runs successfully.
> But the table is not getting created as there are 0 records getting returned 
> from the SELECT query. 
> CTAS should still go ahead and create the table with the column metadata.
> When BI tools fire up multi-pass queries, with CTAS in the first query, the 
> subsequent queries fail because of a missing table.
> In databases like SQL Server, Postgres, CTAS will create the table, even if 
> the SELECT doesnt return any rows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata

2017-02-27 Thread Senthilkumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Senthilkumar reopened DRILL-5298:
-

The issue has not been fixed still. Table is not created on the event on empty 
dataset from a Hive Table

> CTAS with 0 records from a SELECT query should create the table with metadata
> -
>
> Key: DRILL-5298
> URL: https://issues.apache.org/jira/browse/DRILL-5298
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata, Query Planning & Optimization
>Affects Versions: 1.9.0
> Environment: MapR 5.2
>Reporter: Senthilkumar
> Fix For: 1.9.0
>
>
> Hello team,
> I create a table in Drill using CTAS as
> CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0
> It runs successfully.
> But the table is not getting created as there are 0 records getting returned 
> from the SELECT query. 
> CTAS should still go ahead and create the table with the column metadata.
> When BI tools fire up multi-pass queries, with CTAS in the first query, the 
> subsequent queries fail because of a missing table.
> In databases like SQL Server, Postgres, CTAS will create the table, even if 
> the SELECT doesnt return any rows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata

2017-02-27 Thread Senthilkumar (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885375#comment-15885375
 ] 

Senthilkumar commented on DRILL-5298:
-

This issue exists in 1.9 still..

If you try the following statement

CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0

and then try with SELECT * FROM CTAS_TEST, it will still fail with NO TABLE 
exception.



> CTAS with 0 records from a SELECT query should create the table with metadata
> -
>
> Key: DRILL-5298
> URL: https://issues.apache.org/jira/browse/DRILL-5298
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata, Query Planning & Optimization
>Affects Versions: 1.9.0
> Environment: MapR 5.2
>Reporter: Senthilkumar
> Fix For: 1.9.0
>
>
> Hello team,
> I create a table in Drill using CTAS as
> CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0
> It runs successfully.
> But the table is not getting created as there are 0 records getting returned 
> from the SELECT query. 
> CTAS should still go ahead and create the table with the column metadata.
> When BI tools fire up multi-pass queries, with CTAS in the first query, the 
> subsequent queries fail because of a missing table.
> In databases like SQL Server, Postgres, CTAS will create the table, even if 
> the SELECT doesnt return any rows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata

2017-02-27 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885370#comment-15885370
 ] 

Khurram Faraaz commented on DRILL-5298:
---

The last comment in DRILL-4517 says that Drill will no longer produce empty 
parquet files.

> CTAS with 0 records from a SELECT query should create the table with metadata
> -
>
> Key: DRILL-5298
> URL: https://issues.apache.org/jira/browse/DRILL-5298
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata, Query Planning & Optimization
>Affects Versions: 1.9.0
> Environment: MapR 5.2
>Reporter: Senthilkumar
> Fix For: 1.9.0
>
>
> Hello team,
> I create a table in Drill using CTAS as
> CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0
> It runs successfully.
> But the table is not getting created as there are 0 records getting returned 
> from the SELECT query. 
> CTAS should still go ahead and create the table with the column metadata.
> When BI tools fire up multi-pass queries, with CTAS in the first query, the 
> subsequent queries fail because of a missing table.
> In databases like SQL Server, Postgres, CTAS will create the table, even if 
> the SELECT doesnt return any rows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-5298) CTAS with 0 records from a SELECT query should create the table with metadata

2017-02-27 Thread Senthilkumar (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885359#comment-15885359
 ] 

Senthilkumar commented on DRILL-5298:
-

Khurram,

I wanted to know if somebody is working on it already. 

> CTAS with 0 records from a SELECT query should create the table with metadata
> -
>
> Key: DRILL-5298
> URL: https://issues.apache.org/jira/browse/DRILL-5298
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata, Query Planning & Optimization
>Affects Versions: 1.9.0
> Environment: MapR 5.2
>Reporter: Senthilkumar
> Fix For: 1.9.0
>
>
> Hello team,
> I create a table in Drill using CTAS as
> CREATE TABLE CTAS_TEST AS SELECT * FROM `hive.default`.`test` WHERE 1 = 0
> It runs successfully.
> But the table is not getting created as there are 0 records getting returned 
> from the SELECT query. 
> CTAS should still go ahead and create the table with the column metadata.
> When BI tools fire up multi-pass queries, with CTAS in the first query, the 
> subsequent queries fail because of a missing table.
> In databases like SQL Server, Postgres, CTAS will create the table, even if 
> the SELECT doesnt return any rows.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)