[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...

2017-03-20 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/520#discussion_r107077042
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillDatabaseMetaDataImpl.java
 ---
@@ -184,6 +184,18 @@ private ServerMeta getServerMeta() throws SQLException 
{
 return serverMeta;
   }
 
+  /**
+   * The same as {@link DrillDatabaseMetaDataImpl#getServerMeta()} but 
updates server metadata for the every call.
+   * Can be used for meta data that can be changed over the session
+   * @return server meta information.
+   * @throws SQLException for error when getting server meta
+   */
+  private ServerMeta getUpdatedServerMeta() throws SQLException {
--- End diff --

unused method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...

2017-03-20 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/520#discussion_r107077892
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
@@ -262,6 +266,22 @@ public long 
getParquetRowGroupFilterPushDownThreshold() {
 return 
options.getOption(PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD);
   }
 
+  /**
+   * @return Quoting enum for current quoting identifiers character
+   */
+  public Quoting getQuotingIdentifiers() {
--- End diff --

use local var for `options.getOption(QUOTING_IDENTIFIERS)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #520: DRILL-3510: Add ANSI_QUOTES option so that Drill's ...

2017-03-20 Thread sudheeshkatkam
Github user sudheeshkatkam commented on a diff in the pull request:

https://github.com/apache/drill/pull/520#discussion_r107076868
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserSession.java ---
@@ -125,6 +123,15 @@ public Builder setSupportComplexTypes(boolean 
supportComplexTypes) {
 }
 
 public UserSession build() {
+  if (userSession.properties != null && 
userSession.properties.containsKey(DrillProperties.QUOTING_IDENTIFIERS)) {
--- End diff --

`properties` is already non-null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5372) IntervalVector.getObject() returns non-normalized Period

2017-03-20 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5372:
--

 Summary: IntervalVector.getObject() returns non-normalized Period
 Key: DRILL-5372
 URL: https://issues.apache.org/jira/browse/DRILL-5372
 Project: Apache Drill
  Issue Type: Bug
Reporter: Paul Rogers
Priority: Minor


The Drill {{IntervalVector.Accessor}} class provides a method to return the 
interval as a Joda {{Period}}:

{code}
public Period getObject(int index) {
  final int offsetIndex = index * 16;
  final int months  = data.getInt(offsetIndex);
  final int days= data.getInt(offsetIndex + 4);
  final int millis = data.getInt(offsetIndex + 8);
  final Period p = new Period();
  return p.plusMonths(months).plusDays(days).plusMillis(millis);
}
{code}

This method returns the {{Period}} in non-normalized format. That is, the 
months field can contain a month count greater than 12 (rather than setting the 
year field and month field.)

Similarly, the millisecond field contains the entire time portion, rather than 
setting the hour, minute, second and ms fields.

What seems to be happening is that the code uses {{Period}} as a handy way to 
represent the Drill type rather than effectively converting the Drill type to 
the Joda {{Period}} format.

The workaround is to call the {{.normalizedStandard()}} method on the returned 
value:

{code}
IntervalVector.Accessor accessor = ...
Period p = accessor.getObject(rowNo).normalizedStandard();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5371) Large run-time overhead for nested SELECT queries

2017-03-20 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5371:
--

 Summary: Large run-time overhead for nested SELECT queries
 Key: DRILL-5371
 URL: https://issues.apache.org/jira/browse/DRILL-5371
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.10.0
Reporter: Paul Rogers


See DRILL-5370 - a test in which Drill was stress-tested with nested SELECT 
queries of ever-increasing size.

Semantically, the query does nothing other than:

SELECT a AS b AS c AS ... AS z FROM foo;

The above is not valid SQL, of course, but it shows that the nested SELECTs do 
nothing other than create static aliases for columns, and do so many times via 
layers of nested SELECTs.

{code}
SELECT y AS z FROM
(SELECT x AS y FROM
(SELECT w AS x FROM ...
   (SELECT a FROM someTable...))
{code}

Because the nested selects do not actual processing, only impose aliases, the 
optimizer should be able to optimize away the aliasing. That is, there should 
be no need for any run-time work to simply change the name of a column.

However, when run (with 200 columns, each with 500 character names, but only 10 
rows), the overhead in a debug build is somewhere between 1/2 and 1 second per 
nesting.

That is, for just 10 rows, each layer of nested SELECT adds about 1 second to 
the execution time.

Queries of this form may be pathological if written by humans. But, they are 
typical of queries generated by BI tools. Hence, Drill performance for such 
tools can be increased simply by avoiding doing unnecessary work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5370) Drillbit dies for 5 MB SELECT statement

2017-03-20 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-5370:
--

 Summary: Drillbit dies for 5 MB SELECT statement
 Key: DRILL-5370
 URL: https://issues.apache.org/jira/browse/DRILL-5370
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Paul Rogers


Some community users use Drill with BI tools that generate queries. One such 
tool generates queries that map Drill data into a "cube" format for a 
cube-based visualization engine. Such tools tend to create very large, very 
complex queries.

In replicating an issue found by this user, I created a simple program that 
creates deeply-nested queries of the form:

SELECT a99 AS a98 FROM (SELECT a97 AS a98 FROM(… SELECT a1 FROM myTable)…))

The test used 200 columns each with names of 500 characters long. (Drill has a 
hard limit of 1024 characters for a symbol name.)

The setup was an embedded Drillbit using the new "cluster fixture" test 
framework. The test ran multiple iterations, each wrapping the prior SELECT in 
a new one as shown above. The result is a series of queries that grew in size 
by about 100K each iteration.

Drill handled SELECT statements up to 5 MB in size, after which the Drillbit 
ran out of heap memory, suffered a fatal exception and exited.

One question is why a 5 MB query exhausted multiple GB of heap during query 
parsing and planning.

But, more importantly, Drill should have some way to protect itself from such 
failures. In a production cluster, heap exhaustion will bring down all 
in-flight queries and require a manual restart of the Drillbit.

So, Drill should enforce some limit on the amount of heap memory used by a 
query during the parsing and planning process.

The community user found a failure at around 1 MB, but they very likely had a 
query with much more complex structure than the simple nested-SELECT used in my 
test.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] drill issue #791: DRILL-5369: Add initializer for ServerMetaContext

2017-03-20 Thread superbstreak
Github user superbstreak commented on the issue:

https://github.com/apache/drill/pull/791
  
Thanks @laurentgo for the fix!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5369) Missing initialization for ServerMetaContext

2017-03-20 Thread Laurent Goujon (JIRA)
Laurent Goujon created DRILL-5369:
-

 Summary: Missing initialization for ServerMetaContext
 Key: DRILL-5369
 URL: https://issues.apache.org/jira/browse/DRILL-5369
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - C++
Reporter: Laurent Goujon
Assignee: Laurent Goujon


{{ServerMetaContext}} is not initialized properly which might cause some 
unexpected issues (like getMetadata() returning before receiving the answer) in 
some cases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] drill pull request #790: DRILL-5368: Fix memory leak issue in DrillClientImp...

2017-03-20 Thread laurentgo
GitHub user laurentgo opened a pull request:

https://github.com/apache/drill/pull/790

DRILL-5368: Fix memory leak issue in 
DrillClientImpl::processServerMetaResult

Fix a small memory leak by doing local allocation instead since the
object doesn't escape the function.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/laurentgo/drill laurent/DRILL-5368

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/790.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #790


commit a2e209e772d9613bdc081d7044f66aaeb7338d28
Author: Laurent Goujon 
Date:   2017-03-20T17:55:17Z

DRILL-5368: Fix memory leak issue in 
DrillClientImpl::processServerMetaResult

Fix a small memory leak by doing local allocation instead since the
object doesn't escape the function.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5368) Memory leak in C++ server metadata handler

2017-03-20 Thread Laurent Goujon (JIRA)
Laurent Goujon created DRILL-5368:
-

 Summary: Memory leak in C++ server metadata handler
 Key: DRILL-5368
 URL: https://issues.apache.org/jira/browse/DRILL-5368
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - C++
Affects Versions: 1.10.0
Reporter: Laurent Goujon
Assignee: Laurent Goujon
Priority: Minor


When receiving server metadata response, a protobuf ServerMetaResp object is 
dynamically allocated but never freed.

Since for this handler, there's no need to keep the instance attached to the 
handler (content is copied over by the MetaData class), a reference is enough 
and allocation can be done on the stack.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] drill pull request #777: DRILL-5330: NPE in FunctionImplementationRegistry

2017-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/777


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #767: DRILL-5226: Managed external sort fixes

2017-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/767


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #786: DRILL-5359: Fix ClassCastException when Drill pushe...

2017-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/786


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #782: DRILL-5352: Profile parser printing for multi fragm...

2017-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/782


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #772: DRILL-5316: Check drillbits size before we attempt ...

2017-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/772


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #770: DRILL-5311: Check handshake result in C++ connector

2017-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/770


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #780: DRILL-5349: Fix TestParquetWriter unit tests when s...

2017-03-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/780


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #782: DRILL-5352: Profile parser printing for multi fragments

2017-03-20 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/782
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #780: DRILL-5349: Fix TestParquetWriter unit tests when synchron...

2017-03-20 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/780
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #772: DRILL-5316: Check drillbits size before we attempt to acce...

2017-03-20 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/772
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill issue #767: DRILL-5226: Managed external sort fixes

2017-03-20 Thread amansinha100
Github user amansinha100 commented on the issue:

https://github.com/apache/drill/pull/767
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-5367) Join query returns wrong results

2017-03-20 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5367:
-

 Summary: Join query returns wrong results
 Key: DRILL-5367
 URL: https://issues.apache.org/jira/browse/DRILL-5367
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.10.0
 Environment: 3 node cluster
Reporter: Khurram Faraaz


Join query returns wrong results

Drill 1.10.0 does not return any results.
{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM using_f1 JOIN (SELECT * FROM 
using_f2) foo USING(col_prime);
+-+++-+-+---+--+-+-+--+--++
| col_dt  | col_state  | col_prime  | col_varstr  | col_id  | col_name  | 
col_dt0  | col_state0  | col_prime0  | col_varstr0  | col_id0  | col_name0  |
+-+++-+-+---+--+-+-+--+--++
+-+++-+-+---+--+-+-+--+--++
No rows selected (0.314 seconds)
{noformat}

{noformat}

Explain plan for above failing query

0: jdbc:drill:schema=dfs.tmp> explain plan for SELECT * FROM using_f1 JOIN 
(SELECT * FROM using_f2) foo USING(col_prime);
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  ProjectAllowDup(*=[$0], *0=[$1])
00-02Project(T49¦¦*=[$0], T48¦¦*=[$2])
00-03  Project(T49¦¦*=[$1], col_prime=[$2], T48¦¦*=[$0])
00-04HashJoin(condition=[=($2, $0)], joinType=[inner])
00-06  Project(T48¦¦*=[$0])
00-08Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tmp/using_f2]], 
selectionRoot=maprfs:/tmp/using_f2, numFiles=1, usedMetadataFile=false, 
columns=[`*`]]])
00-05  Project(T49¦¦*=[$0], col_prime=[$1])
00-07Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tmp/using_f1]], 
selectionRoot=maprfs:/tmp/using_f1, numFiles=1, usedMetadataFile=false, 
columns=[`*`]]])
{noformat}

Whereas Postgres 9.3 returns expected results for the same data.

{noformat}
postgres=# SELECT * FROM using_f1 JOIN (SELECT * FROM using_f2) foo 
USING(col_prime);
 col_prime |   col_dt   | col_state |  col_varstr
| col_id | col_name  |   col_dt   | col_state |
  col_varstr| col_id |   col_name
---++---+---
++---++---+-
++--
   103 | 2014-12-24 | TX| 
LUW2QzWGdJfnxHrqm3vwyndzRBFwH8l5xVDaM3hTiZAanp
j   |  19462 | Julie Lennox  | 1990-01-11 | WV| 
KKzEOgle6E5h
NANduNAAIp9DQnGLGxO |  54217 | Derek Wilson
   103 | 1985-07-18 | CA| 
aYQ2uLpPxebGGRvcX0fahrAOO4yhkDRvMPES6PuYsIfwkU
Mrcq6NSdt0j |  48987 | Lillian Lupo  | 1990-01-11 | WV| 
KKzEOgle6E5h
NANduNAAIp9DQnGLGxO |  54217 | Derek Wilson
   103 | 1988-02-27 | SC| 
OcVKheHMyeKLgcvamrJHUxKyCGGJGci3Y9ht2LI9T5Ek1n
wckB|  52840 | Martha Rose   | 1990-01-11 | WV| 
KKzEOgle6E5h
NANduNAAIp9DQnGLGxO |  54217 | Derek Wilson
   211 | 1989-12-06 | SD| HHlmvV4
|   1131 | Kenneth Hayes | 1989-05-31 | MT| 
yhHfCGaCqnAr
XUCD4jRoZQ4fj6IQIKZHUGLlIsSr1L7voCE3lEmj3DOSFqJ0Kq  |  49191 | Joan Stein
43 | 2006-01-24 | NV| 
EJAN2JjRqoQWgp7rHLT1yPMBR50g1Kil3klu1vPritFKB2
5EjmL1tLXleagAP |  30179 | William Strassel  | 2006-03-02 | MI| 
W9G0nWo8QNtH
r9YxOscigPbtXEtNPZ  |  44849 | Catherine Turner
   193 | 1990-01-14 | NV| 9nd3po1bnyasqINVA
|  47775 | James Walters
...
1990-01-14 | NV| 9nd3po1bnyasqINVA
|  47775 | James Walters | 1980-04-22 | ID| 
jR8jr1lqDprU
FPhAX4xZnulndYNd3   |   5876 | Rosie Johnson
 5 | 2004-01-27 | KS| 0A8Gwqm66k6wQ1KzcUdSQKZU3AZtPImxb8
|  57787 | Dean Salazar  | 1997-09-13 | SC| 
uq35Sqf1GfPt
IV1mE2CzwxKaX   |  17041 | Dorothy Paulsen
 5 | 1999-07-12 | UT| 
hQk9DBx1egLNIpi9btvN7GPewgvPROWaNArsxAbRILW3dN
fwi526  |  38130 | Beverly Flores| 1997-09-13 | SC| 
uq35Sqf1GfPt
IV1mE2CzwxKaX   |  17041 | Dorothy Paulsen
(239 rows)
{noformat}



--
This message was sent by Atlassian JIRA