[jira] [Commented] (DRILL-7242) Query with range predicate hits IOBE when accessing histogram buckets

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839009#comment-16839009
 ] 

ASF GitHub Bot commented on DRILL-7242:
---

gparai commented on pull request #1785: DRILL-7242: Handle additional boundary 
cases and compute better estim…
URL: https://github.com/apache/drill/pull/1785#discussion_r283598318
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -483,7 +483,7 @@ public static ObjectMapper getMapper() {
   Map statisticsValues = new HashMap<>();
   Double ndv = statsProvider.getNdv(fieldName);
   if (ndv != null) {
-statisticsValues.put(ColumnStatisticsKind.NVD, ndv);
+statisticsValues.put(ColumnStatisticsKind.NDV, ndv);
 
 Review comment:
   I thought I had fixed the typo in an earlier PR. Not sure why it is still 
there.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Query with range predicate hits IOBE when accessing histogram buckets
> -
>
> Key: DRILL-7242
> URL: https://issues.apache.org/jira/browse/DRILL-7242
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.17.0
>
>
> Following query hits an IOBE during histogram access:  (make sure to run 
> ANALYZE command before running this query): 
> {noformat}
>  select 1 from dfs.tmp.employee where store_id > 24;
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 11
>   at 
> org.apache.drill.exec.planner.common.NumericEquiDepthHistogram.getSelectedRows(NumericEquiDepthHistogram.java:215)
>  ~[drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
>   at 
> org.apache.drill.exec.planner.common.NumericEquiDepthHistogram.estimatedSelectivity(NumericEquiDepthHistogram.java:130)
>  ~[drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
>   at 
> org.apache.drill.exec.planner.cost.DrillRelMdSelectivity.computeRangeSelectivity(DrillRelMd
> {noformat}
> Here, 24.0 is the end point of the last histogram bucket and the  boundary 
> condition is not being correctly handled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7242) Query with range predicate hits IOBE when accessing histogram buckets

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839000#comment-16839000
 ] 

ASF GitHub Bot commented on DRILL-7242:
---

amansinha100 commented on pull request #1785: DRILL-7242: Handle additional 
boundary cases and compute better estim…
URL: https://github.com/apache/drill/pull/1785#discussion_r283596709
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/NumericEquiDepthHistogram.java
 ##
 @@ -178,101 +185,136 @@ public Double estimatedSelectivity(final RexNode 
columnFilter, final long totalR
 return currentRange;
   }
 
-  private long getSelectedRows(final Range range) {
-final int numBuckets = buckets.length - 1;
+  @VisibleForTesting
+  protected long getSelectedRows(final Range range) {
 double startBucketFraction = 1.0;
 double endBucketFraction = 1.0;
 long numRows = 0;
 int result;
 Double lowValue = null;
 Double highValue = null;
-final int first = 0;
-final int last = buckets.length - 1;
-int startBucket = first;
-int endBucket = last;
+final int firstStartPointIndex = 0;
+final int lastEndPointIndex = buckets.length - 1;
+int startBucket = firstStartPointIndex;
+int endBucket = lastEndPointIndex - 1;
 
 if (range.hasLowerBound()) {
   lowValue = (Double) range.lowerEndpoint();
 
-  // if low value is greater than the end point of the last bucket then 
none of the rows qualify
-  if (lowValue.compareTo(buckets[last]) > 0) {
+  // if low value is greater than the end point of the last bucket or if 
it is equal but the range is open (i.e
+  // predicate is of type > 5 where 5 is the end point of last bucket) 
then none of the rows qualify
+  result = lowValue.compareTo(buckets[lastEndPointIndex]);
+  if (result > 0 || result == 0 && range.lowerBoundType() == 
BoundType.OPEN)  {
 return 0;
   }
-
-  result = lowValue.compareTo(buckets[first]);
+  result = lowValue.compareTo(buckets[firstStartPointIndex]);
 
   // if low value is less than or equal to the first bucket's start point 
then start with the first bucket and all
   // rows in first bucket are included
   if (result <= 0) {
-startBucket = first;
+startBucket = firstStartPointIndex;
 startBucketFraction = 1.0;
   } else {
 // Use a simplified logic where we treat > and >= the same when 
computing selectivity since the
 // difference is going to be very small for reasonable sized data sets
-startBucket = getContainingBucket(lowValue, numBuckets);
+startBucket = getContainingBucket(lowValue, lastEndPointIndex, true);
+
 // expecting start bucket to be >= 0 since other conditions have been 
handled previously
 Preconditions.checkArgument(startBucket >= 0, "Expected start bucket 
id >= 0");
-startBucketFraction = ((double) (buckets[startBucket + 1] - lowValue)) 
/ (buckets[startBucket + 1] - buckets[startBucket]);
+
+   if (buckets[startBucket + 1].doubleValue() == 
buckets[startBucket].doubleValue()) {
+ // if start and end points of the bucket are the same, consider 
entire bucket
+ startBucketFraction = 1.0;
+   } else {
+  startBucketFraction = ((double) (buckets[startBucket + 1] - 
lowValue)) / (buckets[startBucket + 1] - buckets[startBucket]);
+}
   }
 }
 
 if (range.hasUpperBound()) {
   highValue = (Double) range.upperEndpoint();
 
-  // if the high value is less than the start point of the first bucket 
then none of the rows qualify
-  if (highValue.compareTo(buckets[first]) < 0) {
+  // if the high value is less than the start point of the first bucket or 
if it is equal but the range is open (i.e
+  // predicate is of type < 1 where 1 is the start point of the first 
bucket) then none of the rows qualify
+  result = highValue.compareTo(buckets[firstStartPointIndex]);
+  if (result < 0 || (result == 0 && range.upperBoundType() == 
BoundType.OPEN)) {
 return 0;
   }
 
-  result = highValue.compareTo(buckets[last]);
+  result = highValue.compareTo(buckets[lastEndPointIndex]);
 
   // if high value is greater than or equal to the last bucket's end point 
then include the last bucket and all rows in
   // last bucket qualify
   if (result >= 0) {
-endBucket = last;
+endBucket = lastEndPointIndex - 1;
 endBucketFraction = 1.0;
   } else {
 // Use a simplified logic where we treat < and <= the same when 
computing selectivity since the
 // difference is going to be very small for reasonable sized data sets
-endBucket = getContainingBucket(highValue, numBuckets);
+endBucket = getContainingBucket(highValue, lastEndPointIndex, false);
+
 // expecting end bucket to be >= 0 since other 

[jira] [Commented] (DRILL-7242) Query with range predicate hits IOBE when accessing histogram buckets

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839001#comment-16839001
 ] 

ASF GitHub Bot commented on DRILL-7242:
---

amansinha100 commented on issue #1785: DRILL-7242: Handle additional boundary 
cases and compute better estim…
URL: https://github.com/apache/drill/pull/1785#issuecomment-492045583
 
 
   @gparai  I have addressed your review comments.  Could you pls take another 
look ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Query with range predicate hits IOBE when accessing histogram buckets
> -
>
> Key: DRILL-7242
> URL: https://issues.apache.org/jira/browse/DRILL-7242
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.17.0
>
>
> Following query hits an IOBE during histogram access:  (make sure to run 
> ANALYZE command before running this query): 
> {noformat}
>  select 1 from dfs.tmp.employee where store_id > 24;
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 11
>   at 
> org.apache.drill.exec.planner.common.NumericEquiDepthHistogram.getSelectedRows(NumericEquiDepthHistogram.java:215)
>  ~[drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
>   at 
> org.apache.drill.exec.planner.common.NumericEquiDepthHistogram.estimatedSelectivity(NumericEquiDepthHistogram.java:130)
>  ~[drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
>   at 
> org.apache.drill.exec.planner.cost.DrillRelMdSelectivity.computeRangeSelectivity(DrillRelMd
> {noformat}
> Here, 24.0 is the end point of the last histogram bucket and the  boundary 
> condition is not being correctly handled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7242) Query with range predicate hits IOBE when accessing histogram buckets

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838999#comment-16838999
 ] 

ASF GitHub Bot commented on DRILL-7242:
---

amansinha100 commented on pull request #1785: DRILL-7242: Handle additional 
boundary cases and compute better estim…
URL: https://github.com/apache/drill/pull/1785#discussion_r283596694
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/planner/common/TestNumericEquiDepthHistogram.java
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.common;
+
+import org.apache.drill.categories.PlannerTest;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.BoundType;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.Assert;
+import org.apache.drill.shaded.guava.com.google.common.collect.Range;
+
+
+@Category(PlannerTest.class)
+public class TestNumericEquiDepthHistogram {
+
+  @Test
+  public void testHistogramWithUniqueEndpoints() throws Exception {
+int numBuckets = 10;
+int numRowsPerBucket = 250;
+
+// init array with numBuckets + 1 values
+Double[] buckets = {1.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 
90.0, 100.0};
+
+NumericEquiDepthHistogram histogram = new 
NumericEquiDepthHistogram(numBuckets);
+
+for (int i = 0; i < buckets.length; i++) {
+  histogram.setBucketValue(i, buckets[i]);
+}
+histogram.setNumRowsPerBucket(numRowsPerBucket);
+
+// Range: <= 1.0
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Query with range predicate hits IOBE when accessing histogram buckets
> -
>
> Key: DRILL-7242
> URL: https://issues.apache.org/jira/browse/DRILL-7242
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.16.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.17.0
>
>
> Following query hits an IOBE during histogram access:  (make sure to run 
> ANALYZE command before running this query): 
> {noformat}
>  select 1 from dfs.tmp.employee where store_id > 24;
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 11
>   at 
> org.apache.drill.exec.planner.common.NumericEquiDepthHistogram.getSelectedRows(NumericEquiDepthHistogram.java:215)
>  ~[drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
>   at 
> org.apache.drill.exec.planner.common.NumericEquiDepthHistogram.estimatedSelectivity(NumericEquiDepthHistogram.java:130)
>  ~[drill-java-exec-1.16.0.0-mapr.jar:1.16.0.0-mapr]
>   at 
> org.apache.drill.exec.planner.cost.DrillRelMdSelectivity.computeRangeSelectivity(DrillRelMd
> {noformat}
> Here, 24.0 is the end point of the last histogram bucket and the  boundary 
> condition is not being correctly handled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7256) Query over empty Hive tables fails, we will need to print heap usagePercent details in error message

2019-05-13 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz reassigned DRILL-7256:
-

Assignee: Khurram Faraaz

> Query over empty Hive tables fails, we will need to print heap usagePercent 
> details in error message
> 
>
> Key: DRILL-7256
> URL: https://issues.apache.org/jira/browse/DRILL-7256
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.15.0
>Reporter: Khurram Faraaz
>Assignee: Khurram Faraaz
>Priority: Major
>
> The below query from Drill's web UI on Hive tables failed due to not enough 
> heap memory to run this query. 
> It fails intermittently from Drill web UI, and note that the two Hive tables 
> used in the query are empty, meaning they have no data in them. The query 
> does not fail when run from sqlline.
> The error message does not provide information about the usagePercent of heap.
> It will be useful to provide heap usagePercent information as part of the 
> error message in QueryWrapper.java when usagePercent > 
> HEAP_MEMORY_FAILURE_THRESHOLD
> Drill 1.15.0
> Failing query.
> {noformat}
> SELECT a.event_id
>  FROM hive.cust_bhsf_ce_blob a, hive.t_fct_clinical_event b
>  where 
>  a.event_id=b.event_id
>  and a.blob_contents not like '%dd:contenttype="TESTS"%'
>  and b.EVENT_RELATIONSHIP_CD='B'
> and b.EVENT_CLASS_CD in ('DOC')
> and b.entry_mode_cd='Web'
> and b.RECORD_STATUS_CD='Active'
> and b.RESULT_STATUS_CD ='Auth (Verified)'
> and substring(b.valid_until_dt_tm,1,10) >='2017-12-30'
> and substring(b.event_end_date,1,10) >='2018-01-01'
> {noformat}
> Stack trace from drillbit.log 
> {noformat}
> 2019-05-09 16:25:58,472 [qtp1934687-790] ERROR 
> o.a.d.e.server.rest.QueryResources - Query from Web UI Failed
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface.
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned.
> You can also try an ODBC/JDBC client.
> [Error Id: 91668f42-d88e-426b-b1fe-c0d042700500 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.15.0.5-mapr.jar:1.15.0.5-mapr]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:103) 
> ~[drill-java-exec-1.15.0.5-mapr.jar:1.15.0.5-mapr]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:72)
>  ~[drill-java-exec-1.15.0.5-mapr.jar:1.15.0.5-mapr]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQuery(QueryResources.java:87)
>  ~[drill-java-exec-1.15.0.5-mapr.jar:1.15.0.5-mapr]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.8.0_151]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[na:1.8.0_151]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_151]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_151]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.8.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
>  [jersey-server-2.8.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
>  [jersey-server-2.8.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:195)
>  [jersey-server-2.8.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
>  [jersey-server-2.8.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:387)
>  [jersey-server-2.8.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:331)
>  [jersey-server-2.8.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:103)
>  [jersey-server-2.8.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:269) 
> [jersey-server-2.8.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> [jersey-common-2.8.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
> [jersey-common-2.8.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
> 

[jira] [Created] (DRILL-7256) Query over empty Hive tables fails, we will need to print heap usagePercent details in error message

2019-05-13 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-7256:
-

 Summary: Query over empty Hive tables fails, we will need to print 
heap usagePercent details in error message
 Key: DRILL-7256
 URL: https://issues.apache.org/jira/browse/DRILL-7256
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.15.0
Reporter: Khurram Faraaz


The below query from Drill's web UI on Hive tables failed due to not enough 
heap memory to run this query. 
It fails intermittently from Drill web UI, and note that the two Hive tables 
used in the query are empty, meaning they have no data in them. The query does 
not fail when run from sqlline.

The error message does not provide information about the usagePercent of heap.
It will be useful to provide heap usagePercent information as part of the error 
message in QueryWrapper.java when usagePercent > HEAP_MEMORY_FAILURE_THRESHOLD

Drill 1.15.0

Failing query.
{noformat}
SELECT a.event_id
 FROM hive.cust_bhsf_ce_blob a, hive.t_fct_clinical_event b
 where 
 a.event_id=b.event_id
 and a.blob_contents not like '%dd:contenttype="TESTS"%'
 and b.EVENT_RELATIONSHIP_CD='B'
and b.EVENT_CLASS_CD in ('DOC')
and b.entry_mode_cd='Web'
and b.RECORD_STATUS_CD='Active'
and b.RESULT_STATUS_CD ='Auth (Verified)'
and substring(b.valid_until_dt_tm,1,10) >='2017-12-30'
and substring(b.event_end_date,1,10) >='2018-01-01'
{noformat}

Stack trace from drillbit.log 
{noformat}
2019-05-09 16:25:58,472 [qtp1934687-790] ERROR 
o.a.d.e.server.rest.QueryResources - Query from Web UI Failed
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is not 
enough heap memory to run this query using the web interface.

Please try a query with fewer columns or with a filter or limit condition to 
limit the data returned.
You can also try an ODBC/JDBC client.

[Error Id: 91668f42-d88e-426b-b1fe-c0d042700500 ]
 at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.15.0.5-mapr.jar:1.15.0.5-mapr]
 at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:103) 
~[drill-java-exec-1.15.0.5-mapr.jar:1.15.0.5-mapr]
 at 
org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:72)
 ~[drill-java-exec-1.15.0.5-mapr.jar:1.15.0.5-mapr]
 at 
org.apache.drill.exec.server.rest.QueryResources.submitQuery(QueryResources.java:87)
 ~[drill-java-exec-1.15.0.5-mapr.jar:1.15.0.5-mapr]
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_151]
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_151]
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_151]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_151]
 at 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
 [jersey-server-2.8.jar:na]
 at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
 [jersey-server-2.8.jar:na]
 at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
 [jersey-server-2.8.jar:na]
 at 
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:195)
 [jersey-server-2.8.jar:na]
 at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
 [jersey-server-2.8.jar:na]
 at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:387)
 [jersey-server-2.8.jar:na]
 at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:331)
 [jersey-server-2.8.jar:na]
 at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:103)
 [jersey-server-2.8.jar:na]
 at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:269) 
[jersey-server-2.8.jar:na]
 at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
[jersey-common-2.8.jar:na]
 at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
[jersey-common-2.8.jar:na]
 at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
[jersey-common-2.8.jar:na]
 at org.glassfish.jersey.internal.Errors.process(Errors.java:297) 
[jersey-common-2.8.jar:na]
 at org.glassfish.jersey.internal.Errors.process(Errors.java:267) 
[jersey-common-2.8.jar:na]
 at 
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:297)
 [jersey-common-2.8.jar:na]
 at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:252) 
[jersey-server-2.8.jar:na]
 at 

[jira] [Commented] (DRILL-7191) RM blobs persistence in Zookeeper for Distributed RM

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838875#comment-16838875
 ] 

ASF GitHub Bot commented on DRILL-7191:
---

HanumathRao commented on pull request #1762: [DRILL-7191 / DRILL-7026]: RM 
state blob persistence in Zookeeper and Integration of Distributed queue 
configuration with Planner
URL: https://github.com/apache/drill/pull/1762#discussion_r283541289
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/resourcemgr/rmblobmgr/RMConsistentBlobStoreManager.java
 ##
 @@ -0,0 +1,354 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.resourcemgr.rmblobmgr;
+
+import avro.shaded.com.google.common.annotations.VisibleForTesting;
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.SerializationFeature;
+import com.fasterxml.jackson.databind.module.SimpleModule;
+import org.apache.curator.framework.recipes.locks.InterProcessMutex;
+import org.apache.drill.common.scanner.persistence.ScanResult;
+import org.apache.drill.exec.coord.zk.ZKClusterCoordinator;
+import org.apache.drill.exec.exception.StoreException;
+import org.apache.drill.exec.resourcemgr.NodeResources;
+import org.apache.drill.exec.resourcemgr.NodeResources.NodeResourcesDe;
+import org.apache.drill.exec.resourcemgr.config.QueryQueueConfig;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.exception.LeaderChangeException;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.exception.RMBlobUpdateException;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.exception.ResourceUnavailableException;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ClusterStateBlob;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ForemanQueueUsageBlob;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ForemanResourceUsage;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ForemanResourceUsage.ForemanResourceUsageDe;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.QueueLeadershipBlob;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.RMStateBlob;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.sys.PersistentStoreConfig;
+import 
org.apache.drill.exec.store.sys.store.ZookeeperTransactionalPersistenceStore;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+/**
+ * RM state blobs manager which does all the update to the blobs under a 
global lock and in transactional manner.
+ * Since the blobs are updated by multiple Drillbit at same time to maintain 
the strongly consistent information in
+ * these blobs it uses a global lock shared across all the Drillbits.
+ */
+public class RMConsistentBlobStoreManager implements RMBlobStoreManager {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RMConsistentBlobStoreManager.class);
+
+  private static final String RM_BLOBS_ROOT = "rm/blobs";
+
+  private static final String RM_LOCK_ROOT = "/rm/locks";
+
+  private static final String RM_BLOB_GLOBAL_LOCK_NAME = "/rm_blob_lock";
+
+  private static final String RM_BLOB_SER_DE_NAME = "RMStateBlobSerDeModules";
+
+  public static final int RM_STATE_BLOB_VERSION = 1;
+
+  private static final int MAX_ACQUIRE_RETRY = 3;
+
+  private final ZookeeperTransactionalPersistenceStore 
rmBlobStore;
+
+  private final InterProcessMutex globalBlobMutex;
+
+  private final DrillbitContext context;
+
+  private final ObjectMapper serDeMapper;
+
+  private final Map rmStateBlobs;
+
+  private final StringBuilder exceptionStringBuilder = new StringBuilder();
+
+  public RMConsistentBlobStoreManager(DrillbitContext context, 
Collection leafQueues) throws
+StoreException {
+try {
+  this.context = context;
+  this.serDeMapper = initializeMapper(context.getClasspathScan());
+  this.rmBlobStore = 

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838827#comment-16838827
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on issue #1749: DRILL-7177: Format Plugin for Excel Files
URL: https://github.com/apache/drill/pull/1749#issuecomment-491950505
 
 
   @kkhatua 
   This plugin will only work with Excel files from Excel 2007 or greater.  
(The XML Excel files).  This does do data type detection, however with limits.  
All numeric columns are interpreted as doubles.  
   If a column is recognized as a time, the plugin will map those to a Drill 
date/time column.  
   The plugin will execute formulas. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7191) RM blobs persistence in Zookeeper for Distributed RM

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838816#comment-16838816
 ] 

ASF GitHub Bot commented on DRILL-7191:
---

HanumathRao commented on pull request #1762: [DRILL-7191 / DRILL-7026]: RM 
state blob persistence in Zookeeper and Integration of Distributed queue 
configuration with Planner
URL: https://github.com/apache/drill/pull/1762#discussion_r283490423
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/resourcemgr/rmblobmgr/RMConsistentBlobStoreManager.java
 ##
 @@ -0,0 +1,354 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.resourcemgr.rmblobmgr;
+
+import avro.shaded.com.google.common.annotations.VisibleForTesting;
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.SerializationFeature;
+import com.fasterxml.jackson.databind.module.SimpleModule;
+import org.apache.curator.framework.recipes.locks.InterProcessMutex;
+import org.apache.drill.common.scanner.persistence.ScanResult;
+import org.apache.drill.exec.coord.zk.ZKClusterCoordinator;
+import org.apache.drill.exec.exception.StoreException;
+import org.apache.drill.exec.resourcemgr.NodeResources;
+import org.apache.drill.exec.resourcemgr.NodeResources.NodeResourcesDe;
+import org.apache.drill.exec.resourcemgr.config.QueryQueueConfig;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.exception.LeaderChangeException;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.exception.RMBlobUpdateException;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.exception.ResourceUnavailableException;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ClusterStateBlob;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ForemanQueueUsageBlob;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ForemanResourceUsage;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ForemanResourceUsage.ForemanResourceUsageDe;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.QueueLeadershipBlob;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.RMStateBlob;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.sys.PersistentStoreConfig;
+import 
org.apache.drill.exec.store.sys.store.ZookeeperTransactionalPersistenceStore;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+/**
+ * RM state blobs manager which does all the update to the blobs under a 
global lock and in transactional manner.
+ * Since the blobs are updated by multiple Drillbit at same time to maintain 
the strongly consistent information in
+ * these blobs it uses a global lock shared across all the Drillbits.
+ */
+public class RMConsistentBlobStoreManager implements RMBlobStoreManager {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RMConsistentBlobStoreManager.class);
+
+  private static final String RM_BLOBS_ROOT = "rm/blobs";
 
 Review comment:
   do we need to start this path with "/" ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RM blobs persistence in Zookeeper for Distributed RM
> 
>
> Key: DRILL-7191
> URL: https://issues.apache.org/jira/browse/DRILL-7191
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Query Planning  Optimization
>Affects Versions: 1.17.0
>Reporter: Hanumath Rao Maduri
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.17.0
>
>
> Changes to support storing 

[jira] [Commented] (DRILL-7191) RM blobs persistence in Zookeeper for Distributed RM

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838818#comment-16838818
 ] 

ASF GitHub Bot commented on DRILL-7191:
---

HanumathRao commented on pull request #1762: [DRILL-7191 / DRILL-7026]: RM 
state blob persistence in Zookeeper and Integration of Distributed queue 
configuration with Planner
URL: https://github.com/apache/drill/pull/1762#discussion_r283490473
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZookeeperClient.java
 ##
 @@ -301,6 +324,54 @@ public void put(final String path, final byte[] data, 
DataChangeVersion version)
 }
   }
 
+  public void putAsTransaction(Map pathsWithData) {
+putAsTransaction(pathsWithData, null);
+  }
+
+  /**
+   * Puts the given sets of blob and their data's in a transactional manner. 
It expects all the blob path to exist
+   * before calling this api.
+   * @param pathsWithData - map of blob paths to update and the final data
+   * @param version - version holder
+   */
+  public void putAsTransaction(Map pathsWithData, 
DataChangeVersion version) {
+Preconditions.checkNotNull(pathsWithData, "paths and their data to write 
as transaction is missing");
+List targetPaths = new ArrayList<>();
+CuratorTransaction transaction = curator.inTransaction();
+long totalDataBytes = 0;
+
+try {
+  for (Map.Entry entry : pathsWithData.entrySet()) {
+final String target = PathUtils.join(root, entry.getKey());
+
+if (version != null) {
+  transaction = 
transaction.setData().withVersion(version.getVersion()).forPath(target, 
entry.getValue()).and();
+} else {
+  transaction = transaction.setData().forPath(target, 
entry.getValue()).and();
+}
+targetPaths.add(target);
+totalDataBytes += entry.getValue().length;
+  }
+
+  // If total set operator payload is greater than 1MB then curator set 
operation will fail
 
 Review comment:
   greater than or equal to?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RM blobs persistence in Zookeeper for Distributed RM
> 
>
> Key: DRILL-7191
> URL: https://issues.apache.org/jira/browse/DRILL-7191
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Query Planning  Optimization
>Affects Versions: 1.17.0
>Reporter: Hanumath Rao Maduri
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.17.0
>
>
> Changes to support storing UUID for each Drillbit Service Instance locally to 
> be used by planner and execution layer. This UUID is used to uniquely 
> identify a Drillbit and register Drillbit information in the RM StateBlobs.
> Introduced a PersistentStore named ZookeeperTransactionalPersistenceStore 
> with Transactional capabilities using Zookeeper Transactional API’s. This is 
> used for updating RM State blobs as all the updates need to happen in 
> transactional manner. Added RMStateBlobs definition and support for serde to 
> Zookeeper.
> Implementation for DistributedRM and its corresponding QueryRM apis and state 
> management.
> Updated the state management of Query in Foreman so that same Foreman object 
> can be submitted multiple times. Also introduced concept of 2 maps keeping 
> track of waiting and running queries. These were done to support for async 
> admit protocol which will be needed with Distributed RM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7191) RM blobs persistence in Zookeeper for Distributed RM

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838815#comment-16838815
 ] 

ASF GitHub Bot commented on DRILL-7191:
---

HanumathRao commented on pull request #1762: [DRILL-7191 / DRILL-7026]: RM 
state blob persistence in Zookeeper and Integration of Distributed queue 
configuration with Planner
URL: https://github.com/apache/drill/pull/1762#discussion_r283490400
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/resourcemgr/rmblobmgr/RMConsistentBlobStoreManager.java
 ##
 @@ -0,0 +1,354 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.resourcemgr.rmblobmgr;
+
+import avro.shaded.com.google.common.annotations.VisibleForTesting;
+import com.fasterxml.jackson.core.JsonGenerator;
+import com.fasterxml.jackson.core.JsonParser;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.SerializationFeature;
+import com.fasterxml.jackson.databind.module.SimpleModule;
+import org.apache.curator.framework.recipes.locks.InterProcessMutex;
+import org.apache.drill.common.scanner.persistence.ScanResult;
+import org.apache.drill.exec.coord.zk.ZKClusterCoordinator;
+import org.apache.drill.exec.exception.StoreException;
+import org.apache.drill.exec.resourcemgr.NodeResources;
+import org.apache.drill.exec.resourcemgr.NodeResources.NodeResourcesDe;
+import org.apache.drill.exec.resourcemgr.config.QueryQueueConfig;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.exception.LeaderChangeException;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.exception.RMBlobUpdateException;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.exception.ResourceUnavailableException;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ClusterStateBlob;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ForemanQueueUsageBlob;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ForemanResourceUsage;
+import 
org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.ForemanResourceUsage.ForemanResourceUsageDe;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.QueueLeadershipBlob;
+import org.apache.drill.exec.resourcemgr.rmblobmgr.rmblob.RMStateBlob;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.sys.PersistentStoreConfig;
+import 
org.apache.drill.exec.store.sys.store.ZookeeperTransactionalPersistenceStore;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+/**
+ * RM state blobs manager which does all the update to the blobs under a 
global lock and in transactional manner.
+ * Since the blobs are updated by multiple Drillbit at same time to maintain 
the strongly consistent information in
+ * these blobs it uses a global lock shared across all the Drillbits.
+ */
+public class RMConsistentBlobStoreManager implements RMBlobStoreManager {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(RMConsistentBlobStoreManager.class);
+
+  private static final String RM_BLOBS_ROOT = "rm/blobs";
+
+  private static final String RM_LOCK_ROOT = "/rm/locks";
+
+  private static final String RM_BLOB_GLOBAL_LOCK_NAME = "/rm_blob_lock";
+
+  private static final String RM_BLOB_SER_DE_NAME = "RMStateBlobSerDeModules";
+
+  public static final int RM_STATE_BLOB_VERSION = 1;
+
+  private static final int MAX_ACQUIRE_RETRY = 3;
+
+  private final ZookeeperTransactionalPersistenceStore 
rmBlobStore;
+
+  private final InterProcessMutex globalBlobMutex;
+
+  private final DrillbitContext context;
+
+  private final ObjectMapper serDeMapper;
+
+  private final Map rmStateBlobs;
+
+  private final StringBuilder exceptionStringBuilder = new StringBuilder();
+
+  public RMConsistentBlobStoreManager(DrillbitContext context, 
Collection leafQueues) throws
+StoreException {
+try {
+  this.context = context;
+  this.serDeMapper = initializeMapper(context.getClasspathScan());
+  this.rmBlobStore = 

[jira] [Commented] (DRILL-7191) RM blobs persistence in Zookeeper for Distributed RM

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838778#comment-16838778
 ] 

ASF GitHub Bot commented on DRILL-7191:
---

HanumathRao commented on pull request #1762: [DRILL-7191 / DRILL-7026]: RM 
state blob persistence in Zookeeper and Integration of Distributed queue 
configuration with Planner
URL: https://github.com/apache/drill/pull/1762#discussion_r283471547
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZookeeperClient.java
 ##
 @@ -301,6 +324,54 @@ public void put(final String path, final byte[] data, 
DataChangeVersion version)
 }
   }
 
+  public void putAsTransaction(Map pathsWithData) {
+putAsTransaction(pathsWithData, null);
+  }
+
+  /**
+   * Puts the given sets of blob and their data's in a transactional manner. 
It expects all the blob path to exist
+   * before calling this api.
+   * @param pathsWithData - map of blob paths to update and the final data
+   * @param version - version holder
+   */
+  public void putAsTransaction(Map pathsWithData, 
DataChangeVersion version) {
 
 Review comment:
   Do we currently use non null version. If not then can you please mention it 
in the comment that this is needed for future use.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RM blobs persistence in Zookeeper for Distributed RM
> 
>
> Key: DRILL-7191
> URL: https://issues.apache.org/jira/browse/DRILL-7191
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Query Planning  Optimization
>Affects Versions: 1.17.0
>Reporter: Hanumath Rao Maduri
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.17.0
>
>
> Changes to support storing UUID for each Drillbit Service Instance locally to 
> be used by planner and execution layer. This UUID is used to uniquely 
> identify a Drillbit and register Drillbit information in the RM StateBlobs.
> Introduced a PersistentStore named ZookeeperTransactionalPersistenceStore 
> with Transactional capabilities using Zookeeper Transactional API’s. This is 
> used for updating RM State blobs as all the updates need to happen in 
> transactional manner. Added RMStateBlobs definition and support for serde to 
> Zookeeper.
> Implementation for DistributedRM and its corresponding QueryRM apis and state 
> management.
> Updated the state management of Query in Foreman so that same Foreman object 
> can be submitted multiple times. Also introduced concept of 2 maps keeping 
> track of waiting and running queries. These were done to support for async 
> admit protocol which will be needed with Distributed RM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7191) RM blobs persistence in Zookeeper for Distributed RM

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838777#comment-16838777
 ] 

ASF GitHub Bot commented on DRILL-7191:
---

HanumathRao commented on pull request #1762: [DRILL-7191 / DRILL-7026]: RM 
state blob persistence in Zookeeper and Integration of Distributed queue 
configuration with Planner
URL: https://github.com/apache/drill/pull/1762#discussion_r283471493
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZookeeperClient.java
 ##
 @@ -227,13 +230,33 @@ public void create(final String path) {
 
 final String target = PathUtils.join(root, path);
 try {
-  curator.create().withMode(mode).forPath(target);
+  
curator.create().creatingParentsIfNeeded().withMode(mode).forPath(target);
   getCache().rebuildNode(target);
 } catch (final Exception e) {
   throw new DrillRuntimeException("unable to put ", e);
 }
   }
 
+  public void createAsTransaction(List paths) {
+Preconditions.checkNotNull(paths, "no paths provided to create");
 
 Review comment:
   Is it also better to check for empty list of paths? Also it might be good to 
add a comment for this function.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RM blobs persistence in Zookeeper for Distributed RM
> 
>
> Key: DRILL-7191
> URL: https://issues.apache.org/jira/browse/DRILL-7191
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Query Planning  Optimization
>Affects Versions: 1.17.0
>Reporter: Hanumath Rao Maduri
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.17.0
>
>
> Changes to support storing UUID for each Drillbit Service Instance locally to 
> be used by planner and execution layer. This UUID is used to uniquely 
> identify a Drillbit and register Drillbit information in the RM StateBlobs.
> Introduced a PersistentStore named ZookeeperTransactionalPersistenceStore 
> with Transactional capabilities using Zookeeper Transactional API’s. This is 
> used for updating RM State blobs as all the updates need to happen in 
> transactional manner. Added RMStateBlobs definition and support for serde to 
> Zookeeper.
> Implementation for DistributedRM and its corresponding QueryRM apis and state 
> management.
> Updated the state management of Query in Foreman so that same Foreman object 
> can be submitted multiple times. Also introduced concept of 2 maps keeping 
> track of waiting and running queries. These were done to support for async 
> admit protocol which will be needed with Distributed RM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7191) RM blobs persistence in Zookeeper for Distributed RM

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838774#comment-16838774
 ] 

ASF GitHub Bot commented on DRILL-7191:
---

HanumathRao commented on pull request #1762: [DRILL-7191 / DRILL-7026]: RM 
state blob persistence in Zookeeper and Integration of Distributed queue 
configuration with Planner
URL: https://github.com/apache/drill/pull/1762#discussion_r283471413
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java
 ##
 @@ -249,14 +253,19 @@ public RegistrationHandle update(RegistrationHandle 
handle, State state) {
*/
   @Override
   public Collection getOnlineEndPoints() {
-Collection runningEndPoints = new ArrayList<>();
-for (DrillbitEndpoint endpoint: endpoints){
-  if(isDrillbitInState(endpoint, State.ONLINE)) {
-runningEndPoints.add(endpoint);
+return getOnlineEndpointsUUID().keySet();
+  }
+
+  @Override
+  public Map getOnlineEndpointsUUID() {
 
 Review comment:
   Looks like this is duplicate code similar to that of the 
LocalClusterCoordinator. Can you move this function into a common place and use 
it in both the places?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> RM blobs persistence in Zookeeper for Distributed RM
> 
>
> Key: DRILL-7191
> URL: https://issues.apache.org/jira/browse/DRILL-7191
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server, Query Planning  Optimization
>Affects Versions: 1.17.0
>Reporter: Hanumath Rao Maduri
>Assignee: Sorabh Hamirwasia
>Priority: Major
> Fix For: 1.17.0
>
>
> Changes to support storing UUID for each Drillbit Service Instance locally to 
> be used by planner and execution layer. This UUID is used to uniquely 
> identify a Drillbit and register Drillbit information in the RM StateBlobs.
> Introduced a PersistentStore named ZookeeperTransactionalPersistenceStore 
> with Transactional capabilities using Zookeeper Transactional API’s. This is 
> used for updating RM State blobs as all the updates need to happen in 
> transactional manner. Added RMStateBlobs definition and support for serde to 
> Zookeeper.
> Implementation for DistributedRM and its corresponding QueryRM apis and state 
> management.
> Updated the state management of Query in Foreman so that same Foreman object 
> can be submitted multiple times. Also introduced concept of 2 maps keeping 
> track of waiting and running queries. These were done to support for async 
> admit protocol which will be needed with Distributed RM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838760#comment-16838760
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

kkhatua commented on issue #1749: DRILL-7177: Format Plugin for Excel Files
URL: https://github.com/apache/drill/pull/1749#issuecomment-491924798
 
 
   @cgivre This would be a nice add-on.
   Can you rebase your commit and resolve the conflicts?
   Also, are there limitations, such as versions, data type detection, etc? 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838752#comment-16838752
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

kkhatua commented on issue #1779: DRILL-7222: Visualize estimated and actual 
row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-491922589
 
 
   @arina-ielchiieva the feature allows an advanced user to compare estimated 
rows with the actual rowcounts for the query. Since the estimates are part of 
the Logical Plan Text , but in a scientific format, we need to parse it to 
represent it as a whole number with proper formatting (i.e. use of comma as a 
1000s separator). I used a Github snippet since that had the most references 
and usage, rather than look for a (possible?) Javascript library for this 
feature. 
   @sohami could you take a look as well?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7255) Support nulls for all levels of nesting

2019-05-13 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7255:
---

 Summary: Support nulls for all levels of nesting
 Key: DRILL-7255
 URL: https://issues.apache.org/jira/browse/DRILL-7255
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Igor Guzenko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7255) Support nulls for all levels of nesting

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7255:
---

Assignee: Igor Guzenko

> Support nulls for all levels of nesting
> ---
>
> Key: DRILL-7255
> URL: https://issues.apache.org/jira/browse/DRILL-7255
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7254) Read Hive union w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7254:
---

 Summary: Read Hive union w/o nulls
 Key: DRILL-7254
 URL: https://issues.apache.org/jira/browse/DRILL-7254
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Igor Guzenko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7252) Read Hive map using canonical Map vector

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7252:
---

Assignee: Igor Guzenko

> Read Hive map using canonical Map vector
> -
>
> Key: DRILL-7252
> URL: https://issues.apache.org/jira/browse/DRILL-7252
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7253) Read Hive struct w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7253:
---

Assignee: Igor Guzenko

> Read Hive struct w/o nulls
> --
>
> Key: DRILL-7253
> URL: https://issues.apache.org/jira/browse/DRILL-7253
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7254) Read Hive union w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7254:
---

Assignee: Igor Guzenko

> Read Hive union w/o nulls
> -
>
> Key: DRILL-7254
> URL: https://issues.apache.org/jira/browse/DRILL-7254
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7253) Read Hive struct w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7253:
---

 Summary: Read Hive struct w/o nulls
 Key: DRILL-7253
 URL: https://issues.apache.org/jira/browse/DRILL-7253
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Igor Guzenko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7252) Read Hive map using canonical Map vector

2019-05-13 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7252:
---

 Summary: Read Hive map using canonical Map vector
 Key: DRILL-7252
 URL: https://issues.apache.org/jira/browse/DRILL-7252
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Igor Guzenko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7097) Rename MapVector to StructVector

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7097:

Issue Type: Sub-task  (was: Improvement)
Parent: DRILL-3290

> Rename MapVector to StructVector
> 
>
> Key: DRILL-7097
> URL: https://issues.apache.org/jira/browse/DRILL-7097
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> For a long time Drill's MapVector was actually more suitable for representing 
> Struct data. And in Apache Arrow it was actually renamed to StructVector. To 
> align our code with Arrow and give space for planned implementation of 
> canonical Map (DRILL-7096) we need to rename existing MapVector and all 
> related classes. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7251) Read Hive array w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-7251:
---

Assignee: Igor Guzenko

> Read Hive array w/o nulls
> -
>
> Key: DRILL-7251
> URL: https://issues.apache.org/jira/browse/DRILL-7251
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7096) Develop vector for canonical Map

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7096:

Issue Type: Sub-task  (was: Improvement)
Parent: DRILL-3290

> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7251) Read Hive array w/o nulls

2019-05-13 Thread Igor Guzenko (JIRA)
Igor Guzenko created DRILL-7251:
---

 Summary: Read Hive array w/o nulls
 Key: DRILL-7251
 URL: https://issues.apache.org/jira/browse/DRILL-7251
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Storage - Hive
Reporter: Igor Guzenko






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7222) Visualize estimated and actual row counts for a query

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838537#comment-16838537
 ] 

ASF GitHub Bot commented on DRILL-7222:
---

arina-ielchiieva commented on issue #1779: DRILL-7222: Visualize estimated and 
actual row counts for a query
URL: https://github.com/apache/drill/pull/1779#issuecomment-49182
 
 
   @kkhatua could you please request someone to review JS part. I am not a JS 
expert but the changes does not look good to me. I am not sure why we need such 
advanced number parsing so feedback from someone who knows what is going on 
will be very helpful.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Visualize estimated and actual row counts for a query
> -
>
> Key: DRILL-7222
> URL: https://issues.apache.org/jira/browse/DRILL-7222
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Major
>  Labels: doc-impacting, user-experience
> Fix For: 1.17.0
>
>
> With statistics in place, it would be useful to have the *estimated* rowcount 
> along side the *actual* rowcount query profile's operator overview.
> We can extract this from the Physical Plan section of the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7206) Tuning hash join code using primitive int

2019-05-13 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7206:

Reviewer: Boaz Ben-Zvi

> Tuning hash join code using primitive int
> -
>
> Key: DRILL-7206
> URL: https://issues.apache.org/jira/browse/DRILL-7206
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> This issue is try to tune the HashJoin implementation codes. At right or full 
> join type , the HashJoinProbe will create a List member to record 
> the unmatched composed indexes. It can be replaced by a primitive 
> IntArrayList without box/unbox operation ,also benefits the GC if there's 
> more unmatched items.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7206) Tuning hash join code using primitive int

2019-05-13 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7206:

Labels: ready-to-commit  (was: )

> Tuning hash join code using primitive int
> -
>
> Key: DRILL-7206
> URL: https://issues.apache.org/jira/browse/DRILL-7206
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: weijie.tong
>Assignee: weijie.tong
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> This issue is try to tune the HashJoin implementation codes. At right or full 
> join type , the HashJoinProbe will create a List member to record 
> the unmatched composed indexes. It can be replaced by a primitive 
> IntArrayList without box/unbox operation ,also benefits the GC if there's 
> more unmatched items.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7250) Query with CTE fails when its name matches to the table name without access

2019-05-13 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-7250:
--

 Summary: Query with CTE fails when its name matches to the table 
name without access
 Key: DRILL-7250
 URL: https://issues.apache.org/jira/browse/DRILL-7250
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.16.0
Reporter: Volodymyr Vysotskyi
Assignee: Volodymyr Vysotskyi
 Fix For: 1.17.0


When impersonation is enabled, and for example, we have {{lineitem}} table with 
permissions {{750}} which is owned by {{user0_1:group0_1}} and {{user2_1}} 
don't have access to it.

The following query:
{code:sql}
use mini_dfs_plugin.user0_1;
with lineitem as (SELECT 1 as a) select * from lineitem
{code}
submitted from {{user2_1}} fails with the following error:
{noformat}
java.lang.Exception: org.apache.hadoop.security.AccessControlException: 
Permission denied: user=user2_1, access=READ_EXECUTE, 
inode="/user/user0_1/lineitem":user0_1:group0_1:drwxr-x---
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:317)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:229)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:199)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1752)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1736)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1710)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getListingInt(FSDirStatAndListingOp.java:70)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4432)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:999)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:646)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213)

at ...(:0) ~[na:na]
at 
org.apache.drill.exec.util.FileSystemUtil.listRecursive(FileSystemUtil.java:253)
 ~[classes/:na]
at 
org.apache.drill.exec.util.FileSystemUtil.list(FileSystemUtil.java:208) 
~[classes/:na]
at 
org.apache.drill.exec.util.FileSystemUtil.listFiles(FileSystemUtil.java:104) 
~[classes/:na]
at 
org.apache.drill.exec.util.DrillFileSystemUtil.listFiles(DrillFileSystemUtil.java:86)
 ~[classes/:na]
at 
org.apache.drill.exec.store.dfs.FileSelection.minusDirectories(FileSelection.java:178)
 ~[classes/:na]
at 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.detectEmptySelection(WorkspaceSchemaFactory.java:669)
 ~[classes/:na]
at 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:633)
 ~[classes/:na]
at 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.create(WorkspaceSchemaFactory.java:283)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.getNewEntry(ExpandingConcurrentMap.java:96)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.ExpandingConcurrentMap.get(ExpandingConcurrentMap.java:90)
 ~[classes/:na]
at 
org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory$WorkspaceSchema.getTable(WorkspaceSchemaFactory.java:439)
 ~[classes/:na]
at 
org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:83)
 ~[calcite-core-1.18.0-drill-r1.jar:1.18.0-drill-r1]
at 
org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:286) 
~[calcite-core-1.18.0-drill-r1.jar:1.18.0-drill-r1]
at 
org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntryFrom(SqlValidatorUtil.java:1046)
 ~[calcite-core-1.18.0-drill-r1.jar:1.18.0-drill-r1]
at 
org.apache.calcite.sql.validate.SqlValidatorUtil.getTableEntry(SqlValidatorUtil.java:1003)
 ~[calcite-core-1.18.0-drill-r1.jar:1.18.0-drill-r1]
at 

[jira] [Updated] (DRILL-4782) TO_TIME function cannot separate time from date time string

2019-05-13 Thread Igor Guzenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-4782:

Labels: ready-to-commit  (was: )

> TO_TIME function cannot separate time from date time string
> ---
>
> Key: DRILL-4782
> URL: https://issues.apache.org/jira/browse/DRILL-4782
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.6.0, 1.7.0
> Environment: CentOS 7
>Reporter: Matt Keranen
>Assignee: Dmytriy Grinchenko
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> TO_TIME('2016-03-03 00:00', ''-MM-dd HH:mm') returns "05:14:46.656" 
> instead of the expected "00:00:00"
> Adding and additional split does work as expected: TO_TIME(SPLIT('2016-03-03 
> 00:00', ' ')[1], 'HH:mm')



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7196) Queries are still runnable on disabled plugins

2019-05-13 Thread Dmytriy Grinchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytriy Grinchenko updated DRILL-7196:
--
Description: 
The issue were partially addressed with the DRILL-6732 task. However when 
issuing select request with full path to the table, select still able to 
process it regardless of the plugin state:

{code}
0: jdbc:drill:zk=local> select * from `mongo.my_binaries`.`master.files`;

0: jdbc:drill:zk=local> select md5, chunkSize from 
`mongo.ambari_binaries`.`master.files`;
+---++
|md5| chunkSize  |
+---++
| 4e30dc965bb8aa064af4f2a9608fa7ae  | 261120 |
| b223e44347e63fcae296e8432755e07c  | 261120 |
| 2720c5e10e739142d1b28996349a1d7e  | 261120 |
+---++

0: jdbc:drill:zk=local> show schemas;
+-+
| SCHEMA_NAME |
+-+
| cp.default  |
| dfs.default |
| dfs.root|
| dfs.test|
| dfs.tmp |
| hive.default|
| information_schema  |
| sys |
+-+
8 rows selected (0.127 seconds)
{code}

  was:
The issue were partially addressed with the DRILL-6732 task. However when 
issuing select request with full path to the table, select still able to 
process it regardless of the plugin state:

{code}
0: jdbc:drill:zk=local> select * from `mongo.my_binaries`.`master.files`;

+--+---+++--+
|   _id|md5
| chunkSize  |   length   |uploadDate|
+--+---+++--+
|  | 4e30dc965bb8aa064af4f2a9608fa7ae  
| 261120 | 291830238  | 2019-04-24 07:16:22.355  |
|  | b223e44347e63fcae296e8432755e07c  
| 261120 | 649511 | 2019-05-08 06:59:33.618  |
|  | 2720c5e10e739142d1b28996349a1d7e  
| 261120 | 395283140  | 2019-05-10 08:23:27.041  |
+--+---+++--+

0: jdbc:drill:zk=local> show schemas;
+-+
| SCHEMA_NAME |
+-+
| cp.default  |
| dfs.default |
| dfs.root|
| dfs.test|
| dfs.tmp |
| hive.default|
| information_schema  |
| sys |
+-+
8 rows selected (0.127 seconds)
{code}


> Queries are still runnable on disabled plugins
> --
>
> Key: DRILL-7196
> URL: https://issues.apache.org/jira/browse/DRILL-7196
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Fix For: 1.17.0
>
>
> The issue were partially addressed with the DRILL-6732 task. However when 
> issuing select request with full path to the table, select still able to 
> process it regardless of the plugin state:
> {code}
> 0: jdbc:drill:zk=local> select * from `mongo.my_binaries`.`master.files`;
> 0: jdbc:drill:zk=local> select md5, chunkSize from 
> `mongo.ambari_binaries`.`master.files`;
> +---++
> |md5| chunkSize  |
> +---++
> | 4e30dc965bb8aa064af4f2a9608fa7ae  | 261120 |
> | b223e44347e63fcae296e8432755e07c  | 261120 |
> | 2720c5e10e739142d1b28996349a1d7e  | 261120 |
> +---++
> 0: jdbc:drill:zk=local> show schemas;
> +-+
> | SCHEMA_NAME |
> +-+
> | cp.default  |
> | dfs.default |
> | dfs.root|
> | dfs.test|
> | dfs.tmp |
> | hive.default|
> | information_schema  |
> | sys |
> +-+
> 8 rows selected (0.127 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7196) Queries are still runnable on disabled plugins

2019-05-13 Thread Dmytriy Grinchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytriy Grinchenko updated DRILL-7196:
--
Description: 
The issue were partially addressed with the DRILL-6732 task. However when 
issuing select request with full path to the table, select still able to 
process it regardless of the plugin state:

{code}
0: jdbc:drill:zk=local> select * from `mongo.my_binaries`.`master.files`;

+---++
|md5| chunkSize  |
+---++
| 4e30dc965bb8aa064af4f2a9608fa7ae  | 261120 |
| b223e44347e63fcae296e8432755e07c  | 261120 |
| 2720c5e10e739142d1b28996349a1d7e  | 261120 |
+---++

0: jdbc:drill:zk=local> show schemas;
+-+
| SCHEMA_NAME |
+-+
| cp.default  |
| dfs.default |
| dfs.root|
| dfs.test|
| dfs.tmp |
| hive.default|
| information_schema  |
| sys |
+-+
8 rows selected (0.127 seconds)
{code}

  was:
The issue were partially addressed with the DRILL-6732 task. However when 
issuing select request with full path to the table, select still able to 
process it regardless of the plugin state:

{code}
0: jdbc:drill:zk=local> select * from `mongo.my_binaries`.`master.files`;

0: jdbc:drill:zk=local> select md5, chunkSize from 
`mongo.ambari_binaries`.`master.files`;
+---++
|md5| chunkSize  |
+---++
| 4e30dc965bb8aa064af4f2a9608fa7ae  | 261120 |
| b223e44347e63fcae296e8432755e07c  | 261120 |
| 2720c5e10e739142d1b28996349a1d7e  | 261120 |
+---++

0: jdbc:drill:zk=local> show schemas;
+-+
| SCHEMA_NAME |
+-+
| cp.default  |
| dfs.default |
| dfs.root|
| dfs.test|
| dfs.tmp |
| hive.default|
| information_schema  |
| sys |
+-+
8 rows selected (0.127 seconds)
{code}


> Queries are still runnable on disabled plugins
> --
>
> Key: DRILL-7196
> URL: https://issues.apache.org/jira/browse/DRILL-7196
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Fix For: 1.17.0
>
>
> The issue were partially addressed with the DRILL-6732 task. However when 
> issuing select request with full path to the table, select still able to 
> process it regardless of the plugin state:
> {code}
> 0: jdbc:drill:zk=local> select * from `mongo.my_binaries`.`master.files`;
> +---++
> |md5| chunkSize  |
> +---++
> | 4e30dc965bb8aa064af4f2a9608fa7ae  | 261120 |
> | b223e44347e63fcae296e8432755e07c  | 261120 |
> | 2720c5e10e739142d1b28996349a1d7e  | 261120 |
> +---++
> 0: jdbc:drill:zk=local> show schemas;
> +-+
> | SCHEMA_NAME |
> +-+
> | cp.default  |
> | dfs.default |
> | dfs.root|
> | dfs.test|
> | dfs.tmp |
> | hive.default|
> | information_schema  |
> | sys |
> +-+
> 8 rows selected (0.127 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7196) Queries are still runnable on disabled plugins

2019-05-13 Thread Dmytriy Grinchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytriy Grinchenko updated DRILL-7196:
--
Description: 
The issue were partially addressed with the DRILL-6732 task. However when 
issuing select request with full path to the table, select still able to 
process it regardless of the plugin state:

{code}
0: jdbc:drill:zk=local> select * from `mongo.my_binaries`.`master.files`;

+--+---+++--+
|   _id|md5
| chunkSize  |   length   |uploadDate|
+--+---+++--+
|  | 4e30dc965bb8aa064af4f2a9608fa7ae  
| 261120 | 291830238  | 2019-04-24 07:16:22.355  |
|  | b223e44347e63fcae296e8432755e07c  
| 261120 | 649511 | 2019-05-08 06:59:33.618  |
|  | 2720c5e10e739142d1b28996349a1d7e  
| 261120 | 395283140  | 2019-05-10 08:23:27.041  |
+--+---+++--+

0: jdbc:drill:zk=local> show schemas;
+-+
| SCHEMA_NAME |
+-+
| cp.default  |
| dfs.default |
| dfs.root|
| dfs.test|
| dfs.tmp |
| hive.default|
| information_schema  |
| sys |
+-+
8 rows selected (0.127 seconds)
{code}

  was:
Currently Drillbit will return exception like below: 
{code}
Error: SYSTEM ERROR: AssertionError: Rule's description should be unique; 
existing rule=TestRuleName; new rule=TestRuleName
{code}

This error message is not representative about the real failure cause.  Instead 
we need to provide more clear exception, for example:
{code} "Unable to execute query using disabled  storage plugin." 
{code}



> Queries are still runnable on disabled plugins
> --
>
> Key: DRILL-7196
> URL: https://issues.apache.org/jira/browse/DRILL-7196
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Fix For: 1.17.0
>
>
> The issue were partially addressed with the DRILL-6732 task. However when 
> issuing select request with full path to the table, select still able to 
> process it regardless of the plugin state:
> {code}
> 0: jdbc:drill:zk=local> select * from `mongo.my_binaries`.`master.files`;
> +--+---+++--+
> |   _id|md5   
>  | chunkSize  |   length   |uploadDate|
> +--+---+++--+
> |  | 4e30dc965bb8aa064af4f2a9608fa7ae 
>  | 261120 | 291830238  | 2019-04-24 07:16:22.355  |
> |  | b223e44347e63fcae296e8432755e07c 
>  | 261120 | 649511 | 2019-05-08 06:59:33.618  |
> |  | 2720c5e10e739142d1b28996349a1d7e 
>  | 261120 | 395283140  | 2019-05-10 08:23:27.041  |
> +--+---+++--+
> 0: jdbc:drill:zk=local> show schemas;
> +-+
> | SCHEMA_NAME |
> +-+
> | cp.default  |
> | dfs.default |
> | dfs.root|
> | dfs.test|
> | dfs.tmp |
> | hive.default|
> | information_schema  |
> | sys |
> +-+
> 8 rows selected (0.127 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7196) Queries are still runnable on disabled plugins

2019-05-13 Thread Dmytriy Grinchenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytriy Grinchenko updated DRILL-7196:
--
Summary: Queries are still runnable on disabled plugins  (was: Queries are 
still runable when plugin )

> Queries are still runnable on disabled plugins
> --
>
> Key: DRILL-7196
> URL: https://issues.apache.org/jira/browse/DRILL-7196
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.12.0
>Reporter: Dmytriy Grinchenko
>Assignee: Dmytriy Grinchenko
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently Drillbit will return exception like below: 
> {code}
> Error: SYSTEM ERROR: AssertionError: Rule's description should be unique; 
> existing rule=TestRuleName; new rule=TestRuleName
> {code}
> This error message is not representative about the real failure cause.  
> Instead we need to provide more clear exception, for example:
> {code} "Unable to execute query using disabled  storage plugin." 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)