[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525930#comment-16525930
 ] 

ASF GitHub Bot commented on DRILL-6310:
---

Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch 
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r198697870
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ##
 @@ -84,6 +97,63 @@
   "htRowIdx" /* workspace index */, "incoming" /* read container */, 
"outgoing" /* write container */,
   "aggrValuesContainer" /* workspace container */, UPDATE_AGGR_INSIDE, 
UPDATE_AGGR_OUTSIDE, UPDATE_AGGR_INSIDE);
 
+  public int getOutputRowCount() {
+return hashAggMemoryManager.getOutputRowCount();
+  }
+
+  public RecordBatchMemoryManager getRecordBatchMemoryManager() {
+return hashAggMemoryManager;
+  }
+
+  private class HashAggMemoryManager extends RecordBatchMemoryManager {
+private int valuesRowWidth = 0;
+
+HashAggMemoryManager(int outputBatchSize) {
+  super(outputBatchSize);
+}
+
+@Override
+public void update() {
+  // Get sizing information for the batch.
+  setRecordBatchSizer(new RecordBatchSizer(incoming));
+
+  int fieldId = 0;
+  int newOutgoingRowWidth = 0;
+  for (VectorWrapper w : container) {
+if (w.getValueVector() instanceof FixedWidthVector) {
+  newOutgoingRowWidth += ((FixedWidthVector) 
w.getValueVector()).getValueWidth();
+  if (fieldId >= numGroupByExprs) {
+valuesRowWidth += ((FixedWidthVector) 
w.getValueVector()).getValueWidth();
+  }
+} else {
+  RecordBatchSizer.ColumnSize columnSize = 
getRecordBatchSizer().getColumn(columnMapping.get(w.getValueVector().getField().getName()));
+  newOutgoingRowWidth += columnSize.getAllocSizePerEntry();
+  if (fieldId >= numGroupByExprs) {
+valuesRowWidth += columnSize.getAllocSizePerEntry();
+  }
+}
+fieldId++;
+  }
+
+  updateIncomingStats();
+  if (logger.isDebugEnabled()) {
+logger.debug("BATCH_STATS, incoming: {}", getRecordBatchSizer());
+  }
+
+  // We do not want to keep adjusting batch holders target row count
 
 Review comment:
   The code below is correct, however suggestion for elegance: All the code 
below is using a single local parameter (*newOutgoingRowWidth*) , and all the 
rest is multiple calls to methods of the **RecordBatchMemoryManager** class.
 So -- how about moving all the code (from here to the end of update() ) 
into a new method in the **RecordBatchMemoryManager** class, called possibly 
`updateMemoryManagerIfNeeded(newOutgoingRowWidth)` 
   This looks simpler and cleaner.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> limit batch size for hash aggregate
> ---
>
> Key: DRILL-6310
> URL: https://issues.apache.org/jira/browse/DRILL-6310
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525931#comment-16525931
 ] 

ASF GitHub Bot commented on DRILL-6310:
---

Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch 
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r198711357
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ##
 @@ -84,6 +97,63 @@
   "htRowIdx" /* workspace index */, "incoming" /* read container */, 
"outgoing" /* write container */,
   "aggrValuesContainer" /* workspace container */, UPDATE_AGGR_INSIDE, 
UPDATE_AGGR_OUTSIDE, UPDATE_AGGR_INSIDE);
 
+  public int getOutputRowCount() {
+return hashAggMemoryManager.getOutputRowCount();
+  }
+
+  public RecordBatchMemoryManager getRecordBatchMemoryManager() {
+return hashAggMemoryManager;
+  }
+
+  private class HashAggMemoryManager extends RecordBatchMemoryManager {
+private int valuesRowWidth = 0;
+
+HashAggMemoryManager(int outputBatchSize) {
+  super(outputBatchSize);
+}
+
+@Override
+public void update() {
+  // Get sizing information for the batch.
+  setRecordBatchSizer(new RecordBatchSizer(incoming));
+
+  int fieldId = 0;
+  int newOutgoingRowWidth = 0;
+  for (VectorWrapper w : container) {
+if (w.getValueVector() instanceof FixedWidthVector) {
+  newOutgoingRowWidth += ((FixedWidthVector) 
w.getValueVector()).getValueWidth();
+  if (fieldId >= numGroupByExprs) {
+valuesRowWidth += ((FixedWidthVector) 
w.getValueVector()).getValueWidth();
+  }
+} else {
+  RecordBatchSizer.ColumnSize columnSize = 
getRecordBatchSizer().getColumn(columnMapping.get(w.getValueVector().getField().getName()));
+  newOutgoingRowWidth += columnSize.getAllocSizePerEntry();
 
 Review comment:
   I rebased and tested, and there was a failure here `columnSize == null` in 
some of the new EMIT unit test. Initially I thought that this has to do with 
the first outgoing batch (which always has OK_NEW_SCHEMA) being non-empty ; but 
eventually I made a fix (see below) where the 'name' and the 'expr' are 
different. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> limit batch size for hash aggregate
> ---
>
> Key: DRILL-6310
> URL: https://issues.apache.org/jira/browse/DRILL-6310
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525933#comment-16525933
 ] 

ASF GitHub Bot commented on DRILL-6310:
---

Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch 
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r198711656
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ##
 @@ -283,16 +363,23 @@ private HashAggregator createAggregatorInternal() throws 
SchemaChangeException,
 continue;
   }
 
-  if ( expr instanceof FunctionHolderExpression ) {
- String funcName = ((FunctionHolderExpression) expr).getName();
- if ( funcName.equals("sum") || funcName.equals("max") || 
funcName.equals("min") ) {extraNonNullColumns++;}
-  }
   final MaterializedField outputField = 
MaterializedField.create(ne.getRef().getAsNamePart().getName(), 
expr.getMajorType());
-  @SuppressWarnings("resource")
-  ValueVector vv = TypeHelper.getNewVector(outputField, 
oContext.getAllocator());
+  @SuppressWarnings("resource") ValueVector vv = 
TypeHelper.getNewVector(outputField, oContext.getAllocator());
   aggrOutFieldIds[i] = container.add(vv);
 
   aggrExprs[i] = new ValueVectorWriteExpression(aggrOutFieldIds[i], expr, 
true);
+
+  if (expr instanceof FunctionHolderExpression) {
+String funcName = ((FunctionHolderExpression) expr).getName();
+if (funcName.equals("sum") || funcName.equals("max") || 
funcName.equals("min")) {
+  extraNonNullColumns++;
+}
+if (((FunctionCall) ne.getExpr()).args.get(0) instanceof SchemaPath) {
+  columnMapping.put(outputField.getName(), ((SchemaPath) 
((FunctionCall) ne.getExpr()).args.get(0)).getAsNamePart().getName());
+}
 
 Review comment:
   When I tested (TestHashAggEmitOutcome) there were cases that did not match 
**SchemaPath**.
   So I added here:
   ```
 else if (((FunctionCall) ne.getExpr()).args.get(0) instanceof 
FunctionCall) {
 columnMapping.put(outputField.getName(), ((FunctionCall) 
((FunctionCall) ne.getExpr()).args.get(0)).getName());
   }
   ```
   The execution did get there, but aI did not check further (e.g., if this was 
used right in computing the row size).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> limit batch size for hash aggregate
> ---
>
> Key: DRILL-6310
> URL: https://issues.apache.org/jira/browse/DRILL-6310
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525929#comment-16525929
 ] 

ASF GitHub Bot commented on DRILL-6310:
---

Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch 
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r198702179
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
 ##
 @@ -646,14 +647,18 @@ public PutStatus put(int incomingRowIdx, IndexPointer 
htIdxHolder, int hashCode)
 currentIdx = freeIndex++;
 boolean addedBatch = false;
 try {  // ADD A BATCH
-  addedBatch = addBatchIfNeeded(currentIdx);
+  addedBatch = addBatchIfNeeded(currentIdx, targetBatchRowCount);
+  if (addedBatch) {
+// If we just added the batch, update the current index to point to 
beginning of new batch.
+currentIdx = (batchHolders.size() - 1) * BATCH_SIZE;
+freeIndex = currentIdx + 1;
+  }
 } catch (OutOfMemoryException OOME) {
-  retryAfterOOM( currentIdx < batchHolders.size() * BATCH_SIZE );
+  retryAfterOOM( currentIdx < totalBatchHoldersSize);
 
 Review comment:
   *Just a comment:* The idea of using this "max, but not actual" size of batch 
(the constant BATCH_SIZE) is on one hand smart (much less code needs to be 
changed), but also a little awkward, as things don't mean exactly what they are 
(e.g. totalBatchHolderSize).
  Maybe in the future this code should be cleaned; e.g., keep a count of 
the batches and compare to the count, instead of the not-real total.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> limit batch size for hash aggregate
> ---
>
> Key: DRILL-6310
> URL: https://issues.apache.org/jira/browse/DRILL-6310
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525934#comment-16525934
 ] 

ASF GitHub Bot commented on DRILL-6310:
---

Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch 
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r198698608
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchMemoryManager.java
 ##
 @@ -201,6 +201,10 @@ public static int adjustOutputRowCount(int rowCount) {
 return (Math.min(MAX_NUM_ROWS, Math.max(Integer.highestOneBit(rowCount) - 
1, MIN_NUM_ROWS)));
   }
 
+  public static int computeOutputRowCount(int batchSize, int rowWidth) {
+return adjustOutputRowCount(RecordBatchSizer.safeDivide(batchSize, 
rowWidth));
 
 Review comment:
   BTW, `safeDivide` uses `Math.ceil()`, so it may return a number one bigger 
than the actual division. E.g., safeDivide(15, 2) = 8 , not 7.  But probably 
the `- 1` in the adjustment above fixes that issue.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> limit batch size for hash aggregate
> ---
>
> Key: DRILL-6310
> URL: https://issues.apache.org/jira/browse/DRILL-6310
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525932#comment-16525932
 ] 

ASF GitHub Bot commented on DRILL-6310:
---

Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch 
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r198712153
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ##
 @@ -263,13 +343,13 @@ private HashAggregator createAggregatorInternal() throws 
SchemaChangeException,
 
   // add this group-by vector to the output container
   groupByOutFieldIds[i] = container.add(vv);
+  columnMapping.put(outputField.getName(), 
ne.getRef().getAsNamePart().getName());
 
 Review comment:
   The regressions *TestHashAggEmitOutcome* set the "name" and the "expr" to 
different strings, hence the failure above. So I changed this to:
   ```
 columnMapping.put(outputField.getName(), 
ne.getExpr().toString().replace('`',' ').trim() );
   ```
   To make the regressions (e.g. **testHashAggrMultipleEMITOutcome()**) pass.
   It is a little ugly (removing the '`' before and after), but I could not 
think of something nicer.



This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> limit batch size for hash aggregate
> ---
>
> Key: DRILL-6310
> URL: https://issues.apache.org/jira/browse/DRILL-6310
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6549) batch sizing for nested loop join

2018-06-27 Thread Padma Penumarthy (JIRA)
Padma Penumarthy created DRILL-6549:
---

 Summary: batch sizing for nested loop join
 Key: DRILL-6549
 URL: https://issues.apache.org/jira/browse/DRILL-6549
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.13.0
Reporter: Padma Penumarthy
Assignee: Padma Penumarthy
 Fix For: 1.14.0


limit output batch size for nested loop join based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6519) Add String Distance and Phonetic Functions

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525881#comment-16525881
 ] 

ASF GitHub Bot commented on DRILL-6519:
---

cgivre commented on a change in pull request #1331: DRILL-6519: Add String 
Distance and Phonetic Functions
URL: https://github.com/apache/drill/pull/1331#discussion_r198701570
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestStringDistanceFunctions.java
 ##
 @@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.fn.impl;
+
+import org.apache.drill.categories.SqlFunctionTest;
+import org.apache.drill.categories.UnlikelyTest;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterFixtureBuilder;
+import org.apache.drill.test.ClusterTest;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import static org.junit.Assert.assertEquals;
+
+@Category({UnlikelyTest.class, SqlFunctionTest.class})
+public class TestStringDistanceFunctions extends ClusterTest {
+
+  @BeforeClass
+  public static void setup() throws Exception {
 
 Review comment:
   @arina-ielchiieva I added a singletonDouble() function and reworked the unit 
tests accordingly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add String Distance and Phonetic Functions
> --
>
> Key: DRILL-6519
> URL: https://issues.apache.org/jira/browse/DRILL-6519
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> From a recent project, this collection of functions makes it possible to do 
> fuzzy string matching as well as phonetic matching on strings. 
>  
> The following functions are all phonetic functions and map text to a number 
> or string based on how the word sounds.  For instance "Jayme" and "Jaime" 
> have the same soundex values and hence these functions can be used to match 
> similar sounding words.
>  * caverphone1(  )
>  * caverphone2(  )
>  * cologne_phonetic(  )
>  * dm_soundex(  )
>  * double_metaphone()
>  * match_rating_encoder(  )
>  * metaphone()
>  * nysiis(  )
>  * refined_soundex()
>  * soundex()
> Additionally, there is the
> {code:java}
> sounds_like(,){code}
> function which can be used to find strings that sound similar.   For instance:
>  
> {code:java}
> SELECT * 
> FROM 
> WHERE sounds_like( last_name, 'Gretsky' )
> {code}
> h2. String Distance Functions
> In addition to the phonetic functions, there are a series of distance 
> functions which measure the difference between two strings.  The functions 
> include:
>  * cosine_distance(,)
>  * fuzzy_score(,)
>  * hamming_distance (,)
>  * jaccard_distance (,)
>  * jaro_distance (,)
>  * levenshtein_distance (,)
>  * longest_common_substring_distance(,)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6422) Update guava to 23.0 and shade it

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525834#comment-16525834
 ] 

ASF GitHub Bot commented on DRILL-6422:
---

vrozov commented on a change in pull request #1264:  DRILL-6422: Update guava 
to 23.0 and shade it
URL: https://github.com/apache/drill/pull/1264#discussion_r198691250
 
 

 ##
 File path: drill-shaded/pom.xml
 ##
 @@ -0,0 +1,84 @@
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+  4.0.0
+
+  
+org.apache
+apache
+18
+
+  
+
+  org.apache.drill
+  drill-shaded
+  1.0
+
+  drill-shaded
+  pom
+
+  
+guava-shaded
+  
+
+  
+
+  
+org.apache.maven.plugins
+maven-compiler-plugin
+
 
 Review comment:
   AFAIK class version should not change when a class is relocated. If guava 
classes have version 52, the same version will be in the shaded library.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update guava to 23.0 and shade it
> -
>
> Key: DRILL-6422
> URL: https://issues.apache.org/jira/browse/DRILL-6422
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.14.0
>
>
> Some hadoop libraries use old versions of guava and most of them are 
> incompatible with guava 23.0.
> To allow usage of new guava version, it should be shaded and shaded version 
> should be used in the project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6422) Update guava to 23.0 and shade it

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525830#comment-16525830
 ] 

ASF GitHub Bot commented on DRILL-6422:
---

vrozov commented on a change in pull request #1264:  DRILL-6422: Update guava 
to 23.0 and shade it
URL: https://github.com/apache/drill/pull/1264#discussion_r198690513
 
 

 ##
 File path: drill-shaded/guava-shaded/pom.xml
 ##
 @@ -0,0 +1,147 @@
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+  4.0.0
+
+  
+org.apache
+apache
+18
+  
+
+  org.apache.drill
+  guava-shaded
+  ${dep.guava.version}
+  drill-shaded/guava-shaded
+
+  jar
+
+  
+
+false
+23.0
+  
+
+  
+
+  com.google.guava
+  guava
+  ${dep.guava.version}
+  jar
+
+  
+
+  
+
+
${project.build.directory}/shaded-sources
+
+
+  
 
 Review comment:
   It meaningless to bind `default-compile` or `default-testCompile` execution 
`id` for `maven-surefire-plugin` as it does not define such execution `id`. The 
intention of binding default execution `id` to phase `none` is to disable 
execution of those plugins when desired, instead of relying on other settings 
(like `skip` or `skipTests`). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update guava to 23.0 and shade it
> -
>
> Key: DRILL-6422
> URL: https://issues.apache.org/jira/browse/DRILL-6422
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.14.0
>
>
> Some hadoop libraries use old versions of guava and most of them are 
> incompatible with guava 23.0.
> To allow usage of new guava version, it should be shaded and shaded version 
> should be used in the project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6422) Update guava to 23.0 and shade it

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525815#comment-16525815
 ] 

ASF GitHub Bot commented on DRILL-6422:
---

vrozov commented on a change in pull request #1264:  DRILL-6422: Update guava 
to 23.0 and shade it
URL: https://github.com/apache/drill/pull/1264#discussion_r198689121
 
 

 ##
 File path: pom.xml
 ##
 @@ -3334,5 +3351,6 @@
 exec
 drill-yarn
 distribution
+drill-shaded
 
 Review comment:
   What type of changes do you expect in pom.xml? Why would you prefer to build 
the same artifact over and over again even though it does not change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update guava to 23.0 and shade it
> -
>
> Key: DRILL-6422
> URL: https://issues.apache.org/jira/browse/DRILL-6422
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.14.0
>
>
> Some hadoop libraries use old versions of guava and most of them are 
> incompatible with guava 23.0.
> To allow usage of new guava version, it should be shaded and shaded version 
> should be used in the project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-6454) Native MapR DB plugin support for Hive MapR-DB json table

2018-06-27 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522841#comment-16522841
 ] 

Bridget Bevens edited comment on DRILL-6454 at 6/28/18 12:21 AM:
-

Hi [~vitalii],

Thank you for adding the doc notes to the description. 

I'll add the following description to the MapR Drill release notes:
A new option, store.hive.maprdb_json.optimize_scan_with_native_reader, enables 
Drill to use the native Drill reader to read Hive external tables that were 
created from MapR-DB JSON tables. When you enable this option, Drill performs 
faster reads of the data and applies filter pushdown optimizations. 

I'll add the following info to the MapR MapR-DB format plugin doc page:
Starting in Drill 1.14 (MEP 6.0), Drill can use the native Drill reader to read 
Hive external tables that were created from MapR-DB JSON tables. When using the 
native reader, Drill performs faster reads of the data and can apply filter 
pushdown optimizations. Use the SET command with the 
store.hive.maprdb_json.optimize_scan_with_native_reader option to enable this 
functionality:
SET `store.hive.maprdb_json.optimize_scan_with_native_reader` = true;  

Regarding the store.hive.parquet.optimize_scan_with_native_reader: false 
option, I've updated the following pages with the new option and included a 
note on the options page about the old option being deprecated in Drill 1.15:
https://drill.apache.org/docs/querying-hive/ 
https://drill.apache.org/docs/configuration-options-introduction/ 

I'm going to set doc status to doc-complete, but please let me know if you see 
any issues with the doc updates.

Thanks,
Bridget


was (Author: bbevens):
Hi [~vitalii],

Thank you for adding the doc notes to the description. 
Can you please review the following description for the 
store.hive.maprdb_json.optimize_scan_with_native_reader option?

Description:
When you enable the store.hive.maprdb_json.optimize_scan_with_native_reader 
option, Drill can use the native Drill reader to read [Hive external tables 
that were created from MapR-DB JSON 
tables|https://maprdocs.mapr.com/home/Hive/ConnectingToMapR-DB.html]. The 
native Drill reader enables Drill to perform faster reads of data and apply 
filter pushdown optimizations.  

Thanks,

Bridget


> Native MapR DB plugin support for Hive MapR-DB json table
> -
>
> Key: DRILL-6454
> URL: https://issues.apache.org/jira/browse/DRILL-6454
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.14.0
>
>
> Hive can create and query MapR-DB tables via maprdb-json-handler:
>  [https://maprdocs.mapr.com/home/Hive/ConnectingToMapR-DB.html]
> The aim of this Jira to implement Drill native reader for Hive MapR-DB tables 
> (similar to parquet).
> Design proposal is:
>  - to use JsonTableGroupScan instead of HiveScan;
>  - to add storage planning rule to convert HiveScan to MapRDBGroupScan;
>  - to add system/session option to enable using of this native reader;
>  - native reader can be used only for Drill build with mapr profile (there is 
> no reason to leverage it for default profile);
>  
> *For documentation:*
> two new options were added:
> store.hive.parquet.optimize_scan_with_native_reader: false,
> store.hive.maprdb_json.optimize_scan_with_native_reader: false,
> store.hive.parquet.optimize_scan_with_native_reader is new option used 
> instead of store.hive.optimize_scan_with_native_readers. The latter is 
> deprecated and will be removed in 1.15.
> (https://issues.apache.org/jira/browse/DRILL-6527).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6454) Native MapR DB plugin support for Hive MapR-DB json table

2018-06-27 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-6454:
--
Labels: doc-complete ready-to-commit  (was: doc-impacting ready-to-commit)

> Native MapR DB plugin support for Hive MapR-DB json table
> -
>
> Key: DRILL-6454
> URL: https://issues.apache.org/jira/browse/DRILL-6454
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.14.0
>
>
> Hive can create and query MapR-DB tables via maprdb-json-handler:
>  [https://maprdocs.mapr.com/home/Hive/ConnectingToMapR-DB.html]
> The aim of this Jira to implement Drill native reader for Hive MapR-DB tables 
> (similar to parquet).
> Design proposal is:
>  - to use JsonTableGroupScan instead of HiveScan;
>  - to add storage planning rule to convert HiveScan to MapRDBGroupScan;
>  - to add system/session option to enable using of this native reader;
>  - native reader can be used only for Drill build with mapr profile (there is 
> no reason to leverage it for default profile);
>  
> *For documentation:*
> two new options were added:
> store.hive.parquet.optimize_scan_with_native_reader: false,
> store.hive.maprdb_json.optimize_scan_with_native_reader: false,
> store.hive.parquet.optimize_scan_with_native_reader is new option used 
> instead of store.hive.optimize_scan_with_native_readers. The latter is 
> deprecated and will be removed in 1.15.
> (https://issues.apache.org/jira/browse/DRILL-6527).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6527) Update option name for Drill Parquet native reader

2018-06-27 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525749#comment-16525749
 ] 

Bridget Bevens commented on DRILL-6527:
---

I've updated the name where I could find it in the docs, on the following pages:
https://drill.apache.org/docs/querying-hive/ 
https://drill.apache.org/docs/configuration-options-introduction/ 
Thanks,
Bridget

> Update option name for Drill Parquet native reader
> --
>
> Key: DRILL-6527
> URL: https://issues.apache.org/jira/browse/DRILL-6527
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - Parquet
>Affects Versions: 1.14.0
>Reporter: Vitalii Diravka
>Priority: Minor
> Fix For: 1.15.0
>
>
> The old option name to enable Drill parquet reader is 
> "store.hive.optimize_scan_with_native_readers".
> Starting from DRILL-6454 one new native reader is introduced, therefore more 
> precise option name is added for parquet native reader too.
> A new option name for parquet reader is 
> "store.hive.parquet.optimize_scan_with_native_reader".
> The old one is deprecated and should be removed starting from Drill 1.15.0 
> release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6494) Drill Plugins Handler

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525731#comment-16525731
 ] 

ASF GitHub Bot commented on DRILL-6494:
---

vdiravka commented on issue #1345: DRILL-6494: Drill Plugins Handler
URL: https://github.com/apache/drill/pull/1345#issuecomment-400864708
 
 
   @sohami Please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill Plugins Handler
> -
>
> Key: DRILL-6494
> URL: https://issues.apache.org/jira/browse/DRILL-6494
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> The new service of updating Drill's plugins configs could be implemented.
> Please find details from design overview document:
> https://docs.google.com/document/d/14JKb2TA8dGnOIE5YT2RImkJ7R0IAYSGjJg8xItL5yMI/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6494) Drill Plugins Handler

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525730#comment-16525730
 ] 

ASF GitHub Bot commented on DRILL-6494:
---

vdiravka opened a new pull request #1345: DRILL-6494: Drill Plugins Handler
URL: https://github.com/apache/drill/pull/1345
 
 
   - StoragePluginsHandler is added and implementation as 
StoragePluginsUpdater. It is used in the init()
 stage of StoragePluginRegistryImpl and updates storage plugins configs 
from storage-plugins.conf file.
 If plugins configs are present in the persistence store - they are 
updated, otherwise bootstrap plugins are updated
 and the result configs are loaded to persistence store. If the enabled 
status is absent in the storage-plugins.conf
 file, the last plugin config enabled status persists.
   - The "NULL" issue with updating Hive plugin config by REST is solved. But 
clients are still being instantiated for disabled
 plugins - DRILL-6412.
   - "org.honton.chas.hocon:jackson-dataformat-hocon" library is added for the 
proper deserializing HOCON conf file
   - additional refactoring: "com.typesafe:config" and 
"org.apache.commons:commons-lang3" are placed into DependencyManagement
   block with proper versions; correct properties in DrillMetrics class are 
specified


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill Plugins Handler
> -
>
> Key: DRILL-6494
> URL: https://issues.apache.org/jira/browse/DRILL-6494
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Tools, Build & Test
>Affects Versions: 1.13.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 1.14.0
>
>
> The new service of updating Drill's plugins configs could be implemented.
> Please find details from design overview document:
> https://docs.google.com/document/d/14JKb2TA8dGnOIE5YT2RImkJ7R0IAYSGjJg8xItL5yMI/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6498) Support for EMIT outcome in ExternalSortBatch

2018-06-27 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6498:
-
Labels: ready-to-commit  (was: )

> Support for EMIT outcome in ExternalSortBatch
> -
>
> Key: DRILL-6498
> URL: https://issues.apache.org/jira/browse/DRILL-6498
> Project: Apache Drill
>  Issue Type: Task
>  Components: Execution - Relational Operators
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> With Lateral and Unnest if Sort is present in the sub-query, then it needs to 
> handle the EMIT outcome correctly. This means when a EMIT is received then 
> perform the Sort operation on the records buffered so far and produce output 
> with it. After EMIT Sort should refresh it's state and again work on next 
> batches of incoming record unless an EMIT is seen again.
> For first cut Sort will not support spilling in the subquery between Lateral 
> and Unnest since spilling is very unlikely. The worst case that can happen is 
> that Lateral will get a batch with only 1 row of data because of repeated 
> type column data size being too big. In that case Unnest will produce 1 
> output batch only and Sort or other blocking operators anyways needs enough 
> memory to at least hold 1 incoming batch. So in ideal cases spilling should 
> not happen. But if there is a operator between Sort and Unnest which 
> increases the data size then Sort might be in a situation to spill but thats 
> not a common case for now.
>  
> *Description of Changes:*
>  Currently the sort operator is implemented in below way. This is to provide 
> general high level working of Sort and how EMIT support was implemented.
> 1) In buildSchema phase SORT creates an empty container with SV NONE and 
> sends that downstream.
> 2) Post buildSchema phase it goes into a LOAD and keeps calling next() on 
> upstream until it sees NONE or there is a failure.
> 3) Each batch which it receives it applies SV2 on them if it already doesn't 
> have it and sort them and then buffers the batch after converting it into 
> something called BatchGroup.InputBatch.
> 4) During buffering it looks for memory pressure and spill as needed.
> 5) Once all the batches are received and it gets None from upstream, it 
> starts a merge phase.
> 6) In Merge phase it check if the merge can happen in memory or spilling is 
> needed and perform the merge accordingly. 
> 7) Sort has a concept of SortResults which represents different kinds of 
> output container that sort can generate based on input batches and memory 
> conditions. For example if it's an in-memory merge then output container of 
> sort is SV4 container with SortResults of type MergeSortWrapper. If its spill 
> and merge then container is of SV_NONE type with SortResults as BatchMerger. 
> There are SortResults type for empty and single batches (not used anywhere).
> 8) SortResults basically provides an abstraction such that it provides output 
> result with each next call to it backed by output container of the 
> ExternalSortRecordBatch along with correct recordCount and Sv2/SV4 as needed. 
> So for example: in case of MergeSortWrapper all the inputs are in memory and 
> hence all output is also in memory backed by SV4. For each next call 
> basically SV4 is updated with the start index and length which informs called 
> about record boundary that it should consume. For BatchMerger based on memory 
> pressure and number of record Sort can output with each output container, it 
> fills the output container with that many records and sends downstream.
> 9) Also the abstraction of SortResults is such that at beginning of 
> MergePhase output container which is held by SortResults is cleared off and 
> later re-initialized after merge is completed.
> Now in current condition since SORT is a blocking operator it was clearing 
> the output container ValueVectors post buildSchema phase and in load phase. 
> And later it create the final output container (with ValueVector objects 
> )after it has seen all the incoming data. The very first output batch is 
> always returned with OK_NEW_SCHEMA such that downstream operator can setup 
> the correct SV mode and schema with the first output batch, since schema 
> returned in buildSchema phase was a dummy one. So the vector references 
> maintained by downstream operator in buildSchema phase is updated with vector 
> references in the first output batch.
> With EMIT however SORT will go into load phase multiple times and hence we 
> cannot clear off the output container of Sort after each EMIT boundary. If we 
> do that then downstream operator which is holding references to ExternalSort 
> output container ValueVector will become inval

[jira] [Assigned] (DRILL-6529) Project Batch Sizing causes two LargeFileCompilation tests to timeout

2018-06-27 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6529:


Assignee: Pritesh Maker  (was: Karthikeyan Manivannan)

> Project Batch Sizing causes two LargeFileCompilation tests to timeout
> -
>
> Key: DRILL-6529
> URL: https://issues.apache.org/jira/browse/DRILL-6529
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Karthikeyan Manivannan
>Assignee: Pritesh Maker
>Priority: Major
> Fix For: 1.14.0
>
>
> Timeout failures are seen in TestLargeFileCompilation testExternal_Sort and 
> testTop_N_Sort. These tests are stress tests for compilation where the 
> queries cover projections over 5000 columns and sort over 500 columns. These 
> tests pass if they are run stand-alone. Something triggers the timeouts when 
> the tests are run in parallel as part of a unit test run.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6529) Project Batch Sizing causes two LargeFileCompilation tests to timeout

2018-06-27 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker reassigned DRILL-6529:


Assignee: Karthikeyan Manivannan  (was: Pritesh Maker)

> Project Batch Sizing causes two LargeFileCompilation tests to timeout
> -
>
> Key: DRILL-6529
> URL: https://issues.apache.org/jira/browse/DRILL-6529
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.14.0
>
>
> Timeout failures are seen in TestLargeFileCompilation testExternal_Sort and 
> testTop_N_Sort. These tests are stress tests for compilation where the 
> queries cover projections over 5000 columns and sort over 500 columns. These 
> tests pass if they are run stand-alone. Something triggers the timeouts when 
> the tests are run in parallel as part of a unit test run.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6145) Enable usage of Hive MapR-DB JSON handler

2018-06-27 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-6145:
--
Labels: doc-complete  (was: doc-impacting)

> Enable usage of Hive MapR-DB JSON handler
> -
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-complete
> Fix For: 1.14.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> hive> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> > movie_id string, title string, studio string) 
> > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> > TBLPROPERTIES("maprdb.table.name" = 
> "/tmp/table/json","maprdb.column.id" = "movie_id");
> {code}
>  
>  #  Use hive schema to query this table via Drill:
> {code}
> 0: jdbc:drill:> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6145) Enable usage of Hive MapR-DB JSON handler

2018-06-27 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525716#comment-16525716
 ] 

Bridget Bevens commented on DRILL-6145:
---

Setting doc status to doc-complete, but you can let me know if you see any 
issues and I can make the necessary changes. 
Thanks,
Bridget

> Enable usage of Hive MapR-DB JSON handler
> -
>
> Key: DRILL-6145
> URL: https://issues.apache.org/jira/browse/DRILL-6145
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Hive, Storage - MapRDB
>Affects Versions: 1.12.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-complete
> Fix For: 1.14.0
>
>
> Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's 
> external tables it is necessary to add "hive-maprdb-json-handler".
> Use case:
>  # Create a table MapR-DB JSON table:
> {code}
> _> mapr dbshell_
> _maprdb root:> create /tmp/table/json_  (make sure /tmp/table exists)
> {code}
> -- insert data
> {code}
> insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers 
> on the Edge", "studio":"Command Line Studios"}'
> insert /tmp/table/json --id movie003 --value '\{"title":"The Golden 
> Master", "studio":"All-Nighter"}'
> {code} 
>  #  Create a Hive external table:
> {code}
> hive> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( 
> > movie_id string, title string, studio string) 
> > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
> > TBLPROPERTIES("maprdb.table.name" = 
> "/tmp/table/json","maprdb.column.id" = "movie_id");
> {code}
>  
>  #  Use hive schema to query this table via Drill:
> {code}
> 0: jdbc:drill:> select * from hive.mapr_db_json_hive_tbl;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6094) Decimal data type enhancements

2018-06-27 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-6094:
--
Labels: doc-complete  (was: doc-impacting)

> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-complete
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6094) Decimal data type enhancements

2018-06-27 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525711#comment-16525711
 ] 

Bridget Bevens commented on DRILL-6094:
---

I edited the following pages based on the work done for DECIMAL data type and 
input from Volodymyr. (Thank you, [~vvysotskyi]!)
https://drill.apache.org/docs/aggregate-and-aggregate-statistical/ 
https://drill.apache.org/docs/aggregate-window-functions/ 
https://drill.apache.org/docs/data-type-conversion/#data-type-conversion-examples
 
https://drill.apache.org/docs/supported-data-types/ 
https://drill.apache.org/docs/parquet-format/ 
https://drill.apache.org/docs/math-and-trig/   

I also created a new section called “Decimal Data Type” on this page to cover 
some of the changes: 
https://drill.apache.org/docs/supported-data-types/#decimal-data-type  

I'm setting doc status to doc-complete, but please let me know if I missed 
anything or need to make any changes.

Thanks,
Bridget


> Decimal data type enhancements
> --
>
> Key: DRILL-6094
> URL: https://issues.apache.org/jira/browse/DRILL-6094
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.14.0
>
>
> Currently, Decimal types are disabled by default since existing Decimal 
> implementation has a lot of flaws and performance problems. The goal of this 
> Jira to describe majority of them and possible ways of improving existing 
> implementation to be able to enable Decimal data types by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6548) IllegalStateException: Unexpected EMIT outcome received in buildSchema phase

2018-06-27 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-6548:
-

 Summary: IllegalStateException: Unexpected EMIT outcome received 
in buildSchema phase
 Key: DRILL-6548
 URL: https://issues.apache.org/jira/browse/DRILL-6548
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.14.0
Reporter: Khurram Faraaz
Assignee: Sorabh Hamirwasia


On a four node Apache Drill 1.14.0 master branch against TPC-DS SF1 parquet 
data (parquet views)
git.commit.id.abbrev=b92f599

TPC-DS query 69 fails with IllegalStateException: Unexpected EMIT outcome 
received in buildSchema phase

Failing query is,

{noformat}
2018-06-27 15:24:39,493 [24cbf157-e95c-42ab-7307-f75f5943a277:foreman] INFO 
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
24cbf157-e95c-42ab-7307-f75f5943a277: SELECT cd_gender,
cd_marital_status,
cd_education_status,
Count(*) cnt1,
cd_purchase_estimate,
Count(*) cnt2,
cd_credit_rating,
FROM customer c,
customer_address ca,
customer_demographics
WHERE c.c_current_addr_sk = ca.ca_address_sk
AND ca_state IN ( 'KS', 'AZ', 'NE' )
AND cd_demo_sk = c.c_current_cdemo_sk
AND EXISTS (SELECT *
FROM store_sales,
date_dim
WHERE c.c_customer_sk = ss_customer_sk
AND ss_sold_date_sk = d_date_sk
AND d_year = 2004
AND d_moy BETWEEN 3 AND 3 + 2)
AND ( NOT EXISTS (SELECT *
FROM web_sales,
date_dim
WHERE c.c_customer_sk = ws_bill_customer_sk
AND ws_sold_date_sk = d_date_sk
AND d_year = 2004
AND d_moy BETWEEN 3 AND 3 + 2)
AND NOT EXISTS (SELECT *
FROM catalog_sales,
date_dim
WHERE c.c_customer_sk = cs_ship_customer_sk
AND cs_sold_date_sk = d_date_sk
AND d_year = 2004
AND d_moy BETWEEN 3 AND 3 + 2) )
GROUP BY cd_gender,
cd_marital_status,
cd_education_status,
cd_purchase_estimate,
cd_credit_rating
ORDER BY cd_gender,
cd_marital_status,
cd_education_status,
cd_purchase_estimate,
cd_credit_rating
cd_credit_rating
LIMIT 100
{noformat}

Stack trace from drillbit.log

{noformat}
2018-06-27 15:24:42,130 [24cbf157-e95c-42ab-7307-f75f5943a277:frag:0:0] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
Unexpected EMIT outcome received in buildSchema phase

Fragment 0:0

[Error Id: ba1a35e0-807e-4bab-b820-8aa6aad80e87 on qa102-45.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IllegalStateException: Unexpected EMIT outcome received in buildSchema phase

Fragment 0:0

[Error Id: ba1a35e0-807e-4bab-b820-8aa6aad80e87 on qa102-45.qa.lab:31010]
 at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_161]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
 at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
Caused by: java.lang.IllegalStateException: Unexpected EMIT outcome received in 
buildSchema phase
 at 
org.apache.drill.exec.physical.impl.TopN.TopNBatch.buildSchema(TopNBatch.java:178)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
 at 
org.ap

[jira] [Updated] (DRILL-6531) Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 5:54 PM Bridget Bevens

2018-06-27 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-6531:
--
Labels: doc-complete  (was: )

> Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 
> 5:54 PM Bridget Bevens
> --
>
> Key: DRILL-6531
> URL: https://issues.apache.org/jira/browse/DRILL-6531
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Reporter: Bridget Bevens
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc-complete
> Fix For: 1.14.0
>
>
> Hi Bridget,
>  
>There seems to be an error in the example shown in 
> https://drill.apache.org/docs/custom-function-interfaces/
> Custom Function Interfaces - Apache Drill
> drill.apache.org
> Implement the Drill interface appropriate for the type of function that you 
> want to develop. Each interface provides a set of required holders where you 
> input data types that your function uses and required methods that Drill 
> calls to perform your function’s operations.
> The error is logical, not relating to the main topic (Aggregate Function 
> Interface), but may slightly confuse anyone carefully reading this doc (like 
> me ☺)
> The error is – the red line should come before the brown line:
> @Override
> public void add() {
> if (in.value < min.value) {
>   min.value = in.value;
>   secondMin.value = min.value;
> }
> That is - Should be:
>  
> @Override
> public void add() {
> if (in.value < min.value) {
>   secondMin.value = min.value;
>   min.value = in.value;
> }
>   This comes from interpreting the name of the new function (“The second most 
> minimum”).
> While on the subject – looks like the reset() function is also wrong (need to 
> reset to high numbers, not zero):
>  
> @Override
> public void reset() {
>   min.value = 0;  è  9
>   secondMin.value = 0;  è  9
> }
>   Thanks,
>  
> Boaz
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6531) Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 5:54 PM Bridget Bevens

2018-06-27 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens closed DRILL-6531.
-

doc complete, closing issue

> Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 
> 5:54 PM Bridget Bevens
> --
>
> Key: DRILL-6531
> URL: https://issues.apache.org/jira/browse/DRILL-6531
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Reporter: Bridget Bevens
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc-complete
> Fix For: 1.14.0
>
>
> Hi Bridget,
>  
>There seems to be an error in the example shown in 
> https://drill.apache.org/docs/custom-function-interfaces/
> Custom Function Interfaces - Apache Drill
> drill.apache.org
> Implement the Drill interface appropriate for the type of function that you 
> want to develop. Each interface provides a set of required holders where you 
> input data types that your function uses and required methods that Drill 
> calls to perform your function’s operations.
> The error is logical, not relating to the main topic (Aggregate Function 
> Interface), but may slightly confuse anyone carefully reading this doc (like 
> me ☺)
> The error is – the red line should come before the brown line:
> @Override
> public void add() {
> if (in.value < min.value) {
>   min.value = in.value;
>   secondMin.value = min.value;
> }
> That is - Should be:
>  
> @Override
> public void add() {
> if (in.value < min.value) {
>   secondMin.value = min.value;
>   min.value = in.value;
> }
>   This comes from interpreting the name of the new function (“The second most 
> minimum”).
> While on the subject – looks like the reset() function is also wrong (need to 
> reset to high numbers, not zero):
>  
> @Override
> public void reset() {
>   min.value = 0;  è  9
>   secondMin.value = 0;  è  9
> }
>   Thanks,
>  
> Boaz
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6531) Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 5:54 PM Bridget Bevens

2018-06-27 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens resolved DRILL-6531.
---
Resolution: Fixed

updated doc with the suggested changes. 
thanks,
Bridget

> Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 
> 5:54 PM Bridget Bevens
> --
>
> Key: DRILL-6531
> URL: https://issues.apache.org/jira/browse/DRILL-6531
> Project: Apache Drill
>  Issue Type: Task
>  Components: Documentation
>Reporter: Bridget Bevens
>Assignee: Bridget Bevens
>Priority: Minor
>  Labels: doc-complete
> Fix For: 1.14.0
>
>
> Hi Bridget,
>  
>There seems to be an error in the example shown in 
> https://drill.apache.org/docs/custom-function-interfaces/
> Custom Function Interfaces - Apache Drill
> drill.apache.org
> Implement the Drill interface appropriate for the type of function that you 
> want to develop. Each interface provides a set of required holders where you 
> input data types that your function uses and required methods that Drill 
> calls to perform your function’s operations.
> The error is logical, not relating to the main topic (Aggregate Function 
> Interface), but may slightly confuse anyone carefully reading this doc (like 
> me ☺)
> The error is – the red line should come before the brown line:
> @Override
> public void add() {
> if (in.value < min.value) {
>   min.value = in.value;
>   secondMin.value = min.value;
> }
> That is - Should be:
>  
> @Override
> public void add() {
> if (in.value < min.value) {
>   secondMin.value = min.value;
>   min.value = in.value;
> }
>   This comes from interpreting the name of the new function (“The second most 
> minimum”).
> While on the subject – looks like the reset() function is also wrong (need to 
> reset to high numbers, not zero):
>  
> @Override
> public void reset() {
>   min.value = 0;  è  9
>   secondMin.value = 0;  è  9
> }
>   Thanks,
>  
> Boaz
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-6393) Radians should take an argument (x)

2018-06-27 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens closed DRILL-6393.
-

Doc-complete, closing issue.

> Radians should take an argument (x)
> ---
>
> Key: DRILL-6393
> URL: https://issues.apache.org/jira/browse/DRILL-6393
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-complete
> Fix For: 1.14.0
>
>
> The radians function is missing an argument on this webpage:
>https://drill.apache.org/docs/math-and-trig/
> The table has this information:
> {noformat}
> RADIANS   FLOAT8  Converts x degress to radians.
> {noformat}
> It should be:
> {noformat}
> RADIANS(x)FLOAT8  Converts x degrees to radians.
> {noformat}
> Also, degress is mis-spelled.  It should be degrees.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6393) Radians should take an argument (x)

2018-06-27 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-6393:
--
Labels: doc-complete  (was: doc-impacting)

> Radians should take an argument (x)
> ---
>
> Key: DRILL-6393
> URL: https://issues.apache.org/jira/browse/DRILL-6393
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-complete
> Fix For: 1.14.0
>
>
> The radians function is missing an argument on this webpage:
>https://drill.apache.org/docs/math-and-trig/
> The table has this information:
> {noformat}
> RADIANS   FLOAT8  Converts x degress to radians.
> {noformat}
> It should be:
> {noformat}
> RADIANS(x)FLOAT8  Converts x degrees to radians.
> {noformat}
> Also, degress is mis-spelled.  It should be degrees.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6393) Radians should take an argument (x)

2018-06-27 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens resolved DRILL-6393.
---
Resolution: Fixed

Made the changes and updates are published. 
Thanks,
Bridget

> Radians should take an argument (x)
> ---
>
> Key: DRILL-6393
> URL: https://issues.apache.org/jira/browse/DRILL-6393
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.13.0
>Reporter: Robert Hou
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-complete
> Fix For: 1.14.0
>
>
> The radians function is missing an argument on this webpage:
>https://drill.apache.org/docs/math-and-trig/
> The table has this information:
> {noformat}
> RADIANS   FLOAT8  Converts x degress to radians.
> {noformat}
> It should be:
> {noformat}
> RADIANS(x)FLOAT8  Converts x degrees to radians.
> {noformat}
> Also, degress is mis-spelled.  It should be degrees.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6547) IllegalStateException: Tried to remove unmanaged buffer.

2018-06-27 Thread Robert Hou (JIRA)
Robert Hou created DRILL-6547:
-

 Summary: IllegalStateException: Tried to remove unmanaged buffer.
 Key: DRILL-6547
 URL: https://issues.apache.org/jira/browse/DRILL-6547
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Robert Hou
Assignee: Pritesh Maker


This is the query:
select * from (
select Index, concat(BinaryValue, 'aaa') NewVarcharValue from (select * from 
dfs.`/drill/testdata/batch_memory/alltypes_large_1MB.parquet`)) d where d.Index 
= 1;

This is the plan:
{noformat}
| 00-00Screen
00-01  Project(Index=[$0], NewVarcharValue=[$1])
00-02SelectionVectorRemover
00-03  Filter(condition=[=($0, 1)])
00-04Project(Index=[$0], NewVarcharValue=[CONCAT($1, 'aaa')])
00-05  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/batch_memory/alltypes_large_1MB.parquet]], 
selectionRoot=maprfs:/drill/testdata/batch_memory/alltypes_large_1MB.parquet, 
numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`Index`, 
`BinaryValue`]]])
{noformat}

Here is the stack trace from drillbit.log:
{noformat}
2018-06-27 13:55:03,291 [24cc0659-30b7-b290-7fae-ecb1c1f15c05:frag:0:0] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
Tried to remove unmanaged buffer.

Fragment 0:0

[Error Id: bc1f2f72-c31b-4b9a-964f-96dec9e0f388 on qa-node186.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IllegalStateException: Tried to remove unmanaged buffer.

Fragment 0:0

[Error Id: bc1f2f72-c31b-4b9a-964f-96dec9e0f388 on qa-node186.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_161]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
Caused by: java.lang.IllegalStateException: Tried to remove unmanaged buffer.
at 
org.apache.drill.exec.ops.BufferManagerImpl.replace(BufferManagerImpl.java:50) 
~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at io.netty.buffer.DrillBuf.reallocIfNeeded(DrillBuf.java:97) 
~[drill-memory-base-1.14.0-SNAPSHOT.jar:4.0.48.Final]
at 
org.apache.drill.exec.test.generated.ProjectorGen4046.doEval(ProjectorTemplate.java:77)
 ~[na:na]
at 
org.apache.drill.exec.test.generated.ProjectorGen4046.projectRecords(ProjectorTemplate.java:67)
 ~[na:na]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:236)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:117)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAP

[jira] [Reopened] (DRILL-6529) Project Batch Sizing causes two LargeFileCompilation tests to timeout

2018-06-27 Thread Daniel Gruno (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Gruno reopened DRILL-6529:
-

oops, wrong ticket! sorry!!

> Project Batch Sizing causes two LargeFileCompilation tests to timeout
> -
>
> Key: DRILL-6529
> URL: https://issues.apache.org/jira/browse/DRILL-6529
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.14.0
>
>
> Timeout failures are seen in TestLargeFileCompilation testExternal_Sort and 
> testTop_N_Sort. These tests are stress tests for compilation where the 
> queries cover projections over 5000 columns and sort over 500 columns. These 
> tests pass if they are run stand-alone. Something triggers the timeouts when 
> the tests are run in parallel as part of a unit test run.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6529) Project Batch Sizing causes two LargeFileCompilation tests to timeout

2018-06-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525620#comment-16525620
 ] 

ASF GitHub Bot commented on DRILL-6529:
---

bitblender commented on issue #1335: DRILL-6529: Project Batch Sizing causes 
two LargeFileCompilation tests to timeout
URL: https://github.com/apache/drill/pull/1335#issuecomment-400823880
 
 
   @vvysotskyi @ilooner  NUM_PROJECT_COLUMNS controls 3 other tests besides 
testTopN and testExternalSort - testPARQUET_WRITER(), testTEXT_Writer and 
TestProject.  How about we set NUM_PROJECT_COLUMNS=2500 and then introduce a 
new constant NUM_PROJECT_TEST_COLUMNS=1 for testProject? Basically, reduce 
the stress on SORTers and WRITERs but bump up the column count on testProject 
to a number which will push the code generated for Project over the constant 
pool limit. testProject takes about 130s on my Mac with 1 columns.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Project Batch Sizing causes two LargeFileCompilation tests to timeout
> -
>
> Key: DRILL-6529
> URL: https://issues.apache.org/jira/browse/DRILL-6529
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>Priority: Major
> Fix For: 1.14.0
>
>
> Timeout failures are seen in TestLargeFileCompilation testExternal_Sort and 
> testTop_N_Sort. These tests are stress tests for compilation where the 
> queries cover projections over 5000 columns and sort over 500 columns. These 
> tests pass if they are run stand-alone. Something triggers the timeouts when 
> the tests are run in parallel as part of a unit test run.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6461) Add Basic Data Correctness Unit Tests

2018-06-27 Thread Timothy Farkas (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas updated DRILL-6461:
--
Reviewer: salim achouche

> Add Basic Data Correctness Unit Tests
> -
>
> Key: DRILL-6461
> URL: https://issues.apache.org/jira/browse/DRILL-6461
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>Priority: Major
>
> There are no data correctness unit tests for HashAgg. We need to add some.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6512) Remove unnecessary processing overhead from RecordBatchSizer

2018-06-27 Thread Padma Penumarthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-6512:

  Labels: ready-to-commit  (was: )
Reviewer: Karthikeyan Manivannan

> Remove unnecessary processing overhead from RecordBatchSizer
> 
>
> Key: DRILL-6512
> URL: https://issues.apache.org/jira/browse/DRILL-6512
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> record batch sizer collects lot of information about the record batch. Since 
> it is used now in every operator, for every batch, it makes sense to make it 
> as efficient as possible. Remove anything that is not needed and also, may be 
> provide two options one light weight and another which is more comprehensive. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6537) Limit the batch size for buffering operators based on how much memory they get

2018-06-27 Thread Pritesh Maker (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525348#comment-16525348
 ] 

Pritesh Maker commented on DRILL-6537:
--

[https://github.com/apache/drill/pull/1342] ([~ppenumarthy])

> Limit the batch size for buffering operators based on how much memory they get
> --
>
> Key: DRILL-6537
> URL: https://issues.apache.org/jira/browse/DRILL-6537
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> we are using 16MB for output batch size for all operators by default. 
> however, for buffering operators, depending upon how much memory they get 
> allocated, it is possible that 16MB might be too much to use for output batch 
> size, causing them to spill sometimes.
> Have output batch size to be minimum of 16 MB or 20% of the allocated memory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6410) Memory leak in Parquet Reader during cancellation

2018-06-27 Thread Pritesh Maker (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525259#comment-16525259
 ] 

Pritesh Maker commented on DRILL-6410:
--

[~vrozov] is this the PR - [https://github.com/apache/drill/pull/1333] ?

> Memory leak in Parquet Reader during cancellation
> -
>
> Key: DRILL-6410
> URL: https://issues.apache.org/jira/browse/DRILL-6410
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Reporter: salim achouche
>Assignee: Vlad Rozov
>Priority: Major
> Fix For: 1.14.0
>
>
> Occasionally, a memory leak is observed within the flat Parquet reader when 
> query cancellation is invoked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6539) Record count not set for this vector container error

2018-06-27 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6539:
-
Reviewer: Karthikeyan Manivannan  (was: Timothy Farkas)

> Record count not set for this vector container error 
> -
>
> Key: DRILL-6539
> URL: https://issues.apache.org/jira/browse/DRILL-6539
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> This error is randomly seen when executing queries.
> [Error Id: 6a2a49e5-28d9-4587-ab8b-5262c07f8fdc on drill196:31010]
>   (java.lang.IllegalStateException) Record count not set for this vector 
> container
> com.google.common.base.Preconditions.checkState():173
> org.apache.drill.exec.record.VectorContainer.getRecordCount():394
> org.apache.drill.exec.record.RecordBatchSizer.():681
> org.apache.drill.exec.record.RecordBatchSizer.():665
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():441
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():882
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():891
> 
> org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():578
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():937
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():754
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():335
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():403
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():354
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():299
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.physical.impl.BaseRootExec.next():103
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():93
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
> org.apache.drill.common.SelfC

[jira] [Updated] (DRILL-6539) Record count not set for this vector container error

2018-06-27 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6539:
-
Labels: ready-to-commit  (was: )

> Record count not set for this vector container error 
> -
>
> Key: DRILL-6539
> URL: https://issues.apache.org/jira/browse/DRILL-6539
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> This error is randomly seen when executing queries.
> [Error Id: 6a2a49e5-28d9-4587-ab8b-5262c07f8fdc on drill196:31010]
>   (java.lang.IllegalStateException) Record count not set for this vector 
> container
> com.google.common.base.Preconditions.checkState():173
> org.apache.drill.exec.record.VectorContainer.getRecordCount():394
> org.apache.drill.exec.record.RecordBatchSizer.():681
> org.apache.drill.exec.record.RecordBatchSizer.():665
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():441
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():882
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():891
> 
> org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():578
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():937
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():754
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():335
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():403
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():354
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():299
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.physical.impl.BaseRootExec.next():103
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():93
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
> org.apache.drill.common.SelfCleaningRunnable.run():3

[jira] [Commented] (DRILL-6539) Record count not set for this vector container error

2018-06-27 Thread Pritesh Maker (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525256#comment-16525256
 ] 

Pritesh Maker commented on DRILL-6539:
--

[~ppenumarthy] I don't see the PR with this JIRA

> Record count not set for this vector container error 
> -
>
> Key: DRILL-6539
> URL: https://issues.apache.org/jira/browse/DRILL-6539
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> This error is randomly seen when executing queries.
> [Error Id: 6a2a49e5-28d9-4587-ab8b-5262c07f8fdc on drill196:31010]
>   (java.lang.IllegalStateException) Record count not set for this vector 
> container
> com.google.common.base.Preconditions.checkState():173
> org.apache.drill.exec.record.VectorContainer.getRecordCount():394
> org.apache.drill.exec.record.RecordBatchSizer.():681
> org.apache.drill.exec.record.RecordBatchSizer.():665
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():441
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():882
> 
> org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():891
> 
> org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():578
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():937
> 
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():754
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():335
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():403
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():354
> 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():299
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137
> org.apache.drill.exec.record.AbstractRecordBatch.next():172
> org.apache.drill.exec.physical.impl.BaseRootExec.next():103
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():93
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecuto

[jira] [Updated] (DRILL-4020) The not-equal operator returns incorrect results when used on the HBase row key

2018-06-27 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-4020:
-
Reviewer: Parth Chandra

> The not-equal operator returns incorrect results when used on the HBase row 
> key
> ---
>
> Key: DRILL-4020
> URL: https://issues.apache.org/jira/browse/DRILL-4020
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
> Environment: Drill Sandbox
>Reporter: Akihiko Kusanagi
>Assignee: Akihiko Kusanagi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Create a test HBase table:
> {noformat}
> hbase> create 'table', 'f'
> hbase> put 'table', 'row1', 'f:c', 'value1'
> hbase> put 'table', 'row2', 'f:c', 'value2'
> hbase> put 'table', 'row3', 'f:c', 'value3'
> {noformat}
> The table looks like this:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table`;
> +-+
> | EXPR$0  |
> +-+
> | row1|
> | row2|
> | row3|
> +-+
> 1 row selected (4.596 seconds)
> {noformat}
> However, this query returns incorrect results when a not-equal operator is 
> used on the row key:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table` WHERE row_key <> 'row1';
> +-+
> | EXPR$0  |
> +-+
> | row1|
> | row2|
> | row3|
> +-+
> 1 row selected (0.573 seconds)
> {noformat}
> In the query plan, there is no RowFilter:
> {noformat}
> 00-00Screen
> 00-01  Project(EXPR$0=[CONVERT_FROMUTF8($0)])
> 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
> [tableName=table, startRow=, stopRow=, filter=null], columns=[`row_key`]]])
> {noformat}
> When the query has multiple not-equal operators, it works fine:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table` WHERE row_key <> 'row1' AND row_key <> 'row2';
> +-+
> | EXPR$0  |
> +-+
> | row3|
> +-+
> 1 row selected (0.255 seconds)
> {noformat}
> In the query plan, a FilterList has two RowFilters with NOT_EQUAL operators:
> {noformat}
> 00-00Screen
> 00-01  Project(EXPR$0=[CONVERT_FROMUTF8($0)])
> 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
> [tableName=table, startRow=, stopRow=, filter=FilterList AND (2/2): 
> [RowFilter (NOT_EQUAL, row1), RowFilter (NOT_EQUAL, row2)]], 
> columns=[`row_key`]]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4020) The not-equal operator returns incorrect results when used on the HBase row key

2018-06-27 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-4020:
-
Fix Version/s: 1.14.0

> The not-equal operator returns incorrect results when used on the HBase row 
> key
> ---
>
> Key: DRILL-4020
> URL: https://issues.apache.org/jira/browse/DRILL-4020
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
> Environment: Drill Sandbox
>Reporter: Akihiko Kusanagi
>Assignee: Akihiko Kusanagi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Create a test HBase table:
> {noformat}
> hbase> create 'table', 'f'
> hbase> put 'table', 'row1', 'f:c', 'value1'
> hbase> put 'table', 'row2', 'f:c', 'value2'
> hbase> put 'table', 'row3', 'f:c', 'value3'
> {noformat}
> The table looks like this:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table`;
> +-+
> | EXPR$0  |
> +-+
> | row1|
> | row2|
> | row3|
> +-+
> 1 row selected (4.596 seconds)
> {noformat}
> However, this query returns incorrect results when a not-equal operator is 
> used on the row key:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table` WHERE row_key <> 'row1';
> +-+
> | EXPR$0  |
> +-+
> | row1|
> | row2|
> | row3|
> +-+
> 1 row selected (0.573 seconds)
> {noformat}
> In the query plan, there is no RowFilter:
> {noformat}
> 00-00Screen
> 00-01  Project(EXPR$0=[CONVERT_FROMUTF8($0)])
> 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
> [tableName=table, startRow=, stopRow=, filter=null], columns=[`row_key`]]])
> {noformat}
> When the query has multiple not-equal operators, it works fine:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table` WHERE row_key <> 'row1' AND row_key <> 'row2';
> +-+
> | EXPR$0  |
> +-+
> | row3|
> +-+
> 1 row selected (0.255 seconds)
> {noformat}
> In the query plan, a FilterList has two RowFilters with NOT_EQUAL operators:
> {noformat}
> 00-00Screen
> 00-01  Project(EXPR$0=[CONVERT_FROMUTF8($0)])
> 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
> [tableName=table, startRow=, stopRow=, filter=FilterList AND (2/2): 
> [RowFilter (NOT_EQUAL, row1), RowFilter (NOT_EQUAL, row2)]], 
> columns=[`row_key`]]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-4020) The not-equal operator returns incorrect results when used on the HBase row key

2018-06-27 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-4020:
-
Labels: ready-to-commit  (was: )

> The not-equal operator returns incorrect results when used on the HBase row 
> key
> ---
>
> Key: DRILL-4020
> URL: https://issues.apache.org/jira/browse/DRILL-4020
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
> Environment: Drill Sandbox
>Reporter: Akihiko Kusanagi
>Assignee: Akihiko Kusanagi
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Create a test HBase table:
> {noformat}
> hbase> create 'table', 'f'
> hbase> put 'table', 'row1', 'f:c', 'value1'
> hbase> put 'table', 'row2', 'f:c', 'value2'
> hbase> put 'table', 'row3', 'f:c', 'value3'
> {noformat}
> The table looks like this:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table`;
> +-+
> | EXPR$0  |
> +-+
> | row1|
> | row2|
> | row3|
> +-+
> 1 row selected (4.596 seconds)
> {noformat}
> However, this query returns incorrect results when a not-equal operator is 
> used on the row key:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table` WHERE row_key <> 'row1';
> +-+
> | EXPR$0  |
> +-+
> | row1|
> | row2|
> | row3|
> +-+
> 1 row selected (0.573 seconds)
> {noformat}
> In the query plan, there is no RowFilter:
> {noformat}
> 00-00Screen
> 00-01  Project(EXPR$0=[CONVERT_FROMUTF8($0)])
> 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
> [tableName=table, startRow=, stopRow=, filter=null], columns=[`row_key`]]])
> {noformat}
> When the query has multiple not-equal operators, it works fine:
> {noformat}
> 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM 
> hbase.`table` WHERE row_key <> 'row1' AND row_key <> 'row2';
> +-+
> | EXPR$0  |
> +-+
> | row3|
> +-+
> 1 row selected (0.255 seconds)
> {noformat}
> In the query plan, a FilterList has two RowFilters with NOT_EQUAL operators:
> {noformat}
> 00-00Screen
> 00-01  Project(EXPR$0=[CONVERT_FROMUTF8($0)])
> 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec 
> [tableName=table, startRow=, stopRow=, filter=FilterList AND (2/2): 
> [RowFilter (NOT_EQUAL, row1), RowFilter (NOT_EQUAL, row2)]], 
> columns=[`row_key`]]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6537) Limit the batch size for buffering operators based on how much memory they get

2018-06-27 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6537:
-
Reviewer: Boaz Ben-Zvi

> Limit the batch size for buffering operators based on how much memory they get
> --
>
> Key: DRILL-6537
> URL: https://issues.apache.org/jira/browse/DRILL-6537
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.13.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>Priority: Major
> Fix For: 1.14.0
>
>
> we are using 16MB for output batch size for all operators by default. 
> however, for buffering operators, depending upon how much memory they get 
> allocated, it is possible that 16MB might be too much to use for output batch 
> size, causing them to spill sometimes.
> Have output batch size to be minimum of 16 MB or 20% of the allocated memory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6546) Allow unnest function with nested columns and complex expressions

2018-06-27 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-6546:
-
Fix Version/s: 1.14.0

> Allow unnest function with nested columns and complex expressions
> -
>
> Key: DRILL-6546
> URL: https://issues.apache.org/jira/browse/DRILL-6546
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently queries with unnest and nested columns or complex expressions 
> inside fails:
> {code:sql}
> select u.item from cp.`lateraljoin/nested-customer.parquet` c,
> unnest(c.orders.items) as u(item)
> {code}
> fails with error:
> {noformat}
> VALIDATION ERROR: From line 2, column 10 to line 2, column 21: Column 
> 'orders.items' not found in table 'c'
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6546) Allow unnest function with nested columns and complex expressions

2018-06-27 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-6546:
--

 Summary: Allow unnest function with nested columns and complex 
expressions
 Key: DRILL-6546
 URL: https://issues.apache.org/jira/browse/DRILL-6546
 Project: Apache Drill
  Issue Type: Bug
Reporter: Volodymyr Vysotskyi
Assignee: Volodymyr Vysotskyi


Currently queries with unnest and nested columns or complex expressions inside 
fails:
{code:sql}
select u.item from cp.`lateraljoin/nested-customer.parquet` c,
unnest(c.orders.items) as u(item)
{code}
fails with error:
{noformat}
VALIDATION ERROR: From line 2, column 10 to line 2, column 21: Column 
'orders.items' not found in table 'c'
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6545) Projection Push down into Lateral Join operator.

2018-06-27 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6545:
---
Affects Version/s: (was: 1.13.0)

> Projection Push down into Lateral Join operator.
> 
>
> Key: DRILL-6545
> URL: https://issues.apache.org/jira/browse/DRILL-6545
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.14.0
>
>
> For the Lateral’s logical and physical plan node, we would need to add an 
> output RowType such that a Projection can be pushed down to Lateral. 
> Currently, Lateral will produce all columns from left and right and it 
> depends on a subsequent Project to eliminate unneeded columns. However, this 
> will blow up the memory use of Lateral since each column from the left will 
> be replicated N times based on N rows coming from UNNEST. We can have a 
> ProjectLateralPushdownRule that pushes only the plain columns onto LATERAL 
> but keeps the expression evalulations as part of the Project above the 
> Lateral.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6545) Projection Push down into Lateral Join operator.

2018-06-27 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6545:
---
Issue Type: Improvement  (was: Bug)

> Projection Push down into Lateral Join operator.
> 
>
> Key: DRILL-6545
> URL: https://issues.apache.org/jira/browse/DRILL-6545
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.14.0
>
>
> For the Lateral’s logical and physical plan node, we would need to add an 
> output RowType such that a Projection can be pushed down to Lateral. 
> Currently, Lateral will produce all columns from left and right and it 
> depends on a subsequent Project to eliminate unneeded columns. However, this 
> will blow up the memory use of Lateral since each column from the left will 
> be replicated N times based on N rows coming from UNNEST. We can have a 
> ProjectLateralPushdownRule that pushes only the plain columns onto LATERAL 
> but keeps the expression evalulations as part of the Project above the 
> Lateral.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6545) Projection Push down into Lateral Join operator.

2018-06-27 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-6545:
--

 Summary: Projection Push down into Lateral Join operator.
 Key: DRILL-6545
 URL: https://issues.apache.org/jira/browse/DRILL-6545
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.13.0
Reporter: Hanumath Rao Maduri
Assignee: Hanumath Rao Maduri
 Fix For: 1.14.0


For the Lateral’s logical and physical plan node, we would need to add an 
output RowType such that a Projection can be pushed down to Lateral. Currently, 
Lateral will produce all columns from left and right and it depends on a 
subsequent Project to eliminate unneeded columns. However, this will blow up 
the memory use of Lateral since each column from the left will be replicated N 
times based on N rows coming from UNNEST. We can have a 
ProjectLateralPushdownRule that pushes only the plain columns onto LATERAL but 
keeps the expression evalulations as part of the Project above the Lateral.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6534) Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to dependencyManagement

2018-06-27 Thread Bohdan Kazydub (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub updated DRILL-6534:
--
Summary: Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator 
to dependencyManagement  (was: Upgrade ZooKeeper patch version to 3.4.12 and 
add Apache curator to dependencyManagement)

> Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to 
> dependencyManagement
> 
>
> Key: DRILL-6534
> URL: https://issues.apache.org/jira/browse/DRILL-6534
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Minor
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6534) Upgrade ZooKeeper patch version to 3.4.12 and add Apache curator to dependencyManagement

2018-06-27 Thread Bohdan Kazydub (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub updated DRILL-6534:
--
Summary: Upgrade ZooKeeper patch version to 3.4.12 and add Apache curator 
to dependencyManagement  (was: Upgrade ZooKeeper patch version to 3.4.12)

> Upgrade ZooKeeper patch version to 3.4.12 and add Apache curator to 
> dependencyManagement
> 
>
> Key: DRILL-6534
> URL: https://issues.apache.org/jira/browse/DRILL-6534
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Minor
> Fix For: 1.14.0
>
>
> Also moved apache curator dependencies to dependencyManagement



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6534) Upgrade ZooKeeper patch version to 3.4.12 and add Apache curator to dependencyManagement

2018-06-27 Thread Bohdan Kazydub (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub updated DRILL-6534:
--
Description: (was: Also moved apache curator dependencies to 
dependencyManagement)

> Upgrade ZooKeeper patch version to 3.4.12 and add Apache curator to 
> dependencyManagement
> 
>
> Key: DRILL-6534
> URL: https://issues.apache.org/jira/browse/DRILL-6534
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Minor
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6534) Upgrade ZooKeeper patch version to 3.4.12

2018-06-27 Thread Bohdan Kazydub (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub updated DRILL-6534:
--
Description: Also moved apache curator dependencies to dependencyManagement

> Upgrade ZooKeeper patch version to 3.4.12
> -
>
> Key: DRILL-6534
> URL: https://issues.apache.org/jira/browse/DRILL-6534
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Minor
> Fix For: 1.14.0
>
>
> Also moved apache curator dependencies to dependencyManagement



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6534) Upgrade ZooKeeper patch version to 3.4.12

2018-06-27 Thread Bohdan Kazydub (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub updated DRILL-6534:
--
Summary: Upgrade ZooKeeper patch version to 3.4.12  (was: Upgrade ZooKeeper 
patch version to 3.4.11)

> Upgrade ZooKeeper patch version to 3.4.12
> -
>
> Key: DRILL-6534
> URL: https://issues.apache.org/jira/browse/DRILL-6534
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Minor
> Fix For: 1.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (DRILL-6540) Upgrade to HADOOP-3.1 libraries

2018-06-27 Thread Bohdan Kazydub (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bohdan Kazydub updated DRILL-6540:
--
Comment: was deleted

(was: Note: I've tried to set 4.0.1 version explicitly for curator libraries, 
everything seemed to work, but there is a need to increase maxsize of 
drill-jdbc-all module by 3 MB as curator-client.jar of version 4.0.1 has size 
of 2.7 MB compared to current (2.7.1) version's size of 70 kB.)

> Upgrade to HADOOP-3.1 libraries 
> 
>
> Key: DRILL-6540
> URL: https://issues.apache.org/jira/browse/DRILL-6540
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vitalii Diravka
>Priority: Major
>
> Currently Drill uses 2.7.1 version of hadoop libraries (hadoop-common, 
> hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
> hadoop-yarn-client).
> Half of year ago the [Hadoop 
> 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and 
> recently it was an update - [Hadoop 
> 3.1|https://hadoop.apache.org/docs/r3.1.0/]. 
> To use Drill under Hadoop3.0 distribution we need this upgrade. Also the 
> newer version includes new features, which can be useful for Drill.
> This upgrade is also needed to leverage the newest version of zookeeper 
> libraries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6540) Upgrade to HADOOP-3.1 libraries

2018-06-27 Thread Bohdan Kazydub (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524934#comment-16524934
 ] 

Bohdan Kazydub commented on DRILL-6540:
---

Note: I've tried to set 4.0.1 version explicitly for curator libraries, 
everything seemed to work, but there is a need to increase maxsize of 
drill-jdbc-all module by 3 MB as curator-client.jar of version 4.0.1 has size 
of 2.7 MB compared to current (2.7.1) version's size of 70 kB.

> Upgrade to HADOOP-3.1 libraries 
> 
>
> Key: DRILL-6540
> URL: https://issues.apache.org/jira/browse/DRILL-6540
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Vitalii Diravka
>Priority: Major
>
> Currently Drill uses 2.7.1 version of hadoop libraries (hadoop-common, 
> hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, 
> hadoop-yarn-client).
> Half of year ago the [Hadoop 
> 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and 
> recently it was an update - [Hadoop 
> 3.1|https://hadoop.apache.org/docs/r3.1.0/]. 
> To use Drill under Hadoop3.0 distribution we need this upgrade. Also the 
> newer version includes new features, which can be useful for Drill.
> This upgrade is also needed to leverage the newest version of zookeeper 
> libraries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-6544) Timestamp value in Drill UI showed inconsistently with the same value retrieved from sqline

2018-06-27 Thread Anton Gozhiy (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy reassigned DRILL-6544:
---

Assignee: Anton Gozhiy

> Timestamp value in Drill UI showed inconsistently with the same value 
> retrieved from sqline
> ---
>
> Key: DRILL-6544
> URL: https://issues.apache.org/jira/browse/DRILL-6544
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Anton Gozhiy
>Assignee: Anton Gozhiy
>Priority: Minor
>
> *Query:*
> {code:sql}
> select timestamp '2008-2-23 12:23:34' from (values(1));
> {code}
> *Expected result (from sqline):*
> 2008-02-23 12:23:34.0
> *Actual result (from Drill UI):*
> 2008-02-23T12:23:34



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6544) Timestamp value in Drill UI showed inconsistently with the same value retrieved from sqline

2018-06-27 Thread Anton Gozhiy (JIRA)
Anton Gozhiy created DRILL-6544:
---

 Summary: Timestamp value in Drill UI showed inconsistently with 
the same value retrieved from sqline
 Key: DRILL-6544
 URL: https://issues.apache.org/jira/browse/DRILL-6544
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Anton Gozhiy


*Query:*
{code:sql}
select timestamp '2008-2-23 12:23:34' from (values(1));
{code}

*Expected result (from sqline):*
2008-02-23 12:23:34.0

*Actual result (from Drill UI):*
2008-02-23T12:23:34



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)