[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate
[ https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525930#comment-16525930 ] ASF GitHub Bot commented on DRILL-6310: --- Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#discussion_r198697870 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java ## @@ -84,6 +97,63 @@ "htRowIdx" /* workspace index */, "incoming" /* read container */, "outgoing" /* write container */, "aggrValuesContainer" /* workspace container */, UPDATE_AGGR_INSIDE, UPDATE_AGGR_OUTSIDE, UPDATE_AGGR_INSIDE); + public int getOutputRowCount() { +return hashAggMemoryManager.getOutputRowCount(); + } + + public RecordBatchMemoryManager getRecordBatchMemoryManager() { +return hashAggMemoryManager; + } + + private class HashAggMemoryManager extends RecordBatchMemoryManager { +private int valuesRowWidth = 0; + +HashAggMemoryManager(int outputBatchSize) { + super(outputBatchSize); +} + +@Override +public void update() { + // Get sizing information for the batch. + setRecordBatchSizer(new RecordBatchSizer(incoming)); + + int fieldId = 0; + int newOutgoingRowWidth = 0; + for (VectorWrapper w : container) { +if (w.getValueVector() instanceof FixedWidthVector) { + newOutgoingRowWidth += ((FixedWidthVector) w.getValueVector()).getValueWidth(); + if (fieldId >= numGroupByExprs) { +valuesRowWidth += ((FixedWidthVector) w.getValueVector()).getValueWidth(); + } +} else { + RecordBatchSizer.ColumnSize columnSize = getRecordBatchSizer().getColumn(columnMapping.get(w.getValueVector().getField().getName())); + newOutgoingRowWidth += columnSize.getAllocSizePerEntry(); + if (fieldId >= numGroupByExprs) { +valuesRowWidth += columnSize.getAllocSizePerEntry(); + } +} +fieldId++; + } + + updateIncomingStats(); + if (logger.isDebugEnabled()) { +logger.debug("BATCH_STATS, incoming: {}", getRecordBatchSizer()); + } + + // We do not want to keep adjusting batch holders target row count Review comment: The code below is correct, however suggestion for elegance: All the code below is using a single local parameter (*newOutgoingRowWidth*) , and all the rest is multiple calls to methods of the **RecordBatchMemoryManager** class. So -- how about moving all the code (from here to the end of update() ) into a new method in the **RecordBatchMemoryManager** class, called possibly `updateMemoryManagerIfNeeded(newOutgoingRowWidth)` This looks simpler and cleaner. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > limit batch size for hash aggregate > --- > > Key: DRILL-6310 > URL: https://issues.apache.org/jira/browse/DRILL-6310 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > limit batch size for hash aggregate based on memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate
[ https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525931#comment-16525931 ] ASF GitHub Bot commented on DRILL-6310: --- Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#discussion_r198711357 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java ## @@ -84,6 +97,63 @@ "htRowIdx" /* workspace index */, "incoming" /* read container */, "outgoing" /* write container */, "aggrValuesContainer" /* workspace container */, UPDATE_AGGR_INSIDE, UPDATE_AGGR_OUTSIDE, UPDATE_AGGR_INSIDE); + public int getOutputRowCount() { +return hashAggMemoryManager.getOutputRowCount(); + } + + public RecordBatchMemoryManager getRecordBatchMemoryManager() { +return hashAggMemoryManager; + } + + private class HashAggMemoryManager extends RecordBatchMemoryManager { +private int valuesRowWidth = 0; + +HashAggMemoryManager(int outputBatchSize) { + super(outputBatchSize); +} + +@Override +public void update() { + // Get sizing information for the batch. + setRecordBatchSizer(new RecordBatchSizer(incoming)); + + int fieldId = 0; + int newOutgoingRowWidth = 0; + for (VectorWrapper w : container) { +if (w.getValueVector() instanceof FixedWidthVector) { + newOutgoingRowWidth += ((FixedWidthVector) w.getValueVector()).getValueWidth(); + if (fieldId >= numGroupByExprs) { +valuesRowWidth += ((FixedWidthVector) w.getValueVector()).getValueWidth(); + } +} else { + RecordBatchSizer.ColumnSize columnSize = getRecordBatchSizer().getColumn(columnMapping.get(w.getValueVector().getField().getName())); + newOutgoingRowWidth += columnSize.getAllocSizePerEntry(); Review comment: I rebased and tested, and there was a failure here `columnSize == null` in some of the new EMIT unit test. Initially I thought that this has to do with the first outgoing batch (which always has OK_NEW_SCHEMA) being non-empty ; but eventually I made a fix (see below) where the 'name' and the 'expr' are different. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > limit batch size for hash aggregate > --- > > Key: DRILL-6310 > URL: https://issues.apache.org/jira/browse/DRILL-6310 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > limit batch size for hash aggregate based on memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate
[ https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525933#comment-16525933 ] ASF GitHub Bot commented on DRILL-6310: --- Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#discussion_r198711656 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java ## @@ -283,16 +363,23 @@ private HashAggregator createAggregatorInternal() throws SchemaChangeException, continue; } - if ( expr instanceof FunctionHolderExpression ) { - String funcName = ((FunctionHolderExpression) expr).getName(); - if ( funcName.equals("sum") || funcName.equals("max") || funcName.equals("min") ) {extraNonNullColumns++;} - } final MaterializedField outputField = MaterializedField.create(ne.getRef().getAsNamePart().getName(), expr.getMajorType()); - @SuppressWarnings("resource") - ValueVector vv = TypeHelper.getNewVector(outputField, oContext.getAllocator()); + @SuppressWarnings("resource") ValueVector vv = TypeHelper.getNewVector(outputField, oContext.getAllocator()); aggrOutFieldIds[i] = container.add(vv); aggrExprs[i] = new ValueVectorWriteExpression(aggrOutFieldIds[i], expr, true); + + if (expr instanceof FunctionHolderExpression) { +String funcName = ((FunctionHolderExpression) expr).getName(); +if (funcName.equals("sum") || funcName.equals("max") || funcName.equals("min")) { + extraNonNullColumns++; +} +if (((FunctionCall) ne.getExpr()).args.get(0) instanceof SchemaPath) { + columnMapping.put(outputField.getName(), ((SchemaPath) ((FunctionCall) ne.getExpr()).args.get(0)).getAsNamePart().getName()); +} Review comment: When I tested (TestHashAggEmitOutcome) there were cases that did not match **SchemaPath**. So I added here: ``` else if (((FunctionCall) ne.getExpr()).args.get(0) instanceof FunctionCall) { columnMapping.put(outputField.getName(), ((FunctionCall) ((FunctionCall) ne.getExpr()).args.get(0)).getName()); } ``` The execution did get there, but aI did not check further (e.g., if this was used right in computing the row size). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > limit batch size for hash aggregate > --- > > Key: DRILL-6310 > URL: https://issues.apache.org/jira/browse/DRILL-6310 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > limit batch size for hash aggregate based on memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate
[ https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525929#comment-16525929 ] ASF GitHub Bot commented on DRILL-6310: --- Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#discussion_r198702179 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java ## @@ -646,14 +647,18 @@ public PutStatus put(int incomingRowIdx, IndexPointer htIdxHolder, int hashCode) currentIdx = freeIndex++; boolean addedBatch = false; try { // ADD A BATCH - addedBatch = addBatchIfNeeded(currentIdx); + addedBatch = addBatchIfNeeded(currentIdx, targetBatchRowCount); + if (addedBatch) { +// If we just added the batch, update the current index to point to beginning of new batch. +currentIdx = (batchHolders.size() - 1) * BATCH_SIZE; +freeIndex = currentIdx + 1; + } } catch (OutOfMemoryException OOME) { - retryAfterOOM( currentIdx < batchHolders.size() * BATCH_SIZE ); + retryAfterOOM( currentIdx < totalBatchHoldersSize); Review comment: *Just a comment:* The idea of using this "max, but not actual" size of batch (the constant BATCH_SIZE) is on one hand smart (much less code needs to be changed), but also a little awkward, as things don't mean exactly what they are (e.g. totalBatchHolderSize). Maybe in the future this code should be cleaned; e.g., keep a count of the batches and compare to the count, instead of the not-real total. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > limit batch size for hash aggregate > --- > > Key: DRILL-6310 > URL: https://issues.apache.org/jira/browse/DRILL-6310 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > limit batch size for hash aggregate based on memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate
[ https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525934#comment-16525934 ] ASF GitHub Bot commented on DRILL-6310: --- Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#discussion_r198698608 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchMemoryManager.java ## @@ -201,6 +201,10 @@ public static int adjustOutputRowCount(int rowCount) { return (Math.min(MAX_NUM_ROWS, Math.max(Integer.highestOneBit(rowCount) - 1, MIN_NUM_ROWS))); } + public static int computeOutputRowCount(int batchSize, int rowWidth) { +return adjustOutputRowCount(RecordBatchSizer.safeDivide(batchSize, rowWidth)); Review comment: BTW, `safeDivide` uses `Math.ceil()`, so it may return a number one bigger than the actual division. E.g., safeDivide(15, 2) = 8 , not 7. But probably the `- 1` in the adjustment above fixes that issue. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > limit batch size for hash aggregate > --- > > Key: DRILL-6310 > URL: https://issues.apache.org/jira/browse/DRILL-6310 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > limit batch size for hash aggregate based on memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate
[ https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525932#comment-16525932 ] ASF GitHub Bot commented on DRILL-6310: --- Ben-Zvi commented on a change in pull request #1324: DRILL-6310: limit batch size for hash aggregate URL: https://github.com/apache/drill/pull/1324#discussion_r198712153 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java ## @@ -263,13 +343,13 @@ private HashAggregator createAggregatorInternal() throws SchemaChangeException, // add this group-by vector to the output container groupByOutFieldIds[i] = container.add(vv); + columnMapping.put(outputField.getName(), ne.getRef().getAsNamePart().getName()); Review comment: The regressions *TestHashAggEmitOutcome* set the "name" and the "expr" to different strings, hence the failure above. So I changed this to: ``` columnMapping.put(outputField.getName(), ne.getExpr().toString().replace('`',' ').trim() ); ``` To make the regressions (e.g. **testHashAggrMultipleEMITOutcome()**) pass. It is a little ugly (removing the '`' before and after), but I could not think of something nicer. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > limit batch size for hash aggregate > --- > > Key: DRILL-6310 > URL: https://issues.apache.org/jira/browse/DRILL-6310 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > limit batch size for hash aggregate based on memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6549) batch sizing for nested loop join
Padma Penumarthy created DRILL-6549: --- Summary: batch sizing for nested loop join Key: DRILL-6549 URL: https://issues.apache.org/jira/browse/DRILL-6549 Project: Apache Drill Issue Type: Improvement Components: Execution - Relational Operators Affects Versions: 1.13.0 Reporter: Padma Penumarthy Assignee: Padma Penumarthy Fix For: 1.14.0 limit output batch size for nested loop join based on memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6519) Add String Distance and Phonetic Functions
[ https://issues.apache.org/jira/browse/DRILL-6519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525881#comment-16525881 ] ASF GitHub Bot commented on DRILL-6519: --- cgivre commented on a change in pull request #1331: DRILL-6519: Add String Distance and Phonetic Functions URL: https://github.com/apache/drill/pull/1331#discussion_r198701570 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/fn/impl/TestStringDistanceFunctions.java ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.fn.impl; + +import org.apache.drill.categories.SqlFunctionTest; +import org.apache.drill.categories.UnlikelyTest; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterFixtureBuilder; +import org.apache.drill.test.ClusterTest; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import static org.junit.Assert.assertEquals; + +@Category({UnlikelyTest.class, SqlFunctionTest.class}) +public class TestStringDistanceFunctions extends ClusterTest { + + @BeforeClass + public static void setup() throws Exception { Review comment: @arina-ielchiieva I added a singletonDouble() function and reworked the unit tests accordingly. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add String Distance and Phonetic Functions > -- > > Key: DRILL-6519 > URL: https://issues.apache.org/jira/browse/DRILL-6519 > Project: Apache Drill > Issue Type: Improvement >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > From a recent project, this collection of functions makes it possible to do > fuzzy string matching as well as phonetic matching on strings. > > The following functions are all phonetic functions and map text to a number > or string based on how the word sounds. For instance "Jayme" and "Jaime" > have the same soundex values and hence these functions can be used to match > similar sounding words. > * caverphone1( ) > * caverphone2( ) > * cologne_phonetic( ) > * dm_soundex( ) > * double_metaphone() > * match_rating_encoder( ) > * metaphone() > * nysiis( ) > * refined_soundex() > * soundex() > Additionally, there is the > {code:java} > sounds_like(,){code} > function which can be used to find strings that sound similar. For instance: > > {code:java} > SELECT * > FROM > WHERE sounds_like( last_name, 'Gretsky' ) > {code} > h2. String Distance Functions > In addition to the phonetic functions, there are a series of distance > functions which measure the difference between two strings. The functions > include: > * cosine_distance(,) > * fuzzy_score(,) > * hamming_distance (,) > * jaccard_distance (,) > * jaro_distance (,) > * levenshtein_distance (,) > * longest_common_substring_distance(,) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6422) Update guava to 23.0 and shade it
[ https://issues.apache.org/jira/browse/DRILL-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525834#comment-16525834 ] ASF GitHub Bot commented on DRILL-6422: --- vrozov commented on a change in pull request #1264: DRILL-6422: Update guava to 23.0 and shade it URL: https://github.com/apache/drill/pull/1264#discussion_r198691250 ## File path: drill-shaded/pom.xml ## @@ -0,0 +1,84 @@ + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> + 4.0.0 + + +org.apache +apache +18 + + + + org.apache.drill + drill-shaded + 1.0 + + drill-shaded + pom + + +guava-shaded + + + + + +org.apache.maven.plugins +maven-compiler-plugin + Review comment: AFAIK class version should not change when a class is relocated. If guava classes have version 52, the same version will be in the shaded library. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update guava to 23.0 and shade it > - > > Key: DRILL-6422 > URL: https://issues.apache.org/jira/browse/DRILL-6422 > Project: Apache Drill > Issue Type: Task >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.14.0 > > > Some hadoop libraries use old versions of guava and most of them are > incompatible with guava 23.0. > To allow usage of new guava version, it should be shaded and shaded version > should be used in the project. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6422) Update guava to 23.0 and shade it
[ https://issues.apache.org/jira/browse/DRILL-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525830#comment-16525830 ] ASF GitHub Bot commented on DRILL-6422: --- vrozov commented on a change in pull request #1264: DRILL-6422: Update guava to 23.0 and shade it URL: https://github.com/apache/drill/pull/1264#discussion_r198690513 ## File path: drill-shaded/guava-shaded/pom.xml ## @@ -0,0 +1,147 @@ + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> + 4.0.0 + + +org.apache +apache +18 + + + org.apache.drill + guava-shaded + ${dep.guava.version} + drill-shaded/guava-shaded + + jar + + + +false +23.0 + + + + + com.google.guava + guava + ${dep.guava.version} + jar + + + + + + ${project.build.directory}/shaded-sources + + + Review comment: It meaningless to bind `default-compile` or `default-testCompile` execution `id` for `maven-surefire-plugin` as it does not define such execution `id`. The intention of binding default execution `id` to phase `none` is to disable execution of those plugins when desired, instead of relying on other settings (like `skip` or `skipTests`). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update guava to 23.0 and shade it > - > > Key: DRILL-6422 > URL: https://issues.apache.org/jira/browse/DRILL-6422 > Project: Apache Drill > Issue Type: Task >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.14.0 > > > Some hadoop libraries use old versions of guava and most of them are > incompatible with guava 23.0. > To allow usage of new guava version, it should be shaded and shaded version > should be used in the project. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6422) Update guava to 23.0 and shade it
[ https://issues.apache.org/jira/browse/DRILL-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525815#comment-16525815 ] ASF GitHub Bot commented on DRILL-6422: --- vrozov commented on a change in pull request #1264: DRILL-6422: Update guava to 23.0 and shade it URL: https://github.com/apache/drill/pull/1264#discussion_r198689121 ## File path: pom.xml ## @@ -3334,5 +3351,6 @@ exec drill-yarn distribution +drill-shaded Review comment: What type of changes do you expect in pom.xml? Why would you prefer to build the same artifact over and over again even though it does not change? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update guava to 23.0 and shade it > - > > Key: DRILL-6422 > URL: https://issues.apache.org/jira/browse/DRILL-6422 > Project: Apache Drill > Issue Type: Task >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.14.0 > > > Some hadoop libraries use old versions of guava and most of them are > incompatible with guava 23.0. > To allow usage of new guava version, it should be shaded and shaded version > should be used in the project. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-6454) Native MapR DB plugin support for Hive MapR-DB json table
[ https://issues.apache.org/jira/browse/DRILL-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522841#comment-16522841 ] Bridget Bevens edited comment on DRILL-6454 at 6/28/18 12:21 AM: - Hi [~vitalii], Thank you for adding the doc notes to the description. I'll add the following description to the MapR Drill release notes: A new option, store.hive.maprdb_json.optimize_scan_with_native_reader, enables Drill to use the native Drill reader to read Hive external tables that were created from MapR-DB JSON tables. When you enable this option, Drill performs faster reads of the data and applies filter pushdown optimizations. I'll add the following info to the MapR MapR-DB format plugin doc page: Starting in Drill 1.14 (MEP 6.0), Drill can use the native Drill reader to read Hive external tables that were created from MapR-DB JSON tables. When using the native reader, Drill performs faster reads of the data and can apply filter pushdown optimizations. Use the SET command with the store.hive.maprdb_json.optimize_scan_with_native_reader option to enable this functionality: SET `store.hive.maprdb_json.optimize_scan_with_native_reader` = true; Regarding the store.hive.parquet.optimize_scan_with_native_reader: false option, I've updated the following pages with the new option and included a note on the options page about the old option being deprecated in Drill 1.15: https://drill.apache.org/docs/querying-hive/ https://drill.apache.org/docs/configuration-options-introduction/ I'm going to set doc status to doc-complete, but please let me know if you see any issues with the doc updates. Thanks, Bridget was (Author: bbevens): Hi [~vitalii], Thank you for adding the doc notes to the description. Can you please review the following description for the store.hive.maprdb_json.optimize_scan_with_native_reader option? Description: When you enable the store.hive.maprdb_json.optimize_scan_with_native_reader option, Drill can use the native Drill reader to read [Hive external tables that were created from MapR-DB JSON tables|https://maprdocs.mapr.com/home/Hive/ConnectingToMapR-DB.html]. The native Drill reader enables Drill to perform faster reads of data and apply filter pushdown optimizations. Thanks, Bridget > Native MapR DB plugin support for Hive MapR-DB json table > - > > Key: DRILL-6454 > URL: https://issues.apache.org/jira/browse/DRILL-6454 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Hive, Storage - MapRDB >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Labels: doc-complete, ready-to-commit > Fix For: 1.14.0 > > > Hive can create and query MapR-DB tables via maprdb-json-handler: > [https://maprdocs.mapr.com/home/Hive/ConnectingToMapR-DB.html] > The aim of this Jira to implement Drill native reader for Hive MapR-DB tables > (similar to parquet). > Design proposal is: > - to use JsonTableGroupScan instead of HiveScan; > - to add storage planning rule to convert HiveScan to MapRDBGroupScan; > - to add system/session option to enable using of this native reader; > - native reader can be used only for Drill build with mapr profile (there is > no reason to leverage it for default profile); > > *For documentation:* > two new options were added: > store.hive.parquet.optimize_scan_with_native_reader: false, > store.hive.maprdb_json.optimize_scan_with_native_reader: false, > store.hive.parquet.optimize_scan_with_native_reader is new option used > instead of store.hive.optimize_scan_with_native_readers. The latter is > deprecated and will be removed in 1.15. > (https://issues.apache.org/jira/browse/DRILL-6527). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6454) Native MapR DB plugin support for Hive MapR-DB json table
[ https://issues.apache.org/jira/browse/DRILL-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens updated DRILL-6454: -- Labels: doc-complete ready-to-commit (was: doc-impacting ready-to-commit) > Native MapR DB plugin support for Hive MapR-DB json table > - > > Key: DRILL-6454 > URL: https://issues.apache.org/jira/browse/DRILL-6454 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Hive, Storage - MapRDB >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Labels: doc-complete, ready-to-commit > Fix For: 1.14.0 > > > Hive can create and query MapR-DB tables via maprdb-json-handler: > [https://maprdocs.mapr.com/home/Hive/ConnectingToMapR-DB.html] > The aim of this Jira to implement Drill native reader for Hive MapR-DB tables > (similar to parquet). > Design proposal is: > - to use JsonTableGroupScan instead of HiveScan; > - to add storage planning rule to convert HiveScan to MapRDBGroupScan; > - to add system/session option to enable using of this native reader; > - native reader can be used only for Drill build with mapr profile (there is > no reason to leverage it for default profile); > > *For documentation:* > two new options were added: > store.hive.parquet.optimize_scan_with_native_reader: false, > store.hive.maprdb_json.optimize_scan_with_native_reader: false, > store.hive.parquet.optimize_scan_with_native_reader is new option used > instead of store.hive.optimize_scan_with_native_readers. The latter is > deprecated and will be removed in 1.15. > (https://issues.apache.org/jira/browse/DRILL-6527). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6527) Update option name for Drill Parquet native reader
[ https://issues.apache.org/jira/browse/DRILL-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525749#comment-16525749 ] Bridget Bevens commented on DRILL-6527: --- I've updated the name where I could find it in the docs, on the following pages: https://drill.apache.org/docs/querying-hive/ https://drill.apache.org/docs/configuration-options-introduction/ Thanks, Bridget > Update option name for Drill Parquet native reader > -- > > Key: DRILL-6527 > URL: https://issues.apache.org/jira/browse/DRILL-6527 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Parquet >Affects Versions: 1.14.0 >Reporter: Vitalii Diravka >Priority: Minor > Fix For: 1.15.0 > > > The old option name to enable Drill parquet reader is > "store.hive.optimize_scan_with_native_readers". > Starting from DRILL-6454 one new native reader is introduced, therefore more > precise option name is added for parquet native reader too. > A new option name for parquet reader is > "store.hive.parquet.optimize_scan_with_native_reader". > The old one is deprecated and should be removed starting from Drill 1.15.0 > release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6494) Drill Plugins Handler
[ https://issues.apache.org/jira/browse/DRILL-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525731#comment-16525731 ] ASF GitHub Bot commented on DRILL-6494: --- vdiravka commented on issue #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345#issuecomment-400864708 @sohami Please review This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Drill Plugins Handler > - > > Key: DRILL-6494 > URL: https://issues.apache.org/jira/browse/DRILL-6494 > Project: Apache Drill > Issue Type: New Feature > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > The new service of updating Drill's plugins configs could be implemented. > Please find details from design overview document: > https://docs.google.com/document/d/14JKb2TA8dGnOIE5YT2RImkJ7R0IAYSGjJg8xItL5yMI/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6494) Drill Plugins Handler
[ https://issues.apache.org/jira/browse/DRILL-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525730#comment-16525730 ] ASF GitHub Bot commented on DRILL-6494: --- vdiravka opened a new pull request #1345: DRILL-6494: Drill Plugins Handler URL: https://github.com/apache/drill/pull/1345 - StoragePluginsHandler is added and implementation as StoragePluginsUpdater. It is used in the init() stage of StoragePluginRegistryImpl and updates storage plugins configs from storage-plugins.conf file. If plugins configs are present in the persistence store - they are updated, otherwise bootstrap plugins are updated and the result configs are loaded to persistence store. If the enabled status is absent in the storage-plugins.conf file, the last plugin config enabled status persists. - The "NULL" issue with updating Hive plugin config by REST is solved. But clients are still being instantiated for disabled plugins - DRILL-6412. - "org.honton.chas.hocon:jackson-dataformat-hocon" library is added for the proper deserializing HOCON conf file - additional refactoring: "com.typesafe:config" and "org.apache.commons:commons-lang3" are placed into DependencyManagement block with proper versions; correct properties in DrillMetrics class are specified This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Drill Plugins Handler > - > > Key: DRILL-6494 > URL: https://issues.apache.org/jira/browse/DRILL-6494 > Project: Apache Drill > Issue Type: New Feature > Components: Tools, Build & Test >Affects Versions: 1.13.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.14.0 > > > The new service of updating Drill's plugins configs could be implemented. > Please find details from design overview document: > https://docs.google.com/document/d/14JKb2TA8dGnOIE5YT2RImkJ7R0IAYSGjJg8xItL5yMI/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6498) Support for EMIT outcome in ExternalSortBatch
[ https://issues.apache.org/jira/browse/DRILL-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6498: - Labels: ready-to-commit (was: ) > Support for EMIT outcome in ExternalSortBatch > - > > Key: DRILL-6498 > URL: https://issues.apache.org/jira/browse/DRILL-6498 > Project: Apache Drill > Issue Type: Task > Components: Execution - Relational Operators >Reporter: Sorabh Hamirwasia >Assignee: Sorabh Hamirwasia >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > With Lateral and Unnest if Sort is present in the sub-query, then it needs to > handle the EMIT outcome correctly. This means when a EMIT is received then > perform the Sort operation on the records buffered so far and produce output > with it. After EMIT Sort should refresh it's state and again work on next > batches of incoming record unless an EMIT is seen again. > For first cut Sort will not support spilling in the subquery between Lateral > and Unnest since spilling is very unlikely. The worst case that can happen is > that Lateral will get a batch with only 1 row of data because of repeated > type column data size being too big. In that case Unnest will produce 1 > output batch only and Sort or other blocking operators anyways needs enough > memory to at least hold 1 incoming batch. So in ideal cases spilling should > not happen. But if there is a operator between Sort and Unnest which > increases the data size then Sort might be in a situation to spill but thats > not a common case for now. > > *Description of Changes:* > Currently the sort operator is implemented in below way. This is to provide > general high level working of Sort and how EMIT support was implemented. > 1) In buildSchema phase SORT creates an empty container with SV NONE and > sends that downstream. > 2) Post buildSchema phase it goes into a LOAD and keeps calling next() on > upstream until it sees NONE or there is a failure. > 3) Each batch which it receives it applies SV2 on them if it already doesn't > have it and sort them and then buffers the batch after converting it into > something called BatchGroup.InputBatch. > 4) During buffering it looks for memory pressure and spill as needed. > 5) Once all the batches are received and it gets None from upstream, it > starts a merge phase. > 6) In Merge phase it check if the merge can happen in memory or spilling is > needed and perform the merge accordingly. > 7) Sort has a concept of SortResults which represents different kinds of > output container that sort can generate based on input batches and memory > conditions. For example if it's an in-memory merge then output container of > sort is SV4 container with SortResults of type MergeSortWrapper. If its spill > and merge then container is of SV_NONE type with SortResults as BatchMerger. > There are SortResults type for empty and single batches (not used anywhere). > 8) SortResults basically provides an abstraction such that it provides output > result with each next call to it backed by output container of the > ExternalSortRecordBatch along with correct recordCount and Sv2/SV4 as needed. > So for example: in case of MergeSortWrapper all the inputs are in memory and > hence all output is also in memory backed by SV4. For each next call > basically SV4 is updated with the start index and length which informs called > about record boundary that it should consume. For BatchMerger based on memory > pressure and number of record Sort can output with each output container, it > fills the output container with that many records and sends downstream. > 9) Also the abstraction of SortResults is such that at beginning of > MergePhase output container which is held by SortResults is cleared off and > later re-initialized after merge is completed. > Now in current condition since SORT is a blocking operator it was clearing > the output container ValueVectors post buildSchema phase and in load phase. > And later it create the final output container (with ValueVector objects > )after it has seen all the incoming data. The very first output batch is > always returned with OK_NEW_SCHEMA such that downstream operator can setup > the correct SV mode and schema with the first output batch, since schema > returned in buildSchema phase was a dummy one. So the vector references > maintained by downstream operator in buildSchema phase is updated with vector > references in the first output batch. > With EMIT however SORT will go into load phase multiple times and hence we > cannot clear off the output container of Sort after each EMIT boundary. If we > do that then downstream operator which is holding references to ExternalSort > output container ValueVector will become inval
[jira] [Assigned] (DRILL-6529) Project Batch Sizing causes two LargeFileCompilation tests to timeout
[ https://issues.apache.org/jira/browse/DRILL-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker reassigned DRILL-6529: Assignee: Pritesh Maker (was: Karthikeyan Manivannan) > Project Batch Sizing causes two LargeFileCompilation tests to timeout > - > > Key: DRILL-6529 > URL: https://issues.apache.org/jira/browse/DRILL-6529 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Reporter: Karthikeyan Manivannan >Assignee: Pritesh Maker >Priority: Major > Fix For: 1.14.0 > > > Timeout failures are seen in TestLargeFileCompilation testExternal_Sort and > testTop_N_Sort. These tests are stress tests for compilation where the > queries cover projections over 5000 columns and sort over 500 columns. These > tests pass if they are run stand-alone. Something triggers the timeouts when > the tests are run in parallel as part of a unit test run. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-6529) Project Batch Sizing causes two LargeFileCompilation tests to timeout
[ https://issues.apache.org/jira/browse/DRILL-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker reassigned DRILL-6529: Assignee: Karthikeyan Manivannan (was: Pritesh Maker) > Project Batch Sizing causes two LargeFileCompilation tests to timeout > - > > Key: DRILL-6529 > URL: https://issues.apache.org/jira/browse/DRILL-6529 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Reporter: Karthikeyan Manivannan >Assignee: Karthikeyan Manivannan >Priority: Major > Fix For: 1.14.0 > > > Timeout failures are seen in TestLargeFileCompilation testExternal_Sort and > testTop_N_Sort. These tests are stress tests for compilation where the > queries cover projections over 5000 columns and sort over 500 columns. These > tests pass if they are run stand-alone. Something triggers the timeouts when > the tests are run in parallel as part of a unit test run. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6145) Enable usage of Hive MapR-DB JSON handler
[ https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens updated DRILL-6145: -- Labels: doc-complete (was: doc-impacting) > Enable usage of Hive MapR-DB JSON handler > - > > Key: DRILL-6145 > URL: https://issues.apache.org/jira/browse/DRILL-6145 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - MapRDB >Affects Versions: 1.12.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Labels: doc-complete > Fix For: 1.14.0 > > > Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's > external tables it is necessary to add "hive-maprdb-json-handler". > Use case: > # Create a table MapR-DB JSON table: > {code} > _> mapr dbshell_ > _maprdb root:> create /tmp/table/json_ (make sure /tmp/table exists) > {code} > -- insert data > {code} > insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers > on the Edge", "studio":"Command Line Studios"}' > insert /tmp/table/json --id movie003 --value '\{"title":"The Golden > Master", "studio":"All-Nighter"}' > {code} > # Create a Hive external table: > {code} > hive> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( > > movie_id string, title string, studio string) > > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' > > TBLPROPERTIES("maprdb.table.name" = > "/tmp/table/json","maprdb.column.id" = "movie_id"); > {code} > > # Use hive schema to query this table via Drill: > {code} > 0: jdbc:drill:> select * from hive.mapr_db_json_hive_tbl; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6145) Enable usage of Hive MapR-DB JSON handler
[ https://issues.apache.org/jira/browse/DRILL-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525716#comment-16525716 ] Bridget Bevens commented on DRILL-6145: --- Setting doc status to doc-complete, but you can let me know if you see any issues and I can make the necessary changes. Thanks, Bridget > Enable usage of Hive MapR-DB JSON handler > - > > Key: DRILL-6145 > URL: https://issues.apache.org/jira/browse/DRILL-6145 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - MapRDB >Affects Versions: 1.12.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Major > Labels: doc-complete > Fix For: 1.14.0 > > > Similar to "hive-hbase-storage-handler" to support querying MapR-DB Hive's > external tables it is necessary to add "hive-maprdb-json-handler". > Use case: > # Create a table MapR-DB JSON table: > {code} > _> mapr dbshell_ > _maprdb root:> create /tmp/table/json_ (make sure /tmp/table exists) > {code} > -- insert data > {code} > insert /tmp/table/json --value '\{"_id":"movie002" , "title":"Developers > on the Edge", "studio":"Command Line Studios"}' > insert /tmp/table/json --id movie003 --value '\{"title":"The Golden > Master", "studio":"All-Nighter"}' > {code} > # Create a Hive external table: > {code} > hive> CREATE EXTERNAL TABLE mapr_db_json_hive_tbl ( > > movie_id string, title string, studio string) > > STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' > > TBLPROPERTIES("maprdb.table.name" = > "/tmp/table/json","maprdb.column.id" = "movie_id"); > {code} > > # Use hive schema to query this table via Drill: > {code} > 0: jdbc:drill:> select * from hive.mapr_db_json_hive_tbl; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens updated DRILL-6094: -- Labels: doc-complete (was: doc-impacting) > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-complete > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6094) Decimal data type enhancements
[ https://issues.apache.org/jira/browse/DRILL-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525711#comment-16525711 ] Bridget Bevens commented on DRILL-6094: --- I edited the following pages based on the work done for DECIMAL data type and input from Volodymyr. (Thank you, [~vvysotskyi]!) https://drill.apache.org/docs/aggregate-and-aggregate-statistical/ https://drill.apache.org/docs/aggregate-window-functions/ https://drill.apache.org/docs/data-type-conversion/#data-type-conversion-examples https://drill.apache.org/docs/supported-data-types/ https://drill.apache.org/docs/parquet-format/ https://drill.apache.org/docs/math-and-trig/ I also created a new section called “Decimal Data Type” on this page to cover some of the changes: https://drill.apache.org/docs/supported-data-types/#decimal-data-type I'm setting doc status to doc-complete, but please let me know if I missed anything or need to make any changes. Thanks, Bridget > Decimal data type enhancements > -- > > Key: DRILL-6094 > URL: https://issues.apache.org/jira/browse/DRILL-6094 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: doc-impacting > Fix For: 1.14.0 > > > Currently, Decimal types are disabled by default since existing Decimal > implementation has a lot of flaws and performance problems. The goal of this > Jira to describe majority of them and possible ways of improving existing > implementation to be able to enable Decimal data types by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6548) IllegalStateException: Unexpected EMIT outcome received in buildSchema phase
Khurram Faraaz created DRILL-6548: - Summary: IllegalStateException: Unexpected EMIT outcome received in buildSchema phase Key: DRILL-6548 URL: https://issues.apache.org/jira/browse/DRILL-6548 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.14.0 Reporter: Khurram Faraaz Assignee: Sorabh Hamirwasia On a four node Apache Drill 1.14.0 master branch against TPC-DS SF1 parquet data (parquet views) git.commit.id.abbrev=b92f599 TPC-DS query 69 fails with IllegalStateException: Unexpected EMIT outcome received in buildSchema phase Failing query is, {noformat} 2018-06-27 15:24:39,493 [24cbf157-e95c-42ab-7307-f75f5943a277:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id 24cbf157-e95c-42ab-7307-f75f5943a277: SELECT cd_gender, cd_marital_status, cd_education_status, Count(*) cnt1, cd_purchase_estimate, Count(*) cnt2, cd_credit_rating, FROM customer c, customer_address ca, customer_demographics WHERE c.c_current_addr_sk = ca.ca_address_sk AND ca_state IN ( 'KS', 'AZ', 'NE' ) AND cd_demo_sk = c.c_current_cdemo_sk AND EXISTS (SELECT * FROM store_sales, date_dim WHERE c.c_customer_sk = ss_customer_sk AND ss_sold_date_sk = d_date_sk AND d_year = 2004 AND d_moy BETWEEN 3 AND 3 + 2) AND ( NOT EXISTS (SELECT * FROM web_sales, date_dim WHERE c.c_customer_sk = ws_bill_customer_sk AND ws_sold_date_sk = d_date_sk AND d_year = 2004 AND d_moy BETWEEN 3 AND 3 + 2) AND NOT EXISTS (SELECT * FROM catalog_sales, date_dim WHERE c.c_customer_sk = cs_ship_customer_sk AND cs_sold_date_sk = d_date_sk AND d_year = 2004 AND d_moy BETWEEN 3 AND 3 + 2) ) GROUP BY cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating ORDER BY cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating cd_credit_rating LIMIT 100 {noformat} Stack trace from drillbit.log {noformat} 2018-06-27 15:24:42,130 [24cbf157-e95c-42ab-7307-f75f5943a277:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: Unexpected EMIT outcome received in buildSchema phase Fragment 0:0 [Error Id: ba1a35e0-807e-4bab-b820-8aa6aad80e87 on qa102-45.qa.lab:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalStateException: Unexpected EMIT outcome received in buildSchema phase Fragment 0:0 [Error Id: ba1a35e0-807e-4bab-b820-8aa6aad80e87 on qa102-45.qa.lab:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] Caused by: java.lang.IllegalStateException: Unexpected EMIT outcome received in buildSchema phase at org.apache.drill.exec.physical.impl.TopN.TopNBatch.buildSchema(TopNBatch.java:178) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.ap
[jira] [Updated] (DRILL-6531) Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 5:54 PM Bridget Bevens
[ https://issues.apache.org/jira/browse/DRILL-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens updated DRILL-6531: -- Labels: doc-complete (was: ) > Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, > 5:54 PM Bridget Bevens > -- > > Key: DRILL-6531 > URL: https://issues.apache.org/jira/browse/DRILL-6531 > Project: Apache Drill > Issue Type: Task > Components: Documentation >Reporter: Bridget Bevens >Assignee: Bridget Bevens >Priority: Minor > Labels: doc-complete > Fix For: 1.14.0 > > > Hi Bridget, > >There seems to be an error in the example shown in > https://drill.apache.org/docs/custom-function-interfaces/ > Custom Function Interfaces - Apache Drill > drill.apache.org > Implement the Drill interface appropriate for the type of function that you > want to develop. Each interface provides a set of required holders where you > input data types that your function uses and required methods that Drill > calls to perform your function’s operations. > The error is logical, not relating to the main topic (Aggregate Function > Interface), but may slightly confuse anyone carefully reading this doc (like > me ☺) > The error is – the red line should come before the brown line: > @Override > public void add() { > if (in.value < min.value) { > min.value = in.value; > secondMin.value = min.value; > } > That is - Should be: > > @Override > public void add() { > if (in.value < min.value) { > secondMin.value = min.value; > min.value = in.value; > } > This comes from interpreting the name of the new function (“The second most > minimum”). > While on the subject – looks like the reset() function is also wrong (need to > reset to high numbers, not zero): > > @Override > public void reset() { > min.value = 0; è 9 > secondMin.value = 0; è 9 > } > Thanks, > > Boaz > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6531) Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 5:54 PM Bridget Bevens
[ https://issues.apache.org/jira/browse/DRILL-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens closed DRILL-6531. - doc complete, closing issue > Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, > 5:54 PM Bridget Bevens > -- > > Key: DRILL-6531 > URL: https://issues.apache.org/jira/browse/DRILL-6531 > Project: Apache Drill > Issue Type: Task > Components: Documentation >Reporter: Bridget Bevens >Assignee: Bridget Bevens >Priority: Minor > Labels: doc-complete > Fix For: 1.14.0 > > > Hi Bridget, > >There seems to be an error in the example shown in > https://drill.apache.org/docs/custom-function-interfaces/ > Custom Function Interfaces - Apache Drill > drill.apache.org > Implement the Drill interface appropriate for the type of function that you > want to develop. Each interface provides a set of required holders where you > input data types that your function uses and required methods that Drill > calls to perform your function’s operations. > The error is logical, not relating to the main topic (Aggregate Function > Interface), but may slightly confuse anyone carefully reading this doc (like > me ☺) > The error is – the red line should come before the brown line: > @Override > public void add() { > if (in.value < min.value) { > min.value = in.value; > secondMin.value = min.value; > } > That is - Should be: > > @Override > public void add() { > if (in.value < min.value) { > secondMin.value = min.value; > min.value = in.value; > } > This comes from interpreting the name of the new function (“The second most > minimum”). > While on the subject – looks like the reset() function is also wrong (need to > reset to high numbers, not zero): > > @Override > public void reset() { > min.value = 0; è 9 > secondMin.value = 0; è 9 > } > Thanks, > > Boaz > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-6531) Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 5:54 PM Bridget Bevens
[ https://issues.apache.org/jira/browse/DRILL-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens resolved DRILL-6531. --- Resolution: Fixed updated doc with the suggested changes. thanks, Bridget > Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, > 5:54 PM Bridget Bevens > -- > > Key: DRILL-6531 > URL: https://issues.apache.org/jira/browse/DRILL-6531 > Project: Apache Drill > Issue Type: Task > Components: Documentation >Reporter: Bridget Bevens >Assignee: Bridget Bevens >Priority: Minor > Labels: doc-complete > Fix For: 1.14.0 > > > Hi Bridget, > >There seems to be an error in the example shown in > https://drill.apache.org/docs/custom-function-interfaces/ > Custom Function Interfaces - Apache Drill > drill.apache.org > Implement the Drill interface appropriate for the type of function that you > want to develop. Each interface provides a set of required holders where you > input data types that your function uses and required methods that Drill > calls to perform your function’s operations. > The error is logical, not relating to the main topic (Aggregate Function > Interface), but may slightly confuse anyone carefully reading this doc (like > me ☺) > The error is – the red line should come before the brown line: > @Override > public void add() { > if (in.value < min.value) { > min.value = in.value; > secondMin.value = min.value; > } > That is - Should be: > > @Override > public void add() { > if (in.value < min.value) { > secondMin.value = min.value; > min.value = in.value; > } > This comes from interpreting the name of the new function (“The second most > minimum”). > While on the subject – looks like the reset() function is also wrong (need to > reset to high numbers, not zero): > > @Override > public void reset() { > min.value = 0; è 9 > secondMin.value = 0; è 9 > } > Thanks, > > Boaz > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-6393) Radians should take an argument (x)
[ https://issues.apache.org/jira/browse/DRILL-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens closed DRILL-6393. - Doc-complete, closing issue. > Radians should take an argument (x) > --- > > Key: DRILL-6393 > URL: https://issues.apache.org/jira/browse/DRILL-6393 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.13.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Labels: doc-complete > Fix For: 1.14.0 > > > The radians function is missing an argument on this webpage: >https://drill.apache.org/docs/math-and-trig/ > The table has this information: > {noformat} > RADIANS FLOAT8 Converts x degress to radians. > {noformat} > It should be: > {noformat} > RADIANS(x)FLOAT8 Converts x degrees to radians. > {noformat} > Also, degress is mis-spelled. It should be degrees. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6393) Radians should take an argument (x)
[ https://issues.apache.org/jira/browse/DRILL-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens updated DRILL-6393: -- Labels: doc-complete (was: doc-impacting) > Radians should take an argument (x) > --- > > Key: DRILL-6393 > URL: https://issues.apache.org/jira/browse/DRILL-6393 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.13.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Labels: doc-complete > Fix For: 1.14.0 > > > The radians function is missing an argument on this webpage: >https://drill.apache.org/docs/math-and-trig/ > The table has this information: > {noformat} > RADIANS FLOAT8 Converts x degress to radians. > {noformat} > It should be: > {noformat} > RADIANS(x)FLOAT8 Converts x degrees to radians. > {noformat} > Also, degress is mis-spelled. It should be degrees. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-6393) Radians should take an argument (x)
[ https://issues.apache.org/jira/browse/DRILL-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens resolved DRILL-6393. --- Resolution: Fixed Made the changes and updates are published. Thanks, Bridget > Radians should take an argument (x) > --- > > Key: DRILL-6393 > URL: https://issues.apache.org/jira/browse/DRILL-6393 > Project: Apache Drill > Issue Type: Bug > Components: Documentation >Affects Versions: 1.13.0 >Reporter: Robert Hou >Assignee: Bridget Bevens >Priority: Major > Labels: doc-complete > Fix For: 1.14.0 > > > The radians function is missing an argument on this webpage: >https://drill.apache.org/docs/math-and-trig/ > The table has this information: > {noformat} > RADIANS FLOAT8 Converts x degress to radians. > {noformat} > It should be: > {noformat} > RADIANS(x)FLOAT8 Converts x degrees to radians. > {noformat} > Also, degress is mis-spelled. It should be degrees. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6547) IllegalStateException: Tried to remove unmanaged buffer.
Robert Hou created DRILL-6547: - Summary: IllegalStateException: Tried to remove unmanaged buffer. Key: DRILL-6547 URL: https://issues.apache.org/jira/browse/DRILL-6547 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.14.0 Reporter: Robert Hou Assignee: Pritesh Maker This is the query: select * from ( select Index, concat(BinaryValue, 'aaa') NewVarcharValue from (select * from dfs.`/drill/testdata/batch_memory/alltypes_large_1MB.parquet`)) d where d.Index = 1; This is the plan: {noformat} | 00-00Screen 00-01 Project(Index=[$0], NewVarcharValue=[$1]) 00-02SelectionVectorRemover 00-03 Filter(condition=[=($0, 1)]) 00-04Project(Index=[$0], NewVarcharValue=[CONCAT($1, 'aaa')]) 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/batch_memory/alltypes_large_1MB.parquet]], selectionRoot=maprfs:/drill/testdata/batch_memory/alltypes_large_1MB.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`Index`, `BinaryValue`]]]) {noformat} Here is the stack trace from drillbit.log: {noformat} 2018-06-27 13:55:03,291 [24cc0659-30b7-b290-7fae-ecb1c1f15c05:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: Tried to remove unmanaged buffer. Fragment 0:0 [Error Id: bc1f2f72-c31b-4b9a-964f-96dec9e0f388 on qa-node186.qa.lab:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalStateException: Tried to remove unmanaged buffer. Fragment 0:0 [Error Id: bc1f2f72-c31b-4b9a-964f-96dec9e0f388 on qa-node186.qa.lab:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] Caused by: java.lang.IllegalStateException: Tried to remove unmanaged buffer. at org.apache.drill.exec.ops.BufferManagerImpl.replace(BufferManagerImpl.java:50) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at io.netty.buffer.DrillBuf.reallocIfNeeded(DrillBuf.java:97) ~[drill-memory-base-1.14.0-SNAPSHOT.jar:4.0.48.Final] at org.apache.drill.exec.test.generated.ProjectorGen4046.doEval(ProjectorTemplate.java:77) ~[na:na] at org.apache.drill.exec.test.generated.ProjectorGen4046.projectRecords(ProjectorTemplate.java:67) ~[na:na] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:236) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:117) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAP
[jira] [Reopened] (DRILL-6529) Project Batch Sizing causes two LargeFileCompilation tests to timeout
[ https://issues.apache.org/jira/browse/DRILL-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Gruno reopened DRILL-6529: - oops, wrong ticket! sorry!! > Project Batch Sizing causes two LargeFileCompilation tests to timeout > - > > Key: DRILL-6529 > URL: https://issues.apache.org/jira/browse/DRILL-6529 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Reporter: Karthikeyan Manivannan >Assignee: Karthikeyan Manivannan >Priority: Major > Fix For: 1.14.0 > > > Timeout failures are seen in TestLargeFileCompilation testExternal_Sort and > testTop_N_Sort. These tests are stress tests for compilation where the > queries cover projections over 5000 columns and sort over 500 columns. These > tests pass if they are run stand-alone. Something triggers the timeouts when > the tests are run in parallel as part of a unit test run. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6529) Project Batch Sizing causes two LargeFileCompilation tests to timeout
[ https://issues.apache.org/jira/browse/DRILL-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525620#comment-16525620 ] ASF GitHub Bot commented on DRILL-6529: --- bitblender commented on issue #1335: DRILL-6529: Project Batch Sizing causes two LargeFileCompilation tests to timeout URL: https://github.com/apache/drill/pull/1335#issuecomment-400823880 @vvysotskyi @ilooner NUM_PROJECT_COLUMNS controls 3 other tests besides testTopN and testExternalSort - testPARQUET_WRITER(), testTEXT_Writer and TestProject. How about we set NUM_PROJECT_COLUMNS=2500 and then introduce a new constant NUM_PROJECT_TEST_COLUMNS=1 for testProject? Basically, reduce the stress on SORTers and WRITERs but bump up the column count on testProject to a number which will push the code generated for Project over the constant pool limit. testProject takes about 130s on my Mac with 1 columns. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Project Batch Sizing causes two LargeFileCompilation tests to timeout > - > > Key: DRILL-6529 > URL: https://issues.apache.org/jira/browse/DRILL-6529 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Reporter: Karthikeyan Manivannan >Assignee: Karthikeyan Manivannan >Priority: Major > Fix For: 1.14.0 > > > Timeout failures are seen in TestLargeFileCompilation testExternal_Sort and > testTop_N_Sort. These tests are stress tests for compilation where the > queries cover projections over 5000 columns and sort over 500 columns. These > tests pass if they are run stand-alone. Something triggers the timeouts when > the tests are run in parallel as part of a unit test run. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6461) Add Basic Data Correctness Unit Tests
[ https://issues.apache.org/jira/browse/DRILL-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Farkas updated DRILL-6461: -- Reviewer: salim achouche > Add Basic Data Correctness Unit Tests > - > > Key: DRILL-6461 > URL: https://issues.apache.org/jira/browse/DRILL-6461 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > There are no data correctness unit tests for HashAgg. We need to add some. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6512) Remove unnecessary processing overhead from RecordBatchSizer
[ https://issues.apache.org/jira/browse/DRILL-6512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Padma Penumarthy updated DRILL-6512: Labels: ready-to-commit (was: ) Reviewer: Karthikeyan Manivannan > Remove unnecessary processing overhead from RecordBatchSizer > > > Key: DRILL-6512 > URL: https://issues.apache.org/jira/browse/DRILL-6512 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > record batch sizer collects lot of information about the record batch. Since > it is used now in every operator, for every batch, it makes sense to make it > as efficient as possible. Remove anything that is not needed and also, may be > provide two options one light weight and another which is more comprehensive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6537) Limit the batch size for buffering operators based on how much memory they get
[ https://issues.apache.org/jira/browse/DRILL-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525348#comment-16525348 ] Pritesh Maker commented on DRILL-6537: -- [https://github.com/apache/drill/pull/1342] ([~ppenumarthy]) > Limit the batch size for buffering operators based on how much memory they get > -- > > Key: DRILL-6537 > URL: https://issues.apache.org/jira/browse/DRILL-6537 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > we are using 16MB for output batch size for all operators by default. > however, for buffering operators, depending upon how much memory they get > allocated, it is possible that 16MB might be too much to use for output batch > size, causing them to spill sometimes. > Have output batch size to be minimum of 16 MB or 20% of the allocated memory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6410) Memory leak in Parquet Reader during cancellation
[ https://issues.apache.org/jira/browse/DRILL-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525259#comment-16525259 ] Pritesh Maker commented on DRILL-6410: -- [~vrozov] is this the PR - [https://github.com/apache/drill/pull/1333] ? > Memory leak in Parquet Reader during cancellation > - > > Key: DRILL-6410 > URL: https://issues.apache.org/jira/browse/DRILL-6410 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Reporter: salim achouche >Assignee: Vlad Rozov >Priority: Major > Fix For: 1.14.0 > > > Occasionally, a memory leak is observed within the flat Parquet reader when > query cancellation is invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6539) Record count not set for this vector container error
[ https://issues.apache.org/jira/browse/DRILL-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6539: - Reviewer: Karthikeyan Manivannan (was: Timothy Farkas) > Record count not set for this vector container error > - > > Key: DRILL-6539 > URL: https://issues.apache.org/jira/browse/DRILL-6539 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > This error is randomly seen when executing queries. > [Error Id: 6a2a49e5-28d9-4587-ab8b-5262c07f8fdc on drill196:31010] > (java.lang.IllegalStateException) Record count not set for this vector > container > com.google.common.base.Preconditions.checkState():173 > org.apache.drill.exec.record.VectorContainer.getRecordCount():394 > org.apache.drill.exec.record.RecordBatchSizer.():681 > org.apache.drill.exec.record.RecordBatchSizer.():665 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():441 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():882 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():891 > > org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():578 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():937 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():754 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():335 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():403 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():354 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():299 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.physical.impl.BaseRootExec.next():103 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():93 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():281 > org.apache.drill.common.SelfC
[jira] [Updated] (DRILL-6539) Record count not set for this vector container error
[ https://issues.apache.org/jira/browse/DRILL-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6539: - Labels: ready-to-commit (was: ) > Record count not set for this vector container error > - > > Key: DRILL-6539 > URL: https://issues.apache.org/jira/browse/DRILL-6539 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > This error is randomly seen when executing queries. > [Error Id: 6a2a49e5-28d9-4587-ab8b-5262c07f8fdc on drill196:31010] > (java.lang.IllegalStateException) Record count not set for this vector > container > com.google.common.base.Preconditions.checkState():173 > org.apache.drill.exec.record.VectorContainer.getRecordCount():394 > org.apache.drill.exec.record.RecordBatchSizer.():681 > org.apache.drill.exec.record.RecordBatchSizer.():665 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():441 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():882 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():891 > > org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():578 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():937 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():754 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():335 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():403 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():354 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():299 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.physical.impl.BaseRootExec.next():103 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():93 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():281 > org.apache.drill.common.SelfCleaningRunnable.run():3
[jira] [Commented] (DRILL-6539) Record count not set for this vector container error
[ https://issues.apache.org/jira/browse/DRILL-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525256#comment-16525256 ] Pritesh Maker commented on DRILL-6539: -- [~ppenumarthy] I don't see the PR with this JIRA > Record count not set for this vector container error > - > > Key: DRILL-6539 > URL: https://issues.apache.org/jira/browse/DRILL-6539 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > This error is randomly seen when executing queries. > [Error Id: 6a2a49e5-28d9-4587-ab8b-5262c07f8fdc on drill196:31010] > (java.lang.IllegalStateException) Record count not set for this vector > container > com.google.common.base.Preconditions.checkState():173 > org.apache.drill.exec.record.VectorContainer.getRecordCount():394 > org.apache.drill.exec.record.RecordBatchSizer.():681 > org.apache.drill.exec.record.RecordBatchSizer.():665 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate$BatchHolder.getActualSize():441 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate.getActualSize():882 > > org.apache.drill.exec.physical.impl.common.HashTableTemplate.makeDebugString():891 > > org.apache.drill.exec.physical.impl.common.HashPartition.makeDebugString():578 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.makeDebugString():937 > > org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():754 > org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():335 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.test.generated.HashAggregatorGen89497.doWork():617 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():176 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch():403 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():354 > > org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():299 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():137 > org.apache.drill.exec.record.AbstractRecordBatch.next():172 > org.apache.drill.exec.physical.impl.BaseRootExec.next():103 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():93 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecuto
[jira] [Updated] (DRILL-4020) The not-equal operator returns incorrect results when used on the HBase row key
[ https://issues.apache.org/jira/browse/DRILL-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-4020: - Reviewer: Parth Chandra > The not-equal operator returns incorrect results when used on the HBase row > key > --- > > Key: DRILL-4020 > URL: https://issues.apache.org/jira/browse/DRILL-4020 > Project: Apache Drill > Issue Type: Bug > Components: Storage - HBase >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0 > Environment: Drill Sandbox >Reporter: Akihiko Kusanagi >Assignee: Akihiko Kusanagi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.14.0 > > > Create a test HBase table: > {noformat} > hbase> create 'table', 'f' > hbase> put 'table', 'row1', 'f:c', 'value1' > hbase> put 'table', 'row2', 'f:c', 'value2' > hbase> put 'table', 'row3', 'f:c', 'value3' > {noformat} > The table looks like this: > {noformat} > 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM > hbase.`table`; > +-+ > | EXPR$0 | > +-+ > | row1| > | row2| > | row3| > +-+ > 1 row selected (4.596 seconds) > {noformat} > However, this query returns incorrect results when a not-equal operator is > used on the row key: > {noformat} > 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM > hbase.`table` WHERE row_key <> 'row1'; > +-+ > | EXPR$0 | > +-+ > | row1| > | row2| > | row3| > +-+ > 1 row selected (0.573 seconds) > {noformat} > In the query plan, there is no RowFilter: > {noformat} > 00-00Screen > 00-01 Project(EXPR$0=[CONVERT_FROMUTF8($0)]) > 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec > [tableName=table, startRow=, stopRow=, filter=null], columns=[`row_key`]]]) > {noformat} > When the query has multiple not-equal operators, it works fine: > {noformat} > 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM > hbase.`table` WHERE row_key <> 'row1' AND row_key <> 'row2'; > +-+ > | EXPR$0 | > +-+ > | row3| > +-+ > 1 row selected (0.255 seconds) > {noformat} > In the query plan, a FilterList has two RowFilters with NOT_EQUAL operators: > {noformat} > 00-00Screen > 00-01 Project(EXPR$0=[CONVERT_FROMUTF8($0)]) > 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec > [tableName=table, startRow=, stopRow=, filter=FilterList AND (2/2): > [RowFilter (NOT_EQUAL, row1), RowFilter (NOT_EQUAL, row2)]], > columns=[`row_key`]]]) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-4020) The not-equal operator returns incorrect results when used on the HBase row key
[ https://issues.apache.org/jira/browse/DRILL-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-4020: - Fix Version/s: 1.14.0 > The not-equal operator returns incorrect results when used on the HBase row > key > --- > > Key: DRILL-4020 > URL: https://issues.apache.org/jira/browse/DRILL-4020 > Project: Apache Drill > Issue Type: Bug > Components: Storage - HBase >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0 > Environment: Drill Sandbox >Reporter: Akihiko Kusanagi >Assignee: Akihiko Kusanagi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.14.0 > > > Create a test HBase table: > {noformat} > hbase> create 'table', 'f' > hbase> put 'table', 'row1', 'f:c', 'value1' > hbase> put 'table', 'row2', 'f:c', 'value2' > hbase> put 'table', 'row3', 'f:c', 'value3' > {noformat} > The table looks like this: > {noformat} > 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM > hbase.`table`; > +-+ > | EXPR$0 | > +-+ > | row1| > | row2| > | row3| > +-+ > 1 row selected (4.596 seconds) > {noformat} > However, this query returns incorrect results when a not-equal operator is > used on the row key: > {noformat} > 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM > hbase.`table` WHERE row_key <> 'row1'; > +-+ > | EXPR$0 | > +-+ > | row1| > | row2| > | row3| > +-+ > 1 row selected (0.573 seconds) > {noformat} > In the query plan, there is no RowFilter: > {noformat} > 00-00Screen > 00-01 Project(EXPR$0=[CONVERT_FROMUTF8($0)]) > 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec > [tableName=table, startRow=, stopRow=, filter=null], columns=[`row_key`]]]) > {noformat} > When the query has multiple not-equal operators, it works fine: > {noformat} > 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM > hbase.`table` WHERE row_key <> 'row1' AND row_key <> 'row2'; > +-+ > | EXPR$0 | > +-+ > | row3| > +-+ > 1 row selected (0.255 seconds) > {noformat} > In the query plan, a FilterList has two RowFilters with NOT_EQUAL operators: > {noformat} > 00-00Screen > 00-01 Project(EXPR$0=[CONVERT_FROMUTF8($0)]) > 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec > [tableName=table, startRow=, stopRow=, filter=FilterList AND (2/2): > [RowFilter (NOT_EQUAL, row1), RowFilter (NOT_EQUAL, row2)]], > columns=[`row_key`]]]) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-4020) The not-equal operator returns incorrect results when used on the HBase row key
[ https://issues.apache.org/jira/browse/DRILL-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-4020: - Labels: ready-to-commit (was: ) > The not-equal operator returns incorrect results when used on the HBase row > key > --- > > Key: DRILL-4020 > URL: https://issues.apache.org/jira/browse/DRILL-4020 > Project: Apache Drill > Issue Type: Bug > Components: Storage - HBase >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0 > Environment: Drill Sandbox >Reporter: Akihiko Kusanagi >Assignee: Akihiko Kusanagi >Priority: Critical > Labels: ready-to-commit > Fix For: 1.14.0 > > > Create a test HBase table: > {noformat} > hbase> create 'table', 'f' > hbase> put 'table', 'row1', 'f:c', 'value1' > hbase> put 'table', 'row2', 'f:c', 'value2' > hbase> put 'table', 'row3', 'f:c', 'value3' > {noformat} > The table looks like this: > {noformat} > 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM > hbase.`table`; > +-+ > | EXPR$0 | > +-+ > | row1| > | row2| > | row3| > +-+ > 1 row selected (4.596 seconds) > {noformat} > However, this query returns incorrect results when a not-equal operator is > used on the row key: > {noformat} > 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM > hbase.`table` WHERE row_key <> 'row1'; > +-+ > | EXPR$0 | > +-+ > | row1| > | row2| > | row3| > +-+ > 1 row selected (0.573 seconds) > {noformat} > In the query plan, there is no RowFilter: > {noformat} > 00-00Screen > 00-01 Project(EXPR$0=[CONVERT_FROMUTF8($0)]) > 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec > [tableName=table, startRow=, stopRow=, filter=null], columns=[`row_key`]]]) > {noformat} > When the query has multiple not-equal operators, it works fine: > {noformat} > 0: jdbc:drill:zk=maprdemo:5181> SELECT CONVERT_FROM(row_key, 'UTF8') FROM > hbase.`table` WHERE row_key <> 'row1' AND row_key <> 'row2'; > +-+ > | EXPR$0 | > +-+ > | row3| > +-+ > 1 row selected (0.255 seconds) > {noformat} > In the query plan, a FilterList has two RowFilters with NOT_EQUAL operators: > {noformat} > 00-00Screen > 00-01 Project(EXPR$0=[CONVERT_FROMUTF8($0)]) > 00-02Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec > [tableName=table, startRow=, stopRow=, filter=FilterList AND (2/2): > [RowFilter (NOT_EQUAL, row1), RowFilter (NOT_EQUAL, row2)]], > columns=[`row_key`]]]) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6537) Limit the batch size for buffering operators based on how much memory they get
[ https://issues.apache.org/jira/browse/DRILL-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6537: - Reviewer: Boaz Ben-Zvi > Limit the batch size for buffering operators based on how much memory they get > -- > > Key: DRILL-6537 > URL: https://issues.apache.org/jira/browse/DRILL-6537 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > we are using 16MB for output batch size for all operators by default. > however, for buffering operators, depending upon how much memory they get > allocated, it is possible that 16MB might be too much to use for output batch > size, causing them to spill sometimes. > Have output batch size to be minimum of 16 MB or 20% of the allocated memory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6546) Allow unnest function with nested columns and complex expressions
[ https://issues.apache.org/jira/browse/DRILL-6546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6546: - Fix Version/s: 1.14.0 > Allow unnest function with nested columns and complex expressions > - > > Key: DRILL-6546 > URL: https://issues.apache.org/jira/browse/DRILL-6546 > Project: Apache Drill > Issue Type: Bug >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.14.0 > > > Currently queries with unnest and nested columns or complex expressions > inside fails: > {code:sql} > select u.item from cp.`lateraljoin/nested-customer.parquet` c, > unnest(c.orders.items) as u(item) > {code} > fails with error: > {noformat} > VALIDATION ERROR: From line 2, column 10 to line 2, column 21: Column > 'orders.items' not found in table 'c' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6546) Allow unnest function with nested columns and complex expressions
Volodymyr Vysotskyi created DRILL-6546: -- Summary: Allow unnest function with nested columns and complex expressions Key: DRILL-6546 URL: https://issues.apache.org/jira/browse/DRILL-6546 Project: Apache Drill Issue Type: Bug Reporter: Volodymyr Vysotskyi Assignee: Volodymyr Vysotskyi Currently queries with unnest and nested columns or complex expressions inside fails: {code:sql} select u.item from cp.`lateraljoin/nested-customer.parquet` c, unnest(c.orders.items) as u(item) {code} fails with error: {noformat} VALIDATION ERROR: From line 2, column 10 to line 2, column 21: Column 'orders.items' not found in table 'c' {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6545) Projection Push down into Lateral Join operator.
[ https://issues.apache.org/jira/browse/DRILL-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanumath Rao Maduri updated DRILL-6545: --- Affects Version/s: (was: 1.13.0) > Projection Push down into Lateral Join operator. > > > Key: DRILL-6545 > URL: https://issues.apache.org/jira/browse/DRILL-6545 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Reporter: Hanumath Rao Maduri >Assignee: Hanumath Rao Maduri >Priority: Major > Fix For: 1.14.0 > > > For the Lateral’s logical and physical plan node, we would need to add an > output RowType such that a Projection can be pushed down to Lateral. > Currently, Lateral will produce all columns from left and right and it > depends on a subsequent Project to eliminate unneeded columns. However, this > will blow up the memory use of Lateral since each column from the left will > be replicated N times based on N rows coming from UNNEST. We can have a > ProjectLateralPushdownRule that pushes only the plain columns onto LATERAL > but keeps the expression evalulations as part of the Project above the > Lateral. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6545) Projection Push down into Lateral Join operator.
[ https://issues.apache.org/jira/browse/DRILL-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanumath Rao Maduri updated DRILL-6545: --- Issue Type: Improvement (was: Bug) > Projection Push down into Lateral Join operator. > > > Key: DRILL-6545 > URL: https://issues.apache.org/jira/browse/DRILL-6545 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Reporter: Hanumath Rao Maduri >Assignee: Hanumath Rao Maduri >Priority: Major > Fix For: 1.14.0 > > > For the Lateral’s logical and physical plan node, we would need to add an > output RowType such that a Projection can be pushed down to Lateral. > Currently, Lateral will produce all columns from left and right and it > depends on a subsequent Project to eliminate unneeded columns. However, this > will blow up the memory use of Lateral since each column from the left will > be replicated N times based on N rows coming from UNNEST. We can have a > ProjectLateralPushdownRule that pushes only the plain columns onto LATERAL > but keeps the expression evalulations as part of the Project above the > Lateral. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6545) Projection Push down into Lateral Join operator.
Hanumath Rao Maduri created DRILL-6545: -- Summary: Projection Push down into Lateral Join operator. Key: DRILL-6545 URL: https://issues.apache.org/jira/browse/DRILL-6545 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.13.0 Reporter: Hanumath Rao Maduri Assignee: Hanumath Rao Maduri Fix For: 1.14.0 For the Lateral’s logical and physical plan node, we would need to add an output RowType such that a Projection can be pushed down to Lateral. Currently, Lateral will produce all columns from left and right and it depends on a subsequent Project to eliminate unneeded columns. However, this will blow up the memory use of Lateral since each column from the left will be replicated N times based on N rows coming from UNNEST. We can have a ProjectLateralPushdownRule that pushes only the plain columns onto LATERAL but keeps the expression evalulations as part of the Project above the Lateral. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6534) Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to dependencyManagement
[ https://issues.apache.org/jira/browse/DRILL-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bohdan Kazydub updated DRILL-6534: -- Summary: Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to dependencyManagement (was: Upgrade ZooKeeper patch version to 3.4.12 and add Apache curator to dependencyManagement) > Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to > dependencyManagement > > > Key: DRILL-6534 > URL: https://issues.apache.org/jira/browse/DRILL-6534 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Minor > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6534) Upgrade ZooKeeper patch version to 3.4.12 and add Apache curator to dependencyManagement
[ https://issues.apache.org/jira/browse/DRILL-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bohdan Kazydub updated DRILL-6534: -- Summary: Upgrade ZooKeeper patch version to 3.4.12 and add Apache curator to dependencyManagement (was: Upgrade ZooKeeper patch version to 3.4.12) > Upgrade ZooKeeper patch version to 3.4.12 and add Apache curator to > dependencyManagement > > > Key: DRILL-6534 > URL: https://issues.apache.org/jira/browse/DRILL-6534 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Minor > Fix For: 1.14.0 > > > Also moved apache curator dependencies to dependencyManagement -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6534) Upgrade ZooKeeper patch version to 3.4.12 and add Apache curator to dependencyManagement
[ https://issues.apache.org/jira/browse/DRILL-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bohdan Kazydub updated DRILL-6534: -- Description: (was: Also moved apache curator dependencies to dependencyManagement) > Upgrade ZooKeeper patch version to 3.4.12 and add Apache curator to > dependencyManagement > > > Key: DRILL-6534 > URL: https://issues.apache.org/jira/browse/DRILL-6534 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Minor > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6534) Upgrade ZooKeeper patch version to 3.4.12
[ https://issues.apache.org/jira/browse/DRILL-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bohdan Kazydub updated DRILL-6534: -- Description: Also moved apache curator dependencies to dependencyManagement > Upgrade ZooKeeper patch version to 3.4.12 > - > > Key: DRILL-6534 > URL: https://issues.apache.org/jira/browse/DRILL-6534 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Minor > Fix For: 1.14.0 > > > Also moved apache curator dependencies to dependencyManagement -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6534) Upgrade ZooKeeper patch version to 3.4.12
[ https://issues.apache.org/jira/browse/DRILL-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bohdan Kazydub updated DRILL-6534: -- Summary: Upgrade ZooKeeper patch version to 3.4.12 (was: Upgrade ZooKeeper patch version to 3.4.11) > Upgrade ZooKeeper patch version to 3.4.12 > - > > Key: DRILL-6534 > URL: https://issues.apache.org/jira/browse/DRILL-6534 > Project: Apache Drill > Issue Type: Task >Reporter: Bohdan Kazydub >Assignee: Bohdan Kazydub >Priority: Minor > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (DRILL-6540) Upgrade to HADOOP-3.1 libraries
[ https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bohdan Kazydub updated DRILL-6540: -- Comment: was deleted (was: Note: I've tried to set 4.0.1 version explicitly for curator libraries, everything seemed to work, but there is a need to increase maxsize of drill-jdbc-all module by 3 MB as curator-client.jar of version 4.0.1 has size of 2.7 MB compared to current (2.7.1) version's size of 70 kB.) > Upgrade to HADOOP-3.1 libraries > > > Key: DRILL-6540 > URL: https://issues.apache.org/jira/browse/DRILL-6540 > Project: Apache Drill > Issue Type: Improvement >Reporter: Vitalii Diravka >Priority: Major > > Currently Drill uses 2.7.1 version of hadoop libraries (hadoop-common, > hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, > hadoop-yarn-client). > Half of year ago the [Hadoop > 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and > recently it was an update - [Hadoop > 3.1|https://hadoop.apache.org/docs/r3.1.0/]. > To use Drill under Hadoop3.0 distribution we need this upgrade. Also the > newer version includes new features, which can be useful for Drill. > This upgrade is also needed to leverage the newest version of zookeeper > libraries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6540) Upgrade to HADOOP-3.1 libraries
[ https://issues.apache.org/jira/browse/DRILL-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524934#comment-16524934 ] Bohdan Kazydub commented on DRILL-6540: --- Note: I've tried to set 4.0.1 version explicitly for curator libraries, everything seemed to work, but there is a need to increase maxsize of drill-jdbc-all module by 3 MB as curator-client.jar of version 4.0.1 has size of 2.7 MB compared to current (2.7.1) version's size of 70 kB. > Upgrade to HADOOP-3.1 libraries > > > Key: DRILL-6540 > URL: https://issues.apache.org/jira/browse/DRILL-6540 > Project: Apache Drill > Issue Type: Improvement >Reporter: Vitalii Diravka >Priority: Major > > Currently Drill uses 2.7.1 version of hadoop libraries (hadoop-common, > hadoop-hdfs, hadoop-annotations, hadoop-aws, hadoop-yarn-api, hadoop-client, > hadoop-yarn-client). > Half of year ago the [Hadoop > 3.0|https://hadoop.apache.org/docs/r3.0.0/index.html] was released and > recently it was an update - [Hadoop > 3.1|https://hadoop.apache.org/docs/r3.1.0/]. > To use Drill under Hadoop3.0 distribution we need this upgrade. Also the > newer version includes new features, which can be useful for Drill. > This upgrade is also needed to leverage the newest version of zookeeper > libraries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (DRILL-6544) Timestamp value in Drill UI showed inconsistently with the same value retrieved from sqline
[ https://issues.apache.org/jira/browse/DRILL-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Gozhiy reassigned DRILL-6544: --- Assignee: Anton Gozhiy > Timestamp value in Drill UI showed inconsistently with the same value > retrieved from sqline > --- > > Key: DRILL-6544 > URL: https://issues.apache.org/jira/browse/DRILL-6544 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.14.0 >Reporter: Anton Gozhiy >Assignee: Anton Gozhiy >Priority: Minor > > *Query:* > {code:sql} > select timestamp '2008-2-23 12:23:34' from (values(1)); > {code} > *Expected result (from sqline):* > 2008-02-23 12:23:34.0 > *Actual result (from Drill UI):* > 2008-02-23T12:23:34 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6544) Timestamp value in Drill UI showed inconsistently with the same value retrieved from sqline
Anton Gozhiy created DRILL-6544: --- Summary: Timestamp value in Drill UI showed inconsistently with the same value retrieved from sqline Key: DRILL-6544 URL: https://issues.apache.org/jira/browse/DRILL-6544 Project: Apache Drill Issue Type: Bug Affects Versions: 1.14.0 Reporter: Anton Gozhiy *Query:* {code:sql} select timestamp '2008-2-23 12:23:34' from (values(1)); {code} *Expected result (from sqline):* 2008-02-23 12:23:34.0 *Actual result (from Drill UI):* 2008-02-23T12:23:34 -- This message was sent by Atlassian JIRA (v7.6.3#76005)