date:20170329

[jira] [Updated] (DRILL-5382) Error: Missing function implementation: [isnotnull(MAP-REQUIRED)]

2017-03-29 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-5382:
--
Attachment: drill_3562_b.json

> Error: Missing function implementation: [isnotnull(MAP-REQUIRED)]
> -
>
> Key: DRILL-5382
> URL: https://issues.apache.org/jira/browse/DRILL-5382
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Khurram Faraaz
> Attachments: drill_3562_b.json
>
>
> Projecting a map from JSON data and filter non null values results in 
> SchemaChangeException, Error: Missing function implementation: 
> [isnotnull(MAP-REQUIRED)].
> Data used in test is available here - wget 
> http://data.githubarchive.org/2015-01-01-15.json.gz
> Drill 1.11.0 commit id: adbf363
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select org from `2015-01-01-15.json` where org 
> is not null;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [isnotnull(MAP-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 3a776c68-6476-4bd8-a9e6-928bfc2ef5bd on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp>
> {noformat}
> Select without IS NOT NULL filter, returns correct results.
> {noformat}
> apache drill 1.11.0-SNAPSHOT
> "just drill it"
> 0: jdbc:drill:schema=dfs.tmp> select org from `2015-01-01-15.json`;
> +-+
> | org |
> +-+
> | {} |
> | {} |
> | {} |
> | 
> {"id":9285252,"login":"visionmedia","gravatar_id":"","url":"https://api.github.com/orgs/visionmedia","avatar_url":"https://avatars.githubusercontent.com/u/9285252?"}
>  |
> | {} |
> | {} |
> | {} |
> ...
> ...
> | {} |
> | {} |
> | {} |
> | 
> {"id":9216151,"login":"osp","gravatar_id":"","url":"https://api.github.com/orgs/osp","avatar_url":"https://avatars.githubusercontent.com/u/9216151?"}
>  |
> | {} |
> | 
> {"id":296074,"login":"zendframework","gravatar_id":"","url":"https://api.github.com/orgs/zendframework","avatar_url":"https://avatars.githubusercontent.com/u/296074?"}
>  |
> | {} |
> | {} |
> | {} |
> | {} |
> | 
> {"id":9216151,"login":"osp","gravatar_id":"","url":"https://api.github.com/orgs/osp","avatar_url":"https://avatars.githubusercontent.com/u/9216151?"}
>  |
> | {} |
> | 
> {"id":5822862,"login":"SuaveIO","gravatar_id":"","url":"https://api.github.com/orgs/SuaveIO","avatar_url":"https://avatars.githubusercontent.com/u/5822862?"}
>  |
> | 
> {"id":5822862,"login":"SuaveIO","gravatar_id":"","url":"https://api.github.com/orgs/SuaveIO","avatar_url":"https://avatars.githubusercontent.com/u/5822862?"}
>  |
> | {} |
> | 
> {"id":5822862,"login":"SuaveIO","gravatar_id":"","url":"https://api.github.com/orgs/SuaveIO","avatar_url":"https://avatars.githubusercontent.com/u/5822862?"}
>  |
> | 
> {"id":2918581,"login":"twbs","gravatar_id":"","url":"https://api.github.com/orgs/twbs","avatar_url":"https://avatars.githubusercontent.com/u/2918581?"}
>  |
> | 
> {"id":1104713,"login":"s9y","gravatar_id":"","url":"https://api.github.com/orgs/s9y","avatar_url":"https://avatars.githubusercontent.com/u/1104713?"}
>  |
> | {} |
> | {} |
> | {} |
> | {} |
> +-+
> 11,351 rows selected (0.865 seconds)
> {noformat} 
> Stack trace from drillbit.log
> {noformat}
> Caused by: org.apache.drill.exec.exception.SchemaChangeException: Failure 
> while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [isnotnull(MAP-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--..
> at 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer(FilterRecordBatch.java:186)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema(FilterRecordBatch.java:111)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>

[jira] [Commented] (DRILL-5382) Error: Missing function implementation: [isnotnull(MAP-REQUIRED)]

2017-03-29 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948425#comment-15948425
 ] 

Khurram Faraaz commented on DRILL-5382:
---

Another failing case.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from (select FLATTEN(t.a.b.c) AS c from 
`drill_3562_b.json` t) flat WHERE flat.c IS NOT NULL limit 1;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function implementation: 
[isnotnull(MAP-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--..

Fragment 0:0

[Error Id: 4f5a3769-4b6c-432b-8beb-a893e6e59949 on centos-01.qa.lab:31010] 
(state=,code=0)
0: jdbc:drill:schema=dfs.tmp> select * from (select FLATTEN(t.a.b.c) AS c from 
`drill_3562_b.json` t) flat WHERE flat.c IS NULL limit 1;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function implementation: 
[isnull(MAP-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--..

Fragment 0:0

[Error Id: 5d9c5285-1131-4275-8ebf-be342fcd8f7e on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

> Error: Missing function implementation: [isnotnull(MAP-REQUIRED)]
> -
>
> Key: DRILL-5382
> URL: https://issues.apache.org/jira/browse/DRILL-5382
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.11.0
>Reporter: Khurram Faraaz
>
> Projecting a map from JSON data and filter non null values results in 
> SchemaChangeException, Error: Missing function implementation: 
> [isnotnull(MAP-REQUIRED)].
> Data used in test is available here - wget 
> http://data.githubarchive.org/2015-01-01-15.json.gz
> Drill 1.11.0 commit id: adbf363
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select org from `2015-01-01-15.json` where org 
> is not null;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [isnotnull(MAP-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 3a776c68-6476-4bd8-a9e6-928bfc2ef5bd on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp>
> {noformat}
> Select without IS NOT NULL filter, returns correct results.
> {noformat}
> apache drill 1.11.0-SNAPSHOT
> "just drill it"
> 0: jdbc:drill:schema=dfs.tmp> select org from `2015-01-01-15.json`;
> +-+
> | org |
> +-+
> | {} |
> | {} |
> | {} |
> | 
> {"id":9285252,"login":"visionmedia","gravatar_id":"","url":"https://api.github.com/orgs/visionmedia","avatar_url":"https://avatars.githubusercontent.com/u/9285252?"}
>  |
> | {} |
> | {} |
> | {} |
> ...
> ...
> | {} |
> | {} |
> | {} |
> | 
> {"id":9216151,"login":"osp","gravatar_id":"","url":"https://api.github.com/orgs/osp","avatar_url":"https://avatars.githubusercontent.com/u/9216151?"}
>  |
> | {} |
> | 
> {"id":296074,"login":"zendframework","gravatar_id":"","url":"https://api.github.com/orgs/zendframework","avatar_url":"https://avatars.githubusercontent.com/u/296074?"}
>  |
> | {} |
> | {} |
> | {} |
> | {} |
> | 
> {"id":9216151,"login":"osp","gravatar_id":"","url":"https://api.github.com/orgs/osp","avatar_url":"https://avatars.githubusercontent.com/u/9216151?"}
>  |
> | {} |
> | 
> {"id":5822862,"login":"SuaveIO","gravatar_id":"","url":"https://api.github.com/orgs/SuaveIO","avatar_url":"https://avatars.githubusercontent.com/u/5822862?"}
>  |
> | 
> {"id":5822862,"login":"SuaveIO","gravatar_id":"","url":"https://api.github.com/orgs/SuaveIO","avatar_url":"https://avatars.githubusercontent.com/u/5822862?"}
>  |
> | {} |
> | 
> {"id":5822862,"login":"SuaveIO","gravatar_id":"","url":"https://api.github.com/orgs/SuaveIO","avatar_url":"https://avatars.githubusercontent.com/u/5822862?"}
>  |
> | 
> {"id":2918581,"login":"twbs","gravatar_id":"","url":"https://api.github.com/orgs/twbs","avatar_url":"https://avatars.githubusercontent.com/u/2918581?"}
>  |
> | 
> {"id":1104713,"login":"s9y","gravatar_id":"","url":"https://api.github.com/orgs/s9y","avatar_url":"https://avatars.githubusercontent.com/u/1104713?"}
>  |
> | {} |
> | {} |
> | {} |
> | {} |
> +-+
> 11,351 rows selected (0.865 seconds)
> {noformat} 
> Stack trace from drillbit.log
> {noformat}
> Caused by: org.apache.drill.exec.exception.SchemaChangeException: Failure 
> while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [isnotnull(MAP-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--..
> at 
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer(FilterRecordBatch.java:186)
>

[jira] [Commented] (DRILL-5323) Provide test tools to create, populate and compare row sets

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948235#comment-15948235
 ] 

ASF GitHub Bot commented on DRILL-5323:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/785#discussion_r108768096
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetUtilities.java 
---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.test.rowSet;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.selection.SelectionVector2;
+import org.apache.drill.exec.vector.accessor.AccessorUtilities;
+import org.apache.drill.exec.vector.accessor.ColumnAccessor.ValueType;
+import org.apache.drill.exec.vector.accessor.ColumnWriter;
+import org.apache.drill.test.rowSet.RowSet.RowSetWriter;
+import org.joda.time.Duration;
+import org.joda.time.Period;
+
+public class RowSetUtilities {
+
+  private RowSetUtilities() { }
+
+  public static void reverse(SelectionVector2 sv2) {
+int count = sv2.getCount();
+for (int i = 0; i < count / 2; i++) {
+  char temp = sv2.getIndex(i);
+  int dest = count - 1 - i;
+  sv2.setIndex(i, sv2.getIndex(dest));
+  sv2.setIndex(dest, temp);
+}
+  }
+
+  /**
+   * Set a test data value from an int. Uses the type information of the
+   * column to handle interval types. Else, uses the value type of the
+   * accessor. The value set here is purely for testing; the mapping
+   * from ints to intervals has no real meaning.
+   *
+   * @param rowWriter
+   * @param index
+   * @param value
+   */
+
+  public static void setFromInt(RowSetWriter rowWriter, int index, int 
value) {
+ColumnWriter writer = rowWriter.column(index);
+if (writer.valueType() == ValueType.PERIOD) {
+  setPeriodFromInt(writer, 
rowWriter.schema().column(index).getType().getMinorType(), value);
+} else {
+  AccessorUtilities.setFromInt(writer, value);
+}
+  }
+
+  public static void setPeriodFromInt(ColumnWriter writer, MinorType 
minorType,
+  int value) {
+switch (minorType) {
+case INTERVAL:
+  writer.setPeriod(Duration.millis(value).toPeriod());
+  break;
+case INTERVALYEAR:
+  writer.setPeriod(Period.years(value / 12).withMonths(value % 12));
+  break;
+case INTERVALDAY:
+  int sec = value % 60;
+  value = value / 60;
+  int min = value % 60;
+  value = value / 60;
+  
writer.setPeriod(Period.days(value).withMinutes(min).withSeconds(sec));
--- End diff --

Looks like it's missing calculation to hours and then days


> Provide test tools to create, populate and compare row sets
> ---
>
> Key: DRILL-5323
> URL: https://issues.apache.org/jira/browse/DRILL-5323
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Operators work with individual row sets. A row set is a collection of records 
> stored as column vectors. (Drill uses various terms for this concept. A 
> record batch is a row set with an operator implementation wrapped around it. 
> A vector container is a row set, but with much functionality left as an 
> exercise for the developer. And so on.)
> To simplify tests, we need a {{TestRowSet}} concept that wraps a 
> {{VectorContainer}} and provides easy ways to:
> * Define a schema for the row set.
> * Create a set of vectors that implement the schema.
> * Populate the row set with test data via code.
> * Add an SV2 to the row set.
> * Pass the row set to operator components (such as generated

[jira] [Commented] (DRILL-5323) Provide test tools to create, populate and compare row sets

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948233#comment-15948233
 ] 

ASF GitHub Bot commented on DRILL-5323:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/785#discussion_r108777101
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetPrinter.java ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.test.rowSet;
+
+import java.io.PrintStream;
+
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.test.rowSet.RowSet.RowSetReader;
+import org.apache.drill.test.rowSet.RowSetSchema.AccessSchema;
+
+public class RowSetPrinter {
+  private RowSet rowSet;
+
+  public RowSetPrinter(RowSet rowSet) {
+this.rowSet = rowSet;
+  }
+
+  public void print() {
+print(System.out);
+  }
+
+  public void print(PrintStream out) {
+SelectionVectorMode selectionMode = rowSet.getIndirectionType();
+RowSetReader reader = rowSet.reader();
+int colCount = reader.width();
+printSchema(out, selectionMode);
+while (reader.next()) {
+  printHeader(out, reader, selectionMode);
+  for (int i = 0; i < colCount; i++) {
+if (i > 0) {
+  out.print(", ");
+}
+out.print(reader.getAsString(i));
+  }
+  out.println();
+}
+  }
+
+  private void printSchema(PrintStream out, SelectionVectorMode 
selectionMode) {
+out.print("#");
+switch (selectionMode) {
+case FOUR_BYTE:
+  out.print(" (batch #, row #)");
+  break;
+case TWO_BYTE:
+  out.print(" (row #)");
+  break;
+default:
+  break;
+}
+out.print(": ");
+AccessSchema schema = rowSet.schema().access();
+for (int i = 0; i < schema.count(); i++) {
+  if (i > 0) {
--- End diff --

How about _maps_ inside AccessSchema ?


> Provide test tools to create, populate and compare row sets
> ---
>
> Key: DRILL-5323
> URL: https://issues.apache.org/jira/browse/DRILL-5323
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Operators work with individual row sets. A row set is a collection of records 
> stored as column vectors. (Drill uses various terms for this concept. A 
> record batch is a row set with an operator implementation wrapped around it. 
> A vector container is a row set, but with much functionality left as an 
> exercise for the developer. And so on.)
> To simplify tests, we need a {{TestRowSet}} concept that wraps a 
> {{VectorContainer}} and provides easy ways to:
> * Define a schema for the row set.
> * Create a set of vectors that implement the schema.
> * Populate the row set with test data via code.
> * Add an SV2 to the row set.
> * Pass the row set to operator components (such as generated code blocks.)
> * Compare the results of the operation with an expected result set.
> * Dispose of the underling direct memory when work is done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5323) Provide test tools to create, populate and compare row sets

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948234#comment-15948234
 ] 

ASF GitHub Bot commented on DRILL-5323:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/785#discussion_r108808598
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/AbstractSingleRowSet.java
 ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.test.rowSet;
+
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.impl.spill.RecordBatchSizer;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.MapVector;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.RowSetSchema.LogicalColumn;
+import org.apache.drill.test.rowSet.RowSetSchema.PhysicalSchema;
+
+public abstract class AbstractSingleRowSet extends AbstractRowSet 
implements SingleRowSet {
+
+  public abstract static class StructureBuilder {
+protected final PhysicalSchema schema;
+protected final BufferAllocator allocator;
+protected final ValueVector[] valueVectors;
+protected final MapVector[] mapVectors;
+protected int vectorIndex;
+protected int mapIndex;
+
+public StructureBuilder(BufferAllocator allocator, RowSetSchema 
schema) {
+  this.allocator = allocator;
+  this.schema = schema.physical();
+  valueVectors = new ValueVector[schema.access().count()];
+  if (schema.access().mapCount() == 0) {
+mapVectors = null;
+  } else {
+mapVectors = new MapVector[schema.access().mapCount()];
+  }
+}
+  }
+
+  public static class VectorBuilder extends StructureBuilder {
+
+public VectorBuilder(BufferAllocator allocator, RowSetSchema schema) {
+  super(allocator, schema);
+}
+
+public ValueVector[] buildContainer(VectorContainer container) {
+  for (int i = 0; i < schema.count(); i++) {
+LogicalColumn colSchema = schema.column(i);
+@SuppressWarnings("resource")
+ValueVector v = TypeHelper.getNewVector(colSchema.field, 
allocator, null);
+container.add(v);
+if (colSchema.field.getType().getMinorType() == MinorType.MAP) {
+  MapVector mv = (MapVector) v;
+  mapVectors[mapIndex++] = mv;
+  buildMap(mv, colSchema.mapSchema);
+} else {
+  valueVectors[vectorIndex++] = v;
+}
+  }
+  container.buildSchema(SelectionVectorMode.NONE);
+  return valueVectors;
+}
+
+private void buildMap(MapVector mapVector, PhysicalSchema mapSchema) {
+  for (int i = 0; i < mapSchema.count(); i++) {
+LogicalColumn colSchema = mapSchema.column(i);
+MajorType type = colSchema.field.getType();
+Class vectorClass = 
TypeHelper.getValueVectorClass(type.getMinorType(), type.getMode());
+@SuppressWarnings("resource")
+ValueVector v = mapVector.addOrGet(colSchema.field.getName(), 
type, vectorClass);
+if (type.getMinorType() == MinorType.MAP) {
+  MapVector mv = (MapVector) v;
+  mapVectors[mapIndex++] = mv;
+  buildMap(mv, colSchema.mapSchema);
+} else {
+  valueVectors[vectorIndex++] = v;
+}
+  }
+}
+  }
+
+  public

[jira] [Commented] (DRILL-5323) Provide test tools to create, populate and compare row sets

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948229#comment-15948229
 ] 

ASF GitHub Bot commented on DRILL-5323:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/785#discussion_r108814142
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/DirectRowSet.java ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.test.rowSet;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector2;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.ValueVector;
+import 
org.apache.drill.test.rowSet.AbstractRowSetAccessor.AbstractRowIndex;
+import org.apache.drill.test.rowSet.AbstractRowSetAccessor.BoundedRowIndex;
+import org.apache.drill.test.rowSet.RowSet.ExtendableRowSet;
+
+public class DirectRowSet extends AbstractSingleRowSet implements 
ExtendableRowSet {
+
+  private static class DirectRowIndex extends BoundedRowIndex {
+
+public DirectRowIndex(int rowCount) {
+  super(rowCount);
+}
+
+@Override
+public int index() { return rowIndex; }
+
+@Override
+public int batch() { return 0; }
--- End diff --

Can be moved to AbstractRowIndex class to return 0 by default. Override by 
derived class like HyperRowIndex to have different implementation.


> Provide test tools to create, populate and compare row sets
> ---
>
> Key: DRILL-5323
> URL: https://issues.apache.org/jira/browse/DRILL-5323
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Operators work with individual row sets. A row set is a collection of records 
> stored as column vectors. (Drill uses various terms for this concept. A 
> record batch is a row set with an operator implementation wrapped around it. 
> A vector container is a row set, but with much functionality left as an 
> exercise for the developer. And so on.)
> To simplify tests, we need a {{TestRowSet}} concept that wraps a 
> {{VectorContainer}} and provides easy ways to:
> * Define a schema for the row set.
> * Create a set of vectors that implement the schema.
> * Populate the row set with test data via code.
> * Add an SV2 to the row set.
> * Pass the row set to operator components (such as generated code blocks.)
> * Compare the results of the operation with an expected result set.
> * Dispose of the underling direct memory when work is done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5323) Provide test tools to create, populate and compare row sets

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948231#comment-15948231
 ] 

ASF GitHub Bot commented on DRILL-5323:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/785#discussion_r108761814
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/SchemaBuilder.java ---
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.test.rowSet;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.MaterializedField;
+
+/**
+ * Builder of a row set schema expressed as a list of materialized
+ * fields. Optimized for use when creating schemas by hand in tests.
+ * 
+ * Example usage to create the following schema: 
+ * (c: INT, a: MAP(c: VARCHAR, d: INT, e: MAP(f: VARCHAR), g: INT), h: 
BIGINT)
--- End diff --

... a: MAP(_b: VARCHAR_ .. instead of _c:VARCHAR_



> Provide test tools to create, populate and compare row sets
> ---
>
> Key: DRILL-5323
> URL: https://issues.apache.org/jira/browse/DRILL-5323
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Operators work with individual row sets. A row set is a collection of records 
> stored as column vectors. (Drill uses various terms for this concept. A 
> record batch is a row set with an operator implementation wrapped around it. 
> A vector container is a row set, but with much functionality left as an 
> exercise for the developer. And so on.)
> To simplify tests, we need a {{TestRowSet}} concept that wraps a 
> {{VectorContainer}} and provides easy ways to:
> * Define a schema for the row set.
> * Create a set of vectors that implement the schema.
> * Populate the row set with test data via code.
> * Add an SV2 to the row set.
> * Pass the row set to operator components (such as generated code blocks.)
> * Compare the results of the operation with an expected result set.
> * Dispose of the underling direct memory when work is done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5323) Provide test tools to create, populate and compare row sets

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948230#comment-15948230
 ] 

ASF GitHub Bot commented on DRILL-5323:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/785#discussion_r108758552
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/HyperRowSetImpl.java 
---
@@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.test.rowSet;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.HyperVectorWrapper;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.accessor.AccessorUtilities;
+import org.apache.drill.exec.vector.complex.AbstractMapVector;
+import org.apache.drill.test.rowSet.AbstractRowSetAccessor.BoundedRowIndex;
+import org.apache.drill.test.rowSet.RowSet.HyperRowSet;
+import org.apache.drill.test.rowSet.RowSetSchema.LogicalColumn;
+import org.apache.drill.test.rowSet.RowSetSchema.PhysicalSchema;
+
+public class HyperRowSetImpl extends AbstractRowSet implements HyperRowSet 
{
+
+  public static class HyperRowIndex extends BoundedRowIndex {
+
+private final SelectionVector4 sv4;
+
+public HyperRowIndex(SelectionVector4 sv4) {
+  super(sv4.getCount());
+  this.sv4 = sv4;
+}
+
+@Override
+public int index() {
+  return AccessorUtilities.sv4Index(sv4.get(rowIndex));
+}
+
+@Override
+public int batch( ) {
+  return AccessorUtilities.sv4Batch(sv4.get(rowIndex));
+}
+  }
+
+  /**
+   * Build a hyper row set by restructuring a hyper vector bundle into a 
uniform
+   * shape. Consider this schema: 
+   * { a: 10, b: { c: 20, d: { e: 30 } } }
+   * 
+   * The hyper container, with two batches, has this structure:
+   * 
+   * Batchab
+   * 0Int vectorMap Vector(Int vector, Map 
Vector(Int vector))
+   * 1Int vectorMap Vector(Int vector, Map 
Vector(Int vector))
+   * 
+   * 
+   * The above table shows that top-level scalar vectors (such as the Int 
Vector for column
+   * a) appear "end-to-end" as a hyper-vector. Maps also appear 
end-to-end. But, the
+   * contents of the map (column c) do not appear end-to-end. Instead, 
they appear as
+   * contents in the map vector. To get to c, one indexes into the map 
vector, steps inside
+   * the map to find c and indexes to the right row.
+   * 
+   * Similarly, the maps for d do not appear end-to-end, one must step to 
the right batch
+   * in b, then step to d.
+   * 
+   * Finally, to get to e, one must step
+   * into the hyper vector for b, then steps to the proper batch, steps to 
d, step to e
+   * and finally step to the row within e. This is a very complex, costly 
indexing scheme
+   * that differs depending on map nesting depth.
+   * 
+   * To simplify access, this class restructures the maps to flatten the 
scalar vectors
+   * into end-to-end hyper vectors. For example, for the above:
+   * 
+   * 
+   * Batchacd
+   * 0Int vectorInt vectorInt 
vector
+   * 1Int vectorInt vectorInt 
vector
+   * 
+   *
+   * The maps are still available as hyper vectors, but separated into map 
fields.
+   * (Scalar access no longer needs to access the maps.) The result is a 
uniform
+   * addressing scheme for both top-level and nested vectors.
+   */
+
+  public static class HyperVectorBuilder {
+

[jira] [Commented] (DRILL-5323) Provide test tools to create, populate and compare row sets

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948232#comment-15948232
 ] 

ASF GitHub Bot commented on DRILL-5323:
---

Github user sohami commented on a diff in the pull request:

https://github.com/apache/drill/pull/785#discussion_r108823826
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/RowSetWriterImpl.java 
---
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.test.rowSet;
+
+import java.math.BigDecimal;
+
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.accessor.AbstractColumnWriter;
+import org.apache.drill.exec.vector.accessor.ColumnAccessorFactory;
+import org.apache.drill.exec.vector.accessor.ColumnWriter;
+import org.apache.drill.test.rowSet.RowSet.RowSetWriter;
+import org.joda.time.Period;
+
+/**
+ * Implements a row set writer on top of a {@link RowSet}
+ * container.
+ */
+
+public class RowSetWriterImpl extends AbstractRowSetAccessor implements 
RowSetWriter {
+
+  private final AbstractColumnWriter writers[];
+
+  public RowSetWriterImpl(AbstractSingleRowSet recordSet, AbstractRowIndex 
rowIndex) {
+super(rowIndex, recordSet.schema().access());
+ValueVector[] valueVectors = recordSet.vectors();
+writers = new AbstractColumnWriter[valueVectors.length];
+int posn = 0;
+for (int i = 0; i < writers.length; i++) {
+  writers[posn] = 
ColumnAccessorFactory.newWriter(valueVectors[i].getField().getType());
--- End diff --

we can use _"i"_ instead of _"posn"_


> Provide test tools to create, populate and compare row sets
> ---
>
> Key: DRILL-5323
> URL: https://issues.apache.org/jira/browse/DRILL-5323
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Operators work with individual row sets. A row set is a collection of records 
> stored as column vectors. (Drill uses various terms for this concept. A 
> record batch is a row set with an operator implementation wrapped around it. 
> A vector container is a row set, but with much functionality left as an 
> exercise for the developer. And so on.)
> To simplify tests, we need a {{TestRowSet}} concept that wraps a 
> {{VectorContainer}} and provides easy ways to:
> * Define a schema for the row set.
> * Create a set of vectors that implement the schema.
> * Populate the row set with test data via code.
> * Add an SV2 to the row set.
> * Pass the row set to operator components (such as generated code blocks.)
> * Compare the results of the operation with an expected result set.
> * Dispose of the underling direct memory when work is done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948218#comment-15948218
 ] 

ASF GitHub Bot commented on DRILL-5394:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/802#discussion_r108823495
  
--- Diff: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushFilterIntoScan.java
 ---
@@ -184,6 +184,8 @@ protected void doPushFilterIntoBinaryGroupScan(final 
RelOptRuleCall call,
   return; //no filter pushdown ==> No transformation.
 }
 
+// Set rowCount in newScanSpec so we do not go and fetch rowCount (an 
expensive operation) again from MapR DB client.
+newScanSpec.setRowCount(groupScan.getHBaseScanSpec().getRowCount());
--- End diff --

Change the constructor for BinaryTableGroupScan to also pass in TableStats 
and add a getter for TableStats. This should be sufficient for passing around 
the rows.


> Optimize query planning for MapR-DB tables by caching row counts
> 
>
> Key: DRILL-5394
> URL: https://issues.apache.org/jira/browse/DRILL-5394
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary
> Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was 
> longer than expected. With DEBUG logs, it was understood that there were 
> multiple calls being made to get MapR-DB region locations and to fetch total 
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query 
> planning. This should help reduce query planning time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948216#comment-15948216
 ] 

ASF GitHub Bot commented on DRILL-5394:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/802#discussion_r108822233
  
--- Diff: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/BinaryTableGroupScan.java
 ---
@@ -115,8 +112,11 @@ private void init() {
 try (Admin admin = formatPlugin.getConnection().getAdmin();
  RegionLocator locator = 
formatPlugin.getConnection().getRegionLocator(tableName)) {
   hTableDesc = admin.getTableDescriptor(tableName);
-  tableStats = new MapRDBTableStats(getHBaseConf(), 
hbaseScanSpec.getTableName());
-
+  // Fetch rowCount only once and cache it in hbaseScanSpec.
+  if (hbaseScanSpec.getRowCount() == hbaseScanSpec.ROW_COUNT_UNKNOWN) {
+MapRDBTableStats tableStats = new MapRDBTableStats(getHBaseConf(), 
hbaseScanSpec.getTableName());
+hbaseScanSpec.setRowCount(tableStats.getNumRows());
--- End diff --

This looks weird. We create the TableStats relying on some information from 
the ScanSpec and then proceed to modify the same ScanSpec with the information 
retrieved from TableStats. Please look at below comment as well regarding Spec 
mutability.

Should we instead just overload the MapRDBTableStats constructor to allow 
passing numRows - since that is what the existing constructor endup doing but 
makes a call to DB Client? So instead of populating the ScanSpec we populate 
the tableStats using this new constructor?


> Optimize query planning for MapR-DB tables by caching row counts
> 
>
> Key: DRILL-5394
> URL: https://issues.apache.org/jira/browse/DRILL-5394
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary
> Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was 
> longer than expected. With DEBUG logs, it was understood that there were 
> multiple calls being made to get MapR-DB region locations and to fetch total 
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query 
> planning. This should help reduce query planning time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948217#comment-15948217
 ] 

ASF GitHub Bot commented on DRILL-5394:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/802#discussion_r108822541
  
--- Diff: 
contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/BinaryTableGroupScan.java
 ---
@@ -174,7 +174,7 @@ public MapRDBSubScan getSpecificScan(int 
minorFragmentId) {
   @Override
   public ScanStats getScanStats() {
 //TODO: look at stats for this.
-long rowCount = (long) ((hbaseScanSpec.getFilter() != null ? .5 : 1) * 
tableStats.getNumRows());
+long rowCount = (long) ((hbaseScanSpec.getFilter() != null ? .5 : 1) * 
hbaseScanSpec.getRowCount());
--- End diff --

This code change would not be required if we use the alternative approach?


> Optimize query planning for MapR-DB tables by caching row counts
> 
>
> Key: DRILL-5394
> URL: https://issues.apache.org/jira/browse/DRILL-5394
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary
> Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was 
> longer than expected. With DEBUG logs, it was understood that there were 
> multiple calls being made to get MapR-DB region locations and to fetch total 
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query 
> planning. This should help reduce query planning time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948219#comment-15948219
 ] 

ASF GitHub Bot commented on DRILL-5394:
---

Github user gparai commented on a diff in the pull request:

https://github.com/apache/drill/pull/802#discussion_r108821978
  
--- Diff: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseScanSpec.java
 ---
@@ -76,6 +78,10 @@ public String getTableName() {
 return stopRow == null ? HConstants.EMPTY_START_ROW : stopRow;
   }
 
+  public long getRowCount() { return rowCount; }
+
+  public void setRowCount(long numRowCount) { rowCount = numRowCount; }
--- End diff --

Should the ScanSpec be mutable since it serves a specification? I looked at 
other ScanSpecs but none seem to set ScanSpec members. Maybe all the members 
should be final and initialized via the constructor?


> Optimize query planning for MapR-DB tables by caching row counts
> 
>
> Key: DRILL-5394
> URL: https://issues.apache.org/jira/browse/DRILL-5394
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary
> Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was 
> longer than expected. With DEBUG logs, it was understood that there were 
> multiple calls being made to get MapR-DB region locations and to fetch total 
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query 
> planning. This should help reduce query planning time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-03-29 Thread Rob Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948192#comment-15948192
 ] 

Rob Wu commented on DRILL-5316:
---

Hi [~rhou] v1.3.4.0024's Drill Client contains this patch. You would have to 
test with driver prior to 1.3.4.

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Chun Chang
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5378) Put more information into SchemaChangeException when HashJoin hit SchemaChangeException

2017-03-29 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-5378:
--
Labels: ready-to-commit  (was: )

> Put more information into SchemaChangeException when HashJoin hit 
> SchemaChangeException
> ---
>
> Key: DRILL-5378
> URL: https://issues.apache.org/jira/browse/DRILL-5378
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Minor
>  Labels: ready-to-commit
>
> HashJoin currently does not allow schema change in either build side or probe 
> side. When HashJoin hit SchemaChangeException in the middle of execution, 
> Drill reports a brief error message about SchemaChangeException, without 
> providing any information what schemas are in the incoming batches. That 
> makes hard to analyze the error, and understand what's going on. 
> It probably makes sense to put the two differing schemas in the error 
> message, so that user could get better idea about the schema change. 
> Before Drill can provide support for schema change in HashJoin, the detailed 
> error message would help user debug error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-03-29 Thread Chun Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang updated DRILL-5316:
--
Reviewer: Robert Hou  (was: Chun Chang)

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Chun Chang
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-4678) Tune metadata by generating a dispatcher at runtime

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948154#comment-15948154
 ] 

ASF GitHub Bot commented on DRILL-4678:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/793


> Tune metadata by generating a dispatcher at runtime
> ---
>
> Key: DRILL-4678
> URL: https://issues.apache.org/jira/browse/DRILL-4678
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Serhii Harnyk
>Priority: Critical
> Attachments: hung_Date_Query.log
>
>
> Below query hangs
> {noformat}
> 2016-05-16 10:33:57,506 [28c65de9-9f67-dadb-5e4e-e1a12f8dda49:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 28c65de9-9f67-dadb-5e4e-e1a12f8dda49: SELECT DISTINCT dt FROM (
> VALUES(CAST('1964-03-07' AS DATE)),
>   (CAST('2002-03-04' AS DATE)),
>   (CAST('1966-09-04' AS DATE)),
>   (CAST('1993-08-18' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1970-06-11' AS DATE)),
>   (CAST('1959-10-23' AS DATE)),
>   (CAST('1992-01-14' AS DATE)),
>   (CAST('1994-07-24' AS DATE)),
>   (CAST('1979-11-25' AS DATE)),
>   (CAST('1945-01-14' AS DATE)),
>   (CAST('1982-07-25' AS DATE)),
>   (CAST('1966-09-06' AS DATE)),
>   (CAST('1989-05-01' AS DATE)),
>   (CAST('1996-03-08' AS DATE)),
>   (CAST('1998-08-19' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
>   (CAST('2013-08-13' AS DATE)),
> (CAST('1999-07-20' AS DATE)),
> (CAST('1962-07-03' AS DATE)),
>   (CAST('2011-08-17' AS DATE)),
>   (CAST('2011-05-16' AS DATE)),
>   (CAST('1946-05-08' AS DATE)),
>   (CAST('1994-02-13' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1978-08-09' AS DATE)),
>   (CAST('1958-02-06' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('2012-06-11' AS DATE)),
>   (CAST('1998-03-26' AS DATE)),
>   (CAST('1996-11-04' AS DATE)),
>   (CAST('1953-09-25' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('2003-06-17' AS DATE)),
>   (CAST('1980-07-05' AS DATE)),
>   (CAST('1982-06-15' AS DATE)),
>   (CAST('1951-05-16' AS DATE)))
> tbl(dt)
> {noformat}
> Details from Web UI Profile tab, please note that the query is still in 
> STARTING state
> {noformat}
> Running Queries
> Time  UserQuery   State   Foreman
> 05/16/2016 10:33:57   
> mapr
>  SELECT DISTINCT dt FROM ( VALUES(CAST('1964-03-07' AS DATE)), 
> (CAST('2002-03-04' AS DATE)), (CAST('1966-09-04' AS DATE)), (CAST('199
> STARTING
> centos-01.qa.lab
> {noformat}
> There is no other useful information in drillbit.log. jstack output is 
> attached here for your reference.
> The same query works fine on Postgres 9.3



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5400) Random IndexOutOfBoundsException when running a kvgen query on top of nested json data

2017-03-29 Thread Rahul Challapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948133#comment-15948133
 ] 

Rahul Challapalli commented on DRILL-5400:
--

Dataset used in the query :
{code}
{
"type": "FeatureCollection",
"maps" : [
{"m1":"val1", "m2":"val2"},
{"m3": {"m4":"val4"}}
],
"metadata": {
"generated": 1406245017000,
"url": 
"http://comcat.cr.usgs.gov/fdsnws/event/1/query?format=geojson=2000-07-24%2000%3A00%3A00=6=2014-07-24%2023%3A59%3A59=time;,
"title": "USGS Earthquakes",
"status": 200,
"api": "1.0.13",
"count": 2184
},
"features": [
{
"type": "Feature",
"properties": {
"mag": 6.9,
"time": 1405954481000,
"updated": 1405983436259
},
"geometry": {
"type": "Point",
"coordinates": "100,90"
},
"id": "usb000ruzk",
"location": {
"zip": "95134",
"street": "zanker",
"bldgs": {
"bldg1" : "HQ1",
"bldg2" : "HQ2"
}
}
}
]
}
{code}

> Random IndexOutOfBoundsException when running a kvgen query on top of nested 
> json data
> --
>
> Key: DRILL-5400
> URL: https://issues.apache.org/jira/browse/DRILL-5400
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=38ef562
> The below query did not fail when I ran it in isolation. However when I ran 
> the test suite at [1], which also contains the below query, by using 50 
> threads submitting queries concurrently, I hit the below error.
> {code}
> select geo.features[0].location.bldgs, kvgen(geo.features[0].location.bldgs) 
> from `json_kvgenflatten/nested.json` geo
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, 
> length: 4 (expected: range(0, 0))
> Fragment 0:0
> [Error Id: 9bf434d1-2199-498d-b0a5-b487bbc7690b on qa-node182.qa.lab:31010]
>   (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: 
> range(0, 0))
> io.netty.buffer.DrillBuf.checkIndexD():123
> io.netty.buffer.DrillBuf.chk():147
> io.netty.buffer.DrillBuf.getInt():520
> org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
> org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe():534
> 
> org.apache.drill.exec.vector.NullableVarCharVector$Mutator.fillEmpties():480
> 
> org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setValueCount():591
> org.apache.drill.exec.vector.complex.MapVector$Mutator.setValueCount():346
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setValueCount():273
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():206
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
>   at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
>   at 
>

[jira] [Created] (DRILL-5400) Random IndexOutOfBoundsException when running a kvgen query on top of nested json data

2017-03-29 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5400:


 Summary: Random IndexOutOfBoundsException when running a kvgen 
query on top of nested json data
 Key: DRILL-5400
 URL: https://issues.apache.org/jira/browse/DRILL-5400
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JSON
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=38ef562
The below query did not fail when I ran it in isolation. However when I ran the 
test suite at [1], which also contains the below query, by using 50 threads 
submitting queries concurrently, I hit the below error.
{code}
select geo.features[0].location.bldgs, kvgen(geo.features[0].location.bldgs) 
from `json_kvgenflatten/nested.json` geo
Failed with exception
java.sql.SQLException: SYSTEM ERROR: IndexOutOfBoundsException: index: 0, 
length: 4 (expected: range(0, 0))

Fragment 0:0

[Error Id: 9bf434d1-2199-498d-b0a5-b487bbc7690b on qa-node182.qa.lab:31010]

  (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: range(0, 
0))
io.netty.buffer.DrillBuf.checkIndexD():123
io.netty.buffer.DrillBuf.chk():147
io.netty.buffer.DrillBuf.getInt():520
org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe():534
org.apache.drill.exec.vector.NullableVarCharVector$Mutator.fillEmpties():480

org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setValueCount():591
org.apache.drill.exec.vector.complex.MapVector$Mutator.setValueCount():346

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setValueCount():273
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():206
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745

at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
ERROR: IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0))

Fragment 0:0

[Error Id: 9bf434d1-2199-498d-b0a5-b487bbc7690b on qa-node182.qa.lab:31010]

  (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: range(0, 
0))
io.netty.buffer.DrillBuf.checkIndexD():123

[jira] [Commented] (DRILL-5399) Random Error : Flatten does not support inputs of non-list values.

2017-03-29 Thread Rahul Challapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948104#comment-15948104
 ] 

Rahul Challapalli commented on DRILL-5399:
--

Dataset used in the query :
{code}
{"map":{"rm": [ {"rptd": [{ "a": "foo"}]}]}}
{code}

> Random Error : Flatten does not support inputs of non-list values.
> --
>
> Key: DRILL-5399
> URL: https://issues.apache.org/jira/browse/DRILL-5399
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - JSON
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=38ef562
> The below query did not fail when I ran it in isolation. However when I ran 
> the test suite at [1], which also contains the below query, by using 50 
> threads submitting queries concurrently, I hit the below error.
> {code}
> select flatten(sub.fk.`value`) from (select flatten(kvgen(map)) fk from 
> `json_kvgenflatten/nested3.json`) sub
> Failed with exception
> java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Flatten does not support 
> inputs of non-list values.
> Fragment 0:0
> [Error Id: 90026283-0b95-4bda-948e-54ed57a62edf on qa-node183.qa.lab:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
>   at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
>   at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
>   at 
> oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177)
>   at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
> UNSUPPORTED_OPERATION ERROR: Flatten does not support inputs of non-list 
> values.
> Fragment 0:0
> [Error Id: 90026283-0b95-4bda-948e-54ed57a62edf on qa-node183.qa.lab:31010]
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:343)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:88)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>   at 
>

[jira] [Created] (DRILL-5399) Random Error : Flatten does not support inputs of non-list values.

2017-03-29 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5399:


 Summary: Random Error : Flatten does not support inputs of 
non-list values.
 Key: DRILL-5399
 URL: https://issues.apache.org/jira/browse/DRILL-5399
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Storage - JSON
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=38ef562

The below query did not fail when I ran it in isolation. However when I ran the 
test suite at [1], which also contains the below query, by using 50 threads 
submitting queries concurrently, I hit the below error.
{code}
select flatten(sub.fk.`value`) from (select flatten(kvgen(map)) fk from 
`json_kvgenflatten/nested3.json`) sub
Failed with exception
java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Flatten does not support 
inputs of non-list values.

Fragment 0:0

[Error Id: 90026283-0b95-4bda-948e-54ed57a62edf on qa-node183.qa.lab:31010]


at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:177)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:101)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
UNSUPPORTED_OPERATION ERROR: Flatten does not support inputs of non-list values.

Fragment 0:0

[Error Id: 90026283-0b95-4bda-948e-54ed57a62edf on qa-node183.qa.lab:31010]


at 
oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:343)
at 
oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:88)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274)
at 
oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at 
oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at

[jira] [Commented] (DRILL-5398) Memory Allocator randomly throws an IllegalStateException when reading nested json data

2017-03-29 Thread Rahul Challapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948100#comment-15948100
 ] 

Rahul Challapalli commented on DRILL-5398:
--

Dataset used in the above query :
{code}
{
"rownum": 1,
"bigintegercol": {
"int_1": 1,
"int_2": 2,
"int_3": 3
},
"varcharcol": {
"varchar_1": "abc",
"varchar_2": "def",
"varchar_3": "xyz"
},
"boolcol": {
"boolean_1": true,
"boolean_2": false,
"boolean_3": true
},
"float8col": {
"f8_1": 1.1,
"f8_2": 2.2
},
"complex": [
{
"col1": 3
},
{
"col2": 2,
"col3": 1
},
{
"col1": 7
}
]
}
{
"rownum": 2,
"bigintegercol": {
"int_1": 1,
"int_2": 2
},
"varcharcol": {
"varchar_1": "abcd"
},
"boolcol": {
"boolean_1": true
},
"float8col": {
"f8_1": 1.1,
"f8_2": 2.2,
"f8_3": 3.3
},
"complex": [
{
"col2": 2,
"col3": 1
},
{
"col1": 7
}
]
}
{
"rownum": 3,
"bigintegercol": {
"int_1": 1,
"int_3": 3
},
"varcharcol": {
"varchar_1": "abcde",
"varchar_2": null,
"varchar_3": "xyz",
"varchar_4": "xyz2"
},
"boolcol": {
"boolean_1": true,
"boolean_2": false
},
"float8col": {
"f8_1": 1.1,
"f8_3": 6.6
},
"complex": [
{
"col1": 2,
"col3": 1
}
]
}
{
"rownum": 4,
"bigintegercol": {
"int_2": 2,
"int_3": 3
},
"varcharcol": {
"varchar_1": "abc",
"varchar_2": "def"
},
"boolcol": {
"boolean_1": true,
"boolean_2": false,
"boolean_3": null
},
"float8col": {
"f8_1": 1.1,
"f8_2": 2.2
},
"complex": [
{
"col1": 3,
"col2": 2
},
{
"col3": 1,
"col1": 7
}
]
}
{code}

> Memory Allocator randomly throws an IllegalStateException when reading nested 
> json data
> ---
>
> Key: DRILL-5398
> URL: https://issues.apache.org/jira/browse/DRILL-5398
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - JSON
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=38ef562
> The below query did not fail when I ran it in isolation. However when I ran 
> the test suite at [1], which also contains the below query, by using 50 
> threads submitting queries concurrently, I hit the below error. 
> {code}
> select kvgen(bigintegercol), kvgen(float8col) from 
> `json_kvgenflatten/kvgen1.json`
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: 
> Allocator[op:0:0:2:Project] closed with outstanding buffers allocated (6).
> Allocator(op:0:0:2:Project) 100/110592/434176/100 
> (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 6
> ledger[2747] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
> references: 1, life: 13372865831733586..0, allocatorManager: [694, life: 
> 13372864956215117..0] holds 1 buffers. 
> DrillBuf[3165], udle: [694 0..32768]
> ledger[2756] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
> references: 1, life: 13372865832378236..0, allocatorManager: [702, life: 
> 13372864958396200..0] holds 1 buffers. 
> DrillBuf[3176], udle: [703 0..4096]
> ledger[2775] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
> references: 1, life: 13372865833922897..0, allocatorManager: [706, life: 
> 13372864959861722..0] holds 1 buffers. 
> DrillBuf[3196], udle: [707 0..32768]
> ledger[2761] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
> references: 1, life: 13372865833009204..0, allocatorManager: [700, life: 
> 13372864957931156..0] holds 1 buffers. 
> DrillBuf[3181], udle: [701 0..32768]
> ledger[2769] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
> references: 1, life: 13372865833502205..0, allocatorManager: [708, life: 
> 13372864960352194..0] holds 1 buffers. 
> DrillBuf[3189], udle: [709 0..4096]
> ledger[2741] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
> references: 1, life: 13372865831092092..0, allocatorManager: [696, life: 
> 13372864956764681..0] holds 1 buffers. 
> DrillBuf[3160], udle: [697 0..4096]
>   reservations: 0
> Fragment 0:0
> [Error Id: f8d274e2-8119-495c-8a38-017c834f9931 on qa-node183.qa.lab:31010]
>   (java.lang.IllegalStateException)

[jira] [Created] (DRILL-5398) Memory Allocator randomly throws an IllegalStateException when reading nested json data

2017-03-29 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5398:


 Summary: Memory Allocator randomly throws an IllegalStateException 
when reading nested json data
 Key: DRILL-5398
 URL: https://issues.apache.org/jira/browse/DRILL-5398
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Storage - JSON
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=38ef562

The below query did not fail when I ran it in isolation. However when I ran the 
test suite at [1], which also contains the below query, by using 50 threads 
submitting queries concurrently, I hit the below error. 

{code}
select kvgen(bigintegercol), kvgen(float8col) from 
`json_kvgenflatten/kvgen1.json`
Failed with exception
java.sql.SQLException: SYSTEM ERROR: IllegalStateException: 
Allocator[op:0:0:2:Project] closed with outstanding buffers allocated (6).
Allocator(op:0:0:2:Project) 100/110592/434176/100 
(res/actual/peak/limit)
  child allocators: 0
  ledgers: 6
ledger[2747] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865831733586..0, allocatorManager: [694, life: 
13372864956215117..0] holds 1 buffers. 
DrillBuf[3165], udle: [694 0..32768]
ledger[2756] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865832378236..0, allocatorManager: [702, life: 
13372864958396200..0] holds 1 buffers. 
DrillBuf[3176], udle: [703 0..4096]
ledger[2775] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865833922897..0, allocatorManager: [706, life: 
13372864959861722..0] holds 1 buffers. 
DrillBuf[3196], udle: [707 0..32768]
ledger[2761] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865833009204..0, allocatorManager: [700, life: 
13372864957931156..0] holds 1 buffers. 
DrillBuf[3181], udle: [701 0..32768]
ledger[2769] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865833502205..0, allocatorManager: [708, life: 
13372864960352194..0] holds 1 buffers. 
DrillBuf[3189], udle: [709 0..4096]
ledger[2741] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865831092092..0, allocatorManager: [696, life: 
13372864956764681..0] holds 1 buffers. 
DrillBuf[3160], udle: [697 0..4096]
  reservations: 0


Fragment 0:0

[Error Id: f8d274e2-8119-495c-8a38-017c834f9931 on qa-node183.qa.lab:31010]

  (java.lang.IllegalStateException) Allocator[op:0:0:2:Project] closed with 
outstanding buffers allocated (6).
Allocator(op:0:0:2:Project) 100/110592/434176/100 
(res/actual/peak/limit)
  child allocators: 0
  ledgers: 6
ledger[2747] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865831733586..0, allocatorManager: [694, life: 
13372864956215117..0] holds 1 buffers. 
DrillBuf[3165], udle: [694 0..32768]
ledger[2756] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865832378236..0, allocatorManager: [702, life: 
13372864958396200..0] holds 1 buffers. 
DrillBuf[3176], udle: [703 0..4096]
ledger[2775] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865833922897..0, allocatorManager: [706, life: 
13372864959861722..0] holds 1 buffers. 
DrillBuf[3196], udle: [707 0..32768]
ledger[2761] allocator: op:0:0:2:Project), isOwning: true, size: 32768, 
references: 1, life: 13372865833009204..0, allocatorManager: [700, life: 
13372864957931156..0] holds 1 buffers. 
DrillBuf[3181], udle: [701 0..32768]
ledger[2769] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865833502205..0, allocatorManager: [708, life: 
13372864960352194..0] holds 1 buffers. 
DrillBuf[3189], udle: [709 0..4096]
ledger[2741] allocator: op:0:0:2:Project), isOwning: true, size: 4096, 
references: 1, life: 13372865831092092..0, allocatorManager: [696, life: 
13372864956764681..0] holds 1 buffers. 
DrillBuf[3160], udle: [697 0..4096]
  reservations: 0

org.apache.drill.exec.memory.BaseAllocator.close():486
org.apache.drill.exec.ops.OperatorContextImpl.close():149
org.apache.drill.exec.ops.FragmentContext.suppressingClose():422
org.apache.drill.exec.ops.FragmentContext.close():411
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():318
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():155
org.apache.drill.exec.work.fragment.FragmentExecutor.run():262
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745

at

[jira] [Commented] (DRILL-5397) Random Error : Unable to get holder type for minor type [LATE] and mode [OPTIONAL]

2017-03-29 Thread Rahul Challapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948087#comment-15948087
 ] 

Rahul Challapalli commented on DRILL-5397:
--

Dataset used in the above query :
{code}
{
"type": "FeatureCollection",
"maps" : [
{"m1":"val1", "m2":"val2"},
{"m3": {"m4":"val4"}}
],
"metadata": {
"generated": 1406245017000,
"url": 
"http://comcat.cr.usgs.gov/fdsnws/event/1/query?format=geojson=2000-07-24%2000%3A00%3A00=6=2014-07-24%2023%3A59%3A59=time;,
"title": "USGS Earthquakes",
"status": 200,
"api": "1.0.13",
"count": 2184
},
"features": [
{
"type": "Feature",
"properties": {
"mag": 6.9,
"time": 1405954481000,
"updated": 1405983436259
},
"geometry": {
"type": "Point",
"coordinates": "100,90"
},
"id": "usb000ruzk",
"location": {
"zip": "95134",
"street": "zanker",
"bldgs": {
"bldg1" : "HQ1",
"bldg2" : "HQ2"
}
}
}
]
}
{code}

> Random Error : Unable to get holder type for minor type [LATE] and mode 
> [OPTIONAL]
> --
>
> Key: DRILL-5397
> URL: https://issues.apache.org/jira/browse/DRILL-5397
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - JSON
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=38ef562
> The below query did not fail when running sequentially. However when I ran 
> the test suite at [1], which contains the below query, by using 50 threads 
> submitting queries concurrently, I hit the below error
> {code}
> select kvgen(bldgs[0]) from (select kvgen(geo.features[0].location.bldgs) 
> bldgs from `json_kvgenflatten/nested.json` geo)
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: UnsupportedOperationException: Unable to 
> get holder type for minor type [LATE] and mode [OPTIONAL]
> Fragment 0:0
> [Error Id: 67223a94-b24b-4bde-a87a-743b093b23a6 on qa-node183.qa.lab:31010]
>   (java.lang.UnsupportedOperationException) Unable to get holder type for 
> minor type [LATE] and mode [OPTIONAL]
> org.apache.drill.exec.expr.TypeHelper.getHolderType():602
> org.apache.drill.exec.expr.ClassGenerator.getHolderType():666
> org.apache.drill.exec.expr.ClassGenerator.declare():368
> org.apache.drill.exec.expr.ClassGenerator.declare():364
> 
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown():349
> 
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitUnknown():1320
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():1026
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():795
> 
> org.apache.drill.common.expression.visitors.AbstractExprVisitor.visitNullConstant():162
> 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitNullConstant():1003
> 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitNullConstant():795
> org.apache.drill.common.expression.TypedNullConstant.accept():46
> 
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression():193
> 
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression():1077
> 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():815
> 
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():795
> org.apache.drill.common.expression.FunctionHolderExpression.accept():47
> org.apache.drill.exec.expr.EvaluationVisitor.addExpr():104
> org.apache.drill.exec.expr.ClassGenerator.addExpr():261
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():458
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
>

[jira] [Created] (DRILL-5397) Random Error : Unable to get holder type for minor type [LATE] and mode [OPTIONAL]

2017-03-29 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5397:


 Summary: Random Error : Unable to get holder type for minor type 
[LATE] and mode [OPTIONAL]
 Key: DRILL-5397
 URL: https://issues.apache.org/jira/browse/DRILL-5397
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Storage - JSON
Affects Versions: 1.10.0
Reporter: Rahul Challapalli


git.commit.id.abbrev=38ef562

The below query did not fail when running sequentially. However when I ran the 
test suite at [1], which contains the below query, by using 50 threads 
submitting queries concurrently, I hit the below error

{code}
select kvgen(bldgs[0]) from (select kvgen(geo.features[0].location.bldgs) bldgs 
from `json_kvgenflatten/nested.json` geo)
Failed with exception
java.sql.SQLException: SYSTEM ERROR: UnsupportedOperationException: Unable to 
get holder type for minor type [LATE] and mode [OPTIONAL]

Fragment 0:0

[Error Id: 67223a94-b24b-4bde-a87a-743b093b23a6 on qa-node183.qa.lab:31010]

  (java.lang.UnsupportedOperationException) Unable to get holder type for minor 
type [LATE] and mode [OPTIONAL]
org.apache.drill.exec.expr.TypeHelper.getHolderType():602
org.apache.drill.exec.expr.ClassGenerator.getHolderType():666
org.apache.drill.exec.expr.ClassGenerator.declare():368
org.apache.drill.exec.expr.ClassGenerator.declare():364
org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown():349

org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitUnknown():1320
org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():1026
org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():795

org.apache.drill.common.expression.visitors.AbstractExprVisitor.visitNullConstant():162

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitNullConstant():1003

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitNullConstant():795
org.apache.drill.common.expression.TypedNullConstant.accept():46

org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression():193

org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression():1077

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():815

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():795
org.apache.drill.common.expression.FunctionHolderExpression.accept():47
org.apache.drill.exec.expr.EvaluationVisitor.addExpr():104
org.apache.drill.exec.expr.ClassGenerator.addExpr():261

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():458
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162

org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745

at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1895)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:61)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:473)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1100)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:180)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:112)

[jira] [Updated] (DRILL-5396) A flatten query on top of 2 files with one record each causes oversize allocation error randomly

2017-03-29 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-5396:
-
Affects Version/s: 1.10.0
  Description: 
git.commit.id.abbrev=38ef562

As part of verifying DRILL-3562, I came up with the below 2 files

File 1:
{code}
{ "a": { "b": { "c": [] } } }
{code}

File 2:
{code}
{ "a": { "b": { "c": [1] } } }
{code}

Now the below query work on individual files, however when I run the query on a 
directory containing both the files, I randomly hit the below error
{code}
select FLATTEN(t.a.b.c) AS c from 
dfs.`/drill/testdata/json_kvgenflatten/drill3562` t;
Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand the buffer. 
Max allowed buffer size is reached.

Fragment 0:0

[Error Id: e556243b-7ad6-4131-81f7-e8b225c9c8bc on qa-node182.qa.lab:31010] 
(state=,code=0)
{code}

This could be related to the fix for DRILL-3562

  was:
As part of verifying DRILL-3562, I came up with the below 2 files

File 1:
{code}
{ "a": { "b": { "c": [] } } }
{code}

File 2:
{code}
{ "a": { "b": { "c": [1] } } }
{code}

Now the below query work on individual files, however when I run the query on a 
directory containing both the files, I randomly hit the below error
{code}
select FLATTEN(t.a.b.c) AS c from 
dfs.`/drill/testdata/json_kvgenflatten/drill3562` t;
Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand the buffer. 
Max allowed buffer size is reached.

Fragment 0:0

[Error Id: e556243b-7ad6-4131-81f7-e8b225c9c8bc on qa-node182.qa.lab:31010] 
(state=,code=0)
{code}

This could be related to the fix for DRILL-3562


> A flatten query on top of 2 files with one record each causes oversize 
> allocation error randomly
> 
>
> Key: DRILL-5396
> URL: https://issues.apache.org/jira/browse/DRILL-5396
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - JSON
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=38ef562
> As part of verifying DRILL-3562, I came up with the below 2 files
> File 1:
> {code}
> { "a": { "b": { "c": [] } } }
> {code}
> File 2:
> {code}
> { "a": { "b": { "c": [1] } } }
> {code}
> Now the below query work on individual files, however when I run the query on 
> a directory containing both the files, I randomly hit the below error
> {code}
> select FLATTEN(t.a.b.c) AS c from 
> dfs.`/drill/testdata/json_kvgenflatten/drill3562` t;
> Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand the 
> buffer. Max allowed buffer size is reached.
> Fragment 0:0
> [Error Id: e556243b-7ad6-4131-81f7-e8b225c9c8bc on qa-node182.qa.lab:31010] 
> (state=,code=0)
> {code}
> This could be related to the fix for DRILL-3562



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5396) A flatten query on top of 2 files with one record each causes oversize allocation error randomly

2017-03-29 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli updated DRILL-5396:
-
Affects Version/s: 1.11.0

> A flatten query on top of 2 files with one record each causes oversize 
> allocation error randomly
> 
>
> Key: DRILL-5396
> URL: https://issues.apache.org/jira/browse/DRILL-5396
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - JSON
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>
> As part of verifying DRILL-3562, I came up with the below 2 files
> File 1:
> {code}
> { "a": { "b": { "c": [] } } }
> {code}
> File 2:
> {code}
> { "a": { "b": { "c": [1] } } }
> {code}
> Now the below query work on individual files, however when I run the query on 
> a directory containing both the files, I randomly hit the below error
> {code}
> select FLATTEN(t.a.b.c) AS c from 
> dfs.`/drill/testdata/json_kvgenflatten/drill3562` t;
> Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand the 
> buffer. Max allowed buffer size is reached.
> Fragment 0:0
> [Error Id: e556243b-7ad6-4131-81f7-e8b225c9c8bc on qa-node182.qa.lab:31010] 
> (state=,code=0)
> {code}
> This could be related to the fix for DRILL-3562



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (DRILL-5396) A flatten query on top of 2 files with one record each causes oversize allocation error randomly

2017-03-29 Thread Rahul Challapalli (JIRA)

Rahul Challapalli created DRILL-5396:


 Summary: A flatten query on top of 2 files with one record each 
causes oversize allocation error randomly
 Key: DRILL-5396
 URL: https://issues.apache.org/jira/browse/DRILL-5396
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Storage - JSON
Reporter: Rahul Challapalli


As part of verifying DRILL-3562, I came up with the below 2 files

File 1:
{code}
{ "a": { "b": { "c": [] } } }
{code}

File 2:
{code}
{ "a": { "b": { "c": [1] } } }
{code}

Now the below query work on individual files, however when I run the query on a 
directory containing both the files, I randomly hit the below error
{code}
select FLATTEN(t.a.b.c) AS c from 
dfs.`/drill/testdata/json_kvgenflatten/drill3562` t;
Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand the buffer. 
Max allowed buffer size is reached.

Fragment 0:0

[Error Id: e556243b-7ad6-4131-81f7-e8b225c9c8bc on qa-node182.qa.lab:31010] 
(state=,code=0)
{code}

This could be related to the fix for DRILL-3562



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (DRILL-4812) Wildcard queries fail on Windows

2017-03-29 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua closed DRILL-4812.
---

Verified on Windows 10 by creating a nested table of nation  (25row parquet 
file) with a directory structure of 2 level-1 dirs (a,b), 3 level-2 dirs 
(1,2,3) and 3 level-3 dirs (yes,no, idk).

{code}
0: jdbc:drill:zk=local> select count(n_nationkey) from 
dfs.root.`/drill/nation/a/1/idk/part-m-0.parquet`;
+-+
| EXPR$0  |
+-+
| 25  |
+-+
1 row selected (0.178 seconds)
0: jdbc:drill:zk=local> select count(n_nationkey) from 
dfs.root.`/drill/nation/a/1/idk`;
+-+
| EXPR$0  |
+-+
| 25  |
+-+
1 row selected (0.169 seconds)
0: jdbc:drill:zk=local> select count(n_nationkey) from 
dfs.root.`/drill/nation/a/*/idk`;
+-+
| EXPR$0  |
+-+
| 75  |
+-+
1 row selected (0.167 seconds)
0: jdbc:drill:zk=local> select count(n_nationkey) from 
dfs.root.`/drill/nation/*/*/idk`;
+-+
| EXPR$0  |
+-+
| 150 |
+-+
1 row selected (0.226 seconds)
0: jdbc:drill:zk=local> select count(n_nationkey) from 
dfs.root.`/drill/nation/*/*/*`;
+-+
| EXPR$0  |
+-+
| 450 |
+-+
1 row selected (0.225 seconds)
{code}

> Wildcard queries fail on Windows
> 
>
> Key: DRILL-4812
> URL: https://issues.apache.org/jira/browse/DRILL-4812
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.7.0
> Environment: Windows 7
>Reporter: Mike Lavender
>  Labels: easyfix, easytest, ready-to-commit, windows
> Fix For: 1.10.0
>
>
> Wildcards within the path of a query are not handled on windows and result in 
> a "String index out of range" exception.
> for example:
> {noformat}
> 0: jdbc:drill:zk=local> SELECT SUM(qty) as num FROM 
> dfs.parquet.`/trends/2016/1/*/*/3701`;
> Error: VALIDATION ERROR: String index out of range: -1
> SQL Query null
> {noformat}
> 
> The problem exists within:
> exec\java-exec\src\main\java\org\apache\drill\exec\store\dfs\FileSelection.java
> private static Path handleWildCard(final String root)
> This function is looking for the index of the system specific PATH_SEPARATOR 
> which on windows is '\' (from System.getProperty("file.separator")).  The 
> path passed in to handleWildcard will not ever have those type of path 
> separators as the Path constructor (from org.apache.hadoop.fs.Path) sets all 
> the path separators to '/'.
> NOTE:
> private static String removeLeadingSlash(String path)
> in that same file explicitly looks for '/' and does not use the system 
> specific PATH_SEPARATOR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5290) Provide an option to build operator table once for built-in static functions and reuse it across queries.

2017-03-29 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947967#comment-15947967
 ] 

Kunal Khatua commented on DRILL-5290:
-

Verified through low-latency tests that Drill is able to reuse the operator 
table and, as a result, run faster by avoiding the overhead of reconstructing 
the table.

> Provide an option to build operator table once for built-in static functions 
> and reuse it across queries.
> -
>
> Key: DRILL-5290
> URL: https://issues.apache.org/jira/browse/DRILL-5290
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> Currently, DrillOperatorTable which contains standard SQL operators and 
> functions and Drill User Defined Functions (UDFs) (built-in and dynamic) gets 
> built for each query as part of creating QueryContext. This is an expensive 
> operation ( ~30 msec to build) and allocates  ~2M on heap for each query. For 
> high throughput, low latency operational queries, we quickly run out of heap 
> memory, causing JVM hangs. Build operator table once during startup for 
> static built-in functions and save in DrillbitContext, so we can reuse it 
> across queries.
> Provide a system/session option to not use dynamic UDFs so we can use the 
> operator table saved in DrillbitContext and avoid building each time.
> *Please note, changes are adding new option exec.udf.use_dynamic which needs 
> to be documented.*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (DRILL-5290) Provide an option to build operator table once for built-in static functions and reuse it across queries.

2017-03-29 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua closed DRILL-5290.
---

> Provide an option to build operator table once for built-in static functions 
> and reuse it across queries.
> -
>
> Key: DRILL-5290
> URL: https://issues.apache.org/jira/browse/DRILL-5290
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> Currently, DrillOperatorTable which contains standard SQL operators and 
> functions and Drill User Defined Functions (UDFs) (built-in and dynamic) gets 
> built for each query as part of creating QueryContext. This is an expensive 
> operation ( ~30 msec to build) and allocates  ~2M on heap for each query. For 
> high throughput, low latency operational queries, we quickly run out of heap 
> memory, causing JVM hangs. Build operator table once during startup for 
> static built-in functions and save in DrillbitContext, so we can reuse it 
> across queries.
> Provide a system/session option to not use dynamic UDFs so we can use the 
> operator table saved in DrillbitContext and avoid building each time.
> *Please note, changes are adding new option exec.udf.use_dynamic which needs 
> to be documented.*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5257) Provide option to save query profiles sync, async or not at all

2017-03-29 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947962#comment-15947962
 ] 

Kunal Khatua commented on DRILL-5257:
-

Verified that the new features work and that query profile writes can be skipped

> Provide option to save query profiles sync, async or not at all
> ---
>
> Key: DRILL-5257
> URL: https://issues.apache.org/jira/browse/DRILL-5257
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> DRILL-5123 improved perceived query performance by writing the query profile 
> after sending a final response to the client. This is the desired behaviors 
> in most situations. However, some tests want to verify certain results by 
> reading the query profile from disk. Doing so works best when the query 
> profile is written before returning the final query results.
> This ticket requests that the timing if the query profile writing be 
> configurable.
> * Sync: write profile before final client response.
> * Async: write profile after final client response. (Default)
> * None: don't write query profile at all
> A config option (boot time? run time?) should control the option. A boot-time 
> option is fine for testing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (DRILL-5257) Provide option to save query profiles sync, async or not at all

2017-03-29 Thread Kunal Khatua (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Khatua closed DRILL-5257.
---

> Provide option to save query profiles sync, async or not at all
> ---
>
> Key: DRILL-5257
> URL: https://issues.apache.org/jira/browse/DRILL-5257
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> DRILL-5123 improved perceived query performance by writing the query profile 
> after sending a final response to the client. This is the desired behaviors 
> in most situations. However, some tests want to verify certain results by 
> reading the query profile from disk. Doing so works best when the query 
> profile is written before returning the final query results.
> This ticket requests that the timing if the query profile writing be 
> configurable.
> * Sync: write profile before final client response.
> * Async: write profile after final client response. (Default)
> * None: don't write query profile at all
> A config option (boot time? run time?) should control the option. A boot-time 
> option is fine for testing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (DRILL-4095) Regression: TPCDS query 30 throws can not plan exception

2017-03-29 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-4095.

Resolution: Fixed

Tpcds query 30 works with drill 1.10 release. Automated the test. 

Also vicky draws a comparison between tpcds query 30 and another tpch query in 
the description of the jira which I think are not the same. The tpch query is 
still not supported

> Regression: TPCDS query 30 throws can not plan exception
> 
>
> Key: DRILL-4095
> URL: https://issues.apache.org/jira/browse/DRILL-4095
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Victoria Markman
>
> Fails in: mapr-drill-1.2.0.201510190924-1.noarch.rpm
>  mapr-drill-1.3.0.20152348-1.noarch.rpm
>   
> It looks like this bug was fixed as  part of DRILL-2949 , but has regressed 
> since then.
> Reproduction with data shipped with drill (TPCH):
> {code}
> 0: jdbc:drill:schema=dfs> select 
> . . . . . . . . . . . . > count(*)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > cp.`tpch/nation.parquet` a,
> . . . . . . . . . . . . > cp.`tpch/region.parquet` b
> . . . . . . . . . . . . > where 
> . . . . . . . . . . . . > a.n_regionkey > (select max(b.r_regionkey) 
> from cp.`tpch/region.parquet` b where b.r_nationkey = a.n_nationkey);
> Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due 
> to either a cartesian join or an inequality join
> [Error Id: a52ca497-f654-46ba-b1a7-20d1c0147129 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> If you remove table from the join, query executes:
> {code}
> 0: jdbc:drill:schema=dfs> select 
> . . . . . . . . . . . . > count(*)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > cp.`tpch/nation.parquet` a
> . . . . . . . . . . . . > --cp.`tpch/region.parquet` b
> . . . . . . . . . . . . > where 
> . . . . . . . . . . . . > a.n_regionkey > (select max(b.r_regionkey) 
> from cp.`tpch/region.parquet` b where b.r_nationkey = a.n_nationkey);
> +-+
> | EXPR$0  |
> +-+
> | 0   |
> +-+
> 1 row selected (0.921 seconds)
> {code}
> This affects TPCDS query 30:
> {code}
> 0: jdbc:drill:schema=dfs> WITH customer_total_return
> . . . . . . . . . . . . >  AS (SELECT wr_returning_customer_sk AS 
> ctr_customer_sk,
> . . . . . . . . . . . . > ca_state AS 
> ctr_state,
> . . . . . . . . . . . . > Sum(wr_return_amt)   AS 
> ctr_total_return
> . . . . . . . . . . . . >  FROM   web_returns,
> . . . . . . . . . . . . > date_dim,
> . . . . . . . . . . . . > customer_address
> . . . . . . . . . . . . >  WHERE  wr_returned_date_sk = d_date_sk
> . . . . . . . . . . . . > AND d_year = 2000
> . . . . . . . . . . . . > AND wr_returning_addr_sk = 
> ca_address_sk
> . . . . . . . . . . . . >  GROUP  BY wr_returning_customer_sk,
> . . . . . . . . . . . . >ca_state)
> . . . . . . . . . . . . > SELECT c_customer_id,
> . . . . . . . . . . . . >c_salutation,
> . . . . . . . . . . . . >c_first_name,
> . . . . . . . . . . . . >c_last_name,
> . . . . . . . . . . . . >c_preferred_cust_flag,
> . . . . . . . . . . . . >c_birth_day,
> . . . . . . . . . . . . >c_birth_month,
> . . . . . . . . . . . . >c_birth_year,
> . . . . . . . . . . . . >c_birth_country,
> . . . . . . . . . . . . >c_login,
> . . . . . . . . . . . . >c_email_address,
> . . . . . . . . . . . . >c_last_review_date,
> . . . . . . . . . . . . >ctr_total_return
> . . . . . . . . . . . . > FROM   customer_total_return ctr1,
> . . . . . . . . . . . . >customer_address,
> . . . . . . . . . . . . >customer
> . . . . . . . . . . . . > WHERE  ctr1.ctr_total_return > (SELECT 
> Avg(ctr_total_return) * 1.2
> . . . . . . . . . . . . > FROM   
> customer_total_return ctr2
> . . . . . . . . . . . . > WHERE  
> ctr1.ctr_state = ctr2.ctr_state)
> . . . . . . . . . . . . >AND ca_address_sk = c_current_addr_sk
> . . . . . . . . . . . . >AND ca_state = 'IN'
> . . . . . . . . . . . . >AND ctr1.ctr_customer_sk = c_customer_sk
> . . . . . . . . . . . . > LIMIT 100;
> Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due 
> to either a cartesian join or an inequality join
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection

2017-03-29 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-4980.


Tests for DRILL-4203 also apply to this jira

> Upgrading of the approach of parquet date correctness status detection
> --
>
> Key: DRILL-4980
> URL: https://issues.apache.org/jira/browse/DRILL-4980
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.10.0
>
>
> This jira is an addition for the 
> [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
> The date correctness label for the new generated parquet files should be 
> upgraded. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Closed] (DRILL-4217) Query parquet file treat INT_16 & INT_8 as INT32

2017-03-29 Thread Rahul Challapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Challapalli closed DRILL-4217.


Verified and automated drill-4764 which covers this issue as well

> Query parquet file treat INT_16 & INT_8 as INT32
> 
>
> Key: DRILL-4217
> URL: https://issues.apache.org/jira/browse/DRILL-4217
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Reporter: Low Chin Wei
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Encounter this issue while trying to query a parquet file:
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> UnsupportedOperationException: unsupported type: INT32 INT_16 Fragment 1:1 
> We can treat the following Field Type as INTEGER before support of Short & 
> Byte is implemeted: 
> - INT32 INT_16
> - INT32 INT_8



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (DRILL-4675) Root allocator should prevent allocating more than the available direct memory

2017-03-29 Thread Zelaine Fong (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong reassigned DRILL-4675:
---

Assignee: Boaz Ben-Zvi  (was: Karthikeyan Manivannan)

[~ben-zvi] - assigning to you since you're working on spill to disk for hash 
agg.  The stack trace in the issue indicates that the OOM is occuring in hash 
agg, which applies for the first query, since it has a group by.  It would be 
good to try this query with your spill to disk hash agg code to see if it 
addresses that problem.

> Root allocator should prevent allocating more than the available direct memory
> --
>
> Key: DRILL-4675
> URL: https://issues.apache.org/jira/browse/DRILL-4675
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow, Execution - Monitoring
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Boaz Ben-Zvi
> Attachments: error.log
>
>
> git commit # : 09b262776e965ea17a6a863801f7e1ee3e5b3d5a
> I ran the below 2 queries (each query duplicated 10 times.so total 20 
> queries) using 10 different clients on an 8 node cluster. The drillbit on one 
> of the nodes hits an OOM error. The allocator should have caught this earlier.
> Query 1:
> {code}
> select count(*) 
> from (
> select l_orderkey, l_partkey, l_suppkey 
> from lineitem_nocompression_256
> group by l_orderkey, l_partkey, l_suppkey
> ) s
> {code} 
> Query 2 :
> {code}
> select count(*) from
> dfs.concurrency.customer_nocompression_256_filtered c,
> dfs.concurrency.orders_nocompression_256 o,
> dfs.concurrency.lineitem_nocompression_256 l
> where
> c.c_custkey = o.o_custkey
> and l.l_orderkey = o.o_orderkey
> {code}
> Exception from the logs 
> {code}
> Failure allocating buffer.
> [Error Id: cd71a6a0-7f41-4fe4-8bbb-294119adfebf ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure 
> allocating buffer.
> at 
> io.netty.buffer.PooledByteBufAllocatorL.allocate(PooledByteBufAllocatorL.java:64)
>  ~[drill-memory-base-1.7.0-SNAPSHOT.jar:4.0.27.Final]
> at 
> org.apache.drill.exec.memory.AllocationManager.(AllocationManager.java:80)
>  ~[drill-memory-base-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation(BaseAllocator.java:239)
>  ~[drill-memory-base-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:221) 
> ~[drill-memory-base-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:191) 
> ~[drill-memory-base-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.IntVector.allocateBytes(IntVector.java:200) 
> ~[vector-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.vector.IntVector.allocateNew(IntVector.java:182) 
> ~[vector-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.test.generated.HashTableGen54.allocMetadataVector(HashTableTemplate.java:757)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashTableGen54.resizeAndRehashIfNeeded(HashTableTemplate.java:722)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashTableGen54.insertEntry(HashTableTemplate.java:631)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashTableGen54.put(HashTableTemplate.java:609)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashTableGen54.put(HashTableTemplate.java:542)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen52.checkGroupAndAggrValues(HashAggTemplate.java:542)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.HashAggregatorGen52.doWork(HashAggTemplate.java:300)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:133)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
>

[jira] [Commented] (DRILL-5378) Put more information into SchemaChangeException when HashJoin hit SchemaChangeException

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947339#comment-15947339
 ] 

ASF GitHub Bot commented on DRILL-5378:
---

Github user jinfengni commented on the issue:

https://github.com/apache/drill/pull/801
  
If we change the flow to handle schema change, it makes senses to add new 
tests. Here, we only changes the error message ; not how we handle scheme 
change. I'm not inclined to add new tests just to verify the error message, 
since error message might involve over time. 


> Put more information into SchemaChangeException when HashJoin hit 
> SchemaChangeException
> ---
>
> Key: DRILL-5378
> URL: https://issues.apache.org/jira/browse/DRILL-5378
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Minor
>
> HashJoin currently does not allow schema change in either build side or probe 
> side. When HashJoin hit SchemaChangeException in the middle of execution, 
> Drill reports a brief error message about SchemaChangeException, without 
> providing any information what schemas are in the incoming batches. That 
> makes hard to analyze the error, and understand what's going on. 
> It probably makes sense to put the two differing schemas in the error 
> message, so that user could get better idea about the schema change. 
> Before Drill can provide support for schema change in HashJoin, the detailed 
> error message would help user debug error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947294#comment-15947294
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r108561492
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java
 ---
@@ -307,164 +231,49 @@ public FragmentContext getFragmentContext() {
 return fragmentContext;
   }
 
-  /**
-   * Returns data type length for a given {@see ColumnDescriptor} and it's 
corresponding
-   * {@see SchemaElement}. Neither is enough information alone as the max
-   * repetition level (indicating if it is an array type) is in the 
ColumnDescriptor and
-   * the length of a fixed width field is stored at the schema level.
-   *
-   * @return the length if fixed width, else -1
-   */
-  private int getDataTypeLength(ColumnDescriptor column, SchemaElement se) 
{
-if (column.getType() != PrimitiveType.PrimitiveTypeName.BINARY) {
-  if (column.getMaxRepetitionLevel() > 0) {
-return -1;
-  }
-  if (column.getType() == 
PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY) {
-return se.getType_length() * 8;
-  } else {
-return getTypeLengthInBits(column.getType());
-  }
-} else {
-  return -1;
-}
-  }
-
-  @SuppressWarnings({ "resource", "unchecked" })
   @Override
   public void setup(OperatorContext operatorContext, OutputMutator output) 
throws ExecutionSetupException {
 this.operatorContext = operatorContext;
-if (!isStarQuery()) {
-  columnsFound = new boolean[getColumns().size()];
-  nullFilledVectors = new ArrayList<>();
+if (isStarQuery()) {
+  schema = new ParquetSchema(fragmentContext.getOptions(), 
rowGroupIndex);
+} else {
+  schema = new ParquetSchema(fragmentContext.getOptions(), 
getColumns());
--- End diff --

why do we need to pass rowGroupIndex in one case and not other ? can we add 
comments here ? Is it possible to have a single constructor for ParquetSchema ?


> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-density record batches.
> Rather than continue to decypher the complex code multiple times, this ticket 
> requests to refactor the code to make it functionally identical, but 
> structurally cleaner. The result will be faster time to value when working 
> with this code.
> This is a lower-priority change and will be coordinated with others working 
> on this code base. This ticket is only for the record reader class itself; it 
> does not include the various readers and writers that Parquet uses since 
> another project is actively modifying those classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947298#comment-15947298
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r108560194
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetSchema.java
 ---
@@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.columnreaders;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.parquet.ParquetReaderUtility;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.parquet.column.ColumnDescriptor;
+import org.apache.parquet.format.SchemaElement;
+import org.apache.parquet.hadoop.metadata.BlockMetaData;
+import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
+import org.apache.parquet.hadoop.metadata.ParquetMetadata;
+
+import com.google.common.collect.Lists;
+
+/**
+ * Mapping from the schema of the Parquet file to that of the record reader
+ * to the schema that Drill and the Parquet reader uses.
+ */
+
+public class ParquetSchema {
+  private Collection selectedCols;
+  // This is a parallel list to the columns list above, it is used to 
determine the subset of the project
+  // pushdown columns that do not appear in this file
+  private boolean[] columnsFound;
+  private ParquetMetadata footer;
+  private Map schemaElements;
+  private int columnsToScan;
+  private List columns;
+  private List columnMd = new ArrayList<>();
+  private int bitWidthAllFixedFields;
+  private boolean allFieldsFixedLength = true;
+  private long groupRecordCount;
+  private int recordsPerBatch;
+  private int rowGroupIndex;
+  private final OptionManager options;
+
+  public ParquetSchema(OptionManager options, int rowGroupIndex) {
+selectedCols = null;
+this.rowGroupIndex = rowGroupIndex;
+this.options = options;
+  }
+
+  public ParquetSchema(OptionManager options, Collection 
selectedCols) {
+this.options = options;
--- End diff --

It is not clear which constructor is supposed to be used when. Please add 
some comments. why is rowGroupIndex not needed for the second case ?


> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-density record batches.
> Rather than continue to decypher the complex code multiple times, this ticket 
> requests to refactor the code to make

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947296#comment-15947296
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r108667588
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetInternalsTest.java
 ---
@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import static org.junit.Assert.*;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.drill.TestBuilder;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.FixtureBuilder;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+public class ParquetInternalsTest extends ClusterTest {
+
+  @BeforeClass
+  public static void setup( ) throws Exception {
+FixtureBuilder builder = ClusterFixture.builder()
+  // Set options, etc.
+  ;
+startCluster(builder);
+  }
+
+  @Test
+  public void testFixedWidth() throws Exception {
+String sql = "SELECT l_orderkey, l_partkey, l_suppkey, l_linenumber, 
l_quantity\n" +
+ "FROM `cp`.`tpch/lineitem.parquet` LIMIT 20";
+//client.queryBuilder().sql(sql).printCsv();
+
+Map typeMap = new HashMap<>();
+typeMap.put(TestBuilder.parsePath("l_orderkey"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_partkey"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_suppkey"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_linenumber"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_quantity"), 
Types.required(TypeProtos.MinorType.FLOAT8));
+client.testBuilder()
+  .sqlQuery(sql)
+  .unOrdered()
+  .csvBaselineFile("parquet/expected/fixedWidth.csv")
+  .baselineColumns("l_orderkey", "l_partkey", "l_suppkey", 
"l_linenumber", "l_quantity")
+  .baselineTypes(typeMap)
+  .build()
+  .run();
+  }
+
+
+  @Test
+  public void testVariableWidth() throws Exception {
+String sql = "SELECT s_name, s_address, s_phone, s_comment\n" +
+ "FROM `cp`.`tpch/supplier.parquet` LIMIT 20";
+client.queryBuilder().sql(sql).printCsv();
+
+Map typeMap = new HashMap<>();
+typeMap.put(TestBuilder.parsePath("s_name"), 
Types.required(TypeProtos.MinorType.VARCHAR));
+typeMap.put(TestBuilder.parsePath("s_address"), 
Types.required(TypeProtos.MinorType.VARCHAR));
+typeMap.put(TestBuilder.parsePath("s_phone"), 
Types.required(TypeProtos.MinorType.VARCHAR));
+typeMap.put(TestBuilder.parsePath("s_comment"), 
Types.required(TypeProtos.MinorType.VARCHAR));
+client.testBuilder()
+  .sqlQuery(sql)
+  .unOrdered()
+  .csvBaselineFile("parquet/expected/variableWidth.csv")
+  .baselineColumns("s_name", "s_address", "s_phone", "s_comment")
+  .baselineTypes(typeMap)
+  .build()
+  .run();
+  }
+
+  @Test
+  public void testMixedWidth() throws Exception {
+String sql = "SELECT s_suppkey, s_name, s_address, s_phone, 
s_acctbal\n" +
+ "FROM `cp`.`tpch/supplier.parquet` LIMIT 20";
+//client.queryBuilder().sql(sql).printCsv();
+
+Map typeMap = new HashMap<>();
+typeMap.put(TestBuilder.parsePath("s_suppkey"),

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947291#comment-15947291
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r108667453
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetInternalsTest.java
 ---
@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet;
+
+import static org.junit.Assert.*;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.drill.TestBuilder;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.FixtureBuilder;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+public class ParquetInternalsTest extends ClusterTest {
+
+  @BeforeClass
+  public static void setup( ) throws Exception {
+FixtureBuilder builder = ClusterFixture.builder()
+  // Set options, etc.
+  ;
+startCluster(builder);
+  }
+
+  @Test
+  public void testFixedWidth() throws Exception {
+String sql = "SELECT l_orderkey, l_partkey, l_suppkey, l_linenumber, 
l_quantity\n" +
+ "FROM `cp`.`tpch/lineitem.parquet` LIMIT 20";
+//client.queryBuilder().sql(sql).printCsv();
+
+Map typeMap = new HashMap<>();
+typeMap.put(TestBuilder.parsePath("l_orderkey"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_partkey"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_suppkey"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_linenumber"), 
Types.required(TypeProtos.MinorType.INT));
+typeMap.put(TestBuilder.parsePath("l_quantity"), 
Types.required(TypeProtos.MinorType.FLOAT8));
+client.testBuilder()
+  .sqlQuery(sql)
+  .unOrdered()
+  .csvBaselineFile("parquet/expected/fixedWidth.csv")
+  .baselineColumns("l_orderkey", "l_partkey", "l_suppkey", 
"l_linenumber", "l_quantity")
+  .baselineTypes(typeMap)
+  .build()
+  .run();
+  }
+
+
+  @Test
+  public void testVariableWidth() throws Exception {
+String sql = "SELECT s_name, s_address, s_phone, s_comment\n" +
+ "FROM `cp`.`tpch/supplier.parquet` LIMIT 20";
+client.queryBuilder().sql(sql).printCsv();
--- End diff --

do you want to comment this line ?


> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-density record batches.
> Rather than continue to decypher the complex code multiple times, this ticket 
> requests to refactor the code to make it functionally identical, but 
> structurally cleaner. The result will be faster time to value when working 
> with this code.
> This is a lower-priority change and will be coordinated with others working 
> on this code base. This ticket is only for the

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947297#comment-15947297
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r108557760
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java
 ---
@@ -307,164 +231,49 @@ public FragmentContext getFragmentContext() {
 return fragmentContext;
   }
 
-  /**
-   * Returns data type length for a given {@see ColumnDescriptor} and it's 
corresponding
-   * {@see SchemaElement}. Neither is enough information alone as the max
-   * repetition level (indicating if it is an array type) is in the 
ColumnDescriptor and
-   * the length of a fixed width field is stored at the schema level.
-   *
-   * @return the length if fixed width, else -1
-   */
-  private int getDataTypeLength(ColumnDescriptor column, SchemaElement se) 
{
-if (column.getType() != PrimitiveType.PrimitiveTypeName.BINARY) {
-  if (column.getMaxRepetitionLevel() > 0) {
-return -1;
-  }
-  if (column.getType() == 
PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY) {
-return se.getType_length() * 8;
-  } else {
-return getTypeLengthInBits(column.getType());
-  }
-} else {
-  return -1;
-}
-  }
-
-  @SuppressWarnings({ "resource", "unchecked" })
   @Override
   public void setup(OperatorContext operatorContext, OutputMutator output) 
throws ExecutionSetupException {
 this.operatorContext = operatorContext;
-if (!isStarQuery()) {
-  columnsFound = new boolean[getColumns().size()];
-  nullFilledVectors = new ArrayList<>();
+if (isStarQuery()) {
+  schema = new ParquetSchema(fragmentContext.getOptions(), 
rowGroupIndex);
+} else {
+  schema = new ParquetSchema(fragmentContext.getOptions(), 
getColumns());
 }
-columnStatuses = new ArrayList<>();
-List columns = 
footer.getFileMetaData().getSchema().getColumns();
-allFieldsFixedLength = true;
-ColumnDescriptor column;
-ColumnChunkMetaData columnChunkMetaData;
-int columnsToScan = 0;
-mockRecordsRead = 0;
 
-MaterializedField field;
+//ParquetMetadataConverter metaConverter = new 
ParquetMetadataConverter();
+//FileMetaData fileMetaData;
 
--- End diff --

instead of commenting, remove these lines if not needed. 


> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> The Parquet record reader class is a key part of Drill that has evolved over 
> time to become somewhat hard to follow.
> A number of us are working on Parquet-related tasks and find we have to spend 
> an uncomfortable amount of time trying to understand the code. In particular, 
> this writer needs to figure out how to convince the reader to provide 
> higher-density record batches.
> Rather than continue to decypher the complex code multiple times, this ticket 
> requests to refactor the code to make it functionally identical, but 
> structurally cleaner. The result will be faster time to value when working 
> with this code.
> This is a lower-priority change and will be coordinated with others working 
> on this code base. This ticket is only for the record reader class itself; it 
> does not include the various readers and writers that Parquet uses since 
> another project is actively modifying those classes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947295#comment-15947295
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r108559596
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetSchema.java
 ---
@@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.columnreaders;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.parquet.ParquetReaderUtility;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.parquet.column.ColumnDescriptor;
+import org.apache.parquet.format.SchemaElement;
+import org.apache.parquet.hadoop.metadata.BlockMetaData;
+import org.apache.parquet.hadoop.metadata.ColumnChunkMetaData;
+import org.apache.parquet.hadoop.metadata.ParquetMetadata;
+
+import com.google.common.collect.Lists;
+
+/**
+ * Mapping from the schema of the Parquet file to that of the record reader
+ * to the schema that Drill and the Parquet reader uses.
+ */
+
+public class ParquetSchema {
+  private Collection selectedCols;
+  // This is a parallel list to the columns list above, it is used to 
determine the subset of the project
+  // pushdown columns that do not appear in this file
+  private boolean[] columnsFound;
+  private ParquetMetadata footer;
+  private Map schemaElements;
+  private int columnsToScan;
+  private List columns;
+  private List columnMd = new ArrayList<>();
+  private int bitWidthAllFixedFields;
+  private boolean allFieldsFixedLength = true;
+  private long groupRecordCount;
+  private int recordsPerBatch;
+  private int rowGroupIndex;
+  private final OptionManager options;
+
+  public ParquetSchema(OptionManager options, int rowGroupIndex) {
+selectedCols = null;
+this.rowGroupIndex = rowGroupIndex;
+this.options = options;
+  }
+
+  public ParquetSchema(OptionManager options, Collection 
selectedCols) {
+this.options = options;
+this.selectedCols = selectedCols;
+columnsFound = new boolean[selectedCols.size()];
+  }
+
+  public void buildSchema(ParquetMetadata footer, long batchSize) throws 
Exception {
+this.footer = footer;
+columns = footer.getFileMetaData().getSchema().getColumns();
+groupRecordCount = footer.getBlocks().get(rowGroupIndex).getRowCount();
+loadParquetSchema();
+computeFixedPart();
+//rowGroupOffset = 
footer.getBlocks().get(rowGroupIndex).getColumns().get(0).getFirstDataPageOffset();
+
+if (columnsToScan != 0  && allFieldsFixedLength) {
+  recordsPerBatch = (int) Math.min(Math.min(batchSize / 
bitWidthAllFixedFields,
+  footer.getBlocks().get(0).getColumns().get(0).getValueCount()), 
ParquetRecordReader.DEFAULT_RECORDS_TO_READ_IF_FIXED_WIDTH);
+}
+else {
+  recordsPerBatch = 
ParquetRecordReader.DEFAULT_RECORDS_TO_READ_IF_VARIABLE_WIDTH;
+}
+  }
+
+  private void loadParquetSchema() {
+// TODO - figure out how to deal with this

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947292#comment-15947292
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r108692365
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ReadState.java
 ---
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.columnreaders;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.parquet.ParquetReaderStats;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.parquet.column.ColumnDescriptor;
+import org.apache.parquet.hadoop.metadata.BlockMetaData;
+
+/**
+ * Internal state for reading from a Parquet file.
+ */
+
+public class ReadState {
+  private final ParquetSchema schema;
+  ParquetReaderStats parquetReaderStats;
+  private VarLenBinaryReader varLengthReader;
+  // For columns not found in the file, we need to return a schema element 
with the correct number of values
+  // at that position in the schema. Currently this requires a vector be 
present. Here is a list of all of these vectors
+  // that need only have their value count set at the end of each call to 
next(), as the values default to null.
+  private List nullFilledVectors;
+  // Keeps track of the number of records returned in the case where only 
columns outside of the file were selected.
+  // No actual data needs to be read out of the file, we only need to 
return batches until we have 'read' the number of
+  // records specified in the row group metadata
+  long mockRecordsRead;
+  private List columnStatuses = new ArrayList<>();
+  private long numRecordsToRead; // number of records to read
+  private long totalRecordsRead;
+  boolean useAsyncColReader;
+
+  public ReadState(ParquetSchema schema, ParquetReaderStats 
parquetReaderStats, long numRecordsToRead, boolean useAsyncColReader) {
+this.schema = schema;
+this.parquetReaderStats = parquetReaderStats;
+this.useAsyncColReader = useAsyncColReader;
+if (! schema.isStarQuery()) {
+  nullFilledVectors = new ArrayList<>();
+}
+mockRecordsRead = 0;
+// Callers can pass -1 if they want to read all rows.
+if (numRecordsToRead == 
ParquetRecordReader.NUM_RECORDS_TO_READ_NOT_SPECIFIED) {
+  this.numRecordsToRead = schema.rowCount();
+} else {
+  assert (numRecordsToRead >= 0);
+  this.numRecordsToRead = Math.min(numRecordsToRead, 
schema.rowCount());
+}
+  }
+
+  @SuppressWarnings("unchecked")
+  public void buildReader(ParquetRecordReader reader, OutputMutator 
output) throws Exception {
+final ArrayList 
varLengthColumns = new ArrayList<>();
+// initialize all of the column read status objects
+BlockMetaData rowGroupMetadata = schema.getRowGroupMetadata();
+Map columnChunkMetadataPositionsInList = 
schema.buildChunkMap(rowGroupMetadata);
+for (ParquetColumnMetadata colMd : schema.getColumnMetadata()) {
+  ColumnDescriptor column = colMd.column;
+  colMd.columnChunkMetaData = rowGroupMetadata.getColumns().get(
+  
columnChunkMetadataPositionsInList.get(Arrays.toString(column.getPath(;
+  colMd.buildVector(output);
+  if (! colMd.isFixedLength( )) {
+// create a reader and add it to the appropriate list
+varLengthColumns.add(colMd.makeVariableWidthReader(reader));
+  } else if (colMd.isRepeated()) {
+

[jira] [Commented] (DRILL-5356) Refactor Parquet Record Reader

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947293#comment-15947293
 ] 

ASF GitHub Bot commented on DRILL-5356:
---

Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/789#discussion_r108693574
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/BatchReader.java
 ---
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet.columnreaders;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.base.Stopwatch;
+import com.google.common.collect.Lists;
+
+/**
+ * Base strategy for reading a batch of Parquet records.
+ */
+public abstract class BatchReader {
+
+  protected final ReadState readState;
+
+  public BatchReader(ReadState readState) {
+this.readState = readState;
+  }
+
+  public int readBatch() throws Exception {
+ColumnReader firstColumnStatus = readState.getFirstColumnStatus();
+long recordsToRead = Math.min(getReadCount(firstColumnStatus), 
readState.getRecordsToRead());
+int readCount = readRecords(firstColumnStatus, recordsToRead);
+readState.fillNullVectors(readCount);
+return readCount;
+  }
+
+  protected abstract long getReadCount(ColumnReader firstColumnStatus);
+
+  protected abstract int readRecords(ColumnReader firstColumnStatus, 
long recordsToRead) throws Exception;
+
+  protected void readAllFixedFields(long recordsToRead) throws Exception {
+Stopwatch timer = Stopwatch.createStarted();
+if(readState.useAsyncColReader()){
+  readAllFixedFieldsParallel(recordsToRead);
+} else {
+  readAllFixedFieldsSerial(recordsToRead);
+}
+
readState.parquetReaderStats.timeFixedColumnRead.addAndGet(timer.elapsed(TimeUnit.NANOSECONDS));
+  }
+
+  protected void readAllFixedFieldsSerial(long recordsToRead) throws 
IOException {
+for (ColumnReader crs : readState.getReaders()) {
+  crs.processPages(recordsToRead);
+}
+  }
+
+  protected void readAllFixedFieldsParallel(long recordsToRead) throws 
Exception {
+ArrayList futures = Lists.newArrayList();
+for (ColumnReader crs : readState.getReaders()) {
+  Future f = crs.processPagesAsync(recordsToRead);
+  futures.add(f);
+}
+Exception exception = null;
+for(Future f: futures){
+  if (exception != null) {
+f.cancel(true);
+  } else {
+try {
+  f.get();
+} catch (Exception e) {
+  f.cancel(true);
+  exception = e;
+}
+  }
+}
+if (exception != null) {
+  throw exception;
+}
+  }
+
+  /**
+   * Strategy for reading mock records. (What are these?)
+   */
+
+  public static class MockBatchReader extends BatchReader {
+
+public MockBatchReader(ReadState readState) {
+  super(readState);
+}
+
+@Override
+protected long getReadCount(ColumnReader firstColumnStatus) {
+  if (readState.mockRecordsRead == 
readState.schema().getGroupRecordCount()) {
+return 0;
--- End diff --

How about moving mockRecordsRead to this class instead of keeping it in 
readState ?


> Refactor Parquet Record Reader
> --
>
> Key: DRILL-5356
> URL: https://issues.apache.org/jira/browse/DRILL-5356
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> The Parquet record reader

[jira] [Updated] (DRILL-5395) Query on MapR-DB table fails with NPE due to an issue with assignment logic

2017-03-29 Thread Zelaine Fong (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5395:

Reviewer: Chunhui Shi

Assigned Reviewer to [~cshi]

> Query on MapR-DB table fails with NPE due to an issue with assignment logic
> ---
>
> Key: DRILL-5395
> URL: https://issues.apache.org/jira/browse/DRILL-5395
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary
> Fix For: 1.11.0
>
> Attachments: drillbit.log.txt
>
>
> We uncovered this issue when working on DRILL-5394. 
> The MapR-DB table in question had 5 tablets with skewed data distribution (~6 
> million rows). A partial WIP fix for DRILL-5394 caused the number of rows to 
> be reported incorrectly (~300,000). 2 minor fragments were created (due to 
> filter selectivity) for scanning the 5 tablets. And this resulted in an NPE, 
> possibly related to an issue with assignment logic, that was now exposed. 
> Representative query:
> {code}
> SELECT Convert_from(avail.customer, 'UTF8') AS ABC, 
>Convert_from(prop.customer, 'UTF8')  AS PQR 
> FROM   (SELECT Convert_from(a.row_key, 'UTF8') 
>AS customer, 
>Cast(Convert_from(a.data .` l_discount ` , 'double_be') AS 
> FLOAT) 
>AS availability 
> FROM   db.tpch_maprdb.lineitem_1 a 
> WHERE  Convert_from(a.row_key, 'UTF8') = '%004%') AS avail 
>join 
>   (SELECT Convert_from(b.row_key, 'UTF8') 
>   AS customer, 
>Cast( 
>Convert_from(b.data .` l_discount ` , 'double_be') AS FLOAT) AS 
>availability 
> FROM   db.tpch_maprdb.lineitem_1 b 
> WHERE  Convert_from(b.row_key, 'UTF8') LIKE '%003%') AS prop 
>  ON avail.customer = prop.customer; 
> {code}
> Error:
> {code}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> {code}
> Log attached. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5394) Optimize query planning for MapR-DB tables by caching row counts

2017-03-29 Thread Zelaine Fong (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5394:

Reviewer: Gautam Kumar Parai

Assigned Reviewer to [~gparai]

> Optimize query planning for MapR-DB tables by caching row counts
> 
>
> Key: DRILL-5394
> URL: https://issues.apache.org/jira/browse/DRILL-5394
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, Storage - MapRDB
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: MapR-DB-Binary
> Fix For: 1.11.0
>
>
> On large MapR-DB tables, it was observed that the query planning time was 
> longer than expected. With DEBUG logs, it was understood that there were 
> multiple calls being made to get MapR-DB region locations and to fetch total 
> row count for tables.
> {code}
> 2017-02-23 13:59:55,246 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:05,006 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Function
> ...
> 2017-02-23 14:00:05,031 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:16,438 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:16,439 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:28,479 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:28,480 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:42,396 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.planner.logical.DrillOptiq - Special
> ...
> 2017-02-23 14:00:42,397 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.s.m.d.b.BinaryTableGroupScan - Getting region locations
> 2017-02-23 14:00:54,609 [27513143-8718-7a47-a2d4-06850755568a:foreman] DEBUG 
> o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (49588ms):
> {code}
> We should cache these stats and reuse them where all required during query 
> planning. This should help reduce query planning time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-4253) Some functional tests are failing because sort limit is too low

2017-03-29 Thread Zelaine Fong (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-4253:

Fix Version/s: (was: 1.10.0)
   1.11.0

> Some functional tests are failing because sort limit is too low
> ---
>
> Key: DRILL-4253
> URL: https://issues.apache.org/jira/browse/DRILL-4253
> Project: Apache Drill
>  Issue Type: Test
>  Components: Tools, Build & Test
>Affects Versions: 1.5.0
> Environment: 4 nodes cluster, 32 cores each
>Reporter: Deneche A. Hakim
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The following tests are running out of memory:
> {noformat}
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q174.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q171.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q168_DRILL-2046.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q162_DRILL-1985.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q165.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q177_DRILL-2046.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q159_DRILL-2046.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/large/q157_DRILL-1985.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/large/q175_DRILL-1985.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q160_DRILL-1985.q
> framework/resources/Functional/data-shapes/wide-columns/5000/1000rows/parquet/q163_DRILL-2046.q
> {noformat}
> With errors similar to the following:
> {noformat}
> java.sql.SQLException: SYSTEM ERROR: DrillRuntimeException: Failed to 
> pre-allocate memory for SV. Existing recordCount*4 = 0, incoming batch 
> recordCount*4 = 696
> {noformat}
> {noformat}
> Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
> {noformat}
> Those queries operate on wide tables and the sort limit is too low when using 
> the default value for {{planner.memory.max_query_memory_per_node}}.
> We should update those tests to set a higher limit (4GB worked well for me) 
> to {{planner.memory.max_query_memory_per_node}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5310) Memory leak in managed sort if OOM during sv2 allocation

2017-03-29 Thread Zelaine Fong (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5310:

Fix Version/s: (was: 1.10.0)
   1.11.0

> Memory leak in managed sort if OOM during sv2 allocation
> 
>
> Key: DRILL-5310
> URL: https://issues.apache.org/jira/browse/DRILL-5310
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> See the "identical1" test case in DRILL-5266. Due to misconfiguration, the 
> sort was given too little memory to make progress. An OOM error occurred when 
> allocating an SV2.
> In this scenario, the "converted" record batch is leaked.
> Normally, a converted batch is added to the list of in-memory batches, then 
> released on {{close()}}. But, in this case, the batch is only a local 
> variable, and so leaks.
> The code must release this batch in this condition.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5164) Equi-join query results in CompileException when inputs have large number of columns

2017-03-29 Thread Zelaine Fong (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5164:

Fix Version/s: (was: 1.10.0)
   1.11.0

> Equi-join query results in CompileException when inputs have large number of 
> columns
> 
>
> Key: DRILL-5164
> URL: https://issues.apache.org/jira/browse/DRILL-5164
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>Priority: Critical
> Fix For: 1.11.0
>
> Attachments: manyColsInJson.json
>
>
> Drill 1.9.0 
> git commit ID : 4c1b420b
> 4 node CentOS cluster
> JSON file has 4095 keys (columns)
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from `manyColsInJson.json` t1, 
> `manyColsInJson.json` t2 where t1.key2000 = t2.key2000;
> Error: SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-12-26 09:52:11,321 [279f17fd-c8f0-5d18-1124-76099f0a5cc8:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> CompileException: File 
> 'org.apache.drill.exec.compile.DrillJavaFileObject[HashJoinProbeGen294.java]',
>  Line 16397, Column 17: HashJoinProbeGen294.java:16397: error: code too large
> public void doSetup(FragmentContext context, VectorContainer buildBatch, 
> RecordBatch probeBatch, RecordBatch outgoing)
> ^ (compiler.err.limit.code)
> Fragment 0:0
> [Error Id: 7d0efa7e-e183-4c40-939a-4908699f94bf on centos-01.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
>  [drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.9.0.jar:1.9.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: 
> org.apache.drill.exec.exception.SchemaChangeException: 
> org.apache.drill.exec.exception.ClassTransformationException: 
> java.util.concurrent.ExecutionException: 
> org.apache.drill.exec.exception.ClassTransformationException: Failure 
> generating transformation classes for value:
> package org.apache.drill.exec.test.generated;
> ...
> public class HashJoinProbeGen294 {
> NullableVarCharVector[] vv0;
> NullableVarCharVector vv3;
> NullableVarCharVector[] vv6;
> ...
> vv49137 .copyFromSafe((probeIndex), (outIndex), vv49134);
> vv49143 .copyFromSafe((probeIndex), (outIndex), vv49140);
> vv49149 .copyFromSafe((probeIndex), (outIndex), vv49146);
> }
> }
> 
> public void __DRILL_INIT__()
> throws SchemaChangeException
> {
> }
> }
> at 
> org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:302)
>  ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78) 
> ~[drill-java-exec-1.9.0.jar:1.9.0]
> at 
>

[jira] [Updated] (DRILL-5270) Improve loading of profiles listing in the WebUI

2017-03-29 Thread Zelaine Fong (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5270:

Fix Version/s: (was: 1.10.0)
   1.11.0

> Improve loading of profiles listing in the WebUI
> 
>
> Key: DRILL-5270
> URL: https://issues.apache.org/jira/browse/DRILL-5270
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Web Server
>Affects Versions: 1.9.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
> Fix For: 1.11.0
>
>
> Currently, as the number of profiles increase, we reload the same list of 
> profiles from the FS.
> An ideal improvement would be to detect if there are any new profiles and 
> only reload from the disk then. Otherwise, a cached list is sufficient.
> For a directory of 280K profiles, the load time is close to 6 seconds on a 32 
> core server. With the caching, we can get it down to as much as a few 
> milliseconds.
> To render the cache as invalid, we inspect the last modified time of the 
> directory to confirm whether a reload is needed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5312) "Record batch sizer" does not include overhead for variable-sized vectors

2017-03-29 Thread Zelaine Fong (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5312:

Fix Version/s: (was: 1.10.0)
   1.11.0

> "Record batch sizer" does not include overhead for variable-sized vectors
> -
>
> Key: DRILL-5312
> URL: https://issues.apache.org/jira/browse/DRILL-5312
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The new "record batch sizer" computes the actual data size of a record given 
> a batch of vectors. For most purposes, the record width must include the 
> overhead of the offset vectors for variable-sized vectors. The initial code 
> drop included only the character data, but not the offset vector size when 
> computing row width.
> Since the "managed" external sort relies on the computed row size to 
> determine memory usage, the underestimation of row count width can cause an 
> OOM under certain low-memory conditions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-4139) Exception while trying to prune partition. java.lang.UnsupportedOperationException: Unsupported type: BIT & Interval

2017-03-29 Thread Volodymyr Vysotskyi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947003#comment-15947003
 ] 

Volodymyr Vysotskyi commented on DRILL-4139:


When querying parquet table partitioned by optional VARCHAR column that has few 
NULL values, partition pruning does not happens. 
{code:sql}
explain plan for select * from vrchr_partition where col_vrchr = 'John 
Mcginity';
{code}
{noformat}
00-00Screen
00-01  Project(*=[$0])
00-02Project(T2¦¦*=[$0])
00-03  SelectionVectorRemover
00-04Filter(condition=[=($1, 'John Mcginity')])
00-05  Project(T2¦¦*=[$0], col_vrchr=[$1])
00-06Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=file:/tmp/vrchr_partition]], 
selectionRoot=file:/tmp/vrchr_partition, numFiles=1, usedMetadataFile=false, 
columns=[`*`]]])
{noformat}
In sqlline.log there are no any errors or warnings. 
When partition field does not contains nulls, but still optional, partition 
pruning occurs. 
For all other types except INTERVAL, partition pruning works when partition 
column has few NULL values.

> Exception while trying to prune partition. 
> java.lang.UnsupportedOperationException: Unsupported type: BIT & Interval
> 
>
> Key: DRILL-4139
> URL: https://issues.apache.org/jira/browse/DRILL-4139
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
> Environment: 4 node cluster on CentOS
>Reporter: Khurram Faraaz
>Assignee: Volodymyr Vysotskyi
>
> Exception while trying to prune partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> is seen in drillbit.log after Functional run on 4 node cluster.
> Drill 1.3.0 sys.version => d61bb83a8
> {code}
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2015-11-27 03:12:19,810 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] WARN  
> o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune 
> partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> at 
> org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:479)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:235)
>  ~[drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808)
>  [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303) 
> [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:184)
>  [drill-java-exec-1.3.0.jar:1.3.0]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244) 
> [drill-java-exec-1.3.0.jar:1.3.0]
> at 
>

[jira] [Comment Edited] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection

2017-03-29 Thread Vitalii Diravka (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946915#comment-15946915
 ] 

Vitalii Diravka edited comment on DRILL-4980 at 3/29/17 10:49 AM:
--

[~rkins] After this fix parquet files generated from drill have other metadata 
field for detection of corrupt date values. But it doesn't have any functional 
impact. It is just code refactoring and redesign of an approach of parquet date 
correctness status detection.


was (Author: vitalii):
[~rkins] No, it doesn't have any functional impact. It is just code refactoring 
and redesign of an approach of parquet date correctness status detection.

> Upgrading of the approach of parquet date correctness status detection
> --
>
> Key: DRILL-4980
> URL: https://issues.apache.org/jira/browse/DRILL-4980
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.10.0
>
>
> This jira is an addition for the 
> [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
> The date correctness label for the new generated parquet files should be 
> upgraded. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946913#comment-15946913
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/794
  

Thanks for bringing up this point. I have done some investigation and found 
out that implicit casts for nested loop join are already included during 
materialization.
Join condition is transformed into FunctionCall [1] which is later on 
materialized using ExpressionTreeMaterializer [2]. 
`ExpressionTreeMaterializer.visitFunctionCall` includes section which 
implicit casts [3].
Actually these casts are more enhanced that during hash and merge joins.
For example, during hash and merge joins only casts between numeric types, 
date and timestamp, varchar and varbinary are supported, i.e.
join by int and varchar columns won't be performed. The following error 
will be returned: `Join only supports implicit casts between 1. Numeric data
 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: INT, Right 
type: VARCHAR. Add explicit casts to avoid this error`. 
In our case nested loop join will be able to perform join by int and 
varchar columns without adding explicit casts.

[1] 
https://github.com/arina-ielchiieva/drill/blob/71628e70a525d9bd27b4f5f56259dce84c75154d/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/NestedLoopJoinPrel.java#L95
[2] 
https://github.com/arina-ielchiieva/drill/blob/71628e70a525d9bd27b4f5f56259dce84c75154d/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java#L265
[3] 
https://github.com/apache/drill/blob/9411b26ece34ed8b2f498deea5e41f1901eb1013/exec/java-exec/src/main/java/org/apache/drill/exec/expr/ExpressionTreeMaterializer.java#L362


> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946908#comment-15946908
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/794#discussion_r108640631
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ---
@@ -70,27 +70,65 @@
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillOptiq.class);
 
   /**
-   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax.
+   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax using one input.
+   *
+   * @param context parse context which contains planner settings
+   * @param input data input
+   * @param expr expression to be converted
+   * @return converted expression
*/
   public static LogicalExpression toDrill(DrillParseContext context, 
RelNode input, RexNode expr) {
-final RexToDrill visitor = new RexToDrill(context, input);
+return toDrill(context, Lists.newArrayList(input), expr);
+  }
+
+  /**
+   * Converts a tree of {@link RexNode} operators into a scalar expression 
in Drill syntax using multiple inputs.
+   *
+   * @param context parse context which contains planner settings
+   * @param inputs multiple data inputs
+   * @param expr expression to be converted
+   * @return converted expression
+   */
+  public static LogicalExpression toDrill(DrillParseContext context, 
List inputs, RexNode expr) {
+final RexToDrill visitor = new RexToDrill(context, inputs);
 return expr.accept(visitor);
   }
 
   private static class RexToDrill extends 
RexVisitorImpl {
-private final RelNode input;
+private final List inputs;
 private final DrillParseContext context;
+private final List fieldList;
 
-RexToDrill(DrillParseContext context, RelNode input) {
+RexToDrill(DrillParseContext context, List inputs) {
   super(true);
   this.context = context;
-  this.input = input;
+  this.inputs = inputs;
+  this.fieldList = Lists.newArrayList();
+  /*
+ Fields are enumerated by their presence order in input. Details 
{@link org.apache.calcite.rex.RexInputRef}.
+ Thus we can merge field list from several inputs by adding them 
into the list in order of appearance.
+ Each field index in the list will match field index in the 
RexInputRef instance which will allow us
+ to retrieve field from filed list by index in {@link 
#visitInputRef(RexInputRef)} method. Example:
+
+ Query: select t1.c1, t2.c1. t2.c2 from t1 inner join t2 on t1.c1 
between t2.c1 and t2.c2
+
+ Input 1: $0
+ Input 2: $1, $2
+
+ Result: $0, $1, $2
+   */
+  for (RelNode input : inputs) {
--- End diff --

Yes, in `public LogicalExpression visitInputRef(RexInputRef inputRef)` we 
determine to which input field belongs to. Before that we had only one input 
thus we did simple get operation `input.getRowType().getFieldList().get(index)` 
but now we have two inputs so we have to get operation on one input and if 
field in not found try in the second.  I could iterate over two inputs and do 
get operation and once filed is found break the loop OR I could merge filed 
list in one and do simple get operation `fieldList.get(index)`. For performance 
reasons, I decided to merge filed lists in constructor and use them in `public 
LogicalExpression visitInputRef(RexInputRef inputRef)` rather than iterating 
over them for each field.


> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946907#comment-15946907
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/794#discussion_r108382080
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java
 ---
@@ -40,132 +41,133 @@
   // Record count of the left batch currently being processed
   private int leftRecordCount = 0;
 
-  // List of record counts  per batch in the hyper container
+  // List of record counts per batch in the hyper container
   private List rightCounts = null;
 
   // Output batch
   private NestedLoopJoinBatch outgoing = null;
 
-  // Next right batch to process
-  private int nextRightBatchToProcess = 0;
-
-  // Next record in the current right batch to process
-  private int nextRightRecordToProcess = 0;
-
-  // Next record in the left batch to process
-  private int nextLeftRecordToProcess = 0;
+  // Iteration status tracker
+  private IterationStatusTracker tracker = new IterationStatusTracker();
 
   /**
* Method initializes necessary state and invokes the doSetup() to set 
the
-   * input and output value vector references
+   * input and output value vector references.
+   *
* @param context Fragment context
* @param left Current left input batch being processed
* @param rightContainer Hyper container
+   * @param rightCounts Counts for each right container
* @param outgoing Output batch
*/
-  public void setupNestedLoopJoin(FragmentContext context, RecordBatch 
left,
+  public void setupNestedLoopJoin(FragmentContext context,
+  RecordBatch left,
   ExpandableHyperContainer rightContainer,
   LinkedList rightCounts,
   NestedLoopJoinBatch outgoing) {
 this.left = left;
-leftRecordCount = left.getRecordCount();
+this.leftRecordCount = left.getRecordCount();
 this.rightCounts = rightCounts;
 this.outgoing = outgoing;
 
 doSetup(context, rightContainer, left, outgoing);
   }
 
   /**
-   * This method is the core of the nested loop join. For every record on 
the right we go over
-   * the left batch and produce the cross product output
+   * Main entry point for producing the output records. Thin wrapper 
around populateOutgoingBatch(), this method
+   * controls which left batch we are processing and fetches the next left 
input batch once we exhaust the current one.
+   *
+   * @param joinType join type (INNER ot LEFT)
+   * @return the number of records produced in the output batch
+   */
+  public int outputRecords(JoinRelType joinType) {
+int outputIndex = 0;
+while (leftRecordCount != 0) {
+  outputIndex = populateOutgoingBatch(joinType, outputIndex);
+  if (outputIndex >= NestedLoopJoinBatch.MAX_BATCH_SIZE) {
+break;
+  }
+  // reset state and get next left batch
+  resetAndGetNextLeft();
+}
+return outputIndex;
+  }
+
+  /**
+   * This method is the core of the nested loop join.For each left batch 
record looks for matching record
+   * from the list of right batches. Match is checked by calling {@link 
#doEval(int, int, int)} method.
+   * If matching record is found both left and right records are written 
into output batch,
+   * otherwise if join type is LEFT, than only left record is written, 
right batch record values will be null.
+   *
+   * @param joinType join type (INNER or LEFT)
* @param outputIndex index to start emitting records at
* @return final outputIndex after producing records in the output batch
*/
-  private int populateOutgoingBatch(int outputIndex) {
-
-// Total number of batches on the right side
-int totalRightBatches = rightCounts.size();
-
-// Total number of records on the left
-int localLeftRecordCount = leftRecordCount;
-
-/*
- * The below logic is the core of the NLJ. To have better performance 
we copy the instance members into local
- * method variables, once we are done with the loop we need to update 
the instance variables to reflect the new
- * state. To avoid code duplication of resetting the instance members 
at every exit point in the loop we are using
- * 'goto'
- */
-int localNextRightBatchToProcess = nextRightBatchToProcess;
-int localNextRightRecordToProcess = nextRightRecordToProcess;

[jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946909#comment-15946909
 ] 

ASF GitHub Bot commented on DRILL-5375:
---

Github user arina-ielchiieva commented on a diff in the pull request:

https://github.com/apache/drill/pull/794#discussion_r108421451
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/BatchReference.java ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.expr;
+
+import com.google.common.base.Preconditions;
+
+/**
+ * Holder class that contains batch naming, batch  and record index. Batch 
index is used when batch is hyper container.
+ * Used to distinguish batches in non-equi conditions during expression 
materialization.
+ * Mostly used for nested loop join which allows non equi-join.
+ *
+ * Example:
+ * BatchReference{batchName='leftBatch', batchIndex='leftIndex', 
recordIndex='leftIndex'}
+ * BatchReference{batchName='rightContainer', 
batchIndex='rightBatchIndex', recordIndex='rightRecordIndexWithinBatch'}
+ *
+ */
+public final class BatchReference {
--- End diff --

`BatchReference` instance can be created during batch initialization (ex: 
instance of `AbstractRecordBatch`) since naming of batches used won't change 
during data processing. Though info from batch references will be used during 
schema build (i.e. once per OK_NEW_SCHEMA).
Added this info into `BatchReference` java doc.


> Nested loop join: return correct result for left join
> -
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> ++-+-+---+
> | dt | fyq | who | event |
> ++-+-+---+
> | 2016-01-01 | NULL| aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing  |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> ++-+-+---+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5391) CTAS: folder and file permission should be configurable

2017-03-29 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946874#comment-15946874
 ] 

Arina Ielchiieva commented on DRILL-5391:
-

Zelaine, regarding the decision, with CTTAS introduction approach for creating 
files in Drill has changed, since we had to distinguish which files are 
temporary and which are not. Writers were not aware which files they created 
(fs.create(fileName)). To make solution configurable was decided to pass 
permission which should be applied to the folder rather that indicating if this 
is temp files or not (so if permission needs to be changed for some reason, it 
would be easily maintained). Having permission 755 / 644 for persistent files 
is common permission among systems and applications. Having this permission for 
Drill did not seem to break the anything at that time (well, I guess this Jira 
use case proves we were wrong). 

I think it a good idea for Drill to create files with permission (like it's 
done in Hive), though I guess to make this option configurable is what we have 
missed. For example, for views Drill also creates files with specified 
permission and it can be configurable.

> CTAS: folder and file permission should be configurable
> ---
>
> Key: DRILL-5391
> URL: https://issues.apache.org/jira/browse/DRILL-5391
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
> Environment: CentOS 7, HDP 2.4
>Reporter: Chua Tianxiang
>Priority: Minor
> Attachments: Drill-1-10.PNG, Drill-1-9.PNG
>
>
> In Drill 1.9, CREATE TABLE AS creates a folder with permissions 777, while on 
> Drill 1.10, the same commands creates a folder with permission 775. Both 
> drills are started with root user, installed on the same servers and accesses 
> the same HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5391) CTAS: folder and file permission should be configurable

2017-03-29 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946831#comment-15946831
 ] 

Arina Ielchiieva commented on DRILL-5391:
-

Not sure, why Drill 1.9 creates folders with 777 on your env. For example, for 
me on Linux or maprfs Drill 1.9 creates folders with 755.

> CTAS: folder and file permission should be configurable
> ---
>
> Key: DRILL-5391
> URL: https://issues.apache.org/jira/browse/DRILL-5391
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
> Environment: CentOS 7, HDP 2.4
>Reporter: Chua Tianxiang
>Priority: Minor
> Attachments: Drill-1-10.PNG, Drill-1-9.PNG
>
>
> In Drill 1.9, CREATE TABLE AS creates a folder with permissions 777, while on 
> Drill 1.10, the same commands creates a folder with permission 775. Both 
> drills are started with root user, installed on the same servers and accesses 
> the same HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5378) Put more information into SchemaChangeException when HashJoin hit SchemaChangeException

2017-03-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946759#comment-15946759
 ] 

ASF GitHub Bot commented on DRILL-5378:
---

Github user gparai commented on the issue:

https://github.com/apache/drill/pull/801
  
Would it be a good idea to add tests for the same? Otherwise, LGTM.


> Put more information into SchemaChangeException when HashJoin hit 
> SchemaChangeException
> ---
>
> Key: DRILL-5378
> URL: https://issues.apache.org/jira/browse/DRILL-5378
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Minor
>
> HashJoin currently does not allow schema change in either build side or probe 
> side. When HashJoin hit SchemaChangeException in the middle of execution, 
> Drill reports a brief error message about SchemaChangeException, without 
> providing any information what schemas are in the incoming batches. That 
> makes hard to analyze the error, and understand what's going on. 
> It probably makes sense to put the two differing schemas in the error 
> message, so that user could get better idea about the schema change. 
> Before Drill can provide support for schema change in HashJoin, the detailed 
> error message would help user debug error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5391) CTAS: folder and file permission should be configurable

2017-03-29 Thread Chua Tianxiang (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946758#comment-15946758
 ] 

Chua Tianxiang commented on DRILL-5391:
---

Arina, I got the following 755, umask 022.
drwxr-xr-x   - root  hdfs  0 2017-03-29 08:58 /tmp/folder


> CTAS: folder and file permission should be configurable
> ---
>
> Key: DRILL-5391
> URL: https://issues.apache.org/jira/browse/DRILL-5391
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
> Environment: CentOS 7, HDP 2.4
>Reporter: Chua Tianxiang
>Priority: Minor
> Attachments: Drill-1-10.PNG, Drill-1-9.PNG
>
>
> In Drill 1.9, CREATE TABLE AS creates a folder with permissions 777, while on 
> Drill 1.10, the same commands creates a folder with permission 775. Both 
> drills are started with root user, installed on the same servers and accesses 
> the same HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5391) CTAS: folder and file permission should be configurable

2017-03-29 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946750#comment-15946750
 ] 

Arina Ielchiieva commented on DRILL-5391:
-

Chua, when you create folder on hdfs manually (hadoop fs -mkdir /some-dir) 
under the user who started the drillbit (root), with what permission directory 
is created?

> CTAS: folder and file permission should be configurable
> ---
>
> Key: DRILL-5391
> URL: https://issues.apache.org/jira/browse/DRILL-5391
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.10.0
> Environment: CentOS 7, HDP 2.4
>Reporter: Chua Tianxiang
>Priority: Minor
> Attachments: Drill-1-10.PNG, Drill-1-9.PNG
>
>
> In Drill 1.9, CREATE TABLE AS creates a folder with permissions 777, while on 
> Drill 1.10, the same commands creates a folder with permission 775. Both 
> drills are started with root user, installed on the same servers and accesses 
> the same HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

64 matches

Mail list logo