[GitHub] Ben-Zvi closed pull request #1114: DRILL-6104: Added Logfile Reader

2018-07-18 Thread GitBox
Ben-Zvi closed pull request #1114: DRILL-6104: Added Logfile Reader
URL: https://github.com/apache/drill/pull/1114
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatConfig.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatConfig.java
new file mode 100644
index 000..c3cf97e26c5
--- /dev/null
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatConfig.java
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.log;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Objects;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+@JsonTypeName("logRegex")
+public class LogFormatConfig implements FormatPluginConfig {
+
+  private String regex;
+  private String extension;
+  private int maxErrors = 10;
+  private List schema;
+
+  public String getRegex() {
+return regex;
+  }
+
+  public String getExtension() {
+return extension;
+  }
+
+  public int getMaxErrors() {
+return maxErrors;
+  }
+
+  public List getSchema() {
+return schema;
+  }
+
+  //Setters
+  public void setExtension(String ext) {
+this.extension = ext;
+  }
+
+  public void setMaxErrors(int errors) {
+this.maxErrors = errors;
+  }
+
+  public void setRegex(String regex) {
+this.regex = regex;
+  }
+
+  public void setSchema() {
+this.schema = new ArrayList();
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj == null || getClass() != obj.getClass()) {
+  return false;
+}
+LogFormatConfig other = (LogFormatConfig) obj;
+return Objects.equal(regex, other.regex) &&
+Objects.equal(maxErrors, other.maxErrors) &&
+Objects.equal(schema, other.schema) &&
+Objects.equal(extension, other.extension);
+  }
+
+  @Override
+  public int hashCode() {
+return Arrays.hashCode(new Object[]{regex, maxErrors, schema, extension});
+  }
+
+  @JsonIgnore
+  public List getFieldNames() {
+List result = new ArrayList();
+if (this.schema == null) {
+  return result;
+}
+
+for (LogFormatField field : this.schema) {
+  result.add(field.getFieldName());
+}
+return result;
+  }
+
+  @JsonIgnore
+  public String getDataType(int fieldIndex) {
+LogFormatField f = this.schema.get(fieldIndex);
+return f.getFieldType().toUpperCase();
+  }
+
+  @JsonIgnore
+  public LogFormatField getField(int fieldIndex) {
+return this.schema.get(fieldIndex);
+  }
+
+  @JsonIgnore
+  public String getDateFormat(int patternIndex) {
+LogFormatField f = this.schema.get(patternIndex);
+return f.getFormat();
+  }
+}
\ No newline at end of file
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatField.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatField.java
new file mode 100644
index 000..64a6db76ad7
--- /dev/null
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/LogFormatField.java
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, eithe

[GitHub] Ben-Zvi closed pull request #1336: DRILL-6496: Added missing logging statement in VectorUtil.showVectorAccessibleContent(VectorAccessible va, int[] columnWidths)

2018-07-18 Thread GitBox
Ben-Zvi closed pull request #1336: DRILL-6496: Added missing logging statement 
in VectorUtil.showVectorAccessibleContent(VectorAccessible va, int[] 
columnWidths)
URL: https://github.com/apache/drill/pull/1336
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/json/BaseJsonTest.java
 
b/contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/json/BaseJsonTest.java
index ee32aa18ef6..550fb73b330 100644
--- 
a/contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/json/BaseJsonTest.java
+++ 
b/contrib/format-maprdb/src/test/java/com/mapr/drill/maprdb/tests/json/BaseJsonTest.java
@@ -55,11 +55,11 @@ public static void tearDownAfterClass() throws Exception {
 
   protected void runSQLAndVerifyCount(String sql, int expectedRowCount) throws 
Exception{
 List results = runHBaseSQLlWithResults(sql);
-printResultAndVerifyRowCount(results, expectedRowCount);
+logResultAndVerifyRowCount(results, expectedRowCount);
   }
 
-  private void printResultAndVerifyRowCount(List results, int 
expectedRowCount) throws SchemaChangeException {
-int rowCount = printResult(results);
+  private void logResultAndVerifyRowCount(List results, int 
expectedRowCount) throws SchemaChangeException {
+int rowCount = logResult(results);
 if (expectedRowCount != -1) {
   Assert.assertEquals(expectedRowCount, rowCount);
 }
diff --git 
a/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/BaseHBaseTest.java 
b/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/BaseHBaseTest.java
index dd7ce674e49..ab75eda1ee2 100644
--- 
a/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/BaseHBaseTest.java
+++ 
b/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/BaseHBaseTest.java
@@ -79,7 +79,7 @@ protected String getPlanText(String planFile, String 
tableName) throws IOExcepti
   protected void runHBasePhysicalVerifyCount(String planFile, String 
tableName, int expectedRowCount) throws Exception{
 String physicalPlan = getPlanText(planFile, tableName);
 List results = testPhysicalWithResults(physicalPlan);
-printResultAndVerifyRowCount(results, expectedRowCount);
+logResultAndVerifyRowCount(results, expectedRowCount);
   }
 
   protected List runHBaseSQLlWithResults(String sql) throws 
Exception {
@@ -89,11 +89,11 @@ protected void runHBasePhysicalVerifyCount(String planFile, 
String tableName, in
 
   protected void runHBaseSQLVerifyCount(String sql, int expectedRowCount) 
throws Exception{
 List results = runHBaseSQLlWithResults(sql);
-printResultAndVerifyRowCount(results, expectedRowCount);
+logResultAndVerifyRowCount(results, expectedRowCount);
   }
 
-  private void printResultAndVerifyRowCount(List results, int 
expectedRowCount) throws SchemaChangeException {
-int rowCount = printResult(results);
+  private void logResultAndVerifyRowCount(List results, int 
expectedRowCount) throws SchemaChangeException {
+int rowCount = logResult(results);
 if (expectedRowCount != -1) {
   Assert.assertEquals(expectedRowCount, rowCount);
 }
diff --git 
a/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseCFAsJSONString.java
 
b/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseCFAsJSONString.java
index 592dda07396..50cda8fb292 100644
--- 
a/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseCFAsJSONString.java
+++ 
b/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseCFAsJSONString.java
@@ -55,7 +55,7 @@ public static void closeMyClient() throws IOException {
   public void testColumnFamiliesAsJSONString() throws Exception {
 setColumnWidths(new int[] {112, 12});
 List resultList = runHBaseSQLlWithResults("SELECT f, f2 
FROM hbase.`[TABLE_NAME]` tableName LIMIT 1");
-printResult(resultList);
+logResult(resultList);
   }
 
 }
diff --git 
a/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseQueries.java
 
b/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseQueries.java
index abd76a7e693..27882b59e70 100644
--- 
a/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseQueries.java
+++ 
b/contrib/storage-hbase/src/test/java/org/apache/drill/hbase/TestHBaseQueries.java
@@ -96,7 +96,7 @@ public void testCastEmptyStrings() throws Exception {
 List resultList = runHBaseSQLlWithResults("SELECT 
row_key,\n"
 + " CAST(t.f.c1 as INT) c1, CAST(t.f.c2 as BIGINT) c2, CAST(t.f.c3 
as INT) c3,\n"
 + " CAST(t.f.c4 as INT) c4 FROM hbase.TestTableNullStr t where 
row_key='a1'");
-printResult(resultList);
+logResult(resultList);
 }
 finally {
 test("alter system rese

[GitHub] Ben-Zvi closed pull request #1381: DRILL-6475: Unnest: Null fieldId Pointer.

2018-07-18 Thread GitBox
Ben-Zvi closed pull request #1381: DRILL-6475: Unnest: Null fieldId Pointer.
URL: https://github.com/apache/drill/pull/1381
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/AdjustOperatorsSchemaVisitor.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/AdjustOperatorsSchemaVisitor.java
new file mode 100644
index 000..c46b7255a28
--- /dev/null
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/AdjustOperatorsSchemaVisitor.java
@@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.physical.visitor;
+
+import java.util.ArrayList;
+import java.util.List;
+import com.google.common.base.Preconditions;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.drill.exec.planner.physical.JoinPrel;
+import org.apache.drill.exec.planner.physical.LateralJoinPrel;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.calcite.rel.RelNode;
+
+import com.google.common.collect.Lists;
+import org.apache.drill.exec.planner.physical.UnnestPrel;
+
+/**
+ * AdjustOperatorsSchemaVisitor visits corresponding operators' which 
depending upon their functionality
+ * adjusts their output row types. The adjusting mechanism is unique to each 
operator. In case of joins this visitor
+ * adjusts the field names to make sure that upstream operator only sees that 
there are unique field names even though
+ * the children of the join has same field names. Whereas in case of 
lateral/unnest operators it changes the correlated
+ * field and also the unnest operator's output row type.
+ */
+public class AdjustOperatorsSchemaVisitor extends BasePrelVisitor{
+
+  private Prel registeredPrel = null;
+
+  private static AdjustOperatorsSchemaVisitor INSTANCE = new 
AdjustOperatorsSchemaVisitor();
+
+  public static Prel adjustSchema(Prel prel){
+return prel.accept(INSTANCE, null);
+  }
+
+  private void register(Prel prel) {
+this.registeredPrel = prel;
+  }
+
+  private Prel getRegisteredPrel() {
+return this.registeredPrel;
+  }
+
+  @Override
+  public Prel visitPrel(Prel prel, Void value) throws RuntimeException {
+return preparePrel(prel, getChildren(prel));
+  }
+
+  public void unRegister() {
+this.registeredPrel = null;
+  }
+
+  private List getChildren(Prel prel, int registerForChild) {
+int ch = 0;
+List children = Lists.newArrayList();
+for(Prel child : prel){
+  if (ch == registerForChild) {
+register(prel);
+  }
+  child = child.accept(this, null);
+  if (ch == registerForChild) {
+unRegister();
+  }
+  children.add(child);
+  ch++;
+}
+return children;
+  }
+
+  private List getChildren(Prel prel) {
+return getChildren(prel, -1);
+  }
+
+  private Prel preparePrel(Prel prel, List renamedNodes) {
+return (Prel) prel.copy(prel.getTraitSet(), renamedNodes);
+  }
+
+  @Override
+  public Prel visitJoin(JoinPrel prel, Void value) throws RuntimeException {
+
+List children = getChildren(prel);
+
+final int leftCount = children.get(0).getRowType().getFieldCount();
+
+List reNamedChildren = Lists.newArrayList();
+
+RelNode left = prel.getJoinInput(0, children.get(0));
+RelNode right = prel.getJoinInput(leftCount, children.get(1));
+
+reNamedChildren.add(left);
+reNamedChildren.add(right);
+
+return preparePrel(prel, reNamedChildren);
+  }
+
+  @Override
+  public Prel visitLateral(LateralJoinPrel prel, Void value) throws 
RuntimeException {
+
+List children = getChildren(prel, 1);
+List reNamedChildren = new ArrayList<>();
+
+for (int i = 0; i < children.size(); i++) {
+  reNamedChildren.add(prel.getLateralInput(i, children.get(i)));
+}
+
+re

Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Robert Hou
Congratulations, Arina!

--Robert

On Wed, Jul 18, 2018 at 9:12 PM, Sorabh Hamirwasia 
wrote:

> Congratulations Arina!
>
> On Wed, Jul 18, 2018 at 6:13 PM, Charles Givre  wrote:
>
> > Congrats Arina!! Well done!
> >
> > > On Jul 18, 2018, at 20:59, Paul Rogers 
> > wrote:
> > >
> > > Congratulations Arina!
> > >
> > > - Paul
> > >
> > >
> > >
> > >On Wednesday, July 18, 2018, 2:19:44 PM PDT, Aman Sinha <
> > amansi...@apache.org> wrote:
> > >
> > > Drill developers,
> > > Time flies and it is time for a new PMC chair !  Thank you all for your
> > > support during the past year.
> > >
> > > I am very pleased to announce that the Drill PMC has voted to elect
> Arina
> > > Ielchiieva as the new PMC chair of Apache Drill.  She has also been
> > > approved unanimously by the Apache Board in today's board meeting.
> > Please
> > > join me in congratulating Arina !
> > >
> > > Thanks,
> > > Aman
> >
> >
>


Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Sorabh Hamirwasia
Congratulations Arina!

On Wed, Jul 18, 2018 at 6:13 PM, Charles Givre  wrote:

> Congrats Arina!! Well done!
>
> > On Jul 18, 2018, at 20:59, Paul Rogers 
> wrote:
> >
> > Congratulations Arina!
> >
> > - Paul
> >
> >
> >
> >On Wednesday, July 18, 2018, 2:19:44 PM PDT, Aman Sinha <
> amansi...@apache.org> wrote:
> >
> > Drill developers,
> > Time flies and it is time for a new PMC chair !  Thank you all for your
> > support during the past year.
> >
> > I am very pleased to announce that the Drill PMC has voted to elect Arina
> > Ielchiieva as the new PMC chair of Apache Drill.  She has also been
> > approved unanimously by the Apache Board in today's board meeting.
> Please
> > join me in congratulating Arina !
> >
> > Thanks,
> > Aman
>
>


Re: [DISCUSS] 1.14.0 release

2018-07-18 Thread Charles Givre
HI Boaz, 
DRILL-6104 is ready to release.  Do you think we’ll have an RC this week?
Thanks,
— C

> On Jul 2, 2018, at 23:01, Boaz Ben-Zvi  wrote:
> 
>   Let's try to make progress on the 1.14 release, aiming for a Release 
> Candidate towards the end of this week (a little ambitious, with the July 4th 
> and people on vacations).
> 
> Current Status of the previously requested Jiras:
> 
> ==
> 
> In Progress - DRILL-6104: Generic Logfile Format Plugin
> 
> PR - DRILL-6422: Update Guava to 23.0 and shade it
> 
> PR - DRILL-5999 (DRILL-6516): Support for EMIT outcome in Streaming Agg
> 
> Ready2Commit: DRILL-5977: predicate pushdown support kafkaMsgOffset
> 
> Ready2Commit: DRILL-6519: Add String Distance and Phonetic Functions
> 
> Ready2Commit: DRILL-6577: Change Hash-Join default to not fallback (into 
> pre-1.14 unlimited memory)
> 
> Committed: DRILL-6353: Upgrade Parquet MR dependencies
> 
> Committed: DRILL-6310: limit batch size for hash aggregate
> 
> ===
> 
> And there are few more open or in a PR state.
> 
>Lets try and most of these ready by the end of the week.
> 
>Boaz
> 
> 



[GitHub] vrozov commented on a change in pull request #1387: DRILL-6603: Set num_nulls for parquet statistics to -1 when actual number is not defined.

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1387: DRILL-6603: Set num_nulls 
for parquet statistics to -1 when actual number is not defined.
URL: https://github.com/apache/drill/pull/1387#discussion_r203544029
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -48,13 +48,19 @@ static boolean isAllNulls(Statistics stat, long rowCount) {
   }
 
   /**
-   * Checks that column chunk's statistics does not have nulls
+   * Checks that column chunk's statistics does not have nulls.
+   *
+   * 
+   * If number of nulls is more then 0, it means column has nulls.
 
 Review comment:
   I am not sure that this comment adds any value. It is the same as stating 
that if a number of elements in a set is greater than zero, the set is not 
empty.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1387: DRILL-6603: Set num_nulls for parquet statistics to -1 when actual number is not defined.

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1387: DRILL-6603: Set num_nulls 
for parquet statistics to -1 when actual number is not defined.
URL: https://github.com/apache/drill/pull/1387#discussion_r203543266
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -48,13 +48,19 @@ static boolean isAllNulls(Statistics stat, long rowCount) {
   }
 
   /**
-   * Checks that column chunk's statistics does not have nulls
+   * Checks that column chunk's statistics does not have nulls.
+   *
+   * 
+   * If number of nulls is more then 0, it means column has nulls.
+   * If number of nulls is less then zero (usually -1), it means that 
stats are unavailable.
 
 Review comment:
   Please see the documentation for `getNumNulls()`. In case the number of 
nulls is not set it is `-1`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1387: DRILL-6603: Set num_nulls for parquet statistics to -1 when actual number is not defined.

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1387: DRILL-6603: Set num_nulls 
for parquet statistics to -1 when actual number is not defined.
URL: https://github.com/apache/drill/pull/1387#discussion_r203590106
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/MetadataBase.java
 ##
 @@ -108,6 +105,18 @@
 
 
   public static abstract class ColumnMetadata {
+
+/**
+ * Number of nulls is considered to be valid if its value is not null and 
-1.
+ *
+ * @return true if nulls value is defined, false otherwise
+ */
+@JsonIgnore
+public boolean isNullsDefined() {
+  Long nulls = getNulls();
 
 Review comment:
   Why is it necessary to deal both with `null` and `-1`? Why is `getNulls` a 
`Long` and not a `long`? Consider renaming it to `getNumNulls`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1387: DRILL-6603: Set num_nulls for parquet statistics to -1 when actual number is not defined.

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1387: DRILL-6603: Set num_nulls 
for parquet statistics to -1 when actual number is not defined.
URL: https://github.com/apache/drill/pull/1387#discussion_r203590424
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/MetadataBase.java
 ##
 @@ -108,6 +105,18 @@
 
 
   public static abstract class ColumnMetadata {
+
+/**
+ * Number of nulls is considered to be valid if its value is not null and 
-1.
+ *
+ * @return true if nulls value is defined, false otherwise
+ */
+@JsonIgnore
+public boolean isNullsDefined() {
 
 Review comment:
   Consider `isNumNullsSet()`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1387: DRILL-6603: Set num_nulls for parquet statistics to -1 when actual number is not defined.

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1387: DRILL-6603: Set num_nulls 
for parquet statistics to -1 when actual number is not defined.
URL: https://github.com/apache/drill/pull/1387#discussion_r203541339
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetPredicatesHelper.java
 ##
 @@ -48,13 +48,19 @@ static boolean isAllNulls(Statistics stat, long rowCount) {
   }
 
   /**
-   * Checks that column chunk's statistics does not have nulls
+   * Checks that column chunk's statistics does not have nulls.
 
 Review comment:
   Please be consistent with `isAllNulls()` formatting: both should use (or not 
use) period.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.

2018-07-18 Thread GitBox
Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in 
HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.
URL: https://github.com/apache/drill/pull/1384#discussion_r203586629
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -213,86 +219,185 @@ public int getRecordCount() {
 
   @Override
   protected void buildSchema() throws SchemaChangeException {
-if (! prefetchFirstBatchFromBothSides()) {
-  return;
+// We must first get the schemas from upstream operators before we can 
build
+// our schema.
+boolean validSchema = sniffNewSchemas();
+
+if (validSchema) {
+  // We are able to construct a valid schema from the upstream data.
+  // Setting the state here makes sure AbstractRecordBatch returns 
OK_NEW_SCHEMA
+  state = BatchState.BUILD_SCHEMA;
+} else {
+  // We were not able to build a valid schema, so we need to set our 
termination state.
+  final Optional batchStateOpt = getBatchStateTermination();
+  state = batchStateOpt.get(); // There should be a state here.
 }
 
+// If we have a valid schema, this will build a valid container. If we 
were unable to obtain a valid schema,
+// we still need to build a dummy schema. These code handles both cases 
for us.
+setupOutputContainerSchema();
+container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
+
 // Initialize the hash join helper context
-if (rightUpstream != IterOutcome.NONE) {
+if (rightUpstream == IterOutcome.OK_NEW_SCHEMA) {
+  // We only need the hash tables if we have data on the build side.
   setupHashTable();
 }
-setupOutputContainerSchema();
+
 try {
   hashJoinProbe = setupHashJoinProbe();
 } catch (IOException | ClassTransformationException e) {
   throw new SchemaChangeException(e);
 }
-
-container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
   }
 
   @Override
   protected boolean prefetchFirstBatchFromBothSides() {
-leftUpstream = sniffNonEmptyBatch(0, left);
-rightUpstream = sniffNonEmptyBatch(1, right);
+if (leftUpstream != IterOutcome.NONE) {
+  // We can only get data if there is data available
+  leftUpstream = sniffNonEmptyBatch(leftUpstream, LEFT_INDEX, left);
+}
+
+if (rightUpstream != IterOutcome.NONE) {
+  // We can only get data if there is data available
+  rightUpstream = sniffNonEmptyBatch(rightUpstream, RIGHT_INDEX, right);
+}
+
+buildSideIsEmpty = rightUpstream == IterOutcome.NONE;
+
+final Optional batchStateOpt = getBatchStateTermination();
+
+if (batchStateOpt.isPresent()) {
+  // We reached a termination state
+  state = batchStateOpt.get();
+
+  switch (state) {
+case STOP:
+case OUT_OF_MEMORY:
+  // Terminate processing now
+  return false;
+case DONE:
+  // No more data but take operation to completion
+  return true;
+default:
+  throw new IllegalStateException();
+  }
+} else {
+  // For build side, use aggregate i.e. average row width across batches
+  batchMemoryManager.update(LEFT_INDEX, 0);
+  batchMemoryManager.update(RIGHT_INDEX, 0, true);
 
-// For build side, use aggregate i.e. average row width across batches
-batchMemoryManager.update(LEFT_INDEX, 0);
-batchMemoryManager.update(RIGHT_INDEX, 0, true);
+  logger.debug("BATCH_STATS, incoming left: {}", 
batchMemoryManager.getRecordBatchSizer(LEFT_INDEX));
+  logger.debug("BATCH_STATS, incoming right: {}", 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX));
 
-logger.debug("BATCH_STATS, incoming left: {}", 
batchMemoryManager.getRecordBatchSizer(LEFT_INDEX));
-logger.debug("BATCH_STATS, incoming right: {}", 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX));
+  // Got our first batche(s)
+  state = BatchState.FIRST;
+  return true;
+}
+  }
 
+  /**
+   * Checks if a termination state has been reached, and returns the 
appropriate termination state if it has been reached.
+   * @return The termination state if it has been reached. Otherwise empty.
+   */
+  private Optional getBatchStateTermination() {
 if (leftUpstream == IterOutcome.STOP || rightUpstream == IterOutcome.STOP) 
{
-  state = BatchState.STOP;
-  return false;
+  return Optional.of(BatchState.STOP);
 }
 
 if (leftUpstream == IterOutcome.OUT_OF_MEMORY || rightUpstream == 
IterOutcome.OUT_OF_MEMORY) {
-  state = BatchState.OUT_OF_MEMORY;
-  return false;
+  return Optional.of(BatchState.OUT_OF_MEMORY);
 }
 
 if (checkForEarlyFinish(leftUpstream, rightUpstream)) {
-  state = BatchState.DONE;
-  return false;
+  return Optional.of(BatchState.DONE);
+}
+
+return Optional.empty();
+  }
+
+  /**
+   * Sniffs all data necessary to construct a schema.
+   * @return True if all the 

[GitHub] Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.

2018-07-18 Thread GitBox
Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in 
HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.
URL: https://github.com/apache/drill/pull/1384#discussion_r203538175
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -213,86 +219,185 @@ public int getRecordCount() {
 
   @Override
   protected void buildSchema() throws SchemaChangeException {
-if (! prefetchFirstBatchFromBothSides()) {
-  return;
+// We must first get the schemas from upstream operators before we can 
build
+// our schema.
+boolean validSchema = sniffNewSchemas();
+
+if (validSchema) {
+  // We are able to construct a valid schema from the upstream data.
+  // Setting the state here makes sure AbstractRecordBatch returns 
OK_NEW_SCHEMA
+  state = BatchState.BUILD_SCHEMA;
+} else {
+  // We were not able to build a valid schema, so we need to set our 
termination state.
+  final Optional batchStateOpt = getBatchStateTermination();
+  state = batchStateOpt.get(); // There should be a state here.
 }
 
+// If we have a valid schema, this will build a valid container. If we 
were unable to obtain a valid schema,
+// we still need to build a dummy schema. These code handles both cases 
for us.
+setupOutputContainerSchema();
+container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
+
 // Initialize the hash join helper context
-if (rightUpstream != IterOutcome.NONE) {
+if (rightUpstream == IterOutcome.OK_NEW_SCHEMA) {
+  // We only need the hash tables if we have data on the build side.
   setupHashTable();
 }
-setupOutputContainerSchema();
+
 try {
   hashJoinProbe = setupHashJoinProbe();
 } catch (IOException | ClassTransformationException e) {
   throw new SchemaChangeException(e);
 }
-
-container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
   }
 
   @Override
   protected boolean prefetchFirstBatchFromBothSides() {
-leftUpstream = sniffNonEmptyBatch(0, left);
-rightUpstream = sniffNonEmptyBatch(1, right);
+if (leftUpstream != IterOutcome.NONE) {
+  // We can only get data if there is data available
+  leftUpstream = sniffNonEmptyBatch(leftUpstream, LEFT_INDEX, left);
+}
+
+if (rightUpstream != IterOutcome.NONE) {
+  // We can only get data if there is data available
+  rightUpstream = sniffNonEmptyBatch(rightUpstream, RIGHT_INDEX, right);
+}
+
+buildSideIsEmpty = rightUpstream == IterOutcome.NONE;
+
+final Optional batchStateOpt = getBatchStateTermination();
+
+if (batchStateOpt.isPresent()) {
+  // We reached a termination state
+  state = batchStateOpt.get();
+
+  switch (state) {
+case STOP:
+case OUT_OF_MEMORY:
+  // Terminate processing now
+  return false;
+case DONE:
+  // No more data but take operation to completion
+  return true;
+default:
+  throw new IllegalStateException();
+  }
+} else {
+  // For build side, use aggregate i.e. average row width across batches
+  batchMemoryManager.update(LEFT_INDEX, 0);
+  batchMemoryManager.update(RIGHT_INDEX, 0, true);
 
-// For build side, use aggregate i.e. average row width across batches
-batchMemoryManager.update(LEFT_INDEX, 0);
-batchMemoryManager.update(RIGHT_INDEX, 0, true);
+  logger.debug("BATCH_STATS, incoming left: {}", 
batchMemoryManager.getRecordBatchSizer(LEFT_INDEX));
+  logger.debug("BATCH_STATS, incoming right: {}", 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX));
 
-logger.debug("BATCH_STATS, incoming left: {}", 
batchMemoryManager.getRecordBatchSizer(LEFT_INDEX));
-logger.debug("BATCH_STATS, incoming right: {}", 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX));
+  // Got our first batche(s)
+  state = BatchState.FIRST;
+  return true;
+}
+  }
 
+  /**
+   * Checks if a termination state has been reached, and returns the 
appropriate termination state if it has been reached.
+   * @return The termination state if it has been reached. Otherwise empty.
+   */
+  private Optional getBatchStateTermination() {
 
 Review comment:
   Why re-implement *verifyOutcomeToSetBatchState()* ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.

2018-07-18 Thread GitBox
Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in 
HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.
URL: https://github.com/apache/drill/pull/1384#discussion_r203588145
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -213,86 +219,185 @@ public int getRecordCount() {
 
   @Override
   protected void buildSchema() throws SchemaChangeException {
-if (! prefetchFirstBatchFromBothSides()) {
-  return;
+// We must first get the schemas from upstream operators before we can 
build
+// our schema.
+boolean validSchema = sniffNewSchemas();
+
+if (validSchema) {
+  // We are able to construct a valid schema from the upstream data.
+  // Setting the state here makes sure AbstractRecordBatch returns 
OK_NEW_SCHEMA
+  state = BatchState.BUILD_SCHEMA;
+} else {
+  // We were not able to build a valid schema, so we need to set our 
termination state.
+  final Optional batchStateOpt = getBatchStateTermination();
+  state = batchStateOpt.get(); // There should be a state here.
 }
 
+// If we have a valid schema, this will build a valid container. If we 
were unable to obtain a valid schema,
+// we still need to build a dummy schema. These code handles both cases 
for us.
+setupOutputContainerSchema();
+container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
+
 // Initialize the hash join helper context
-if (rightUpstream != IterOutcome.NONE) {
+if (rightUpstream == IterOutcome.OK_NEW_SCHEMA) {
+  // We only need the hash tables if we have data on the build side.
   setupHashTable();
 }
-setupOutputContainerSchema();
+
 try {
   hashJoinProbe = setupHashJoinProbe();
 } catch (IOException | ClassTransformationException e) {
   throw new SchemaChangeException(e);
 }
-
-container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
   }
 
   @Override
   protected boolean prefetchFirstBatchFromBothSides() {
-leftUpstream = sniffNonEmptyBatch(0, left);
-rightUpstream = sniffNonEmptyBatch(1, right);
+if (leftUpstream != IterOutcome.NONE) {
+  // We can only get data if there is data available
+  leftUpstream = sniffNonEmptyBatch(leftUpstream, LEFT_INDEX, left);
+}
+
+if (rightUpstream != IterOutcome.NONE) {
+  // We can only get data if there is data available
+  rightUpstream = sniffNonEmptyBatch(rightUpstream, RIGHT_INDEX, right);
+}
+
+buildSideIsEmpty = rightUpstream == IterOutcome.NONE;
+
+final Optional batchStateOpt = getBatchStateTermination();
+
+if (batchStateOpt.isPresent()) {
+  // We reached a termination state
+  state = batchStateOpt.get();
+
+  switch (state) {
+case STOP:
+case OUT_OF_MEMORY:
+  // Terminate processing now
+  return false;
+case DONE:
+  // No more data but take operation to completion
+  return true;
+default:
+  throw new IllegalStateException();
+  }
+} else {
+  // For build side, use aggregate i.e. average row width across batches
+  batchMemoryManager.update(LEFT_INDEX, 0);
+  batchMemoryManager.update(RIGHT_INDEX, 0, true);
 
-// For build side, use aggregate i.e. average row width across batches
-batchMemoryManager.update(LEFT_INDEX, 0);
-batchMemoryManager.update(RIGHT_INDEX, 0, true);
+  logger.debug("BATCH_STATS, incoming left: {}", 
batchMemoryManager.getRecordBatchSizer(LEFT_INDEX));
+  logger.debug("BATCH_STATS, incoming right: {}", 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX));
 
-logger.debug("BATCH_STATS, incoming left: {}", 
batchMemoryManager.getRecordBatchSizer(LEFT_INDEX));
-logger.debug("BATCH_STATS, incoming right: {}", 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX));
+  // Got our first batche(s)
+  state = BatchState.FIRST;
+  return true;
+}
+  }
 
+  /**
+   * Checks if a termination state has been reached, and returns the 
appropriate termination state if it has been reached.
+   * @return The termination state if it has been reached. Otherwise empty.
+   */
+  private Optional getBatchStateTermination() {
 if (leftUpstream == IterOutcome.STOP || rightUpstream == IterOutcome.STOP) 
{
-  state = BatchState.STOP;
-  return false;
+  return Optional.of(BatchState.STOP);
 }
 
 if (leftUpstream == IterOutcome.OUT_OF_MEMORY || rightUpstream == 
IterOutcome.OUT_OF_MEMORY) {
-  state = BatchState.OUT_OF_MEMORY;
-  return false;
+  return Optional.of(BatchState.OUT_OF_MEMORY);
 }
 
 if (checkForEarlyFinish(leftUpstream, rightUpstream)) {
-  state = BatchState.DONE;
-  return false;
+  return Optional.of(BatchState.DONE);
+}
+
+return Optional.empty();
+  }
+
+  /**
+   * Sniffs all data necessary to construct a schema.
+   * @return True if all the 

[GitHub] Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.

2018-07-18 Thread GitBox
Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in 
HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.
URL: https://github.com/apache/drill/pull/1384#discussion_r203576783
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -317,7 +422,25 @@ public HashJoinMemoryCalculator getCalculatorImpl() {
 
   @Override
   public IterOutcome innerNext() {
-// In case incoming was killed before, just cleanup and return
+if (!prefetched) {
+  // If we didn't retrieve our first data hold batch, we need to do it now.
+  prefetched = true;
+  prefetchFirstBatchFromBothSides();
+
+  // Handle emitting the correct outcome for termination conditions
+  // Use the state set by prefetchFirstBatchFromBothSides to emit the 
correct termination outcome.
+  switch (state) {
+case DONE:
+  return IterOutcome.NONE;
+case STOP:
+  return IterOutcome.STOP;
+case OUT_OF_MEMORY:
+  return IterOutcome.OUT_OF_MEMORY;
+default:
+  // No termination condition so continue processing.
 
 Review comment:
   So basically this code was taken from AbstractRecordBatch:next() , following 
the buildSchema() , and moved into this innerNext() call. It is a hack. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.

2018-07-18 Thread GitBox
Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in 
HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.
URL: https://github.com/apache/drill/pull/1384#discussion_r203557114
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -213,86 +219,185 @@ public int getRecordCount() {
 
   @Override
   protected void buildSchema() throws SchemaChangeException {
-if (! prefetchFirstBatchFromBothSides()) {
-  return;
+// We must first get the schemas from upstream operators before we can 
build
+// our schema.
+boolean validSchema = sniffNewSchemas();
+
+if (validSchema) {
+  // We are able to construct a valid schema from the upstream data.
+  // Setting the state here makes sure AbstractRecordBatch returns 
OK_NEW_SCHEMA
+  state = BatchState.BUILD_SCHEMA;
+} else {
+  // We were not able to build a valid schema, so we need to set our 
termination state.
+  final Optional batchStateOpt = getBatchStateTermination();
+  state = batchStateOpt.get(); // There should be a state here.
 }
 
+// If we have a valid schema, this will build a valid container. If we 
were unable to obtain a valid schema,
+// we still need to build a dummy schema. These code handles both cases 
for us.
+setupOutputContainerSchema();
+container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
+
 // Initialize the hash join helper context
-if (rightUpstream != IterOutcome.NONE) {
+if (rightUpstream == IterOutcome.OK_NEW_SCHEMA) {
+  // We only need the hash tables if we have data on the build side.
   setupHashTable();
 }
-setupOutputContainerSchema();
+
 try {
   hashJoinProbe = setupHashJoinProbe();
 } catch (IOException | ClassTransformationException e) {
   throw new SchemaChangeException(e);
 }
-
-container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
   }
 
   @Override
   protected boolean prefetchFirstBatchFromBothSides() {
-leftUpstream = sniffNonEmptyBatch(0, left);
-rightUpstream = sniffNonEmptyBatch(1, right);
+if (leftUpstream != IterOutcome.NONE) {
+  // We can only get data if there is data available
+  leftUpstream = sniffNonEmptyBatch(leftUpstream, LEFT_INDEX, left);
+}
+
+if (rightUpstream != IterOutcome.NONE) {
+  // We can only get data if there is data available
+  rightUpstream = sniffNonEmptyBatch(rightUpstream, RIGHT_INDEX, right);
+}
+
+buildSideIsEmpty = rightUpstream == IterOutcome.NONE;
+
+final Optional batchStateOpt = getBatchStateTermination();
+
+if (batchStateOpt.isPresent()) {
+  // We reached a termination state
+  state = batchStateOpt.get();
+
+  switch (state) {
+case STOP:
+case OUT_OF_MEMORY:
+  // Terminate processing now
+  return false;
+case DONE:
+  // No more data but take operation to completion
+  return true;
+default:
+  throw new IllegalStateException();
+  }
+} else {
+  // For build side, use aggregate i.e. average row width across batches
+  batchMemoryManager.update(LEFT_INDEX, 0);
+  batchMemoryManager.update(RIGHT_INDEX, 0, true);
 
-// For build side, use aggregate i.e. average row width across batches
-batchMemoryManager.update(LEFT_INDEX, 0);
-batchMemoryManager.update(RIGHT_INDEX, 0, true);
+  logger.debug("BATCH_STATS, incoming left: {}", 
batchMemoryManager.getRecordBatchSizer(LEFT_INDEX));
+  logger.debug("BATCH_STATS, incoming right: {}", 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX));
 
-logger.debug("BATCH_STATS, incoming left: {}", 
batchMemoryManager.getRecordBatchSizer(LEFT_INDEX));
-logger.debug("BATCH_STATS, incoming right: {}", 
batchMemoryManager.getRecordBatchSizer(RIGHT_INDEX));
+  // Got our first batche(s)
+  state = BatchState.FIRST;
+  return true;
+}
+  }
 
+  /**
+   * Checks if a termination state has been reached, and returns the 
appropriate termination state if it has been reached.
+   * @return The termination state if it has been reached. Otherwise empty.
+   */
+  private Optional getBatchStateTermination() {
 if (leftUpstream == IterOutcome.STOP || rightUpstream == IterOutcome.STOP) 
{
-  state = BatchState.STOP;
-  return false;
+  return Optional.of(BatchState.STOP);
 }
 
 if (leftUpstream == IterOutcome.OUT_OF_MEMORY || rightUpstream == 
IterOutcome.OUT_OF_MEMORY) {
-  state = BatchState.OUT_OF_MEMORY;
-  return false;
+  return Optional.of(BatchState.OUT_OF_MEMORY);
 }
 
 if (checkForEarlyFinish(leftUpstream, rightUpstream)) {
-  state = BatchState.DONE;
-  return false;
+  return Optional.of(BatchState.DONE);
+}
+
+return Optional.empty();
+  }
+
+  /**
+   * Sniffs all data necessary to construct a schema.
+   * @return True if all the 

[GitHub] Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.

2018-07-18 Thread GitBox
Ben-Zvi commented on a change in pull request #1384: DRILL-6606: Fixed bug in 
HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.
URL: https://github.com/apache/drill/pull/1384#discussion_r203555686
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -213,86 +219,185 @@ public int getRecordCount() {
 
   @Override
   protected void buildSchema() throws SchemaChangeException {
-if (! prefetchFirstBatchFromBothSides()) {
-  return;
+// We must first get the schemas from upstream operators before we can 
build
+// our schema.
+boolean validSchema = sniffNewSchemas();
+
+if (validSchema) {
+  // We are able to construct a valid schema from the upstream data.
+  // Setting the state here makes sure AbstractRecordBatch returns 
OK_NEW_SCHEMA
+  state = BatchState.BUILD_SCHEMA;
+} else {
+  // We were not able to build a valid schema, so we need to set our 
termination state.
+  final Optional batchStateOpt = getBatchStateTermination();
+  state = batchStateOpt.get(); // There should be a state here.
 }
 
+// If we have a valid schema, this will build a valid container. If we 
were unable to obtain a valid schema,
+// we still need to build a dummy schema. These code handles both cases 
for us.
+setupOutputContainerSchema();
+container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
+
 // Initialize the hash join helper context
-if (rightUpstream != IterOutcome.NONE) {
+if (rightUpstream == IterOutcome.OK_NEW_SCHEMA) {
+  // We only need the hash tables if we have data on the build side.
   setupHashTable();
 }
-setupOutputContainerSchema();
+
 try {
   hashJoinProbe = setupHashJoinProbe();
 } catch (IOException | ClassTransformationException e) {
   throw new SchemaChangeException(e);
 }
-
-container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
   }
 
   @Override
   protected boolean prefetchFirstBatchFromBothSides() {
-leftUpstream = sniffNonEmptyBatch(0, left);
-rightUpstream = sniffNonEmptyBatch(1, right);
+if (leftUpstream != IterOutcome.NONE) {
+  // We can only get data if there is data available
+  leftUpstream = sniffNonEmptyBatch(leftUpstream, LEFT_INDEX, left);
+}
+
+if (rightUpstream != IterOutcome.NONE) {
+  // We can only get data if there is data available
+  rightUpstream = sniffNonEmptyBatch(rightUpstream, RIGHT_INDEX, right);
+}
+
+buildSideIsEmpty = rightUpstream == IterOutcome.NONE;
+
+final Optional batchStateOpt = getBatchStateTermination();
+
+if (batchStateOpt.isPresent()) {
+  // We reached a termination state
+  state = batchStateOpt.get();
+
+  switch (state) {
+case STOP:
+case OUT_OF_MEMORY:
+  // Terminate processing now
+  return false;
+case DONE:
+  // No more data but take operation to completion
+  return true;
+default:
+  throw new IllegalStateException();
+  }
+} else {
+  // For build side, use aggregate i.e. average row width across batches
+  batchMemoryManager.update(LEFT_INDEX, 0);
 
 Review comment:
   While the special handling of STOP above prevents failures of uninitialized 
container (DRILL-6517) for STOP, this code does not cover the case of 
uninitialized container (from a single side) with a NONE .
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Charles Givre
Congrats Arina!! Well done!

> On Jul 18, 2018, at 20:59, Paul Rogers  wrote:
> 
> Congratulations Arina!
> 
> - Paul
> 
> 
> 
>On Wednesday, July 18, 2018, 2:19:44 PM PDT, Aman Sinha 
>  wrote:  
> 
> Drill developers,
> Time flies and it is time for a new PMC chair !  Thank you all for your
> support during the past year.
> 
> I am very pleased to announce that the Drill PMC has voted to elect Arina
> Ielchiieva as the new PMC chair of Apache Drill.  She has also been
> approved unanimously by the Apache Board in today's board meeting.  Please
> join me in congratulating Arina !
> 
> Thanks,
> Aman



Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Paul Rogers
Congratulations Arina!

- Paul

 

On Wednesday, July 18, 2018, 2:19:44 PM PDT, Aman Sinha 
 wrote:  
 
 Drill developers,
Time flies and it is time for a new PMC chair !  Thank you all for your
support during the past year.

I am very pleased to announce that the Drill PMC has voted to elect Arina
Ielchiieva as the new PMC chair of Apache Drill.  She has also been
approved unanimously by the Apache Board in today's board meeting.  Please
join me in congratulating Arina !

Thanks,
Aman
  

[GitHub] vrozov commented on a change in pull request #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1333: DRILL-6410: Memory leak in 
Parquet Reader during cancellation
URL: https://github.com/apache/drill/pull/1333#discussion_r203570718
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java
 ##
 @@ -83,21 +80,18 @@
 class AsyncPageReader extends PageReader {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(AsyncPageReader.class);
 
-  private ExecutorService threadPool;
-  private long queueSize;
-  private LinkedBlockingQueue pageQueue;
-  private ConcurrentLinkedQueue> asyncPageRead;
-  private long totalPageValuesRead = 0;
-  private Object pageQueueSyncronize = new Object(); // Object to use to 
synchronize access to the page Queue.
- // FindBugs complains if 
we synchronize on a Concurrent Queue
+  private final ExecutableTasksLatch executableTasksLatch;
 
 Review comment:
   - IMO, most of the time, the source code should document itself and java 
documentation is necessary for an API distributed as an already compiled 
library/jar.
   - I would prefer reviewers to point to obscure code rather than to rely on 
the documentation.
   - Waiting for a review comments helps to avoid inconsistency between the 
code and the documentation (quite a common problem) as usually code evolves 
faster and documentation lags behind. 
   - I already added documentation for `ExecutableTasksLatch` and 
`ExecutableTask`.
   - I'll change comments for `AsyncPageReader` during rebase and merge 
conflict resolution.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


RE: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Robert Wu
Congratulations, Arina!

Best regards,

Rob

-Original Message-
From: Padma Penumarthy  
Sent: Wednesday, July 18, 2018 5:01 PM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New PMC Chair of Apache Drill

Arina, Congratulations and best wishes.

Thanks
Padma



On Wed, Jul 18, 2018 at 4:54 PM, Bridget Bevens  wrote:

> Congratulations, Arina!!!
>
> On Wed, Jul 18, 2018 at 3:20 PM, Parth Chandra  wrote:
>
> > Congratulations
> >
> > On Wed, Jul 18, 2018 at 3:14 PM, Kunal Khatua  wrote:
> >
> > > Congratulations, Arina !
> > > On 7/18/2018 2:26:05 PM, Volodymyr Vysotskyi 
> > > 
> > wrote:
> > > Congratulations, Arina! Well deserved!
> > >
> > > Kind regards,
> > > Volodymyr Vysotskyi
> > >
> > >
> > > On Thu, Jul 19, 2018 at 12:24 AM Abhishek Girish wrote:
> > >
> > > > Congratulations, Arina!
> > > >
> > > > On Wed, Jul 18, 2018 at 2:19 PM Aman Sinha wrote:
> > > >
> > > > > Drill developers,
> > > > > Time flies and it is time for a new PMC chair ! Thank you all 
> > > > > for
> > your
> > > > > support during the past year.
> > > > >
> > > > > I am very pleased to announce that the Drill PMC has voted to 
> > > > > elect
> > > Arina
> > > > > Ielchiieva as the new PMC chair of Apache Drill. She has also 
> > > > > been approved unanimously by the Apache Board in today's board 
> > > > > meeting.
> > > > Please
> > > > > join me in congratulating Arina !
> > > > >
> > > > > Thanks,
> > > > > Aman
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Padma Penumarthy
Arina, Congratulations and best wishes.

Thanks
Padma



On Wed, Jul 18, 2018 at 4:54 PM, Bridget Bevens  wrote:

> Congratulations, Arina!!!
>
> On Wed, Jul 18, 2018 at 3:20 PM, Parth Chandra  wrote:
>
> > Congratulations
> >
> > On Wed, Jul 18, 2018 at 3:14 PM, Kunal Khatua  wrote:
> >
> > > Congratulations, Arina !
> > > On 7/18/2018 2:26:05 PM, Volodymyr Vysotskyi 
> > wrote:
> > > Congratulations, Arina! Well deserved!
> > >
> > > Kind regards,
> > > Volodymyr Vysotskyi
> > >
> > >
> > > On Thu, Jul 19, 2018 at 12:24 AM Abhishek Girish wrote:
> > >
> > > > Congratulations, Arina!
> > > >
> > > > On Wed, Jul 18, 2018 at 2:19 PM Aman Sinha wrote:
> > > >
> > > > > Drill developers,
> > > > > Time flies and it is time for a new PMC chair ! Thank you all for
> > your
> > > > > support during the past year.
> > > > >
> > > > > I am very pleased to announce that the Drill PMC has voted to elect
> > > Arina
> > > > > Ielchiieva as the new PMC chair of Apache Drill. She has also been
> > > > > approved unanimously by the Apache Board in today's board meeting.
> > > > Please
> > > > > join me in congratulating Arina !
> > > > >
> > > > > Thanks,
> > > > > Aman
> > > > >
> > > >
> > >
> >
>


[GitHub] vrozov commented on a change in pull request #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1333: DRILL-6410: Memory leak in 
Parquet Reader during cancellation
URL: https://github.com/apache/drill/pull/1333#discussion_r203566882
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/ExecutableTasksLatch.java
 ##
 @@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.util;
+
+import java.util.Collection;
+import java.util.LinkedList;
+import java.util.Queue;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.LockSupport;
+import java.util.function.BooleanSupplier;
+
+import org.apache.drill.exec.testing.CountDownLatchInjection;
+import org.apache.drill.exec.testing.NoOpControlsInjector;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+
+/**
+ * A wrapper class around {@linkplain ExecutorService an execution service} 
that allows a thread that instantiated an
+ * instance of the class to wait for a submitted task to complete (either 
successfully or unsuccessfully) or to wait for
+ * all submitted tasks to complete.
+ * @param  type of tasks to execute. C must extend {@linkplain 
Callable}{@literal }
+ */
+public class ExecutableTasksLatch> {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExecutableTasksLatch.class);
+
+  /**
+   * An interface that class {@literal } may optionally implement to 
receive a callback when a submitted for execution
+   * task starts or finishes.
+   */
+  public interface Notifiable {
+/**
+ * Notify a task that it is assigned to a thread for execution
+ */
+void started();
+
+/**
+ * A callback in case a task is considered to be successfully completed 
(not cancelled).
+ */
+void finished();
+  }
+
+  private final ExecutorService executor;
+  private final Thread thread;
+  private final AtomicInteger count;
+  private final Queue> executableTasks;
+  private final CountDownLatchInjection testCountDownLatch;
+
+  /**
+   * Constructs an instance of ExecutableTasksLatch. A thread where 
construction is done becomes the waiting
+   * thread.
+   * @param executor instance of {@linkplain ExecutorService execution 
service} to wrap
+   * @param testCountDownLatch optional {@linkplain CountDownLatchInjection}
+   */
+  public ExecutableTasksLatch(ExecutorService executor, 
CountDownLatchInjection testCountDownLatch) {
+this.executor = executor;
+thread = Thread.currentThread();
+count = new AtomicInteger();
+executableTasks = new LinkedList<>();
+this.testCountDownLatch = testCountDownLatch == null ? 
NoOpControlsInjector.LATCH : testCountDownLatch;
+  }
+
+  /**
+   * Waits for an earliest submitted task to complete and removes task from a 
collection of known tasks.
+   * @throws ExecutionException if task threw an Exception during execution
+   * @throws InterruptedException if wait is interrupted
+   */
+  public void take() throws ExecutionException, InterruptedException {
+final ExecutableTask task = executableTasks.peek();
+Preconditions.checkState(task != null, "No tasks are scheduled for 
execution");
+while (ExecutableTask.State.COMPLETING.compareTo(task.state.get()) > 0) {
+  if (Thread.interrupted()) {
+throw new InterruptedException();
+  }
+  LockSupport.park();
+}
+executableTasks.remove();
+while (ExecutableTask.State.COMPLETING.compareTo(task.state.get()) == 0) {
+  Thread.yield();
+}
+
+if (task.exception != null) {
+  throw task.exception;
+}
+  }
+
+  /**
+   * @return immutable collection of submitted for execution tasks that are 
not yet taken from
+   * the {@linkplain ExecutableTasksLatch}
+   */
+  public Collection> getExecutableTasks() {
+return ImmutableList.copyOf(executableTasks);
+  }
+
+  /**
+   * submits a task for execution by {@linkplain ExecutorService}
+   * @param callable task to e

Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Bridget Bevens
Congratulations, Arina!!!

On Wed, Jul 18, 2018 at 3:20 PM, Parth Chandra  wrote:

> Congratulations
>
> On Wed, Jul 18, 2018 at 3:14 PM, Kunal Khatua  wrote:
>
> > Congratulations, Arina !
> > On 7/18/2018 2:26:05 PM, Volodymyr Vysotskyi 
> wrote:
> > Congratulations, Arina! Well deserved!
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> >
> > On Thu, Jul 19, 2018 at 12:24 AM Abhishek Girish wrote:
> >
> > > Congratulations, Arina!
> > >
> > > On Wed, Jul 18, 2018 at 2:19 PM Aman Sinha wrote:
> > >
> > > > Drill developers,
> > > > Time flies and it is time for a new PMC chair ! Thank you all for
> your
> > > > support during the past year.
> > > >
> > > > I am very pleased to announce that the Drill PMC has voted to elect
> > Arina
> > > > Ielchiieva as the new PMC chair of Apache Drill. She has also been
> > > > approved unanimously by the Apache Board in today's board meeting.
> > > Please
> > > > join me in congratulating Arina !
> > > >
> > > > Thanks,
> > > > Aman
> > > >
> > >
> >
>


[GitHub] vrozov commented on a change in pull request #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1333: DRILL-6410: Memory leak in 
Parquet Reader during cancellation
URL: https://github.com/apache/drill/pull/1333#discussion_r203566491
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/ExecutableTasksLatch.java
 ##
 @@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.util;
+
+import java.util.Collection;
+import java.util.LinkedList;
+import java.util.Queue;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.LockSupport;
+import java.util.function.BooleanSupplier;
+
+import org.apache.drill.exec.testing.CountDownLatchInjection;
+import org.apache.drill.exec.testing.NoOpControlsInjector;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+
+/**
+ * A wrapper class around {@linkplain ExecutorService an execution service} 
that allows a thread that instantiated an
+ * instance of the class to wait for a submitted task to complete (either 
successfully or unsuccessfully) or to wait for
+ * all submitted tasks to complete.
+ * @param  type of tasks to execute. C must extend {@linkplain 
Callable}{@literal }
+ */
+public class ExecutableTasksLatch> {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExecutableTasksLatch.class);
+
+  /**
+   * An interface that class {@literal } may optionally implement to 
receive a callback when a submitted for execution
+   * task starts or finishes.
+   */
+  public interface Notifiable {
+/**
+ * Notify a task that it is assigned to a thread for execution
+ */
+void started();
+
+/**
+ * A callback in case a task is considered to be successfully completed 
(not cancelled).
+ */
+void finished();
+  }
+
+  private final ExecutorService executor;
+  private final Thread thread;
+  private final AtomicInteger count;
+  private final Queue> executableTasks;
+  private final CountDownLatchInjection testCountDownLatch;
+
+  /**
+   * Constructs an instance of ExecutableTasksLatch. A thread where 
construction is done becomes the waiting
+   * thread.
+   * @param executor instance of {@linkplain ExecutorService execution 
service} to wrap
+   * @param testCountDownLatch optional {@linkplain CountDownLatchInjection}
+   */
+  public ExecutableTasksLatch(ExecutorService executor, 
CountDownLatchInjection testCountDownLatch) {
+this.executor = executor;
+thread = Thread.currentThread();
+count = new AtomicInteger();
+executableTasks = new LinkedList<>();
+this.testCountDownLatch = testCountDownLatch == null ? 
NoOpControlsInjector.LATCH : testCountDownLatch;
+  }
+
+  /**
+   * Waits for an earliest submitted task to complete and removes task from a 
collection of known tasks.
+   * @throws ExecutionException if task threw an Exception during execution
+   * @throws InterruptedException if wait is interrupted
+   */
+  public void take() throws ExecutionException, InterruptedException {
+final ExecutableTask task = executableTasks.peek();
+Preconditions.checkState(task != null, "No tasks are scheduled for 
execution");
+while (ExecutableTask.State.COMPLETING.compareTo(task.state.get()) > 0) {
+  if (Thread.interrupted()) {
+throw new InterruptedException();
+  }
+  LockSupport.park();
+}
+executableTasks.remove();
+while (ExecutableTask.State.COMPLETING.compareTo(task.state.get()) == 0) {
+  Thread.yield();
+}
+
+if (task.exception != null) {
+  throw task.exception;
+}
+  }
+
+  /**
+   * @return immutable collection of submitted for execution tasks that are 
not yet taken from
+   * the {@linkplain ExecutableTasksLatch}
+   */
+  public Collection> getExecutableTasks() {
+return ImmutableList.copyOf(executableTasks);
+  }
+
+  /**
+   * submits a task for execution by {@linkplain ExecutorService}
+   * @param callable task to e

[GitHub] vrozov commented on a change in pull request #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1333: DRILL-6410: Memory leak in 
Parquet Reader during cancellation
URL: https://github.com/apache/drill/pull/1333#discussion_r203566303
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java
 ##
 @@ -284,30 +278,9 @@ protected void nextInternal() throws IOException {
 
   }
 
-  private void waitForExecutionResult() throws InterruptedException, 
ExecutionException {
-// Get the execution result but don't remove the Future object from the 
"asyncPageRead" queue yet;
-// this will ensure that cleanup will happen properly in case of an 
exception being thrown
-asyncPageRead.peek().get(); // get the result of execution
-// Alright now remove the Future object
-asyncPageRead.poll();
-  }
-
   @Override public void clear() {
 //Cancelling all existing AsyncPageReaderTasks
-while (asyncPageRead != null && !asyncPageRead.isEmpty()) {
-  try {
-Future f = asyncPageRead.poll();
-if(!f.isDone() && !f.isCancelled()){
-  f.cancel(true);
-} else {
-  f.get(1, TimeUnit.MILLISECONDS);
-}
-  } catch (RuntimeException e) {
-// Do Nothing
-  } catch (Exception e) {
-// Do nothing.
-  }
-}
+executableTasksLatch.await(() -> true);
 
 Review comment:
   No, `ExecutableTasksLatch.await()` guarantees that when it returns all tasks 
submitted for execution are either done or canceled. `Future.cancel()` does not 
wait for the `FutureTask` to be canceled as it merely interrupts the thread 
where `FutureTask` is running (in the case it is already running). Note that 
after `Future` is canceled it is not possible to check whether it is finished 
or not (`Future.get` throws `CancellationException`).
   
   Missing guarantee that all tasks are finished when `clear()` returns, so 
some tasks continue to run and reference vectors, so allocator reports a memory 
leak. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Gautam Parai
Congratulations Arina!

Gautam

On Wed, Jul 18, 2018 at 4:03 PM, Boaz Ben-Zvi  wrote:

>"plus one" more congratulations 
>
>
>
> On 7/18/18 3:20 PM, Parth Chandra wrote:
>
>> Congratulations
>>
>> On Wed, Jul 18, 2018 at 3:14 PM, Kunal Khatua  wrote:
>>
>> Congratulations, Arina !
>>> On 7/18/2018 2:26:05 PM, Volodymyr Vysotskyi 
>>> wrote:
>>> Congratulations, Arina! Well deserved!
>>>
>>> Kind regards,
>>> Volodymyr Vysotskyi
>>>
>>>
>>> On Thu, Jul 19, 2018 at 12:24 AM Abhishek Girish wrote:
>>>
>>> Congratulations, Arina!

 On Wed, Jul 18, 2018 at 2:19 PM Aman Sinha wrote:

 Drill developers,
> Time flies and it is time for a new PMC chair ! Thank you all for your
> support during the past year.
>
> I am very pleased to announce that the Drill PMC has voted to elect
>
 Arina
>>>
 Ielchiieva as the new PMC chair of Apache Drill. She has also been
> approved unanimously by the Apache Board in today's board meeting.
>
 Please

> join me in congratulating Arina !
>
> Thanks,
> Aman
>
>
>


Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Boaz Ben-Zvi

   "plus one" more congratulations 


On 7/18/18 3:20 PM, Parth Chandra wrote:

Congratulations

On Wed, Jul 18, 2018 at 3:14 PM, Kunal Khatua  wrote:


Congratulations, Arina !
On 7/18/2018 2:26:05 PM, Volodymyr Vysotskyi  wrote:
Congratulations, Arina! Well deserved!

Kind regards,
Volodymyr Vysotskyi


On Thu, Jul 19, 2018 at 12:24 AM Abhishek Girish wrote:


Congratulations, Arina!

On Wed, Jul 18, 2018 at 2:19 PM Aman Sinha wrote:


Drill developers,
Time flies and it is time for a new PMC chair ! Thank you all for your
support during the past year.

I am very pleased to announce that the Drill PMC has voted to elect

Arina

Ielchiieva as the new PMC chair of Apache Drill. She has also been
approved unanimously by the Apache Board in today's board meeting.

Please

join me in congratulating Arina !

Thanks,
Aman





[GitHub] parthchandra commented on a change in pull request #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation

2018-07-18 Thread GitBox
parthchandra commented on a change in pull request #1333: DRILL-6410: Memory 
leak in Parquet Reader during cancellation
URL: https://github.com/apache/drill/pull/1333#discussion_r203556768
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java
 ##
 @@ -284,30 +278,9 @@ protected void nextInternal() throws IOException {
 
   }
 
-  private void waitForExecutionResult() throws InterruptedException, 
ExecutionException {
-// Get the execution result but don't remove the Future object from the 
"asyncPageRead" queue yet;
-// this will ensure that cleanup will happen properly in case of an 
exception being thrown
-asyncPageRead.peek().get(); // get the result of execution
-// Alright now remove the Future object
-asyncPageRead.poll();
-  }
-
   @Override public void clear() {
 //Cancelling all existing AsyncPageReaderTasks
-while (asyncPageRead != null && !asyncPageRead.isEmpty()) {
-  try {
-Future f = asyncPageRead.poll();
-if(!f.isDone() && !f.isCancelled()){
-  f.cancel(true);
-} else {
-  f.get(1, TimeUnit.MILLISECONDS);
-}
-  } catch (RuntimeException e) {
-// Do Nothing
-  } catch (Exception e) {
-// Do nothing.
-  }
-}
+executableTasksLatch.await(() -> true);
 
 Review comment:
   Isn't the original code doing the same thing that `await` is doing? TBH, I'd 
really like to understand where the memory leak was occurring. (Just trying to 
understand how this PR, does, in fact, fix the issue)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] parthchandra commented on a change in pull request #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation

2018-07-18 Thread GitBox
parthchandra commented on a change in pull request #1333: DRILL-6410: Memory 
leak in Parquet Reader during cancellation
URL: https://github.com/apache/drill/pull/1333#discussion_r203207677
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java
 ##
 @@ -83,21 +80,18 @@
 class AsyncPageReader extends PageReader {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(AsyncPageReader.class);
 
-  private ExecutorService threadPool;
-  private long queueSize;
-  private LinkedBlockingQueue pageQueue;
-  private ConcurrentLinkedQueue> asyncPageRead;
-  private long totalPageValuesRead = 0;
-  private Object pageQueueSyncronize = new Object(); // Object to use to 
synchronize access to the page Queue.
- // FindBugs complains if 
we synchronize on a Concurrent Queue
+  private final ExecutableTasksLatch executableTasksLatch;
 
 Review comment:
   The documentation would be useful for the reviewer too. It is easier to 
check the code if one knows what what guarantees ExecutableTasksLatch is 
supposed to provide. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] parthchandra commented on a change in pull request #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation

2018-07-18 Thread GitBox
parthchandra commented on a change in pull request #1333: DRILL-6410: Memory 
leak in Parquet Reader during cancellation
URL: https://github.com/apache/drill/pull/1333#discussion_r203210709
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java
 ##
 @@ -83,21 +80,18 @@
 class AsyncPageReader extends PageReader {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(AsyncPageReader.class);
 
-  private ExecutorService threadPool;
-  private long queueSize;
-  private LinkedBlockingQueue pageQueue;
-  private ConcurrentLinkedQueue> asyncPageRead;
-  private long totalPageValuesRead = 0;
-  private Object pageQueueSyncronize = new Object(); // Object to use to 
synchronize access to the page Queue.
- // FindBugs complains if 
we synchronize on a Concurrent Queue
+  private final ExecutableTasksLatch executableTasksLatch;
 
 Review comment:
   Note that the class comment for AsyncPageReader needs to be updated too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] parthchandra commented on a change in pull request #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation

2018-07-18 Thread GitBox
parthchandra commented on a change in pull request #1333: DRILL-6410: Memory 
leak in Parquet Reader during cancellation
URL: https://github.com/apache/drill/pull/1333#discussion_r203482144
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/ExecutableTasksLatch.java
 ##
 @@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.util;
+
+import java.util.Collection;
+import java.util.LinkedList;
+import java.util.Queue;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.LockSupport;
+import java.util.function.BooleanSupplier;
+
+import org.apache.drill.exec.testing.CountDownLatchInjection;
+import org.apache.drill.exec.testing.NoOpControlsInjector;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+
+/**
+ * A wrapper class around {@linkplain ExecutorService an execution service} 
that allows a thread that instantiated an
+ * instance of the class to wait for a submitted task to complete (either 
successfully or unsuccessfully) or to wait for
+ * all submitted tasks to complete.
+ * @param  type of tasks to execute. C must extend {@linkplain 
Callable}{@literal }
+ */
+public class ExecutableTasksLatch> {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExecutableTasksLatch.class);
+
+  /**
+   * An interface that class {@literal } may optionally implement to 
receive a callback when a submitted for execution
+   * task starts or finishes.
+   */
+  public interface Notifiable {
+/**
+ * Notify a task that it is assigned to a thread for execution
+ */
+void started();
+
+/**
+ * A callback in case a task is considered to be successfully completed 
(not cancelled).
+ */
+void finished();
+  }
+
+  private final ExecutorService executor;
+  private final Thread thread;
+  private final AtomicInteger count;
+  private final Queue> executableTasks;
+  private final CountDownLatchInjection testCountDownLatch;
+
+  /**
+   * Constructs an instance of ExecutableTasksLatch. A thread where 
construction is done becomes the waiting
+   * thread.
+   * @param executor instance of {@linkplain ExecutorService execution 
service} to wrap
+   * @param testCountDownLatch optional {@linkplain CountDownLatchInjection}
+   */
+  public ExecutableTasksLatch(ExecutorService executor, 
CountDownLatchInjection testCountDownLatch) {
+this.executor = executor;
+thread = Thread.currentThread();
+count = new AtomicInteger();
+executableTasks = new LinkedList<>();
+this.testCountDownLatch = testCountDownLatch == null ? 
NoOpControlsInjector.LATCH : testCountDownLatch;
+  }
+
+  /**
+   * Waits for an earliest submitted task to complete and removes task from a 
collection of known tasks.
+   * @throws ExecutionException if task threw an Exception during execution
+   * @throws InterruptedException if wait is interrupted
+   */
+  public void take() throws ExecutionException, InterruptedException {
+final ExecutableTask task = executableTasks.peek();
+Preconditions.checkState(task != null, "No tasks are scheduled for 
execution");
+while (ExecutableTask.State.COMPLETING.compareTo(task.state.get()) > 0) {
+  if (Thread.interrupted()) {
+throw new InterruptedException();
+  }
+  LockSupport.park();
+}
+executableTasks.remove();
+while (ExecutableTask.State.COMPLETING.compareTo(task.state.get()) == 0) {
+  Thread.yield();
+}
+
+if (task.exception != null) {
+  throw task.exception;
+}
+  }
+
+  /**
+   * @return immutable collection of submitted for execution tasks that are 
not yet taken from
+   * the {@linkplain ExecutableTasksLatch}
+   */
+  public Collection> getExecutableTasks() {
+return ImmutableList.copyOf(executableTasks);
+  }
+
+  /**
+   * submits a task for execution by {@linkplain ExecutorService}
+   * @param callable tas

[GitHub] parthchandra commented on a change in pull request #1333: DRILL-6410: Memory leak in Parquet Reader during cancellation

2018-07-18 Thread GitBox
parthchandra commented on a change in pull request #1333: DRILL-6410: Memory 
leak in Parquet Reader during cancellation
URL: https://github.com/apache/drill/pull/1333#discussion_r203556793
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/util/ExecutableTasksLatch.java
 ##
 @@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.util;
+
+import java.util.Collection;
+import java.util.LinkedList;
+import java.util.Queue;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.LockSupport;
+import java.util.function.BooleanSupplier;
+
+import org.apache.drill.exec.testing.CountDownLatchInjection;
+import org.apache.drill.exec.testing.NoOpControlsInjector;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+
+/**
+ * A wrapper class around {@linkplain ExecutorService an execution service} 
that allows a thread that instantiated an
+ * instance of the class to wait for a submitted task to complete (either 
successfully or unsuccessfully) or to wait for
+ * all submitted tasks to complete.
+ * @param  type of tasks to execute. C must extend {@linkplain 
Callable}{@literal }
+ */
+public class ExecutableTasksLatch> {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExecutableTasksLatch.class);
+
+  /**
+   * An interface that class {@literal } may optionally implement to 
receive a callback when a submitted for execution
+   * task starts or finishes.
+   */
+  public interface Notifiable {
+/**
+ * Notify a task that it is assigned to a thread for execution
+ */
+void started();
+
+/**
+ * A callback in case a task is considered to be successfully completed 
(not cancelled).
+ */
+void finished();
+  }
+
+  private final ExecutorService executor;
+  private final Thread thread;
+  private final AtomicInteger count;
+  private final Queue> executableTasks;
+  private final CountDownLatchInjection testCountDownLatch;
+
+  /**
+   * Constructs an instance of ExecutableTasksLatch. A thread where 
construction is done becomes the waiting
+   * thread.
+   * @param executor instance of {@linkplain ExecutorService execution 
service} to wrap
+   * @param testCountDownLatch optional {@linkplain CountDownLatchInjection}
+   */
+  public ExecutableTasksLatch(ExecutorService executor, 
CountDownLatchInjection testCountDownLatch) {
+this.executor = executor;
+thread = Thread.currentThread();
+count = new AtomicInteger();
+executableTasks = new LinkedList<>();
+this.testCountDownLatch = testCountDownLatch == null ? 
NoOpControlsInjector.LATCH : testCountDownLatch;
+  }
+
+  /**
+   * Waits for an earliest submitted task to complete and removes task from a 
collection of known tasks.
+   * @throws ExecutionException if task threw an Exception during execution
+   * @throws InterruptedException if wait is interrupted
+   */
+  public void take() throws ExecutionException, InterruptedException {
+final ExecutableTask task = executableTasks.peek();
+Preconditions.checkState(task != null, "No tasks are scheduled for 
execution");
+while (ExecutableTask.State.COMPLETING.compareTo(task.state.get()) > 0) {
+  if (Thread.interrupted()) {
+throw new InterruptedException();
+  }
+  LockSupport.park();
+}
+executableTasks.remove();
+while (ExecutableTask.State.COMPLETING.compareTo(task.state.get()) == 0) {
+  Thread.yield();
+}
+
+if (task.exception != null) {
+  throw task.exception;
+}
+  }
+
+  /**
+   * @return immutable collection of submitted for execution tasks that are 
not yet taken from
+   * the {@linkplain ExecutableTasksLatch}
+   */
+  public Collection> getExecutableTasks() {
+return ImmutableList.copyOf(executableTasks);
+  }
+
+  /**
+   * submits a task for execution by {@linkplain ExecutorService}
+   * @param callable tas

Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Parth Chandra
Congratulations

On Wed, Jul 18, 2018 at 3:14 PM, Kunal Khatua  wrote:

> Congratulations, Arina !
> On 7/18/2018 2:26:05 PM, Volodymyr Vysotskyi  wrote:
> Congratulations, Arina! Well deserved!
>
> Kind regards,
> Volodymyr Vysotskyi
>
>
> On Thu, Jul 19, 2018 at 12:24 AM Abhishek Girish wrote:
>
> > Congratulations, Arina!
> >
> > On Wed, Jul 18, 2018 at 2:19 PM Aman Sinha wrote:
> >
> > > Drill developers,
> > > Time flies and it is time for a new PMC chair ! Thank you all for your
> > > support during the past year.
> > >
> > > I am very pleased to announce that the Drill PMC has voted to elect
> Arina
> > > Ielchiieva as the new PMC chair of Apache Drill. She has also been
> > > approved unanimously by the Apache Board in today's board meeting.
> > Please
> > > join me in congratulating Arina !
> > >
> > > Thanks,
> > > Aman
> > >
> >
>


Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Kunal Khatua
Congratulations, Arina !
On 7/18/2018 2:26:05 PM, Volodymyr Vysotskyi  wrote:
Congratulations, Arina! Well deserved!

Kind regards,
Volodymyr Vysotskyi


On Thu, Jul 19, 2018 at 12:24 AM Abhishek Girish wrote:

> Congratulations, Arina!
>
> On Wed, Jul 18, 2018 at 2:19 PM Aman Sinha wrote:
>
> > Drill developers,
> > Time flies and it is time for a new PMC chair ! Thank you all for your
> > support during the past year.
> >
> > I am very pleased to announce that the Drill PMC has voted to elect Arina
> > Ielchiieva as the new PMC chair of Apache Drill. She has also been
> > approved unanimously by the Apache Board in today's board meeting.
> Please
> > join me in congratulating Arina !
> >
> > Thanks,
> > Aman
> >
>


[jira] [Resolved] (DRILL-6605) TPCDS-84 Query does not return any rows

2018-07-18 Thread Robert Hou (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-6605.
---
Resolution: Fixed

> TPCDS-84 Query does not return any rows
> ---
>
> Key: DRILL-6605
> URL: https://issues.apache.org/jira/browse/DRILL-6605
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Robert Hou
>Assignee: Robert Hou
>Priority: Major
> Attachments: drillbit.log.node80, drillbit.log.node81, 
> drillbit.log.node82, drillbit.log.node83, drillbit.log.node85, 
> drillbit.log.node86, drillbit.log.node87, drillbit.log.node88
>
>
> Query is:
> Advanced/tpcds/tpcds_sf100/hive/parquet/query84.sql
> This uses the hive parquet reader.
> {code:sql}
> SELECT c_customer_id   AS customer_id,
> c_last_name
> || ', '
> || c_first_name AS customername
> FROM   customer,
> customer_address,
> customer_demographics,
> household_demographics,
> income_band,
> store_returns
> WHERE  ca_city = 'Green Acres'
> AND c_current_addr_sk = ca_address_sk
> AND ib_lower_bound >= 54986
> AND ib_upper_bound <= 54986 + 5
> AND ib_income_band_sk = hd_income_band_sk
> AND cd_demo_sk = c_current_cdemo_sk
> AND hd_demo_sk = c_current_hdemo_sk
> AND sr_cdemo_sk = cd_demo_sk
> ORDER  BY c_customer_id
> LIMIT 100
> {code}
> This query should return 100 rows.  It does not return any rows.
> Here is the explain plan:
> {noformat}
> | 00-00Screen
> 00-01  Project(customer_id=[$0], customername=[$1])
> 00-02SelectionVectorRemover
> 00-03  Limit(fetch=[100])
> 00-04SingleMergeExchange(sort0=[0])
> 01-01  OrderedMuxExchange(sort0=[0])
> 02-01SelectionVectorRemover
> 02-02  TopN(limit=[100])
> 02-03HashToRandomExchange(dist0=[[$0]])
> 03-01  Project(customer_id=[$0], customername=[||(||($5, 
> ', '), $4)])
> 03-02Project(c_customer_id=[$1], 
> c_current_cdemo_sk=[$2], c_current_hdemo_sk=[$3], c_current_addr_sk=[$4], 
> c_first_name=[$5], c_last_name=[$6], ca_address_sk=[$8], ca_city=[$9], 
> cd_demo_sk=[$7], hd_demo_sk=[$10], hd_income_band_sk=[$11], 
> ib_income_band_sk=[$12], ib_lower_bound=[$13], ib_upper_bound=[$14], 
> sr_cdemo_sk=[$0])
> 03-03  HashJoin(condition=[=($7, $0)], 
> joinType=[inner])
> 03-05HashToRandomExchange(dist0=[[$0]])
> 04-01  Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:store_returns), 
> columns=[`sr_cdemo_sk`], numPartitions=0, partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/web_returns], 
> confProperties={}]])
> 03-04HashToRandomExchange(dist0=[[$6]])
> 05-01  HashJoin(condition=[=($2, $9)], 
> joinType=[inner])
> 05-03HashJoin(condition=[=($3, $7)], 
> joinType=[inner])
> 05-05  HashJoin(condition=[=($1, $6)], 
> joinType=[inner])
> 05-07Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:customer), 
> columns=[`c_customer_id`, `c_current_cdemo_sk`, `c_current_hdemo_sk`, 
> `c_current_addr_sk`, `c_first_name`, `c_last_name`], numPartitions=0, 
> partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer], 
> confProperties={}]])
> 05-06BroadcastExchange
> 06-01  Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:customer_demographics), 
> columns=[`cd_demo_sk`], numPartitions=0, partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer_demographics],
>  confProperties={}]])
> 05-04  BroadcastExchange
> 07-01SelectionVectorRemover
> 07-02  Filter(condition=[=($1, 'Green 
> Acres')])
> 07-03Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:customer_address), 
> columns=[`ca_address_sk`, `ca_city`], numPartitions=0, partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer_address],
>  confProperties={}]])
> 05-02BroadcastExchange
> 08-01  HashJoin(condition=[=($1, $2)], 
> joinType=[inner])
> 08-03Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:household_demographics), 
> columns=[`hd_demo_sk`, `hd_income_band_sk`], numPartitions=0, partitions= 
> null, 
> inputDirectories=[maprfs:/drill/

[GitHub] superbstreak opened a new pull request #1388: DRILL-6610: Add support for Minimum TLS restriction.

2018-07-18 Thread GitBox
superbstreak opened a new pull request #1388: DRILL-6610: Add support for 
Minimum TLS restriction.
URL: https://github.com/apache/drill/pull/1388
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Volodymyr Vysotskyi
 Congratulations, Arina! Well deserved!

Kind regards,
Volodymyr Vysotskyi


On Thu, Jul 19, 2018 at 12:24 AM Abhishek Girish  wrote:

> Congratulations, Arina!
>
> On Wed, Jul 18, 2018 at 2:19 PM Aman Sinha  wrote:
>
> > Drill developers,
> > Time flies and it is time for a new PMC chair !  Thank you all for your
> > support during the past year.
> >
> > I am very pleased to announce that the Drill PMC has voted to elect Arina
> > Ielchiieva as the new PMC chair of Apache Drill.  She has also been
> > approved unanimously by the Apache Board in today's board meeting.
> Please
> > join me in congratulating Arina !
> >
> > Thanks,
> > Aman
> >
>


Re: [ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Abhishek Girish
Congratulations, Arina!

On Wed, Jul 18, 2018 at 2:19 PM Aman Sinha  wrote:

> Drill developers,
> Time flies and it is time for a new PMC chair !  Thank you all for your
> support during the past year.
>
> I am very pleased to announce that the Drill PMC has voted to elect Arina
> Ielchiieva as the new PMC chair of Apache Drill.  She has also been
> approved unanimously by the Apache Board in today's board meeting.  Please
> join me in congratulating Arina !
>
> Thanks,
> Aman
>


[ANNOUNCE] New PMC Chair of Apache Drill

2018-07-18 Thread Aman Sinha
Drill developers,
Time flies and it is time for a new PMC chair !  Thank you all for your
support during the past year.

I am very pleased to announce that the Drill PMC has voted to elect Arina
Ielchiieva as the new PMC chair of Apache Drill.  She has also been
approved unanimously by the Apache Board in today's board meeting.  Please
join me in congratulating Arina !

Thanks,
Aman


[GitHub] arina-ielchiieva commented on a change in pull request #1296: DRILL-5365: Prevent plugin config from changing default fs. Make DrillFileSystem Immutable.

2018-07-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #1296: DRILL-5365: 
Prevent plugin config from changing default fs. Make DrillFileSystem Immutable.
URL: https://github.com/apache/drill/pull/1296#discussion_r203505315
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystemCache.java
 ##
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.dfs;
+
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Motivation
+ * 
+ *   This cache is intended to work around the bugs in the {@link 
org.apache.hadoop.fs.FileSystem} static cache (DRILL-5365). Specifically, as of 
Hadoop 2.7.x the
+ *   {@link org.apache.hadoop.fs.FileSystem} cache has the following bad 
behavior:
+ * 
+ * 
+ *   
+ * The {@link org.apache.hadoop.conf.Configuration} object is not 
considered when constructing keys for the {@link 
org.apache.hadoop.fs.FileSystem} cache of
+ * {@link org.apache.hadoop.fs.FileSystem} objects.
+ *   
+ *   
+ *  The {@link org.apache.hadoop.fs.FileSystem} cache does not honor the 
fs.default.name property when constructing keys, only 
fs.defaultFS is used to construct
+ *  keys in the cache.
+ *   
+ * 
+ *
+ * Usage
+ *
+ * 
+ *   
+ * A prerequisite for usage is that all {@link 
org.apache.hadoop.conf.Configuration} objects are normalized with
+ * {@link 
org.apache.drill.exec.store.dfs.DrillFileSystem#normalize(Configuration)}.
+ *   
+ *   
+ * This cache should only be used from {@link 
org.apache.drill.exec.store.dfs.DrillFileSystem}.
+ *   
+ * 
+ *
+ * TODO
+ *
+ * 
+ *   
+ * Drill currently keeps a {@link 
org.apache.drill.exec.store.dfs.DrillFileSystem} open indefinitely. This will 
be corrected
+ * in DRILL-6608. As a result this cache currently has no methods to 
remove {@link org.apache.hadoop.fs.FileSystem} objects
+ * after they are created.
+ *   
+ * 
+ */
+class DrillFileSystemCache {
+  private Map, FileSystem> cache = new HashMap<>();
 
 Review comment:
   static final?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] sohami commented on a change in pull request #1296: DRILL-5365: Prevent plugin config from changing default fs. Make DrillFileSystem Immutable.

2018-07-18 Thread GitBox
sohami commented on a change in pull request #1296: DRILL-5365: Prevent plugin 
config from changing default fs. Make DrillFileSystem Immutable.
URL: https://github.com/apache/drill/pull/1296#discussion_r203504583
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystemCache.java
 ##
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.dfs;
+
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Motivation
+ * 
+ *   This cache is intended to work around the bugs in the {@link 
org.apache.hadoop.fs.FileSystem} static cache (DRILL-5365). Specifically, as of 
Hadoop 2.7.x the
+ *   {@link org.apache.hadoop.fs.FileSystem} cache has the following bad 
behavior:
+ * 
+ * 
+ *   
+ * The {@link org.apache.hadoop.conf.Configuration} object is not 
considered when constructing keys for the {@link 
org.apache.hadoop.fs.FileSystem} cache of
+ * {@link org.apache.hadoop.fs.FileSystem} objects.
+ *   
+ *   
+ *  The {@link org.apache.hadoop.fs.FileSystem} cache does not honor the 
fs.default.name property when constructing keys, only 
fs.defaultFS is used to construct
+ *  keys in the cache.
+ *   
+ * 
+ *
+ * Usage
+ *
+ * 
+ *   
+ * A prerequisite for usage is that all {@link 
org.apache.hadoop.conf.Configuration} objects are normalized with
+ * {@link 
org.apache.drill.exec.store.dfs.DrillFileSystem#normalize(Configuration)}.
+ *   
+ *   
+ * This cache should only be used from {@link 
org.apache.drill.exec.store.dfs.DrillFileSystem}.
+ *   
+ * 
+ *
+ * TODO
+ *
+ * 
+ *   
+ * Drill currently keeps a {@link 
org.apache.drill.exec.store.dfs.DrillFileSystem} open indefinitely. This will 
be corrected
+ * in DRILL-6608. As a result this cache currently has no methods to 
remove {@link org.apache.hadoop.fs.FileSystem} objects
+ * after they are created.
+ *   
+ * 
+ */
+class DrillFileSystemCache {
+  private Map, FileSystem> cache = new HashMap<>();
+
+  /**
+   * If a {@link org.apache.hadoop.fs.FileSystem} object corresponding to the 
given {@link org.apache.hadoop.conf.Configuration}
+   * exists in the cache, then it is returned. If no corresponding {@link 
org.apache.hadoop.fs.FileSystem} exist, then it is created,
+   * added to the cache, and returned.
+   * @param configuration The {@link org.apache.hadoop.conf.Configuration} 
corresponding to the desired {@link org.apache.hadoop.fs.FileSystem}
+   *  object. It is expected that this configuration is 
first normalized with {@link 
org.apache.drill.exec.store.dfs.DrillFileSystem#normalize(Configuration)}.
+   * @return The {@link org.apache.hadoop.fs.FileSystem} object corresponding 
to he given {@link org.apache.hadoop.conf.Configuration}.
+   * @throws IOException An error when creating the desired {@link 
org.apache.hadoop.fs.FileSystem} object.
+   */
+  protected synchronized FileSystem get(final Configuration configuration) 
throws IOException {
+final Map map = new HashMap<>(configToMap(configuration));
+
+if (!cache.containsKey(map)) {
+  cache.put(map, FileSystem.newInstance(configuration));
+}
+
+return cache.get(map);
+  }
+
+  static Map configToMap(final Configuration configuration) {
+Preconditions.checkNotNull(configuration);
+final Map map = new HashMap<>();
+
+for (Map.Entry entry: configuration) {
+  map.put(entry.getKey().trim(), entry.getValue());
 
 Review comment:
   In this implementation we are only considering `Configuration` object in 
Cache key. How about other parameters which [HadoopFileSystem 
cache](https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L2812)
 is using ? Specially I am concerned about the UGI object from which the user 
information is retrieved which is later checked for access control.
   
   Also for handling `fs.default.name` and `fs.defaultFS` once it's set in the 
configuration parameter and File

[GitHub] sohami commented on a change in pull request #1296: DRILL-5365: Prevent plugin config from changing default fs. Make DrillFileSystem Immutable.

2018-07-18 Thread GitBox
sohami commented on a change in pull request #1296: DRILL-5365: Prevent plugin 
config from changing default fs. Make DrillFileSystem Immutable.
URL: https://github.com/apache/drill/pull/1296#discussion_r203504920
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java
 ##
 @@ -65,46 +62,105 @@
 import com.google.common.collect.Maps;
 
 /**
- * DrillFileSystem is the wrapper around the actual FileSystem implementation.
+ * DrillFileSystem is the wrapper around the actual FileSystem implementation. 
The {@link DrillFileSystem} is
+ * immutable.
  *
  * If {@link org.apache.drill.exec.ops.OperatorStats} are provided it returns 
an instrumented FSDataInputStream to
  * measure IO wait time and tracking file open/close operations.
  */
 public class DrillFileSystem extends FileSystem implements OpenFileTracker {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillFileSystem.class);
   private final static boolean TRACKING_ENABLED = 
AssertionUtil.isAssertionsEnabled();
+  private final static DrillFileSystemCache CACHE = new DrillFileSystemCache();
 
+  public static final String FS_DEFAULT_NAME = "fs.default.name";
   public static final String UNDERSCORE_PREFIX = "_";
   public static final String DOT_PREFIX = ".";
 
   private final ConcurrentMap 
openedFiles = Maps.newConcurrentMap();
 
+  private final Configuration fsConf;
   private final FileSystem underlyingFs;
   private final OperatorStats operatorStats;
   private final CompressionCodecFactory codecFactory;
 
+  private boolean initialized = false;
+
   public DrillFileSystem(Configuration fsConf) throws IOException {
 this(fsConf, null);
   }
 
   public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) 
throws IOException {
-this.underlyingFs = FileSystem.get(fsConf);
+// Configuration objects are mutable, and the underlying FileSystem object 
may directly use a passed in Configuration.
+// In order to avoid scenarios where a Configuration can change after a 
DrillFileSystem is created, we make a copy
+// of the Configuration.
+this.fsConf = new Configuration(fsConf);
+normalize(fsConf);
 
 Review comment:
   Why not use `Configuration.setDeprecateProperties` instead ? See 
[here](https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L577)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1296: DRILL-5365: Prevent plugin config from changing default fs. Make DrillFileSystem Immutable.

2018-07-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #1296: DRILL-5365: 
Prevent plugin config from changing default fs. Make DrillFileSystem Immutable.
URL: https://github.com/apache/drill/pull/1296#discussion_r203505132
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java
 ##
 @@ -65,46 +62,105 @@
 import com.google.common.collect.Maps;
 
 /**
- * DrillFileSystem is the wrapper around the actual FileSystem implementation.
+ * DrillFileSystem is the wrapper around the actual FileSystem implementation. 
The {@link DrillFileSystem} is
+ * immutable.
  *
  * If {@link org.apache.drill.exec.ops.OperatorStats} are provided it returns 
an instrumented FSDataInputStream to
  * measure IO wait time and tracking file open/close operations.
  */
 public class DrillFileSystem extends FileSystem implements OpenFileTracker {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DrillFileSystem.class);
   private final static boolean TRACKING_ENABLED = 
AssertionUtil.isAssertionsEnabled();
+  private final static DrillFileSystemCache CACHE = new DrillFileSystemCache();
 
+  public static final String FS_DEFAULT_NAME = "fs.default.name";
   public static final String UNDERSCORE_PREFIX = "_";
   public static final String DOT_PREFIX = ".";
 
   private final ConcurrentMap 
openedFiles = Maps.newConcurrentMap();
 
+  private final Configuration fsConf;
   private final FileSystem underlyingFs;
   private final OperatorStats operatorStats;
   private final CompressionCodecFactory codecFactory;
 
+  private boolean initialized = false;
+
   public DrillFileSystem(Configuration fsConf) throws IOException {
 this(fsConf, null);
   }
 
   public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) 
throws IOException {
-this.underlyingFs = FileSystem.get(fsConf);
+// Configuration objects are mutable, and the underlying FileSystem object 
may directly use a passed in Configuration.
+// In order to avoid scenarios where a Configuration can change after a 
DrillFileSystem is created, we make a copy
+// of the Configuration.
+this.fsConf = new Configuration(fsConf);
+normalize(fsConf);
+
+this.underlyingFs = CACHE.get(fsConf);
 this.codecFactory = new CompressionCodecFactory(fsConf);
 this.operatorStats = operatorStats;
+this.initialized = true;
 
 Review comment:
   @ilooner if this flag is set in constructor to true do we need it at all?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on issue #1264: DRILL-6422: Update guava to 23.0 and shade it

2018-07-18 Thread GitBox
vrozov commented on issue #1264:  DRILL-6422: Update guava to 23.0 and shade it
URL: https://github.com/apache/drill/pull/1264#issuecomment-406040097
 
 
   Shaded jar needs to be published to maven no matter what. In case of a 
single PR, publishing to maven needs to happen **before** PR is merged. In the 
case of two PR, there is no such dependency. The first PR can be merged without 
publishing shaded jars to maven. Otherwise LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-6619) Lateral changes for implicit column

2018-07-18 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6619:


 Summary: Lateral changes for implicit column
 Key: DRILL-6619
 URL: https://issues.apache.org/jira/browse/DRILL-6619
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia


1) Update Lateral to consume right batch such that it can contain rows for 
multiple left incoming rows.
2) Update lateral to exclude the implicit field (name passed in PopConfig) with 
rowId from output container. The type of implicit field will be IntVector.
3) Fix all existing unit tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6618) Unnest changes for implicit column

2018-07-18 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6618:


 Summary: Unnest changes for implicit column
 Key: DRILL-6618
 URL: https://issues.apache.org/jira/browse/DRILL-6618
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Sorabh Hamirwasia
Assignee: Parth Chandra


1) Update unnest to work on entire left incoming instead of row by row 
processing.
2) Update unnest to generate an implicit field (name passed in PopConfig) with 
rowId of each output row being generated. The type of implicit field will be 
IntVector.
3) Fix all existing unit tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6616) Batch Processing for Lateral/Unnest

2018-07-18 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6616:


 Summary: Batch Processing for Lateral/Unnest
 Key: DRILL-6616
 URL: https://issues.apache.org/jira/browse/DRILL-6616
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Affects Versions: 1.14.0
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia


Implement the execution and planner side changes for the batch processing done 
by lateral and unnest. Based on the prototype we found performance to be much 
better as compared to initial row-by-row execution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6617) Planner Side changed to propagate $drill_implicit_field$ information

2018-07-18 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6617:


 Summary: Planner Side changed to propagate $drill_implicit_field$ 
information
 Key: DRILL-6617
 URL: https://issues.apache.org/jira/browse/DRILL-6617
 Project: Apache Drill
  Issue Type: Sub-task
  Components: Query Planning & Optimization
Affects Versions: 1.14.0
Reporter: Sorabh Hamirwasia
Assignee: Hanumath Rao Maduri


*+Implement support in planning side for below:+*
1) Propagate the implicit column $drill_implicit_field$ to both Lateral and 
Unnest operator using PopConfig.
2) Update the expressions for operators between Lateral/Unnest subquery to use 
this implicit column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] ilooner commented on a change in pull request #1296: DRILL-5365: Prevent plugin config from changing default fs. Make DrillFileSystem Immutable.

2018-07-18 Thread GitBox
ilooner commented on a change in pull request #1296: DRILL-5365: Prevent plugin 
config from changing default fs. Make DrillFileSystem Immutable.
URL: https://github.com/apache/drill/pull/1296#discussion_r203484229
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystemCache.java
 ##
 @@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.dfs;
+
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * Motivation
+ * 
+ *   This cache is intended to work around the bugs in the {@link 
org.apache.hadoop.fs.FileSystem} static cache (DRILL-5365). Specifically, as of 
Hadoop 2.7.x the
+ *   {@link org.apache.hadoop.fs.FileSystem} cache has the following bad 
behavior:
+ * 
+ * 
+ *   
+ * The {@link org.apache.hadoop.conf.Configuration} object is not 
considered when constructing keys for the {@link 
org.apache.hadoop.fs.FileSystem} cache of
+ * {@link org.apache.hadoop.fs.FileSystem} objects.
+ *   
+ *   
+ *  The {@link org.apache.hadoop.fs.FileSystem} cache does not honor the 
fs.default.name property when constructing keys, only 
fs.defaultFS is used to construct
+ *  keys in the cache.
+ *   
+ * 
+ *
+ * Usage
+ *
+ * 
+ *   
+ * A prerequisite for usage is that all {@link 
org.apache.hadoop.conf.Configuration} objects are normalized with
+ * {@link 
org.apache.drill.exec.store.dfs.DrillFileSystem#normalize(Configuration)}.
+ *   
+ *   
+ * This cache should only be used from {@link 
org.apache.drill.exec.store.dfs.DrillFileSystem}.
+ *   
+ * 
+ *
+ * TODO
+ *
+ * 
+ *   
+ * Drill currently keeps a {@link 
org.apache.drill.exec.store.dfs.DrillFileSystem} open indefinitely. This will 
be corrected
+ * in DRILL-6608. As a result this cache currently has no methods to 
remove {@link org.apache.hadoop.fs.FileSystem} objects
+ * after they are created.
+ *   
+ * 
+ */
+class DrillFileSystemCache {
+  private Map, FileSystem> cache = new HashMap<>();
 
 Review comment:
   @arina-ielchiieva Thanks for catching this. 
   
   Making the maps concurrent wouldn't be enough, there would still be a race 
condition in **get** since we are checking and then setting a file system. So 
we could create two copies of the same file system. I made the get method 
synchronized instead.
   
   Please take another look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] priteshm commented on issue #1384: DRILL-6606: Fixed bug in HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.

2018-07-18 Thread GitBox
priteshm commented on issue #1384: DRILL-6606: Fixed bug in HashJoin that 
caused it not to return OK_NEW_SCHEMA in some cases.
URL: https://github.com/apache/drill/pull/1384#issuecomment-406028909
 
 
   Spoke with @Ben-Zvi - he will review this shortly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] priteshm commented on issue #1381: DRILL-6475: Unnest: Null fieldId Pointer.

2018-07-18 Thread GitBox
priteshm commented on issue #1381: DRILL-6475: Unnest: Null fieldId Pointer.
URL: https://github.com/apache/drill/pull/1381#issuecomment-406022366
 
 
   @amansinha100 , @HanumathRao since it has an overall +1, I added the 
ready-to-commit label on the JIRA


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1264: DRILL-6422: Update guava to 23.0 and shade it

2018-07-18 Thread GitBox
vvysotskyi commented on a change in pull request #1264:  DRILL-6422: Update 
guava to 23.0 and shade it
URL: https://github.com/apache/drill/pull/1264#discussion_r203471930
 
 

 ##
 File path: drill-shaded/pom.xml
 ##
 @@ -0,0 +1,72 @@
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+  4.0.0
+
+  
+org.apache
+apache
+18
+
+  
+
+  org.apache.drill
+  drill-shaded
+  1.0
+
+  drill-shaded
+  pom
+
+  
+
+  
+
+  org.apache.maven.plugins
+  maven-shade-plugin
+  3.1.0
+  
+
+  package
+  
+shade
+  
+
+  
+  
+true
+true
+true
 
 Review comment:
   Thanks, removed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1264: DRILL-6422: Update guava to 23.0 and shade it

2018-07-18 Thread GitBox
vvysotskyi commented on a change in pull request #1264:  DRILL-6422: Update 
guava to 23.0 and shade it
URL: https://github.com/apache/drill/pull/1264#discussion_r203472049
 
 

 ##
 File path: drill-shaded/pom.xml
 ##
 @@ -0,0 +1,72 @@
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+  4.0.0
+
+  
+org.apache
+apache
+18
+
+  
+
+  org.apache.drill
+  drill-shaded
+  1.0
+
+  drill-shaded
+  pom
+
+  
+
+  
+
+  org.apache.maven.plugins
+  maven-shade-plugin
+  3.1.0
+  
+
+  package
+  
+shade
+  
+
+  
+  
+true
+true
+true
+
${project.build.directory}/dependency-reduced-pom.xml
+
+  
+  
 
 Review comment:
   added `false`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on issue #1387: DRILL-6603: Set num_nulls for parquet statistics to -1 when actual number is not defined.

2018-07-18 Thread GitBox
arina-ielchiieva commented on issue #1387: DRILL-6603: Set num_nulls for 
parquet statistics to -1 when actual number is not defined.
URL: https://github.com/apache/drill/pull/1387#issuecomment-406009671
 
 
   @vvysotskyi could you please review?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva opened a new pull request #1387: DRILL-6603: Set num_nulls for parquet statistics to -1 when actual number is not defined.

2018-07-18 Thread GitBox
arina-ielchiieva opened a new pull request #1387: DRILL-6603: Set num_nulls for 
parquet statistics to -1 when actual number is not defined.
URL: https://github.com/apache/drill/pull/1387
 
 
   Details in [DRILL-6603](https://issues.apache.org/jira/browse/DRILL-6603).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on issue #1384: DRILL-6606: Fixed bug in HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.

2018-07-18 Thread GitBox
ilooner commented on issue #1384: DRILL-6606: Fixed bug in HashJoin that caused 
it not to return OK_NEW_SCHEMA in some cases.
URL: https://github.com/apache/drill/pull/1384#issuecomment-405991291
 
 
   @KazydubB I saw that bug as well after opening the PR and pushed a fix. The 
test is passing now on both my laptop and jenkins. Please give it another try.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] KazydubB commented on a change in pull request #1386: DRILL-6574: Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

2018-07-18 Thread GitBox
KazydubB commented on a change in pull request #1386: DRILL-6574: Add option to 
push LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#discussion_r203434125
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ##
 @@ -126,9 +161,84 @@ public static boolean containsLimit0(final RelNode rel) {
 return visitor.isContains();
   }
 
-  private boolean contains = false;
+  /**
+   * TODO(DRILL-3993): Use RelBuilder to create a limit node to allow for 
applying this optimization in potentially
+   * any of the transformations, but currently this can be applied after Drill 
logical transformation, and before
+   * Drill physical transformation.
+   */
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final Pointer isUnsupported = new Pointer<>(false);
 
 Review comment:
   Because the variable is re-assigned and is used in anonymous classes 
(unsupportedFunctionsVisitor, unsupportedOperationsVisitor variables) and 
should be (effectively) final.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] KazydubB commented on a change in pull request #1386: DRILL-6574: Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

2018-07-18 Thread GitBox
KazydubB commented on a change in pull request #1386: DRILL-6574: Add option to 
push LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#discussion_r203434125
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ##
 @@ -126,9 +161,84 @@ public static boolean containsLimit0(final RelNode rel) {
 return visitor.isContains();
   }
 
-  private boolean contains = false;
+  /**
+   * TODO(DRILL-3993): Use RelBuilder to create a limit node to allow for 
applying this optimization in potentially
+   * any of the transformations, but currently this can be applied after Drill 
logical transformation, and before
+   * Drill physical transformation.
+   */
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final Pointer isUnsupported = new Pointer<>(false);
 
 Review comment:
   Because the variable is re-assigned and is used in anonymous classes 
(unsupportedFunctionsVisitor, unsupportedOperationsVisitor variables) and 
should be (effectively) final.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203432621
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java
 ##
 @@ -115,56 +175,17 @@ public void removeChild(MaterializedField field) {
*/
 
   public void replaceType(MajorType newType) {
-assert type.getMinorType() == newType.getMinorType();
-assert type.getMode() == newType.getMode();
+Preconditions.checkArgument(type.getMinorType() == newType.getMinorType());
+Preconditions.checkArgument(type.getMode() == newType.getMode());
 
 Review comment:
   I would be concerned with the cost only in case `replaceType()` is called 
for every row (tuple) (note that in java there is no test-time assert, asserts 
are optimized out only when bytecode is compiled by the hotspot) and I have no 
idea what it means to replace a type.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

2018-07-18 Thread GitBox
vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option 
to push LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#discussion_r203412894
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ##
 @@ -126,9 +161,84 @@ public static boolean containsLimit0(final RelNode rel) {
 return visitor.isContains();
   }
 
-  private boolean contains = false;
+  /**
+   * TODO(DRILL-3993): Use RelBuilder to create a limit node to allow for 
applying this optimization in potentially
+   * any of the transformations, but currently this can be applied after Drill 
logical transformation, and before
+   * Drill physical transformation.
+   */
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final Pointer isUnsupported = new Pointer<>(false);
 
 Review comment:
   What is the reason for using Pointer instead of a boolean variable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

2018-07-18 Thread GitBox
vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option 
to push LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#discussion_r203427626
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/limit/TestLateLimit0Optimization.java
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.limit;
+
+import org.apache.drill.test.BaseTestQuery;
+import org.apache.drill.PlanTestBase;
+import org.junit.Test;
+
+public class TestLateLimit0Optimization extends BaseTestQuery {
+
+  @Test
+  public void convertFromJson() throws Exception {
+checkThatQueryIsNotOptimized("SELECT CONVERT_FROM('{x:100, y:215.6}' 
,'JSON') AS MYCOL FROM (VALUES(1))");
+  }
+
+  private static void checkThatQueryIsNotOptimized(final String query) throws 
Exception {
+PlanTestBase.testPlanMatchingPatterns(wrapLimit0(query),
+new String[]{},
 
 Review comment:
   No need to create an empty array, `null` may be passed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

2018-07-18 Thread GitBox
vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option 
to push LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#discussion_r203428413
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/limit/TestLateLimit0Optimization.java
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.limit;
+
+import org.apache.drill.test.BaseTestQuery;
+import org.apache.drill.PlanTestBase;
+import org.junit.Test;
+
+public class TestLateLimit0Optimization extends BaseTestQuery {
 
 Review comment:
   It would be good to add a test which checks that `limit 0` wasn't inserted 
when the option is disabled.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

2018-07-18 Thread GitBox
vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option 
to push LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#discussion_r203399743
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ##
 @@ -126,9 +161,84 @@ public static boolean containsLimit0(final RelNode rel) {
 return visitor.isContains();
   }
 
-  private boolean contains = false;
+  /**
+   * TODO(DRILL-3993): Use RelBuilder to create a limit node to allow for 
applying this optimization in potentially
+   * any of the transformations, but currently this can be applied after Drill 
logical transformation, and before
+   * Drill physical transformation.
+   */
+  public static DrillRel addLimitOnTopOfLeafNodes(final DrillRel rel) {
+final Pointer isUnsupported = new Pointer<>(false);
+
+// to visit unsupported functions
+final RexShuttle unsupportedFunctionsVisitor = new RexShuttle() {
+  @Override
+  public RexNode visitCall(RexCall call) {
+final SqlOperator operator = call.getOperator();
+if (isUnsupportedScalarFunction(operator)) {
+  isUnsupported.value = true;
+  return call;
+}
+return super.visitCall(call);
+  }
+};
+
+// to visit unsupported operators
+final RelShuttle unsupportedOperationsVisitor = new RelShuttleImpl() {
+  @Override
+  public RelNode visit(RelNode other) {
+if (other instanceof DrillUnionRelBase) {
+  isUnsupported.value = true;
+  return other;
+} else if (other instanceof DrillProjectRelBase) {
+  other.accept(unsupportedFunctionsVisitor);
 
 Review comment:
   Should we also add a check for `isUnsupported` value before calling 
`other.accept()` to consider a case when rel node has several inputs and one of 
the previous inputs already changed `isUnsupported` value?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

2018-07-18 Thread GitBox
vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option 
to push LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#discussion_r203427936
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/limit/TestLateLimit0Optimization.java
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.limit;
+
+import org.apache.drill.test.BaseTestQuery;
+import org.apache.drill.PlanTestBase;
+import org.junit.Test;
+
+public class TestLateLimit0Optimization extends BaseTestQuery {
+
+  @Test
+  public void convertFromJson() throws Exception {
+checkThatQueryIsNotOptimized("SELECT CONVERT_FROM('{x:100, y:215.6}' 
,'JSON') AS MYCOL FROM (VALUES(1))");
+  }
+
+  private static void checkThatQueryIsNotOptimized(final String query) throws 
Exception {
+PlanTestBase.testPlanMatchingPatterns(wrapLimit0(query),
+new String[]{},
+new String[]{
+".*Limit\\(offset=\\[0\\], fetch=\\[0\\]\\)(.*[\n\r])+.*Scan.*"
+});
+  }
+
+  private static String wrapLimit0(final String query) {
+return "SELECT * FROM (" + query + ") LZT LIMIT 0";
+  }
+
+  @Test
+  public void convertToIntBE() throws Exception {
+checkThatQueryIsOptimized("SELECT CONVERT_TO(r_regionkey, 'INT_BE') FROM 
cp.`tpch/region.parquet`");
+  }
+
+  private static void checkThatQueryIsOptimized(final String query) throws 
Exception {
+PlanTestBase.testPlanMatchingPatterns(wrapLimit0(query),
 
 Review comment:
   `PlanTestBase` has a method which takes two arguments, please use that 
method here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

2018-07-18 Thread GitBox
vvysotskyi commented on a change in pull request #1386: DRILL-6574: Add option 
to push LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#discussion_r203419904
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/FindLimit0Visitor.java
 ##
 @@ -126,9 +161,84 @@ public static boolean containsLimit0(final RelNode rel) {
 return visitor.isContains();
   }
 
-  private boolean contains = false;
+  /**
+   * TODO(DRILL-3993): Use RelBuilder to create a limit node to allow for 
applying this optimization in potentially
 
 Review comment:
   Please remove this comment. I don't think that it should be done in 
`RelBuilder` since rel node should be checked starting from the top, but 
`RelBuilder` may be used at any level.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203427835
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java
 ##
 @@ -49,39 +54,79 @@ private MaterializedField(String name, MajorType type, 
LinkedHashSet(size));
+  }
+
+  private  void copyFrom(Collection source, Function transformation) {
+Preconditions.checkState(children.isEmpty());
+source.forEach(child -> children.add(transformation.apply(child)));
+  }
+
+  public static MaterializedField create(String name, MajorType type) {
+return new MaterializedField(name, type, 0);
+  }
+
   public static MaterializedField create(SerializedField serField) {
-LinkedHashSet children = new LinkedHashSet<>();
-for (SerializedField sf : serField.getChildList()) {
-  children.add(MaterializedField.create(sf));
+MaterializedField field = new 
MaterializedField(serField.getNamePart().getName(), serField.getMajorType(), 
serField.getChildCount());
+if (OFFSETS_FIELD.equals(field)) {
+  return OFFSETS_FIELD;
 }
-return new MaterializedField(serField.getNamePart().getName(), 
serField.getMajorType(), children);
+field.copyFrom(serField.getChildList(), MaterializedField::create);
+return field;
   }
 
-  /**
-   * Create and return a serialized field based on the current state.
-   */
-  public SerializedField getSerializedField() {
-SerializedField.Builder serializedFieldBuilder = getAsBuilder();
-for(MaterializedField childMaterializedField : getChildren()) {
-  
serializedFieldBuilder.addChild(childMaterializedField.getSerializedField());
+  public MaterializedField copy() {
+return copy(getName(), getType());
+  }
+
+  public MaterializedField copy(MajorType type) {
+return copy(name, type);
+  }
+
+  public MaterializedField copy(String name) {
+return copy(name, getType());
+  }
+
+  public MaterializedField copy(String name, final MajorType type) {
+if (this == OFFSETS_FIELD) {
+  return this;
 }
-return serializedFieldBuilder.build();
+MaterializedField field = new MaterializedField(name, type, 
getChildren().size());
+field.copyFrom(getChildren(), MaterializedField::copy);
+return field;
   }
 
-  public SerializedField.Builder getAsBuilder() {
-return SerializedField.newBuilder()
-.setMajorType(type)
-.setNamePart(NamePart.newBuilder().setName(name).build());
+  public String getName() {
+return name;
+  }
+
+  public MajorType getType() {
+return type;
   }
 
   public Collection getChildren() {
-return new ArrayList<>(children);
+return children;
+  }
+
+  public int getWidth() {
+return type.getWidth();
+  }
+
 
 Review comment:
   The PR does not introduce new functions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203427239
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java
 ##
 @@ -49,39 +54,79 @@ private MaterializedField(String name, MajorType type, 
LinkedHashSet(size));
+  }
+
+  private  void copyFrom(Collection source, Function transformation) {
+Preconditions.checkState(children.isEmpty());
+source.forEach(child -> children.add(transformation.apply(child)));
+  }
+
+  public static MaterializedField create(String name, MajorType type) {
+return new MaterializedField(name, type, 0);
+  }
+
   public static MaterializedField create(SerializedField serField) {
-LinkedHashSet children = new LinkedHashSet<>();
-for (SerializedField sf : serField.getChildList()) {
-  children.add(MaterializedField.create(sf));
+MaterializedField field = new 
MaterializedField(serField.getNamePart().getName(), serField.getMajorType(), 
serField.getChildCount());
+if (OFFSETS_FIELD.equals(field)) {
+  return OFFSETS_FIELD;
 }
-return new MaterializedField(serField.getNamePart().getName(), 
serField.getMajorType(), children);
+field.copyFrom(serField.getChildList(), MaterializedField::create);
+return field;
   }
 
-  /**
-   * Create and return a serialized field based on the current state.
-   */
-  public SerializedField getSerializedField() {
-SerializedField.Builder serializedFieldBuilder = getAsBuilder();
-for(MaterializedField childMaterializedField : getChildren()) {
-  
serializedFieldBuilder.addChild(childMaterializedField.getSerializedField());
+  public MaterializedField copy() {
+return copy(getName(), getType());
+  }
+
+  public MaterializedField copy(MajorType type) {
+return copy(name, type);
+  }
+
+  public MaterializedField copy(String name) {
+return copy(name, getType());
+  }
+
+  public MaterializedField copy(String name, final MajorType type) {
+if (this == OFFSETS_FIELD) {
+  return this;
 }
-return serializedFieldBuilder.build();
+MaterializedField field = new MaterializedField(name, type, 
getChildren().size());
+field.copyFrom(getChildren(), MaterializedField::copy);
+return field;
   }
 
-  public SerializedField.Builder getAsBuilder() {
-return SerializedField.newBuilder()
-.setMajorType(type)
-.setNamePart(NamePart.newBuilder().setName(name).build());
+  public String getName() {
+return name;
+  }
+
+  public MajorType getType() {
+return type;
   }
 
   public Collection getChildren() {
-return new ArrayList<>(children);
+return children;
+  }
+
+  public int getWidth() {
+return type.getWidth();
+  }
 
 Review comment:
   `getWidth()` is still used.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203426875
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java
 ##
 @@ -49,39 +54,79 @@ private MaterializedField(String name, MajorType type, 
LinkedHashSet(size));
+  }
+
+  private  void copyFrom(Collection source, Function transformation) {
+Preconditions.checkState(children.isEmpty());
+source.forEach(child -> children.add(transformation.apply(child)));
+  }
+
+  public static MaterializedField create(String name, MajorType type) {
+return new MaterializedField(name, type, 0);
+  }
+
   public static MaterializedField create(SerializedField serField) {
-LinkedHashSet children = new LinkedHashSet<>();
-for (SerializedField sf : serField.getChildList()) {
-  children.add(MaterializedField.create(sf));
+MaterializedField field = new 
MaterializedField(serField.getNamePart().getName(), serField.getMajorType(), 
serField.getChildCount());
+if (OFFSETS_FIELD.equals(field)) {
+  return OFFSETS_FIELD;
 }
-return new MaterializedField(serField.getNamePart().getName(), 
serField.getMajorType(), children);
+field.copyFrom(serField.getChildList(), MaterializedField::create);
+return field;
   }
 
-  /**
-   * Create and return a serialized field based on the current state.
-   */
-  public SerializedField getSerializedField() {
-SerializedField.Builder serializedFieldBuilder = getAsBuilder();
-for(MaterializedField childMaterializedField : getChildren()) {
-  
serializedFieldBuilder.addChild(childMaterializedField.getSerializedField());
+  public MaterializedField copy() {
+return copy(getName(), getType());
+  }
+
+  public MaterializedField copy(MajorType type) {
+return copy(name, type);
+  }
+
+  public MaterializedField copy(String name) {
+return copy(name, getType());
+  }
+
+  public MaterializedField copy(String name, final MajorType type) {
+if (this == OFFSETS_FIELD) {
+  return this;
 }
-return serializedFieldBuilder.build();
+MaterializedField field = new MaterializedField(name, type, 
getChildren().size());
+field.copyFrom(getChildren(), MaterializedField::copy);
 
 Review comment:
   `children` in `MaterializedField` is a `Set`. Please see `equals()` and 
`hash()` for `MaterializedField`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203418113
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java
 ##
 @@ -49,39 +54,79 @@ private MaterializedField(String name, MajorType type, 
LinkedHashSet(size));
+  }
+
+  private  void copyFrom(Collection source, Function transformation) {
+Preconditions.checkState(children.isEmpty());
+source.forEach(child -> children.add(transformation.apply(child)));
+  }
+
+  public static MaterializedField create(String name, MajorType type) {
+return new MaterializedField(name, type, 0);
+  }
+
   public static MaterializedField create(SerializedField serField) {
-LinkedHashSet children = new LinkedHashSet<>();
-for (SerializedField sf : serField.getChildList()) {
-  children.add(MaterializedField.create(sf));
+MaterializedField field = new 
MaterializedField(serField.getNamePart().getName(), serField.getMajorType(), 
serField.getChildCount());
+if (OFFSETS_FIELD.equals(field)) {
+  return OFFSETS_FIELD;
 }
-return new MaterializedField(serField.getNamePart().getName(), 
serField.getMajorType(), children);
+field.copyFrom(serField.getChildList(), MaterializedField::create);
+return field;
   }
 
-  /**
-   * Create and return a serialized field based on the current state.
-   */
-  public SerializedField getSerializedField() {
-SerializedField.Builder serializedFieldBuilder = getAsBuilder();
-for(MaterializedField childMaterializedField : getChildren()) {
-  
serializedFieldBuilder.addChild(childMaterializedField.getSerializedField());
+  public MaterializedField copy() {
+return copy(getName(), getType());
+  }
+
+  public MaterializedField copy(MajorType type) {
+return copy(name, type);
+  }
+
+  public MaterializedField copy(String name) {
+return copy(name, getType());
+  }
+
+  public MaterializedField copy(String name, final MajorType type) {
+if (this == OFFSETS_FIELD) {
 
 Review comment:
   I agree that it will be good to unify `bits` field as well as part of 
another JIRA/PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203416964
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java
 ##
 @@ -49,39 +54,79 @@ private MaterializedField(String name, MajorType type, 
LinkedHashSet(size));
+  }
+
+  private  void copyFrom(Collection source, Function transformation) {
+Preconditions.checkState(children.isEmpty());
+source.forEach(child -> children.add(transformation.apply(child)));
+  }
+
+  public static MaterializedField create(String name, MajorType type) {
+return new MaterializedField(name, type, 0);
+  }
+
   public static MaterializedField create(SerializedField serField) {
-LinkedHashSet children = new LinkedHashSet<>();
-for (SerializedField sf : serField.getChildList()) {
-  children.add(MaterializedField.create(sf));
+MaterializedField field = new 
MaterializedField(serField.getNamePart().getName(), serField.getMajorType(), 
serField.getChildCount());
+if (OFFSETS_FIELD.equals(field)) {
+  return OFFSETS_FIELD;
 }
-return new MaterializedField(serField.getNamePart().getName(), 
serField.getMajorType(), children);
+field.copyFrom(serField.getChildList(), MaterializedField::create);
+return field;
   }
 
-  /**
-   * Create and return a serialized field based on the current state.
-   */
-  public SerializedField getSerializedField() {
-SerializedField.Builder serializedFieldBuilder = getAsBuilder();
-for(MaterializedField childMaterializedField : getChildren()) {
-  
serializedFieldBuilder.addChild(childMaterializedField.getSerializedField());
+  public MaterializedField copy() {
+return copy(getName(), getType());
+  }
+
+  public MaterializedField copy(MajorType type) {
+return copy(name, type);
+  }
 
 Review comment:
   Please see my comment regarding `clone()`, copy constructor and a builder 
design pattern.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203416278
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java
 ##
 @@ -38,6 +41,8 @@
  */
 
 public class MaterializedField {
+  public final static MaterializedField OFFSETS_FIELD = new 
MaterializedField(ValueVector.OFFSETS_VECTOR_NAME, 
Types.required(MinorType.UINT4), 0);
 
 Review comment:
   I disagree. I don't see any problem with static `OFFSET_FIELD` belonging to 
`MaterializedField`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203415144
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/record/TestMaterializedField.java
 ##
 @@ -58,32 +58,24 @@ public void initialize() {
   }
 
   @Test
-  public void testClone() {
-final MaterializedField cloneParent = parent.clone();
-final boolean isParentEqual = parent.equals(cloneParent);
-assertTrue("Cloned parent does not match the original", isParentEqual);
+  public void testCopy() {
+final MaterializedField cloneParent = parent.copy();
+assertEquals("Parent copy does not match the original", parent, 
cloneParent);
 
-final MaterializedField cloneChild = child.clone();
-final boolean isChildEqual = child.equals(cloneChild);
-assertTrue("Cloned child does not match the original", isChildEqual);
+final MaterializedField cloneChild = child.copy();
+assertEquals("Child copy does not match the original", child, cloneChild);
 
-for (final MaterializedField field:new MaterializedField[]{parent, child}) 
{
-  for (Object[] args:matrix) {
-final String path = args[0].toString();
-final TypeProtos.MajorType type = 
TypeProtos.MajorType.class.cast(args[1]);
-
-final MaterializedField clone = field.withPathAndType(path, type);
-
-final boolean isPathEqual = path.equals(clone.getName());
-assertTrue("Cloned path does not match the original", isPathEqual);
-
-final boolean isTypeEqual = type.equals(clone.getType());
-assertTrue("Cloned type does not match the original", isTypeEqual);
-
-final boolean isChildrenEqual = 
field.getChildren().equals(clone.getChildren());
-assertTrue("Cloned children do not match the original", 
isChildrenEqual);
-  }
+for (Object[] args : matrix) {
+  assertTypeAndPath(parent, (String)args[0], 
(TypeProtos.MajorType)args[1]);
+  assertTypeAndPath(child, (String)args[0], (TypeProtos.MajorType)args[1]);
 }
+  }
+
+  private void assertTypeAndPath(MaterializedField field, String path, 
TypeProtos.MajorType type) {
+final MaterializedField clone = field.copy(path, type);
 
 Review comment:
   I am not sure what is the concern. Is it with the test or with the `copy`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203414024
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/record/TestMaterializedField.java
 ##
 @@ -58,32 +58,24 @@ public void initialize() {
   }
 
   @Test
-  public void testClone() {
-final MaterializedField cloneParent = parent.clone();
-final boolean isParentEqual = parent.equals(cloneParent);
-assertTrue("Cloned parent does not match the original", isParentEqual);
+  public void testCopy() {
+final MaterializedField cloneParent = parent.copy();
+assertEquals("Parent copy does not match the original", parent, 
cloneParent);
 
 Review comment:
   The goal of the PR is to address usage of the `clone()` instead of the copy 
constructor with a few minor fixes of badly written Java code. The change here 
addresses the issue of using `assertTrue` where it should assert for equality.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203408822
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/ColumnState.java
 ##
 @@ -123,7 +123,7 @@ public static MapArrayColumnState 
build(ResultSetLoaderImpl resultSetLoader,
   // Create the map's offset vector.
 
   UInt4Vector offsetVector = new UInt4Vector(
-  BaseRepeatedValueVector.OFFSETS_FIELD,
+  MaterializedField.OFFSETS_FIELD,
 
 Review comment:
   It is not a name. It is an immutable static `MaterializedField` that can be 
shared between vectors. The name still belongs to `ValueVector`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] cgivre commented on issue #1114: DRILL-6104: Added Logfile Reader

2018-07-18 Thread GitBox
cgivre commented on issue #1114: DRILL-6104: Added Logfile Reader
URL: https://github.com/apache/drill/pull/1114#issuecomment-405957791
 
 
   Commits squashed and proper commit message added.  
   
   Regarding the unit tests, is there an example of a unit test for when we are 
expecting Drill to throw an exception?
   
   > On Jul 18, 2018, at 09:30, Arina Ielchiieva  
wrote:
   > 
   > @cgivre  looks like PR is ready to commit. 
Please squash the commits and leave one with proper commit message. Maybe add 
extra unit tests as Paul asked if you have time.
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub 
, or mute the 
thread 
.
   > 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203407400
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ##
 @@ -792,7 +792,7 @@ private void setupOutputContainerSchema() {
 }
 
 // make sure to project field with children for children to show up in 
the schema
-final MaterializedField projected = field.withType(outputType);
+final MaterializedField projected = field.copy(outputType);
 
 Review comment:
   `copy` refers to a copy constructor with a deep copy semantics. `with...` 
usually applies to a builder and that pattern is secondary here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vrozov commented on a change in pull request #1383: DRILL-6613: Refactor MaterializedField

2018-07-18 Thread GitBox
vrozov commented on a change in pull request #1383: DRILL-6613: Refactor 
MaterializedField
URL: https://github.com/apache/drill/pull/1383#discussion_r203406367
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
 ##
 @@ -453,22 +453,15 @@ protected boolean setupNewSchema() throws 
SchemaChangeException {
 cg.addExpr(expr);
   } else {
 // need to do evaluation.
-final MaterializedField outputField;
+ValueVector incomingVector = null;
 if (expr instanceof ValueVectorReadExpression) {
   final TypedFieldId id = 
ValueVectorReadExpression.class.cast(expr).getFieldId();
-  @SuppressWarnings("resource")
-  final ValueVector incomingVector = 
incoming.getValueAccessorById(id.getIntermediateClass(), 
id.getFieldIds()).getValueVector();
-  // outputField is taken from the incoming schema to avoid the loss 
of nested fields
-  // when the first batch will be empty.
-  if (incomingVector != null) {
-outputField = incomingVector.getField().clone();
-  } else {
-outputField = MaterializedField.create(outputName, 
expr.getMajorType());
-  }
-} else {
-  outputField = MaterializedField.create(outputName, 
expr.getMajorType());
+  incomingVector = 
incoming.getValueAccessorById(id.getIntermediateClass(), 
id.getFieldIds()).getValueVector();
 
 Review comment:
   There is no dropped code, the code is re-organized to avoid nested if/else.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva closed pull request #1385: DRILL-6612: Query fails with AssertionError when joining persistent a…

2018-07-18 Thread GitBox
arina-ielchiieva closed pull request #1385: DRILL-6612: Query fails with 
AssertionError when joining persistent a…
URL: https://github.com/apache/drill/pull/1385
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/calcite/jdbc/DynamicRootSchema.java 
b/exec/java-exec/src/main/java/org/apache/calcite/jdbc/DynamicRootSchema.java
index cde46f2c28b..e6b8f49499d 100644
--- 
a/exec/java-exec/src/main/java/org/apache/calcite/jdbc/DynamicRootSchema.java
+++ 
b/exec/java-exec/src/main/java/org/apache/calcite/jdbc/DynamicRootSchema.java
@@ -26,6 +26,7 @@
 import org.apache.calcite.schema.impl.AbstractSchema;
 import org.apache.calcite.util.BuiltInMethod;
 import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.planner.sql.SchemaUtilites;
 import org.apache.drill.exec.store.SchemaConfig;
 import org.apache.drill.exec.store.StoragePlugin;
 import org.apache.drill.exec.store.StoragePluginRegistry;
@@ -83,9 +84,9 @@ public void loadSchemaFactory(String schemaName, boolean 
caseSensitive) {
   }
 
   // Could not find the plugin of schemaName. The schemaName could be 
`dfs.tmp`, a 2nd level schema under 'dfs'
-  String[] paths = schemaName.split("\\.");
-  if (paths.length == 2) {
-plugin = getSchemaFactories().getPlugin(paths[0]);
+  List paths = SchemaUtilites.getSchemaPathAsList(schemaName);
+  if (paths.size() == 2) {
+plugin = getSchemaFactories().getPlugin(paths.get(0));
 if (plugin == null) {
   return;
 }
@@ -95,7 +96,7 @@ public void loadSchemaFactory(String schemaName, boolean 
caseSensitive) {
 plugin.registerSchemas(schemaConfig, thisPlus);
 
 // Load second level schemas for this storage plugin
-final SchemaPlus firstlevelSchema = thisPlus.getSubSchema(paths[0]);
+final SchemaPlus firstlevelSchema = 
thisPlus.getSubSchema(paths.get(0));
 final List secondLevelSchemas = Lists.newArrayList();
 for (String secondLevelSchemaName : 
firstlevelSchema.getSubSchemaNames()) {
   
secondLevelSchemas.add(firstlevelSchema.getSubSchema(secondLevelSchemaName));
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SchemaUtilites.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SchemaUtilites.java
index 7d42e57e9ad..a2623634916 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SchemaUtilites.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SchemaUtilites.java
@@ -27,6 +27,7 @@
 import org.apache.drill.exec.store.AbstractSchema;
 import org.apache.drill.exec.store.dfs.WorkspaceSchemaFactory;
 
+import java.util.Arrays;
 import java.util.Collections;
 import java.util.List;
 
@@ -73,7 +74,7 @@ public static SchemaPlus findSchema(final SchemaPlus 
defaultSchema, final List schemaPathAsList = 
Lists.newArrayList(schemaPath.split("\\."));
+final List schemaPathAsList = getSchemaPathAsList(schemaPath);
 return findSchema(defaultSchema, schemaPathAsList);
   }
 
@@ -144,6 +145,11 @@ public static String getSchemaPath(List 
schemaPath) {
 return SCHEMA_PATH_JOINER.join(schemaPath);
   }
 
+  /** Utility method to get the list with schema path components for given 
schema path string. */
+  public static List getSchemaPathAsList(String schemaPath) {
+return Arrays.asList(schemaPath.split("\\."));
+  }
+
   /** Utility method to get the schema path as list for given schema instance. 
*/
   public static List getSchemaPathAsList(SchemaPlus schema) {
 if (isRootSchema(schema)) {
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
index eee141e77a1..d4da23f43e7 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java
@@ -18,6 +18,7 @@
 package org.apache.drill.exec.planner.sql;
 
 import java.math.BigDecimal;
+import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
 import java.util.Map;
@@ -52,7 +53,6 @@
 import org.apache.calcite.sql.SqlCall;
 import org.apache.calcite.sql.SqlIdentifier;
 import org.apache.calcite.sql.SqlNode;
-import org.apache.calcite.sql.SqlKind;
 import org.apache.calcite.sql.SqlOperatorTable;
 import org.apache.calcite.sql.parser.SqlParseException;
 import org.apache.calcite.sql.parser.SqlParser;
@@ -251,24 +251,24 @@ protected void validateFrom(
 SqlValidatorScope scope) {
   switch (node.getKind()) {
 case AS:
-  if (((SqlCall) node).operand(0) instanceof SqlIden

[GitHub] arina-ielchiieva commented on issue #1114: DRILL-6104: Added Logfile Reader

2018-07-18 Thread GitBox
arina-ielchiieva commented on issue #1114: DRILL-6104: Added Logfile Reader
URL: https://github.com/apache/drill/pull/1114#issuecomment-405931381
 
 
   @cgivre looks like PR is ready to commit. Please squash the commits and 
leave one with proper commit message. Maybe add extra unit tests as Paul asked 
if you have time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added pcapng-format support

2018-07-18 Thread GitBox
Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added 
pcapng-format support
URL: https://github.com/apache/drill/pull/1126#discussion_r203371503
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/pcapng/TestPcapngHeaders.java
 ##
 @@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcapng;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.TupleSchema;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSet;
+import org.apache.drill.test.rowSet.RowSetBuilder;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.nio.file.Paths;
+
+public class TestPcapngHeaders extends ClusterTest {
+  @BeforeClass
+  public static void setupTestFiles() throws Exception {
+startCluster(ClusterFixture.builder(dirTestWatcher).maxParallelization(1));
+dirTestWatcher.copyResourceToRoot(Paths.get("store", "pcapng"));
+  }
+
+  @Test
+  public void testValidHeaders() throws IOException {
+String query = "select * from dfs.`store/pcapng/sniff.pcapng`";
+RowSet actual = client.queryBuilder().sql(query).rowSet();
+
+TupleSchema expectedSchema = new TupleSchema();
+
+expectedSchema.add(MaterializedField.create("tcp_flags_ece_ecn_capable", 
Types.optional(TypeProtos.MinorType.INT)));
+
expectedSchema.add(MaterializedField.create("tcp_flags_ece_congestion_experienced",
 Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("tcp_flags_psh", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("type", 
Types.optional(TypeProtos.MinorType.VARCHAR)));
+expectedSchema.add(MaterializedField.create("tcp_flags_cwr", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("dst_ip", 
Types.optional(TypeProtos.MinorType.VARCHAR)));
+expectedSchema.add(MaterializedField.create("src_ip", 
Types.optional(TypeProtos.MinorType.VARCHAR)));
+expectedSchema.add(MaterializedField.create("tcp_flags_fin", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("tcp_flags_ece", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("tcp_flags", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("tcp_flags_ack", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("src_mac_address", 
Types.optional(TypeProtos.MinorType.VARCHAR)));
+expectedSchema.add(MaterializedField.create("tcp_flags_syn", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("tcp_flags_rst", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("timestamp", 
Types.optional(TypeProtos.MinorType.TIMESTAMP)));
+expectedSchema.add(MaterializedField.create("tcp_session", 
Types.optional(TypeProtos.MinorType.BIGINT)));
+expectedSchema.add(MaterializedField.create("packet_data", 
Types.optional(TypeProtos.MinorType.VARCHAR)));
+expectedSchema.add(MaterializedField.create("tcp_parsed_flags", 
Types.optional(TypeProtos.MinorType.VARCHAR)));
+expectedSchema.add(MaterializedField.create("tcp_flags_ns", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("src_port", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("packet_length", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("tcp_flags_urg", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("tcp_ack", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add(MaterializedField.create("dst_port", 
Types.optional(TypeProtos.MinorType.INT)));
+expectedSchema.add

[GitHub] Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added pcapng-format support

2018-07-18 Thread GitBox
Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added 
pcapng-format support
URL: https://github.com/apache/drill/pull/1126#discussion_r203371455
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcapng/schema/Schema.java
 ##
 @@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcapng.schema;
+
+import org.apache.drill.exec.store.pcapng.schema.columns.Column;
+import org.apache.drill.exec.store.pcapng.schema.columns.DstIpImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.DstMacImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.DstPortImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.PacketDataImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.PacketLenImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.SrcIpImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.SrcMacImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.SrcPortImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpAckImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlags;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsAckImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsCwrImpl;
+import 
org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsEceCongestionExperiencedImpl;
+import 
org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsEceEcnCapableImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsEceImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsFinImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsNsImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsPshImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsRstImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsSynImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsUrgIml;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpParsedFlags;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpSessionImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TimestampImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TypeImpl;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+
+public class Schema {
 
 Review comment:
   Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added pcapng-format support

2018-07-18 Thread GitBox
Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added 
pcapng-format support
URL: https://github.com/apache/drill/pull/1126#discussion_r203371427
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcapng/PcapngRecordReader.java
 ##
 @@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcapng;
+
+import com.google.common.collect.ImmutableList;
+import fr.bmartel.pcapdecoder.PcapDecoder;
+import fr.bmartel.pcapdecoder.structure.types.IPcapngType;
+import fr.bmartel.pcapdecoder.structure.types.inter.IEnhancedPacketBLock;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcapng.schema.Schema;
+import org.apache.drill.exec.store.pcapng.schema.columns.Column;
+import org.apache.drill.exec.store.pcapng.schema.columns.DummyImpl;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+public class PcapngRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapngRecordReader.class);
+
+  // batch size should not exceed max allowed record count
+  private static final int BATCH_SIZE = 40_000;
+
+  private final Path pathToFile;
+  private OutputMutator output;
+  private ImmutableList projectedCols;
+  private FileSystem fs;
+  private FSDataInputStream in;
+  private List columns;
+
+  private Iterator it;
+
+  public PcapngRecordReader(final String pathToFile,
+final FileSystem fileSystem,
+final List columns) {
+this.fs = fileSystem;
+this.pathToFile = fs.makeQualified(new Path(pathToFile));
+this.columns = columns;
+setColumns(columns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator output) 
throws ExecutionSetupException {
+try {
+
+  this.output = output;
+  this.in = fs.open(pathToFile);
+  PcapDecoder decoder = new PcapDecoder(IOUtils.toByteArray(in));
+  decoder.decode();
+  this.it = decoder.getSectionList().iterator();
+  setupProjection();
+} catch (IOException io) {
+  throw UserException.dataReadError(io)
+  .addContext("File name:", pathToFile.toUri().getPath())
+  .build(logger);
+}
+  }
+
+  @Override
+  public int next() {
+if (isSkipQuery()) {
+  return getBatchOfBlocks().size();
+} else {
+  return putToTable(getBatchOfBlocks());
+}
+  }
+
+  private int putToTable(final List batchOfBlocks) {
+int counter = 0;
+for (IEnhancedPacketBLock bLock : batchOfBlocks) {
+  for (ProjectedColumnInfo pci : projectedCols) {
+pci.getColumn().process(bLock, pci.getVv(), counter);
+  }
+  counter++;
+}
+return counter;
+  }
+
+  @Override
+  public void close() throws Exception {
+if (in != null) {
+  in.close();
+}
+  }
+
+  private void setupProjection() {
+if (isSkipQuery()) {
+  projectedCols = projectNone();
+} else if (isStarQuery()) {
+  projectedCols = projectAllCols(Schema.getColumnsNames());
+} else {
+  projectedCols = projectCols(columns);
+}
+  }
+
+  private ImmutableList projectNone() {
+ImmutableList.Builder pciBuilder = 
ImmutableList.builder();
+pciBuilder.add(makeCol

[GitHub] Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added pcapng-format support

2018-07-18 Thread GitBox
Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added 
pcapng-format support
URL: https://github.com/apache/drill/pull/1126#discussion_r203371346
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcapng/PcapngRecordReader.java
 ##
 @@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcapng;
+
+import com.google.common.collect.ImmutableList;
+import fr.bmartel.pcapdecoder.PcapDecoder;
+import fr.bmartel.pcapdecoder.structure.types.IPcapngType;
+import fr.bmartel.pcapdecoder.structure.types.inter.IEnhancedPacketBLock;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcapng.schema.Schema;
+import org.apache.drill.exec.store.pcapng.schema.columns.Column;
+import org.apache.drill.exec.store.pcapng.schema.columns.DummyImpl;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+public class PcapngRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapngRecordReader.class);
+
+  // batch size should not exceed max allowed record count
+  private static final int BATCH_SIZE = 40_000;
+
+  private final Path pathToFile;
+  private OutputMutator output;
+  private ImmutableList projectedCols;
+  private FileSystem fs;
+  private FSDataInputStream in;
+  private List columns;
+
+  private Iterator it;
+
+  public PcapngRecordReader(final String pathToFile,
+final FileSystem fileSystem,
+final List columns) {
+this.fs = fileSystem;
+this.pathToFile = fs.makeQualified(new Path(pathToFile));
+this.columns = columns;
+setColumns(columns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator output) 
throws ExecutionSetupException {
+try {
+
+  this.output = output;
+  this.in = fs.open(pathToFile);
+  PcapDecoder decoder = new PcapDecoder(IOUtils.toByteArray(in));
+  decoder.decode();
+  this.it = decoder.getSectionList().iterator();
+  setupProjection();
+} catch (IOException io) {
+  throw UserException.dataReadError(io)
+  .addContext("File name:", pathToFile.toUri().getPath())
+  .build(logger);
+}
+  }
+
+  @Override
+  public int next() {
+if (isSkipQuery()) {
+  return getBatchOfBlocks().size();
+} else {
+  return putToTable(getBatchOfBlocks());
+}
+  }
+
+  private int putToTable(final List batchOfBlocks) {
+int counter = 0;
+for (IEnhancedPacketBLock bLock : batchOfBlocks) {
+  for (ProjectedColumnInfo pci : projectedCols) {
+pci.getColumn().process(bLock, pci.getVv(), counter);
+  }
+  counter++;
+}
+return counter;
+  }
+
+  @Override
+  public void close() throws Exception {
+if (in != null) {
+  in.close();
 
 Review comment:
   Okay


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added pcapng-format support

2018-07-18 Thread GitBox
Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added 
pcapng-format support
URL: https://github.com/apache/drill/pull/1126#discussion_r203371324
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcapng/schema/Schema.java
 ##
 @@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcapng.schema;
+
+import org.apache.drill.exec.store.pcapng.schema.columns.Column;
+import org.apache.drill.exec.store.pcapng.schema.columns.DstIpImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.DstMacImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.DstPortImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.PacketDataImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.PacketLenImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.SrcIpImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.SrcMacImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.SrcPortImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpAckImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlags;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsAckImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsCwrImpl;
+import 
org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsEceCongestionExperiencedImpl;
+import 
org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsEceEcnCapableImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsEceImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsFinImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsNsImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsPshImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsRstImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsSynImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpFlagsUrgIml;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpParsedFlags;
+import org.apache.drill.exec.store.pcapng.schema.columns.TcpSessionImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TimestampImpl;
+import org.apache.drill.exec.store.pcapng.schema.columns.TypeImpl;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Set;
+
+public class Schema {
+
+  private final static Map columns = new HashMap<>();
+
+  static {
+columns.put("timestamp", new TimestampImpl());
 
 Review comment:
   Sure, I'll move it to the one class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on issue #1385: DRILL-6612: Query fails with AssertionError when joining persistent a…

2018-07-18 Thread GitBox
vvysotskyi commented on issue #1385: DRILL-6612: Query fails with 
AssertionError when joining persistent a…
URL: https://github.com/apache/drill/pull/1385#issuecomment-405924429
 
 
   @arina-ielchiieva, thanks for the review!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added pcapng-format support

2018-07-18 Thread GitBox
Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added 
pcapng-format support
URL: https://github.com/apache/drill/pull/1126#discussion_r203371235
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcapng/PcapngRecordReader.java
 ##
 @@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcapng;
+
+import com.google.common.collect.ImmutableList;
+import fr.bmartel.pcapdecoder.PcapDecoder;
+import fr.bmartel.pcapdecoder.structure.types.IPcapngType;
+import fr.bmartel.pcapdecoder.structure.types.inter.IEnhancedPacketBLock;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcapng.schema.Schema;
+import org.apache.drill.exec.store.pcapng.schema.columns.Column;
+import org.apache.drill.exec.store.pcapng.schema.columns.DummyImpl;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+public class PcapngRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapngRecordReader.class);
+
+  // batch size should not exceed max allowed record count
+  private static final int BATCH_SIZE = 40_000;
+
+  private final Path pathToFile;
+  private OutputMutator output;
+  private ImmutableList projectedCols;
+  private FileSystem fs;
+  private FSDataInputStream in;
+  private List columns;
+
+  private Iterator it;
+
+  public PcapngRecordReader(final String pathToFile,
+final FileSystem fileSystem,
+final List columns) {
+this.fs = fileSystem;
+this.pathToFile = fs.makeQualified(new Path(pathToFile));
+this.columns = columns;
+setColumns(columns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator output) 
throws ExecutionSetupException {
+try {
+
+  this.output = output;
+  this.in = fs.open(pathToFile);
+  PcapDecoder decoder = new PcapDecoder(IOUtils.toByteArray(in));
+  decoder.decode();
+  this.it = decoder.getSectionList().iterator();
+  setupProjection();
+} catch (IOException io) {
+  throw UserException.dataReadError(io)
+  .addContext("File name:", pathToFile.toUri().getPath())
+  .build(logger);
+}
+  }
+
+  @Override
+  public int next() {
+if (isSkipQuery()) {
+  return getBatchOfBlocks().size();
+} else {
+  return putToTable(getBatchOfBlocks());
+}
+  }
+
+  private int putToTable(final List batchOfBlocks) {
+int counter = 0;
+for (IEnhancedPacketBLock bLock : batchOfBlocks) {
+  for (ProjectedColumnInfo pci : projectedCols) {
+pci.getColumn().process(bLock, pci.getVv(), counter);
+  }
+  counter++;
+}
+return counter;
+  }
+
+  @Override
+  public void close() throws Exception {
+if (in != null) {
+  in.close();
+}
+  }
+
+  private void setupProjection() {
+if (isSkipQuery()) {
+  projectedCols = projectNone();
+} else if (isStarQuery()) {
+  projectedCols = projectAllCols(Schema.getColumnsNames());
+} else {
+  projectedCols = projectCols(columns);
+}
+  }
+
+  private ImmutableList projectNone() {
+ImmutableList.Builder pciBuilder = 
ImmutableList.builder();
+pciBuilder.add(makeCol

[GitHub] Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added pcapng-format support

2018-07-18 Thread GitBox
Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added 
pcapng-format support
URL: https://github.com/apache/drill/pull/1126#discussion_r203371152
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcapng/PcapngRecordReader.java
 ##
 @@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcapng;
+
+import com.google.common.collect.ImmutableList;
+import fr.bmartel.pcapdecoder.PcapDecoder;
+import fr.bmartel.pcapdecoder.structure.types.IPcapngType;
+import fr.bmartel.pcapdecoder.structure.types.inter.IEnhancedPacketBLock;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcapng.schema.Schema;
+import org.apache.drill.exec.store.pcapng.schema.columns.Column;
+import org.apache.drill.exec.store.pcapng.schema.columns.DummyImpl;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+public class PcapngRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapngRecordReader.class);
+
+  // batch size should not exceed max allowed record count
+  private static final int BATCH_SIZE = 40_000;
+
+  private final Path pathToFile;
+  private OutputMutator output;
+  private ImmutableList projectedCols;
+  private FileSystem fs;
+  private FSDataInputStream in;
+  private List columns;
+
+  private Iterator it;
+
+  public PcapngRecordReader(final String pathToFile,
+final FileSystem fileSystem,
+final List columns) {
+this.fs = fileSystem;
+this.pathToFile = fs.makeQualified(new Path(pathToFile));
+this.columns = columns;
+setColumns(columns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator output) 
throws ExecutionSetupException {
+try {
+
+  this.output = output;
+  this.in = fs.open(pathToFile);
+  PcapDecoder decoder = new PcapDecoder(IOUtils.toByteArray(in));
+  decoder.decode();
+  this.it = decoder.getSectionList().iterator();
+  setupProjection();
+} catch (IOException io) {
+  throw UserException.dataReadError(io)
+  .addContext("File name:", pathToFile.toUri().getPath())
+  .build(logger);
+}
+  }
+
+  @Override
+  public int next() {
+if (isSkipQuery()) {
+  return getBatchOfBlocks().size();
+} else {
+  return putToTable(getBatchOfBlocks());
+}
+  }
+
+  private int putToTable(final List batchOfBlocks) {
+int counter = 0;
+for (IEnhancedPacketBLock bLock : batchOfBlocks) {
+  for (ProjectedColumnInfo pci : projectedCols) {
+pci.getColumn().process(bLock, pci.getVv(), counter);
+  }
+  counter++;
+}
+return counter;
+  }
+
+  @Override
+  public void close() throws Exception {
+if (in != null) {
+  in.close();
+}
+  }
+
+  private void setupProjection() {
+if (isSkipQuery()) {
+  projectedCols = projectNone();
+} else if (isStarQuery()) {
+  projectedCols = projectAllCols(Schema.getColumnsNames());
+} else {
+  projectedCols = projectCols(columns);
+}
+  }
+
+  private ImmutableList projectNone() {
+ImmutableList.Builder pciBuilder = 
ImmutableList.builder();
+pciBuilder.add(makeCol

[GitHub] Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added pcapng-format support

2018-07-18 Thread GitBox
Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added 
pcapng-format support
URL: https://github.com/apache/drill/pull/1126#discussion_r203371188
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcapng/PcapngRecordReader.java
 ##
 @@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcapng;
+
+import com.google.common.collect.ImmutableList;
+import fr.bmartel.pcapdecoder.PcapDecoder;
+import fr.bmartel.pcapdecoder.structure.types.IPcapngType;
+import fr.bmartel.pcapdecoder.structure.types.inter.IEnhancedPacketBLock;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcapng.schema.Schema;
+import org.apache.drill.exec.store.pcapng.schema.columns.Column;
+import org.apache.drill.exec.store.pcapng.schema.columns.DummyImpl;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+public class PcapngRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapngRecordReader.class);
+
+  // batch size should not exceed max allowed record count
+  private static final int BATCH_SIZE = 40_000;
+
+  private final Path pathToFile;
+  private OutputMutator output;
+  private ImmutableList projectedCols;
+  private FileSystem fs;
+  private FSDataInputStream in;
+  private List columns;
+
+  private Iterator it;
+
+  public PcapngRecordReader(final String pathToFile,
+final FileSystem fileSystem,
+final List columns) {
+this.fs = fileSystem;
+this.pathToFile = fs.makeQualified(new Path(pathToFile));
+this.columns = columns;
+setColumns(columns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator output) 
throws ExecutionSetupException {
+try {
+
+  this.output = output;
+  this.in = fs.open(pathToFile);
+  PcapDecoder decoder = new PcapDecoder(IOUtils.toByteArray(in));
+  decoder.decode();
+  this.it = decoder.getSectionList().iterator();
+  setupProjection();
+} catch (IOException io) {
+  throw UserException.dataReadError(io)
+  .addContext("File name:", pathToFile.toUri().getPath())
+  .build(logger);
+}
+  }
+
+  @Override
+  public int next() {
+if (isSkipQuery()) {
+  return getBatchOfBlocks().size();
+} else {
+  return putToTable(getBatchOfBlocks());
+}
+  }
+
+  private int putToTable(final List batchOfBlocks) {
+int counter = 0;
+for (IEnhancedPacketBLock bLock : batchOfBlocks) {
+  for (ProjectedColumnInfo pci : projectedCols) {
+pci.getColumn().process(bLock, pci.getVv(), counter);
+  }
+  counter++;
+}
+return counter;
+  }
+
+  @Override
+  public void close() throws Exception {
+if (in != null) {
+  in.close();
+}
+  }
+
+  private void setupProjection() {
+if (isSkipQuery()) {
+  projectedCols = projectNone();
+} else if (isStarQuery()) {
+  projectedCols = projectAllCols(Schema.getColumnsNames());
+} else {
+  projectedCols = projectCols(columns);
+}
+  }
+
+  private ImmutableList projectNone() {
+ImmutableList.Builder pciBuilder = 
ImmutableList.builder();
+pciBuilder.add(makeCol

[GitHub] Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added pcapng-format support

2018-07-18 Thread GitBox
Vlad-Storona commented on a change in pull request #1126: DRILL-6179: Added 
pcapng-format support
URL: https://github.com/apache/drill/pull/1126#discussion_r203371117
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcapng/PcapngRecordReader.java
 ##
 @@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcapng;
+
+import com.google.common.collect.ImmutableList;
+import fr.bmartel.pcapdecoder.PcapDecoder;
+import fr.bmartel.pcapdecoder.structure.types.IPcapngType;
+import fr.bmartel.pcapdecoder.structure.types.inter.IEnhancedPacketBLock;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.pcapng.schema.Schema;
+import org.apache.drill.exec.store.pcapng.schema.columns.Column;
+import org.apache.drill.exec.store.pcapng.schema.columns.DummyImpl;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+public class PcapngRecordReader extends AbstractRecordReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapngRecordReader.class);
+
+  // batch size should not exceed max allowed record count
+  private static final int BATCH_SIZE = 40_000;
+
+  private final Path pathToFile;
+  private OutputMutator output;
+  private ImmutableList projectedCols;
+  private FileSystem fs;
+  private FSDataInputStream in;
+  private List columns;
+
+  private Iterator it;
+
+  public PcapngRecordReader(final String pathToFile,
+final FileSystem fileSystem,
+final List columns) {
+this.fs = fileSystem;
+this.pathToFile = fs.makeQualified(new Path(pathToFile));
+this.columns = columns;
+setColumns(columns);
+  }
+
+  @Override
+  public void setup(final OperatorContext context, final OutputMutator output) 
throws ExecutionSetupException {
+try {
+
+  this.output = output;
+  this.in = fs.open(pathToFile);
+  PcapDecoder decoder = new PcapDecoder(IOUtils.toByteArray(in));
+  decoder.decode();
+  this.it = decoder.getSectionList().iterator();
+  setupProjection();
+} catch (IOException io) {
+  throw UserException.dataReadError(io)
+  .addContext("File name:", pathToFile.toUri().getPath())
+  .build(logger);
+}
+  }
+
+  @Override
+  public int next() {
+if (isSkipQuery()) {
+  return getBatchOfBlocks().size();
+} else {
+  return putToTable(getBatchOfBlocks());
+}
+  }
+
+  private int putToTable(final List batchOfBlocks) {
+int counter = 0;
+for (IEnhancedPacketBLock bLock : batchOfBlocks) {
+  for (ProjectedColumnInfo pci : projectedCols) {
+pci.getColumn().process(bLock, pci.getVv(), counter);
+  }
+  counter++;
+}
+return counter;
+  }
+
+  @Override
+  public void close() throws Exception {
+if (in != null) {
+  in.close();
+}
+  }
+
+  private void setupProjection() {
+if (isSkipQuery()) {
+  projectedCols = projectNone();
+} else if (isStarQuery()) {
+  projectedCols = projectAllCols(Schema.getColumnsNames());
+} else {
+  projectedCols = projectCols(columns);
+}
+  }
+
+  private ImmutableList projectNone() {
+ImmutableList.Builder pciBuilder = 
ImmutableList.builder();
+pciBuilder.add(makeCol

[GitHub] arina-ielchiieva commented on issue #1385: DRILL-6612: Query fails with AssertionError when joining persistent a…

2018-07-18 Thread GitBox
arina-ielchiieva commented on issue #1385: DRILL-6612: Query fails with 
AssertionError when joining persistent a…
URL: https://github.com/apache/drill/pull/1385#issuecomment-405923205
 
 
   +1, LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on a change in pull request #1385: DRILL-6612: Query fails with AssertionError when joining persistent a…

2018-07-18 Thread GitBox
vvysotskyi commented on a change in pull request #1385: DRILL-6612: Query fails 
with AssertionError when joining persistent a…
URL: https://github.com/apache/drill/pull/1385#discussion_r203368482
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SchemaUtilites.java
 ##
 @@ -144,6 +144,11 @@ public static String getSchemaPath(List 
schemaPath) {
 return SCHEMA_PATH_JOINER.join(schemaPath);
   }
 
+  /** Utility method to get the list with schema path components for given 
schema path string. */
+  public static List getSchemaPathList(String schemaPath) {
 
 Review comment:
   Thanks, done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on a change in pull request #1385: DRILL-6612: Query fails with AssertionError when joining persistent a…

2018-07-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #1385: DRILL-6612: Query 
fails with AssertionError when joining persistent a…
URL: https://github.com/apache/drill/pull/1385#discussion_r203363652
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SchemaUtilites.java
 ##
 @@ -144,6 +144,11 @@ public static String getSchemaPath(List 
schemaPath) {
 return SCHEMA_PATH_JOINER.join(schemaPath);
   }
 
+  /** Utility method to get the list with schema path components for given 
schema path string. */
+  public static List getSchemaPathList(String schemaPath) {
 
 Review comment:
   Please rename to `getSchemaPathAsList`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on issue #1386: DRILL-6574: Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

2018-07-18 Thread GitBox
arina-ielchiieva commented on issue #1386: DRILL-6574: Add option to push 
LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#issuecomment-405911162
 
 
   Please note currently Travis checks do not pass since #1384 should be merged 
first.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on issue #1386: DRILL-6574: Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

2018-07-18 Thread GitBox
arina-ielchiieva commented on issue #1386: DRILL-6574: Add option to push 
LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386#issuecomment-405893185
 
 
   @vvysotskyi could you please review?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] KazydubB opened a new pull request #1386: DRILL-6574: Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

2018-07-18 Thread GitBox
KazydubB opened a new pull request #1386: DRILL-6574: Add option to push 
LIMIT(0) on top of SCAN (late limit 0 optimization)
URL: https://github.com/apache/drill/pull/1386
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] KazydubB commented on issue #1384: DRILL-6606: Fixed bug in HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.

2018-07-18 Thread GitBox
KazydubB commented on issue #1384: DRILL-6606: Fixed bug in HashJoin that 
caused it not to return OK_NEW_SCHEMA in some cases.
URL: https://github.com/apache/drill/pull/1384#issuecomment-405886995
 
 
   @ilooner, was verifying my changes with your fix and found that there is a 
failing test:
   ```
   Failed tests: 
 
TestDrillbitResilience.memoryLeaksWhenCancelled:914->assertCancelledWithoutException:556->assertStateCompleted:542
 Query state is incorrect (expected: CANCELED, actual: FAILED) AND/OR 
   Exception thrown: org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: IllegalStateException: Batch data read operation 
(getRecordCount()) attempted when last next() call on batch [#4207, 
HashJoinBatch] returned STOP (not OK or OK_NEW_SCHEMA).
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arina-ielchiieva commented on issue #1296: DRILL-5365: Prevent plugin config from changing default fs. Make DrillFileSystem Immutable.

2018-07-18 Thread GitBox
arina-ielchiieva commented on issue #1296: DRILL-5365: Prevent plugin config 
from changing default fs. Make DrillFileSystem Immutable.
URL: https://github.com/apache/drill/pull/1296#issuecomment-405874956
 
 
   @ilooner removed ready-to-commit label since I want to make sure we address 
comment about concurrency.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi opened a new pull request #1385: DRILL-6612: Query fails with AssertionError when joining persistent a…

2018-07-18 Thread GitBox
vvysotskyi opened a new pull request #1385: DRILL-6612: Query fails with 
AssertionError when joining persistent a…
URL: https://github.com/apache/drill/pull/1385
 
 
   …nd temporary tables
   
   For details please see 
[DRILL-6612](https://issues.apache.org/jira/browse/DRILL-6612)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ilooner commented on issue #1384: DRILL-6606: Fixed bug in HashJoin that caused it not to return OK_NEW_SCHEMA in some cases.

2018-07-18 Thread GitBox
ilooner commented on issue #1384: DRILL-6606: Fixed bug in HashJoin that caused 
it not to return OK_NEW_SCHEMA in some cases.
URL: https://github.com/apache/drill/pull/1384#issuecomment-405840660
 
 
   @Ben-Zvi please review. This should be merged before DRILL-6453.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >