[GitHub] weijietong commented on issue #1334: DRILL-6385: Support JPPD feature

2018-08-18 Thread GitBox
weijietong commented on issue #1334: DRILL-6385: Support JPPD feature
URL: https://github.com/apache/drill/pull/1334#issuecomment-414051224
 
 
   Thanks for your effort to review this PR @amansinha100 @sohami .  It has 
been rebased onto the master and squashed into one commit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] asfgit closed pull request #1419: DRILL-6662: Access AWS access key ID and secret access key using Cred…

2018-08-18 Thread GitBox
asfgit closed pull request #1419: DRILL-6662: Access AWS access key ID and 
secret access key using Cred…
URL: https://github.com/apache/drill/pull/1419
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/distribution/src/resources/core-site-example.xml 
b/distribution/src/resources/core-site-example.xml
index 854e54dbe67..c7225a14784 100644
--- a/distribution/src/resources/core-site-example.xml
+++ b/distribution/src/resources/core-site-example.xml
@@ -30,4 +30,18 @@
 ENTER_YOUR_SECRETKEY
 
 
+
+
+
+
+
+
 
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
index b1f41a41451..cb66913ad13 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java
@@ -20,6 +20,7 @@
 import static 
org.apache.drill.exec.store.dfs.FileSystemSchemaFactory.DEFAULT_WS_NAME;
 
 import java.io.IOException;
+import java.net.URI;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
@@ -77,6 +78,10 @@ public FileSystemPlugin(FileSystemConfig config, 
DrillbitContext context, String
   fsConf.set("fs.classpath.impl", ClassPathFileSystem.class.getName());
   fsConf.set("fs.drill-local.impl", 
LocalSyncableFileSystem.class.getName());
 
+  if (isS3Connection(fsConf)) {
+handleS3Credentials(fsConf);
+  }
+
   formatCreator = newFormatCreator(config, context, fsConf);
   List matchers = new ArrayList<>();
   formatPluginsByConfig = new HashMap<>();
@@ -104,6 +109,33 @@ public FileSystemPlugin(FileSystemConfig config, 
DrillbitContext context, String
 }
   }
 
+  private boolean isS3Connection(Configuration conf) {
+URI uri = FileSystem.getDefaultUri(conf);
+return uri.getScheme().equals("s3a");
+  }
+
+  /**
+   * Retrieve secret and access keys from configured (with
+   * {@link 
org.apache.hadoop.security.alias.CredentialProviderFactory#CREDENTIAL_PROVIDER_PATH}
 property)
+   * credential providers and set it into {@code conf}. If provider path is 
not configured or credential
+   * is absent in providers, it will conditionally fallback to configuration 
setting. The fallback will occur unless
+   * {@link 
org.apache.hadoop.security.alias.CredentialProvider#CLEAR_TEXT_FALLBACK} is set 
to {@code false}.
+   *
+   * @param conf {@code Configuration} which will be updated with credentials 
from provider
+   * @throws IOException thrown if a credential cannot be retrieved from 
provider
+   */
+  private void handleS3Credentials(Configuration conf) throws IOException {
+String[] credentialKeys = {"fs.s3a.secret.key", "fs.s3a.access.key"};
+for (String key : credentialKeys) {
+  char[] credentialChars = conf.getPassword(key);
+  if (credentialChars == null) {
+logger.warn(String.format("Property '%s' is absent.", key));
+  } else {
+conf.set(key, String.valueOf(credentialChars));
+  }
+}
+  }
+
   /**
* Creates a new FormatCreator instance.
*


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] asfgit closed pull request #1432: DRILL-6689: Include query user information to drillbit.log

2018-08-18 Thread GitBox
asfgit closed pull request #1432: DRILL-6689: Include query user information to 
drillbit.log
URL: https://github.com/apache/drill/pull/1432
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java 
b/exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java
index 16e8eb0d8cd..fa085e99b04 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java
@@ -260,9 +260,10 @@ public void run() {
 break;
   case SQL:
 final String sql = queryRequest.getPlan();
-// log query id and query text before starting any real work. Also, put
+// log query id, username and query text before starting any real 
work. Also, put
 // them together such that it is easy to search based on query id
-logger.info("Query text for query id {}: {}", this.queryIdString, sql);
+logger.info("Query text for query with id {} issued by {}: {}", 
queryIdString,
+queryContext.getQueryUserName(), sql);
 runSQL(sql);
 break;
   case EXECUTION:


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] asfgit closed pull request #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS logical type handling with earlier versions + minor fixes

2018-08-18 Thread GitBox
asfgit closed pull request #1428: DRILL-6670: align Parquet TIMESTAMP_MICROS 
logical type handling with earlier versions + minor fixes
URL: https://github.com/apache/drill/pull/1428
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
index 47f2e18a9a7..e0ff85d4c3c 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
@@ -106,11 +106,21 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
   ParquetReaderUtility.detectCorruptDates(footer, 
rowGroupScan.getColumns(), autoCorrectCorruptDates);
 logger.debug("Contains corrupt dates: {}", containsCorruptDates);
 
-if 
(!context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER)
-&& !ParquetReaderUtility.containsComplexColumn(footer, 
rowGroupScan.getColumns())) {
-  logger.debug("Query {} qualifies for new Parquet reader",
-  
QueryIdHelper.getQueryId(oContext.getFragmentContext().getHandle().getQueryId()));
-  readers.add(new ParquetRecordReader(context,
+boolean useNewReader = 
context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER);
+boolean containsComplexColumn = 
ParquetReaderUtility.containsComplexColumn(footer, rowGroupScan.getColumns());
+logger.debug("PARQUET_NEW_RECORD_READER is {}. Complex columns {}.", 
useNewReader ? "enabled" : "disabled",
+containsComplexColumn ? "found." : "not found.");
+RecordReader reader;
+
+if (useNewReader || containsComplexColumn) {
+  reader = new DrillParquetReader(context,
+  footer,
+  rowGroup,
+  columnExplorer.getTableColumns(),
+  fs,
+  containsCorruptDates);
+} else {
+  reader = new ParquetRecordReader(context,
   rowGroup.getPath(),
   rowGroup.getRowGroupIndex(),
   rowGroup.getNumRecordsToRead(),
@@ -118,18 +128,14 @@ protected ScanBatch getBatch(ExecutorFragmentContext 
context, AbstractParquetRow
   CodecFactory.createDirectCodecFactory(fs.getConf(), new 
ParquetDirectByteBufferAllocator(oContext.getAllocator()), 0),
   footer,
   rowGroupScan.getColumns(),
-  containsCorruptDates));
-} else {
-  logger.debug("Query {} doesn't qualify for new reader, using old 
one",
-  
QueryIdHelper.getQueryId(oContext.getFragmentContext().getHandle().getQueryId()));
-  readers.add(new DrillParquetReader(context,
-  footer,
-  rowGroup,
-  columnExplorer.getTableColumns(),
-  fs,
-  containsCorruptDates));
+  containsCorruptDates);
 }
 
+logger.debug("Query {} uses {}",
+
QueryIdHelper.getQueryId(oContext.getFragmentContext().getHandle().getQueryId()),
+reader.getClass().getSimpleName());
+readers.add(reader);
+
 List partitionValues = 
rowGroupScan.getPartitionValues(rowGroup);
 Map implicitValues = 
columnExplorer.populateImplicitColumns(rowGroup.getPath(), partitionValues, 
rowGroupScan.supportsFileImplicitColumns());
 implicitColumns.add(implicitValues);
diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java
index 798d3c27306..03d5382fcda 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java
@@ -77,6 +77,10 @@
   } else if 
(!columnChunkMetaData.getEncodings().contains(Encoding.PLAIN_DICTIONARY) && (
   columnChunkMetaData.getType() == 
PrimitiveType.PrimitiveTypeName.FIXED_LEN_BYTE_ARRAY
   || columnChunkMetaData.getType() == 
PrimitiveType.PrimitiveTypeName.INT96)) {
+if (convertedType == null) {
+  return new FixedByteAlignedReader.FixedBinaryReader(recordReader, 
descriptor,
+  columnChunkMetaData, (VariableWidthVector) v, schemaElement);
+}
 switch (convertedType) {
   case DECIMAL:
 return new FixedByteAlignedReader.VarDecimalReader(recordReader, 
descriptor,
@

[GitHub] ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-18 Thread GitBox
ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List 
and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r211061681
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/rowSet/test/TestVariantAccessors.java
 ##
 @@ -0,0 +1,1178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.test.rowSet.test;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertSame;
+import static org.junit.Assert.assertTrue;
+
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.NullableBigIntVector;
+import org.apache.drill.exec.vector.NullableFloat8Vector;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ObjectReader;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleReader;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.accessor.VariantReader;
+import org.apache.drill.exec.vector.accessor.VariantWriter;
+import org.apache.drill.exec.vector.complex.ListVector;
+import org.apache.drill.exec.vector.complex.MapVector;
+import org.apache.drill.exec.vector.complex.UnionVector;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetReader;
+import org.apache.drill.test.rowSet.RowSetWriter;
+import org.apache.drill.test.rowSet.RowSet.ExtendableRowSet;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.junit.Test;
+
+/**
+ * Tests for readers and writers for union and list types.
+ * 
+ * Note that the union type is only partially supported in Drill.
+ * The list type is unsupported. (However, the list type works in
+ * the schema builder, row set writer, row set reader and the
+ * result set builder. It does not, however, work in the Project
+ * and other operators. Some assembly required for future use.)
+ */
+
+public class TestVariantAccessors extends SubOperatorTest {
+
+  @SuppressWarnings("resource")
+  @Test
+  public void testBuildRowSetUnion() {
+final TupleMetadata schema = new SchemaBuilder()
+
+// Union with simple and complex types
+
+.addUnion("u")
+  .addType(MinorType.INT)
+  .addMap()
+.addNullable("c", MinorType.BIGINT)
+.addNullable("d", MinorType.VARCHAR)
+.resumeUnion()
+  .addList()
+.addType(MinorType.VARCHAR)
+.buildNested()
+  .resumeSchema()
+.buildSchema();
+
+final ExtendableRowSet rowSet = fixture.rowSet(schema);
+final VectorContainer vc = rowSet.container();
+assertEquals(1, vc.getNumberOfColumns());
+
+// Single union
+
+final ValueVector vector = vc.getValueVector(0).getValueVector();
+assertTrue(vector instanceof UnionVector);
+final UnionVector union = (UnionVector) vector;
+
+final MapVector typeMap = union.getTypeMap();
+ValueVector member = typeMap.getChild(MinorType.INT.name());
+assertTrue(member instanceof NullableIntVector);
+
+// Inner map
+
+member = typeMap.getChild(MinorType.MAP.name());
+assertTrue(member instanceof MapVector);
+member = typeMap.getChild(MinorType.MAP.name());
+assertTrue(member ins

[GitHub] ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-18 Thread GitBox
ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List 
and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r211063693
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/ListVector.java
 ##
 @@ -250,22 +270,140 @@ public void load(UserBitShared.SerializedField 
metadata, DrillBuf buffer) {
 bits.load(bitMetadata, buffer.slice(offsetLength, bitLength));
 
 final UserBitShared.SerializedField vectorMetadata = metadata.getChild(2);
-if (getDataVector() == DEFAULT_DATA_VECTOR) {
+if (isEmptyType()) {
   addOrGetVector(VectorDescriptor.create(vectorMetadata.getMajorType()));
 }
 
 final int vectorLength = vectorMetadata.getBufferLength();
 vector.load(vectorMetadata, buffer.slice(offsetLength + bitLength, 
vectorLength));
   }
 
+  public boolean isEmptyType() {
+return getDataVector() == DEFAULT_DATA_VECTOR;
+  }
+
+  @Override
+  public void setChildVector(ValueVector childVector) {
+
+// Unlike the repeated list vector, the (plain) list vector
+// adds the dummy vector as a child type.
+
+assert field.getChildren().size() == 1;
+assert field.getChildren().iterator().next().getType().getMinorType() == 
MinorType.LATE;
+field.removeChild(vector.getField());
+
+super.setChildVector(childVector);
+
+// Initial LATE type vector not added as a subtype initially.
+// So, no need to remove it, just add the new subtype. Since the
+// MajorType is immutable, must build a new one and replace the type
+// in the materialized field. (We replace the type, rather than creating
+// a new materialized field, to preserve the link to this field from
+// a parent map, list or union.)
+
+assert field.getType().getSubTypeCount() == 0;
+field.replaceType(
+field.getType().toBuilder()
+  .addSubType(childVector.getField().getType().getMinorType())
+  .build());
+  }
+
+  /**
+   * Promote the list to a union. Called from old-style writers. This 
implementation
+   * relies on the caller to set the types vector for any existing values.
+   * This method simply clears the existing vector.
+   *
+   * @return the new union vector
+   */
+
   public UnionVector promoteToUnion() {
-MaterializedField newField = 
MaterializedField.create(getField().getName(), Types.optional(MinorType.UNION));
-UnionVector vector = new UnionVector(newField, allocator, null);
+UnionVector vector = createUnion();
+
+// Replace the current vector, clearing its data. (This is the
+// old behavior.
+
 replaceDataVector(vector);
 reader = new UnionListReader(this);
 return vector;
   }
 
+  /**
+   * Revised form of promote to union that correctly fixes up the list
+   * field metadata to match the new union type.
+   *
+   * @return the new union vector
+   */
+
+  public UnionVector promoteToUnion2() {
 
 Review comment:
   do we need both versions of promoteToUnion ? please fix the comments. Both 
of them say old behavior. Also, if we want to keep both of them, can we name 
them better ? may be promoteToUnionNew instead of promoteToUnion2 ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-18 Thread GitBox
ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List 
and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r211062745
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
 ##
 @@ -212,6 +212,18 @@ protected void replaceDataVector(ValueVector v) {
 vector = v;
   }
 
+  public void setChildVector(ValueVector childVector) {
+
+// When created, the list uses the default vector of type LATE.
+// That entry appears as a child vector. Remove it and add the
+// new type instead.
+
+assert vector == DEFAULT_DATA_VECTOR;
+replaceDataVector(childVector);
 
 Review comment:
   do we want to check if childVector is valid i.e. not null and any other 
checks ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-18 Thread GitBox
ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List 
and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r211063114
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/ListVector.java
 ##
 @@ -41,25 +43,42 @@
 import org.apache.drill.exec.vector.complex.reader.FieldReader;
 import org.apache.drill.exec.vector.complex.writer.FieldWriter;
 
-import java.util.List;
-import java.util.Set;
+import com.google.common.collect.ObjectArrays;
+
+import io.netty.buffer.DrillBuf;
 
 public class ListVector extends BaseRepeatedValueVector {
 
-  private UInt4Vector offsets;
+  public static final String UNION_VECTOR_NAME = "$union$";
+
   private final UInt1Vector bits;
-  private Mutator mutator = new Mutator();
-  private Accessor accessor = new Accessor();
+  private final Mutator mutator = new Mutator();
+  private final Accessor accessor = new Accessor();
   private UnionListWriter writer;
   private UnionListReader reader;
 
   public ListVector(MaterializedField field, BufferAllocator allocator, 
CallBack callBack) {
 super(field, allocator);
+// Can't do this. See below.
 
 Review comment:
   what are the implications of this "can't do this"  and commented out code 
below ? do we want to leave commented out code here ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-18 Thread GitBox
ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List 
and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r211076352
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/VariantReader.java
 ##
 @@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.accessor;
+
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.metadata.VariantMetadata;
+
+/**
+ * Reader for a Drill "union vector." The union vector is presented
+ * as a reader over a set of variants. In the old Visual Basic world,
+ * "the Variant data type is a tagged union that can be used to
+ * represent any other data type." The term is used here to avoid
+ * confusion with the "union operator" which is something entirely
+ * different.
+ * 
+ * At read time, the set of possible types is fixed. A request to
+ * obtain a reader for an unused type returns a null pointer.
+ * 
+ * This reader is essentially a map of types: it allows access to
+ * type-specific readers for the set of types supported in the
+ * current vector. A client checks the type of each value, then
+ * uses the proper type-specific reader to access that value.
+ *
+ * @see {@link VariantWriter}
+ */
+
+public interface VariantReader extends ColumnReader {
+
+  VariantMetadata variantSchema();
+
+  int size();
+
+  /**
+   * Determine if a given type is supported by the union vector
+   * for some value in the result set.
+   *
+   * @param type the Drill minor type to query
+   *
+   * @return true if a reader for the given type is available,
+   * false if a request for a reader of that type will return
+   * null.
+   */
+
+  boolean hasType(MinorType type);
+
+  /**
+   * Return the member reader for the given type. The type must be a member
+   * of the union. Allows caching readers across rows.
+   *
+   * @param type member type
+   * @return reader for that type
+   */
+
+  ObjectReader member(MinorType type);
+
+  /**
+   * Return the scalar reader for the given type member. The type must be a
+   * member of the union. Allows caching readers across rows. Identical to:
+   * >member(type).scalar()
+   *
+   * @param type member type
+   * @return scalar reader for that type
+   */
+
+  ScalarReader scalar(MinorType type);
+
+  /**
+   * Return the data type of the current value. (What happens if the row is
+   * null, must it be a null of some type?)
+   *
+   * @return data type of the current data value
+   */
+
+  MinorType dataType();
+
+  /**
+   * Return the writer for the member type of the current row.
 
 Review comment:
   return the "reader"


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-18 Thread GitBox
ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List 
and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r211061460
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/rowSet/impl/TestResultSetLoaderRepeatedList.java
 ##
 @@ -0,0 +1,390 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet.impl;
+
+import static org.apache.drill.test.rowSet.RowSetUtilities.objArray;
+import static org.apache.drill.test.rowSet.RowSetUtilities.strArray;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertTrue;
+
+import java.util.Arrays;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetLoader;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.ColumnMetadata.StructureType;
+import org.apache.drill.exec.record.metadata.RepeatedListColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.accessor.ArrayReader;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarReader;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.writer.RepeatedListWriter;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSet;
+import org.apache.drill.test.rowSet.RowSetReader;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.junit.Test;
+
+import com.google.common.base.Charsets;
+
+/**
+ * Tests repeated list support. Repeated lists add another layer of 
dimensionality
+ * on top of other repeated types. Since a repeated list can wrap another 
repeated
+ * list, the repeated list allows creating 2D, 3D or higher dimensional arrays 
(lists,
+ * actually, since the different "slices" need not have the same length...)
+ * Repeated lists appear to be used only by JSON.
+ */
+
+public class TestResultSetLoaderRepeatedList extends SubOperatorTest {
+
+  @Test
+  public void test2DEarlySchema() {
+TupleMetadata schema = new SchemaBuilder()
+.add("id", MinorType.INT)
+.addRepeatedList("list2")
+  .addArray(MinorType.VARCHAR)
+  .resumeSchema()
+.buildSchema();
+
+ResultSetLoaderImpl.ResultSetOptions options = new OptionBuilder()
+.setSchema(schema)
+.build();
+ResultSetLoader rsLoader = new ResultSetLoaderImpl(fixture.allocator(), 
options);
+
+do2DTest(schema, rsLoader);
+rsLoader.close();
+  }
+
+  private void do2DTest(TupleMetadata schema, ResultSetLoader rsLoader) {
+RowSetLoader writer = rsLoader.writer();
 
 Review comment:
   Not a must. how about writing a test for 3D i.e. RepeatedRepeatedList ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-18 Thread GitBox
ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List 
and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r211076015
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/ListVector.java
 ##
 @@ -250,22 +270,140 @@ public void load(UserBitShared.SerializedField 
metadata, DrillBuf buffer) {
 bits.load(bitMetadata, buffer.slice(offsetLength, bitLength));
 
 final UserBitShared.SerializedField vectorMetadata = metadata.getChild(2);
-if (getDataVector() == DEFAULT_DATA_VECTOR) {
+if (isEmptyType()) {
   addOrGetVector(VectorDescriptor.create(vectorMetadata.getMajorType()));
 }
 
 final int vectorLength = vectorMetadata.getBufferLength();
 vector.load(vectorMetadata, buffer.slice(offsetLength + bitLength, 
vectorLength));
   }
 
+  public boolean isEmptyType() {
+return getDataVector() == DEFAULT_DATA_VECTOR;
+  }
+
+  @Override
+  public void setChildVector(ValueVector childVector) {
+
+// Unlike the repeated list vector, the (plain) list vector
+// adds the dummy vector as a child type.
+
+assert field.getChildren().size() == 1;
+assert field.getChildren().iterator().next().getType().getMinorType() == 
MinorType.LATE;
+field.removeChild(vector.getField());
+
+super.setChildVector(childVector);
+
+// Initial LATE type vector not added as a subtype initially.
+// So, no need to remove it, just add the new subtype. Since the
+// MajorType is immutable, must build a new one and replace the type
+// in the materialized field. (We replace the type, rather than creating
+// a new materialized field, to preserve the link to this field from
+// a parent map, list or union.)
+
+assert field.getType().getSubTypeCount() == 0;
+field.replaceType(
+field.getType().toBuilder()
+  .addSubType(childVector.getField().getType().getMinorType())
+  .build());
+  }
+
+  /**
+   * Promote the list to a union. Called from old-style writers. This 
implementation
+   * relies on the caller to set the types vector for any existing values.
+   * This method simply clears the existing vector.
+   *
+   * @return the new union vector
+   */
+
   public UnionVector promoteToUnion() {
-MaterializedField newField = 
MaterializedField.create(getField().getName(), Types.optional(MinorType.UNION));
-UnionVector vector = new UnionVector(newField, allocator, null);
+UnionVector vector = createUnion();
+
+// Replace the current vector, clearing its data. (This is the
+// old behavior.
+
 replaceDataVector(vector);
 reader = new UnionListReader(this);
 return vector;
   }
 
+  /**
+   * Revised form of promote to union that correctly fixes up the list
+   * field metadata to match the new union type.
+   *
+   * @return the new union vector
+   */
+
+  public UnionVector promoteToUnion2() {
+UnionVector unionVector = createUnion();
+
+// Replace the current vector, clearing its data. (This is the
+// old behavior.
+
+setChildVector(unionVector);
+return unionVector;
+  }
+
+  /**
+   * Promote to a union, preserving the existing data vector as a member of
+   * the new union. Back-fill the types vector with the proper type value
+   * for existing rows.
+   *
+   * @return the new union vector
+   */
+
+  public UnionVector convertToUnion(int allocValueCount, int valueCount) {
+assert allocValueCount >= valueCount;
+UnionVector unionVector = createUnion();
+unionVector.allocateNew(allocValueCount);
+
+// Preserve the current vector (and its data) if it is other than
+// the default. (New behavior used by column writers.)
+
+if (! isEmptyType()) {
+  unionVector.addType(vector);
+  int prevType = vector.getField().getType().getMinorType().getNumber();
+  UInt1Vector.Mutator typeMutator = 
unionVector.getTypeVector().getMutator();
+
+  // If the previous vector was nullable, then promote the nullable state
+  // to the type vector by setting either the null marker or the type
+  // marker depending on the original nullable values.
+
+  if (vector instanceof NullableVector) {
+UInt1Vector.Accessor bitsAccessor =
+((UInt1Vector) ((NullableVector) 
vector).getBitsVector()).getAccessor();
+for (int i = 0; i < valueCount; i++) {
+  typeMutator.setSafe(i, (bitsAccessor.get(i) == 0)
+  ? UnionVector.NULL_MARKER
+  : prevType);
+}
+  } else {
+
+// The value is not nullable. (Perhaps it is a map.)
+// Note that the original design of lists have a flaw: if the sole 
member
+// is a map, then map entries can't be nullable when the only type, but
+// become nullable when in a union. What a mess...
 
 Review comment:
   should we add those JIRAs in the comments ? lot of detailed information. 

---

[GitHub] ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-18 Thread GitBox
ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List 
and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r211064266
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/BuildFromSchema.java
 ##
 @@ -96,4 +132,67 @@ private void expandMap(ObjectWriter colWriter, 
ColumnMetadata colSchema) {
   buildTuple(colWriter.tuple(), colSchema.mapSchema());
 }
   }
+
+  private void buildVariant(ParentShim parent, ColumnMetadata colSchema) {
+final ObjectWriter colWriter = parent.add(colSchema.cloneEmpty());
+expandVariant(colWriter, colSchema);
+  }
+
+  private void expandVariant(ObjectWriter colWriter, ColumnMetadata colSchema) 
{
+if (colSchema.isArray()) {
+  buildUnion(colWriter.array().variant(), colSchema.variantSchema());
+} else {
+  buildUnion(colWriter.variant(), colSchema.variantSchema());
+}
+  }
+
+  public void buildUnion(VariantWriter writer, VariantMetadata schema) {
+final UnionShim unionShim = new UnionShim(writer);
+for (final ColumnMetadata member : schema.members()) {
+  buildColumn(unionShim, member);
+}
+  }
+
+  private void buildSingleList(ParentShim parent, ColumnMetadata colSchema) {
+final ColumnMetadata seed = colSchema.cloneEmpty();
+final ColumnMetadata subtype = colSchema.variantSchema().listSubtype();
+seed.variantSchema().addType(subtype.cloneEmpty());
+seed.variantSchema().becomeSimple();
+final ObjectWriter listWriter = parent.add(seed);
+expandColumn(listWriter, subtype);
+  }
+
+  /**
+   * Expand a repeated list. The list may be multi-dimensional, meaning that
+   * it may have may layers of other repeated lists before we get to the 
element
+   * (inner-most) array.
+   *
+   * @param writer tuple writer for the tuple that holds the array
+   * @param colSchema schema definition of the array
+   */
+
+  private void buildRepeatedList(ParentShim parent, ColumnMetadata colSchema) {
+final ColumnMetadata seed = colSchema.cloneEmpty();
+final RepeatedListWriter listWriter = (RepeatedListWriter) 
parent.add(seed).array();
+final ColumnMetadata elements = colSchema.childSchema();
+if (elements != null) {
+  final RepeatedListShim listShim = new RepeatedListShim(listWriter);
+  buildColumn(listShim, elements);
+}
+  }
+
+  private void expandColumn(ObjectWriter colWriter, ColumnMetadata colSchema) {
+
+if (colSchema.structureType() == StructureType.MULTI_ARRAY) {
+  assert false;
+} else if (colSchema.isMap()) {
+  expandMap(colWriter, colSchema);
+} else if (isSingleList(colSchema)) {
+  assert false;
+} else if (colSchema.isVariant()) {
+  expandVariant(colWriter, colSchema);
+} else {
+  // Nothing to expand for primitives
 
 Review comment:
   no need for this else then ? you can leave the comment. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-18 Thread GitBox
ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List 
and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r211061558
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/rowSet/impl/TestResultSetLoaderUnions.java
 ##
 @@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet.impl;
+
+import static org.apache.drill.test.rowSet.RowSetUtilities.listValue;
+import static org.apache.drill.test.rowSet.RowSetUtilities.mapValue;
+import static org.apache.drill.test.rowSet.RowSetUtilities.strArray;
+import static org.apache.drill.test.rowSet.RowSetUtilities.variantArray;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertSame;
+import static org.junit.Assert.assertTrue;
+
+import java.util.Arrays;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.rowSet.ResultSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetLoader;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.NullableIntVector;
+import org.apache.drill.exec.vector.NullableVarCharVector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ObjectType;
+import org.apache.drill.exec.vector.accessor.ObjectWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.accessor.ValueType;
+import org.apache.drill.exec.vector.accessor.VariantWriter;
+import org.apache.drill.exec.vector.accessor.writer.EmptyListShim;
+import org.apache.drill.exec.vector.accessor.writer.ListWriterImpl;
+import org.apache.drill.exec.vector.accessor.writer.SimpleListShim;
+import org.apache.drill.exec.vector.accessor.writer.UnionVectorShim;
+import org.apache.drill.exec.vector.accessor.writer.UnionWriterImpl;
+import org.apache.drill.exec.vector.complex.ListVector;
+import org.apache.drill.exec.vector.complex.UnionVector;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSet;
+import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.test.rowSet.RowSetBuilder;
+import org.apache.drill.test.rowSet.RowSetReader;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.apache.drill.test.rowSet.schema.SchemaBuilder;
+import org.junit.Test;
+
+import com.google.common.base.Charsets;
+
+/**
+ * Tests the result set loader support for union vectors. Union vectors
+ * are only lightly supported in Apache Drill and not supported at all
+ * in commercial versions. They have may problems: both in code and in theory.
+ * Most operators do not support them. But, JSON uses them, so they must
+ * be made to work in the result set loader layer.
+ */
+
+public class TestResultSetLoaderUnions extends SubOperatorTest {
+
+  @Test
+  public void testUnionBasics() {
+final TupleMetadata schema = new SchemaBuilder()
+.add("id", MinorType.INT)
+.addUnion("u")
+  .addType(MinorType.VARCHAR)
+  .addMap()
+.addNullable("a", MinorType.INT)
+.addNullable("b", MinorType.VARCHAR)
+.resumeUnion()
+  .resumeSchema()
+.buildSchema();
+
+final ResultSetLoaderImpl.ResultSetOptions options = new OptionBuilder()
+.setSchema(schema)
+.build();
+final ResultSetLoader rsLoader = new 
ResultSetLoaderImpl(fixture.allocator(), options);
+final RowSetLoader writer = rsLoader.writer();
+
+// Sanity check of writer structure
+
+final ObjectWriter wo = writer.column(1);
+assertEquals(

[GitHub] ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

2018-08-18 Thread GitBox
ppadma commented on a change in pull request #1429: DRILL-6676: Add Union, List 
and Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429#discussion_r211064165
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/BuildFromSchema.java
 ##
 @@ -60,17 +90,23 @@ public ObjectWriter add(ColumnMetadata colSchema) {
*/
 
   public void buildTuple(TupleWriter writer, TupleMetadata schema) {
-ParentShim tupleShim = new TupleShim(writer);
+final ParentShim tupleShim = new TupleShim(writer);
 for (int i = 0; i < schema.size(); i++) {
-  ColumnMetadata colSchema = schema.metadata(i);
+  final ColumnMetadata colSchema = schema.metadata(i);
   buildColumn(tupleShim, colSchema);
 }
   }
 
   private void buildColumn(ParentShim parent, ColumnMetadata colSchema) {
 
-if (colSchema.isMap()) {
+if (colSchema.structureType() == StructureType.MULTI_ARRAY) {
 
 Review comment:
   can we have isMultiArray method ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vvysotskyi commented on issue #1409: DRILL-6644: Don't reserve space for incoming probe batches unnecessarily during the build phase.

2018-08-18 Thread GitBox
vvysotskyi commented on issue #1409: DRILL-6644: Don't reserve space for 
incoming probe batches unnecessarily during the build phase.
URL: https://github.com/apache/drill/pull/1409#issuecomment-414060965
 
 
   @ilooner, this PR causes Advanced tests failures:
   ```
   tpcds/tpcds_sf1/original/parquet/query11.sql
   tpcds/tpcds_sf1/original/parquet/query4.sql
   
   java.sql.SQLException: SYSTEM ERROR: IllegalStateException
   
   Fragment 35:1
   
   [Error Id: 81d1f358-6871-4ab4-a79c-d632b6e5c1ae on cv1:31010]
   
 (java.lang.IllegalStateException) null
   com.google.common.base.Preconditions.checkState():158
   
org.apache.drill.exec.physical.impl.join.BatchSizePredictorImpl.predictBatchSize():76
   
org.apache.drill.exec.physical.impl.join.HashJoinMemoryCalculatorImpl$PostBuildCalculationsImpl.initialize():635
   
org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():844
   org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():414
   org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():544
   org.apache.drill.exec.record.AbstractRecordBatch.next():172
   org.apache.drill.exec.record.AbstractRecordBatch.next():119
   org.apache.drill.exec.record.AbstractRecordBatch.next():109
   org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
   
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():142
   org.apache.drill.exec.record.AbstractRecordBatch.next():172
   org.apache.drill.exec.record.AbstractRecordBatch.next():119
   org.apache.drill.exec.record.AbstractRecordBatch.next():109
   org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
   
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():142
   org.apache.drill.exec.record.AbstractRecordBatch.next():172
   org.apache.drill.exec.record.AbstractRecordBatch.next():119
   org.apache.drill.exec.test.generated.HashAggregatorGen75169.doWork():649
   
org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():273
   org.apache.drill.exec.record.AbstractRecordBatch.next():172
   org.apache.drill.exec.record.AbstractRecordBatch.next():119
   org.apache.drill.exec.record.AbstractRecordBatch.next():109
   org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
   
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():142
   org.apache.drill.exec.record.AbstractRecordBatch.next():172
   org.apache.drill.exec.physical.impl.BaseRootExec.next():103
   
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
   org.apache.drill.exec.physical.impl.BaseRootExec.next():93
   org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
   org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
   java.security.AccessController.doPrivileged():-2
   javax.security.auth.Subject.doAs():422
   org.apache.hadoop.security.UserGroupInformation.doAs():1595
   org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
   org.apache.drill.common.SelfCleaningRunnable.run():38
   java.util.concurrent.ThreadPoolExecutor.runWorker():1149
   java.util.concurrent.ThreadPoolExecutor$Worker.run():624
   java.lang.Thread.run():748
   ```
   Could you please take a look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


"Crude-but-effective" Arrow integration

2018-08-18 Thread Paul Rogers
Hi All,

Charles recently suggested why Arrow integration could be helpful. (See quote 
below.)  When we've looked at reworking Drill's internals to use Arrow, we 
found the project to be costly with little direct benefit in terms of 
performance or stability. But, Charles points out that the real value is in 
data exchange, not in changing Drill's internals.

 It might be fairly simple to integrate with Arrow for input or output. Why? As 
it turns out (last time I checked) the memory layout of Arrow vectors is 
identical to Drill's, so it is simply a matter of reinterpreting Drill's 
vectors as Arrow vectors (or visa-versa); possibly passing memory ownership 
somehow. (I suspect the memory ownership issue will be the fussiest part of the 
whole exercise.)


Drill and Arrow use different metadata formats. But, since they both describe 
the same in-memory layout, we can probably translate from one to the other with 
some straightforward code. Since metadata is a small part of a typical result 
set, the overhead of the metadata translation is likely negligible.


If an Arrow client wants to consume Drill output, someone could wrap the Drill 
native Drill Client API that speaks Drill value vectors. The wrapper could 
reinterpret Drill vectors as Arrow vectors, and convert metadata.


If we want Drill to consume Arrow data, then we'd have to play the same trick 
in reverse: reinterpret Arrow vectors as Drill vectors, then convert Arrow 
metadata to Drill format.

Building such integration can be done by the community to enable integration. 
Granted, this approach is a bit on the "crude-but-effective" side. But, if the 
integration proves valuable, then there is justification for a next round of 
deeper integration.


 Charles' original comment from the discussion about project state:

(quote)
The first [suggested improvement] is the Arrow integration.  I’m not enough of 
a software engineer to understand
all the internal details here, but as I understand it, the promise of Arrow is 
that many tools
will share a common memory model and that it will be possible to transfer data 
from one tool
to the other without having to serialize/deserialize the data.  In the data 
science community
many of the major platforms, Python-pandas, R, and Spark are moving or have 
adopted Arrow.
 
Drill’s strength is the ease that it can query many different data sources and 
if Drill
were to adopt Arrow, I suspect that many people would adopt it as a part of a 
machine learning
pipeline.  Just recently, I attempted to do some data manipulation using Spark, 
and couldn’t
help but notice how difficult ti was in contrast with Drill. I’m sure this is a 
very complex
task, but I do think that it could be worth it in the end.

(unquote)

Thanks,
- Paul