[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011406#comment-17011406
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-572380826
 
 
   Thank you @ihuzenko and @arina-ielchiieva for your reviews. Addressed 
remaining minor issues. Squashed commits.
   
   @ihuzenko, your many suggestions made this a much better solution. It is 
almost, but not quite, clean enough that I could write code gen unit tests. 
Need to clean up that pesky `ExpressionTreeMaterializer` issue, then we'll be 
able to write such tests. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011402#comment-17011402
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364554593
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/TypedFieldId.java
 ##
 @@ -241,7 +241,7 @@ public TypedFieldId build() {
 secondaryFinal = finalType;
   }
 
-  MajorType actualFinalType = finalType;
+  //MajorType actualFinalType = finalType;
 
 Review comment:
   Not sure. Not sure what the comments here are about, so I thought I'd leave 
them. Just commenting out an unused variable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011399#comment-17011399
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364554333
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectBatchBuilder.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.TypedFieldId;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+public class ProjectBatchBuilder implements 
ProjectionMaterializer.BatchBuilder {
+  private final ProjectRecordBatch projectBatch;
+  private final VectorContainer container;
+  private final SchemaChangeCallBack callBack;
+  private final RecordBatch incomingBatch;
+  final List transfers = new ArrayList<>();
 
 Review comment:
   Made private. Added getter. But. left initializer with the field since this 
is a final member and we often initialize such objects this way as a way of 
saying that the field does not depend on constructor input.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7502) Incorrect/invalid codegen for typeof() with UNION

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011396#comment-17011396
 ] 

ASF GitHub Bot commented on DRILL-7502:
---

paul-rogers commented on issue #1945: DRILL-7502: Invalid codegen for typeof() 
with UNION
URL: https://github.com/apache/drill/pull/1945#issuecomment-572377107
 
 
   @vvysotskyi, thanks much for the review. Addressed the one comment I missed 
last time. Squashed commits. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incorrect/invalid codegen for typeof() with UNION
> -
>
> Key: DRILL-7502
> URL: https://issues.apache.org/jira/browse/DRILL-7502
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> The {{typeof()}} function is defined as follows:
> {code:java}
>   @FunctionTemplate(names = {"typeOf"},
>   scope = FunctionTemplate.FunctionScope.SIMPLE,
>   nulls = NullHandling.INTERNAL)
>   public static class GetType implements DrillSimpleFunc {
> @Param
> FieldReader input;
> @Output
> VarCharHolder out;
> @Inject
> DrillBuf buf;
> @Override
> public void setup() {}
> @Override
> public void eval() {
>   String typeName = input.getTypeString();
>   byte[] type = typeName.getBytes();
>   buf = buf.reallocIfNeeded(type.length);
>   buf.setBytes(0, type);
>   out.buffer = buf;
>   out.start = 0;
>   out.end = type.length;
> }
>   }
> {code}
> Note that the {{input}} field is defined as {{FieldReader}} which has a 
> method called {{getTypeString()}}. As a result, the code works fine in all 
> existing tests in {{TestTypeFns}}.
> I tried to add a function to use {{typeof()}} on a column of type {{UNION}}. 
> When I did, the query failed with a compile error in generated code:
> {noformat}
> SYSTEM ERROR: CompileException: Line 42, Column 43: 
>   A method named "getTypeString" is not declared in any enclosing class nor 
> any supertype, nor through a static import
> {noformat}
> The stack trace shows the generated code; Note that the type of {{input}} 
> changes from a reader to a holder, causing code to be invalid:
> {code:java}
> public class ProjectorGen0 {
> DrillBuf work0;
> UnionVector vv1;
> VarCharVector vv6;
> DrillBuf work9;
> VarCharVector vv11;
> DrillBuf work14;
> VarCharVector vv16;
> public void doEval(int inIndex, int outIndex)
> throws SchemaChangeException
> {
> {
> UnionHolder out4 = new UnionHolder();
> {
> out4 .isSet = vv1 .getAccessor().isSet((inIndex));
> if (out4 .isSet == 1) {
> vv1 .getAccessor().get((inIndex), out4);
> }
> }
> // start of eval portion of typeOf function. //
> VarCharHolder out5 = new VarCharHolder();
> {
> final VarCharHolder out = new VarCharHolder();
> UnionHolder input = out4;
> DrillBuf buf = work0;
> UnionFunctions$GetType_eval:
> {
> String typeName = input.getTypeString();
> byte[] type = typeName.getBytes();
> buf = buf.reallocIfNeeded(type.length);
> buf.setBytes(0, type);
> out.buffer = buf;
> out.start = 0;
> out.end = type.length;
> }
> {code}
> By contrast, here is the generated code for one of the existing 
> {{TestTypeFns}} tests where things work:
> {code:java}
> public class ProjectorGen0
> extends ProjectorTemplate
> {
> DrillBuf work0;
> NullableBigIntVector vv1;
> VarCharVector vv7;
> public ProjectorGen0() {
> try {
> __DRILL_INIT__();
> } catch (SchemaChangeException e) {
> throw new UnsupportedOperationException(e);
> }
> }
> public void doEval(int inIndex, int outIndex)
> throws SchemaChangeException
> {
> {
>..
> // start of eval portion of typeOf function. //
> VarCharHolder out6 = new VarCharHolder();
> {
> final VarCharHolder out = new VarCharHolder();
> FieldReader input = new NullableIntHolderReaderImpl(out5);
> DrillBuf buf = work0;
> UnionFunctions$GetType_eval:
> {
> String typeName = input.getTypeString();
> byte[] type = typeName.getBytes();

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011341#comment-17011341
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-572223262
 
 
   Hi @arina-ielchiieva 
   I fixed the issues you raised.   I went through all the warnings in IntelliJ 
and addressed all the relevant ones.  Just FYI, I did do this earlier in the 
review process, but it recommended some changes that didn't seem to pass 
review.  For instance, changing a lot of the functions to package private.  In 
any event, I removed almost all of the warnings.  (There are two remaining in 
the batch reader relating to reworking a for loop, but those did not work well, 
so I'm leaving them.) 
   
   I'm not sure how to write a generic matrix transpose method for java 
primitives.  It's easy enough to write, but I don't think there's a way to do 
it in a generic manner.  If anyone can show me how to do that, I'm all ears.
   
   Thank you for your help and patience with this.  As you know, this isn't my 
full time job and this particular plugin is really complicated.  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011336#comment-17011336
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364537105
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.ExecConstants;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.drill.shaded.guava.com.google.common.io.Files;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private FileSplit split;
+
+  private IHDF5Reader hdf5Reader;
+
+  private File infile;
+
+  private BufferedReader reader;
+
+  private RowSetLoader rowWriter;
+
+  private Iterator metadataIterator;
+
+  

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011334#comment-17011334
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-572354784
 
 
   @arina-ielchiieva 
   Thanks for your help and this is ready for the next round. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file is to 
> use the defaultPath field in your query. If the defaultPath field is defined 
> in the query, or via the plugin configuration, Drill will only return the 
> data, rather than the file metadata.
> ** Note: Once you have determined which data set you are querying, it is 
> advisable to use this method to query HDF5 data. **
> You can set the defaultPath variable in either the plugin configuration, or 
> at query time using the table() function as shown in the example below:
> {{SELECT * 
> FROM table(dfs.test.`dset.h5` (type => 'hdf5', defaultPath => '/dset'))}}
> This query will return the result below:
> {{apache drill> SELECT * FROM table(dfs.test.`dset.h5` (type => 'hdf5', 
> defaultPath => '/dset'));
> 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011333#comment-17011333
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-572223262
 
 
   Hi @arina-ielchiieva 
   I fixed most of the issues you raised.   I went through all the warnings in 
IntelliJ and addressed all the relevant ones.  Just FYI, I did do this earlier 
in the review process, but it recommended some changes that didn't seem to pass 
review.  For instance, changing a lot of the functions to package private.  In 
any event, I removed almost all of the warnings.  (There are two remaining in 
the batch reader relating to reworking a for loop.) 
   
   There are two issues which I still need to look at from this review:
   1.  A generic matrix transpose method for primitives
   2.  Moving the `path` assignment to another method.
   
   Thank you for your help and patience with this.  As you know, this isn't my 
full time job and this particular plugin is really complicated. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15|
> | 19| 20| 21|
> +---+---+---+}}
> Alternatively, a better way to query the actual data in an HDF5 file 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011329#comment-17011329
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364535478
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5FloatDataWriter.java
 ##
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.store.hdf5.HDF5Utils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+
+import java.util.List;
+
+public class HDF5FloatDataWriter extends HDF5DataWriter {
+
+  private final float[] data;
+
+  private final ScalarWriter rowWriter;
+
+  // This constructor is used when the data is a 1D column.  The column is 
inferred from the datapath
+  public HDF5FloatDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+data = reader.readFloatArray(datapath);
+fieldName = HDF5Utils.getNameFromPath(datapath);
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.FLOAT8, TypeProtos.DataMode.OPTIONAL);
+  }
+
+  // This constructor is used when the data is part of a 2D array.  In this 
case the column name is provided in the constructor
+  public HDF5FloatDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath, String fieldName, int currentColumn) {
+super(reader, columnWriter, datapath, fieldName, currentColumn);
+// Get dimensions
+long[] dimensions = 
reader.object().getDataSetInformation(datapath).getDimensions();
+float[][] tempData;
+if (dimensions.length == 2) {
+  tempData = transpose(reader.readFloatMatrix(datapath));
+} else {
+  tempData = transpose(reader.float32().readMDArray(datapath).toMatrix());
+}
+data = tempData[currentColumn];
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.FLOAT8, TypeProtos.DataMode.OPTIONAL);
+  }
+
+  public HDF5FloatDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String fieldName, List tempListData) {
+super(reader, columnWriter, null);
+this.fieldName = fieldName;
+data = new float[tempListData.size()];
+for (int i = 0; i < tempListData.size(); i++) {
+  data[i] = (Float)tempListData.get(i);
+}
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.FLOAT8, TypeProtos.DataMode.OPTIONAL);
+  }
+
+
+  public boolean write() {
+if (counter > data.length) {
+  return false;
+} else {
+  rowWriter.setDouble(data[counter++]);
+  return true;
+}
+  }
+
+  public boolean hasNext() {
+return counter < data.length;
+  }
+
+  private float[][] transpose (float[][] array) {
 
 Review comment:
   Added javadoc.  I also renamed the variables so that this is clearer.  
Basically, this is a computer science 101 problem. ;-)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011327#comment-17011327
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364535389
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5DoubleDataWriter.java
 ##
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.store.hdf5.HDF5Utils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+
+import java.util.List;
+
+public class HDF5DoubleDataWriter extends HDF5DataWriter {
+
+  private final double[] data;
+
+  private final ScalarWriter rowWriter;
+
+  // This constructor is used when the data is a 1D column.  The column is 
inferred from the datapath
+  public HDF5DoubleDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+data = reader.readDoubleArray(datapath);
+
+fieldName = HDF5Utils.getNameFromPath(datapath);
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.FLOAT8, TypeProtos.DataMode.OPTIONAL);
+  }
+
+  // This constructor is used when the data is part of a 2D array.  In this 
case the column name is provided in the constructor
+  public HDF5DoubleDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath, String fieldName, int currentColumn) {
+super(reader, columnWriter, datapath, fieldName, currentColumn);
+// Get dimensions
+long[] dimensions = 
reader.object().getDataSetInformation(datapath).getDimensions();
+double[][] tempData;
+if (dimensions.length == 2) {
+  tempData = transpose(reader.readDoubleMatrix(datapath));
+} else {
+  tempData = transpose(reader.float64().readMDArray(datapath).toMatrix());
+}
+data = tempData[currentColumn];
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.FLOAT8, TypeProtos.DataMode.OPTIONAL);
+  }
+
+  public HDF5DoubleDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String fieldName, List tempListData) {
+super(reader, columnWriter, null);
+this.fieldName = fieldName;
+data = new double[tempListData.size()];
+for (int i = 0; i < tempListData.size(); i++) {
+  data[i] = (Double)tempListData.get(i);
+}
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.FLOAT8, TypeProtos.DataMode.OPTIONAL);
+  }
+
+
+  public boolean write() {
+if (counter > data.length) {
+  return false;
+} else {
+  rowWriter.setDouble(data[counter++]);
+  return true;
+}
+  }
+
+  public boolean hasNext() {
+return counter < data.length;
+  }
+
+  private double[][] transpose(double[][] array) {
 
 Review comment:
   I couldn't figure any way to do this in a generic way, however, I'm not an 
expert on some of Java's tricks for generic data types. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011305#comment-17011305
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364530041
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private final HDF5ReaderConfig readerConfig;
+
+  private final List dataWriters;
+
+  private FileSplit split;
+
+  private IHDF5Reader hdf5Reader;
+
+  private File inFile;
+
+  private BufferedReader reader;
+
+  private RowSetLoader rowWriter;
+
+  private Iterator metadataIterator;
+
+  private ScalarWriter pathWriter;
+
+  private ScalarWriter dataTypeWriter;
+

[jira] [Commented] (DRILL-7437) Storage Plugin for Generic HTTP REST API

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010999#comment-17010999
 ] 

ASF GitHub Bot commented on DRILL-7437:
---

cgivre commented on pull request #1892: DRILL-7437: Storage Plugin for Generic 
HTTP REST API
URL: https://github.com/apache/drill/pull/1892#discussion_r364427520
 
 

 ##
 File path: 
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
 ##
 @@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.http.util;
+
+import okhttp3.Cache;
+import okhttp3.Credentials;
+import okhttp3.FormBody;
+import okhttp3.Interceptor;
+import okhttp3.OkHttpClient;
+import okhttp3.OkHttpClient.Builder;
+import okhttp3.Request;
+import okhttp3.Response;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.store.http.HttpAPIConfig;
+import org.apache.drill.exec.store.http.HttpStoragePluginConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+
+/**
+ * This class performs the actual HTTP requests for the HTTP Storage Plugin. 
The core method is the getInputStream()
+ * method which accepts a url and opens an InputStream with that URL's 
contents.
+ */
+public class SimpleHttp {
+  private static final Logger logger = 
LoggerFactory.getLogger(SimpleHttp.class);
+
+  private final OkHttpClient client;
+
+  private final HttpStoragePluginConfig config;
+
+  private final FragmentContext context;
+
+  private final HttpAPIConfig apiConfig;
+
+  public SimpleHttp(HttpStoragePluginConfig config, FragmentContext context, 
String connectionName) {
+this.config = config;
+this.context = context;
+this.apiConfig = config.connections().get(connectionName);
+client = setupServer();
+  }
+
+
+
+  public InputStream getInputStream(String urlStr) {
+Request.Builder requestBuilder;
+
+// The configuration does not allow for any other request types other than 
POST and GET.
+if (apiConfig.method().equals("get")) {
 
 Review comment:
   In the HTTPApiConfig class, it validates this field and converts to 
lowercase. The plugin accepts only `get` and `post` requests.  If the field is 
`null` or some other unsupported method, the plugin falls back to `get`.
   
   
https://github.com/apache/drill/blob/3190a0d43cf7959fa6f17f75e966dc1a924a6c1b/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpAPIConfig.java#L56-#L60
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Storage Plugin for Generic HTTP REST API
> 
>
> Key: DRILL-7437
> URL: https://issues.apache.org/jira/browse/DRILL-7437
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
> Fix For: Future
>
>
> In many data analytic situations there is a need to obtain reference data 
> which is volatile or hosted on a service with a REST API.  
> For instance, consider the case of a financial dataset which you want to run 
> a currency conversion.  Or in the security arena, an organization might have 
> a service that returns network information about an IT asset.  The goal being 
> to enable Drill to quickly incorporate external data that is only accessible 
> via REST API. 
> This plugin is not intended to be a substitute for dedicated storage plugins 
> with systems that use a REST API, such as Apache Solr or ElasticSearch.  
> This plugin is based on several projects that 

[jira] [Commented] (DRILL-7437) Storage Plugin for Generic HTTP REST API

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010998#comment-17010998
 ] 

ASF GitHub Bot commented on DRILL-7437:
---

cgivre commented on pull request #1892: DRILL-7437: Storage Plugin for Generic 
HTTP REST API
URL: https://github.com/apache/drill/pull/1892#discussion_r364428141
 
 

 ##
 File path: 
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
 ##
 @@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.http.util;
+
+import okhttp3.Cache;
+import okhttp3.Credentials;
+import okhttp3.FormBody;
+import okhttp3.Interceptor;
+import okhttp3.OkHttpClient;
+import okhttp3.OkHttpClient.Builder;
+import okhttp3.Request;
+import okhttp3.Response;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.store.http.HttpAPIConfig;
+import org.apache.drill.exec.store.http.HttpStoragePluginConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+
+/**
+ * This class performs the actual HTTP requests for the HTTP Storage Plugin. 
The core method is the getInputStream()
+ * method which accepts a url and opens an InputStream with that URL's 
contents.
+ */
+public class SimpleHttp {
+  private static final Logger logger = 
LoggerFactory.getLogger(SimpleHttp.class);
+
+  private final OkHttpClient client;
+
+  private final HttpStoragePluginConfig config;
+
+  private final FragmentContext context;
+
+  private final HttpAPIConfig apiConfig;
+
+  public SimpleHttp(HttpStoragePluginConfig config, FragmentContext context, 
String connectionName) {
+this.config = config;
+this.context = context;
+this.apiConfig = config.connections().get(connectionName);
+client = setupServer();
+  }
+
+
+
+  public InputStream getInputStream(String urlStr) {
+Request.Builder requestBuilder;
+
+// The configuration does not allow for any other request types other than 
POST and GET.
+if (apiConfig.method().equals("get")) {
+  // Handle GET requests
+  requestBuilder = new Request.Builder().url(urlStr);
+} else {
+  // Handle POST requests
+  FormBody.Builder formBodyBuilder = buildPostBody();
+  requestBuilder = new Request.Builder()
+.url(urlStr)
+.post(formBodyBuilder.build());
+}
+
+// Add headers to request
+if (apiConfig.headers() != null) {
+  for (Map.Entry entry : apiConfig.headers().entrySet()) {
+String key = entry.getKey();
+String value = entry.getValue();
+requestBuilder.addHeader(key, value);
+  }
+}
+
+// Build the request object
+Request request = requestBuilder.build();
+logger.debug("Headers: {}", request.headers());
+
+try {
+  // Execute the request
+  Response response = client
+.newCall(request)
+.execute();
+  logger.debug(response.toString());
+
+  // If the request is unsuccessful, throw a UserException
+  if (!response.isSuccessful()) {
+throw UserException.dataReadError()
+  .message("Error retrieving data: " + response.code() + " " + 
response.message())
+  .addContext("Response code: ", response.code())
+  .build(logger);
+  }
+  logger.debug("HTTP Request for {} successful.", urlStr);
+  logger.debug("Response Headers: {} ", response.headers().toString());
+
+  // Return the InputStream of the response
+  return response.body().byteStream();
+} catch (IOException e) {
+  throw UserException.functionError()
+.message("Error retrieving data")
+.addContext(e.getMessage())
+.build(logger);
+}
+  }
+
+  /**
+   * Function configures the OkHTTP3 server object with configuration info 
from the user.
+   *
+   * @return OkHttpClient configured server
+   */
+  private 

[jira] [Commented] (DRILL-7437) Storage Plugin for Generic HTTP REST API

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010997#comment-17010997
 ] 

ASF GitHub Bot commented on DRILL-7437:
---

cgivre commented on pull request #1892: DRILL-7437: Storage Plugin for Generic 
HTTP REST API
URL: https://github.com/apache/drill/pull/1892#discussion_r364427520
 
 

 ##
 File path: 
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/util/SimpleHttp.java
 ##
 @@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.http.util;
+
+import okhttp3.Cache;
+import okhttp3.Credentials;
+import okhttp3.FormBody;
+import okhttp3.Interceptor;
+import okhttp3.OkHttpClient;
+import okhttp3.OkHttpClient.Builder;
+import okhttp3.Request;
+import okhttp3.Response;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.store.http.HttpAPIConfig;
+import org.apache.drill.exec.store.http.HttpStoragePluginConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
+
+
+/**
+ * This class performs the actual HTTP requests for the HTTP Storage Plugin. 
The core method is the getInputStream()
+ * method which accepts a url and opens an InputStream with that URL's 
contents.
+ */
+public class SimpleHttp {
+  private static final Logger logger = 
LoggerFactory.getLogger(SimpleHttp.class);
+
+  private final OkHttpClient client;
+
+  private final HttpStoragePluginConfig config;
+
+  private final FragmentContext context;
+
+  private final HttpAPIConfig apiConfig;
+
+  public SimpleHttp(HttpStoragePluginConfig config, FragmentContext context, 
String connectionName) {
+this.config = config;
+this.context = context;
+this.apiConfig = config.connections().get(connectionName);
+client = setupServer();
+  }
+
+
+
+  public InputStream getInputStream(String urlStr) {
+Request.Builder requestBuilder;
+
+// The configuration does not allow for any other request types other than 
POST and GET.
+if (apiConfig.method().equals("get")) {
 
 Review comment:
   In the HTTPApiConfig class, it validates this field and converts to 
lowercase. The plugin accepts only `get` and `post` requests.  If the field is 
`null` or some other unsupported method, the plugin falls back to `get`.
   
   [1]: 
https://github.com/apache/drill/blob/3190a0d43cf7959fa6f17f75e966dc1a924a6c1b/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpAPIConfig.java#L56-L60
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Storage Plugin for Generic HTTP REST API
> 
>
> Key: DRILL-7437
> URL: https://issues.apache.org/jira/browse/DRILL-7437
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
> Fix For: Future
>
>
> In many data analytic situations there is a need to obtain reference data 
> which is volatile or hosted on a service with a REST API.  
> For instance, consider the case of a financial dataset which you want to run 
> a currency conversion.  Or in the security arena, an organization might have 
> a service that returns network information about an IT asset.  The goal being 
> to enable Drill to quickly incorporate external data that is only accessible 
> via REST API. 
> This plugin is not intended to be a substitute for dedicated storage plugins 
> with systems that use a REST API, such as Apache Solr or ElasticSearch.  
> This plugin is based on several projects 

[jira] [Commented] (DRILL-7437) Storage Plugin for Generic HTTP REST API

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010994#comment-17010994
 ] 

ASF GitHub Bot commented on DRILL-7437:
---

cgivre commented on pull request #1892: DRILL-7437: Storage Plugin for Generic 
HTTP REST API
URL: https://github.com/apache/drill/pull/1892#discussion_r364425361
 
 

 ##
 File path: 
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpSubScan.java
 ##
 @@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.http;
+
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Objects;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.physical.base.AbstractBase;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.base.PhysicalVisitor;
+import org.apache.drill.exec.physical.base.SubScan;
+import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
+import org.apache.drill.shaded.guava.com.google.common.base.MoreObjects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableSet;
+
+@JsonTypeName("http-sub-scan")
+public class HttpSubScan extends AbstractBase implements SubScan {
+
+  private final HttpScanSpec tableSpec;
+  private final HttpStoragePluginConfig config;
+  private final List columns;
+
+  @JsonCreator
+  public HttpSubScan(
+@JsonProperty("config") HttpStoragePluginConfig config,
+@JsonProperty("tableSpec") HttpScanSpec tableSpec,
+@JsonProperty("columns") List columns) {
+super("user-if-needed");
+this.config = config;
+this.tableSpec = tableSpec;
+this.columns = columns;
+  }
+  @JsonProperty("tableSpec")
+  public HttpScanSpec tableSpec() {
+return tableSpec;
+  }
+
+  @JsonProperty("columns")
+  public List columns() {
+return columns;
+  }
+
+  @JsonProperty("config")
+  public HttpStoragePluginConfig config() {
+return config;
+  }
+
+  @JsonIgnore
+  public String getURL() {
+return tableSpec.getURL();
+  }
+
+  @JsonIgnore
+  public String getFullURL() {
+String selectedConnection = tableSpec.database();
+String url = config.connections().get(selectedConnection).url();
+return url + tableSpec.tableName();
+  }
+
+ @Override
+  public  T accept(
+   PhysicalVisitor physicalVisitor, X value) throws E {
+return physicalVisitor.visitSubScan(this, value);
+  }
 
 Review comment:
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Storage Plugin for Generic HTTP REST API
> 
>
> Key: DRILL-7437
> URL: https://issues.apache.org/jira/browse/DRILL-7437
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Minor
> Fix For: Future
>
>
> In many data analytic situations there is a need to obtain reference data 
> which is volatile or hosted on a service with a REST API.  
> For instance, consider the case of a financial dataset which you want to run 
> a currency conversion.  Or in the security arena, an organization might have 
> a service that returns network information about an IT asset.  The goal being 
> to enable Drill to quickly incorporate external data that is only accessible 
> via REST API. 
> This plugin is not intended to be a substitute for dedicated storage plugins 
> with systems that use a REST API, such as Apache Solr or ElasticSearch.  
> This plugin is 

[jira] [Commented] (DRILL-7437) Storage Plugin for Generic HTTP REST API

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010992#comment-17010992
 ] 

ASF GitHub Bot commented on DRILL-7437:
---

cgivre commented on pull request #1892: DRILL-7437: Storage Plugin for Generic 
HTTP REST API
URL: https://github.com/apache/drill/pull/1892#discussion_r364424899
 
 

 ##
 File path: 
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpGroupScan.java
 ##
 @@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.http;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+
+import com.fasterxml.jackson.annotation.JacksonInject;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.expression.SchemaPath;
+
+import org.apache.drill.exec.physical.base.AbstractGroupScan;
+import org.apache.drill.exec.physical.base.GroupScan;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.physical.base.ScanStats.GroupScanProperty;
+import org.apache.drill.exec.physical.base.SubScan;
+import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.shaded.guava.com.google.common.base.MoreObjects;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@JsonTypeName("http-scan")
+public class HttpGroupScan extends AbstractGroupScan {
+  private static final Logger logger = 
LoggerFactory.getLogger(HttpGroupScan.class);
+
+  private List columns;
+  private final HttpScanSpec httpScanSpec;
+  private final HttpStoragePluginConfig config;
+
+  public HttpGroupScan (
+HttpStoragePluginConfig config,
+HttpScanSpec scanSpec,
+List columns
+  ) {
+super("no-user");
+this.config = config;
+this.httpScanSpec = scanSpec;
+this.columns = columns == null || columns.size() == 0 ? ALL_COLUMNS : 
columns;
+  }
+
+  public HttpGroupScan(HttpGroupScan that) {
+super(that);
+config = that.config();
+httpScanSpec = that.httpScanSpec();
+columns = that.getColumns();
+  }
+
+  @JsonCreator
+  public HttpGroupScan(
+@JsonProperty("config") HttpStoragePluginConfig config,
+@JsonProperty("columns") List columns,
+@JsonProperty("httpScanSpec") HttpScanSpec httpScanSpec,
+@JacksonInject StoragePluginRegistry engineRegistry
+  ) {
+super("no-user");
+this.config = config;
+this.columns = columns;
+this.httpScanSpec = httpScanSpec;
+  }
+
+  @JsonProperty("config")
+  public HttpStoragePluginConfig config() { return config; }
+
+  @JsonProperty("columns")
+  public List columns() { return columns; }
+
+  @JsonProperty("httpScanSpec")
+  public HttpScanSpec httpScanSpec() { return httpScanSpec; }
+
+  @Override
+  public void applyAssignments(List endpoints) {
+logger.debug("HttpGroupScan applyAssignments");
+  }
+
+  @Override
+  @JsonIgnore
+  public int getMaxParallelizationWidth() {
+return 0;
+  }
+
+  @Override
+  public boolean canPushdownProjects(List columns) {
+return true;
+  }
+
+  @Override
+  public SubScan getSpecificScan(int minorFragmentId) {
+logger.debug("HttpGroupScan getSpecificScan");
+return new HttpSubScan(config, httpScanSpec, columns);
+  }
+
+  @Override
+  public GroupScan clone(List columns) {
+logger.debug("HttpGroupScan clone {}", columns);
+HttpGroupScan newScan = new HttpGroupScan(this);
+newScan.columns = columns;
 
 Review comment:
   I hear what you're saying, but it still needs the `scanSpec`.  Where should 
that come from?  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For 

[jira] [Updated] (DRILL-7520) Cannot connect to Drill with PLAIN authentication enabled using JDBC client (mapr profile)

2020-01-08 Thread Vova Vysotskyi (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Vysotskyi updated DRILL-7520:
--
Affects Version/s: 1.16.0

> Cannot connect to Drill with PLAIN authentication enabled using JDBC client 
> (mapr profile)
> --
>
> Key: DRILL-7520
> URL: https://issues.apache.org/jira/browse/DRILL-7520
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Anton Gozhiy
>Priority: Major
>
> *Prerequisites:*
> # Drill with the JDBC driver is built with "mapr" profile
> # Security is enabled and PLAIN authentication is configured
> *Steps:*
> # Use some external JDBC client to connect (e.g. DBeaver)
> # Connection string: "jdbc:drill:drillbit=node1:31010"
> # Set appropriate user/password
> # Test Connection
> *Expected result:*
> Connection successful.
> *Actual result:*
> Exception happens:
> {noformat}
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> Exception in thread "main" java.sql.SQLNonTransientConnectionException: 
> Failure in connecting to Drill: oadd.org.apache.drill.exec.rpc.RpcException: 
> HANDSHAKE_VALIDATION : org/apache/hadoop/conf/Configuration
>   at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:178)
>   at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:67)
>   at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:67)
>   at 
> oadd.org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138)
>   at org.apache.drill.jdbc.Driver.connect(Driver.java:75)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:247)
>   at TheBestClientEver.main(TheBestClientEver.java:28)
> Caused by: oadd.org.apache.drill.exec.rpc.RpcException: HANDSHAKE_VALIDATION 
> : org/apache/hadoop/conf/Configuration
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient$2.connectionFailed(UserClient.java:315)
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler$ChannelClosedHandler.connectionFailed(QueryResultHandler.java:396)
>   at 
> oadd.org.apache.drill.exec.rpc.ConnectionMultiListener$HandshakeSendHandler.success(ConnectionMultiListener.java:170)
>   at 
> oadd.org.apache.drill.exec.rpc.ConnectionMultiListener$HandshakeSendHandler.success(ConnectionMultiListener.java:143)
>   at 
> oadd.org.apache.drill.exec.rpc.RequestIdMap$RpcListener.set(RequestIdMap.java:134)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClient$ClientHandshakeHandler.consumeHandshake(BasicClient.java:318)
>   at 
> oadd.org.apache.drill.exec.rpc.AbstractHandshakeHandler.decode(AbstractHandshakeHandler.java:57)
>   at 
> oadd.org.apache.drill.exec.rpc.AbstractHandshakeHandler.decode(AbstractHandshakeHandler.java:29)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>   at 
> oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> 

[jira] [Updated] (DRILL-7520) Cannot connect to Drill with PLAIN authentication enabled using JDBC client (mapr profile)

2020-01-08 Thread Anton Gozhiy (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Gozhiy updated DRILL-7520:

Summary: Cannot connect to Drill with PLAIN authentication enabled using 
JDBC client (mapr profile)  (was: Cannot connect to Drill with PLAIN 
authentication enabled using JDBC client)

> Cannot connect to Drill with PLAIN authentication enabled using JDBC client 
> (mapr profile)
> --
>
> Key: DRILL-7520
> URL: https://issues.apache.org/jira/browse/DRILL-7520
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Anton Gozhiy
>Priority: Major
>
> *Prerequisites:*
> # Drill with the JDBC driver is built with "mapr" profile
> # Security is enabled and PLAIN authentication is configured
> *Steps:*
> # Use some external JDBC client to connect (e.g. DBeaver)
> # Connection string: "jdbc:drill:drillbit=node1:31010"
> # Set appropriate user/password
> # Test Connection
> *Expected result:*
> Connection successful.
> *Actual result:*
> Exception happens:
> {noformat}
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> Exception in thread "main" java.sql.SQLNonTransientConnectionException: 
> Failure in connecting to Drill: oadd.org.apache.drill.exec.rpc.RpcException: 
> HANDSHAKE_VALIDATION : org/apache/hadoop/conf/Configuration
>   at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:178)
>   at 
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:67)
>   at 
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:67)
>   at 
> oadd.org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138)
>   at org.apache.drill.jdbc.Driver.connect(Driver.java:75)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:247)
>   at TheBestClientEver.main(TheBestClientEver.java:28)
> Caused by: oadd.org.apache.drill.exec.rpc.RpcException: HANDSHAKE_VALIDATION 
> : org/apache/hadoop/conf/Configuration
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient$2.connectionFailed(UserClient.java:315)
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler$ChannelClosedHandler.connectionFailed(QueryResultHandler.java:396)
>   at 
> oadd.org.apache.drill.exec.rpc.ConnectionMultiListener$HandshakeSendHandler.success(ConnectionMultiListener.java:170)
>   at 
> oadd.org.apache.drill.exec.rpc.ConnectionMultiListener$HandshakeSendHandler.success(ConnectionMultiListener.java:143)
>   at 
> oadd.org.apache.drill.exec.rpc.RequestIdMap$RpcListener.set(RequestIdMap.java:134)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClient$ClientHandshakeHandler.consumeHandshake(BasicClient.java:318)
>   at 
> oadd.org.apache.drill.exec.rpc.AbstractHandshakeHandler.decode(AbstractHandshakeHandler.java:57)
>   at 
> oadd.org.apache.drill.exec.rpc.AbstractHandshakeHandler.decode(AbstractHandshakeHandler.java:29)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>   at 
> 

[jira] [Created] (DRILL-7520) Cannot connect to Drill with PLAIN authentication enabled using JDBC client

2020-01-08 Thread Anton Gozhiy (Jira)
Anton Gozhiy created DRILL-7520:
---

 Summary: Cannot connect to Drill with PLAIN authentication enabled 
using JDBC client
 Key: DRILL-7520
 URL: https://issues.apache.org/jira/browse/DRILL-7520
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Anton Gozhiy


*Prerequisites:*
# Drill with the JDBC driver is built with "mapr" profile
# Security is enabled and PLAIN authentication is configured

*Steps:*
# Use some external JDBC client to connect (e.g. DBeaver)
# Connection string: "jdbc:drill:drillbit=node1:31010"
# Set appropriate user/password
# Test Connection

*Expected result:*
Connection successful.

*Actual result:*
Exception happens:
{noformat}
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
Exception in thread "main" java.sql.SQLNonTransientConnectionException: Failure 
in connecting to Drill: oadd.org.apache.drill.exec.rpc.RpcException: 
HANDSHAKE_VALIDATION : org/apache/hadoop/conf/Configuration
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:178)
at 
org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:67)
at 
org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:67)
at 
oadd.org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138)
at org.apache.drill.jdbc.Driver.connect(Driver.java:75)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at TheBestClientEver.main(TheBestClientEver.java:28)
Caused by: oadd.org.apache.drill.exec.rpc.RpcException: HANDSHAKE_VALIDATION : 
org/apache/hadoop/conf/Configuration
at 
oadd.org.apache.drill.exec.rpc.user.UserClient$2.connectionFailed(UserClient.java:315)
at 
oadd.org.apache.drill.exec.rpc.user.QueryResultHandler$ChannelClosedHandler.connectionFailed(QueryResultHandler.java:396)
at 
oadd.org.apache.drill.exec.rpc.ConnectionMultiListener$HandshakeSendHandler.success(ConnectionMultiListener.java:170)
at 
oadd.org.apache.drill.exec.rpc.ConnectionMultiListener$HandshakeSendHandler.success(ConnectionMultiListener.java:143)
at 
oadd.org.apache.drill.exec.rpc.RequestIdMap$RpcListener.set(RequestIdMap.java:134)
at 
oadd.org.apache.drill.exec.rpc.BasicClient$ClientHandshakeHandler.consumeHandshake(BasicClient.java:318)
at 
oadd.org.apache.drill.exec.rpc.AbstractHandshakeHandler.decode(AbstractHandshakeHandler.java:57)
at 
oadd.org.apache.drill.exec.rpc.AbstractHandshakeHandler.decode(AbstractHandshakeHandler.java:29)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at 
oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at 
oadd.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
at 
oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at 
oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at 
oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010963#comment-17010963
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on issue #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-572223262
 
 
   Hi @arina-ielchiieva 
   I fixed most of the issues you raised.  **Please do NOT do another pass yet 
as there are still a few things I need to fix which I will do either tonight or 
tomorrow**.  I went through all the warnings in IntelliJ and addressed all the 
relevant ones.  Just FYI, I did do this earlier in the review process, but it 
recommended some changes that didn't seem to pass review.  For instance, 
changing a lot of the functions to package private.  In any event, I removed 
almost all of the warnings.  (There are two remaining in the batch reader 
relating to reworking a for loop.) 
   
   There are two issues which I still need to look at from this review:
   1.  A generic matrix transpose method for primitives
   2.  Moving the `path` assignment to another method.
   
   Thank you for your help and patience with this.  As you know, this isn't my 
full time job and this particular plugin is really complicated. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> int_data[1] as col_1,
> . .semicolon> int_data[2] as col_2
> . .semicolon> FROM ( SELECT flatten(int_data) AS int_data
> . . . . . .)> FROM dfs.test.`dset.h5`
> . . . . . .)> );
> +---+---+---+
> | col_0 | col_1 | col_2 |
> +---+---+---+
> | 1 | 2 | 3 |
> | 7 | 8 | 9 |
> | 13| 14| 15

[jira] [Commented] (DRILL-7517) Drill 1.16.0 shuts down frequently

2020-01-08 Thread Nitin Pawar (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010961#comment-17010961
 ] 

Nitin Pawar commented on DRILL-7517:


[~volodymyr], I am afraid I can not download the data files to local as that's 
customer data. 

These queries run perfectly fine on Drill-1.13 version which we are using 
currently. Issue related to OOM comes only for 1.16

Current guideline of having HEAP of 4 to 8 GB from the documentation, I kept it 
to 12GB.

For now I have shutdown the 1.16 instances of drill as it had major impact on 
our production infra.

Our current load of queries will be maximum 100 queries on drill cluster 
parallel at any given point of time.  There might be other queries such as 
"explain plan extended" to know more details of the queries but I am not sure 
if that causes lots of issues as they are not processing any data as the plan 
we always run against cp.`employee.json`

 

I will try to get heapdump and share it with you tomorrow

> Drill 1.16.0 shuts down frequently
> --
>
> Key: DRILL-7517
> URL: https://issues.apache.org/jira/browse/DRILL-7517
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Nitin Pawar
>Priority: Critical
>
> We see following exception every few hours
> Our drillbit cluster queries data from S3. The only queries we make to web 
> interface are for explain plan and no actual query goes via WEB UI. 
> here is the full exception
> 2020-01-07 16:34:02,922 [qtp80683229-962] INFO 
> o.a.d.exec.server.rest.QueryWrapper - User Error Occurred: There is not 
> enough heap memory to run this query using the web interface. 
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface. 
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned. 
> You can also try an ODBC/JDBC client. 
> [Error Id: 7ad61839-a2e8-4fdd-a600-e662fc0f03e0 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:115) 
> [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:74)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at sun.reflect.GeneratedMethodAccessor212.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
>  [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) 
> [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:297) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
>  [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) 
> [jersey-server-2.25.1.jar:na]
>  at 
> 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010960#comment-17010960
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364405199
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5LongDataWriter.java
 ##
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.store.hdf5.HDF5Utils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+
+import java.util.List;
+
+public class HDF5LongDataWriter extends HDF5DataWriter {
+
+  private final long[] data;
+
+  private final ScalarWriter rowWriter;
+
+  // This constructor is used when the data is a 1D column.  The column is 
inferred from the datapath
+  public HDF5LongDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+data = reader.readLongArray(datapath);
+
+fieldName = HDF5Utils.getNameFromPath(datapath);
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.BIGINT, TypeProtos.DataMode.OPTIONAL);
+
+  }
+
+  // This constructor is used when the data is part of a 2D array.  In this 
case the column name is provided in the constructor
+  public HDF5LongDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath, String fieldName, int currentColumn) {
+super(reader, columnWriter, datapath, fieldName, currentColumn);
+// Get dimensions
+long[] dimensions = 
reader.object().getDataSetInformation(datapath).getDimensions();
+long[][] tempData;
+if (dimensions.length == 2) {
+  tempData = transpose(reader.readLongMatrix(datapath));
+} else {
+  tempData = transpose(reader.int64().readMDArray(datapath).toMatrix());
+}
+data = tempData[currentColumn];
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.BIGINT, TypeProtos.DataMode.OPTIONAL);
+  }
+
+  public HDF5LongDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String fieldName, List tempListData) {
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010950#comment-17010950
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364396602
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Format.java
 ##
 @@ -0,0 +1,907 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.ExecTest;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.store.dfs.ZipCodec;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+@Category(RowSetTests.class)
+public class TestHDF5Format extends ClusterTest {
+
+  @BeforeClass
+  public static void setup() throws Exception {
+ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
+
+HDF5FormatConfig formatConfig = new HDF5FormatConfig();
+cluster.defineFormat("dfs", "hdf5", formatConfig);
+cluster.defineFormat("cp", "hdf5", formatConfig);
+dirTestWatcher.copyResourceToRoot(Paths.get("hdf5/"));
+  }
+
+  @Test
+  public void testExplicitQuery() throws RpcException {
+String sql = "SELECT path, data_type, file_name FROM cp.`hdf5/dset.h5`";
+RowSet results = client.queryBuilder().sql(sql).rowSet();
+TupleMetadata expectedSchema = new SchemaBuilder()
+  .add("path", TypeProtos.MinorType.VARCHAR, TypeProtos.DataMode.OPTIONAL)
+  .add("data_type", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+  .add("file_name", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+  .buildSchema();
+
+RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema)
+  .addRow("/dset", "DATASET", "dset.h5")
+  .build();
+new RowSetComparison(expected).unorderedVerifyAndClearAll(results);
+  }
+
+  @Test
+  public void testStarQuery() throws Exception {
+List t1 = Arrays.asList(1, 2, 3, 4, 5, 6);
+List t2 = Arrays.asList(7, 8, 9, 10, 11, 12);
+List t3 = Arrays.asList(13, 14, 15, 16, 17, 18);
+List t4 = Arrays.asList(19, 20, 21, 22, 23, 24);
+List> finalList = new ArrayList<>();
+finalList.add(t1);
+finalList.add(t2);
+finalList.add(t3);
+finalList.add(t4);
+
+testBuilder()
+  .sqlQuery("SELECT * FROM cp.`hdf5/dset.h5`")
+  .unOrdered()
+  .baselineColumns("path", "data_type", "file_name", "int_data")
+  .baselineValues("/dset", "DATASET", "dset.h5", finalList)
+  .go();
+  }
+
+  @Test
+  public void testSimpleExplicitQuery() throws Exception {
+List t1 = Arrays.asList(1, 2, 3, 4, 5, 6);
+List t2 = Arrays.asList(7, 8, 9, 10, 11, 12);
+List t3 = Arrays.asList(13, 14, 15, 16, 17, 18);
+List t4 = Arrays.asList(19, 20, 21, 22, 23, 24);
+List> finalList = new ArrayList<>();
+finalList.add(t1);

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010926#comment-17010926
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364388158
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private final HDF5ReaderConfig readerConfig;
+
+  private final List dataWriters;
+
+  private FileSplit split;
+
+  private IHDF5Reader hdf5Reader;
+
+  private File inFile;
+
+  private BufferedReader reader;
+
+  private RowSetLoader rowWriter;
+
+  private Iterator metadataIterator;
+
+  private ScalarWriter pathWriter;
+
+  private ScalarWriter dataTypeWriter;
+

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010925#comment-17010925
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364388071
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private final HDF5ReaderConfig readerConfig;
+
+  private final List dataWriters;
+
+  private FileSplit split;
+
+  private IHDF5Reader hdf5Reader;
+
+  private File inFile;
+
+  private BufferedReader reader;
+
+  private RowSetLoader rowWriter;
+
+  private Iterator metadataIterator;
+
+  private ScalarWriter pathWriter;
+
+  private ScalarWriter dataTypeWriter;
+

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010945#comment-17010945
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364391749
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5LongDataWriter.java
 ##
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.store.hdf5.HDF5Utils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+
+import java.util.List;
+
+public class HDF5LongDataWriter extends HDF5DataWriter {
+
+  private final long[] data;
+
+  private final ScalarWriter rowWriter;
+
+  // This constructor is used when the data is a 1D column.  The column is 
inferred from the datapath
+  public HDF5LongDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+data = reader.readLongArray(datapath);
+
+fieldName = HDF5Utils.getNameFromPath(datapath);
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.BIGINT, TypeProtos.DataMode.OPTIONAL);
+
+  }
+
+  // This constructor is used when the data is part of a 2D array.  In this 
case the column name is provided in the constructor
+  public HDF5LongDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath, String fieldName, int currentColumn) {
+super(reader, columnWriter, datapath, fieldName, currentColumn);
+// Get dimensions
+long[] dimensions = 
reader.object().getDataSetInformation(datapath).getDimensions();
+long[][] tempData;
+if (dimensions.length == 2) {
+  tempData = transpose(reader.readLongMatrix(datapath));
+} else {
+  tempData = transpose(reader.int64().readMDArray(datapath).toMatrix());
+}
+data = tempData[currentColumn];
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.BIGINT, TypeProtos.DataMode.OPTIONAL);
+  }
+
+  public HDF5LongDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String fieldName, List tempListData) {
+super(reader, columnWriter, null);
+this.fieldName = fieldName;
+data = new long[tempListData.size()];
+for (int i = 0; i < tempListData.size(); i++) {
+  data[i] = (Long)tempListData.get(i);
 
 Review comment:
   Fixed... by removing casts
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010946#comment-17010946
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364391838
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Format.java
 ##
 @@ -0,0 +1,907 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.ExecTest;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.store.dfs.ZipCodec;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+@Category(RowSetTests.class)
+public class TestHDF5Format extends ClusterTest {
+
+  @BeforeClass
+  public static void setup() throws Exception {
+ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
+
+HDF5FormatConfig formatConfig = new HDF5FormatConfig();
+cluster.defineFormat("dfs", "hdf5", formatConfig);
+cluster.defineFormat("cp", "hdf5", formatConfig);
+dirTestWatcher.copyResourceToRoot(Paths.get("hdf5/"));
+  }
+
+  @Test
+  public void testExplicitQuery() throws RpcException {
+String sql = "SELECT path, data_type, file_name FROM cp.`hdf5/dset.h5`";
+RowSet results = client.queryBuilder().sql(sql).rowSet();
+TupleMetadata expectedSchema = new SchemaBuilder()
+  .add("path", TypeProtos.MinorType.VARCHAR, TypeProtos.DataMode.OPTIONAL)
+  .add("data_type", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+  .add("file_name", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+  .buildSchema();
+
+RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema)
+  .addRow("/dset", "DATASET", "dset.h5")
+  .build();
+new RowSetComparison(expected).unorderedVerifyAndClearAll(results);
+  }
+
+  @Test
+  public void testStarQuery() throws Exception {
+List t1 = Arrays.asList(1, 2, 3, 4, 5, 6);
+List t2 = Arrays.asList(7, 8, 9, 10, 11, 12);
+List t3 = Arrays.asList(13, 14, 15, 16, 17, 18);
+List t4 = Arrays.asList(19, 20, 21, 22, 23, 24);
+List> finalList = new ArrayList<>();
+finalList.add(t1);
+finalList.add(t2);
+finalList.add(t3);
+finalList.add(t4);
+
+testBuilder()
+  .sqlQuery("SELECT * FROM cp.`hdf5/dset.h5`")
+  .unOrdered()
+  .baselineColumns("path", "data_type", "file_name", "int_data")
+  .baselineValues("/dset", "DATASET", "dset.h5", finalList)
+  .go();
+  }
+
+  @Test
+  public void testSimpleExplicitQuery() throws Exception {
+List t1 = Arrays.asList(1, 2, 3, 4, 5, 6);
+List t2 = Arrays.asList(7, 8, 9, 10, 11, 12);
+List t3 = Arrays.asList(13, 14, 15, 16, 17, 18);
+List t4 = Arrays.asList(19, 20, 21, 22, 23, 24);
+List> finalList = new ArrayList<>();
+finalList.add(t1);

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010943#comment-17010943
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364391498
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5IntDataWriter.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.store.hdf5.HDF5Utils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import java.util.List;
+
+public class HDF5IntDataWriter extends HDF5DataWriter {
+
+  private final int[] data;
+
+  private final ScalarWriter rowWriter;
+
+  // This constructor is used when the data is a 1D column.  The column is 
inferred from the datapath
+  public HDF5IntDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+data = reader.readIntArray(datapath);
+
+fieldName = HDF5Utils.getNameFromPath(datapath);
+
+rowWriter = makeWriter(columnWriter, fieldName, TypeProtos.MinorType.INT, 
TypeProtos.DataMode.OPTIONAL);
+  }
+
+  // This constructor is used when the data is part of a 2D array.  In this 
case the column name is provided in the constructor
+  public HDF5IntDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath, String fieldName, int currentColumn) {
+super(reader, columnWriter, datapath, fieldName, currentColumn);
+// Get dimensions
+long[] dimensions = 
reader.object().getDataSetInformation(datapath).getDimensions();
+int[][] tempData;
+if (dimensions.length == 2) {
+  tempData = transpose(reader.readIntMatrix(datapath));
+} else {
+  tempData = transpose(reader.int32().readMDArray(datapath).toMatrix());
+}
+data = tempData[currentColumn];
+
+rowWriter = makeWriter(columnWriter, fieldName, TypeProtos.MinorType.INT, 
TypeProtos.DataMode.OPTIONAL);
+  }
+
+  // This constructor is used for compound data types.
+  public HDF5IntDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String fieldName, List tempListData) {
+super(reader, columnWriter, null);
+this.fieldName = fieldName;
+data = new int[tempListData.size()];
+for (int i = 0; i < tempListData.size(); i++) {
+  data[i] = (Integer)tempListData.get(i);
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010942#comment-17010942
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364391385
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5IntDataWriter.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.store.hdf5.HDF5Utils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import java.util.List;
+
+public class HDF5IntDataWriter extends HDF5DataWriter {
+
+  private final int[] data;
+
+  private final ScalarWriter rowWriter;
+
+  // This constructor is used when the data is a 1D column.  The column is 
inferred from the datapath
+  public HDF5IntDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+data = reader.readIntArray(datapath);
+
+fieldName = HDF5Utils.getNameFromPath(datapath);
+
+rowWriter = makeWriter(columnWriter, fieldName, TypeProtos.MinorType.INT, 
TypeProtos.DataMode.OPTIONAL);
+  }
+
+  // This constructor is used when the data is part of a 2D array.  In this 
case the column name is provided in the constructor
+  public HDF5IntDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath, String fieldName, int currentColumn) {
+super(reader, columnWriter, datapath, fieldName, currentColumn);
+// Get dimensions
+long[] dimensions = 
reader.object().getDataSetInformation(datapath).getDimensions();
+int[][] tempData;
+if (dimensions.length == 2) {
+  tempData = transpose(reader.readIntMatrix(datapath));
+} else {
+  tempData = transpose(reader.int32().readMDArray(datapath).toMatrix());
+}
+data = tempData[currentColumn];
+
+rowWriter = makeWriter(columnWriter, fieldName, TypeProtos.MinorType.INT, 
TypeProtos.DataMode.OPTIONAL);
+  }
+
+  // This constructor is used for compound data types.
+  public HDF5IntDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String fieldName, List tempListData) {
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010941#comment-17010941
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364391231
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5DoubleDataWriter.java
 ##
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.store.hdf5.HDF5Utils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+
+import java.util.List;
+
+public class HDF5DoubleDataWriter extends HDF5DataWriter {
+
+  private final double[] data;
+
+  private final ScalarWriter rowWriter;
+
+  // This constructor is used when the data is a 1D column.  The column is 
inferred from the datapath
+  public HDF5DoubleDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+data = reader.readDoubleArray(datapath);
+
+fieldName = HDF5Utils.getNameFromPath(datapath);
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.FLOAT8, TypeProtos.DataMode.OPTIONAL);
+  }
+
+  // This constructor is used when the data is part of a 2D array.  In this 
case the column name is provided in the constructor
+  public HDF5DoubleDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath, String fieldName, int currentColumn) {
+super(reader, columnWriter, datapath, fieldName, currentColumn);
+// Get dimensions
+long[] dimensions = 
reader.object().getDataSetInformation(datapath).getDimensions();
+double[][] tempData;
+if (dimensions.length == 2) {
+  tempData = transpose(reader.readDoubleMatrix(datapath));
+} else {
+  tempData = transpose(reader.float64().readMDArray(datapath).toMatrix());
+}
+data = tempData[currentColumn];
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.FLOAT8, TypeProtos.DataMode.OPTIONAL);
+  }
+
+  public HDF5DoubleDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String fieldName, List tempListData) {
+super(reader, columnWriter, null);
+this.fieldName = fieldName;
+data = new double[tempListData.size()];
+for (int i = 0; i < tempListData.size(); i++) {
+  data[i] = (Double)tempListData.get(i);
+}
+rowWriter = makeWriter(columnWriter, fieldName, 
TypeProtos.MinorType.FLOAT8, TypeProtos.DataMode.OPTIONAL);
+  }
+
+
+  public boolean write() {
+if (counter > data.length) {
+  return false;
+} else {
+  rowWriter.setDouble(data[counter++]);
+  return true;
+}
+  }
+
+  public boolean hasNext() {
+return counter < data.length;
+  }
+
+  private double[][] transpose(double[][] array) {
 
 Review comment:
   Hi Arina, 
   Basically, I needed a way to transpose primitive 2D arrays.   Since they are 
primitives, I implemented a version of this with the differnet data types for 
each primitive.  Is there a way to do this in a generic way for all primitive 
types?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010937#comment-17010937
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364390483
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5FormatPlugin.java
 ##
 @@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileScanBuilder;
+
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.shaded.guava.com.google.common.io.Files;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.hdf5.HDF5BatchReader.HDF5ReaderConfig;
+
+import java.io.File;
+
+
+public class HDF5FormatPlugin extends EasyFormatPlugin {
+
+  public static final String DEFAULT_NAME = "hdf5";
+
+  private final DrillbitContext context;
+
+  public HDF5FormatPlugin(String name, DrillbitContext context, Configuration 
fsConf, StoragePluginConfig storageConfig, HDF5FormatConfig formatConfig) {
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
+this.context = context;
+
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010938#comment-17010938
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364390577
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5DataWriter.java
 ##
 @@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import java.util.ArrayList;
+import java.util.List;
+
+public abstract class HDF5DataWriter {
+  protected final RowSetLoader columnWriter;
+
+  protected final IHDF5Reader reader;
+
+  protected final String datapath;
+
+  protected String fieldName;
+
+  protected int colCount;
+
+  protected int counter;
+
+  protected Object[][] compoundData;
+
+  public HDF5DataWriter(IHDF5Reader reader, RowSetLoader columnWriter, String 
datapath) {
+this.reader = reader;
+this.columnWriter = columnWriter;
+this.datapath = datapath;
+  }
+
+  public HDF5DataWriter(IHDF5Reader reader, RowSetLoader columnWriter, String 
datapath, String fieldName, int colCount) {
+this.reader = reader;
+this.columnWriter = columnWriter;
+this.datapath = datapath;
+this.fieldName = fieldName;
+this.colCount = colCount;
+  }
+
+  public boolean write() {
+return false;
+  }
+
+  public boolean hasNext() {
+return false;
+  }
+
+  public int currentRowCount() {
+return counter;
+  }
+
+  public List getColumn(int columnIndex) {
+List result = new ArrayList();
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010924#comment-17010924
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364387938
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Format.java
 ##
 @@ -0,0 +1,907 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.ExecTest;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.store.dfs.ZipCodec;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+@Category(RowSetTests.class)
+public class TestHDF5Format extends ClusterTest {
+
+  @BeforeClass
+  public static void setup() throws Exception {
+ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010923#comment-17010923
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364387869
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Format.java
 ##
 @@ -0,0 +1,907 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.ExecTest;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.store.dfs.ZipCodec;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+@Category(RowSetTests.class)
+public class TestHDF5Format extends ClusterTest {
+
+  @BeforeClass
+  public static void setup() throws Exception {
+ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
+
+HDF5FormatConfig formatConfig = new HDF5FormatConfig();
+cluster.defineFormat("dfs", "hdf5", formatConfig);
+cluster.defineFormat("cp", "hdf5", formatConfig);
 
 Review comment:
   Removed `cp`;
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010935#comment-17010935
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364390104
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5DrillMetadata.java
 ##
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import java.util.HashMap;
+import java.util.Map;
+
+public class HDF5DrillMetadata {
+  private String path;
+
+  private String dataType;
+
+  private Map attributes;
+
+  public HDF5DrillMetadata() {
+attributes = new HashMap();
 
 Review comment:
   Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for HDF5
> --
>
> Key: DRILL-7233
> URL: https://issues.apache.org/jira/browse/DRILL-7233
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.18.0
>
>
> h2. Drill HDF5 Format Plugin
> h2. 
> Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
> designed to store and organize large amounts of data. Originally developed at 
> the National Center for Supercomputing Applications, it is supported by The 
> HDF Group, a non-profit corporation whose mission is to ensure continued 
> development of HDF5 technologies and the continued accessibility of data 
> stored in HDF.
> This plugin enables Apache Drill to query HDF5 files.
> h3. Configuration
> There are three configuration variables in this plugin:
> type: This should be set to hdf5.
> extensions: This is a list of the file extensions used to identify HDF5 
> files. Typically HDF5 uses .h5 or .hdf5 as file extensions. This defaults to 
> .h5.
> defaultPath:
> h3. Example Configuration
> h3. 
> For most uses, the configuration below will suffice to enable Drill to query 
> HDF5 files.
> {{"hdf5": {
>   "type": "hdf5",
>   "extensions": [
> "h5"
>   ],
>   "defaultPath": null
> }}}
> h3. Usage
> Since HDF5 can be viewed as a file system within a file, a single file can 
> contain many datasets. For instance, if you have a simple HDF5 file, a star 
> query will produce the following result:
> {{apache drill> select * from dfs.test.`dset.h5`;
> +---+---+---+--+
> | path  | data_type | file_name | int_data
>  |
> +---+---+---+--+
> | /dset | DATASET   | dset.h5   | 
> [[1,2,3,4,5,6],[7,8,9,10,11,12],[13,14,15,16,17,18],[19,20,21,22,23,24]] |
> +---+---+---+--+}}
> The actual data in this file is mapped to a column called int_data. In order 
> to effectively access the data, you should use Drill's FLATTEN() function on 
> the int_data column, which produces the following result.
> {{apache drill> select flatten(int_data) as int_data from dfs.test.`dset.h5`;
> +-+
> |  int_data   |
> +-+
> | [1,2,3,4,5,6]   |
> | [7,8,9,10,11,12]|
> | [13,14,15,16,17,18] |
> | [19,20,21,22,23,24] |
> +-+}}
> Once you have the data in this form, you can access it similarly to how you 
> might access nested data in JSON or other files.
> {{apache drill> SELECT int_data[0] as col_0,
> . .semicolon> 

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010929#comment-17010929
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364389029
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private final HDF5ReaderConfig readerConfig;
+
+  private final List dataWriters;
+
+  private FileSplit split;
+
+  private IHDF5Reader hdf5Reader;
+
+  private File inFile;
+
+  private BufferedReader reader;
+
+  private RowSetLoader rowWriter;
+
+  private Iterator metadataIterator;
+
+  private ScalarWriter pathWriter;
+
+  private ScalarWriter dataTypeWriter;
+

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010930#comment-17010930
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364389125
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private final HDF5ReaderConfig readerConfig;
+
+  private final List dataWriters;
+
+  private FileSplit split;
+
+  private IHDF5Reader hdf5Reader;
+
+  private File inFile;
+
+  private BufferedReader reader;
+
+  private RowSetLoader rowWriter;
+
+  private Iterator metadataIterator;
+
+  private ScalarWriter pathWriter;
+
+  private ScalarWriter dataTypeWriter;
+

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010932#comment-17010932
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364389284
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private final HDF5ReaderConfig readerConfig;
+
+  private final List dataWriters;
+
+  private FileSplit split;
+
+  private IHDF5Reader hdf5Reader;
+
+  private File inFile;
+
+  private BufferedReader reader;
+
+  private RowSetLoader rowWriter;
+
+  private Iterator metadataIterator;
+
+  private ScalarWriter pathWriter;
+
+  private ScalarWriter dataTypeWriter;
+

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010931#comment-17010931
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364389201
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private final HDF5ReaderConfig readerConfig;
+
+  private final List dataWriters;
+
+  private FileSplit split;
+
+  private IHDF5Reader hdf5Reader;
+
+  private File inFile;
+
+  private BufferedReader reader;
+
+  private RowSetLoader rowWriter;
+
+  private Iterator metadataIterator;
+
+  private ScalarWriter pathWriter;
+
+  private ScalarWriter dataTypeWriter;
+

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010933#comment-17010933
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364389461
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private final HDF5ReaderConfig readerConfig;
+
+  private final List dataWriters;
+
+  private FileSplit split;
+
+  private IHDF5Reader hdf5Reader;
+
+  private File inFile;
+
+  private BufferedReader reader;
+
+  private RowSetLoader rowWriter;
+
+  private Iterator metadataIterator;
+
+  private ScalarWriter pathWriter;
+
+  private ScalarWriter dataTypeWriter;
+

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010928#comment-17010928
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364388979
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private final HDF5ReaderConfig readerConfig;
+
+  private final List dataWriters;
+
+  private FileSplit split;
+
+  private IHDF5Reader hdf5Reader;
+
+  private File inFile;
+
+  private BufferedReader reader;
+
+  private RowSetLoader rowWriter;
+
+  private Iterator metadataIterator;
+
+  private ScalarWriter pathWriter;
+
+  private ScalarWriter dataTypeWriter;
+

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010927#comment-17010927
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364388908
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private final HDF5ReaderConfig readerConfig;
+
+  private final List dataWriters;
+
+  private FileSplit split;
+
+  private IHDF5Reader hdf5Reader;
+
+  private File inFile;
+
+  private BufferedReader reader;
+
+  private RowSetLoader rowWriter;
+
+  private Iterator metadataIterator;
+
+  private ScalarWriter pathWriter;
+
+  private ScalarWriter dataTypeWriter;
+

[jira] [Commented] (DRILL-7517) Drill 1.16.0 shuts down frequently

2020-01-08 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010869#comment-17010869
 ] 

Vova Vysotskyi commented on DRILL-7517:
---

[~nitinpawar432], also, if it is possible, could you please share the heap dump 
after OOM is thrown?

> Drill 1.16.0 shuts down frequently
> --
>
> Key: DRILL-7517
> URL: https://issues.apache.org/jira/browse/DRILL-7517
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Nitin Pawar
>Priority: Critical
>
> We see following exception every few hours
> Our drillbit cluster queries data from S3. The only queries we make to web 
> interface are for explain plan and no actual query goes via WEB UI. 
> here is the full exception
> 2020-01-07 16:34:02,922 [qtp80683229-962] INFO 
> o.a.d.exec.server.rest.QueryWrapper - User Error Occurred: There is not 
> enough heap memory to run this query using the web interface. 
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface. 
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned. 
> You can also try an ODBC/JDBC client. 
> [Error Id: 7ad61839-a2e8-4fdd-a600-e662fc0f03e0 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:115) 
> [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:74)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at sun.reflect.GeneratedMethodAccessor212.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
>  [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) 
> [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:297) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
>  [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) 
> [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>  [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>  [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> 

[jira] [Commented] (DRILL-7517) Drill 1.16.0 shuts down frequently

2020-01-08 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010846#comment-17010846
 ] 

Vova Vysotskyi commented on DRILL-7517:
---

[~nitinpawar432], could you please also try to download files to local machine 
and then select them to see whether OOM will happen for this case?

> Drill 1.16.0 shuts down frequently
> --
>
> Key: DRILL-7517
> URL: https://issues.apache.org/jira/browse/DRILL-7517
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Nitin Pawar
>Priority: Critical
>
> We see following exception every few hours
> Our drillbit cluster queries data from S3. The only queries we make to web 
> interface are for explain plan and no actual query goes via WEB UI. 
> here is the full exception
> 2020-01-07 16:34:02,922 [qtp80683229-962] INFO 
> o.a.d.exec.server.rest.QueryWrapper - User Error Occurred: There is not 
> enough heap memory to run this query using the web interface. 
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface. 
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned. 
> You can also try an ODBC/JDBC client. 
> [Error Id: 7ad61839-a2e8-4fdd-a600-e662fc0f03e0 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:115) 
> [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:74)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at sun.reflect.GeneratedMethodAccessor212.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
>  [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) 
> [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:297) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
>  [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) 
> [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>  [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>  [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> 

[jira] [Commented] (DRILL-7517) Drill 1.16.0 shuts down frequently

2020-01-08 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010787#comment-17010787
 ] 

Vova Vysotskyi commented on DRILL-7517:
---

[~nitinpawar432], this available memory may be the memory that wasn't used yet 
as direct. From the log, you have attached, it looks like it is real OOM, but 
not the issue with the check I mentioned above. Could you please reduce direct 
memory size and increase heap size to see, how its usage was increased?

Also, it would be useful to see some examples of queries for which memory usage 
was increased.

> Drill 1.16.0 shuts down frequently
> --
>
> Key: DRILL-7517
> URL: https://issues.apache.org/jira/browse/DRILL-7517
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Nitin Pawar
>Priority: Critical
>
> We see following exception every few hours
> Our drillbit cluster queries data from S3. The only queries we make to web 
> interface are for explain plan and no actual query goes via WEB UI. 
> here is the full exception
> 2020-01-07 16:34:02,922 [qtp80683229-962] INFO 
> o.a.d.exec.server.rest.QueryWrapper - User Error Occurred: There is not 
> enough heap memory to run this query using the web interface. 
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface. 
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned. 
> You can also try an ODBC/JDBC client. 
> [Error Id: 7ad61839-a2e8-4fdd-a600-e662fc0f03e0 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:115) 
> [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:74)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at sun.reflect.GeneratedMethodAccessor212.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
>  [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) 
> [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:297) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
>  [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) 
> [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> 

[jira] [Comment Edited] (DRILL-7517) Drill 1.16.0 shuts down frequently

2020-01-08 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010786#comment-17010786
 ] 

Arina Ielchiieva edited comment on DRILL-7517 at 1/8/20 3:47 PM:
-

[~nitinpawar432] 
{quote}
drill now adds double quote to csv by itself and we do the same so it gets 
doubled and data gets messed up
{quote}
Apparently, your data contains spaces so adding quotes is the correct behavior. 
I understand why you have been adding quotes yourself since before Drill did 
not behave correctly but since now it does, the best you can do is to modify 
your app not to add quotes.

{quote}
If there is a way where we could provide quotechar and escapechar to drill, we 
could move forward to 1.17
{quote}

There is an option to provide {{quote}} and {{escape}}, you can do in text 
format plugin configuration 
(https://drill.apache.org/docs/text-files-csv-tsv-psv/).


was (Author: arina):
[~nitinpawar432] 
{quote}
drill now adds double quote to csv by itself and we do the same so it gets 
doubled and data gets messed up
{quote}
Apparently, your data contains spaces so adding quotes is the correct behavior. 
I understand why you have been adding quotes yourself since before Drill did 
not behave correctly but since now it does, the best you can do is to modify 
your app not to add quotes.

{quote}
If there is a way where we could provide quotechar and escapechar to drill, we 
could move forward to 1.17
{quote}

There is an option to provide quotechar and escapechar, you can do in text 
format plugin configuration 
(https://drill.apache.org/docs/text-files-csv-tsv-psv/): {{quote}} and 
{{escape}} options.

> Drill 1.16.0 shuts down frequently
> --
>
> Key: DRILL-7517
> URL: https://issues.apache.org/jira/browse/DRILL-7517
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Nitin Pawar
>Priority: Critical
>
> We see following exception every few hours
> Our drillbit cluster queries data from S3. The only queries we make to web 
> interface are for explain plan and no actual query goes via WEB UI. 
> here is the full exception
> 2020-01-07 16:34:02,922 [qtp80683229-962] INFO 
> o.a.d.exec.server.rest.QueryWrapper - User Error Occurred: There is not 
> enough heap memory to run this query using the web interface. 
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface. 
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned. 
> You can also try an ODBC/JDBC client. 
> [Error Id: 7ad61839-a2e8-4fdd-a600-e662fc0f03e0 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:115) 
> [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:74)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at sun.reflect.GeneratedMethodAccessor212.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
>  [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) 
> [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> 

[jira] [Commented] (DRILL-7517) Drill 1.16.0 shuts down frequently

2020-01-08 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010786#comment-17010786
 ] 

Arina Ielchiieva commented on DRILL-7517:
-

[~nitinpawar432] 
{quote}
drill now adds double quote to csv by itself and we do the same so it gets 
doubled and data gets messed up
{quote}
Apparently, your data contains spaces so adding quotes is the correct behavior. 
I understand why you have been adding quotes yourself since before Drill did 
not behave correctly but since now it does, the best you can do is to modify 
your app not to add quotes.

{quote}
If there is a way where we could provide quotechar and escapechar to drill, we 
could move forward to 1.17
{quote}

There is an option to provide quotechar and escapechar, you can do in text 
format plugin configuration 
(https://drill.apache.org/docs/text-files-csv-tsv-psv/): {{quote}} and 
{{escape}} options.

> Drill 1.16.0 shuts down frequently
> --
>
> Key: DRILL-7517
> URL: https://issues.apache.org/jira/browse/DRILL-7517
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Nitin Pawar
>Priority: Critical
>
> We see following exception every few hours
> Our drillbit cluster queries data from S3. The only queries we make to web 
> interface are for explain plan and no actual query goes via WEB UI. 
> here is the full exception
> 2020-01-07 16:34:02,922 [qtp80683229-962] INFO 
> o.a.d.exec.server.rest.QueryWrapper - User Error Occurred: There is not 
> enough heap memory to run this query using the web interface. 
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface. 
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned. 
> You can also try an ODBC/JDBC client. 
> [Error Id: 7ad61839-a2e8-4fdd-a600-e662fc0f03e0 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:115) 
> [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:74)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at sun.reflect.GeneratedMethodAccessor212.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
>  [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) 
> [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:297) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
>  [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) 
> [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
>  

[jira] [Updated] (DRILL-7518) Parquet INT64 Nullable Type Support

2020-01-08 Thread Vova Vysotskyi (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Vysotskyi updated DRILL-7518:
--
Labels: ready-to-commit  (was: )

> Parquet INT64 Nullable Type Support
> ---
>
> Key: DRILL-7518
> URL: https://issues.apache.org/jira/browse/DRILL-7518
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
> Environment: Tested on the apache/drill:1.17.0 docker image.
>Reporter: David Severski
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Querying a parquet file with fields of type INT64 generates an immediate 
> error in the complex parquet reader with an error of "Unsupported nullable 
> converted type INT_64 for primitive type INT64". Attempts to work around this 
> via explicit CAST() and CONVERT_FROM() are unsuccessful. The suggestion from 
> drill-users is that an implementation needs to be made at 
> https://github.com/apache/drill/blob/9993fa3547b029db5fe33a2210fa6f07e8ac1990/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java#L303.
>  
> If related, a similar INT32 type field in the same file exhibits this 
> problem, but can be worked around via an explicit CAST() to INT.
>  
> At this time, I do not have a sanitized parquet file to submit as a reference 
> example. :(
>  
> Reference thread on drill-users list: 
> http://mail-archives.apache.org/mod_mbox/drill-user/202001.mbox/%3ccajguoa53ldkxqsh1fsvtj+dk5421eg4aw4paim++8bferxd...@mail.gmail.com%3e



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7518) Parquet INT64 Nullable Type Support

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010778#comment-17010778
 ] 

ASF GitHub Bot commented on DRILL-7518:
---

arina-ielchiieva commented on pull request #1952: DRILL-7518: Support INT_64 
for nullable INT64 in Parquet
URL: https://github.com/apache/drill/pull/1952
 
 
   Jira - [DRILL-7518](https://issues.apache.org/jira/browse/DRILL-7518).
   
   Sadly there is no file to reproduce the issue was attached. It's hard to 
generate such file using Jira code since such converted types are deprecated. 
Apparently, `pyspark` generates such files but did not try to generate file 
with it.
   
   Anyway, checked all supported converted types for int 32 and 64 according to 
documentation 
(https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) and made 
sure all of them are supported for non-nullable and nullable readers. So far 
only one was missing.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Parquet INT64 Nullable Type Support
> ---
>
> Key: DRILL-7518
> URL: https://issues.apache.org/jira/browse/DRILL-7518
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
> Environment: Tested on the apache/drill:1.17.0 docker image.
>Reporter: David Severski
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Querying a parquet file with fields of type INT64 generates an immediate 
> error in the complex parquet reader with an error of "Unsupported nullable 
> converted type INT_64 for primitive type INT64". Attempts to work around this 
> via explicit CAST() and CONVERT_FROM() are unsuccessful. The suggestion from 
> drill-users is that an implementation needs to be made at 
> https://github.com/apache/drill/blob/9993fa3547b029db5fe33a2210fa6f07e8ac1990/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java#L303.
>  
> If related, a similar INT32 type field in the same file exhibits this 
> problem, but can be worked around via an explicit CAST() to INT.
>  
> At this time, I do not have a sanitized parquet file to submit as a reference 
> example. :(
>  
> Reference thread on drill-users list: 
> http://mail-archives.apache.org/mod_mbox/drill-user/202001.mbox/%3ccajguoa53ldkxqsh1fsvtj+dk5421eg4aw4paim++8bferxd...@mail.gmail.com%3e



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010775#comment-17010775
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

arina-ielchiieva commented on pull request #1944: DRILL-7503: Refactor the 
project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364290284
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectionMaterializer.java
 ##
 @@ -0,0 +1,625 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.commons.collections.map.CaseInsensitiveMap;
+import org.apache.drill.common.expression.ConvertExpression;
+import org.apache.drill.common.expression.ErrorCollector;
+import org.apache.drill.common.expression.ErrorCollectorImpl;
+import org.apache.drill.common.expression.ExpressionPosition;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.FunctionCall;
+import org.apache.drill.common.expression.FunctionCallFactory;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.expression.ValueExpressions;
+import org.apache.drill.common.expression.PathSegment.NameSegment;
+import org.apache.drill.common.expression.fn.FunctionReplacementUtils;
+import org.apache.drill.common.logical.data.NamedExpression;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.exception.ClassTransformationException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.ClassGenerator;
+import org.apache.drill.exec.expr.CodeGenerator;
+import org.apache.drill.exec.expr.DrillFuncHolderExpr;
+import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.expr.fn.FunctionLookupContext;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.planner.StarColumnHelper;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.carrotsearch.hppc.IntHashSet;
+
+/**
+ * Plans the projection given the incoming and requested outgoing schemas. 
Works
+ * with the {@link VectorState} to create required vectors, writers and so on.
+ * Populates the code generator with the "projector" expressions.
+ */
+class ProjectionMaterializer {
+  private static final Logger logger = 
LoggerFactory.getLogger(ProjectionMaterializer.class);
+  private static final String EMPTY_STRING = "";
+
+  /**
+   * Abstracts the physical vector setup operations to separate
+   * the physical setup, in ProjectRecordBatch, from the
+   * logical setup in the materializer class.
+   */
+  public interface BatchBuilder {
+void addTransferField(String name, ValueVector vvIn);
+ValueVectorWriteExpression addOutputVector(String name, LogicalExpression 
expr);
+int addDirectTransfer(FieldReference ref, ValueVectorReadExpression 
vectorRead);
+void addComplexField(FieldReference ref);
+ValueVectorWriteExpression addEvalVector(String outputName,
+LogicalExpression expr);
+  }
+
+  private static class ClassifierResult {
+private boolean isStar;
+private List outputNames;
+private String prefix = "";
+

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010774#comment-17010774
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

arina-ielchiieva commented on pull request #1944: DRILL-7503: Refactor the 
project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364289943
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectionMaterializer.java
 ##
 @@ -0,0 +1,625 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.commons.collections.map.CaseInsensitiveMap;
+import org.apache.drill.common.expression.ConvertExpression;
+import org.apache.drill.common.expression.ErrorCollector;
+import org.apache.drill.common.expression.ErrorCollectorImpl;
+import org.apache.drill.common.expression.ExpressionPosition;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.FunctionCall;
+import org.apache.drill.common.expression.FunctionCallFactory;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.expression.ValueExpressions;
+import org.apache.drill.common.expression.PathSegment.NameSegment;
+import org.apache.drill.common.expression.fn.FunctionReplacementUtils;
+import org.apache.drill.common.logical.data.NamedExpression;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.exception.ClassTransformationException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.ClassGenerator;
+import org.apache.drill.exec.expr.CodeGenerator;
+import org.apache.drill.exec.expr.DrillFuncHolderExpr;
+import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.expr.fn.FunctionLookupContext;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.planner.StarColumnHelper;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.carrotsearch.hppc.IntHashSet;
+
+/**
+ * Plans the projection given the incoming and requested outgoing schemas. 
Works
+ * with the {@link VectorState} to create required vectors, writers and so on.
+ * Populates the code generator with the "projector" expressions.
+ */
+class ProjectionMaterializer {
+  private static final Logger logger = 
LoggerFactory.getLogger(ProjectionMaterializer.class);
+  private static final String EMPTY_STRING = "";
+
+  /**
+   * Abstracts the physical vector setup operations to separate
+   * the physical setup, in ProjectRecordBatch, from the
+   * logical setup in the materializer class.
+   */
+  public interface BatchBuilder {
+void addTransferField(String name, ValueVector vvIn);
+ValueVectorWriteExpression addOutputVector(String name, LogicalExpression 
expr);
+int addDirectTransfer(FieldReference ref, ValueVectorReadExpression 
vectorRead);
+void addComplexField(FieldReference ref);
+ValueVectorWriteExpression addEvalVector(String outputName,
+LogicalExpression expr);
+  }
+
+  private static class ClassifierResult {
+private boolean isStar;
+private List outputNames;
+private String prefix = "";
+

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010769#comment-17010769
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

arina-ielchiieva commented on pull request #1944: DRILL-7503: Refactor the 
project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364288964
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/TypedFieldId.java
 ##
 @@ -241,7 +241,7 @@ public TypedFieldId build() {
 secondaryFinal = finalType;
   }
 
-  MajorType actualFinalType = finalType;
+  //MajorType actualFinalType = finalType;
 
 Review comment:
   Should we keep this commented code?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7517) Drill 1.16.0 shuts down frequently

2020-01-08 Thread Nitin Pawar (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010770#comment-17010770
 ] 

Nitin Pawar commented on DRILL-7517:


[~volodymyr], we did look at free memory available on machine it did have more 
than 30GB when the OOM came.

> Drill 1.16.0 shuts down frequently
> --
>
> Key: DRILL-7517
> URL: https://issues.apache.org/jira/browse/DRILL-7517
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Nitin Pawar
>Priority: Critical
>
> We see following exception every few hours
> Our drillbit cluster queries data from S3. The only queries we make to web 
> interface are for explain plan and no actual query goes via WEB UI. 
> here is the full exception
> 2020-01-07 16:34:02,922 [qtp80683229-962] INFO 
> o.a.d.exec.server.rest.QueryWrapper - User Error Occurred: There is not 
> enough heap memory to run this query using the web interface. 
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface. 
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned. 
> You can also try an ODBC/JDBC client. 
> [Error Id: 7ad61839-a2e8-4fdd-a600-e662fc0f03e0 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:115) 
> [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:74)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at sun.reflect.GeneratedMethodAccessor212.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
>  [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) 
> [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:297) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
>  [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) 
> [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>  [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>  [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> 

[jira] [Commented] (DRILL-7517) Drill 1.16.0 shuts down frequently

2020-01-08 Thread Nitin Pawar (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010768#comment-17010768
 ] 

Nitin Pawar commented on DRILL-7517:


[~volodymyr], we do not have an option to upgrade to 1.17 as  it has some CSV 
fixes which breaks our current code. (eg: drill now adds double quote to csv by 
itself and we do the same so it gets doubled and data gets messed up).

If there is a way where we could provide quotechar and escapechar to drill, we 
could move forward to 1.17

> Drill 1.16.0 shuts down frequently
> --
>
> Key: DRILL-7517
> URL: https://issues.apache.org/jira/browse/DRILL-7517
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Nitin Pawar
>Priority: Critical
>
> We see following exception every few hours
> Our drillbit cluster queries data from S3. The only queries we make to web 
> interface are for explain plan and no actual query goes via WEB UI. 
> here is the full exception
> 2020-01-07 16:34:02,922 [qtp80683229-962] INFO 
> o.a.d.exec.server.rest.QueryWrapper - User Error Occurred: There is not 
> enough heap memory to run this query using the web interface. 
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface. 
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned. 
> You can also try an ODBC/JDBC client. 
> [Error Id: 7ad61839-a2e8-4fdd-a600-e662fc0f03e0 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:115) 
> [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:74)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at sun.reflect.GeneratedMethodAccessor212.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
>  [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) 
> [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:297) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
>  [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) 
> [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>  

[jira] [Commented] (DRILL-7517) Drill 1.16.0 shuts down frequently

2020-01-08 Thread Nitin Pawar (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010766#comment-17010766
 ] 

Nitin Pawar commented on DRILL-7517:


Here is the OOM log

 

2020-01-07 09:30:37,254 [21ebb195-58af-7ac6-e1d1-a7cebb1de4c8:foreman] ERROR 
o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, exiting. 
Information message: Unable to handle out of memory condition in Foreman.
java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_222]
 at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
 ~[na:1.8.0_222]
 at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) 
~[na:1.8.0_222]
 at java.lang.StringBuilder.append(StringBuilder.java:136) ~[na:1.8.0_222]
 at 
org.apache.drill.common.expression.ExpressionStringBuilder.visitUnknown(ExpressionStringBuilder.java:351)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.ExpressionStringBuilder.visitUnknown(ExpressionStringBuilder.java:45)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.visitors.AbstractExprVisitor.visitTypedFieldExpr(AbstractExprVisitor.java:195)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.TypedFieldExpr.accept(TypedFieldExpr.java:38)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.ExpressionStringBuilder.visitFunctionHolderExpression(ExpressionStringBuilder.java:97)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.ExpressionStringBuilder.visitFunctionHolderExpression(ExpressionStringBuilder.java:45)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.FunctionHolderExpression.accept(FunctionHolderExpression.java:53)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.ExpressionStringBuilder.visitFunctionCall(ExpressionStringBuilder.java:77)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.ExpressionStringBuilder.visitBooleanOperator(ExpressionStringBuilder.java:85)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.ExpressionStringBuilder.visitBooleanOperator(ExpressionStringBuilder.java:45)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.BooleanOperator.accept(BooleanOperator.java:35)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.ExpressionStringBuilder.visitFunctionCall(ExpressionStringBuilder.java:77)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.ExpressionStringBuilder.visitBooleanOperator(ExpressionStringBuilder.java:85)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.ExpressionStringBuilder.visitBooleanOperator(ExpressionStringBuilder.java:45)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.BooleanOperator.accept(BooleanOperator.java:35)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.common.expression.ExpressionStringBuilder.toString(ExpressionStringBuilder.java:52)
 ~[drill-logical-1.16.0.jar:1.16.0]
 at 
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.getFilterPredicate(AbstractGroupScanWithMetadata.java:314)
 ~[drill-java-exec-1.16.0.jar:1.16.0]
 at 
org.apache.drill.exec.store.parquet.AbstractParquetGroupScan.applyFilter(AbstractParquetGroupScan.java:216)
 ~[drill-java-exec-1.16.0.jar:1.16.0]
 at 
org.apache.drill.exec.store.parquet.ParquetPushDownFilter.doOnMatch(ParquetPushDownFilter.java:175)
 ~[drill-java-exec-1.16.0.jar:1.16.0]
 at 
org.apache.drill.exec.store.parquet.ParquetPushDownFilter$2.onMatch(ParquetPushDownFilter.java:103)
 ~[drill-java-exec-1.16.0.jar:1.16.0]
 at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:319)
 ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
 at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:561) 
~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
 at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:420) 
~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
 at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:257) 
~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
 at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
 ~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
 at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:216) 
~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
 at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:203) 
~[calcite-core-1.18.0-drill-r0.jar:1.18.0-drill-r0]
 at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:419)
 ~[drill-java-exec-1.16.0.jar:1.16.0]

> Drill 1.16.0 shuts down frequently
> --
>
>   

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010765#comment-17010765
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

arina-ielchiieva commented on pull request #1778: DRILL-7233: Format Plugin for 
HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364287412
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Format.java
 ##
 @@ -0,0 +1,907 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.ExecTest;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.store.dfs.ZipCodec;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+@Category(RowSetTests.class)
+public class TestHDF5Format extends ClusterTest {
+
+  @BeforeClass
+  public static void setup() throws Exception {
+ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
+
+HDF5FormatConfig formatConfig = new HDF5FormatConfig();
+cluster.defineFormat("dfs", "hdf5", formatConfig);
+cluster.defineFormat("cp", "hdf5", formatConfig);
+dirTestWatcher.copyResourceToRoot(Paths.get("hdf5/"));
+  }
+
+  @Test
+  public void testExplicitQuery() throws RpcException {
+String sql = "SELECT path, data_type, file_name FROM cp.`hdf5/dset.h5`";
+RowSet results = client.queryBuilder().sql(sql).rowSet();
+TupleMetadata expectedSchema = new SchemaBuilder()
+  .add("path", TypeProtos.MinorType.VARCHAR, TypeProtos.DataMode.OPTIONAL)
+  .add("data_type", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+  .add("file_name", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+  .buildSchema();
+
+RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema)
+  .addRow("/dset", "DATASET", "dset.h5")
+  .build();
+new RowSetComparison(expected).unorderedVerifyAndClearAll(results);
+  }
+
+  @Test
+  public void testStarQuery() throws Exception {
+List t1 = Arrays.asList(1, 2, 3, 4, 5, 6);
+List t2 = Arrays.asList(7, 8, 9, 10, 11, 12);
+List t3 = Arrays.asList(13, 14, 15, 16, 17, 18);
+List t4 = Arrays.asList(19, 20, 21, 22, 23, 24);
+List> finalList = new ArrayList<>();
+finalList.add(t1);
+finalList.add(t2);
+finalList.add(t3);
+finalList.add(t4);
+
+testBuilder()
+  .sqlQuery("SELECT * FROM cp.`hdf5/dset.h5`")
+  .unOrdered()
+  .baselineColumns("path", "data_type", "file_name", "int_data")
+  .baselineValues("/dset", "DATASET", "dset.h5", finalList)
+  .go();
+  }
+
+  @Test
+  public void testSimpleExplicitQuery() throws Exception {
+List t1 = Arrays.asList(1, 2, 3, 4, 5, 6);
+List t2 = Arrays.asList(7, 8, 9, 10, 11, 12);
+List t3 = Arrays.asList(13, 14, 15, 16, 17, 18);
+List t4 = Arrays.asList(19, 20, 21, 22, 23, 24);
+List> finalList = new ArrayList<>();
+

[jira] [Commented] (DRILL-7233) Format Plugin for HDF5

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010760#comment-17010760
 ] 

ASF GitHub Bot commented on DRILL-7233:
---

cgivre commented on pull request #1778: DRILL-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r364280559
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Format.java
 ##
 @@ -0,0 +1,907 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.ExecTest;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.store.dfs.ZipCodec;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+@Category(RowSetTests.class)
+public class TestHDF5Format extends ClusterTest {
+
+  @BeforeClass
+  public static void setup() throws Exception {
+ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
+
+HDF5FormatConfig formatConfig = new HDF5FormatConfig();
+cluster.defineFormat("dfs", "hdf5", formatConfig);
+cluster.defineFormat("cp", "hdf5", formatConfig);
+dirTestWatcher.copyResourceToRoot(Paths.get("hdf5/"));
+  }
+
+  @Test
+  public void testExplicitQuery() throws RpcException {
+String sql = "SELECT path, data_type, file_name FROM cp.`hdf5/dset.h5`";
+RowSet results = client.queryBuilder().sql(sql).rowSet();
+TupleMetadata expectedSchema = new SchemaBuilder()
+  .add("path", TypeProtos.MinorType.VARCHAR, TypeProtos.DataMode.OPTIONAL)
+  .add("data_type", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+  .add("file_name", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+  .buildSchema();
+
+RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema)
+  .addRow("/dset", "DATASET", "dset.h5")
+  .build();
+new RowSetComparison(expected).unorderedVerifyAndClearAll(results);
+  }
+
+  @Test
+  public void testStarQuery() throws Exception {
+List t1 = Arrays.asList(1, 2, 3, 4, 5, 6);
+List t2 = Arrays.asList(7, 8, 9, 10, 11, 12);
+List t3 = Arrays.asList(13, 14, 15, 16, 17, 18);
+List t4 = Arrays.asList(19, 20, 21, 22, 23, 24);
+List> finalList = new ArrayList<>();
+finalList.add(t1);
+finalList.add(t2);
+finalList.add(t3);
+finalList.add(t4);
+
+testBuilder()
+  .sqlQuery("SELECT * FROM cp.`hdf5/dset.h5`")
+  .unOrdered()
+  .baselineColumns("path", "data_type", "file_name", "int_data")
+  .baselineValues("/dset", "DATASET", "dset.h5", finalList)
+  .go();
+  }
+
+  @Test
+  public void testSimpleExplicitQuery() throws Exception {
+List t1 = Arrays.asList(1, 2, 3, 4, 5, 6);
+List t2 = Arrays.asList(7, 8, 9, 10, 11, 12);
+List t3 = Arrays.asList(13, 14, 15, 16, 17, 18);
+List t4 = Arrays.asList(19, 20, 21, 22, 23, 24);
+List> finalList = new ArrayList<>();
+finalList.add(t1);

[jira] [Comment Edited] (DRILL-7517) Drill 1.16.0 shuts down frequently

2020-01-08 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010738#comment-17010738
 ] 

Vova Vysotskyi edited comment on DRILL-7517 at 1/8/20 3:01 PM:
---

I think this issue may be connected with the fix for DRILL-6477, where was 
added a check for heap memory usage but memory usage was the same.
[~nitinpawar432], could you please check it with some profiling tools, like 
VisualVM, whether memory usage was really increased.

Also, in Drill 1.17 was added session option to configure heap threshold or 
disable this check {{drill.exec.http.memory.heap.failure.threshold}}. To 
disable this check, 0 value should be set.


was (Author: vvysotskyi):
I think this issue may be connected with the fix for DRILL-6477, where was 
added a check for heap memory usage but memory usage was the same.
[~nitinpawar432], could you please check it with some profiling tools, like 
VisualVM, whether memory usage was really increased.

Also, in Drill 1.17 was added session option to configure heap threshold or 
disable this check.

> Drill 1.16.0 shuts down frequently
> --
>
> Key: DRILL-7517
> URL: https://issues.apache.org/jira/browse/DRILL-7517
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Nitin Pawar
>Priority: Critical
>
> We see following exception every few hours
> Our drillbit cluster queries data from S3. The only queries we make to web 
> interface are for explain plan and no actual query goes via WEB UI. 
> here is the full exception
> 2020-01-07 16:34:02,922 [qtp80683229-962] INFO 
> o.a.d.exec.server.rest.QueryWrapper - User Error Occurred: There is not 
> enough heap memory to run this query using the web interface. 
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface. 
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned. 
> You can also try an ODBC/JDBC client. 
> [Error Id: 7ad61839-a2e8-4fdd-a600-e662fc0f03e0 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:115) 
> [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:74)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at sun.reflect.GeneratedMethodAccessor212.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
>  [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) 
> [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:297) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
>  [jersey-common-2.25.1.jar:na]
>  at 

[jira] [Commented] (DRILL-7517) Drill 1.16.0 shuts down frequently

2020-01-08 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010738#comment-17010738
 ] 

Vova Vysotskyi commented on DRILL-7517:
---

I think this issue may be connected with the fix for DRILL-6477, where was 
added a check for heap memory usage but memory usage was the same.
[~nitinpawar432], could you please check it with some profiling tools, like 
VisualVM, whether memory usage was really increased.

Also, in Drill 1.17 was added session option to configure heap threshold or 
disable this check.

> Drill 1.16.0 shuts down frequently
> --
>
> Key: DRILL-7517
> URL: https://issues.apache.org/jira/browse/DRILL-7517
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Nitin Pawar
>Priority: Critical
>
> We see following exception every few hours
> Our drillbit cluster queries data from S3. The only queries we make to web 
> interface are for explain plan and no actual query goes via WEB UI. 
> here is the full exception
> 2020-01-07 16:34:02,922 [qtp80683229-962] INFO 
> o.a.d.exec.server.rest.QueryWrapper - User Error Occurred: There is not 
> enough heap memory to run this query using the web interface. 
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface. 
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned. 
> You can also try an ODBC/JDBC client. 
> [Error Id: 7ad61839-a2e8-4fdd-a600-e662fc0f03e0 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:115) 
> [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:74)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at sun.reflect.GeneratedMethodAccessor212.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
>  [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) 
> [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:297) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
>  [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) 
> [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> 

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010735#comment-17010735
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364234843
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectionMaterializer.java
 ##
 @@ -0,0 +1,625 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.commons.collections.map.CaseInsensitiveMap;
+import org.apache.drill.common.expression.ConvertExpression;
+import org.apache.drill.common.expression.ErrorCollector;
+import org.apache.drill.common.expression.ErrorCollectorImpl;
+import org.apache.drill.common.expression.ExpressionPosition;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.FunctionCall;
+import org.apache.drill.common.expression.FunctionCallFactory;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.expression.ValueExpressions;
+import org.apache.drill.common.expression.PathSegment.NameSegment;
+import org.apache.drill.common.expression.fn.FunctionReplacementUtils;
+import org.apache.drill.common.logical.data.NamedExpression;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.exception.ClassTransformationException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.ClassGenerator;
+import org.apache.drill.exec.expr.CodeGenerator;
+import org.apache.drill.exec.expr.DrillFuncHolderExpr;
+import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.expr.fn.FunctionLookupContext;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.planner.StarColumnHelper;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.carrotsearch.hppc.IntHashSet;
+
+/**
+ * Plans the projection given the incoming and requested outgoing schemas. 
Works
+ * with the {@link VectorState} to create required vectors, writers and so on.
 
 Review comment:
   please update the comment since ```VectorState``` is gone.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" 

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010733#comment-17010733
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364218502
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectBatchBuilder.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.TypedFieldId;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+public class ProjectBatchBuilder implements 
ProjectionMaterializer.BatchBuilder {
+  private final ProjectRecordBatch projectBatch;
+  private final VectorContainer container;
+  private final SchemaChangeCallBack callBack;
+  private final RecordBatch incomingBatch;
+  final List transfers = new ArrayList<>();
 
 Review comment:
   please make also private + add getter + move ```= new ArrayList<>()``` into 
constructor. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010734#comment-17010734
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364223072
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectBatchBuilder.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.TypedFieldId;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+public class ProjectBatchBuilder implements 
ProjectionMaterializer.BatchBuilder {
+  private final ProjectRecordBatch projectBatch;
+  private final VectorContainer container;
+  private final SchemaChangeCallBack callBack;
+  private final RecordBatch incomingBatch;
+  final List transfers = new ArrayList<>();
+
+  public ProjectBatchBuilder(ProjectRecordBatch projectBatch, VectorContainer 
container,
+  SchemaChangeCallBack callBack, RecordBatch incomingBatch) {
+this.projectBatch = projectBatch;
+this.container = container;
+this.callBack = callBack;
+this.incomingBatch = incomingBatch;
+  }
+
+  @Override
+  public void addTransferField(String name, ValueVector vvIn) {
+FieldReference ref = new FieldReference(name);
+ValueVector vvOut = 
container.addOrGet(MaterializedField.create(ref.getAsNamePart().getName(),
+  vvIn.getField().getType()), callBack);
+projectBatch.memoryManager.addTransferField(vvIn, 
vvIn.getField().getName(), vvOut.getField().getName());
+transfers.add(vvIn.makeTransferPair(vvOut));
+  }
+
+  @Override
+  public int addDirectTransfer(FieldReference ref, ValueVectorReadExpression 
vectorRead) {
+TypedFieldId id = vectorRead.getFieldId();
+ValueVector vvIn = 
incomingBatch.getValueAccessorById(id.getIntermediateClass(), 
id.getFieldIds()).getValueVector();
+Preconditions.checkNotNull(incomingBatch);
+
+ValueVector vvOut =
+
container.addOrGet(MaterializedField.create(ref.getLastSegment().getNameSegment().getPath(),
+vectorRead.getMajorType()), callBack);
+TransferPair tp = vvIn.makeTransferPair(vvOut);
+projectBatch.memoryManager.addTransferField(vvIn, TypedFieldId.getPath(id, 
incomingBatch), vvOut.getField().getName());
+transfers.add(tp);
+return vectorRead.getFieldId().getFieldIds()[0];
+  }
+
+  @Override
+  public ValueVectorWriteExpression addOutputVector(String name, 
LogicalExpression expr) {
+MaterializedField outputField = MaterializedField.create(name, 
expr.getMajorType());
+ValueVector vv = container.addOrGet(outputField, callBack);
+projectBatch.allocationVectors.add(vv);
+TypedFieldId fid = 
container.getValueVectorId(SchemaPath.getSimplePath(outputField.getName()));
+ValueVectorWriteExpression write = new ValueVectorWriteExpression(fid, 
expr, true);
+projectBatch.memoryManager.addNewField(vv, write);
+return write;
+  }
+
+  @Override
+  public void addComplexField(FieldReference ref) {
+initComplexWriters();
+if (projectBatch.complexFieldReferencesList == null) {
+  projectBatch.complexFieldReferencesList = 

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010732#comment-17010732
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364210835
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/OutputWidthVisitor.java
 ##
 @@ -43,238 +43,213 @@
 
 import java.util.ArrayList;
 
-public class OutputWidthVisitor extends 
AbstractExecExprVisitor {
-
-@Override
-public OutputWidthExpression visitVarDecimalConstant(VarDecimalExpression 
varDecimalExpression,
- 
OutputWidthVisitorState state) throws RuntimeException {
-
Preconditions.checkArgument(varDecimalExpression.getMajorType().hasPrecision());
-return new 
FixedLenExpr(varDecimalExpression.getMajorType().getPrecision());
-}
-
-
-/**
- *
- * Records the {@link IfExpression} as a {@link IfElseWidthExpr}. 
IfElseWidthExpr will be reduced to
- * a {@link FixedLenExpr} by taking the max of the if-expr-width and the 
else-expr-width.
- *
- * @param ifExpression
- * @param state
- * @return IfElseWidthExpr
- * @throws RuntimeException
- */
-@Override
-public OutputWidthExpression visitIfExpression(IfExpression ifExpression, 
OutputWidthVisitorState state)
-throws 
RuntimeException {
-IfExpression.IfCondition condition = ifExpression.ifCondition;
-LogicalExpression ifExpr = condition.expression;
-LogicalExpression elseExpr = ifExpression.elseExpression;
-
-OutputWidthExpression ifWidthExpr = ifExpr.accept(this, state);
-OutputWidthExpression elseWidthExpr = null;
-if (elseExpr != null) {
-elseWidthExpr = elseExpr.accept(this, state);
-}
-return new IfElseWidthExpr(ifWidthExpr, elseWidthExpr);
+public class OutputWidthVisitor
+extends AbstractExecExprVisitor {
+
+  @Override
+  public OutputWidthExpression visitVarDecimalConstant(VarDecimalExpression 
varDecimalExpression,
+   OutputWidthVisitorState 
state) throws RuntimeException {
+
Preconditions.checkArgument(varDecimalExpression.getMajorType().hasPrecision());
+return new 
FixedLenExpr(varDecimalExpression.getMajorType().getPrecision());
+  }
+
+  /**
+   *
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010731#comment-17010731
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364208438
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/OutputWidthVisitor.java
 ##
 @@ -43,238 +43,213 @@
 
 import java.util.ArrayList;
 
-public class OutputWidthVisitor extends 
AbstractExecExprVisitor {
-
-@Override
-public OutputWidthExpression visitVarDecimalConstant(VarDecimalExpression 
varDecimalExpression,
- 
OutputWidthVisitorState state) throws RuntimeException {
-
Preconditions.checkArgument(varDecimalExpression.getMajorType().hasPrecision());
-return new 
FixedLenExpr(varDecimalExpression.getMajorType().getPrecision());
-}
-
-
-/**
- *
- * Records the {@link IfExpression} as a {@link IfElseWidthExpr}. 
IfElseWidthExpr will be reduced to
- * a {@link FixedLenExpr} by taking the max of the if-expr-width and the 
else-expr-width.
- *
- * @param ifExpression
- * @param state
- * @return IfElseWidthExpr
- * @throws RuntimeException
- */
-@Override
-public OutputWidthExpression visitIfExpression(IfExpression ifExpression, 
OutputWidthVisitorState state)
-throws 
RuntimeException {
-IfExpression.IfCondition condition = ifExpression.ifCondition;
-LogicalExpression ifExpr = condition.expression;
-LogicalExpression elseExpr = ifExpression.elseExpression;
-
-OutputWidthExpression ifWidthExpr = ifExpr.accept(this, state);
-OutputWidthExpression elseWidthExpr = null;
-if (elseExpr != null) {
-elseWidthExpr = elseExpr.accept(this, state);
-}
-return new IfElseWidthExpr(ifWidthExpr, elseWidthExpr);
+public class OutputWidthVisitor
+extends AbstractExecExprVisitor {
 
 Review comment:
   Please reformat to: 
   
   ```java
   public class OutputWidthVisitor extends
   AbstractExecExprVisitor {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7519) Error on case when different branche are array of same type but build differenlty

2020-01-08 Thread benj (Jira)
benj created DRILL-7519:
---

 Summary: Error on case when different branche are array of same 
type but build differenlty
 Key: DRILL-7519
 URL: https://issues.apache.org/jira/browse/DRILL-7519
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: benj


With 3 array build like
{code:sql}
SELECT T.s, typeof(T.s), modeof(T.s)
  ,T.j, typeof(T.j), modeof(T.j)
  ,T.j2.a, typeof(T.j2.a), modeof(T.j2.a)
FROM (
 SELECT split('a,b',',') as s
 , convert_fromJSON('["c","d"]') AS j
 , convert_fromJSON('{"tag":["e","f"]}') AS j2
) AS T
+---+-++---+-++---+-++
| s | EXPR$1  | EXPR$2 | j | EXPR$4  | EXPR$5 |  EXPR$6   | 
EXPR$7  | EXPR$8 |
+---+-++---+-++---+-++
| ["a","b"] | VARCHAR | ARRAY  | ["c","d"] | VARCHAR | ARRAY  | ["e","f"] | 
VARCHAR | ARRAY  |
+---+-++---+-++---+-++
{code}
it's possible to use *s* and *j* in the branch of the same case, but it's not 
possible to use *s or j* in accordance with *j2.tag*
{code:sql}
SELECT CASE WHEN true THEN T.s ELSE T.j END
 , CASE WHEN false THEN T.s ELSE T.j END
FROM (
 SELECT split('a,b',',') AS s
  , convert_fromJSON('["c","d"]') AS j
  , convert_fromJSON('{"tag":["e","f"]}') AS j2
) AS T
+---+---+
|  EXPR$0   |  EXPR$1   |
+---+---+
| ["a","b"] | ["c","d"] |
+---+---+

SELECT CASE WHEN true THEN T.j2.tag ELSE T.s /*idem with T.j*/ END
 , CASE WHEN false THEN T.j2.tag ELSE T.s /*idem with T.j*/ END
 FROM (SELECT split('a,b',',') AS s, convert_fromJSON('["c","d"]') AS j, 
convert_fromJSON('{"tag":["e","f"]}') AS j2) AS T;
+---+---+
|  EXPR$0   |  EXPR$1   |
+---+---+
| ["e","f"] | ["a","b"] |
+---+---+

/* But surprisingly */
SELECT CASE WHEN false THEN T.j2.tag ELSE T.s /*idem with T.j*/ END
FROM (SELECT split('a,b',',') AS s, convert_fromJSON('["c","d"]') AS j, 
convert_fromJSON('{"tag":["e","f"]}') AS j2) AS T;
Error: SYSTEM ERROR: NullPointerException

/* and */
SELECT CASE WHEN true THEN T.j2.tag ELSE T.s /*idem with T.j*/ END
FROM (SELECT split('a,b',',') AS s, convert_fromJSON('["c","d"]') AS j, 
convert_fromJSON('{"tag":["e","f"]}') AS j2) AS T;
+---+
|  EXPR$0   |
+---+
| ["e","f"] |
+---+

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7518) Parquet INT64 Nullable Type Support

2020-01-08 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7518:
---

Assignee: Arina Ielchiieva

> Parquet INT64 Nullable Type Support
> ---
>
> Key: DRILL-7518
> URL: https://issues.apache.org/jira/browse/DRILL-7518
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
> Environment: Tested on the apache/drill:1.17.0 docker image.
>Reporter: David Severski
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Querying a parquet file with fields of type INT64 generates an immediate 
> error in the complex parquet reader with an error of "Unsupported nullable 
> converted type INT_64 for primitive type INT64". Attempts to work around this 
> via explicit CAST() and CONVERT_FROM() are unsuccessful. The suggestion from 
> drill-users is that an implementation needs to be made at 
> https://github.com/apache/drill/blob/9993fa3547b029db5fe33a2210fa6f07e8ac1990/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java#L303.
>  
> If related, a similar INT32 type field in the same file exhibits this 
> problem, but can be worked around via an explicit CAST() to INT.
>  
> At this time, I do not have a sanitized parquet file to submit as a reference 
> example. :(
>  
> Reference thread on drill-users list: 
> http://mail-archives.apache.org/mod_mbox/drill-user/202001.mbox/%3ccajguoa53ldkxqsh1fsvtj+dk5421eg4aw4paim++8bferxd...@mail.gmail.com%3e



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7518) Parquet INT64 Nullable Type Support

2020-01-08 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7518:

Reviewer: Vova Vysotskyi

> Parquet INT64 Nullable Type Support
> ---
>
> Key: DRILL-7518
> URL: https://issues.apache.org/jira/browse/DRILL-7518
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
> Environment: Tested on the apache/drill:1.17.0 docker image.
>Reporter: David Severski
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.18.0
>
>
> Querying a parquet file with fields of type INT64 generates an immediate 
> error in the complex parquet reader with an error of "Unsupported nullable 
> converted type INT_64 for primitive type INT64". Attempts to work around this 
> via explicit CAST() and CONVERT_FROM() are unsuccessful. The suggestion from 
> drill-users is that an implementation needs to be made at 
> https://github.com/apache/drill/blob/9993fa3547b029db5fe33a2210fa6f07e8ac1990/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java#L303.
>  
> If related, a similar INT32 type field in the same file exhibits this 
> problem, but can be worked around via an explicit CAST() to INT.
>  
> At this time, I do not have a sanitized parquet file to submit as a reference 
> example. :(
>  
> Reference thread on drill-users list: 
> http://mail-archives.apache.org/mod_mbox/drill-user/202001.mbox/%3ccajguoa53ldkxqsh1fsvtj+dk5421eg4aw4paim++8bferxd...@mail.gmail.com%3e



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7518) Parquet INT64 Nullable Type Support

2020-01-08 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7518:

Fix Version/s: 1.18.0

> Parquet INT64 Nullable Type Support
> ---
>
> Key: DRILL-7518
> URL: https://issues.apache.org/jira/browse/DRILL-7518
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
> Environment: Tested on the apache/drill:1.17.0 docker image.
>Reporter: David Severski
>Priority: Major
> Fix For: 1.18.0
>
>
> Querying a parquet file with fields of type INT64 generates an immediate 
> error in the complex parquet reader with an error of "Unsupported nullable 
> converted type INT_64 for primitive type INT64". Attempts to work around this 
> via explicit CAST() and CONVERT_FROM() are unsuccessful. The suggestion from 
> drill-users is that an implementation needs to be made at 
> https://github.com/apache/drill/blob/9993fa3547b029db5fe33a2210fa6f07e8ac1990/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java#L303.
>  
> If related, a similar INT32 type field in the same file exhibits this 
> problem, but can be worked around via an explicit CAST() to INT.
>  
> At this time, I do not have a sanitized parquet file to submit as a reference 
> example. :(
>  
> Reference thread on drill-users list: 
> http://mail-archives.apache.org/mod_mbox/drill-user/202001.mbox/%3ccajguoa53ldkxqsh1fsvtj+dk5421eg4aw4paim++8bferxd...@mail.gmail.com%3e



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7502) Incorrect/invalid codegen for typeof() with UNION

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010609#comment-17010609
 ] 

ASF GitHub Bot commented on DRILL-7502:
---

vvysotskyi commented on issue #1945: DRILL-7502: Invalid codegen for typeof() 
with UNION
URL: https://github.com/apache/drill/pull/1945#issuecomment-572012181
 
 
   @paul-rogers, could you please squash the commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incorrect/invalid codegen for typeof() with UNION
> -
>
> Key: DRILL-7502
> URL: https://issues.apache.org/jira/browse/DRILL-7502
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> The {{typeof()}} function is defined as follows:
> {code:java}
>   @FunctionTemplate(names = {"typeOf"},
>   scope = FunctionTemplate.FunctionScope.SIMPLE,
>   nulls = NullHandling.INTERNAL)
>   public static class GetType implements DrillSimpleFunc {
> @Param
> FieldReader input;
> @Output
> VarCharHolder out;
> @Inject
> DrillBuf buf;
> @Override
> public void setup() {}
> @Override
> public void eval() {
>   String typeName = input.getTypeString();
>   byte[] type = typeName.getBytes();
>   buf = buf.reallocIfNeeded(type.length);
>   buf.setBytes(0, type);
>   out.buffer = buf;
>   out.start = 0;
>   out.end = type.length;
> }
>   }
> {code}
> Note that the {{input}} field is defined as {{FieldReader}} which has a 
> method called {{getTypeString()}}. As a result, the code works fine in all 
> existing tests in {{TestTypeFns}}.
> I tried to add a function to use {{typeof()}} on a column of type {{UNION}}. 
> When I did, the query failed with a compile error in generated code:
> {noformat}
> SYSTEM ERROR: CompileException: Line 42, Column 43: 
>   A method named "getTypeString" is not declared in any enclosing class nor 
> any supertype, nor through a static import
> {noformat}
> The stack trace shows the generated code; Note that the type of {{input}} 
> changes from a reader to a holder, causing code to be invalid:
> {code:java}
> public class ProjectorGen0 {
> DrillBuf work0;
> UnionVector vv1;
> VarCharVector vv6;
> DrillBuf work9;
> VarCharVector vv11;
> DrillBuf work14;
> VarCharVector vv16;
> public void doEval(int inIndex, int outIndex)
> throws SchemaChangeException
> {
> {
> UnionHolder out4 = new UnionHolder();
> {
> out4 .isSet = vv1 .getAccessor().isSet((inIndex));
> if (out4 .isSet == 1) {
> vv1 .getAccessor().get((inIndex), out4);
> }
> }
> // start of eval portion of typeOf function. //
> VarCharHolder out5 = new VarCharHolder();
> {
> final VarCharHolder out = new VarCharHolder();
> UnionHolder input = out4;
> DrillBuf buf = work0;
> UnionFunctions$GetType_eval:
> {
> String typeName = input.getTypeString();
> byte[] type = typeName.getBytes();
> buf = buf.reallocIfNeeded(type.length);
> buf.setBytes(0, type);
> out.buffer = buf;
> out.start = 0;
> out.end = type.length;
> }
> {code}
> By contrast, here is the generated code for one of the existing 
> {{TestTypeFns}} tests where things work:
> {code:java}
> public class ProjectorGen0
> extends ProjectorTemplate
> {
> DrillBuf work0;
> NullableBigIntVector vv1;
> VarCharVector vv7;
> public ProjectorGen0() {
> try {
> __DRILL_INIT__();
> } catch (SchemaChangeException e) {
> throw new UnsupportedOperationException(e);
> }
> }
> public void doEval(int inIndex, int outIndex)
> throws SchemaChangeException
> {
> {
>..
> // start of eval portion of typeOf function. //
> VarCharHolder out6 = new VarCharHolder();
> {
> final VarCharHolder out = new VarCharHolder();
> FieldReader input = new NullableIntHolderReaderImpl(out5);
> DrillBuf buf = work0;
> UnionFunctions$GetType_eval:
> {
> String typeName = input.getTypeString();
> byte[] type = typeName.getBytes();
> buf = buf.reallocIfNeeded(type.length);
> 

[jira] [Commented] (DRILL-7230) Add README.md with instructions for release

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010538#comment-17010538
 ] 

ASF GitHub Bot commented on DRILL-7230:
---

vvysotskyi commented on issue #1937: DRILL-7230: Add README.md with 
instructions for release and release scripts
URL: https://github.com/apache/drill/pull/1937#issuecomment-571978751
 
 
   I have squashed the commits.
   @arina-ielchiieva, thanks a lot for the review, it definitely became good 
only after addressing your comments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add README.md with instructions for release
> ---
>
> Key: DRILL-7230
> URL: https://issues.apache.org/jira/browse/DRILL-7230
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build  Test
>Reporter: Sorabh Hamirwasia
>Assignee: Vova Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7502) Incorrect/invalid codegen for typeof() with UNION

2020-01-08 Thread Vova Vysotskyi (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Vysotskyi updated DRILL-7502:
--
Labels: ready-to-commit  (was: )

> Incorrect/invalid codegen for typeof() with UNION
> -
>
> Key: DRILL-7502
> URL: https://issues.apache.org/jira/browse/DRILL-7502
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> The {{typeof()}} function is defined as follows:
> {code:java}
>   @FunctionTemplate(names = {"typeOf"},
>   scope = FunctionTemplate.FunctionScope.SIMPLE,
>   nulls = NullHandling.INTERNAL)
>   public static class GetType implements DrillSimpleFunc {
> @Param
> FieldReader input;
> @Output
> VarCharHolder out;
> @Inject
> DrillBuf buf;
> @Override
> public void setup() {}
> @Override
> public void eval() {
>   String typeName = input.getTypeString();
>   byte[] type = typeName.getBytes();
>   buf = buf.reallocIfNeeded(type.length);
>   buf.setBytes(0, type);
>   out.buffer = buf;
>   out.start = 0;
>   out.end = type.length;
> }
>   }
> {code}
> Note that the {{input}} field is defined as {{FieldReader}} which has a 
> method called {{getTypeString()}}. As a result, the code works fine in all 
> existing tests in {{TestTypeFns}}.
> I tried to add a function to use {{typeof()}} on a column of type {{UNION}}. 
> When I did, the query failed with a compile error in generated code:
> {noformat}
> SYSTEM ERROR: CompileException: Line 42, Column 43: 
>   A method named "getTypeString" is not declared in any enclosing class nor 
> any supertype, nor through a static import
> {noformat}
> The stack trace shows the generated code; Note that the type of {{input}} 
> changes from a reader to a holder, causing code to be invalid:
> {code:java}
> public class ProjectorGen0 {
> DrillBuf work0;
> UnionVector vv1;
> VarCharVector vv6;
> DrillBuf work9;
> VarCharVector vv11;
> DrillBuf work14;
> VarCharVector vv16;
> public void doEval(int inIndex, int outIndex)
> throws SchemaChangeException
> {
> {
> UnionHolder out4 = new UnionHolder();
> {
> out4 .isSet = vv1 .getAccessor().isSet((inIndex));
> if (out4 .isSet == 1) {
> vv1 .getAccessor().get((inIndex), out4);
> }
> }
> // start of eval portion of typeOf function. //
> VarCharHolder out5 = new VarCharHolder();
> {
> final VarCharHolder out = new VarCharHolder();
> UnionHolder input = out4;
> DrillBuf buf = work0;
> UnionFunctions$GetType_eval:
> {
> String typeName = input.getTypeString();
> byte[] type = typeName.getBytes();
> buf = buf.reallocIfNeeded(type.length);
> buf.setBytes(0, type);
> out.buffer = buf;
> out.start = 0;
> out.end = type.length;
> }
> {code}
> By contrast, here is the generated code for one of the existing 
> {{TestTypeFns}} tests where things work:
> {code:java}
> public class ProjectorGen0
> extends ProjectorTemplate
> {
> DrillBuf work0;
> NullableBigIntVector vv1;
> VarCharVector vv7;
> public ProjectorGen0() {
> try {
> __DRILL_INIT__();
> } catch (SchemaChangeException e) {
> throw new UnsupportedOperationException(e);
> }
> }
> public void doEval(int inIndex, int outIndex)
> throws SchemaChangeException
> {
> {
>..
> // start of eval portion of typeOf function. //
> VarCharHolder out6 = new VarCharHolder();
> {
> final VarCharHolder out = new VarCharHolder();
> FieldReader input = new NullableIntHolderReaderImpl(out5);
> DrillBuf buf = work0;
> UnionFunctions$GetType_eval:
> {
> String typeName = input.getTypeString();
> byte[] type = typeName.getBytes();
> buf = buf.reallocIfNeeded(type.length);
> buf.setBytes(0, type);
> out.buffer = buf;
> out.start = 0;
> out.end = type.length;
> }
> work0 = buf;
> out6 .start = out.start;
> out6 .end = out.end;
> out6 .buffer = out.buffer;
> }
> // end of eval portion of typeOf function. //
> {code}
> Notice that the {{input}} variable is of type {{FieldReader}} as expected.
> Queries that work:
> {code:java}
> String sql = "SELECT typeof(CAST(a AS " + castType + ")) FROM (VALUES 
> (1))