[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347724198
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5StringDataWriter.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.store.hdf5.HDF5Utils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+
+public class HDF5StringDataWriter extends HDF5DataWriter {
+
+  private String[] data;
+
+  private ArrayList listData;
+
+  private ScalarWriter rowWriter;
+
+  // This constructor is used when the data is a 1D column.  The column is 
inferred from the datapath
+  public HDF5StringDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+data = reader.readStringArray(datapath);
+listData = new ArrayList<>();
+
+listData.addAll(Arrays.asList(data));
+
+fieldName = HDF5Utils.getNameFromPath(datapath);
+ColumnMetadata colSchema = MetadataUtils.newScalar(fieldName, 
TypeProtos.MinorType.VARCHAR, TypeProtos.DataMode.OPTIONAL);
+int index = columnWriter.addColumn(colSchema);
+rowWriter = columnWriter.scalar(index);
+  }
 
 Review comment:
   Unfortunately I didn't see any iterator classes in the HDF5 library.  It 
seemed like you had the choice of either:
   1.   Reading the first element of a dataset
   2.  Reading the entire dataset
   
   There didn't seem to be any Iterator or obvious way of iterating over a 
dataset.  All the sample code I found online and examples used this technique. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347724198
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5StringDataWriter.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.store.hdf5.HDF5Utils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+
+public class HDF5StringDataWriter extends HDF5DataWriter {
+
+  private String[] data;
+
+  private ArrayList listData;
+
+  private ScalarWriter rowWriter;
+
+  // This constructor is used when the data is a 1D column.  The column is 
inferred from the datapath
+  public HDF5StringDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+data = reader.readStringArray(datapath);
+listData = new ArrayList<>();
+
+listData.addAll(Arrays.asList(data));
+
+fieldName = HDF5Utils.getNameFromPath(datapath);
+ColumnMetadata colSchema = MetadataUtils.newScalar(fieldName, 
TypeProtos.MinorType.VARCHAR, TypeProtos.DataMode.OPTIONAL);
+int index = columnWriter.addColumn(colSchema);
+rowWriter = columnWriter.scalar(index);
+  }
 
 Review comment:
   Unfortunately I didn't see any iterator classes in the HDF5 library.  It 
seemed like you had the choice of either:
   1.   Reading the first element of a dataset
   2.  Reading the entire dataset
   
   There didn't seem to be any Iterator or obvious way of iterating over a 
dataset.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347718505
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5Utils.java
 ##
 @@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5EnumerationValue;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataTypeInformation;
+import org.apache.commons.lang3.ArrayUtils;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+
+import java.io.IOException;
+import java.util.BitSet;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class HDF5Utils {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5Utils.class);
+
+  private static final boolean caseSensitive = false;
+
+  /**
+   * This function returns and HDF5Attribute object for use when Drill maps 
the attributes.
+   *
+   * @param pathName
+   * @param key
+   * @param reader
+   * @return
+   * @throws IOException
+   */
+  public static HDF5Attribute getAttribute(String pathName, final String key, 
IHDF5Reader reader) throws IOException {
+if (pathName.equals("")) {
+  pathName = "/";
+}
+
+if (!reader.exists(pathName)) {
+  return null;
+}
+
+if (key.equals("dimensions")) {
+  final HDF5DataSetInformation datasetInfo = 
reader.object().getDataSetInformation(pathName);
+  final long[] dimensions = datasetInfo.getDimensions();
+  ArrayUtils.reverse(dimensions);
+  return new HDF5Attribute(MinorType.LIST, "dimensions", dimensions);
+}
+
+if (key.equals("dataType")) {
+  final HDF5DataSetInformation datasetInfo = 
reader.object().getDataSetInformation(pathName);
+  return new HDF5Attribute(getDataType(datasetInfo), "DataType", 
datasetInfo.getTypeInformation().getDataClass());
+}
+
+if (!reader.object().hasAttribute(pathName, key)) {
+  return null;
+}
+
+HDF5DataTypeInformation attributeInfo = 
reader.object().getAttributeInformation(pathName, key);
+Class type = attributeInfo.tryGetJavaType();
+if (type.isAssignableFrom(long[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.BIGINT, key, 
reader.int64().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.BIGINT, key, 
reader.uint64().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(int[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.INT, key, 
reader.int32().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.INT, key, 
reader.uint32().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(short[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.INT, key, 
reader.int16().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.INT, key, 
reader.uint16().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(byte[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.INT, key, 
reader.int8().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.INT, key, 
reader.uint8().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(double[].class)) {
+  return new HDF5Attribute(MinorType.FLOAT8, key, 
reader.float64().getAttr(pathName, key));
+} else if (type.isAssignableFrom(float[].class)) {
+  return new HDF5Attribute(MinorType.FLOAT8, key, 
reader.float32().getAttr(pathName, key));
+} else if (type.isAssignableFrom(String[].class)) {
+  return new HDF5Attribute(MinorType.VARCHAR, key, 
reader.string().getAttr(pathName, key));
+} else if (type.isAssignableFrom(long.class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(Min

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347717747
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5StringDataWriter.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.store.hdf5.HDF5Utils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+
+public class HDF5StringDataWriter extends HDF5DataWriter {
+
+  private String[] data;
+
+  private ArrayList listData;
+
+  private ScalarWriter rowWriter;
+
+  // This constructor is used when the data is a 1D column.  The column is 
inferred from the datapath
+  public HDF5StringDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+data = reader.readStringArray(datapath);
+listData = new ArrayList<>();
+
+listData.addAll(Arrays.asList(data));
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347715490
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5IntDataWriter.java
 ##
 @@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.store.hdf5.HDF5Utils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+
+import java.util.ArrayList;
+
+public class HDF5IntDataWriter extends HDF5DataWriter {
+
+  private int[] data;
+
+  private ArrayList listData;
+
+  private ScalarWriter rowWriter;
+
+  private final boolean flattenMatrix = false;
+
+  // TODO Deal with Long Ints
+
+  // This constructor is used when the data is a 1D column.  The column is 
inferred from the datapath
+  public HDF5IntDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+data = reader.readIntArray(datapath);
+listData = new ArrayList<>();
+
+for (int datum : data) {
+  listData.add(new Integer(datum));
+}
+
+fieldName = HDF5Utils.getNameFromPath(datapath);
+ColumnMetadata colSchema = MetadataUtils.newScalar(fieldName, 
TypeProtos.MinorType.INT, TypeProtos.DataMode.OPTIONAL);
+int index = columnWriter.addColumn(colSchema);
+rowWriter = columnWriter.scalar(index);
+  }
+
+  // This constructor is used when the data is part of a 2D array.  In this 
case the column name is provided in the constructor
+  public HDF5IntDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath, String fieldName, int currentColumn) {
+super(reader, columnWriter, datapath, fieldName, currentColumn);
+listData = new ArrayList<>();
+// Get dimensions
+long[] dimensions = 
reader.object().getDataSetInformation(datapath).getDimensions();
+int[][] tempData;
+if( dimensions.length == 2) {
+  tempData = transpose(reader.readIntMatrix(datapath));
+} else {
+  tempData = transpose(reader.int32().readMDArray(datapath).toMatrix());
+}
+data = tempData[currentColumn];
+for (int datum : data) {
+  listData.add(datum);
+}
+
+ColumnMetadata colSchema = MetadataUtils.newScalar(fieldName, 
TypeProtos.MinorType.INT, TypeProtos.DataMode.OPTIONAL);
+int index = columnWriter.addColumn(colSchema);
+rowWriter = columnWriter.scalar(index);
+  }
+
+  public HDF5IntDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String fieldName, ArrayList data) {
+super(reader, columnWriter, null);
+this.fieldName = fieldName;
+this.listData = data;
+
+ColumnMetadata colSchema = MetadataUtils.newScalar(fieldName, 
TypeProtos.MinorType.INT, TypeProtos.DataMode.OPTIONAL);
+int index = columnWriter.addColumn(colSchema);
+rowWriter = columnWriter.scalar(index);
+  }
+
+
+  public boolean write() {
+if (counter > listData.size()) {
+  return false;
+} else {
+  rowWriter.setInt(listData.get(counter));
+  counter++;
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347714219
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,887 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5BatchReader.class);
+  private FileSplit split;
+  private HDF5FormatConfig formatConfig;
+  private ResultSetLoader loader;
+  private String tempFileName;
+  private IHDF5Reader HDF5reader;
+  private File infile;
+  private BufferedReader reader;
+  protected HDF5ReaderConfig readerConfig;
+  private boolean finish;
+
+
+  public static class HDF5ReaderConfig {
+protected final HDF5FormatPlugin plugin;
+protected TupleMetadata schema;
+protected String defaultPath;
+protected HDF5FormatConfig formatConfig;
+
+public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig 
formatConfig) {
+  this.plugin = plugin;
+  this.formatConfig = formatConfig;
+  this.defaultPath = formatConfig.getDefaultPath();
+}
+  }
+
+
+  public HDF5BatchReader(HDF5ReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+this.formatConfig = readerConfig.formatConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+loader = negotiator.build();
+openFile(negotiator);
+this.loader = negotiator.build();
+return true;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+  IHDF5Factory factory = HDF5FactoryProvider.get();
+  this.infile = convertInputStreamToFile(in);
+  this.HDF5reader = factory.openForReading(infile);
+} catch (Exception e) {
+  throw UserException
+.dataReadError(e)
+.message("Failed to open open input file: %s", split.getPath())
+.build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in));
+  }
+
+  /**
+   * This function converts the Drill inputstream into a File object for the 
HDF5 library.  This function
+   * exists due to a known limitation in the 

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347713986
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,887 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5BatchReader.class);
+  private FileSplit split;
+  private HDF5FormatConfig formatConfig;
+  private ResultSetLoader loader;
+  private String tempFileName;
+  private IHDF5Reader HDF5reader;
+  private File infile;
+  private BufferedReader reader;
+  protected HDF5ReaderConfig readerConfig;
+  private boolean finish;
+
+
+  public static class HDF5ReaderConfig {
+protected final HDF5FormatPlugin plugin;
+protected TupleMetadata schema;
+protected String defaultPath;
+protected HDF5FormatConfig formatConfig;
+
+public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig 
formatConfig) {
+  this.plugin = plugin;
+  this.formatConfig = formatConfig;
+  this.defaultPath = formatConfig.getDefaultPath();
+}
+  }
+
+
+  public HDF5BatchReader(HDF5ReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+this.formatConfig = readerConfig.formatConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+loader = negotiator.build();
+openFile(negotiator);
+this.loader = negotiator.build();
+return true;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+  IHDF5Factory factory = HDF5FactoryProvider.get();
+  this.infile = convertInputStreamToFile(in);
+  this.HDF5reader = factory.openForReading(infile);
+} catch (Exception e) {
+  throw UserException
+.dataReadError(e)
+.message("Failed to open open input file: %s", split.getPath())
+.build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in));
+  }
+
+  /**
+   * This function converts the Drill inputstream into a File object for the 
HDF5 library.  This function
+   * exists due to a known limitation in the 

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347713849
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,887 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5BatchReader.class);
+  private FileSplit split;
+  private HDF5FormatConfig formatConfig;
+  private ResultSetLoader loader;
+  private String tempFileName;
+  private IHDF5Reader HDF5reader;
+  private File infile;
+  private BufferedReader reader;
+  protected HDF5ReaderConfig readerConfig;
+  private boolean finish;
+
+
+  public static class HDF5ReaderConfig {
+protected final HDF5FormatPlugin plugin;
+protected TupleMetadata schema;
+protected String defaultPath;
+protected HDF5FormatConfig formatConfig;
+
+public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig 
formatConfig) {
+  this.plugin = plugin;
+  this.formatConfig = formatConfig;
+  this.defaultPath = formatConfig.getDefaultPath();
+}
+  }
+
+
+  public HDF5BatchReader(HDF5ReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+this.formatConfig = readerConfig.formatConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+loader = negotiator.build();
+openFile(negotiator);
+this.loader = negotiator.build();
+return true;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+  IHDF5Factory factory = HDF5FactoryProvider.get();
+  this.infile = convertInputStreamToFile(in);
+  this.HDF5reader = factory.openForReading(infile);
+} catch (Exception e) {
+  throw UserException
+.dataReadError(e)
+.message("Failed to open open input file: %s", split.getPath())
+.build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in));
+  }
+
+  /**
+   * This function converts the Drill inputstream into a File object for the 
HDF5 library.  This function
+   * exists due to a known limitation in the 

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347713828
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,887 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5BatchReader.class);
+  private FileSplit split;
+  private HDF5FormatConfig formatConfig;
+  private ResultSetLoader loader;
+  private String tempFileName;
+  private IHDF5Reader HDF5reader;
+  private File infile;
+  private BufferedReader reader;
+  protected HDF5ReaderConfig readerConfig;
+  private boolean finish;
+
+
+  public static class HDF5ReaderConfig {
+protected final HDF5FormatPlugin plugin;
+protected TupleMetadata schema;
+protected String defaultPath;
+protected HDF5FormatConfig formatConfig;
+
+public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig 
formatConfig) {
+  this.plugin = plugin;
+  this.formatConfig = formatConfig;
+  this.defaultPath = formatConfig.getDefaultPath();
+}
+  }
+
+
+  public HDF5BatchReader(HDF5ReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+this.formatConfig = readerConfig.formatConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+loader = negotiator.build();
+openFile(negotiator);
+this.loader = negotiator.build();
+return true;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+  IHDF5Factory factory = HDF5FactoryProvider.get();
+  this.infile = convertInputStreamToFile(in);
+  this.HDF5reader = factory.openForReading(infile);
+} catch (Exception e) {
+  throw UserException
+.dataReadError(e)
+.message("Failed to open open input file: %s", split.getPath())
+.build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in));
+  }
+
+  /**
+   * This function converts the Drill inputstream into a File object for the 
HDF5 library.  This function
+   * exists due to a known limitation in the 

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347713678
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,887 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5BatchReader.class);
+  private FileSplit split;
+  private HDF5FormatConfig formatConfig;
+  private ResultSetLoader loader;
+  private String tempFileName;
+  private IHDF5Reader HDF5reader;
+  private File infile;
+  private BufferedReader reader;
+  protected HDF5ReaderConfig readerConfig;
+  private boolean finish;
+
+
+  public static class HDF5ReaderConfig {
+protected final HDF5FormatPlugin plugin;
+protected TupleMetadata schema;
+protected String defaultPath;
+protected HDF5FormatConfig formatConfig;
+
+public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig 
formatConfig) {
+  this.plugin = plugin;
+  this.formatConfig = formatConfig;
+  this.defaultPath = formatConfig.getDefaultPath();
+}
+  }
+
+
+  public HDF5BatchReader(HDF5ReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+this.formatConfig = readerConfig.formatConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+loader = negotiator.build();
+openFile(negotiator);
+this.loader = negotiator.build();
+return true;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+  IHDF5Factory factory = HDF5FactoryProvider.get();
+  this.infile = convertInputStreamToFile(in);
+  this.HDF5reader = factory.openForReading(infile);
+} catch (Exception e) {
+  throw UserException
+.dataReadError(e)
+.message("Failed to open open input file: %s", split.getPath())
+.build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in));
+  }
+
+  /**
+   * This function converts the Drill inputstream into a File object for the 
HDF5 library.  This function
+   * exists due to a known limitation in the 

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347713607
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,887 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5BatchReader.class);
+  private FileSplit split;
+  private HDF5FormatConfig formatConfig;
+  private ResultSetLoader loader;
+  private String tempFileName;
+  private IHDF5Reader HDF5reader;
+  private File infile;
+  private BufferedReader reader;
+  protected HDF5ReaderConfig readerConfig;
+  private boolean finish;
+
+
+  public static class HDF5ReaderConfig {
+protected final HDF5FormatPlugin plugin;
+protected TupleMetadata schema;
+protected String defaultPath;
+protected HDF5FormatConfig formatConfig;
+
+public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig 
formatConfig) {
+  this.plugin = plugin;
+  this.formatConfig = formatConfig;
+  this.defaultPath = formatConfig.getDefaultPath();
+}
+  }
+
+
+  public HDF5BatchReader(HDF5ReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+this.formatConfig = readerConfig.formatConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+loader = negotiator.build();
+openFile(negotiator);
+this.loader = negotiator.build();
+return true;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+  IHDF5Factory factory = HDF5FactoryProvider.get();
+  this.infile = convertInputStreamToFile(in);
+  this.HDF5reader = factory.openForReading(infile);
+} catch (Exception e) {
+  throw UserException
+.dataReadError(e)
+.message("Failed to open open input file: %s", split.getPath())
+.build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in));
+  }
+
+  /**
+   * This function converts the Drill inputstream into a File object for the 
HDF5 library.  This function
+   * exists due to a known limitation in the 

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347712719
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,887 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5BatchReader.class);
+  private FileSplit split;
+  private HDF5FormatConfig formatConfig;
+  private ResultSetLoader loader;
+  private String tempFileName;
+  private IHDF5Reader HDF5reader;
+  private File infile;
+  private BufferedReader reader;
+  protected HDF5ReaderConfig readerConfig;
+  private boolean finish;
+
+
+  public static class HDF5ReaderConfig {
+protected final HDF5FormatPlugin plugin;
+protected TupleMetadata schema;
+protected String defaultPath;
+protected HDF5FormatConfig formatConfig;
+
+public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig 
formatConfig) {
+  this.plugin = plugin;
+  this.formatConfig = formatConfig;
+  this.defaultPath = formatConfig.getDefaultPath();
+}
+  }
+
+
+  public HDF5BatchReader(HDF5ReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+this.formatConfig = readerConfig.formatConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+loader = negotiator.build();
+openFile(negotiator);
+this.loader = negotiator.build();
+return true;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+  IHDF5Factory factory = HDF5FactoryProvider.get();
+  this.infile = convertInputStreamToFile(in);
+  this.HDF5reader = factory.openForReading(infile);
+} catch (Exception e) {
+  throw UserException
+.dataReadError(e)
+.message("Failed to open open input file: %s", split.getPath())
+.build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in));
+  }
+
+  /**
+   * This function converts the Drill inputstream into a File object for the 
HDF5 library.  This function
+   * exists due to a known limitation in the 

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347711575
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,887 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5BatchReader.class);
+  private FileSplit split;
+  private HDF5FormatConfig formatConfig;
+  private ResultSetLoader loader;
+  private String tempFileName;
+  private IHDF5Reader HDF5reader;
+  private File infile;
+  private BufferedReader reader;
+  protected HDF5ReaderConfig readerConfig;
+  private boolean finish;
+
+
+  public static class HDF5ReaderConfig {
+protected final HDF5FormatPlugin plugin;
+protected TupleMetadata schema;
+protected String defaultPath;
+protected HDF5FormatConfig formatConfig;
+
+public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig 
formatConfig) {
+  this.plugin = plugin;
+  this.formatConfig = formatConfig;
+  this.defaultPath = formatConfig.getDefaultPath();
+}
+  }
+
+
+  public HDF5BatchReader(HDF5ReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+this.formatConfig = readerConfig.formatConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+loader = negotiator.build();
+openFile(negotiator);
+this.loader = negotiator.build();
+return true;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+  IHDF5Factory factory = HDF5FactoryProvider.get();
+  this.infile = convertInputStreamToFile(in);
+  this.HDF5reader = factory.openForReading(infile);
+} catch (Exception e) {
+  throw UserException
+.dataReadError(e)
+.message("Failed to open open input file: %s", split.getPath())
+.build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in));
+  }
+
+  /**
+   * This function converts the Drill inputstream into a File object for the 
HDF5 library.  This function
+   * exists due to a known limitation in the 

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347711645
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,887 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5BatchReader.class);
+  private FileSplit split;
+  private HDF5FormatConfig formatConfig;
+  private ResultSetLoader loader;
+  private String tempFileName;
+  private IHDF5Reader HDF5reader;
+  private File infile;
+  private BufferedReader reader;
+  protected HDF5ReaderConfig readerConfig;
+  private boolean finish;
+
+
+  public static class HDF5ReaderConfig {
+protected final HDF5FormatPlugin plugin;
+protected TupleMetadata schema;
+protected String defaultPath;
+protected HDF5FormatConfig formatConfig;
+
+public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig 
formatConfig) {
+  this.plugin = plugin;
+  this.formatConfig = formatConfig;
+  this.defaultPath = formatConfig.getDefaultPath();
+}
+  }
+
+
+  public HDF5BatchReader(HDF5ReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+this.formatConfig = readerConfig.formatConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+loader = negotiator.build();
+openFile(negotiator);
+this.loader = negotiator.build();
+return true;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+  IHDF5Factory factory = HDF5FactoryProvider.get();
+  this.infile = convertInputStreamToFile(in);
+  this.HDF5reader = factory.openForReading(infile);
+} catch (Exception e) {
+  throw UserException
+.dataReadError(e)
+.message("Failed to open open input file: %s", split.getPath())
+.build(logger);
+}
+reader = new BufferedReader(new InputStreamReader(in));
+  }
+
+  /**
+   * This function converts the Drill inputstream into a File object for the 
HDF5 library.  This function
+   * exists due to a known limitation in the 

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347711097
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,887 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5BatchReader.class);
+  private FileSplit split;
+  private HDF5FormatConfig formatConfig;
+  private ResultSetLoader loader;
+  private String tempFileName;
+  private IHDF5Reader HDF5reader;
+  private File infile;
+  private BufferedReader reader;
+  protected HDF5ReaderConfig readerConfig;
+  private boolean finish;
+
+
+  public static class HDF5ReaderConfig {
+protected final HDF5FormatPlugin plugin;
+protected TupleMetadata schema;
+protected String defaultPath;
+protected HDF5FormatConfig formatConfig;
+
+public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig 
formatConfig) {
+  this.plugin = plugin;
+  this.formatConfig = formatConfig;
+  this.defaultPath = formatConfig.getDefaultPath();
+}
+  }
+
+
+  public HDF5BatchReader(HDF5ReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+this.formatConfig = readerConfig.formatConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+loader = negotiator.build();
+openFile(negotiator);
+this.loader = negotiator.build();
+return true;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+InputStream in;
+try {
+  in = negotiator.fileSystem().open(split.getPath());
+  IHDF5Factory factory = HDF5FactoryProvider.get();
+  this.infile = convertInputStreamToFile(in);
+  this.HDF5reader = factory.openForReading(infile);
+} catch (Exception e) {
+  throw UserException
+.dataReadError(e)
+.message("Failed to open open input file: %s", split.getPath())
+.build(logger);
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347708891
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5Utils.java
 ##
 @@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5EnumerationValue;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataTypeInformation;
+import org.apache.commons.lang3.ArrayUtils;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+
+import java.io.IOException;
+import java.util.BitSet;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class HDF5Utils {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5Utils.class);
+
+  private static final boolean caseSensitive = false;
+
+  /**
+   * This function returns and HDF5Attribute object for use when Drill maps 
the attributes.
+   *
+   * @param pathName
+   * @param key
+   * @param reader
+   * @return
+   * @throws IOException
+   */
+  public static HDF5Attribute getAttribute(String pathName, final String key, 
IHDF5Reader reader) throws IOException {
+if (pathName.equals("")) {
+  pathName = "/";
+}
+
+if (!reader.exists(pathName)) {
+  return null;
+}
+
+if (key.equals("dimensions")) {
+  final HDF5DataSetInformation datasetInfo = 
reader.object().getDataSetInformation(pathName);
+  final long[] dimensions = datasetInfo.getDimensions();
+  ArrayUtils.reverse(dimensions);
+  return new HDF5Attribute(MinorType.LIST, "dimensions", dimensions);
+}
+
+if (key.equals("dataType")) {
+  final HDF5DataSetInformation datasetInfo = 
reader.object().getDataSetInformation(pathName);
+  return new HDF5Attribute(getDataType(datasetInfo), "DataType", 
datasetInfo.getTypeInformation().getDataClass());
+}
+
+if (!reader.object().hasAttribute(pathName, key)) {
+  return null;
+}
+
+HDF5DataTypeInformation attributeInfo = 
reader.object().getAttributeInformation(pathName, key);
+Class type = attributeInfo.tryGetJavaType();
+if (type.isAssignableFrom(long[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.BIGINT, key, 
reader.int64().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.BIGINT, key, 
reader.uint64().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(int[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.INT, key, 
reader.int32().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.INT, key, 
reader.uint32().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(short[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.INT, key, 
reader.int16().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.INT, key, 
reader.uint16().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(byte[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.INT, key, 
reader.int8().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.INT, key, 
reader.uint8().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(double[].class)) {
+  return new HDF5Attribute(MinorType.FLOAT8, key, 
reader.float64().getAttr(pathName, key));
+} else if (type.isAssignableFrom(float[].class)) {
+  return new HDF5Attribute(MinorType.FLOAT8, key, 
reader.float32().getAttr(pathName, key));
+} else if (type.isAssignableFrom(String[].class)) {
+  return new HDF5Attribute(MinorType.VARCHAR, key, 
reader.string().getAttr(pathName, key));
+} else if (type.isAssignableFrom(long.class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(Min

[GitHub] [drill] paul-rogers commented on issue #1901: DRILL-7388: Kafka improvements

2019-11-18 Thread GitBox
paul-rogers commented on issue #1901: DRILL-7388: Kafka improvements
URL: https://github.com/apache/drill/pull/1901#issuecomment-555231426
 
 
   @arina-ielchiieva, we do now have multiple plugins using EVF: text (which I 
did) and several which Charles has done including Pcap, HDF5 and regex.
   
   You are correct, since the Kafka reader uses the JSON reader, it makes sense 
to get that merged as part of the JSON format reader. Once that is stable, we 
can apply it here (and in Charles' HTTP storage plugin.)
   
   The JSON format plugin will get merged once I finish resolving the 
batch/container record count issues. Those were causing unexpected test 
failures when I enabled the new JSON reader.
   
   So, nothing yet to do on Kafka for EVF; but your changes will make the 
eventual upgrade easier. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #1901: DRILL-7388: Kafka improvements

2019-11-18 Thread GitBox
arina-ielchiieva commented on issue #1901: DRILL-7388: Kafka improvements
URL: https://github.com/apache/drill/pull/1901#issuecomment-555204411
 
 
   @vvysotskyi / @paul-rogers thanks for the code review. Addressed code review 
comments.
   
   @paul-rogers regarding moving Kafka to EVF, does it depend on Json format 
conversion to EVF or `org.apache.drill.exec.vector.complex.fn.JsonReader` won't 
be changed only code which is using it? Also looks like we have a couple of 
examples of format plugins on EVF, do we have EVF based plugins? Only Mock 
plugin?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1901: DRILL-7388: Kafka improvements

2019-11-18 Thread GitBox
paul-rogers commented on a change in pull request #1901: DRILL-7388: Kafka 
improvements
URL: https://github.com/apache/drill/pull/1901#discussion_r347570726
 
 

 ##
 File path: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/ReadOptions.java
 ##
 @@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.kafka;
+
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.server.options.OptionManager;
+
+import java.util.StringJoiner;
+
+/**
+ * Holds all system / session options that are used during data read from 
Kafka.
+ */
+public class ReadOptions {
+
+  private final String messageReader;
+  private final long pollTimeOut;
+  private final boolean allTextMode;
+  private final boolean readNumbersAsDouble;
+  private final boolean enableUnionType;
+  private final boolean skipInvalidRecords;
+  private final boolean allowNanInf;
+  private final boolean allowEscapeAnyChar;
+
+  public ReadOptions(OptionManager optionManager) {
+this.messageReader = 
optionManager.getString(ExecConstants.KAFKA_RECORD_READER);
+this.pollTimeOut = optionManager.getLong(ExecConstants.KAFKA_POLL_TIMEOUT);
+this.allTextMode = 
optionManager.getBoolean(ExecConstants.KAFKA_ALL_TEXT_MODE);
+this.readNumbersAsDouble = 
optionManager.getBoolean(ExecConstants.KAFKA_READER_READ_NUMBERS_AS_DOUBLE);
+this.enableUnionType = 
optionManager.getBoolean(ExecConstants.ENABLE_UNION_TYPE_KEY);
+this.skipInvalidRecords = 
optionManager.getBoolean(ExecConstants.KAFKA_READER_SKIP_INVALID_RECORDS);
+this.allowNanInf = 
optionManager.getBoolean(ExecConstants.KAFKA_READER_NAN_INF_NUMBERS);
+this.allowEscapeAnyChar = 
optionManager.getBoolean(ExecConstants.KAFKA_READER_ESCAPE_ANY_CHAR);
 
 Review comment:
   Nicely done; this is a clean, simple way to hide option complexity from the 
rest of the code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on issue #1902: DRILL-7448: Fix warnings when running Drill memory tests

2019-11-18 Thread GitBox
vvysotskyi commented on issue #1902: DRILL-7448: Fix warnings when running 
Drill memory tests
URL: https://github.com/apache/drill/pull/1902#issuecomment-555173031
 
 
   @KazydubB, thanks for fixing it. Could you please add also Janino dependency 
into other places where `drill-common:tests` is used to avoid possible issues 
for unit tests in other modules? Also, please add a comment above added 
dependencies with the info why it should be present.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] KazydubB opened a new pull request #1902: DRILL-7448: Fix warnings when running Drill memory tests

2019-11-18 Thread GitBox
KazydubB opened a new pull request #1902: DRILL-7448: Fix warnings when running 
Drill memory tests
URL: https://github.com/apache/drill/pull/1902
 
 
   Jira - [DRILL-7448](https://issues.apache.org/jira/browse/DRILL-7448)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1901: DRILL-7388: Kafka improvements

2019-11-18 Thread GitBox
vvysotskyi commented on a change in pull request #1901: DRILL-7388: Kafka 
improvements
URL: https://github.com/apache/drill/pull/1901#discussion_r347534328
 
 

 ##
 File path: 
contrib/storage-kafka/src/main/java/org/apache/drill/exec/store/kafka/KafkaPartitionScanSpecBuilder.java
 ##
 @@ -31,30 +31,30 @@
 import org.apache.kafka.clients.consumer.OffsetAndTimestamp;
 import org.apache.kafka.common.TopicPartition;
 import org.apache.kafka.common.serialization.ByteArrayDeserializer;
+
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
-import java.util.concurrent.TimeUnit;
 
-public class KafkaPartitionScanSpecBuilder extends
-AbstractExprVisitor,Void,RuntimeException> {
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(KafkaPartitionScanSpecBuilder.class);
+public class KafkaPartitionScanSpecBuilder
+  extends 
AbstractExprVisitor,Void,RuntimeException>
 
 Review comment:
   ```suggestion
 extends AbstractExprVisitor, Void, 
RuntimeException>
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi merged pull request #1900: DRILL-7446: Fix Eclipse compilation issue in AbstractParquetGroupScan

2019-11-18 Thread GitBox
vvysotskyi merged pull request #1900: DRILL-7446: Fix Eclipse compilation issue 
in AbstractParquetGroupScan
URL: https://github.com/apache/drill/pull/1900
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi edited a comment on issue #1895: DRILL-6540: Upgrade to HADOOP-3.x libraries

2019-11-18 Thread GitBox
vvysotskyi edited a comment on issue #1895: DRILL-6540: Upgrade to HADOOP-3.x 
libraries
URL: https://github.com/apache/drill/pull/1895#issuecomment-555059751
 
 
   @agozhiy, could you please add these binary files into the project sources 
and add README.md with information on how to build them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347457857
 
 

 ##
 File path: contrib/format-hdf5/README.md
 ##
 @@ -0,0 +1,129 @@
+# Drill HDF5 Format Plugin
+Per wikipedia, Hierarchical Data Format (HDF) is a set of file formats 
designed to store and organize large amounts of data. Originally developed at 
the National Center for Supercomputing Applications, it is supported by The HDF 
Group, a non-profit corporation whose mission is to ensure continued 
development of HDF5 technologies and the continued accessibility of data stored 
in HDF.
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347458039
 
 

 ##
 File path: contrib/format-hdf5/src/main/resources/bootstrap-format-plugins.json
 ##
 @@ -0,0 +1,26 @@
+{
+  "storage":{
+"dfs": {
+  "type": "file",
+  "formats": {
+"hdf5": {
+  "type": "hdf5",
+  "extensions": [
+"h5"
+  ]
+}
+  }
+},
+"s3": {
+  "type": "file",
+  "formats": {
+"hdf5": {
+  "type": "hdf5",
+  "extensions": [
+"h5"
+  ]
+}
+  }
+}
+  }
+}
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on issue #1895: DRILL-6540: Upgrade to HADOOP-3.x libraries

2019-11-18 Thread GitBox
vvysotskyi commented on issue #1895: DRILL-6540: Upgrade to HADOOP-3.x libraries
URL: https://github.com/apache/drill/pull/1895#issuecomment-555059751
 
 
   @agozhiy, since winutils binaries are hosted in your repository, could you 
please add `README.md` file with the link to the repository and steps on how to 
publish new versions. As an example, may be used `Calcite.md`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347429261
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5MapDataWriter.java
 ##
 @@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+
+public class HDF5MapDataWriter extends HDF5DataWriter {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5MapDataWriter.class);
+
+  private final String UNSAFE_SPACE_SEPARATOR = " ";
+
+  private final String SAFE_SPACE_SEPARATOR = "_";
+
+  private ArrayList dataWriters;
+
+  private ArrayList fieldNames;
+
+
+  public HDF5MapDataWriter(IHDF5Reader reader, RowSetLoader columnWriter, 
String datapath) {
+super(reader, columnWriter, datapath);
+fieldNames = new ArrayList<>();
+
+compoundData = reader.compounds().readArray(datapath, Object[].class);
+dataWriters = new ArrayList<>();
+fieldNames = getFieldNames();
+try {
+  getDataWriters();
+} catch (Exception e) {
+  throw UserException.dataReadError().addContext("Error writing Compound 
Field: ").addContext(e.getMessage()).build(logger);
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347429397
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java
 ##
 @@ -0,0 +1,1210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5FactoryProvider;
+import ch.systemsx.cisd.hdf5.HDF5LinkInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Factory;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.expr.holders.BigIntHolder;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5DoubleDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5EnumDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5FloatDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5IntDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5LongDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5MapDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5StringDataWriter;
+import org.apache.drill.exec.store.hdf5.writers.HDF5TimestampDataWriter;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.drill.shaded.guava.com.google.common.io.Files;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class HDF5BatchReader implements ManagedReader {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5BatchReader.class);
+
+  private static final String PATH_COLUMN_NAME = "path";
+
+  private static final String DATA_TYPE_COLUMN_NAME = "data_type";
+
+  private static final String FILE_NAME_COLUMN_NAME = "file_name";
+
+  private static final String INT_COLUMN_PREFIX = "int_col_";
+
+  private static final String LONG_COLUMN_PREFIX = "long_col_";
+
+  private static final String FLOAT_COLUMN_PREFIX = "float_col_";
+
+  private static final String DOUBLE_COLUMN_PREFIX = "double_col_";
+
+  private static final String INT_COLUMN_NAME = "int_data";
+
+  private static final String FLOAT_COLUMN_NAME = "float_data";
+
+  private static final String DOUBLE_COLUMN_NAME = "double_data";
+
+  private static final String LONG_COLUMN_NAME = "long_data";
+
+  private static final String STRING_COLUMN_NAME = "string_data";
+
+  private static final String BOOLEAN_COLUMN_NAME = "boolean_data";
+
+  private static final String PATH_PATTERN_REGEX = "/*.*/(.+?)$";

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347429310
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/writers/HDF5MapDataWriter.java
 ##
 @@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5.writers;
+
+import ch.systemsx.cisd.hdf5.HDF5CompoundMemberInformation;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.ArrayList;
+import java.util.List;
+
+public class HDF5MapDataWriter extends HDF5DataWriter {
+  private static final Logger logger = 
LoggerFactory.getLogger(HDF5MapDataWriter.class);
+
+  private final String UNSAFE_SPACE_SEPARATOR = " ";
+
+  private final String SAFE_SPACE_SEPARATOR = "_";
+
+  private ArrayList dataWriters;
+
+  private ArrayList fieldNames;
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on issue #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on issue #1778: Drill-7233: Format Plugin for HDF5
URL: https://github.com/apache/drill/pull/1778#issuecomment-555052931
 
 
   @paul-rogers @arina-ielchiieva 
   I still have to fix one thing in the `HDFUtils` class and check on Windows 
environment, so please hold off on review until tomorrow. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] agozhiy commented on issue #1895: DRILL-6540: Upgrade to HADOOP-3.x libraries

2019-11-18 Thread GitBox
agozhiy commented on issue #1895: DRILL-6540: Upgrade to HADOOP-3.x libraries
URL: https://github.com/apache/drill/pull/1895#issuecomment-555051790
 
 
   As it appeared, the new Hadoop library version required updating of 
hadoop-winutils, though there is  only an artifact for the version 2.7.0 in 
maven repository. So I created the needed artifact using jitpack.io. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347427267
 
 

 ##
 File path: 
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5Utils.java
 ##
 @@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import ch.systemsx.cisd.hdf5.HDF5DataClass;
+import ch.systemsx.cisd.hdf5.HDF5EnumerationValue;
+import ch.systemsx.cisd.hdf5.IHDF5Reader;
+import ch.systemsx.cisd.hdf5.HDF5DataSetInformation;
+import ch.systemsx.cisd.hdf5.HDF5DataTypeInformation;
+import org.apache.commons.lang3.ArrayUtils;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+
+import java.io.IOException;
+import java.util.BitSet;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class HDF5Utils {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HDF5Utils.class);
+
+  private static final boolean caseSensitive = false;
+
+  /**
+   * This function returns and HDF5Attribute object for use when Drill maps 
the attributes.
+   *
+   * @param pathName
+   * @param key
+   * @param reader
+   * @return
+   * @throws IOException
+   */
+  public static HDF5Attribute getAttribute(String pathName, final String key, 
IHDF5Reader reader) throws IOException {
+if (pathName.equals("")) {
+  pathName = "/";
+}
+
+if (!reader.exists(pathName)) {
+  return null;
+}
+
+if (key.equals("dimensions")) {
+  final HDF5DataSetInformation datasetInfo = 
reader.object().getDataSetInformation(pathName);
+  final long[] dimensions = datasetInfo.getDimensions();
+  ArrayUtils.reverse(dimensions);
+  return new HDF5Attribute(MinorType.LIST, "dimensions", dimensions);
+}
+
+if (key.equals("dataType")) {
+  final HDF5DataSetInformation datasetInfo = 
reader.object().getDataSetInformation(pathName);
+  return new HDF5Attribute(getDataType(datasetInfo), "DataType", 
datasetInfo.getTypeInformation().getDataClass());
+}
+
+if (!reader.object().hasAttribute(pathName, key)) {
+  return null;
+}
+
+HDF5DataTypeInformation attributeInfo = 
reader.object().getAttributeInformation(pathName, key);
+Class type = attributeInfo.tryGetJavaType();
+if (type.isAssignableFrom(long[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.BIGINT, key, 
reader.int64().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.BIGINT, key, 
reader.uint64().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(int[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.INT, key, 
reader.int32().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.INT, key, 
reader.uint32().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(short[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.INT, key, 
reader.int16().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.INT, key, 
reader.uint16().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(byte[].class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(MinorType.INT, key, 
reader.int8().getAttr(pathName, key));
+  } else {
+return new HDF5Attribute(MinorType.INT, key, 
reader.uint8().getAttr(pathName, key));
+  }
+} else if (type.isAssignableFrom(double[].class)) {
+  return new HDF5Attribute(MinorType.FLOAT8, key, 
reader.float64().getAttr(pathName, key));
+} else if (type.isAssignableFrom(float[].class)) {
+  return new HDF5Attribute(MinorType.FLOAT8, key, 
reader.float32().getAttr(pathName, key));
+} else if (type.isAssignableFrom(String[].class)) {
+  return new HDF5Attribute(MinorType.VARCHAR, key, 
reader.string().getAttr(pathName, key));
+} else if (type.isAssignableFrom(long.class)) {
+  if (attributeInfo.isSigned()) {
+return new HDF5Attribute(Min

[GitHub] [drill] cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin for HDF5

2019-11-18 Thread GitBox
cgivre commented on a change in pull request #1778: Drill-7233: Format Plugin 
for HDF5
URL: https://github.com/apache/drill/pull/1778#discussion_r347427673
 
 

 ##
 File path: 
contrib/format-hdf5/src/test/java/org/apache/drill/exec/store/hdf5/TestHDF5Utils.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.hdf5;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNull;
+
+public class TestHDF5Utils {
+
+  @Test
+  public void testGetNameFromPath() {
+String path1 = "/group1";
+String path2 = "/group1/group2/group3";
+String emptyPath = "";
+String nullPath = null;
+String rootPath = "/";
+
+assertEquals(HDF5Utils.getNameFromPath(path1), "group1");
+assertEquals(HDF5Utils.getNameFromPath(path2), "group3");
+assertEquals(HDF5Utils.getNameFromPath(emptyPath), "");
+assertNull(HDF5Utils.getNameFromPath(nullPath));
+assertEquals(HDF5Utils.getNameFromPath(rootPath), "");
+  }
+
+}
 
 Review comment:
   I removed 1 unit test and the associated 1MB file.  There are still 4 test 
files but they are very small. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347386586
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/UnionWriterImpl.java
 ##
 @@ -332,6 +334,14 @@ public int writeIndex() {
 return index.vectorIndex();
   }
 
+  @Override
+  public void copy(ColumnReader from) {
+if (! from.isNull()) {
 
 Review comment:
   ```suggestion
   if (!from.isNull()) {
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347359164
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ResultSetCopierImpl.java
 ##
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.impl.protocol.BatchAccessor;
+import org.apache.drill.exec.physical.resultSet.ResultSetCopier;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.ResultSetReader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetReader;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.exec.vector.accessor.ColumnWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+
+public class ResultSetCopierImpl implements ResultSetCopier {
+
+  private enum State {
+START,
+NO_SCHEMA,
+BETWEEN_BATCHES,
+BATCH_ACTIVE,
+NEW_SCHEMA,
+SCHEMA_PENDING,
+CLOSED
+  }
+
+  private interface BlockCopy {
 
 Review comment:
   do we really need the interface with only one implementation ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347342670
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/ResultSetCopier.java
 ##
 @@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet;
+
+import org.apache.drill.exec.physical.impl.aggregate.BatchIterator;
+import org.apache.drill.exec.record.VectorContainer;
+
+/**
+ * Copies rows from an input batch to an output batch. The input
+ * batch is assumed to have a selection vector, or the caller
+ * will pick the rows to copy.
+ * 
+ * Works to create full output batches to minimize per-batch
+ * overhead and to eliminate unnecessary empty batches if no
+ * rows are copied.
+ * 
+ * The output batches are assumed to have the same schema as
+ * input batches. (No projection occurs.) The output schema will
+ * change each time the input schema changes. (For an SV4, then
+ * the upstream operator must have ensured all batches covered
+ * by the SV4 have the same schema.)
+ * 
+ * This implementation works with a single stream of batches which,
+ * following Drill's rules, must consist of the same set of vectors on
+ * each non-schema-change batch.
+ *
+ * Protocol
+ * Overall lifecycle:
+ * 
+ * Create an instance of the
+ * {@link org.apache.drill.exec.physical.resultSet.impl.ResultSetCopierImpl
+ *  ResultSetCopierImpl} class, passing the input batch
+ *  accessor to the constructor.
+ * Loop to process each output batch as shown below. That is, continually
+ * process calls to the {@link BatchIterator#next()} method.
+ * Call {@link #close()}.
+ * 
+ * 
+ *
+ * To build each output batch:
+ *
+ * 
+ * public IterOutcome next() {
+ *   copier.startBatch();
+ *   while (! copier.isFull() {
+ * copier.freeInput();
+ * IterOutcome innerResult = inner.next();
+ * if (innerResult == DONE) { break; }
+ * copier.startInput();
+ * copier.copyAll();
+ *   }
+ *   if (copier.hasRows()) {
+ * outputContainer = copier.harvest();
+ * return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
+ *   } else { return DONE; }
+ * }
+ * 
+ * 
+ * The above assumes that the upstream operator can be polled
+ * multiple times in the DONE state. The extra polling is
+ * needed to handle any in-flight copies when the input
+ * exhausts its batches.
+ * 
+ * The above also shows that the copier handles and reports
+ * schema changes by setting the schema change flag in the
+ * output container. Real code must handle multiple calls to
+ * next() in the DONE state, and work around lack of such support
+ * in its input (perhaps by tracking a state.)
+ * 
+ * An input batch is processed by copying the rows. Copying can be done
+ * row-by row, via a row range, or by copying the entire input batch as
+ * shown in the example.
+ * Copying the entire batch make sense when the input batch carries as
+ * selection vector that identifies which rows to copy, in which
+ * order.
+ * 
+ * Because we wish to fill the output batch, we may be able to copy
+ * part of a batch, the whole batch, or multiple batches to the output.
+ */
+
+public interface ResultSetCopier {
+
+  /**
+   * Start the next output batch.
+   */
+  void startBatch();
+
+  /**
+   * Start the next input batch. The input batch must be held
+   * by the VectorAccessor passed into the constructor.
+   */
+  void startInput();
+
+  /**
+   * If copying rows one by one, copy the next row from the
+   * input.
+   *
+   * @return true if more rows remain on the input, false
+   * if all rows are exhausted
+   */
+  boolean copyNext();
+
+  /**
+   * Copy a row at the given position. For those cases in
+   * which random copying is needed, but a selection vector
+   * is not available. Note that this version is slow because
+   * of the need to reset indexes for every row. Better to
+   * use a selection vector, then copy sequentially.
+   *
+   * @param posn the input row position. If a selection vector
+   * is attached, then this is the selection vector position
+   */
+  void copyRe

[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347366534
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ResultSetCopierImpl.java
 ##
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.impl.protocol.BatchAccessor;
+import org.apache.drill.exec.physical.resultSet.ResultSetCopier;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.ResultSetReader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetReader;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.exec.vector.accessor.ColumnWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+
+public class ResultSetCopierImpl implements ResultSetCopier {
+
+  private enum State {
+START,
+NO_SCHEMA,
+BETWEEN_BATCHES,
+BATCH_ACTIVE,
+NEW_SCHEMA,
+SCHEMA_PENDING,
+CLOSED
+  }
+
+  private interface BlockCopy {
+void copy();
+boolean hasMore();
+  }
+
+  private class CopyAll implements BlockCopy {
+
+@Override
+public void copy() {
+  while (!rowWriter.isFull() && rowReader.next()) {
+project();
+  }
+}
+
+@Override
+public boolean hasMore() {
+  return rowReader.hasNext();
+}
+  }
+
+  private static class CopyPair {
+protected final ColumnWriter writer;
+protected final ColumnReader reader;
+
+protected CopyPair(ColumnWriter writer, ColumnReader reader) {
+  this.writer = writer;
+  this.reader = reader;
+}
+  }
+
+  // Input state
+
+  private int currentSchemaVersion = -1;
+  private final ResultSetReader resultSetReader;
+  protected RowSetReader rowReader;
+
+  // Output state
+
+  private final BufferAllocator allocator;
+  private final OptionBuilder writerOptions;
+  private ResultSetLoader resultSetWriter;
+  private RowSetLoader rowWriter;
+
+  // Copy state
+
+  private State state;
+  private CopyPair[] projection;
+  private BlockCopy activeCopy;
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch) {
+this(allocator, inputBatch, new OptionBuilder());
+  }
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch,
+  OptionBuilder outputOptions) {
+this.allocator = allocator;
+resultSetReader = new ResultSetReaderImpl(inputBatch);
+writerOptions = outputOptions;
+writerOptions.setVectorCache(new ResultVectorCacheImpl(allocator));
+state = State.START;
+  }
+
+  @Override
+  public void startBatch() {
+if (state == State.START) {
+
+  // No schema yet. Defer real batch start until we see an input
+  // batch.
+
+  state = State.NO_SCHEMA;
+  return;
+}
+Preconditions.checkState(state == State.BETWEEN_BATCHES || state == 
State.SCHEMA_PENDING);
+if (state == State.SCHEMA_PENDING) {
+
+  // We have a pending new schema. Create new writers to match.
+
+  createMapping();
+}
+resultSetWriter.startBatch();
+state = State.BATCH_ACTIVE;
+if (isCopyPending()) {
+
+  // Resume copying if a copy is active.
+
+  copyBlock();
+}
+  }
+
+  @Override
+  public void startInput() {
+Preconditions.checkState(state == State.NO_SCHEMA || state == 
State.NEW_SCHEMA ||
+ state == State.BATCH_ACTIVE,
+"Can only start input while in an output batch");
+Preconditions.checkState(!isCopyPending(),
+"Finish the pending copy before changing input");
+
+bindInput();
+
+if (state == State.BATCH_ACTIVE) {
+
+  // If no schema change, we are ready to copy.
+
+  if (currentSchemaVersion == 
res

[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347342851
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/ResultSetCopier.java
 ##
 @@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet;
+
+import org.apache.drill.exec.physical.impl.aggregate.BatchIterator;
+import org.apache.drill.exec.record.VectorContainer;
+
+/**
+ * Copies rows from an input batch to an output batch. The input
+ * batch is assumed to have a selection vector, or the caller
+ * will pick the rows to copy.
+ * 
+ * Works to create full output batches to minimize per-batch
+ * overhead and to eliminate unnecessary empty batches if no
+ * rows are copied.
+ * 
+ * The output batches are assumed to have the same schema as
+ * input batches. (No projection occurs.) The output schema will
+ * change each time the input schema changes. (For an SV4, then
+ * the upstream operator must have ensured all batches covered
+ * by the SV4 have the same schema.)
+ * 
+ * This implementation works with a single stream of batches which,
+ * following Drill's rules, must consist of the same set of vectors on
+ * each non-schema-change batch.
+ *
+ * Protocol
+ * Overall lifecycle:
+ * 
+ * Create an instance of the
+ * {@link org.apache.drill.exec.physical.resultSet.impl.ResultSetCopierImpl
+ *  ResultSetCopierImpl} class, passing the input batch
+ *  accessor to the constructor.
+ * Loop to process each output batch as shown below. That is, continually
+ * process calls to the {@link BatchIterator#next()} method.
+ * Call {@link #close()}.
+ * 
+ * 
+ *
+ * To build each output batch:
+ *
+ * 
+ * public IterOutcome next() {
+ *   copier.startBatch();
+ *   while (! copier.isFull() {
+ * copier.freeInput();
+ * IterOutcome innerResult = inner.next();
+ * if (innerResult == DONE) { break; }
+ * copier.startInput();
+ * copier.copyAll();
+ *   }
+ *   if (copier.hasRows()) {
+ * outputContainer = copier.harvest();
+ * return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
+ *   } else { return DONE; }
+ * }
+ * 
+ * 
+ * The above assumes that the upstream operator can be polled
+ * multiple times in the DONE state. The extra polling is
+ * needed to handle any in-flight copies when the input
+ * exhausts its batches.
+ * 
+ * The above also shows that the copier handles and reports
+ * schema changes by setting the schema change flag in the
+ * output container. Real code must handle multiple calls to
+ * next() in the DONE state, and work around lack of such support
+ * in its input (perhaps by tracking a state.)
+ * 
+ * An input batch is processed by copying the rows. Copying can be done
+ * row-by row, via a row range, or by copying the entire input batch as
+ * shown in the example.
+ * Copying the entire batch make sense when the input batch carries as
+ * selection vector that identifies which rows to copy, in which
+ * order.
+ * 
+ * Because we wish to fill the output batch, we may be able to copy
+ * part of a batch, the whole batch, or multiple batches to the output.
+ */
+
+public interface ResultSetCopier {
+
+  /**
+   * Start the next output batch.
+   */
+  void startBatch();
+
+  /**
+   * Start the next input batch. The input batch must be held
+   * by the VectorAccessor passed into the constructor.
+   */
+  void startInput();
+
+  /**
+   * If copying rows one by one, copy the next row from the
+   * input.
+   *
+   * @return true if more rows remain on the input, false
+   * if all rows are exhausted
+   */
+  boolean copyNext();
+
+  /**
+   * Copy a row at the given position. For those cases in
+   * which random copying is needed, but a selection vector
+   * is not available. Note that this version is slow because
+   * of the need to reset indexes for every row. Better to
+   * use a selection vector, then copy sequentially.
+   *
+   * @param posn the input row position. If a selection vector
+   * is attached, then this is the selection vector position
+   */
+  void copyRe

[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347343283
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/ResultSetCopier.java
 ##
 @@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet;
+
+import org.apache.drill.exec.physical.impl.aggregate.BatchIterator;
+import org.apache.drill.exec.record.VectorContainer;
+
+/**
+ * Copies rows from an input batch to an output batch. The input
+ * batch is assumed to have a selection vector, or the caller
+ * will pick the rows to copy.
+ * 
+ * Works to create full output batches to minimize per-batch
+ * overhead and to eliminate unnecessary empty batches if no
+ * rows are copied.
+ * 
+ * The output batches are assumed to have the same schema as
+ * input batches. (No projection occurs.) The output schema will
+ * change each time the input schema changes. (For an SV4, then
+ * the upstream operator must have ensured all batches covered
+ * by the SV4 have the same schema.)
+ * 
+ * This implementation works with a single stream of batches which,
+ * following Drill's rules, must consist of the same set of vectors on
+ * each non-schema-change batch.
+ *
+ * Protocol
+ * Overall lifecycle:
+ * 
+ * Create an instance of the
+ * {@link org.apache.drill.exec.physical.resultSet.impl.ResultSetCopierImpl
+ *  ResultSetCopierImpl} class, passing the input batch
+ *  accessor to the constructor.
+ * Loop to process each output batch as shown below. That is, continually
+ * process calls to the {@link BatchIterator#next()} method.
+ * Call {@link #close()}.
+ * 
+ * 
+ *
+ * To build each output batch:
+ *
+ * 
+ * public IterOutcome next() {
+ *   copier.startBatch();
+ *   while (! copier.isFull() {
+ * copier.freeInput();
+ * IterOutcome innerResult = inner.next();
+ * if (innerResult == DONE) { break; }
+ * copier.startInput();
+ * copier.copyAll();
+ *   }
+ *   if (copier.hasRows()) {
+ * outputContainer = copier.harvest();
+ * return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
+ *   } else { return DONE; }
+ * }
+ * 
+ * 
+ * The above assumes that the upstream operator can be polled
+ * multiple times in the DONE state. The extra polling is
+ * needed to handle any in-flight copies when the input
+ * exhausts its batches.
+ * 
+ * The above also shows that the copier handles and reports
+ * schema changes by setting the schema change flag in the
+ * output container. Real code must handle multiple calls to
+ * next() in the DONE state, and work around lack of such support
+ * in its input (perhaps by tracking a state.)
+ * 
+ * An input batch is processed by copying the rows. Copying can be done
+ * row-by row, via a row range, or by copying the entire input batch as
+ * shown in the example.
+ * Copying the entire batch make sense when the input batch carries as
+ * selection vector that identifies which rows to copy, in which
+ * order.
+ * 
+ * Because we wish to fill the output batch, we may be able to copy
+ * part of a batch, the whole batch, or multiple batches to the output.
+ */
+
+public interface ResultSetCopier {
+
+  /**
+   * Start the next output batch.
+   */
+  void startBatch();
+
+  /**
+   * Start the next input batch. The input batch must be held
+   * by the VectorAccessor passed into the constructor.
+   */
+  void startInput();
+
+  /**
+   * If copying rows one by one, copy the next row from the
+   * input.
+   *
+   * @return true if more rows remain on the input, false
+   * if all rows are exhausted
+   */
+  boolean copyNext();
+
+  /**
+   * Copy a row at the given position. For those cases in
+   * which random copying is needed, but a selection vector
+   * is not available. Note that this version is slow because
+   * of the need to reset indexes for every row. Better to
+   * use a selection vector, then copy sequentially.
+   *
+   * @param posn the input row position. If a selection vector
+   * is attached, then this is the selection vector position
+   */
+  void copyRe

[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347342043
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/ResultSetCopier.java
 ##
 @@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet;
+
+import org.apache.drill.exec.physical.impl.aggregate.BatchIterator;
+import org.apache.drill.exec.record.VectorContainer;
+
+/**
+ * Copies rows from an input batch to an output batch. The input
+ * batch is assumed to have a selection vector, or the caller
+ * will pick the rows to copy.
+ * 
+ * Works to create full output batches to minimize per-batch
+ * overhead and to eliminate unnecessary empty batches if no
+ * rows are copied.
+ * 
+ * The output batches are assumed to have the same schema as
+ * input batches. (No projection occurs.) The output schema will
+ * change each time the input schema changes. (For an SV4, then
+ * the upstream operator must have ensured all batches covered
+ * by the SV4 have the same schema.)
+ * 
+ * This implementation works with a single stream of batches which,
+ * following Drill's rules, must consist of the same set of vectors on
+ * each non-schema-change batch.
+ *
+ * Protocol
+ * Overall lifecycle:
+ * 
+ * Create an instance of the
+ * {@link org.apache.drill.exec.physical.resultSet.impl.ResultSetCopierImpl
+ *  ResultSetCopierImpl} class, passing the input batch
+ *  accessor to the constructor.
+ * Loop to process each output batch as shown below. That is, continually
+ * process calls to the {@link BatchIterator#next()} method.
+ * Call {@link #close()}.
+ * 
+ * 
+ *
+ * To build each output batch:
+ *
+ * 
+ * public IterOutcome next() {
+ *   copier.startBatch();
+ *   while (! copier.isFull() {
+ * copier.freeInput();
+ * IterOutcome innerResult = inner.next();
+ * if (innerResult == DONE) { break; }
+ * copier.startInput();
+ * copier.copyAll();
+ *   }
+ *   if (copier.hasRows()) {
+ * outputContainer = copier.harvest();
+ * return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
+ *   } else { return DONE; }
+ * }
+ * 
+ * 
+ * The above assumes that the upstream operator can be polled
+ * multiple times in the DONE state. The extra polling is
+ * needed to handle any in-flight copies when the input
+ * exhausts its batches.
+ * 
+ * The above also shows that the copier handles and reports
+ * schema changes by setting the schema change flag in the
+ * output container. Real code must handle multiple calls to
+ * next() in the DONE state, and work around lack of such support
+ * in its input (perhaps by tracking a state.)
+ * 
+ * An input batch is processed by copying the rows. Copying can be done
+ * row-by row, via a row range, or by copying the entire input batch as
+ * shown in the example.
+ * Copying the entire batch make sense when the input batch carries as
+ * selection vector that identifies which rows to copy, in which
+ * order.
+ * 
+ * Because we wish to fill the output batch, we may be able to copy
+ * part of a batch, the whole batch, or multiple batches to the output.
+ */
+
+public interface ResultSetCopier {
+
+  /**
+   * Start the next output batch.
+   */
+  void startBatch();
+
+  /**
+   * Start the next input batch. The input batch must be held
+   * by the VectorAccessor passed into the constructor.
+   */
+  void startInput();
+
+  /**
+   * If copying rows one by one, copy the next row from the
+   * input.
+   *
+   * @return true if more rows remain on the input, false
+   * if all rows are exhausted
+   */
+  boolean copyNext();
+
+  /**
+   * Copy a row at the given position. For those cases in
+   * which random copying is needed, but a selection vector
+   * is not available. Note that this version is slow because
+   * of the need to reset indexes for every row. Better to
+   * use a selection vector, then copy sequentially.
+   *
+   * @param posn the input row position. If a selection vector
+   * is attached, then this is the selection vector position
+   */
+  void copyRe

[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347341277
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/ResultSetCopier.java
 ##
 @@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet;
+
+import org.apache.drill.exec.physical.impl.aggregate.BatchIterator;
+import org.apache.drill.exec.record.VectorContainer;
+
+/**
+ * Copies rows from an input batch to an output batch. The input
+ * batch is assumed to have a selection vector, or the caller
+ * will pick the rows to copy.
+ * 
+ * Works to create full output batches to minimize per-batch
+ * overhead and to eliminate unnecessary empty batches if no
+ * rows are copied.
+ * 
+ * The output batches are assumed to have the same schema as
+ * input batches. (No projection occurs.) The output schema will
+ * change each time the input schema changes. (For an SV4, then
+ * the upstream operator must have ensured all batches covered
+ * by the SV4 have the same schema.)
+ * 
+ * This implementation works with a single stream of batches which,
+ * following Drill's rules, must consist of the same set of vectors on
+ * each non-schema-change batch.
+ *
+ * Protocol
+ * Overall lifecycle:
+ * 
+ * Create an instance of the
+ * {@link org.apache.drill.exec.physical.resultSet.impl.ResultSetCopierImpl
+ *  ResultSetCopierImpl} class, passing the input batch
+ *  accessor to the constructor.
+ * Loop to process each output batch as shown below. That is, continually
+ * process calls to the {@link BatchIterator#next()} method.
+ * Call {@link #close()}.
+ * 
+ * 
+ *
+ * To build each output batch:
+ *
+ * 
+ * public IterOutcome next() {
+ *   copier.startBatch();
+ *   while (! copier.isFull() {
+ * copier.freeInput();
+ * IterOutcome innerResult = inner.next();
+ * if (innerResult == DONE) { break; }
+ * copier.startInput();
+ * copier.copyAll();
+ *   }
+ *   if (copier.hasRows()) {
+ * outputContainer = copier.harvest();
+ * return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
+ *   } else { return DONE; }
+ * }
+ * 
+ * 
+ * The above assumes that the upstream operator can be polled
+ * multiple times in the DONE state. The extra polling is
+ * needed to handle any in-flight copies when the input
+ * exhausts its batches.
+ * 
+ * The above also shows that the copier handles and reports
+ * schema changes by setting the schema change flag in the
+ * output container. Real code must handle multiple calls to
+ * next() in the DONE state, and work around lack of such support
+ * in its input (perhaps by tracking a state.)
+ * 
+ * An input batch is processed by copying the rows. Copying can be done
+ * row-by row, via a row range, or by copying the entire input batch as
+ * shown in the example.
+ * Copying the entire batch make sense when the input batch carries as
+ * selection vector that identifies which rows to copy, in which
+ * order.
+ * 
+ * Because we wish to fill the output batch, we may be able to copy
+ * part of a batch, the whole batch, or multiple batches to the output.
+ */
+
+public interface ResultSetCopier {
+
+  /**
+   * Start the next output batch.
+   */
+  void startBatch();
 
 Review comment:
   ```suggestion
 void startOutputBatch();
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347358798
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ResultSetCopierImpl.java
 ##
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.impl.protocol.BatchAccessor;
+import org.apache.drill.exec.physical.resultSet.ResultSetCopier;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.ResultSetReader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetReader;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.exec.vector.accessor.ColumnWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+
+public class ResultSetCopierImpl implements ResultSetCopier {
+
+  private enum State {
+START,
+NO_SCHEMA,
+BETWEEN_BATCHES,
+BATCH_ACTIVE,
+NEW_SCHEMA,
+SCHEMA_PENDING,
+CLOSED
+  }
+
+  private interface BlockCopy {
+void copy();
+boolean hasMore();
+  }
+
+  private class CopyAll implements BlockCopy {
+
+@Override
+public void copy() {
+  while (!rowWriter.isFull() && rowReader.next()) {
+project();
+  }
+}
+
+@Override
+public boolean hasMore() {
+  return rowReader.hasNext();
+}
+  }
+
+  private static class CopyPair {
+protected final ColumnWriter writer;
+protected final ColumnReader reader;
+
+protected CopyPair(ColumnWriter writer, ColumnReader reader) {
+  this.writer = writer;
+  this.reader = reader;
+}
+  }
+
+  // Input state
+
+  private int currentSchemaVersion = -1;
+  private final ResultSetReader resultSetReader;
+  protected RowSetReader rowReader;
+
+  // Output state
+
+  private final BufferAllocator allocator;
+  private final OptionBuilder writerOptions;
+  private ResultSetLoader resultSetWriter;
+  private RowSetLoader rowWriter;
+
+  // Copy state
+
+  private State state;
+  private CopyPair[] projection;
+  private BlockCopy activeCopy;
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch) {
+this(allocator, inputBatch, new OptionBuilder());
+  }
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch,
+  OptionBuilder outputOptions) {
+this.allocator = allocator;
+resultSetReader = new ResultSetReaderImpl(inputBatch);
+writerOptions = outputOptions;
+writerOptions.setVectorCache(new ResultVectorCacheImpl(allocator));
+state = State.START;
 
 Review comment:
   ```suggestion
   this.state = State.START;
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347358541
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ResultSetCopierImpl.java
 ##
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.impl.protocol.BatchAccessor;
+import org.apache.drill.exec.physical.resultSet.ResultSetCopier;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.ResultSetReader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetReader;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.exec.vector.accessor.ColumnWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+
+public class ResultSetCopierImpl implements ResultSetCopier {
+
+  private enum State {
+START,
+NO_SCHEMA,
+BETWEEN_BATCHES,
+BATCH_ACTIVE,
+NEW_SCHEMA,
+SCHEMA_PENDING,
+CLOSED
+  }
+
+  private interface BlockCopy {
+void copy();
+boolean hasMore();
+  }
+
+  private class CopyAll implements BlockCopy {
+
+@Override
+public void copy() {
+  while (!rowWriter.isFull() && rowReader.next()) {
+project();
+  }
+}
+
+@Override
+public boolean hasMore() {
+  return rowReader.hasNext();
+}
+  }
+
+  private static class CopyPair {
+protected final ColumnWriter writer;
+protected final ColumnReader reader;
+
+protected CopyPair(ColumnWriter writer, ColumnReader reader) {
+  this.writer = writer;
+  this.reader = reader;
+}
+  }
+
+  // Input state
+
+  private int currentSchemaVersion = -1;
+  private final ResultSetReader resultSetReader;
+  protected RowSetReader rowReader;
+
+  // Output state
+
+  private final BufferAllocator allocator;
+  private final OptionBuilder writerOptions;
+  private ResultSetLoader resultSetWriter;
+  private RowSetLoader rowWriter;
+
+  // Copy state
+
+  private State state;
+  private CopyPair[] projection;
+  private BlockCopy activeCopy;
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch) {
+this(allocator, inputBatch, new OptionBuilder());
+  }
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch,
+  OptionBuilder outputOptions) {
+this.allocator = allocator;
+resultSetReader = new ResultSetReaderImpl(inputBatch);
+writerOptions = outputOptions;
 
 Review comment:
   ```suggestion
   this.writerOptions = outputOptions.setVectorCache(new 
ResultVectorCacheImpl(allocator));
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347358651
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ResultSetCopierImpl.java
 ##
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.impl.protocol.BatchAccessor;
+import org.apache.drill.exec.physical.resultSet.ResultSetCopier;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.ResultSetReader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetReader;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.exec.vector.accessor.ColumnWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+
+public class ResultSetCopierImpl implements ResultSetCopier {
+
+  private enum State {
+START,
+NO_SCHEMA,
+BETWEEN_BATCHES,
+BATCH_ACTIVE,
+NEW_SCHEMA,
+SCHEMA_PENDING,
+CLOSED
+  }
+
+  private interface BlockCopy {
+void copy();
+boolean hasMore();
+  }
+
+  private class CopyAll implements BlockCopy {
+
+@Override
+public void copy() {
+  while (!rowWriter.isFull() && rowReader.next()) {
+project();
+  }
+}
+
+@Override
+public boolean hasMore() {
+  return rowReader.hasNext();
+}
+  }
+
+  private static class CopyPair {
+protected final ColumnWriter writer;
+protected final ColumnReader reader;
+
+protected CopyPair(ColumnWriter writer, ColumnReader reader) {
+  this.writer = writer;
+  this.reader = reader;
+}
+  }
+
+  // Input state
+
+  private int currentSchemaVersion = -1;
+  private final ResultSetReader resultSetReader;
+  protected RowSetReader rowReader;
+
+  // Output state
+
+  private final BufferAllocator allocator;
+  private final OptionBuilder writerOptions;
+  private ResultSetLoader resultSetWriter;
+  private RowSetLoader rowWriter;
+
+  // Copy state
+
+  private State state;
+  private CopyPair[] projection;
+  private BlockCopy activeCopy;
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch) {
+this(allocator, inputBatch, new OptionBuilder());
+  }
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch,
+  OptionBuilder outputOptions) {
+this.allocator = allocator;
+resultSetReader = new ResultSetReaderImpl(inputBatch);
+writerOptions = outputOptions;
+writerOptions.setVectorCache(new ResultVectorCacheImpl(allocator));
 
 Review comment:
   ```suggestion
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347341843
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/ResultSetCopier.java
 ##
 @@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet;
+
+import org.apache.drill.exec.physical.impl.aggregate.BatchIterator;
+import org.apache.drill.exec.record.VectorContainer;
+
+/**
+ * Copies rows from an input batch to an output batch. The input
+ * batch is assumed to have a selection vector, or the caller
+ * will pick the rows to copy.
+ * 
+ * Works to create full output batches to minimize per-batch
+ * overhead and to eliminate unnecessary empty batches if no
+ * rows are copied.
+ * 
+ * The output batches are assumed to have the same schema as
+ * input batches. (No projection occurs.) The output schema will
+ * change each time the input schema changes. (For an SV4, then
+ * the upstream operator must have ensured all batches covered
+ * by the SV4 have the same schema.)
+ * 
+ * This implementation works with a single stream of batches which,
+ * following Drill's rules, must consist of the same set of vectors on
+ * each non-schema-change batch.
+ *
+ * Protocol
+ * Overall lifecycle:
+ * 
+ * Create an instance of the
+ * {@link org.apache.drill.exec.physical.resultSet.impl.ResultSetCopierImpl
+ *  ResultSetCopierImpl} class, passing the input batch
+ *  accessor to the constructor.
+ * Loop to process each output batch as shown below. That is, continually
+ * process calls to the {@link BatchIterator#next()} method.
+ * Call {@link #close()}.
+ * 
+ * 
+ *
+ * To build each output batch:
+ *
+ * 
+ * public IterOutcome next() {
+ *   copier.startBatch();
+ *   while (! copier.isFull() {
+ * copier.freeInput();
+ * IterOutcome innerResult = inner.next();
+ * if (innerResult == DONE) { break; }
+ * copier.startInput();
+ * copier.copyAll();
+ *   }
+ *   if (copier.hasRows()) {
+ * outputContainer = copier.harvest();
+ * return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
+ *   } else { return DONE; }
+ * }
+ * 
+ * 
+ * The above assumes that the upstream operator can be polled
+ * multiple times in the DONE state. The extra polling is
+ * needed to handle any in-flight copies when the input
+ * exhausts its batches.
+ * 
+ * The above also shows that the copier handles and reports
+ * schema changes by setting the schema change flag in the
+ * output container. Real code must handle multiple calls to
+ * next() in the DONE state, and work around lack of such support
+ * in its input (perhaps by tracking a state.)
+ * 
+ * An input batch is processed by copying the rows. Copying can be done
+ * row-by row, via a row range, or by copying the entire input batch as
+ * shown in the example.
+ * Copying the entire batch make sense when the input batch carries as
+ * selection vector that identifies which rows to copy, in which
+ * order.
+ * 
+ * Because we wish to fill the output batch, we may be able to copy
+ * part of a batch, the whole batch, or multiple batches to the output.
+ */
+
+public interface ResultSetCopier {
+
+  /**
+   * Start the next output batch.
+   */
+  void startBatch();
+
+  /**
+   * Start the next input batch. The input batch must be held
+   * by the VectorAccessor passed into the constructor.
+   */
+  void startInput();
+
+  /**
+   * If copying rows one by one, copy the next row from the
+   * input.
+   *
+   * @return true if more rows remain on the input, false
+   * if all rows are exhausted
+   */
+  boolean copyNext();
 
 Review comment:
   ```suggestion
 boolean copyNextRow();
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347261592
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/protocol/VectorContainerAccessor.java
 ##
 @@ -22,41 +22,83 @@
 
 import org.apache.drill.common.expression.SchemaPath;
 import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.RecordBatch;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
 
 public class VectorContainerAccessor implements BatchAccessor {
 
-  public static class ContainerAndSv2Accessor extends VectorContainerAccessor {
+  public static class ExtendedContainerAccessor extends 
VectorContainerAccessor {
 
 private SelectionVector2 sv2;
+private SelectionVector4 sv4;
+
+public void setBatch(RecordBatch batch) {
+  addBatch(batch.getContainer());
+  switch (container.getSchema().getSelectionVectorMode()) {
+  case TWO_BYTE:
+ setSelectionVector(batch.getSelectionVector2());
+ break;
+  case FOUR_BYTE:
+ setSelectionVector(batch.getSelectionVector4());
+ break;
+   default:
+ break;
 
 Review comment:
   please remove redundant default case if we can't get rid of the method 
completely.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347379920
 
 

 ##
 File path: exec/vector/src/main/codegen/templates/ColumnAccessors.java
 ##
 @@ -592,6 +592,32 @@ public final void setDefaultValue(final Object value) {
   }
 
 }
+
+@Override
+public final void copy(ColumnReader from) {
+  ${drillType}ColumnReader source = (${drillType}ColumnReader) from;
+  final DrillBuf sourceBuf = source.buffer();
+  <#-- First cut, copy materialized value.
+<#if varWidth>
+  byte[] bytes = source.getBytes();
+  setBytes(bytes, bytes.length);
+<#else>
+  set${label}(source.get${label}());
+ -->
 
 Review comment:
   please remove commented lines


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347362163
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ResultSetCopierImpl.java
 ##
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.impl.protocol.BatchAccessor;
+import org.apache.drill.exec.physical.resultSet.ResultSetCopier;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.ResultSetReader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetReader;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.exec.vector.accessor.ColumnWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+
+public class ResultSetCopierImpl implements ResultSetCopier {
+
+  private enum State {
+START,
+NO_SCHEMA,
+BETWEEN_BATCHES,
+BATCH_ACTIVE,
+NEW_SCHEMA,
+SCHEMA_PENDING,
+CLOSED
+  }
+
+  private interface BlockCopy {
+void copy();
+boolean hasMore();
+  }
+
+  private class CopyAll implements BlockCopy {
+
+@Override
+public void copy() {
+  while (!rowWriter.isFull() && rowReader.next()) {
+project();
+  }
+}
+
+@Override
+public boolean hasMore() {
+  return rowReader.hasNext();
+}
+  }
+
+  private static class CopyPair {
+protected final ColumnWriter writer;
+protected final ColumnReader reader;
+
+protected CopyPair(ColumnWriter writer, ColumnReader reader) {
+  this.writer = writer;
+  this.reader = reader;
+}
+  }
+
+  // Input state
+
+  private int currentSchemaVersion = -1;
+  private final ResultSetReader resultSetReader;
+  protected RowSetReader rowReader;
+
+  // Output state
+
+  private final BufferAllocator allocator;
+  private final OptionBuilder writerOptions;
+  private ResultSetLoader resultSetWriter;
+  private RowSetLoader rowWriter;
+
+  // Copy state
+
+  private State state;
+  private CopyPair[] projection;
+  private BlockCopy activeCopy;
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch) {
+this(allocator, inputBatch, new OptionBuilder());
+  }
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch,
+  OptionBuilder outputOptions) {
+this.allocator = allocator;
+resultSetReader = new ResultSetReaderImpl(inputBatch);
+writerOptions = outputOptions;
+writerOptions.setVectorCache(new ResultVectorCacheImpl(allocator));
+state = State.START;
+  }
+
+  @Override
+  public void startBatch() {
+if (state == State.START) {
+
+  // No schema yet. Defer real batch start until we see an input
+  // batch.
+
+  state = State.NO_SCHEMA;
+  return;
+}
+Preconditions.checkState(state == State.BETWEEN_BATCHES || state == 
State.SCHEMA_PENDING);
+if (state == State.SCHEMA_PENDING) {
+
+  // We have a pending new schema. Create new writers to match.
+
+  createMapping();
+}
+resultSetWriter.startBatch();
+state = State.BATCH_ACTIVE;
+if (isCopyPending()) {
+
+  // Resume copying if a copy is active.
+
+  copyBlock();
+}
+  }
+
+  @Override
+  public void startInput() {
+Preconditions.checkState(state == State.NO_SCHEMA || state == 
State.NEW_SCHEMA ||
+ state == State.BATCH_ACTIVE,
+"Can only start input while in an output batch");
+Preconditions.checkState(!isCopyPending(),
+"Finish the pending copy before changing input");
+
+bindInput();
+
+if (state == State.BATCH_ACTIVE) {
+
+  // If no schema change, we are ready to copy.
+
+  if (currentSchemaVersion == 
res

[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347383123
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/convert/AbstractWriteConverter.java
 ##
 @@ -143,4 +144,9 @@ public void setTime(LocalTime value) {
   public void setTimestamp(Instant value) {
 baseWriter.setTimestamp(value);
   }
+
+  @Override
+  public void copy(ColumnReader from) {
+throw new UnsupportedOperationException("Cannot copy values through a type 
converter");
 
 Review comment:
   If type converters shouldn't support copy then maybe makes sense to mark the 
method as final here. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347364583
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ResultSetCopierImpl.java
 ##
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.impl.protocol.BatchAccessor;
+import org.apache.drill.exec.physical.resultSet.ResultSetCopier;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.ResultSetReader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetReader;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.exec.vector.accessor.ColumnWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+
+public class ResultSetCopierImpl implements ResultSetCopier {
+
+  private enum State {
+START,
+NO_SCHEMA,
+BETWEEN_BATCHES,
+BATCH_ACTIVE,
+NEW_SCHEMA,
+SCHEMA_PENDING,
+CLOSED
+  }
+
+  private interface BlockCopy {
+void copy();
+boolean hasMore();
+  }
+
+  private class CopyAll implements BlockCopy {
+
+@Override
+public void copy() {
+  while (!rowWriter.isFull() && rowReader.next()) {
+project();
+  }
+}
+
+@Override
+public boolean hasMore() {
+  return rowReader.hasNext();
+}
+  }
+
+  private static class CopyPair {
+protected final ColumnWriter writer;
+protected final ColumnReader reader;
+
+protected CopyPair(ColumnWriter writer, ColumnReader reader) {
+  this.writer = writer;
+  this.reader = reader;
+}
+  }
+
+  // Input state
+
+  private int currentSchemaVersion = -1;
+  private final ResultSetReader resultSetReader;
+  protected RowSetReader rowReader;
+
+  // Output state
+
+  private final BufferAllocator allocator;
+  private final OptionBuilder writerOptions;
+  private ResultSetLoader resultSetWriter;
+  private RowSetLoader rowWriter;
+
+  // Copy state
+
+  private State state;
+  private CopyPair[] projection;
+  private BlockCopy activeCopy;
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch) {
+this(allocator, inputBatch, new OptionBuilder());
+  }
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch,
+  OptionBuilder outputOptions) {
+this.allocator = allocator;
+resultSetReader = new ResultSetReaderImpl(inputBatch);
+writerOptions = outputOptions;
+writerOptions.setVectorCache(new ResultVectorCacheImpl(allocator));
+state = State.START;
+  }
+
+  @Override
+  public void startBatch() {
+if (state == State.START) {
+
+  // No schema yet. Defer real batch start until we see an input
+  // batch.
+
+  state = State.NO_SCHEMA;
+  return;
+}
+Preconditions.checkState(state == State.BETWEEN_BATCHES || state == 
State.SCHEMA_PENDING);
+if (state == State.SCHEMA_PENDING) {
+
+  // We have a pending new schema. Create new writers to match.
+
+  createMapping();
+}
+resultSetWriter.startBatch();
+state = State.BATCH_ACTIVE;
+if (isCopyPending()) {
+
+  // Resume copying if a copy is active.
+
+  copyBlock();
+}
+  }
+
+  @Override
+  public void startInput() {
+Preconditions.checkState(state == State.NO_SCHEMA || state == 
State.NEW_SCHEMA ||
+ state == State.BATCH_ACTIVE,
+"Can only start input while in an output batch");
+Preconditions.checkState(!isCopyPending(),
+"Finish the pending copy before changing input");
+
+bindInput();
+
+if (state == State.BATCH_ACTIVE) {
+
+  // If no schema change, we are ready to copy.
+
+  if (currentSchemaVersion == 
res

[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347358260
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/impl/ResultSetCopierImpl.java
 ##
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.physical.impl.protocol.BatchAccessor;
+import org.apache.drill.exec.physical.resultSet.ResultSetCopier;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.ResultSetReader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.physical.rowSet.RowSetReader;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ColumnReader;
+import org.apache.drill.exec.vector.accessor.ColumnWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+
+public class ResultSetCopierImpl implements ResultSetCopier {
+
+  private enum State {
+START,
+NO_SCHEMA,
+BETWEEN_BATCHES,
+BATCH_ACTIVE,
+NEW_SCHEMA,
+SCHEMA_PENDING,
+CLOSED
+  }
+
+  private interface BlockCopy {
+void copy();
+boolean hasMore();
+  }
+
+  private class CopyAll implements BlockCopy {
+
+@Override
+public void copy() {
+  while (!rowWriter.isFull() && rowReader.next()) {
+project();
+  }
+}
+
+@Override
+public boolean hasMore() {
+  return rowReader.hasNext();
+}
+  }
+
+  private static class CopyPair {
+protected final ColumnWriter writer;
+protected final ColumnReader reader;
+
+protected CopyPair(ColumnWriter writer, ColumnReader reader) {
+  this.writer = writer;
+  this.reader = reader;
+}
+  }
+
+  // Input state
+
+  private int currentSchemaVersion = -1;
+  private final ResultSetReader resultSetReader;
+  protected RowSetReader rowReader;
+
+  // Output state
+
+  private final BufferAllocator allocator;
+  private final OptionBuilder writerOptions;
+  private ResultSetLoader resultSetWriter;
+  private RowSetLoader rowWriter;
+
+  // Copy state
+
+  private State state;
+  private CopyPair[] projection;
+  private BlockCopy activeCopy;
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch) {
+this(allocator, inputBatch, new OptionBuilder());
+  }
+
+  public ResultSetCopierImpl(BufferAllocator allocator, BatchAccessor 
inputBatch,
+  OptionBuilder outputOptions) {
+this.allocator = allocator;
+resultSetReader = new ResultSetReaderImpl(inputBatch);
 
 Review comment:
   ```suggestion
   this.resultSetReader = new ResultSetReaderImpl(inputBatch);
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347343078
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/ResultSetCopier.java
 ##
 @@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet;
+
+import org.apache.drill.exec.physical.impl.aggregate.BatchIterator;
+import org.apache.drill.exec.record.VectorContainer;
+
+/**
+ * Copies rows from an input batch to an output batch. The input
+ * batch is assumed to have a selection vector, or the caller
+ * will pick the rows to copy.
+ * 
+ * Works to create full output batches to minimize per-batch
+ * overhead and to eliminate unnecessary empty batches if no
+ * rows are copied.
+ * 
+ * The output batches are assumed to have the same schema as
+ * input batches. (No projection occurs.) The output schema will
+ * change each time the input schema changes. (For an SV4, then
+ * the upstream operator must have ensured all batches covered
+ * by the SV4 have the same schema.)
+ * 
+ * This implementation works with a single stream of batches which,
+ * following Drill's rules, must consist of the same set of vectors on
+ * each non-schema-change batch.
+ *
+ * Protocol
+ * Overall lifecycle:
+ * 
+ * Create an instance of the
+ * {@link org.apache.drill.exec.physical.resultSet.impl.ResultSetCopierImpl
+ *  ResultSetCopierImpl} class, passing the input batch
+ *  accessor to the constructor.
+ * Loop to process each output batch as shown below. That is, continually
+ * process calls to the {@link BatchIterator#next()} method.
+ * Call {@link #close()}.
+ * 
+ * 
+ *
+ * To build each output batch:
+ *
+ * 
+ * public IterOutcome next() {
+ *   copier.startBatch();
+ *   while (! copier.isFull() {
+ * copier.freeInput();
+ * IterOutcome innerResult = inner.next();
+ * if (innerResult == DONE) { break; }
+ * copier.startInput();
+ * copier.copyAll();
+ *   }
+ *   if (copier.hasRows()) {
+ * outputContainer = copier.harvest();
+ * return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
+ *   } else { return DONE; }
+ * }
+ * 
+ * 
+ * The above assumes that the upstream operator can be polled
+ * multiple times in the DONE state. The extra polling is
+ * needed to handle any in-flight copies when the input
+ * exhausts its batches.
+ * 
+ * The above also shows that the copier handles and reports
+ * schema changes by setting the schema change flag in the
+ * output container. Real code must handle multiple calls to
+ * next() in the DONE state, and work around lack of such support
+ * in its input (perhaps by tracking a state.)
+ * 
+ * An input batch is processed by copying the rows. Copying can be done
+ * row-by row, via a row range, or by copying the entire input batch as
+ * shown in the example.
+ * Copying the entire batch make sense when the input batch carries as
+ * selection vector that identifies which rows to copy, in which
+ * order.
+ * 
+ * Because we wish to fill the output batch, we may be able to copy
+ * part of a batch, the whole batch, or multiple batches to the output.
+ */
+
+public interface ResultSetCopier {
+
+  /**
+   * Start the next output batch.
+   */
+  void startBatch();
+
+  /**
+   * Start the next input batch. The input batch must be held
+   * by the VectorAccessor passed into the constructor.
+   */
+  void startInput();
+
+  /**
+   * If copying rows one by one, copy the next row from the
+   * input.
+   *
+   * @return true if more rows remain on the input, false
+   * if all rows are exhausted
+   */
+  boolean copyNext();
+
+  /**
+   * Copy a row at the given position. For those cases in
+   * which random copying is needed, but a selection vector
+   * is not available. Note that this version is slow because
+   * of the need to reset indexes for every row. Better to
+   * use a selection vector, then copy sequentially.
+   *
+   * @param posn the input row position. If a selection vector
+   * is attached, then this is the selection vector position
+   */
+  void copyRe

[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347376061
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector2Builder.java
 ##
 @@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.selection;
+
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.record.VectorContainer;
+
+public class SelectionVector2Builder {
 
 Review comment:
   Seems like the class is only intended for test purposes. If so, please move 
to test sources. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347260214
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/protocol/VectorContainerAccessor.java
 ##
 @@ -22,41 +22,83 @@
 
 import org.apache.drill.common.expression.SchemaPath;
 import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.RecordBatch;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
 
 public class VectorContainerAccessor implements BatchAccessor {
 
-  public static class ContainerAndSv2Accessor extends VectorContainerAccessor {
+  public static class ExtendedContainerAccessor extends 
VectorContainerAccessor {
 
 private SelectionVector2 sv2;
+private SelectionVector4 sv4;
+
+public void setBatch(RecordBatch batch) {
 
 Review comment:
   This method seems to be unused, can we remove it ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347259525
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/protocol/VectorContainerAccessor.java
 ##
 @@ -22,41 +22,83 @@
 
 import org.apache.drill.common.expression.SchemaPath;
 import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.RecordBatch;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
 
 public class VectorContainerAccessor implements BatchAccessor {
 
-  public static class ContainerAndSv2Accessor extends VectorContainerAccessor {
+  public static class ExtendedContainerAccessor extends 
VectorContainerAccessor {
 
 Review comment:
   Please extract the class to separate file and provide javadoc. Also I know 
it's hard but it would be cool to provide more meaningful name for the class. 
Maybe something like, ```ExternalSelectionVectorsContainerAccessor```. Also if 
batch can contain only either sv2 or sv4 maybe it would be better to have 2 
separate classes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347342178
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/ResultSetCopier.java
 ##
 @@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet;
+
+import org.apache.drill.exec.physical.impl.aggregate.BatchIterator;
+import org.apache.drill.exec.record.VectorContainer;
+
+/**
+ * Copies rows from an input batch to an output batch. The input
+ * batch is assumed to have a selection vector, or the caller
+ * will pick the rows to copy.
+ * 
+ * Works to create full output batches to minimize per-batch
+ * overhead and to eliminate unnecessary empty batches if no
+ * rows are copied.
+ * 
+ * The output batches are assumed to have the same schema as
+ * input batches. (No projection occurs.) The output schema will
+ * change each time the input schema changes. (For an SV4, then
+ * the upstream operator must have ensured all batches covered
+ * by the SV4 have the same schema.)
+ * 
+ * This implementation works with a single stream of batches which,
+ * following Drill's rules, must consist of the same set of vectors on
+ * each non-schema-change batch.
+ *
+ * Protocol
+ * Overall lifecycle:
+ * 
+ * Create an instance of the
+ * {@link org.apache.drill.exec.physical.resultSet.impl.ResultSetCopierImpl
+ *  ResultSetCopierImpl} class, passing the input batch
+ *  accessor to the constructor.
+ * Loop to process each output batch as shown below. That is, continually
+ * process calls to the {@link BatchIterator#next()} method.
+ * Call {@link #close()}.
+ * 
+ * 
+ *
+ * To build each output batch:
+ *
+ * 
+ * public IterOutcome next() {
+ *   copier.startBatch();
+ *   while (! copier.isFull() {
+ * copier.freeInput();
+ * IterOutcome innerResult = inner.next();
+ * if (innerResult == DONE) { break; }
+ * copier.startInput();
+ * copier.copyAll();
+ *   }
+ *   if (copier.hasRows()) {
+ * outputContainer = copier.harvest();
+ * return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
+ *   } else { return DONE; }
+ * }
+ * 
+ * 
+ * The above assumes that the upstream operator can be polled
+ * multiple times in the DONE state. The extra polling is
+ * needed to handle any in-flight copies when the input
+ * exhausts its batches.
+ * 
+ * The above also shows that the copier handles and reports
+ * schema changes by setting the schema change flag in the
+ * output container. Real code must handle multiple calls to
+ * next() in the DONE state, and work around lack of such support
+ * in its input (perhaps by tracking a state.)
+ * 
+ * An input batch is processed by copying the rows. Copying can be done
+ * row-by row, via a row range, or by copying the entire input batch as
+ * shown in the example.
+ * Copying the entire batch make sense when the input batch carries as
+ * selection vector that identifies which rows to copy, in which
+ * order.
+ * 
+ * Because we wish to fill the output batch, we may be able to copy
+ * part of a batch, the whole batch, or multiple batches to the output.
+ */
+
+public interface ResultSetCopier {
+
+  /**
+   * Start the next output batch.
+   */
+  void startBatch();
+
+  /**
+   * Start the next input batch. The input batch must be held
+   * by the VectorAccessor passed into the constructor.
+   */
+  void startInput();
+
+  /**
+   * If copying rows one by one, copy the next row from the
+   * input.
+   *
+   * @return true if more rows remain on the input, false
+   * if all rows are exhausted
+   */
+  boolean copyNext();
+
+  /**
+   * Copy a row at the given position. For those cases in
+   * which random copying is needed, but a selection vector
+   * is not available. Note that this version is slow because
+   * of the need to reset indexes for every row. Better to
+   * use a selection vector, then copy sequentially.
+   *
+   * @param posn the input row position. If a selection vector
+   * is attached, then this is the selection vector position
+   */
+  void copyRe

[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347268987
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/protocol/VectorContainerAccessor.java
 ##
 @@ -22,41 +22,83 @@
 
 import org.apache.drill.common.expression.SchemaPath;
 import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.RecordBatch;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
 
 public class VectorContainerAccessor implements BatchAccessor {
 
-  public static class ContainerAndSv2Accessor extends VectorContainerAccessor {
+  public static class ExtendedContainerAccessor extends 
VectorContainerAccessor {
 
 private SelectionVector2 sv2;
+private SelectionVector4 sv4;
+
+public void setBatch(RecordBatch batch) {
+  addBatch(batch.getContainer());
+  switch (container.getSchema().getSelectionVectorMode()) {
+  case TWO_BYTE:
+ setSelectionVector(batch.getSelectionVector2());
+ break;
+  case FOUR_BYTE:
+ setSelectionVector(batch.getSelectionVector4());
+ break;
+   default:
+ break;
+  }
+}
 
 public void setSelectionVector(SelectionVector2 sv2) {
+  Preconditions.checkState(sv4 == null);
   this.sv2 = sv2;
 }
 
+public void setSelectionVector(SelectionVector4 sv4) {
+  Preconditions.checkState(sv2 == null);
+  this.sv4 = sv4;
+}
+
 @Override
 public SelectionVector2 selectionVector2() {
   return sv2;
 }
-  }
-
-  public static class ContainerAndSv4Accessor extends VectorContainerAccessor {
-
-private SelectionVector4 sv4;
 
 @Override
 public SelectionVector4 selectionVector4() {
   return sv4;
 }
+
+@Override
+public int rowCount() {
+  if (sv2 != null) {
+return sv2.getCount();
+  } else if (sv4 != null) {
+return sv4.getCount();
+  } else {
+return super.rowCount();
+  }
+}
+
+@Override
+public void release() {
+  super.release();
+  if (sv2 != null) {
+sv2.clear();
+sv2 = null;
+  }
+  if (sv4 != null) {
+sv4.clear();
+sv4 = null;
+  }
+}
   }
 
-  private VectorContainer container;
-  private SchemaTracker schemaTracker = new SchemaTracker();
+  protected VectorContainer container;
 
 Review comment:
   I think package-private access is enough for extracted child class. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347267693
 
 

 ##
 File path: common/src/main/java/org/apache/drill/common/types/Types.java
 ##
 @@ -806,16 +806,20 @@ public static boolean isSortable(MinorType type) {
 return typeBuilder;
   }
 
+  public static boolean isSameType(MajorType type1, MajorType type2) {
+return type1.getMinorType() == type2.getMinorType() &&
+   type1.getMode() == type2.getMode() &&
+   type1.getScale() == type2.getScale() &&
+   type1.getPrecision() == type2.getPrecision();
+  }
+
   public static boolean isEquivalent(MajorType type1, MajorType type2) {
 
 // Requires full type equality, including fields such as precision and 
scale.
 // But, unset fields are equivalent to 0. Can't use the protobuf-provided
 // isEquals() which treats set and unset fields as different.
 
 Review comment:
   This comment could be converted to javadoc for newly created 
```isSameType(MajorType type1, MajorType type2)``` method. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347267013
 
 

 ##
 File path: common/src/main/java/org/apache/drill/common/types/Types.java
 ##
 @@ -806,16 +806,20 @@ public static boolean isSortable(MinorType type) {
 return typeBuilder;
   }
 
+  public static boolean isSameType(MajorType type1, MajorType type2) {
 
 Review comment:
   ```suggestion
 private static boolean isSameType(MajorType type1, MajorType type2) {
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347341416
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/resultSet/ResultSetCopier.java
 ##
 @@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet;
+
+import org.apache.drill.exec.physical.impl.aggregate.BatchIterator;
+import org.apache.drill.exec.record.VectorContainer;
+
+/**
+ * Copies rows from an input batch to an output batch. The input
+ * batch is assumed to have a selection vector, or the caller
+ * will pick the rows to copy.
+ * 
+ * Works to create full output batches to minimize per-batch
+ * overhead and to eliminate unnecessary empty batches if no
+ * rows are copied.
+ * 
+ * The output batches are assumed to have the same schema as
+ * input batches. (No projection occurs.) The output schema will
+ * change each time the input schema changes. (For an SV4, then
+ * the upstream operator must have ensured all batches covered
+ * by the SV4 have the same schema.)
+ * 
+ * This implementation works with a single stream of batches which,
+ * following Drill's rules, must consist of the same set of vectors on
+ * each non-schema-change batch.
+ *
+ * Protocol
+ * Overall lifecycle:
+ * 
+ * Create an instance of the
+ * {@link org.apache.drill.exec.physical.resultSet.impl.ResultSetCopierImpl
+ *  ResultSetCopierImpl} class, passing the input batch
+ *  accessor to the constructor.
+ * Loop to process each output batch as shown below. That is, continually
+ * process calls to the {@link BatchIterator#next()} method.
+ * Call {@link #close()}.
+ * 
+ * 
+ *
+ * To build each output batch:
+ *
+ * 
+ * public IterOutcome next() {
+ *   copier.startBatch();
+ *   while (! copier.isFull() {
+ * copier.freeInput();
+ * IterOutcome innerResult = inner.next();
+ * if (innerResult == DONE) { break; }
+ * copier.startInput();
+ * copier.copyAll();
+ *   }
+ *   if (copier.hasRows()) {
+ * outputContainer = copier.harvest();
+ * return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
+ *   } else { return DONE; }
+ * }
+ * 
+ * 
+ * The above assumes that the upstream operator can be polled
+ * multiple times in the DONE state. The extra polling is
+ * needed to handle any in-flight copies when the input
+ * exhausts its batches.
+ * 
+ * The above also shows that the copier handles and reports
+ * schema changes by setting the schema change flag in the
+ * output container. Real code must handle multiple calls to
+ * next() in the DONE state, and work around lack of such support
+ * in its input (perhaps by tracking a state.)
+ * 
+ * An input batch is processed by copying the rows. Copying can be done
+ * row-by row, via a row range, or by copying the entire input batch as
+ * shown in the example.
+ * Copying the entire batch make sense when the input batch carries as
+ * selection vector that identifies which rows to copy, in which
+ * order.
+ * 
+ * Because we wish to fill the output batch, we may be able to copy
+ * part of a batch, the whole batch, or multiple batches to the output.
+ */
+
+public interface ResultSetCopier {
+
+  /**
+   * Start the next output batch.
+   */
+  void startBatch();
+
+  /**
+   * Start the next input batch. The input batch must be held
+   * by the VectorAccessor passed into the constructor.
+   */
+  void startInput();
 
 Review comment:
   ```suggestion
 void startInputBatch();
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch copier based on result set framework

2019-11-18 Thread GitBox
ihuzenko commented on a change in pull request #1899: DRILL-7445: Create batch 
copier based on result set framework
URL: https://github.com/apache/drill/pull/1899#discussion_r347265336
 
 

 ##
 File path: common/src/main/java/org/apache/drill/common/types/Types.java
 ##
 @@ -806,16 +806,20 @@ public static boolean isSortable(MinorType type) {
 return typeBuilder;
   }
 
+  public static boolean isSameType(MajorType type1, MajorType type2) {
+return type1.getMinorType() == type2.getMinorType() &&
+   type1.getMode() == type2.getMode() &&
+   type1.getScale() == type2.getScale() &&
+   type1.getPrecision() == type2.getPrecision();
+  }
+
   public static boolean isEquivalent(MajorType type1, MajorType type2) {
 
 // Requires full type equality, including fields such as precision and 
scale.
 // But, unset fields are equivalent to 0. Can't use the protobuf-provided
 // isEquals() which treats set and unset fields as different.
 
-if (type1.getMinorType() != type2.getMinorType() ||
-type1.getMode() != type2.getMode() ||
-type1.getScale() != type2.getScale() ||
-type1.getPrecision() != type2.getPrecision()) {
+if (! isSameType(type1, type2)) {
 
 Review comment:
   ```suggestion
   if (!isSameType(type1, type2)) {
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1896: DRILL-7441: Fix issues with fillEmpties, offset vectors

2019-11-18 Thread GitBox
vvysotskyi commented on a change in pull request #1896: DRILL-7441: Fix issues 
with fillEmpties, offset vectors
URL: https://github.com/apache/drill/pull/1896#discussion_r347331872
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseTestQuery.java
 ##
 @@ -377,6 +379,14 @@ public static void resetSessionOption(String option) {
 }
   }
 
+  public static void resetAllSessionOptions() {
+try {
+  test("ALTER SESSION RESET ALL");
+} catch(final Exception e) {
 
 Review comment:
   ```suggestion
   } catch (Exception e) {
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1896: DRILL-7441: Fix issues with fillEmpties, offset vectors

2019-11-18 Thread GitBox
vvysotskyi commented on a change in pull request #1896: DRILL-7441: Fix issues 
with fillEmpties, offset vectors
URL: https://github.com/apache/drill/pull/1896#discussion_r347331938
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/test/BaseTestQuery.java
 ##
 @@ -377,6 +379,14 @@ public static void resetSessionOption(String option) {
 }
   }
 
+  public static void resetAllSessionOptions() {
+try {
+  test("ALTER SESSION RESET ALL");
+} catch(final Exception e) {
+  fail("Failed to reset all session option");
 
 Review comment:
   ```suggestion
 fail("Failed to reset all session options");
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #1896: DRILL-7441: Fix issues with fillEmpties, offset vectors

2019-11-18 Thread GitBox
vvysotskyi commented on a change in pull request #1896: DRILL-7441: Fix issues 
with fillEmpties, offset vectors
URL: https://github.com/apache/drill/pull/1896#discussion_r347331522
 
 

 ##
 File path: exec/vector/src/main/codegen/templates/NullableValueVectors.java
 ##
 @@ -844,29 +841,35 @@ public void generateTestData(int valueCount) {
 
 @Override
 public void reset() {
-  setCount = 0;
   <#if type.major = "VarLen">lastSet = -1;
 }
 
 <#if type.major = "VarLen">
 @VisibleForTesting
 public int getLastSet() { return lastSet; }
-
+
 
+@Override
+public void setSetCount(int n) {
+  <#if type.major = "VarLen">lastSet = n - 1;
+}
+
 // For nullable vectors, exchanging buffers (done elsewhere)
 // requires also exchanging mutator state (done here.)
 
 @Override
 public void exchange(ValueVector.Mutator other) {
-  final Mutator target = (Mutator) other;
-  int temp = setCount;
-  setCount = target.setCount;
-  target.setCount = temp;
+  <#if type.major == "VarLen">
 
 Review comment:
   Thanks for the explanation!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva opened a new pull request #1901: DRILL-7388: Kafka improvements

2019-11-18 Thread GitBox
arina-ielchiieva opened a new pull request #1901: DRILL-7388: Kafka improvements
URL: https://github.com/apache/drill/pull/1901
 
 
   Jira - [DRILL-7388](https://issues.apache.org/jira/browse/DRILL-7388).
   
   1. Upgraded Kafka libraries to 2.3.1 (DRILL-6739).
   2. Added new options to support the same features as native JSON reader:
 a. store.kafka.reader.skip_invalid_records, default: false (DRILL-6723);
 b. store.kafka.reader.allow_nan_inf, default: true;
 c. store.kafka.reader.allow_escape_any_char, default: false.
   3. Fixed issue when Kafka topic contains only one message (DRILL-7388).
   4. Replaced Gson parser with Jackson to parse JSON in the same manner as 
Drill native Json reader.
   5. Performance improvements: Kafka consumers will be closed async, fixed 
issue with resource leak (DRILL-7290), moved to debug unnecessary info logging.
   6. Updated bootstrap-storage-plugins.json to reflect actual Kafka connection 
properties.
   7. Added unit tests.
   8. Refactoring and code clean up.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [DISCUSS] 1.17.0 release

2019-11-18 Thread Volodymyr Vysotskyi
Hi Charles,

We found a blocking issue for the release: DRILL-7450
.
I need a couple of days to fix it. I'll share updates soon.

Kind regards,
Volodymyr Vysotskyi


On Fri, Nov 15, 2019 at 6:56 PM Charles Givre  wrote:

> Hi Volodymyr,
> I wanted to follow up to see how we are doing with the Drill release
> candidate?  Are we getting close to a release?
> Thanks,
> -- C
>
>
> > On Nov 4, 2019, at 1:57 PM, Volodymyr Vysotskyi 
> wrote:
> >
> > Hello Drillers,
> >
> > It's about 6 months have passed since the previous release and its time
> to
> > discuss and start planning for the 1.17.0.
> > I volunteer to manage the new release.
> >
> > We have 6 Jira tickets with "reviewable" status, 2 "in progress" and 6
> open
> > tickets [1]. Jira tickets marked as ready to commit will be merged soon
> and
> > I hope other PRs from this list will be completed before the cut-off
> date.
> >
> > Among these tickets, I want to include DRILL-7273 [2] to the release
> (pull
> > request is already opened, but CR comments should be addressed).
> >
> > I would like to propose a preliminary release cut-off date as the middle
> of
> > the next week (Nov, 13) or the beginning of the week after that (Nov,
> 18).
> >
> > Please let me know if there are any other Jira tickets you working on
> which
> > should be included in this release.
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=185&projectKey=DRILL&quickFilter=1425
> > [2] https://issues.apache.org/jira/browse/DRILL-7273
> >
> > Kind regards,
> > Volodymyr Vysotskyi
>
>


[jira] [Created] (DRILL-7450) Improve performance for ANALYZE command

2019-11-18 Thread Vova Vysotskyi (Jira)
Vova Vysotskyi created DRILL-7450:
-

 Summary: Improve performance for ANALYZE command
 Key: DRILL-7450
 URL: https://issues.apache.org/jira/browse/DRILL-7450
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Vova Vysotskyi
 Fix For: 1.17.0


In the scope of DRILL-7273 was introduced ANALYZE command for collecting 
metadata and storing it to the metastore.
But current implementation uses too much memory and is low-performant. It uses 
stream aggregate for collecting metadata, but all incoming data should be 
sorted before producing the aggregation. Memory usage may be reduced by using 
hash aggregate, so sort may not be produced and it should increase performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7449) memory leak parse_url function

2019-11-18 Thread benj (Jira)
benj created DRILL-7449:
---

 Summary: memory leak parse_url function
 Key: DRILL-7449
 URL: https://issues.apache.org/jira/browse/DRILL-7449
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.16.0
Reporter: benj


Requests with *parse_url* works well when the number of treated rows is low but 
produce memory leak when number of rows grows (~ between 500 000 and 1 million) 
(and for certain number of row sometimes the request works and sometimes it 
failed with memory leaks)

Extract from dataset tested:
{noformat}
{"Attributable":true,"Description":"Website has been identified as malicious by 
Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
{"Attributable":true,"Description":"Website has been identified as malicious by 
Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
{noformat}
Request tested:
{code:sql}
ALTER SESSION SET `store.format`='parquet';
ALTER SESSION SET `store.parquet.use_new_reader` = true;
ALTER SESSION SET `store.parquet.compression` = 'snappy';
ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
ALTER SESSION SET `store.json.all_text_mode` = true;
ALTER SESSION SET `exec.enable_union_type` = true;
ALTER SESSION SET `store.json.all_text_mode` = true;

CREATE TABLE dfs.test.`output_pqt` AS
(
SELECT R.parsed.host AS Domain
FROM ( 
  SELECT parse_url(T.Url) AS parsed
  FROM dfs.test.`file.json` AS T
) AS R 
ORDER BY Domain
);
{code}
 
 Result when memory leak:
{noformat}
Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory 
leaked: (256)
Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)


Fragment 3:0

Please, refer to logs for more information.

[Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]

  (java.lang.IllegalStateException) Memory was leaked by query. Memory leaked: 
(256)
Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)

org.apache.drill.exec.memory.BaseAllocator.close():520
org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
org.apache.drill.exec.ops.FragmentContextImpl.close():546
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748 (state=,code=0)
java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
by query. Memory leaked: (256)
Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)


Fragment 3:0

Please, refer to logs for more information.

[Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]

  (java.lang.IllegalStateException) Memory was leaked by query. Memory leaked: 
(256)
Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)

org.apache.drill.exec.memory.BaseAllocator.close():520
org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
org.apache.drill.exec.ops.FragmentContextImpl.close():546
org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748

at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:538)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:610)
at 
org.apache.drill.jdbc.impl.Dri