[jira] [Resolved] (DRILL-5914) CSV (text) reader fails to parse quoted newlines in trailing fields

2019-10-05 Thread Paul Rogers (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved DRILL-5914.

Resolution: Fixed

This issue was fixed as part of the "Complaint text reader V3" project. The 
test cited in the description now correctly reports 4 lines for the 
{{COUNT(*)}} query.

> CSV (text) reader fails to parse quoted newlines in trailing fields
> ---
>
> Key: DRILL-5914
> URL: https://issues.apache.org/jira/browse/DRILL-5914
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Consider the existing `TestCsvHeader.testCountOnCsvWithHeader()` unit test. 
> The input file is as follows:
> {noformat}
> Year,Make,Model,Description,Price
> 1997,Ford,E350,"ac, abs, moon",3000.00
> 1999,Chevy,"Venture ""Extended Edition""","",4900.00
> 1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
> 1996,Jeep,Grand Cherokee,"MUST SELL!
> air, moon roof, loaded",4799.00
> {noformat}
> Note the newline in side the description in the last record.
> If we do a `SELECT *` query, the file is parsed fine; we get 4 records.
> If we do a `SELECT Year, Model` query, the CSV reader uses a special trick: 
> it short-circuits reads on the three columns that are not wanted:
> {code}
> TextReader.parseRecord() {
> ...
> if (earlyTerm) {
>   if (ch != newLine) {
> input.skipLines(1); // <-- skip lines
>   }
>   break;
> }
> {code}
> This method skips forward in the file, discarding characters until it hits a 
> newline:
> {code}
>   do {
> nextChar();
>   } while (lineCount < expectedLineCount);
> {code}
> Note that this code handles individual characters, it is not aware of 
> per-field semantics. That is, unlike the higher-level parser methods, the 
> `nextChar()` method does not consider newlines inside of quoted fields to be 
> special.
> This problem shows up acutely in a `SELECT COUNT\(*)` style query that skips 
> all fields; the result is we count the input as five lines, not four.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331765083
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapFormatConfig.java
 ##
 @@ -17,12 +17,27 @@
  */
 package org.apache.drill.exec.store.pcap;
 
+import com.fasterxml.jackson.annotation.JsonInclude;
 import com.fasterxml.jackson.annotation.JsonTypeName;
 import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.List;
 
 @JsonTypeName("pcap")
 public class PcapFormatConfig implements FormatPluginConfig {
 
+  private static final List DEFAULT_EXTS = ImmutableList.of("pcap");
 
 Review comment:
   Best if we define `"pcap"` to be a constant and use that constant here and 
above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331767121
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapFormatPlugin.java
 ##
 @@ -17,112 +17,71 @@
  */
 package org.apache.drill.exec.store.pcap;
 
-import org.apache.drill.exec.planner.common.DrillStatsTable;
-import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileScanBuilder;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
 import org.apache.drill.common.exceptions.ExecutionSetupException;
-import org.apache.drill.common.expression.SchemaPath;
 import org.apache.drill.common.logical.StoragePluginConfig;
-import org.apache.drill.exec.ops.FragmentContext;
-import org.apache.drill.exec.planner.logical.DrillTable;
 import org.apache.drill.exec.proto.UserBitShared;
 import org.apache.drill.exec.server.DrillbitContext;
-import org.apache.drill.exec.store.RecordReader;
-import org.apache.drill.exec.store.RecordWriter;
-import org.apache.drill.exec.store.SchemaConfig;
-import org.apache.drill.exec.store.dfs.BasicFormatMatcher;
-import org.apache.drill.exec.store.dfs.DrillFileSystem;
-import org.apache.drill.exec.store.dfs.FileSelection;
-import org.apache.drill.exec.store.dfs.FileSystemPlugin;
-import org.apache.drill.exec.store.dfs.FormatMatcher;
-import org.apache.drill.exec.store.dfs.FormatSelection;
-import org.apache.drill.exec.store.dfs.MagicString;
 import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
-import org.apache.drill.exec.store.dfs.easy.EasyWriter;
-import org.apache.drill.exec.store.dfs.easy.FileWork;
 import org.apache.hadoop.conf.Configuration;
-
-import java.io.IOException;
-import java.util.List;
-import java.util.regex.Pattern;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
+import org.apache.drill.exec.store.pcap.PcapBatchReader.PcapReaderConfig;
 
 public class PcapFormatPlugin extends EasyFormatPlugin {
 
-  private final PcapFormatMatcher matcher;
-
-  public PcapFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf,
-  StoragePluginConfig storagePluginConfig) {
-this(name, context, fsConf, storagePluginConfig, new PcapFormatConfig());
-  }
-
-  public PcapFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf, StoragePluginConfig config, PcapFormatConfig formatPluginConfig) {
-super(name, context, fsConf, config, formatPluginConfig, true, false, 
true, false, Lists.newArrayList("pcap"), "pcap");
-this.matcher = new PcapFormatMatcher(this);
-  }
-
-  @Override
-  public boolean supportsPushDown() {
-return true;
-  }
-
-  @Override
-  public RecordReader getRecordReader(FragmentContext context, DrillFileSystem 
dfs, FileWork fileWork, List columns, String userName) throws 
ExecutionSetupException {
-return new PcapRecordReader(fileWork.getPath(), dfs, columns);
-  }
-
-  @Override
-  public RecordWriter getRecordWriter(FragmentContext context, EasyWriter 
writer) throws IOException {
-throw new UnsupportedOperationException("unimplemented");
-  }
+  private static class PcapReaderFactory extends FileReaderFactory {
+private final PcapReaderConfig readerConfig;
 
-  @Override
-  public int getReaderOperatorType() {
-return UserBitShared.CoreOperatorType.PCAP_SUB_SCAN_VALUE;
-  }
+public PcapReaderFactory(PcapReaderConfig config) {
+  readerConfig = config;
+}
 
-  @Override
-  public int getWriterOperatorType() {
-throw new UnsupportedOperationException();
+@Override
+public ManagedReader newReader() {
+  return new PcapBatchReader(readerConfig);
+}
   }
-
-  @Override
-  public FormatMatcher getMatcher() {
-return this.matcher;
+  public PcapFormatPlugin(String name, DrillbitContext context,
+   Configuration fsConf, StoragePluginConfig 
storageConfig,
+   PcapFormatConfig formatConfig) {
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
   }
 
-  @Override
-  public boolean supportsStatistics() {
-return false;
+  private static EasyFormatConfig easyConfig(Configuration fsConf, 
PcapFormatConfig pluginConfig) {
+EasyFormatConfig config = new EasyFormatConfig();
+config.readable = true;
+config.writable = false;
+  

[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331767290
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/pcap/TestPcapRecordReader.java
 ##
 @@ -45,8 +45,16 @@ public void testStarQuery() throws Exception {
   @Test
   public void testCorruptPCAPQuery() throws Exception {
 runSQLVerifyCount("select * from dfs.`store/pcap/testv1.pcap`", 7000);
-runSQLVerifyCount("select * from dfs.`store/pcap/testv1.pcap` WHERE 
is_corrupt=false", 6408);
-runSQLVerifyCount("select * from dfs.`store/pcap/testv1.pcap` WHERE 
is_corrupt=true", 592);
+  }
+
+  @Test
+  public void testTrueCorruptPCAPQuery() throws Exception {
+runSQLVerifyCount("select * from dfs.`store/pcap/testv1.pcap` WHERE 
is_corrupt=true", 16);
+  }
+
+  @Test
+  public void testNotCorruptPCAPQuery() throws Exception {
+runSQLVerifyCount("select * from dfs.`store/pcap/testv1.pcap` WHERE 
is_corrupt=false", 6984);
 
 Review comment:
   Looks like the original test were pretty light. It leaves to the user to 
test per-column setup and type conversions.
   
   Would recommend adding tests that:
   
   1) Verify the schema: names, types
   2) Verifies the data (using the usual mechanism to read a few rows and 
validate the results.)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331764812
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java
 ##
 @@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII;
+
+public class PcapBatchReader implements ManagedReader {
+
+  private FileSplit split;
+
+  private PcapReaderConfig readerConfig;
+
+  private PacketDecoder decoder;
+
+  private ResultSetLoader loader;
+
+  private FSDataInputStream fsStream;
+
+  private Schema pcapSchema;
+
+  private int validBytes;
+
+  private byte[] buffer;
+
+  private int offset = 0;
+
+  static final int BUFFER_SIZE = 500_000;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapBatchReader.class);
+
+  public static class PcapReaderConfig {
+
+protected final PcapFormatPlugin plugin;
+public PcapReaderConfig(PcapFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public PcapBatchReader(PcapReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+this.pcapSchema = new Schema();
+TupleMetadata schema = pcapSchema.buildSchema(builder);
+negotiator.setTableSchema(schema, false);
+this.loader = negotiator.build();
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!parseNextPacket(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  fsStream.close();
+} catch (IOException e) {
+  throw UserException.
+dataReadError()
+.addContext("Error closing InputStream: " + e.getMessage())
+.build(logger);
+}
+fsStream = null;
+this.buffer = null;
+this.decoder = null;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+String filePath = null;
+try {
+  filePath = split.getPath().toString();
+  this.fsStream = negotiator.fileSystem().open(new Path(filePath));
+  this.decoder = new PacketDecoder(fsStream);
+  this.buffer = new byte[BUFFER_SIZE + decoder.getMaxLength()];
+  this.validBytes = fsStream.read(buffer);
+} catch (IOException io) {
+  throw UserException.dataReadError(io).addContext("File name:", 
filePath).build(logger);
+}
+  }
+
+  private boolean parseNextPacket(RowSetLoader rowWriter){
+Packet packet = new Packet();
+
+if(offset >= validBytes){
 
 Review comment:
   Nit: space before {


This is an automat

[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331767252
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/schema/Schema.java
 ##
 @@ -78,4 +81,23 @@ public ColumnDto getColumnByIndex(int i) {
   public int getNumberOfColumns() {
 return columns.size();
   }
+
+  public TupleMetadata buildSchema(SchemaBuilder builder) {
+for(ColumnDto column : columns) {
+  if(column.getColumnType() == PcapTypes.BOOLEAN) {
+builder.addNullable(column.getColumnName(), TypeProtos.MinorType.BIT);
+  } else if(column.getColumnType() == PcapTypes.INTEGER) {
+builder.addNullable(column.getColumnName(), TypeProtos.MinorType.INT);
+  } else if(column.getColumnType() == PcapTypes.STRING) {
+builder.addNullable(column.getColumnName(), 
TypeProtos.MinorType.VARCHAR);
+  } else if(column.getColumnType() == PcapTypes.TIMESTAMP) {
+builder.addNullable(column.getColumnName(), 
TypeProtos.MinorType.TIMESTAMP);
+  } else if(column.getColumnType() == PcapTypes.LONG) {
+builder.addNullable(column.getColumnName(), 
TypeProtos.MinorType.BIGINT);
+  }
+}
+
+TupleMetadata schema = builder.buildSchema();
+return schema;
 
 Review comment:
   Nit: `return builder.buildSchema();`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331764935
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java
 ##
 @@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII;
+
+public class PcapBatchReader implements ManagedReader {
+
+  private FileSplit split;
+
+  private PcapReaderConfig readerConfig;
+
+  private PacketDecoder decoder;
+
+  private ResultSetLoader loader;
+
+  private FSDataInputStream fsStream;
+
+  private Schema pcapSchema;
+
+  private int validBytes;
+
+  private byte[] buffer;
+
+  private int offset = 0;
+
+  static final int BUFFER_SIZE = 500_000;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapBatchReader.class);
+
+  public static class PcapReaderConfig {
+
+protected final PcapFormatPlugin plugin;
+public PcapReaderConfig(PcapFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public PcapBatchReader(PcapReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+this.pcapSchema = new Schema();
+TupleMetadata schema = pcapSchema.buildSchema(builder);
+negotiator.setTableSchema(schema, false);
+this.loader = negotiator.build();
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!parseNextPacket(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  fsStream.close();
+} catch (IOException e) {
+  throw UserException.
+dataReadError()
+.addContext("Error closing InputStream: " + e.getMessage())
+.build(logger);
+}
+fsStream = null;
+this.buffer = null;
+this.decoder = null;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+String filePath = null;
+try {
+  filePath = split.getPath().toString();
+  this.fsStream = negotiator.fileSystem().open(new Path(filePath));
+  this.decoder = new PacketDecoder(fsStream);
+  this.buffer = new byte[BUFFER_SIZE + decoder.getMaxLength()];
+  this.validBytes = fsStream.read(buffer);
+} catch (IOException io) {
+  throw UserException.dataReadError(io).addContext("File name:", 
filePath).build(logger);
+}
+  }
+
+  private boolean parseNextPacket(RowSetLoader rowWriter){
+Packet packet = new Packet();
+
+if(offset >= validBytes){
+  return false;
+}
+if (validBytes - offset < decoder.getMaxLength()) {
+  getNextPacket(rowWriter);
+}
+

[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331764842
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java
 ##
 @@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII;
+
+public class PcapBatchReader implements ManagedReader {
+
+  private FileSplit split;
+
+  private PcapReaderConfig readerConfig;
+
+  private PacketDecoder decoder;
+
+  private ResultSetLoader loader;
+
+  private FSDataInputStream fsStream;
+
+  private Schema pcapSchema;
+
+  private int validBytes;
+
+  private byte[] buffer;
+
+  private int offset = 0;
+
+  static final int BUFFER_SIZE = 500_000;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapBatchReader.class);
+
+  public static class PcapReaderConfig {
+
+protected final PcapFormatPlugin plugin;
+public PcapReaderConfig(PcapFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public PcapBatchReader(PcapReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+this.pcapSchema = new Schema();
+TupleMetadata schema = pcapSchema.buildSchema(builder);
+negotiator.setTableSchema(schema, false);
+this.loader = negotiator.build();
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!parseNextPacket(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  fsStream.close();
+} catch (IOException e) {
+  throw UserException.
+dataReadError()
+.addContext("Error closing InputStream: " + e.getMessage())
+.build(logger);
+}
+fsStream = null;
+this.buffer = null;
+this.decoder = null;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+String filePath = null;
+try {
+  filePath = split.getPath().toString();
+  this.fsStream = negotiator.fileSystem().open(new Path(filePath));
+  this.decoder = new PacketDecoder(fsStream);
+  this.buffer = new byte[BUFFER_SIZE + decoder.getMaxLength()];
+  this.validBytes = fsStream.read(buffer);
+} catch (IOException io) {
+  throw UserException.dataReadError(io).addContext("File name:", 
filePath).build(logger);
+}
+  }
+
+  private boolean parseNextPacket(RowSetLoader rowWriter){
+Packet packet = new Packet();
+
+if(offset >= validBytes){
+  return false;
+}
+if (validBytes - offset < decoder.getMaxLength()) {
+  getNextPacket(rowWriter);
+}
+

[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331764819
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java
 ##
 @@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII;
+
+public class PcapBatchReader implements ManagedReader {
+
+  private FileSplit split;
+
+  private PcapReaderConfig readerConfig;
+
+  private PacketDecoder decoder;
+
+  private ResultSetLoader loader;
+
+  private FSDataInputStream fsStream;
+
+  private Schema pcapSchema;
+
+  private int validBytes;
+
+  private byte[] buffer;
+
+  private int offset = 0;
+
+  static final int BUFFER_SIZE = 500_000;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapBatchReader.class);
+
+  public static class PcapReaderConfig {
+
+protected final PcapFormatPlugin plugin;
+public PcapReaderConfig(PcapFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public PcapBatchReader(PcapReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+this.pcapSchema = new Schema();
+TupleMetadata schema = pcapSchema.buildSchema(builder);
+negotiator.setTableSchema(schema, false);
+this.loader = negotiator.build();
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!parseNextPacket(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  fsStream.close();
+} catch (IOException e) {
+  throw UserException.
+dataReadError()
+.addContext("Error closing InputStream: " + e.getMessage())
+.build(logger);
+}
+fsStream = null;
+this.buffer = null;
+this.decoder = null;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+String filePath = null;
+try {
+  filePath = split.getPath().toString();
+  this.fsStream = negotiator.fileSystem().open(new Path(filePath));
+  this.decoder = new PacketDecoder(fsStream);
+  this.buffer = new byte[BUFFER_SIZE + decoder.getMaxLength()];
+  this.validBytes = fsStream.read(buffer);
+} catch (IOException io) {
+  throw UserException.dataReadError(io).addContext("File name:", 
filePath).build(logger);
+}
+  }
+
+  private boolean parseNextPacket(RowSetLoader rowWriter){
+Packet packet = new Packet();
+
+if(offset >= validBytes){
+  return false;
+}
+if (validBytes - offset < decoder.getMaxLength()) {
+  getNextPacket(rowWriter);
+}
+

[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331765097
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapFormatConfig.java
 ##
 @@ -17,12 +17,27 @@
  */
 package org.apache.drill.exec.store.pcap;
 
+import com.fasterxml.jackson.annotation.JsonInclude;
 import com.fasterxml.jackson.annotation.JsonTypeName;
 import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.List;
 
 @JsonTypeName("pcap")
 public class PcapFormatConfig implements FormatPluginConfig {
 
+  private static final List DEFAULT_EXTS = ImmutableList.of("pcap");
+  public List extensions;
+
+  @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+  public List getExtensions() {
+if (extensions == null) {
+  return DEFAULT_EXTS;
+}
+return extensions;
 
 Review comment:
   Nit: `return extensions == null ? DEFAULT_EXTS : extensions`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331765111
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapFormatPlugin.java
 ##
 @@ -17,112 +17,71 @@
  */
 package org.apache.drill.exec.store.pcap;
 
-import org.apache.drill.exec.planner.common.DrillStatsTable;
-import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileScanBuilder;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
 import org.apache.drill.common.exceptions.ExecutionSetupException;
-import org.apache.drill.common.expression.SchemaPath;
 import org.apache.drill.common.logical.StoragePluginConfig;
-import org.apache.drill.exec.ops.FragmentContext;
-import org.apache.drill.exec.planner.logical.DrillTable;
 import org.apache.drill.exec.proto.UserBitShared;
 import org.apache.drill.exec.server.DrillbitContext;
-import org.apache.drill.exec.store.RecordReader;
-import org.apache.drill.exec.store.RecordWriter;
-import org.apache.drill.exec.store.SchemaConfig;
-import org.apache.drill.exec.store.dfs.BasicFormatMatcher;
-import org.apache.drill.exec.store.dfs.DrillFileSystem;
-import org.apache.drill.exec.store.dfs.FileSelection;
-import org.apache.drill.exec.store.dfs.FileSystemPlugin;
-import org.apache.drill.exec.store.dfs.FormatMatcher;
-import org.apache.drill.exec.store.dfs.FormatSelection;
-import org.apache.drill.exec.store.dfs.MagicString;
 import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
-import org.apache.drill.exec.store.dfs.easy.EasyWriter;
-import org.apache.drill.exec.store.dfs.easy.FileWork;
 import org.apache.hadoop.conf.Configuration;
-
-import java.io.IOException;
-import java.util.List;
-import java.util.regex.Pattern;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
+import org.apache.drill.exec.store.pcap.PcapBatchReader.PcapReaderConfig;
 
 public class PcapFormatPlugin extends EasyFormatPlugin {
 
-  private final PcapFormatMatcher matcher;
-
-  public PcapFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf,
-  StoragePluginConfig storagePluginConfig) {
-this(name, context, fsConf, storagePluginConfig, new PcapFormatConfig());
-  }
-
-  public PcapFormatPlugin(String name, DrillbitContext context, Configuration 
fsConf, StoragePluginConfig config, PcapFormatConfig formatPluginConfig) {
-super(name, context, fsConf, config, formatPluginConfig, true, false, 
true, false, Lists.newArrayList("pcap"), "pcap");
-this.matcher = new PcapFormatMatcher(this);
-  }
-
-  @Override
-  public boolean supportsPushDown() {
-return true;
-  }
-
-  @Override
-  public RecordReader getRecordReader(FragmentContext context, DrillFileSystem 
dfs, FileWork fileWork, List columns, String userName) throws 
ExecutionSetupException {
-return new PcapRecordReader(fileWork.getPath(), dfs, columns);
-  }
-
-  @Override
-  public RecordWriter getRecordWriter(FragmentContext context, EasyWriter 
writer) throws IOException {
-throw new UnsupportedOperationException("unimplemented");
-  }
+  private static class PcapReaderFactory extends FileReaderFactory {
+private final PcapReaderConfig readerConfig;
 
-  @Override
-  public int getReaderOperatorType() {
-return UserBitShared.CoreOperatorType.PCAP_SUB_SCAN_VALUE;
-  }
+public PcapReaderFactory(PcapReaderConfig config) {
+  readerConfig = config;
+}
 
-  @Override
-  public int getWriterOperatorType() {
-throw new UnsupportedOperationException();
+@Override
+public ManagedReader newReader() {
+  return new PcapBatchReader(readerConfig);
+}
   }
-
-  @Override
-  public FormatMatcher getMatcher() {
-return this.matcher;
+  public PcapFormatPlugin(String name, DrillbitContext context,
+   Configuration fsConf, StoragePluginConfig 
storageConfig,
+   PcapFormatConfig formatConfig) {
+super(name, easyConfig(fsConf, formatConfig), context, storageConfig, 
formatConfig);
   }
 
-  @Override
-  public boolean supportsStatistics() {
-return false;
+  private static EasyFormatConfig easyConfig(Configuration fsConf, 
PcapFormatConfig pluginConfig) {
+EasyFormatConfig config = new EasyFormatConfig();
+config.readable = true;
+config.writable = false;
+  

[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r328425281
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java
 ##
 @@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII;
+
+public class PcapBatchReader implements ManagedReader {
+
+  private FileSplit split;
+
+  private PcapReaderConfig readerConfig;
+
+  private PacketDecoder decoder;
+
+  private ResultSetLoader loader;
+
+  private FSDataInputStream fsStream;
+
+  private Schema pcapSchema;
+
+  private int validBytes;
+
+  private byte[] buffer;
+
+  private int offset = 0;
 
 Review comment:
   No need for = 0


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331767130
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/schema/Schema.java
 ##
 @@ -78,4 +81,23 @@ public ColumnDto getColumnByIndex(int i) {
   public int getNumberOfColumns() {
 return columns.size();
   }
+
+  public TupleMetadata buildSchema(SchemaBuilder builder) {
+for(ColumnDto column : columns) {
+  if(column.getColumnType() == PcapTypes.BOOLEAN) {
 
 Review comment:
   Nit: space after if


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331764799
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java
 ##
 @@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII;
+
+public class PcapBatchReader implements ManagedReader {
+
+  private FileSplit split;
+
+  private PcapReaderConfig readerConfig;
+
+  private PacketDecoder decoder;
+
+  private ResultSetLoader loader;
+
+  private FSDataInputStream fsStream;
+
+  private Schema pcapSchema;
+
+  private int validBytes;
+
+  private byte[] buffer;
+
+  private int offset = 0;
+
+  static final int BUFFER_SIZE = 500_000;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapBatchReader.class);
+
+  public static class PcapReaderConfig {
+
+protected final PcapFormatPlugin plugin;
+public PcapReaderConfig(PcapFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public PcapBatchReader(PcapReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+this.pcapSchema = new Schema();
+TupleMetadata schema = pcapSchema.buildSchema(builder);
+negotiator.setTableSchema(schema, false);
+this.loader = negotiator.build();
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!parseNextPacket(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  fsStream.close();
+} catch (IOException e) {
+  throw UserException.
+dataReadError()
+.addContext("Error closing InputStream: " + e.getMessage())
+.build(logger);
+}
+fsStream = null;
+this.buffer = null;
+this.decoder = null;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+String filePath = null;
+try {
+  filePath = split.getPath().toString();
+  this.fsStream = negotiator.fileSystem().open(new Path(filePath));
+  this.decoder = new PacketDecoder(fsStream);
+  this.buffer = new byte[BUFFER_SIZE + decoder.getMaxLength()];
+  this.validBytes = fsStream.read(buffer);
 
 Review comment:
   Nit: The this. prefix is not needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Ap

[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331765043
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java
 ##
 @@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII;
+
+public class PcapBatchReader implements ManagedReader {
+
+  private FileSplit split;
+
+  private PcapReaderConfig readerConfig;
+
+  private PacketDecoder decoder;
+
+  private ResultSetLoader loader;
+
+  private FSDataInputStream fsStream;
+
+  private Schema pcapSchema;
+
+  private int validBytes;
+
+  private byte[] buffer;
+
+  private int offset = 0;
+
+  static final int BUFFER_SIZE = 500_000;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapBatchReader.class);
+
+  public static class PcapReaderConfig {
+
+protected final PcapFormatPlugin plugin;
+public PcapReaderConfig(PcapFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public PcapBatchReader(PcapReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+this.pcapSchema = new Schema();
+TupleMetadata schema = pcapSchema.buildSchema(builder);
+negotiator.setTableSchema(schema, false);
+this.loader = negotiator.build();
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!parseNextPacket(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  fsStream.close();
+} catch (IOException e) {
+  throw UserException.
+dataReadError()
+.addContext("Error closing InputStream: " + e.getMessage())
+.build(logger);
+}
+fsStream = null;
+this.buffer = null;
+this.decoder = null;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+String filePath = null;
+try {
+  filePath = split.getPath().toString();
+  this.fsStream = negotiator.fileSystem().open(new Path(filePath));
+  this.decoder = new PacketDecoder(fsStream);
+  this.buffer = new byte[BUFFER_SIZE + decoder.getMaxLength()];
+  this.validBytes = fsStream.read(buffer);
+} catch (IOException io) {
+  throw UserException.dataReadError(io).addContext("File name:", 
filePath).build(logger);
+}
+  }
+
+  private boolean parseNextPacket(RowSetLoader rowWriter){
+Packet packet = new Packet();
+
+if(offset >= validBytes){
+  return false;
+}
+if (validBytes - offset < decoder.getMaxLength()) {
+  getNextPacket(rowWriter);
+}
+

[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r328425343
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java
 ##
 @@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII;
+
+public class PcapBatchReader implements ManagedReader {
+
+  private FileSplit split;
+
+  private PcapReaderConfig readerConfig;
+
+  private PacketDecoder decoder;
+
+  private ResultSetLoader loader;
+
+  private FSDataInputStream fsStream;
+
+  private Schema pcapSchema;
+
+  private int validBytes;
+
+  private byte[] buffer;
+
+  private int offset = 0;
+
+  static final int BUFFER_SIZE = 500_000;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapBatchReader.class);
 
 Review comment:
   Nit: statics usually go at the top of a class


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r328425702
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java
 ##
 @@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII;
+
+public class PcapBatchReader implements ManagedReader {
+
+  private FileSplit split;
+
+  private PcapReaderConfig readerConfig;
+
+  private PacketDecoder decoder;
+
+  private ResultSetLoader loader;
+
+  private FSDataInputStream fsStream;
+
+  private Schema pcapSchema;
+
+  private int validBytes;
+
+  private byte[] buffer;
+
+  private int offset = 0;
+
+  static final int BUFFER_SIZE = 500_000;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapBatchReader.class);
+
+  public static class PcapReaderConfig {
+
+protected final PcapFormatPlugin plugin;
 
 Review comment:
   If this config holds only a single item, then it can be omitted. Just pass 
teh plugin into the batch reader instead of this class.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331764958
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java
 ##
 @@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII;
+
+public class PcapBatchReader implements ManagedReader {
+
+  private FileSplit split;
+
+  private PcapReaderConfig readerConfig;
+
+  private PacketDecoder decoder;
+
+  private ResultSetLoader loader;
+
+  private FSDataInputStream fsStream;
+
+  private Schema pcapSchema;
+
+  private int validBytes;
+
+  private byte[] buffer;
+
+  private int offset = 0;
+
+  static final int BUFFER_SIZE = 500_000;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapBatchReader.class);
+
+  public static class PcapReaderConfig {
+
+protected final PcapFormatPlugin plugin;
+public PcapReaderConfig(PcapFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public PcapBatchReader(PcapReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+this.pcapSchema = new Schema();
+TupleMetadata schema = pcapSchema.buildSchema(builder);
+negotiator.setTableSchema(schema, false);
+this.loader = negotiator.build();
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!parseNextPacket(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  fsStream.close();
+} catch (IOException e) {
+  throw UserException.
+dataReadError()
+.addContext("Error closing InputStream: " + e.getMessage())
+.build(logger);
+}
+fsStream = null;
+this.buffer = null;
+this.decoder = null;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+String filePath = null;
+try {
+  filePath = split.getPath().toString();
+  this.fsStream = negotiator.fileSystem().open(new Path(filePath));
+  this.decoder = new PacketDecoder(fsStream);
+  this.buffer = new byte[BUFFER_SIZE + decoder.getMaxLength()];
+  this.validBytes = fsStream.read(buffer);
+} catch (IOException io) {
+  throw UserException.dataReadError(io).addContext("File name:", 
filePath).build(logger);
+}
+  }
+
+  private boolean parseNextPacket(RowSetLoader rowWriter){
+Packet packet = new Packet();
+
+if(offset >= validBytes){
+  return false;
+}
+if (validBytes - offset < decoder.getMaxLength()) {
+  getNextPacket(rowWriter);
+}
+

[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331765065
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/PcapBatchReader.java
 ##
 @@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.pcap;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.pcap.decoder.Packet;
+import org.apache.drill.exec.store.pcap.decoder.PacketDecoder;
+import org.apache.drill.exec.store.pcap.schema.Schema;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+
+import static 
org.apache.drill.exec.store.pcap.PcapFormatUtils.parseBytesToASCII;
+
+public class PcapBatchReader implements ManagedReader {
+
+  private FileSplit split;
+
+  private PcapReaderConfig readerConfig;
+
+  private PacketDecoder decoder;
+
+  private ResultSetLoader loader;
+
+  private FSDataInputStream fsStream;
+
+  private Schema pcapSchema;
+
+  private int validBytes;
+
+  private byte[] buffer;
+
+  private int offset = 0;
+
+  static final int BUFFER_SIZE = 500_000;
+
+  private static final Logger logger = 
LoggerFactory.getLogger(PcapBatchReader.class);
+
+  public static class PcapReaderConfig {
+
+protected final PcapFormatPlugin plugin;
+public PcapReaderConfig(PcapFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public PcapBatchReader(PcapReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder();
+this.pcapSchema = new Schema();
+TupleMetadata schema = pcapSchema.buildSchema(builder);
+negotiator.setTableSchema(schema, false);
+this.loader = negotiator.build();
+return true;
+  }
+
+  @Override
+  public boolean next() {
+RowSetLoader rowWriter = loader.writer();
+while (!rowWriter.isFull()) {
+  if (!parseNextPacket(rowWriter)) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+try {
+  fsStream.close();
+} catch (IOException e) {
+  throw UserException.
+dataReadError()
+.addContext("Error closing InputStream: " + e.getMessage())
+.build(logger);
+}
+fsStream = null;
+this.buffer = null;
+this.decoder = null;
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+String filePath = null;
+try {
+  filePath = split.getPath().toString();
+  this.fsStream = negotiator.fileSystem().open(new Path(filePath));
+  this.decoder = new PacketDecoder(fsStream);
+  this.buffer = new byte[BUFFER_SIZE + decoder.getMaxLength()];
+  this.validBytes = fsStream.read(buffer);
+} catch (IOException io) {
+  throw UserException.dataReadError(io).addContext("File name:", 
filePath).build(logger);
+}
+  }
+
+  private boolean parseNextPacket(RowSetLoader rowWriter){
+Packet packet = new Packet();
+
+if(offset >= validBytes){
+  return false;
+}
+if (validBytes - offset < decoder.getMaxLength()) {
+  getNextPacket(rowWriter);
+}
+

[GitHub] [drill] paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert PCAP Format Plugin to EVF

2019-10-05 Thread GitBox
paul-rogers commented on a change in pull request #1862: DRILL-7385: Convert 
PCAP Format Plugin to EVF
URL: https://github.com/apache/drill/pull/1862#discussion_r331767234
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/schema/Schema.java
 ##
 @@ -78,4 +81,23 @@ public ColumnDto getColumnByIndex(int i) {
   public int getNumberOfColumns() {
 return columns.size();
   }
+
+  public TupleMetadata buildSchema(SchemaBuilder builder) {
+for(ColumnDto column : columns) {
+  if(column.getColumnType() == PcapTypes.BOOLEAN) {
+builder.addNullable(column.getColumnName(), TypeProtos.MinorType.BIT);
+  } else if(column.getColumnType() == PcapTypes.INTEGER) {
+builder.addNullable(column.getColumnName(), TypeProtos.MinorType.INT);
+  } else if(column.getColumnType() == PcapTypes.STRING) {
+builder.addNullable(column.getColumnName(), 
TypeProtos.MinorType.VARCHAR);
+  } else if(column.getColumnType() == PcapTypes.TIMESTAMP) {
+builder.addNullable(column.getColumnName(), 
TypeProtos.MinorType.TIMESTAMP);
+  } else if(column.getColumnType() == PcapTypes.LONG) {
+builder.addNullable(column.getColumnName(), 
TypeProtos.MinorType.BIGINT);
 
 Review comment:
   Nit: would be simpler to do something like the following:
   
   ```
 for (ColumnTdo column : columns) {
   builder.addNullable(column.getColumnName(), convertType(column));
 }
   ...
   MinorType convertType(ColumnDto column) {
 switch (column.getColumnType()) {
   case PcapTypes.BOOLEAN:
 return MinorType.BIT;
  ...
   ```
   
   The above requires that we handle the case of a type that is not in the 
switch: maybe throw an exception or some such.
   
   Also, if the types are integers and configuous (which they are of PcapTypes 
is an enum), then you can create a mapping table:
   
   ```
 MinorType typeMap[] = new int[PcapTypes.getSize()];
 typeMap[PcapTypes.BOOLEAN.getOrdinal()] = MinorType.BIT;
 ...
   ```
   
   Then:
   
   ```
 for (ColumnTdo column : columns) {
   builder.addNullable(column.getColumnName(), 
typeMap[column.getColumType().getOrdinal());
   ```
   
   The mapping table is the fastest solution. But, this is not super critical 
in a once-per-file activity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers opened a new pull request #1867: DRILL-7358: Fix COUNT(*) for empty text files

2019-10-05 Thread GitBox
paul-rogers opened a new pull request #1867: DRILL-7358: Fix COUNT(*) for empty 
text files
URL: https://github.com/apache/drill/pull/1867
 
 
   Fixes a subtle error when a text file has a header (and so has a
   schema), but is in a COUNT(*) query, so that no columns are
   projected. Ensures that, in this case, an empty schema is
   treated as a valid result set.
   
   Tests: updated CSV tests to include this case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on issue #1867: DRILL-7358: Fix COUNT(*) for empty text files

2019-10-05 Thread GitBox
paul-rogers commented on issue #1867: DRILL-7358: Fix COUNT(*) for empty text 
files
URL: https://github.com/apache/drill/pull/1867#issuecomment-538706729
 
 
   @arina-ielchiieva, please review. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services