[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-02-12 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r256074380
 
 

 ##
 File path: 
protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java
 ##
 @@ -895,6 +903,7 @@ public static CoreOperatorType valueOf(int value) {
 case 55: return PCAPNG_SUB_SCAN;
 case 56: return RUNTIME_FILTER;
 case 57: return ROWKEY_JOIN;
+case 58: return SYSLOG_SUB_SCAN;
 
 Review comment:
   Based on this message `Could NOT find Protobuf (missing: Protobuf_LIBRARIES 
Protobuf_INCLUDE_DIR)`, I suppose you missed `>brew install protobuf`.
   You should perform from 0.1 to 2.7 steps from the 
[doc](https://github.com/apache/drill/blob/master/contrib/native/client/readme.macos#L40)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-02-12 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r256075337
 
 

 ##
 File path: distribution/src/assemble/component.xml
 ##
 @@ -426,4 +427,4 @@
 
   
 
-
+
 
 Review comment:
   ```suggestion
   
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-02-12 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r256084710
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogFormatPlugin.java
 ##
 @@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.RecordReader;
+import org.apache.drill.exec.store.RecordWriter;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.dfs.easy.EasyWriter;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
+
+
+import java.util.List;
+
+public class SyslogFormatPlugin extends EasyFormatPlugin {
+
+  public static final String DEFAULT_NAME = "syslog";
+  private final SyslogFormatConfig formatConfig;
+
+  public SyslogFormatPlugin(String name, DrillbitContext context,
+Configuration fsConf, StoragePluginConfig 
storageConfig,
+SyslogFormatConfig formatConfig) {
+super(name, context, fsConf, storageConfig, formatConfig,
+true,  // readable
+false, // writable
+true, // blockSplittable
+true,  // compressible
+Lists.newArrayList(formatConfig.getExtensions()),
+DEFAULT_NAME);
+this.formatConfig = formatConfig;
+  }
+
+  @Override
+  public RecordReader getRecordReader(FragmentContext context, DrillFileSystem 
dfs, FileWork fileWork,
+  List columns, String 
userName) throws ExecutionSetupException {
+return new SyslogRecordReader(context, dfs, fileWork, columns, userName, 
formatConfig);
+  }
+
+  @Override
+  public boolean supportsPushDown() {
+return true;
+  }
+
+  @Override
+  public RecordWriter getRecordWriter(FragmentContext context,
+  EasyWriter writer) throws 
UnsupportedOperationException {
+throw new UnsupportedOperationException("Drill does not support writing 
records to Syslog format.");
+  }
+
+  @Override
+  public int getReaderOperatorType() {
+return CoreOperatorType.SYSLOG_SUB_SCAN_VALUE;
+  }
+
+  @Override
+  public int getWriterOperatorType() {
+throw new UnsupportedOperationException("Drill does not support writing 
records to Syslog format.");
+  }
+}
 
 Review comment:
   ```suggestion
   }
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-02-12 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r256077643
 
 

 ##
 File path: contrib/format-syslog/src/main/resources/drill-module.conf
 ##
 @@ -0,0 +1,22 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+//
+//  This file tells Drill to consider this module when class path scanning.
+//  This file can also include any supplementary configuration information.
+//  This file is in HOCON format, see 
https://github.com/typesafehub/config/blob/master/HOCON.md for more information.
+
+drill.classpath.scanning: {
+  packages += "org.apache.drill.exec.store.syslog"
+}
 
 Review comment:
   ```suggestion
   }
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-02-12 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r256081405
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogRecordReader.java
 ##
 @@ -0,0 +1,403 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.google.common.base.Charsets;
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.fs.Path;
+import org.realityforge.jsyslog.message.StructuredDataParameter;
+import org.realityforge.jsyslog.message.SyslogMessage;
+
+import java.io.BufferedReader;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.text.SimpleDateFormat;
+import java.util.List;
+import java.util.Map;
+import java.util.Iterator;
+
+public class SyslogRecordReader extends AbstractRecordReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SyslogRecordReader.class);
+  private static final int MAX_RECORDS_PER_BATCH = 4096;
+
+  private final DrillFileSystem fileSystem;
+  private final FileWork fileWork;
+  private final String userName;
+  private BufferedReader reader;
+  private DrillBuf buffer;
+  private VectorContainerWriter writer;
+  private SyslogFormatConfig config;
+  private int maxErrors;
+  private boolean flattenStructuredData;
+  private int errorCount;
+  private int lineCount;
+  private List projectedColumns;
+  private String line;
+
+  private SimpleDateFormat df;
+
+  public SyslogRecordReader(FragmentContext context,
+DrillFileSystem fileSystem,
+FileWork fileWork,
+List columns,
+String userName,
+SyslogFormatConfig config) throws 
OutOfMemoryException {
+
+this.fileSystem = fileSystem;
+this.fileWork = fileWork;
+this.userName = userName;
+this.config = config;
+this.maxErrors = config.getMaxErrors();
+this.df = getValidDateObject("-MM-dd'T'HH:mm:ss.SSS'Z'");
+this.errorCount = 0;
+this.buffer = context.getManagedBuffer(2048);
 
 Review comment:
   I can't find any usage of this method in the project. 
   Usually `context.getManagedBuffer()` is used or 
`context.getManagedBuffer().reallocIfNeeded(xxx)`
   Could clarify why `context.getManagedBuffer(2048)` is used? How `2048` was 
evaluated?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-02-12 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r256077359
 
 

 ##
 File path: 
contrib/format-syslog/src/test/java/org/apache/drill/exec/store/syslog/TestSyslogFormat.java
 ##
 @@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.syslog;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.rowSet.RowSet;
+import org.apache.drill.test.rowSet.RowSetBuilder;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.ClassRule;
+
+public class TestSyslogFormat extends ClusterTest {
+
+  @ClassRule
+  public static final BaseDirTestWatcher dirTestWatcher = new 
BaseDirTestWatcher();
+
+  @BeforeClass
+  public static void setup() throws Exception {
+
ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher).maxParallelization(1));
+defineSyslogPlugin();
+  }
+
+  private static void defineSyslogPlugin() throws ExecutionSetupException {
+SyslogFormatConfig sampleConfig = new SyslogFormatConfig();
+sampleConfig.setExtension("syslog");
+
+SyslogFormatConfig flattenedDataConfig = new SyslogFormatConfig();
+flattenedDataConfig.setExtension("syslog1");
+flattenedDataConfig.setFlattenStructuredData(true);
+
+// Define a temporary plugin for the "cp" storage plugin.
+Drillbit drillbit = cluster.drillbit();
+final StoragePluginRegistry pluginRegistry = 
drillbit.getContext().getStorage();
+final FileSystemPlugin plugin = (FileSystemPlugin) 
pluginRegistry.getPlugin("cp");
+final FileSystemConfig pluginConfig = (FileSystemConfig) 
plugin.getConfig();
+pluginConfig.getFormats().put("sample", sampleConfig);
+pluginConfig.getFormats().put("flat", flattenedDataConfig);
+pluginRegistry.createOrUpdate("cp", pluginConfig, false);
+  }
+
+  @Test
+  public void testNonComplexFields() throws RpcException {
+String sql = "SELECT event_date," +
+"severity_code," +
+"severity," +
+"facility_code," +
+"facility," +
+"ip," +
+"process_id," +
+"message_id," +
+"structured_data_text " +
+"FROM cp.`syslog/logs.syslog`";
+
+RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+TupleMetadata expectedSchema = new SchemaBuilder()
+.add("event_date", TypeProtos.MinorType.TIMESTAMP, 
TypeProtos.DataMode.OPTIONAL)
+.add("severity_code", TypeProtos.MinorType.INT, 
TypeProtos.DataMode.OPTIONAL)
+.add("severity", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+.add("facility_code", TypeProtos.MinorType.INT, 
TypeProtos.DataMode.OPTIONAL)
+.add("facility", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+.add("ip", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+.add("process_id", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+.add("message_id", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+.add("structured_data_text", TypeProtos.MinorType.VARCHAR, 
TypeProtos.DataMode.OPTIONAL)
+.buildSchema();
+
+RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema)
+.addRow(1065910455003L, 2, "CRIT", 4, "AUTH", 
"mymachine.example.com", null, "", "")
+.addRow(482196050520L, 2, "CRIT", 4, "AUTH", 
"mymachine.example.com", null,

[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-02-12 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r256076616
 
 

 ##
 File path: contrib/native/client/src/protobuf/UserBitShared.pb.h
 ##
 @@ -1,65 +1,152 @@
 // Generated by the protocol buffer compiler.  DO NOT EDIT!
 // source: UserBitShared.proto
 
-#ifndef PROTOBUF_UserBitShared_2eproto__INCLUDED
-#define PROTOBUF_UserBitShared_2eproto__INCLUDED
+#ifndef PROTOBUF_INCLUDED_UserBitShared_2eproto
+#define PROTOBUF_INCLUDED_UserBitShared_2eproto
 
 #include 
 
 #include 
 
-#if GOOGLE_PROTOBUF_VERSION < 2005000
+#if GOOGLE_PROTOBUF_VERSION < 3006001
 
 Review comment:
   Please install 2.5 protobuf version. Drill currently uses it.
   But it will be updated soon, see #1639 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-02-05 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r253767075
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogRecordReader.java
 ##
 @@ -0,0 +1,403 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.google.common.base.Charsets;
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.fs.Path;
+import org.realityforge.jsyslog.message.StructuredDataParameter;
+import org.realityforge.jsyslog.message.SyslogMessage;
+
+import java.io.BufferedReader;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.text.SimpleDateFormat;
+import java.util.List;
+import java.util.Map;
+import java.util.Iterator;
+
+public class SyslogRecordReader extends AbstractRecordReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SyslogRecordReader.class);
+  private static final int MAX_RECORDS_PER_BATCH = 4096;
+
+  private final DrillFileSystem fileSystem;
+  private final FileWork fileWork;
+  private final String userName;
+  private BufferedReader reader;
+  private DrillBuf buffer;
+  private VectorContainerWriter writer;
+  private SyslogFormatConfig config;
+  private int maxErrors;
+  private boolean flattenStructuredData;
+  private int errorCount;
+  private int lineCount;
+  private List projectedColumns;
+  private String line;
+
+  private SimpleDateFormat df;
+
+  public SyslogRecordReader(FragmentContext context,
+DrillFileSystem fileSystem,
+FileWork fileWork,
+List columns,
+String userName,
+SyslogFormatConfig config) throws 
OutOfMemoryException {
+
+this.fileSystem = fileSystem;
+this.fileWork = fileWork;
+this.userName = userName;
+this.config = config;
+this.maxErrors = config.getMaxErrors();
+this.df = getValidDateObject("-MM-dd'T'HH:mm:ss.SSS'Z'");
+this.errorCount = 0;
+this.buffer = context.getManagedBuffer(2048);
 
 Review comment:
   The constant for 4096 is created, but 2048 - the new value is introduced and 
used?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-02-05 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r253763788
 
 

 ##
 File path: contrib/format-syslog/README.md
 ##
 @@ -0,0 +1,41 @@
+# Syslog Format Plugin
 
 Review comment:
   Once you will create a Jira please post its number here.
   
   I think you meant `contrib`. It is not a strict division, but there are core 
components in `java-exec`. And almost all storage/format plugins are in the 
`contrib` module.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-02-05 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r253784583
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogFormatConfig.java
 ##
 @@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Objects;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.ArrayList;
+
+@JsonTypeName("syslog")
+public class SyslogFormatConfig implements FormatPluginConfig {
+
+  public List extensions;
+  public int maxErrors = 10;
+  public boolean flattenStructuredData = false;
+
+  public boolean getFlattenStructuredData() {
+return flattenStructuredData;
+  }
+
+  public int getMaxErrors() {
+return maxErrors;
+  }
+
+  public List getExtensions() {
+return extensions;
+  }
+
+  public void setExtensions(List ext) {
+this.extensions = ext;
+  }
+
+  public void setExtension(String ext) {
+if (this.extensions == null) {
+  this.extensions = new ArrayList();
+}
+this.extensions.add(ext);
+  }
+
+  public void setMaxErrors(int errors) {
+this.maxErrors = errors;
+  }
+
+  public void setFlattenStructuredData(boolean flattenErrors) {
+this.flattenStructuredData = flattenErrors;
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj == null || getClass() != obj.getClass()) {
+  return false;
+}
+SyslogFormatConfig other = (SyslogFormatConfig) obj;
+return Objects.equal(extensions, other.extensions) &&
+Objects.equal(maxErrors, other.maxErrors) &&
+Objects.equal(flattenStructuredData, other.flattenStructuredData);
+  }
+
+  @Override
+  public int hashCode() {
+return Arrays.hashCode(new Object[]{maxErrors, flattenStructuredData, 
extensions});
+  }
+
+}
 
 Review comment:
   Looks like the new line is still absent (in some other files too).
   ```suggestion
   }
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r248421269
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogRecordReader.java
 ##
 @@ -0,0 +1,395 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.google.common.base.Charsets;
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.fs.Path;
+import org.realityforge.jsyslog.message.StructuredDataParameter;
+import org.realityforge.jsyslog.message.SyslogMessage;
+
+import java.io.BufferedReader;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.text.SimpleDateFormat;
+import java.util.List;
+import java.util.Map;
+import java.util.Iterator;
+
+public class SyslogRecordReader extends AbstractRecordReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SyslogRecordReader.class);
+  private static final int MAX_RECORDS_PER_BATCH = 8096;
+
+  private final DrillFileSystem fileSystem;
+  private final FileWork fileWork;
+  private final String userName;
+  private BufferedReader reader;
+  private DrillBuf buffer;
+  private VectorContainerWriter writer;
+  private SyslogFormatConfig config;
+  private int maxErrors;
+  private boolean flattenStructuredData;
+  private int errorCount;
+  private int lineCount;
+  private List projectedColumns;
+  private String line;
+
+  private SimpleDateFormat df;
+
+
+  public SyslogRecordReader(FragmentContext context,
+DrillFileSystem fileSystem,
+FileWork fileWork,
+List columns,
+String userName,
+SyslogFormatConfig config) throws 
OutOfMemoryException {
+
+this.fileSystem = fileSystem;
+this.fileWork = fileWork;
+this.userName = userName;
+this.config = config;
+this.maxErrors = config.getMaxErrors();
+this.df = getValidDateObject("-MM-dd'T'HH:mm:ss.SSS'Z'");
+this.errorCount = 0;
+this.buffer = context.getManagedBuffer(4096);
+this.projectedColumns = columns;
+this.flattenStructuredData = config.getFlattenStructuredData();
+
+
 
 Review comment:
   remove one empty line


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r247707978
 
 

 ##
 File path: 
contrib/format-syslog/src/test/java/org/apache/drill/exec/store/syslog/TestSyslogFormat.java
 ##
 @@ -0,0 +1,310 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.rowSet.RowSet;
+import org.apache.drill.test.rowSet.RowSetBuilder;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.ClassRule;
+
+public class TestSyslogFormat extends ClusterTest {
+
+  @ClassRule
+  public static final BaseDirTestWatcher dirTestWatcher = new 
BaseDirTestWatcher();
+
+  @BeforeClass
+  public static void setup() throws Exception {
+
ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher).maxParallelization(1));
+defineSyslogPlugin();
+  }
+
+  private static void defineSyslogPlugin() throws ExecutionSetupException {
+SyslogFormatConfig sampleConfig = new SyslogFormatConfig();
+sampleConfig.setExtension("syslog");
+
+SyslogFormatConfig flattenedDataConfig = new SyslogFormatConfig();
+flattenedDataConfig.setExtension("syslog1");
+flattenedDataConfig.setFlattenStructuredData(true);
+
+
+// Define a temporary plugin for the "cp" storage plugin.
+Drillbit drillbit = cluster.drillbit();
+final StoragePluginRegistry pluginRegistry = 
drillbit.getContext().getStorage();
+final FileSystemPlugin plugin = (FileSystemPlugin) 
pluginRegistry.getPlugin("cp");
+final FileSystemConfig pluginConfig = (FileSystemConfig) 
plugin.getConfig();
+pluginConfig.getFormats().put("sample", sampleConfig);
+pluginConfig.getFormats().put("flat", flattenedDataConfig);
+pluginRegistry.createOrUpdate("cp", pluginConfig, false);
+
+  }
+
+  @Test
+  public void testNonComplexFields() throws RpcException {
+String sql = "SELECT event_date," +
+"severity_code," +
 
 Review comment:
   space after comma


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r244857160
 
 

 ##
 File path: contrib/format-syslog/src/main/resources/checkstyle-suppressions.xml
 ##
 @@ -0,0 +1,28 @@
+
+
+

[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r248424980
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogRecordReader.java
 ##
 @@ -0,0 +1,395 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.google.common.base.Charsets;
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.fs.Path;
+import org.realityforge.jsyslog.message.StructuredDataParameter;
+import org.realityforge.jsyslog.message.SyslogMessage;
+
+import java.io.BufferedReader;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.text.SimpleDateFormat;
+import java.util.List;
+import java.util.Map;
+import java.util.Iterator;
+
+public class SyslogRecordReader extends AbstractRecordReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SyslogRecordReader.class);
+  private static final int MAX_RECORDS_PER_BATCH = 8096;
+
+  private final DrillFileSystem fileSystem;
+  private final FileWork fileWork;
+  private final String userName;
+  private BufferedReader reader;
+  private DrillBuf buffer;
+  private VectorContainerWriter writer;
+  private SyslogFormatConfig config;
+  private int maxErrors;
+  private boolean flattenStructuredData;
+  private int errorCount;
+  private int lineCount;
+  private List projectedColumns;
+  private String line;
+
+  private SimpleDateFormat df;
+
+
+  public SyslogRecordReader(FragmentContext context,
+DrillFileSystem fileSystem,
+FileWork fileWork,
+List columns,
+String userName,
+SyslogFormatConfig config) throws 
OutOfMemoryException {
+
+this.fileSystem = fileSystem;
+this.fileWork = fileWork;
+this.userName = userName;
+this.config = config;
+this.maxErrors = config.getMaxErrors();
+this.df = getValidDateObject("-MM-dd'T'HH:mm:ss.SSS'Z'");
+this.errorCount = 0;
+this.buffer = context.getManagedBuffer(4096);
+this.projectedColumns = columns;
+this.flattenStructuredData = config.getFlattenStructuredData();
+
+
+setColumns(columns);
+  }
+
+  public void setup(final OperatorContext context, final OutputMutator output) 
throws ExecutionSetupException {
+openFile();
+this.writer = new VectorContainerWriter(output);
+  }
+
+  private void openFile() {
+InputStream in;
+try {
+  in = fileSystem.open(new Path(fileWork.getPath()));
+} catch (Exception e) {
+  throw UserException
+  .dataReadError(e)
+  .message("Failed to open open input file: %s", fileWork.getPath())
+  .addContext("User name", this.userName)
+  .build(logger);
+}
+this.lineCount = 0;
+reader = new BufferedReader(new InputStreamReader(in, Charsets.UTF_8));
+  }
+
+  public int next() {
+this.writer.allocate();
+this.writer.reset();
+
+int recordCount = 0;
+
+try {
+  BaseWriter.MapWriter map = this.writer.rootAsMap();
+  String line = null;
+
+  while (recordCount < MAX_RECORDS_PER_BATCH && (line = 
this.reader.readLine()) != null) {
+lineCount++;
+
+// Skip empty lines
+line = line.trim();
+if (line.length() == 0) {

[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r236334709
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogFormatConfig.java
 ##
 @@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Objects;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.ArrayList;
+
+@JsonTypeName("syslog")
 
 Review comment:
   Consider adding `@JsonInclude(Include.NON_DEFAULT)` to exclude 
{"extensions":[]}
   
   Looks like `LogFormatConfig` requires it too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r244859297
 
 

 ##
 File path: distribution/src/assemble/bin.xml
 ##
 @@ -103,6 +103,7 @@
 org.apache.drill.contrib:drill-jdbc-storage
 org.apache.drill.contrib:drill-kudu-storage
 org.apache.drill.contrib:drill-storage-kafka
+org.apache.drill.contrib:drill-format-syslog
 
 Review comment:
   `distribution/assembly` was reorganized. Please put it into 
`/drill/distribution/src/assemble/component.xml` file


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r248423541
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogRecordReader.java
 ##
 @@ -0,0 +1,395 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.google.common.base.Charsets;
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.fs.Path;
+import org.realityforge.jsyslog.message.StructuredDataParameter;
+import org.realityforge.jsyslog.message.SyslogMessage;
+
+import java.io.BufferedReader;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.text.SimpleDateFormat;
+import java.util.List;
+import java.util.Map;
+import java.util.Iterator;
+
+public class SyslogRecordReader extends AbstractRecordReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SyslogRecordReader.class);
+  private static final int MAX_RECORDS_PER_BATCH = 8096;
+
+  private final DrillFileSystem fileSystem;
+  private final FileWork fileWork;
+  private final String userName;
+  private BufferedReader reader;
+  private DrillBuf buffer;
+  private VectorContainerWriter writer;
+  private SyslogFormatConfig config;
+  private int maxErrors;
+  private boolean flattenStructuredData;
+  private int errorCount;
+  private int lineCount;
+  private List projectedColumns;
+  private String line;
+
+  private SimpleDateFormat df;
+
+
+  public SyslogRecordReader(FragmentContext context,
+DrillFileSystem fileSystem,
+FileWork fileWork,
+List columns,
+String userName,
+SyslogFormatConfig config) throws 
OutOfMemoryException {
+
+this.fileSystem = fileSystem;
+this.fileWork = fileWork;
+this.userName = userName;
+this.config = config;
+this.maxErrors = config.getMaxErrors();
+this.df = getValidDateObject("-MM-dd'T'HH:mm:ss.SSS'Z'");
+this.errorCount = 0;
+this.buffer = context.getManagedBuffer(4096);
+this.projectedColumns = columns;
+this.flattenStructuredData = config.getFlattenStructuredData();
+
+
+setColumns(columns);
+  }
+
+  public void setup(final OperatorContext context, final OutputMutator output) 
throws ExecutionSetupException {
 
 Review comment:
   Please add `@Override` annotation. It is not mandatory, but better for code 
understanding.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r248429446
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogRecordReader.java
 ##
 @@ -0,0 +1,395 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.google.common.base.Charsets;
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.fs.Path;
+import org.realityforge.jsyslog.message.StructuredDataParameter;
+import org.realityforge.jsyslog.message.SyslogMessage;
+
+import java.io.BufferedReader;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.text.SimpleDateFormat;
+import java.util.List;
+import java.util.Map;
+import java.util.Iterator;
+
+public class SyslogRecordReader extends AbstractRecordReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SyslogRecordReader.class);
+  private static final int MAX_RECORDS_PER_BATCH = 8096;
+
+  private final DrillFileSystem fileSystem;
+  private final FileWork fileWork;
+  private final String userName;
+  private BufferedReader reader;
+  private DrillBuf buffer;
+  private VectorContainerWriter writer;
+  private SyslogFormatConfig config;
+  private int maxErrors;
+  private boolean flattenStructuredData;
+  private int errorCount;
+  private int lineCount;
+  private List projectedColumns;
+  private String line;
+
+  private SimpleDateFormat df;
+
+
+  public SyslogRecordReader(FragmentContext context,
+DrillFileSystem fileSystem,
+FileWork fileWork,
+List columns,
+String userName,
+SyslogFormatConfig config) throws 
OutOfMemoryException {
+
+this.fileSystem = fileSystem;
+this.fileWork = fileWork;
+this.userName = userName;
+this.config = config;
+this.maxErrors = config.getMaxErrors();
+this.df = getValidDateObject("-MM-dd'T'HH:mm:ss.SSS'Z'");
+this.errorCount = 0;
+this.buffer = context.getManagedBuffer(4096);
 
 Review comment:
   Use constant for `4096`, possibly with some explanation why 
`context.getManagedBuffer()` is not sufficient.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r248420290
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogFormatPlugin.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.google.common.collect.Lists;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.RecordReader;
+import org.apache.drill.exec.store.RecordWriter;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.dfs.easy.EasyWriter;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.hadoop.conf.Configuration;
+
+import java.util.List;
+
+public class SyslogFormatPlugin extends EasyFormatPlugin {
+
+  public static final String DEFAULT_NAME = "syslog";
+  private final SyslogFormatConfig formatConfig;
+
+  public SyslogFormatPlugin(String name, DrillbitContext context,
+Configuration fsConf, StoragePluginConfig 
storageConfig,
+SyslogFormatConfig formatConfig) {
+super(name, context, fsConf, storageConfig, formatConfig,
+true,  // readable
+false, // writable
+true, // blockSplittable
+true,  // compressible
+Lists.newArrayList(formatConfig.getExtensions()),
+DEFAULT_NAME);
+this.formatConfig = formatConfig;
+  }
+
+  @Override
+  public RecordReader getRecordReader(FragmentContext context, DrillFileSystem 
dfs, FileWork fileWork,
+  List columns, String 
userName) throws ExecutionSetupException {
+return new SyslogRecordReader(context, dfs, fileWork, columns, userName, 
formatConfig);
+  }
+
+  @Override
+  public boolean supportsPushDown() {
+return true;
+  }
+
+  @Override
+  public RecordWriter getRecordWriter(FragmentContext context,
+  EasyWriter writer) throws 
UnsupportedOperationException {
+throw new UnsupportedOperationException("unimplemented");
+  }
+
+  @Override
+  public int getReaderOperatorType() {
+return 0;
+  }  //TODO Add protobuf for this
+
+  @Override
+  public int getWriterOperatorType() {
+throw new UnsupportedOperationException("unimplemented");
 
 Review comment:
   Let's add more precise exception description, for instance:
   `Drill doesn't currently support writing to SysLog files.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r244856976
 
 

 ##
 File path: contrib/format-syslog/src/main/resources/checkstyle-config.xml
 ##
 @@ -0,0 +1,40 @@
+
+
+http://www.puppycrawl.com/dtds/configuration_1_2.dtd";>
+
+
 
 Review comment:
   Why this file is needed? What is the difference with default 
`drill/src/main/resources/checkstyle-config.xml`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r248419639
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogFormatPlugin.java
 ##
 @@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.google.common.collect.Lists;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.RecordReader;
+import org.apache.drill.exec.store.RecordWriter;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.dfs.easy.EasyWriter;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.hadoop.conf.Configuration;
+
+import java.util.List;
+
+public class SyslogFormatPlugin extends EasyFormatPlugin {
+
+  public static final String DEFAULT_NAME = "syslog";
+  private final SyslogFormatConfig formatConfig;
+
+  public SyslogFormatPlugin(String name, DrillbitContext context,
+Configuration fsConf, StoragePluginConfig 
storageConfig,
+SyslogFormatConfig formatConfig) {
+super(name, context, fsConf, storageConfig, formatConfig,
+true,  // readable
+false, // writable
+true, // blockSplittable
+true,  // compressible
+Lists.newArrayList(formatConfig.getExtensions()),
+DEFAULT_NAME);
+this.formatConfig = formatConfig;
+  }
+
+  @Override
+  public RecordReader getRecordReader(FragmentContext context, DrillFileSystem 
dfs, FileWork fileWork,
+  List columns, String 
userName) throws ExecutionSetupException {
+return new SyslogRecordReader(context, dfs, fileWork, columns, userName, 
formatConfig);
+  }
+
+  @Override
+  public boolean supportsPushDown() {
+return true;
+  }
+
+  @Override
+  public RecordWriter getRecordWriter(FragmentContext context,
+  EasyWriter writer) throws 
UnsupportedOperationException {
+throw new UnsupportedOperationException("unimplemented");
+  }
+
+  @Override
+  public int getReaderOperatorType() {
+return 0;
+  }  //TODO Add protobuf for this
 
 Review comment:
   The proper way is to do it along with this PR. Without it the feature is not 
complete.
   Just setup correct Protobuf version (2.5.0) on your laptop and run the 
following commands:
   1) Java Protobuf files:
   `mvn process-sources -P proto-compile`
   The full guide is here: https://github.com/apache/drill/tree/master/protocol
   2) C++ Protobuf files: 
   `cd contrib/native/client`
   `rm -rf build`
   `mkdir build`
   `cd build && cmake -G "Unix Makefiles" ..`
   `make cpProtobufs`
   The full guide for MacOS is here: 
https://github.com/apache/drill/blob/master/contrib/native/client/readme.macos#L88


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r244854710
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogRecordReader.java
 ##
 @@ -0,0 +1,395 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.google.common.base.Charsets;
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.fs.Path;
+import org.realityforge.jsyslog.message.StructuredDataParameter;
+import org.realityforge.jsyslog.message.SyslogMessage;
+
+import java.io.BufferedReader;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.text.SimpleDateFormat;
+import java.util.List;
+import java.util.Map;
+import java.util.Iterator;
+
+public class SyslogRecordReader extends AbstractRecordReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SyslogRecordReader.class);
+  private static final int MAX_RECORDS_PER_BATCH = 8096;
 
 Review comment:
   How it is obtained? For instance in most Drill readers 4096 (or 4000) is a 
default row count per batch .


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r247707338
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogFormatConfig.java
 ##
 @@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Objects;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.ArrayList;
+
+@JsonTypeName("syslog")
+public class SyslogFormatConfig implements FormatPluginConfig {
+
+  public List extensions;
+  public int maxErrors = 10;
+  public boolean flattenStructuredData = false;
 
 Review comment:
   `false` is default value. Is it necessary to specify it explicitly?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r244821981
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogFormatConfig.java
 ##
 @@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Objects;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.ArrayList;
+
+@JsonTypeName("syslog")
+public class SyslogFormatConfig implements FormatPluginConfig {
+
+  public List extensions;
+  public int maxErrors = 10;
+  public boolean flattenStructuredData = false;
+
+  public boolean getFlattenStructuredData() {
+return flattenStructuredData;
+  }
+
+  public int getMaxErrors() {
+return maxErrors;
+  }
+
+  public List getExtensions() {
+return extensions;
+  }
+
+  public void setExtensions(List ext) {
+this.extensions = ext;
+  }
+
+  public void setExtension(String ext) {
+if (this.extensions == null) {
+  this.extensions = new ArrayList();
+}
+this.extensions.add(ext);
+  }
+
+  public void setMaxErrors(int errors) {
+this.maxErrors = errors;
+  }
+
+  public void setFlattenStructuredData(boolean flattenErrors) {
+this.flattenStructuredData = flattenErrors;
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj == null || getClass() != obj.getClass()) {
+  return false;
+}
+SyslogFormatConfig other = (SyslogFormatConfig) obj;
+return Objects.equal(extensions, other.extensions) &&
+Objects.equal(maxErrors, other.maxErrors) &&
+Objects.equal(flattenStructuredData, other.flattenStructuredData);
+  }
+
+  @Override
+  public int hashCode() {
+return Arrays.hashCode(new Object[]{maxErrors, flattenStructuredData, 
extensions});
+  }
+
+}
 
 Review comment:
   new line in the end of file here and in other places


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r248420763
 
 

 ##
 File path: 
contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogRecordReader.java
 ##
 @@ -0,0 +1,395 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.syslog;
+
+import com.google.common.base.Charsets;
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.store.AbstractRecordReader;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.FileWork;
+import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.fs.Path;
+import org.realityforge.jsyslog.message.StructuredDataParameter;
+import org.realityforge.jsyslog.message.SyslogMessage;
+
+import java.io.BufferedReader;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.text.SimpleDateFormat;
+import java.util.List;
+import java.util.Map;
+import java.util.Iterator;
+
+public class SyslogRecordReader extends AbstractRecordReader {
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(SyslogRecordReader.class);
+  private static final int MAX_RECORDS_PER_BATCH = 8096;
+
+  private final DrillFileSystem fileSystem;
+  private final FileWork fileWork;
+  private final String userName;
+  private BufferedReader reader;
+  private DrillBuf buffer;
+  private VectorContainerWriter writer;
+  private SyslogFormatConfig config;
+  private int maxErrors;
+  private boolean flattenStructuredData;
+  private int errorCount;
+  private int lineCount;
+  private List projectedColumns;
+  private String line;
+
+  private SimpleDateFormat df;
+
+
+  public SyslogRecordReader(FragmentContext context,
+DrillFileSystem fileSystem,
+FileWork fileWork,
+List columns,
+String userName,
+SyslogFormatConfig config) throws 
OutOfMemoryException {
+
+this.fileSystem = fileSystem;
+this.fileWork = fileWork;
+this.userName = userName;
+this.config = config;
+this.maxErrors = config.getMaxErrors();
+this.df = getValidDateObject("-MM-dd'T'HH:mm:ss.SSS'Z'");
+this.errorCount = 0;
 
 Review comment:
   `0` is a default value. Is it better to specify it explicitly?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r244850744
 
 

 ##
 File path: contrib/format-syslog/README.md
 ##
 @@ -0,0 +1,41 @@
+# Syslog Format Plugin
 
 Review comment:
   Good documentation.
   
   Some notes regarding your previous `Logfile Plugin` plugin (DRILL-6104):
   * Could you please add the clarification about type of Plugin to ` 
/exec/java-exec/src/main/java/org/apache/drill/exec/store/log/README.md`
   header:
   `# Drill Regex/Logfile Plugin` -> `# Drill Regex/Logfile Format Plugin`
   * Should `LogFormatPlugin` be considered to be moved to `contrib` package 
too, similar to `SyslogFormatPlugin`?
   
   No changes should be added in this PR. Just create a new Jira for it, if you 
agree with above notes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin

2019-01-16 Thread GitBox
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG 
(RFC-5424) Format Plugin
URL: https://github.com/apache/drill/pull/1530#discussion_r236317105
 
 

 ##
 File path: contrib/format-syslog/pom.xml
 ##
 @@ -0,0 +1,88 @@
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+4.0.0
+
+
+drill-contrib-parent
+org.apache.drill.contrib
+1.15.0-SNAPSHOT
+
+
+drill-format-syslog
+contrib/format-syslog
+
+
+
 
 Review comment:
   Usually two spaces indent is used in Drill project for XML files.
   You can import 
[formatter-configuration](https://drill.apache.org/docs/apache-drill-contribution-guidelines/#formatter-configuration)
 to your IDE to apply these configs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services