[jira] [Created] (DRILL-7731) JDBC Storage Plugin for salesforce

2020-05-04 Thread Mohammed Zeeshan (Jira)
Mohammed Zeeshan created DRILL-7731:
---

 Summary: JDBC Storage Plugin for salesforce 
 Key: DRILL-7731
 URL: https://issues.apache.org/jira/browse/DRILL-7731
 Project: Apache Drill
  Issue Type: Task
  Components: Storage - Other
Affects Versions: 1.17.0
Reporter: Mohammed Zeeshan


Team,
 
I've a query for creating a storage plugin to salesforce with JDBC
 
Installed necessary jdbc driver and tried with below configuration:
{
  "type": "jdbc",
  "driver": "oracle.jdbc.driver.OracleDriver",
  "url": 
"[jdbc:oracle:thin:@login.salesforce.com|mailto:jdbc%3aoracle%3athin...@login.salesforce.com];,
  "username": "XXXMyUserXXX",
  "password": "XXXMyPasswordXXX",
  "enabled": true
}
 
But eventually ending up in an error, i have followed documentation but 
unfortunately no luck
 
 *Please retry: error (unable to create/ update storage)* 
 
Could you please help to join the missing piece.
 
Best Wishes,
Mohammed Zeeshan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [drill] paul-rogers opened a new pull request #2076: DRILL-7729: Use java.time in column accessors

2020-05-04 Thread GitBox


paul-rogers opened a new pull request #2076:
URL: https://github.com/apache/drill/pull/2076


   # [DRILL-7729](https://issues.apache.org/jira/browse/DRILL-7729): Use 
java.time in column accessors
   
   (Please replace `PR Title` with actual PR Title)
   
   ## Description
   
   Uses` java.time` classes in the column accessors. Leaves Joda time for 
`Interval`, as it has no `java.time` equivalent.
   
   This change allows the column accessors to work with Drill's JSON writer 
implementation. This PR includes a new `JsonWriter` based on a `RowSetReader`.
   
   ## Documentation
   
   The change is transparent to users **except** in one particular case: a use 
of the "provided schema" feature in which the user has provided a format for a 
`DATE`, `TIME` or `TIMESTAMP` column. The `java.time` formats are slightly 
different than their Joda predecessors.
   
   ## Testing
   
   Modified all tests which used Joda formats. Reran the full unit test suite.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] cgivre commented on a change in pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


cgivre commented on a change in pull request #2067:
URL: https://github.com/apache/drill/pull/2067#discussion_r419816763



##
File path: contrib/format-spss/README.md
##
@@ -0,0 +1,83 @@
+# Format Plugin for SPSS (SAV) Files
+This format plugin enables Apache Drill to read and query Statistical Package 
for the Social Sciences (SPSS) (or Statistical Product and Service Solutions) 
data files. According
+ to Wikipedia: (https://en.wikipedia.org/wiki/SPSS)
+ ***
+ SPSS is a widely used program for statistical analysis in social science. It 
is also used by market researchers, health researchers, survey companies, 
government, education researchers, marketing organizations, data miners, and 
others. The original SPSS manual (Nie, Bent & Hull, 1970) has been described as 
one of "sociology's most influential books" for allowing ordinary researchers 
to do their own statistical analysis. In addition to statistical analysis, data 
management (case selection, file reshaping, creating derived data) and data 
documentation (a metadata dictionary is stored in the datafile) are features of 
the base software.
+ ***
+ 
+## Configuration 
+To configure Drill to read SPSS files, simply add the following code to the 
formats section of your file-based storage plugin.  This should happen 
automatically for the default

Review comment:
   My thought is that it's better to add it in the bootstrap so that people 
know it's there.  Just my .02... 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] cgivre commented on pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


cgivre commented on pull request #2067:
URL: https://github.com/apache/drill/pull/2067#issuecomment-623789077


   @paul-rogers 
   Thanks for the review.  I believe I addressed all your comments. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] cgivre commented on a change in pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


cgivre commented on a change in pull request #2067:
URL: https://github.com/apache/drill/pull/2067#discussion_r419816168



##
File path: 
contrib/format-spss/src/main/java/org/apache/drill/exec/store/spss/SpssBatchReader.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.spss;
+
+import com.bedatadriven.spss.SpssDataFileReader;
+import com.bedatadriven.spss.SpssVariable;
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+public class SpssBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(SpssBatchReader.class);
+
+  private static final String VALUE_LABEL = "_value";
+
+  private FileSplit split;
+
+  private InputStream fsStream;
+
+  private SpssDataFileReader spssReader;
+
+  private RowSetLoader rowWriter;
+
+  private List variableList;
+
+  private List writerList;
+
+  private CustomErrorContext errorContext;
+
+
+  public static class SpssReaderConfig {
+
+protected final SpssFormatPlugin plugin;
+
+public SpssReaderConfig(SpssFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+negotiator.tableSchema(buildSchema(), true);
+errorContext = negotiator.parentErrorContext();
+ResultSetLoader loader = negotiator.build();
+rowWriter = loader.writer();
+buildReaderList();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+while (!rowWriter.isFull()) {
+  if (!processNextRow()) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+if (fsStream != null) {
+  try {
+fsStream.close();
+  } catch (IOException e) {
+logger.warn("Error when closing SPSS File Stream resource: {}", 
e.getMessage());
+  }
+  fsStream = null;
+}
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+try {
+  fsStream = 
negotiator.fileSystem().openPossiblyCompressedStream(split.getPath());
+  spssReader = new SpssDataFileReader(fsStream);
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Unable to open SPSS File %s", split.getPath())
+.addContext(e.getMessage())
+.addContext(errorContext)
+.build(logger);
+}
+  }
+
+  private boolean processNextRow() {
+try {
+  // Stop reading when you run out of data
+  if (!spssReader.readNextCase()) {
+return false;
+  }
+
+  rowWriter.start();
+  for (SpssColumnWriter spssColumnWriter : writerList) {
+spssColumnWriter.load(spssReader);
+  }
+  rowWriter.save();
+
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Error reading SPSS File.")
+.addContext(e.getMessage())
+.addContext(errorContext)
+.build(logger);
+}
+return true;
+  }
+
+  private TupleMetadata buildSchema() {
+SchemaBuilder builder = new SchemaBuilder();
+variableList = spssReader.getVariables();
+
+for (SpssVariable variable : variableList) {
+  String varName = variable.getVariableName();
+
+  if (variable.isNumeric()) {
+builder.addNullable(varName, 

[GitHub] [drill] cgivre commented on a change in pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


cgivre commented on a change in pull request #2067:
URL: https://github.com/apache/drill/pull/2067#discussion_r419814031



##
File path: 
contrib/format-spss/src/main/java/org/apache/drill/exec/store/spss/SpssBatchReader.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.spss;
+
+import com.bedatadriven.spss.SpssDataFileReader;
+import com.bedatadriven.spss.SpssVariable;
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+public class SpssBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(SpssBatchReader.class);
+
+  private static final String VALUE_LABEL = "_value";
+
+  private FileSplit split;
+
+  private InputStream fsStream;
+
+  private SpssDataFileReader spssReader;
+
+  private RowSetLoader rowWriter;
+
+  private List variableList;
+
+  private List writerList;
+
+  private CustomErrorContext errorContext;
+
+
+  public static class SpssReaderConfig {
+
+protected final SpssFormatPlugin plugin;
+
+public SpssReaderConfig(SpssFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+negotiator.tableSchema(buildSchema(), true);
+errorContext = negotiator.parentErrorContext();
+ResultSetLoader loader = negotiator.build();
+rowWriter = loader.writer();
+buildReaderList();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+while (!rowWriter.isFull()) {
+  if (!processNextRow()) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+if (fsStream != null) {
+  try {
+fsStream.close();
+  } catch (IOException e) {
+logger.warn("Error when closing SPSS File Stream resource: {}", 
e.getMessage());
+  }
+  fsStream = null;
+}
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+try {
+  fsStream = 
negotiator.fileSystem().openPossiblyCompressedStream(split.getPath());
+  spssReader = new SpssDataFileReader(fsStream);
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Unable to open SPSS File %s", split.getPath())
+.addContext(e.getMessage())
+.addContext(errorContext)
+.build(logger);
+}
+  }
+
+  private boolean processNextRow() {
+try {
+  // Stop reading when you run out of data
+  if (!spssReader.readNextCase()) {
+return false;
+  }
+
+  rowWriter.start();
+  for (SpssColumnWriter spssColumnWriter : writerList) {
+spssColumnWriter.load(spssReader);
+  }
+  rowWriter.save();
+
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Error reading SPSS File.")
+.addContext(e.getMessage())
+.addContext(errorContext)
+.build(logger);
+}
+return true;
+  }
+
+  private TupleMetadata buildSchema() {
+SchemaBuilder builder = new SchemaBuilder();
+variableList = spssReader.getVariables();
+
+for (SpssVariable variable : variableList) {
+  String varName = variable.getVariableName();
+
+  if (variable.isNumeric()) {
+builder.addNullable(varName, 

[GitHub] [drill] cgivre commented on a change in pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


cgivre commented on a change in pull request #2067:
URL: https://github.com/apache/drill/pull/2067#discussion_r419813495



##
File path: 
contrib/format-spss/src/main/java/org/apache/drill/exec/store/spss/SpssBatchReader.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.spss;
+
+import com.bedatadriven.spss.SpssDataFileReader;
+import com.bedatadriven.spss.SpssVariable;
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+public class SpssBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(SpssBatchReader.class);
+
+  private static final String VALUE_LABEL = "_value";
+
+  private FileSplit split;
+
+  private InputStream fsStream;
+
+  private SpssDataFileReader spssReader;
+
+  private RowSetLoader rowWriter;
+
+  private List variableList;
+
+  private List writerList;
+
+  private CustomErrorContext errorContext;
+
+
+  public static class SpssReaderConfig {
+
+protected final SpssFormatPlugin plugin;
+
+public SpssReaderConfig(SpssFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+negotiator.tableSchema(buildSchema(), true);
+errorContext = negotiator.parentErrorContext();
+ResultSetLoader loader = negotiator.build();
+rowWriter = loader.writer();
+buildReaderList();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+while (!rowWriter.isFull()) {
+  if (!processNextRow()) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+if (fsStream != null) {
+  try {
+fsStream.close();
+  } catch (IOException e) {
+logger.warn("Error when closing SPSS File Stream resource: {}", 
e.getMessage());
+  }
+  fsStream = null;
+}
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+try {
+  fsStream = 
negotiator.fileSystem().openPossiblyCompressedStream(split.getPath());
+  spssReader = new SpssDataFileReader(fsStream);
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Unable to open SPSS File %s", split.getPath())
+.addContext(e.getMessage())
+.addContext(errorContext)
+.build(logger);
+}
+  }
+
+  private boolean processNextRow() {
+try {
+  // Stop reading when you run out of data
+  if (!spssReader.readNextCase()) {
+return false;
+  }
+
+  rowWriter.start();
+  for (SpssColumnWriter spssColumnWriter : writerList) {
+spssColumnWriter.load(spssReader);
+  }
+  rowWriter.save();
+
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Error reading SPSS File.")
+.addContext(e.getMessage())

Review comment:
   I think this is what you meant here. 

##
File path: 
contrib/format-spss/src/main/java/org/apache/drill/exec/store/spss/SpssBatchReader.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright 

[GitHub] [drill] cgivre commented on a change in pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


cgivre commented on a change in pull request #2067:
URL: https://github.com/apache/drill/pull/2067#discussion_r419812878



##
File path: 
contrib/format-spss/src/main/java/org/apache/drill/exec/store/spss/SpssBatchReader.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.spss;
+
+import com.bedatadriven.spss.SpssDataFileReader;
+import com.bedatadriven.spss.SpssVariable;
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+public class SpssBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(SpssBatchReader.class);
+
+  private static final String VALUE_LABEL = "_value";
+
+  private FileSplit split;
+
+  private InputStream fsStream;
+
+  private SpssDataFileReader spssReader;
+
+  private RowSetLoader rowWriter;
+
+  private List variableList;
+
+  private List writerList;
+
+  private CustomErrorContext errorContext;
+
+
+  public static class SpssReaderConfig {
+
+protected final SpssFormatPlugin plugin;
+
+public SpssReaderConfig(SpssFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+negotiator.tableSchema(buildSchema(), true);
+errorContext = negotiator.parentErrorContext();
+ResultSetLoader loader = negotiator.build();
+rowWriter = loader.writer();
+buildReaderList();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+while (!rowWriter.isFull()) {
+  if (!processNextRow()) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+if (fsStream != null) {
+  try {
+fsStream.close();
+  } catch (IOException e) {
+logger.warn("Error when closing SPSS File Stream resource: {}", 
e.getMessage());
+  }
+  fsStream = null;
+}
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+try {
+  fsStream = 
negotiator.fileSystem().openPossiblyCompressedStream(split.getPath());
+  spssReader = new SpssDataFileReader(fsStream);
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Unable to open SPSS File %s", split.getPath())
+.addContext(e.getMessage())
+.addContext(errorContext)
+.build(logger);
+}
+  }
+
+  private boolean processNextRow() {
+try {
+  // Stop reading when you run out of data
+  if (!spssReader.readNextCase()) {
+return false;
+  }
+
+  rowWriter.start();
+  for (SpssColumnWriter spssColumnWriter : writerList) {
+spssColumnWriter.load(spssReader);
+  }
+  rowWriter.save();
+
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Error reading SPSS File.")
+.addContext(e.getMessage())
+.addContext(errorContext)
+.build(logger);
+}
+return true;
+  }
+
+  private TupleMetadata buildSchema() {
+SchemaBuilder builder = new SchemaBuilder();
+variableList = spssReader.getVariables();
+
+for (SpssVariable variable : variableList) {
+  String varName = variable.getVariableName();
+
+  if (variable.isNumeric()) {
+builder.addNullable(varName, 

[GitHub] [drill] cgivre commented on a change in pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


cgivre commented on a change in pull request #2067:
URL: https://github.com/apache/drill/pull/2067#discussion_r419812810



##
File path: 
contrib/format-spss/src/main/java/org/apache/drill/exec/store/spss/SpssBatchReader.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.spss;
+
+import com.bedatadriven.spss.SpssDataFileReader;
+import com.bedatadriven.spss.SpssVariable;
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+public class SpssBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(SpssBatchReader.class);
+
+  private static final String VALUE_LABEL = "_value";
+
+  private FileSplit split;
+
+  private InputStream fsStream;
+
+  private SpssDataFileReader spssReader;
+
+  private RowSetLoader rowWriter;
+
+  private List variableList;
+
+  private List writerList;
+
+  private CustomErrorContext errorContext;
+
+
+  public static class SpssReaderConfig {
+
+protected final SpssFormatPlugin plugin;
+
+public SpssReaderConfig(SpssFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+negotiator.tableSchema(buildSchema(), true);
+errorContext = negotiator.parentErrorContext();
+ResultSetLoader loader = negotiator.build();
+rowWriter = loader.writer();
+buildReaderList();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+while (!rowWriter.isFull()) {
+  if (!processNextRow()) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+if (fsStream != null) {
+  try {
+fsStream.close();
+  } catch (IOException e) {
+logger.warn("Error when closing SPSS File Stream resource: {}", 
e.getMessage());
+  }
+  fsStream = null;
+}
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+try {
+  fsStream = 
negotiator.fileSystem().openPossiblyCompressedStream(split.getPath());
+  spssReader = new SpssDataFileReader(fsStream);
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Unable to open SPSS File %s", split.getPath())
+.addContext(e.getMessage())
+.addContext(errorContext)
+.build(logger);
+}
+  }
+
+  private boolean processNextRow() {
+try {
+  // Stop reading when you run out of data
+  if (!spssReader.readNextCase()) {
+return false;
+  }
+
+  rowWriter.start();
+  for (SpssColumnWriter spssColumnWriter : writerList) {
+spssColumnWriter.load(spssReader);
+  }
+  rowWriter.save();
+
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Error reading SPSS File.")
+.addContext(e.getMessage())
+.addContext(errorContext)
+.build(logger);
+}
+return true;
+  }
+
+  private TupleMetadata buildSchema() {
+SchemaBuilder builder = new SchemaBuilder();
+variableList = spssReader.getVariables();
+
+for (SpssVariable variable : variableList) {
+  String varName = variable.getVariableName();
+
+  if (variable.isNumeric()) {
+builder.addNullable(varName, 

[GitHub] [drill] cgivre commented on a change in pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


cgivre commented on a change in pull request #2067:
URL: https://github.com/apache/drill/pull/2067#discussion_r419812606



##
File path: 
contrib/format-spss/src/main/java/org/apache/drill/exec/store/spss/SpssBatchReader.java
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.spss;
+
+import com.bedatadriven.spss.SpssDataFileReader;
+import com.bedatadriven.spss.SpssVariable;
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+public class SpssBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(SpssBatchReader.class);
+
+  private static final String VALUE_LABEL = "_value";
+
+  private FileSplit split;
+
+  private InputStream fsStream;
+
+  private SpssDataFileReader spssReader;
+
+  private RowSetLoader rowWriter;
+
+  private List variableList;
+
+  private List writerList;
+
+  private CustomErrorContext errorContext;
+
+
+  public static class SpssReaderConfig {
+
+protected final SpssFormatPlugin plugin;
+
+public SpssReaderConfig(SpssFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+openFile(negotiator);
+negotiator.tableSchema(buildSchema(), true);
+errorContext = negotiator.parentErrorContext();
+ResultSetLoader loader = negotiator.build();
+rowWriter = loader.writer();
+buildReaderList();
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+while (!rowWriter.isFull()) {
+  if (!processNextRow()) {
+return false;
+  }
+}
+return true;
+  }
+
+  @Override
+  public void close() {
+if (fsStream != null) {
+  try {
+fsStream.close();
+  } catch (IOException e) {
+logger.warn("Error when closing SPSS File Stream resource: {}", 
e.getMessage());
+  }
+  fsStream = null;
+}
+  }
+
+  private void openFile(FileSchemaNegotiator negotiator) {
+try {
+  fsStream = 
negotiator.fileSystem().openPossiblyCompressedStream(split.getPath());
+  spssReader = new SpssDataFileReader(fsStream);
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Unable to open SPSS File %s", split.getPath())
+.addContext(e.getMessage())
+.addContext(errorContext)
+.build(logger);
+}
+  }
+
+  private boolean processNextRow() {
+try {
+  // Stop reading when you run out of data
+  if (!spssReader.readNextCase()) {
+return false;
+  }
+
+  rowWriter.start();
+  for (SpssColumnWriter spssColumnWriter : writerList) {
+spssColumnWriter.load(spssReader);
+  }
+  rowWriter.save();
+
+} catch (IOException e) {
+  throw UserException
+.dataReadError(e)
+.message("Error reading SPSS File.")
+.addContext(e.getMessage())
+.addContext(errorContext)
+.build(logger);
+}
+return true;
+  }
+
+  private TupleMetadata buildSchema() {
+SchemaBuilder builder = new SchemaBuilder();
+variableList = spssReader.getVariables();
+
+for (SpssVariable variable : variableList) {
+  String varName = variable.getVariableName();
+
+  if (variable.isNumeric()) {
+builder.addNullable(varName, 

[GitHub] [drill] cgivre commented on a change in pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


cgivre commented on a change in pull request #2067:
URL: https://github.com/apache/drill/pull/2067#discussion_r419812106



##
File path: contrib/format-spss/pom.xml
##
@@ -0,0 +1,88 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+
+  
+drill-contrib-parent
+org.apache.drill.contrib
+1.18.0-SNAPSHOT
+  
+
+  drill-format-spss
+  contrib/format-spss
+
+  
+
+  org.apache.drill.exec
+  drill-java-exec
+  ${project.version}
+
+
+  com.bedatadriven.spss
+  spss-reader
+  1.3
+
+
+
+
+  org.apache.drill.exec
+  drill-java-exec
+  tests
+  ${project.version}
+  test
+
+
+
+  org.apache.drill
+  drill-common
+  tests
+  ${project.version}
+  test
+
+  
+  
+
+  
+maven-resources-plugin
+
+  
+copy-java-sources
+process-sources
+
+  copy-resources
+
+
+  
${basedir}/target/classes/org/apache/drill/exec/store/syslog

Review comment:
   Oops ... Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] cgivre commented on a change in pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


cgivre commented on a change in pull request #2067:
URL: https://github.com/apache/drill/pull/2067#discussion_r419811899



##
File path: contrib/format-spss/README.md
##
@@ -0,0 +1,83 @@
+# Format Plugin for SPSS (SAV) Files
+This format plugin enables Apache Drill to read and query Statistical Package 
for the Social Sciences (SPSS) (or Statistical Product and Service Solutions) 
data files. According
+ to Wikipedia: (https://en.wikipedia.org/wiki/SPSS)
+ ***
+ SPSS is a widely used program for statistical analysis in social science. It 
is also used by market researchers, health researchers, survey companies, 
government, education researchers, marketing organizations, data miners, and 
others. The original SPSS manual (Nie, Bent & Hull, 1970) has been described as 
one of "sociology's most influential books" for allowing ordinary researchers 
to do their own statistical analysis. In addition to statistical analysis, data 
management (case selection, file reshaping, creating derived data) and data 
documentation (a metadata dictionary is stored in the datafile) are features of 
the base software.

Review comment:
   Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] paul-rogers opened a new pull request #2075: DRILL-7730: Improve web query efficiency

2020-05-04 Thread GitBox


paul-rogers opened a new pull request #2075:
URL: https://github.com/apache/drill/pull/2075


   # DRILL-7730](https://issues.apache.org/jira/browse/DRILL-7730): Improve web 
query efficiency
   
   ## Description
   
   Drill provides a REST API to run queries: `http://:8047/query` and 
`/query.json`. This PR improves the memory efficiency of these queries.
   
   Drill runs queries as a DAG of operators, rooted on the "Screen" operator. 
The Screen operator takes each output batch of the query and hands it over to a 
`UserClientConnection` object. The original design is that 
`UserClientConnection` corresponded to an RPC connection. So, the Screen 
operator converted the vectors in the outgoing batch into a 
`QueryWritableBatch` which is an ordered list of buffers ready to send via 
Netty.
   
   When the REST API was added, the simplest thing was to add a new 
REST-specific version of `UserClientConnection`, called `WebUserConnection`. 
Rather than sending our list of buffers off to the network, the web version 
converts the buffers back into a set of value vectors using the same 
deserialization code used in the Drill client. However, that deserialization 
code needs the data in the form of a single large buffer. So, the REST code 
copies the entire batch from the list of buffers into one large direct memory 
buffer. Then it converts that back into vectors.
   
   Clearly, all this work simply gets us back where we started: the Screen 
operator has a batch of vectors, the `WebUserConnection` recreates them, 
consuming lots of memory and CPU in the process. All of this work occurs in the 
query thread (not the REST request thread), making the query more costly than 
necessary.
   
   So, the major part of this PR is to avoid the copy: allow the REST code to 
work with the batch given to Screen.
   
   This is done by creating a new level of indirection, the `QueryDataPackage` 
class. Now, Screen simply wraps the outgoing batch of vectors in a data package 
and hands that off to the `UserClientConnection`. The RPC version calls a 
method which does the conversion from vectors into a list of buffers. But, the 
REST version calls a different method which returns the original batch of 
vectors. Voila, no more copying and no more extra direct memory overhead.
   
   The `WebUserConnection` use the vectors to create three on-heap structures: 
a list of column names, a list of column types, and a list of maps of rows. The 
rows are particularly inefficient and will be addressed in a separate PR. As it 
turns out, the code that handled the column and metadata list had a bug: every 
incoming batch of data would append another copy to the in-memory list, 
resulting in many redundant objects. That bug is fixed in this PR.
   
   The work to understand all this resulted in "grand tour" of parts of Drill. 
Much code cleanup resulted. Also, WebUserConnection` is split into two classes 
as part of the next phase (removing the on-heap buffered results.)
   
   ## Documentation
   
   N/A: the user visible behavior of Drill is unchanged (though REST queries 
might be a bit faster.)
   
   ## Testing
   
   Reran all unit tests. Though, to be fair, the test suite include basically 
no tests of the REST API. The test run instead ensured that nothing was broken 
in the main RPC pathway.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] cgivre commented on a change in pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


cgivre commented on a change in pull request #2067:
URL: https://github.com/apache/drill/pull/2067#discussion_r419668523



##
File path: pom.xml
##
@@ -359,6 +359,7 @@
 **/*.pcap
 **/*.log1
 **/*.log2
+**/*.sav

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] vvysotskyi commented on a change in pull request #2072: DRILL-7724: Refactor metadata controller batch

2020-05-04 Thread GitBox


vvysotskyi commented on a change in pull request #2072:
URL: https://github.com/apache/drill/pull/2072#discussion_r419664485



##
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/metadata/MetadataControllerBatch.java
##
@@ -127,113 +126,93 @@ protected MetadataControllerBatch(MetadataControllerPOP 
popConfig,
 ? null
 : popConfig.getContext().metadataToHandle().stream()
 .collect(Collectors.toMap(MetadataInfo::identifier, 
Function.identity()));
-this.metadataUnits = new ArrayList<>();
-this.statisticsCollector = new StatisticsCollectorImpl();
 this.columnNamesOptions = new ColumnNamesOptions(context.getOptions());
   }
 
-  protected boolean setupNewSchema() {
-container.clear();
-container.addOrGet(MetastoreAnalyzeConstants.OK_FIELD_NAME, 
Types.required(TypeProtos.MinorType.BIT), null);
-container.addOrGet(MetastoreAnalyzeConstants.SUMMARY_FIELD_NAME, 
Types.required(TypeProtos.MinorType.VARCHAR), null);
-container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
-container.setEmpty();
-return true;
-  }
-
   @Override
   public IterOutcome innerNext() {
-IterOutcome outcome;
-boolean finishedLeft;
-if (finished) {
-  return IterOutcome.NONE;
-}
+while (state != State.FINISHED) {
+  switch (state) {
+case RIGHT: {
 
-if (!finishedRight) {
-  outcome = handleRightIncoming();
-  if (outcome != null) {
-return outcome;
+  // Can only return NOT_YET
+  IterOutcome outcome = handleRightIncoming();
+  if (outcome != null) {
+return outcome;
+  }
+  break;
+}
+case LEFT: {
+
+  // Can only return NOT_YET
+  IterOutcome outcome = handleLeftIncoming();
+  if (outcome != null) {
+return outcome;
+  }
+  break;
+}
+case WRITE:
+  writeToMetastore();
+  createSummary();
+  state = State.FINISHED;
+  return IterOutcome.OK_NEW_SCHEMA;
+
+case FINISHED:
+  break;
+
+default:
+  throw new IllegalStateException(state.name());
   }
 }
+return IterOutcome.NONE;
+  }
 
+  private IterOutcome handleRightIncoming() {
 outer:
-while (true) {
-  outcome = next(0, left);
+for (;;) {

Review comment:
   Thanks for adding it to the Jira! Some time ago I looked into the way 
how we can apply separate checkstyle rules without enforcing other rules we 
don't want to touch, but tools like https://github.com/diffplug/spotless or 
IDEA autoformatting updates the code using a complete list of rules.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] vvysotskyi commented on a change in pull request #2067: DRILL-7716: Create Format Plugin for SPSS Files

2020-05-04 Thread GitBox


vvysotskyi commented on a change in pull request #2067:
URL: https://github.com/apache/drill/pull/2067#discussion_r419660268



##
File path: pom.xml
##
@@ -359,6 +359,7 @@
 **/*.pcap
 **/*.log1
 **/*.log2
+**/*.sav

Review comment:
   Please also update the exclusion list for `license-maven-plugin`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] vvysotskyi commented on a change in pull request #2071: DRILL-7727 Fix protobuf warning message

2020-05-04 Thread GitBox


vvysotskyi commented on a change in pull request #2071:
URL: https://github.com/apache/drill/pull/2071#discussion_r419654433



##
File path: contrib/native/client/CMakeLists.txt
##
@@ -25,6 +25,13 @@ cmake_policy(SET CMP0043 NEW)
 cmake_policy(SET CMP0048 NEW)
 enable_testing()
 
+#
+# required version for dependencies
+#
+set (BOOST_MINIMUM_VERSION 1.54.0)
+set (PROTOBUF_MINIMUM_VERSION 3.6.1)

Review comment:
   protoc compiler is required to be installed when regenerating c++ 
protobuf files, not java classes, therefore I pointed to that PRs in my 
previous comment.
   
   But this change doesn't make things worse, so it may be merged as it is.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (DRILL-7729) Use java.time in column accessors

2020-05-04 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7729:
--

 Summary: Use java.time in column accessors
 Key: DRILL-7729
 URL: https://issues.apache.org/jira/browse/DRILL-7729
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.17.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.18.0


Use {{java.time}} classes in the column accessors, except for {{Interval}}, 
which has no {{java.time}} equivalent. Doing so allows us to create a row-set 
version of Drill's JSON writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [drill] paul-rogers commented on a change in pull request #2072: DRILL-7724: Refactor metadata controller batch

2020-05-04 Thread GitBox


paul-rogers commented on a change in pull request #2072:
URL: https://github.com/apache/drill/pull/2072#discussion_r419628787



##
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/metadata/MetadataControllerBatch.java
##
@@ -127,113 +126,93 @@ protected MetadataControllerBatch(MetadataControllerPOP 
popConfig,
 ? null
 : popConfig.getContext().metadataToHandle().stream()
 .collect(Collectors.toMap(MetadataInfo::identifier, 
Function.identity()));
-this.metadataUnits = new ArrayList<>();
-this.statisticsCollector = new StatisticsCollectorImpl();
 this.columnNamesOptions = new ColumnNamesOptions(context.getOptions());
   }
 
-  protected boolean setupNewSchema() {
-container.clear();
-container.addOrGet(MetastoreAnalyzeConstants.OK_FIELD_NAME, 
Types.required(TypeProtos.MinorType.BIT), null);
-container.addOrGet(MetastoreAnalyzeConstants.SUMMARY_FIELD_NAME, 
Types.required(TypeProtos.MinorType.VARCHAR), null);
-container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
-container.setEmpty();
-return true;
-  }
-
   @Override
   public IterOutcome innerNext() {
-IterOutcome outcome;
-boolean finishedLeft;
-if (finished) {
-  return IterOutcome.NONE;
-}
+while (state != State.FINISHED) {
+  switch (state) {
+case RIGHT: {
 
-if (!finishedRight) {
-  outcome = handleRightIncoming();
-  if (outcome != null) {
-return outcome;
+  // Can only return NOT_YET
+  IterOutcome outcome = handleRightIncoming();
+  if (outcome != null) {
+return outcome;
+  }
+  break;
+}
+case LEFT: {
+
+  // Can only return NOT_YET
+  IterOutcome outcome = handleLeftIncoming();
+  if (outcome != null) {
+return outcome;
+  }
+  break;
+}
+case WRITE:
+  writeToMetastore();
+  createSummary();
+  state = State.FINISHED;
+  return IterOutcome.OK_NEW_SCHEMA;
+
+case FINISHED:
+  break;
+
+default:
+  throw new IllegalStateException(state.name());
   }
 }
+return IterOutcome.NONE;
+  }
 
+  private IterOutcome handleRightIncoming() {
 outer:
-while (true) {
-  outcome = next(0, left);
+for (;;) {

Review comment:
   Thanks for the explanation! I've updated DRILL-7352, our roll-up of 
coding standards, with the `while (true)` preference.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] paul-rogers commented on pull request #2068: DRILL-7717: Support Mongo extended types in V2 JSON loader

2020-05-04 Thread GitBox


paul-rogers commented on pull request #2068:
URL: https://github.com/apache/drill/pull/2068#issuecomment-623616523


   @vvysotskyi, thanks much for the review. Addressed your remaining comments. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] laurentgo commented on a change in pull request #2071: DRILL-7727 Fix protobuf warning message

2020-05-04 Thread GitBox


laurentgo commented on a change in pull request #2071:
URL: https://github.com/apache/drill/pull/2071#discussion_r419589392



##
File path: contrib/native/client/CMakeLists.txt
##
@@ -25,6 +25,13 @@ cmake_policy(SET CMP0043 NEW)
 cmake_policy(SET CMP0048 NEW)
 enable_testing()
 
+#
+# required version for dependencies
+#
+set (BOOST_MINIMUM_VERSION 1.54.0)
+set (PROTOBUF_MINIMUM_VERSION 3.6.1)

Review comment:
   Looks like regenerating with the correct version present on the user 
machine would be enough to fix the issue, but Drill has protobuf generated 
classes merged into code base instead of having those generated on the fly (not 
sure if people would be willing to revisit this? protoc compilers are now 
available in maven central, so you don't need to have installed on the system 
to build the java code).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: compile issue with MapR repo is being worked

2020-05-04 Thread Ted Dunning
Forwarded.



On Mon, May 4, 2020 at 6:15 AM Charles Givre  wrote:

> HI Ted, Vova,
> My PR is still blocked by the MapR repos.  After reverting back to the
> HTTP repo (which does seem to be working) we're now getting the following
> error:
>
> [ERROR] Failed to execute goal on project drill-format-mapr: Could not
> resolve dependencies for project
> org.apache.drill.contrib:drill-format-mapr:jar:1.18.0-SNAPSHOT: Could not
> transfer artifact com.mapr.hadoop:maprfs:jar:6.1.0-mapr from/to
> mapr-releases (http://repository.mapr.com/maven/): GET request of:
> com/mapr/hadoop/maprfs/6.1.0-mapr/maprfs-6.1.0-mapr.jar from mapr-releases
> failed: Premature end of Content-Length delimited message body (expected:
> 67,884,262; received: 47,333,376) -> [Help 1]
> 3146
>  <
> https://github.com/apache/drill/pull/2067/checks?check_run_id=642773818#step:6:3146>[ERROR]
>
> 3147
>  <
> https://github.com/apache/drill/pull/2067/checks?check_run_id=642773818#step:6:3147>[ERROR]
> To see the full stack trace of the errors, re-run Maven with the -e switch.
> 3148
>  <
> https://github.com/apache/drill/pull/2067/checks?check_run_id=642773818#step:6:3148>[ERROR]
> Re-run Maven using the -X switch to enable full debug logging.
> 3149
>  <
> https://github.com/apache/drill/pull/2067/checks?check_run_id=642773818#step:6:3149>[ERROR]
>
> 3150
>  <
> https://github.com/apache/drill/pull/2067/checks?check_run_id=642773818#step:6:3150>[ERROR]
> For more information about the errors and possible solutions, please read
> the following articles:
> 3151
>  <
> https://github.com/apache/drill/pull/2067/checks?check_run_id=642773818#step:6:3151>[ERROR]
> [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
> 3152
>  <
> https://github.com/apache/drill/pull/2067/checks?check_run_id=642773818#step:6:3152>[ERROR]
>
> 3153
>  <
> https://github.com/apache/drill/pull/2067/checks?check_run_id=642773818#step:6:3153>[ERROR]
> After correcting the problems, you can resume the build with the command
> 3154
>  <
> https://github.com/apache/drill/pull/2067/checks?check_run_id=642773818#step:6:3154>[ERROR]
>  mvn  -rf :drill-format-mapr
> 3155
>  <
> https://github.com/apache/drill/pull/2067/checks?check_run_id=642773818#step:6:3155>##[error]Process
> completed with exit code 1.
>
> Thanks for your help on quickly addressing this issue.
> -- C
>
> > On May 4, 2020, at 2:48 AM, Vova Vysotskyi  wrote:
> >
> > Hi Ted,
> >
> > Thanks for your help! It looks like for http protocol this issue was
> > resolved.
> >
> > Kind regards,
> > Volodymyr Vysotskyi
> >
> >
> > On Mon, May 4, 2020 at 4:19 AM Charles Givre  wrote:
> >
> >> Hi Ted,
> >> Thanks for your help.  You can view the logs here:
> >> https://github.com/apache/drill/pull/2067 <
> >> https://github.com/apache/drill/pull/2067> in the CI stuff.
> >> -- C
> >>
> >>
> >>
> >>
> >>> On May 3, 2020, at 9:16 PM, Ted Dunning  wrote:
> >>>
> >>> I will pass the word.
> >>>
> >>> Do you have logs?
> >>>
> >>>
> >>> On Sun, May 3, 2020 at 4:15 PM Charles Givre  wrote:
> >>>
>  Hi Ted,
>  Thanks for looking into this so quickly.  Unfortunately, I re-ran the
> CI
>  jobs from github and it is still producing the same errors.
>  Best,
>  --C
> 
> > On May 3, 2020, at 5:58 PM, Ted Dunning 
> wrote:
> >
> > It appears that the certificate issue is resolved.
> >
> > Can somebody verify this by doing a compilation?
> >
> > I have to add that based on the number of off-line and on-list pings
> I
>  got
> > about this issue I can say that there were quite a few people
> compiling
> > Drill on a Sunday morning. That bodes well, I think, for community
>  health.
> >
> >
> >
> > On Sun, May 3, 2020 at 11:27 AM Ted Dunning 
>  wrote:
> >
> >>
> >> I just got word back that the team is looking at the issue.
> >>
> >> Not surprisingly, their first look indicates that the issue isn't
> what
>  it
> >> appears to be (i.e. not a bad cert)
> >>
> >>
> >>
> 
> 
> >>
> >>
>
>


Re: compile issue with MapR repo is being worked

2020-05-04 Thread Charles Givre
HI Ted, Vova, 
My PR is still blocked by the MapR repos.  After reverting back to the HTTP 
repo (which does seem to be working) we're now getting the following error:

[ERROR] Failed to execute goal on project drill-format-mapr: Could not resolve 
dependencies for project 
org.apache.drill.contrib:drill-format-mapr:jar:1.18.0-SNAPSHOT: Could not 
transfer artifact com.mapr.hadoop:maprfs:jar:6.1.0-mapr from/to mapr-releases 
(http://repository.mapr.com/maven/): GET request of: 
com/mapr/hadoop/maprfs/6.1.0-mapr/maprfs-6.1.0-mapr.jar from mapr-releases 
failed: Premature end of Content-Length delimited message body (expected: 
67,884,262; received: 47,333,376) -> [Help 1]
3146
 
[ERROR]
 
3147
 
[ERROR]
 To see the full stack trace of the errors, re-run Maven with the -e switch.
3148
 
[ERROR]
 Re-run Maven using the -X switch to enable full debug logging.
3149
 
[ERROR]
 
3150
 
[ERROR]
 For more information about the errors and possible solutions, please read the 
following articles:
3151
 
[ERROR]
 [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
3152
 
[ERROR]
 
3153
 
[ERROR]
 After correcting the problems, you can resume the build with the command
3154
 
[ERROR]
   mvn  -rf :drill-format-mapr
3155
 
##[error]Process
 completed with exit code 1.

Thanks for your help on quickly addressing this issue.
-- C

> On May 4, 2020, at 2:48 AM, Vova Vysotskyi  wrote:
> 
> Hi Ted,
> 
> Thanks for your help! It looks like for http protocol this issue was
> resolved.
> 
> Kind regards,
> Volodymyr Vysotskyi
> 
> 
> On Mon, May 4, 2020 at 4:19 AM Charles Givre  wrote:
> 
>> Hi Ted,
>> Thanks for your help.  You can view the logs here:
>> https://github.com/apache/drill/pull/2067 <
>> https://github.com/apache/drill/pull/2067> in the CI stuff.
>> -- C
>> 
>> 
>> 
>> 
>>> On May 3, 2020, at 9:16 PM, Ted Dunning  wrote:
>>> 
>>> I will pass the word.
>>> 
>>> Do you have logs?
>>> 
>>> 
>>> On Sun, May 3, 2020 at 4:15 PM Charles Givre  wrote:
>>> 
 Hi Ted,
 Thanks for looking into this so quickly.  Unfortunately, I re-ran the CI
 jobs from github and it is still producing the same errors.
 Best,
 --C
 
> On May 3, 2020, at 5:58 PM, Ted Dunning  wrote:
> 
> It appears that the certificate issue is resolved.
> 
> Can somebody verify this by doing a compilation?
> 
> I have to add that based on the number of off-line and on-list pings I
 got
> about this issue I can say that there were quite a few people compiling
> Drill on a Sunday morning. That bodes well, I think, for community
 health.
> 
> 
> 
> On Sun, May 3, 2020 at 11:27 AM Ted Dunning 
 wrote:
> 
>> 
>> I just got word back that the team is looking at the issue.
>> 
>> Not surprisingly, their first look indicates that the issue isn't what
 it
>> appears to be (i.e. not a bad cert)
>> 
>> 
>> 
 
 
>> 
>> 



[GitHub] [drill] vvysotskyi commented on a change in pull request #2072: DRILL-7724: Refactor metadata controller batch

2020-05-04 Thread GitBox


vvysotskyi commented on a change in pull request #2072:
URL: https://github.com/apache/drill/pull/2072#discussion_r419244221



##
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/metadata/MetadataControllerBatch.java
##
@@ -127,113 +126,93 @@ protected MetadataControllerBatch(MetadataControllerPOP 
popConfig,
 ? null
 : popConfig.getContext().metadataToHandle().stream()
 .collect(Collectors.toMap(MetadataInfo::identifier, 
Function.identity()));
-this.metadataUnits = new ArrayList<>();
-this.statisticsCollector = new StatisticsCollectorImpl();
 this.columnNamesOptions = new ColumnNamesOptions(context.getOptions());
   }
 
-  protected boolean setupNewSchema() {
-container.clear();
-container.addOrGet(MetastoreAnalyzeConstants.OK_FIELD_NAME, 
Types.required(TypeProtos.MinorType.BIT), null);
-container.addOrGet(MetastoreAnalyzeConstants.SUMMARY_FIELD_NAME, 
Types.required(TypeProtos.MinorType.VARCHAR), null);
-container.buildSchema(BatchSchema.SelectionVectorMode.NONE);
-container.setEmpty();
-return true;
-  }
-
   @Override
   public IterOutcome innerNext() {
-IterOutcome outcome;
-boolean finishedLeft;
-if (finished) {
-  return IterOutcome.NONE;
-}
+while (state != State.FINISHED) {
+  switch (state) {
+case RIGHT: {
 
-if (!finishedRight) {
-  outcome = handleRightIncoming();
-  if (outcome != null) {
-return outcome;
+  // Can only return NOT_YET
+  IterOutcome outcome = handleRightIncoming();
+  if (outcome != null) {
+return outcome;
+  }
+  break;
+}
+case LEFT: {
+
+  // Can only return NOT_YET
+  IterOutcome outcome = handleLeftIncoming();
+  if (outcome != null) {
+return outcome;
+  }
+  break;
+}
+case WRITE:
+  writeToMetastore();
+  createSummary();
+  state = State.FINISHED;
+  return IterOutcome.OK_NEW_SCHEMA;
+
+case FINISHED:
+  break;
+
+default:
+  throw new IllegalStateException(state.name());
   }
 }
+return IterOutcome.NONE;
+  }
 
+  private IterOutcome handleRightIncoming() {
 outer:
-while (true) {
-  outcome = next(0, left);
+for (;;) {

Review comment:
   I use IntelliJ IDEA, the warning was the following: `'for' loop may be 
replaced with 'while' loop`.
   Its description is
   ```
   Reports for loops which contain neither initialization or update components, 
and can thus be replaced by simpler while statements. Example:
 for(; exitCondition(); ) {
   process();
 }
   This loop can be replaced with
 while(exitCondition()) {
   process();
 }
   A fix action is also available for other for loops, so you can replace any 
for loop with while. 
   Use the checkbox below if you wish this inspection to ignore for loops with 
trivial or non-existent conditions.
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] vvysotskyi commented on a change in pull request #2068: DRILL-7717: Support Mongo extended types in V2 JSON loader

2020-05-04 Thread GitBox


vvysotskyi commented on a change in pull request #2068:
URL: https://github.com/apache/drill/pull/2068#discussion_r419241974



##
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/values/UtcTimestampValueListener.java
##
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.easy.json.values;
+
+import java.time.Instant;
+import java.time.ZoneId;
+
+import org.apache.drill.exec.store.easy.json.loader.JsonLoaderImpl;
+import org.apache.drill.exec.store.easy.json.parser.TokenIterator;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+
+import com.fasterxml.jackson.core.JsonToken;
+
+/**
+ * Per the https://docs.mongodb.com/manual/reference/mongodb-extended-json-v1/#bson.data_date;>
+ * V1 docs:
+ * 
+ * In Strict mode, {@code } is an ISO-8601 date format with a mandatory 
time zone field
+ * following the template -MM-DDTHH:mm:ss.mmm<+/-Offset>.
+ * 
+ * In Shell mode, {@code } is the JSON representation of a 64-bit signed
+ * integer giving the number of milliseconds since epoch UTC.
+ * 
+ * 
+ * Drill dates are in the local time zone, so conversion is needed.
+ */
+public class UtcTimestampValueListener extends ScalarListener {
+
+  private static final ZoneId localZoneId = ZoneId.systemDefault();

Review comment:
   Please rename it to upper case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: compile issue with MapR repo is being worked

2020-05-04 Thread Vova Vysotskyi
Hi Ted,

Thanks for your help! It looks like for http protocol this issue was
resolved.

Kind regards,
Volodymyr Vysotskyi


On Mon, May 4, 2020 at 4:19 AM Charles Givre  wrote:

> Hi Ted,
> Thanks for your help.  You can view the logs here:
> https://github.com/apache/drill/pull/2067 <
> https://github.com/apache/drill/pull/2067> in the CI stuff.
> -- C
>
>
>
>
> > On May 3, 2020, at 9:16 PM, Ted Dunning  wrote:
> >
> > I will pass the word.
> >
> > Do you have logs?
> >
> >
> > On Sun, May 3, 2020 at 4:15 PM Charles Givre  wrote:
> >
> >> Hi Ted,
> >> Thanks for looking into this so quickly.  Unfortunately, I re-ran the CI
> >> jobs from github and it is still producing the same errors.
> >> Best,
> >> --C
> >>
> >>> On May 3, 2020, at 5:58 PM, Ted Dunning  wrote:
> >>>
> >>> It appears that the certificate issue is resolved.
> >>>
> >>> Can somebody verify this by doing a compilation?
> >>>
> >>> I have to add that based on the number of off-line and on-list pings I
> >> got
> >>> about this issue I can say that there were quite a few people compiling
> >>> Drill on a Sunday morning. That bodes well, I think, for community
> >> health.
> >>>
> >>>
> >>>
> >>> On Sun, May 3, 2020 at 11:27 AM Ted Dunning 
> >> wrote:
> >>>
> 
>  I just got word back that the team is looking at the issue.
> 
>  Not surprisingly, their first look indicates that the issue isn't what
> >> it
>  appears to be (i.e. not a bad cert)
> 
> 
> 
> >>
> >>
>
>