[jira] [Commented] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-16 Thread Boaz Ben-Zvi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953226#comment-16953226
 ] 

Boaz Ben-Zvi commented on DRILL-7405:
-

Seems to be working now, but that file should be placed on another storage, not 
AWS.

 

> Build fails due to inaccessible apache-drill on S3 storage
> --
>
> Key: DRILL-7405
> URL: https://issues.apache.org/jira/browse/DRILL-7405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.16.0
>Reporter: Boaz Ben-Zvi
>Assignee: Abhishek Girish
>Priority: Minor
> Fix For: 1.17.0
>
>
>   A new clean build (e.g. after deleting the ~/.m2 local repository) would 
> fail now due to:  
> Access denied to: 
> [http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz&d=DwMGaQ&c=C5b8zRQO1miGmBeVZ2LFWg&r=KLC1nKJ8dIOnUay2kR6CAw&m=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU&s=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk&e=]
>  
> (e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )
> A new publicly available storage place is needed, plus appropriate changes in 
> Drill to get to these resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-16 Thread Abhishek Girish (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953225#comment-16953225
 ] 

Abhishek Girish commented on DRILL-7405:


Changing priority as it's no longer blocking builds. Will keep JIRA open to 
find a long term solution.

> Build fails due to inaccessible apache-drill on S3 storage
> --
>
> Key: DRILL-7405
> URL: https://issues.apache.org/jira/browse/DRILL-7405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.16.0
>Reporter: Boaz Ben-Zvi
>Assignee: Abhishek Girish
>Priority: Minor
> Fix For: 1.17.0
>
>
>   A new clean build (e.g. after deleting the ~/.m2 local repository) would 
> fail now due to:  
> Access denied to: 
> [http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz&d=DwMGaQ&c=C5b8zRQO1miGmBeVZ2LFWg&r=KLC1nKJ8dIOnUay2kR6CAw&m=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU&s=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk&e=]
>  
> (e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )
> A new publicly available storage place is needed, plus appropriate changes in 
> Drill to get to these resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-16 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7405:
---
Priority: Minor  (was: Blocker)

> Build fails due to inaccessible apache-drill on S3 storage
> --
>
> Key: DRILL-7405
> URL: https://issues.apache.org/jira/browse/DRILL-7405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.16.0
>Reporter: Boaz Ben-Zvi
>Assignee: Abhishek Girish
>Priority: Minor
> Fix For: 1.17.0
>
>
>   A new clean build (e.g. after deleting the ~/.m2 local repository) would 
> fail now due to:  
> Access denied to: 
> [http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz&d=DwMGaQ&c=C5b8zRQO1miGmBeVZ2LFWg&r=KLC1nKJ8dIOnUay2kR6CAw&m=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU&s=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk&e=]
>  
> (e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )
> A new publicly available storage place is needed, plus appropriate changes in 
> Drill to get to these resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-16 Thread Abhishek Girish (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953220#comment-16953220
 ] 

Abhishek Girish commented on DRILL-7405:


Hey [~ben-zvi], this works for me. I am able to download the below file:

http://apache-drill.s3.amazonaws.com/files/sf-0.01_tpc-h_parquet_typed.tgz

Could you please try again?

> Build fails due to inaccessible apache-drill on S3 storage
> --
>
> Key: DRILL-7405
> URL: https://issues.apache.org/jira/browse/DRILL-7405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.16.0
>Reporter: Boaz Ben-Zvi
>Assignee: Abhishek Girish
>Priority: Blocker
> Fix For: 1.17.0
>
>
>   A new clean build (e.g. after deleting the ~/.m2 local repository) would 
> fail now due to:  
> Access denied to: 
> [http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz&d=DwMGaQ&c=C5b8zRQO1miGmBeVZ2LFWg&r=KLC1nKJ8dIOnUay2kR6CAw&m=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU&s=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk&e=]
>  
> (e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )
> A new publicly available storage place is needed, plus appropriate changes in 
> Drill to get to these resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953104#comment-16953104
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335650653
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split();
+op

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953105#comment-16953105
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335650653
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split();
+op

[jira] [Assigned] (DRILL-6990) IllegalStateException: The current reader doesn't support getting next information

2019-10-16 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko reassigned DRILL-6990:
---

Assignee: Igor Guzenko

> IllegalStateException: The current reader doesn't support getting next 
> information
> --
>
> Key: DRILL-6990
> URL: https://issues.apache.org/jira/browse/DRILL-6990
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: parqt_nestedArray.parquet.tar
>
>
> Reading a parquet file created from Spark, returns IllegalStateException: The 
> current reader doesn't support getting next information
> Drill 1.14.0, parquet file created from Spark is attached here.
> //Steps to create parquet file from Spark 2.3.1
> [root@ba102-495 ~]# cd /opt/mapr/spark/spark-2.3.1
> [root@ba102-495 spark-2.3.1]# cd bin
> [root@ba102-495 bin]# ./spark-shell
> 19/01/21 22:57:05 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Spark context Web UI available at http://qa102-45.qa.lab:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1548111430809).
> Spark session available as 'spark'.
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 2.3.1-mapr-SNAPSHOT
>  /_/
> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> import spark.implicits._
> import spark.implicits._
> scala> val df = spark.read.json("/apps/nestedDataJson.json")
> df: org.apache.spark.sql.DataFrame = [id: bigint, nested_array: 
> array>]
> scala> df.write.parquet("/apps/parqt_nestedArray.parquet")
> Data used in test
> {noformat}
> [root@ba102-495 ~]# cat nestedDataJson.json
> {"id":19,"nested_array":[[1,2,3,4],[5,6,7,8],[9,10,12]]}
> {"id":14121,"nested_array":[[1,3,4],[5,6,8],[9,11,12]]}
> {"id":18894,"nested_array":[[1,3,4],[5,6,7,8],[9,10,11,12]]}
> {"id":12499,"nested_array":[[1,4],[5,7,8],[9,11,12]]}
> {"id":120,"nested_array":[[1,4],[5,7,8],[9,10,11,12]]}
> {"id":12,"nested_array":[[1,2,3,4],[5,6,7,8],[11,12]]}
> {"id":13,"nested_array":[[1,2,3,4],[5,8],[9,10,11,12]]}
> {"id":14,"nested_array":[[1,2,3,4],[5,68],[9,10,11,12]]}
> {"id":123,"nested_array":[[1,2,3,4],[5,8],[9,10,11,12]]}
> {"id":124,"nested_array":[[1,2,4],[5,6,7,8],[9,10,11,12]]}
> {"id":134,"nested_array":[[1,4],[5,8],[9,12]]}
> {noformat}
> From drillbit.log
> {noformat}
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalStateException: The current reader doesn't support getting next 
> information. Fragment 0:0 [Error Id: c16c70dd-6565-463f-83a7-118ccd8442e2 on 
> ba102-495.qa.lab:31010]
> ...
> ...
> 2019-01-21 23:08:11,268 [23b9af24-10b9-ad11-5583-ecc3e0c562e6:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> The current reader doesn't support getting next information.
> Fragment 0:0
> [Error Id: c16c70dd-6565-463f-83a7-118ccd8442e2 on ba102-495.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: The current reader doesn't support getting next 
> information.
> Fragment 0:0
> [Error Id: c16c70dd-6565-463f-83a7-118ccd8442e2 on ba102-495.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_181]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_181]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
> Caused by: java.lang.IllegalStateException: The current reader doesn't 
> support getting next information.
>  at 
> org.apache.drill.exec.vector.complex.impl.AbstractBaseReader.next(AbstractBaseReader.java:64)
>  ~[vector-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.exec.vector.complex.impl.SingleMapReaderImpl.next(SingleMapReade

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952995#comment-16952995
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

paul-rogers commented on issue #1749: DRILL-7177: Format Plugin for Excel Files
URL: https://github.com/apache/drill/pull/1749#issuecomment-542792797
 
 
   One additional thought: how are column types handled? Do we require all 
fields to have the same types within a column?
   
   Name | Balance
   | -
   Fred | 123.45
   Barney | 556.78
   
   If so, then it might make sense to hold an array of column handler objects, 
like in the earlier version of the Regex plugin, that holds the column accessor 
and performs any required type conversions. This would be cleaner/faster than 
doing a `switch` per column.
   
   Can a column type vary?
   
   Name | Balance
   | -
   Fred | 123.45
   Barney | "Not sure"
   
   If so, then the code has to handle conversions. The above would generate an 
error, but would we allow:
   
   Name | Balance
   | -
   Fred | 123.45
   Barney | "556.78"
   
   Can a column start as empty (null) so that we have to defer type selection?
   
   Name | Balance
   | -
   Fred | 
   Barney | 556.78
   
   If so, then we need a way to defer column type selection until we see the 
first non-null value. The (not yet merged) new JSON reader hands the 
deferred-type case; I can share that logic if you need it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (DRILL-7195) Query returns incorrect result or does not fail when cast with is null is used in filter condition

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-7195.
-
Resolution: Fixed

> Query returns incorrect result or does not fail when cast with is null is 
> used in filter condition
> --
>
> Key: DRILL-7195
> URL: https://issues.apache.org/jira/browse/DRILL-7195
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. For the case when a query contains filter with a {{cast}} which cannot be 
> done with {{is null}}, the query does not fail:
> {code:sql}
> select * from dfs.tmp.`a.json` as t where cast(t.a as integer) is null;
> +---+
> | a |
> +---+
> +---+
> No rows selected (0.142 seconds)
> {code}
> where
> {noformat}
> cat /tmp/a.json
> {"a":"aaa"}
> {noformat}
> But for the case when this condition is specified in project, query, as it is 
> expected, fails:
> {code:sql}
> select cast(t.a as integer) is null from dfs.tmp.`a.json` t;
> Error: SYSTEM ERROR: NumberFormatException: aaa
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: ed3982ce-a12f-4d63-bc6e-cafddf28cc24 on user515050-pc:31010] 
> (state=,code=0)
> {code}
> This is a regression, for Drill 1.15 the first and the second queries are 
> failed:
> {code:sql}
> select * from dfs.tmp.`a.json` as t where cast(t.a as integer) is null;
> Error: SYSTEM ERROR: NumberFormatException: aaa
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: 2f878f15-ddaa-48cd-9dfb-45c04db39048 on user515050-pc:31010] 
> (state=,code=0)
> {code}
> 2. For the case when {{drill.exec.functions.cast_empty_string_to_null}} is 
> enabled, this issue will cause wrong results:
> {code:sql}
> alter system set `drill.exec.functions.cast_empty_string_to_null`=true;
> select * from dfs.tmp.`a1.json` t where cast(t.a as integer) is null;
> +---+
> | a |
> +---+
> +---+
> No rows selected (1.759 seconds)
> {code}
> where
> {noformat}
> cat /tmp/a1.json 
> {"a":"1"}
> {"a":""}
> {noformat}
> Result for Drill 1.15.0:
> {code:sql}
> select * from dfs.tmp.`a1.json` t where cast(t.a as integer) is null;
> ++
> | a  |
> ++
> ||
> ++
> 1 row selected (1.724 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7401) Sqlline 1.9 upgrade

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7401:

Description: 
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

*TODO:*
1. Add SqlLine properties: 
{{connectInteractionMode: useNPTogetherOrEmpty}} - supports connection mehanism 
used in SqlLine 1.17 and earlier:
a. if user and password are not indicated, connects without them (user and 
password are set t empty string): {{./drill-embedded}}
b. if user is indicated, asks for password in interactive mode: 
{{./drill-embedded -n "user1"}}
c. if user is indicated as empty string, behaives like in point a (user and 
password are set t empty string): {{./drill-embedded -n ""}}
d. if user and password are indicated, connects using provided input 
{{./drill-embedded -n "user1" -p "123"}}

{{showLineNumbers: true}} - adds line numbers when query is more than one line:
{noformat}
apache drill> select
2..semicolon> *
3..semicolon> from
4..semicolon> sys.version;
{noformat}

2. Remove nohup support code from sqlline.sh since it is not needed any more 
(nohup support wroks without flag):
{code}
To add nohup support for SQLline script
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
   export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
fi
{code}

3. Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
{noformat}
Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
{noformat}

4. Remove unneeded echo commands in sqlline.bat during start up:
{noformat}
drill-embedded.bat
DRILL_ARGS - " -u jdbc:drill:zk=local -n user1 -p ppp"
Calculating HADOOP_CLASSPATH ...
HBASE_HOME not detected...
Calculating Drill classpath...
Apache Drill 1.17.0-SNAPSHOT
"Data is the new oil. Ready to Drill some?"
apache drill>
{noformat}

  was:
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

*TODO:*
1. Add SqlLine properties: 
{{connectInteractionMode: useNPTogetherOrEmpty}} - supports connection mehanism 
used in SqlLine 1.17 and earlier:
a. if user and password are not indicated, connects without them (user and 
password are set t empty string): {{./drill-embedded}}
b. if user is indicated, asks for password in interactive mode: 
{{./drill-embedded -n "user1"}}
c. if user is indicated as empty string, behaives like in point a (user and 
password are set t empty string): {{./drill-embedded -n ""}}
d. if user and password are indicated, connects using provided input 
{{./drill-embedded -n "user1" -p ''123"}}

{{showLineNumbers: true}} - adds line numbers when query is more than one line:
{noformat}
apache drill> select
2..semicolon> *
3..semicolon> from
4..semicolon> sys.version;
{noformat}

2. Remove nohup support code from sqlline.sh since it is not needed any more 
(nohup support wroks without flag):
{code}
To add nohup support for SQLline script
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
   export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
fi
{code}

3. Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
{noformat}
Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
{noformat}

4. Remove unneeded echo commands in sqlline.bat during start up:
{noformat}
drill-embedded.bat
DRILL_ARGS - " -u jdbc:drill:zk=local -n user1 -p ppp"
Calculating HADOOP_CLASSPATH ...
HBASE_HOME not detected...
Calculating Drill classpath...
Apache Drill 1.17.0-SNAPSHOT
"Data is the new oil. Ready to Drill some?"
apache drill>
{noformat}


> Sqlline 1.9 upgrade
> ---
>
> Key: DRILL-7401
> URL: https://issues.apache.org/jira/browse/DRILL-7401
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Upgrade to SqlLine 1.9 once it is released 
> (https://github.com/julianhyde/sqlline/issues/350).
> *TODO:*
> 1. Add SqlLine properties: 
> {{connectInteractionMode: useNPTogetherOrEmpty}} - supports connection 
> mehanism used in SqlLine 1.17 and earlier:
> a. if user and password are not indicated, connects without them (user and 
> password are set t empty string): {{./drill-embedded}}
> b. if user is indicated, asks for password in interactive mode: 
> {{./drill-embedded -n "user1"}}
> c. if user is indicated as empty string, behaives like in point a (user a

[jira] [Updated] (DRILL-7401) Sqlline 1.9 upgrade

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7401:

Description: 
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

*TODO:*
1. Add SqlLine properties: 
{{connectInteractionMode: useNPTogetherOrEmpty}} - supports connection mehanism 
used in SqlLine 1.17 and earlier:
a. if user and password are not indicated, connects without them (user and 
password are set t empty string): {{./drill-embedded}}
b. if user is indicated, asks for password in interactive mode: 
{{./drill-embedded -n "user1"}}
c. if user is indicated as empty string, behaives like in point a (user and 
password are set t empty string): {{./drill-embedded -n ""}}
d. if user and password are indicated, connects using provided input 
{{./drill-embedded -n "user1" -p ''123"}}

{{showLineNumbers: true}} - adds line numbers when query is more than one line:
{noformat}
apache drill> select
2..semicolon> *
3..semicolon> from
4..semicolon> sys.version;
{noformat}

2. Remove nohup support code from sqlline.sh since it is not needed any more 
(nohup support wroks without flag):
{code}
To add nohup support for SQLline script
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
   export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
fi
{code}

3. Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
{noformat}
Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
{noformat}

4. Remove unneeded echo commands in sqlline.bat during start up:
{noformat}
drill-embedded.bat
DRILL_ARGS - " -u jdbc:drill:zk=local -n user1 -p ppp"
Calculating HADOOP_CLASSPATH ...
HBASE_HOME not detected...
Calculating Drill classpath...
Apache Drill 1.17.0-SNAPSHOT
"Data is the new oil. Ready to Drill some?"
apache drill>
{noformat}

  was:
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

*TODO:*
1. Add SqlLine properties: 
{{connectInteractionMode: useNPTogetherOrEmpty}} - supports connection mehanism 
used in SqlLine 1.17 and earlier:
a. if user and password are not indicated, connects without them (user and 
password are set t empty string): {{./drill-embedded}}
b. if user is indicated, asks for password in interactive mode: 
{{./drill-embedded -n "user1"}}
c. if user is indicated as empty string, behaives like in point a (user and 
password are set t empty string): {{./drill-embedded -n ""}}
d. if user and password are indicated, connects using provided input 
{{./drill-embedded -n "user1 -p ''123"}}

{{showLineNumbers: true}} - adds line numbers when query is more than one line:
{noformat}
apache drill> select
2..semicolon> *
3..semicolon> from
4..semicolon> sys.version;
{noformat}

2. Remove nohup support code from sqlline.sh since it is not needed any more 
(nohup support wroks without flag):
{code}
To add nohup support for SQLline script
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
   export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
fi
{code}

3. Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
{noformat}
Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
{noformat}

4. Remove unneeded echo commands in sqlline.bat during start up:
{noformat}
drill-embedded.bat
DRILL_ARGS - " -u jdbc:drill:zk=local -n user1 -p ppp"
Calculating HADOOP_CLASSPATH ...
HBASE_HOME not detected...
Calculating Drill classpath...
Apache Drill 1.17.0-SNAPSHOT
"Data is the new oil. Ready to Drill some?"
apache drill>
{noformat}


> Sqlline 1.9 upgrade
> ---
>
> Key: DRILL-7401
> URL: https://issues.apache.org/jira/browse/DRILL-7401
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Upgrade to SqlLine 1.9 once it is released 
> (https://github.com/julianhyde/sqlline/issues/350).
> *TODO:*
> 1. Add SqlLine properties: 
> {{connectInteractionMode: useNPTogetherOrEmpty}} - supports connection 
> mehanism used in SqlLine 1.17 and earlier:
> a. if user and password are not indicated, connects without them (user and 
> password are set t empty string): {{./drill-embedded}}
> b. if user is indicated, asks for password in interactive mode: 
> {{./drill-embedded -n "user1"}}
> c. if user is indicated as empty string, behaives like in point a (user a

[jira] [Updated] (DRILL-7401) Sqlline 1.9 upgrade

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7401:

Description: 
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

*TODO:*
1. Add SqlLine properties: 
{{connectInteractionMode: useNPTogetherOrEmpty}} - supports connection mehanism 
used in SqlLine 1.17 and earlier:
a. if user and password are not indicated, connects without them (user and 
password are set t empty string): {{./drill-embedded}}
b. if user is indicated, asks for password in interactive mode: 
{{./drill-embedded -n "user1"}}
c. if user is indicated as empty string, behaives like in point a (user and 
password are set t empty string): {{./drill-embedded -n ""}}
d. if user and password are indicated, connects using provided input 
{{./drill-embedded -n "user1 -p ''123"}}

{{showLineNumbers: true}} - adds line numbers when query is more than one line:
{noformat}
apache drill> select
2..semicolon> *
3..semicolon> from
4..semicolon> sys.version;
{noformat}

2. Remove nohup support code from sqlline.sh since it is not needed any more 
(nohup support wroks without flag):
{code}
To add nohup support for SQLline script
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
   export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
fi
{code}

3. Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
{noformat}
Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
{noformat}

4. Remove unneeded echo commands in sqlline.bat during start up:
{noformat}
drill-embedded.bat
DRILL_ARGS - " -u jdbc:drill:zk=local -n user1 -p ppp"
Calculating HADOOP_CLASSPATH ...
HBASE_HOME not detected...
Calculating Drill classpath...
Apache Drill 1.17.0-SNAPSHOT
"Data is the new oil. Ready to Drill some?"
apache drill>
{noformat}

  was:
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

*TODO:*
1. Add SqlLine properties: 
{{connectInteractionMode: useNPTogetherOrEmpty}} - supports connection mehanism 
used in SqlLine 1.17 and earlier:
a. if user and password are not indicated, connects without them (user and 
password are set t empty string): {{./drill-embedded}}
b. if user is indicated, asks for password in interactive mode: 
{{./drill-embedded -n "user1"}}
c. if user is indicated as empty string, behaives like in point a (user and 
password are set t empty string): {{./drill-embedded -n ""}}
d. if user and password are indicated, connects using provided input 
{{./drill-embedded -n "user1 -p ''123"}}

{{showLineNumbers: true}} - adds line numbers when query is more than one line:
{noformat]
apache drill> select
2..semicolon> *
3..semicolon> from
4..semicolon> sys.version;
{noformat}

2. Remove nohup support code from sqlline.sh since it is not needed any more 
(nohup support wroks without flag):
{code}
To add nohup support for SQLline script
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
   export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
fi
{code}

3. Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
{noformat}
Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
{noformat}

4. Remove unneeded echo commands in sqlline.bat during start up:
{noformat}
drill-embedded.bat
DRILL_ARGS - " -u jdbc:drill:zk=local -n user1 -p ppp"
Calculating HADOOP_CLASSPATH ...
HBASE_HOME not detected...
Calculating Drill classpath...
Apache Drill 1.17.0-SNAPSHOT
"Data is the new oil. Ready to Drill some?"
apache drill>
{noformat}


> Sqlline 1.9 upgrade
> ---
>
> Key: DRILL-7401
> URL: https://issues.apache.org/jira/browse/DRILL-7401
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Upgrade to SqlLine 1.9 once it is released 
> (https://github.com/julianhyde/sqlline/issues/350).
> *TODO:*
> 1. Add SqlLine properties: 
> {{connectInteractionMode: useNPTogetherOrEmpty}} - supports connection 
> mehanism used in SqlLine 1.17 and earlier:
> a. if user and password are not indicated, connects without them (user and 
> password are set t empty string): {{./drill-embedded}}
> b. if user is indicated, asks for password in interactive mode: 
> {{./drill-embedded -n "user1"}}
> c. if user is indicated as empty string, behaives like in point a (user an

[jira] [Updated] (DRILL-7401) Sqlline 1.9 upgrade

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7401:

Description: 
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

*TODO:*
1. Add SqlLine properties: 
{{connectInteractionMode: useNPTogetherOrEmpty}} - supports connection mehanism 
used in SqlLine 1.17 and earlier:
a. if user and password are not indicated, connects without them (user and 
password are set t empty string): {{./drill-embedded}}
b. if user is indicated, asks for password in interactive mode: 
{{./drill-embedded -n "user1"}}
c. if user is indicated as empty string, behaives like in point a (user and 
password are set t empty string): {{./drill-embedded -n ""}}
d. if user and password are indicated, connects using provided input 
{{./drill-embedded -n "user1 -p ''123"}}

{{showLineNumbers: true}} - adds line numbers when query is more than one line:
{noformat]
apache drill> select
2..semicolon> *
3..semicolon> from
4..semicolon> sys.version;
{noformat}

2. Remove nohup support code from sqlline.sh since it is not needed any more 
(nohup support wroks without flag):
{code}
To add nohup support for SQLline script
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
   export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
fi
{code}

3. Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
{noformat}
Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
{noformat}

4. Remove unneeded echo commands in sqlline.bat during start up:
{noformat}
drill-embedded.bat
DRILL_ARGS - " -u jdbc:drill:zk=local -n user1 -p ppp"
Calculating HADOOP_CLASSPATH ...
HBASE_HOME not detected...
Calculating Drill classpath...
Apache Drill 1.17.0-SNAPSHOT
"Data is the new oil. Ready to Drill some?"
apache drill>
{noformat}

  was:
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

*TODO:*
1. Add SqlLine properties: 
connectInteractionMode: useNPTogetherOrEmpty
showLineNumbers: true

2. Remove nohup support code from sqlline.sh since it is not needed any more 
(nohup support wroks without flag):
{code}
To add nohup support for SQLline script
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
   export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
fi
{code}

3. Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
{noformat}
Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
{noformat}

4. Remove unneeded echo commands in sqlline.bat during start up:
{noformat}
drill-embedded.bat
DRILL_ARGS - " -u jdbc:drill:zk=local -n user1 -p ppp"
Calculating HADOOP_CLASSPATH ...
HBASE_HOME not detected...
Calculating Drill classpath...
Apache Drill 1.17.0-SNAPSHOT
"Data is the new oil. Ready to Drill some?"
apache drill>
{noformat}


> Sqlline 1.9 upgrade
> ---
>
> Key: DRILL-7401
> URL: https://issues.apache.org/jira/browse/DRILL-7401
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Upgrade to SqlLine 1.9 once it is released 
> (https://github.com/julianhyde/sqlline/issues/350).
> *TODO:*
> 1. Add SqlLine properties: 
> {{connectInteractionMode: useNPTogetherOrEmpty}} - supports connection 
> mehanism used in SqlLine 1.17 and earlier:
> a. if user and password are not indicated, connects without them (user and 
> password are set t empty string): {{./drill-embedded}}
> b. if user is indicated, asks for password in interactive mode: 
> {{./drill-embedded -n "user1"}}
> c. if user is indicated as empty string, behaives like in point a (user and 
> password are set t empty string): {{./drill-embedded -n ""}}
> d. if user and password are indicated, connects using provided input 
> {{./drill-embedded -n "user1 -p ''123"}}
> {{showLineNumbers: true}} - adds line numbers when query is more than one 
> line:
> {noformat]
> apache drill> select
> 2..semicolon> *
> 3..semicolon> from
> 4..semicolon> sys.version;
> {noformat}
> 2. Remove nohup support code from sqlline.sh since it is not needed any more 
> (nohup support wroks without flag):
> {code}
> To add nohup support for SQLline script
> if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
>export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
> -Djline.terminal=jline.Unsupport

[jira] [Updated] (DRILL-7401) Sqlline 1.9 upgrade

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7401:

Description: 
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

*TODO:*
1. Add SqlLine properties: 
connectInteractionMode: useNPTogetherOrEmpty
showLineNumbers: true

2. Remove nohup support code from sqlline.sh since it is not needed any more 
(nohup support wroks without flag):
{code}
To add nohup support for SQLline script
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
   export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
fi
{code}

3. Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
{noformat}
Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
{noformat}

4. Remove unneeded echo commands in sqlline.bat during start up:
{noformat}
drill-embedded.bat
DRILL_ARGS - " -u jdbc:drill:zk=local -n user1 -p ppp"
Calculating HADOOP_CLASSPATH ...
HBASE_HOME not detected...
Calculating Drill classpath...
Apache Drill 1.17.0-SNAPSHOT
"Data is the new oil. Ready to Drill some?"
apache drill>
{noformat}

  was:
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

*TODO:*
Add SqlLine properties: 
connectInteractionMode: useNPTogetherOrEmpty
showLineNumbers: true

Remove nohup support code from sqlline.sh since it is not needed any more 
(nohup support wroks without flag):
{code}
To add nohup support for SQLline script
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
   export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
fi
{code}

Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
{noformat}
Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
{noformat}


> Sqlline 1.9 upgrade
> ---
>
> Key: DRILL-7401
> URL: https://issues.apache.org/jira/browse/DRILL-7401
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Upgrade to SqlLine 1.9 once it is released 
> (https://github.com/julianhyde/sqlline/issues/350).
> *TODO:*
> 1. Add SqlLine properties: 
> connectInteractionMode: useNPTogetherOrEmpty
> showLineNumbers: true
> 2. Remove nohup support code from sqlline.sh since it is not needed any more 
> (nohup support wroks without flag):
> {code}
> To add nohup support for SQLline script
> if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
>export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
> -Djline.terminal=jline.UnsupportedTerminal"
> fi
> {code}
> 3. Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning 
> when submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
> {noformat}
> Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
> WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
> debug logging for more information)
> {noformat}
> 4. Remove unneeded echo commands in sqlline.bat during start up:
> {noformat}
> drill-embedded.bat
> DRILL_ARGS - " -u jdbc:drill:zk=local -n user1 -p ppp"
> Calculating HADOOP_CLASSPATH ...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Apache Drill 1.17.0-SNAPSHOT
> "Data is the new oil. Ready to Drill some?"
> apache drill>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7401) Sqlline 1.9 upgrade

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7401:

Description: 
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

*TODO:*
Add SqlLine properties: 
connectInteractionMode: useNPTogetherOrEmpty
showLineNumbers: true

Remove nohup support code from sqlline.sh since it is not needed any more 
(nohup support wroks without flag):
{code}
To add nohup support for SQLline script
if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
   export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
-Djline.terminal=jline.UnsupportedTerminal"
fi
{code}

Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
{noformat}
Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
debug logging for more information)
{noformat}

  was:
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

TODO:
Add SqlLine properties: 
connectInteractionMode: useNPTogetherOrEmpty
showLineNumbers: true


> Sqlline 1.9 upgrade
> ---
>
> Key: DRILL-7401
> URL: https://issues.apache.org/jira/browse/DRILL-7401
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Upgrade to SqlLine 1.9 once it is released 
> (https://github.com/julianhyde/sqlline/issues/350).
> *TODO:*
> Add SqlLine properties: 
> connectInteractionMode: useNPTogetherOrEmpty
> showLineNumbers: true
> Remove nohup support code from sqlline.sh since it is not needed any more 
> (nohup support wroks without flag):
> {code}
> To add nohup support for SQLline script
> if [[ ( ! $(ps -o stat= -p $$) =~ "+" ) && ! ( -p /dev/stdin ) ]]; then
>export SQLLINE_JAVA_OPTS="$SQLLINE_JAVA_OPTS 
> -Djline.terminal=jline.UnsupportedTerminal"
> fi
> {code}
> Add {{-Dorg.jline.terminal.dumb=true}} to avoid JLine terminal warning when 
> submitting query in sqlline.sh to execute via {{-e}} or {{-f}}:
> {noformat}
> Oct 11, 2019 2:14:45 PM org.jline.utils.Log logr
> WARNING: Unable to create a system terminal, creating a dumb terminal (enable 
> debug logging for more information)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7405:

Fix Version/s: 1.17.0

> Build fails due to inaccessible apache-drill on S3 storage
> --
>
> Key: DRILL-7405
> URL: https://issues.apache.org/jira/browse/DRILL-7405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.16.0
>Reporter: Boaz Ben-Zvi
>Assignee: Abhishek Girish
>Priority: Blocker
> Fix For: 1.17.0
>
>
>   A new clean build (e.g. after deleting the ~/.m2 local repository) would 
> fail now due to:  
> Access denied to: 
> [http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz&d=DwMGaQ&c=C5b8zRQO1miGmBeVZ2LFWg&r=KLC1nKJ8dIOnUay2kR6CAw&m=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU&s=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk&e=]
>  
> (e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )
> A new publicly available storage place is needed, plus appropriate changes in 
> Drill to get to these resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952924#comment-16952924
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335541105
 
 

 ##
 File path: contrib/format-excel/README.md
 ##
 @@ -0,0 +1,36 @@
+# Excel Format Plugin
+This plugin enables Drill to read Microsoft Excel files.  This format is best 
used with Excel files that do not have extensive formatting, however it will 
work with formatted files, by allowing you to define a region within the file 
where the data is.  
+
+The plugin will automatically evaluate cells which contain formulae.  
+
+## Plugin Configuration 
+This plugin has several configuration variables which must be set in order to 
read Excel files effectively.  Since Excel files often contain other elements 
besides data, you can use the configuration variables to define a region within 
your spreadsheet in which Drill should extract data.  This is potentially 
useful if your spreadsheet contains a lot of formatting or other complications. 
+
+* `headerRow`:  Set to -1 if there are no column headers.  
+* `lastRow`:  This defines the last row of your data.  The default is an 
arbitrary large number.  You only will need to set this if you want Drill to 
stop reading at a specific location.
+* `sheetName`:  This is the name of the sheet you want to query.  This will 
default to the first sheet in the file if left undefined. 
+* `firstColumn`:  If you want to define a region within a spreadsheet, this is 
the left-most column index.  This is indexed from one.  If set to `0` Drill 
will start at the left most column.
+* `lastColumn`:  If you want to define a region within a spreadsheet, this is 
the right-most column index.  This is indexed from one.  If set to `0` Drill 
will read all available columns.  This is not inclusive, so if you ask for 
columns 2-5 you will get columns 2,3 and 4. 
+
+## Usage
+You can specify the configuration at runtime via the `table()` method or in 
the storage plugin configuration.  For instance, if you just want to query an 
Excel file, you could execute the query as follows:
 
 Review comment:
   ```suggestion
   You can specify the configuration at runtime via the `table()` function or 
in the storage plugin configuration.  For instance, if you just want to query 
an Excel file, you could execute the query as follows:
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952929#comment-16952929
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335545045
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
 
 Review comment:
   can some of the fields be final?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952935#comment-16952935
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335546864
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split(

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952922#comment-16952922
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335540761
 
 

 ##
 File path: contrib/format-excel/README.md
 ##
 @@ -0,0 +1,36 @@
+# Excel Format Plugin
+This plugin enables Drill to read Microsoft Excel files.  This format is best 
used with Excel files that do not have extensive formatting, however it will 
work with formatted files, by allowing you to define a region within the file 
where the data is.  
+
+The plugin will automatically evaluate cells which contain formulae.  
+
+## Plugin Configuration 
+This plugin has several configuration variables which must be set in order to 
read Excel files effectively.  Since Excel files often contain other elements 
besides data, you can use the configuration variables to define a region within 
your spreadsheet in which Drill should extract data.  This is potentially 
useful if your spreadsheet contains a lot of formatting or other complications. 
+
+* `headerRow`:  Set to -1 if there are no column headers.  
+* `lastRow`:  This defines the last row of your data.  The default is an 
arbitrary large number.  You only will need to set this if you want Drill to 
stop reading at a specific location.
 
 Review comment:
   I ma not sure that referring as `you ` is common documentation language 
style...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952925#comment-16952925
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335541229
 
 

 ##
 File path: contrib/format-excel/README.md
 ##
 @@ -0,0 +1,36 @@
+# Excel Format Plugin
+This plugin enables Drill to read Microsoft Excel files.  This format is best 
used with Excel files that do not have extensive formatting, however it will 
work with formatted files, by allowing you to define a region within the file 
where the data is.  
+
+The plugin will automatically evaluate cells which contain formulae.  
+
+## Plugin Configuration 
+This plugin has several configuration variables which must be set in order to 
read Excel files effectively.  Since Excel files often contain other elements 
besides data, you can use the configuration variables to define a region within 
your spreadsheet in which Drill should extract data.  This is potentially 
useful if your spreadsheet contains a lot of formatting or other complications. 
+
+* `headerRow`:  Set to -1 if there are no column headers.  
+* `lastRow`:  This defines the last row of your data.  The default is an 
arbitrary large number.  You only will need to set this if you want Drill to 
stop reading at a specific location.
+* `sheetName`:  This is the name of the sheet you want to query.  This will 
default to the first sheet in the file if left undefined. 
+* `firstColumn`:  If you want to define a region within a spreadsheet, this is 
the left-most column index.  This is indexed from one.  If set to `0` Drill 
will start at the left most column.
+* `lastColumn`:  If you want to define a region within a spreadsheet, this is 
the right-most column index.  This is indexed from one.  If set to `0` Drill 
will read all available columns.  This is not inclusive, so if you ask for 
columns 2-5 you will get columns 2,3 and 4. 
+
+## Usage
+You can specify the configuration at runtime via the `table()` method or in 
the storage plugin configuration.  For instance, if you just want to query an 
Excel file, you could execute the query as follows:
+
+```
+SELECT  
+FROM dfs.`somefile.xlsx`
+```
+
+If you wanted to query a different sheet other than the default, use the 
`table()` method as shown below:
 
 Review comment:
   ```suggestion
   If you wanted to query a different sheet other than the default, use the 
`table()` function as shown below:
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952930#comment-16952930
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335543565
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split(

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952931#comment-16952931
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335543367
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split(

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952927#comment-16952927
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335541785
 
 

 ##
 File path: contrib/format-excel/README.md
 ##
 @@ -0,0 +1,36 @@
+# Excel Format Plugin
+This plugin enables Drill to read Microsoft Excel files.  This format is best 
used with Excel files that do not have extensive formatting, however it will 
work with formatted files, by allowing you to define a region within the file 
where the data is.  
+
+The plugin will automatically evaluate cells which contain formulae.  
+
+## Plugin Configuration 
+This plugin has several configuration variables which must be set in order to 
read Excel files effectively.  Since Excel files often contain other elements 
besides data, you can use the configuration variables to define a region within 
your spreadsheet in which Drill should extract data.  This is potentially 
useful if your spreadsheet contains a lot of formatting or other complications. 
+
+* `headerRow`:  Set to -1 if there are no column headers.  
+* `lastRow`:  This defines the last row of your data.  The default is an 
arbitrary large number.  You only will need to set this if you want Drill to 
stop reading at a specific location.
+* `sheetName`:  This is the name of the sheet you want to query.  This will 
default to the first sheet in the file if left undefined. 
+* `firstColumn`:  If you want to define a region within a spreadsheet, this is 
the left-most column index.  This is indexed from one.  If set to `0` Drill 
will start at the left most column.
+* `lastColumn`:  If you want to define a region within a spreadsheet, this is 
the right-most column index.  This is indexed from one.  If set to `0` Drill 
will read all available columns.  This is not inclusive, so if you ask for 
columns 2-5 you will get columns 2,3 and 4. 
+
+## Usage
+You can specify the configuration at runtime via the `table()` method or in 
the storage plugin configuration.  For instance, if you just want to query an 
Excel file, you could execute the query as follows:
+
+```
+SELECT  
+FROM dfs.`somefile.xlsx`
+```
+
+If you wanted to query a different sheet other than the default, use the 
`table()` method as shown below:
+```
+SELECT  
+FROM table( dfs.`test_data.xlsx` (type => 'excel', sheetName => 'secondSheet'))
+```
+Theoretically, you could join data together from different sheets as follows:
 
 Review comment:
   Why theoretically? Did you try to this? If yes and it works, than please 
remove theoretically :) If it does not better remove the example.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952937#comment-16952937
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335546719
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split(

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952933#comment-16952933
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335542533
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
 
 Review comment:
   Remove comment.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Ve

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952923#comment-16952923
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335539415
 
 

 ##
 File path: contrib/format-excel/README.md
 ##
 @@ -0,0 +1,36 @@
+# Excel Format Plugin
+This plugin enables Drill to read Microsoft Excel files.  This format is best 
used with Excel files that do not have extensive formatting, however it will 
work with formatted files, by allowing you to define a region within the file 
where the data is.  
 
 Review comment:
   Please remove additional spaces after dots... Not sure what formatting was 
used but they appear everywhere in .md file.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952934#comment-16952934
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335541996
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
 
 Review comment:
   Imports...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952926#comment-16952926
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335540172
 
 

 ##
 File path: contrib/format-excel/README.md
 ##
 @@ -0,0 +1,36 @@
+# Excel Format Plugin
+This plugin enables Drill to read Microsoft Excel files.  This format is best 
used with Excel files that do not have extensive formatting, however it will 
work with formatted files, by allowing you to define a region within the file 
where the data is.  
+
+The plugin will automatically evaluate cells which contain formulae.  
+
+## Plugin Configuration 
+This plugin has several configuration variables which must be set in order to 
read Excel files effectively.  Since Excel files often contain other elements 
besides data, you can use the configuration variables to define a region within 
your spreadsheet in which Drill should extract data.  This is potentially 
useful if your spreadsheet contains a lot of formatting or other complications. 
+
+* `headerRow`:  Set to -1 if there are no column headers.  
+* `lastRow`:  This defines the last row of your data.  The default is an 
arbitrary large number.  You only will need to set this if you want Drill to 
stop reading at a specific location.
 
 Review comment:
   Can you please specify exact number?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952932#comment-16952932
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335543067
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split(

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952936#comment-16952936
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335544085
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split(

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952928#comment-16952928
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335544654
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split(

[jira] [Updated] (DRILL-7406) Update Calcite to 1.21.0

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7406:

Reviewer: Vova Vysotskyi

> Update Calcite to 1.21.0
> 
>
> Key: DRILL-7406
> URL: https://issues.apache.org/jira/browse/DRILL-7406
> Project: Apache Drill
>  Issue Type: Task
>  Components: Query Planning & Optimization, SQL Parser
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952772#comment-16952772
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335443550
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split();
+op

[jira] [Updated] (DRILL-7401) Sqlline 1.9 upgrade

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7401:

Description: 
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

TODO:
Add SqlLine properties: 
connectInteractionMode: useNPTogetherOrEmpty
showLineNumbers: true

  was:
Upgrade to SqlLine 1.9 once it is released 
(https://github.com/julianhyde/sqlline/issues/350).

TODO:
Add SqlLine property: connectInteractionMode: useNPTogetherOrEmpty


> Sqlline 1.9 upgrade
> ---
>
> Key: DRILL-7401
> URL: https://issues.apache.org/jira/browse/DRILL-7401
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Upgrade to SqlLine 1.9 once it is released 
> (https://github.com/julianhyde/sqlline/issues/350).
> TODO:
> Add SqlLine properties: 
> connectInteractionMode: useNPTogetherOrEmpty
> showLineNumbers: true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7401) Sqlline 1.9 upgrade

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7401:

Reviewer: Vova Vysotskyi

> Sqlline 1.9 upgrade
> ---
>
> Key: DRILL-7401
> URL: https://issues.apache.org/jira/browse/DRILL-7401
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Upgrade to SqlLine 1.9 once it is released 
> (https://github.com/julianhyde/sqlline/issues/350).
> TODO:
> Add SqlLine property: connectInteractionMode: useNPTogetherOrEmpty



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7406) Update Calcite to 1.21.0

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7406:

Affects Version/s: 1.16.0

> Update Calcite to 1.21.0
> 
>
> Key: DRILL-7406
> URL: https://issues.apache.org/jira/browse/DRILL-7406
> Project: Apache Drill
>  Issue Type: Task
>  Components: Query Planning & Optimization, SQL Parser
>Affects Versions: 1.16.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7406) Update Calcite to 1.21.0

2019-10-16 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7406:

Affects Version/s: (was: 1.16.0)

> Update Calcite to 1.21.0
> 
>
> Key: DRILL-7406
> URL: https://issues.apache.org/jira/browse/DRILL-7406
> Project: Apache Drill
>  Issue Type: Task
>  Components: Query Planning & Optimization, SQL Parser
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7406) Update Calcite to 1.21.0

2019-10-16 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952714#comment-16952714
 ] 

Vova Vysotskyi commented on DRILL-7406:
---

Please also cherry-pick fix for CALCITE-1178, it with changes made in 
CALCITE-2302 will help to remove Drill specific commit

 {{Drill-specific changes: added a general class for Date/Time/Timestamp 
literals (TimestampString, DateString, TimeString) to avoid class cast 
exceptions.}}

Also, please track CALCITE-2018, it may be merged into master soon with some 
additional changes, so it would be good to have those commits instead of 
specific ones.

Finally, please cherry-pick fix for CALCITE-3390 when it is merged since it is 
required for DRILL-7391.


> Update Calcite to 1.21.0
> 
>
> Key: DRILL-7406
> URL: https://issues.apache.org/jira/browse/DRILL-7406
> Project: Apache Drill
>  Issue Type: Task
>  Components: Query Planning & Optimization, SQL Parser
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7406) Update Calcite to 1.21.0

2019-10-16 Thread Igor Guzenko (Jira)
Igor Guzenko created DRILL-7406:
---

 Summary: Update Calcite to 1.21.0
 Key: DRILL-7406
 URL: https://issues.apache.org/jira/browse/DRILL-7406
 Project: Apache Drill
  Issue Type: Task
  Components: Query Planning & Optimization, SQL Parser
Reporter: Igor Guzenko
Assignee: Igor Guzenko






--
This message was sent by Atlassian Jira
(v8.3.4#803005)