[jira] [Commented] (DRILL-7424) Project operator fails to set the container row count

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961588#comment-16961588
 ] 

ASF GitHub Bot commented on DRILL-7424:
---

paul-rogers commented on issue #1882: DRILL-7424: Project operator fails to set 
the container row count
URL: https://github.com/apache/drill/pull/1882#issuecomment-547215825
 
 
   Addressed the comment and squashed the minor change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Project operator fails to set the container row count
> -
>
> Key: DRILL-7424
> URL: https://issues.apache.org/jira/browse/DRILL-7424
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Enabled the "batch validator" for the Project operator. Ran tests. Exceptions 
> occurred because, in some paths, the Project operator fails to set the 
> container row count.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7424) Project operator fails to set the container row count

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961587#comment-16961587
 ] 

ASF GitHub Bot commented on DRILL-7424:
---

paul-rogers commented on pull request #1882: DRILL-7424: Project operator fails 
to set the container row count
URL: https://github.com/apache/drill/pull/1882#discussion_r339860025
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java
 ##
 @@ -54,7 +53,7 @@
  */
 
 public class BatchValidator {
-  private static final Logger logger = 
LoggerFactory.getLogger(BatchValidator.class);
+  private static Logger logger = LoggerFactory.getLogger(BatchValidator.class);
 
 Review comment:
   Thanks for catching that error.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Project operator fails to set the container row count
> -
>
> Key: DRILL-7424
> URL: https://issues.apache.org/jira/browse/DRILL-7424
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Enabled the "batch validator" for the Project operator. Ran tests. Exceptions 
> occurred because, in some paths, the Project operator fails to set the 
> container row count.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7427) Efficiency Improvements for ESRI Shapefile Format Plugin

2019-10-28 Thread Charles Givre (Jira)
Charles Givre created DRILL-7427:


 Summary: Efficiency Improvements for ESRI Shapefile Format Plugin
 Key: DRILL-7427
 URL: https://issues.apache.org/jira/browse/DRILL-7427
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.18.0
Reporter: Charles Givre


The ESRI Shapefile format plugin has a few inefficiencies in writing the 
columns.  The schema for shapefiles is only partially known, and is not in a 
definite order. [~Paul.Rogers] suggested that there are ways of increasing the 
efficiency of the ESRI Shapefile reader by optimizing how Drill reads the 
fields.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7426) Json support lists of different types

2019-10-28 Thread Paul Rogers (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961336#comment-16961336
 ] 

Paul Rogers commented on DRILL-7426:


[~cgivre], I should have seen that one coming...

But, seriously, a provided schema turns out to be the best way to predict the 
future.

> Json support lists of different types
> -
>
> Key: DRILL-7426
> URL: https://issues.apache.org/jira/browse/DRILL-7426
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.16.0
>Reporter: benj
>Priority: Trivial
>
> With a file.json like
> {code:json}
> {
> "name": "toto",
> "info": [["LOAD", []]],
> "response": 1
> }
> {code}
> A simple SELECT gives an error
> {code:sql}
> apache drill> SELECT * FROM dfs.test.`file.json`;
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types.
> {code}
> But there is an option _exec.enable_union_type_ that allows these request
> {code:sql}
> apache drill> ALTER SESSION SET `exec.enable_union_type` = true;
> apache drill> SELECT * FROM dfs.test.`file.json`;
> +--+---+--+
> | name | info  | response |
> +--+---+--+
> | toto | [["LOAD",[]]] | 1|
> +--+---+--+
> 1 row selected (0.283 seconds)
> {code}
> The usage of this option is not evident. So, it will be useful to mention 
> after the error message the possibility to set it.
> {noformat}
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types.  SET 
> the option 'exec.enable_union_type' to true and try again;
> {noformat}
> This behaviour is used for other error, example:
> {noformat}
> ...
> Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due 
> to either a cartesian join or an inequality join. 
> If a cartesian or inequality join is used intentionally, set the option 
> 'planner.enable_nljoin_for_scalar_only' to false and try again.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7426) Json support lists of different types

2019-10-28 Thread Charles Givre (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961326#comment-16961326
 ] 

Charles Givre commented on DRILL-7426:
--

Dammit [~paul-rogers], Can't you just figure out how to predict the future 
already? ;)

> Json support lists of different types
> -
>
> Key: DRILL-7426
> URL: https://issues.apache.org/jira/browse/DRILL-7426
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.16.0
>Reporter: benj
>Priority: Trivial
>
> With a file.json like
> {code:json}
> {
> "name": "toto",
> "info": [["LOAD", []]],
> "response": 1
> }
> {code}
> A simple SELECT gives an error
> {code:sql}
> apache drill> SELECT * FROM dfs.test.`file.json`;
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types.
> {code}
> But there is an option _exec.enable_union_type_ that allows these request
> {code:sql}
> apache drill> ALTER SESSION SET `exec.enable_union_type` = true;
> apache drill> SELECT * FROM dfs.test.`file.json`;
> +--+---+--+
> | name | info  | response |
> +--+---+--+
> | toto | [["LOAD",[]]] | 1|
> +--+---+--+
> 1 row selected (0.283 seconds)
> {code}
> The usage of this option is not evident. So, it will be useful to mention 
> after the error message the possibility to set it.
> {noformat}
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types.  SET 
> the option 'exec.enable_union_type' to true and try again;
> {noformat}
> This behaviour is used for other error, example:
> {noformat}
> ...
> Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due 
> to either a cartesian join or an inequality join. 
> If a cartesian or inequality join is used intentionally, set the option 
> 'planner.enable_nljoin_for_scalar_only' to false and try again.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7426) Json support lists of different types

2019-10-28 Thread Paul Rogers (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961322#comment-16961322
 ] 

Paul Rogers commented on DRILL-7426:


[~cgivre], the query in question used the wildcard, which asks to read all 
columns. In general, the reader cannot predict the future: it cannot tell that 
`info` will contain mixed data.

However, Drill should work if the query were `SELECT name, response FROM ...`. 
If not, then that is a bug that is fixable.

The issue is that the user seems to need the data. One workaround is to rewrite 
the JSON so that the array is represented as an object:

{noformat}
{
"name": "toto",
"info": { command: "LOAD", values: [] },
"response": 1
}
{noformat}

But, here we run into the empty-array issue: we don't know the type of the 
`values` array...

In general, JSON can represent a wider set of data structures than relational 
tuples. It has always been an open question the variety of such data that Drill 
should handle. I think most users end up running an ETL to convert the data 
into a relational format (then store the data in Parquet for better 
performance.) So, one could debate whether it is worth adding more complexity 
to Drill.

> Json support lists of different types
> -
>
> Key: DRILL-7426
> URL: https://issues.apache.org/jira/browse/DRILL-7426
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.16.0
>Reporter: benj
>Priority: Trivial
>
> With a file.json like
> {code:json}
> {
> "name": "toto",
> "info": [["LOAD", []]],
> "response": 1
> }
> {code}
> A simple SELECT gives an error
> {code:sql}
> apache drill> SELECT * FROM dfs.test.`file.json`;
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types.
> {code}
> But there is an option _exec.enable_union_type_ that allows these request
> {code:sql}
> apache drill> ALTER SESSION SET `exec.enable_union_type` = true;
> apache drill> SELECT * FROM dfs.test.`file.json`;
> +--+---+--+
> | name | info  | response |
> +--+---+--+
> | toto | [["LOAD",[]]] | 1|
> +--+---+--+
> 1 row selected (0.283 seconds)
> {code}
> The usage of this option is not evident. So, it will be useful to mention 
> after the error message the possibility to set it.
> {noformat}
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types.  SET 
> the option 'exec.enable_union_type' to true and try again;
> {noformat}
> This behaviour is used for other error, example:
> {noformat}
> ...
> Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due 
> to either a cartesian join or an inequality join. 
> If a cartesian or inequality join is used intentionally, set the option 
> 'planner.enable_nljoin_for_scalar_only' to false and try again.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7426) Json support lists of different types

2019-10-28 Thread Charles Givre (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961297#comment-16961297
 ] 

Charles Givre commented on DRILL-7426:
--

[~paul-rogers]. I wonder if it might be worthwhile to have something in Drill 
where it ignores or does something other than dump the stack trace when it 
encounters unreadable data?   For instance, what if in this case, it read the 
entire field as a string?

> Json support lists of different types
> -
>
> Key: DRILL-7426
> URL: https://issues.apache.org/jira/browse/DRILL-7426
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.16.0
>Reporter: benj
>Priority: Trivial
>
> With a file.json like
> {code:json}
> {
> "name": "toto",
> "info": [["LOAD", []]],
> "response": 1
> }
> {code}
> A simple SELECT gives an error
> {code:sql}
> apache drill> SELECT * FROM dfs.test.`file.json`;
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types.
> {code}
> But there is an option _exec.enable_union_type_ that allows these request
> {code:sql}
> apache drill> ALTER SESSION SET `exec.enable_union_type` = true;
> apache drill> SELECT * FROM dfs.test.`file.json`;
> +--+---+--+
> | name | info  | response |
> +--+---+--+
> | toto | [["LOAD",[]]] | 1|
> +--+---+--+
> 1 row selected (0.283 seconds)
> {code}
> The usage of this option is not evident. So, it will be useful to mention 
> after the error message the possibility to set it.
> {noformat}
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types.  SET 
> the option 'exec.enable_union_type' to true and try again;
> {noformat}
> This behaviour is used for other error, example:
> {noformat}
> ...
> Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due 
> to either a cartesian join or an inequality join. 
> If a cartesian or inequality join is used intentionally, set the option 
> 'planner.enable_nljoin_for_scalar_only' to false and try again.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7418) MetadataDirectGroupScan improvements

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961285#comment-16961285
 ] 

ASF GitHub Bot commented on DRILL-7418:
---

vvysotskyi commented on pull request #1883: DRILL-7418: MetadataDirectGroupScan 
improvements
URL: https://github.com/apache/drill/pull/1883#discussion_r339690335
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/planner/logical/TestConvertCountToDirectScan.java
 ##
 @@ -17,364 +17,426 @@
  */
 package org.apache.drill.exec.planner.logical;
 
-import org.apache.drill.PlanTestBase;
 import org.apache.drill.categories.PlannerTest;
+import org.apache.drill.categories.UnlikelyTest;
 import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterFixtureBuilder;
+import org.apache.drill.test.ClusterTest;
 import org.junit.BeforeClass;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
 
 import java.nio.file.Paths;
 
-@Category(PlannerTest.class)
-public class TestConvertCountToDirectScan extends PlanTestBase {
+import static org.junit.Assert.assertEquals;
+
+@Category({PlannerTest.class, UnlikelyTest.class})
+public class TestConvertCountToDirectScan extends ClusterTest {
 
   @BeforeClass
-  public static void setupTestFiles() {
+  public static void setup() throws Exception {
+ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher);
 dirTestWatcher.copyResourceToRoot(Paths.get("directcount.parquet"));
+startCluster(builder);
   }
 
   @Test
-  public void ensureCaseDoesNotConvertToDirectScan() throws Exception {
-testPlanMatchingPatterns(
-"select count(case when n_name = 'ALGERIA' and n_regionkey = 2 then 
n_nationkey else null end) as cnt\n" +
-"from dfs.`directcount.parquet`", new String[]{"CASE"});
+  public void testCaseDoesNotConvertToDirectScan() throws Exception {
+queryBuilder()
+  .sql("select " +
+  "count(case when n_name = 'ALGERIA' and n_regionkey = 2 then n_nationkey 
else null end) as cnt " +
+  "from dfs.`directcount.parquet`")
+  .planMatcher()
+  .include("CASE")
+  .match();
   }
 
   @Test
-  public void ensureConvertSimpleCountToDirectScan() throws Exception {
+  public void testConvertSimpleCountToDirectScan() throws Exception {
 String sql = "select count(*) as cnt from cp.`tpch/nation.parquet`";
-testPlanMatchingPatterns(sql, new String[]{"DynamicPojoRecordReader"});
+
+queryBuilder()
+  .sql(sql)
+  .planMatcher()
+  .include("DynamicPojoRecordReader")
+  .match();
 
 testBuilder()
-.sqlQuery(sql)
-.unOrdered()
-.baselineColumns("cnt")
-.baselineValues(25L)
-.go();
+  .sqlQuery(sql)
+  .unOrdered()
+  .baselineColumns("cnt")
+  .baselineValues(25L)
+  .go();
   }
 
   @Test
-  public void ensureConvertSimpleCountConstToDirectScan() throws Exception {
+  public void testConvertSimpleCountConstToDirectScan() throws Exception {
 String sql = "select count(100) as cnt from cp.`tpch/nation.parquet`";
-testPlanMatchingPatterns(sql, new String[]{"DynamicPojoRecordReader"});
+
+queryBuilder()
+  .sql(sql)
+  .planMatcher()
+  .include("DynamicPojoRecordReader")
+  .match();
 
 testBuilder()
-.sqlQuery(sql)
-.unOrdered()
-.baselineColumns("cnt")
-.baselineValues(25L)
-.go();
+  .sqlQuery(sql)
+  .unOrdered()
+  .baselineColumns("cnt")
+  .baselineValues(25L)
+  .go();
   }
 
   @Test
-  public void ensureConvertSimpleCountConstExprToDirectScan() throws Exception 
{
+  public void testConvertSimpleCountConstExprToDirectScan() throws Exception {
 String sql = "select count(1 + 2) as cnt from cp.`tpch/nation.parquet`";
-testPlanMatchingPatterns(sql, new String[]{"DynamicPojoRecordReader"});
+
+queryBuilder()
+  .sql(sql)
+  .planMatcher()
+  .include("DynamicPojoRecordReader")
+  .match();
 
 testBuilder()
-.sqlQuery(sql)
-.unOrdered()
-.baselineColumns("cnt")
-.baselineValues(25L)
-.go();
+  .sqlQuery(sql)
+  .unOrdered()
+  .baselineColumns("cnt")
+  .baselineValues(25L)
+  .go();
   }
 
   @Test
-  public void ensureDoesNotConvertForDirectoryColumns() throws Exception {
+  public void testDoesNotConvertForDirectoryColumns() throws Exception {
 String sql = "select count(dir0) as cnt from cp.`tpch/nation.parquet`";
-testPlanMatchingPatterns(sql, new String[]{"ParquetGroupScan"});
+
+queryBuilder()
+  .sql(sql)
+  .planMatcher()
+  .include("ParquetGroupScan")
+  .match();
 
 testBuilder()
-.sqlQuery(sql)
-.unOrdered()
-.baselineColumns("cnt")
-.baselineValues(0L)
-.go();
+  .sqlQuery(sql)
+  .unOrdered()

[jira] [Commented] (DRILL-7418) MetadataDirectGroupScan improvements

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961284#comment-16961284
 ] 

ASF GitHub Bot commented on DRILL-7418:
---

vvysotskyi commented on pull request #1883: DRILL-7418: MetadataDirectGroupScan 
improvements
URL: https://github.com/apache/drill/pull/1883#discussion_r339697900
 
 

 ##
 File path: exec/java-exec/src/test/java/org/apache/drill/test/QueryBuilder.java
 ##
 @@ -764,4 +765,45 @@ protected String queryPlan(String columnName) throws 
Exception {
 
 return builder.toString();
   }
+
+  /**
+   * Collects expected and non-expected query patterns.
+   * Upon {@link #match()} method call, matches given patterns to the query 
plan.
+   */
+  public static class PlanMatcher {
+
+private static final String EXPECTED_NOT_FOUND = "Did not find expected 
pattern";
+private static final String UNEXPECTED_FOUND = "Found unwanted pattern";
+
+private final String plan;
+private final List included = new ArrayList<>();
+private final List excluded = new ArrayList<>();
+
+public PlanMatcher(String plan) {
+  this.plan = plan;
+}
+
+public PlanMatcher include(String... patterns) {
+  included.addAll(Arrays.asList(patterns));
+  return this;
+}
+
+public PlanMatcher exclude(String... patterns) {
+  excluded.addAll(Arrays.asList(patterns));
+  return this;
+}
+
+public void match() {
 
 Review comment:
   Could you please add JavaDoc for this method, since it may be possible to 
mix-up this and `Matcher.matches()` goals due to similar names, since here we 
do not require an entire string match to the pattern, only string parts.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> MetadataDirectGroupScan improvements
> 
>
> Key: DRILL-7418
> URL: https://issues.apache.org/jira/browse/DRILL-7418
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.17.0
>
>
> When count is converted to direct scan (case when statistics or table 
> metadata are available and there is no need to perform count operation), 
> {{MetadataDirectGroupScan}} is used. Proposed {{MetadataDirectGroupScan}} 
> enhancements:
> 1. Show table selection root instead listing all table files. If table has 
> lots of files, query plan gets polluted with all files enumeration. Since 
> files are not used for calculation (only metadata), they are not relevant and 
> can be excluded from the plan.
> Before:
> {noformat}
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02DirectScan(groupscan=[files = 
> [/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_0.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_5.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_4.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_9.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_3.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_6.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_7.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_10.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_2.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_1.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_8.parquet], 
> numFiles = 11, usedMetadataSummaryFile = false, 
> DynamicPojoRecordReader{records = [[1560060, 2880404, 2880404, 0]]}])
> {noformat}
> After:
> {noformat}
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02DirectScan(groupscan=[selectionRoot = 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all, numFiles = 11, 
> usedMetadataSummaryFile = false, DynamicPojoRecordReader{records = [[1560060, 
> 2880404, 2880404, 0]]}])
> {noformat}
> For Hive tables which were scanned directly, selection root is not available 
> thus will be omitted.
> 2. Submission of physical plan which contains {{MetadataDirectGroupScan}} 
> fails with deserialization errors, proper ser / de should be implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7347) Upgrade Apache Iceberg to released version

2019-10-28 Thread Vova Vysotskyi (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Vysotskyi updated DRILL-7347:
--
Labels: ready-to-commit  (was: )

> Upgrade Apache Iceberg to released version
> --
>
> Key: DRILL-7347
> URL: https://issues.apache.org/jira/browse/DRILL-7347
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Currently Drill uses Apache Iceberg build on certain commit using JitPack 
> since there is no official released version. Once Iceberg first version is 
> released, we need to use officially released version instead of commit.
> First official Iceberg version is 0.7.0-incubating.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7426) Json support lists of different types

2019-10-28 Thread Paul Rogers (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961257#comment-16961257
 ] 

Paul Rogers commented on DRILL-7426:


As it turns out, this is a known limitation of Drill. Drill is a relational 
engine, designed to serve relational clients such as JDBC and ODBC. Although 
Drill has a Union data type, that type remains experimental and not fully 
supported.

At present, it seems that the Union type can be passed through the scan 
operator to a SqlLine client, where it is converted to a string for display, as 
shown in your example. However, it is not supported by most other operators, 
resulting in the failure you reported.

The fundamental problem is that it is not clear how the Union type should work 
with clients (JDBC, ODBC) that require a traditional relational schema. Drill 
does not support extended SQL syntax (such as SQL++), just traditional 
relational SQL.

We have seen cases in which JSON authors use arrays as a compact representation 
of a tuple:

{noformat}
[ 10, "fred", "flintstone", "male", 12.34 ]
{noformat}

Is this the case with your example that contains, it seems, both a string and 
an array?

At present, Drill has no way to map such a tuple into a relational structure. 
One could imagine converting the array into, say, a Map with field names 
defined somehow.

Here, "all text mode" will not help as that mode can't handle array/string 
conflicts, only string/number conflicts.

> Json support lists of different types
> -
>
> Key: DRILL-7426
> URL: https://issues.apache.org/jira/browse/DRILL-7426
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.16.0
>Reporter: benj
>Priority: Trivial
>
> With a file.json like
> {code:json}
> {
> "name": "toto",
> "info": [["LOAD", []]],
> "response": 1
> }
> {code}
> A simple SELECT gives an error
> {code:sql}
> apache drill> SELECT * FROM dfs.test.`file.json`;
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types.
> {code}
> But there is an option _exec.enable_union_type_ that allows these request
> {code:sql}
> apache drill> ALTER SESSION SET `exec.enable_union_type` = true;
> apache drill> SELECT * FROM dfs.test.`file.json`;
> +--+---+--+
> | name | info  | response |
> +--+---+--+
> | toto | [["LOAD",[]]] | 1|
> +--+---+--+
> 1 row selected (0.283 seconds)
> {code}
> The usage of this option is not evident. So, it will be useful to mention 
> after the error message the possibility to set it.
> {noformat}
> Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
> value of type LIST. Drill does not support lists of different types.  SET 
> the option 'exec.enable_union_type' to true and try again;
> {noformat}
> This behaviour is used for other error, example:
> {noformat}
> ...
> Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due 
> to either a cartesian join or an inequality join. 
> If a cartesian or inequality join is used intentionally, set the option 
> 'planner.enable_nljoin_for_scalar_only' to false and try again.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7424) Project operator fails to set the container row count

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961256#comment-16961256
 ] 

ASF GitHub Bot commented on DRILL-7424:
---

vvysotskyi commented on pull request #1882: DRILL-7424: Project operator fails 
to set the container row count
URL: https://github.com/apache/drill/pull/1882#discussion_r339680258
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java
 ##
 @@ -54,7 +53,7 @@
  */
 
 public class BatchValidator {
-  private static final Logger logger = 
LoggerFactory.getLogger(BatchValidator.class);
+  private static Logger logger = LoggerFactory.getLogger(BatchValidator.class);
 
 Review comment:
   I think it would be better to leave `logger` final.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Project operator fails to set the container row count
> -
>
> Key: DRILL-7424
> URL: https://issues.apache.org/jira/browse/DRILL-7424
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Enabled the "batch validator" for the Project operator. Ran tests. Exceptions 
> occurred because, in some paths, the Project operator fails to set the 
> container row count.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961240#comment-16961240
 ] 

Vova Vysotskyi commented on DRILL-4303:
---

Merged into Apache master with commit id 
[8f40dc9e|https://github.com/apache/drill/commit/8f40dc9ea50c036a36ddc25183f48ce578a5154e].

> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961238#comment-16961238
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

vvysotskyi commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7426) Json support lists of different types

2019-10-28 Thread benj (Jira)
benj created DRILL-7426:
---

 Summary: Json support lists of different types
 Key: DRILL-7426
 URL: https://issues.apache.org/jira/browse/DRILL-7426
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.16.0
Reporter: benj


With a file.json like
{code:json}
{
"name": "toto",
"info": [["LOAD", []]],
"response": 1
}
{code}
A simple SELECT gives an error
{code:sql}
apache drill> SELECT * FROM dfs.test.`file.json`;
Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
value of type LIST. Drill does not support lists of different types.
{code}
But there is an option _exec.enable_union_type_ that allows these request
{code:sql}
apache drill> ALTER SESSION SET `exec.enable_union_type` = true;
apache drill> SELECT * FROM dfs.test.`file.json`;
+--+---+--+
| name | info  | response |
+--+---+--+
| toto | [["LOAD",[]]] | 1|
+--+---+--+
1 row selected (0.283 seconds)
{code}
The usage of this option is not evident. So, it will be useful to mention after 
the error message the possibility to set it.
{noformat}
Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, encountered a 
value of type LIST. Drill does not support lists of different types.  SET 
the option 'exec.enable_union_type' to true and try again;
{noformat}
This behaviour is used for other error, example:
{noformat}
...
Error: UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due 
to either a cartesian join or an inequality join. 
If a cartesian or inequality join is used intentionally, set the option 
'planner.enable_nljoin_for_scalar_only' to false and try again.
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7423) Create More Efficient Way to Read Excel Cells

2019-10-28 Thread Charles Givre (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre updated DRILL-7423:
-
Description: 
The Excel format plugin reads cells but there are ways to make the reading 
process more efficient.  Since the schema of an Excel file is not known in 
advance, Drill must read the first row of data in order to extract the schema.  

It is actually a bit more complex.  To read the schema, Drill must first read 
the header rows and convert them all into Strings.  This gets us the header 
names if present.

Drill cannot create writers until it actually reads the first row of data where 
it will determine the data types.  This creates an inefficiency in that when 
Drill is writing the columns, it has to do a hash lookup for each column.  
Since the columns are in a fixed order, it may be possible to store the writers 
in an array and gain some efficiency there.

Also at present, if the columns are heterogenous, Drill requires the user to 
use allTextMode to query the data.  It would be nice if Drill could query the 
data w/o having to set that.

  was:
The Excel format plugin reads cells but there are ways to make the reading 
process more efficient.  Since the schema of an Excel file is not known in 
advance, Drill must read the first row of data in order to extract the schema.  

It is actually a bit more complex.  To read the schema, Drill must first read 
the header rows and convert them all into Strings.  This gets us the header 
names if present.

Drill cannot create writers until it actually reads the first row of data where 
it will determine the data types.  This creates an inefficiency in that when 
Drill is writing the columns, it has to do a hash lookup for each column.  
Since the columns are in a fixed order, it may be possible to store the writers 
in a 


> Create More Efficient Way to Read Excel Cells
> -
>
> Key: DRILL-7423
> URL: https://issues.apache.org/jira/browse/DRILL-7423
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Priority: Major
>
> The Excel format plugin reads cells but there are ways to make the reading 
> process more efficient.  Since the schema of an Excel file is not known in 
> advance, Drill must read the first row of data in order to extract the 
> schema.  
> It is actually a bit more complex.  To read the schema, Drill must first read 
> the header rows and convert them all into Strings.  This gets us the header 
> names if present.
> Drill cannot create writers until it actually reads the first row of data 
> where it will determine the data types.  This creates an inefficiency in that 
> when Drill is writing the columns, it has to do a hash lookup for each 
> column.  Since the columns are in a fixed order, it may be possible to store 
> the writers in an array and gain some efficiency there.
> Also at present, if the columns are heterogenous, Drill requires the user to 
> use allTextMode to query the data.  It would be nice if Drill could query the 
> data w/o having to set that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7423) Create More Efficient Way to Read Excel Cells

2019-10-28 Thread Charles Givre (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre updated DRILL-7423:
-
Description: 
The Excel format plugin reads cells but there are ways to make the reading 
process more efficient.  Since the schema of an Excel file is not known in 
advance, Drill must read the first row of data in order to extract the schema.  

It is actually a bit more complex.  To read the schema, Drill must first read 
the header rows and convert them all into Strings.  This gets us the header 
names if present.

Drill cannot create writers until it actually reads the first row of data where 
it will determine the data types.  This creates an inefficiency in that when 
Drill is writing the columns, it has to do a hash lookup for each column.  
Since the columns are in a fixed order, it may be possible to store the writers 
in a 

  was:The Excel format plugin reads cells but there are ways to make the 
reading process more efficient.  


> Create More Efficient Way to Read Excel Cells
> -
>
> Key: DRILL-7423
> URL: https://issues.apache.org/jira/browse/DRILL-7423
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Priority: Major
>
> The Excel format plugin reads cells but there are ways to make the reading 
> process more efficient.  Since the schema of an Excel file is not known in 
> advance, Drill must read the first row of data in order to extract the 
> schema.  
> It is actually a bit more complex.  To read the schema, Drill must first read 
> the header rows and convert them all into Strings.  This gets us the header 
> names if present.
> Drill cannot create writers until it actually reads the first row of data 
> where it will determine the data types.  This creates an inefficiency in that 
> when Drill is writing the columns, it has to do a hash lookup for each 
> column.  Since the columns are in a fixed order, it may be possible to store 
> the writers in a 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961124#comment-16961124
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546985478
 
 
   Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961120#comment-16961120
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546984435
 
 
   I will create the ticket shortly. 
   
   > On Oct 28, 2019, at 10:54 AM, Arina Ielchiieva  
wrote:
   > 
   > @cgivre  one more thing, I think you forgot to 
create Jira for the reader enhancement Paul has mentioned.
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub 
,
 or unsubscribe 
.
   > 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961117#comment-16961117
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546984086
 
 
   @cgivre one more thing, I think you forgot to create Jira for the reader 
enhancement Paul has mentioned.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7406) Update Calcite to 1.21.0

2019-10-28 Thread Bohdan Kazydub (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961118#comment-16961118
 ] 

Bohdan Kazydub commented on DRILL-7406:
---

Also please undo changes done in 
{{org.apache.drill.exec.planner.physical.JoinPrel}} in scope of DRILL-7200 (as 
it is already fixed in CALCITE-3174).

> Update Calcite to 1.21.0
> 
>
> Key: DRILL-7406
> URL: https://issues.apache.org/jira/browse/DRILL-7406
> Project: Apache Drill
>  Issue Type: Task
>  Components: Query Planning  Optimization, SQL Parser
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> DRILL-7340 should be fixed by this update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961100#comment-16961100
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546979600
 
 
   Once full test run will pass, PR will be merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961099#comment-16961099
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546977509
 
 
   Thank you @arina-ielchiieva and @paul-rogers for the review.  Also thank you 
@k255 for the original PR.  Commits squashed. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4303:

Reviewer: Paul Rogers

> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961098#comment-16961098
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on issue #1858: DRILL-4303: ESRI Shapefile (shp) 
Format Plugin
URL: https://github.com/apache/drill/pull/1858#issuecomment-546975836
 
 
   +1, please rebase if needed and squash the commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4303:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961087#comment-16961087
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339584367
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+this.hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+this.hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable("gid", TypeProtos.MinorType.INT)
+  .addNullable("srid", TypeProtos.MinorType.INT)
+  .addNullable("shapeType", TypeProtos.MinorType.VARCHAR)
+  .addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+Geometry geom = null;
+
+while (!rowWriter.isFull()) {
+  Object[] 

[jira] [Commented] (DRILL-7347) Upgrade Apache Iceberg to released version

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961086#comment-16961086
 ] 

ASF GitHub Bot commented on DRILL-7347:
---

arina-ielchiieva commented on pull request #1884: DRILL-7347: Upgrade Apache 
Iceberg to released version
URL: https://github.com/apache/drill/pull/1884
 
 
   Jira - [DRILL-7347](https://issues.apache.org/jira/browse/DRILL-7347).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Apache Iceberg to released version
> --
>
> Key: DRILL-7347
> URL: https://issues.apache.org/jira/browse/DRILL-7347
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently Drill uses Apache Iceberg build on certain commit using JitPack 
> since there is no official released version. Once Iceberg first version is 
> released, we need to use officially released version instead of commit.
> First official Iceberg version is 0.7.0-incubating.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961080#comment-16961080
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339584367
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+this.hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+this.hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable("gid", TypeProtos.MinorType.INT)
+  .addNullable("srid", TypeProtos.MinorType.INT)
+  .addNullable("shapeType", TypeProtos.MinorType.VARCHAR)
+  .addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+Geometry geom = null;
+
+while (!rowWriter.isFull()) {
+  Object[] 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961067#comment-16961067
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339576671
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+this.hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+this.hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable("gid", TypeProtos.MinorType.INT)
+  .addNullable("srid", TypeProtos.MinorType.INT)
+  .addNullable("shapeType", TypeProtos.MinorType.VARCHAR)
+  .addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+Geometry geom = null;
+
+while (!rowWriter.isFull()) {
+  Object[] dbfRow = 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961060#comment-16961060
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339573014
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(RowSetTests.class)
+public class TestShapefileFormatPlugin extends ClusterTest {
 
 Review comment:
   Please replace `openPossiblyCompressedStream` with `open` since when `set 
config.compressible = false;` compressed file will never be processed thus you 
require reader to do extra logic which is unneeded.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961061#comment-16961061
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339573443
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,323 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+  private static final String GID_FIELD_NAME = "gid";
+  private static final String SRID_FIELD_NAME = "srid";
+  private static final String SHAPE_TYPE_FIELD_NAME = "shapeType";
+  private static final String GEOM_FIELD_NAME = "geom";
+  private static final String SRID_PATTERN_TEXT = 
"AUTHORITY\\[\"\\w+\"\\s*,\\s*\"*(\\d+)\"*\\]\\]$";
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+split = negotiator.split();
+hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable(GID_FIELD_NAME, TypeProtos.MinorType.INT)
+  .addNullable(SRID_FIELD_NAME, TypeProtos.MinorType.INT)
+  .addNullable(SHAPE_TYPE_FIELD_NAME, TypeProtos.MinorType.VARCHAR)
+  .addNullable(GEOM_FIELD_NAME, TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961056#comment-16961056
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339571526
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
+
+  public List extensions;
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961052#comment-16961052
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339569884
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
 
 Review comment:
   Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961053#comment-16961053
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339570276
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(RowSetTests.class)
+public class TestShapefileFormatPlugin extends ClusterTest {
 
 Review comment:
   I removed the unit test.  I left the open method to use the 
`openPossiblyCompressedStream()` function however in the event that a file is 
compressed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961050#comment-16961050
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339569795
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
+
+  public List extensions;
+
+  @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+  public List getExtensions() {
+if (extensions == null) {
+  return DEFAULT_EXTS;
+}
+return extensions;
+  }
+
+  public ShpBatchReader.ShpReaderConfig getReaderConfig(ShpFormatPlugin 
plugin) {
+ShpBatchReader.ShpReaderConfig readerConfig = new 
ShpBatchReader.ShpReaderConfig(plugin);
+
+return readerConfig;
+  }
+
+  @Override
+  public int hashCode() {
+return Arrays.hashCode(new Object[]{extensions});
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj == null || getClass() != obj.getClass()) {
+  return false;
+}
+ShpFormatConfig other = (ShpFormatConfig)obj;
+return Objects.equal(extensions, other.getExtensions() );
 
 Review comment:
   Fixed and replaced with `java.util.Objects`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961048#comment-16961048
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339569525
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961047#comment-16961047
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

cgivre commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp) Format 
Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339569053
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,161 @@
+/*
 
 Review comment:
   Removed file.  Not sure why that was there.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-1709) desc => describe command

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961011#comment-16961011
 ] 

ASF GitHub Bot commented on DRILL-1709:
---

arina-ielchiieva commented on pull request #1881: DRILL-1709: Add desc alias 
for describe command
URL: https://github.com/apache/drill/pull/1881
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> desc => describe command
> 
>
> Key: DRILL-1709
> URL: https://issues.apache.org/jira/browse/DRILL-1709
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 0.6.0
> Environment: MapR 4.0.1
>Reporter: Hari Sekhon
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> There is no desc command, can you please add that shortcut to describe.
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon
> DESCRIBE statements in Drill that should support desc alias:
> 1. describe schema for table dfs.tmp.`test_table`;
> 2. describe schema dfs.tmp;
> 3. describe information_schema.`catalogs`;
> 4. describe table information_schema.`catalogs`;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961012#comment-16961012
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7351) WebUI is Vulnerable to CSRF

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960974#comment-16960974
 ] 

ASF GitHub Bot commented on DRILL-7351:
---

agozhiy commented on pull request #1864: DRILL-7351: Added tokens to Web forms 
to prevent CSRF attacks
URL: https://github.com/apache/drill/pull/1864#discussion_r339526572
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/CsrfTokenInjectFilter.java
 ##
 @@ -43,11 +43,14 @@ public void init(FilterConfig filterConfig) throws 
ServletException {
   public void doFilter(ServletRequest request, ServletResponse response, 
FilterChain chain) throws IOException, ServletException {
 HttpServletRequest httpRequest = (HttpServletRequest) request;
 if (HttpMethod.GET.equalsIgnoreCase(httpRequest.getMethod())) {
+  // We don't create a session with this call as we need to check if there 
is a session already (i.e. if a user is logged in).
   HttpSession session = httpRequest.getSession(false);
   if (session != null) {
 String csrfToken = (String) 
session.getAttribute(WebServerConstants.CSRF_TOKEN);
 if (csrfToken == null) {
-  csrfToken = RandomStringUtils.random(20, 0, 0, true, true, null, new 
SecureRandom());
+  byte[] buffer = new byte[32];
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> WebUI is Vulnerable to CSRF
> ---
>
> Key: DRILL-7351
> URL: https://issues.apache.org/jira/browse/DRILL-7351
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Don Perial
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
> Attachments: Screen Shot 2019-08-19 at 10.11.50 AM.png, 
> drill-csrf.html
>
>
> There is no way to protect the WebUI from CSRF and the fact that the value 
> for the access-control-allow-origin header is '*' appears to confound this 
> issue as well.
> The attached file demonstrates the vulnerability.
> Steps to replicate:
>  # Login to an instance of the Drill WebUI.
>  # Edit the attached [^drill-csrf.html]. Replace DRILL_HOST with the hostname 
> of the Drill WebUI from step #1.
>  # Load the file from #2 in the same browser as #1 either new tab or same 
> window will do.
>  # Return to the Drill WebUI and click on 'Profiles'.
> Observed results:
> The query 'SELECT 100' appears in the list of executed queries (see:  
> [^Screen Shot 2019-08-19 at 10.11.50 AM.png] ).
> Expected results:
> It should be possible to whitelist or completely restrict code from other 
> domain names to submit queries to the WebUI.
> Risks:
> Potential for code execution by unauthorized parties.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7351) WebUI is Vulnerable to CSRF

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960971#comment-16960971
 ] 

ASF GitHub Bot commented on DRILL-7351:
---

agozhiy commented on pull request #1864: DRILL-7351: Added tokens to Web forms 
to prevent CSRF attacks
URL: https://github.com/apache/drill/pull/1864#discussion_r339526475
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryResources.java
 ##
 @@ -124,7 +126,7 @@ public QueryPage(WorkManager work, HttpServletRequest 
request) {
   onlyImpersonationEnabled = WebServer.isOnlyImpersonationEnabled(config);
   autoLimitEnabled = 
config.getBoolean(ExecConstants.HTTP_WEB_CLIENT_RESULTSET_AUTOLIMIT_CHECKED);
   defaultRowsAutoLimited = 
config.getInt(ExecConstants.HTTP_WEB_CLIENT_RESULTSET_AUTOLIMIT_ROWS);
-  csrfToken = Utilities.getCsrfTokenFromHttpRequest(request);
+  csrfToken = 
Optional.ofNullable(WebUtils.getCsrfTokenFromHttpRequest(request)).orElse("");
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> WebUI is Vulnerable to CSRF
> ---
>
> Key: DRILL-7351
> URL: https://issues.apache.org/jira/browse/DRILL-7351
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Don Perial
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
> Attachments: Screen Shot 2019-08-19 at 10.11.50 AM.png, 
> drill-csrf.html
>
>
> There is no way to protect the WebUI from CSRF and the fact that the value 
> for the access-control-allow-origin header is '*' appears to confound this 
> issue as well.
> The attached file demonstrates the vulnerability.
> Steps to replicate:
>  # Login to an instance of the Drill WebUI.
>  # Edit the attached [^drill-csrf.html]. Replace DRILL_HOST with the hostname 
> of the Drill WebUI from step #1.
>  # Load the file from #2 in the same browser as #1 either new tab or same 
> window will do.
>  # Return to the Drill WebUI and click on 'Profiles'.
> Observed results:
> The query 'SELECT 100' appears in the list of executed queries (see:  
> [^Screen Shot 2019-08-19 at 10.11.50 AM.png] ).
> Expected results:
> It should be possible to whitelist or completely restrict code from other 
> domain names to submit queries to the WebUI.
> Risks:
> Potential for code execution by unauthorized parties.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960970#comment-16960970
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on issue #1749: DRILL-7177: Format Plugin for Excel Files
URL: https://github.com/apache/drill/pull/1749#issuecomment-546915225
 
 
   Thanks @arina-ielchiieva and @paul-rogers for your review and help with this 
plugin!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7351) WebUI is Vulnerable to CSRF

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7351:

Reviewer: Arina Ielchiieva

> WebUI is Vulnerable to CSRF
> ---
>
> Key: DRILL-7351
> URL: https://issues.apache.org/jira/browse/DRILL-7351
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Don Perial
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
> Attachments: Screen Shot 2019-08-19 at 10.11.50 AM.png, 
> drill-csrf.html
>
>
> There is no way to protect the WebUI from CSRF and the fact that the value 
> for the access-control-allow-origin header is '*' appears to confound this 
> issue as well.
> The attached file demonstrates the vulnerability.
> Steps to replicate:
>  # Login to an instance of the Drill WebUI.
>  # Edit the attached [^drill-csrf.html]. Replace DRILL_HOST with the hostname 
> of the Drill WebUI from step #1.
>  # Load the file from #2 in the same browser as #1 either new tab or same 
> window will do.
>  # Return to the Drill WebUI and click on 'Profiles'.
> Observed results:
> The query 'SELECT 100' appears in the list of executed queries (see:  
> [^Screen Shot 2019-08-19 at 10.11.50 AM.png] ).
> Expected results:
> It should be possible to whitelist or completely restrict code from other 
> domain names to submit queries to the WebUI.
> Risks:
> Potential for code execution by unauthorized parties.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7351) WebUI is Vulnerable to CSRF

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960960#comment-16960960
 ] 

ASF GitHub Bot commented on DRILL-7351:
---

arina-ielchiieva commented on issue #1864: DRILL-7351: Added tokens to Web 
forms to prevent CSRF attacks
URL: https://github.com/apache/drill/pull/1864#issuecomment-546913158
 
 
   +1, LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> WebUI is Vulnerable to CSRF
> ---
>
> Key: DRILL-7351
> URL: https://issues.apache.org/jira/browse/DRILL-7351
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Don Perial
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: Screen Shot 2019-08-19 at 10.11.50 AM.png, 
> drill-csrf.html
>
>
> There is no way to protect the WebUI from CSRF and the fact that the value 
> for the access-control-allow-origin header is '*' appears to confound this 
> issue as well.
> The attached file demonstrates the vulnerability.
> Steps to replicate:
>  # Login to an instance of the Drill WebUI.
>  # Edit the attached [^drill-csrf.html]. Replace DRILL_HOST with the hostname 
> of the Drill WebUI from step #1.
>  # Load the file from #2 in the same browser as #1 either new tab or same 
> window will do.
>  # Return to the Drill WebUI and click on 'Profiles'.
> Observed results:
> The query 'SELECT 100' appears in the list of executed queries (see:  
> [^Screen Shot 2019-08-19 at 10.11.50 AM.png] ).
> Expected results:
> It should be possible to whitelist or completely restrict code from other 
> domain names to submit queries to the WebUI.
> Risks:
> Potential for code execution by unauthorized parties.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7351) WebUI is Vulnerable to CSRF

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7351:

Labels: ready-to-commit  (was: )

> WebUI is Vulnerable to CSRF
> ---
>
> Key: DRILL-7351
> URL: https://issues.apache.org/jira/browse/DRILL-7351
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Don Perial
>Assignee: Anton Gozhiy
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
> Attachments: Screen Shot 2019-08-19 at 10.11.50 AM.png, 
> drill-csrf.html
>
>
> There is no way to protect the WebUI from CSRF and the fact that the value 
> for the access-control-allow-origin header is '*' appears to confound this 
> issue as well.
> The attached file demonstrates the vulnerability.
> Steps to replicate:
>  # Login to an instance of the Drill WebUI.
>  # Edit the attached [^drill-csrf.html]. Replace DRILL_HOST with the hostname 
> of the Drill WebUI from step #1.
>  # Load the file from #2 in the same browser as #1 either new tab or same 
> window will do.
>  # Return to the Drill WebUI and click on 'Profiles'.
> Observed results:
> The query 'SELECT 100' appears in the list of executed queries (see:  
> [^Screen Shot 2019-08-19 at 10.11.50 AM.png] ).
> Expected results:
> It should be possible to whitelist or completely restrict code from other 
> domain names to submit queries to the WebUI.
> Risks:
> Potential for code execution by unauthorized parties.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7351) WebUI is Vulnerable to CSRF

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960961#comment-16960961
 ] 

ASF GitHub Bot commented on DRILL-7351:
---

arina-ielchiieva commented on issue #1864: DRILL-7351: Added tokens to Web 
forms to prevent CSRF attacks
URL: https://github.com/apache/drill/pull/1864#issuecomment-546913158
 
 
   +1, LGTM
   Please squash the commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> WebUI is Vulnerable to CSRF
> ---
>
> Key: DRILL-7351
> URL: https://issues.apache.org/jira/browse/DRILL-7351
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Don Perial
>Assignee: Anton Gozhiy
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: Screen Shot 2019-08-19 at 10.11.50 AM.png, 
> drill-csrf.html
>
>
> There is no way to protect the WebUI from CSRF and the fact that the value 
> for the access-control-allow-origin header is '*' appears to confound this 
> issue as well.
> The attached file demonstrates the vulnerability.
> Steps to replicate:
>  # Login to an instance of the Drill WebUI.
>  # Edit the attached [^drill-csrf.html]. Replace DRILL_HOST with the hostname 
> of the Drill WebUI from step #1.
>  # Load the file from #2 in the same browser as #1 either new tab or same 
> window will do.
>  # Return to the Drill WebUI and click on 'Profiles'.
> Observed results:
> The query 'SELECT 100' appears in the list of executed queries (see:  
> [^Screen Shot 2019-08-19 at 10.11.50 AM.png] ).
> Expected results:
> It should be possible to whitelist or completely restrict code from other 
> domain names to submit queries to the WebUI.
> Risks:
> Potential for code execution by unauthorized parties.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7347) Upgrade Apache Iceberg to released version

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7347:

Reviewer: Vova Vysotskyi

> Upgrade Apache Iceberg to released version
> --
>
> Key: DRILL-7347
> URL: https://issues.apache.org/jira/browse/DRILL-7347
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently Drill uses Apache Iceberg build on certain commit using JitPack 
> since there is no official released version. Once Iceberg first version is 
> released, we need to use officially released version instead of commit.
> First official Iceberg version is 0.7.0-incubating.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7418) MetadataDirectGroupScan improvements

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960955#comment-16960955
 ] 

ASF GitHub Bot commented on DRILL-7418:
---

arina-ielchiieva commented on pull request #1883: DRILL-7418: 
MetadataDirectGroupScan improvements
URL: https://github.com/apache/drill/pull/1883
 
 
   1. Replaced files listing with selection root information to reduce query 
plan size in MetadataDirectGroupScan.
   2. Fixed MetadataDirectGroupScan ser / de issues.
   3. Added PlanMatcher to QueryBuilder for more convenient plan matching.
   4. Re-written TestConvertCountToDirectScan to use ClusterTest.
   5. Refactoring and code clean up.
   
   Jira - [DRILL-7418](https://issues.apache.org/jira/browse/DRILL-7418).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> MetadataDirectGroupScan improvements
> 
>
> Key: DRILL-7418
> URL: https://issues.apache.org/jira/browse/DRILL-7418
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.17.0
>
>
> When count is converted to direct scan (case when statistics or table 
> metadata are available and there is no need to perform count operation), 
> {{MetadataDirectGroupScan}} is used. Proposed {{MetadataDirectGroupScan}} 
> enhancements:
> 1. Show table selection root instead listing all table files. If table has 
> lots of files, query plan gets polluted with all files enumeration. Since 
> files are not used for calculation (only metadata), they are not relevant and 
> can be excluded from the plan.
> Before:
> {noformat}
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02DirectScan(groupscan=[files = 
> [/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_0.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_5.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_4.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_9.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_3.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_6.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_7.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_10.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_2.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_1.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_8.parquet], 
> numFiles = 11, usedMetadataSummaryFile = false, 
> DynamicPojoRecordReader{records = [[1560060, 2880404, 2880404, 0]]}])
> {noformat}
> After:
> {noformat}
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02DirectScan(groupscan=[selectionRoot = 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all, numFiles = 11, 
> usedMetadataSummaryFile = false, DynamicPojoRecordReader{records = [[1560060, 
> 2880404, 2880404, 0]]}])
> {noformat}
> For Hive tables which were scanned directly, selection root is not available 
> thus will be omitted.
> 2. Submission of physical plan which contains {{MetadataDirectGroupScan}} 
> fails with deserialization errors, proper ser / de should be implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960952#comment-16960952
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on issue #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#issuecomment-546907590
 
 
   I'll run full test suit and merge the PR soon.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960951#comment-16960951
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on issue #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#issuecomment-546906349
 
 
   +1, LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7177:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960948#comment-16960948
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339515525
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = Arrays.asList("xlsx");
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960949#comment-16960949
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339515560
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = Arrays.asList("xlsx");
+
+  // This is the theoretical maximum number of rows in an Excel spreadsheet
+  private final int MAX_ROWS = 1048576;
+
+  public List extensions = DEFAULT_EXTS;
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960944#comment-16960944
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339513312
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = Arrays.asList("xlsx");
 
 Review comment:
   Please remove.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960943#comment-16960943
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339513664
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = Arrays.asList("xlsx");
+
+  // This is the theoretical maximum number of rows in an Excel spreadsheet
+  private final int MAX_ROWS = 1048576;
+
+  public List extensions = DEFAULT_EXTS;
 
 Review comment:
   ```suggestion
 public List extensions = Collections.singletonList("xlsx");
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960924#comment-16960924
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339502837
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("xlsx");
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960922#comment-16960922
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339500656
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("xlsx");
 
 Review comment:
   Fixed.  Is `ImmutableList` ok to leave?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960915#comment-16960915
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339500436
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("xlsx");
+
+  // This is the theoretical maximum number of rows in an Excel spreadsheet
+  private final int MAX_ROWS = 1048576;
+
+  public List extensions;
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960917#comment-16960917
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339500656
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("xlsx");
 
 Review comment:
   Fixed.  Is `ImmutableList` ok to leave?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960916#comment-16960916
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339500487
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("xlsx");
+
+  // This is the theoretical maximum number of rows in an Excel spreadsheet
+  private final int MAX_ROWS = 1048576;
+
+  public List extensions;
+
+  public int headerRow;
+
+  public int lastRow = MAX_ROWS;
+
+  public int firstColumn;
+
+  public int lastColumn;
+
+  public boolean allTextMode;
+
+  public String sheetName = "";
+
+  public int getHeaderRow() {
+return headerRow;
+  }
+
+  public int getLastRow() {
+return lastRow;
+  }
+
+  public String getSheetName() {
+return sheetName;
+  }
+
+  public int getFirstColumn() {
+return firstColumn;
+  }
+
+  public int getLastColumn() {
+return lastColumn;
+  }
+
+  public boolean getAllTextMode() {
+return allTextMode;
+  }
+
+  public ExcelReaderConfig getReaderConfig(ExcelFormatPlugin plugin) {
+ExcelReaderConfig readerConfig = new ExcelReaderConfig(plugin);
+return readerConfig;
+  }
+
+  @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+  public List getExtensions() {
+if (extensions == null) {
+  return DEFAULT_EXTS;
+}
+return extensions;
+  }
+
+  @Override
+  public int hashCode() {
+return Arrays.hashCode(
+  new Object[]{extensions, headerRow, lastRow, sheetName, firstColumn, 
lastColumn, allTextMode});
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj == null || getClass() != obj.getClass()) {
+  return false;
+}
+ExcelFormatConfig other = (ExcelFormatConfig) obj;
+return Objects.equal(headerRow, other.headerRow)
 
 Review comment:
   Fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7347) Upgrade Apache Iceberg to released version

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7347:

Description: 
Currently Drill uses Apache Iceberg build on certain commit using JitPack since 
there is no official released version. Once Iceberg first version is released, 
we need to use officially released version instead of commit.

First official Iceberg version is 0.7.0-incubating.

  was:Currently Drill uses Apache Iceberg build on certain commit using JitPack 
since there is no official released version. Once Iceberg first version is 
released, we need to use officially released version instead of commit.


> Upgrade Apache Iceberg to released version
> --
>
> Key: DRILL-7347
> URL: https://issues.apache.org/jira/browse/DRILL-7347
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently Drill uses Apache Iceberg build on certain commit using JitPack 
> since there is no official released version. Once Iceberg first version is 
> released, we need to use officially released version instead of commit.
> First official Iceberg version is 0.7.0-incubating.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7397) Fix logback errors when building the project

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7397:

Reviewer: Vova Vysotskyi

> Fix logback errors when building the project
> 
>
> Key: DRILL-7397
> URL: https://issues.apache.org/jira/browse/DRILL-7397
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> {noformat}
> [INFO] Compiling 75 source files to /.../drill/common/target/classes
> [WARNING] Unable to autodetect 'javac' path, using 'javac' from the 
> environment.
> [INFO] 
> [INFO] --- exec-maven-plugin:1.6.0:java (default) @ drill-common ---
> 17:46:05,674 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could 
> NOT find resource [logback.groovy]
> 17:46:05,675 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found 
> resource [logback-test.xml] at 
> [file:/.../drill/common/src/test/resources/logback-test.xml]
> 17:46:05,712 |-INFO in 
> ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not 
> set
> 17:46:05,714 |-ERROR in ch.qos.logback.core.joran.conditional.IfAction - 
> Could not find Janino library on the class path. Skipping conditional 
> processing.
> 17:46:05,714 |-ERROR in ch.qos.logback.core.joran.conditional.IfAction - See 
> also http://logback.qos.ch/codes.html#ifJanino
> 17:46:05,714 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - 
> About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
> 17:46:05,719 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - 
> Naming appender as [STDOUT]
> 17:46:05,724 |-INFO in 
> ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default 
> type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] 
> property
> 17:46:05,740 |-INFO in ch.qos.logback.classic.joran.action.LevelAction - ROOT 
> level set to ERROR
> 17:46:05,740 |-ERROR in ch.qos.logback.core.joran.conditional.IfAction - 
> Could not find Janino library on the class path. Skipping conditional 
> processing.
> 17:46:05,740 |-ERROR in ch.qos.logback.core.joran.conditional.IfAction - See 
> also http://logback.qos.ch/codes.html#ifJanino
> 17:46:05,740 |-ERROR in ch.qos.logback.core.joran.action.AppenderRefAction - 
> Could not find an AppenderAttachable at the top of execution stack. Near 
> [appender-ref] line 59
> 17:46:05,740 |-WARN in ch.qos.logback.classic.joran.action.RootLoggerAction - 
> The object on the top the of the stack is not the root logger
> 17:46:05,740 |-WARN in ch.qos.logback.classic.joran.action.RootLoggerAction - 
> It is: ch.qos.logback.core.joran.conditional.IfAction
> 17:46:05,740 |-INFO in 
> ch.qos.logback.classic.joran.action.ConfigurationAction - End of 
> configuration.
> 17:46:05,741 |-INFO in 
> ch.qos.logback.classic.joran.JoranConfigurator@58e3a2c7 - Registering current 
> configuration as safe fallback point
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7347) Upgrade Apache Iceberg to released version

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7347:

Fix Version/s: 1.17.0

> Upgrade Apache Iceberg to released version
> --
>
> Key: DRILL-7347
> URL: https://issues.apache.org/jira/browse/DRILL-7347
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Currently Drill uses Apache Iceberg build on certain commit using JitPack 
> since there is no official released version. Once Iceberg first version is 
> released, we need to use officially released version instead of commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (DRILL-7347) Upgrade Apache Iceberg to released version

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7347:
---

Assignee: Arina Ielchiieva

> Upgrade Apache Iceberg to released version
> --
>
> Key: DRILL-7347
> URL: https://issues.apache.org/jira/browse/DRILL-7347
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>
> Currently Drill uses Apache Iceberg build on certain commit using JitPack 
> since there is no official released version. Once Iceberg first version is 
> released, we need to use officially released version instead of commit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7397) Fix logback errors when building the project

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7397:

Fix Version/s: 1.17.0

> Fix logback errors when building the project
> 
>
> Key: DRILL-7397
> URL: https://issues.apache.org/jira/browse/DRILL-7397
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> {noformat}
> [INFO] Compiling 75 source files to /.../drill/common/target/classes
> [WARNING] Unable to autodetect 'javac' path, using 'javac' from the 
> environment.
> [INFO] 
> [INFO] --- exec-maven-plugin:1.6.0:java (default) @ drill-common ---
> 17:46:05,674 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could 
> NOT find resource [logback.groovy]
> 17:46:05,675 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found 
> resource [logback-test.xml] at 
> [file:/.../drill/common/src/test/resources/logback-test.xml]
> 17:46:05,712 |-INFO in 
> ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not 
> set
> 17:46:05,714 |-ERROR in ch.qos.logback.core.joran.conditional.IfAction - 
> Could not find Janino library on the class path. Skipping conditional 
> processing.
> 17:46:05,714 |-ERROR in ch.qos.logback.core.joran.conditional.IfAction - See 
> also http://logback.qos.ch/codes.html#ifJanino
> 17:46:05,714 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - 
> About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
> 17:46:05,719 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - 
> Naming appender as [STDOUT]
> 17:46:05,724 |-INFO in 
> ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default 
> type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] 
> property
> 17:46:05,740 |-INFO in ch.qos.logback.classic.joran.action.LevelAction - ROOT 
> level set to ERROR
> 17:46:05,740 |-ERROR in ch.qos.logback.core.joran.conditional.IfAction - 
> Could not find Janino library on the class path. Skipping conditional 
> processing.
> 17:46:05,740 |-ERROR in ch.qos.logback.core.joran.conditional.IfAction - See 
> also http://logback.qos.ch/codes.html#ifJanino
> 17:46:05,740 |-ERROR in ch.qos.logback.core.joran.action.AppenderRefAction - 
> Could not find an AppenderAttachable at the top of execution stack. Near 
> [appender-ref] line 59
> 17:46:05,740 |-WARN in ch.qos.logback.classic.joran.action.RootLoggerAction - 
> The object on the top the of the stack is not the root logger
> 17:46:05,740 |-WARN in ch.qos.logback.classic.joran.action.RootLoggerAction - 
> It is: ch.qos.logback.core.joran.conditional.IfAction
> 17:46:05,740 |-INFO in 
> ch.qos.logback.classic.joran.action.ConfigurationAction - End of 
> configuration.
> 17:46:05,741 |-INFO in 
> ch.qos.logback.classic.joran.JoranConfigurator@58e3a2c7 - Registering current 
> configuration as safe fallback point
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960839#comment-16960839
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339433907
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("xlsx");
+
+  // This is the theoretical maximum number of rows in an Excel spreadsheet
+  private final int MAX_ROWS = 1048576;
+
+  public List extensions;
+
+  public int headerRow;
+
+  public int lastRow = MAX_ROWS;
+
+  public int firstColumn;
+
+  public int lastColumn;
+
+  public boolean allTextMode;
+
+  public String sheetName = "";
+
+  public int getHeaderRow() {
+return headerRow;
+  }
+
+  public int getLastRow() {
+return lastRow;
+  }
+
+  public String getSheetName() {
+return sheetName;
+  }
+
+  public int getFirstColumn() {
+return firstColumn;
+  }
+
+  public int getLastColumn() {
+return lastColumn;
+  }
+
+  public boolean getAllTextMode() {
+return allTextMode;
+  }
+
+  public ExcelReaderConfig getReaderConfig(ExcelFormatPlugin plugin) {
+ExcelReaderConfig readerConfig = new ExcelReaderConfig(plugin);
+return readerConfig;
+  }
+
+  @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+  public List getExtensions() {
+if (extensions == null) {
+  return DEFAULT_EXTS;
+}
+return extensions;
+  }
+
+  @Override
+  public int hashCode() {
+return Arrays.hashCode(
+  new Object[]{extensions, headerRow, lastRow, sheetName, firstColumn, 
lastColumn, allTextMode});
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj == null || getClass() != obj.getClass()) {
+  return false;
+}
+ExcelFormatConfig other = (ExcelFormatConfig) obj;
+return Objects.equal(headerRow, other.headerRow)
 
 Review comment:
   Use native Java equivalent instead of Guava `Objects`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960840#comment-16960840
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339433311
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("xlsx");
 
 Review comment:
   Please prefer native Java methods instead of Guava: `Arrays.asList`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960841#comment-16960841
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339433777
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+
+@JsonTypeName(ExcelFormatPlugin.DEFAULT_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("xlsx");
+
+  // This is the theoretical maximum number of rows in an Excel spreadsheet
+  private final int MAX_ROWS = 1048576;
+
+  public List extensions;
 
 Review comment:
   You can assign default extensions right away and return `public List 
getExtensions() { return extensions;` } without additional logic. Similar as 
you do for `sheetName`, for example.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960833#comment-16960833
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339430477
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
+
+  public List extensions;
+
+  @JsonInclude(JsonInclude.Include.NON_DEFAULT)
+  public List getExtensions() {
+if (extensions == null) {
+  return DEFAULT_EXTS;
+}
+return extensions;
+  }
+
+  public ShpBatchReader.ShpReaderConfig getReaderConfig(ShpFormatPlugin 
plugin) {
+ShpBatchReader.ShpReaderConfig readerConfig = new 
ShpBatchReader.ShpReaderConfig(plugin);
+
+return readerConfig;
+  }
+
+  @Override
+  public int hashCode() {
+return Arrays.hashCode(new Object[]{extensions});
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+if (this == obj) {
+  return true;
+}
+if (obj == null || getClass() != obj.getClass()) {
+  return false;
+}
+ShpFormatConfig other = (ShpFormatConfig)obj;
+return Objects.equal(extensions, other.getExtensions() );
 
 Review comment:
   Is there an analog method not from Guava? We should try to use build-in Java 
methods when possible...
   
   ```suggestion
   return Objects.equal(extensions, other.getExtensions());
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960830#comment-16960830
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339429171
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
 
 Review comment:
   No I meant set `List extensions = Arrays.asList("shp", "dbf"); `
   And `public List getExtensions() { return extensions; }`
   
   In test class I see you only read from shp extension, dbf is also valid? I 
though it is used for the second file of three files. As far as I understand, 
for Drill only shp will be valid...
   So I think proper code should be `List extensions = 
Collections.singletonList("shp"); `
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960835#comment-16960835
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339429916
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.server.Drillbit;
+import org.apache.drill.exec.store.StoragePluginRegistry;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.BaseDirTestWatcher;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(RowSetTests.class)
+public class TestShapefileFormatPlugin extends ClusterTest {
 
 Review comment:
   Drill does not support such functionality. One one compressed file can be 
read at a time.
   In this case you need to use 
`negotiator.fileSystem().open(split.getPath());` and `set `config.compressible 
= false;`, also remove unnecessary compression code from the test class.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960831#comment-16960831
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339432499
 
 

 ##
 File path: 
contrib/format-esri/src/test/java/org/apache/drill/exec/store/esri/TestShapefileFormatPlugin.java
 ##
 @@ -0,0 +1,161 @@
+/*
 
 Review comment:
   Why you need to add 
`contrib/format-esri/src/test/resources/shapefiles/CA-cities.parquet` in 
resources?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960834#comment-16960834
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339429242
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
 ##
 @@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import com.esri.core.geometry.Geometry;
+import com.esri.core.geometry.GeometryCursor;
+import com.esri.core.geometry.ShapefileReader;
+import com.esri.core.geometry.SpatialReference;
+import com.esri.core.geometry.ogc.OGCGeometry;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+import org.jamel.dbf.DbfReader;
+import org.jamel.dbf.structure.DbfField;
+import org.joda.time.Instant;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ShpBatchReader implements ManagedReader {
+
+  private FileSplit split;
+  private ResultSetLoader loader;
+  private ShpReaderConfig readerConfig;
+  private Path hadoopShp;
+  private Path hadoopDbf;
+  private Path hadoopPrj;
+  private InputStream fileReaderShp = null;
+  private InputStream fileReaderDbf = null;
+  private InputStream fileReaderPrj = null;
+  private GeometryCursor geomCursor = null;
+  private DbfReader dbfReader = null;
+  private ScalarWriter gidWriter;
+  private ScalarWriter sridWriter;
+  private ScalarWriter shapeTypeWriter;
+  private ScalarWriter geomWriter;
+  private RowSetLoader rowWriter;
+  private int srid;
+  private SpatialReference spatialReference;
+  private static final Logger logger = 
LoggerFactory.getLogger(ShpBatchReader.class);
+
+  public static class ShpReaderConfig {
+protected final ShpFormatPlugin plugin;
+
+public ShpReaderConfig(ShpFormatPlugin plugin) {
+  this.plugin = plugin;
+}
+  }
+
+  public ShpBatchReader(ShpReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+this.split = negotiator.split();
+this.hadoopShp = split.getPath();
+
+String filePath = split.getPath().toString();
+this.hadoopDbf = new Path(filePath.replace(".shp", ".dbf"));
+this.hadoopPrj = new Path(filePath.replace(".shp", ".prj"));
+
+openFile(negotiator);
+SchemaBuilder builder = new SchemaBuilder()
+  .addNullable("gid", TypeProtos.MinorType.INT)
+  .addNullable("srid", TypeProtos.MinorType.INT)
+  .addNullable("shapeType", TypeProtos.MinorType.VARCHAR)
+  .addNullable("geom", TypeProtos.MinorType.VARBINARY);
+
+negotiator.setTableSchema(builder.buildSchema(), false);
+loader = negotiator.build();
+
+rowWriter = loader.writer();
+gidWriter = rowWriter.scalar("gid");
+sridWriter = rowWriter.scalar("srid");
+shapeTypeWriter = rowWriter.scalar("shapeType");
+geomWriter = rowWriter.scalar("geom");
+
+return true;
+  }
+
+  @Override
+  public boolean next() {
+Geometry geom = null;
+
+while (!rowWriter.isFull()) {
+  Object[] 

[jira] [Commented] (DRILL-4303) ESRI Shapefile (shp) format plugin

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960832#comment-16960832
 ] 

ASF GitHub Bot commented on DRILL-4303:
---

arina-ielchiieva commented on pull request #1858: DRILL-4303: ESRI Shapefile 
(shp) Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r339430653
 
 

 ##
 File path: 
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatConfig.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.esri;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Format plugin config for shapefile data files.
+ */
+@JsonTypeName(ShpFormatPlugin.PLUGIN_NAME)
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ShpFormatConfig implements FormatPluginConfig {
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("shp", 
"dbf");
 
 Review comment:
   Please avoid using Guava: `ImmutableList.of` -> `Arrays.asList`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ESRI Shapefile (shp) format plugin
> --
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.17.0
>Reporter: Karol Potocki
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular 
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp - 
> geometry data)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960827#comment-16960827
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339432072
 
 

 ##
 File path: 
protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java
 ##
 @@ -901,6 +905,10 @@ private FragmentState(int value) {
  * LTSV_SUB_SCAN = 62;
  */
 public static final int LTSV_SUB_SCAN_VALUE = 62;
+/**
+ * EXCEL_SUB_SCAN = 64;
+ */
+public static final int EXCEL_SUB_SCAN_VALUE = 64;
 
 Review comment:
   @cgivre did you generate C++ code, protobuf generating instruction, please 
check step 3:
   ```
   If changes are made to the DrillClient's protobuf, you would need to 
regenerate the sources for the C++ client as well.
   Steps for regenerating the sources are available 
https://github.com/apache/drill/blob/master/contrib/native/client/
   ```
   https://github.com/apache/drill/tree/master/protocol
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960826#comment-16960826
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

arina-ielchiieva commented on pull request #1749: DRILL-7177: Format Plugin for 
Excel Files
URL: https://github.com/apache/drill/pull/1749#discussion_r339432072
 
 

 ##
 File path: 
protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java
 ##
 @@ -901,6 +905,10 @@ private FragmentState(int value) {
  * LTSV_SUB_SCAN = 62;
  */
 public static final int LTSV_SUB_SCAN_VALUE = 62;
+/**
+ * EXCEL_SUB_SCAN = 64;
+ */
+public static final int EXCEL_SUB_SCAN_VALUE = 64;
 
 Review comment:
   @cgivre did you generate C++ code, protobuf generating instruction, please 
check step 3:
   ```
   If changes are made to the DrillClient's protobuf, you would need to 
regenerate the sources for the C++ client as well.
   Steps for regenerating the sources are available 
https://github.com/apache/drill/blob/master/contrib/native/client/
   ```
   https://github.com/apache/drill/tree/master/protocol
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7424) Project operator fails to set the container row count

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7424:

Reviewer: Vova Vysotskyi

> Project operator fails to set the container row count
> -
>
> Key: DRILL-7424
> URL: https://issues.apache.org/jira/browse/DRILL-7424
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Enabled the "batch validator" for the Project operator. Ran tests. Exceptions 
> occurred because, in some paths, the Project operator fails to set the 
> container row count.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7424) Project operator fails to set the container row count

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7424:

Fix Version/s: 1.17.0

> Project operator fails to set the container row count
> -
>
> Key: DRILL-7424
> URL: https://issues.apache.org/jira/browse/DRILL-7424
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Enabled the "batch validator" for the Project operator. Ran tests. Exceptions 
> occurred because, in some paths, the Project operator fails to set the 
> container row count.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7424) Project operator fails to set the container row count

2019-10-28 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7424:

Affects Version/s: 1.16.0

> Project operator fails to set the container row count
> -
>
> Key: DRILL-7424
> URL: https://issues.apache.org/jira/browse/DRILL-7424
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Enabled the "batch validator" for the Project operator. Ran tests. Exceptions 
> occurred because, in some paths, the Project operator fails to set the 
> container row count.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)