[jira] [Commented] (DRILL-7403) Validate batch checks, vector integretity in unit tests

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954272#comment-16954272
 ] 

ASF GitHub Bot commented on DRILL-7403:
---

paul-rogers commented on issue #1871: DRILL-7403: Validate batch checks, vector 
integretity in unit tests
URL: https://github.com/apache/drill/pull/1871#issuecomment-543515384
 
 
   @arina-ielchiieva, addressed the comments. Since they were minor, went ahead 
and squashed commits. Local tests passed up to the one that usually fails for 
me:
   
   ```
   [ERROR] Errors: 
   [ERROR]   
TestDynamicUDFSupport.testReRegisterTheSameJarWithDifferentContent:600->BaseTestQuery.testRunAndReturn:340
 » Rpc
   ```
   
   Tried enabling the check for only the "new" scan. But, somehow, this still 
checked the Parquet reader:
   
   ```
   [INFO] Running org.apache.drill.exec.store.parquet2.TestDrillParquetReader
   columns-offsets - UInt4Vector: Invalid offset at index 2049 = 4098 exceeds 
maximum of 4096
   columns-offsets - UInt4Vector: Invalid offset at index 2050 = 4100 exceeds 
maximum of 4096
   ```
   
   Since this PR is just about introducing the test code, I went ahead and 
disabled calls to the code. Later PRs will try to enable the checks 
operator-by-operator so we can find issues gradually.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Validate batch checks, vector integretity in unit tests
> ---
>
> Key: DRILL-7403
> URL: https://issues.apache.org/jira/browse/DRILL-7403
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.17.0
>
>
> Drill provides a {{BatchValidator}} that checks vectors. It is disabled by 
> default. This enhancement adds more checks, including checks for row counts 
> (of which there are surprisingly many.)
> Since most operators will fail if the check is enabled, this enhancement also 
> adds a table to keep track of which operators pass the checks (and for which 
> checks should be enabled) and those that still need work. This allows the 
> checks to exist in the code, and to be enabled incrementally as we fix the 
> various problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7403) Validate batch checks, vector integretity in unit tests

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954271#comment-16954271
 ] 

ASF GitHub Bot commented on DRILL-7403:
---

paul-rogers commented on issue #1871: DRILL-7403: Validate batch checks, vector 
integretity in unit tests
URL: https://github.com/apache/drill/pull/1871#issuecomment-543515384
 
 
   @arina-ielchiieva, addressed the comments. Since they were minor, went ahead 
and squashed commits. Local tests passed up to the one that usually fails for 
me:
   
   ```
   [ERROR] Errors: 
   [ERROR]   
TestDynamicUDFSupport.testReRegisterTheSameJarWithDifferentContent:600->BaseTestQuery.testRunAndReturn:340
 » Rpc
   ```
   
   Please let me know if any of the pre-commit tests fail; there is some slight 
probability that even the one case now being tested in all cases might find an 
issue.
   
   If this all works, next step will be to enable testing, and submit fixes, 
for each operator one by one.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Validate batch checks, vector integretity in unit tests
> ---
>
> Key: DRILL-7403
> URL: https://issues.apache.org/jira/browse/DRILL-7403
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.17.0
>
>
> Drill provides a {{BatchValidator}} that checks vectors. It is disabled by 
> default. This enhancement adds more checks, including checks for row counts 
> (of which there are surprisingly many.)
> Since most operators will fail if the check is enabled, this enhancement also 
> adds a table to keep track of which operators pass the checks (and for which 
> checks should be enabled) and those that still need work. This allows the 
> checks to exist in the code, and to be enabled incrementally as we fix the 
> various problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7403) Validate batch checks, vector integretity in unit tests

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954269#comment-16954269
 ] 

ASF GitHub Bot commented on DRILL-7403:
---

paul-rogers commented on issue #1871: DRILL-7403: Validate batch checks, vector 
integretity in unit tests
URL: https://github.com/apache/drill/pull/1871#issuecomment-543515384
 
 
   @arina-ielchiieva, addressed the comments. Since they were minor, went ahead 
and squashed commits. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Validate batch checks, vector integretity in unit tests
> ---
>
> Key: DRILL-7403
> URL: https://issues.apache.org/jira/browse/DRILL-7403
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0, 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.17.0
>
>
> Drill provides a {{BatchValidator}} that checks vectors. It is disabled by 
> default. This enhancement adds more checks, including checks for row counts 
> (of which there are surprisingly many.)
> Since most operators will fail if the check is enabled, this enhancement also 
> adds a table to keep track of which operators pass the checks (and for which 
> checks should be enabled) and those that still need work. This allows the 
> checks to exist in the code, and to be enabled incrementally as we fix the 
> various problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7403) Validate batch checks, vector integretity in unit tests

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954266#comment-16954266
 ] 

ASF GitHub Bot commented on DRILL-7403:
---

paul-rogers commented on pull request #1871: DRILL-7403: Validate batch checks, 
vector integretity in unit tests
URL: https://github.com/apache/drill/pull/1871#discussion_r336315771
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/validate/BatchValidator.java
 ##
 @@ -101,47 +290,90 @@ private void validateVector(ValueVector vector) {
 }
   }
 
-  private void validateVariableWidthVector(String name, VariableWidthVector 
vector, int entryCount) {
+  private void validateNullableVector(String name, NullableVector vector) {
+int outerCount = vector.getAccessor().getValueCount();
+ValueVector valuesVector = vector.getValuesVector();
+int valueCount = valuesVector.getAccessor().getValueCount();
+if (valueCount != outerCount) {
+  error(name, vector, String.format(
+  "Outer value count = %d, but inner value count = %d",
+  outerCount, valueCount));
+}
+verifyIsSetVector(vector, (UInt1Vector) vector.getBitsVector());
+validateVector(name + "-values", valuesVector);
+  }
+
+  private void validateVariableWidthVector(String name, VariableWidthVector 
vector) {
 
 // Offsets are in the derived classes. Handle only VarChar for now.
 
 if (vector instanceof VarCharVector) {
-  validateVarCharVector(name, (VarCharVector) vector, entryCount);
+  validateVarCharVector(name, (VarCharVector) vector);
 } else {
   logger.debug("Don't know how to validate vector: " + name + " of class " 
+ vector.getClass().getSimpleName());
 }
   }
 
-  private void validateVarCharVector(String name, VarCharVector vector, int 
entryCount) {
-//int dataLength = vector.getAllocatedByteCount(); // Includes offsets and 
data.
-int dataLength = vector.getBuffer().capacity();
-validateOffsetVector(name + "-offsets", vector.getOffsetVector(), 
entryCount, dataLength);
+  private void validateVarCharVector(String name, VarCharVector vector) {
+int size = vector.getAccessor().getValueCount();
+
+// Disabled because a large number of operators
+// set up offset vectors wrongly.
+if (size == 0) {
+  return;
+}
+
+int dataLength = vector.getBuffer().writerIndex();
+validateOffsetVector(name + "-offsets", vector.getOffsetVector(), false, 
size, dataLength);
   }
 
   private void validateRepeatedVector(String name, BaseRepeatedValueVector 
vector) {
-
 int dataLength = Integer.MAX_VALUE;
 if (vector instanceof RepeatedVarCharVector) {
-  dataLength = ((RepeatedVarCharVector) 
vector).getOffsetVector().getValueCapacity();
+  dataLength = ((RepeatedVarCharVector) 
vector).getDataVector().getBuffer().writerIndex();
 } else if (vector instanceof RepeatedFixedWidthVectorLike) {
-  dataLength = ((BaseDataValueVector) 
vector.getDataVector()).getBuffer().capacity();
+  dataLength = ((BaseDataValueVector) 
vector.getDataVector()).getBuffer().writerIndex();
 }
-int itemCount = validateOffsetVector(name + "-offsets", 
vector.getOffsetVector(), rowCount, dataLength);
+int valueCount = vector.getAccessor().getValueCount();
+int itemCount = validateOffsetVector(name + "-offsets", 
vector.getOffsetVector(), true, valueCount, dataLength);
 
 // Special handling of repeated VarChar vectors
 // The nested data vectors are not quite exactly like top-level vectors.
 
 ValueVector dataVector = vector.getDataVector();
+if (dataVector.getAccessor().getValueCount() != itemCount) {
+  error(name, vector, String.format(
+  "Vector has %d values, but offset vector labels %d values",
+  valueCount, itemCount));
+}
 if (dataVector instanceof VariableWidthVector) {
-  validateVariableWidthVector(name + "-data", (VariableWidthVector) 
dataVector, itemCount);
+  validateVariableWidthVector(name + "-data", (VariableWidthVector) 
dataVector);
 }
   }
 
-  private int validateOffsetVector(String name, UInt4Vector offsetVector, int 
valueCount, int maxOffset) {
+  private void validateFixedWidthVector(String name, FixedWidthVector vector) {
+// Not much to do
+  }
+
+  private int validateOffsetVector(String name, UInt4Vector offsetVector, 
boolean repeated, int valueCount, int maxOffset) {
+UInt4Vector.Accessor accessor = offsetVector.getAccessor();
+int offsetCount = accessor.getValueCount();
+// Disabled because a large number of operators
 
 Review comment:
   This is a problem, but one to fix after the basics are fixed. Changed it to 
a TODO.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the 

[jira] [Commented] (DRILL-7402) Suppress batch dumps for expected failures in tests

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954246#comment-16954246
 ] 

ASF GitHub Bot commented on DRILL-7402:
---

paul-rogers commented on issue #1872: DRILL-7402: Suppress batch dumps for 
expected failures in tests
URL: https://github.com/apache/drill/pull/1872#issuecomment-543489731
 
 
   @arina-ielchiieva, fixed the import issue. To save time, went ahead and 
squashed commits. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Suppress batch dumps for expected failures in tests
> ---
>
> Key: DRILL-7402
> URL: https://issues.apache.org/jira/browse/DRILL-7402
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.17.0
>
>
> Drill provides a way to dump the last few batches when an error occurs. 
> However, in tests, we often deliberately cause something to fail. In this 
> case, the batch dump is unnecessary.
> This enhancement adds a config property, disabled in tests, that controls the 
> dump activity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954216#comment-16954216
 ] 

ASF GitHub Bot commented on DRILL-7405:
---

Agirish commented on issue #1874: DRILL-7405: Avoiding download of TPC-H data
URL: https://github.com/apache/drill/pull/1874#issuecomment-543463326
 
 
   Unit tests as well as Functional & Advanced Regression tests from [1] are 
successful.
   
   [1] https://github.com/mapr/drill-test-framework
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build fails due to inaccessible apache-drill on S3 storage
> --
>
> Key: DRILL-7405
> URL: https://issues.apache.org/jira/browse/DRILL-7405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.16.0
>Reporter: Boaz Ben-Zvi
>Assignee: Abhishek Girish
>Priority: Critical
> Fix For: 1.17.0
>
>
>   A new clean build (e.g. after deleting the ~/.m2 local repository) would 
> fail now due to:  
> Access denied to: 
> [http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz=DwMGaQ=C5b8zRQO1miGmBeVZ2LFWg=KLC1nKJ8dIOnUay2kR6CAw=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk=]
>  
> (e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )
> A new publicly available storage place is needed, plus appropriate changes in 
> Drill to get to these resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-17 Thread Abhishek Girish (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954179#comment-16954179
 ] 

Abhishek Girish edited comment on DRILL-7405 at 10/18/19 1:12 AM:
--

Switching priority to Critical - as the S3 link will only be available for a 
short period. 

I have a PR [1] open - it moves the files to GitHub as they are just a few MB 
in size. [~shamirwasia]/[~sorabh] can you please take a look?

[1] https://github.com/apache/drill/pull/1874 


was (Author: agirish):
Switching priority to Critical - as the S3 link will only be available for a 
short period. 

I have a PR open - it moves the files to GitHub as they are just a few MB in 
size.

https://github.com/apache/drill/pull/1874

[~shamirwasia]/[~sorabh] can . you please take a look>

> Build fails due to inaccessible apache-drill on S3 storage
> --
>
> Key: DRILL-7405
> URL: https://issues.apache.org/jira/browse/DRILL-7405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.16.0
>Reporter: Boaz Ben-Zvi
>Assignee: Abhishek Girish
>Priority: Blocker
> Fix For: 1.17.0
>
>
>   A new clean build (e.g. after deleting the ~/.m2 local repository) would 
> fail now due to:  
> Access denied to: 
> [http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz=DwMGaQ=C5b8zRQO1miGmBeVZ2LFWg=KLC1nKJ8dIOnUay2kR6CAw=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk=]
>  
> (e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )
> A new publicly available storage place is needed, plus appropriate changes in 
> Drill to get to these resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-17 Thread Abhishek Girish (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954179#comment-16954179
 ] 

Abhishek Girish commented on DRILL-7405:


Switching priority to Critical - as the S3 link will only be available for a 
short period. 

I have a PR open - it moves the files to GitHub as they are just a few MB in 
size.

https://github.com/apache/drill/pull/1874

[~shamirwasia]/[~sorabh] can . you please take a look>

> Build fails due to inaccessible apache-drill on S3 storage
> --
>
> Key: DRILL-7405
> URL: https://issues.apache.org/jira/browse/DRILL-7405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.16.0
>Reporter: Boaz Ben-Zvi
>Assignee: Abhishek Girish
>Priority: Blocker
> Fix For: 1.17.0
>
>
>   A new clean build (e.g. after deleting the ~/.m2 local repository) would 
> fail now due to:  
> Access denied to: 
> [http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz=DwMGaQ=C5b8zRQO1miGmBeVZ2LFWg=KLC1nKJ8dIOnUay2kR6CAw=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk=]
>  
> (e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )
> A new publicly available storage place is needed, plus appropriate changes in 
> Drill to get to these resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-17 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7405:
---
Priority: Critical  (was: Blocker)

> Build fails due to inaccessible apache-drill on S3 storage
> --
>
> Key: DRILL-7405
> URL: https://issues.apache.org/jira/browse/DRILL-7405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.16.0
>Reporter: Boaz Ben-Zvi
>Assignee: Abhishek Girish
>Priority: Critical
> Fix For: 1.17.0
>
>
>   A new clean build (e.g. after deleting the ~/.m2 local repository) would 
> fail now due to:  
> Access denied to: 
> [http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz=DwMGaQ=C5b8zRQO1miGmBeVZ2LFWg=KLC1nKJ8dIOnUay2kR6CAw=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk=]
>  
> (e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )
> A new publicly available storage place is needed, plus appropriate changes in 
> Drill to get to these resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-17 Thread Abhishek Girish (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7405:
---
Reviewer: Sorabh Hamirwasia
Priority: Blocker  (was: Minor)

> Build fails due to inaccessible apache-drill on S3 storage
> --
>
> Key: DRILL-7405
> URL: https://issues.apache.org/jira/browse/DRILL-7405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.16.0
>Reporter: Boaz Ben-Zvi
>Assignee: Abhishek Girish
>Priority: Blocker
> Fix For: 1.17.0
>
>
>   A new clean build (e.g. after deleting the ~/.m2 local repository) would 
> fail now due to:  
> Access denied to: 
> [http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz=DwMGaQ=C5b8zRQO1miGmBeVZ2LFWg=KLC1nKJ8dIOnUay2kR6CAw=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk=]
>  
> (e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )
> A new publicly available storage place is needed, plus appropriate changes in 
> Drill to get to these resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7405) Build fails due to inaccessible apache-drill on S3 storage

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954178#comment-16954178
 ] 

ASF GitHub Bot commented on DRILL-7405:
---

Agirish commented on pull request #1874: DRILL-7405: Avoiding download of TPC-H 
data
URL: https://github.com/apache/drill/pull/1874
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build fails due to inaccessible apache-drill on S3 storage
> --
>
> Key: DRILL-7405
> URL: https://issues.apache.org/jira/browse/DRILL-7405
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.16.0
>Reporter: Boaz Ben-Zvi
>Assignee: Abhishek Girish
>Priority: Minor
> Fix For: 1.17.0
>
>
>   A new clean build (e.g. after deleting the ~/.m2 local repository) would 
> fail now due to:  
> Access denied to: 
> [http://apache-drill.s3.amazonaws.com|https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Ddrill.s3.amazonaws.com_files_sf-2D0.01-5Ftpc-2Dh-5Fparquet-5Ftyped.tgz=DwMGaQ=C5b8zRQO1miGmBeVZ2LFWg=KLC1nKJ8dIOnUay2kR6CAw=08mf7Xfn1orlbAA60GKLIuj_PTtfaSAijrKDLOucMPU=CX97We3sm3ZZ_aVJIrsUdXVJ3CNMYg7p3IsxbJpuXWk=]
>  
> (e.g., for the test data  sf-0.01_tpc-h_parquet_typed.tgz )
> A new publicly available storage place is needed, plus appropriate changes in 
> Drill to get to these resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953767#comment-16953767
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335696309
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split();
+

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953768#comment-16953768
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335699083
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelFormatConfig.java
 ##
 @@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.shaded.guava.com.google.common.base.Objects;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.exec.store.excel.ExcelBatchReader.ExcelReaderConfig;
+
+import java.util.Arrays;
+import java.util.List;
+
+@JsonTypeName("excel")
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class ExcelFormatConfig implements FormatPluginConfig {
+  //This is the theoretical maximum number of rows in an Excel spreadsheet
+  private final int MAX_ROWS = 1048576;
+
+  public List extensions;
+
+  public int headerRow = 0;
+
+  public int lastRow = MAX_ROWS;
+
+  public int firstColumn = 0;
+
+  public int lastColumn = 0;
+
+  public boolean readAllFieldsAsVarChar = false;
+
+  public boolean evaluateFormulae = true;
+
+  public String sheetName = "";
+
+  public ExcelFormatConfig() {
+  }
+
+  private static final List DEFAULT_EXTS = ImmutableList.of("xlsx");
+
+  public boolean getEvaluateFormulae() {
+return evaluateFormulae;
+  }
+
+  public int getHeaderRow() {
+return headerRow;
+  }
+
+  public int getLastRow() {
+return lastRow;
+  }
+
+  public String getSheetName() {
+return sheetName;
+  }
+
+  public int getFirstColumn() {
+return firstColumn;
+  }
+
+  public int getLastColumn() {
+return lastColumn;
+  }
+
+  public boolean getReadAllFieldsAsVarChar() {
+return readAllFieldsAsVarChar;
+  }
+
+  public void setHeaderRow(int row) {
+this.headerRow = row;
+  }
+
+  public void setLastRow(int row) {
+this.lastRow = row;
+  }
+
+  public void setFirstColumn(int column) {
+this.firstColumn = column;
+  }
+
+  public void setLastColumn(int column) {
+this.lastColumn = column;
+  }
+
+  public void setSheetName(String sn) {
+this.sheetName = sn;
+  }
+
+  public void setEvaluateFormulae(boolean value) {
+this.evaluateFormulae = value;
+  }
+
+  public ExcelReaderConfig getReaderConfig(ExcelFormatPlugin plugin) {
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953769#comment-16953769
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335698017
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split();
+

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953765#comment-16953765
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335694604
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
 
 Review comment:
   Where would you suggest putting 

[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953770#comment-16953770
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335700464
 
 

 ##
 File path: 
protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java
 ##
 @@ -980,6 +988,7 @@ public static CoreOperatorType forNumber(int value) {
 case 60: return UNPIVOT_MAPS;
 case 61: return STATISTICS_MERGE;
 case 62: return LTSV_SUB_SCAN;
+case 64: return EXCEL_SUB_SCAN;
 
 Review comment:
   I was saving 63 for HDF5.  64 is ok.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7177) Format Plugin for Excel Files

2019-10-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953766#comment-16953766
 ] 

ASF GitHub Bot commented on DRILL-7177:
---

cgivre commented on pull request #1749: DRILL-7177: Format Plugin for Excel 
Files
URL: https://github.com/apache/drill/pull/1749#discussion_r335697332
 
 

 ##
 File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
 ##
 @@ -0,0 +1,398 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.excel;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileSplit;
+
+import org.apache.poi.ss.usermodel.Cell;
+import org.apache.poi.ss.usermodel.CellValue;
+import org.apache.poi.ss.usermodel.DateUtil;
+import org.apache.poi.ss.usermodel.FormulaEvaluator;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.xssf.usermodel.XSSFSheet;
+import org.apache.poi.xssf.usermodel.XSSFWorkbook;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.joda.time.Instant;
+
+import java.util.Iterator;
+import java.io.IOException;
+import java.util.ArrayList;
+
+public class ExcelBatchReader implements ManagedReader {
+  private ExcelReaderConfig readerConfig;
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ExcelBatchReader.class);
+
+  private XSSFWorkbook workbook;
+
+  private FSDataInputStream fsStream;
+
+  private static final String SAFE_WILDCARD = "_$";
+
+  private static final String SAFE_SEPARATOR = "_";
+
+  private static final String PARSER_WILDCARD = ".*";
+
+  private static final String MISSING_FIELD_NAME_HEADER = "field_";
+
+  private static final String SAFE_NEWLINE_REPLACEMENT = " ";
+
+  private XSSFSheet sheet;
+
+  private FormulaEvaluator evaluator;
+
+  private ArrayList excelFieldNames;
+
+  private Iterator rowIterator;
+
+  private int totalColumnCount;
+
+  private int lineCount;
+
+  private FileSplit split;
+
+  private ResultSetLoader loader;
+
+  private int recordCount;
+
+  public static class ExcelReaderConfig {
+protected final ExcelFormatPlugin plugin;
+
+protected int headerRow;
+
+protected int lastRow;
+
+protected int firstColumn;
+
+protected int lastColumn;
+
+protected boolean readAllFieldsAsVarChar;
+
+protected boolean evaluateFormulae;
+
+protected TupleMetadata schema;
+
+protected String sheetName;
+
+public ExcelReaderConfig(ExcelFormatPlugin plugin, int headerRow, int 
lastRow, int firstColumn, int lastColumn, boolean readAllFieldsAsVarChar, 
boolean evaluateFormulae,
+ //TupleMetadata schema,
+ String sheetName) {
+  this.plugin = plugin;
+  this.headerRow = headerRow;
+  this.lastRow = lastRow;
+  this.firstColumn = firstColumn;
+  this.lastColumn = lastColumn;
+  this.readAllFieldsAsVarChar = readAllFieldsAsVarChar;
+  this.evaluateFormulae = evaluateFormulae;
+  this.sheetName = sheetName;
+
+}
+  }
+
+  public ExcelBatchReader(ExcelReaderConfig readerConfig) {
+this.readerConfig = readerConfig;
+  }
+
+  @Override
+  public boolean open(FileSchemaNegotiator negotiator) {
+verifyConfigOptions();
+split = negotiator.split();
+

[jira] [Commented] (DRILL-7407) drill hive struct not support

2019-10-17 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953736#comment-16953736
 ] 

Arina Ielchiieva commented on DRILL-7407:
-

Well, such decision was made by the community when releasing Drill 1.13. 
Supporting multiple versions of various products is tedious work which results 
in various backward compatibility problems plus that won't allow Drill to 
evolve as quick as others products do. You can use Drill 1.12 if needed but if 
you need new Drill features, use newer version of Hive.

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7407) drill hive struct not support

2019-10-17 Thread liuchao (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953733#comment-16953733
 ] 

liuchao commented on DRILL-7407:


[~arina]  ok  .

But it is not a  good  idea for 1.13+ not support Hive 1.X

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (DRILL-7408) 1.16.0 not support Hive1.2.1

2019-10-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-7408.
---
Resolution: Invalid

Since Drill 1.13+, only Hive 2.3.2 is supported.

> 1.16.0 not  support Hive1.2.1
> -
>
> Key: DRILL-7408
> URL: https://issues.apache.org/jira/browse/DRILL-7408
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: liuchao
>Priority: Major
> Attachments: image-2019-10-17-21-01-01-268.png
>
>
> !image-2019-10-17-21-01-01-268.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7407) drill hive struct not support

2019-10-17 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953727#comment-16953727
 ] 

Arina Ielchiieva commented on DRILL-7407:
-

[~liuchao8158] to use Drill 1.13+, you need new to upgrade Hive to 2.3.2

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7408) 1.16.0 not support Hive1.2.1

2019-10-17 Thread liuchao (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953726#comment-16953726
 ] 

liuchao commented on DRILL-7408:


I use 1.12.0  it is ok

But 1.16.0  query is error

> 1.16.0 not  support Hive1.2.1
> -
>
> Key: DRILL-7408
> URL: https://issues.apache.org/jira/browse/DRILL-7408
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: liuchao
>Priority: Major
> Attachments: image-2019-10-17-21-01-01-268.png
>
>
> !image-2019-10-17-21-01-01-268.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7408) 1.16.0 not support Hive1.2.1

2019-10-17 Thread liuchao (Jira)
liuchao created DRILL-7408:
--

 Summary: 1.16.0 not  support Hive1.2.1
 Key: DRILL-7408
 URL: https://issues.apache.org/jira/browse/DRILL-7408
 Project: Apache Drill
  Issue Type: Bug
Reporter: liuchao
 Attachments: image-2019-10-17-21-01-01-268.png

!image-2019-10-17-21-01-01-268.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7407) drill hive struct not support

2019-10-17 Thread liuchao (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953711#comment-16953711
 ] 

liuchao commented on DRILL-7407:


[~arina]

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7407) drill hive struct not support

2019-10-17 Thread liuchao (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953710#comment-16953710
 ] 

liuchao commented on DRILL-7407:


[~IhorHuzenko]  hive 1.2.1   drill 1.12.0 

 

java.util.concurrent.ExecutionException: 
org.apache.thrift.TApplicationException: Invalid method name: 
'get_table_req'java.util.concurrent.ExecutionException: 
org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req' 
at 
org.apache.drill.shaded.guava.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:502)
 ~[drill-shaded-guava-23.0.jar:23.0] at 
org.apache.drill.shaded.guava.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:461)
 ~[drill-shaded-guava-23.0.jar:23.0] at 
org.apache.drill.shaded.guava.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
 ~[drill-shaded-guava-23.0.jar:23.0] at 
org.apache.drill.shaded.guava.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:142)
 ~[drill-shaded-guava-23.0.jar:23.0] at 
org.apache.drill.shaded.guava.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2453)
 ~[drill-shaded-guava-23.0.jar:23.0] at 
org.apache.drill.shaded.guava.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2417)
 ~[drill-shaded-guava-23.0.jar:23.0] at 
org.apache.drill.shaded.guava.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2299)
 ~[drill-shaded-guava-23.0.jar:23.0]

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7407) drill hive struct not support

2019-10-17 Thread liuchao (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953709#comment-16953709
 ] 

liuchao commented on DRILL-7407:


[~IhorHuzenko]  hive 1.2.1   drill 1.12.0 

!image-2019-10-17-20-47-11-071.png!

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7407) drill hive struct not support

2019-10-17 Thread liuchao (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953690#comment-16953690
 ] 

liuchao commented on DRILL-7407:


Hello  Igor Guzenko, my hive is  1.2.1 . I used 1.16.0   can not started.   
just can use 1.12.0.

so, i don not know whether the 1.17.0 is support for Hive 1.2.1

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7407) drill hive struct not support

2019-10-17 Thread liuchao (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953691#comment-16953691
 ] 

liuchao commented on DRILL-7407:


[~IhorHuzenko]

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7406) Update Calcite to 1.21.0

2019-10-17 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7406:

Description: DRILL-7340 should be fixed by this update.

> Update Calcite to 1.21.0
> 
>
> Key: DRILL-7406
> URL: https://issues.apache.org/jira/browse/DRILL-7406
> Project: Apache Drill
>  Issue Type: Task
>  Components: Query Planning  Optimization, SQL Parser
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>
> DRILL-7340 should be fixed by this update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7177) Format Plugin for Excel Files

2019-10-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7177:

Reviewer: Paul Rogers

> Format Plugin for Excel Files
> -
>
> Key: DRILL-7177
> URL: https://issues.apache.org/jira/browse/DRILL-7177
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting
> Fix For: 1.17.0
>
>
> This pull request adds the functionality which enables Drill to query 
> Microsoft Excel files. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (DRILL-6990) IllegalStateException: The current reader doesn't support getting next information

2019-10-17 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko resolved DRILL-6990.
-
Resolution: Fixed

Fixed in scope of DRILL-7268. 

> IllegalStateException: The current reader doesn't support getting next 
> information
> --
>
> Key: DRILL-6990
> URL: https://issues.apache.org/jira/browse/DRILL-6990
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: parqt_nestedArray.parquet.tar
>
>
> Reading a parquet file created from Spark, returns IllegalStateException: The 
> current reader doesn't support getting next information
> Drill 1.14.0, parquet file created from Spark is attached here.
> //Steps to create parquet file from Spark 2.3.1
> [root@ba102-495 ~]# cd /opt/mapr/spark/spark-2.3.1
> [root@ba102-495 spark-2.3.1]# cd bin
> [root@ba102-495 bin]# ./spark-shell
> 19/01/21 22:57:05 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Spark context Web UI available at http://qa102-45.qa.lab:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1548111430809).
> Spark session available as 'spark'.
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 2.3.1-mapr-SNAPSHOT
>  /_/
> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> import spark.implicits._
> import spark.implicits._
> scala> val df = spark.read.json("/apps/nestedDataJson.json")
> df: org.apache.spark.sql.DataFrame = [id: bigint, nested_array: 
> array>]
> scala> df.write.parquet("/apps/parqt_nestedArray.parquet")
> Data used in test
> {noformat}
> [root@ba102-495 ~]# cat nestedDataJson.json
> {"id":19,"nested_array":[[1,2,3,4],[5,6,7,8],[9,10,12]]}
> {"id":14121,"nested_array":[[1,3,4],[5,6,8],[9,11,12]]}
> {"id":18894,"nested_array":[[1,3,4],[5,6,7,8],[9,10,11,12]]}
> {"id":12499,"nested_array":[[1,4],[5,7,8],[9,11,12]]}
> {"id":120,"nested_array":[[1,4],[5,7,8],[9,10,11,12]]}
> {"id":12,"nested_array":[[1,2,3,4],[5,6,7,8],[11,12]]}
> {"id":13,"nested_array":[[1,2,3,4],[5,8],[9,10,11,12]]}
> {"id":14,"nested_array":[[1,2,3,4],[5,68],[9,10,11,12]]}
> {"id":123,"nested_array":[[1,2,3,4],[5,8],[9,10,11,12]]}
> {"id":124,"nested_array":[[1,2,4],[5,6,7,8],[9,10,11,12]]}
> {"id":134,"nested_array":[[1,4],[5,8],[9,12]]}
> {noformat}
> From drillbit.log
> {noformat}
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalStateException: The current reader doesn't support getting next 
> information. Fragment 0:0 [Error Id: c16c70dd-6565-463f-83a7-118ccd8442e2 on 
> ba102-495.qa.lab:31010]
> ...
> ...
> 2019-01-21 23:08:11,268 [23b9af24-10b9-ad11-5583-ecc3e0c562e6:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> The current reader doesn't support getting next information.
> Fragment 0:0
> [Error Id: c16c70dd-6565-463f-83a7-118ccd8442e2 on ba102-495.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: The current reader doesn't support getting next 
> information.
> Fragment 0:0
> [Error Id: c16c70dd-6565-463f-83a7-118ccd8442e2 on ba102-495.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_181]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_181]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
> Caused by: java.lang.IllegalStateException: The current reader doesn't 
> support getting next information.
>  at 
> org.apache.drill.exec.vector.complex.impl.AbstractBaseReader.next(AbstractBaseReader.java:64)
>  ~[vector-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> 

[jira] [Updated] (DRILL-6990) IllegalStateException: The current reader doesn't support getting next information

2019-10-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6990:

Fix Version/s: 1.17.0

> IllegalStateException: The current reader doesn't support getting next 
> information
> --
>
> Key: DRILL-6990
> URL: https://issues.apache.org/jira/browse/DRILL-6990
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: parqt_nestedArray.parquet.tar
>
>
> Reading a parquet file created from Spark, returns IllegalStateException: The 
> current reader doesn't support getting next information
> Drill 1.14.0, parquet file created from Spark is attached here.
> //Steps to create parquet file from Spark 2.3.1
> [root@ba102-495 ~]# cd /opt/mapr/spark/spark-2.3.1
> [root@ba102-495 spark-2.3.1]# cd bin
> [root@ba102-495 bin]# ./spark-shell
> 19/01/21 22:57:05 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Spark context Web UI available at http://qa102-45.qa.lab:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1548111430809).
> Spark session available as 'spark'.
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 2.3.1-mapr-SNAPSHOT
>  /_/
> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> import spark.implicits._
> import spark.implicits._
> scala> val df = spark.read.json("/apps/nestedDataJson.json")
> df: org.apache.spark.sql.DataFrame = [id: bigint, nested_array: 
> array>]
> scala> df.write.parquet("/apps/parqt_nestedArray.parquet")
> Data used in test
> {noformat}
> [root@ba102-495 ~]# cat nestedDataJson.json
> {"id":19,"nested_array":[[1,2,3,4],[5,6,7,8],[9,10,12]]}
> {"id":14121,"nested_array":[[1,3,4],[5,6,8],[9,11,12]]}
> {"id":18894,"nested_array":[[1,3,4],[5,6,7,8],[9,10,11,12]]}
> {"id":12499,"nested_array":[[1,4],[5,7,8],[9,11,12]]}
> {"id":120,"nested_array":[[1,4],[5,7,8],[9,10,11,12]]}
> {"id":12,"nested_array":[[1,2,3,4],[5,6,7,8],[11,12]]}
> {"id":13,"nested_array":[[1,2,3,4],[5,8],[9,10,11,12]]}
> {"id":14,"nested_array":[[1,2,3,4],[5,68],[9,10,11,12]]}
> {"id":123,"nested_array":[[1,2,3,4],[5,8],[9,10,11,12]]}
> {"id":124,"nested_array":[[1,2,4],[5,6,7,8],[9,10,11,12]]}
> {"id":134,"nested_array":[[1,4],[5,8],[9,12]]}
> {noformat}
> From drillbit.log
> {noformat}
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalStateException: The current reader doesn't support getting next 
> information. Fragment 0:0 [Error Id: c16c70dd-6565-463f-83a7-118ccd8442e2 on 
> ba102-495.qa.lab:31010]
> ...
> ...
> 2019-01-21 23:08:11,268 [23b9af24-10b9-ad11-5583-ecc3e0c562e6:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> The current reader doesn't support getting next information.
> Fragment 0:0
> [Error Id: c16c70dd-6565-463f-83a7-118ccd8442e2 on ba102-495.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: The current reader doesn't support getting next 
> information.
> Fragment 0:0
> [Error Id: c16c70dd-6565-463f-83a7-118ccd8442e2 on ba102-495.qa.lab:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  ~[drill-common-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
>  [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
>  [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
>  [drill-java-exec-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_181]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_181]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
> Caused by: java.lang.IllegalStateException: The current reader doesn't 
> support getting next information.
>  at 
> org.apache.drill.exec.vector.complex.impl.AbstractBaseReader.next(AbstractBaseReader.java:64)
>  ~[vector-1.14.0-mapr.jar:1.14.0-mapr]
>  at 
> 

[jira] [Updated] (DRILL-7407) drill hive struct not support

2019-10-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7407:

Fix Version/s: 1.17.0

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7407) drill hive struct not support

2019-10-17 Thread Igor Guzenko (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953610#comment-16953610
 ] 

Igor Guzenko commented on DRILL-7407:
-

Hello [~liuchao8158], support for Hive struct is already merged into master as 
DRILL-7253. You can either build Drill from master or wait for release 1.17.0 . 
I believe the latest will be released before December. 

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (DRILL-7407) drill hive struct not support

2019-10-17 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva closed DRILL-7407.
---
Resolution: Duplicate

Fixed in the scope of https://issues.apache.org/jira/browse/DRILL-7253.

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Fix For: 1.17.0
>
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7407) drill hive struct not support

2019-10-17 Thread liuchao (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953609#comment-16953609
 ] 

liuchao commented on DRILL-7407:


drill 1.12.0

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7407) drill hive struct not support

2019-10-17 Thread Igor Guzenko (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Guzenko updated DRILL-7407:

Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!  (was: 
!image-2019-10-17-18-11-58-563.png!)

> drill hive struct not support
> -
>
> Key: DRILL-7407
> URL: https://issues.apache.org/jira/browse/DRILL-7407
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
> Environment: !image-2019-10-17-18-11-58-563.png|width=819,height=84!
>Reporter: liuchao
>Priority: Major
> Attachments: image-2019-10-17-18-11-58-563.png, 
> image-2019-10-17-18-12-03-639.png
>
>
> !image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7407) drill hive struct not support

2019-10-17 Thread liuchao (Jira)
liuchao created DRILL-7407:
--

 Summary: drill hive struct not support
 Key: DRILL-7407
 URL: https://issues.apache.org/jira/browse/DRILL-7407
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
 Environment: !image-2019-10-17-18-11-58-563.png!
Reporter: liuchao
 Attachments: image-2019-10-17-18-11-58-563.png, 
image-2019-10-17-18-12-03-639.png

!image-2019-10-17-18-12-03-639.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)