[jira] [Commented] (DRILL-7913) Improve the tips using the Tooltips

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338152#comment-17338152
 ] 

ASF GitHub Bot commented on DRILL-7913:
---

luocooong opened a new pull request #2213:
URL: https://github.com/apache/drill/pull/2213


   # [DRILL-7913](https://issues.apache.org/jira/browse/DRILL-7913): Improve 
the tips using the Tooltips
   
   ## Description
   
   Actually, the tips of web console is hard to use.
   
   1. Too slowly (Delay 2s +)
   2. No good for the readable (Font size and bg color)
   
   Based on the Bootstrap (existed) and Popper.js (new), Use the new module 
[Tooltips](https://getbootstrap.com/docs/4.0/components/tooltips/) to improve 
these. 
   
   ## Documentation
   N/A
   
   ## Testing
   
![image](https://user-images.githubusercontent.com/50079619/116840144-64c03a00-ac07-11eb-852a-3cd5f760d114.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve the tips using the Tooltips
> ---
>
> Key: DRILL-7913
> URL: https://issues.apache.org/jira/browse/DRILL-7913
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
> Fix For: 1.19.0
>
>
> Actually, the tips of web console is hard to use.
>  # Too slowly (Delay 2s +)
>  # No good for the readable (Font size and bg color)
> Based on the Bootstrap (existed) and Popper.js (new), Use the new module 
> Tooltips to improve these. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7913) Improve the tips using the Tooltips

2021-05-02 Thread Cong Luo (Jira)
Cong Luo created DRILL-7913:
---

 Summary: Improve the tips using the Tooltips
 Key: DRILL-7913
 URL: https://issues.apache.org/jira/browse/DRILL-7913
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Cong Luo
Assignee: Cong Luo
 Fix For: 1.19.0


Actually, the tips of web console is hard to use.
 # Too slowly (Delay 2s +)
 # No good for the readable (Font size and bg color)

Based on the Bootstrap (existed) and Popper.js (new), Use the new module 
Tooltips to improve these. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338104#comment-17338104
 ] 

ASF GitHub Bot commented on DRILL-7912:
---

cgivre merged pull request #2211:
URL: https://github.com/apache/drill/pull/2211


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Sheet Names to Excel Reader
> ---
>
> Key: DRILL-7912
> URL: https://issues.apache.org/jira/browse/DRILL-7912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently, there is no way to determine what sheets are available in an Excel 
> file.  This PR adds a metadata field called `_sheets` which a user can query 
> to determine available sheets in a given file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7864) Parquet file could not be read correctly

2021-05-02 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338074#comment-17338074
 ] 

Vova Vysotskyi commented on DRILL-7864:
---

[~matthros], I have tried querying the attached parquet file on the fresh Drill 
master version, and it returned the correct results, so looks like it was 
already fixed (perhaps by parquet update). Could you please confirm that it 
works as expected?

> Parquet file could not be read correctly
> 
>
> Key: DRILL-7864
> URL: https://issues.apache.org/jira/browse/DRILL-7864
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.18.0
>Reporter: Matthias Rosenthaler
>Priority: Major
> Attachments: drill_query.csv, output.parquet, parquet-dotnet.csv
>
>
> The following parquet file which is generated by ParquetSharp (which is using 
> the underlying apache arrow c++ lib) is not readable by drill. The values of 
> the columns are displaced. If I write the affected float32 columns 
> "InjectionRate" and "I_injection_IA" as float64, everything is fine.
> Update: It seems that the bug is *caused by dictionary encoding*. If I turn 
> this feature of, drill is able to read it. So please take a look into reading 
> dictionary encoded columns in drill to solve the bug.
> Also created a ticket for the arrow project, but they redirect me to the 
> drill project. https://issues.apache.org/jira/browse/ARROW-11629
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338058#comment-17338058
 ] 

ASF GitHub Bot commented on DRILL-7912:
---

cgivre commented on a change in pull request #2211:
URL: https://github.com/apache/drill/pull/2211#discussion_r624711472



##
File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
##
@@ -116,6 +117,23 @@ public String getFieldName() {
 }
   }
 
+  private enum IMPLICIT_LIST_COLUMN {
+/**
+ * A list of the available sheets in the file.
+ */
+SHEETS("_sheets");
+
+private final String fieldName;
+
+IMPLICIT_LIST_COLUMN(String fieldName) {
+  this.fieldName = fieldName;
+}
+
+public String getFieldName() {

Review comment:
   I fixed this.  Really that function should be called something else, so 
I made it `void` and renamed the function. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Sheet Names to Excel Reader
> ---
>
> Key: DRILL-7912
> URL: https://issues.apache.org/jira/browse/DRILL-7912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently, there is no way to determine what sheets are available in an Excel 
> file.  This PR adds a metadata field called `_sheets` which a user can query 
> to determine available sheets in a given file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338059#comment-17338059
 ] 

ASF GitHub Bot commented on DRILL-7912:
---

cgivre commented on pull request #2211:
URL: https://github.com/apache/drill/pull/2211#issuecomment-830825978


   Hi @luocooong 
   I addressed your comments.  Could you please take another look?
   Thx!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Sheet Names to Excel Reader
> ---
>
> Key: DRILL-7912
> URL: https://issues.apache.org/jira/browse/DRILL-7912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently, there is no way to determine what sheets are available in an Excel 
> file.  This PR adds a metadata field called `_sheets` which a user can query 
> to determine available sheets in a given file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338057#comment-17338057
 ] 

ASF GitHub Bot commented on DRILL-7912:
---

cgivre commented on a change in pull request #2211:
URL: https://github.com/apache/drill/pull/2211#discussion_r624711391



##
File path: contrib/format-excel/pom.xml
##
@@ -67,7 +67,7 @@
 
   com.github.pjfanning
   excel-streaming-reader
-  3.0.3
+  3.0.4

Review comment:
   It is noted in the PR description.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Sheet Names to Excel Reader
> ---
>
> Key: DRILL-7912
> URL: https://issues.apache.org/jira/browse/DRILL-7912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently, there is no way to determine what sheets are available in an Excel 
> file.  This PR adds a metadata field called `_sheets` which a user can query 
> to determine available sheets in a given file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338056#comment-17338056
 ] 

ASF GitHub Bot commented on DRILL-7912:
---

cgivre commented on a change in pull request #2211:
URL: https://github.com/apache/drill/pull/2211#discussion_r624710583



##
File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
##
@@ -116,6 +117,23 @@ public String getFieldName() {
 }
   }
 
+  private enum IMPLICIT_LIST_COLUMN {
+/**
+ * A list of the available sheets in the file.
+ */
+SHEETS("_sheets");
+
+private final String fieldName;
+
+IMPLICIT_LIST_COLUMN(String fieldName) {
+  this.fieldName = fieldName;
+}
+
+public String getFieldName() {

Review comment:
   I don't believe that will work in this case.  The line `this.fieldName = 
fieldName` assigns the variable.  If you call `getFieldName()` when that is not 
assigned, you will get some error. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Sheet Names to Excel Reader
> ---
>
> Key: DRILL-7912
> URL: https://issues.apache.org/jira/browse/DRILL-7912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently, there is no way to determine what sheets are available in an Excel 
> file.  This PR adds a metadata field called `_sheets` which a user can query 
> to determine available sheets in a given file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338053#comment-17338053
 ] 

ASF GitHub Bot commented on DRILL-7912:
---

cgivre commented on a change in pull request #2211:
URL: https://github.com/apache/drill/pull/2211#discussion_r624710376



##
File path: contrib/format-excel/pom.xml
##
@@ -67,7 +67,7 @@
 
   com.github.pjfanning
   excel-streaming-reader
-  3.0.3
+  3.0.4

Review comment:
   Sure.  I should have broken that up into multiple commits. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Sheet Names to Excel Reader
> ---
>
> Key: DRILL-7912
> URL: https://issues.apache.org/jira/browse/DRILL-7912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently, there is no way to determine what sheets are available in an Excel 
> file.  This PR adds a metadata field called `_sheets` which a user can query 
> to determine available sheets in a given file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338051#comment-17338051
 ] 

ASF GitHub Bot commented on DRILL-7912:
---

cgivre commented on a change in pull request #2211:
URL: https://github.com/apache/drill/pull/2211#discussion_r624710325



##
File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
##
@@ -566,6 +598,34 @@ private void writeMetadata() {
 
metadataColumnWriters.get(index).setTimestamp(Instant.ofEpochMilli(timeValue.getTime()));
   }
 }
+
+// Write the sheet names.  Since this is the only list field
+int listIndex = IMPLICIT_STRING_COLUMN.values().length + 
IMPLICIT_TIMESTAMP_COLUMN.values().length;
+String sheetColumnName = IMPLICIT_LIST_COLUMN.SHEETS.fieldName;
+List sheetNames = listMetadata.get(sheetColumnName);
+
+if (sheetNameWriter == null) {
+  int sheetColumnIndex = 
rowWriter.tupleSchema().index(IMPLICIT_LIST_COLUMN.SHEETS.getFieldName());
+  if (sheetColumnIndex == -1) {
+ColumnMetadata colSchema = MetadataUtils.newScalar(sheetColumnName, 
MinorType.VARCHAR, DataMode.REPEATED);
+colSchema.setBooleanProperty(ColumnMetadata.EXCLUDE_FROM_WILDCARD, 
true);
+listIndex = rowWriter.addColumn(colSchema);
+  }
+  sheetNameWriter = rowWriter.column(listIndex).array().scalar();
+}
+
+for (String sheetName : sheetNames) {
+  sheetNameWriter.setString(sheetName);
+}
+  }
+
+  private List getSheetNames() {
+List sheets = new ArrayList<>();
+int sheetCount = streamingWorkbook.getNumberOfSheets();
+for (int i = 0; i < sheetCount; i++) {
+  sheets.add(streamingWorkbook.getSheetName(i));

Review comment:
   Excel populates the names by default as `Sheet 1` etc.  I don't think 
Excel will let you have a `null` name, and if it is blank you'd just get `""` 
which will work. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Sheet Names to Excel Reader
> ---
>
> Key: DRILL-7912
> URL: https://issues.apache.org/jira/browse/DRILL-7912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently, there is no way to determine what sheets are available in an Excel 
> file.  This PR adds a metadata field called `_sheets` which a user can query 
> to determine available sheets in a given file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338050#comment-17338050
 ] 

ASF GitHub Bot commented on DRILL-7912:
---

cgivre commented on a change in pull request #2211:
URL: https://github.com/apache/drill/pull/2211#discussion_r624710173



##
File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
##
@@ -566,6 +598,34 @@ private void writeMetadata() {
 
metadataColumnWriters.get(index).setTimestamp(Instant.ofEpochMilli(timeValue.getTime()));
   }
 }
+
+// Write the sheet names.  Since this is the only list field
+int listIndex = IMPLICIT_STRING_COLUMN.values().length + 
IMPLICIT_TIMESTAMP_COLUMN.values().length;
+String sheetColumnName = IMPLICIT_LIST_COLUMN.SHEETS.fieldName;
+List sheetNames = listMetadata.get(sheetColumnName);
+
+if (sheetNameWriter == null) {
+  int sheetColumnIndex = 
rowWriter.tupleSchema().index(IMPLICIT_LIST_COLUMN.SHEETS.getFieldName());
+  if (sheetColumnIndex == -1) {
+ColumnMetadata colSchema = MetadataUtils.newScalar(sheetColumnName, 
MinorType.VARCHAR, DataMode.REPEATED);
+colSchema.setBooleanProperty(ColumnMetadata.EXCLUDE_FROM_WILDCARD, 
true);

Review comment:
   > Great feature. Does this function also apply to other format plugins 
(`EXCLUDE_FROM_WILDCARD ` set to true)?
   
   The `EXCLUDE_FROM_WILDCARD` feature is meant for metadata fields or other 
info that you'd want to access to, but you'd also want the user to explicitly 
ask for.  In this case, the sheet names... 
   
   Some other format plugins, such as the log reader, have some metadata fields 
like this.  I think there may be a few in various storage plugins as well. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Sheet Names to Excel Reader
> ---
>
> Key: DRILL-7912
> URL: https://issues.apache.org/jira/browse/DRILL-7912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently, there is no way to determine what sheets are available in an Excel 
> file.  This PR adds a metadata field called `_sheets` which a user can query 
> to determine available sheets in a given file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7893) Column alias is not working for a parquet file

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337999#comment-17337999
 ] 

ASF GitHub Bot commented on DRILL-7893:
---

luocooong merged pull request #2208:
URL: https://github.com/apache/drill/pull/2208


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Column alias is not working for a parquet file
> --
>
> Key: DRILL-7893
> URL: https://issues.apache.org/jira/browse/DRILL-7893
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.17.0
>Reporter: Matthias Rosenthaler
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: values.parquet
>
>
> The following query results in a column name of "shot_id" instead of the 
> expected "x".
> SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001'
> The strange thing is, if I modify the query, like adding a limit clause, it 
> is working:
> SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001' LIMIT 1000
> [^values.parquet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7893) Column alias is not working for a parquet file

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337998#comment-17337998
 ] 

ASF GitHub Bot commented on DRILL-7893:
---

vvysotskyi commented on pull request #2208:
URL: https://github.com/apache/drill/pull/2208#issuecomment-830798068


   @luocooong, there is no need to test these changes in the distributed mode. 
Change in the final plan is minimal, it either creates an additional project if 
necessary or leaves the plan unchanged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Column alias is not working for a parquet file
> --
>
> Key: DRILL-7893
> URL: https://issues.apache.org/jira/browse/DRILL-7893
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.17.0
>Reporter: Matthias Rosenthaler
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: values.parquet
>
>
> The following query results in a column name of "shot_id" instead of the 
> expected "x".
> SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001'
> The strange thing is, if I modify the query, like adding a limit clause, it 
> is working:
> SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001' LIMIT 1000
> [^values.parquet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7893) Column alias is not working for a parquet file

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337995#comment-17337995
 ] 

ASF GitHub Bot commented on DRILL-7893:
---

luocooong commented on pull request #2208:
URL: https://github.com/apache/drill/pull/2208#issuecomment-830794800


   @vvysotskyi Hi.  Have you tested these changes in a real world (distributed 
mode for better)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Column alias is not working for a parquet file
> --
>
> Key: DRILL-7893
> URL: https://issues.apache.org/jira/browse/DRILL-7893
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.17.0
>Reporter: Matthias Rosenthaler
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: values.parquet
>
>
> The following query results in a column name of "shot_id" instead of the 
> expected "x".
> SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001'
> The strange thing is, if I modify the query, like adding a limit clause, it 
> is working:
> SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001' LIMIT 1000
> [^values.parquet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader

2021-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337992#comment-17337992
 ] 

ASF GitHub Bot commented on DRILL-7912:
---

luocooong commented on a change in pull request #2211:
URL: https://github.com/apache/drill/pull/2211#discussion_r624658770



##
File path: contrib/format-excel/pom.xml
##
@@ -67,7 +67,7 @@
 
   com.github.pjfanning
   excel-streaming-reader
-  3.0.3
+  3.0.4

Review comment:
   It‘s always good to use the latest version. Is possible to record it on 
the `Description` of PR  and `Message` of Commits?

##
File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
##
@@ -566,6 +598,34 @@ private void writeMetadata() {
 
metadataColumnWriters.get(index).setTimestamp(Instant.ofEpochMilli(timeValue.getTime()));
   }
 }
+
+// Write the sheet names.  Since this is the only list field
+int listIndex = IMPLICIT_STRING_COLUMN.values().length + 
IMPLICIT_TIMESTAMP_COLUMN.values().length;
+String sheetColumnName = IMPLICIT_LIST_COLUMN.SHEETS.fieldName;
+List sheetNames = listMetadata.get(sheetColumnName);
+
+if (sheetNameWriter == null) {
+  int sheetColumnIndex = 
rowWriter.tupleSchema().index(IMPLICIT_LIST_COLUMN.SHEETS.getFieldName());
+  if (sheetColumnIndex == -1) {
+ColumnMetadata colSchema = MetadataUtils.newScalar(sheetColumnName, 
MinorType.VARCHAR, DataMode.REPEATED);
+colSchema.setBooleanProperty(ColumnMetadata.EXCLUDE_FROM_WILDCARD, 
true);
+listIndex = rowWriter.addColumn(colSchema);
+  }
+  sheetNameWriter = rowWriter.column(listIndex).array().scalar();
+}
+
+for (String sheetName : sheetNames) {
+  sheetNameWriter.setString(sheetName);
+}
+  }
+
+  private List getSheetNames() {
+List sheets = new ArrayList<>();
+int sheetCount = streamingWorkbook.getNumberOfSheets();
+for (int i = 0; i < sheetCount; i++) {
+  sheets.add(streamingWorkbook.getSheetName(i));

Review comment:
   Is it worked fine if the `getSheetName()`is a blank?

##
File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
##
@@ -566,6 +598,34 @@ private void writeMetadata() {
 
metadataColumnWriters.get(index).setTimestamp(Instant.ofEpochMilli(timeValue.getTime()));
   }
 }
+
+// Write the sheet names.  Since this is the only list field
+int listIndex = IMPLICIT_STRING_COLUMN.values().length + 
IMPLICIT_TIMESTAMP_COLUMN.values().length;
+String sheetColumnName = IMPLICIT_LIST_COLUMN.SHEETS.fieldName;
+List sheetNames = listMetadata.get(sheetColumnName);
+
+if (sheetNameWriter == null) {
+  int sheetColumnIndex = 
rowWriter.tupleSchema().index(IMPLICIT_LIST_COLUMN.SHEETS.getFieldName());
+  if (sheetColumnIndex == -1) {
+ColumnMetadata colSchema = MetadataUtils.newScalar(sheetColumnName, 
MinorType.VARCHAR, DataMode.REPEATED);
+colSchema.setBooleanProperty(ColumnMetadata.EXCLUDE_FROM_WILDCARD, 
true);

Review comment:
   Great feature. Does this function also apply to other format plugins 
(`EXCLUDE_FROM_WILDCARD ` set to true)?

##
File path: 
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
##
@@ -116,6 +117,23 @@ public String getFieldName() {
 }
   }
 
+  private enum IMPLICIT_LIST_COLUMN {
+/**
+ * A list of the available sheets in the file.
+ */
+SHEETS("_sheets");
+
+private final String fieldName;
+
+IMPLICIT_LIST_COLUMN(String fieldName) {
+  this.fieldName = fieldName;
+}
+
+public String getFieldName() {

Review comment:
   It can simply declare  to `String getFieldName()`. the code of above is 
same to this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Sheet Names to Excel Reader
> ---
>
> Key: DRILL-7912
> URL: https://issues.apache.org/jira/browse/DRILL-7912
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.18.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.19.0
>
>
> Currently, there is no way to determine what sheets are available in an Excel 
> file.  This PR adds a metadata field called `_sheets` which a user can query 
> to determine available sheets in a given file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)