[jira] [Commented] (DRILL-7913) Improve the tips using the Tooltips
[ https://issues.apache.org/jira/browse/DRILL-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338152#comment-17338152 ] ASF GitHub Bot commented on DRILL-7913: --- luocooong opened a new pull request #2213: URL: https://github.com/apache/drill/pull/2213 # [DRILL-7913](https://issues.apache.org/jira/browse/DRILL-7913): Improve the tips using the Tooltips ## Description Actually, the tips of web console is hard to use. 1. Too slowly (Delay 2s +) 2. No good for the readable (Font size and bg color) Based on the Bootstrap (existed) and Popper.js (new), Use the new module [Tooltips](https://getbootstrap.com/docs/4.0/components/tooltips/) to improve these. ## Documentation N/A ## Testing ![image](https://user-images.githubusercontent.com/50079619/116840144-64c03a00-ac07-11eb-852a-3cd5f760d114.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve the tips using the Tooltips > --- > > Key: DRILL-7913 > URL: https://issues.apache.org/jira/browse/DRILL-7913 > Project: Apache Drill > Issue Type: Improvement >Reporter: Cong Luo >Assignee: Cong Luo >Priority: Major > Fix For: 1.19.0 > > > Actually, the tips of web console is hard to use. > # Too slowly (Delay 2s +) > # No good for the readable (Font size and bg color) > Based on the Bootstrap (existed) and Popper.js (new), Use the new module > Tooltips to improve these. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7913) Improve the tips using the Tooltips
Cong Luo created DRILL-7913: --- Summary: Improve the tips using the Tooltips Key: DRILL-7913 URL: https://issues.apache.org/jira/browse/DRILL-7913 Project: Apache Drill Issue Type: Improvement Reporter: Cong Luo Assignee: Cong Luo Fix For: 1.19.0 Actually, the tips of web console is hard to use. # Too slowly (Delay 2s +) # No good for the readable (Font size and bg color) Based on the Bootstrap (existed) and Popper.js (new), Use the new module Tooltips to improve these. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader
[ https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338104#comment-17338104 ] ASF GitHub Bot commented on DRILL-7912: --- cgivre merged pull request #2211: URL: https://github.com/apache/drill/pull/2211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Sheet Names to Excel Reader > --- > > Key: DRILL-7912 > URL: https://issues.apache.org/jira/browse/DRILL-7912 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > Currently, there is no way to determine what sheets are available in an Excel > file. This PR adds a metadata field called `_sheets` which a user can query > to determine available sheets in a given file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7864) Parquet file could not be read correctly
[ https://issues.apache.org/jira/browse/DRILL-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338074#comment-17338074 ] Vova Vysotskyi commented on DRILL-7864: --- [~matthros], I have tried querying the attached parquet file on the fresh Drill master version, and it returned the correct results, so looks like it was already fixed (perhaps by parquet update). Could you please confirm that it works as expected? > Parquet file could not be read correctly > > > Key: DRILL-7864 > URL: https://issues.apache.org/jira/browse/DRILL-7864 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.18.0 >Reporter: Matthias Rosenthaler >Priority: Major > Attachments: drill_query.csv, output.parquet, parquet-dotnet.csv > > > The following parquet file which is generated by ParquetSharp (which is using > the underlying apache arrow c++ lib) is not readable by drill. The values of > the columns are displaced. If I write the affected float32 columns > "InjectionRate" and "I_injection_IA" as float64, everything is fine. > Update: It seems that the bug is *caused by dictionary encoding*. If I turn > this feature of, drill is able to read it. So please take a look into reading > dictionary encoded columns in drill to solve the bug. > Also created a ticket for the arrow project, but they redirect me to the > drill project. https://issues.apache.org/jira/browse/ARROW-11629 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader
[ https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338058#comment-17338058 ] ASF GitHub Bot commented on DRILL-7912: --- cgivre commented on a change in pull request #2211: URL: https://github.com/apache/drill/pull/2211#discussion_r624711472 ## File path: contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java ## @@ -116,6 +117,23 @@ public String getFieldName() { } } + private enum IMPLICIT_LIST_COLUMN { +/** + * A list of the available sheets in the file. + */ +SHEETS("_sheets"); + +private final String fieldName; + +IMPLICIT_LIST_COLUMN(String fieldName) { + this.fieldName = fieldName; +} + +public String getFieldName() { Review comment: I fixed this. Really that function should be called something else, so I made it `void` and renamed the function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Sheet Names to Excel Reader > --- > > Key: DRILL-7912 > URL: https://issues.apache.org/jira/browse/DRILL-7912 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > Currently, there is no way to determine what sheets are available in an Excel > file. This PR adds a metadata field called `_sheets` which a user can query > to determine available sheets in a given file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader
[ https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338059#comment-17338059 ] ASF GitHub Bot commented on DRILL-7912: --- cgivre commented on pull request #2211: URL: https://github.com/apache/drill/pull/2211#issuecomment-830825978 Hi @luocooong I addressed your comments. Could you please take another look? Thx! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Sheet Names to Excel Reader > --- > > Key: DRILL-7912 > URL: https://issues.apache.org/jira/browse/DRILL-7912 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > Currently, there is no way to determine what sheets are available in an Excel > file. This PR adds a metadata field called `_sheets` which a user can query > to determine available sheets in a given file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader
[ https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338057#comment-17338057 ] ASF GitHub Bot commented on DRILL-7912: --- cgivre commented on a change in pull request #2211: URL: https://github.com/apache/drill/pull/2211#discussion_r624711391 ## File path: contrib/format-excel/pom.xml ## @@ -67,7 +67,7 @@ com.github.pjfanning excel-streaming-reader - 3.0.3 + 3.0.4 Review comment: It is noted in the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Sheet Names to Excel Reader > --- > > Key: DRILL-7912 > URL: https://issues.apache.org/jira/browse/DRILL-7912 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > Currently, there is no way to determine what sheets are available in an Excel > file. This PR adds a metadata field called `_sheets` which a user can query > to determine available sheets in a given file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader
[ https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338056#comment-17338056 ] ASF GitHub Bot commented on DRILL-7912: --- cgivre commented on a change in pull request #2211: URL: https://github.com/apache/drill/pull/2211#discussion_r624710583 ## File path: contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java ## @@ -116,6 +117,23 @@ public String getFieldName() { } } + private enum IMPLICIT_LIST_COLUMN { +/** + * A list of the available sheets in the file. + */ +SHEETS("_sheets"); + +private final String fieldName; + +IMPLICIT_LIST_COLUMN(String fieldName) { + this.fieldName = fieldName; +} + +public String getFieldName() { Review comment: I don't believe that will work in this case. The line `this.fieldName = fieldName` assigns the variable. If you call `getFieldName()` when that is not assigned, you will get some error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Sheet Names to Excel Reader > --- > > Key: DRILL-7912 > URL: https://issues.apache.org/jira/browse/DRILL-7912 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > Currently, there is no way to determine what sheets are available in an Excel > file. This PR adds a metadata field called `_sheets` which a user can query > to determine available sheets in a given file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader
[ https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338053#comment-17338053 ] ASF GitHub Bot commented on DRILL-7912: --- cgivre commented on a change in pull request #2211: URL: https://github.com/apache/drill/pull/2211#discussion_r624710376 ## File path: contrib/format-excel/pom.xml ## @@ -67,7 +67,7 @@ com.github.pjfanning excel-streaming-reader - 3.0.3 + 3.0.4 Review comment: Sure. I should have broken that up into multiple commits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Sheet Names to Excel Reader > --- > > Key: DRILL-7912 > URL: https://issues.apache.org/jira/browse/DRILL-7912 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > Currently, there is no way to determine what sheets are available in an Excel > file. This PR adds a metadata field called `_sheets` which a user can query > to determine available sheets in a given file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader
[ https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338051#comment-17338051 ] ASF GitHub Bot commented on DRILL-7912: --- cgivre commented on a change in pull request #2211: URL: https://github.com/apache/drill/pull/2211#discussion_r624710325 ## File path: contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java ## @@ -566,6 +598,34 @@ private void writeMetadata() { metadataColumnWriters.get(index).setTimestamp(Instant.ofEpochMilli(timeValue.getTime())); } } + +// Write the sheet names. Since this is the only list field +int listIndex = IMPLICIT_STRING_COLUMN.values().length + IMPLICIT_TIMESTAMP_COLUMN.values().length; +String sheetColumnName = IMPLICIT_LIST_COLUMN.SHEETS.fieldName; +List sheetNames = listMetadata.get(sheetColumnName); + +if (sheetNameWriter == null) { + int sheetColumnIndex = rowWriter.tupleSchema().index(IMPLICIT_LIST_COLUMN.SHEETS.getFieldName()); + if (sheetColumnIndex == -1) { +ColumnMetadata colSchema = MetadataUtils.newScalar(sheetColumnName, MinorType.VARCHAR, DataMode.REPEATED); +colSchema.setBooleanProperty(ColumnMetadata.EXCLUDE_FROM_WILDCARD, true); +listIndex = rowWriter.addColumn(colSchema); + } + sheetNameWriter = rowWriter.column(listIndex).array().scalar(); +} + +for (String sheetName : sheetNames) { + sheetNameWriter.setString(sheetName); +} + } + + private List getSheetNames() { +List sheets = new ArrayList<>(); +int sheetCount = streamingWorkbook.getNumberOfSheets(); +for (int i = 0; i < sheetCount; i++) { + sheets.add(streamingWorkbook.getSheetName(i)); Review comment: Excel populates the names by default as `Sheet 1` etc. I don't think Excel will let you have a `null` name, and if it is blank you'd just get `""` which will work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Sheet Names to Excel Reader > --- > > Key: DRILL-7912 > URL: https://issues.apache.org/jira/browse/DRILL-7912 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > Currently, there is no way to determine what sheets are available in an Excel > file. This PR adds a metadata field called `_sheets` which a user can query > to determine available sheets in a given file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader
[ https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338050#comment-17338050 ] ASF GitHub Bot commented on DRILL-7912: --- cgivre commented on a change in pull request #2211: URL: https://github.com/apache/drill/pull/2211#discussion_r624710173 ## File path: contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java ## @@ -566,6 +598,34 @@ private void writeMetadata() { metadataColumnWriters.get(index).setTimestamp(Instant.ofEpochMilli(timeValue.getTime())); } } + +// Write the sheet names. Since this is the only list field +int listIndex = IMPLICIT_STRING_COLUMN.values().length + IMPLICIT_TIMESTAMP_COLUMN.values().length; +String sheetColumnName = IMPLICIT_LIST_COLUMN.SHEETS.fieldName; +List sheetNames = listMetadata.get(sheetColumnName); + +if (sheetNameWriter == null) { + int sheetColumnIndex = rowWriter.tupleSchema().index(IMPLICIT_LIST_COLUMN.SHEETS.getFieldName()); + if (sheetColumnIndex == -1) { +ColumnMetadata colSchema = MetadataUtils.newScalar(sheetColumnName, MinorType.VARCHAR, DataMode.REPEATED); +colSchema.setBooleanProperty(ColumnMetadata.EXCLUDE_FROM_WILDCARD, true); Review comment: > Great feature. Does this function also apply to other format plugins (`EXCLUDE_FROM_WILDCARD ` set to true)? The `EXCLUDE_FROM_WILDCARD` feature is meant for metadata fields or other info that you'd want to access to, but you'd also want the user to explicitly ask for. In this case, the sheet names... Some other format plugins, such as the log reader, have some metadata fields like this. I think there may be a few in various storage plugins as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Sheet Names to Excel Reader > --- > > Key: DRILL-7912 > URL: https://issues.apache.org/jira/browse/DRILL-7912 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > Currently, there is no way to determine what sheets are available in an Excel > file. This PR adds a metadata field called `_sheets` which a user can query > to determine available sheets in a given file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7893) Column alias is not working for a parquet file
[ https://issues.apache.org/jira/browse/DRILL-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337999#comment-17337999 ] ASF GitHub Bot commented on DRILL-7893: --- luocooong merged pull request #2208: URL: https://github.com/apache/drill/pull/2208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Column alias is not working for a parquet file > -- > > Key: DRILL-7893 > URL: https://issues.apache.org/jira/browse/DRILL-7893 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.17.0 >Reporter: Matthias Rosenthaler >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.19.0 > > Attachments: values.parquet > > > The following query results in a column name of "shot_id" instead of the > expected "x". > SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001' > The strange thing is, if I modify the query, like adding a limit clause, it > is working: > SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001' LIMIT 1000 > [^values.parquet] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7893) Column alias is not working for a parquet file
[ https://issues.apache.org/jira/browse/DRILL-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337998#comment-17337998 ] ASF GitHub Bot commented on DRILL-7893: --- vvysotskyi commented on pull request #2208: URL: https://github.com/apache/drill/pull/2208#issuecomment-830798068 @luocooong, there is no need to test these changes in the distributed mode. Change in the final plan is minimal, it either creates an additional project if necessary or leaves the plan unchanged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Column alias is not working for a parquet file > -- > > Key: DRILL-7893 > URL: https://issues.apache.org/jira/browse/DRILL-7893 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.17.0 >Reporter: Matthias Rosenthaler >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.19.0 > > Attachments: values.parquet > > > The following query results in a column name of "shot_id" instead of the > expected "x". > SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001' > The strange thing is, if I modify the query, like adding a limit clause, it > is working: > SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001' LIMIT 1000 > [^values.parquet] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7893) Column alias is not working for a parquet file
[ https://issues.apache.org/jira/browse/DRILL-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337995#comment-17337995 ] ASF GitHub Bot commented on DRILL-7893: --- luocooong commented on pull request #2208: URL: https://github.com/apache/drill/pull/2208#issuecomment-830794800 @vvysotskyi Hi. Have you tested these changes in a real world (distributed mode for better)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Column alias is not working for a parquet file > -- > > Key: DRILL-7893 > URL: https://issues.apache.org/jira/browse/DRILL-7893 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.17.0 >Reporter: Matthias Rosenthaler >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.19.0 > > Attachments: values.parquet > > > The following query results in a column name of "shot_id" instead of the > expected "x". > SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001' > The strange thing is, if I modify the query, like adding a limit clause, it > is working: > SELECT shot_id as x FROM values.parquet WHERE step = 'RPCurve_001' LIMIT 1000 > [^values.parquet] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-7912) Add Sheet Names to Excel Reader
[ https://issues.apache.org/jira/browse/DRILL-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337992#comment-17337992 ] ASF GitHub Bot commented on DRILL-7912: --- luocooong commented on a change in pull request #2211: URL: https://github.com/apache/drill/pull/2211#discussion_r624658770 ## File path: contrib/format-excel/pom.xml ## @@ -67,7 +67,7 @@ com.github.pjfanning excel-streaming-reader - 3.0.3 + 3.0.4 Review comment: It‘s always good to use the latest version. Is possible to record it on the `Description` of PR and `Message` of Commits? ## File path: contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java ## @@ -566,6 +598,34 @@ private void writeMetadata() { metadataColumnWriters.get(index).setTimestamp(Instant.ofEpochMilli(timeValue.getTime())); } } + +// Write the sheet names. Since this is the only list field +int listIndex = IMPLICIT_STRING_COLUMN.values().length + IMPLICIT_TIMESTAMP_COLUMN.values().length; +String sheetColumnName = IMPLICIT_LIST_COLUMN.SHEETS.fieldName; +List sheetNames = listMetadata.get(sheetColumnName); + +if (sheetNameWriter == null) { + int sheetColumnIndex = rowWriter.tupleSchema().index(IMPLICIT_LIST_COLUMN.SHEETS.getFieldName()); + if (sheetColumnIndex == -1) { +ColumnMetadata colSchema = MetadataUtils.newScalar(sheetColumnName, MinorType.VARCHAR, DataMode.REPEATED); +colSchema.setBooleanProperty(ColumnMetadata.EXCLUDE_FROM_WILDCARD, true); +listIndex = rowWriter.addColumn(colSchema); + } + sheetNameWriter = rowWriter.column(listIndex).array().scalar(); +} + +for (String sheetName : sheetNames) { + sheetNameWriter.setString(sheetName); +} + } + + private List getSheetNames() { +List sheets = new ArrayList<>(); +int sheetCount = streamingWorkbook.getNumberOfSheets(); +for (int i = 0; i < sheetCount; i++) { + sheets.add(streamingWorkbook.getSheetName(i)); Review comment: Is it worked fine if the `getSheetName()`is a blank? ## File path: contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java ## @@ -566,6 +598,34 @@ private void writeMetadata() { metadataColumnWriters.get(index).setTimestamp(Instant.ofEpochMilli(timeValue.getTime())); } } + +// Write the sheet names. Since this is the only list field +int listIndex = IMPLICIT_STRING_COLUMN.values().length + IMPLICIT_TIMESTAMP_COLUMN.values().length; +String sheetColumnName = IMPLICIT_LIST_COLUMN.SHEETS.fieldName; +List sheetNames = listMetadata.get(sheetColumnName); + +if (sheetNameWriter == null) { + int sheetColumnIndex = rowWriter.tupleSchema().index(IMPLICIT_LIST_COLUMN.SHEETS.getFieldName()); + if (sheetColumnIndex == -1) { +ColumnMetadata colSchema = MetadataUtils.newScalar(sheetColumnName, MinorType.VARCHAR, DataMode.REPEATED); +colSchema.setBooleanProperty(ColumnMetadata.EXCLUDE_FROM_WILDCARD, true); Review comment: Great feature. Does this function also apply to other format plugins (`EXCLUDE_FROM_WILDCARD ` set to true)? ## File path: contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java ## @@ -116,6 +117,23 @@ public String getFieldName() { } } + private enum IMPLICIT_LIST_COLUMN { +/** + * A list of the available sheets in the file. + */ +SHEETS("_sheets"); + +private final String fieldName; + +IMPLICIT_LIST_COLUMN(String fieldName) { + this.fieldName = fieldName; +} + +public String getFieldName() { Review comment: It can simply declare to `String getFieldName()`. the code of above is same to this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Sheet Names to Excel Reader > --- > > Key: DRILL-7912 > URL: https://issues.apache.org/jira/browse/DRILL-7912 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: 1.18.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.19.0 > > > Currently, there is no way to determine what sheets are available in an Excel > file. This PR adds a metadata field called `_sheets` which a user can query > to determine available sheets in a given file. -- This message was sent by Atlassian Jira (v8.3.4#803005)