[ https://issues.apache.org/jira/browse/DRILL-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Givre resolved DRILL-7423. ---------------------------------- Resolution: Resolved > Create More Efficient Way to Read Excel Cells > --------------------------------------------- > > Key: DRILL-7423 > URL: https://issues.apache.org/jira/browse/DRILL-7423 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.18.0 > Reporter: Charles Givre > Priority: Major > > The Excel format plugin reads cells but there are ways to make the reading > process more efficient. Since the schema of an Excel file is not known in > advance, Drill must read the first row of data in order to extract the > schema. > It is actually a bit more complex. To read the schema, Drill must first read > the header rows and convert them all into Strings. This gets us the header > names if present. > Drill cannot create writers until it actually reads the first row of data > where it will determine the data types. This creates an inefficiency in that > when Drill is writing the columns, it has to do a hash lookup for each > column. Since the columns are in a fixed order, it may be possible to store > the writers in an array and gain some efficiency there. > Also at present, if the columns are heterogenous, Drill requires the user to > use allTextMode to query the data. It would be nice if Drill could query the > data w/o having to set that. -- This message was sent by Atlassian Jira (v8.3.4#803005)