[GitHub] [drill] cgivre commented on pull request #2427: DRILL-8095: upgrade to poi 5.2.0

GitBox Sat, 15 Jan 2022 15:21:32 -0800


cgivre commented on pull request #2427:
URL: https://github.com/apache/drill/pull/2427#issuecomment-1013770006



   > tbh styles should used if you are inferring the schema - I just thought 
the existing code wasn't using them
   > 
   > One enhancement that I think Drill needs is to allow the format-excel code 
to process a number of rows (maybe defaulting to something like 5 or 10) when 
inferring the schema - this would help if the first data row has null values in 
some columns - or you might have a column that some times has numeric values 
but other times text and in that case the schema would need to make that cell 
text based.
   
   I'd definitely agree with this approach.  I am actually working on a storage 
plugin for Google Sheets [1] which does exactly that.  There is a `typifier` 
class which reads the data from a column and attempts to infer the data type.  
Perhaps once that gets merged, we can reuse the `typifier` for the Excel reader 
and use this approach. 
   
   
   [1]: 
https://github.com/cgivre/drill/tree/storage-googlesheets/contrib/storage-googlesheets


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [drill] cgivre commented on pull request #2427: DRILL-8095: upgrade to poi 5.2.0

Reply via email to