cgivre commented on pull request #2427: URL: https://github.com/apache/drill/pull/2427#issuecomment-1013770006
> tbh styles should used if you are inferring the schema - I just thought the existing code wasn't using them > > One enhancement that I think Drill needs is to allow the format-excel code to process a number of rows (maybe defaulting to something like 5 or 10) when inferring the schema - this would help if the first data row has null values in some columns - or you might have a column that some times has numeric values but other times text and in that case the schema would need to make that cell text based. I'd definitely agree with this approach. I am actually working on a storage plugin for Google Sheets [1] which does exactly that. There is a `typifier` class which reads the data from a column and attempts to infer the data type. Perhaps once that gets merged, we can reuse the `typifier` for the Excel reader and use this approach. [1]: https://github.com/cgivre/drill/tree/storage-googlesheets/contrib/storage-googlesheets -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org