dzamo commented on pull request #2359: URL: https://github.com/apache/drill/pull/2359#issuecomment-963148567
> * Sometimes the PDF reader does not read tables perfectly and you get a mix of found headers and not found headers, so that's one reason I took that approach. This consideration will only apply for `extractHeaders = true`, right? In this case all of the readers do split out columns so I think we're good. > * I actually dislike the `columns` approach from the CSV readers because it increases the level of complexity of queries. In theory, if someone is querying a table (doesn't matter from where) they will want that broken into columns and rows. The columns array approach (IMHO) makes this a lot harder that it needs to be. The columns array does allow text files to contain jagged arrays, which is perhaps valuable? I don't see a huge saving in typing `select field_0, field_1` over `select columns[0], columns[1]`. > * This actually follows the model used in the Excel reader. My fear, and I don't know if this is real or not, is that if we profilerate plugin-specific quirks in how the schema is represented to the user then the promise of a standard SQL interface to the data gets tainted and developers and users will be put off. "It's standard except every plugin presents its data the way that author prefers" "That fragment of code I sent you for the columns array won't work here because this plugin does it differently" "This SQL script makes for confusing reading because the plugins involved name their generated columns differently, sometimes 'column', sometimes 'field', sometimes 'var'" etc. I don't feel all that strongly about `field_0` vs `columns[0]` (I mean, maybe we deprecate the columns array?) but I am finding myself thinking more and more about consistency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org