dzamo edited a comment on pull request #2359:
URL: https://github.com/apache/drill/pull/2359#issuecomment-963148567
> * Sometimes the PDF reader does not read tables perfectly and you get a
mix of found headers and not found headers, so that's one reason I took that
approach.
This consideration will only apply for `extractHeaders = true`, right? In
this case all of the readers do split out columns so I think we're good.
> * I actually dislike the `columns` approach from the CSV readers because
it increases the level of complexity of queries. In theory, if someone is
querying a table (doesn't matter from where) they will want that broken into
columns and rows. The columns array approach (IMHO) makes this a lot harder
that it needs to be.
The columns array does allow text files to contain jagged arrays, which is
perhaps valuable? I don't see a huge saving in typing `select field_0,
field_1` over `select columns[0], columns[1]`, am I missing some other
complication?
> * This actually follows the model used in the Excel reader.
My fear, and I don't know if this is real or not, is that if we profilerate
plugin-specific quirks in how the schema is represented to the user then the
promise of a standard SQL interface to the data gets tainted and developers and
users will be put off.
"It's standard except every plugin presents its data the way that author
prefers"
"That fragment of code I sent you for the columns array won't work here
because this plugin does it differently"
"This SQL script makes for confusing reading because the plugins involved
name their generated columns differently, sometimes 'column', sometimes
'field', sometimes 'var'"
etc.
I don't feel all that strongly about `field_0` vs `columns[0]` (I mean,
maybe we deprecate the columns array?) but I am finding myself thinking more
and more about consistency.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]