[GitHub] [drill] cgivre commented on pull request #2359: DRILL-8028: Add PDF Format Plugin

GitBox Mon, 08 Nov 2021 04:41:04 -0800


cgivre commented on pull request #2359:
URL: https://github.com/apache/drill/pull/2359#issuecomment-963111943



   > @cgivre Instead of generating `field_0`, `field_1` when no column names 
can be determined, is it possible to generate a `columns` array to be accessed 
using `columns[0], columns[1], ...` to remain consistent with what we do for 
CSV?
   
   It's possible to do that, but a few things:
   1.  Sometimes the PDF reader does not read tables perfectly and you get a 
mix of found headers and not found headers, so that's one reason I took that 
approach.
   2.  I actually dislike the `columns` approach from the CSV readers because 
it increases the level of complexity of queries.  In theory, if someone is 
querying a table (doesn't matter from where) they will want that broken into 
columns and rows.  The columns array approach (IMHO) makes this a lot harder 
that it needs to be.  
   3. This actually follows the model used in the Excel reader.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [drill] cgivre commented on pull request #2359: DRILL-8028: Add PDF Format Plugin

Reply via email to