cgivre commented on pull request #2359: URL: https://github.com/apache/drill/pull/2359#issuecomment-963111943
> @cgivre Instead of generating `field_0`, `field_1` when no column names can be determined, is it possible to generate a `columns` array to be accessed using `columns[0], columns[1], ...` to remain consistent with what we do for CSV? It's possible to do that, but a few things: 1. Sometimes the PDF reader does not read tables perfectly and you get a mix of found headers and not found headers, so that's one reason I took that approach. 2. I actually dislike the `columns` approach from the CSV readers because it increases the level of complexity of queries. In theory, if someone is querying a table (doesn't matter from where) they will want that broken into columns and rows. The columns array approach (IMHO) makes this a lot harder that it needs to be. 3. This actually follows the model used in the Excel reader. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
