[
https://issues.apache.org/jira/browse/DRILL-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678473#comment-17678473
]
ASF GitHub Bot commented on DRILL-8390:
---------------------------------------
cgivre opened a new pull request, #2742:
URL: https://github.com/apache/drill/pull/2742
# [DRILL-8390](https://issues.apache.org/jira/browse/DRILL-8390): Minor
Improvements to PDF Reader
## Description
This PR makes some minor improvements to the PDF reader including:
Fixes a minor bug where certain configurations the first row of data was
skipped
Fixes a minor bug where empty tables were causing crashes with the
spreadsheet extraction algorithm was used
Adds a `_table_count` metadata field
Adds a `_table_index` metadata field to reflect the current table.
## Documentation
See above. Updated README.
## Testing
Ran existing unit tests. Manually tested against customer data.
> Minor Improvements to PDF Reader
> --------------------------------
>
> Key: DRILL-8390
> URL: https://issues.apache.org/jira/browse/DRILL-8390
> Project: Apache Drill
> Issue Type: Improvement
> Components: Format - PDF
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Major
>
> This PR makes some minor improvements to the PDF reader including:
> * Fixes a minor bug where certain configurations the first row of data was
> skipped
> * Fixes a minor bug where empty tables were causing crashes with the
> spreadsheet extraction algorithm was used
> * Adds a table_count metadata field
> * Adds a table_index metadata field to reflect the current table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)