Michael McCandless created PDFBOX-1303:
------------------------------------------
Summary: Tika's PDFParser fails to parse documents embedded in a
PDF Package
Key: PDFBOX-1303
URL: https://issues.apache.org/jira/browse/PDFBOX-1303
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Reporter: Michael McCandless
Fix For: 1.7.0
In working on PDFBOX-1297, I realized Tika's PDFParser also doesn't
visit documents embedded with a PDF document (ie a PDF package).
Tika can actually handle this better than ExtractText since it can
recurse on any embedded document type (not just PDFs) and parse them
as well, vs ExtractText which only extracts when the embedded
documents are also PDF.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira