[jira] [Created] (PDFBOX-1303) Tika's PDFParser fails to parse documents embedded in a PDF Package

Michael McCandless (JIRA) Sat, 05 May 2012 03:42:15 -0700

Michael McCandless created PDFBOX-1303:
------------------------------------------


             Summary: Tika's PDFParser fails to parse documents embedded in a 
PDF Package
                 Key: PDFBOX-1303
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1303
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
            Reporter: Michael McCandless
             Fix For: 1.7.0


In working on PDFBOX-1297, I realized Tika's PDFParser also doesn't
visit documents embedded with a PDF document (ie a PDF package).

Tika can actually handle this better than ExtractText since it can
recurse on any embedded document type (not just PDFs) and parse them
as well, vs ExtractText which only extracts when the embedded
documents are also PDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PDFBOX-1303) Tika's PDFParser fails to parse documents embedded in a PDF Package

Reply via email to