[jira] [Commented] (TIKA-738) Tika fails to extract text from PDF annotations

Michael McCandless (Commented) (JIRA) Tue, 18 Oct 2011 10:49:35 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129889#comment-13129889
 ]


Michael McCandless commented on TIKA-738:
-----------------------------------------

I opened PDFBOX-1143 to improve PDFTextStripper so that it visits text 
annotations.

I also worked out a simple patch to PDF2XHTML to directly extract the 
annotations ourselves until PDFBOX-1143 is fixed.
                
> Tika fails to extract text from PDF annotations
> -----------------------------------------------
>
>                 Key: TIKA-738
>                 URL: https://issues.apache.org/jira/browse/TIKA-738
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>
> Spinoff from TIKA-717.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-738) Tika fails to extract text from PDF annotations

Reply via email to