[jira] [Commented] (PDFBOX-1502) Not Extracting Text from PDF Document

JIRA Sat, 02 Feb 2013 11:06:16 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569624#comment-13569624
 ]


Andreas Lehmkühler commented on PDFBOX-1502:
--------------------------------------------

I can't confirm that. PDFBox works as expected, all the text can be extracted 
(see attachment). The only missing part isn't stored as text but as annotation. 
Even adobe reader isn't able to extract that text.
                
> Not Extracting Text from PDF Document
> -------------------------------------
>
>                 Key: PDFBOX-1502
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1502
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator, 1.7.1, 1.8.0
>         Environment: Mac OS , jdk 1.7
>            Reporter: deepak
>         Attachments: PDFBOX1502-RenewalAdvice.txt, Renewal Advice .pdf
>
>
> PDDocument  document = PDDocument.load(Inputstream);
> PDFTextStripper stripper = new PDFTextStripper();
> stripper.getText(document)   is not returning some text content in the 
> attached PDF Document . It is just returning the form fields but the values 
> are empty .  The bug is reproducible both in 1.8.0-Snapshot and 1.7.1 
> codebase.
> Please help in resolving the issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1502) Not Extracting Text from PDF Document

Reply via email to