[ 
https://issues.apache.org/jira/browse/PDFBOX-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673299#comment-13673299
 ] 

Maruan Sahyoun commented on PDFBOX-1502:
----------------------------------------

Hi,

as far as I can see the text extraction works as expected. Text extraction is 
meant to extract the boilerplate text not the fields value. This works similar 
to Adobe Reader if you save the filled out form as text. You will get also not 
get the fields value. So from my perspective the software works as designed an 
inline with what Adobe Reader does.

BR
Maruan
                
> Not Extracting Text from PDF Document
> -------------------------------------
>
>                 Key: PDFBOX-1502
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1502
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator, 1.7.1, 1.8.0
>         Environment: Mac OS , jdk 1.7
>            Reporter: deepak
>            Assignee: Andreas Lehmkühler
>         Attachments: PDFBOX1502-RenewalAdvice.txt, 
> Renewal_Advice_Edited_Extracted_Text.txt, Renewal_Advice_Edited.pdf, Renewal 
> Advice .pdf
>
>
> PDDocument  document = PDDocument.load(Inputstream);
> PDFTextStripper stripper = new PDFTextStripper();
> stripper.getText(document)   is not returning some text content in the 
> attached PDF Document . It is just returning the form fields but the values 
> are empty .  The bug is reproducible both in 1.8.0-Snapshot and 1.7.1 
> codebase.
> Please help in resolving the issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to