[ https://issues.apache.org/jira/browse/PDFBOX-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860415#comment-13860415 ]
Andreas Lehmkühler commented on PDFBOX-1502: -------------------------------------------- OK, I try to summarize all prior comments: - PDFBox does extract all text of a pdf (if possible) excluding form values, annotations, metadata etc. - have a look at the [PrintFields|http://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/fdf/PrintFields.java] example on how to extract form values - updated pdfs, like the edited one, have to read using the non-sequential parser (use -nonSeq as commandline option / use PDDocument#loadNonSeq instead of PDDocument#load within your own code) as the old can't handle incremental updates If there are any further questions, please address those to our [mailing lists|http://pdfbox.apache.org/mailinglists.html]. We don't use JIRA as A+Q-tool. > Not Extracting Text from PDF Document > ------------------------------------- > > Key: PDFBOX-1502 > URL: https://issues.apache.org/jira/browse/PDFBOX-1502 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 0.8.0-incubator, 1.7.1, 1.8.0 > Environment: Mac OS , jdk 1.7 > Reporter: deepak > Assignee: Andreas Lehmkühler > Attachments: PDFBOX1502-RenewalAdvice.txt, Renewal Advice .pdf, > Renewal_Advice_Edited.pdf, Renewal_Advice_Edited_Extracted_Text.txt > > > PDDocument document = PDDocument.load(Inputstream); > PDFTextStripper stripper = new PDFTextStripper(); > stripper.getText(document) is not returning some text content in the > attached PDF Document . It is just returning the form fields but the values > are empty . The bug is reproducible both in 1.8.0-Snapshot and 1.7.1 > codebase. > Please help in resolving the issue -- This message was sent by Atlassian JIRA (v6.1.5#6160)