[ https://issues.apache.org/jira/browse/PDFBOX-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexandre updated PDFBOX-3800: ------------------------------ Comment: was deleted (was: I may provide another example of pdf if someone wants it.) > I extract text of a pdf using PDFTextStripper and part of the text is missing. > ------------------------------------------------------------------------------ > > Key: PDFBOX-3800 > URL: https://issues.apache.org/jira/browse/PDFBOX-3800 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.6, 2.0.7 > Environment: Mac OS x under Eclipse > Reporter: Alexandre > Attachments: Smith.pdf > > > Hi, > I am quite unfamiliar with PDFbox. Still, I spent some time trying to figure > out to solve the following issue. > There is an issue for the pdf in attachment while extracting its text. > Indeed, as you can see the pdf contains the text "Mapping Twitter topic > networks: ... " until "... hub and spokes". But the result of PDFTextStripper > getText() does not contain any of these characters. > I checked and the community has already fixed similar bugs in the past. > Any help will be delighted. > Cheers, > A. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org