[ https://issues.apache.org/jira/browse/PDFBOX-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr closed PDFBOX-2991. ----------------------------------- Fix Version/s: 3.0.0 PDFBox 2.0.22 Resolution: Duplicate Likely duplicate of PDFBOX-5002. No longer happens since 2.0.22. > Improper word concatenation when extracting pdf > ----------------------------------------------- > > Key: PDFBOX-2991 > URL: https://issues.apache.org/jira/browse/PDFBOX-2991 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.10, 1.8.11, 2.0.0 > Reporter: Java Developer > Priority: Major > Fix For: 3.0.0 PDFBox, 2.0.22 > > Attachments: sample-resume.pdf > > > The code below will output text for a pdf. Words that are on different lines > are concatenated together > PDDocument pdDoc = PDDocument.load(new File("sample-resume.pdf")); > StringWriter writer = new StringWriter(); > new PDFTextStripper().writeText(pdDoc, writer); > pdDoc.close(); > System.out.println(writer.toString()); -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org