[ 
https://issues.apache.org/jira/browse/PDFBOX-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908062#comment-14908062
 ] 

Ben McCann edited comment on PDFBOX-2991 at 10/15/15 10:19 PM:
---------------------------------------------------------------

The big problem here for me is that it's not recognizing them as different 
words. It should be able to tell to put a whitespace between "California" and 
"[email protected]", right?


was (Author: chengas123):
I don't care whether it thinks they're on the same line or not. The big problem 
here for me is that it's not recognizing them as different words. It should be 
able to tell to put a whitespace between "CA" and "[email protected]", right?

> Improper word concatenation when extracting pdf
> -----------------------------------------------
>
>                 Key: PDFBOX-2991
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2991
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Ben McCann
>         Attachments: sample-resume.pdf
>
>
> The code below will output text for a pdf. Words that are on different lines 
> are concatenated together
>     PDDocument pdDoc = PDDocument.load(new File("sample-resume.pdf"));
>     StringWriter writer = new StringWriter();
>     new PDFTextStripper().writeText(pdDoc, writer);
>     pdDoc.close();
>     System.out.println(writer.toString());



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to