[ https://issues.apache.org/jira/browse/PDFBOX-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088233#comment-14088233 ]
John Hewson commented on PDFBOX-2259: ------------------------------------- I'm not sure what you mean. The linked webpage doesn't contain the phrase "semi-space" anywhere. What output were you expecting? Can you paste an example? > PDFTextStripper has problem with semi-space characters > ------------------------------------------------------ > > Key: PDFBOX-2259 > URL: https://issues.apache.org/jira/browse/PDFBOX-2259 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.6 > Reporter: Amir > Priority: Critical > Attachments: test.pdf > > > In some right-to-left languages, compound words are separated using > "semi-space" (please take a look at Unicode spaces: > https://www.cs.tut.fi/~jkorpela/chars/spaces.html). When the input document > contains these words, PDFTextStripper neglects semi-space character and > concatenates words together. -- This message was sent by Atlassian JIRA (v6.2#6252)