[ https://issues.apache.org/jira/browse/PDFBOX-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179883#comment-14179883 ]
EugenePig commented on PDFBOX-2409: ----------------------------------- I am sure I run ExtractText with “-sort -encoding UTF-8". I upload a new picture “THESSALONIANS.txt.mac.jpg” captured on the Mac. The result is still wrong. Does it look as same as yours? > got the wrong result from Arabic text extraction > ------------------------------------------------ > > Key: PDFBOX-2409 > URL: https://issues.apache.org/jira/browse/PDFBOX-2409 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.7, 2.0.0 > Environment: Ubuntu 14.04 64bit > java version "1.8.0_20" > Reporter: EugenePig > Assignee: John Hewson > Attachments: THESSALONIANS.pdf, THESSALONIANS.txt, > THESSALONIANS.txt.jpg > > > java -jar pdfbox-app-1.8.7.jar ExtractText -sort -encoding UTF-8 > THESSALONIANS.pdf > java -jar pdfbox-app-2.0.0-SNAPSHOT.jar ExtractText -sort -encoding UTF-8 > THESSALONIANS.pdf > Please compare THESSALONIANS.txt.jpg with THESSALONIANS.pdf. There are a lot > of differences. I just marked a few differences with red circles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)