[ https://issues.apache.org/jira/browse/PDFBOX-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Hewson updated PDFBOX-2547: -------------------------------- Affects Version/s: 2.0.0 > maybe encoding error > -------------------- > > Key: PDFBOX-2547 > URL: https://issues.apache.org/jira/browse/PDFBOX-2547 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.7, 2.0.0 > Reporter: Michał > Priority: Minor > > Hi, > I just download a pdf form page: > http://download.jw.org/files/media_books/32/es15_P.pdf > and wants extract text from this document. > I use command: > java -jar pdfbox-app-1.8.7.jar ExtractText -encoding UTF-8 es15_P.pdf > resultFile-UTF-8.txt > But I see some problems for exmaple: > 1. I see in text file 'STX' and 'ETX' instead of 'ę' and 'ą'. > 2. extractor return a text 'naprzykładmiłe' instead of 'na przykład miłe' > (page 4, line 6). > Maybe it is some small problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)