[ 
https://issues.apache.org/jira/browse/PDFBOX-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson updated PDFBOX-2547:
--------------------------------
    Affects Version/s: 2.0.0

> maybe encoding error
> --------------------
>
>                 Key: PDFBOX-2547
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2547
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.7, 2.0.0
>            Reporter: Michał
>            Priority: Minor
>
> Hi,
> I just download a pdf form page:
> http://download.jw.org/files/media_books/32/es15_P.pdf
> and wants extract text from this document.
> I use command:
> java -jar pdfbox-app-1.8.7.jar ExtractText -encoding UTF-8 es15_P.pdf 
> resultFile-UTF-8.txt
> But I see some problems for exmaple:
> 1. I see in text file 'STX' and 'ETX' instead of 'ę' and 'ą'.
> 2. extractor return a text 'naprzykładmiłe' instead of 'na przykład miłe' 
> (page 4, line 6).
> Maybe it is some small problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to