Greek text extraction
---------------------
Key: PDFBOX-770
URL: https://issues.apache.org/jira/browse/PDFBOX-770
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 1.2.0
Environment: Ubuntu 10.04
Reporter: Manos Karampasis
Greek text extraction error
Ι have a greek pdf but after extraction the greek letter π is extracted as pi
for expamle
original text in pdf
"φυσικών προσώπων"
extracted text
"φυσικών piροσώpiων"
due to this problem solr is not indexing documents correctly
is there any configuration I can make?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.