Jakub Wilk: > Could you attach the HTML files to the bug report (or, alternatively, > send them to me in a private mail)?
Hi Jakub, thank you for responding so quickly. I reported the same issue to gscan2pdf and attached the minimal hocr test file that Jeffrey Ratcliffe uses. The tesseract utf-8 issue is also already reported upstream: http://code.google.com/p/tesseract-ocr/issues/detail?id=690 Maybe you want to star it? Best regards, Thomas Koch, http://www.koch.ro
Pße