https://bugs.kde.org/show_bug.cgi?id=472692
Bug ID: 472692 Summary: Tesseract OCR does not take language selection into account Classification: Applications Product: digikam Version: 8.2.0 Platform: Microsoft Windows OS: Microsoft Windows Status: REPORTED Severity: normal Priority: NOR Component: Plugin-Generic-OcrTextConverter Assignee: digikam-bugs-n...@kde.org Reporter: claus.peja+...@gmail.com Target Milestone: --- Created attachment 160560 --> https://bugs.kde.org/attachment.cgi?id=160560&action=edit One of the files I want to extract text from. SUMMARY I use Tesseract OCR with digiKam 8.2.0 (20.07.2023) on Windows 10 Pro. I try to get the text from a jpg. If I select 'Languages: Default', I get a result, but German umlauts, ä, ü, and ö, are scanned incorrectly as a o u, and, yes, that's a difference in German ;-) . But when I select 'Languages: deu', I get no result. No test is found at all. But also selecting e.g. eng gives no result. However, when I use Tesseract (v5.3.1.20230401) directly on the command line with switch -l deu, it works. Tesseract command that works: tesseract /dir/pic1.jpg /text/pic1.ocr-result -l deu I attach one of the pictures I use. I marked the last sentence and the umlauts in it. STEPS TO REPRODUCE 1. Open the image attached in the 'OCR text converter...' 2. Select 'Languages: Default'. What you select for 'Segmentation mode' and 'Engine mode' makes no difference. DPI=72 3. Start OCR 4. Now you get the result without umlauts (ö ü) 5. Close OCR 6. Open the same image again in 'OCR text converter...' 7. Select 'Languages: deu'. What you select for 'Segmentation mode' and 'Engine mode' makes no difference. DPI=72 8. Start OCR 9. Now you get no result OBSERVED RESULT With default, the sentence is scanned as: Die Giebel und Traufen konnen durch Wind- bzw. Traufen- oder Tropf-bretter geschutzt werden. EXPECTED RESULT The correct sentence is: Die Giebel und Traufen können durch Wind- bzw. Traufen- oder Tropf-bretter geschützt werden. SOFTWARE/OS VERSIONS Windows 10, 22H2 ADDITIONAL INFORMATION -- You are receiving this mail because: You are watching all bug changes.