Thanks for giving it a try! I ended up generating 11 versions of the same 
picture with very little different filtering and it ends up always getting 
one version totally readable. So for now I'm happy with the solution and 
the ideas provided here.

JMS

Le mercredi 3 avril 2024 à 11 h 35 min 56 s UTC-4, renec...@gmail.com a 
écrit :

> I let process your image with the  same parameters and it works well, 
> tesseract 5.3 . 
>
> Le mar. 2 avr. 2024 à 16:12, Jean-Marc Spaggiari <jean...@spaggiari.org> 
> a écrit :
>
>> Oh, interesting! Thanks for the suggestion.
>>
>> It works well to add the C indeed. However, when I do that, it's 
>> confusing a 0 for a 9 on another example :(
>>
>> [image: output6.png]
>>
>> I get C 9135 with the 'fra' option and 0135 without. 
>>
>> I built a small application to split the letters one by one and to run 
>> them individually through tesseract and I get C 0135 correctly. But it 
>> fails with other images. I'm wondering what's wrong with my input picture 
>> :-/
>>
>> JMS
>>
>>
>> Le mardi 2 avril 2024 à 09 h 49 min 49 s UTC-4, renec...@gmail.com a 
>> écrit :
>>
>>> Hi Jean-Marc,
>>> I do test your picture with French language  parameter : --psm 6    -l 
>>> 'fra'       it works well.
>>> With the english language  -l eng  effectively the C is dropped.
>>> In fact the C is viewed as a euro sign   (€).  
>>> Hope it help
>>>
>>> Best regards
>>> René
>>>
>>>
>>>
>>>
>>>
>>> Le mar. 2 avr. 2024 à 14:46, Jean-Marc Spaggiari <jean...@spaggiari.org> 
>>> a écrit :
>>>
>>>> Hi,
>>>>
>>>> I'm trying to OCR short words in the form of a letter, a space, 4 
>>>> numbers.
>>>>
>>>> I'm doing a lot of pre-processing to get the picture cleaned and so far 
>>>> I arrive to something like that:
>>>> [image: output6.png]
>>>> My challenge is that tesseract is only detecting the numbers. I tried 
>>>> all the posisble PSM with the same result. The heading C is always ignored.
>>>>
>>>> This is the command line that I am running:
>>>> tesseract -c tessedit_char_whitelist=" 
>>>> 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" output6.png stdout
>>>>
>>>> I tried with tesseract 5.3.0 and tesseract 5.3.4-45-g87a15 with the 
>>>> same result. 
>>>>
>>>> I'm looking for some recommendations on what I can do better to help 
>>>> tesseract detecting the heading C correctly.
>>>>
>>>> Thanks,
>>>>
>>>> JMS
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/d102e4e7-76e1-4a33-a84b-040a9b082b5fn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/d102e4e7-76e1-4a33-a84b-040a9b082b5fn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c7f7c846-d9d2-4fba-b708-d02aeb5612c4n%40googlegroups.com.

Reply via email to