Oh, interesting! Thanks for the suggestion.

It works well to add the C indeed. However, when I do that, it's confusing 
a 0 for a 9 on another example :(

[image: output6.png]

I get C 9135 with the 'fra' option and 0135 without. 

I built a small application to split the letters one by one and to run them 
individually through tesseract and I get C 0135 correctly. But it fails 
with other images. I'm wondering what's wrong with my input picture :-/

JMS


Le mardi 2 avril 2024 à 09 h 49 min 49 s UTC-4, renec...@gmail.com a écrit :

> Hi Jean-Marc,
> I do test your picture with French language  parameter : --psm 6    -l 
> 'fra'       it works well.
> With the english language  -l eng  effectively the C is dropped.
> In fact the C is viewed as a euro sign   (€).  
> Hope it help
>
> Best regards
> René
>
>
>
>
>
> Le mar. 2 avr. 2024 à 14:46, Jean-Marc Spaggiari <jean...@spaggiari.org> 
> a écrit :
>
>> Hi,
>>
>> I'm trying to OCR short words in the form of a letter, a space, 4 numbers.
>>
>> I'm doing a lot of pre-processing to get the picture cleaned and so far I 
>> arrive to something like that:
>> [image: output6.png]
>> My challenge is that tesseract is only detecting the numbers. I tried all 
>> the posisble PSM with the same result. The heading C is always ignored.
>>
>> This is the command line that I am running:
>> tesseract -c tessedit_char_whitelist=" 
>> 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" output6.png stdout
>>
>> I tried with tesseract 5.3.0 and tesseract 5.3.4-45-g87a15 with the same 
>> result. 
>>
>> I'm looking for some recommendations on what I can do better to help 
>> tesseract detecting the heading C correctly.
>>
>> Thanks,
>>
>> JMS
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d102e4e7-76e1-4a33-a84b-040a9b082b5fn%40googlegroups.com.

Reply via email to