Thanks for giving it a try! I ended up generating 11 versions of the same picture with very little different filtering and it ends up always getting one version totally readable. So for now I'm happy with the solution and the ideas provided here.
JMS Le mercredi 3 avril 2024 à 11 h 35 min 56 s UTC-4, renec...@gmail.com a écrit : > I let process your image with the same parameters and it works well, > tesseract 5.3 . > > Le mar. 2 avr. 2024 à 16:12, Jean-Marc Spaggiari <jean...@spaggiari.org> > a écrit : > >> Oh, interesting! Thanks for the suggestion. >> >> It works well to add the C indeed. However, when I do that, it's >> confusing a 0 for a 9 on another example :( >> >> [image: output6.png] >> >> I get C 9135 with the 'fra' option and 0135 without. >> >> I built a small application to split the letters one by one and to run >> them individually through tesseract and I get C 0135 correctly. But it >> fails with other images. I'm wondering what's wrong with my input picture >> :-/ >> >> JMS >> >> >> Le mardi 2 avril 2024 à 09 h 49 min 49 s UTC-4, renec...@gmail.com a >> écrit : >> >>> Hi Jean-Marc, >>> I do test your picture with French language parameter : --psm 6 -l >>> 'fra' it works well. >>> With the english language -l eng effectively the C is dropped. >>> In fact the C is viewed as a euro sign (€). >>> Hope it help >>> >>> Best regards >>> René >>> >>> >>> >>> >>> >>> Le mar. 2 avr. 2024 à 14:46, Jean-Marc Spaggiari <jean...@spaggiari.org> >>> a écrit : >>> >>>> Hi, >>>> >>>> I'm trying to OCR short words in the form of a letter, a space, 4 >>>> numbers. >>>> >>>> I'm doing a lot of pre-processing to get the picture cleaned and so far >>>> I arrive to something like that: >>>> [image: output6.png] >>>> My challenge is that tesseract is only detecting the numbers. I tried >>>> all the posisble PSM with the same result. The heading C is always ignored. >>>> >>>> This is the command line that I am running: >>>> tesseract -c tessedit_char_whitelist=" >>>> 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" output6.png stdout >>>> >>>> I tried with tesseract 5.3.0 and tesseract 5.3.4-45-g87a15 with the >>>> same result. >>>> >>>> I'm looking for some recommendations on what I can do better to help >>>> tesseract detecting the heading C correctly. >>>> >>>> Thanks, >>>> >>>> JMS >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/d102e4e7-76e1-4a33-a84b-040a9b082b5fn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/d102e4e7-76e1-4a33-a84b-040a9b082b5fn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c7f7c846-d9d2-4fba-b708-d02aeb5612c4n%40googlegroups.com.