Oh, interesting! Thanks for the suggestion. It works well to add the C indeed. However, when I do that, it's confusing a 0 for a 9 on another example :(
[image: output6.png] I get C 9135 with the 'fra' option and 0135 without. I built a small application to split the letters one by one and to run them individually through tesseract and I get C 0135 correctly. But it fails with other images. I'm wondering what's wrong with my input picture :-/ JMS Le mardi 2 avril 2024 à 09 h 49 min 49 s UTC-4, renec...@gmail.com a écrit : > Hi Jean-Marc, > I do test your picture with French language parameter : --psm 6 -l > 'fra' it works well. > With the english language -l eng effectively the C is dropped. > In fact the C is viewed as a euro sign (€). > Hope it help > > Best regards > René > > > > > > Le mar. 2 avr. 2024 à 14:46, Jean-Marc Spaggiari <jean...@spaggiari.org> > a écrit : > >> Hi, >> >> I'm trying to OCR short words in the form of a letter, a space, 4 numbers. >> >> I'm doing a lot of pre-processing to get the picture cleaned and so far I >> arrive to something like that: >> [image: output6.png] >> My challenge is that tesseract is only detecting the numbers. I tried all >> the posisble PSM with the same result. The heading C is always ignored. >> >> This is the command line that I am running: >> tesseract -c tessedit_char_whitelist=" >> 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" output6.png stdout >> >> I tried with tesseract 5.3.0 and tesseract 5.3.4-45-g87a15 with the same >> result. >> >> I'm looking for some recommendations on what I can do better to help >> tesseract detecting the heading C correctly. >> >> Thanks, >> >> JMS >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/cf6a3a25-732a-4214-8ce3-03a90a719c8dn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d102e4e7-76e1-4a33-a84b-040a9b082b5fn%40googlegroups.com.