Cropping the image to only include the relevant area can significantly 
improve performance in cases where recognition was poor due to image 
processing or layout analysis failing.  An indicator that this is happening 
is if words are missing entirely from the final output (rather than being 
misidentified).  

Knowing the font is less useful.  At least with the higher-performing LSTM 
model, there is no easy way to limit Tesseract to certain fonts. 

Other things you can try given your input image:

   - Try disabling the dictionaries (DAWGs)
      - Tesseract is heavily biased towards dictionary words, which 
      generally makes sense, however probably does not in the context of 
      usernames (which are generally not dictionary words).
   - Try upscaling the image
      - Tesseract sometimes performs better if you upscale the image, even 
      if it's a "dumb" upscaling (splitting every pixel into 4).
   
On Saturday, April 6, 2024 at 1:00:59 PM UTC-7 Shatter wrote:

> Hey y'all o/
>
> I am maintaining a discord bot for a small community of a mobile game and 
> want to develop a new feature.
>
> The game has tournaments and a different user is maintaining a website to 
> track the performance of the best users.
> I want to give members of my community the option to relatively easily 
> check who they're up against. Other than e.g. op.gg or porofessor for 
> League of Legends, there's no API to this game, so my idea went to text 
> recognition.
>
> I know which typeface is going to be used, and should be able to crop some 
> irrelevant stuff (though aspect ratios could screw with this).
>
> Does knowing this and thus restricting the expected inputs make the job 
> easier and less error prone?
>
> I am adding an image of how an input could look like, the yellow frame 
> marks the relevant data. [image: firefox_aNtcjbCPMI.png]
>
> Any pointers or help are very welcome ^^
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7819c4db-47ed-4acf-8de6-4086bef74e20n%40googlegroups.com.

Reply via email to