Hi experts,

I’ve read that tesseract is not good at image OCR, for images like internet photos, but does well on pdf text. 

Is this true, or I need to build some complex training to guide it?

Sent from my iPhone

On Feb 14, 2024, at 12:28, Glenn C <gck...@gmail.com> wrote:

Hi all,

I'm trying to build a meme text extraction.  Since I don't know the font, location, or other details of the text, I can't use any of the documented or internet recommendations on things like whitelists, or single line, etc.  In this example, the detection is too accurate...I want the meme text and not the other things in the images.

What's the best methods to filter or image process these types of issues?  (most internet recommendations are for noise filtering, color inversion, etc, which aren't really useful here)

I attach a sample image, and here's my tesseract output as well.

thanks in advance!

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5126e29c-b2af-43db-b570-d6d7af2e57acn%40googlegroups.com.
<IMG_5592.jpg>
<imgtotext.png>

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/21E6B21C-37E0-4543-A084-D6BF8F4BB3A6%40gmail.com.

Reply via email to