Hello, I've run into some trouble using Tesseract OCR in a python program doing some screen scraping. I can't quite wrap my head around why this one value is having so much more trouble than the others on the same page, with the same contrast and font.
This is the image in question: It has been scraped from a 1080p resolution screenshot, sliced into individual images for the values in a grid, scaled up by 10x, inverted (from white-on-black to this), thresholded, and passed to Tesseract. I have also tried various Gaussian and median blurs but those seem to just make other strings fail more. I have tried most of the PSM options that make sense, and passed options with just numerals, $, comma, and decimal as allow list of characters. I've tried all the different interpolations OpenCV has to offer. Tesseract just constantly chokes on this value. It's a little frustrating because the only OCR I've found that works with this value is an A9T9 model(I think) through the free api at ocr.space ( https://ocr.space/ocrapi#ocrengine2 ). Unfortunately there doesn't appear to be a way for me to run that locally, and the string seems like it should be simple for an OCR read. Any advice on poking Tesseract in the right way to read this, or some fancy filtering I could do to help make the image clearer for it? Thanks! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ae2ae7cd-6cd1-44ef-843e-ef10a35929c6n%40googlegroups.com.