I am trying to extract data from a large number of images. I am trying to use pytesser, but I am not getting the desired results. I tried to recognize the entire image and it was not consistent at all. So, I cropped the picture with PIL in the sections that I am interested in, and then tried to run. The problem I am having is with the date and time block as well as the numerical data. I have tried to use the ImageEnhance and ImageFilter but I have not been able to increase the accuracy. Does anyone have any experience making the 'text' in an image easily detectable for an ocr.
import ImageEnhance, ImageFilter from pytesser import * im = Image.open('C:\\Users\\bryan\\Desktop\\10-28-08.bmp') im1 = im.crop([156,105,265,120]) # Date/Time im1 = im1.convert('RGB') im2 = im.crop([380,815,430,833]) # Couch Vrt im2 = im2.convert('RGB') text1, text2 = image_to_string(im1), image_to_string(im2) >>> text1 '\n' >>> text2 'HIS\n\n' >>> -- "The game of science can accurately be described as a never-ending insult to human intelligence." - João Magueijo
<<attachment: DateTime.bmp>>
<<attachment: CouchVrt.bmp>>
_______________________________________________ Image-SIG maillist - Image-SIG@python.org http://mail.python.org/mailman/listinfo/image-sig