I am trying to extract data from a large number of images. I am trying to use pytesser, but I am not getting the desired results. I tried to recognize the entire image and it was not consistent at all. So, I cropped the picture with PIL in the sections that I am interested in, and then tried to run. The problem I am having is with the date and time block as well as the numerical data. I have tried to use the ImageEnhance and ImageFilter but I have not been able to increase the accuracy. Does anyone have any experience making the 'text' in an image easily detectable for an ocr.
import ImageEnhance, ImageFilter
from pytesser import *
im = Image.open('C:\\Users\\bryan\\Desktop\\10-28-08.bmp')
im1 = im.crop([156,105,265,120]) # Date/Time
im1 = im1.convert('RGB')
im2 = im.crop([380,815,430,833]) # Couch Vrt
im2 = im2.convert('RGB')
text1, text2 = image_to_string(im1), image_to_string(im2)
>>> text1
'\n'
>>> text2
'HIS\n\n'
>>>
--
"The game of science can accurately be described as a never-ending
insult to human intelligence." - João Magueijo
<<attachment: DateTime.bmp>>
<<attachment: CouchVrt.bmp>>
_______________________________________________ Image-SIG maillist - [email protected] http://mail.python.org/mailman/listinfo/image-sig
