
I am currently working with the Tesseract OCR. Tesseract is owned by Google 
with Apache 2.0 licensing.

The issue I am running into is text accuracy. 

The current process: target text color to black, background to white, max 
contrast, pass to OCR. 

With documents from modern word processors this approach is accurate 98% of the 
time. When trying to read commercial serials or ID's, which are can be very 
compact, the result is accurate in count but not characters.

Has anyone worked with this system before and know a possible solution? I am 
currently looking into ImageMagick.

gnhlug-discuss mailing list

Reply via email to