Re: [CODE4LIB] Free/Open OCR solutions?

Francis Kayiwa Wed, 28 Jul 2010 09:03:03 -0700

On 7/28/10 10:46 AM, Andy Kelly wrote:

I'm working on scanning some documents in a collection and then preforming
OCR on the documents. Thus far, I've used Adobe Acrobat Pro's OCR function
with some success but the machines I'm working on are fairly old Pentium 4
Dell boxes, this makes opening 600 DPI scans painful and preforming OCR an
entirely valid excuse for a long coffee break.


As you might expect, I'm looking for a way to speed up this process at the
OCR end of things, since the scanning can only move so quickly. I'm
wondering if any of you have experience with any open OCR solutions such as:
Tesseract-OCR<http://code.google.com/p/tesseract-ocr/>  or
ocropus<http://code.google.com/p/ocropus/>.
At a glance, Tesseract seems to be further along in development. Any other
suggestions on how best to approach this sort of task would be appreciated
if you've done similar work.


I've used Tesseract quite a bit but moved on to

https://launchpad.net/cuneiform-linux

Cuneiform OCR software which used to be top rated Russian OCR that wasbeen open-sourced. This is especially useful for non-English languages.


./fxk


I've got my own Ubuntu Server I'm planning on evaluating one or both of
these on, as much for my own interest as the project's or the
organization's. Since I'm an unpaid part-time intern and the only one who's
working on this project, I'm willing to learn to do things the hard way so
they're easier in the long run.

Thanks for any suggestions or advice you may be able to offer.





--
Maintainer's Motto:
        If we can't fix it, it ain't broke.

Re: [CODE4LIB] Free/Open OCR solutions?

Reply via email to