On 3/20/07, Geoffrey S. Mendelson <[EMAIL PROTECTED]> wrote:
On Tue, Mar 20, 2007 at 03:51:50PM +0200, Hetz Ben Hamo wrote:

> * can use Sane to scan a document
> * can save it to PDF
> * The PDF shouldn't be a dumb TIFF/JPG file page/collection, but a
> "real" PDF (so I can search/grep for words in the scanned doc)
> * Should have some basic hebrew OCR (optionally)
>
> Any suggestions?

Windows. Seriously, OCR is not a new technology, what makes programs
better than others is the large library of fonts that it knows how to
handle. Commercial programs include lots of code to handle many fonts
of both different types and sizes.


How about Tesseract?
http://code.google.com/p/tesseract-ocr/

It's English only, but it's been in the market for a long time (only
recently open sourced), so I would not expect it to have this kind of
problem.
Any war/sucess stories about it here?

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to