cranky old OCR engine that apparently sucks less than most
  other ones out there ??.  friend of mine asked for it in response
  to seeing something on groklaw where they used it with image-based PDFs
  and xpdf or something to snarf the text out of them

  without the stuff in ${SUPDISTFILES}, the user has to train the
  OCR engine, which is reasonably documented on their wiki but
  also looks laborious and annoying if you don't otherwise need
  that level of accuracy, hence grabbing the SUPDISTFILE stuff.

  apache license 2.0

-- 

  jared

Attachment: tesseract-ocr.tgz
Description: application/tar-gz

Reply via email to