cranky old OCR engine that apparently sucks less than most other ones out there ??. friend of mine asked for it in response to seeing something on groklaw where they used it with image-based PDFs and xpdf or something to snarf the text out of them
without the stuff in ${SUPDISTFILES}, the user has to train the OCR engine, which is reasonably documented on their wiki but also looks laborious and annoying if you don't otherwise need that level of accuracy, hence grabbing the SUPDISTFILE stuff. apache license 2.0 -- jared
tesseract-ocr.tgz
Description: application/tar-gz