On Thu, 17 Jul 2008, Antoine Jacoutot wrote: > On Wed, 16 Jul 2008, jared r r spiegel wrote: > > > cranky old OCR engine that apparently sucks less than most > > other ones out there ??. friend of mine asked for it in response > > to seeing something on groklaw where they used it with image-based PDFs > > and xpdf or something to snarf the text out of them > > > > without the stuff in ${SUPDISTFILES}, the user has to train the > > OCR engine, which is reasonably documented on their wiki but > > also looks laborious and annoying if you don't otherwise need > > that level of accuracy, hence grabbing the SUPDISTFILE stuff. > > > > apache license 2.0 > > > I'll take care of that. > Thanks for your submission.
Ok... I finally got some time to look into this. I reworked your port so that language files are in corresponding subpackages. I also tweaked the patches a bit, changed DESCR, provide doc and samples... I assume you still want to maintain this, right? Anyway, it seems to work fine on i386. (for those who don't have a scanner, two sample file are provided for testing). Comments/OK? -- Antoine
tesseract.tar.gz
Description: Binary data