Re: new: textproc/tesseract-ocr

Antoine Jacoutot Sun, 07 Sep 2008 06:01:27 -0700

On Thu, 17 Jul 2008, Antoine Jacoutot wrote:

> On Wed, 16 Jul 2008, jared r r spiegel wrote:
> 
> >   cranky old OCR engine that apparently sucks less than most
> >   other ones out there ??.  friend of mine asked for it in response
> >   to seeing something on groklaw where they used it with image-based PDFs
> >   and xpdf or something to snarf the text out of them
> > 
> >   without the stuff in ${SUPDISTFILES}, the user has to train the
> >   OCR engine, which is reasonably documented on their wiki but
> >   also looks laborious and annoying if you don't otherwise need
> >   that level of accuracy, hence grabbing the SUPDISTFILE stuff.
> > 
> >   apache license 2.0
> 
> 
> I'll take care of that.
> Thanks for your submission.


Ok... I finally got some time to look into this.
I reworked your port so that language files are in corresponding 
subpackages. I also tweaked the patches a bit, changed DESCR, provide 
doc and samples...
I assume you still want to maintain this, right?

Anyway, it seems to work fine on i386.
(for those who don't have a scanner, two sample file are provided for 
testing).

Comments/OK?

-- 
Antoine

tesseract.tar.gz
Description: Binary data

Re: new: textproc/tesseract-ocr

Reply via email to