That is brilliant! On Jan 25, 2008 6:12 AM, mark harwood <[EMAIL PROTECTED]> wrote:
> Probably not a practical solution for you to set up but I love this idea: > http://blog.wired.com/monkeybites/2007/05/recaptcha_fight.html > > ----- Original Message ---- > From: Renaud Waldura <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Friday, 25 January, 2008 1:43:06 AM > Subject: Lucene to index OCR text > > I've > been > poking > around > the > list > archives > and > didn't > really > come > up > against > anything > interesting. > Anyone > using > Lucene > to > index > OCR > text? > Any > strategies/algorithms/packages > you > recommend? > > I > have > a > large > collection > (10^7 > docs) > that's > mostly > the > result > of > OCR. > We > index/search/etc. > with > Lucene > without > any > trouble, > but > OCR > errors > are > a > problem, > when > doing > exact > phrase > matches > in > particular. > I'm > looking > for > ideas > on > how > to > deal > with > this > thorny > problem. > > -- > Renaud > Waldura > Applications > Group > Manager > Library > and > Center > for > Knowledge > Management > University > of > California, > San > Francisco > (415) > 502-6660 > > > > > > > > ___________________________________________________________ > Yahoo! Answers - Got a question? Someone out there knows the answer. Try > it > now. > http://uk.answers.yahoo.com/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >