Probably not a practical solution for you to set up but I love this idea:  
http://blog.wired.com/monkeybites/2007/05/recaptcha_fight.html

----- Original Message ----
From: Renaud Waldura <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, 25 January, 2008 1:43:06 AM
Subject: Lucene to index OCR text

I've 
been 
poking 
around 
the 
list 
archives 
and 
didn't 
really 
come 
up 
against
anything 
interesting. 
Anyone 
using 
Lucene 
to 
index 
OCR 
text? 
Any
strategies/algorithms/packages 
you 
recommend?
 
I 
have 
a 
large 
collection 
(10^7 
docs) 
that's 
mostly 
the 
result 
of 
OCR. 
We
index/search/etc. 
with 
Lucene 
without 
any 
trouble, 
but 
OCR 
errors 
are 
a
problem, 
when 
doing 
exact 
phrase 
matches 
in 
particular. 
I'm 
looking 
for
ideas 
on 
how 
to 
deal 
with 
this 
thorny 
problem.
 
--
Renaud 
Waldura
Applications 
Group 
Manager
Library 
and 
Center 
for 
Knowledge 
Management
University 
of 
California, 
San 
Francisco
(415) 
502-6660

 





      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to