Lucene to index OCR text

Renaud Waldura Thu, 24 Jan 2008 17:43:46 -0800

I've been poking around the list archives and didn't really come up against
anything interesting. Anyone using Lucene to index OCR text? Any
strategies/algorithms/packages you recommend?
 
I have a large collection (10^7 docs) that's mostly the result of OCR. We
index/search/etc. with Lucene without any trouble, but OCR errors are a
problem, when doing exact phrase matches in particular. I'm looking for
ideas on how to deal with this thorny problem.
 
--
Renaud Waldura
Applications Group Manager
Library and Center for Knowledge Management
University of California, San Francisco
(415) 502-6660

Lucene to index OCR text

Reply via email to