I've been poking around the list archives and didn't really come up against
anything interesting. Anyone using Lucene to index OCR text? Any
strategies/algorithms/packages you recommend?
I have a large collection (10^7 docs) that's mostly the result of OCR. We
index/search/etc. with Lucene without any trouble, but OCR errors are a
problem, when doing exact phrase matches in particular. I'm looking for
ideas on how to deal with this thorny problem.
Renaud Waldura
Applications Group Manager
Library and Center for Knowledge Management
University of California, San Francisco
(415) 502-6660


Reply via email to