On February 2, Walter Lewis wrote: > The "good" news from the perspective of searching is that a > reasonable percentage of those errors will affect terms that are > either rarely used in searching or are repeated correctly in the > vicinity.
This is why OCR should be done by a search engine company (such as Google), which has statistics on what real people really search for, and can improve the OCR process as it goes. Software developing companies such as ABBYY or Omnipage never get that kind of feedback from actual users. They only represent a fraction of the entire feedback loop. All my experience of scanning old Swedish and Danish books with ABBYY Finereader, never got back to ABBYY, they never asked for any of that feedback. I have no idea to what degree Google Book Search does this right, but by controlling the entire scan-search loop they have one excuse less to fail. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se