On 2011-12-29 23:31, Ralf Stephan wrote: > > On Dec 30, 2011, at 7:12 AM, Janusz S. Bień wrote: >> On Thu, 29 Dec 2011 Edward Betts<[email protected]> wrote: >>> As you point out the OCR doesn't properly handle blackletter type. >> >> There is a solution to it, but it is expensive: >> >> http://www.frakturschrift.com/ > > tesseract is free and has support for broken fonts in German, > Swedish and Dansk. The results are near as good as with ABBYY. > >>> A system for correcting OCR is often requested, conceptually it is quite >>> simple. > > What about the interface of Distributed Proofreaders pgdp.net? > It's written in PHP and provides a full editor.
Does it maintain scanned page image coordinates for corrected words? A while back I built a prototype for correcting OCR errors in Internet Archive scanned books. http://edwardbetts.com/correct It shows a page at a time and lets you see the lines of text as images and text. You can click on a word to correct it. The prototype is very rough, it is ugly, incomplete and contains bugs. -- Edward. _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
