Re: [ol-discuss] Recording the quality of a book's OCR

Edward Betts Fri, 30 Dec 2011 00:05:28 -0800

On 2011-12-29 23:31, Ralf Stephan wrote:
>
> On Dec 30, 2011, at 7:12 AM, Janusz S. Bień wrote:
>> On Thu, 29 Dec 2011  Edward Betts<[email protected]>  wrote:
>>> As you point out the OCR doesn't properly handle blackletter type.
>>
>> There is a solution to it, but it is expensive:
>>
>>       http://www.frakturschrift.com/
>
> tesseract is free and has support for broken fonts in German,
> Swedish and Dansk. The results are near as good as with ABBYY.
>
>>> A system for correcting OCR is often requested, conceptually it is quite
>>> simple.
>
> What about the interface of Distributed Proofreaders pgdp.net?
> It's written in PHP and provides a full editor.


Does it maintain scanned page image coordinates for corrected words?

A while back I built a prototype for correcting OCR errors in Internet 
Archive scanned books.

http://edwardbetts.com/correct

It shows a page at a time and lets you see the lines of text as images 
and text. You can click on a word to correct it. The prototype is very 
rough, it is ugly, incomplete and contains bugs.

-- 
Edward.
_______________________________________________
Ol-discuss mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
[email protected]

Re: [ol-discuss] Recording the quality of a book's OCR

Reply via email to