On 01/04/2012 08:06 AM, Ralf Stephan wrote:
> Regarding a subproblem that nevertheless makes books
> unreadable:
>
> Are language-other-than-English (LOTE) books OCR'd
> with English set? Could they be re-OCR'd when the language
> option is changed? Can a user trigger re-OCR?

A few years back (2008?) the Internet Archive switched to
ABBYY Finereader, where the language is set when a user
uploads scanned images. After this, OCR quality is quite good.

I scanned and uploaded this in November 2010,
http://www.archive.org/details/Ned_med_vapnen
as you can tell from the dates in the "HTTP" file menu.
The OCR produced by "ABBYY FineReader 8.0" has only
very few OCR errors, despite the old Swedish spelling.

Another Swedish book from January 2008 does not say OCR: ABBYY,
and OCR quality seems slightly worse to my eye,
http://www.archive.org/details/konstanteckninga00brun

One problem is if older scans were OCRed with older
software and worse results. Should one go back and
run a new OCR on these? Perpetually every 5 years?


-- 
   Lars Aronsson (l...@aronsson.se)
   Aronsson Datateknik - http://aronsson.se

   Project Runeberg - free Nordic literature - http://runeberg.org/


_______________________________________________
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org

Reply via email to