On 01/04/2012 08:06 AM, Ralf Stephan wrote: > Regarding a subproblem that nevertheless makes books > unreadable: > > Are language-other-than-English (LOTE) books OCR'd > with English set? Could they be re-OCR'd when the language > option is changed? Can a user trigger re-OCR?
A few years back (2008?) the Internet Archive switched to ABBYY Finereader, where the language is set when a user uploads scanned images. After this, OCR quality is quite good. I scanned and uploaded this in November 2010, http://www.archive.org/details/Ned_med_vapnen as you can tell from the dates in the "HTTP" file menu. The OCR produced by "ABBYY FineReader 8.0" has only very few OCR errors, despite the old Swedish spelling. Another Swedish book from January 2008 does not say OCR: ABBYY, and OCR quality seems slightly worse to my eye, http://www.archive.org/details/konstanteckninga00brun One problem is if older scans were OCRed with older software and worse results. Should one go back and run a new OCR on these? Perpetually every 5 years? -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se Project Runeberg - free Nordic literature - http://runeberg.org/ _______________________________________________ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org