On 03-10, Ray Olszewski wrote: > At 08:59 AM 3/10/2004 -0500, Hal MacArgle wrote: > [...] > >> 1. Run a program that will scan to an image file, then a separate program > >> that will do OCR on the scanned image. > > > > Done - scan to PBM (P4) then run Gocr to get a bit mapped > >file.. Trouble is; even using resolution 360, slow and a big PBM > >file, the final copy is about 90% accurate and the original format of > >the letter compromized losing paragraphs, offsets, indents, etc... > >Taking the time to do all this plus fix up the immediate above would > >take almost as long as manually re-typing the page.. > > Yes, this is always the problem with OCR. Back when I did a lot of it > (about 12-15 years ago, on a Macintosh), OCR packages included the "brag" > that they were 99% accurate. I, like any serious user, had no trouble > translating "99% accurate" to "an error every 3 lines of text" and was > unimpressed. For everyday use, OCR needs about "four 9s" of accuracy, > translating to one error every few pages of text. > > If your 90% estimate is correct, it translates to (on average) several > errors per line of text, making the process close to worthless for you. > > Of course, any OCR package is better on some images than others. I don't > know what your source pages look like. For example, serif fonts (e.g., > Times Roman, Century Schoolbook, Palatino) are generally easier to OCR well > than sanserif fonts (e.g., Ariel, anything with "sans" in the name). Fresh > printouts are better then third-generation Xeroxes. And so on. > > I expect that you are at the point where you need help from someone with > real and current expertise in OCR work, preferably on Linux. That's not me, > and from the surrounding silence, I suspect it is not to be found on this > list. > Greetings: And your detailled comments most valuable as usual.. It stands to reason the fonts must "match" the design.. I just looked at my cheque book and the "crazy" numeral "style." I read further that OCR's, even the pricey ones, have a real problem when the text is in italics after a long run of "normal." The banking system had better "match" eh??
> > I fetched Clara but could only find a .rpm file, no tarball > >could for Slackware, etc.. Slack has a rpm program but the > >dependencies needed to extract looked like they were mostly Red Hat's > >filenames.. > > Well ... a source .tgz can be downloaded from a link on this page -- > http://www.claraocr.org/ > > (Thank you, Google). > I, of course, used google/linux but entered ocr and got the "wrong" Clara site.. Thanks. I will try the tar ball but am pessimistic of course.. Methinks Patrick at Slackware doesn't include ocr in his packages for good reason... Appreciate!! Hal - in Terra Alta, WV - Slackware GNU/Linux 9.0 (2.4.20) Utrum Per Hebdomadem Perveniam . - To unsubscribe from this list: send the line "unsubscribe linux-newbie" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.linux-learn.org/faqs