UCLA had developed a very good scanning OCR solution ..... but I don't think it was pure FOSS.... will ask.
Joseph Tim Churches wrote: > Tim Churches wrote: >> Karsten Hilbert wrote: >>> Well, the path of least resistance here is to scan it and >>> use it as a background image in some text editor or other so >>> that what you type appears to be written into the fields >>> while it is (technically) written on top of the background >>> image. We then save the result as any other old document >>> tied into the medical record. >> No, we need the data in computable form for epidemiological (aggregate) >> analysis - images of numbers nd characters must be converted to ASCII or >> Unicode bytes. There is a commercial product, Teleform, which does this >> reasonably well - see >> http://www.cardiff.com/products/teleform/index.html - and we may just >> provide an interface which can load data which has been scanned off >> hand-written forms using that, but gee, an open source solution would be >> nice. Suggestions very welcome. > > A few months ago Google released Tesseract OCR, an oCR engine developed > in the 1990s by Hewlett-Packard. Apparently it was state-of-the-art in > 1995, but that's over a decade ago, and has not been developed since. > There don't seem to be any other open source OCR engines around that are > being actively developed or which are anything more than demos or > proofs-of-concept. And Teleform seems to have the OCR-from-paper-forms > market almost to themselves. I think we'll have to build a batch input > interface that Teleform can be plugged into - I think it exports to XML, > or at the very least CSV files. > > But if anyone can suggest an alternative for turning data recorded on > paper forms into data (as opposed to raster image) files, we'd love to > hear of it. > > Tim C > > > > > Yahoo! Groups Links > > > > . >