On Tue, Feb 20, 2007 at 07:25:28AM +1100, Tim Churches wrote:

> >> No, we need the data in computable form
> > OK, that kills the easy solution. Or it might not. If you
> > don't blend both sources of information (background image
> > and user input) but rather keep them separate and blend on
> > display/printing you'd still have the computable user input.
> > The drawback is that it lacks any metadata (apart from which
> > form it belongs to) as all the metadata would be encoded in
> > the *location* of what the user typed. Which in itself just
> > *might* lend itself to an OCR-like solution where a mask
> > image is overlaid onto the data thereby adding metadata to
> > it.
> 
> Hmm, that's a nice idea. It would be interesting to use PIL (Python
> image library) to do the form subtraction that you mention, leaving just
> the handwritten entries, and then present that to Tesseract OCR (which
> is in C and could be wrapped as a Python library, I'm sure), and see how
> it performs.
Wait, the initial idea was slightly different still:

1) scan the legacy paper form
2) put it into an OOo document as a background image
3) define text areas in OOo to have users type data into them
4) later on read the data from the text entry areas in the OOo document

The data retrieved from step 4 will be computable data ! Not
particularly well constrained, but not just image data either.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Reply via email to