Karsten Hilbert wrote:
> On Tue, Feb 20, 2007 at 07:25:28AM +1100, Tim Churches wrote:
> 
>>>> No, we need the data in computable form
>>> OK, that kills the easy solution. Or it might not. If you
>>> don't blend both sources of information (background image
>>> and user input) but rather keep them separate and blend on
>>> display/printing you'd still have the computable user input.
>>> The drawback is that it lacks any metadata (apart from which
>>> form it belongs to) as all the metadata would be encoded in
>>> the *location* of what the user typed. Which in itself just
>>> *might* lend itself to an OCR-like solution where a mask
>>> image is overlaid onto the data thereby adding metadata to
>>> it.
>> Hmm, that's a nice idea. It would be interesting to use PIL (Python
>> image library) to do the form subtraction that you mention, leaving just
>> the handwritten entries, and then present that to Tesseract OCR (which
>> is in C and could be wrapped as a Python library, I'm sure), and see how
>> it performs.
> Wait, the initial idea was slightly different still:
> 
> 1) scan the legacy paper form
> 2) put it into an OOo document as a background image
> 3) define text areas in OOo to have users type data into them
> 4) later on read the data from the text entry areas in the OOo document
> 
> The data retrieved from step 4 will be computable data ! Not
> particularly well constrained, but not just image data either.

Ah, OK. Our problem is that many users only want to record data with a
pen, on paper. No typing, no computers. And then scan the paper forms in
and have their data appear, automagically, in the database just as if
they had typed it. Nearly every user over the age of 50 asks for that.
Mind you, these are mobile users, and paper forms and a pen are highly
portable, robust and never need to be plugged in, so they do have a bit
of a valid case.

Tim C

Reply via email to