Thanks everyone for their ideas and suggestions! Some had occurred to us
but were discarded because we feel our solution needs to be automated --
45 million pages are a lot of thrust on any human-driven effort.

I like Itamar's idea of doing "competing" OCR, and keeping the best
result. Unfortunately OCR software is far from cheap, and the cost of 2
different product licenses may be too high for the project.

I've also looked into the Tesseract/OCRopus, but while the ideas are good
it ain't there yet.


> On Jan 25, 2008 6:12 AM, mark harwood <[EMAIL PROTECTED]> wrote:
>
>> Probably not a practical solution for you to set up but I love this
>> idea:
>>  http://blog.wired.com/monkeybites/2007/05/recaptcha_fight.html
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to