> On Dec 31, 2018, at 7:13 PM, dwight via cctalk <cctalk@classiccmp.org> wrote:
> 
> Fred is right, OCR is only worth it if the document is in perfect condition. 
> I just finish getting an old 4004 listing working. I made only two mistakes 
> on the 4K of code that were not the fault of the poorness of the listing. 
> Twice I put LDM instead of LD. LDM was the most commonly used.

I wouldn't put it quite so strongly.  OCR even if not perfect can help a lot.  
You can often OCR + test assembly + proofread faster than retyping, especially 
since that requires fixing typos and proofreading also.  Many OCR errors are 
caught by the assembler, though not all of them of course.  I've done both in 
an ongoing software preservation project; my conclusion still is to use OCR 
when it works "well enough".  A couple of errors per page is definitely "well 
enough".

The program used matters.  I looked at Tesseract a bit but its quality was 
vastly inferior to commercial products in the examples I tried.  I now use 
Abbyy FineReader, which handles a lot of line printer and typewriter material 
quite well.

        paul


Reply via email to