I am investigating aspell for use on a large set of scanned pages with text that was generated through OCR.
I searched through the mailing list achiive and found http://lists.gnu.org/archive/html/aspell-user/2002-07/msg00003.html wherein Kevin Atkinson explains that aspell was not designed for OCR-type errors. Nevertheless, I chose to proceed a bit ... primarly because I was unable to find anything open source that was better. Unfortunately I did not get very far. aspell seems to ignore any words with digits in them, and my OCR text has plenty of digit/character confusion. I was unable to find any options to control behavior with digits. Searching the mailing list again I found http://lists.gnu.org/archive/html/aspell-user/2006-08/msg00013.html wherein Thomas Güttler suggested modifying the cset table so that additional characters could be treated as word characters. I tried copying the .cset file, modifying it to turn the Digits into Letters, specifyiing my cset using --encoding on the command line. However but the behavior did not change ... words with digits in them were still ignored and did not show up with --list. Any comments/suggestions/advice appreciated. Michael _______________________________________________ Aspell-user mailing list Aspell-user@gnu.org http://lists.gnu.org/mailman/listinfo/aspell-user