I am investigating aspell for use on a large set of scanned pages with
text that was generated through OCR.

I searched through the mailing list achiive and found
  http://lists.gnu.org/archive/html/aspell-user/2002-07/msg00003.html
wherein Kevin Atkinson explains that aspell was not designed for
OCR-type errors.

Nevertheless, I chose to proceed a bit ... primarly because I was
unable to find anything open source that was better. Unfortunately I
did not get very far.

aspell seems to ignore any words with digits in them, and my OCR text
has plenty of digit/character confusion. I was unable to find any
options to control behavior with digits.

Searching the mailing list again I found
  http://lists.gnu.org/archive/html/aspell-user/2006-08/msg00013.html
wherein Thomas Güttler suggested modifying the cset table so that
additional characters could be treated as word characters. I tried
copying the .cset file, modifying it to turn the Digits into Letters,
specifyiing my cset using --encoding on the command line. However but
the behavior did not change ... words with digits in them were still
ignored and did not show up with --list.

Any comments/suggestions/advice appreciated.


Michael


_______________________________________________
Aspell-user mailing list
Aspell-user@gnu.org
http://lists.gnu.org/mailman/listinfo/aspell-user

Reply via email to