As this thread seems to be at an end, I'll just tidy up a couple of loose ends.

1. My problem 1, with the treatment of hyphens during tokenization.  Carlo's 
suggestion of two-pass checking, the first with the hyphen as a letter, and the 
second with the hyphen as a punctuation mark, is interesting, but won't the 
first pass object to all the productive compounds like "half-moon" - 
potentially infinite in number - which will not be in the dictionary?  It may 
still be workable if the input to the second pass is, not the whole text over 
again, but the list of words rejected in the first pass, but interactive use of 
the checker seems to be ruled out unless both checks can be done in the same 
pass, as the MS spell checker does it.

2. Kevin referred to the file 
http://aspell.net/man-html/Words-With-Symbols-in-Them.html from which I quote:
The case where the symbol can appear at the beginning or end of the word is 
more difficult to deal with. The symbol may or may not actually be part of the 
word. Aspell currently handles this case by first trying to spell check the 
word with the symbol and if that fails, try it without.
I cannot reconcile this with what I observed in my problem 2, where aspell 
appeared to check the word without the symbol (apostrophe) only.  I found that 
'twas was rejected when the dictionary contained 'twas but not twas , and 
accepted when the dictionary contained twas but not 'twas .

Ciarán Ó Duibhín.

 
_______________________________________________
Aspell-user mailing list
Aspell-user@gnu.org
https://lists.gnu.org/mailman/listinfo/aspell-user

Reply via email to