As this thread seems to be at an end, I'll just tidy up a couple of loose ends.
1. My problem 1, with the treatment of hyphens during tokenization. Carlo's suggestion of two-pass checking, the first with the hyphen as a letter, and the second with the hyphen as a punctuation mark, is interesting, but won't the first pass object to all the productive compounds like "half-moon" - potentially infinite in number - which will not be in the dictionary? It may still be workable if the input to the second pass is, not the whole text over again, but the list of words rejected in the first pass, but interactive use of the checker seems to be ruled out unless both checks can be done in the same pass, as the MS spell checker does it. 2. Kevin referred to the file http://aspell.net/man-html/Words-With-Symbols-in-Them.html from which I quote: The case where the symbol can appear at the beginning or end of the word is more difficult to deal with. The symbol may or may not actually be part of the word. Aspell currently handles this case by first trying to spell check the word with the symbol and if that fails, try it without. I cannot reconcile this with what I observed in my problem 2, where aspell appeared to check the word without the symbol (apostrophe) only. I found that 'twas was rejected when the dictionary contained 'twas but not twas , and accepted when the dictionary contained twas but not 'twas . Ciarán Ó Duibhín. _______________________________________________ Aspell-user mailing list Aspell-user@gnu.org https://lists.gnu.org/mailman/listinfo/aspell-user