Hi Ruud, You are absolutely right. A lot of typos will be allowed by the compound recognition, but Hunspell has already had the suggested feature to forbid the ugliest spelling mistakes recognized by the compound analysis: if the (pseudo) compound word can be produced from a dictionary word (or from its affixed forms) by one of the REP replacement rules, it won't be accepted by Hunspell. For example, one of the most typical Hungarian spelling mistake is the i↔í replacement. Using the
REP i í REP í i rules, the bad "szer+víz" or "elit+élt" compounds aren't accepted, because the dictionary contains the words "szerviz" and "elítélt". You may have to extend the REP rules also with similar 1-character replacements to catch the most important spelling mistakes of your language. I think, for the average wordprocessing on a language with arbitrary number of compound words is much better to use the compound recognition feature of Hunspell. But for other tasks, especially to check and edit artifically distorted texts, like the output of an OCR program, you may need to add new REP rules (for the typical OCR errors) or to offer an optional dictionary without compound recognition. Regards, László 2009/2/17 R.J. Baars <[email protected]>: > Laszlo, > > One of my colleages in OpenTaal (also project leader of OOo NL) is worried > about the compounding supporting compounds that could easily be a mistake. > > Of course we can try and find these, and flag them as forbiddenword, but > did you ever think of a function, detecting whether the compounded word is > a possible type for a word that is in the list itself, and if zo, forbid > it? > > Ruud > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
