On Friday 30 June 2006 09:52 am, Simon Brouwer wrote: > > The shorter the words, the more catastrophic the error rate. > > It might then be a good idea if the spell checker would reject guessed > compounds below a certain minimum length (configurable in the affix file).
Yes, I know, that aspell allows this. Hunspell has so many compound flags, that I am not sure, what it allows and what not. This can be the first step in creating quality word lists: select only the shorter than say 9 characters words from a web corpus, and compound by machine the rest. Then go ahead up to 12 characters, 15 characters, and the rest. So we can eliminate step for step the error prone mechanic compounding. > > I assume, that the results are in German analogous, because the first > > investigations showed that quite clearly, if I have time, I > > will look also into that somewhat deeper. > > I notice that the German examples show mostly wrong compounds that are > misspellings of other words. Maybe that list is not representative, but > such errors would be more common and are more difficult to spot by the > user. > So a possible improvement could be to disqualify a guessed compound if > it is too similar to a word that is actually in the word list. The > existing suggestion mechanism could be used to determine this. > > Or maybe such mechanisms have already been implemented? You are right, I found that there is very often just one more or less character, that makes the mechanically compounded word senseless and erroneous. However, this would cause the elimination of a lot of potentially good words, therefore this needs to be verified. Maybe this approach would spill the child with the bath water. Best is to eliminate mechanic compounding completely, the sooner, the better. Regards: Eleonora --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
