One more comment: > For a reference: > http://en.wikipedia.org/wiki/Levenshtein_distance > > Befor looking for a completely different solution I'd like to try > experimenting with a slight modification of that algorithm > > Usually the Levenshtein distance is calculated by counting changes with > the following weight (as in the link above) > (A) > - adding a character: +1 > - deleting a character: +1 > - changing a character: +1 > But sometimes I have seen weights that prefer changes over insertions > and deletions, in those cases the weights were like this: > (B) > - adding a character: +2 > - deleting a character: +1 > - changing a character: +2 > > Because of this, and since the actual problem is only with getting a > better proposal if the character differs only in its 'decoration' I'd > like to suggest trying the following idea: shifting the weights. > > Provided hunspell uses weights like this (the actual values do not matter!) > - adding a character: +A > - deleting a character: +D > - changing a character: +C > then the weights should be calculated like this instead > - adding a character: +2*A > - deleting a character: +2*D > - changing a character: +2*C, if the characters differ not just by > 'decoration' > - changing a character: +C, if the characters differ *only in* the > decoration > > That way changes like é to c will have double weight and changes like e > to é will have only single weight. Thus the latter changes should be > preferable compared to other changes and therefor the respective > suggestions being higher up in the list of proposals. > > Obviously the suggestion mechanism now needs to accept words of twice > the Levenshtein distance than before.
If that idea basically proves to be functional but not good enough you may try two more modifications: a) use a even higher standard weight (that is use +5*A instead if +2*A, ...) b) you may consider changes like the ones in decoration to be completely neglectable and give them a weight of 0. But since I'm not aware how a weight of 0 might actually impact the reliability of the algorithm one needs to verify that the algorithm will still be fine with a weight of 0 Thomas --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
