One more comment:

> For a reference:
> http://en.wikipedia.org/wiki/Levenshtein_distance
> 
> Befor looking for a completely different solution I'd like to try
> experimenting with a slight modification of that algorithm
> 
> Usually the Levenshtein distance is calculated by counting changes with
> the following weight (as in the link above)
> (A)
>   - adding a character:   +1
>   - deleting a character: +1
>   - changing a character: +1
> But sometimes I have seen weights that prefer changes over insertions
> and deletions, in those cases the weights were like this:
> (B)
>   - adding a character:   +2
>   - deleting a character: +1
>   - changing a character: +2
> 
> Because of this, and since the actual problem is only with getting a
> better proposal if the character differs only in its 'decoration' I'd
> like to suggest trying the following idea: shifting the weights.
> 
> Provided hunspell uses weights like this (the actual values do not matter!)
>   - adding a character:   +A
>   - deleting a character: +D
>   - changing a character: +C
> then the weights should be calculated like this instead
>   - adding a character:   +2*A
>   - deleting a character: +2*D
>   - changing a character: +2*C, if the characters differ not just by
> 'decoration'
>   - changing a character: +C, if the characters differ *only in* the
> decoration
> 
> That way changes like é to c will have double weight and changes like e
> to é will have only single weight. Thus the latter changes should be
> preferable compared to other changes and therefor the respective
> suggestions being higher up in the list of proposals.
> 
> Obviously the suggestion mechanism now needs to accept words of twice
> the Levenshtein distance than before.

If that idea basically proves to be functional but not good enough you
may try two more modifications:

a) use a even higher standard weight
   (that is use +5*A instead if +2*A, ...)
b) you may consider changes like the ones in decoration to be completely
   neglectable and give them a weight of 0.
   But since I'm not aware how a weight of 0 might actually impact the
   reliability of the algorithm one needs to verify that the algorithm
   will still be fine with a weight of 0

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to