again, this had better been sent to the list as well }:-(
--- Begin Message ---
Am Mittwoch, den 26.04.2006, 23:28 -0700 schrieb Kon Lovett:
> I re-packaged the leventhshein egg last year, not re-writing the
> existing impl, just extending w/ new procedures.
Uh, looks interesting.
May I suggest to add one more: (levenshtein< s t limit) => boolean.
This is much cheaper than (< (levenshtein-distance s t) limit) , which
in turn is quite useful when ignoring minor typos.
How does it work with UTF-8? Is (levenshtein-distance "ä" "n") 1?
> Interested in what
> "obvious optimisation" was missing from the original.
The original, or what I recall about it, doesn't do common
prefix/postfix stripping. The Levenshtein algorithm is simply applied
to the whole string. This is unnecessary costly.
> Is the "deadly incompatible with user level threading" due to
> possible context switch during a match? If so, then this is not
> currently a problem.
The old code used to feed possibly huge strings to a C procedure, which
could then work almost forever, preventing context switches. I can't
tell from a quick glance, but did I get you right, that's gone? Great.
The other problem I recall, but I don't even recall whether this was in
the egg or some other C implementation, was that the whole matrix was
allocated, while only space for the smallest line (the shorter one
columns or rows) is required.
Best regards
/Jörg
--- End Message ---
_______________________________________________
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users