Hi Brenton!

2011/2/16 Brenton <bashw...@gmail.com>

> Laurent,
>
> I have been doing some work on a diff library for Clojure sequences (I
> need to get back to it and finish it up).
>
> http://github.com/brentonashworth/clj-diff
>
> The main goal of this library is to compute sequential diffs quickly.
>

While mine goals was really just to first see how I could express the
algorithm using higher order functions and clojure datastructures / sequence
abstraction, even internally.


> Whenever I see someone doing something similar I like to compare
> performance just in case you know something that I don't.
>

Fair enough ;)


> Other algorithms usually perform well on small sequences but then
> break down as the sizes grow. For example, I did a quick test of this
> algorithm on two 10,000 character strings and your algorithm took 80
> seconds while mine computed the edit distance is 120 ms.
>

It could be interesting to plot some more intermediate results between 10
character strings comparisons and 10,000, so that we can see if this is just
a problem of a constant factor, or a divergence in big O complexity.

Please note that my algorithm is very close in implementation (and thus
big-O complexity, unless I've made a mistake) to the one described on the
wikipedia page.
It's not an "approximate" levenshtein distance, it is the levenshtein
distance.

Do you know what the performance of your version could become if you really
implemented the computation of the levenshtein distance by really computing
and comparing all edit paths ?

(Not saying that strictly sticking to levenshtein distance/algorithm is a
rational goal, just trying to compare apples with apples)


> While my library is primarily concerned with diffs and edit distance,
> I did add a levenshtein-distance function which attempts to compute
> this distance from a previously computed minimum edit path. It is not
> always accurate because there may be many minimum edit paths with
> shorter or longer levenshtein distances. If the algorithm is modified
> slightly so that the edit path with the minimum levenshtein distance
> is chosen then it would be able to do both.
>
> I can't take credit for the algorithm, I just implemented what I read
> in a paper. But I do think this approach will get the job done as
> quickly as possible. Of course there is a lot more code to read than
> your very impressive ten lines.
>

You're welcome,

Cheers,

-- 
Laurent

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to