Laurent,

I have been doing some work on a diff library for Clojure sequences (I
need to get back to it and finish it up).

http://github.com/brentonashworth/clj-diff

The main goal of this library is to compute sequential diffs quickly.
Whenever I see someone doing something similar I like to compare
performance just in case you know something that I don't.

Other algorithms usually perform well on small sequences but then
break down as the sizes grow. For example, I did a quick test of this
algorithm on two 10,000 character strings and your algorithm took 80
seconds while mine computed the edit distance is 120 ms.

While my library is primarily concerned with diffs and edit distance,
I did add a levenshtein-distance function which attempts to compute
this distance from a previously computed minimum edit path. It is not
always accurate because there may be many minimum edit paths with
shorter or longer levenshtein distances. If the algorithm is modified
slightly so that the edit path with the minimum levenshtein distance
is chosen then it would be able to do both.

I can't take credit for the algorithm, I just implemented what I read
in a paper. But I do think this approach will get the job done as
quickly as possible. Of course there is a lot more code to read than
your very impressive ten lines.

Cheers,
Brenton




On Feb 15, 2:38 pm, Laurent PETIT <laurent.pe...@gmail.com> wrote:
> Hi,
>
> Was playing with levenshtein (argh, where do I place the h -sorry mister
> levenshtein-), and thougth it could be interesting to share my current
> result here, to get some feedback.
>
> The following version works with any seq-able (not only Strings), but
> hardwires function = for equality testing of seq values (rather good default
> IMHO), but also hardwires the cost of 1 for either element insertion,
> deletion, or swap.
>
> It is functional, it does not use arrays, nor a MxN matrix, just the
> required data for computing a given "row" of the "virtual matrix" (e.g. the
> previous row)
>
> I'm quite sure it can still be improved for better readability and
> performance without loosing any of the above mentioned characteristics :
>
> https://gist.github.com/828413
>
> Cheers,
>
> --
> Laurent

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to