Laurent, When I have some time, I will take another look at my code to see if I can get an accurate levenshtein distance without adding significant complexity. I am optimistic.
I'll let you know what I find. Thanks again for sharing your code. Brenton On Feb 16, 2:12 am, Laurent PETIT <laurent.pe...@gmail.com> wrote: > Hi Brenton! > > 2011/2/16 Brenton <bashw...@gmail.com> > > > Laurent, > > > I have been doing some work on a diff library for Clojure sequences (I > > need to get back to it and finish it up). > > >http://github.com/brentonashworth/clj-diff > > > The main goal of this library is to compute sequential diffs quickly. > > While mine goals was really just to first see how I could express the > algorithm using higher order functions and clojure datastructures / sequence > abstraction, even internally. > > > Whenever I see someone doing something similar I like to compare > > performance just in case you know something that I don't. > > Fair enough ;) > > > Other algorithms usually perform well on small sequences but then > > break down as the sizes grow. For example, I did a quick test of this > > algorithm on two 10,000 character strings and your algorithm took 80 > > seconds while mine computed the edit distance is 120 ms. > > It could be interesting to plot some more intermediate results between 10 > character strings comparisons and 10,000, so that we can see if this is just > a problem of a constant factor, or a divergence in big O complexity. > > Please note that my algorithm is very close in implementation (and thus > big-O complexity, unless I've made a mistake) to the one described on the > wikipedia page. > It's not an "approximate" levenshtein distance, it is the levenshtein > distance. > > Do you know what the performance of your version could become if you really > implemented the computation of the levenshtein distance by really computing > and comparing all edit paths ? > > (Not saying that strictly sticking to levenshtein distance/algorithm is a > rational goal, just trying to compare apples with apples) > > > While my library is primarily concerned with diffs and edit distance, > > I did add a levenshtein-distance function which attempts to compute > > this distance from a previously computed minimum edit path. It is not > > always accurate because there may be many minimum edit paths with > > shorter or longer levenshtein distances. If the algorithm is modified > > slightly so that the edit path with the minimum levenshtein distance > > is chosen then it would be able to do both. > > > I can't take credit for the algorithm, I just implemented what I read > > in a paper. But I do think this approach will get the job done as > > quickly as possible. Of course there is a lot more code to read than > > your very impressive ten lines. > > You're welcome, > > Cheers, > > -- > Laurent -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en