"Marcos Sanches" <[EMAIL PROTECTED]> wrote: Ls1<-length(s1) Ls2<-length(s2) for ( p in 1:ls1){ for (q in 1:ls2){ t1<-levenshteinFast(s1[p],s2[q]) ...
Ls1=42000 Ls2=70000 I think I will wait for months untill this program ends. Do you have any sugestion to increase the speed? The first suggestion has to be "search HARD in the on-line literature; as others are bound to have had similar problems." My second suggestion is to consider sorting. The cost of distance(x,y) is proportional to |x|*|y|. Now suppose y and z have a common prefix of length n. Then the computation of distance(x,z) can be done in |x|*(|z|-n) time by reusing the first n+1 columns of the matrix developed for y. The simplest way to arrange strings so that strings with a common prefix are adjacent is to sort them. Depending on what your strings are like, this might make a big improvement, or it might be a waste of time, There are various approximate algorithms out there; the bioinformatics literature is vast. But the use of the simple insert=1 delete=1 change=1 cost function is *also* an approximation. If strings are typed, some keys are closer than others. If they are transcribed from speech, some sounds are more similar than others and some deletions (such as reduced vowels) are likelier than others. So don't be afraid of approximations. ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html