Frédéric Chiroleu <frederic.chiroleu <at> cirad.fr> writes: > > Hi, > > I misunderstand the definition of Canberra distance in R. > > On Internet and in function description pages of dist() from stats and > Dist() from amap, Canberra distance between vectors x and y, d(x,y), is : > > d(x,y) = sum(abs(x-y)/(x+y)) > > But in use, through simple examples, we find that the formula is : > > d(x,y) = (NZ + 1)/NZ * sum(abs(x-y)/(x+y)) > > with NZ = nb of pairs of coordinates that are different from (0,0) (Non > Zeros) > I think you must try another example. At least in my simple experiments the multiplier seemed to be NZ/NZ or one instead of your almost one, and this one was also the documented case. I could not find any difference to the documentation. However, there is a note about "double zeros" (zero denominator and numerator) in the dist documentation. Could that cause some difference?
If you really want to know how the distance is calculated, download the R source file and look at there. If you want to know how the index was originally suggested to be calculated, you must find the Lance & Williams paper in Aust. Comput. J. 1, 15-20, 1967 (I haven't found it, but would be curious to see it). Cheers, jari oksanen ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.