On Sat, Sep 25, 2010 at 3:53 PM, Lorenzo Isella <lorenzo.ise...@gmail.com>wrote:
> On 09/25/2010 03:23 PM, Rainer M Krug wrote: > > >> Evaluate, for me, does not necessary mean "test if they are >> significantly different", but rather to quantify the difference. If that >> is what you are looking for, you could look at the "Earth Movers >> Distance", where a package is available at R-forge >> (https://r-forge.r-project.org/projects/earthmovdist/) which I co-wrote >> and used before. >> >> Cheers, >> >> Rainer >> >> > Thanks Rainer. I had a quick look at wikipedia and the package you mention, > and it seems what I am looking for. > Great - could you please give me some feedback after using the package, if something could be improved? Thanks. > Just a question about normalization of the distance calculated by the > algorithm. > Let us say that I have 4 distributions A,B,C,D coupled this way (A,B) and > (C,D). > The length of data in A is equal to the length of data in B, same applies > to C and D but length(A)!=length(C). > Now, the argument I would like to make is that A and B are more similar > than C and D and show a couple of numbers to prove this. > Bottom line: provided my data lists are long enough, does this distance > scale with the number of data? and if they do, how should I normalize this > distance to compare the results? > You could represent the distance as the proportion of maximum possible distance, i.e. scaling it to be between 0 and 1. An example: A and B have the same length (x), and you calculate the emd(A, B), which is d. Now you have to determine the maximum distance between these two: remembering the analogy of moving earth, the biggest distance between the two distributions would be if in A, all elements would be in A(1) and all other would be zero, and in B all elements would be zero, except of B(x). Now you can calculate the difference between these two, and you get dmax The last step is to divide d/dmax, i.e. scaling to a value between 0 and 1. this value then can be compared with the same ratio obtained from C and D with length y. One important point to keep in mind when using the emd: if the sum(A) is not the same as sum(B), emd(A,B) is NOT EQUAL to emd(B,A). If this applies to your case, you have to decide what to do, but one option is to standardise A and B so that their sum is the same (effectively comparing the SHAPES and not the actual values. If you need a reference where we used this approach (for comparison of different maps from different areas), see : @ARTICLE{Roura-Pascual2009_rmkc, author = {Roura-Pascual, N\'{u}ria and Krug, Rainer M. and Richardson, David M. and Hui, Cang}, title = {Spatially-explicit sensitivity analysis for conservation management: exploring the influence of decisions in invasive alien plant management}, journal = {Diversity and Distributions}, year = {2010}, volume = {16}, pages = {426--438}, doi = {10.1111/j.1472-4642.2010.00659.x}, file = {Article:Roura-Pascual2009_rmkc.pdf:PDF}, owner = {rkrug}, timestamp = {2009.03.11} } Please feel free to contact me if you have further questions, Cheers, Rainer > Cheers > > Lorenzo > -- NEW GERMAN FAX NUMBER!!! Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Natural Sciences Building Office Suite 2039 Stellenbosch University Main Campus, Merriman Avenue Stellenbosch South Africa Cell: +27 - (0)83 9479 042 Fax: +27 - (0)86 516 2782 Fax: +49 - (0)321 2125 2244 email: rai...@krugs.de Skype: RMkrug Google: r.m.k...@gmail.com [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.