On Sat, Sep 25, 2010 at 3:53 PM, Lorenzo Isella <lorenzo.ise...@gmail.com>wrote:

> On 09/25/2010 03:23 PM, Rainer M Krug wrote:
>
>
>> Evaluate, for me, does not necessary mean "test if they are
>> significantly different", but rather to quantify the difference. If that
>> is what you are looking for, you could look at the "Earth Movers
>> Distance", where a package is available at R-forge
>> (https://r-forge.r-project.org/projects/earthmovdist/) which I co-wrote
>> and used before.
>>
>> Cheers,
>>
>> Rainer
>>
>>
> Thanks Rainer. I had a quick look at wikipedia and the package you mention,
> and it seems what I am looking for.
>

Great - could you please give me some feedback after using the package, if
something could be improved? Thanks.


> Just a question about normalization of the distance calculated by the
> algorithm.
> Let us say that I have 4 distributions A,B,C,D coupled this way (A,B) and
> (C,D).
> The length of data in A is equal to the length of data in B, same applies
> to C and D but length(A)!=length(C).
> Now, the argument I would like to make is that A and B are more similar
> than C and D and show a couple of numbers to prove this.
> Bottom line: provided my data lists are long enough, does this distance
> scale with the number of data? and if they do, how should I normalize this
> distance to compare the results?
>

You could represent the distance as the proportion of maximum possible
distance, i.e. scaling it to be between 0 and 1.

An example:
A and B have the same length (x), and you calculate the emd(A, B), which is
d.
Now you have to determine the maximum distance between these two:
remembering the analogy of moving earth, the biggest distance between the
two distributions would be if in A, all elements would be in A(1) and all
other would be zero, and in B all elements would be zero, except of B(x).
Now you can calculate the difference between these two, and you get dmax
The last step is to divide d/dmax, i.e. scaling to a value between 0 and 1.

this value then can be compared with the same ratio obtained from C and D
with length y.

One important point to keep in mind when using the emd: if the sum(A) is not
the same as sum(B), emd(A,B) is NOT EQUAL to emd(B,A). If this applies to
your case, you have to decide what to do, but one option is to standardise A
and B so that their sum is the same (effectively comparing the SHAPES and
not the actual values.


If you need a reference where we used this approach (for comparison of
different maps from different areas), see :

@ARTICLE{Roura-Pascual2009_rmkc,
  author = {Roura-Pascual, N\'{u}ria and Krug, Rainer M. and Richardson,
David
M. and Hui, Cang},
  title = {Spatially-explicit sensitivity analysis for conservation
management:
exploring the influence of decisions in invasive alien plant management},
  journal = {Diversity and Distributions},
  year = {2010},
  volume = {16},
  pages = {426--438},
  doi = {10.1111/j.1472-4642.2010.00659.x},
  file = {Article:Roura-Pascual2009_rmkc.pdf:PDF},
  owner = {rkrug},
  timestamp = {2009.03.11}
}

Please feel free to contact me if you have further questions,

Cheers,

Rainer



> Cheers
>
> Lorenzo
>



-- 
NEW GERMAN FAX NUMBER!!!

Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology,
UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Cell:           +27 - (0)83 9479 042
Fax:            +27 - (0)86 516 2782
Fax:            +49 - (0)321 2125 2244
email:          rai...@krugs.de

Skype:          RMkrug
Google:         r.m.k...@gmail.com

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to