Re: [R] Measure Difference Between Two Distributions

Lorenzo Isella Sat, 25 Sep 2010 09:26:35 -0700

ld represent the distance as the proportion of maximum possible

distance, i.e. scaling it to be between 0 and 1.


An example:
A and B have the same length (x), and you calculate the emd(A, B), which
is d.
Now you have to determine the maximum distance between these two:
remembering the analogy of moving earth, the biggest distance between
the two distributions would be if in A, all elements would be in A(1)
and all other would be zero, and in B all elements would be zero, except
of B(x). Now you can calculate the difference between these two, and you
get dmax
The last step is to divide d/dmax, i.e. scaling to a value between 0 and 1.

this value then can be compared with the same ratio obtained from C and
D with length y.

One important point to keep in mind when using the emd: if the sum(A) is
not the same as sum(B), emd(A,B) is NOT EQUAL to emd(B,A). If this
applies to your case, you have to decide what to do, but one option is
to standardise A and B so that their sum is the same (effectively
comparing the SHAPES and not the actual values.


OK, I see. The standardization part is not a terrible problem, I guess.

The other bit is less clear (to me). What are A(1) and B(x)? Am I pilingup all the elements in A and B in a single bin?

Cheers

Lorenzo

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Measure Difference Between Two Distributions

Reply via email to