-------- Original Message --------
Subject: RE: [No Subject Provided: Corrected Mahalanobis D - the mod]
Date: Sun, 2 Mar 2008 12:54:04 -0800 (PST)
From: F. James Rohlf <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Organization: Stony Brook University
To: morphmet@morphometrics.org
References: <[EMAIL PROTECTED]>

That correction is based on the relationship given in equation (4) on page 219 of Anderson (1984) "An introduction to multivariate statistical analysis", 2nd ed. It shows that the expected (average) value of the sample Generalized distance squared is not the population value and thus there is a bias for small sample sizes.

One can invert the equation to get the estimate corrected for bias that you quote. However this can yield negative estimates sometimes. What the formula does is to give an estimate which on average will equal the true distance squared - thus some estimates have to be too small (perhaps even negative) and others too large. The formula only gives the relationship for their average.

------------------------
F. James Rohlf, Distinguished Professor
Ecology & Evolution, Stony Brook University
www: http://life.bio.sunysb.edu/ee/rohlf


-----Original Message-----
From: morphmet [mailto:[EMAIL PROTECTED]
Sent: Friday, February 29, 2008 2:13 PM
To: morphmet
Subject: [No Subject Provided: Corrected Mahalanobis D - the mod]

-------- Original Message --------
Date:   Fri, 29 Feb 2008 11:05:27 -0800 (PST)
From:   Elsa et St�phane BOUEE <[EMAIL PROTECTED]>
To:     <morphmet@morphometrics.org>



Speaking about Mahalanobis distance (D) I have a question/remark.

Due to random fluctuation in a finite number of observations, D is not
null and will increase with the number of variables.

Markus has proposed a formula that takes into account this fact (I did
not find the mathematical demonstration of this formula):



Corrected(D)=[(n1+n2-p-3)*D/(n1+n2-2)]-[(n1+n2)*p/n1*n2]

With: D=mahalanobis distance

       n1 and n2: number of observations in the 2 groups

       p: number of variables



I applied this formula on a dataset and found negative results (even
with a small number of variables (5)), which is embarrassing for a
distance�



Therefore, I used another method to encompass this bias. I randomly
permuted the variables with the observations (I neither cannot use my
hands, but hope everyone can understand) and calculated 10000 random D
by using this method. Then, I subtracted the mean of those random D to
the true D calculated on my dataset.



Am I correct doing so ?

Has anyone an idea of a better (exact mathematic) way to correct the D
without having negative values?



Thank you for your answers



St�phane BOUEE




--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org




--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org

Reply via email to