RE: [No Subject Provided: Corrected Mahalanobis D - the mod]

morphmet Mon, 03 Mar 2008 12:34:15 -0800

-------- Original Message --------
Subject: RE: [No Subject Provided: Corrected Mahalanobis D - the mod]
Date: Sun, 2 Mar 2008 12:54:04 -0800 (PST)
From: F. James Rohlf <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Organization: Stony Brook University
To: [email protected]
References: <[EMAIL PROTECTED]>

That correction is based on the relationship given in equation (4) onpage 219 of Anderson (1984) "An introduction to multivariate statisticalanalysis", 2nd ed. It shows that the expected (average) value of thesample Generalized distance squared is not the population value and thusthere is a bias for small sample sizes.

One can invert the equation to get the estimate corrected for bias thatyou quote. However this can yield negative estimates sometimes. What theformula does is to give an estimate which on average will equal the truedistance squared - thus some estimates have to be too small (perhapseven negative) and others too large. The formula only gives therelationship for their average.


------------------------
F. James Rohlf, Distinguished Professor
Ecology & Evolution, Stony Brook University
www: http://life.bio.sunysb.edu/ee/rohlf

-----Original Message-----
From: morphmet [mailto:[EMAIL PROTECTED]
Sent: Friday, February 29, 2008 2:13 PM
To: morphmet
Subject: [No Subject Provided: Corrected Mahalanobis D - the mod]

-------- Original Message --------
Date:   Fri, 29 Feb 2008 11:05:27 -0800 (PST)
From:   Elsa et St�phane BOUEE <[EMAIL PROTECTED]>
To:     <[email protected]>

Speaking about Mahalanobis distance (D) I have a question/remark.

Due to random fluctuation in a finite number of observations, D is not
null and will increase with the number of variables.

Markus has proposed a formula that takes into account this fact (I did
not find the mathematical demonstration of this formula):

Corrected(D)=[(n1+n2-p-3)*D/(n1+n2-2)]-[(n1+n2)*p/n1*n2]

With: D=mahalanobis distance

       n1 and n2: number of observations in the 2 groups

       p: number of variables

I applied this formula on a dataset and found negative results (even
with a small number of variables (5)), which is embarrassing for a
distance�

Therefore, I used another method to encompass this bias. I randomly
permuted the variables with the observations (I neither cannot use my
hands, but hope everyone can understand) and calculated 10000 random D
by using this method. Then, I subtracted the mean of those random D to
the true D calculated on my dataset.

Am I correct doing so ?

Has anyone an idea of a better (exact mathematic) way to correct the D
without having negative values?

Thank you for your answers

St�phane BOUEE

--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org





--
Replies will be sent to the list.
For more information visit http://www.morphometrics.org

RE: [No Subject Provided: Corrected Mahalanobis D - the mod]

Reply via email to