-------- Original Message -------- Subject: RE: [No Subject Provided: Corrected Mahalanobis D - the mod] Date: Sun, 2 Mar 2008 12:54:04 -0800 (PST) From: F. James Rohlf <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Organization: Stony Brook University To: morphmet@morphometrics.org References: <[EMAIL PROTECTED]>
That correction is based on the relationship given in equation (4) on page 219 of Anderson (1984) "An introduction to multivariate statistical analysis", 2nd ed. It shows that the expected (average) value of the sample Generalized distance squared is not the population value and thus there is a bias for small sample sizes.
One can invert the equation to get the estimate corrected for bias that you quote. However this can yield negative estimates sometimes. What the formula does is to give an estimate which on average will equal the true distance squared - thus some estimates have to be too small (perhaps even negative) and others too large. The formula only gives the relationship for their average.
------------------------ F. James Rohlf, Distinguished Professor Ecology & Evolution, Stony Brook University www: http://life.bio.sunysb.edu/ee/rohlf
-----Original Message----- From: morphmet [mailto:[EMAIL PROTECTED] Sent: Friday, February 29, 2008 2:13 PM To: morphmet Subject: [No Subject Provided: Corrected Mahalanobis D - the mod] -------- Original Message -------- Date: Fri, 29 Feb 2008 11:05:27 -0800 (PST) From: Elsa et St�phane BOUEE <[EMAIL PROTECTED]> To: <morphmet@morphometrics.org> Speaking about Mahalanobis distance (D) I have a question/remark. Due to random fluctuation in a finite number of observations, D is not null and will increase with the number of variables. Markus has proposed a formula that takes into account this fact (I did not find the mathematical demonstration of this formula): Corrected(D)=[(n1+n2-p-3)*D/(n1+n2-2)]-[(n1+n2)*p/n1*n2] With: D=mahalanobis distance n1 and n2: number of observations in the 2 groups p: number of variables I applied this formula on a dataset and found negative results (even with a small number of variables (5)), which is embarrassing for a distance� Therefore, I used another method to encompass this bias. I randomly permuted the variables with the observations (I neither cannot use my hands, but hope everyone can understand) and calculated 10000 random D by using this method. Then, I subtracted the mean of those random D to the true D calculated on my dataset. Am I correct doing so ? Has anyone an idea of a better (exact mathematic) way to correct the D without having negative values? Thank you for your answers St�phane BOUEE -- Replies will be sent to the list. For more information visit http://www.morphometrics.org
-- Replies will be sent to the list. For more information visit http://www.morphometrics.org