Re: 'Distance' between two normal distributions
Is there any optimality or other reason for the choice of the two below distances? There are surely many other possibilities (e.g. Mallow's distance), which, however, might not be as appropriate, but at the moment I do not see any reasoning. Could you please comment/advise on this? TIA Robert NĂ©meth Hermman Rubin wrote: >In article, >Francis Dermot Sweeney <[EMAIL PROTECTED]> wrote: >>If I have two normal distributions N(m1, s1) and N(m2, s2), what is a >>good measure of the distance between them? I was thinking of something >>like a K-S distance like max|phi1-phi2|. I know it probably depende on >>what I want it for, or what exactly I mean by distance, but any ideas >>would be helpful. >If you are testing simple against simple for large samples, >you want the Cramer-Chernoff distance (see Chernoff), which >is essentially -ln(min \int f(x)^t g(x)^(1-t) dx), where f >and g are the two densities. If you are doing a sequential >test with small cost of obsrvation, the distance is given by >the pair of Kullback-Leibler numbers. >- -- >This address is for information only. I do not claim that these views >are those of the Statistics Department or of Purdue University. >Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 >[EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: 'Distance' between two normal distributions
In article, Francis Dermot Sweeney <[EMAIL PROTECTED]> wrote: >If I have two normal distributions N(m1, s1) and N(m2, s2), what is a >good measure of the distance between them? I was thinking of something >like a K-S distance like max|phi1-phi2|. I know it probably depende on >what I want it for, or what exactly I mean by distance, but any ideas >would be helpful. If you are testing simple against simple for large samples, you want the Cramer-Chernoff distance (see Chernoff), which is essentially -ln(min \int f(x)^t g(x)^(1-t) dx), where f and g are the two densities. If you are doing a sequential test with small cost of obsrvation, the distance is given by the pair of Kullback-Leibler numbers. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: 'Distance' between two normal distributions
Francis Dermot Sweeney wrote: = > If I have two normal distributions N(m1, s1) and N(m2, = > s2), what is a good measure of the distance between them? = > I was thinking of something like a K-S distance like = > max|phi1-phi2|. I know it probably depende on what I > want it for, or what exactly I mean by distance, but any = > ideas would be helpful. Francis, This question arises in receiver operating characteristic (ROC) analysis, where an effective ("latent") pair of univariate normal data distributions often may be assumed to underlie an ROC curve. Given two univariate normal probability densities with generally different means (m1 and m2) and standard deviations (s1 and s2), the common indices of separation are d=92_e =3D (m1 - m2)/((s1 + s2)/2) and d_a =3D (m1 - m2)/SQRT((s1**2 + s2**2)/2), whereas a less well-known measure is Sakitt's D =3D (m1 - m2)/SQRT(s1 * s2). In the special case where s1 =3D s2 =3D s, all three of these indices red= uce to d' =3D (m1 - m2)/s . All three indices also apply rigorously to *non-normal* decision-variable densities in ROC analysis if some (usually unknown) monotonic transformation of the decision variable yields normal densities. This generalization is possible because ROC curves are invariant under any monotonic transformation of the decison axis, so the requirement for strict interpretability of the indices becomes one of having an ROC curve that plots as a straight line on "normal deviate axes" (e.g., see Metz CE. ROC methodology in radiologic imaging. = Investigative Radiology 1986; 21: 720). In non-normal situations of this kind, the indices are *not* defined in terms of means and standard deviations, but instead in terms of the straight-line ROC curve on normal-deviate axes. If the "y intercept" and "slope" of such an ROC are given by "a" and "b", respectively, then d=92_e =3D 2a/(1 + b) and d_a =3D a*SQRT(2/(1 + b**2)) , whereas Sakitt's D =3D a/SQRT(b). All of these indices approach = d' =3D a in the special case where b =3D 1. = When an ROC curve plots as a straight line on normal-deviate axes, its value of d_a happens to equals the normal deviate which corresponds to the area under the ROC when that curve is plotted on *conventional* (i.e., probability, rather than normal-deviate) axes. The latter interpretation of d_a is sometimes used for other ROC curve forms as well, which isn't strictly "legal" but, from a practical standpoint, is rarely misleading. If you=92d like to do some additional reading, I would recommend that you= begin with Simpson AJ, Fitter MJ. What is the best index of detectability? = Psych Bull 1973; 80:481-488. And finally, I feel obliged to emphasize the importance of a point that you raised yourself: The validity of any summary index *does* depend -- sometimes strongly -- upon what it=92s used for. Hoping this helps, Charles Metz = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: 'Distance' between two normal distributions
seems, as you have said, depends what you want to do with it if there is considerable overlap, then whatever distance you use will have some of both distributions included ... if there is essentially no overlap ... then any pair of values ... one from each ...will reflect a real difference of course, if there is a small difference in means but very large sds ... that is one thing wheres ... if there were the same small differences in means but, minuscule sds ... that would be another thing the simple thing would be to use the mean difference but, that really does not reflect if there is any overlap between the two and, that seems to be part of the issue At 07:28 PM 2/6/02 +, Francis Dermot Sweeney wrote: >If I have two normal distributions N(m1, s1) and N(m2, s2), what is a >good measure of the distance between them? I was thinking of something >like a K-S distance like max|phi1-phi2|. I know it probably depende on >what I want it for, or what exactly I mean by distance, but any ideas >would be helpful. > >Thanks, >Francis. > >-- > >Francis Sweeney >Dept. of Aero/Astro >Stanford U. > > >= >Instructions for joining and leaving this list, remarks about the >problem of INAPPROPRIATE MESSAGES, and archives are available at > http://jse.stat.ncsu.edu/ >= Dennis Roberts, 208 Cedar Bldg., University Park PA 16802 WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm AC 8148632401 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =