----- Original Message -----
From: "Frank E Harrell Jr" <[EMAIL PROTECTED]>
To: "John Sorkin" <[EMAIL PROTECTED]>
Cc: <r-help@r-project.org>; <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Monday, October 13, 2008 2:09 PM
Subject: Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)
John Sorkin wrote:
Frank,
Perhaps I was not clear in my previous Email message. Sensitivity and
specificity do tell us about the quality of a test in that given two
tests the one with higher sensitivity will be better at identifying
subjects who have a disease in a pool who have a disease, and the more
sensitive test will be better at identifying subjects who do not have a
disease in a pool of people who do not have a disease. It is true that
positive predictive and negative predictive values are of greater utility
to a clinician, but as you know these two measures are functions of
sensitivity, specificity and disease prevalence. All other things being
equal, given two tests one would select the one with greater sensitivity
and specificity so in a sense they do measure the "quality" of a clinical
test - but not, as I tried to explain the quality of a statistical model.
That is not very relevant John. It is a function of all those things
because those quantities are all deficient.
I would select the test that can move the pre-test probability a great
deal in one or both directions.
Of course, this quantity is known as a likelihood ratio and is a function of
sensitivity and specificity. For 2 x 2 data one often speaks of postive
likelihood ratio and negative likelihood ratio, but for multi-row
contingency table one can define likelihood ratios for a series of cut-off
points. This has become a popular approach in evidence-based medicine when
diagnostic tests have continuous rather than binary outputs.
You are of course correct that sensitivity and specificity are not truly
"inherent" characteristics of a test as their values may change from
population-to-population, but paretically speaking, they don't change all
that much, certainly not as much as positive and negative predictive
values.
They change quite a bit, and mathematically must change if the disease is
not all-or-nothing.
I guess we will disagree about the utility of sensitivity and specificity
as simplifying concepts.
Thank you as always for your clear thoughts and stimulating comments.
And thanks for yours John.
Frank
John
among those subjects with a disease and the one with greater specificity
will be better at indentifying John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
Frank E Harrell Jr <[EMAIL PROTECTED]> 10/13/2008 2:35 PM >>>
John Sorkin wrote:
Jumping into a thread can be like jumping into a den of lions but here
goes . . .
Sensitivity and specificity are not designed to determine the quality of
a fit (i.e. if your model is good), but rather are characteristics of a
test. A test that has high sensitivity will properly identify a large
portion of people with a disease (or a characteristic) of interest. A
test with high specificity will properly identify large proportion of
people without a disease (or characteristic) of interest. Sensitivity
and specificity inform the end user about the "quality" of a test. Other
metrics have been designed to determine the quality of the fit, none
that I know of are completely satisfactory. The pseudo R squared is one
such measure.
For a given diagnostic test (or classification scheme), different
cut-off points for identifying subject who have disease can be examined
to see how they influence sensitivity and 1-specificity using ROC
curves.
I await the flames that will surely come my way
John
John this has been much debated but I fail to see how backwards
probabilities are that helpful in judging the usefulness of a test. Why
not condition on what we know (the test result and other baseline
variables) and quit conditioning on what we are trying to find out
(disease status)? The data collected in most studies (other than
case-control) allow one to use logistic modeling with the correct time
order.
Furthermore, sensitivity and specificity are not constants but vary with
subjects' characteristics. So they are not even useful as simplifying
concepts.
Frank
John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
Frank E Harrell Jr <[EMAIL PROTECTED]> 10/13/2008 12:27 PM >>>
Maithili Shiva wrote:
Dear Mr Peter Dalgaard and Mr Dieter Menne,
I sincerely thank you for helping me out with my problem. The thing is
taht I already have calculated SENS = Gg / (Gg + Bg) = 89.97%
and SPEC = Bb / (Bb + Gb) = 74.38%.
Now I have values of SENS and SPEC, which are absolute in nature. My
question was how do I interpret these absolue values. How does these
values help me to find out wheher my model is good.
With regards
Ms Maithili Shiva
I can't understand why you are interested in probabilities that are in
backwards time order.
Frank
________________________________________________________________________
Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC)
To: r-help@r-project.org Date: Friday, October 10, 2008, 5:54 AM
Hi
Hi I am working on credit scoring model using logistic
regression. I havd main sample of 42500 clentes and based on
their status as regards to defaulted / non - defaulted, I
have genereted the probability of default.
I have a hold out sample of 5000 clients. I have calculated
(1) No of correctly classified goods Gg, (2) No of correcly
classified Bads Bg and also (3) number of wrongly classified
bads (Gb) and (4) number of wrongly classified goods (Bg).
My prolem is how to interpret these results? What I have
arrived at are the absolute figures.
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.