Kjetil Halvorsen wrote:
Hola!
If the original questioner wants a guide as to how variables to measure
IN THE FUTURE, when using his model in practice, thwen I think he will
be unhappy with any advice which forces him to measure each of the 44
variables
when probably a small subset will do! What is wrong with first using, let
us say, penalized likelihood, maybe with CV to choose degree of smoothing,
and SECONDLY using stepwise (maybee stepAIC from MASS) with
the predicted values from the first step model to get a good
few-vatiables approximation which can be used in practice? If my memory
is'nt too bad, that
idea is from harrel's book.
No, I never recommended that. The probably is the extremely low
probability that the method will find the "right" variables. The
process is unstable, and predicted values do not validate well. If you
want parsimony then a unified approach based on an L1 penalty (lasso an
derivatives) is worth a look. These methods select variables and
penalize the remaining variables. The coefficients will be different
than had the remaining variables been put into an unpenalized model,
i.e., the method penalizes for the context of not knowing the right
variables in advance.
The penalized likelihood step you proposed is a good one but the
unpenalized stepwise method in the second step runs into problems.
Frank
Kjetil
On Mon, Sep 29, 2008 at 9:50 PM, Frank E Harrell Jr
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
Greg Snow wrote:
-----Original Message-----
From: [EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>
[mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
project.org <http://project.org>] On Behalf Of Frank E
Harrell Jr
Sent: Saturday, September 27, 2008 7:15 PM
To: Darin Brooks
Cc: [EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>;
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>;
[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>
Subject: Re: [R] FW: logistic regression
Darin Brooks wrote:
Glad you were amused.
I assume that "booking this as a fortune" means that
this was an
idiotic way
to model the data?
Dieter was nominating this for the "fortunes" package in R.
(Thanks
Dieter)
MARS? Boosted Regression Trees? Any of these a better
choice to
extract
significant predictors (from a list of about 44) for a
measured
dependent
variable?
Or use a data reduction method (principal components, variable
clustering, etc.) or redundancy analysis (to remove individual
predictors before examining associations with Y), or fit the
full model
using penalized maximum likelihood estimation. lasso and
lasso-like
methods are also worth pursuing.
Frank (and any others who want to share an opinion):
What are your thoughts on model averaging as part of the above list?
Model averaging has good performance but no advantage over fitting a
single complex model using penalized maximum likelihood estimation.
Frank
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
801.408.8111
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
R-help@r-project.org <mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.