Kjetil Halvorsen wrote:
Hola!

If the original questioner wants a guide as to how variables to measure
IN THE FUTURE, when using his model in practice, thwen I think he will be unhappy with any advice which forces him to measure each of the 44 variables
when probably a small subset will do!   What is wrong with first using, let
us say, penalized likelihood, maybe with CV to choose degree of smoothing,
and SECONDLY using stepwise (maybee stepAIC from MASS) with
the predicted values from the first step model to get a good few-vatiables approximation which can be used in practice? If my memory is'nt too bad, that
idea is from harrel's book.

No, I never recommended that. The probably is the extremely low probability that the method will find the "right" variables. The process is unstable, and predicted values do not validate well. If you want parsimony then a unified approach based on an L1 penalty (lasso an derivatives) is worth a look. These methods select variables and penalize the remaining variables. The coefficients will be different than had the remaining variables been put into an unpenalized model, i.e., the method penalizes for the context of not knowing the right variables in advance.

The penalized likelihood step you proposed is a good one but the unpenalized stepwise method in the second step runs into problems.

Frank


Kjetil

On Mon, Sep 29, 2008 at 9:50 PM, Frank E Harrell Jr <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    Greg Snow wrote:

            -----Original Message-----
            From: [EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>
            [mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
            project.org <http://project.org>] On Behalf Of Frank E
            Harrell Jr
            Sent: Saturday, September 27, 2008 7:15 PM
            To: Darin Brooks
            Cc: [EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>;
            [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>;
            [EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>
            Subject: Re: [R] FW: logistic regression

            Darin Brooks wrote:

                Glad you were amused.

                I assume that "booking this as a fortune" means that
                this was an

            idiotic way

                to model the data?

            Dieter was nominating this for the "fortunes" package in R.
             (Thanks
            Dieter)

                MARS?  Boosted Regression Trees?  Any of these a better
                choice to

            extract

                significant predictors (from a list of about 44) for a
                measured

            dependent

                variable?

            Or use a data reduction method (principal components, variable
            clustering, etc.) or redundancy analysis (to remove individual
            predictors before examining associations with Y), or fit the
            full model
            using penalized maximum likelihood estimation.  lasso and
            lasso-like
            methods are also worth pursuing.


        Frank (and any others who want to share an opinion):

        What are your thoughts on model averaging as part of the above list?


    Model averaging has good performance but no advantage over fitting a
    single complex model using penalized maximum likelihood estimation.

    Frank




        --
        Gregory (Greg) L. Snow Ph.D.
        Statistical Data Center
        Intermountain Healthcare
        [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
        801.408.8111





-- Frank E Harrell Jr Professor and Chair School of Medicine
                        Department of Biostatistics   Vanderbilt University

    ______________________________________________
    R-help@r-project.org <mailto:R-help@r-project.org> mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to