Re: [R] FW: logistic regression

Frank E Harrell Jr Mon, 06 Oct 2008 20:25:09 -0700

Kjetil Halvorsen wrote:

Hola!
If the original questioner wants a guide as to how variables to measure
IN THE FUTURE, when using his model in practice, thwen I think he willbe unhappy with any advice which forces him to measure each of the 44variables
when probably a small subset will do!   What is wrong with first using, let
us say, penalized likelihood, maybe with CV to choose degree of smoothing,
and SECONDLY using stepwise (maybee stepAIC from MASS) with
the predicted values from the first step model to get a goodfew-vatiables approximation which can be used in practice? If my memoryis'nt too bad, that
idea is from harrel's book.

No, I never recommended that. The probably is the extremely lowprobability that the method will find the "right" variables. Theprocess is unstable, and predicted values do not validate well. If youwant parsimony then a unified approach based on an L1 penalty (lasso anderivatives) is worth a look. These methods select variables andpenalize the remaining variables. The coefficients will be differentthan had the remaining variables been put into an unpenalized model,i.e., the method penalizes for the context of not knowing the rightvariables in advance.

The penalized likelihood step you proposed is a good one but theunpenalized stepwise method in the second step runs into problems.


Frank


Kjetil

On Mon, Sep 29, 2008 at 9:50 PM, Frank E Harrell Jr<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:


    Greg Snow wrote:

            -----Original Message-----
            From: [EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>
            [mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
            project.org <http://project.org>] On Behalf Of Frank E
            Harrell Jr
            Sent: Saturday, September 27, 2008 7:15 PM
            To: Darin Brooks
            Cc: [EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>;
            [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>;
            [EMAIL PROTECTED]
            <mailto:[EMAIL PROTECTED]>
            Subject: Re: [R] FW: logistic regression

            Darin Brooks wrote:

                Glad you were amused.

                I assume that "booking this as a fortune" means that
                this was an

            idiotic way

                to model the data?

            Dieter was nominating this for the "fortunes" package in R.
             (Thanks
            Dieter)

                MARS?  Boosted Regression Trees?  Any of these a better
                choice to

            extract

                significant predictors (from a list of about 44) for a
                measured

            dependent

                variable?

            Or use a data reduction method (principal components, variable
            clustering, etc.) or redundancy analysis (to remove individual
            predictors before examining associations with Y), or fit the
            full model
            using penalized maximum likelihood estimation.  lasso and
            lasso-like
            methods are also worth pursuing.


        Frank (and any others who want to share an opinion):

        What are your thoughts on model averaging as part of the above list?


    Model averaging has good performance but no advantage over fitting a
    single complex model using penalized maximum likelihood estimation.

    Frank




        --
        Gregory (Greg) L. Snow Ph.D.
        Statistical Data Center
        Intermountain Healthcare
        [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
        801.408.8111

--Frank E Harrell Jr Professor and Chair School of Medicine

                        Department of Biostatistics   Vanderbilt University

    ______________________________________________
    R-help@r-project.org <mailto:R-help@r-project.org> mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.



--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] FW: logistic regression

Reply via email to