Re: [R] Regression w/ interactions

Frank E Harrell Jr Thu, 15 Apr 2010 09:41:43 -0700

Michael Dykes wrote:

So, am i wrong to /assume /that the reasons my professor is asking us tofind a high R^2 & adjusted R^2, low Cp (near what p+1, if i remembercorrectly), low PRESS, & AIC is b/c the data is randomly generated (b/che has stated that all of the data for *all *of these hw assignments arerandomly generated)? And i am not /exactly /sure to what you arereferring when you say: 'low signal to noise ratios'. Do you mean /low/R^2 to epsilon_i's? or /low /predictors to epsilon i's? Please excusemy ignorance in these matters, but I am not only asking these questionsfor hw sakes's but for my future, as I hope to study for the actuarialexams & take the Probability Test sometime either next Spring or Summer[after taking this professors Calculus-based Prob & Stat sequence in thecoming Fall & Spring Semester].
Thanks again for your help, Professor.

Let me just add that a valid test of whether any of the variables orinteractions is associated with Y is to formulate a model with all theparameters in it and to use the global F test.

Stepwise techniques such as you are being asked to use are notscientific. If the true R^2 (which you do not know) is not high, thelow signal:noise ratio makes the data incapable to telling you the"right" variables to include with any reliability. Unfortunately, mostteachers of statistics do not understand this point, so you might begraded off for providing the right answer.


Frank

On Thu, Apr 15, 2010 at 8:26 AM, Frank E Harrell Jr<f.harr...@vanderbilt.edu <mailto:f.harr...@vanderbilt.edu>> wrote:


    Michael Dykes wrote:

        I have a project due in my Linear Regression class re:
        regression on a data
        set & my professor gave us a hint that there were *exactly *2 sig
        interactions. The data set is attached. We have to find which
        predictors are
        significant, & which 2 interactions are sig. Also, I nedd some
        guidance for
        this & selecting the best model. I tried the `full' model, that
        being:
        z=lm(y~x1+x2+x3+x4+x1*x2+x2*x3...+x3*x4). I then ran an anova(z), &
        summary(z). My R^2 & R^2_a were *really* low. I am not sure how
        to do PRESS,
        AIC & Cp in R yet though. Any help would be appreciated.



    Michael this is not really the place for help on homework other than
    perhaps on technical roadblocks.  Note that the strategy you are
    being told to follow is one whose statistical properties have been
    severely criticized in the statistical literature.  Only with a very
    high signal to noise ratio (e.g., high true R^2) can torturing data
    lead to a confession to something other than what the analyst wants
    to hear.  I suppose that in simulated data there is a "true" model
    out there waiting to be found, but beware of using this approach
    with real data with low signal to noise ratios.

    Frank

--Frank E Harrell Jr Professor and Chairman School of Medicine

                        Department of Biostatistics   Vanderbilt University


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regression w/ interactions

Reply via email to