Inline below: Bert Gunter Genentech Nonclinical Statistics
-----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Steve Lianoglou Sent: Friday, April 02, 2010 2:34 PM To: Jay Cc: r-help@r-project.org Subject: Re: [R] Cross-validation for parameter selection (glm/logit) Hi, On Fri, Apr 2, 2010 at 9:14 AM, Jay <josip.2...@gmail.com> wrote: > If my aim is to select a good subset of parameters for my final logit > model built using glm(). -- Define "good" What is the best way to cross-validate the -- Define "best" > results so that they are reliable? -- Define "reliable" Answers depend on what you mean by these terms. I suggest you consult a statistician to work with you. These are huge issues for which you would profit by some guidance. Cheers, Bert > > Let's say that I have a large dataset of 1000's of observations. I > split this data into two groups, one that I use for training and > another for validation. First I use the training set to build a model, > and the the stepAIC() with a Forward-Backward search. BUT, if I base > my parameter selection purely on this result, I suppose it will be > somewhat skewed due to the 1-time data split (I use only 1 training > dataset) Another approach would be to use penalized regression models. The glment package has lasso and elasticnet models for both logistic and "normal" regression models. Intuitively: in addition to minimizing (say) the squared loss, the model has to pay some cost (lambda) for including a non-zero parameter in your model, which in turn provides sparse models. You ca use CV to fine tune the value for lambda. If you're not familiar with these penalized models, the glmnet package has a few references to get you started. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.