On Sat, 21 Feb 2009, Tal Galili wrote:

Hello dear R mailing list members.

I have recently became curious of the possibility applying model
selection algorithms (even as simple as AIC) to regressions of large
datasets.


Large in the sense of many observations, one assumes.

But how large in terms of the number of variables??

If not too many variables, then you can form the regression sums of squares for all 2^p combinations of regressors from a biglm() fit of all variables as biglm provides coef() and vcov() methods.

If it is large, then you most likely will need to do subsampling to reduce the number to 'not too many' via lm() and friends then and apply the above strategy.

I searched as best as I could, but couldn't find any
reference or wrapper for using step or stepAIC to packages such as
biglm.


Surely any direct implementation of step() would be hopelessly long in execution time.


HTH,

Chuck



Any ideas or directions of how to implement such a concept ?


Best,
Tal









--
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu               UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to