Prof Brian Ripley wrote: > I would not have expected glm to be more than say 5x slower than lm if CPU > cycles and not memory were the limiting factor. In that case more RAM > might be all you need.
The ratio between glm and lm might well be about 5x, but that's still a big difference for us. I am pretty sure that RAM is not the main problem; according to the Windows Task Manager the computer is at close to 100% CPU usage, and swapping is not going on. Of course L1/L2 caches may still be something one can work on, but I'm not sure whether glm has enough repeated access to the same data for that to help. (I don't know how glm works, but I guess it does a lot of scans through the whole data set, and that the amount of working memory it needs during these scans is basically a function of the number of parameters, not the number of observations, is that right?) Many thanks for your observations about subset selection by the way, they are a lot of help. Would a good approach be, say, to use some stricter criteria like BIC for choosing a model, and then use non-statistical methods to improve the plausibility of the chosen parameters? best wishes, George Russell ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.