On Fri, 1 Sep 2006, [EMAIL PROTECTED] wrote: > Prof Brian Ripley wrote: > > I would not have expected glm to be more than say 5x slower than lm if > CPU > > cycles and not memory were the limiting factor. In that case more RAM > > might be all you need. > > The ratio between glm and lm might well be about 5x, but that's still a > big difference for us.
You said lm was 'very fast', so I did not expect 5x 'very fast' to be 'too slow'. > I am pretty sure that RAM is not the main > problem; according to the Windows Task Manager the computer is at close to > 100% CPU usage, and swapping is not going on. Of course L1/L2 caches may > still be > something one can work on, but I'm not sure whether glm has enough > repeated access to the same data for that to help. (I don't know how glm > works, > but I guess it does a lot of scans through the whole data set, and that > the amount of working memory it needs during these scans is basically a > function of the number of parameters, not the number of observations, is > that right?) Not so. Because glm does weighted fits, it needs to access the whole data matrix at each iteration (to re-weight). > Many thanks for your observations about subset selection by the way, they > are a lot of help. Would a good approach be, say, to use some stricter > criteria like BIC for choosing a model, and then use non-statistical > methods to improve the plausibility of the chosen parameters? The latter entirely I would say. All statistics can say is that a variable improves the fit measurably more than one that is unrelated to the response: whether it improves it enough to be worthwhile in your application is non-statistical. The point here is that all but the most uselss variables will measurably improve the fit in large problems with few variables. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.