I guess you meant X has 20,000 levels (and 40 observations each)? In that case lm() will attempt to create a design matrix that's 8e5 by 2e4, and that's unlikely to fit in the RAM.
It is very easy to compute by hand. I'm using a smaller data size first to check the result against summary(lm()), then the size you specified (hopefully with more correct arithmetics...) to show that it's quite doable even on a modest laptop: > set.seed(1) > y <- rnorm(80) > x <- factor(rep(paste("x", 1:20, sep=""), 4)) > summary(lm(y ~ x))$r.squared [1] 0.2555885 > rsq <- function(x, y) { + sse <- sum(ave(y, x, FUN=function(x) x - mean(x))^2) + sstotal <- var(y) * (length(y) - 1) + 1 - sse / sstotal + } > rsq(x, y) [1] 0.2555885 > set.seed(1) > y <- rnorm(8e5) > x <- factor(rep(paste("x", 1:2e4, sep=""), 40)) > system.time(rsq(x, y)) [1] 1.99 0.03 2.06 NA NA Andy From: ivo welch > > dear R wizards: > > X is factor with 20,000*20=800,000 observations of 20,000 factors. > I.e., each factor has 20 observations. y is 800,000 normally > distributed data points. I want to see how much R^2 the X > factors can provide. Easy, right? > > > lm ( y ~ X) > and > > aov( y ~ X) > Error: cannot allocate vector of size 3125000 Kb > > is this computationally infeasible? (I am not an expert, but > off-hand, I thought this can work as long as the X's are just > factors = fitted means.) > > > help.search("fixed.effects"); > > fixed.effects(nlme) Extract Fixed Effects > fixed.effects.lmList(nlme) Extract lmList Fixed Effects > lme(nlme) Linear Mixed-Effects Models > lmeStruct(nlme) Linear Mixed-Effects Structure > nlme(nlme) Nonlinear Mixed-Effects Models > nlmeStruct(nlme) Nonlinear Mixed-Effects Structure > > Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'. > > ok---I want to read the fixed.effects help. could the help > system tell me how to inspect these entries? > > > help("fixed.effects") > wrong > > help.search("fixed.effects") > two entries, as above. nothing new. > > ?fixed.effects > wrong > > eventually, it dawned on me that nlme in parens was not a > function argument, but the name of the package within which > fixed.effects lives. Suggestion: maybe a different notation > to name packages would be more intuitive in the help system. > yes, I know it now, but other novices may not. even a colon > instead of a () may be more intuitive in this context. > > > library(nlme); ?lme > and then > > lme(y ~ X) > Error in getGroups.data.frame(dataMix, groups) : > Invalid formula for groups > > > now I have to beg for some help. ok, blatant free-riding. > the lme docs tells me it does the Laird and Ware model, but I > do not know this model. the only two examples given at the > end of the lme help file seem to be similar to what I just > specified. so, how do I execute a simple fixed-effects > model? (later, I may want to add a variable Z that is a > continuous random variable.) could someone please give me > one quick example? help is, as always, highly appreciated. > > sincerely, > > /ivo welch > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html