Dear John, interesting. There must be a bottleneck somewhere, which possibly went unnoticed because econometricians seldom use so many data points. In fact 'plm' wasn't designed to handle "only" 700 Megs of data at a time; but we're happy to investigate in this direction too. E.g., I was aware of some efficiency problems if effect="twoways" but I seem to understand that you are using effect="individual"? --> which takes me to the main point.
I understand that enclosing the data for a reproducible report, as requested by the posting guide, is awkward for such a big dataset. Yet it would be of great help if you at least produced: - an output of your procedure, in order to see what goes wrong and where - the output of traceback() called immediately after you got the error (idem) and possibly gave it a try with lm() applied to the very same formula and data, maybe into a system.time( ... ) statement. Else, the information you provide is way too scant to even make an educated guess. For example, it isn't clear whether the problem is related to plm() or to vcovHC.plm etc. As far as "simple demeaning" is concerned, you might try the following code, which really does only that. Be aware that **standard errors are biased** etc. etc., this is not meant to be a proper function but just a computational test for your data and a quick demonstration of demeaning. 'plm()' is far more structured, for a number of reasons. Please execute it inside system.time() again. ######### test function for within model, BIASED SEs !! ############# ## ## ## example: ## data(Produc, package="plm") ## mod <- FEmod(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, index=Produc$state, data=Produc) ## summary(mod) ## ## compare with: ## library(plm) ## example(plm) demean<-function(x,index,lambda=1,na.rm=F) { as.vector(x-lambda*tapply(x,index,mean,na.rm=na.rm)[unclass(as.factor(in dex))]) } FEmod<-function(formula,index,data=ls()) { ## fit a model without intercept in any case formula<-as.formula(paste(deparse(formula(formula)),"-1",sep="")) X<-model.matrix(formula,data=data) y<-model.response(model.frame(formula,data=data)) ## reduce index accordingly names(index)<-row.names(data) ind<-index[which(names(index)%in%row.names(X))] ## within transf. MX<-matrix(NA,ncol=dim(X)[[2]],nrow=dim(X)[[1]]) for(i in 1:dim(X)[[2]]) { MX[,i]<-demean(X[,i],index=ind,lambda=1) } My<-demean(y,index=ind,lambda=1) ## estimate within model femod<-lm(My~MX-1) return(femod) } ####### end test function ######## Best, Giovanni ########### original message ############# ------------------------------ Message: 28 Date: Tue, 07 Feb 2012 15:35:07 +0100 From: caribou...@gmx.fr To: r-help@r-project.org Subject: [R] fixed effects with clustered standard errors Message-ID: <20120207143507.142...@gmx.com> Content-Type: text/plain; charset="utf-8" Dear R-helpers, I have a very simple question and I really hope that someone could help me I would like to estimate a simple fixed effect regression model with clustered standard errors by individuals. For those using Stata, the counterpart would be xtreg with the "fe" option, or areg with the "absorb" option and in both case the clustering is achieved with "vce(cluster id)" My question is : how could I do that with R ? An important point is that I have too many individuals, therefore I cannot include dummies and should use the demeaning "usual" procedure. I tried with the plm package with the "within" option, but R quikcly tells me that the memory limits are attained (I have over 10go ram!) while the dataset is only 700mo (about 50 000 individuals, highly unbalanced) I dont understand... plm do indeed demean the data so the computation should be fast and light enough... ?! Are there any other solutions ? Many thanks in advance ! ;) John ############ end original message ############ Giovanni Millo, PhD Research Dept., Assicurazioni Generali SpA Via Machiavelli 4, 34132 Trieste (Italy) tel. +39 040 671184 fax +39 040 671160 Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.