Dimitri, You might try applying ave() to each column. E.g., use
f2 <- function(frame) { for(i in 2:ncol(frame)) { frame[,i] <- ave(frame[,i], frame[,1], FUN=function(x)x/mean(x,na.rm=TRUE)) } frame } Note that this returns a data.frame and retains the grouping column (the first) while your original code returns a matrix without the grouping column. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter > Sent: Tuesday, March 30, 2010 10:52 AM > To: 'Dimitri Liakhovitski'; 'r-help' > Subject: Re: [R] Code is too slow: mean-centering variables > in a data framebysubgroup > > ?scale > > Bert Gunter > Genentech Nonclinical Biostatistics > > > > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On > Behalf Of Dimitri Liakhovitski > Sent: Tuesday, March 30, 2010 8:05 AM > To: r-help > Subject: [R] Code is too slow: mean-centering variables in a > data frame > bysubgroup > > Dear R-ers, > > I have a large data frame (several thousands of rows and about 2.5 > thousand columns). One variable ("group") is a grouping variable with > over 30 levels. And I have a lot of NAs. > For each variable, I need to divide each value by variable mean - by > subgroup. I have the code but it's way too slow - takes me about 1.5 > hours. > Below is a data example and my code that is too slow. Is there a > different, faster way of doing the same thing? > Thanks a lot for your advice! > > Dimitri > > > # Building an example frame - with groups and a lot of NAs: > set.seed(1234) > frame<-data.frame(group=rep(paste("group",1:10),10),a=rnorm(1: 100),b=rnorm(1 > :100),c=rnorm(1:100),d=rnorm(1:100),e=rnorm(1:100),f=rnorm(1:1 > 00),g=rnorm(1: > 100)) > frame<-frame[order(frame$group),] > names.used<-names(frame)[2:length(frame)] > set.seed(1234) > for(i in names.used){ > i.for.NA<-sample(1:100,60) > frame[[i]][i.for.NA]<-NA > } > frame > > ### Code that does what's needed but is too slow: > Start<-Sys.time() > frame <- do.call(cbind, lapply(names.used, function(x){ > unlist(by(frame, frame$group, function(y) y[,x] / > mean(y[,x],na.rm=T))) > })) > Finish<-Sys.time() > print(Finish-Start) # Takes too long > > -- > Dimitri Liakhovitski > Ninah.com > dimitri.liakhovit...@ninah.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.