Just adding a few cents to this: rowMedians(x) is roughly 4-10 times faster than apply(x, MARGIN=1, FUN=median) - at least on my local Windows 7 64bit tests. You can do these simple benchmark runs yourself via the matrixStats/tests/rowMedians.R system test, cf. http://goo.gl/YCJed [R-forge].
/Henrik On Wed, May 23, 2012 at 10:30 AM, Preeti <pre...@sci.utah.edu> wrote: > Hmm.. that is interesting... I did this on our server machine which has > about 200 cores. So memory is not an issue. Also, building the dataframe > takes about a few minutes maximum for me. My code is similar to yours but > for the fact that I create my dataframe from read.delim("filename") and > then I drop the first column because it has characters. I don't know why it > takes long on my machine. > > On Wed, May 23, 2012 at 11:26 AM, Benno Pütz <pu...@mpipsykl.mpg.de> wrote: > >> I wonder how you do this (or maybe on what kind of machine you execute it). >> >> I tried it out of curiosity and get >> >> > df = as.data.frame(lapply(1:300,function(x)sample(200,250000,T))) >> > colnames(df) = sample(letters[1:20],300,T) >> > system.time(dfmed<-lapply(unique(colnames(df)), function(x) >> + rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))) >> user system elapsed >> 5.680 0.952 7.171 >> >> and those times are in seconds! The time consuming part was building the >> data.frame not the calculation. >> >> The only thing I noticed is that my R process claims some 1.4 GB of memory >> but that should not be a problem on any recent hardware but my guess at >> answering your question would be that this might be your problem, >> especially if you have other memory-hogging variables like this data frame >> lying around and you see severe memory swapping effects >> >> Benno >> >> Hello Everybody, >> >> The code: >> >> dfmed<-lapply(unique(colnames(df)), function(x) >> rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)) >> >> takes really long time to execute ( in hours). Is there a faster way to do >> this? >> >> Thanks! >> >> On Tue, May 22, 2012 at 3:46 PM, Preeti <pre...@sci.utah.edu> wrote: >> >> Thanks Henrik! Here is the one-liner that I wrote: >> >> >> dfmed<-lapply(unique(colnames(df)), function(x) >> >> rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)) >> >> >> Thanks again! >> >> >> >> On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson <h...@biostat.ucsf.edu >> >wrote: >> >> >> See rowMedians() of the matrixStats package for replacing apply(x, >> >> MARGIN=1, FUN=median). /Henrik >> >> >> On Tue, May 22, 2012 at 12:34 PM, Preeti <pre...@sci.utah.edu> wrote: >> >> Hi, >> >> >> I have a 250,000 by 300 matrix. I am trying to calculate the median of >> >> those columns (by row) with column names that are identical. I would >> >> like >> >> this to be efficient since apply(x,1,median) where x is created by >> >> choosing >> >> only those columns with same column name and looping on this is taking a >> >> really long time. Is there an efficient way to do this? >> >> >> Thanks! >> >> >> [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> Benno Pütz >> Statistical Genetics >> MPI of Psychiatry >> Kraepelinstr. 2-10 >> 80804 Munich, Germany >> T: ++49-(0)89-306 22 222 >> F: ++49-(0)89-306 22 601 >> >> >> >> > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.