Just adding a few cents to this:

rowMedians(x) is roughly 4-10 times faster than apply(x, MARGIN=1,
FUN=median) - at least on my local Windows 7 64bit tests.  You can do
these simple benchmark runs yourself via the
matrixStats/tests/rowMedians.R system test, cf. http://goo.gl/YCJed


On Wed, May 23, 2012 at 10:30 AM, Preeti <pre...@sci.utah.edu> wrote:
> Hmm.. that is interesting... I did this on our server machine which has
> about 200 cores. So memory is not an issue. Also, building the dataframe
> takes about a few minutes maximum for me. My code is similar to yours but
> for the fact that I create my dataframe from read.delim("filename") and
> then I drop the first column because it has characters. I don't know why it
> takes long on my machine.
> On Wed, May 23, 2012 at 11:26 AM, Benno Pütz <pu...@mpipsykl.mpg.de> wrote:
>> I wonder how you do this (or maybe on what kind of machine you execute it).
>> I tried it out of curiosity and get
>> > df = as.data.frame(lapply(1:300,function(x)sample(200,250000,T)))
>> > colnames(df) = sample(letters[1:20],300,T)
>> > system.time(dfmed<-lapply(unique(colnames(df)), function(x)
>> + rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)))
>>    user  system elapsed
>>   5.680   0.952   7.171
>> and those times are in seconds! The time consuming part was building the
>> data.frame not the calculation.
>> The only thing I noticed is that my R process claims some 1.4 GB of memory
>> but that should not be a problem on any recent hardware but my guess at
>> answering your question would be that this might be your problem,
>> especially if you have other memory-hogging variables like this data frame
>> lying around and you see severe memory swapping effects
>> Benno
>> Hello Everybody,
>> The code:
>> dfmed<-lapply(unique(colnames(df)), function(x)
>> rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
>> takes really long time to execute ( in hours). Is there a faster way to do
>> this?
>> Thanks!
>> On Tue, May 22, 2012 at 3:46 PM, Preeti <pre...@sci.utah.edu> wrote:
>> Thanks Henrik! Here is the one-liner that I wrote:
>> dfmed<-lapply(unique(colnames(df)), function(x)
>> rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
>> Thanks again!
>> On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson <h...@biostat.ucsf.edu
>> >wrote:
>> See rowMedians() of the matrixStats package for replacing apply(x,
>> MARGIN=1, FUN=median). /Henrik
>> On Tue, May 22, 2012 at 12:34 PM, Preeti <pre...@sci.utah.edu> wrote:
>> Hi,
>> I have a 250,000 by 300 matrix. I am trying to calculate the median of
>> those columns (by row) with column names that are identical. I would
>> like
>> this to be efficient since apply(x,1,median) where x is created by
>> choosing
>> only those columns with same column name and looping on this is taking a
>> really long time. Is there an efficient way to do this?
>> Thanks!
>>       [[alternative HTML version deleted]]
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> [[alternative HTML version deleted]]
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> Benno Pütz
>> Statistical Genetics
>> MPI of Psychiatry
>> Kraepelinstr. 2-10
>> 80804 Munich, Germany
>> T: ++49-(0)89-306 22 222
>> F: ++49-(0)89-306 22 601
>        [[alternative HTML version deleted]]
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to