Assuming your original matrix IS a matrix, call it yourmat, and not a data frame (whose columns **must* have unique names if you haven't messed with the check.names default) then maybe:
#### UNTESTED!!! ### thenames <- unique(dimnames(yourmat)[[2]]) ans <- lapply(thenames, function(nm, { apply( yourmat[, thenames==nm],1, median,na.rm=TRUE) }) If I got it right, ans should be a list of vectors, one per unique column name, each of which gives rowwise medians of the columns with the same name. This can be combined into a new matrix e.g. by do.call(cbind,ans) if you like. You could get a matrix answer directly if you use sapply or, maybe faster, vapply instead of lapply, but I find lists simpler to begin with. I believe this should be reasonably fast. Converting to and from data frames and operating on data frames slows things down a lot, because these are very general structures that must keep track of a lot of overhead when being worked on. Matrices do not. -- Bert On Wed, May 23, 2012 at 9:46 AM, Preeti <pre...@sci.utah.edu> wrote: > Hello Everybody, > > The code: > > dfmed<-lapply(unique(colnames(df)), function(x) > rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)) > > takes really long time to execute ( in hours). Is there a faster way to do > this? > > Thanks! > > On Tue, May 22, 2012 at 3:46 PM, Preeti <pre...@sci.utah.edu> wrote: > >> Thanks Henrik! Here is the one-liner that I wrote: >> >> dfmed<-lapply(unique(colnames(df)), function(x) >> rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)) >> >> Thanks again! >> >> >> On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson >> <h...@biostat.ucsf.edu>wrote: >> >>> See rowMedians() of the matrixStats package for replacing apply(x, >>> MARGIN=1, FUN=median). /Henrik >>> >>> On Tue, May 22, 2012 at 12:34 PM, Preeti <pre...@sci.utah.edu> wrote: >>> > Hi, >>> > >>> > I have a 250,000 by 300 matrix. I am trying to calculate the median of >>> > those columns (by row) with column names that are identical. I would >>> like >>> > this to be efficient since apply(x,1,median) where x is created by >>> choosing >>> > only those columns with same column name and looping on this is taking a >>> > really long time. Is there an efficient way to do this? >>> > >>> > Thanks! >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help@r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.