Thanks for Peter, William, and Hadley's helps. Your codes are much more concise than mine. :P Both William and Hadley's comments are the same. Here are their codes.
f <- function(dataMatrix) rowMeans(datamatrix=="02") And Peter's codes are the following. apply(yourMatrix, 1, function(x) length(x[x==yourPattern]))/ncol(yourMatrix) In terms of the running time, the first one ran faster than the later one on my dataset (2.5 mins vs. 6.4 mins) The memory consumption, however, of the first one is much higher than the later. ( >8G vs. ~3G ) Any thoughts? My guess is the rowMeans created extra copies to perform its calculation, but not so sure. And I am also interested in understanding ways to handle memory issues. Help someone could shed light on this for me. :) Best, Mike -----Original Message----- From: Peter Alspach [mailto:palsp...@hortresearch.co.nz] Sent: Thursday, May 14, 2009 4:47 PM To: Ping-Hsun Hsieh Subject: RE: [R] memory usage grows too fast Tena koe Mike If I understand you correctly, you should be able to use something like: apply(yourMatrix, 1, function(x) length(x[x==yourPattern]))/ncol(yourMatrix) I see you've divided by nrow(yourMatrix) so perhaps I am missing something. HTH ... Peter Alspach > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Ping-Hsun Hsieh > Sent: Friday, 15 May 2009 11:22 a.m. > To: r-help@r-project.org > Subject: [R] memory usage grows too fast > > Hi All, > > I have a 1000x1000000 matrix. > The calculation I would like to do is actually very simple: > for each row, calculate the frequency of a given pattern. For > example, a toy dataset is as follows. > > Col1 Col2 Col3 Col4 > 01 02 02 00 => Freq of "02" is 0.5 > 02 02 02 01 => Freq of "02" is 0.75 > 00 02 01 01 ... > > My code is quite simple as the following to find the pattern "02". > > OccurrenceRate_Fun<-function(dataMatrix) > { > tmp<-NULL > tmpMatrix<-apply(dataMatrix,1,match,"02") > for ( i in 1: ncol(tmpMatrix)) > { > tmpRate<-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix) > tmp<-c(tmp,tmpHET) > } > rm(tmpMatrix) > rm(tmpRate) > return(tmp) > gc() > } > > The problem is the memory usage grows very fast and hard to > be handled on machines with less RAM. > Could anyone please give me some comments on how to reduce > the space complexity in this calculation? > > Thanks, > Mike > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > The contents of this e-mail are confidential and may be ...{{dropped:14}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.