Thanks for Peter, William, and Hadley's helps.
Your codes are much more concise than mine.  :P
 
Both William and Hadley's comments are the same. Here are their codes.

        f <- function(dataMatrix) rowMeans(datamatrix=="02")

And Peter's codes are the following.

        apply(yourMatrix, 1, function(x) 
length(x[x==yourPattern]))/ncol(yourMatrix)


In terms of the running time, the first one ran faster than the later one on my 
dataset (2.5 mins vs. 6.4 mins)
The memory consumption, however, of the first one is much higher than the 
later.  ( >8G vs. ~3G )

Any thoughts? My guess is the rowMeans created extra copies to perform its 
calculation, but not so sure.
And I am also interested in understanding ways to handle memory issues. Help 
someone could shed light on this for me. :)

Best,
Mike

-----Original Message-----
From: Peter Alspach [mailto:palsp...@hortresearch.co.nz] 
Sent: Thursday, May 14, 2009 4:47 PM
To: Ping-Hsun Hsieh
Subject: RE: [R] memory usage grows too fast

Tena koe Mike

If I understand you correctly, you should be able to use something like:

apply(yourMatrix, 1, function(x)
length(x[x==yourPattern]))/ncol(yourMatrix)

I see you've divided by nrow(yourMatrix) so perhaps I am missing
something.

HTH ...

Peter Alspach

 

> -----Original Message-----
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Ping-Hsun Hsieh
> Sent: Friday, 15 May 2009 11:22 a.m.
> To: r-help@r-project.org
> Subject: [R] memory usage grows too fast
> 
> Hi All,
> 
> I have a 1000x1000000 matrix. 
> The calculation I would like to do is actually very simple: 
> for each row, calculate the frequency of a given pattern. For 
> example, a toy dataset is as follows.
> 
> Col1  Col2    Col3    Col4
> 01    02      02      00              => Freq of "02" is 0.5
> 02    02      02      01              => Freq of "02" is 0.75
> 00    02      01      01              ...
> 
> My code is quite simple as the following to find the pattern "02".
> 
> OccurrenceRate_Fun<-function(dataMatrix)
> {
>   tmp<-NULL
>   tmpMatrix<-apply(dataMatrix,1,match,"02")
>    for ( i in 1: ncol(tmpMatrix))
>   {
>     tmpRate<-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
>     tmp<-c(tmp,tmpHET)
>   }
>   rm(tmpMatrix)
>   rm(tmpRate)
>   return(tmp)
>   gc()
> }
> 
> The problem is the memory usage grows very fast and hard to 
> be handled on machines with less RAM.
> Could anyone please give me some comments on how to reduce 
> the space complexity in this calculation?
> 
> Thanks,
> Mike
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

The contents of this e-mail are confidential and may be ...{{dropped:14}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to