I am running into resource issues with calculating correlation scores with 
cor.test(), on R 2.13.0:

  R version 2.13.0 (2011-04-13) ...
  Platform: x86_64-unknown-linux-gnu (64-bit)

In my test case, I read in a pair of ~150M vectors from text files using the 
pipe() and scan() functions, which pull in a specific column of numeric values 
from a text file. Once I have the two vectors, I run cor.test() on them.

If I run this on our compute cluster (running SGE), I have the option of 
setting hard limits on the memory assigned to the compute slot or node that my 
R task is sent to (this is done to keep R from grabbing so much memory from the 
compute cluster that other non-R tasks stall and fail). 

If I set hard limits (h_data and h_vmem) under 8 GB, then the R task finishes 
early with the following R error:

  Error: cannot allocate vector of size 2.0 Gb

What is confusing to me is that I have a 64-bit version of R, and so I should 
be able to use hard limits of 4GB (or, say, 5GB, if I make a generous 
assumption of 1GB of overhead) for this particular input size (2 GB x 2 vectors 
-- plus, say, 1GB of overhead).

What seems to be the case is that the overhead is closer to 4 GB in size, 
itself, in addition to the 4 GB for the two input vectors, based on hard 
limits. If my hard limits are under 8 GB, then the job fails. 

Does cor.test() really require this much extra space, or have I missed some 
compilation or other magic setting that addresses this aspect of running 
cor.test()?

Thanks for your advice.

Regards,
Alex
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to