Richard Pugh wrote:
...
I have run into some issues regarding the way R handles its memory,
especially on NT.
...

Actually, you've run into NT's nasty memory management. Welcome! :)
R-core have worked very hard to work around Windows memory issues, so they've probably got a better answer than I can give. I'll give you a few quick answers, and then wait for correction when one them replies.


A typical call
may look like this .


myInputData <- matrix(sample(1:100, 7500000, T), nrow=5000)
myPortfolio <- createPortfolio(myInputData)

It seems I can only repeat this code process 2/3 times before I have to
restart R (to get the memory back). I use the same object names
(myInputData and myPortfolio) each time, so I am not create more large
objects ..

Actually, you do. Re-using a name does not re-use the same blocks of memory. The size of the object may change, for example.


I think the problems I have are illustrated with the following example
from a small R session .


# Memory usage for Rui process = 19,800
testData <- matrix(rnorm(10000000), 1000) # Create big matrix
# Memory usage for Rgui process = 254,550k
rm(testData)
# Memory usage for Rgui process = 254,550k
gc()

used (Mb) gc trigger (Mb) Ncells 369277 9.9 667722 17.9 Vcells 87650 0.7 24286664 185.3

# Memory usage for Rgui process = 20,200k

In the above code, R cannot recollect all memory used, so the memory
usage increases from 19.8k to 20.2. However, the following example is
more typical of the environments I use .


# Memory 128,100k
myTestData <- matrix(rnorm(10000000), 1000)
# Memory 357,272k
rm(myTestData)
# Memory 357,272k
gc()

used (Mb) gc trigger (Mb) Ncells 478197 12.8 818163 21.9 Vcells 9309525 71.1 31670210 241.7

# Memory 279,152k

R can return memory to Windows, but it cannot *make* Windows take it back. Exiting the app is the only guaranteed way to do this, for any application.


The fact that you get this with matricies makes me suspect fragmentation issues with memory, rather than pure lack of memory. Here, the memory is disorganised, thanks to some programmers in Redmond. When a matrix gets assigned, it needs all its memory to be contiguous. If the memory on your machine has, say, 250 MB free, but only in 1 MB chunks, and you need to build a 2 MB matrix, you're out of luck.

From the sounds of your calculations, they *must* be done as big matricies (true?). If not, try a data structure that isn't a matrix or array; these require *contiguous* blocks of memory. Lists, by comparison, can store their components in separate blocks. Would a list of smaller matricies work?

Could anyone point out what I could do to rectify this (if anything), or
generally what strategy I could take to improve this?

Some suggestions:


1) call gc() somewhere inside your routines regularly. Not guaranteed to help, but worth a try.

2) Get even more RAM, and hope it stabilises.

3) Change data structures to something other than one huge matrix. Matricies have huge computational advantages, but are pigs for memory.

4) Export the data crunching part of the application to an operating system that isn't notorious for bad memory management. <opinion, subjective=yes> I've almost never had anguish from Solaris. Linux and FreeBSD are not bad. </opinion> Have you considered running the results on a different machine, and storing the results in a fresh table on the same database as where you get the raw data?

Hope that helps.

Jason
--
Indigo Industrial Controls Ltd.
http://www.indigoindustrial.co.nz
64-21-343-545
[EMAIL PROTECTED]

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Reply via email to