On Feb 5, 2008 10:12 AM, Henrik Bengtsson <[EMAIL PROTECTED]> wrote: > On Feb 5, 2008 8:01 AM, Iago Mosqueira <[EMAIL PROTECTED]> wrote: > > Hello, > > > > After experiencing some difficulties with large arrays, I was surprised > > to see the apparent need for class to gc() after creating fairly large > > arrays. For example, calling > > > > a<-array(2, dim=c(10,10,10,10,10,100)) > > > > makes the memory usage of a fresh session of R jump from 13.8 Mb to > > 166.4 Mb. A call to gc() brought it down to 90.8 Mb, > > > > > gc() > > used (Mb) gc trigger (Mb) max used (Mb) > > Ncells 132619 3.6 350000 9.4 350000 9.4 > > Vcells 10086440 77.0 21335887 162.8 20086792 153.3 > > > > as expected by > > > > > object.size(a) > > > > [1] 80000136 > > I think the reason for this is that array() has to "expand" the input > data to the right length internally; > > data <- rep(data, length.out = vl) > > That is a so called "NAMED" object internally and when the following call to > > dim(data) <- dim > > occurs, the safest thing R can do is to create a copy. [Anyone, > correct me if I'm wrong]. > > If you expand the input data yourself, you won't see that extra copy, e.g. > > data <- 2 > dim <- c(10,10,10,10,10,100) > data <- rep(data, length.out=prod(dim)) > a <- array(data, dim=dim)
My bad here; that does indeed create an extra copy; rep() is the problem and you see that when you gc() after rep(). It seems to be hard to allocate an array with values without creating an extra copy, e.g. dim <- c(10,10,10,10,10,100) data <- numeric(prod(dim)) dim(data) <- dim will not create an extra copy, but as soon as you try to set a value it will happen, e.g. data[1,2,3,4,5,6] <- 2 Again, I believe this has to do with the fact that R is taking the safest path possible and not risking overwriting an existing object in memory (R is copy by value). Note that when you do a second assignment, that "safety copy" is already created so no more copies will be created, e.g. calling data[1,2,3,4,5,7] <- 3 after the above will not create an extra copy. /Henrik > > > > > Do I need to call gc() after creating every large array, or can I setup > > the system to do this more often or efficiently? > > The R garbage collector will free/deallocate that memory when > "needed". However, calling gc() explicitly should minimize the risk > for over-fragmented memory. Basically, if there are several blocks of > garbage memory hanging around, you might end up with a situation where > you a lot of *total* memory available, but you will only be able to > allocate small chunks of memory at any time. Even calling gc() at > that situation will not help; there is no mechanism that defragments > memory in R. So calling gc() after large allocations will add some > protection against that. > > /Henrik > > > > > > Thanks very much, > > > > > > Iago > > > > > > $platform > > [1] "i686-pc-linux-gnu" > > $version.string > > [1] "R version 2.6.1 (2007-11-26)" > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel