Usually R is pretty good about not copying objects when it doesn't need to. However, the list() function seems to make unnecessary copies. For example:

> system.time(x<-double(10^9))
   user  system elapsed
  1.772   4.280   7.017
> system.time(y<-double(10^9))
   user  system elapsed
  2.564   3.368   5.943
> system.time(z<-list(x,y))
   user  system elapsed
  5.520   6.748  12.304

I have a function where I create two large arrays, manipulate them in certain ways, and then return both as a list. I'm optimizing the function, so I'd like to be able to build the return list quickly. The two large arrays drop out of scope immediately after I make the list and return it, so copying them is completely unnecessary.

Is there some way to do this? I'm not familiar with manipulating lists through the .Call interface, and haven't been able to find much about this in the documentation. Might it be possible to write a fast (but possibly unsafe) list function using .Call that doesn't make copies of the arguments?

PS A few things I've tried. First, this is not due to triggering garbage collection -- even if I call gc() before list(x,y), it still takes a long time.

Also, I've tried rewriting the function by creating the list at the beginning as in:
result <- list(x=double(10^9),y=double(10^9))
and then manipulating result$x and result$y but this made my code run slower, as R seemed to be making other unnecessary copies while manipulating elements of a list like this.

I've considered (though not implemented) creating an environment rather than a list, and returning the environment, but I'd rather find a simple way of creating a list without making copies if possible.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to