On 19/08/2010 12:57 PM, li...@jdadesign.net wrote:
I understand R is a "Pass-By-Value" language. I have a few practical
questions, however.

I'm dealing with a "large" dataset (~1GB) and so my understanding of the
nuances of memory usage in R is becoming important.

In an example such as:
> d <- read.csv("file.csv");
> n <- apply(d, 1, sum);
must "d" be copied to another location in memory in order to be used by
apply? In general, is copying only done when a variable is updated within
a function?

Generally R only copies when the variable is modified, but its rules for detecting this are sometimes overly conservative, so you may get some unnecessary copying. For example,

d[1,1] <- 3

will probably not make a full copy of d when the internal version of "[<-" is used, but if you have an R-level version, it probably will. I forget whether the dataframe method is internal or R level. In the apply(d, 1, sum) example, it would probably make a copy of each row to pass to sum, but never a copy of the whole dataframe/array.
Would the following example be any different in terms of memory usage?
> d <- read.csv("file.csv");
> n <- apply(d[,2:10], 1, sum);
or can R reference the original "d" object since no changes to the object
are being made?

This would make a new object containing d[,2:10], and would pass that to apply.
I'm familiar with FF and BigMemory, but are there any packages/tricks
which allow for passing such objects by reference without having to code
in C?

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to