On 19/08/2010 12:57 PM, li...@jdadesign.net wrote:
I understand R is a "Pass-By-Value" language. I have a few practical
questions, however.
I'm dealing with a "large" dataset (~1GB) and so my understanding of the
nuances of memory usage in R is becoming important.
In an example such as:
> d <- read.csv("file.csv");
> n <- apply(d, 1, sum);
must "d" be copied to another location in memory in order to be used by
apply? In general, is copying only done when a variable is updated within
a function?
Generally R only copies when the variable is modified, but its rules for
detecting this are sometimes overly conservative, so you may get some
unnecessary copying. For example,
d[1,1] <- 3
will probably not make a full copy of d when the internal version of
"[<-" is used, but if you have an R-level version, it probably will. I
forget whether the dataframe method is internal or R level.
In the apply(d, 1, sum) example, it would probably make a copy of each
row to pass to sum, but never a copy of the whole dataframe/array.
Would the following example be any different in terms of memory usage?
> d <- read.csv("file.csv");
> n <- apply(d[,2:10], 1, sum);
or can R reference the original "d" object since no changes to the object
are being made?
This would make a new object containing d[,2:10], and would pass that to
apply.
I'm familiar with FF and BigMemory, but are there any packages/tricks
which allow for passing such objects by reference without having to code
in C?
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.