Re: [R] Pass By Value Questions
Jeff, R has 'environments' as a general mechanism to pass around objects by reference. However, that does not help with most functions like 'apply' which take arguments other than environments. > I'm familiar with FF and BigMemory, but are there any packages/tricks> which > allow for passing such objects by reference without having to code> in C? With ff (and I assume with bigmemory as well) you can pass around objects by reference without C-coding.To be more precise with regard to ff: atomic ff objects have 'hybrid copying semantics', which means that two references to an ff object will share the data and SOME features (like the 'length') while OTHER features (like 'dim') are copied on modify (see 'vt' for an powerful application of this concept). You might want to have a look at 'ffapply' and friends and at 'chunk'. HTH Jens Oehlschlägel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pass By Value Questions
To: r-help Cc: Jeff, Matt, Duncan, Hadley [ using Nabble to cc ] Jeff, Matt, How about the 'refdata' class in package ref. Also, Hadley's immutable data.frame in plyr 1.1. Both allow you to refer to subsets of a data.frame or matrix by reference I believe, if I understand correctly. Matthew http://datatable.r-forge.r-project.org/ -- View this message in context: http://r.789695.n4.nabble.com/Pass-By-Value-Questions-tp2331565p2332330.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pass By Value Questions
On Thu, 2010-08-19 at 14:27 -0400, Duncan Murdoch wrote: > On 19/08/2010 12:57 PM, li...@jdadesign.net wrote: > > I understand R is a "Pass-By-Value" language. I have a few practical > > questions, however. > > > > I'm dealing with a "large" dataset (~1GB) and so my understanding of the > > nuances of memory usage in R is becoming important. > > > > In an example such as: > > > d <- read.csv("file.csv"); > > > n <- apply(d, 1, sum); > > must "d" be copied to another location in memory in order to be used by > > apply? In general, is copying only done when a variable is updated within > > a function? > > > > Generally R only copies when the variable is modified, but its rules for > detecting this are sometimes overly conservative, so you may get some > unnecessary copying. For example, > > d[1,1] <- 3 > > will probably not make a full copy of d when the internal version of > "[<-" is used, but if you have an R-level version, it probably will. I > forget whether the dataframe method is internal or R level. > > In the apply(d, 1, sum) example, it would probably make a copy of each > row to pass to sum, but never a copy of the whole dataframe/array. > > Would the following example be any different in terms of memory usage? > > > d <- read.csv("file.csv"); > > > n <- apply(d[,2:10], 1, sum); > > or can R reference the original "d" object since no changes to the object > > are being made? > > > > This would make a new object containing d[,2:10], and would pass that to > apply. Since d is a data.frame, subsetting the columns would create a new data.frame, as Duncan says. However, the columns of the new data.frame would internally _reference_ the appropriate columns of d, until either were modified. This does not apply to row subsetting. That is, d[2:10,] would create a new data.frame and copy the relevant data. Nor does it apply to _any_ subsetting of matrices. > > I'm familiar with FF and BigMemory, but are there any packages/tricks > > which allow for passing such objects by reference without having to code > > in C? > > It's difficult to determine exactly when data is copied internally by R. The tracemem function may be used to track when entire objects are duplicated. However, tracemem would not detect the duplication that occurs, for example, when subsetting the rows of d. Otherwise, we can monitor memory usage with gc(), and experiment with code on a trial and error basis. I have had limited success in avoiding duplication by utilizing R environments. See for example http://biostatmatt.com/archives/663 . However, this may be more trouble that it's worth. -Matt > Duncan Murdoch > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Matthew S. Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pass By Value Questions
On 19/08/2010 12:57 PM, li...@jdadesign.net wrote: I understand R is a "Pass-By-Value" language. I have a few practical questions, however. I'm dealing with a "large" dataset (~1GB) and so my understanding of the nuances of memory usage in R is becoming important. In an example such as: > d <- read.csv("file.csv"); > n <- apply(d, 1, sum); must "d" be copied to another location in memory in order to be used by apply? In general, is copying only done when a variable is updated within a function? Generally R only copies when the variable is modified, but its rules for detecting this are sometimes overly conservative, so you may get some unnecessary copying. For example, d[1,1] <- 3 will probably not make a full copy of d when the internal version of "[<-" is used, but if you have an R-level version, it probably will. I forget whether the dataframe method is internal or R level. In the apply(d, 1, sum) example, it would probably make a copy of each row to pass to sum, but never a copy of the whole dataframe/array. Would the following example be any different in terms of memory usage? > d <- read.csv("file.csv"); > n <- apply(d[,2:10], 1, sum); or can R reference the original "d" object since no changes to the object are being made? This would make a new object containing d[,2:10], and would pass that to apply. I'm familiar with FF and BigMemory, but are there any packages/tricks which allow for passing such objects by reference without having to code in C? Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pass By Value Questions
I understand R is a "Pass-By-Value" language. I have a few practical questions, however. I'm dealing with a "large" dataset (~1GB) and so my understanding of the nuances of memory usage in R is becoming important. In an example such as: > d <- read.csv("file.csv"); > n <- apply(d, 1, sum); must "d" be copied to another location in memory in order to be used by apply? In general, is copying only done when a variable is updated within a function? Would the following example be any different in terms of memory usage? > d <- read.csv("file.csv"); > n <- apply(d[,2:10], 1, sum); or can R reference the original "d" object since no changes to the object are being made? I'm familiar with FF and BigMemory, but are there any packages/tricks which allow for passing such objects by reference without having to code in C? Regards, Jeff Allen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.