Re: [R] Pass By Value Questions

2010-08-20 Thread Matthew Dowle


To: r-help
Cc: Jeff, Matt, Duncan, Hadley   [ using Nabble to cc ]

Jeff, Matt,

How about the 'refdata' class in package ref.
Also, Hadley's immutable data.frame in plyr 1.1.

Both allow you to refer to subsets of a data.frame or matrix by reference I
believe, if I understand correctly.

Matthew

http://datatable.r-forge.r-project.org/



-- 
View this message in context: 
http://r.789695.n4.nabble.com/Pass-By-Value-Questions-tp2331565p2332330.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pass By Value Questions

2010-08-20 Thread Jens Oehlschlägel
Jeff, 
R has 'environments' as a general mechanism to pass around objects by 
reference. However, that does not help with most functions like 'apply' which 
take arguments other than environments. 
 I'm familiar with FF and BigMemory, but are there any packages/tricks which 
 allow for passing such objects by reference without having to code in C? 
With ff (and I assume with bigmemory as well) you can pass around objects by 
reference without C-coding.To be more precise with regard to ff: atomic ff 
objects have 'hybrid copying semantics', which means that two references to an 
ff object will share the data and SOME features (like the 'length') while OTHER 
features (like 'dim') are copied on modify (see 'vt' for an powerful 
application of this concept). You might want to have a look at 'ffapply' and 
friends and at 'chunk'.

HTH

Jens Oehlschlägel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pass By Value Questions

2010-08-19 Thread lists
I understand R is a Pass-By-Value language. I have a few practical
questions, however.

I'm dealing with a large dataset (~1GB) and so my understanding of the
nuances of memory usage in R is becoming important.

In an example such as:
 d - read.csv(file.csv);
 n - apply(d, 1, sum);
must d be copied to another location in memory in order to be used by
apply? In general, is copying only done when a variable is updated within
a function?

Would the following example be any different in terms of memory usage?
 d - read.csv(file.csv);
 n - apply(d[,2:10], 1, sum);
or can R reference the original d object since no changes to the object
are being made?

I'm familiar with FF and BigMemory, but are there any packages/tricks
which allow for passing such objects by reference without having to code
in C?

Regards,
Jeff Allen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pass By Value Questions

2010-08-19 Thread Duncan Murdoch

On 19/08/2010 12:57 PM, li...@jdadesign.net wrote:

I understand R is a Pass-By-Value language. I have a few practical
questions, however.

I'm dealing with a large dataset (~1GB) and so my understanding of the
nuances of memory usage in R is becoming important.

In an example such as:
 d - read.csv(file.csv);
 n - apply(d, 1, sum);
must d be copied to another location in memory in order to be used by
apply? In general, is copying only done when a variable is updated within
a function?
  


Generally R only copies when the variable is modified, but its rules for 
detecting this are sometimes overly conservative, so you may get some 
unnecessary copying.  For example,


d[1,1] - 3

will probably not make a full copy of d when the internal version of 
[- is used, but if you have an R-level version, it probably will.  I 
forget whether the dataframe method is internal or R level. 

In the apply(d, 1, sum) example, it would probably make a copy of each 
row to pass to sum, but never a copy of the whole dataframe/array.

Would the following example be any different in terms of memory usage?
 d - read.csv(file.csv);
 n - apply(d[,2:10], 1, sum);
or can R reference the original d object since no changes to the object
are being made?
  


This would make a new object containing d[,2:10], and would pass that to 
apply.

I'm familiar with FF and BigMemory, but are there any packages/tricks
which allow for passing such objects by reference without having to code
in C?
  


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pass By Value Questions

2010-08-19 Thread Matt Shotwell
On Thu, 2010-08-19 at 14:27 -0400, Duncan Murdoch wrote:
 On 19/08/2010 12:57 PM, li...@jdadesign.net wrote:
  I understand R is a Pass-By-Value language. I have a few practical
  questions, however.
 
  I'm dealing with a large dataset (~1GB) and so my understanding of the
  nuances of memory usage in R is becoming important.
 
  In an example such as:
   d - read.csv(file.csv);
   n - apply(d, 1, sum);
  must d be copied to another location in memory in order to be used by
  apply? In general, is copying only done when a variable is updated within
  a function?

 
 Generally R only copies when the variable is modified, but its rules for 
 detecting this are sometimes overly conservative, so you may get some 
 unnecessary copying.  For example,
 
 d[1,1] - 3
 
 will probably not make a full copy of d when the internal version of 
 [- is used, but if you have an R-level version, it probably will.  I 
 forget whether the dataframe method is internal or R level. 
 
 In the apply(d, 1, sum) example, it would probably make a copy of each 
 row to pass to sum, but never a copy of the whole dataframe/array.
  Would the following example be any different in terms of memory usage?
   d - read.csv(file.csv);
   n - apply(d[,2:10], 1, sum);
  or can R reference the original d object since no changes to the object
  are being made?

 
 This would make a new object containing d[,2:10], and would pass that to 
 apply.

Since d is a data.frame, subsetting the columns would create a new
data.frame, as Duncan says. However, the columns of the new data.frame
would internally _reference_ the appropriate columns of d, until either
were modified. This does not apply to row subsetting. That is, d[2:10,]
would create a new data.frame and copy the relevant data. Nor does it
apply to _any_ subsetting of matrices.

  I'm familiar with FF and BigMemory, but are there any packages/tricks
  which allow for passing such objects by reference without having to code
  in C?


It's difficult to determine exactly when data is copied internally by R.
The tracemem function may be used to track when entire objects are
duplicated. However, tracemem would not detect the duplication that
occurs, for example, when subsetting the rows of d. Otherwise, we can
monitor memory usage with gc(), and experiment with code on a trial and
error basis.

I have had limited success in avoiding duplication by utilizing R
environments. See for example http://biostatmatt.com/archives/663 .
However, this may be more trouble that it's worth.

-Matt

 Duncan Murdoch
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Matthew S. Shotwell
Graduate Student 
Division of Biostatistics and Epidemiology
Medical University of South Carolina

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.