Re: [Rd] full copy on assignment?

2010-04-04 Thread Norm Matloff
Thanks very much.

By the way, I tried setting a GDB breakpoint at duplicate1(), with the
following:

   x - 1:1000
   x[3] - 8
   x[33] - 88

I found that duplicate1() was called on both of the latter two lines.
I was a bit surprised, since change-on-write would seem to imply that
copying would be done in that second line but NOT on the third.
Moreover, system.time() gave 0.284 user time for the second and 0 on
the third.  YET duplicate1() WAS called on the third, and in stepping
through the code, there didn't seem to be an immediate exit.

Thanks to both John and Duncan for their comment on the fact that using
[- directly is a very different situation.  That's not what I asked,
but the comment is useful to me for other reasons.

Norm

 Message: 4
 Date: Sat, 03 Apr 2010 17:54:58 -0700
 From: John Chambers j...@r-project.org
 To: r-devel@r-project.org
 Subject: Re: [Rd] full copy on assignment?
...
...
 How often does y get duplicated? Hopefully not a million times.  One can 
 look at this in gdb, by trapping calls to duplicate1.  The answer is:  
 just once, to ensure that the object is local.  Then the duplicated 
 version has only one reference and the primitive replacement doesn't 
 copy it.
...

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] full copy on assignment?

2010-04-04 Thread Martin Morgan
On 04/04/2010 05:27 PM, Norm Matloff wrote:
 Thanks very much.
 
 By the way, I tried setting a GDB breakpoint at duplicate1(), with the
 following:
 
x - 1:1000
x[3] - 8
x[33] - 88

Here's how I investigated this, with the last line somewhat surprising

  R -d gdb
  gdb r
  ... cntrl-C
  gdb break duplicate1
  gdb commands
   call Rf_PrintValue(s)
   c
   end
  gdb c

  then

  x=5:1
  x[1L] = 1L # no copy
  x[1L] = 10 # type coercion, new alloc but no copy
  x[1] = 20 # copy of index (!)

Martin

 
 I found that duplicate1() was called on both of the latter two lines.
 I was a bit surprised, since change-on-write would seem to imply that
 copying would be done in that second line but NOT on the third.
 Moreover, system.time() gave 0.284 user time for the second and 0 on
 the third.  YET duplicate1() WAS called on the third, and in stepping
 through the code, there didn't seem to be an immediate exit.
 
 Thanks to both John and Duncan for their comment on the fact that using
 [- directly is a very different situation.  That's not what I asked,
 but the comment is useful to me for other reasons.
 
 Norm
 
 Message: 4
 Date: Sat, 03 Apr 2010 17:54:58 -0700
 From: John Chambers j...@r-project.org
 To: r-devel@r-project.org
 Subject: Re: [Rd] full copy on assignment?
 ...
 ...
 How often does y get duplicated? Hopefully not a million times.  One can 
 look at this in gdb, by trapping calls to duplicate1.  The answer is:  
 just once, to ensure that the object is local.  Then the duplicated 
 version has only one reference and the primitive replacement doesn't 
 copy it.
 ...
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] full copy on assignment?

2010-04-03 Thread Norm Matloff

Here's a basic question that doesn't seem to be completely answered in
the docs, and which unfortunately I've not had time to figure out by
wading through the R source code:

In a vector (or array) element assignment such as 

   z[3] - 8 

is there in actuality a full rewriting of the entire vector pointed to
by z, as implied by

   z - [-(z,3,value=8)

Assume that an element of z has already being changed previously, so
that copy-on-change issues don't apply, with z being reassigned back to
the same memory address.

I seem to recall reading somewhere that recent R versions make some
attempt to avoid rewriting the entire vector, and my timing experiments
seem to suggest that it's true.  

So, is a full rewrite avoided?  And where in the source code is this
done?

Thanks.

Norm Matloff

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] full copy on assignment?

2010-04-03 Thread Duncan Murdoch

On 03/04/2010 6:34 PM, Norm Matloff wrote:

Here's a basic question that doesn't seem to be completely answered in
the docs, and which unfortunately I've not had time to figure out by
wading through the R source code:

In a vector (or array) element assignment such as 

   z[3] - 8 


is there in actuality a full rewriting of the entire vector pointed to
by z, as implied by

   z - [-(z,3,value=8)

Assume that an element of z has already being changed previously, so
that copy-on-change issues don't apply, with z being reassigned back to
the same memory address.

I seem to recall reading somewhere that recent R versions make some
attempt to avoid rewriting the entire vector, and my timing experiments
seem to suggest that it's true.  


So, is a full rewrite avoided?  And where in the source code is this
done?


It depends.  User-written assignment functions can't avoid the copy. 
They act like the expansion


z - [-(z,3,value=8)

and in that, R can't tell that the newly created result of 
[-(z,3,value=8) will later overwrite z.


However, if z is a regular vector without a class and you're using the 
built-in version of z[3] - 8, it can take some shortcuts.  This happens 
in multiple places; one is around line 488 of subassign.c another is 
around line 1336.  In each of these places copies are made in some 
circumstances, but not in general.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] full copy on assignment?

2010-04-03 Thread John Chambers
In particular, Duncan's comment applies in situations where the 
replacement is in a loop, obviously the case one worries about.


What happens in the stupid little function:

 foo - function(x) { for(i in seq_along(x)) x[i] - x[i] +1; x}

for the case

 y - 1:1e6
 y1 - foo(y)

How often does y get duplicated? Hopefully not a million times.  One can 
look at this in gdb, by trapping calls to duplicate1.  The answer is:  
just once, to ensure that the object is local.  Then the duplicated 
version has only one reference and the primitive replacement doesn't 
copy it.


Unfortunately, as Duncan said, changing the definition to a user-written 
replacement function:


 sub- - function(x,i, value){x[i]- value; x}
 foo - function(x) { for(i in seq_along(x)) sub(x,i) - x[i]+1; x}

does duplicate a million times, since every call to `sub-` gets an 
argument with two references.


John



On 4/3/10 4:42 PM, Duncan Murdoch wrote:

On 03/04/2010 6:34 PM, Norm Matloff wrote:

Here's a basic question that doesn't seem to be completely answered in
the docs, and which unfortunately I've not had time to figure out by
wading through the R source code:

In a vector (or array) element assignment such as
   z[3] - 8
is there in actuality a full rewriting of the entire vector pointed to
by z, as implied by

   z - [-(z,3,value=8)

Assume that an element of z has already being changed previously, so
that copy-on-change issues don't apply, with z being reassigned back to
the same memory address.

I seem to recall reading somewhere that recent R versions make some
attempt to avoid rewriting the entire vector, and my timing experiments
seem to suggest that it's true.
So, is a full rewrite avoided?  And where in the source code is this
done?


It depends.  User-written assignment functions can't avoid the copy. 
They act like the expansion


z - [-(z,3,value=8)

and in that, R can't tell that the newly created result of 
[-(z,3,value=8) will later overwrite z.


However, if z is a regular vector without a class and you're using the 
built-in version of z[3] - 8, it can take some shortcuts.  This 
happens in multiple places; one is around line 488 of subassign.c 
another is around line 1336.  In each of these places copies are made 
in some circumstances, but not in general.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] full copy on assignment?

2010-04-03 Thread Norm Matloff

Thanks, Martin and Duncan, for the quick, cleary replies.

Norm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel