[Rd] reference class internals

2014-01-09 Thread Norm Matloff
I have a question about reference classes, which someone here
undoubtedly can answer immediately, saving me hours of wading through
indecipherable internal code. :-)  Thanks in advance.  

Reference class data is mutable, fine, but in what sense?  Is it really
physical,  or is it just a view given to the programmer?
 
If for instance I have vector as a field in a reference class, and I
change one element of the vector, is it really true that the change is
guaranteed to be made in-place, no copying, no memory reallocation etc?

Norm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] reference class internals

2014-01-09 Thread Hadley Wickham
It's a bit of a simplification, reference classes are wrappers around
environments.  So if modifying a value in an environment would create
a copy, then modifying the same value in a reference class will also
create a copy.

The situation with modifying a vector is a bit complicated as it will
sometimes be modified in place and sometimes be duplicated and
modified (depending on whether its NAMED attribute is 1 or 2, and
exactly how you're modifying it).

Hadley

On Thu, Jan 9, 2014 at 4:33 PM, Norm Matloff  wrote:
> I have a question about reference classes, which someone here
> undoubtedly can answer immediately, saving me hours of wading through
> indecipherable internal code. :-)  Thanks in advance.
>
> Reference class data is mutable, fine, but in what sense?  Is it really
> physical,  or is it just a view given to the programmer?
>
> If for instance I have vector as a field in a reference class, and I
> change one element of the vector, is it really true that the change is
> guaranteed to be made in-place, no copying, no memory reallocation etc?
>
> Norm
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] reference class internals

2014-01-09 Thread Norm Matloff

Bottom line:  Really no different from the case of ordinary vectors that
are not in reference classes, right?  In other words, not true
pass-by-reference.

Norm

On Thu, Jan 09, 2014 at 04:43:44PM -0600, Hadley Wickham wrote:
> It's a bit of a simplification, reference classes are wrappers around
> environments.  So if modifying a value in an environment would create
> a copy, then modifying the same value in a reference class will also
> create a copy.
> 
> The situation with modifying a vector is a bit complicated as it will
> sometimes be modified in place and sometimes be duplicated and
> modified (depending on whether its NAMED attribute is 1 or 2, and
> exactly how you're modifying it).
> 
> Hadley
> 
> On Thu, Jan 9, 2014 at 4:33 PM, Norm Matloff  wrote:
> > I have a question about reference classes, which someone here
> > undoubtedly can answer immediately, saving me hours of wading through
> > indecipherable internal code. :-)  Thanks in advance.
> >
> > Reference class data is mutable, fine, but in what sense?  Is it really
> > physical,  or is it just a view given to the programmer?
> >
> > If for instance I have vector as a field in a reference class, and I
> > change one element of the vector, is it really true that the change is
> > guaranteed to be made in-place, no copying, no memory reallocation etc?
> >
> > Norm
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> 
> -- 
> http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] reference class internals

2014-01-09 Thread Simon Urbanek
On Jan 9, 2014, at 6:20 PM, Norm Matloff  wrote:

> Bottom line:  Really no different from the case of ordinary vectors that are 
> not in reference classes, right?  In other words, not true pass-by-reference.
> 

The pass-by-reference applies to the object itself, not necessarily to anything 
you obtain by calling a function on the object (like extracting a part from 
it). Vectors are not reference-semantics objects so regular rules apply.

If you pass a reference semantics object to a function, the function can modify 
the object. If you pass any other object, the contents are guaranteed to not be 
touched. Reference-semantics objects in R are literally passed by reference 
(same C pointer), so yes, it is true pass-by-reference.

Cheers,
Simon


(*) - technically, there is a thin non-refernce wrapper around the instances of 
reference classes, because there are things you don't want to happen to your 
ref-semantics instance - e.g. you don't want unclass(x) to destroy x and all 
instances of it (which it would do if there was no wrapper). But the actual 
payload of the object is a true ref-semantics object - an environment - that is 
always passed by reference.



> Norm
> 
> On Thu, Jan 09, 2014 at 04:43:44PM -0600, Hadley Wickham wrote:
>> It's a bit of a simplification, reference classes are wrappers around
>> environments.  So if modifying a value in an environment would create
>> a copy, then modifying the same value in a reference class will also
>> create a copy.
>> 
>> The situation with modifying a vector is a bit complicated as it will
>> sometimes be modified in place and sometimes be duplicated and
>> modified (depending on whether its NAMED attribute is 1 or 2, and
>> exactly how you're modifying it).
>> 
>> Hadley
>> 
>> On Thu, Jan 9, 2014 at 4:33 PM, Norm Matloff  wrote:
>>> I have a question about reference classes, which someone here
>>> undoubtedly can answer immediately, saving me hours of wading through
>>> indecipherable internal code. :-)  Thanks in advance.
>>> 
>>> Reference class data is mutable, fine, but in what sense?  Is it really
>>> physical,  or is it just a view given to the programmer?
>>> 
>>> If for instance I have vector as a field in a reference class, and I
>>> change one element of the vector, is it really true that the change is
>>> guaranteed to be made in-place, no copying, no memory reallocation etc?
>>> 
>>> Norm
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 
>> 
>> -- 
>> http://had.co.nz/
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] reference class internals

2014-01-09 Thread Norm Matloff

Thanks, Hadley and Simon.

The reason I asked today was that when reference classes first came out,
it had appeared to me that there is no peformance advantage to using
reference classes, that it was mainly a style issue (encapsulation,
etc.).  Unless I'm missing something, both of you have confirmed my
original impression, correct?

Norm

On Thu, Jan 09, 2014 at 09:44:10PM -0500, Simon Urbanek wrote:
> On Jan 9, 2014, at 6:20 PM, Norm Matloff  wrote:
> 
> > Bottom line:  Really no different from the case of ordinary vectors that 
> > are not in reference classes, right?  In other words, not true 
> > pass-by-reference.
> > 
> 
> The pass-by-reference applies to the object itself, not necessarily to 
> anything you obtain by calling a function on the object (like extracting a 
> part from it). Vectors are not reference-semantics objects so regular rules 
> apply.
> 
> If you pass a reference semantics object to a function, the function can 
> modify the object. If you pass any other object, the contents are guaranteed 
> to not be touched. Reference-semantics objects in R are literally passed by 
> reference (same C pointer), so yes, it is true pass-by-reference.
> 
> Cheers,
> Simon
> 
> 
> (*) - technically, there is a thin non-refernce wrapper around the instances 
> of reference classes, because there are things you don't want to happen to 
> your ref-semantics instance - e.g. you don't want unclass(x) to destroy x and 
> all instances of it (which it would do if there was no wrapper). But the 
> actual payload of the object is a true ref-semantics object - an environment 
> - that is always passed by reference.
> 
> 
> 
> > Norm
> > 
> > On Thu, Jan 09, 2014 at 04:43:44PM -0600, Hadley Wickham wrote:
> >> It's a bit of a simplification, reference classes are wrappers around
> >> environments.  So if modifying a value in an environment would create
> >> a copy, then modifying the same value in a reference class will also
> >> create a copy.
> >> 
> >> The situation with modifying a vector is a bit complicated as it will
> >> sometimes be modified in place and sometimes be duplicated and
> >> modified (depending on whether its NAMED attribute is 1 or 2, and
> >> exactly how you're modifying it).
> >> 
> >> Hadley
> >> 
> >> On Thu, Jan 9, 2014 at 4:33 PM, Norm Matloff  
> >> wrote:
> >>> I have a question about reference classes, which someone here
> >>> undoubtedly can answer immediately, saving me hours of wading through
> >>> indecipherable internal code. :-)  Thanks in advance.
> >>> 
> >>> Reference class data is mutable, fine, but in what sense?  Is it really
> >>> physical,  or is it just a view given to the programmer?
> >>> 
> >>> If for instance I have vector as a field in a reference class, and I
> >>> change one element of the vector, is it really true that the change is
> >>> guaranteed to be made in-place, no copying, no memory reallocation etc?
> >>> 
> >>> Norm
> >>> 
> >>> __
> >>> R-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >> 
> >> 
> >> 
> >> -- 
> >> http://had.co.nz/
> > 
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> > 
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] reference class internals

2014-01-09 Thread Martin Morgan

On 01/09/2014 07:53 PM, Norm Matloff wrote:


Thanks, Hadley and Simon.

The reason I asked today was that when reference classes first came out,
it had appeared to me that there is no peformance advantage to using
reference classes, that it was mainly a style issue (encapsulation,
etc.).  Unless I'm missing something, both of you have confirmed my
original impression, correct?


We've used reference classes for performance benefit. E.g., updating a single 
(e.g., small) field in an S4 object triggers an entire copy of the object, 
whereas for a reference class the fields can be updated independently. This is 
especially true inside function (e.g., method) calls (e.g., slot access), where 
the object is marked to be duplicated.




> a = setClass("A", representation(x="numeric"))(x=1:5)
.Internal(inspect(a))

@5237508 25 S4SXP g0c0 [OBJ,NAM(2),S4,gp=0x10,ATT]
ATTRIB:
  @5237460 02 LISTSXP g0c0 []
TAG: @12ea3a0 01 SYMSXP g0c0 [NAM(2)] "x"
@5225db8 13 INTSXP g0c3 [NAM(2)] (len=5, tl=0) 1,2,3,4,5
TAG: @1284b08 01 SYMSXP g0c0 [LCK,gp=0x4000] "class" (has value)
@52355c8 16 STRSXP g0c1 [NAM(2),ATT] (len=1, tl=0)
  @4740e48 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "A"
ATTRIB:
  @52373f0 02 LISTSXP g0c0 []
TAG: @128e500 01 SYMSXP g0c0 [NAM(2)] "package"
@5235598 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
  @12ee2b8 09 CHARSXP g0c2 [gp=0x61] [ASCII] [cached] ".GlobalEnv"

a@x[1]=2L
.Internal(inspect(a))  ## almost everything duplicated!

@5243cd0 25 S4SXP g0c0 [OBJ,NAM(2),S4,gp=0x10,ATT]
ATTRIB:
  @5243c60 02 LISTSXP g0c0 []
TAG: @12ea3a0 01 SYMSXP g0c0 [NAM(2)] "x"
@5225b30 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 2,2,3,4,5
TAG: @1284b08 01 SYMSXP g0c0 [LCK,gp=0x4000] "class" (has value)
@52405f8 16 STRSXP g0c1 [NAM(2),ATT] (len=1, tl=0)
  @4740e48 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "A"
ATTRIB:
  @5243bf0 02 LISTSXP g0c0 []
TAG: @128e500 01 SYMSXP g0c0 [NAM(2)] "package"
@52405c8 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
  @12ee2b8 09 CHARSXP g0c2 [gp=0x61] [ASCII] [cached] ".GlobalEnv"

(this also influence performance of other R objects, of course, e.g.,

> f = function(x) { x@a = 2L; x }
> l = list(a=1:5); .Internal(inspect(l))
@53f8448 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
  @53cef48 13 INTSXP g0c3 [] (len=5, tl=0) 1,2,3,4,5
ATTRIB:
  @53f9190 02 LISTSXP g0c0 []
TAG: @1284638 01 SYMSXP g0c0 [LCK,gp=0x4000] "names" (has value)
@53f8418 16 STRSXP g0c1 [] (len=1, tl=0)
  @146b128 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "a"
> .Internal(inspect(f(l)))
@53f83e8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
  @53cef00 13 INTSXP g0c3 [] (len=5, tl=0) 2,2,3,4,5
ATTRIB:
  @53f9988 02 LISTSXP g0c0 []
TAG: @1284638 01 SYMSXP g0c0 [LCK,gp=0x4000] "names" (has value)
@53f83b8 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
  @146b128 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "a"

Copies are localized to the updated field with reference classes (can't show 
this with .Internal(inspect()), though, because x = new.env(); x$x = x; 
.Internal(insepct(x)) [mimicking .self in reference classes] has an infinite (? 
I didn't wait that long) recursion).


I think actually reference classes have a surprising performance _hit_ compared 
to other R approaches to minimizing copying; this has come up on this or the R 
mailing list before, but I've lost track of the original. Here's a StackOverflow 
version


http://stackoverflow.com/questions/18677696/stack-class-in-r-something-more-concise/18678440#18678440

Martin



Norm

On Thu, Jan 09, 2014 at 09:44:10PM -0500, Simon Urbanek wrote:

On Jan 9, 2014, at 6:20 PM, Norm Matloff  wrote:


Bottom line:  Really no different from the case of ordinary vectors that are 
not in reference classes, right?  In other words, not true pass-by-reference.



The pass-by-reference applies to the object itself, not necessarily to anything 
you obtain by calling a function on the object (like extracting a part from 
it). Vectors are not reference-semantics objects so regular rules apply.

If you pass a reference semantics object to a function, the function can modify 
the object. If you pass any other object, the contents are guaranteed to not be 
touched. Reference-semantics objects in R are literally passed by reference 
(same C pointer), so yes, it is true pass-by-reference.

Cheers,
Simon


(*) - technically, there is a thin non-refernce wrapper around the instances of 
reference classes, because there are things you don't want to happen to your 
ref-semantics instance - e.g. you don't want unclass(x) to destroy x and all 
instances of it (which it would do if there was no wrapper). But the actual 
payload of the object is a true ref-semantics object - an environment - that is 
always passed by reference.




Norm

On Thu, Jan 09, 2014 at 04:43:44PM -0600, Hadley Wickham wrote:

It's a bit of a simplification, reference classes are wrappers around
environments.  So if modifying a va

Re: [Rd] reference class internals

2014-01-09 Thread Norm Matloff

I guess I should explain where I'm coming from in all this.

I've always been something of a skeptic on object-oriented programming.
Though I agree it has some advantages, and I do use it myself (in
Python), in general I think it makes one work far too hard for the
potential benefit.  C++ templates (which I use in Thrust) drive me
crazy, very frustrating.

So I am, for better or worse, one of those people who don't even like S4
(again a style issue).  Obviously those who do like S4 may get a
performance benefit via reference classes in the situation Martin
mentions below.

I've been meaning for some time to look into whether there might
actually be a performance benefit for non-OOP programmers like me,
thinking the answer would be no but wanting to confirm.  So,
today I finally got around to asking, and immediately got three quick,
cogent and informative replies.  This testifies to the quality of the
membership of this list!  Thanks very much.

Norm

On Thu, Jan 09, 2014 at 08:27:09PM -0800, Martin Morgan wrote:
> On 01/09/2014 07:53 PM, Norm Matloff wrote:
> >
> >Thanks, Hadley and Simon.
> >
> >The reason I asked today was that when reference classes first came out,
> >it had appeared to me that there is no peformance advantage to using
> >reference classes, that it was mainly a style issue (encapsulation,
> >etc.).  Unless I'm missing something, both of you have confirmed my
> >original impression, correct?
> 
> We've used reference classes for performance benefit. E.g., updating
> a single (e.g., small) field in an S4 object triggers an entire copy
> of the object, whereas for a reference class the fields can be
> updated independently. This is especially true inside function
> (e.g., method) calls (e.g., slot access), where the object is marked
> to be duplicated.
> 
> 
> >> a = setClass("A", representation(x="numeric"))(x=1:5)
> >.Internal(inspect(a))
> @5237508 25 S4SXP g0c0 [OBJ,NAM(2),S4,gp=0x10,ATT]
> ATTRIB:
>   @5237460 02 LISTSXP g0c0 []
> TAG: @12ea3a0 01 SYMSXP g0c0 [NAM(2)] "x"
> @5225db8 13 INTSXP g0c3 [NAM(2)] (len=5, tl=0) 1,2,3,4,5
> TAG: @1284b08 01 SYMSXP g0c0 [LCK,gp=0x4000] "class" (has value)
> @52355c8 16 STRSXP g0c1 [NAM(2),ATT] (len=1, tl=0)
>   @4740e48 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "A"
> ATTRIB:
>   @52373f0 02 LISTSXP g0c0 []
>   TAG: @128e500 01 SYMSXP g0c0 [NAM(2)] "package"
>   @5235598 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
> @12ee2b8 09 CHARSXP g0c2 [gp=0x61] [ASCII] [cached] ".GlobalEnv"
> >a@x[1]=2L
> >.Internal(inspect(a))  ## almost everything duplicated!
> @5243cd0 25 S4SXP g0c0 [OBJ,NAM(2),S4,gp=0x10,ATT]
> ATTRIB:
>   @5243c60 02 LISTSXP g0c0 []
> TAG: @12ea3a0 01 SYMSXP g0c0 [NAM(2)] "x"
> @5225b30 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 2,2,3,4,5
> TAG: @1284b08 01 SYMSXP g0c0 [LCK,gp=0x4000] "class" (has value)
> @52405f8 16 STRSXP g0c1 [NAM(2),ATT] (len=1, tl=0)
>   @4740e48 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "A"
> ATTRIB:
>   @5243bf0 02 LISTSXP g0c0 []
>   TAG: @128e500 01 SYMSXP g0c0 [NAM(2)] "package"
>   @52405c8 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
> @12ee2b8 09 CHARSXP g0c2 [gp=0x61] [ASCII] [cached] ".GlobalEnv"
> 
> (this also influence performance of other R objects, of course, e.g.,
> 
> > f = function(x) { x@a = 2L; x }
> > l = list(a=1:5); .Internal(inspect(l))
> @53f8448 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
>   @53cef48 13 INTSXP g0c3 [] (len=5, tl=0) 1,2,3,4,5
> ATTRIB:
>   @53f9190 02 LISTSXP g0c0 []
> TAG: @1284638 01 SYMSXP g0c0 [LCK,gp=0x4000] "names" (has value)
> @53f8418 16 STRSXP g0c1 [] (len=1, tl=0)
>   @146b128 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "a"
> > .Internal(inspect(f(l)))
> @53f83e8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
>   @53cef00 13 INTSXP g0c3 [] (len=5, tl=0) 2,2,3,4,5
> ATTRIB:
>   @53f9988 02 LISTSXP g0c0 []
> TAG: @1284638 01 SYMSXP g0c0 [LCK,gp=0x4000] "names" (has value)
> @53f83b8 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
>   @146b128 09 CHARSXP g0c1 [gp=0x61] [ASCII] [cached] "a"
> 
> Copies are localized to the updated field with reference classes
> (can't show this with .Internal(inspect()), though, because x =
> new.env(); x$x = x; .Internal(insepct(x)) [mimicking .self in
> reference classes] has an infinite (? I didn't wait that long)
> recursion).
> 
> I think actually reference classes have a surprising performance
> _hit_ compared to other R approaches to minimizing copying; this has
> come up on this or the R mailing list before, but I've lost track of
> the original. Here's a StackOverflow version
> 
> http://stackoverflow.com/questions/18677696/stack-class-in-r-something-more-concise/18678440#18678440
> 
> Martin
> 
> 
> >Norm
> >
> >On Thu, Jan 09, 2014 at 09:44:10PM -0500, Simon Urbanek wrote:
> >>On Jan 9, 2014, at 6:20 PM, Norm Matloff  wrote:
> >>
> >>>Bottom line:  Really no different from the case of ordinary vectors that 
> >>>are not in reference classes,