subject:"Re\: \[Rd\] surprising behaviour of names<\-"

Re: [Rd] surprising behaviour of names<-

2009-03-16 Thread Wacek Kusnierczyk

Thomas Lumley wrote:
>
> Wacek,
>
> In this case I think the *tmp* dates from the days before backticks,
> when it was not a legal name (it still isn't) and it was much, much
> harder to use illegal names, so the collision issue really didn't exist.
>

thanks for the explanation.

> You're right about the documentation.
>


thanks for the acknowledgement.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-16 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
>
>> '*tmp*' = 0
>> `*tmp*`
>> # 0
>>
>> x = 1
>> names(x) = 'foo'
>> `*tmp*`
>> # error: object "*tmp*" not found
>>
>> `*ugly*`
>> 
>
> I agree, and I am a bit flabbergasted.  I had not expected that
> something like this would happen and I am indeed not aware of anything
> in the documentation that warns about this; but others may prove me
> wrong on this.
>   

hopefully.

>   
>> given that `*tmp*`is a perfectly legal (though some would say
>> 'non-standard') name, it would be good if somewhere here a warning
>> were issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is
>> not just any non-standard name, but one that is 'obviously' used
>> under the hood to perform black magic.
>> 
>
> Now I wonder whether there are any other objects (with non-standard)
> names) that can be nuked by operations performed under the hood.  
>   

any such risk should be clearly documented, if not with a warning issued
each time the user risks h{is,er} workspace corrupted by the under-the-hood.

> I guess the best thing is to stay away from non-standard names, if only
> to save the typing of back-ticks. :)
>   

agree.  but then, there may be -- and probably are -- other such 'best
to stay away' things in r, all of which should be documented so that a
user know what may happen on the surface, *without* having to peek under
the hood.

> Thanks for letting me know, I have learned something new today.
>   

wow.  most of my fiercely truculent ranting is meant to point out things
that may not be intentional, or if they are, they seem to me design
flaws rather than features -- so that either i learn that i am ignorant
or wrong, or someone else does, pro bono.  hopefully.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-16 Thread Thomas Lumley



Wacek,

In this case I think the *tmp* dates from the days before backticks, when it 
was not a legal name (it still isn't) and it was much, much harder to use 
illegal names, so the collision issue really didn't exist.

You're right about the documentation.

  -thomas


On Sun, 15 Mar 2009, Wacek Kusnierczyk wrote:


Berwin A Turlach wrote:


Obviously, assuming that R really executes
*tmp* <- x
x <- "names<-"('*tmp*', value=c("a","b"))
under the hood, in the C code, then *tmp* does not end up in the symbol
table and does not persist beyond the execution of
names(x) <- c("a","b")


to prove that i take you seriously, i have peeked into the code, and
found that indeed there is a temporary binding for *tmp* made behind the
scenes -- sort of. unfortunately, it is not done carefully enough to
avoid possible interference with the user's code:

'*tmp*' = 0
`*tmp*`
# 0

x = 1
names(x) = 'foo'
`*tmp*`
# error: object "*tmp*" not found

`*ugly*`

given that `*tmp*`is a perfectly legal (though some would say
'non-standard') name, it would be good if somewhere here a warning were
issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is not just
any non-standard name, but one that is 'obviously' used under the hood
to perform black magic.

it also appears that the explanation given in, e.g., the r language
definition (draft, of course) sec. 3.4.4:

"
Assignment to subsets of a structure is a special case of a general
mechanism for complex
assignment:
x[3:5] <- 13:15
The result of this commands is as if the following had been executed
‘*tmp*‘ <- x
x <- "[<-"(‘*tmp*‘, 3:5, value=13:15)
"

is incomplete (because the final result is not '*tmp*' having the value
of x, as it might seem, but rather '*tmp*' having been unbound).

so the suggestion for the documenters is to add to the end of the
section (or wherever else it is appropriate) a warning to the effect
that in the end '*tmp*' will be removed, even if the user has explicitly
defined it earlier in the same scope.

or maybe have the implementation not rely on a user-forgeable name? for
example, the '.Last.value' name is automatically bound to the most
recently returned value, but it resides in package:base and does not
collide with bindings using it made by the user:

.Last.value = 0

1
.Last.value
# 0, not 1

1
base::.Last.value
# 1, not 0


why could not '*tmp*' be bound and unbound outside of the user's
namespace? (i guess it's easier to update the docs -- or just ignore the
issue.)


on the margin, traceback('<-') will pick only one of the uses of '<-'
suggested by the code above:

x <- 1:10

trace('<-')
x[3:5] <- 13:15
# trace: x[3:5] <- 13:15
# trace: x <- `[<-`(`*tmp*`, 3:5, value = 13:15)

which is somewhat confusing, because then '*tmp*' appears in the trace
somewhat ex machina. (again, the explanation is in the source code, but
the traceback could have been more informative.)

cheers,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-15 Thread Berwin A Turlach

G'day Wacek,

On Sun, 15 Mar 2009 21:01:33 +0100
Wacek Kusnierczyk  wrote:

> Berwin A Turlach wrote:
> >
> > Obviously, assuming that R really executes 
> > *tmp* <- x
> > x <- "names<-"('*tmp*', value=c("a","b"))
> > under the hood, in the C code, then *tmp* does not end up in the
> > symbol table and does not persist beyond the execution of 
> > names(x) <- c("a","b")
> >
> >   
> 
> to prove that i take you seriously, i have peeked into the code, and
> found that indeed there is a temporary binding for *tmp* made behind
> the scenes -- sort of. unfortunately, it is not done carefully enough
> to avoid possible interference with the user's code:
> 
> '*tmp*' = 0
> `*tmp*`
> # 0
> 
> x = 1
> names(x) = 'foo'
> `*tmp*`
> # error: object "*tmp*" not found
> 
> `*ugly*`

I agree, and I am a bit flabbergasted.  I had not expected that
something like this would happen and I am indeed not aware of anything
in the documentation that warns about this; but others may prove me
wrong on this.

> given that `*tmp*`is a perfectly legal (though some would say
> 'non-standard') name, it would be good if somewhere here a warning
> were issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is
> not just any non-standard name, but one that is 'obviously' used
> under the hood to perform black magic.

Now I wonder whether there are any other objects (with non-standard)
names) that can be nuked by operations performed under the hood.  

I guess the best thing is to stay away from non-standard names, if only
to save the typing of back-ticks. :)

Thanks for letting me know, I have learned something new today.

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-15 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
>
> Obviously, assuming that R really executes 
>   *tmp* <- x
>   x <- "names<-"('*tmp*', value=c("a","b"))
> under the hood, in the C code, then *tmp* does not end up in the symbol
> table and does not persist beyond the execution of 
>   names(x) <- c("a","b")
>
>   

to prove that i take you seriously, i have peeked into the code, and
found that indeed there is a temporary binding for *tmp* made behind the
scenes -- sort of. unfortunately, it is not done carefully enough to
avoid possible interference with the user's code:

'*tmp*' = 0
`*tmp*`
# 0

x = 1
names(x) = 'foo'
`*tmp*`
# error: object "*tmp*" not found

`*ugly*`

given that `*tmp*`is a perfectly legal (though some would say
'non-standard') name, it would be good if somewhere here a warning were
issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is not just
any non-standard name, but one that is 'obviously' used under the hood
to perform black magic.

it also appears that the explanation given in, e.g., the r language
definition (draft, of course) sec. 3.4.4:

"
Assignment to subsets of a structure is a special case of a general
mechanism for complex
assignment:
x[3:5] <- 13:15
The result of this commands is as if the following had been executed
‘*tmp*‘ <- x
x <- "[<-"(‘*tmp*‘, 3:5, value=13:15)
"

is incomplete (because the final result is not '*tmp*' having the value
of x, as it might seem, but rather '*tmp*' having been unbound).

so the suggestion for the documenters is to add to the end of the
section (or wherever else it is appropriate) a warning to the effect
that in the end '*tmp*' will be removed, even if the user has explicitly
defined it earlier in the same scope.

or maybe have the implementation not rely on a user-forgeable name? for
example, the '.Last.value' name is automatically bound to the most
recently returned value, but it resides in package:base and does not
collide with bindings using it made by the user:

.Last.value = 0

1
.Last.value
# 0, not 1

1
base::.Last.value
# 1, not 0

why could not '*tmp*' be bound and unbound outside of the user's
namespace? (i guess it's easier to update the docs -- or just ignore the
issue.)

on the margin, traceback('<-') will pick only one of the uses of '<-'
suggested by the code above:

x <- 1:10

trace('<-')
x[3:5] <- 13:15
# trace: x[3:5] <- 13:15
# trace: x <- `[<-`(`*tmp*`, 3:5, value = 13:15)

which is somewhat confusing, because then '*tmp*' appears in the trace
somewhat ex machina. (again, the explanation is in the source code, but
the traceback could have been more informative.)

cheers,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-14 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
> On Sat, 14 Mar 2009 07:22:34 +0100
> Wacek Kusnierczyk  wrote:
>
> [...]
>   
>>> Well, I don't see any new object created in my workspace after
>>> x <- 4
>>> names(x) <- "foo"
>>> Do you?
>>>   
>>>   
>> of course not.  that's why i'd say the two above are *not*
>> equivalent. 
>>
>> i haven't noticed the 'in the c code';  do you mean the r interpreter
>> actually generates, in the c code, such r expressions for itself to
>> evaluate?
>> 
>
> As I said before, I have little knowledge about how the parser works and
> what goes on under the hood; and I have also little time and
> inclination to learn about it.  
>
> But if you are interested in these details, then by all means invest
> the time to investigate.
>
>   

berwin, you're playing radio erewan now.  i talk about what the user
sees at the interface, and you talk about c code.  then you admit you
don't know the code, and suggest i examine it if i'm interested.  i
incidentally am, but the whole point was that the user should not be
forced to look under the hood to know the interface to a function. 
prefix 'names<-' seems to have a certain behaviour that is not properly
documented.

> Alternatively, you would hope that Simon eventually finishes the book
> that he is writing on programming in R; as I understand it, that book
> would explain part of these issues in details.  Hopefully, along with
> the book he makes the tools that he has for introspection available.
>   

simon:  i'd be happy to contribute in any way you might find useful.

>   
 i guess you have looked under the hood;  point me to the relevant
 code. 
 
>>> No I did not, because I am not interested in knowing such intimate
>>> details of R, but it seems you were interested.
>>>   
>>>   
>> yes, but then your claim about what happens under the hood, in the c
>> code, is a pure stipulation.  
>> 
>
> I made no claim about what is going on under the hood because I have no
> knowledge about these matters.  But, yes, I was speculating of what
> might go on.
>   

owe me a beer.

>   
>> and you got the example from the r language definition sec. 10.2,
>> which says the forms are equivalent, with no 'under the hood, in the
>> c code' comment.
>> 
>
> Trying to figure out what a writer/painter actually means/says beyond
> the explicitly stated/painted, something that is summed up in Australia
> (and other places) under the term "critical thinking", was not high in
> the curriculum of your school, was it? :-)
>   

sure, but probably not the way you seem to think about.  have you
incidentally read ferdydurke by gombrowicz? 


>   
>> you're just showing that your statements cannot be taken seriously.
>> 
>
> Usually, my statement can be taken seriously, unless followed by some
> indication that I said them tongue-in-cheek.  Of course, statements
> that I allegedly made but were in fact put into my mouth cannot, and
> should not, be taken seriously.
>   

i'm talking about your speculations about what the parser does (wrt.
infix and prefix forms having exactly the same parse tree), rather vague
statements such as "'names<-'(x,'foo') should create (more or less) a
parse tree equivalent to that expression", and other statements (surely,
qualified with 'assuming', 'strongly suggests', and the like), coupled
with your admitting that you in fact donæt know what happens there, is
not particularly reassuring.
>   
 yes, *if* you are able to predict the refcount of the object
 passed to 'names<-' *then* you can predict what 'names<-' will do,
 [...] 
 
>>> I think Simon pointed already out that you seem to have a wrong
>>> picture of what is going on.  [...]
>>>   
>> so what you quote effectively talks about a specific refcount
>> mechanism.  it's not refcount that would be used by the garbage
>> collector, but it's a refcount, or maybe refflag.
>> 
>
> Fair enough, if you call this a refcount then there is no problem.
> Whenever I came across the term refcount in my readings, it was
> referring to different mechanisms, typically mechanisms that kept exact
> track on how often an object was referred too.  So I would not call the
> value of the named field a refcount.  And we can agree to call it from
> now on a refcount as long as we realise what mechanism is really used.
>   

the major point of the discussion was that 'names<-' will sometimes
modify and othertimes copy its argument.  you chose to justify this by
looking under the hood, and i suppose you were pretty clear what i meant
by refcount, because it should have been clear from the context.

>  
>   
>> yes, that's my opinion:  the effects of implementation tricks should
>> not be observable by the user, because they can lead to hard to
>> explain and debug behaviour in the user's program.  you surely don't
>> suggest that all users consult the source code before writing
>> programs in r.
>> 
>
> Indeed, I am not suggesting

Re: [Rd] surprising behaviour of names<-

2009-03-14 Thread Berwin A Turlach

On Sat, 14 Mar 2009 07:22:34 +0100
Wacek Kusnierczyk  wrote:

[...]
> > Well, I don't see any new object created in my workspace after
> > x <- 4
> > names(x) <- "foo"
> > Do you?
> >   
> 
> of course not.  that's why i'd say the two above are *not*
> equivalent. 
> 
> i haven't noticed the 'in the c code';  do you mean the r interpreter
> actually generates, in the c code, such r expressions for itself to
> evaluate?

As I said before, I have little knowledge about how the parser works and
what goes on under the hood; and I have also little time and
inclination to learn about it.  

But if you are interested in these details, then by all means invest
the time to investigate.

Alternatively, you would hope that Simon eventually finishes the book
that he is writing on programming in R; as I understand it, that book
would explain part of these issues in details.  Hopefully, along with
the book he makes the tools that he has for introspection available.

> >> i guess you have looked under the hood;  point me to the relevant
> >> code. 
> >
> > No I did not, because I am not interested in knowing such intimate
> > details of R, but it seems you were interested.
> >   
> 
> yes, but then your claim about what happens under the hood, in the c
> code, is a pure stipulation.  

I made no claim about what is going on under the hood because I have no
knowledge about these matters.  But, yes, I was speculating of what
might go on.

> and you got the example from the r language definition sec. 10.2,
> which says the forms are equivalent, with no 'under the hood, in the
> c code' comment.

Trying to figure out what a writer/painter actually means/says beyond
the explicitly stated/painted, something that is summed up in Australia
(and other places) under the term "critical thinking", was not high in
the curriculum of your school, was it? :-)

> you're just showing that your statements cannot be taken seriously.

Usually, my statement can be taken seriously, unless followed by some
indication that I said them tongue-in-cheek.  Of course, statements
that I allegedly made but were in fact put into my mouth cannot, and
should not, be taken seriously.

> >> yes, *if* you are able to predict the refcount of the object
> >> passed to 'names<-' *then* you can predict what 'names<-' will do,
> >> [...] 
> >
> > I think Simon pointed already out that you seem to have a wrong
> > picture of what is going on.  [...]
>
> so what you quote effectively talks about a specific refcount
> mechanism.  it's not refcount that would be used by the garbage
> collector, but it's a refcount, or maybe refflag.

Fair enough, if you call this a refcount then there is no problem.
Whenever I came across the term refcount in my readings, it was
referring to different mechanisms, typically mechanisms that kept exact
track on how often an object was referred too.  So I would not call the
value of the named field a refcount.  And we can agree to call it from
now on a refcount as long as we realise what mechanism is really used.

> yes, that's my opinion:  the effects of implementation tricks should
> not be observable by the user, because they can lead to hard to
> explain and debug behaviour in the user's program.  you surely don't
> suggest that all users consult the source code before writing
> programs in r.

Indeed, I am not suggesting this.  Only users who use/rely on
features that are not sufficiently documented would have to study the
source code to find out what the exact behaviour is.  But, of course,
this could be fraught with danger since the behaviour could change
without warning.

> i have indeed learned what prefix 'names<-' does and now i know that
> the surprising behaviour is due to the observability of the internal
> optimization.
> 
> thanks to simon, peter, and you for your answers which allowed me to
> learn this ugly detail.

You are welcome.

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-14 Thread Thomas Lumley


On Fri, 13 Mar 2009, William Dunlap wrote:


Would it make anyone any happier if the manual said
that the replacement functions should not be called
in the form
  xNew <- `func<-` (xOld, value)
and should only be used as
  func(xToBeChanged) <- value
?


That was my reaction, too.  The discussion reminded me of old comp.lang.c 
threads about i=i++ and similar issues. The anomalies in
  xNew <- `func<-` (xOld, value) 
arise precisely because it isn't supposed to be used that way.


My other proposal for 'rigidly defined areas of doubt and uncertainty' has been 
the evaluation order of the *apply family (eg, does apply process the columns 
left to right, or right to left, or however it feels like?).


  -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-13 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
> On Fri, 13 Mar 2009 19:41:42 +0100
> Wacek Kusnierczyk  wrote:
>
>
>   
>> indeed, you said "R supposedly uses call-by-value (though we know how
>> to circumvent that, don't we?)".
>>
>> in that vain, R supposedly can be used to do valid statistical
>> computations (though we know how to circumvent it) ;)
>> 
>
> Sure, use Excel? ;-)
>   

no, it has a buggy round

>  
>   
>>> Indeed, if you type these two commands on the command line, then it
>>> is not surprising that a copy of tmp is returned since you create a
>>> temporary object that ends up in the symbol table and persist after
>>> the commands are finished.
>>>   
>>>   
>> what does command line have to do with it?
>> 
>
> If you want to find out what goes on under the hood, it is not
> necessarily sufficient to do the same calculations on the command line.
>  
>   
>>> Obviously, assuming that R really executes 
>>> *tmp* <- x
>>> x <- "names<-"('*tmp*', value=c("a","b"))
>>> under the hood, in the C code, then *tmp* does not end up in the
>>> symbol table 
>>>   
>> no?
>> 
>
> Well, I don't see any new object created in my workspace after
>   x <- 4
>   names(x) <- "foo"
> Do you?
>   

of course not.  that's why i'd say the two above are *not* equivalent. 

i haven't noticed the 'in the c code';  do you mean the r interpreter
actually generates, in the c code, such r expressions for itself to
evaluate?


>   
>> i guess you have looked under the hood;  point me to the relevant
>> code.
>> 
>
> No I did not, because I am not interested in knowing such intimate
> details of R, but it seems you were interested.
>   

yes, but then your claim about what happens under the hood, in the c
code, is a pure stipulation.  and you got the example from the r
language definition sec. 10.2, which says the forms are equivalent, with
no 'under the hood, in the c code' comment.

you're just showing that your statements cannot be taken seriously.


>  
>   
>> yes, *if* you are able to predict the refcount of the object passed to
>> 'names<-' *then* you can predict what 'names<-' will do, [...] 
>> 
>
> I think Simon pointed already out that you seem to have a wrong
> picture of what is going on.  As far as I know, there is no refcount
> for objects.  
>
> The relevant documentation would be R Language Manual, 1.1 SEXPs:
>
>   What R users think of as variables or objects are symbols which are
>   bound to a value. The value can be thought of as either a SEXP (a
>   pointer), or the structure it points to, a SEXPREC (and there are
>   alternative forms used for vectors, namely VECSXP pointing to
>   VECTOR_SEXPREC structures).
>
> and 1.1.2 Rest of header:
>
>   The named field is set and accessed by the SET_NAMED
>   and NAMED macros, and take values 0, 1 and 2. R has a `call by value'
>   illusion, so an assignment like
>
>   b <- a
>
>   appears to make a copy of a and refer to it as b. However, if neither
>   a nor b are subsequently altered there is no need to copy. What really
>   happens is that a new symbol b is bound to the same value as a and the
>   named field on the value object is set (in this case to 2). When an
>   object is about to be altered, the named field is consulted. A value
>   of 2 means that the object must be duplicated before being changed.
>   (Note that this does not say that it is necessary to duplicate, only
>   that it should be duplicated whether necessary or not.) A value of 0
>   means that it is known that no other SEXP shares data with this
>   object, and so it may safely be altered. A value of 1 is used for
>   situations like
>
>   dim(a) <- c(7, 2)
>
>   where in principle two copies of a exist for the duration of the
>   computation as (in principle)
>
>   a <- `dim<-`(a, c(7, 2))
>
>   but for no longer, and so some primitive functions can be optimized to
>   avoid a copy in this case. 
>
>   

so what you quote effectively talks about a specific refcount
mechanism.  it's not refcount that would be used by the garbage
collector, but it's a refcount, or maybe refflag.


>> and in general, this should not matter because it should be
>> unobservable, but it isn't.
>> 
>
> That's your opinion (to which you are entitled).  

yes, that's my opinion:  the effects of implementation tricks should not
be observable by the user, because they can lead to hard to explain and
debug behaviour in the user's program.  you surely don't suggest that
all users consult the source code before writing programs in r.


> Unfortunately (for
> you), the designers of R decided on a design which allows them to
> reduce the number of copies that have to be made.
>   

and that's excellent, only that they failed to hide the mechanism below
the interface.  or maybe they decided not to hide it?

> I was under the impression that you were interested to understand what
> happens if you issue the commands
>   names(x) <- "foo"
> and
>   "names<-"(x, "foo")
>

Re: [Rd] surprising behaviour of names<-

2009-03-13 Thread Berwin A Turlach

On Fri, 13 Mar 2009 19:41:42 +0100
Wacek Kusnierczyk  wrote:

> > Glad to see that we agree on this.
> >   
> 
> owe you a beer.

O.k., if we ever meet is is first your shout and then mine.

> >> haven't objected to that.  i object to your 'r uses pass by value',
> >> which is only partially correct.
> >> 
> >
> > Well, I used qualifiers and did not stated it categorically. 
> >   
> 
> indeed, you said "R supposedly uses call-by-value (though we know how
> to circumvent that, don't we?)".
> 
> in that vain, R supposedly can be used to do valid statistical
> computations (though we know how to circumvent it) ;)

Sure, use Excel? ;-)

> > Indeed, if you type these two commands on the command line, then it
> > is not surprising that a copy of tmp is returned since you create a
> > temporary object that ends up in the symbol table and persist after
> > the commands are finished.
> >   
> 
> what does command line have to do with it?

If you want to find out what goes on under the hood, it is not
necessarily sufficient to do the same calculations on the command line.

> > Obviously, assuming that R really executes 
> > *tmp* <- x
> > x <- "names<-"('*tmp*', value=c("a","b"))
> > under the hood, in the C code, then *tmp* does not end up in the
> > symbol table 
> 
> no?

Well, I don't see any new object created in my workspace after
x <- 4
names(x) <- "foo"
Do you?

> i guess you have looked under the hood;  point me to the relevant
> code.

No I did not, because I am not interested in knowing such intimate
details of R, but it seems you were interested.

> yes, *if* you are able to predict the refcount of the object passed to
> 'names<-' *then* you can predict what 'names<-' will do, [...] 

I think Simon pointed already out that you seem to have a wrong
picture of what is going on.  As far as I know, there is no refcount
for objects.  

The relevant documentation would be R Language Manual, 1.1 SEXPs:

  What R users think of as variables or objects are symbols which are
  bound to a value. The value can be thought of as either a SEXP (a
  pointer), or the structure it points to, a SEXPREC (and there are
  alternative forms used for vectors, namely VECSXP pointing to
  VECTOR_SEXPREC structures).

and 1.1.2 Rest of header:

  The named field is set and accessed by the SET_NAMED
  and NAMED macros, and take values 0, 1 and 2. R has a `call by value'
  illusion, so an assignment like

  b <- a

  appears to make a copy of a and refer to it as b. However, if neither
  a nor b are subsequently altered there is no need to copy. What really
  happens is that a new symbol b is bound to the same value as a and the
  named field on the value object is set (in this case to 2). When an
  object is about to be altered, the named field is consulted. A value
  of 2 means that the object must be duplicated before being changed.
  (Note that this does not say that it is necessary to duplicate, only
  that it should be duplicated whether necessary or not.) A value of 0
  means that it is known that no other SEXP shares data with this
  object, and so it may safely be altered. A value of 1 is used for
  situations like

  dim(a) <- c(7, 2)

  where in principle two copies of a exist for the duration of the
  computation as (in principle)

  a <- `dim<-`(a, c(7, 2))

  but for no longer, and so some primitive functions can be optimized to
  avoid a copy in this case. 

> but in general you may not have the chance. [...]

Agreed.

> and in general, this should not matter because it should be
> unobservable, but it isn't.

That's your opinion (to which you are entitled).  Unfortunately (for
you), the designers of R decided on a design which allows them to
reduce the number of copies that have to be made.

> >> you suggested that "One reads the manual, (...) one reflects and
> >> investigates, ..."
> >> 
> >
> > Indeed, and I am not giving up hope that one day you will master
> > this art.
> >   
> 
> well, this time i meant you.

Rest assure I have read and reflected on that part of the manual.  

And I guess it boils down to how you interpret what "is equivalent to"
means.

For me it means that those two commands are what is executed in the C
engine once the "names(x)<-c("a","b")" expression is parsed and the
parse list arrives at the interpreter.  To investigate whether that is
the case, one would have to look at the C code, and I have little
inclination to do so.  But that would be necessary to answer the
question whether *tmp* or a copy of *tmp* is returned, if one is really
interested in this question.  Or whether a *tmp* object is created at
all.

You seem to take "is equivalent to" to mean that issuing
"names(x)<-c("a","b")" on the command line has the same effect as
issuing those two other commands on the command line and addressing
whether *tmp* or a copy of *tmp* is returned in this case.  Fair
enough, but it addresses a different question.  And, as you said
yourself in a

Re: [Rd] surprising behaviour of names<-

2009-03-13 Thread Wacek Kusnierczyk

Tony Plate wrote:
> Wacek Kusnierczyk wrote:
>> Tony Plate wrote:
>>
>>> Is there anything incorrect or missing in the help page for normal
>>> usage of the replacement function for 'names'? (i.e., when used in an
>>> expression like 'names(x) <- ...')
>>> 
>>
>> what is missing here in the first place is a specification of what
>> 'normal' means.  as far as i can see from the man page, 'normal' does
>> not exclude prefix use.  and if so, what is missing in the help page is
>> a clear statement what an application of 'names<-' will do, in the sense
>> of what a user may observe.
>>   
> Fair enough.  I looked at the help page for "names" after sending my
> email, and was surprised to see the following in the "DETAILS" section:
>
>   "It is possible to update just part of the names attribute via the
> general rules: see the examples. This works because the expression
> there is evaluated as |z <- "names<-"(z, "[<-"(names(z), 3, "c2"))|. "
>
> To me, this paragraph is far more confusing than enlightening,
> especially as also gives the impression that it's OK to use a
> replacement function in a functional form.  In my own personal opinion
> it would be a enhancement to remove that example from the
> documentation, and just say you can do things like 'names(x)[2:3] <-
> c("a","b")'.

i must say that this part of the man page does explain things to me. 
much less the code [1] berwin suggested as a piece to read and
investigate (slightly modified):

tmp = x
x = 'names<-'(tmp, 'foo')

berwin's conclusion seemed to be that this code
hints/suggests/fortune-tells the user that 'names<-' might be doing side
effects. 

this code illustrates what names(x) = 'foo' (the infix form) does --
that it destructively modifies x.  now, if the code were to illustrate
that the prefix form does perform side effects too, then the following
would be enough:

'names<-'(x, 'foo')

if the code were to illustrate that the prefix form, unlike the infix
form, does not perform side effects, then the following would suffice
for a discussion:

x = 'names<-'(x, 'foo')

if the code wee to illustrate that the prefix form may or may not do
side effects depending on the situation, then it surely fails to show
that, unless the user performs some sophisticated inference which i am
not capable of, or, more likely, unless the user already knows that this
was to be shown.

without a discussion, the example is simply an unworked rubbish.  and
it's obviously wrong; it says that (slightly and irrelevantly simplified)

names(x) = 'foo'

"is equivalent to"

tmp = x
x = 'names<-'(tmp, 'foo')

which is nonsense, because in the latter case you either have an
additional binding that you don't have in the former case, or, worse,
you rebind, possibly with a different value, a name that has had a
binding already.  it's a gritty-nitty detail, but so is most of
statistics based on nitty-gritty details which non-statisticians are
happy to either ignore or be ignorant about.

[1] http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html#Comments

>
> I often use name replacement functions in a functional way, and
> because one can't use 'names<-' etc in this way, 

note, this 'because' does not follow in any way from the man page, or
the section of 'r language definition' referred to above.

> I define my own functions like the following:
>
> set.names <- function(n,x) {names(x) <- n; x}

it appears that

set.names = function(n, x) 'names<-'(x, n)

would do the job (guess why).

>
> (and similarly for set.rownames(), set colnames(), etc.)
>
> I would highly recommend you do this rather than try to use a call
> like "names<-"(x, ...).

i'm almost tempted to extend your recommendation to 'define your own
function for about every function already in r' ;)

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-13 Thread Tony Plate


Wacek Kusnierczyk wrote:

Tony Plate wrote:
  

Wacek Kusnierczyk wrote:


[snip]
i just can't get it why the manual does not manifestly explain what
'names<-' does, and leaves you doing the guesswork you suggest.

  
  
I'm having trouble understanding the point of this discussion. 
Someone is calling a replacement function in a way that it's not meant

to be used, and is them complaining about it not doing what he thinks
it should, or about the documentation not describing what happens when
one does that?



where is it written that the function is not meant to be used this way? 
you get an example in the man page, showing precisely how it could be

used that way.  it also explains the value of 'names<-':

"
 For 'names<-', the updated object.  (Note that the value of
 'names(x) <- value' is that of the assignment, 'value', not the
 return value from the left-hand side.)
"

it does speak of 'names<-' used in prefix form, and does not do it in
any negative (discouraging) way.

  

Is there anything incorrect or missing in the help page for normal
usage of the replacement function for 'names'? (i.e., when used in an
expression like 'names(x) <- ...')



what is missing here in the first place is a specification of what
'normal' means.  as far as i can see from the man page, 'normal' does
not exclude prefix use.  and if so, what is missing in the help page is
a clear statement what an application of 'names<-' will do, in the sense
of what a user may observe.
  
Fair enough.  I looked at the help page for "names" after sending my 
email, and was surprised to see the following in the "DETAILS" section:


  "It is possible to update just part of the names attribute via the 
general rules: see the examples. This works because the expression there 
is evaluated as |z <- "names<-"(z, "[<-"(names(z), 3, "c2"))|. "


To me, this paragraph is far more confusing than enlightening, 
especially as also gives the impression that it's OK to use a 
replacement function in a functional form.  In my own personal opinion 
it would be a enhancement to remove that example from the documentation, 
and just say you can do things like 'names(x)[2:3] <- c("a","b")'.


I often use name replacement functions in a functional way, and because 
one can't use 'names<-' etc in this way, I define my own functions like 
the following:


set.names <- function(n,x) {names(x) <- n; x}

(and similarly for set.rownames(), set colnames(), etc.)

I would highly recommend you do this rather than try to use a call like 
"names<-"(x, ...).


-- Tony Plate

(I guess that if on the label of fridge there is a picture of a guy 
carrying it on his back, then Mr. Fridge-Racer might have some grounds 
for suing.)


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-13 Thread Wacek Kusnierczyk

Tony Plate wrote:
> Wacek Kusnierczyk wrote:
>> [snip]
>> i just can't get it why the manual does not manifestly explain what
>> 'names<-' does, and leaves you doing the guesswork you suggest.
>>
>>   
> I'm having trouble understanding the point of this discussion. 
> Someone is calling a replacement function in a way that it's not meant
> to be used, and is them complaining about it not doing what he thinks
> it should, or about the documentation not describing what happens when
> one does that?

where is it written that the function is not meant to be used this way? 
you get an example in the man page, showing precisely how it could be
used that way.  it also explains the value of 'names<-':

"
 For 'names<-', the updated object.  (Note that the value of
 'names(x) <- value' is that of the assignment, 'value', not the
 return value from the left-hand side.)
"

it does speak of 'names<-' used in prefix form, and does not do it in
any negative (discouraging) way.

>
> Is there anything incorrect or missing in the help page for normal
> usage of the replacement function for 'names'? (i.e., when used in an
> expression like 'names(x) <- ...')

what is missing here in the first place is a specification of what
'normal' means.  as far as i can see from the man page, 'normal' does
not exclude prefix use.  and if so, what is missing in the help page is
a clear statement what an application of 'names<-' will do, in the sense
of what a user may observe.

>
> R does give one the ability to use its facilities in non-standard
> ways.  However, I don't see much value in the help page for 'gun'
> attempting to describe the ways in which the bones in your foot will
> be shattered should you choose to point the gun at your foot and pull
> the trigger.  Reminds me of the story of the guy in New York, who
> after injuring his back in refrigerator-carrying race, sued the
> manufacturer of the refrigerator for not having a warning label
> against that sort of use.

very funny.  little relevant.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-13 Thread Wacek Kusnierczyk

William Dunlap wrote:
> Would it make anyone any happier if the manual said
> that the replacement functions should not be called
> in the form
>xNew <- `func<-` (xOld, value)
> and should only be used as
>func(xToBeChanged) <- value
>   

surely better than guesswork.

> ? 
>
> The explanation
>   names(x) <- c("a","b")
>   is equivalent to
>   '*tmp*' <- x
>   x <- "names<-"('*tmp*', value=c("a","b"))
> could also be extended a bit, adding a line like
>   rm(`*tmp*`)
> Those 3 lines should be considered an atomic operation:
> the value that `*tmp*` or `x` may have or what is
> in the symbol table at various points in that sequence 
> is not defined.  (Letting details be explicitly undefined
> is important: it gives developers room to improve the
> efficiency of the interpreter and tells users where not to go.) 
>   

there is a difference between letting things be undefined and explicitly
stating that things are unspecified.  the c99 standard [1], for example,
is explicit about the non-determinism of expressions that involve side
effects, as it is about that some expressions may actually not be
evaluated if the optimizer decides so. 

berwin has already suggested that one reads from what docs do *not*
say;  it's a very bad idea.  it's best that the documentation *does* say
that, for example, a particular function should be used only in the
infix form because the semantics of the prefix form are not guaranteed
and may change in future versions.

if the current state is that 'names<-' will modify the object it is
given as an argument in some situations, but not in others, and this is
visible to the user, the best thing to do is to give an explicit warning
-- perhaps with an annotation that things may change, if they may.

best,
vQ

[1] http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-13 Thread Tony Plate


Wacek Kusnierczyk wrote:

[snip]
i just can't get it why the manual does not manifestly explain what
'names<-' does, and leaves you doing the guesswork you suggest.

  
I'm having trouble understanding the point of this discussion.  Someone 
is calling a replacement function in a way that it's not meant to be 
used, and is them complaining about it not doing what he thinks it 
should, or about the documentation not describing what happens when one 
does that?


Is there anything incorrect or missing in the help page for normal usage 
of the replacement function for 'names'? (i.e., when used in an 
expression like 'names(x) <- ...')


R does give one the ability to use its facilities in non-standard ways.  
However, I don't see much value in the help page for 'gun' attempting to 
describe the ways in which the bones in your foot will be shattered 
should you choose to point the gun at your foot and pull the trigger.  
Reminds me of the story of the guy in New York, who after injuring his 
back in refrigerator-carrying race, sued the manufacturer of the 
refrigerator for not having a warning label against that sort of use.


-- Tony Plate

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-13 Thread William Dunlap

Would it make anyone any happier if the manual said
that the replacement functions should not be called
in the form
   xNew <- `func<-` (xOld, value)
and should only be used as
   func(xToBeChanged) <- value
? 

The explanation
  names(x) <- c("a","b")
  is equivalent to
  '*tmp*' <- x
  x <- "names<-"('*tmp*', value=c("a","b"))
could also be extended a bit, adding a line like
  rm(`*tmp*`)
Those 3 lines should be considered an atomic operation:
the value that `*tmp*` or `x` may have or what is
in the symbol table at various points in that sequence 
is not defined.  (Letting details be explicitly undefined
is important: it gives developers room to improve the
efficiency of the interpreter and tells users where not to go.) 


Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

> -Original Message-
> From: r-devel-boun...@r-project.org 
> [mailto:r-devel-boun...@r-project.org] On Behalf Of Wacek Kusnierczyk
> Sent: Friday, March 13, 2009 11:42 AM
> To: Berwin A Turlach
> Cc: r-devel@r-project.org List
> Subject: Re: [Rd] surprising behaviour of names<-
> ... blah blah blah
> >> x = 1
> >> tmp = x
> >> x = 'names<-'(tmp, 'foo')
> >> names(tmp)
> >> # NULL

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-13 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
>
>> sure!
>> 
>
> Glad to see that we agree on this.
>   

owe you a beer.

>   
>>> Read section 2.1.10 ("Environments") in the R
>>> Language Definition, 
>>>   
>> haven't objected to that.  i object to your 'r uses pass by value',
>> which is only partially correct.
>> 
>
> Well, I used qualifiers and did not stated it categorically. 
>   

indeed, you said "R supposedly uses call-by-value (though we know how to
circumvent that, don't we?)".

in that vain, R supposedly can be used to do valid statistical
computations (though we know how to circumvent it) ;)

>  
>   
 and actually, in the example we discuss, 'names<-' does *not*
 return an updated *tmp*, so there's even less to entertain.  

>>> How do you know?  Are you sure?  Have you by now studied what goes
>>> on under the hood?
>>>   
>> yes, a bit.  but in this example, it's enough to look into *tmp* to
>> see that it hasn't got the names added, and since x does have names,
>> names<- must have returned a copy of *tmp* rather than *tmp* changed:
>>
>> x = 1
>> tmp = x
>> x = 'names<-'(tmp, 'foo')
>> names(tmp)
>> # NULL
>> 
>
> Indeed, if you type these two commands on the command line, then it is
> not surprising that a copy of tmp is returned since you create a
> temporary object that ends up in the symbol table and persist after the
> commands are finished.
>   

what does command line have to do with it?

> Obviously, assuming that R really executes 
>   *tmp* <- x
>   x <- "names<-"('*tmp*', value=c("a","b"))
> under the hood, in the C code, then *tmp* does not end up in the symbol
> table 

no?

> and does not persist beyond the execution of 
>   names(x) <- c("a","b")
>   

no?

i guess you have looked under the hood;  point me to the relevant code.

> This looks to me as one of the situations where a value of 1 is used
> for the named field of some of the objects involves so that a copy can
> be avoided.  That's why I asked whether you looked under the hood.
>   

anyway, what happens under the hood is much less interesting from the
user's perspective that what can be seen over the hood.  what i can see,
is that 'names<-' will incoherently perform in-place modification or
copy-on-assignment. 

yes, *if* you are able to predict the refcount of the object passed to
'names<-' *then* you can predict what 'names<-' will do, but in general
you may not have the chance.  and in general, this should not matter
because it should be unobservable, but it isn't.

back to your i += i++ example, the outcome may differ from a compiler to
a compiler, but, i guess, compilers will implement the order coherently,
so that whatever version they choose, the outcome will be predictable,
and not dependent on some earlier code.  (prove me wrong.  or maybe i'll
do it myself.)

>   
>> you suggested that "One reads the manual, (...) one reflects and
>> investigates, ..."
>> 
>
> Indeed, and I am not giving up hope that one day you will master this
> art.
>   

well, this time i meant you.

>   
>> -- had you done it, you wouldn't have asked the  question.
>> 
>
> Sorry, I forgot that you have a tendency to interpret statements
> extremely verbatim 

yes, i have two hooks installed:  one says \begin{verbatim}, the other
says \end{verbatim}.

> and with little reference to the context in which
> they are made.  

not that you're trying to be extremely accurate or polite here...

> I will try to be more explicit in future.
>   

it will certainly do good to you.

>>
>> i just can't get it why the manual does not manifestly explain what
>> 'names<-' does, and leaves you doing the guesswork you suggest.
>> 
>
> As I said before, patched to documentation are also welcome.
>   

i'll give it a try.

> Best wishes,
>   

hope you mean it.

likewise,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-13 Thread Berwin A Turlach

On Fri, 13 Mar 2009 11:43:55 +0100
Wacek Kusnierczyk  wrote:

> Berwin A Turlach wrote:
>
> > And it is documented behaviour.  
> 
> sure!

Glad to see that we agree on this.

> > Read section 2.1.10 ("Environments") in the R
> > Language Definition, 
> 
> haven't objected to that.  i object to your 'r uses pass by value',
> which is only partially correct.

Well, I used qualifiers and did not stated it categorically. 

> >> and actually, in the example we discuss, 'names<-' does *not*
> >> return an updated *tmp*, so there's even less to entertain.  
> >> 
> >
> > How do you know?  Are you sure?  Have you by now studied what goes
> > on under the hood?
> 
> yes, a bit.  but in this example, it's enough to look into *tmp* to
> see that it hasn't got the names added, and since x does have names,
> names<- must have returned a copy of *tmp* rather than *tmp* changed:
>
> x = 1
> tmp = x
> x = 'names<-'(tmp, 'foo')
> names(tmp)
> # NULL

Indeed, if you type these two commands on the command line, then it is
not surprising that a copy of tmp is returned since you create a
temporary object that ends up in the symbol table and persist after the
commands are finished.

Obviously, assuming that R really executes 
*tmp* <- x
x <- "names<-"('*tmp*', value=c("a","b"))
under the hood, in the C code, then *tmp* does not end up in the symbol
table and does not persist beyond the execution of 
names(x) <- c("a","b")

This looks to me as one of the situations where a value of 1 is used
for the named field of some of the objects involves so that a copy can
be avoided.  That's why I asked whether you looked under the hood.

> you suggested that "One reads the manual, (...) one reflects and
> investigates, ..."

Indeed, and I am not giving up hope that one day you will master this
art.

> -- had you done it, you wouldn't have asked the  question.

Sorry, I forgot that you have a tendency to interpret statements
extremely verbatim and with little reference to the context in which
they are made.  I will try to be more explicit in future.

> >> for fun and more guesswork, the example could have been:
> >>
> >> x = x
> >> x = 'names<-'(x, value=c('a', 'b'))
> >> 
> >
> > But it is manifestly not written that way in the manual; and for
> > good reasons since 'names<-' might have side effects which invokes
> > in the last line undefined behaviour.  Just as in the equivalent C
> > snippet that I mentioned.
> 
> i just can't get it why the manual does not manifestly explain what
> 'names<-' does, and leaves you doing the guesswork you suggest.

As I said before, patched to documentation are also welcome.

Best wishes,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-13 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
>
>> foo = function(arg) arg$foo = foo
>>
>> e = new.env()
>> foo(e)
>> e$foo
>>   
>> are you sure this is pass by value?
>> 
>
> But that is what environments are for, aren't they?  

might be.

> And it is
> documented behaviour.  

sure!

> Read section 2.1.10 ("Environments") in the R
> Language Definition, 

haven't objected to that.  i object to your 'r uses pass by value',
which is only partially correct.

> in particular the last paragraph:
>
>   Unlike most other R objects, environments are not copied when 
>   passed to functions or used in assignments.  Thus, if you assign the
>   same environment to several symbols and change one, the others will
>   change too.  In particular, assigning attributes to an environment can
>   lead to surprises.
>
> [..]
>   
>> and actually, in the example we discuss, 'names<-' does *not* return
>> an updated *tmp*, so there's even less to entertain.  
>> 
>
> How do you know?  Are you sure?  Have you by now studied what goes on
> under the hood?
>   

yes, a bit.  but in this example, it's enough to look into *tmp* to see
that it hasn't got the names added, and since x does have names, names<-
must have returned a copy of *tmp* rather than *tmp* changed:
   
x = 1
tmp = x
x = 'names<-'(tmp, 'foo')
names(tmp)
# NULL

you suggested that "One reads the manual, (...) one reflects and
investigates, ..." -- had you done it, you wouldn't have asked the question.



>   
>> for fun and more guesswork, the example could have been:
>>
>> x = x
>> x = 'names<-'(x, value=c('a', 'b'))
>> 
>
> But it is manifestly not written that way in the manual; and for good
> reasons since 'names<-' might have side effects which invokes in the
> last line undefined behaviour.  Just as in the equivalent C snippet
> that I mentioned.
>   

i just can't get it why the manual does not manifestly explain what
'names<-' does, and leaves you doing the guesswork you suggest.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Berwin A Turlach

On Thu, 12 Mar 2009 21:26:15 +0100
Wacek Kusnierczyk  wrote:

> > YMMV, but when I read a passage like this in R documentation, I
> > start to wonder why it is stated that 
> > names(x) <- c("a","b")
> > is equivalent to 
> > *tmp* <- x
> > x <- "names<-"('*tmp*', value=c("a","b"))
> > and the simpler construct
> > x <- "names<-"(x, value=c("a", "b"))
> > is not used.  There must be a reason, 
> 
> got an explanation:  because it probably is as drafty as the
> aforementioned document.

Your grasp of what "draft manual" means in the context of R
documentation seems to be as tenuous as the grasp of intelligent
design/creationist proponents on what it means in science to label a
body of knowledge a "(scientific) theory". :)

[...]
> but it is possible to send an argument to a function that makes an
> assignment to the argument, and yet the assignment is made to the
> original, not to a copy:
> 
> foo = function(arg) arg$foo = foo
> 
> e = new.env()
> foo(e)
> e$foo
>   
> are you sure this is pass by value?

But that is what environments are for, aren't they?  And it is
documented behaviour.  Read section 2.1.10 ("Environments") in the R
Language Definition, in particular the last paragraph:

  Unlike most other R objects, environments are not copied when 
  passed to functions or used in assignments.  Thus, if you assign the
  same environment to several symbols and change one, the others will
  change too.  In particular, assigning attributes to an environment can
  lead to surprises.

[..]
> and actually, in the example we discuss, 'names<-' does *not* return
> an updated *tmp*, so there's even less to entertain.  

How do you know?  Are you sure?  Have you by now studied what goes on
under the hood?

> for fun and more guesswork, the example could have been:
> 
> x = x
> x = 'names<-'(x, value=c('a', 'b'))

But it is manifestly not written that way in the manual; and for good
reasons since 'names<-' might have side effects which invokes in the
last line undefined behaviour.  Just as in the equivalent C snippet
that I mentioned.

> for your interest in well written documentation, ?names says that the
> argument x is 'an r object', and nowhere does it say that environment
> is not an r object.  it also says what the value of 'names<-' applied
> to pairlists is.  the following error message is doubly surprising:
> 
> e = new.env()
> 'names<-'(e, 'foo')
> # Error: names() applied to a non-vector

But names are implemented by assigning a "name" attribute to the
object; as you should know.  And the above documentation suggests that
it is not a good idea to assign attributed to environments.  So why
would you expect this to work?

> firstly, because it would seem that there's nothing wrong in applying
> names to an environment;  from ?'$':
> 
> "
> x$name
> 
> name: A literal character string or a name (possibly backtick
>   quoted).  For extraction, this is normally (see under
>   'Environments') partially matched to the 'names' of the
>   object.
> "

I fail to see the relevance of this.

> secondly, because, as ?names says, names can be applied to pairlists,

Yes, but it does not say that names can be applied to environment.
And it explicitly says that the "default methods get and set the
'"name"' attribute of..." and (other) documentation warns you about
setting attributes on environments.

> which are not vectors, and the following does not give an error as
> above:
> 
> p = pairlist()
> is.vector(p)
> # FALSE
> names(p)
> # names successfully applied to a non-vector
>
> assure me this is not a mess, but a well-documented design feature.

It is documented, if it is well-documented depends on your definition
of "well-documented". :)

> ... and one wonders why r man pages have to be read in O(e^n) time.

I believe patches to documentation are also welcome; and perhaps more
readily accepted than patches to code. 

[...]  
> >>> I guess that would require a rewrite (or extension) of the parser.
> >>> To me, Section 10.1.2 of the Language Definition manual suggests
> >>> that once an expression is parsed, you cannot distinguish any more
> >>> whether 'names<-' was called using infix syntax or prefix syntax.
> >>>   
> >>>   
> >> but this must be nonsense, since:
> >>
> >> x = 1
> >> 'names<-'(x, 'foo')
> >> names(x)
> >> # NULL
> >>
> >> x = 1
> >> names(x) <- 'foo'
> >> names(x)
> >> # "foo"
> >>
> >> clearly, there is not only syntactic difference here.  but it
> >> might be that 10.1.2 does not suggest anything like what you say.
> >> 
> >
> > Please tell me how this example contradicts my reading of 10.1.2
> > that the expressions 
> > 'names<-'(x, 'foo')
> > and
> > names(x) <- 'foo'
> > once they are parsed, produce exactly the same parse tree and that
> > it becomes impossible to tell from the parse tree whether
> > originally the infix syntax or

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Wacek Kusnierczyk

G. Jay Kerns wrote:
> Wacek Kusnierczyk wrote:
>
>
>   
> I am prompted to imagine someone pointing out to the volunteers of the
> International Red Cross - on the field of a natural disaster, no less
> - that their uniforms are not an acceptably consistent shade of
> pink... or that the screws on their tourniquets do not have the
> appropriate pitch as to minimize the friction for the turner...
>
>   

not that it is very accurate, because unintuitive and confusing
semantics may lead to hidden and dangerous errors in users' code.  wrong
shade of a uniform might lead to the person being shot, for example, but
then your point vanishes.

> As a practicing statistician I am simply thankful that the bleeding is
> stopped.   :-)
>   

when it is stopped, not turned to an internal bleeding, which you simply
don't see.

> Cheers to R-Core (and the hundreds of other volunteers).
>
>   

absolutely.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Josh Ulrich

On Thu, Mar 12, 2009 at 3:24 PM, G. Jay Kerns  wrote:
> Wacek Kusnierczyk wrote:
>
> [snip]
>
>> as i explained a few months ago, i study r to find examples of bad
>> design.  if anyone in the r core is interested in having the problems i
>> report fixed, i'm happy to get involved in a discussion about the design
>> and implementation.  if not, i'm happy with just pointing out the issues.
>
> :-)
>
> I am prompted to imagine someone pointing out to the volunteers of the
> International Red Cross - on the field of a natural disaster, no less
> - that their uniforms are not an acceptably consistent shade of
> pink... or that the screws on their tourniquets do not have the
> appropriate pitch as to minimize the friction for the turner...

Your analogy may overstate the case a bit, since R volunteers - while
providing a valuable service to the community - are not dealing with
matters of life and death.

Habitat for Humanity (an organization that provides free housing to
the under-privileged) would be a better comparison.  I'm sure those
volunteers would appreciate a critique of their work, provided the
critique was not condescending and focused on serving the community
better, not to showcase the acumen of the one giving the critique.

>
> As a practicing statistician I am simply thankful that the bleeding is
> stopped.   :-)
>
> Cheers to R-Core (and the hundreds of other volunteers).
> Jay
>

I second that.  Thanks to R-Core et al for all their generous efforts.

>
>
> ***
> G. Jay Kerns, Ph.D.
> Associate Professor
> Department of Mathematics & Statistics
> Youngstown State University
> Youngstown, OH 44555-0002 USA
> Office: 1035 Cushwa Hall
> Phone: (330) 941-3310 Office (voice mail)
> -3302 Department
> -3170 FAX
> E-mail: gke...@ysu.edu
> http://www.cc.ysu.edu/~gjkerns/
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Best,
Josh
--
http://quantemplation.blogspot.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Wacek Kusnierczyk

Simon Urbanek wrote:
>
> On Mar 12, 2009, at 11:12 , Wacek Kusnierczyk wrote:
>
>> Simon Urbanek wrote:
>>>
>>> On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:
>>>
 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter hinted
 at that). Internally x=1 an x=c(1) are slightly different in that the
 former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
 what causes the difference in behavior as Peter explained. The reason
 is that c(1) creates a copy of the 1 (which is a constant
 [=unmutable] thus requiring a copy) and the new copy has no other
 references and thus can be modified and hence NAMED(x) = 0.

>>>
>>> Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --
>>> since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)
>>> = 1 -- this is just a detail on how things work with assignment, the
>>> explanation above is still correct since duplication happens
>>> conditional on NAMED == 2.
>>
>> there is an interesting corollary.  self-assignment seems to increase
>> the reference count:
>>
>>x = 1;  'names<-'(x, 'foo'); names(x)
>># NULL
>>
>>x = 1;  x = x;  'names<-'(x, 'foo'); names(x)
>># "foo"
>>
>
> Not for me, at least in current R:

not for me either.  i messed up the example, sorry.  here's the intended
version:

x = c(1);  'names<-'(x, 'foo');  names(x)
# "foo"

x = c(1);  x = x; 'names<-'(x, 'foo');  names(x)
# NULL

>
> > x = 1;  'names<-'(x, 'foo'); names(x)
> foo
>   1
> NULL
> > x = 1;  x = x;  'names<-'(x, 'foo'); names(x)
> foo
>   1
> NULL
>
> (both R 2.8.1 and R-devel 3/11/09, darwin 9.6)
>
> In addition, you still got it backwards - your output suggests that
> the assignment created a new, clean copy. Functional call of `names<-`
> (whose side-effect on x is undefined BTW) is destructive when you get
> a clean copy (e.g. as a result of the c function) and non-destructive
> when the object was referenced. It is left as an exercise to the
> reader to reason why constants such as 1 are referenced.

all true, again because of my mistake. 

anyway, it may be suprising that with all its smartness (i mean it)
about copy-on-assingment, r does not see that it makes no sense to
increase refcount here.  of course, you can't judge from just the
syntactic form 'x=x', but still it should not be very difficult to have
the interpreter see when it finds an object named 'x' in the same
environment where it attempts the assignment.  (of course, who'd do
self-assignments in practical code?)

cheers,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
> On Thu, 12 Mar 2009 15:21:50 +0100
> Wacek Kusnierczyk  wrote:
>
>   
>> seems to suggest?  is not the purpose of documentation to clearly,
>> ideally beyond any doubt, specify what is to be specified?
>> 
>
> The R Language Definition manual is still a draft. :)
>   

this is indeed a good explanation for all sorts of nonsense.  worse if
stuff tends to persist despite critique.

>   
>>> that in this case the infix and prefix syntax
>>> is not equivalent as it does not say that 
>>>   
>>>   
>> are you suggesting fortune telling from what the docs do *not* say?
>> 
>
> My experience is that sometimes you have to realise what is not
> stated.  

in general, yes.  in r, this often ends up with 'have you seen the
documentation saying that??' in response.

> I remember a discussion with somebody who asked why he could
> not run, on windows, R CMD INSTALL on a *.zip file.  I pointed out to
> him that the documentation states that you can run R CMD INSTALL on
> *.tar.gz or *.tgz files and, thus, there should be no expectation that
> it can be run on *.zip file.
>   

yes, that's a good point.  this reminds me of a (possibly anectodal)
lady who sued the manufacturer of her microwave after she had dried in
it her cat after a bath.

> YMMV, but when I read a passage like this in R documentation, I start
> to wonder why it is stated that 
>   names(x) <- c("a","b")
> is equivalent to 
>   *tmp* <- x
>   x <- "names<-"('*tmp*', value=c("a","b"))
> and the simpler construct
>   x <- "names<-"(x, value=c("a", "b"))
> is not used.  There must be a reason, 

got an explanation:  because it probably is as drafty as the
aforementioned document.

> nobody likes to type
> unnecessarily long code.  And, after thinking about this for a while,
> the penny might drop.
>   

that's cool.  instead of stating what 'names<-' does or does not, one
expresses it in a convoluted way an makes you guess from a *tmp*
variable. a nice exercise, i like it.

> [...] 
>   
 does this say anything about what 'names<-'(...) actually
 returns?  updated *tmp*, or a copy of it?

>>> Since R uses pass-by-value, 
>>>   
>> since?  it doesn't!
>> 
>
> For all practical purposes it is as long as standard evaluation is
> used.  One just have to be aware that some functions evaluate their
> arguments in a non-standard way.  
>   

it's maybe a bit of hairsplitting, but what you have in r is not exactly
what is called 'pass by value'.  here's a relevant quote from [1], p. 309:

"
In the call-by-name (CBN) mechanism, a formal parameter names the
computation designated by an unevaluated argument expression.

In the call-by-value (CBV) mechanism, a formal parameter names the value
of an evaluated argument expression.

In the call-by-need or lazy evaluation (CBL), the formal parameter name
can be bound to a location that originally stores the computation of the
argument expression. The first time the parameter is referenced, the
computation is performed, but the resulting value is cached at the
location and is used on every subsequent reference. Thus, the argument
expression is evaluated at most once and is never evaluated at all if
the parameter is never referenced.
"

note the 'unevaluated' and 'evaluated'.  you're free to have your pick. 

but it is possible to send an argument to a function that makes an
assignment to the argument, and yet the assignment is made to the
original, not to a copy:

foo = function(arg) arg$foo = foo

e = new.env()
foo(e)
e$foo

are you sure this is pass by value?

it appears that r has a pass-by-need mechanism that dispatches to
pass-by-value or pass-by-reference depending on the type of the object. 
with this semantics, all sorts of mess are possible, and 'names<-'
provides one example.

[1] design concepts in programming languages, turbak and gifford, mit
press 2008

> [...]
>   
>>> If you entertain the idea that 'names<-' updates *tmp* and
>>> returns the updated *tmp*, then you believe that 'names<-' behaves
>>> in a non-standard way and should take appropriate care. 
>>>   
>> i got lost in your argumentation.  [..]
>> 
>
> I was commenting on "does this say anything about what 'names<-'(...)
> actually returns?  updated *tmp*, or a copy of it?"
>
> As I said, if you entertain the idea that 'names<-' returns an updated
> *tmp*, then you believe that 'names<-' behaves in a non-standard way
> and appropriate care has to be taken.
>
>   

i can check, by experimentation, whether 'names<-' returns a copy or the
original; even if i can establish that it returns the original after
having modified it, it's not something to entertain.  maybe you
entertain the idea of your users performing the guesswork instead of
reading an unambiguous specification.  you have already said that you
don't care if your users get confused, it would fit the image.

and actually, in the example we discuss, 'names<-' does *not*

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread G. Jay Kerns

Wacek Kusnierczyk wrote:

[snip]

> as i explained a few months ago, i study r to find examples of bad
> design.  if anyone in the r core is interested in having the problems i
> report fixed, i'm happy to get involved in a discussion about the design
> and implementation.  if not, i'm happy with just pointing out the issues.

:-)

I am prompted to imagine someone pointing out to the volunteers of the
International Red Cross - on the field of a natural disaster, no less
- that their uniforms are not an acceptably consistent shade of
pink... or that the screws on their tourniquets do not have the
appropriate pitch as to minimize the friction for the turner...

As a practicing statistician I am simply thankful that the bleeding is
stopped.   :-)

Cheers to R-Core (and the hundreds of other volunteers).
Jay



***
G. Jay Kerns, Ph.D.
Associate Professor
Department of Mathematics & Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gke...@ysu.edu
http://www.cc.ysu.edu/~gjkerns/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Simon Urbanek

On Mar 12, 2009, at 11:12 , Wacek Kusnierczyk wrote:

Simon Urbanek wrote:

On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:

Wacek,

Peter gave you a full answer explaining it very well. If you really
want to be able to trace each instance yourself, you have to learn
far more about R internals than you apparently know (and Peter  
hinted
at that). Internally x=1 an x=c(1) are slightly different in that  
the

former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
what causes the difference in behavior as Peter explained. The  
reason

is that c(1) creates a copy of the 1 (which is a constant
[=unmutable] thus requiring a copy) and the new copy has no other
references and thus can be modified and hence NAMED(x) = 0.

Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --
since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)
= 1 -- this is just a detail on how things work with assignment, the
explanation above is still correct since duplication happens
conditional on NAMED == 2.

there is an interesting corollary.  self-assignment seems to  
increase the reference count:

   x = 1;  'names<-'(x, 'foo'); names(x)
   # NULL

   x = 1;  x = x;  'names<-'(x, 'foo'); names(x)
   # "foo"

Not for me, at least in current R:

> x = 1;  'names<-'(x, 'foo'); names(x)
foo
  1
NULL
> x = 1;  x = x;  'names<-'(x, 'foo'); names(x)
foo
  1
NULL

(both R 2.8.1 and R-devel 3/11/09, darwin 9.6)

In addition, you still got it backwards - your output suggests that  
the assignment created a new, clean copy. Functional call of `names<-`  
(whose side-effect on x is undefined BTW) is destructive when you get  
a clean copy (e.g. as a result of the c function) and non-destructive  
when the object was referenced. It is left as an exercise to the  
reader to reason why constants such as 1 are referenced.

Cheers,
Simon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Berwin A Turlach

On Thu, 12 Mar 2009 15:21:50 +0100
Wacek Kusnierczyk  wrote:

[...]   
> >>> And the R Language manual (ignoring for the moment that it is a
> >>> draft and all that), 
> >>>   
> >> since we must...
> >>
> >> 
> >>> clearly states that 
> >>>
> >>>   names(x) <- c("a","b")
> >>>
> >>> is equivalent to
> >>>   
> >>>   '*tmp*' <- x
> >>>  x <- "names<-"('*tmp*', value=c("a","b"))
> >>>   
> >>>   
> >> ... and?  
> >> 
> >
> > This seems to suggest 
> 
> seems to suggest?  is not the purpose of documentation to clearly,
> ideally beyond any doubt, specify what is to be specified?

The R Language Definition manual is still a draft. :)

> > that in this case the infix and prefix syntax
> > is not equivalent as it does not say that 
> >   
> 
> are you suggesting fortune telling from what the docs do *not* say?

My experience is that sometimes you have to realise what is not
stated.  I remember a discussion with somebody who asked why he could
not run, on windows, R CMD INSTALL on a *.zip file.  I pointed out to
him that the documentation states that you can run R CMD INSTALL on
*.tar.gz or *.tgz files and, thus, there should be no expectation that
it can be run on *.zip file.

YMMV, but when I read a passage like this in R documentation, I start
to wonder why it is stated that 
names(x) <- c("a","b")
is equivalent to 
*tmp* <- x
x <- "names<-"('*tmp*', value=c("a","b"))
and the simpler construct
x <- "names<-"(x, value=c("a", "b"))
is not used.  There must be a reason, nobody likes to type
unnecessarily long code.  And, after thinking about this for a while,
the penny might drop.

[...] 
> >> does this say anything about what 'names<-'(...) actually
> >> returns?  updated *tmp*, or a copy of it?
> >> 
> >
> > Since R uses pass-by-value, 
> 
> since?  it doesn't!

For all practical purposes it is as long as standard evaluation is
used.  One just have to be aware that some functions evaluate their
arguments in a non-standard way.  

[...]
> > If you entertain the idea that 'names<-' updates *tmp* and
> > returns the updated *tmp*, then you believe that 'names<-' behaves
> > in a non-standard way and should take appropriate care. 
> 
> i got lost in your argumentation.  [..]

I was commenting on "does this say anything about what 'names<-'(...)
actually returns?  updated *tmp*, or a copy of it?"

As I said, if you entertain the idea that 'names<-' returns an updated
*tmp*, then you believe that 'names<-' behaves in a non-standard way
and appropriate care has to be taken.

> > And the fact that a variable *tmp* is used hints to the fact that
> > 'names<-' might have side-effect.  
> 
> are you suggesting fortune telling from the fact that a variable *tmp*
> is used?

Nothing to do with fortune telling.  One reads the manual, one wonders
why is this construct used instead of an apparently much more simple
one, one reflects and investigates, one realises why the given
construct is stated as the equivalent: because "names<-" has
side-effects.

> > This is similar to the discussion what value i should have in the
> > following C snippet:
> > i = 0;
> > i += i++;
> >   
> 
> nonsense, it's a *completely* different issue.  here you touch the
> issue of the order of evaluation, and not of whether an object is
> copied or modified;  above, the inverse is true.

Sorry, there was a typo above.  The second statement should have been
i = i++;

Then on some abstract level they are the same; an object appears on the
left hand side of an assignment but is also modified in the expression
assigned to it.  So what value should it end up with?

> >> why?  you can still use the infix names<- with destructive
> >> semantics to avoid copying. 
> >> 
> >
> > I guess that would require a rewrite (or extension) of the parser.
> > To me, Section 10.1.2 of the Language Definition manual suggests
> > that once an expression is parsed, you cannot distinguish any more
> > whether 'names<-' was called using infix syntax or prefix syntax.
> >   
> 
> but this must be nonsense, since:
> 
> x = 1
> 'names<-'(x, 'foo')
> names(x)
> # NULL
> 
> x = 1
> names(x) <- 'foo'
> names(x)
> # "foo"
> 
> clearly, there is not only syntactic difference here.  but it might be
> that 10.1.2 does not suggest anything like what you say.

Please tell me how this example contradicts my reading of 10.1.2 that
the expressions 
'names<-'(x, 'foo')
and
names(x) <- 'foo'
once they are parsed, produce exactly the same parse tree and that it
becomes impossible to tell from the parse tree whether originally the
infix syntax or the prefix syntax was used.  In fact, the last sentence
in section 10.1.2 strongly suggests to me that the parse tree stores
all function calls as if prefix notation was used.  But it is probably
my English again.

> > Thus, I guess you want to start a discussion with R Core whether it
> > is worthwhile to change t

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Wacek Kusnierczyk

Simon Urbanek wrote:
>
> On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:
>
>> Wacek,
>>
>> Peter gave you a full answer explaining it very well. If you really
>> want to be able to trace each instance yourself, you have to learn
>> far more about R internals than you apparently know (and Peter hinted
>> at that). Internally x=1 an x=c(1) are slightly different in that the
>> former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
>> what causes the difference in behavior as Peter explained. The reason
>> is that c(1) creates a copy of the 1 (which is a constant
>> [=unmutable] thus requiring a copy) and the new copy has no other
>> references and thus can be modified and hence NAMED(x) = 0.
>>
>
> Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --
> since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)
> = 1 -- this is just a detail on how things work with assignment, the
> explanation above is still correct since duplication happens
> conditional on NAMED == 2.

there is an interesting corollary.  self-assignment seems to increase
the reference count:

x = 1;  'names<-'(x, 'foo'); names(x)
# NULL

x = 1;  x = x;  'names<-'(x, 'foo'); names(x)
# "foo"

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Wacek Kusnierczyk

Wacek Kusnierczyk wrote:
> Berwin A Turlach wrote:
>   
>
>> This is similar to the discussion what value i should have in the
>> following C snippet:
>>  i = 0;
>>  i += i++;
>>   
>> 
>
>
> in fact, your example is useless because the result here is clearly
> specified by the semantics (as far as i know -- prove me wrong).  you
> lookup i (0) and i (0) (the order does not matter here), add these
> values (0), assign to i (0), and increase i (1). 
>   

i'm happy to prove myself wrong.  the c programming language, 2nd ed. by
ritchie and kernigan, has the following discussion:

"
One unhappy situation is typified by the statement

a[i] = i++;

The question is whether the subscript is the old value of i or the new.
Compilers can interpret
this in different ways, and generate different answers depending on
their interpretation. The
standard intentionally leaves most such matters unspecified.
"

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
> On Thu, 12 Mar 2009 10:53:19 +0100
> Wacek Kusnierczyk  wrote:
>
>   
>> well, ?'names<-' says:
>>
>> "
>> Value:
>>  For 'names<-', the updated object. 
>> "
>>
>> which is only partially correct, in that the value will sometimes be
>> an updated *copy* of the object.
>> 
>
> But since R supposedly 

*supposedly*

> uses call-by-value (though we know how to
> circumvent that, don't we?) 

we know how a lot of built-ins hack around this, don't we, and we also
know that call-by-value is not really the argument passing mechanism in r.

> wouldn't you always expect that a copy of
> the object is returned?
>   

indeed!  that's what i have said previously, no?  there is still space
for the smart (i mean it) copy-on-assignment behaviour, but it should
not be visible to the user, in particular, not in that 'names<-'
destructively modifies the object it is given when the refcount is 1. 
in my humble opinion, there is either a design flaw or a bug here.

>  
>   
>>> And the R Language manual (ignoring for the moment that it is a
>>> draft and all that), 
>>>   
>> since we must...
>>
>> 
>>> clearly states that 
>>>
>>> names(x) <- c("a","b")
>>>
>>> is equivalent to
>>> 
>>> '*tmp*' <- x
>>>  x <- "names<-"('*tmp*', value=c("a","b"))
>>>   
>>>   
>> ... and?  
>> 
>
> This seems to suggest 

seems to suggest?  is not the purpose of documentation to clearly,
ideally beyond any doubt, specify what is to be specified?

> that in this case the infix and prefix syntax
> is not equivalent as it does not say that 
>   

are you suggesting fortune telling from what the docs do *not* say?

>   names(x) <- c("a","b")
> is equivalent to
>   x <- "names<-"(x, value=c("a","b"))
> and I was commenting on the claim that the infix syntax is equivalent
> to the prefix syntax.
>
>   
>> does this say anything about what 'names<-'(...) actually
>> returns?  updated *tmp*, or a copy of it?
>> 
>
> Since R uses pass-by-value, 

since?  it doesn't!

> you would expect the latter, wouldn't
> you?  

yes, that's what i'd expect in a functional language.

> If you entertain the idea that 'names<-' updates *tmp* and
> returns the updated *tmp*, then you believe that 'names<-' behaves in a
> non-standard way and should take appropriate care.
>   

i got lost in your argumentation.  i have given examples of where
'names<-' destructively modifies and returns the updated object, not a
copy.  what is your point here?

> And the fact that a variable *tmp* is used hints to the fact that
> 'names<-' might have side-effect.  

are you suggesting fortune telling from the fact that a variable *tmp*
is used?

> If 'names<-' has side effects,
> then it might not be well defined with what value x ends up with if
> one executes:
>   x <- 'names<-'(x, value=c("a","b"))  
>   

not really, unless you mean the returned object in the referential sense
(memory location) versus value conceptually.  here x will obviously have
the value of the original x plus the names, *but* indeed you cannot tell
from this snippet whether after the assignment x will be the same,
though updated, object or will rather be an updated copy:

x = c(1)
x = 'names<-'(x, 'foo')
# x is the same object

x = c(1)
y = x
x = 'names<-'(x, 'foo')
# x is another object

so, as you say, it is not well defined with what object will x end up as
its value, though the value of the object visible to the user is well
defined.  rewrite the above and play:

x = c(1)
y = 'names<-'(x, 'foo')
names(x)

what are the names of x?  is y identical (sensu refernce) with x, is y
different (sensu reference) but indiscernible (sensu value) from x, or
is y different (sensu value) from x in that y has names and x doesn't?

> This is similar to the discussion what value i should have in the
> following C snippet:
>   i = 0;
>   i += i++;
>   

nonsense, it's a *completely* different issue.  here you touch the issue
of the order of evaluation, and not of whether an object is copied or
modified;  above, the inverse is true.

in fact, your example is useless because the result here is clearly
specified by the semantics (as far as i know -- prove me wrong).  you
lookup i (0) and i (0) (the order does not matter here), add these
values (0), assign to i (0), and increase i (1). 

i have a better example for you:

int i = 0;
i += ++i - ++i

which will give different final values for i in c (2 with gcc 4.2, 1
with gcc 3.4), c# and java (-1), perl (2) and php (1).  again, this has
nothing to do with the above.

>  
> [..]
>   
>>> I am not sure whether R ever behaved in that way, but as Peter
>>> pointed out, this would be quite undesirable from a memory
>>> management and performance point of view.  
>>>   
>> why?  you can still use the infix names<- with destructive semantics
>> to avoid copying. 
>> 
>
> I guess that would require a rewrite (or extension) of the par

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Berwin A Turlach

On Thu, 12 Mar 2009 10:53:19 +0100
Wacek Kusnierczyk  wrote:

> well, ?'names<-' says:
> 
> "
> Value:
>  For 'names<-', the updated object. 
> "
> 
> which is only partially correct, in that the value will sometimes be
> an updated *copy* of the object.

But since R supposedly uses call-by-value (though we know how to
circumvent that, don't we?) wouldn't you always expect that a copy of
the object is returned?

> > And the R Language manual (ignoring for the moment that it is a
> > draft and all that), 
> 
> since we must...
> 
> > clearly states that 
> >
> > names(x) <- c("a","b")
> >
> > is equivalent to
> > 
> > '*tmp*' <- x
> >  x <- "names<-"('*tmp*', value=c("a","b"))
> >   
> 
> ... and?  

This seems to suggest that in this case the infix and prefix syntax
is not equivalent as it does not say that 
names(x) <- c("a","b")
is equivalent to
x <- "names<-"(x, value=c("a","b"))
and I was commenting on the claim that the infix syntax is equivalent
to the prefix syntax.

> does this say anything about what 'names<-'(...) actually
> returns?  updated *tmp*, or a copy of it?

Since R uses pass-by-value, you would expect the latter, wouldn't
you?  If you entertain the idea that 'names<-' updates *tmp* and
returns the updated *tmp*, then you believe that 'names<-' behaves in a
non-standard way and should take appropriate care.

And the fact that a variable *tmp* is used hints to the fact that
'names<-' might have side-effect.  If 'names<-' has side effects,
then it might not be well defined with what value x ends up with if
one executes:
x <- 'names<-'(x, value=c("a","b"))  

This is similar to the discussion what value i should have in the
following C snippet:
i = 0;
i += i++;

[..]
> > I am not sure whether R ever behaved in that way, but as Peter
> > pointed out, this would be quite undesirable from a memory
> > management and performance point of view.  
> 
> why?  you can still use the infix names<- with destructive semantics
> to avoid copying. 

I guess that would require a rewrite (or extension) of the parser.  To
me, Section 10.1.2 of the Language Definition manual suggests that once
an expression is parsed, you cannot distinguish any more whether
'names<-' was called using infix syntax or prefix syntax.

Thus, I guess you want to start a discussion with R Core whether it is
worthwhile to change the parser such that it keeps track on whether a
function was used with infix notation or prefix notation and to
provide for most (all?) assignment operators implementations that use
destructive semantics if the infix version was used and always copy if
the prefix notation is used. 

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
>
> Whoever said that must have been at that moment not as precise as he or
> she could have been.  Also, R does not behave according to what people
> say on this list (which is good, because some times people they wrong
> things on this list) but according to how it is documented to do; at
> least that is what people on this list (and others) say. :)
>   

well, ?'names<-' says:

"
Value:
 For 'names<-', the updated object. 
"

which is only partially correct, in that the value will sometimes be an
updated *copy* of the object.

> And the R Language manual (ignoring for the moment that it is a draft
> and all that), 

since we must...

> clearly states that 
>
>   names(x) <- c("a","b")
>
> is equivalent to
>   
>   '*tmp*' <- x
>  x <- "names<-"('*tmp*', value=c("a","b"))
>   

... and?  does this say anything about what 'names<-'(...) actually
returns?  updated *tmp*, or a copy of it?


> [...]
>   
>> well, i can imagine a user using the prefix 'names<-' precisely under
>> the assumption that it will perform functionally;  
>> 
>
> You mean
>   y <- 'names<-'(x, "foo")
> instead of
>   y <- x
>   names(y) <- "foo"
> ?
>   

what i mean is, rather precisely, that 'names<-'(x, 'foo') will produce
a *new* object with a copy of the value of x and names as specified, and
will *not*, under any circumstances, modify x.

the first line above does not quite address this, e.g.:

x = c(1)
y = 'names<-'(x, 'foo')
names(x)
# "foo", 'should' be NULL


> Fair enough.  But I would still prefer the latter version this it is
> (for me) easier to read and to decipher the intention of the code.
>   

you're welcome to use it.  but this is personal preference, and i'm
trying to discuss the semantics of r here.  what you show is a way to
clutter the code, and you need to explicitly name the new object, while,
in functional programming, it is typical to operate on anonymous objects
passed from one function to another, e.g.

f('names<-'(x, 'foo'))

which would have to become

y = x
names(y) = 'foo'
f(y)

or

f({y = x; names(y) = 'foo'; y})

with 'y' being a nuissance name.


>> i.e., 'names<-'(x, 'foo') will always produce a copy of x with the
>> new names, and never change the x.  
>> 
>
> I am not sure whether R ever behaved in that way, but as Peter pointed
> out, this would be quite undesirable from a memory management and
> performance point of view.  

why?  you can still use the infix names<- with destructive semantics to
avoid copying. 


> Image that every time you modify a (name)
> component of a large object a new copy of that object is created.
>   

see above.  besides, r has been several times claimed here (but see your
remark above) to be a functional language, and in this context it is
surprising that the smart (i mean it) copy-on-assignment mechanism,
which is an implementational optimization, not only becomes visible, but
also makes functions (hmm, procedures?) such as 'names<-' non-functional
-- in some, but not all, cases.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Berwin A Turlach

On Thu, 12 Mar 2009 10:05:36 +0100
Wacek Kusnierczyk  wrote:

> well, as far as i remember, it has been said on this list that in r
> the infix syntax is equivalent to the prefix syntax, [...]

Whoever said that must have been at that moment not as precise as he or
she could have been.  Also, R does not behave according to what people
say on this list (which is good, because some times people they wrong
things on this list) but according to how it is documented to do; at
least that is what people on this list (and others) say. :)

And the R Language manual (ignoring for the moment that it is a draft
and all that), clearly states that 

names(x) <- c("a","b")

is equivalent to

'*tmp*' <- x
 x <- "names<-"('*tmp*', value=c("a","b"))

[...]
> well, i can imagine a user using the prefix 'names<-' precisely under
> the assumption that it will perform functionally;  

You mean
y <- 'names<-'(x, "foo")
instead of
y <- x
names(y) <- "foo"
?

Fair enough.  But I would still prefer the latter version this it is
(for me) easier to read and to decipher the intention of the code.

> i.e., 'names<-'(x, 'foo') will always produce a copy of x with the
> new names, and never change the x.  

I am not sure whether R ever behaved in that way, but as Peter pointed
out, this would be quite undesirable from a memory management and
performance point of view.  Image that every time you modify a (name)
component of a large object a new copy of that object is created.

> cheers, and thanks for the discussion.

You are welcome.

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Wacek Kusnierczyk

Wacek Kusnierczyk wrote:
>
> is precisely why i'd think that the prefix 'names<-' should never do
> destructive modifications, because that's what x = 'names<-'(x, 'foo'),
> and thus also names(x) = 'foo', is for.
>
>   

to make the point differently, i'd expect the following two to be
equivalent:

x = c(1); 'names<-'(x, 'foo'); names(x)
# "foo"

x = c(1); do.call('names<-', list(x, 'foo')); names(x)
# NULL

but they're obviously not.  and of course, just that i'd expect it is
not a strong argument.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
> On Wed, 11 Mar 2009 20:29:14 +0100
> Wacek Kusnierczyk  wrote:
>
>   
>> Simon Urbanek wrote:
>> 
>>> Wacek,
>>>
>>> Peter gave you a full answer explaining it very well. If you really
>>> want to be able to trace each instance yourself, you have to learn
>>> far more about R internals than you apparently know (and Peter
>>> hinted at that). Internally x=1 an x=c(1) are slightly different in
>>> that the former has NAMED(x) = 2 whereas the latter has NAMED(x) =
>>> 0 which is what causes the difference in behavior as Peter
>>> explained. The reason is that c(1) creates a copy of the 1 (which
>>> is a constant [=unmutable] thus requiring a copy) and the new copy
>>> has no other references and thus can be modified and hence NAMED(x)
>>> = 0.
>>>   
>> simon, thanks for the explanation, it's now as clear as i might
>> expect.
>>
>> now i'm concerned with what you say:  that to understand something
>> visible to the user one needs to "learn far more about R internals
>> than one apparently knows".  your response suggests that to use r
>> without confusion one needs to know the internals, 
>> 
>
> Simon can probably speak for himself, but according to my reading he
> has not suggested anything similar to what you suggest he suggested. :)
>   

so i did not say *he* suggested this.  'your response suggests' does
not, on my reading, imply any intention from simon's side.  but it's you
who is an expert in (a dialect of) english, so i won't argue.


>   
>> and this would be a really bad thing to say.. 
>> 
>
> No problems, since he did not say anything vaguely similar to what you
> suggest he said.
>   

let's not depart from the point.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
> On Wed, 11 Mar 2009 20:31:18 +0100
> Wacek Kusnierczyk  wrote:
>
>   
>> Simon Urbanek wrote:
>> 
>>> On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:
>>>
>>>   
 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter
 hinted at that). Internally x=1 an x=c(1) are slightly different
 in that the former has NAMED(x) = 2 whereas the latter has
 NAMED(x) = 0 which is what causes the difference in behavior as
 Peter explained. The reason is that c(1) creates a copy of the 1
 (which is a constant [=unmutable] thus requiring a copy) and the
 new copy has no other references and thus can be modified and
 hence NAMED(x) = 0.

>>> Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above
>>> -- since NAMED(c(1)) = 0 and once it's assigned to x it becomes
>>> NAMED(x) = 1 -- this is just a detail on how things work with
>>> assignment, the explanation above is still correct since
>>> duplication happens conditional on NAMED == 2.
>>>   
>> i guess this is what every user needs to know to understand the
>> behaviour one can observe on the surface? 
>> 
>
> Nope, only users who prefer to write '+'(1,2) instead of 1+2, or
> 'names<-'(x, 'foo') instead of names(x)='foo'.
>
>   

well, as far as i remember, it has been said on this list that in r the
infix syntax is equivalent to the prefix syntax, so no one wanting to
use the form above should be afraid of different semantics;  these two
forms should be perfectly equivalent.  after all,

x = 1
names(x) = 'foo'
names(x)

should return NULL, because when the second assignment is made, we need
to make a copy of the value of x, so it is the copy that should have
changed names, not the value of x (which would still be the original 1).

on the other hand, the fact that

names(x) = 'foo'

is (or so it seems) a shorthand for

x = 'names<-'(x, 'foo')

is precisely why i'd think that the prefix 'names<-' should never do
destructive modifications, because that's what x = 'names<-'(x, 'foo'),
and thus also names(x) = 'foo', is for.

i guess the above is sort of blasphemy.

> Attempting to change the name attribute of x via 'names<-'(x, 'foo')
> looks to me as if one relies on a side effect of the function
> 'names<-'; which, in my book would be a bad thing.  

indeed;  so, for coherence, 'names<-' should always do the modification
on a copy.  it would then have semantics different from the infix form
of 'names<-', but at least consistently so.

> I.e. relying on side
> effects of a function, or writing functions with side effects which are
> then called for their side-effects;  this, of course, excludes
> functions like plot() :)  I never had the need to call 'names<-'()
> directly and cannot foresee circumstances in which I would do so.
>   

> Plenty of users, including me, are happy using the latter forms and,
> hence, never have to bother with understanding these implementation
> details or have to bother about them.  
>
> Your mileage obviously varies, but that is when you have to learn about
> these internal details.  If you call functions because of their
> side-effects, you better learn what the side-effects are exactly.
>   

well, i can imagine a user using the prefix 'names<-' precisely under
the assumption that it will perform functionally;  i.e., 'names<-'(x,
'foo') will always produce a copy of x with the new names, and never
change the x.  that there will be a destructive modification made to x
on some, but not all, occasions, is hardly a good thing in this context
-- and it's not a situation where a user wants to use the function
"because of its side effects", quite to the contrary.  this was actually
the situation i had when i first discovered the surprizing behaviour of
'names<-';  i thought 'names<-' did *not* have side effects.

cheers, and thanks for the discussion.
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Berwin A Turlach

On Wed, 11 Mar 2009 20:29:14 +0100
Wacek Kusnierczyk  wrote:

> Simon Urbanek wrote:
> > Wacek,
> >
> > Peter gave you a full answer explaining it very well. If you really
> > want to be able to trace each instance yourself, you have to learn
> > far more about R internals than you apparently know (and Peter
> > hinted at that). Internally x=1 an x=c(1) are slightly different in
> > that the former has NAMED(x) = 2 whereas the latter has NAMED(x) =
> > 0 which is what causes the difference in behavior as Peter
> > explained. The reason is that c(1) creates a copy of the 1 (which
> > is a constant [=unmutable] thus requiring a copy) and the new copy
> > has no other references and thus can be modified and hence NAMED(x)
> > = 0.
> 
> 
> simon, thanks for the explanation, it's now as clear as i might
> expect.
> 
> now i'm concerned with what you say:  that to understand something
> visible to the user one needs to "learn far more about R internals
> than one apparently knows".  your response suggests that to use r
> without confusion one needs to know the internals, 

Simon can probably speak for himself, but according to my reading he
has not suggested anything similar to what you suggest he suggested. :)

> and this would be a really bad thing to say.. 

No problems, since he did not say anything vaguely similar to what you
suggest he said.

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-12 Thread Berwin A Turlach

On Wed, 11 Mar 2009 20:31:18 +0100
Wacek Kusnierczyk  wrote:

> Simon Urbanek wrote:
> >
> > On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:
> >
> >> Wacek,
> >>
> >> Peter gave you a full answer explaining it very well. If you really
> >> want to be able to trace each instance yourself, you have to learn
> >> far more about R internals than you apparently know (and Peter
> >> hinted at that). Internally x=1 an x=c(1) are slightly different
> >> in that the former has NAMED(x) = 2 whereas the latter has
> >> NAMED(x) = 0 which is what causes the difference in behavior as
> >> Peter explained. The reason is that c(1) creates a copy of the 1
> >> (which is a constant [=unmutable] thus requiring a copy) and the
> >> new copy has no other references and thus can be modified and
> >> hence NAMED(x) = 0.
> >>
> >
> > Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above
> > -- since NAMED(c(1)) = 0 and once it's assigned to x it becomes
> > NAMED(x) = 1 -- this is just a detail on how things work with
> > assignment, the explanation above is still correct since
> > duplication happens conditional on NAMED == 2.
> 
> i guess this is what every user needs to know to understand the
> behaviour one can observe on the surface? 

Nope, only users who prefer to write '+'(1,2) instead of 1+2, or
'names<-'(x, 'foo') instead of names(x)='foo'.

Attempting to change the name attribute of x via 'names<-'(x, 'foo')
looks to me as if one relies on a side effect of the function
'names<-'; which, in my book would be a bad thing.  I.e. relying on side
effects of a function, or writing functions with side effects which are
then called for their side-effects;  this, of course, excludes
functions like plot() :)  I never had the need to call 'names<-'()
directly and cannot foresee circumstances in which I would do so.

Plenty of users, including me, are happy using the latter forms and,
hence, never have to bother with understanding these implementation
details or have to bother about them.  

Your mileage obviously varies, but that is when you have to learn about
these internal details.  If you call functions because of their
side-effects, you better learn what the side-effects are exactly.

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-11 Thread Wacek Kusnierczyk

Simon Urbanek wrote:
>
> On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:
>
>> Wacek,
>>
>> Peter gave you a full answer explaining it very well. If you really
>> want to be able to trace each instance yourself, you have to learn
>> far more about R internals than you apparently know (and Peter hinted
>> at that). Internally x=1 an x=c(1) are slightly different in that the
>> former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
>> what causes the difference in behavior as Peter explained. The reason
>> is that c(1) creates a copy of the 1 (which is a constant
>> [=unmutable] thus requiring a copy) and the new copy has no other
>> references and thus can be modified and hence NAMED(x) = 0.
>>
>
> Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --
> since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)
> = 1 -- this is just a detail on how things work with assignment, the
> explanation above is still correct since duplication happens
> conditional on NAMED == 2.

i guess this is what every user needs to know to understand the
behaviour one can observe on the surface?  thanks for further
clarifications.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-11 Thread Wacek Kusnierczyk

Simon Urbanek wrote:
> Wacek,
>
> Peter gave you a full answer explaining it very well. If you really
> want to be able to trace each instance yourself, you have to learn far
> more about R internals than you apparently know (and Peter hinted at
> that). Internally x=1 an x=c(1) are slightly different in that the
> former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
> what causes the difference in behavior as Peter explained. The reason
> is that c(1) creates a copy of the 1 (which is a constant [=unmutable]
> thus requiring a copy) and the new copy has no other references and
> thus can be modified and hence NAMED(x) = 0.

simon, thanks for the explanation, it's now as clear as i might expect.

now i'm concerned with what you say:  that to understand something
visible to the user one needs to "learn far more about R internals than
one apparently knows".  your response suggests that to use r without
confusion one needs to know the internals, and this would be a really
bad thing to say..  i have long been concerned with that r unnecessarily
exposes users to its internals, and here's one more example of how the
interface fails to hide the guts.  (and peter did not give me a full
answer, but a vague hint.)

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-11 Thread Simon Urbanek



On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:


Wacek,

Peter gave you a full answer explaining it very well. If you really  
want to be able to trace each instance yourself, you have to learn  
far more about R internals than you apparently know (and Peter  
hinted at that). Internally x=1 an x=c(1) are slightly different in  
that the former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0  
which is what causes the difference in behavior as Peter explained.  
The reason is that c(1) creates a copy of the 1 (which is a constant  
[=unmutable] thus requiring a copy) and the new copy has no other  
references and thus can be modified and hence NAMED(x) = 0.




Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --  
since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)  
= 1 -- this is just a detail on how things work with assignment, the  
explanation above is still correct since duplication happens  
conditional on NAMED == 2.


Cheers,
Simon



On Mar 10, 2009, at 18:16 , Wacek Kusnierczyk wrote:

i got an offline response saying that my original post may have not  
been

clear as to what the problem was, essentially, and that i may need to
restate it in words, in addition to code.

the problem is:  the performance of 'names<-' is incoherent, in  
that in
some situations it acts in a functional manner, producing a copy of  
its
argument with the names changed, while in others it changes the  
object

in-place (and returns it), without copying first.  your explanation
below is of course valid, but does not seem to address the issue.  in
the examples below, there is always (or so it seems) just one  
reference

to the object.

why are the following functional:

  x = 1;  'names<-'(x, 'foo'); names(x)
  x = 'foo'; 'names<-'(x, 'foo');  names(x)

while these are destructive:

  x = c(1);  'names<-'(x, 'foo'); names(x)
  x = c('foo'); 'names<-'(x, 'foo');  names(x)

it is claimed that in r a singular value is a one-element vector, and
indeed,

  identical(1, c(1))
  # TRUE
  all.equal(is(1), is(c(1)))
  # TRUE

i also do not understand the difference here:

  x = c(1); 'names<-'(x, 'foo'); names(x)
  # "foo"
  x = c(1); names(x); 'names<-'(x, 'foo'); names(x)
  # "foo"
  x = c(1); print(x); 'names<-'(x, 'foo'); names(x)
  # NULL
  x = c(1); print(c(x)); 'names<-'(x, 'foo'); names(x)
  # "foo"

does print, but not names, increase the reference count for x when
applied to x, but not to c(x)?

if the issue is that there is, in those examples where x is left
unchanged, an additional reference to x that causes the value of x  
to be
copied, could you please explain how and when this additional  
reference

is created?


thanks,
vQ




Peter Dalgaard wrote:



is there something i misunderstand here?



Only the ideology/pragmatism... In principle, R has call-by-value
semantics and a function does not destructively modify its  
arguments(*),

and foo(x)<-bar behaves like x <- "foo<-"(x, bar). HOWEVER, this has
obvious performance repercussions (think x <- rnorm(1e7); x[1] <-  
0), so
we do allow destructive modification by replacement functions,  
PROVIDED

that the x is not used by anything else. On the least suspicion that
something else is using the object, a copy of x is made before the
modification.

So

(A) you should not use code like y <- "foo<-"(x, bar)

because

(B) you cannot (easily) predict whether or not x will be modified
destructively


(*) unless you mess with match.call() or substitute() and the  
like. But

that's a different story.






--
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical  
Engineering (IME)

Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics & Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-11 Thread Simon Urbanek


Wacek,

Peter gave you a full answer explaining it very well. If you really  
want to be able to trace each instance yourself, you have to learn far  
more about R internals than you apparently know (and Peter hinted at  
that). Internally x=1 an x=c(1) are slightly different in that the  
former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is  
what causes the difference in behavior as Peter explained. The reason  
is that c(1) creates a copy of the 1 (which is a constant [=unmutable]  
thus requiring a copy) and the new copy has no other references and  
thus can be modified and hence NAMED(x) = 0.


Cheers,
Simon


On Mar 10, 2009, at 18:16 , Wacek Kusnierczyk wrote:

i got an offline response saying that my original post may have not  
been

clear as to what the problem was, essentially, and that i may need to
restate it in words, in addition to code.

the problem is:  the performance of 'names<-' is incoherent, in that  
in
some situations it acts in a functional manner, producing a copy of  
its

argument with the names changed, while in others it changes the object
in-place (and returns it), without copying first.  your explanation
below is of course valid, but does not seem to address the issue.  in
the examples below, there is always (or so it seems) just one  
reference

to the object.

why are the following functional:

   x = 1;  'names<-'(x, 'foo'); names(x)
   x = 'foo'; 'names<-'(x, 'foo');  names(x)

while these are destructive:

   x = c(1);  'names<-'(x, 'foo'); names(x)
   x = c('foo'); 'names<-'(x, 'foo');  names(x)

it is claimed that in r a singular value is a one-element vector, and
indeed,

   identical(1, c(1))
   # TRUE
   all.equal(is(1), is(c(1)))
   # TRUE

i also do not understand the difference here:

   x = c(1); 'names<-'(x, 'foo'); names(x)
   # "foo"
   x = c(1); names(x); 'names<-'(x, 'foo'); names(x)
   # "foo"
   x = c(1); print(x); 'names<-'(x, 'foo'); names(x)
   # NULL
   x = c(1); print(c(x)); 'names<-'(x, 'foo'); names(x)
   # "foo"

does print, but not names, increase the reference count for x when
applied to x, but not to c(x)?

if the issue is that there is, in those examples where x is left
unchanged, an additional reference to x that causes the value of x  
to be
copied, could you please explain how and when this additional  
reference

is created?


thanks,
vQ




Peter Dalgaard wrote:



is there something i misunderstand here?



Only the ideology/pragmatism... In principle, R has call-by-value
semantics and a function does not destructively modify its  
arguments(*),

and foo(x)<-bar behaves like x <- "foo<-"(x, bar). HOWEVER, this has
obvious performance repercussions (think x <- rnorm(1e7); x[1] <-  
0), so
we do allow destructive modification by replacement functions,  
PROVIDED

that the x is not used by anything else. On the least suspicion that
something else is using the object, a copy of x is made before the
modification.

So

(A) you should not use code like y <- "foo<-"(x, bar)

because

(B) you cannot (easily) predict whether or not x will be modified
destructively


(*) unless you mess with match.call() or substitute() and the like.  
But

that's a different story.






--
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical  
Engineering (IME)

Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics & Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-10 Thread Wacek Kusnierczyk

i got an offline response saying that my original post may have not been
clear as to what the problem was, essentially, and that i may need to
restate it in words, in addition to code.

the problem is:  the performance of 'names<-' is incoherent, in that in
some situations it acts in a functional manner, producing a copy of its
argument with the names changed, while in others it changes the object
in-place (and returns it), without copying first.  your explanation
below is of course valid, but does not seem to address the issue.  in
the examples below, there is always (or so it seems) just one reference
to the object.

why are the following functional:

x = 1;  'names<-'(x, 'foo'); names(x)
x = 'foo'; 'names<-'(x, 'foo');  names(x)

while these are destructive:

x = c(1);  'names<-'(x, 'foo'); names(x)
x = c('foo'); 'names<-'(x, 'foo');  names(x)

it is claimed that in r a singular value is a one-element vector, and
indeed,

identical(1, c(1))
# TRUE
all.equal(is(1), is(c(1)))
# TRUE

i also do not understand the difference here:

x = c(1); 'names<-'(x, 'foo'); names(x)
# "foo"
x = c(1); names(x); 'names<-'(x, 'foo'); names(x)
# "foo"
x = c(1); print(x); 'names<-'(x, 'foo'); names(x)
# NULL
x = c(1); print(c(x)); 'names<-'(x, 'foo'); names(x)
# "foo"

does print, but not names, increase the reference count for x when
applied to x, but not to c(x)?

if the issue is that there is, in those examples where x is left
unchanged, an additional reference to x that causes the value of x to be
copied, could you please explain how and when this additional reference
is created?

thanks,
vQ

Peter Dalgaard wrote:
>
>> is there something i misunderstand here?
>> 
>
> Only the ideology/pragmatism... In principle, R has call-by-value
> semantics and a function does not destructively modify its arguments(*),
> and foo(x)<-bar behaves like x <- "foo<-"(x, bar). HOWEVER, this has
> obvious performance repercussions (think x <- rnorm(1e7); x[1] <- 0), so
> we do allow destructive modification by replacement functions, PROVIDED
> that the x is not used by anything else. On the least suspicion that
> something else is using the object, a copy of x is made before the
> modification.
>
> So
>
> (A) you should not use code like y <- "foo<-"(x, bar)
>
> because
>
> (B) you cannot (easily) predict whether or not x will be modified
> destructively
>
> 
> (*) unless you mess with match.call() or substitute() and the like. But
> that's a different story.
>
>
>   

-- 
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical Engineering (IME)
Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics & Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-10 Thread Wacek Kusnierczyk

Peter Dalgaard wrote:
>
> (*) unless you mess with match.call() or substitute() and the like. But
> that's a different story.
>   

different or not, it is a story that happens quite often -- too often,
perhaps -- to the degree that one may be tempted to say that the
semantics of argument passing in r is a mess. which of course is not
true, but since it is possible to mess with match.call & co, people
(including r core) do mess with them, and the result is obviously a
mess.  on top of the clear call-by-need semantics -- and on the surface,
you cannot tell how the arguments of a function will be taken (by
value?  by reference?  not at all?), which in effect looks like a messy
semantics.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-10 Thread Wacek Kusnierczyk

Stavros Macrakis wrote:
>>> (B) you cannot (easily) predict whether or not x will be modified
>>> destructively
>>>   
>> that's fine, thanks, but i must be terribly stupid as i do not see how
>> this explains the examples above.  where is the x used by something else
>> in the first example, so that 'names<-'(x, 'foo') does *not* modify x
>> destructively, while it does in the other cases?
>>
>> i just can't see how your explanation fits the examples -- it probably
>> does, but i beg you show it explicitly.
>> 
>
> I think the following shows what Peter was referring to:
>
> In this case, there is only one pointer to the value of x:
>
> x <- c(1,2)
>   
>> "names<-"(x,"foo")
>> 
>  foo 
>12
>   
>> x
>> 
>  foo 
>12
>
> In this case, there are two:
>
>   
>> x <- c(1,2)
>> y <- x
>> "names<-"(x,"foo")
>> 
>  foo 
>12
>   
>> x
>> 
> [1] 1 2
>   
>> y
>> 
> [1] 1 2
>   

that is and was clear to me, but none of my examples was of the second
form, and hence i think peter's answer did not answer my question. 
what's the difference here:

x = 1
'names<-'(x, 'foo')
names(x)
# NULL

x = c(foo=1)
'names<-'(x, 'foo')
names(x)
# "foo"

certainly not something like what you show.   what's the difference here:

x = 1
'names<-'(x, 'foo')
names(x)
# NULL
  
x = 1:2
'names<-'(x, c('foo', 'bar'))
names(x)
# "foo" "bar"

certainly not something like what you show.

> It seems as though `names<-` and the like cannot be treated as R
> functions (which do not modify their arguments) but as special
> internal routines which do sometimes modify their arguments.
>   

they seem to behave somewhat like macros:

'names<-'(a, b)

with the destructive 'names<-' is sort of replaced with

a = 'names<-'(a, b)

with a functional 'names<-'.  but this still does not explain the
incoherence above.  my problem was and is not that 'names<-' is not a
pure function, but that it sometimes is, sometimes is not, without any
obvious explanation.  that is, i suspect (not claim) that the behaviour
is not a design feature, but an incident.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-10 Thread Stavros Macrakis

>> (B) you cannot (easily) predict whether or not x will be modified
>> destructively
>
> that's fine, thanks, but i must be terribly stupid as i do not see how
> this explains the examples above.  where is the x used by something else
> in the first example, so that 'names<-'(x, 'foo') does *not* modify x
> destructively, while it does in the other cases?
>
> i just can't see how your explanation fits the examples -- it probably
> does, but i beg you show it explicitly.

I think the following shows what Peter was referring to:

In this case, there is only one pointer to the value of x:

x <- c(1,2)
> "names<-"(x,"foo")
 foo 
   12
> x
 foo 
   12

In this case, there are two:

> x <- c(1,2)
> y <- x
> "names<-"(x,"foo")
 foo 
   12
> x
[1] 1 2
> y
[1] 1 2

It seems as though `names<-` and the like cannot be treated as R
functions (which do not modify their arguments) but as special
internal routines which do sometimes modify their arguments.

  -s

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-10 Thread Wacek Kusnierczyk

Peter Dalgaard wrote:
> Wacek Kusnierczyk wrote:
>   
>> playing with 'names<-', i observed the following:
>>   
>> x = 1
>> names(x)
>> # NULL
>> 'names<-'(x, 'foo')
>> # c(foo=1)
>> names(x)
>> # NULL
>>
>> where 'names<-' has a functional flavour (does not change x), but:
>>
>> x = 1:2
>> names(x)
>> # NULL
>> 'names<-'(x, 'foo')
>> # c(foo=1, 2)
>> names(x)
>> # "foo" NA
>>   
>> where 'names<-' seems to perform a side effect on x (destructively
>> modifies x).  furthermore:
>>
>> x = c(foo=1)
>> names(x)
>> # "foo"
>> 'names<-'(x, NULL)
>> names(x)
>> # NULL
>> 'names<-'(x, 'bar')
>> names(x)
>> # "bar" !!!
>>
>> x = c(foo=1)
>> names(x)
>> # "foo"
>> 'names<-'(x, 'bar')
>> names(x)
>> # "bar" !!!
>>
>> where 'names<-' is not only able to destructively remove names from x,
>> but also destructively add or modify them (quite unlike in the first
>> example above).
>>
>> analogous code but using 'dimnames<-' on a matrix performs a side effect
>> on the matrix even if it initially does not have dimnames:
>>
>> x = matrix(1,1,1)
>> dimnames(x)
>> # NULL
>> 'dimnames<-'(x, list('foo', 'bar'))
>> dimnames(x)
>> # list("foo", "bar")
>>
>> this is incoherent with the first example above, in that in both cases
>> the structure initially has no names or dimnames attribute, but the end
>> result is different in the two examples.
>>
>> is there something i misunderstand here?
>> 
>
> Only the ideology/pragmatism... In principle, R has call-by-value
> semantics and a function does not destructively modify its arguments(*),
> and foo(x)<-bar behaves like x <- "foo<-"(x, bar). HOWEVER, this has
> obvious performance repercussions (think x <- rnorm(1e7); x[1] <- 0), so
> we do allow destructive modification by replacement functions, PROVIDED
> that the x is not used by anything else. On the least suspicion that
> something else is using the object, a copy of x is made before the
> modification.
>
> So
>
> (A) you should not use code like y <- "foo<-"(x, bar)
>
> because
>
> (B) you cannot (easily) predict whether or not x will be modified
> destructively
>
>   

that's fine, thanks, but i must be terribly stupid as i do not see how
this explains the examples above.  where is the x used by something else
in the first example, so that 'names<-'(x, 'foo') does *not* modify x
destructively, while it does in the other cases?

i just can't see how your explanation fits the examples -- it probably
does, but i beg you show it explicitly.
thanks.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names<-

2009-03-10 Thread Peter Dalgaard

Wacek Kusnierczyk wrote:
> playing with 'names<-', i observed the following:
>   
> x = 1
> names(x)
> # NULL
> 'names<-'(x, 'foo')
> # c(foo=1)
> names(x)
> # NULL
> 
> where 'names<-' has a functional flavour (does not change x), but:
> 
> x = 1:2
> names(x)
> # NULL
> 'names<-'(x, 'foo')
> # c(foo=1, 2)
> names(x)
> # "foo" NA
>   
> where 'names<-' seems to perform a side effect on x (destructively
> modifies x).  furthermore:
> 
> x = c(foo=1)
> names(x)
> # "foo"
> 'names<-'(x, NULL)
> names(x)
> # NULL
> 'names<-'(x, 'bar')
> names(x)
> # "bar" !!!
> 
> x = c(foo=1)
> names(x)
> # "foo"
> 'names<-'(x, 'bar')
> names(x)
> # "bar" !!!
> 
> where 'names<-' is not only able to destructively remove names from x,
> but also destructively add or modify them (quite unlike in the first
> example above).
> 
> analogous code but using 'dimnames<-' on a matrix performs a side effect
> on the matrix even if it initially does not have dimnames:
> 
> x = matrix(1,1,1)
> dimnames(x)
> # NULL
> 'dimnames<-'(x, list('foo', 'bar'))
> dimnames(x)
> # list("foo", "bar")
> 
> this is incoherent with the first example above, in that in both cases
> the structure initially has no names or dimnames attribute, but the end
> result is different in the two examples.
> 
> is there something i misunderstand here?

Only the ideology/pragmatism... In principle, R has call-by-value
semantics and a function does not destructively modify its arguments(*),
and foo(x)<-bar behaves like x <- "foo<-"(x, bar). HOWEVER, this has
obvious performance repercussions (think x <- rnorm(1e7); x[1] <- 0), so
we do allow destructive modification by replacement functions, PROVIDED
that the x is not used by anything else. On the least suspicion that
something else is using the object, a copy of x is made before the
modification.

So

(A) you should not use code like y <- "foo<-"(x, bar)

because

(B) you cannot (easily) predict whether or not x will be modified
destructively


(*) unless you mess with match.call() or substitute() and the like. But
that's a different story.


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

48 matches

Mail list logo