subject:"\[Rd\] surprising behaviour of names\-"

Re: [Rd] surprising behaviour of names-

2009-03-16 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:

 '*tmp*' = 0
 `*tmp*`
 # 0

 x = 1
 names(x) = 'foo'
 `*tmp*`
 # error: object *tmp* not found

 `*ugly*`
 

 I agree, and I am a bit flabbergasted.  I had not expected that
 something like this would happen and I am indeed not aware of anything
 in the documentation that warns about this; but others may prove me
 wrong on this.
   

hopefully.

   
 given that `*tmp*`is a perfectly legal (though some would say
 'non-standard') name, it would be good if somewhere here a warning
 were issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is
 not just any non-standard name, but one that is 'obviously' used
 under the hood to perform black magic.
 

 Now I wonder whether there are any other objects (with non-standard)
 names) that can be nuked by operations performed under the hood.  
   

any such risk should be clearly documented, if not with a warning issued
each time the user risks h{is,er} workspace corrupted by the under-the-hood.


 I guess the best thing is to stay away from non-standard names, if only
 to save the typing of back-ticks. :)
   

agree.  but then, there may be -- and probably are -- other such 'best
to stay away' things in r, all of which should be documented so that a
user know what may happen on the surface, *without* having to peek under
the hood.


 Thanks for letting me know, I have learned something new today.
   

wow.  most of my fiercely truculent ranting is meant to point out things
that may not be intentional, or if they are, they seem to me design
flaws rather than features -- so that either i learn that i am ignorant
or wrong, or someone else does, pro bono.  hopefully.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-16 Thread Wacek Kusnierczyk

Thomas Lumley wrote:

 Wacek,

 In this case I think the *tmp* dates from the days before backticks,
 when it was not a legal name (it still isn't) and it was much, much
 harder to use illegal names, so the collision issue really didn't exist.


thanks for the explanation.

 You're right about the documentation.



thanks for the acknowledgement.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-15 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:

 Obviously, assuming that R really executes 
   *tmp* - x
   x - names-('*tmp*', value=c(a,b))
 under the hood, in the C code, then *tmp* does not end up in the symbol
 table and does not persist beyond the execution of 
   names(x) - c(a,b)

   

to prove that i take you seriously, i have peeked into the code, and
found that indeed there is a temporary binding for *tmp* made behind the
scenes -- sort of. unfortunately, it is not done carefully enough to
avoid possible interference with the user's code:

'*tmp*' = 0
`*tmp*`
# 0

x = 1
names(x) = 'foo'
`*tmp*`
# error: object *tmp* not found

`*ugly*`

given that `*tmp*`is a perfectly legal (though some would say
'non-standard') name, it would be good if somewhere here a warning were
issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is not just
any non-standard name, but one that is 'obviously' used under the hood
to perform black magic.

it also appears that the explanation given in, e.g., the r language
definition (draft, of course) sec. 3.4.4:


Assignment to subsets of a structure is a special case of a general
mechanism for complex
assignment:
x[3:5] - 13:15
The result of this commands is as if the following had been executed
‘*tmp*‘ - x
x - [-(‘*tmp*‘, 3:5, value=13:15)


is incomplete (because the final result is not '*tmp*' having the value
of x, as it might seem, but rather '*tmp*' having been unbound).

so the suggestion for the documenters is to add to the end of the
section (or wherever else it is appropriate) a warning to the effect
that in the end '*tmp*' will be removed, even if the user has explicitly
defined it earlier in the same scope.

or maybe have the implementation not rely on a user-forgeable name? for
example, the '.Last.value' name is automatically bound to the most
recently returned value, but it resides in package:base and does not
collide with bindings using it made by the user:

.Last.value = 0

1
.Last.value
# 0, not 1

1
base::.Last.value
# 1, not 0


why could not '*tmp*' be bound and unbound outside of the user's
namespace? (i guess it's easier to update the docs -- or just ignore the
issue.)


on the margin, traceback('-') will pick only one of the uses of '-'
suggested by the code above:

x - 1:10

trace('-')
x[3:5] - 13:15
# trace: x[3:5] - 13:15
# trace: x - `[-`(`*tmp*`, 3:5, value = 13:15)

which is somewhat confusing, because then '*tmp*' appears in the trace
somewhat ex machina. (again, the explanation is in the source code, but
the traceback could have been more informative.)

cheers,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-15 Thread Berwin A Turlach

G'day Wacek,

On Sun, 15 Mar 2009 21:01:33 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 Berwin A Turlach wrote:
 
  Obviously, assuming that R really executes 
  *tmp* - x
  x - names-('*tmp*', value=c(a,b))
  under the hood, in the C code, then *tmp* does not end up in the
  symbol table and does not persist beyond the execution of 
  names(x) - c(a,b)
 

 
 to prove that i take you seriously, i have peeked into the code, and
 found that indeed there is a temporary binding for *tmp* made behind
 the scenes -- sort of. unfortunately, it is not done carefully enough
 to avoid possible interference with the user's code:
 
 '*tmp*' = 0
 `*tmp*`
 # 0
 
 x = 1
 names(x) = 'foo'
 `*tmp*`
 # error: object *tmp* not found
 
 `*ugly*`

I agree, and I am a bit flabbergasted.  I had not expected that
something like this would happen and I am indeed not aware of anything
in the documentation that warns about this; but others may prove me
wrong on this.

 given that `*tmp*`is a perfectly legal (though some would say
 'non-standard') name, it would be good if somewhere here a warning
 were issued -- perhaps where i assign to `*tmp*`, because `*tmp*` is
 not just any non-standard name, but one that is 'obviously' used
 under the hood to perform black magic.

Now I wonder whether there are any other objects (with non-standard)
names) that can be nuked by operations performed under the hood.  

I guess the best thing is to stay away from non-standard names, if only
to save the typing of back-ticks. :)

Thanks for letting me know, I have learned something new today.

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-14 Thread Thomas Lumley


On Fri, 13 Mar 2009, William Dunlap wrote:


Would it make anyone any happier if the manual said
that the replacement functions should not be called
in the form
  xNew - `func-` (xOld, value)
and should only be used as
  func(xToBeChanged) - value
?


That was my reaction, too.  The discussion reminded me of old comp.lang.c 
threads about i=i++ and similar issues. The anomalies in
  xNew - `func-` (xOld, value) 
arise precisely because it isn't supposed to be used that way.


My other proposal for 'rigidly defined areas of doubt and uncertainty' has been 
the evaluation order of the *apply family (eg, does apply process the columns 
left to right, or right to left, or however it feels like?).


  -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-14 Thread Berwin A Turlach

On Sat, 14 Mar 2009 07:22:34 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

[...]
  Well, I don't see any new object created in my workspace after
  x - 4
  names(x) - foo
  Do you?

 
 of course not.  that's why i'd say the two above are *not*
 equivalent. 
 
 i haven't noticed the 'in the c code';  do you mean the r interpreter
 actually generates, in the c code, such r expressions for itself to
 evaluate?

As I said before, I have little knowledge about how the parser works and
what goes on under the hood; and I have also little time and
inclination to learn about it.  

But if you are interested in these details, then by all means invest
the time to investigate.

Alternatively, you would hope that Simon eventually finishes the book
that he is writing on programming in R; as I understand it, that book
would explain part of these issues in details.  Hopefully, along with
the book he makes the tools that he has for introspection available.

  i guess you have looked under the hood;  point me to the relevant
  code. 
 
  No I did not, because I am not interested in knowing such intimate
  details of R, but it seems you were interested.

 
 yes, but then your claim about what happens under the hood, in the c
 code, is a pure stipulation.  

I made no claim about what is going on under the hood because I have no
knowledge about these matters.  But, yes, I was speculating of what
might go on.

 and you got the example from the r language definition sec. 10.2,
 which says the forms are equivalent, with no 'under the hood, in the
 c code' comment.

Trying to figure out what a writer/painter actually means/says beyond
the explicitly stated/painted, something that is summed up in Australia
(and other places) under the term critical thinking, was not high in
the curriculum of your school, was it? :-)

 you're just showing that your statements cannot be taken seriously.

Usually, my statement can be taken seriously, unless followed by some
indication that I said them tongue-in-cheek.  Of course, statements
that I allegedly made but were in fact put into my mouth cannot, and
should not, be taken seriously.

  yes, *if* you are able to predict the refcount of the object
  passed to 'names-' *then* you can predict what 'names-' will do,
  [...] 
 
  I think Simon pointed already out that you seem to have a wrong
  picture of what is going on.  [...]

 so what you quote effectively talks about a specific refcount
 mechanism.  it's not refcount that would be used by the garbage
 collector, but it's a refcount, or maybe refflag.

Fair enough, if you call this a refcount then there is no problem.
Whenever I came across the term refcount in my readings, it was
referring to different mechanisms, typically mechanisms that kept exact
track on how often an object was referred too.  So I would not call the
value of the named field a refcount.  And we can agree to call it from
now on a refcount as long as we realise what mechanism is really used.
 
 yes, that's my opinion:  the effects of implementation tricks should
 not be observable by the user, because they can lead to hard to
 explain and debug behaviour in the user's program.  you surely don't
 suggest that all users consult the source code before writing
 programs in r.

Indeed, I am not suggesting this.  Only users who use/rely on
features that are not sufficiently documented would have to study the
source code to find out what the exact behaviour is.  But, of course,
this could be fraught with danger since the behaviour could change
without warning.

 i have indeed learned what prefix 'names-' does and now i know that
 the surprising behaviour is due to the observability of the internal
 optimization.
 
 thanks to simon, peter, and you for your answers which allowed me to
 learn this ugly detail.

You are welcome.

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-14 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
 On Sat, 14 Mar 2009 07:22:34 +0100
 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 [...]
   
 Well, I don't see any new object created in my workspace after
 x - 4
 names(x) - foo
 Do you?
   
   
 of course not.  that's why i'd say the two above are *not*
 equivalent. 

 i haven't noticed the 'in the c code';  do you mean the r interpreter
 actually generates, in the c code, such r expressions for itself to
 evaluate?
 

 As I said before, I have little knowledge about how the parser works and
 what goes on under the hood; and I have also little time and
 inclination to learn about it.  

 But if you are interested in these details, then by all means invest
 the time to investigate.

   

berwin, you're playing radio erewan now.  i talk about what the user
sees at the interface, and you talk about c code.  then you admit you
don't know the code, and suggest i examine it if i'm interested.  i
incidentally am, but the whole point was that the user should not be
forced to look under the hood to know the interface to a function. 
prefix 'names-' seems to have a certain behaviour that is not properly
documented.

 Alternatively, you would hope that Simon eventually finishes the book
 that he is writing on programming in R; as I understand it, that book
 would explain part of these issues in details.  Hopefully, along with
 the book he makes the tools that he has for introspection available.
   

simon:  i'd be happy to contribute in any way you might find useful.

   
 i guess you have looked under the hood;  point me to the relevant
 code. 
 
 No I did not, because I am not interested in knowing such intimate
 details of R, but it seems you were interested.
   
   
 yes, but then your claim about what happens under the hood, in the c
 code, is a pure stipulation.  
 

 I made no claim about what is going on under the hood because I have no
 knowledge about these matters.  But, yes, I was speculating of what
 might go on.
   

owe me a beer.

   
 and you got the example from the r language definition sec. 10.2,
 which says the forms are equivalent, with no 'under the hood, in the
 c code' comment.
 

 Trying to figure out what a writer/painter actually means/says beyond
 the explicitly stated/painted, something that is summed up in Australia
 (and other places) under the term critical thinking, was not high in
 the curriculum of your school, was it? :-)
   

sure, but probably not the way you seem to think about.  have you
incidentally read ferdydurke by gombrowicz? 


   
 you're just showing that your statements cannot be taken seriously.
 

 Usually, my statement can be taken seriously, unless followed by some
 indication that I said them tongue-in-cheek.  Of course, statements
 that I allegedly made but were in fact put into my mouth cannot, and
 should not, be taken seriously.
   

i'm talking about your speculations about what the parser does (wrt.
infix and prefix forms having exactly the same parse tree), rather vague
statements such as 'names-'(x,'foo') should create (more or less) a
parse tree equivalent to that expression, and other statements (surely,
qualified with 'assuming', 'strongly suggests', and the like), coupled
with your admitting that you in fact donæt know what happens there, is
not particularly reassuring.
   
 yes, *if* you are able to predict the refcount of the object
 passed to 'names-' *then* you can predict what 'names-' will do,
 [...] 
 
 I think Simon pointed already out that you seem to have a wrong
 picture of what is going on.  [...]
   
 so what you quote effectively talks about a specific refcount
 mechanism.  it's not refcount that would be used by the garbage
 collector, but it's a refcount, or maybe refflag.
 

 Fair enough, if you call this a refcount then there is no problem.
 Whenever I came across the term refcount in my readings, it was
 referring to different mechanisms, typically mechanisms that kept exact
 track on how often an object was referred too.  So I would not call the
 value of the named field a refcount.  And we can agree to call it from
 now on a refcount as long as we realise what mechanism is really used.
   

the major point of the discussion was that 'names-' will sometimes
modify and othertimes copy its argument.  you chose to justify this by
looking under the hood, and i suppose you were pretty clear what i meant
by refcount, because it should have been clear from the context.

  
   
 yes, that's my opinion:  the effects of implementation tricks should
 not be observable by the user, because they can lead to hard to
 explain and debug behaviour in the user's program.  you surely don't
 suggest that all users consult the source code before writing
 programs in r.
 

 Indeed, I am not suggesting this.  Only users who use/rely on
 features that are not sufficiently documented would have to study the
 source code to find out what the exact

Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:

 foo = function(arg) arg$foo = foo

 e = new.env()
 foo(e)
 e$foo
   
 are you sure this is pass by value?
 

 But that is what environments are for, aren't they?  

might be.

 And it is
 documented behaviour.  

sure!

 Read section 2.1.10 (Environments) in the R
 Language Definition, 

haven't objected to that.  i object to your 'r uses pass by value',
which is only partially correct.

 in particular the last paragraph:

   Unlike most other R objects, environments are not copied when 
   passed to functions or used in assignments.  Thus, if you assign the
   same environment to several symbols and change one, the others will
   change too.  In particular, assigning attributes to an environment can
   lead to surprises.

 [..]
   
 and actually, in the example we discuss, 'names-' does *not* return
 an updated *tmp*, so there's even less to entertain.  
 

 How do you know?  Are you sure?  Have you by now studied what goes on
 under the hood?
   

yes, a bit.  but in this example, it's enough to look into *tmp* to see
that it hasn't got the names added, and since x does have names, names-
must have returned a copy of *tmp* rather than *tmp* changed:
   
x = 1
tmp = x
x = 'names-'(tmp, 'foo')
names(tmp)
# NULL

you suggested that One reads the manual, (...) one reflects and
investigates, ... -- had you done it, you wouldn't have asked the question.



   
 for fun and more guesswork, the example could have been:

 x = x
 x = 'names-'(x, value=c('a', 'b'))
 

 But it is manifestly not written that way in the manual; and for good
 reasons since 'names-' might have side effects which invokes in the
 last line undefined behaviour.  Just as in the equivalent C snippet
 that I mentioned.
   

i just can't get it why the manual does not manifestly explain what
'names-' does, and leaves you doing the guesswork you suggest.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Berwin A Turlach

On Fri, 13 Mar 2009 11:43:55 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 Berwin A Turlach wrote:

  And it is documented behaviour.  
 
 sure!

Glad to see that we agree on this.

  Read section 2.1.10 (Environments) in the R
  Language Definition, 
 
 haven't objected to that.  i object to your 'r uses pass by value',
 which is only partially correct.

Well, I used qualifiers and did not stated it categorically. 
 
  and actually, in the example we discuss, 'names-' does *not*
  return an updated *tmp*, so there's even less to entertain.  
  
 
  How do you know?  Are you sure?  Have you by now studied what goes
  on under the hood?
 
 yes, a bit.  but in this example, it's enough to look into *tmp* to
 see that it hasn't got the names added, and since x does have names,
 names- must have returned a copy of *tmp* rather than *tmp* changed:

 x = 1
 tmp = x
 x = 'names-'(tmp, 'foo')
 names(tmp)
 # NULL

Indeed, if you type these two commands on the command line, then it is
not surprising that a copy of tmp is returned since you create a
temporary object that ends up in the symbol table and persist after the
commands are finished.

Obviously, assuming that R really executes 
*tmp* - x
x - names-('*tmp*', value=c(a,b))
under the hood, in the C code, then *tmp* does not end up in the symbol
table and does not persist beyond the execution of 
names(x) - c(a,b)

This looks to me as one of the situations where a value of 1 is used
for the named field of some of the objects involves so that a copy can
be avoided.  That's why I asked whether you looked under the hood.

 you suggested that One reads the manual, (...) one reflects and
 investigates, ...

Indeed, and I am not giving up hope that one day you will master this
art.

 -- had you done it, you wouldn't have asked the  question.

Sorry, I forgot that you have a tendency to interpret statements
extremely verbatim and with little reference to the context in which
they are made.  I will try to be more explicit in future.

  for fun and more guesswork, the example could have been:
 
  x = x
  x = 'names-'(x, value=c('a', 'b'))
  
 
  But it is manifestly not written that way in the manual; and for
  good reasons since 'names-' might have side effects which invokes
  in the last line undefined behaviour.  Just as in the equivalent C
  snippet that I mentioned.
 
 i just can't get it why the manual does not manifestly explain what
 'names-' does, and leaves you doing the guesswork you suggest.

As I said before, patched to documentation are also welcome.

Best wishes,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:

 sure!
 

 Glad to see that we agree on this.
   

owe you a beer.

   
 Read section 2.1.10 (Environments) in the R
 Language Definition, 
   
 haven't objected to that.  i object to your 'r uses pass by value',
 which is only partially correct.
 

 Well, I used qualifiers and did not stated it categorically. 
   

indeed, you said R supposedly uses call-by-value (though we know how to
circumvent that, don't we?).

in that vain, R supposedly can be used to do valid statistical
computations (though we know how to circumvent it) ;)


  
   
 and actually, in the example we discuss, 'names-' does *not*
 return an updated *tmp*, so there's even less to entertain.  
 
 
 How do you know?  Are you sure?  Have you by now studied what goes
 on under the hood?
   
 yes, a bit.  but in this example, it's enough to look into *tmp* to
 see that it hasn't got the names added, and since x does have names,
 names- must have returned a copy of *tmp* rather than *tmp* changed:

 x = 1
 tmp = x
 x = 'names-'(tmp, 'foo')
 names(tmp)
 # NULL
 

 Indeed, if you type these two commands on the command line, then it is
 not surprising that a copy of tmp is returned since you create a
 temporary object that ends up in the symbol table and persist after the
 commands are finished.
   

what does command line have to do with it?

 Obviously, assuming that R really executes 
   *tmp* - x
   x - names-('*tmp*', value=c(a,b))
 under the hood, in the C code, then *tmp* does not end up in the symbol
 table 

no?

 and does not persist beyond the execution of 
   names(x) - c(a,b)
   

no?

i guess you have looked under the hood;  point me to the relevant code.

 This looks to me as one of the situations where a value of 1 is used
 for the named field of some of the objects involves so that a copy can
 be avoided.  That's why I asked whether you looked under the hood.
   

anyway, what happens under the hood is much less interesting from the
user's perspective that what can be seen over the hood.  what i can see,
is that 'names-' will incoherently perform in-place modification or
copy-on-assignment. 

yes, *if* you are able to predict the refcount of the object passed to
'names-' *then* you can predict what 'names-' will do, but in general
you may not have the chance.  and in general, this should not matter
because it should be unobservable, but it isn't.

back to your i += i++ example, the outcome may differ from a compiler to
a compiler, but, i guess, compilers will implement the order coherently,
so that whatever version they choose, the outcome will be predictable,
and not dependent on some earlier code.  (prove me wrong.  or maybe i'll
do it myself.)

   
 you suggested that One reads the manual, (...) one reflects and
 investigates, ...
 

 Indeed, and I am not giving up hope that one day you will master this
 art.
   

well, this time i meant you.


   
 -- had you done it, you wouldn't have asked the  question.
 

 Sorry, I forgot that you have a tendency to interpret statements
 extremely verbatim 

yes, i have two hooks installed:  one says \begin{verbatim}, the other
says \end{verbatim}.


 and with little reference to the context in which
 they are made.  

not that you're trying to be extremely accurate or polite here...

 I will try to be more explicit in future.
   

it will certainly do good to you.



 i just can't get it why the manual does not manifestly explain what
 'names-' does, and leaves you doing the guesswork you suggest.
 

 As I said before, patched to documentation are also welcome.
   

i'll give it a try.


 Best wishes,
   

hope you mean it.

likewise,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-13 Thread William Dunlap

Would it make anyone any happier if the manual said
that the replacement functions should not be called
in the form
   xNew - `func-` (xOld, value)
and should only be used as
   func(xToBeChanged) - value
? 

The explanation
  names(x) - c(a,b)
  is equivalent to
  '*tmp*' - x
  x - names-('*tmp*', value=c(a,b))
could also be extended a bit, adding a line like
  rm(`*tmp*`)
Those 3 lines should be considered an atomic operation:
the value that `*tmp*` or `x` may have or what is
in the symbol table at various points in that sequence 
is not defined.  (Letting details be explicitly undefined
is important: it gives developers room to improve the
efficiency of the interpreter and tells users where not to go.) 


Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

 -Original Message-
 From: r-devel-boun...@r-project.org 
 [mailto:r-devel-boun...@r-project.org] On Behalf Of Wacek Kusnierczyk
 Sent: Friday, March 13, 2009 11:42 AM
 To: Berwin A Turlach
 Cc: r-devel@r-project.org List
 Subject: Re: [Rd] surprising behaviour of names-
 ... blah blah blah
  x = 1
  tmp = x
  x = 'names-'(tmp, 'foo')
  names(tmp)
  # NULL

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Tony Plate


Wacek Kusnierczyk wrote:

[snip]
i just can't get it why the manual does not manifestly explain what
'names-' does, and leaves you doing the guesswork you suggest.

  
I'm having trouble understanding the point of this discussion.  Someone 
is calling a replacement function in a way that it's not meant to be 
used, and is them complaining about it not doing what he thinks it 
should, or about the documentation not describing what happens when one 
does that?


Is there anything incorrect or missing in the help page for normal usage 
of the replacement function for 'names'? (i.e., when used in an 
expression like 'names(x) - ...')


R does give one the ability to use its facilities in non-standard ways.  
However, I don't see much value in the help page for 'gun' attempting to 
describe the ways in which the bones in your foot will be shattered 
should you choose to point the gun at your foot and pull the trigger.  
Reminds me of the story of the guy in New York, who after injuring his 
back in refrigerator-carrying race, sued the manufacturer of the 
refrigerator for not having a warning label against that sort of use.


-- Tony Plate

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Wacek Kusnierczyk

William Dunlap wrote:
 Would it make anyone any happier if the manual said
 that the replacement functions should not be called
 in the form
xNew - `func-` (xOld, value)
 and should only be used as
func(xToBeChanged) - value
   

surely better than guesswork.

 ? 

 The explanation
   names(x) - c(a,b)
   is equivalent to
   '*tmp*' - x
   x - names-('*tmp*', value=c(a,b))
 could also be extended a bit, adding a line like
   rm(`*tmp*`)
 Those 3 lines should be considered an atomic operation:
 the value that `*tmp*` or `x` may have or what is
 in the symbol table at various points in that sequence 
 is not defined.  (Letting details be explicitly undefined
 is important: it gives developers room to improve the
 efficiency of the interpreter and tells users where not to go.) 
   

there is a difference between letting things be undefined and explicitly
stating that things are unspecified.  the c99 standard [1], for example,
is explicit about the non-determinism of expressions that involve side
effects, as it is about that some expressions may actually not be
evaluated if the optimizer decides so. 

berwin has already suggested that one reads from what docs do *not*
say;  it's a very bad idea.  it's best that the documentation *does* say
that, for example, a particular function should be used only in the
infix form because the semantics of the prefix form are not guaranteed
and may change in future versions.

if the current state is that 'names-' will modify the object it is
given as an argument in some situations, but not in others, and this is
visible to the user, the best thing to do is to give an explicit warning
-- perhaps with an annotation that things may change, if they may.

best,
vQ


[1] http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Wacek Kusnierczyk

Tony Plate wrote:
 Wacek Kusnierczyk wrote:
 [snip]
 i just can't get it why the manual does not manifestly explain what
 'names-' does, and leaves you doing the guesswork you suggest.

   
 I'm having trouble understanding the point of this discussion. 
 Someone is calling a replacement function in a way that it's not meant
 to be used, and is them complaining about it not doing what he thinks
 it should, or about the documentation not describing what happens when
 one does that?

where is it written that the function is not meant to be used this way? 
you get an example in the man page, showing precisely how it could be
used that way.  it also explains the value of 'names-':


 For 'names-', the updated object.  (Note that the value of
 'names(x) - value' is that of the assignment, 'value', not the
 return value from the left-hand side.)


it does speak of 'names-' used in prefix form, and does not do it in
any negative (discouraging) way.


 Is there anything incorrect or missing in the help page for normal
 usage of the replacement function for 'names'? (i.e., when used in an
 expression like 'names(x) - ...')

what is missing here in the first place is a specification of what
'normal' means.  as far as i can see from the man page, 'normal' does
not exclude prefix use.  and if so, what is missing in the help page is
a clear statement what an application of 'names-' will do, in the sense
of what a user may observe.


 R does give one the ability to use its facilities in non-standard
 ways.  However, I don't see much value in the help page for 'gun'
 attempting to describe the ways in which the bones in your foot will
 be shattered should you choose to point the gun at your foot and pull
 the trigger.  Reminds me of the story of the guy in New York, who
 after injuring his back in refrigerator-carrying race, sued the
 manufacturer of the refrigerator for not having a warning label
 against that sort of use.

very funny.  little relevant.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Wacek Kusnierczyk

Tony Plate wrote:
 Wacek Kusnierczyk wrote:
 Tony Plate wrote:

 Is there anything incorrect or missing in the help page for normal
 usage of the replacement function for 'names'? (i.e., when used in an
 expression like 'names(x) - ...')
 

 what is missing here in the first place is a specification of what
 'normal' means.  as far as i can see from the man page, 'normal' does
 not exclude prefix use.  and if so, what is missing in the help page is
 a clear statement what an application of 'names-' will do, in the sense
 of what a user may observe.
   
 Fair enough.  I looked at the help page for names after sending my
 email, and was surprised to see the following in the DETAILS section:

   It is possible to update just part of the names attribute via the
 general rules: see the examples. This works because the expression
 there is evaluated as |z - names-(z, [-(names(z), 3, c2))|. 

 To me, this paragraph is far more confusing than enlightening,
 especially as also gives the impression that it's OK to use a
 replacement function in a functional form.  In my own personal opinion
 it would be a enhancement to remove that example from the
 documentation, and just say you can do things like 'names(x)[2:3] -
 c(a,b)'.

i must say that this part of the man page does explain things to me. 
much less the code [1] berwin suggested as a piece to read and
investigate (slightly modified):

tmp = x
x = 'names-'(tmp, 'foo')

berwin's conclusion seemed to be that this code
hints/suggests/fortune-tells the user that 'names-' might be doing side
effects. 

this code illustrates what names(x) = 'foo' (the infix form) does --
that it destructively modifies x.  now, if the code were to illustrate
that the prefix form does perform side effects too, then the following
would be enough:

'names-'(x, 'foo')

if the code were to illustrate that the prefix form, unlike the infix
form, does not perform side effects, then the following would suffice
for a discussion:

x = 'names-'(x, 'foo')

if the code wee to illustrate that the prefix form may or may not do
side effects depending on the situation, then it surely fails to show
that, unless the user performs some sophisticated inference which i am
not capable of, or, more likely, unless the user already knows that this
was to be shown.

without a discussion, the example is simply an unworked rubbish.  and
it's obviously wrong; it says that (slightly and irrelevantly simplified)

names(x) = 'foo'

is equivalent to

tmp = x
x = 'names-'(tmp, 'foo')

which is nonsense, because in the latter case you either have an
additional binding that you don't have in the former case, or, worse,
you rebind, possibly with a different value, a name that has had a
binding already.  it's a gritty-nitty detail, but so is most of
statistics based on nitty-gritty details which non-statisticians are
happy to either ignore or be ignorant about.


[1] http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html#Comments


 I often use name replacement functions in a functional way, and
 because one can't use 'names-' etc in this way, 

note, this 'because' does not follow in any way from the man page, or
the section of 'r language definition' referred to above.


 I define my own functions like the following:

 set.names - function(n,x) {names(x) - n; x}

it appears that

set.names = function(n, x) 'names-'(x, n)

would do the job (guess why).


 (and similarly for set.rownames(), set colnames(), etc.)

 I would highly recommend you do this rather than try to use a call
 like names-(x, ...).

i'm almost tempted to extend your recommendation to 'define your own
function for about every function already in r' ;)

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-13 Thread Berwin A Turlach

On Fri, 13 Mar 2009 19:41:42 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

  Glad to see that we agree on this.

 
 owe you a beer.

O.k., if we ever meet is is first your shout and then mine.
 
  haven't objected to that.  i object to your 'r uses pass by value',
  which is only partially correct.
  
 
  Well, I used qualifiers and did not stated it categorically. 

 
 indeed, you said R supposedly uses call-by-value (though we know how
 to circumvent that, don't we?).
 
 in that vain, R supposedly can be used to do valid statistical
 computations (though we know how to circumvent it) ;)

Sure, use Excel? ;-)
 
  Indeed, if you type these two commands on the command line, then it
  is not surprising that a copy of tmp is returned since you create a
  temporary object that ends up in the symbol table and persist after
  the commands are finished.

 
 what does command line have to do with it?

If you want to find out what goes on under the hood, it is not
necessarily sufficient to do the same calculations on the command line.
 
  Obviously, assuming that R really executes 
  *tmp* - x
  x - names-('*tmp*', value=c(a,b))
  under the hood, in the C code, then *tmp* does not end up in the
  symbol table 
 
 no?

Well, I don't see any new object created in my workspace after
x - 4
names(x) - foo
Do you?

 i guess you have looked under the hood;  point me to the relevant
 code.

No I did not, because I am not interested in knowing such intimate
details of R, but it seems you were interested.
 
 yes, *if* you are able to predict the refcount of the object passed to
 'names-' *then* you can predict what 'names-' will do, [...] 

I think Simon pointed already out that you seem to have a wrong
picture of what is going on.  As far as I know, there is no refcount
for objects.  

The relevant documentation would be R Language Manual, 1.1 SEXPs:

  What R users think of as variables or objects are symbols which are
  bound to a value. The value can be thought of as either a SEXP (a
  pointer), or the structure it points to, a SEXPREC (and there are
  alternative forms used for vectors, namely VECSXP pointing to
  VECTOR_SEXPREC structures).

and 1.1.2 Rest of header:

  The named field is set and accessed by the SET_NAMED
  and NAMED macros, and take values 0, 1 and 2. R has a `call by value'
  illusion, so an assignment like

  b - a

  appears to make a copy of a and refer to it as b. However, if neither
  a nor b are subsequently altered there is no need to copy. What really
  happens is that a new symbol b is bound to the same value as a and the
  named field on the value object is set (in this case to 2). When an
  object is about to be altered, the named field is consulted. A value
  of 2 means that the object must be duplicated before being changed.
  (Note that this does not say that it is necessary to duplicate, only
  that it should be duplicated whether necessary or not.) A value of 0
  means that it is known that no other SEXP shares data with this
  object, and so it may safely be altered. A value of 1 is used for
  situations like

  dim(a) - c(7, 2)

  where in principle two copies of a exist for the duration of the
  computation as (in principle)

  a - `dim-`(a, c(7, 2))

  but for no longer, and so some primitive functions can be optimized to
  avoid a copy in this case. 

 but in general you may not have the chance. [...]

Agreed.

 and in general, this should not matter because it should be
 unobservable, but it isn't.

That's your opinion (to which you are entitled).  Unfortunately (for
you), the designers of R decided on a design which allows them to
reduce the number of copies that have to be made.

  you suggested that One reads the manual, (...) one reflects and
  investigates, ...
  
 
  Indeed, and I am not giving up hope that one day you will master
  this art.

 
 well, this time i meant you.
 
Rest assure I have read and reflected on that part of the manual.  

And I guess it boils down to how you interpret what is equivalent to
means.

For me it means that those two commands are what is executed in the C
engine once the names(x)-c(a,b) expression is parsed and the
parse list arrives at the interpreter.  To investigate whether that is
the case, one would have to look at the C code, and I have little
inclination to do so.  But that would be necessary to answer the
question whether *tmp* or a copy of *tmp* is returned, if one is really
interested in this question.  Or whether a *tmp* object is created at
all.

You seem to take is equivalent to to mean that issuing
names(x)-c(a,b) on the command line has the same effect as
issuing those two other commands on the command line and addressing
whether *tmp* or a copy of *tmp* is returned in this case.  Fair
enough, but it addresses a different question.  And, as you said
yourself in another e-mail, on the command line these two versions are
not equivalent since

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Berwin A Turlach

On Wed, 11 Mar 2009 20:31:18 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 Simon Urbanek wrote:
 
  On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:
 
  Wacek,
 
  Peter gave you a full answer explaining it very well. If you really
  want to be able to trace each instance yourself, you have to learn
  far more about R internals than you apparently know (and Peter
  hinted at that). Internally x=1 an x=c(1) are slightly different
  in that the former has NAMED(x) = 2 whereas the latter has
  NAMED(x) = 0 which is what causes the difference in behavior as
  Peter explained. The reason is that c(1) creates a copy of the 1
  (which is a constant [=unmutable] thus requiring a copy) and the
  new copy has no other references and thus can be modified and
  hence NAMED(x) = 0.
 
 
  Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above
  -- since NAMED(c(1)) = 0 and once it's assigned to x it becomes
  NAMED(x) = 1 -- this is just a detail on how things work with
  assignment, the explanation above is still correct since
  duplication happens conditional on NAMED == 2.
 
 i guess this is what every user needs to know to understand the
 behaviour one can observe on the surface? 

Nope, only users who prefer to write '+'(1,2) instead of 1+2, or
'names-'(x, 'foo') instead of names(x)='foo'.

Attempting to change the name attribute of x via 'names-'(x, 'foo')
looks to me as if one relies on a side effect of the function
'names-'; which, in my book would be a bad thing.  I.e. relying on side
effects of a function, or writing functions with side effects which are
then called for their side-effects;  this, of course, excludes
functions like plot() :)  I never had the need to call 'names-'()
directly and cannot foresee circumstances in which I would do so.

Plenty of users, including me, are happy using the latter forms and,
hence, never have to bother with understanding these implementation
details or have to bother about them.  

Your mileage obviously varies, but that is when you have to learn about
these internal details.  If you call functions because of their
side-effects, you better learn what the side-effects are exactly.

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Berwin A Turlach

On Wed, 11 Mar 2009 20:29:14 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 Simon Urbanek wrote:
  Wacek,
 
  Peter gave you a full answer explaining it very well. If you really
  want to be able to trace each instance yourself, you have to learn
  far more about R internals than you apparently know (and Peter
  hinted at that). Internally x=1 an x=c(1) are slightly different in
  that the former has NAMED(x) = 2 whereas the latter has NAMED(x) =
  0 which is what causes the difference in behavior as Peter
  explained. The reason is that c(1) creates a copy of the 1 (which
  is a constant [=unmutable] thus requiring a copy) and the new copy
  has no other references and thus can be modified and hence NAMED(x)
  = 0.
 
 
 simon, thanks for the explanation, it's now as clear as i might
 expect.
 
 now i'm concerned with what you say:  that to understand something
 visible to the user one needs to learn far more about R internals
 than one apparently knows.  your response suggests that to use r
 without confusion one needs to know the internals, 

Simon can probably speak for himself, but according to my reading he
has not suggested anything similar to what you suggest he suggested. :)

 and this would be a really bad thing to say.. 

No problems, since he did not say anything vaguely similar to what you
suggest he said.

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
 On Wed, 11 Mar 2009 20:31:18 +0100
 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

   
 Simon Urbanek wrote:
 
 On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:

   
 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter
 hinted at that). Internally x=1 an x=c(1) are slightly different
 in that the former has NAMED(x) = 2 whereas the latter has
 NAMED(x) = 0 which is what causes the difference in behavior as
 Peter explained. The reason is that c(1) creates a copy of the 1
 (which is a constant [=unmutable] thus requiring a copy) and the
 new copy has no other references and thus can be modified and
 hence NAMED(x) = 0.

 
 Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above
 -- since NAMED(c(1)) = 0 and once it's assigned to x it becomes
 NAMED(x) = 1 -- this is just a detail on how things work with
 assignment, the explanation above is still correct since
 duplication happens conditional on NAMED == 2.
   
 i guess this is what every user needs to know to understand the
 behaviour one can observe on the surface? 
 

 Nope, only users who prefer to write '+'(1,2) instead of 1+2, or
 'names-'(x, 'foo') instead of names(x)='foo'.

   

well, as far as i remember, it has been said on this list that in r the
infix syntax is equivalent to the prefix syntax, so no one wanting to
use the form above should be afraid of different semantics;  these two
forms should be perfectly equivalent.  after all,

x = 1
names(x) = 'foo'
names(x)

should return NULL, because when the second assignment is made, we need
to make a copy of the value of x, so it is the copy that should have
changed names, not the value of x (which would still be the original 1).

on the other hand, the fact that

names(x) = 'foo'

is (or so it seems) a shorthand for

x = 'names-'(x, 'foo')

is precisely why i'd think that the prefix 'names-' should never do
destructive modifications, because that's what x = 'names-'(x, 'foo'),
and thus also names(x) = 'foo', is for.

i guess the above is sort of blasphemy.

 Attempting to change the name attribute of x via 'names-'(x, 'foo')
 looks to me as if one relies on a side effect of the function
 'names-'; which, in my book would be a bad thing.  

indeed;  so, for coherence, 'names-' should always do the modification
on a copy.  it would then have semantics different from the infix form
of 'names-', but at least consistently so.



 I.e. relying on side
 effects of a function, or writing functions with side effects which are
 then called for their side-effects;  this, of course, excludes
 functions like plot() :)  I never had the need to call 'names-'()
 directly and cannot foresee circumstances in which I would do so.
   

 Plenty of users, including me, are happy using the latter forms and,
 hence, never have to bother with understanding these implementation
 details or have to bother about them.  

 Your mileage obviously varies, but that is when you have to learn about
 these internal details.  If you call functions because of their
 side-effects, you better learn what the side-effects are exactly.
   

well, i can imagine a user using the prefix 'names-' precisely under
the assumption that it will perform functionally;  i.e., 'names-'(x,
'foo') will always produce a copy of x with the new names, and never
change the x.  that there will be a destructive modification made to x
on some, but not all, occasions, is hardly a good thing in this context
-- and it's not a situation where a user wants to use the function
because of its side effects, quite to the contrary.  this was actually
the situation i had when i first discovered the surprizing behaviour of
'names-';  i thought 'names-' did *not* have side effects.

cheers, and thanks for the discussion.
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
 On Wed, 11 Mar 2009 20:29:14 +0100
 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

   
 Simon Urbanek wrote:
 
 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter
 hinted at that). Internally x=1 an x=c(1) are slightly different in
 that the former has NAMED(x) = 2 whereas the latter has NAMED(x) =
 0 which is what causes the difference in behavior as Peter
 explained. The reason is that c(1) creates a copy of the 1 (which
 is a constant [=unmutable] thus requiring a copy) and the new copy
 has no other references and thus can be modified and hence NAMED(x)
 = 0.
   
 simon, thanks for the explanation, it's now as clear as i might
 expect.

 now i'm concerned with what you say:  that to understand something
 visible to the user one needs to learn far more about R internals
 than one apparently knows.  your response suggests that to use r
 without confusion one needs to know the internals, 
 

 Simon can probably speak for himself, but according to my reading he
 has not suggested anything similar to what you suggest he suggested. :)
   

so i did not say *he* suggested this.  'your response suggests' does
not, on my reading, imply any intention from simon's side.  but it's you
who is an expert in (a dialect of) english, so i won't argue.


   
 and this would be a really bad thing to say.. 
 

 No problems, since he did not say anything vaguely similar to what you
 suggest he said.
   

let's not depart from the point.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Berwin A Turlach

On Thu, 12 Mar 2009 10:05:36 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 well, as far as i remember, it has been said on this list that in r
 the infix syntax is equivalent to the prefix syntax, [...]

Whoever said that must have been at that moment not as precise as he or
she could have been.  Also, R does not behave according to what people
say on this list (which is good, because some times people they wrong
things on this list) but according to how it is documented to do; at
least that is what people on this list (and others) say. :)

And the R Language manual (ignoring for the moment that it is a draft
and all that), clearly states that 

names(x) - c(a,b)

is equivalent to

'*tmp*' - x
 x - names-('*tmp*', value=c(a,b))

[...]
 well, i can imagine a user using the prefix 'names-' precisely under
 the assumption that it will perform functionally;  

You mean
y - 'names-'(x, foo)
instead of
y - x
names(y) - foo
?

Fair enough.  But I would still prefer the latter version this it is
(for me) easier to read and to decipher the intention of the code.

 i.e., 'names-'(x, 'foo') will always produce a copy of x with the
 new names, and never change the x.  

I am not sure whether R ever behaved in that way, but as Peter pointed
out, this would be quite undesirable from a memory management and
performance point of view.  Image that every time you modify a (name)
component of a large object a new copy of that object is created.
 
 cheers, and thanks for the discussion.

You are welcome.

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:

 Whoever said that must have been at that moment not as precise as he or
 she could have been.  Also, R does not behave according to what people
 say on this list (which is good, because some times people they wrong
 things on this list) but according to how it is documented to do; at
 least that is what people on this list (and others) say. :)
   

well, ?'names-' says:


Value:
 For 'names-', the updated object. 


which is only partially correct, in that the value will sometimes be an
updated *copy* of the object.

 And the R Language manual (ignoring for the moment that it is a draft
 and all that), 

since we must...

 clearly states that 

   names(x) - c(a,b)

 is equivalent to
   
   '*tmp*' - x
  x - names-('*tmp*', value=c(a,b))
   

... and?  does this say anything about what 'names-'(...) actually
returns?  updated *tmp*, or a copy of it?


 [...]
   
 well, i can imagine a user using the prefix 'names-' precisely under
 the assumption that it will perform functionally;  
 

 You mean
   y - 'names-'(x, foo)
 instead of
   y - x
   names(y) - foo
 ?
   

what i mean is, rather precisely, that 'names-'(x, 'foo') will produce
a *new* object with a copy of the value of x and names as specified, and
will *not*, under any circumstances, modify x.

the first line above does not quite address this, e.g.:

x = c(1)
y = 'names-'(x, 'foo')
names(x)
# foo, 'should' be NULL


 Fair enough.  But I would still prefer the latter version this it is
 (for me) easier to read and to decipher the intention of the code.
   

you're welcome to use it.  but this is personal preference, and i'm
trying to discuss the semantics of r here.  what you show is a way to
clutter the code, and you need to explicitly name the new object, while,
in functional programming, it is typical to operate on anonymous objects
passed from one function to another, e.g.

f('names-'(x, 'foo'))

which would have to become

y = x
names(y) = 'foo'
f(y)

or

f({y = x; names(y) = 'foo'; y})

with 'y' being a nuissance name.


 i.e., 'names-'(x, 'foo') will always produce a copy of x with the
 new names, and never change the x.  
 

 I am not sure whether R ever behaved in that way, but as Peter pointed
 out, this would be quite undesirable from a memory management and
 performance point of view.  

why?  you can still use the infix names- with destructive semantics to
avoid copying. 


 Image that every time you modify a (name)
 component of a large object a new copy of that object is created.
   

see above.  besides, r has been several times claimed here (but see your
remark above) to be a functional language, and in this context it is
surprising that the smart (i mean it) copy-on-assignment mechanism,
which is an implementational optimization, not only becomes visible, but
also makes functions (hmm, procedures?) such as 'names-' non-functional
-- in some, but not all, cases.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk

Wacek Kusnierczyk wrote:

 is precisely why i'd think that the prefix 'names-' should never do
 destructive modifications, because that's what x = 'names-'(x, 'foo'),
 and thus also names(x) = 'foo', is for.

   

to make the point differently, i'd expect the following two to be
equivalent:

x = c(1); 'names-'(x, 'foo'); names(x)
# foo

x = c(1); do.call('names-', list(x, 'foo')); names(x)
# NULL

but they're obviously not.  and of course, just that i'd expect it is
not a strong argument.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Berwin A Turlach

On Thu, 12 Mar 2009 10:53:19 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

 well, ?'names-' says:
 
 
 Value:
  For 'names-', the updated object. 
 
 
 which is only partially correct, in that the value will sometimes be
 an updated *copy* of the object.

But since R supposedly uses call-by-value (though we know how to
circumvent that, don't we?) wouldn't you always expect that a copy of
the object is returned?
 
  And the R Language manual (ignoring for the moment that it is a
  draft and all that), 
 
 since we must...
 
  clearly states that 
 
  names(x) - c(a,b)
 
  is equivalent to
  
  '*tmp*' - x
   x - names-('*tmp*', value=c(a,b))

 
 ... and?  

This seems to suggest that in this case the infix and prefix syntax
is not equivalent as it does not say that 
names(x) - c(a,b)
is equivalent to
x - names-(x, value=c(a,b))
and I was commenting on the claim that the infix syntax is equivalent
to the prefix syntax.

 does this say anything about what 'names-'(...) actually
 returns?  updated *tmp*, or a copy of it?

Since R uses pass-by-value, you would expect the latter, wouldn't
you?  If you entertain the idea that 'names-' updates *tmp* and
returns the updated *tmp*, then you believe that 'names-' behaves in a
non-standard way and should take appropriate care.

And the fact that a variable *tmp* is used hints to the fact that
'names-' might have side-effect.  If 'names-' has side effects,
then it might not be well defined with what value x ends up with if
one executes:
x - 'names-'(x, value=c(a,b))  

This is similar to the discussion what value i should have in the
following C snippet:
i = 0;
i += i++;
 
[..]
  I am not sure whether R ever behaved in that way, but as Peter
  pointed out, this would be quite undesirable from a memory
  management and performance point of view.  
 
 why?  you can still use the infix names- with destructive semantics
 to avoid copying. 

I guess that would require a rewrite (or extension) of the parser.  To
me, Section 10.1.2 of the Language Definition manual suggests that once
an expression is parsed, you cannot distinguish any more whether
'names-' was called using infix syntax or prefix syntax.

Thus, I guess you want to start a discussion with R Core whether it is
worthwhile to change the parser such that it keeps track on whether a
function was used with infix notation or prefix notation and to
provide for most (all?) assignment operators implementations that use
destructive semantics if the infix version was used and always copy if
the prefix notation is used. 

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
 On Thu, 12 Mar 2009 10:53:19 +0100
 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

   
 well, ?'names-' says:

 
 Value:
  For 'names-', the updated object. 
 

 which is only partially correct, in that the value will sometimes be
 an updated *copy* of the object.
 

 But since R supposedly 

*supposedly*

 uses call-by-value (though we know how to
 circumvent that, don't we?) 

we know how a lot of built-ins hack around this, don't we, and we also
know that call-by-value is not really the argument passing mechanism in r.

 wouldn't you always expect that a copy of
 the object is returned?
   

indeed!  that's what i have said previously, no?  there is still space
for the smart (i mean it) copy-on-assignment behaviour, but it should
not be visible to the user, in particular, not in that 'names-'
destructively modifies the object it is given when the refcount is 1. 
in my humble opinion, there is either a design flaw or a bug here.


  
   
 And the R Language manual (ignoring for the moment that it is a
 draft and all that), 
   
 since we must...

 
 clearly states that 

 names(x) - c(a,b)

 is equivalent to
 
 '*tmp*' - x
  x - names-('*tmp*', value=c(a,b))
   
   
 ... and?  
 

 This seems to suggest 

seems to suggest?  is not the purpose of documentation to clearly,
ideally beyond any doubt, specify what is to be specified?

 that in this case the infix and prefix syntax
 is not equivalent as it does not say that 
   

are you suggesting fortune telling from what the docs do *not* say?

   names(x) - c(a,b)
 is equivalent to
   x - names-(x, value=c(a,b))
 and I was commenting on the claim that the infix syntax is equivalent
 to the prefix syntax.

   
 does this say anything about what 'names-'(...) actually
 returns?  updated *tmp*, or a copy of it?
 

 Since R uses pass-by-value, 

since?  it doesn't!

 you would expect the latter, wouldn't
 you?  

yes, that's what i'd expect in a functional language.

 If you entertain the idea that 'names-' updates *tmp* and
 returns the updated *tmp*, then you believe that 'names-' behaves in a
 non-standard way and should take appropriate care.
   

i got lost in your argumentation.  i have given examples of where
'names-' destructively modifies and returns the updated object, not a
copy.  what is your point here?

 And the fact that a variable *tmp* is used hints to the fact that
 'names-' might have side-effect.  

are you suggesting fortune telling from the fact that a variable *tmp*
is used?


 If 'names-' has side effects,
 then it might not be well defined with what value x ends up with if
 one executes:
   x - 'names-'(x, value=c(a,b))  
   

not really, unless you mean the returned object in the referential sense
(memory location) versus value conceptually.  here x will obviously have
the value of the original x plus the names, *but* indeed you cannot tell
from this snippet whether after the assignment x will be the same,
though updated, object or will rather be an updated copy:

x = c(1)
x = 'names-'(x, 'foo')
# x is the same object

x = c(1)
y = x
x = 'names-'(x, 'foo')
# x is another object

so, as you say, it is not well defined with what object will x end up as
its value, though the value of the object visible to the user is well
defined.  rewrite the above and play:

x = c(1)
y = 'names-'(x, 'foo')
names(x)

what are the names of x?  is y identical (sensu refernce) with x, is y
different (sensu reference) but indiscernible (sensu value) from x, or
is y different (sensu value) from x in that y has names and x doesn't?



 This is similar to the discussion what value i should have in the
 following C snippet:
   i = 0;
   i += i++;
   

nonsense, it's a *completely* different issue.  here you touch the issue
of the order of evaluation, and not of whether an object is copied or
modified;  above, the inverse is true.

in fact, your example is useless because the result here is clearly
specified by the semantics (as far as i know -- prove me wrong).  you
lookup i (0) and i (0) (the order does not matter here), add these
values (0), assign to i (0), and increase i (1). 

i have a better example for you:

int i = 0;
i += ++i - ++i

which will give different final values for i in c (2 with gcc 4.2, 1
with gcc 3.4), c# and java (-1), perl (2) and php (1).  again, this has
nothing to do with the above.



  
 [..]
   
 I am not sure whether R ever behaved in that way, but as Peter
 pointed out, this would be quite undesirable from a memory
 management and performance point of view.  
   
 why?  you can still use the infix names- with destructive semantics
 to avoid copying. 
 

 I guess that would require a rewrite (or extension) of the parser.  To
 me, Section 10.1.2 of the Language Definition manual suggests that once
 an expression is parsed, you cannot distinguish any more whether

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk

Wacek Kusnierczyk wrote:
 Berwin A Turlach wrote:
   

 This is similar to the discussion what value i should have in the
 following C snippet:
  i = 0;
  i += i++;
   
 


 in fact, your example is useless because the result here is clearly
 specified by the semantics (as far as i know -- prove me wrong).  you
 lookup i (0) and i (0) (the order does not matter here), add these
 values (0), assign to i (0), and increase i (1). 
   

i'm happy to prove myself wrong.  the c programming language, 2nd ed. by
ritchie and kernigan, has the following discussion:


One unhappy situation is typified by the statement

a[i] = i++;

The question is whether the subscript is the old value of i or the new.
Compilers can interpret
this in different ways, and generate different answers depending on
their interpretation. The
standard intentionally leaves most such matters unspecified.


vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk

Simon Urbanek wrote:

 On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:

 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter hinted
 at that). Internally x=1 an x=c(1) are slightly different in that the
 former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
 what causes the difference in behavior as Peter explained. The reason
 is that c(1) creates a copy of the 1 (which is a constant
 [=unmutable] thus requiring a copy) and the new copy has no other
 references and thus can be modified and hence NAMED(x) = 0.


 Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --
 since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)
 = 1 -- this is just a detail on how things work with assignment, the
 explanation above is still correct since duplication happens
 conditional on NAMED == 2.

there is an interesting corollary.  self-assignment seems to increase
the reference count:

x = 1;  'names-'(x, 'foo'); names(x)
# NULL

x = 1;  x = x;  'names-'(x, 'foo'); names(x)
# foo

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Berwin A Turlach

On Thu, 12 Mar 2009 15:21:50 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

[...]   
  And the R Language manual (ignoring for the moment that it is a
  draft and all that), 

  since we must...
 
  
  clearly states that 
 
names(x) - c(a,b)
 
  is equivalent to

'*tmp*' - x
   x - names-('*tmp*', value=c(a,b))


  ... and?  
  
 
  This seems to suggest 
 
 seems to suggest?  is not the purpose of documentation to clearly,
 ideally beyond any doubt, specify what is to be specified?

The R Language Definition manual is still a draft. :)

  that in this case the infix and prefix syntax
  is not equivalent as it does not say that 

 
 are you suggesting fortune telling from what the docs do *not* say?

My experience is that sometimes you have to realise what is not
stated.  I remember a discussion with somebody who asked why he could
not run, on windows, R CMD INSTALL on a *.zip file.  I pointed out to
him that the documentation states that you can run R CMD INSTALL on
*.tar.gz or *.tgz files and, thus, there should be no expectation that
it can be run on *.zip file.

YMMV, but when I read a passage like this in R documentation, I start
to wonder why it is stated that 
names(x) - c(a,b)
is equivalent to 
*tmp* - x
x - names-('*tmp*', value=c(a,b))
and the simpler construct
x - names-(x, value=c(a, b))
is not used.  There must be a reason, nobody likes to type
unnecessarily long code.  And, after thinking about this for a while,
the penny might drop.

[...] 
  does this say anything about what 'names-'(...) actually
  returns?  updated *tmp*, or a copy of it?
  
 
  Since R uses pass-by-value, 
 
 since?  it doesn't!

For all practical purposes it is as long as standard evaluation is
used.  One just have to be aware that some functions evaluate their
arguments in a non-standard way.  

[...]
  If you entertain the idea that 'names-' updates *tmp* and
  returns the updated *tmp*, then you believe that 'names-' behaves
  in a non-standard way and should take appropriate care. 
 
 i got lost in your argumentation.  [..]

I was commenting on does this say anything about what 'names-'(...)
actually returns?  updated *tmp*, or a copy of it?

As I said, if you entertain the idea that 'names-' returns an updated
*tmp*, then you believe that 'names-' behaves in a non-standard way
and appropriate care has to be taken.

  And the fact that a variable *tmp* is used hints to the fact that
  'names-' might have side-effect.  
 
 are you suggesting fortune telling from the fact that a variable *tmp*
 is used?

Nothing to do with fortune telling.  One reads the manual, one wonders
why is this construct used instead of an apparently much more simple
one, one reflects and investigates, one realises why the given
construct is stated as the equivalent: because names- has
side-effects.

  This is similar to the discussion what value i should have in the
  following C snippet:
  i = 0;
  i += i++;

 
 nonsense, it's a *completely* different issue.  here you touch the
 issue of the order of evaluation, and not of whether an object is
 copied or modified;  above, the inverse is true.

Sorry, there was a typo above.  The second statement should have been
i = i++;

Then on some abstract level they are the same; an object appears on the
left hand side of an assignment but is also modified in the expression
assigned to it.  So what value should it end up with?

   
  why?  you can still use the infix names- with destructive
  semantics to avoid copying. 
  
 
  I guess that would require a rewrite (or extension) of the parser.
  To me, Section 10.1.2 of the Language Definition manual suggests
  that once an expression is parsed, you cannot distinguish any more
  whether 'names-' was called using infix syntax or prefix syntax.

 
 but this must be nonsense, since:
 
 x = 1
 'names-'(x, 'foo')
 names(x)
 # NULL
 
 x = 1
 names(x) - 'foo'
 names(x)
 # foo
 
 clearly, there is not only syntactic difference here.  but it might be
 that 10.1.2 does not suggest anything like what you say.

Please tell me how this example contradicts my reading of 10.1.2 that
the expressions 
'names-'(x, 'foo')
and
names(x) - 'foo'
once they are parsed, produce exactly the same parse tree and that it
becomes impossible to tell from the parse tree whether originally the
infix syntax or the prefix syntax was used.  In fact, the last sentence
in section 10.1.2 strongly suggests to me that the parse tree stores
all function calls as if prefix notation was used.  But it is probably
my English again.

  Thus, I guess you want to start a discussion with R Core whether it
  is worthwhile to change the parser such that it keeps track on
  whether a function was used with infix notation or prefix notation
  and to provide for most (all?) assignment operators implementations
  that

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Simon Urbanek



On Mar 12, 2009, at 11:12 , Wacek Kusnierczyk wrote:


Simon Urbanek wrote:


On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:


Wacek,

Peter gave you a full answer explaining it very well. If you really
want to be able to trace each instance yourself, you have to learn
far more about R internals than you apparently know (and Peter  
hinted
at that). Internally x=1 an x=c(1) are slightly different in that  
the

former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
what causes the difference in behavior as Peter explained. The  
reason

is that c(1) creates a copy of the 1 (which is a constant
[=unmutable] thus requiring a copy) and the new copy has no other
references and thus can be modified and hence NAMED(x) = 0.



Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --
since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)
= 1 -- this is just a detail on how things work with assignment, the
explanation above is still correct since duplication happens
conditional on NAMED == 2.


there is an interesting corollary.  self-assignment seems to  
increase the reference count:


   x = 1;  'names-'(x, 'foo'); names(x)
   # NULL

   x = 1;  x = x;  'names-'(x, 'foo'); names(x)
   # foo



Not for me, at least in current R:

 x = 1;  'names-'(x, 'foo'); names(x)
foo
  1
NULL
 x = 1;  x = x;  'names-'(x, 'foo'); names(x)
foo
  1
NULL

(both R 2.8.1 and R-devel 3/11/09, darwin 9.6)

In addition, you still got it backwards - your output suggests that  
the assignment created a new, clean copy. Functional call of `names-`  
(whose side-effect on x is undefined BTW) is destructive when you get  
a clean copy (e.g. as a result of the c function) and non-destructive  
when the object was referenced. It is left as an exercise to the  
reader to reason why constants such as 1 are referenced.


Cheers,
Simon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread G. Jay Kerns

Wacek Kusnierczyk wrote:

[snip]

 as i explained a few months ago, i study r to find examples of bad
 design.  if anyone in the r core is interested in having the problems i
 report fixed, i'm happy to get involved in a discussion about the design
 and implementation.  if not, i'm happy with just pointing out the issues.

:-)

I am prompted to imagine someone pointing out to the volunteers of the
International Red Cross - on the field of a natural disaster, no less
- that their uniforms are not an acceptably consistent shade of
pink... or that the screws on their tourniquets do not have the
appropriate pitch as to minimize the friction for the turner...

As a practicing statistician I am simply thankful that the bleeding is
stopped.   :-)

Cheers to R-Core (and the hundreds of other volunteers).
Jay



***
G. Jay Kerns, Ph.D.
Associate Professor
Department of Mathematics  Statistics
Youngstown State University
Youngstown, OH 44555-0002 USA
Office: 1035 Cushwa Hall
Phone: (330) 941-3310 Office (voice mail)
-3302 Department
-3170 FAX
E-mail: gke...@ysu.edu
http://www.cc.ysu.edu/~gjkerns/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk

Berwin A Turlach wrote:
 On Thu, 12 Mar 2009 15:21:50 +0100
 Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

   
 seems to suggest?  is not the purpose of documentation to clearly,
 ideally beyond any doubt, specify what is to be specified?
 

 The R Language Definition manual is still a draft. :)
   

this is indeed a good explanation for all sorts of nonsense.  worse if
stuff tends to persist despite critique.

   
 that in this case the infix and prefix syntax
 is not equivalent as it does not say that 
   
   
 are you suggesting fortune telling from what the docs do *not* say?
 

 My experience is that sometimes you have to realise what is not
 stated.  

in general, yes.  in r, this often ends up with 'have you seen the
documentation saying that??' in response.

 I remember a discussion with somebody who asked why he could
 not run, on windows, R CMD INSTALL on a *.zip file.  I pointed out to
 him that the documentation states that you can run R CMD INSTALL on
 *.tar.gz or *.tgz files and, thus, there should be no expectation that
 it can be run on *.zip file.
   

yes, that's a good point.  this reminds me of a (possibly anectodal)
lady who sued the manufacturer of her microwave after she had dried in
it her cat after a bath.

 YMMV, but when I read a passage like this in R documentation, I start
 to wonder why it is stated that 
   names(x) - c(a,b)
 is equivalent to 
   *tmp* - x
   x - names-('*tmp*', value=c(a,b))
 and the simpler construct
   x - names-(x, value=c(a, b))
 is not used.  There must be a reason, 

got an explanation:  because it probably is as drafty as the
aforementioned document.

 nobody likes to type
 unnecessarily long code.  And, after thinking about this for a while,
 the penny might drop.
   

that's cool.  instead of stating what 'names-' does or does not, one
expresses it in a convoluted way an makes you guess from a *tmp*
variable. a nice exercise, i like it.

 [...] 
   
 does this say anything about what 'names-'(...) actually
 returns?  updated *tmp*, or a copy of it?
 
 
 Since R uses pass-by-value, 
   
 since?  it doesn't!
 

 For all practical purposes it is as long as standard evaluation is
 used.  One just have to be aware that some functions evaluate their
 arguments in a non-standard way.  
   

it's maybe a bit of hairsplitting, but what you have in r is not exactly
what is called 'pass by value'.  here's a relevant quote from [1], p. 309:


In the call-by-name (CBN) mechanism, a formal parameter names the
computation designated by an unevaluated argument expression.

In the call-by-value (CBV) mechanism, a formal parameter names the value
of an evaluated argument expression.

In the call-by-need or lazy evaluation (CBL), the formal parameter name
can be bound to a location that originally stores the computation of the
argument expression. The first time the parameter is referenced, the
computation is performed, but the resulting value is cached at the
location and is used on every subsequent reference. Thus, the argument
expression is evaluated at most once and is never evaluated at all if
the parameter is never referenced.


note the 'unevaluated' and 'evaluated'.  you're free to have your pick. 

but it is possible to send an argument to a function that makes an
assignment to the argument, and yet the assignment is made to the
original, not to a copy:

foo = function(arg) arg$foo = foo

e = new.env()
foo(e)
e$foo
  
are you sure this is pass by value?

it appears that r has a pass-by-need mechanism that dispatches to
pass-by-value or pass-by-reference depending on the type of the object. 
with this semantics, all sorts of mess are possible, and 'names-'
provides one example.

[1] design concepts in programming languages, turbak and gifford, mit
press 2008


 [...]
   
 If you entertain the idea that 'names-' updates *tmp* and
 returns the updated *tmp*, then you believe that 'names-' behaves
 in a non-standard way and should take appropriate care. 
   
 i got lost in your argumentation.  [..]
 

 I was commenting on does this say anything about what 'names-'(...)
 actually returns?  updated *tmp*, or a copy of it?

 As I said, if you entertain the idea that 'names-' returns an updated
 *tmp*, then you believe that 'names-' behaves in a non-standard way
 and appropriate care has to be taken.

   

i can check, by experimentation, whether 'names-' returns a copy or the
original; even if i can establish that it returns the original after
having modified it, it's not something to entertain.  maybe you
entertain the idea of your users performing the guesswork instead of
reading an unambiguous specification.  you have already said that you
don't care if your users get confused, it would fit the image.

and actually, in the example we discuss, 'names-' does *not* return an
updated *tmp*, so there's even less to entertain.  for fun and more
guesswork, the example could

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk

Simon Urbanek wrote:

 On Mar 12, 2009, at 11:12 , Wacek Kusnierczyk wrote:

 Simon Urbanek wrote:

 On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:

 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter hinted
 at that). Internally x=1 an x=c(1) are slightly different in that the
 former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
 what causes the difference in behavior as Peter explained. The reason
 is that c(1) creates a copy of the 1 (which is a constant
 [=unmutable] thus requiring a copy) and the new copy has no other
 references and thus can be modified and hence NAMED(x) = 0.


 Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --
 since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)
 = 1 -- this is just a detail on how things work with assignment, the
 explanation above is still correct since duplication happens
 conditional on NAMED == 2.

 there is an interesting corollary.  self-assignment seems to increase
 the reference count:

x = 1;  'names-'(x, 'foo'); names(x)
# NULL

x = 1;  x = x;  'names-'(x, 'foo'); names(x)
# foo


 Not for me, at least in current R:

not for me either.  i messed up the example, sorry.  here's the intended
version:

x = c(1);  'names-'(x, 'foo');  names(x)
# foo

x = c(1);  x = x; 'names-'(x, 'foo');  names(x)
# NULL
  


  x = 1;  'names-'(x, 'foo'); names(x)
 foo
   1
 NULL
  x = 1;  x = x;  'names-'(x, 'foo'); names(x)
 foo
   1
 NULL

 (both R 2.8.1 and R-devel 3/11/09, darwin 9.6)

 In addition, you still got it backwards - your output suggests that
 the assignment created a new, clean copy. Functional call of `names-`
 (whose side-effect on x is undefined BTW) is destructive when you get
 a clean copy (e.g. as a result of the c function) and non-destructive
 when the object was referenced. It is left as an exercise to the
 reader to reason why constants such as 1 are referenced.

all true, again because of my mistake. 

anyway, it may be suprising that with all its smartness (i mean it)
about copy-on-assingment, r does not see that it makes no sense to
increase refcount here.  of course, you can't judge from just the
syntactic form 'x=x', but still it should not be very difficult to have
the interpreter see when it finds an object named 'x' in the same
environment where it attempts the assignment.  (of course, who'd do
self-assignments in practical code?)

cheers,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Josh Ulrich

On Thu, Mar 12, 2009 at 3:24 PM, G. Jay Kerns gke...@ysu.edu wrote:
 Wacek Kusnierczyk wrote:

 [snip]

 as i explained a few months ago, i study r to find examples of bad
 design.  if anyone in the r core is interested in having the problems i
 report fixed, i'm happy to get involved in a discussion about the design
 and implementation.  if not, i'm happy with just pointing out the issues.

 :-)

 I am prompted to imagine someone pointing out to the volunteers of the
 International Red Cross - on the field of a natural disaster, no less
 - that their uniforms are not an acceptably consistent shade of
 pink... or that the screws on their tourniquets do not have the
 appropriate pitch as to minimize the friction for the turner...

Your analogy may overstate the case a bit, since R volunteers - while
providing a valuable service to the community - are not dealing with
matters of life and death.

Habitat for Humanity (an organization that provides free housing to
the under-privileged) would be a better comparison.  I'm sure those
volunteers would appreciate a critique of their work, provided the
critique was not condescending and focused on serving the community
better, not to showcase the acumen of the one giving the critique.


 As a practicing statistician I am simply thankful that the bleeding is
 stopped.   :-)

 Cheers to R-Core (and the hundreds of other volunteers).
 Jay


I second that.  Thanks to R-Core et al for all their generous efforts.



 ***
 G. Jay Kerns, Ph.D.
 Associate Professor
 Department of Mathematics  Statistics
 Youngstown State University
 Youngstown, OH 44555-0002 USA
 Office: 1035 Cushwa Hall
 Phone: (330) 941-3310 Office (voice mail)
 -3302 Department
 -3170 FAX
 E-mail: gke...@ysu.edu
 http://www.cc.ysu.edu/~gjkerns/

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


Best,
Josh
--
http://quantemplation.blogspot.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Wacek Kusnierczyk

G. Jay Kerns wrote:
 Wacek Kusnierczyk wrote:


   
 I am prompted to imagine someone pointing out to the volunteers of the
 International Red Cross - on the field of a natural disaster, no less
 - that their uniforms are not an acceptably consistent shade of
 pink... or that the screws on their tourniquets do not have the
 appropriate pitch as to minimize the friction for the turner...

   

not that it is very accurate, because unintuitive and confusing
semantics may lead to hidden and dangerous errors in users' code.  wrong
shade of a uniform might lead to the person being shot, for example, but
then your point vanishes.


 As a practicing statistician I am simply thankful that the bleeding is
 stopped.   :-)
   

when it is stopped, not turned to an internal bleeding, which you simply
don't see.

 Cheers to R-Core (and the hundreds of other volunteers).

   

absolutely.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-12 Thread Berwin A Turlach

On Thu, 12 Mar 2009 21:26:15 +0100
Wacek Kusnierczyk waclaw.marcin.kusnierc...@idi.ntnu.no wrote:

  YMMV, but when I read a passage like this in R documentation, I
  start to wonder why it is stated that 
  names(x) - c(a,b)
  is equivalent to 
  *tmp* - x
  x - names-('*tmp*', value=c(a,b))
  and the simpler construct
  x - names-(x, value=c(a, b))
  is not used.  There must be a reason, 
 
 got an explanation:  because it probably is as drafty as the
 aforementioned document.

Your grasp of what draft manual means in the context of R
documentation seems to be as tenuous as the grasp of intelligent
design/creationist proponents on what it means in science to label a
body of knowledge a (scientific) theory. :)

[...]
 but it is possible to send an argument to a function that makes an
 assignment to the argument, and yet the assignment is made to the
 original, not to a copy:
 
 foo = function(arg) arg$foo = foo
 
 e = new.env()
 foo(e)
 e$foo
   
 are you sure this is pass by value?

But that is what environments are for, aren't they?  And it is
documented behaviour.  Read section 2.1.10 (Environments) in the R
Language Definition, in particular the last paragraph:

  Unlike most other R objects, environments are not copied when 
  passed to functions or used in assignments.  Thus, if you assign the
  same environment to several symbols and change one, the others will
  change too.  In particular, assigning attributes to an environment can
  lead to surprises.

[..]
 and actually, in the example we discuss, 'names-' does *not* return
 an updated *tmp*, so there's even less to entertain.  

How do you know?  Are you sure?  Have you by now studied what goes on
under the hood?

 for fun and more guesswork, the example could have been:
 
 x = x
 x = 'names-'(x, value=c('a', 'b'))

But it is manifestly not written that way in the manual; and for good
reasons since 'names-' might have side effects which invokes in the
last line undefined behaviour.  Just as in the equivalent C snippet
that I mentioned.

 for your interest in well written documentation, ?names says that the
 argument x is 'an r object', and nowhere does it say that environment
 is not an r object.  it also says what the value of 'names-' applied
 to pairlists is.  the following error message is doubly surprising:
 
 e = new.env()
 'names-'(e, 'foo')
 # Error: names() applied to a non-vector

But names are implemented by assigning a name attribute to the
object; as you should know.  And the above documentation suggests that
it is not a good idea to assign attributed to environments.  So why
would you expect this to work?

 firstly, because it would seem that there's nothing wrong in applying
 names to an environment;  from ?'$':
 
 
 x$name
 
 name: A literal character string or a name (possibly backtick
   quoted).  For extraction, this is normally (see under
   'Environments') partially matched to the 'names' of the
   object.
 

I fail to see the relevance of this.

 secondly, because, as ?names says, names can be applied to pairlists,

Yes, but it does not say that names can be applied to environment.
And it explicitly says that the default methods get and set the
'name' attribute of... and (other) documentation warns you about
setting attributes on environments.

 which are not vectors, and the following does not give an error as
 above:
 
 p = pairlist()
 is.vector(p)
 # FALSE
 names(p)
 # names successfully applied to a non-vector

 assure me this is not a mess, but a well-documented design feature.

It is documented, if it is well-documented depends on your definition
of well-documented. :)

 ... and one wonders why r man pages have to be read in O(e^n) time.

I believe patches to documentation are also welcome; and perhaps more
readily accepted than patches to code. 

[...]  
  I guess that would require a rewrite (or extension) of the parser.
  To me, Section 10.1.2 of the Language Definition manual suggests
  that once an expression is parsed, you cannot distinguish any more
  whether 'names-' was called using infix syntax or prefix syntax.


  but this must be nonsense, since:
 
  x = 1
  'names-'(x, 'foo')
  names(x)
  # NULL
 
  x = 1
  names(x) - 'foo'
  names(x)
  # foo
 
  clearly, there is not only syntactic difference here.  but it
  might be that 10.1.2 does not suggest anything like what you say.
  
 
  Please tell me how this example contradicts my reading of 10.1.2
  that the expressions 
  'names-'(x, 'foo')
  and
  names(x) - 'foo'
  once they are parsed, produce exactly the same parse tree and that
  it becomes impossible to tell from the parse tree whether
  originally the infix syntax or the prefix syntax was used.  
 
 because if they produced the same parse tree, you would either have to
 have the same result in both cases (because the same parse tree

Re: [Rd] surprising behaviour of names-

2009-03-11 Thread Simon Urbanek


Wacek,

Peter gave you a full answer explaining it very well. If you really  
want to be able to trace each instance yourself, you have to learn far  
more about R internals than you apparently know (and Peter hinted at  
that). Internally x=1 an x=c(1) are slightly different in that the  
former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is  
what causes the difference in behavior as Peter explained. The reason  
is that c(1) creates a copy of the 1 (which is a constant [=unmutable]  
thus requiring a copy) and the new copy has no other references and  
thus can be modified and hence NAMED(x) = 0.


Cheers,
Simon


On Mar 10, 2009, at 18:16 , Wacek Kusnierczyk wrote:

i got an offline response saying that my original post may have not  
been

clear as to what the problem was, essentially, and that i may need to
restate it in words, in addition to code.

the problem is:  the performance of 'names-' is incoherent, in that  
in
some situations it acts in a functional manner, producing a copy of  
its

argument with the names changed, while in others it changes the object
in-place (and returns it), without copying first.  your explanation
below is of course valid, but does not seem to address the issue.  in
the examples below, there is always (or so it seems) just one  
reference

to the object.

why are the following functional:

   x = 1;  'names-'(x, 'foo'); names(x)
   x = 'foo'; 'names-'(x, 'foo');  names(x)

while these are destructive:

   x = c(1);  'names-'(x, 'foo'); names(x)
   x = c('foo'); 'names-'(x, 'foo');  names(x)

it is claimed that in r a singular value is a one-element vector, and
indeed,

   identical(1, c(1))
   # TRUE
   all.equal(is(1), is(c(1)))
   # TRUE

i also do not understand the difference here:

   x = c(1); 'names-'(x, 'foo'); names(x)
   # foo
   x = c(1); names(x); 'names-'(x, 'foo'); names(x)
   # foo
   x = c(1); print(x); 'names-'(x, 'foo'); names(x)
   # NULL
   x = c(1); print(c(x)); 'names-'(x, 'foo'); names(x)
   # foo

does print, but not names, increase the reference count for x when
applied to x, but not to c(x)?

if the issue is that there is, in those examples where x is left
unchanged, an additional reference to x that causes the value of x  
to be
copied, could you please explain how and when this additional  
reference

is created?


thanks,
vQ




Peter Dalgaard wrote:



is there something i misunderstand here?



Only the ideology/pragmatism... In principle, R has call-by-value
semantics and a function does not destructively modify its  
arguments(*),

and foo(x)-bar behaves like x - foo-(x, bar). HOWEVER, this has
obvious performance repercussions (think x - rnorm(1e7); x[1] -  
0), so
we do allow destructive modification by replacement functions,  
PROVIDED

that the x is not used by anything else. On the least suspicion that
something else is using the object, a copy of x is made before the
modification.

So

(A) you should not use code like y - foo-(x, bar)

because

(B) you cannot (easily) predict whether or not x will be modified
destructively


(*) unless you mess with match.call() or substitute() and the like.  
But

that's a different story.






--
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical  
Engineering (IME)

Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics  Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-11 Thread Simon Urbanek



On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:


Wacek,

Peter gave you a full answer explaining it very well. If you really  
want to be able to trace each instance yourself, you have to learn  
far more about R internals than you apparently know (and Peter  
hinted at that). Internally x=1 an x=c(1) are slightly different in  
that the former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0  
which is what causes the difference in behavior as Peter explained.  
The reason is that c(1) creates a copy of the 1 (which is a constant  
[=unmutable] thus requiring a copy) and the new copy has no other  
references and thus can be modified and hence NAMED(x) = 0.




Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --  
since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)  
= 1 -- this is just a detail on how things work with assignment, the  
explanation above is still correct since duplication happens  
conditional on NAMED == 2.


Cheers,
Simon



On Mar 10, 2009, at 18:16 , Wacek Kusnierczyk wrote:

i got an offline response saying that my original post may have not  
been

clear as to what the problem was, essentially, and that i may need to
restate it in words, in addition to code.

the problem is:  the performance of 'names-' is incoherent, in  
that in
some situations it acts in a functional manner, producing a copy of  
its
argument with the names changed, while in others it changes the  
object

in-place (and returns it), without copying first.  your explanation
below is of course valid, but does not seem to address the issue.  in
the examples below, there is always (or so it seems) just one  
reference

to the object.

why are the following functional:

  x = 1;  'names-'(x, 'foo'); names(x)
  x = 'foo'; 'names-'(x, 'foo');  names(x)

while these are destructive:

  x = c(1);  'names-'(x, 'foo'); names(x)
  x = c('foo'); 'names-'(x, 'foo');  names(x)

it is claimed that in r a singular value is a one-element vector, and
indeed,

  identical(1, c(1))
  # TRUE
  all.equal(is(1), is(c(1)))
  # TRUE

i also do not understand the difference here:

  x = c(1); 'names-'(x, 'foo'); names(x)
  # foo
  x = c(1); names(x); 'names-'(x, 'foo'); names(x)
  # foo
  x = c(1); print(x); 'names-'(x, 'foo'); names(x)
  # NULL
  x = c(1); print(c(x)); 'names-'(x, 'foo'); names(x)
  # foo

does print, but not names, increase the reference count for x when
applied to x, but not to c(x)?

if the issue is that there is, in those examples where x is left
unchanged, an additional reference to x that causes the value of x  
to be
copied, could you please explain how and when this additional  
reference

is created?


thanks,
vQ




Peter Dalgaard wrote:



is there something i misunderstand here?



Only the ideology/pragmatism... In principle, R has call-by-value
semantics and a function does not destructively modify its  
arguments(*),

and foo(x)-bar behaves like x - foo-(x, bar). HOWEVER, this has
obvious performance repercussions (think x - rnorm(1e7); x[1] -  
0), so
we do allow destructive modification by replacement functions,  
PROVIDED

that the x is not used by anything else. On the least suspicion that
something else is using the object, a copy of x is made before the
modification.

So

(A) you should not use code like y - foo-(x, bar)

because

(B) you cannot (easily) predict whether or not x will be modified
destructively


(*) unless you mess with match.call() or substitute() and the  
like. But

that's a different story.






--
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical  
Engineering (IME)

Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics  Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-11 Thread Wacek Kusnierczyk

Simon Urbanek wrote:
 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn far
 more about R internals than you apparently know (and Peter hinted at
 that). Internally x=1 an x=c(1) are slightly different in that the
 former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
 what causes the difference in behavior as Peter explained. The reason
 is that c(1) creates a copy of the 1 (which is a constant [=unmutable]
 thus requiring a copy) and the new copy has no other references and
 thus can be modified and hence NAMED(x) = 0.


simon, thanks for the explanation, it's now as clear as i might expect.

now i'm concerned with what you say:  that to understand something
visible to the user one needs to learn far more about R internals than
one apparently knows.  your response suggests that to use r without
confusion one needs to know the internals, and this would be a really
bad thing to say..  i have long been concerned with that r unnecessarily
exposes users to its internals, and here's one more example of how the
interface fails to hide the guts.  (and peter did not give me a full
answer, but a vague hint.)

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-11 Thread Wacek Kusnierczyk

Simon Urbanek wrote:

 On Mar 11, 2009, at 10:52 , Simon Urbanek wrote:

 Wacek,

 Peter gave you a full answer explaining it very well. If you really
 want to be able to trace each instance yourself, you have to learn
 far more about R internals than you apparently know (and Peter hinted
 at that). Internally x=1 an x=c(1) are slightly different in that the
 former has NAMED(x) = 2 whereas the latter has NAMED(x) = 0 which is
 what causes the difference in behavior as Peter explained. The reason
 is that c(1) creates a copy of the 1 (which is a constant
 [=unmutable] thus requiring a copy) and the new copy has no other
 references and thus can be modified and hence NAMED(x) = 0.


 Errata: to be precise replace NAMED(x) = 0 with NAMED(x) = 1 above --
 since NAMED(c(1)) = 0 and once it's assigned to x it becomes NAMED(x)
 = 1 -- this is just a detail on how things work with assignment, the
 explanation above is still correct since duplication happens
 conditional on NAMED == 2.

i guess this is what every user needs to know to understand the
behaviour one can observe on the surface?  thanks for further
clarifications.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] surprising behaviour of names-

2009-03-10 Thread Wacek Kusnierczyk

playing with 'names-', i observed the following:
  
x = 1
names(x)
# NULL
'names-'(x, 'foo')
# c(foo=1)
names(x)
# NULL

where 'names-' has a functional flavour (does not change x), but:

x = 1:2
names(x)
# NULL
'names-'(x, 'foo')
# c(foo=1, 2)
names(x)
# foo NA
  
where 'names-' seems to perform a side effect on x (destructively
modifies x).  furthermore:

x = c(foo=1)
names(x)
# foo
'names-'(x, NULL)
names(x)
# NULL
'names-'(x, 'bar')
names(x)
# bar !!!

x = c(foo=1)
names(x)
# foo
'names-'(x, 'bar')
names(x)
# bar !!!

where 'names-' is not only able to destructively remove names from x,
but also destructively add or modify them (quite unlike in the first
example above).

analogous code but using 'dimnames-' on a matrix performs a side effect
on the matrix even if it initially does not have dimnames:

x = matrix(1,1,1)
dimnames(x)
# NULL
'dimnames-'(x, list('foo', 'bar'))
dimnames(x)
# list(foo, bar)

this is incoherent with the first example above, in that in both cases
the structure initially has no names or dimnames attribute, but the end
result is different in the two examples.

is there something i misunderstand here?


there is another, minor issue with names:

'names-'(1, c('foo', 'bar'))
# error: 'names' attribute [2] must be the same length as the vector [1]

'names-'(1:2, 'foo')
# no error

since ?names says that If 'value' is shorter than 'x', it is extended
by character 'NA's to the length of 'x' (where x is the vector and
value is the names vector), the error message above should say that the
names attribute must be *at most*, not *exactly*, of the length of the
vector.

regards,
vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-10 Thread Peter Dalgaard

Wacek Kusnierczyk wrote:
 playing with 'names-', i observed the following:
   
 x = 1
 names(x)
 # NULL
 'names-'(x, 'foo')
 # c(foo=1)
 names(x)
 # NULL
 
 where 'names-' has a functional flavour (does not change x), but:
 
 x = 1:2
 names(x)
 # NULL
 'names-'(x, 'foo')
 # c(foo=1, 2)
 names(x)
 # foo NA
   
 where 'names-' seems to perform a side effect on x (destructively
 modifies x).  furthermore:
 
 x = c(foo=1)
 names(x)
 # foo
 'names-'(x, NULL)
 names(x)
 # NULL
 'names-'(x, 'bar')
 names(x)
 # bar !!!
 
 x = c(foo=1)
 names(x)
 # foo
 'names-'(x, 'bar')
 names(x)
 # bar !!!
 
 where 'names-' is not only able to destructively remove names from x,
 but also destructively add or modify them (quite unlike in the first
 example above).
 
 analogous code but using 'dimnames-' on a matrix performs a side effect
 on the matrix even if it initially does not have dimnames:
 
 x = matrix(1,1,1)
 dimnames(x)
 # NULL
 'dimnames-'(x, list('foo', 'bar'))
 dimnames(x)
 # list(foo, bar)
 
 this is incoherent with the first example above, in that in both cases
 the structure initially has no names or dimnames attribute, but the end
 result is different in the two examples.
 
 is there something i misunderstand here?

Only the ideology/pragmatism... In principle, R has call-by-value
semantics and a function does not destructively modify its arguments(*),
and foo(x)-bar behaves like x - foo-(x, bar). HOWEVER, this has
obvious performance repercussions (think x - rnorm(1e7); x[1] - 0), so
we do allow destructive modification by replacement functions, PROVIDED
that the x is not used by anything else. On the least suspicion that
something else is using the object, a copy of x is made before the
modification.

So

(A) you should not use code like y - foo-(x, bar)

because

(B) you cannot (easily) predict whether or not x will be modified
destructively


(*) unless you mess with match.call() or substitute() and the like. But
that's a different story.


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-10 Thread Wacek Kusnierczyk

Peter Dalgaard wrote:
 Wacek Kusnierczyk wrote:
   
 playing with 'names-', i observed the following:
   
 x = 1
 names(x)
 # NULL
 'names-'(x, 'foo')
 # c(foo=1)
 names(x)
 # NULL

 where 'names-' has a functional flavour (does not change x), but:

 x = 1:2
 names(x)
 # NULL
 'names-'(x, 'foo')
 # c(foo=1, 2)
 names(x)
 # foo NA
   
 where 'names-' seems to perform a side effect on x (destructively
 modifies x).  furthermore:

 x = c(foo=1)
 names(x)
 # foo
 'names-'(x, NULL)
 names(x)
 # NULL
 'names-'(x, 'bar')
 names(x)
 # bar !!!

 x = c(foo=1)
 names(x)
 # foo
 'names-'(x, 'bar')
 names(x)
 # bar !!!

 where 'names-' is not only able to destructively remove names from x,
 but also destructively add or modify them (quite unlike in the first
 example above).

 analogous code but using 'dimnames-' on a matrix performs a side effect
 on the matrix even if it initially does not have dimnames:

 x = matrix(1,1,1)
 dimnames(x)
 # NULL
 'dimnames-'(x, list('foo', 'bar'))
 dimnames(x)
 # list(foo, bar)

 this is incoherent with the first example above, in that in both cases
 the structure initially has no names or dimnames attribute, but the end
 result is different in the two examples.

 is there something i misunderstand here?
 

 Only the ideology/pragmatism... In principle, R has call-by-value
 semantics and a function does not destructively modify its arguments(*),
 and foo(x)-bar behaves like x - foo-(x, bar). HOWEVER, this has
 obvious performance repercussions (think x - rnorm(1e7); x[1] - 0), so
 we do allow destructive modification by replacement functions, PROVIDED
 that the x is not used by anything else. On the least suspicion that
 something else is using the object, a copy of x is made before the
 modification.

 So

 (A) you should not use code like y - foo-(x, bar)

 because

 (B) you cannot (easily) predict whether or not x will be modified
 destructively

   

that's fine, thanks, but i must be terribly stupid as i do not see how
this explains the examples above.  where is the x used by something else
in the first example, so that 'names-'(x, 'foo') does *not* modify x
destructively, while it does in the other cases?

i just can't see how your explanation fits the examples -- it probably
does, but i beg you show it explicitly.
thanks.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-10 Thread Stavros Macrakis

 (B) you cannot (easily) predict whether or not x will be modified
 destructively

 that's fine, thanks, but i must be terribly stupid as i do not see how
 this explains the examples above.  where is the x used by something else
 in the first example, so that 'names-'(x, 'foo') does *not* modify x
 destructively, while it does in the other cases?

 i just can't see how your explanation fits the examples -- it probably
 does, but i beg you show it explicitly.

I think the following shows what Peter was referring to:

In this case, there is only one pointer to the value of x:

x - c(1,2)
 names-(x,foo)
 foo NA
   12
 x
 foo NA
   12

In this case, there are two:

 x - c(1,2)
 y - x
 names-(x,foo)
 foo NA
   12
 x
[1] 1 2
 y
[1] 1 2

It seems as though `names-` and the like cannot be treated as R
functions (which do not modify their arguments) but as special
internal routines which do sometimes modify their arguments.

  -s

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-10 Thread Wacek Kusnierczyk

Stavros Macrakis wrote:
 (B) you cannot (easily) predict whether or not x will be modified
 destructively
   
 that's fine, thanks, but i must be terribly stupid as i do not see how
 this explains the examples above.  where is the x used by something else
 in the first example, so that 'names-'(x, 'foo') does *not* modify x
 destructively, while it does in the other cases?

 i just can't see how your explanation fits the examples -- it probably
 does, but i beg you show it explicitly.
 

 I think the following shows what Peter was referring to:

 In this case, there is only one pointer to the value of x:

 x - c(1,2)
   
 names-(x,foo)
 
  foo NA
12
   
 x
 
  foo NA
12

 In this case, there are two:

   
 x - c(1,2)
 y - x
 names-(x,foo)
 
  foo NA
12
   
 x
 
 [1] 1 2
   
 y
 
 [1] 1 2
   

that is and was clear to me, but none of my examples was of the second
form, and hence i think peter's answer did not answer my question. 
what's the difference here:

x = 1
'names-'(x, 'foo')
names(x)
# NULL

x = c(foo=1)
'names-'(x, 'foo')
names(x)
# foo

certainly not something like what you show.   what's the difference here:

x = 1
'names-'(x, 'foo')
names(x)
# NULL
  
x = 1:2
'names-'(x, c('foo', 'bar'))
names(x)
# foo bar

certainly not something like what you show.

 It seems as though `names-` and the like cannot be treated as R
 functions (which do not modify their arguments) but as special
 internal routines which do sometimes modify their arguments.
   

they seem to behave somewhat like macros:

'names-'(a, b)

with the destructive 'names-' is sort of replaced with

a = 'names-'(a, b)

with a functional 'names-'.  but this still does not explain the
incoherence above.  my problem was and is not that 'names-' is not a
pure function, but that it sometimes is, sometimes is not, without any
obvious explanation.  that is, i suspect (not claim) that the behaviour
is not a design feature, but an incident.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-10 Thread Wacek Kusnierczyk

Peter Dalgaard wrote:

 (*) unless you mess with match.call() or substitute() and the like. But
 that's a different story.
   

different or not, it is a story that happens quite often -- too often,
perhaps -- to the degree that one may be tempted to say that the
semantics of argument passing in r is a mess. which of course is not
true, but since it is possible to mess with match.call  co, people
(including r core) do mess with them, and the result is obviously a
mess.  on top of the clear call-by-need semantics -- and on the surface,
you cannot tell how the arguments of a function will be taken (by
value?  by reference?  not at all?), which in effect looks like a messy
semantics.

vQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] surprising behaviour of names-

2009-03-10 Thread Wacek Kusnierczyk

i got an offline response saying that my original post may have not been
clear as to what the problem was, essentially, and that i may need to
restate it in words, in addition to code.

the problem is:  the performance of 'names-' is incoherent, in that in
some situations it acts in a functional manner, producing a copy of its
argument with the names changed, while in others it changes the object
in-place (and returns it), without copying first.  your explanation
below is of course valid, but does not seem to address the issue.  in
the examples below, there is always (or so it seems) just one reference
to the object.

why are the following functional:

x = 1;  'names-'(x, 'foo'); names(x)
x = 'foo'; 'names-'(x, 'foo');  names(x)

while these are destructive:

x = c(1);  'names-'(x, 'foo'); names(x)
x = c('foo'); 'names-'(x, 'foo');  names(x)

it is claimed that in r a singular value is a one-element vector, and
indeed,

identical(1, c(1))
# TRUE
all.equal(is(1), is(c(1)))
# TRUE

i also do not understand the difference here:

x = c(1); 'names-'(x, 'foo'); names(x)
# foo
x = c(1); names(x); 'names-'(x, 'foo'); names(x)
# foo
x = c(1); print(x); 'names-'(x, 'foo'); names(x)
# NULL
x = c(1); print(c(x)); 'names-'(x, 'foo'); names(x)
# foo

does print, but not names, increase the reference count for x when
applied to x, but not to c(x)?

if the issue is that there is, in those examples where x is left
unchanged, an additional reference to x that causes the value of x to be
copied, could you please explain how and when this additional reference
is created?


thanks,
vQ




Peter Dalgaard wrote:

 is there something i misunderstand here?
 

 Only the ideology/pragmatism... In principle, R has call-by-value
 semantics and a function does not destructively modify its arguments(*),
 and foo(x)-bar behaves like x - foo-(x, bar). HOWEVER, this has
 obvious performance repercussions (think x - rnorm(1e7); x[1] - 0), so
 we do allow destructive modification by replacement functions, PROVIDED
 that the x is not used by anything else. On the least suspicion that
 something else is using the object, a copy of x is made before the
 modification.

 So

 (A) you should not use code like y - foo-(x, bar)

 because

 (B) you cannot (easily) predict whether or not x will be modified
 destructively

 
 (*) unless you mess with match.call() or substitute() and the like. But
 that's a different story.


   


-- 
---
Wacek Kusnierczyk, MD PhD

Email: w...@idi.ntnu.no
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical Engineering (IME)
Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics  Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

46 matches

Mail list logo