Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:

 but I think R is stuck with what it has due to compatibility and the large
 base of users yet its still possible to add functions in packages or new
 functions to R so a new variant of subset would be possible in which
 case one could decide to use the new function in place of the old one.
   

you're probably correct.

but then it might be worth asking whether carrying on with misdesign for
backward compatibility outbalances guaranteed crashes in future users'
programs, which result in confused complaints, the need for responses
suggesting hacks to bypass the design, and possibly incorrect results
published because r is likely to do everything but what the user expects.

r suffers from early made poor decisions, but then this in itself is not
a good reason to carry on.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:

 Regarding the convenience it occurs in expressions like this:

iris2 - subset(iris, select = - Species)

 to create a data frame without the Species column.
   

aha!  so what's you best guess about the result here:

d = data.frame(a = 1)
d$`-b` = 2
names(d)
# here we go

subset(d, select = -b)
# to b or not to b?

b = a
subset(d, select = -b)
# tragedy

d$b = 3
subset(d, select = -b)
# catharsis

(for whatever reason a user may choose to have a column named '-b')


vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Gavin Simpson
On Tue, 2008-11-11 at 09:49 +0100, Wacek Kusnierczyk wrote:
 Gabor Grothendieck wrote:
 
  Regarding the convenience it occurs in expressions like this:
 
 iris2 - subset(iris, select = - Species)
 
  to create a data frame without the Species column.

 
 aha!  so what's you best guess about the result here:

I'm not sure I see too much of a problem here.

 
 d = data.frame(a = 1)
 d$`-b` = 2
 names(d)
 # here we go
 
 subset(d, select = -b)
 # to b or not to b?

but -b is not the name of the column; you explicitly called it `-b` and
you should refer to it as such. If you use non-standard names then
expect to do a bit more work.

 subset(d, select = `-b`)
  -b
1  2
 subset(d, select = - `-b`)
  a
1 1

 
 b = a
 subset(d, select = -b)
 # tragedy

For this, I interpret it as not finding a column named b so tries to
evaluate:

 b = a
 `-`(b)
Error in -b : invalid argument to unary operator

`-` is a function remember.

If you want this to work you can use get()

 subset(d, select = - get(b))
  -b
1  2

 
 d$b = 3
 subset(d, select = -b)
 # catharsis
 
 (for whatever reason a user may choose to have a column named '-b')

Yes, but the user is warned about not using standard naming conventions
in the Introduction to R manual. You aren't stopped from using names
like `-b` but if you use them, you have to expect to work a little
harder.

Reading ?subset we have:

  select: expression, indicating columns to select from a data frame.



 For data frames, the 'subset' argument works on the rows.  Note
 that 'subset' will be evaluated in the data frame, so columns can
 be referred to (by name) as variables in the expression (see the
 examples).

which I think is reasonably explicit is it not? It explains why your
second example fails and why '- get(b)' doesn't, and also why your other
examples don't give you what you want. You aren't using the appropriate
'name'.

I'm sure we could all find aspects of R that don't work in exactly the
way we might preconceive or think of as being intuitive. But if it works
as documented then I don't see what the problem is unless i) you are
offering to rewrite the code to make it work better, ii) that R Core
thinks any proposal works better and iii) in doing so it doesn't break
most of the R code out there in R itself or in add-on packages.

G

 
 
 vQ
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
 On Tue, 11 Nov 2008 09:27:41 +0100
 Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

   
 but then it might be worth asking whether carrying on with misdesign
 for backward compatibility outbalances guaranteed crashes in future
 users' programs, [...]
 

 Why is it worth asking this if nobody else asks it?  


i guess most of the people who do ask questions here care little about r
itself, they just want it to solve a problem, even if it involves
hacking the language.

those outside the r team who care about language design have probably
left the list long ago, if only they were subscribed.  the fact that
it's only me asking is no statistics.  i do talk to people, and know
many who'd ask, but they just don't care, because they have already
trashed r.  instead of discouraging me, make use of that i care to ask.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Gavin Simpson wrote:

 d = data.frame(a = 1)
 d$`-b` = 2
 names(d)
 # here we go

 subset(d, select = -b)
 # to b or not to b?
 

 but -b is not the name of the column; you explicitly called it `-b` and
 you should refer to it as such. If you use non-standard names then
 expect to do a bit more work.
   
identical(names(d)[2], -b)

if i do

d$`c` = 4

then you claim d has no column named 'c'?  do i have to refer to the c
column as `c`?


   
 subset(d, select = `-b`)
 
   -b
 1  2
   

... and i have to use

subset(d, select = `a`)

and not

subset(d, select = a)

right?  besides, subset(d, select = `-b`) should rather return the
column(s) whose names are the value of the variable `-b`:

`-a` = a
subset(d, select = `-a`)
# returns all columns except for the one named 'a', rather than the
column named '-a' -- but that's just because there is no such column in
d;  if there were, this one would be returned. 

so even with backquotes used, there is no obvious interpretation of what
select=`-b`should mean, because it depends on what names components of
the first argument have.  and this breaks the concept of referential
transparency.

so the problem is not so easily explained away.  what subset does *is*
messy.


 subset(d, select = - `-b`)
 
   a
 1 1

   
 b = a
 subset(d, select = -b)
 # tragedy
 

 For this, I interpret it as not finding a column named b so tries to
 evaluate:

   

you interpret it.  how obvious is this for most users?
it tries to find a column named 'b', not a column named b.  that's the
problem with subset.


 b = a
 `-`(b)
 
 Error in -b : invalid argument to unary operator

 `-` is a function remember.

 If you want this to work you can use get()

   
 subset(d, select = - get(b))
 
   -b
 1  2

   

use this hack to get around the design.

 d$b = 3
 subset(d, select = -b)
 # catharsis

 (for whatever reason a user may choose to have a column named '-b')
 

 Yes, but the user is warned about not using standard naming conventions
 in the Introduction to R manual. You aren't stopped from using names
 like `-b` but if you use them, you have to expect to work a little
 harder.
   

i'd like you to point me to that warning, as i apparently need to read
it, but i haven't found it in the manual yet.  thanks.

 Reading ?subset we have:

   select: expression, indicating columns to select from a data frame.

 

  For data frames, the 'subset' argument works on the rows.  Note
  that 'subset' will be evaluated in the data frame, so columns can
  be referred to (by name) as variables in the expression (see the
  examples).

 which I think is reasonably explicit is it not? 

about?  it says nothing about how the expression passed as the select
argument is treated.  it just says that the select argument is an
expression indicating columns (but how?), and then, in the middle of
explaining the subset parameter, it mentions that columns can be
referred to by name as variables in the expression.  how clear is this?

the following does not work -- i'd expect it to, by virtue the clear
explanation:

d = data.frame(a=1, b=2)
subset(d, select=c(a, b))
# what??  it does not break any 'specification' given in the docs


 It explains why your
 second example fails and why '- get(b)' doesn't, and also why your other
 examples don't give you what you want. You aren't using the appropriate
 'name'.
   
that's still too confusing.  ?get:

get(x, ...)

x: a variable name (given as a character string)

so:

get(b)
# a, because we get the variable b, whose value is a

get(b)
# variable a not found

in '-get(b)', get(b) should evaluate to the value of the variable named
in b; b is a, so get should lookup the value of the variable a, but
there is none (unless you defined it), so this should break.  instead,
'get(b)' is replaced with 'a', and '-a' in subset(d, select=-a) is not
treated as an application of the function `-`to the variable a, but
literally as the specification 'but column named 'a''. 

it must be painfully obvious to a casual user.


 I'm sure we could all find aspects of R that don't work in exactly the
 way we might preconceive or think of as being intuitive. 

most of it, seems like.

 But if it works
 as documented 

in many cases, the documentation is insufficient, confusing, and
unhelpful when it comes to this sort of what you might call 'optimizations'.

 then I don't see what the problem is unless i) you are
 offering to rewrite the code to make it work better, ii) that R Core
 thinks any proposal works better and iii) in doing so it doesn't break
 most of the R code out there in R itself or in add-on packages.
   

i'd prefer r to work better rather than work better.  i'm afraid that
serious improvements to r must, by necessity, break quite a lot of
earlier code, which exploits, if only due the impossibility of not doing
so, such design.

it certainly is a good idea to offer to contribute and i'd be happy to
do so, but i wouldn't be given a 

Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Gavin Simpson wrote:
 On Tue, 2008-11-11 at 11:08 +0100, Wacek Kusnierczyk wrote:
   
 Gavin Simpson wrote:
 
 d = data.frame(a = 1)
 d$`-b` = 2
 names(d)
 # here we go

 subset(d, select = -b)
 # to b or not to b?
 
 
 but -b is not the name of the column; you explicitly called it `-b` and
 you should refer to it as such. If you use non-standard names then
 expect to do a bit more work.
   
   
 identical(names(d)[2], -b)

 if i do

 d$`c` = 4

 then you claim d has no column named 'c'?
 

 No, where do you get that from?
   
by simple analogy to the above, just read your own comments.  if you're
suggesting one should not expect this bit to be consistent, it would be
just another example of messy semantics.

   
   do i have to refer to the c
 column as `c`?
 

 No, but then c is a name that doesn't need to be quoted. -b is a name
 that needs to be quoted and if you quote it, things work as you might
 expect.
   

not necessarily, as one of my examples showed:  again, the result of
subset(d, select=`-b`) will depend on whether d has a column named '-b',
and if it doesn't, on whether there is a variable called '-b' that is a
character vector.  there is no way out of this issue, backquoting is no
solution.  no further comment.

   
 
   
   
 subset(d, select = `-b`)
 
 
   -b
 1  2
   
   
 ... and i have to use

 subset(d, select = `a`)

 and not

 subset(d, select = a)

 right?
 

 Is a a name in d? You can quote it if you want but it doesn't need to
 be quoted, so you can use either.
   

you see, yo need to know whether 'a' is a name in d to know what
subset(d, select=a) would do.  no further comment.
   
   besides, subset(d, select = `-b`) should rather return the
 column(s) whose names are the value of the variable `-b`:

 `-a` = a
 subset(d, select = `-a`)
 # returns all columns except for the one named 'a', rather than the
 column named '-a' -- but that's just because there is no such column in
 d;  if there were, this one would be returned. 
 

 No, it returns a if you are following on from your original examples.
 `-a` refers to a variable (object) and that evaluates to a and a is
 component of d so is returned.
   
you're right here, but the problem remains: subset(d, select=`-a`) will
treat `-a` as a column name or as a name of a variable with a vector of
column names, depending on what's in the data.  no further comment.


   
 so even with backquotes used, there is no obvious interpretation of what
 select=`-b`should mean, because it depends on what names components of
 the first argument have.  and this breaks the concept of referential
 transparency.

 so the problem is not so easily explained away.  what subset does *is*
 messy.
 

 In your opinion.
   

yes, but not only mine.  perhaps some more r users will want to support
this claim; just wait.

 And without wanting to be rude or anything, your opinion carries very
 little weight in a project like R. You've arrived on the list and been
 very critical of the work of others. Now there is nothing wrong with
 being critical if it is constructive, and additionally with something
 like R you need to be constructive *and* contribute back. I'm not saying
 that if you did patch R to work the way you think is correct R Core will
 accept them as they need to maintain backwards compatibility and with S
 and not annoy the hundreds of package authors. but coming on here and
 criticising the work of others isn't going to win you many friends.
   

that's really sad.  you're saying no one should ever criticize r without
reading the source code.  you are *really* not interested in feedback. 
note, feedback on the *design*, not implementation, is not fixed by
sending a patch.  you have a serious misconception here.

if i buy a tv, and read the quick guide, and start using it, and push
buttons, and suddenly get an electric shock, and complain to the
manufacturer, and they say i should have carefully read the 2K pages
manual because it says there i can get high voltage on my fingers while
pushing the buttons, and it's my fault, and if i want to complain i
should first study the schematics --- what??  they're just crazy, no?

 Also, subset (and the other things you've been harping on about) work as
 documented. So you kind of have to like it or lump it.
   

we've just gone through the docs, and it's *you* who thinks it's so
beautifully clear from the docs what subset does.  i lump it.

   
 
 subset(d, select = - `-b`)
 
 
   a
 1 1

   
   
 b = a
 subset(d, select = -b)
 # tragedy
 
 
 For this, I interpret it as not finding a column named b so tries to
 evaluate:

   
   
 you interpret it.  how obvious is this for most users?
 it tries to find a column named 'b', not a column named b.  that's the
 problem with subset.
 

 If users read the documentation then they'd know about unary operators.
   

if you read the reference you'd know touching this 

Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Duncan Murdoch wrote:
 On 11/11/2008 8:53 AM, hadley wickham wrote:
 On Mon, Nov 10, 2008 at 1:04 PM, Wacek Kusnierczyk
 [EMAIL PROTECTED] wrote:
 pardon me, but does this address in any way the legitimate complaint of
 the rightfully confused user?

 consider the following:

 d = data.frame(a=1, b=2)
 a = c(a, b)
 z = a
 # that is, both a and z are c(a, b)

 subset(d, select=z)
 # gives two columns, since z is a two element vector whose elements are
 valid column names

 subset(d, select=a)
 # gives one column, since 'a' (but not a) is a valid column name

 subset(d, select=c(a,b))
 # gives two columns


 this is certainly what the authors intended, and they may have good
 grounds for this smart design.  but this must break the expectation
 of a
 naive (r-naive, for that matter) user, who may otherwise have excellent
 experience in using a functional programming language, e.g., scheme.
 (especially scheme, where symbols and expressions are first-class
 objects, yet the distinction between a symbol or an expression and
 their
 referent is made painfully clear, perhaps except for when one hacks
 with
 macros.)

 the examples above illustrate the notorious problem with r that one can
 never tell whether 'a' means the value referred to with the identifier
 'a' or the symbol 'a', unless one gets ugly surprises and is forced
 to study the documentation.  and even then one may not get a clear
 answer.

 I agree, with some caveats.  There are basically two uses of R: as a
 interactive data analysis package and as a statistical programming
 language.  These uses come into conflict: in the interactive
 environment, you want to minimise typing so that you can be as speedy
 as possible.  It doesn't matter if R occasionally makes a wrong guess
 when you have specified something implicitly, because you can fix it
 on the fly.  When you are programming, you care less about saving
 typing and more about reproducibility.  You want to be explicit so
 your function is robust to widely varying inputs, even if it means you
 have to type a lot more.  You see this tension in quite a few places:

  * drop = T
  * functions that return different types of output (e.g. sapply)
 depending on input parameters
  * partial matching of argument names
  * using unevaluated expressions instead of strings (e.g. library,
 subset, ...)

 These are all things that are helpful for interactive use, but make
 life as a programmer more difficult.  I find the last one particularly
 frustrating because it means it is very difficult to program with some
 functions (i.e subset) without resorting to complex quoting,
 substituting and evaluating tricks.  I have tried to steer away from
 this technique in my packages, and where it's just too convenient for
 interactive use, insulating the deparsing into special functions that
 the data analyst must use (e.g. aes() in ggplot, and .() in plyr),
 along with providing alternatives for the programmer.

 I don't understand why you're getting so much push-back on this issue.
  R is a fantastic language, but it has some genuinely nasty corners.
 In my opinion, this is one of them.

 I think your analysis is correct, that the goals of casual use and
 programming are inconsistent.  But in general I think there's always
 going to be support for providing alternative ways that are
 programmer-safe.

you know, in ipython you can write, e.g., m 1 instead of m(1) to call
the method m on the value 1.  but this is a syntactic shorthand which is
not valid in python, and you can see how it gets translated into python
when you try it.  so you have the cake and you eat it -- there is
consistent (at least, much more consistent than in r) policy on the
syntax, and you can still have conveniences in the interactive interpreter.

r, on the other hand, prefers solutions such as the subset one, which
are the best recipe for confusion.  why would not the r team have a look
at what others are doing?  programming language design has progressed a
lot since the so often cited reference for r was written in 1988.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread hadley wickham
 I think your analysis is correct, that the goals of casual use and
 programming are inconsistent.  But in general I think there's always going
 to be support for providing alternative ways that are programmer-safe.

 For instance, library( foo, character.only=TRUE) says that foo is a
 character vector, not the name of a package.  I don't know of anything that
 subset() provides that is not available in other ways (I think of it as
 purely a convenience function, and my first piece of advice to Karl was not
 to use it).

Good points - every function optimised for interactive use should have
a companion that is optimised for programmatic use.

 However, if there really is something there, then it would be
 worthwhile pointing that out, and either modifying subset() to make it safe,
 or providing an alternative function.

When I teach subsetting I try to make this clear - using [ will always
work, there's no magic and everything is explicit.  subset() has more
magic which saves you typing, but occasionally the magic doesn't work
and you'll be left scratching your head as to why.  In my experience
students prefer subset() until they encounter strange behaviour that
they don't understand.

 I think this tension is a fundamental part of the character of S and R.  But
 it is also fundamental to R that there are QC tests that apply to code in
 packages:  so writing new tests that detect dangerous usage (e.g. to
 disallow partial name matching) would be another way to improve reliability.
  Writing a test for misuse of drop=TRUE seems quite hard, but there are
 probably ways a debugger could do it:  e.g. to tag the invocation as to
 whether any indices were dropped on the first call, and then warn if the
 result isn't the same on every subsequent call).

A similar thing would be to force package authors to explicitly
specify na.rm to ensure that they have thought about how to deal with
missing values (this always trips me up).  Perhaps you could treat
drop similarly - in non-interactive code drop should not have a
default value.   Presumably this wouldn't be too hard to implement - R
CMD check would just switch out [ for a version that didn't have a
default value, in a similar way to what happens with T and F (another
example of implicit interactive use vs. explicit programmatic use)

 Conceivably Karl's problem could be detected in the same way:  tag each name
 in the expression as to whether it was found in the data frame or some other
 environment, and then warn if that tag ever changes.  Or maybe the test
 should just warn that subset() is a convenience function, not meant for
 programming.

It would be nice if the documentation was clearer on these issues.  I
can imagine every function having a numeric value associated with it
which gave it's position on the interactive vs programming continuum.
Then you could sum up all the values in a function and warn the author
if it was too high.  Not very practical to implement though!

Hadley
-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Gavin Simpson
On Tue, 2008-11-11 at 15:54 +0100, Wacek Kusnierczyk wrote:

  Have you tried? But bear in mind that R Core has more to balance that
  just whether you think a design flaw or infelicity etc should be fixed
  when it decides whether to accept patches.

 
 my whole posting is an attempt, may you try to notice.
 
 vQ

Did you read what you wrote. And you still wonder why you get little
response from certain quarters?

1) Don't say no further comment - that is quite arrogant to think that
you are right and everyone who disagrees is wrong.

2) You are being critical of other people's work in a manner that is not
polite or respectful of the efforts of others.

There is nothing wrong with being critical - I never said there was -
but there is a right way to go about it and a wrong way.

Also, you have to consider where we are now with R and where we have
come from. Whilst it would, in an ideal world, be great to fix every
design flaw that you think is in R, there is too much inertia there now
to change somethings or it will take a lot of effort on the part of a
team of people who give that time for free. This has to be a
consideration along side all the other considerations of good design,
improving the logic of how R works etc. You might not agree, but as long
as things are documented to work in a particular way then we might have
to live with them, unless a good case can be made to break existing code
and someone steps up to make the changes.

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
G'day Duncan,

On Tue, 11 Nov 2008 09:37:57 -0500
Duncan Murdoch [EMAIL PROTECTED] wrote:

 I think this tension is a fundamental part of the character of S and
 R. But it is also fundamental to R that there are QC tests that apply
 to code in packages:  so writing new tests that detect dangerous
 usage (e.g. to disallow partial name matching) would be another way
 to improve reliability.  [...]

Please not. :)
After years of using of R, it is now second nature to me to type (yes,
I always spell out from and to) 
seq(from=xx, to=yy, length=zz)
and I never understood why the full name of that argument had to be
length.out.  I would hate to see lots of warning messages because I am
using partial matching.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk

Gavin Simpson wrote:

 I've found several of these discussions involving Wacek's questions very
 enlightening at times; once you get past the it doesn't work as I
 expect so is wrong attitude.
   

just one fix:  my attitude is 'it doesn't work as i imagine an average
user would expect it so it's potentially confusing'.

vQ



-- 
---
Wacek Kusnierczyk, MD PhD

Email: [EMAIL PROTECTED]
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical Engineering (IME)
Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics  Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Gavin Simpson wrote:

 my whole posting is an attempt, may you try to notice.

 vQ
 

 Did you read what you wrote. And you still wonder why you get little
 response from certain quarters?

 1) Don't say no further comment - that is quite arrogant to think that
 you are right and everyone who disagrees is wrong.
   

that meant 'i won't further comment on this, i give up'.  i thought for
a while about explaining this, but then i though i might use the r
strategy -- let it be ambiguous.

 2) You are being critical of other people's work in a manner that is not
 polite or respectful of the efforts of others.
   

i can certainly agree that i don't pay attention to diplomacy.  my
favourite philosopher said 'you should seek friends, not truth';  i
betray him here, fortunately or not.  anyway, if you mean a post raising
serious issues should be ignored just because it is not polished enough,
let it be.  you gain peace, you lose feedback.

i can promise to make more effort to wrap the essence in a cake, and
drop unnecessary pun (you know, i have drop=FALSE by default, because
that's the way many languages other than r have), if this helps.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Peter Dalgaard
Berwin A Turlach wrote:
 G'day Duncan,
 
 On Tue, 11 Nov 2008 09:37:57 -0500
 Duncan Murdoch [EMAIL PROTECTED] wrote:
 
 I think this tension is a fundamental part of the character of S and
 R. But it is also fundamental to R that there are QC tests that apply
 to code in packages:  so writing new tests that detect dangerous
 usage (e.g. to disallow partial name matching) would be another way
 to improve reliability.  [...]
 
 Please not. :)
 After years of using of R, it is now second nature to me to type (yes,
 I always spell out from and to) 
   seq(from=xx, to=yy, length=zz)
 and I never understood why the full name of that argument had to be
 length.out.  I would hate to see lots of warning messages because I am
 using partial matching.

I think the story is this:

At some point in time, in a galaxy not too far away, and using one of
the R-like languages, calling the argument length gave you trouble
calling length(from) inside the function (attempt to call non-function
or some such error). Later, this issue was fixed so that function calls
would look for functions only, but by then, the name couldn't be changed
since some people had been writing it out in full.

(There are a couple of other cases, one of them involving an argument
ending in a ., but I forgot what they are. I don't think there was
ever an along() function, so along.with escapes me.)

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread hadley wickham
On Mon, Nov 10, 2008 at 1:04 PM, Wacek Kusnierczyk
[EMAIL PROTECTED] wrote:
 pardon me, but does this address in any way the legitimate complaint of
 the rightfully confused user?

 consider the following:

 d = data.frame(a=1, b=2)
 a = c(a, b)
 z = a
 # that is, both a and z are c(a, b)

 subset(d, select=z)
 # gives two columns, since z is a two element vector whose elements are
 valid column names

 subset(d, select=a)
 # gives one column, since 'a' (but not a) is a valid column name

 subset(d, select=c(a,b))
 # gives two columns


 this is certainly what the authors intended, and they may have good
 grounds for this smart design.  but this must break the expectation of a
 naive (r-naive, for that matter) user, who may otherwise have excellent
 experience in using a functional programming language, e.g., scheme.
 (especially scheme, where symbols and expressions are first-class
 objects, yet the distinction between a symbol or an expression and their
 referent is made painfully clear, perhaps except for when one hacks with
 macros.)

 the examples above illustrate the notorious problem with r that one can
 never tell whether 'a' means the value referred to with the identifier
 'a' or the symbol 'a', unless one gets ugly surprises and is forced
 to study the documentation.  and even then one may not get a clear answer.

I agree, with some caveats.  There are basically two uses of R: as a
interactive data analysis package and as a statistical programming
language.  These uses come into conflict: in the interactive
environment, you want to minimise typing so that you can be as speedy
as possible.  It doesn't matter if R occasionally makes a wrong guess
when you have specified something implicitly, because you can fix it
on the fly.  When you are programming, you care less about saving
typing and more about reproducibility.  You want to be explicit so
your function is robust to widely varying inputs, even if it means you
have to type a lot more.  You see this tension in quite a few places:

 * drop = T
 * functions that return different types of output (e.g. sapply)
depending on input parameters
 * partial matching of argument names
 * using unevaluated expressions instead of strings (e.g. library, subset, ...)

These are all things that are helpful for interactive use, but make
life as a programmer more difficult.  I find the last one particularly
frustrating because it means it is very difficult to program with some
functions (i.e subset) without resorting to complex quoting,
substituting and evaluating tricks.  I have tried to steer away from
this technique in my packages, and where it's just too convenient for
interactive use, insulating the deparsing into special functions that
the data analyst must use (e.g. aes() in ggplot, and .() in plyr),
along with providing alternatives for the programmer.

I don't understand why you're getting so much push-back on this issue.
 R is a fantastic language, but it has some genuinely nasty corners.
In my opinion, this is one of them.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Gavin Simpson
On Tue, 2008-11-11 at 08:14 -0600, hadley wickham wrote:
  And without wanting to be rude or anything, your opinion carries very
  little weight in a project like R. You've arrived on the list and been
  very critical of the work of others. Now there is nothing wrong with
  being critical if it is constructive, and additionally with something
  like R you need to be constructive *and* contribute back. I'm not saying
 
 You are holding Wacek to a very high standard.  Why is not acceptable
 to say that this part of R is hard to understand without having to
 provide a better solution?

Ok, reading back I should have said if you want something fixed, patches
are welcome. I didn't mean to say that to get help you had to contribute
back. However, Wacek's approach was (and I'm paraphrasing): subset
doesn't work logically or as I expect. It is a mess and needs fixing.

I'm sure no-one on the list minds if people don't understand things and
want to ask questions - I know I ask plenty of questions here about
things I don't understand. But just as there is a posting guide that
says how to go about phrasing a question that is likely to get a
response, we don't need people denigrating the work of others whilst
asking for assistance with what are admittedly hard concepts (ones I
don't fully understand either).

I've found several of these discussions involving Wacek's questions very
enlightening at times; once you get past the it doesn't work as I
expect so is wrong attitude.

 
 subset() _is_ confusing to novice R users.  You can not anticipate
 what subset(df, select = a) will do unless you know what variables are
 defined in the local environment and what variables are defined in the
 data frame.  It is hard to understand how it works without a deep
 understanding of environments and it is hard to teach all the special
 cases.   It is difficult to reliably use subset within another
 function.

I agree, but one can read the documentation for help. It isn't perfect
and expects you know a bit (a lot) about environments etc, but I don't
think it is too confusing if you know what is in df (otherwise how do
you know what to select?), you read the help page and follow the
examples.

G

 
 This comes from my personal experience with subset (good for
 interactive use, never program with) and from my experiences teaching
 ~80 students how to use R over the last two years.
 
 Hadley
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Petr PIKAL
Hi

[EMAIL PROTECTED] napsal dne 11.11.2008 11:32:27:

 Berwin A Turlach wrote:
  On Tue, 11 Nov 2008 09:27:41 +0100
  Wacek Kusnierczyk [EMAIL PROTECTED] wrote:
 
  
  but then it might be worth asking whether carrying on with misdesign
  for backward compatibility outbalances guaranteed crashes in future
  users' programs, [...]
  
 
  Why is it worth asking this if nobody else asks it? 
 
 
 i guess most of the people who do ask questions here care little about r
 itself, they just want it to solve a problem, even if it involves
 hacking the language.

Well, if somebody does not care what is he/she doing then he/she should 
stop immediately. 
If you do not care about how to use machine-gun correctly you could easily 
harm yourself or others. 

 
 those outside the r team who care about language design have probably
 left the list long ago, if only they were subscribed.  the fact that

I am just a BFU although for some time already, so I learned much virtues 
from capable persons who are developing and using R. I started with R when 
I had to change from DOS Statgraphics to some Windows based program and 
get used to it. 

It is like buying new shoes. If somebody just put them on, go for a some 
mountaineering, find out that they cause blisters, discard them and buy a 
new pair then he probable does not get rid of blisters.

 it's only me asking is no statistics.  i do talk to people, and know
 many who'd ask, but they just don't care, because they have already
 trashed r.  instead of discouraging me, make use of that i care to ask.

If i understand - see Gabors post

 Gabor Grothendieck wrote: 
 Certainly this has been recognized as a potential problem: 
 
 http://developer.r-project.org/nonstandard-eval.pdf 
 
 however, it is convenient when you are performing 
 an analysis and entering commands directly as opposed 
 to writing a program although possibly the potential ambiguities 
 overshadow the convenience. 

But changing it could be quite difficult and not on developers high 
priority list.

Just my 2c

Regards
Petr

 
 vQ
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
 On Tue, 11 Nov 2008 09:27:41 +0100
 Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

   
 but then it might be worth asking whether carrying on with misdesign
 for backward compatibility outbalances guaranteed crashes in future
 users' programs, [...]
 

 Why is it worth asking this if nobody else asks it?  Most notably a
 certain software company in Redmond, Washington, which is famous for
 carrying on with bad designs and bugs all in the name of backward
 compatibility.  Apparently this company also sets industry standards so
 it must be o.k. to do that. ;-)
   

sure.  i have had this analogy in mind for a long time, but just didn't
want to say it aloud.  indeed, r carries on with bad design, but since
there are more and more users, it's just fine.

   
 which result in confused complaints, 
 

 Didn't see any confused complaints yet.  

really.  the discussion was motivated precisely by a user's complaint. 
just scan this list;  a large part of the questions stems from
confusion, which results directly from r's design. 

 Only polite requests for
 enlightenment after coming across behaviour that useRs found surprising
 given their knowledge of R.  The confused complaints seem to be posted
 as responses to responses to such question by people who for what ever
 reason seem to have an axe to grind with R. 
   
 the need for responses suggesting hacks to bypass the design, 
 

 Not to bypass the design, but to achieve what the person whats.  As any
 programming language, R is a Turing machine and anything can be done
 with it; it is just a question how.
   

yes, to bypass the design.  to achieve what one would normally expect an
expression to be evaluated to, but r does it differently.

   
 and possibly incorrect results published 
 

 I guess such things cannot be avoided no matter what software you are
 using.  I am more worried about all the analysis done in MS Excel, in
 particular in the financial maths/stats world.  Also, to me it seems
 that getting incorrect results is a relative small problem compared with
 the frequent misinterpretation of correct results or the use of
 inappropriate statistical techniques.  
   

could not agree more, which does oppose in any way my complaints.


   
 because r is likely to do everything but what the user expects.
 

 This is quite a strong statement, and I wonder what the basis is for
 that a statement.  Care to provide any evidence?
   

i could think of organizing a (don't)useR conference, where submissions
would provide such evidence.  whatever i say here, is mostly discarded
as nonsense comments (while it certainly isn't), you say i make the
problem up (while i just follow up user's complaints).  seriously, i may
have exaggerated in the immediately above, but lots of comments made
here by the users convince me that r very often breaks expectations.

 R is a tool; a very powerful one and hence also very sharp.  It is easy
 to cut yourself with it, but when one knows how to use it gives the
 results that one expects.  I guess the problem in this age of instant
 gratification is that people are not willing to put in the time and
 effort to learn about the tools they are using.  
   

but a good tool should be made with care for how users will use it.  r
apparently fits the ideas of its developers, while confuses naive
users.  i do not opt for redmond-like 'i know better what you want'
intelligence, but i think some of the confusions should be predicted and
the design tuned accordingly.

 How about spending some time learning about R instead of continuously
 griping about it?  Just imagine how much you could have learned in the
 time you spend writing all those e-mails. :)
   

i learn a lot while writing these emails, because i do read manuals and
make up tests.  but there would be little progress if we all were buying
what we are given instead of critically examining it.  i can stop
posting at any moment, but i don't think it would help the community ;)

 r suffers from early made poor decisions, but then this in itself is
 not a good reason to carry on.
 

 Radford Neal is also complaining on his blog
 (http://radfordneal.wordpress.com/) about what he thinks are design
 flaws in R.  Why don't you two get together and design a good
 substitute without any flaws?  Or is that too hard? ;-)
   

it's certainly hard to design and implement a system of the size of r. 
it's certainly easier to just complain rather than make a better tool. 
but it would really be a pitiful world if all of us were just
developing, and no one would criticize.  my purpose is not (or not just,
if you prefer) to annoy the r team, but to point out and document issues
that really need rethinking.  discouragingly, many of these issues
appear to be known already, but simply ignored. 

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 09:49:31 +0100
Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

 (for whatever reason a user may choose to have a column named '-b')

For whatever reason, people also jump from bridges.  Does that mean
all bridges have an inherently flawed design and should be abolished?

Wait, then we would only have level crossing and some people, for
whatever reason, think it is a good idea to race trains to level
crossings.  Gee, we better abolish them too since they are such a bad
design.  

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Gavin Simpson
On Tue, 2008-11-11 at 11:08 +0100, Wacek Kusnierczyk wrote:
 Gavin Simpson wrote:
 
  d = data.frame(a = 1)
  d$`-b` = 2
  names(d)
  # here we go
 
  subset(d, select = -b)
  # to b or not to b?
  
 
  but -b is not the name of the column; you explicitly called it `-b` and
  you should refer to it as such. If you use non-standard names then
  expect to do a bit more work.

 identical(names(d)[2], -b)
 
 if i do
 
 d$`c` = 4
 
 then you claim d has no column named 'c'?

No, where do you get that from?

   do i have to refer to the c
 column as `c`?

No, but then c is a name that doesn't need to be quoted. -b is a name
that needs to be quoted and if you quote it, things work as you might
expect.

 
 

  subset(d, select = `-b`)
  
-b
  1  2

 
 ... and i have to use
 
 subset(d, select = `a`)
 
 and not
 
 subset(d, select = a)
 
 right?

Is a a name in d? You can quote it if you want but it doesn't need to
be quoted, so you can use either.

   besides, subset(d, select = `-b`) should rather return the
 column(s) whose names are the value of the variable `-b`:
 
 `-a` = a
 subset(d, select = `-a`)
 # returns all columns except for the one named 'a', rather than the
 column named '-a' -- but that's just because there is no such column in
 d;  if there were, this one would be returned. 

No, it returns a if you are following on from your original examples.
`-a` refers to a variable (object) and that evaluates to a and a is
component of d so is returned.

 
 so even with backquotes used, there is no obvious interpretation of what
 select=`-b`should mean, because it depends on what names components of
 the first argument have.  and this breaks the concept of referential
 transparency.
 
 so the problem is not so easily explained away.  what subset does *is*
 messy.

In your opinion.

And without wanting to be rude or anything, your opinion carries very
little weight in a project like R. You've arrived on the list and been
very critical of the work of others. Now there is nothing wrong with
being critical if it is constructive, and additionally with something
like R you need to be constructive *and* contribute back. I'm not saying
that if you did patch R to work the way you think is correct R Core will
accept them as they need to maintain backwards compatibility and with S
and not annoy the hundreds of package authors. but coming on here and
criticising the work of others isn't going to win you many friends.

Also, subset (and the other things you've been harping on about) work as
documented. So you kind of have to like it or lump it.

 
 
  subset(d, select = - `-b`)
  
a
  1 1
 

  b = a
  subset(d, select = -b)
  # tragedy
  
 
  For this, I interpret it as not finding a column named b so tries to
  evaluate:
 

 
 you interpret it.  how obvious is this for most users?
 it tries to find a column named 'b', not a column named b.  that's the
 problem with subset.

If users read the documentation then they'd know about unary operators.

 
 
  b = a
  `-`(b)
  
  Error in -b : invalid argument to unary operator
 
  `-` is a function remember.
 
  If you want this to work you can use get()
 

  subset(d, select = - get(b))
  
-b
  1  2
 

 
 use this hack to get around the design.

No hack, that is what get() is for. b is *not* a component of d. - b (or
`-`(b) evaluates to an error. If you want to select columns except the
column referenced by the contents of b (which is a) then you can use
get().

 
  d$b = 3
  subset(d, select = -b)
  # catharsis
 
  (for whatever reason a user may choose to have a column named '-b')
  
 
  Yes, but the user is warned about not using standard naming conventions
  in the Introduction to R manual. You aren't stopped from using names
  like `-b` but if you use them, you have to expect to work a little
  harder.

 
 i'd like you to point me to that warning, as i apparently need to read
 it, but i haven't found it in the manual yet.  thanks.

You could look at section 1.8 of An Introduction to R for a
starter. ?Syntax is also a logical place to start and it explicitly
refers you to details in the See Also section. If you read all of those
(but I'll save you some time and point you to ?Quotes) you find the
answers to how things like this work. ?Quotes explains what are
syntactic names and how to use '`' backticks to quote non-syntactic
names.

Ok, ?Syntax and ?Quotes may not jump out at you as being very obvious
places to look. If so, grab the source to the introduction to R manual,
find a logical place to put this information or note to point people to
the help pages and patch it accordingly. Then contribute that back to
good of everyone.

 
  Reading ?subset we have:
 
select: expression, indicating columns to select from a data frame.
 
  
 
   For data frames, the 'subset' argument works on the rows.  Note
   that 'subset' will be evaluated in the data frame, so columns can
   be referred 

Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Thomas Lumley


Some of the uses of non-standard evaluation are undoubtedly a problem in 
R. Probably the worst is in model.frame, because it is much harder to 
work around. I have never used subset(,select=) and hence have never 
been at risk of confusion (if you don't like how it works, I suggest you 
do the same), but model.frame() is inside lots of things.


 There are two issues here that I think are worth pointing out:
1/ Some things are just not fixable any more. They can only be fixed in a 
new language. The people thinking about new statistical languages mostly 
know what the problems are, because they have been using S and/or R for 
many years and it's really not that hard to notice the problems. The 
document on non-standard evaluation demonstrates that R-core is aware of 
this particular problem.


2/ There are some uses of non-standard evaluation that don't seem to 
confuse people, and an interesting question is how to characterise them. 
These are what I referred to as 'macro-like functions' in the document 
that you have already been referred to.  For example, subset(,subset=) and 
with() don't seem to be as confusing or to cause problems for programmers 
in the same way. There is an empirical question as to what these 
relatively non-problematic constructs are, and a theoretical question as 
to why they are different. In particular, with() not only has non-standard 
evaluation, it is quite similar to the notoriously confusing attach().



-thomas


On Tue, 11 Nov 2008, Wacek Kusnierczyk wrote:


Berwin A Turlach wrote:

On Tue, 11 Nov 2008 09:27:41 +0100
Wacek Kusnierczyk [EMAIL PROTECTED] wrote:



but then it might be worth asking whether carrying on with misdesign
for backward compatibility outbalances guaranteed crashes in future
users' programs, [...]



Why is it worth asking this if nobody else asks it?



i guess most of the people who do ask questions here care little about r
itself, they just want it to solve a problem, even if it involves
hacking the language.

those outside the r team who care about language design have probably
left the list long ago, if only they were subscribed.  the fact that
it's only me asking is no statistics.  i do talk to people, and know
many who'd ask, but they just don't care, because they have already
trashed r.  instead of discouraging me, make use of that i care to ask.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Bert Gunter
Ummm... as today is still Armistice day  (in my time zone, anyway), maybe we
should call a truce and end this flame war...

Cheers,
Bert


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Berwin A Turlach
Sent: Tuesday, November 11, 2008 9:31 AM
To: Duncan Murdoch
Cc: R help
Subject: Re: [R] Variable passed to function not used in function in
select=... in subset

G'day Duncan,

On Tue, 11 Nov 2008 09:37:57 -0500
Duncan Murdoch [EMAIL PROTECTED] wrote:

 I think this tension is a fundamental part of the character of S and
 R. But it is also fundamental to R that there are QC tests that apply
 to code in packages:  so writing new tests that detect dangerous
 usage (e.g. to disallow partial name matching) would be another way
 to improve reliability.  [...]

Please not. :)
After years of using of R, it is now second nature to me to type (yes,
I always spell out from and to) 
seq(from=xx, to=yy, length=zz)
and I never understood why the full name of that argument had to be
length.out.  I would hate to see lots of warning messages because I am
using partial matching.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Duncan Murdoch

On 11/11/2008 8:53 AM, hadley wickham wrote:

On Mon, Nov 10, 2008 at 1:04 PM, Wacek Kusnierczyk
[EMAIL PROTECTED] wrote:

pardon me, but does this address in any way the legitimate complaint of
the rightfully confused user?

consider the following:

d = data.frame(a=1, b=2)
a = c(a, b)
z = a
# that is, both a and z are c(a, b)

subset(d, select=z)
# gives two columns, since z is a two element vector whose elements are
valid column names

subset(d, select=a)
# gives one column, since 'a' (but not a) is a valid column name

subset(d, select=c(a,b))
# gives two columns


this is certainly what the authors intended, and they may have good
grounds for this smart design.  but this must break the expectation of a
naive (r-naive, for that matter) user, who may otherwise have excellent
experience in using a functional programming language, e.g., scheme.
(especially scheme, where symbols and expressions are first-class
objects, yet the distinction between a symbol or an expression and their
referent is made painfully clear, perhaps except for when one hacks with
macros.)

the examples above illustrate the notorious problem with r that one can
never tell whether 'a' means the value referred to with the identifier
'a' or the symbol 'a', unless one gets ugly surprises and is forced
to study the documentation.  and even then one may not get a clear answer.


I agree, with some caveats.  There are basically two uses of R: as a
interactive data analysis package and as a statistical programming
language.  These uses come into conflict: in the interactive
environment, you want to minimise typing so that you can be as speedy
as possible.  It doesn't matter if R occasionally makes a wrong guess
when you have specified something implicitly, because you can fix it
on the fly.  When you are programming, you care less about saving
typing and more about reproducibility.  You want to be explicit so
your function is robust to widely varying inputs, even if it means you
have to type a lot more.  You see this tension in quite a few places:

 * drop = T
 * functions that return different types of output (e.g. sapply)
depending on input parameters
 * partial matching of argument names
 * using unevaluated expressions instead of strings (e.g. library, subset, ...)

These are all things that are helpful for interactive use, but make
life as a programmer more difficult.  I find the last one particularly
frustrating because it means it is very difficult to program with some
functions (i.e subset) without resorting to complex quoting,
substituting and evaluating tricks.  I have tried to steer away from
this technique in my packages, and where it's just too convenient for
interactive use, insulating the deparsing into special functions that
the data analyst must use (e.g. aes() in ggplot, and .() in plyr),
along with providing alternatives for the programmer.

I don't understand why you're getting so much push-back on this issue.
 R is a fantastic language, but it has some genuinely nasty corners.
In my opinion, this is one of them.


I think your analysis is correct, that the goals of casual use and 
programming are inconsistent.  But in general I think there's always 
going to be support for providing alternative ways that are 
programmer-safe.


For instance, library( foo, character.only=TRUE) says that foo is a 
character vector, not the name of a package.  I don't know of anything 
that subset() provides that is not available in other ways (I think of 
it as purely a convenience function, and my first piece of advice to 
Karl was not to use it).  However, if there really is something there, 
then it would be worthwhile pointing that out, and either modifying 
subset() to make it safe, or providing an alternative function.


I think this tension is a fundamental part of the character of S and R. 
 But it is also fundamental to R that there are QC tests that apply to 
code in packages:  so writing new tests that detect dangerous usage 
(e.g. to disallow partial name matching) would be another way to improve 
reliability.  Writing a test for misuse of drop=TRUE seems quite hard, 
but there are probably ways a debugger could do it:  e.g. to tag the 
invocation as to whether any indices were dropped on the first call, and 
then warn if the result isn't the same on every subsequent call).


Conceivably Karl's problem could be detected in the same way:  tag each 
name in the expression as to whether it was found in the data frame or 
some other environment, and then warn if that tag ever changes.  Or 
maybe the test should just warn that subset() is a convenience function, 
not meant for programming.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Duncan Murdoch

On 11/11/2008 2:56 PM, Bert Gunter wrote:

Ummm... as today is still Armistice day  (in my time zone, anyway), maybe we
should call a truce and end this flame war...


I haven't seen very many flames  --  there have been disagreements, but 
generally it's been quite civil.  Certainly I don't think Berwin flamed me.


If we were to add in a warning about partial name matching, it would 
have to be accompanied by some way to deal with common uses like the one 
Berwin mentioned.  (There are at least 100 uses of seq(..., length=...) 
in the core  recommended packages.  I wouldn't want to fix all of 
those.)  But it could still be useful, in the same way the checks for 
using TRUE and FALSE instead of T and F are useful.


Duncan Murdoch



Cheers,
Bert


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Berwin A Turlach
Sent: Tuesday, November 11, 2008 9:31 AM
To: Duncan Murdoch
Cc: R help
Subject: Re: [R] Variable passed to function not used in function in
select=... in subset

G'day Duncan,

On Tue, 11 Nov 2008 09:37:57 -0500
Duncan Murdoch [EMAIL PROTECTED] wrote:


I think this tension is a fundamental part of the character of S and
R. But it is also fundamental to R that there are QC tests that apply
to code in packages:  so writing new tests that detect dangerous
usage (e.g. to disallow partial name matching) would be another way
to improve reliability.  [...]


Please not. :)
After years of using of R, it is now second nature to me to type (yes,
I always spell out from and to) 
	seq(from=xx, to=yy, length=zz)

and I never understood why the full name of that argument had to be
length.out.  I would hate to see lots of warning messages because I am
using partial matching.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Kenn Konstabel wrote:

 On the other hand, while there may be ground to complain,  it may be easier
 to make your own version of subset.data.frame and  advertise it to everyone:

   

sure, but:

a) it may actually increase the mess, and reduce portability
b) is still vulnerable to the idiosyncrasies of the functions you use to
develop your own function.

to b), that was the original case; the user wanted to implement a
function that did print-names-subset, and he got caught by subset.


it should be preferred to have a clean and consistent protocol for how
functions treat their arguments, rather than to multiply implementations
of the same operation to provide versions that differ in nitty-gritty
details just because the original does something odd.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 11:27:30 +0100
Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

 Berwin A Turlach wrote:

  Why is it worth asking this if nobody else asks it?  Most notably a
  certain software company in Redmond, Washington, which is famous for
  carrying on with bad designs and bugs all in the name of backward
  compatibility.  Apparently this company also sets industry
  standards so it must be o.k. to do that. ;-)

 
 sure.  i have had this analogy in mind for a long time, but just
 didn't want to say it aloud.  

Mate, if you contemplate comparing R to anything coming out of Redmond,
Washington, then you should first heed the old saying that it is
better to remain silent and let people believe that one is a fool than
to open one's mouth and remove any doubt. :)

 indeed, r carries on with bad design, but since there are more and
 more users, it's just fine.

Whether R carries on with bad design is debatable.  Clearly the changes
that you would like to see would lead to big changes that might break a
lot of existing code and programming idioms.  Such changes could
estrange large part of the user base and, in a worst case scenario,
make R unusable for many tasks it is used now.  No wonder that nobody
is eager to implement such design changes.  Apparently Python is
planning such whole sale changes when moving to version 3.x.  Let's see
what that does to the popularity of Python and the uptake of the new
version.

  Didn't see any confused complaints yet.  
 
 really.  the discussion was motivated precisely by a user's
 complaint. 

We must have different definition of what constitutes a complaint.  I
looked at the initial posting again.  In my book there was no
complaint.  Just a user who asked how to achieve a certain aim because
the way he tried to achieve it did not work.  There were three or four
constructive answers that pointed out how it can be done and then all
of a sudden complaints about alleged design flaws of R started.

 just scan this list;  a large part of the questions stems from
 confusion, which results directly from r's design. 

That's your opinion, to which you are of course entitled to.  In my
opinion, a large part of the questions on r-help these days stem from
the fact that in this age of instant gratification it seems to be
easier to fire off an e-mail to a mailing list and try to pick the
brain of thousands of subscribers  instead of spending time on trying
to read the documentation, learn about R and figure out the question on
one's own.

  because r is likely to do everything but what the user expects.
 
  This is quite a strong statement, and I wonder what the basis is for
  that a statement.  Care to provide any evidence?
 
 i could think of organizing a (don't)useR conference, where
 submissions would provide such evidence.  

Please do so.  Such a conference would probably turn out to be more
hilarious and funnier than the Melbourne International Comedy Festival;
should be real fun to attend. :)

 whatever i say here, is mostly discarded as nonsense comments (while
 it certainly isn't), you say i make the problem up (while i just
 follow up user's complaints).  seriously, i may have exaggerated in
 the immediately above, but lots of comments made here by the users
 convince me that r very often breaks expectations.

Ever heard about biased sampling?  On a list like this you, of course,
hear questions by useRs who had the wrong expectations about how R
should behave and got surprised.  You do not hear of all the instances
in which useRs had the correct expectations which promptly were met by
R.  

  R is a tool; a very powerful one and hence also very sharp.  It is
  easy to cut yourself with it, but when one knows how to use it
  gives the results that one expects.  I guess the problem in this
  age of instant gratification is that people are not willing to put
  in the time and effort to learn about the tools they are using.  
 
 but a good tool should be made with care for how users will use it.  

But the group of users change, and sometimes one cannot foresee all
possible ways in which future users may use the software.  As a
programming paradigm says, you cannot make a piece of software
idiot-proof; nature will always come up with a better idiot.  

 r apparently fits the ideas of its developers, 

That's the prerogative of the developers, isn't it?  But if it would
only fit their ideas, then it would only be used by them.  The fact
that it is used by many others seem to indicate that it fits also the
ideas of many others.

 while confuses naive users.  

Well, many judiciaries have staged driver licenses for motorcycle;
initially allowing only low-powered machine for new users with
increasing powerful machines allowed for more experiences users.  Some
people in Australia would like to introduce a similar system for
car-drivers since, apparently, too many P-platers kill themselves with
high-powered V8 cars (though, I am not sure whether this is a problem
of 

Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 09:27:41 +0100
Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

 but then it might be worth asking whether carrying on with misdesign
 for backward compatibility outbalances guaranteed crashes in future
 users' programs, [...]

Why is it worth asking this if nobody else asks it?  Most notably a
certain software company in Redmond, Washington, which is famous for
carrying on with bad designs and bugs all in the name of backward
compatibility.  Apparently this company also sets industry standards so
it must be o.k. to do that. ;-)

 which result in confused complaints, 

Didn't see any confused complaints yet.  Only polite requests for
enlightenment after coming across behaviour that useRs found surprising
given their knowledge of R.  The confused complaints seem to be posted
as responses to responses to such question by people who for what ever
reason seem to have an axe to grind with R. 

 the need for responses suggesting hacks to bypass the design, 

Not to bypass the design, but to achieve what the person whats.  As any
programming language, R is a Turing machine and anything can be done
with it; it is just a question how.

 and possibly incorrect results published 

I guess such things cannot be avoided no matter what software you are
using.  I am more worried about all the analysis done in MS Excel, in
particular in the financial maths/stats world.  Also, to me it seems
that getting incorrect results is a relative small problem compared with
the frequent misinterpretation of correct results or the use of
inappropriate statistical techniques.  

 because r is likely to do everything but what the user expects.

This is quite a strong statement, and I wonder what the basis is for
that a statement.  Care to provide any evidence?

R is a tool; a very powerful one and hence also very sharp.  It is easy
to cut yourself with it, but when one knows how to use it gives the
results that one expects.  I guess the problem in this age of instant
gratification is that people are not willing to put in the time and
effort to learn about the tools they are using.  

How about spending some time learning about R instead of continuously
griping about it?  Just imagine how much you could have learned in the
time you spend writing all those e-mails. :)

 r suffers from early made poor decisions, but then this in itself is
 not a good reason to carry on.

Radford Neal is also complaining on his blog
(http://radfordneal.wordpress.com/) about what he thinks are design
flaws in R.  Why don't you two get together and design a good
substitute without any flaws?  Or is that too hard? ;-)

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Kenn Konstabel
On Tue, Nov 11, 2008 at 12:27 PM, Wacek Kusnierczyk 
[EMAIL PROTECTED] wrote:

 it's certainly hard to design and implement a system of the size of r.
 it's certainly easier to just complain rather than make a better tool.
 but it would really be a pitiful world if all of us were just
 developing, and no one would criticize.  my purpose is not (or not just,
 if you prefer) to annoy the r team, but to point out and document issues
 that really need rethinking.  discouragingly, many of these issues
 appear to be known already, but simply ignored.


On the other hand, while there may be ground to complain,  it may be easier
to make your own version of subset.data.frame and  advertise it to everyone:

Substitute the second `substitute` in subset.data.frame for nothing, i.e.,
replace
   vars - eval(substitute(select), nl, parent.frame())
.. with
  vars - eval(select, nl, parent.frame())
.. and it will behave as you want (if I understood you).

# suppose you have modified subset.data.frame this way
# and called it waceks.subset
 df1-data.frame(group=G1, visit=V1, value=0.9)
group - c(group, visit)

 subset(df1, select=group)
  group
1G1

 waceks.subset(df1, select=group)
  group visit
1G1V1


KK

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread hadley wickham
 And without wanting to be rude or anything, your opinion carries very
 little weight in a project like R. You've arrived on the list and been
 very critical of the work of others. Now there is nothing wrong with
 being critical if it is constructive, and additionally with something
 like R you need to be constructive *and* contribute back. I'm not saying

You are holding Wacek to a very high standard.  Why is not acceptable
to say that this part of R is hard to understand without having to
provide a better solution?

subset() _is_ confusing to novice R users.  You can not anticipate
what subset(df, select = a) will do unless you know what variables are
defined in the local environment and what variables are defined in the
data frame.  It is hard to understand how it works without a deep
understanding of environments and it is hard to teach all the special
cases.   It is difficult to reliably use subset within another
function.

This comes from my personal experience with subset (good for
interactive use, never program with) and from my experiences teaching
~80 students how to use R over the last two years.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Duncan Murdoch

On 11/11/2008 5:00 AM, Berwin A Turlach wrote:


Radford Neal is also complaining on his blog
(http://radfordneal.wordpress.com/) about what he thinks are design
flaws in R.  Why don't you two get together and design a good
substitute without any flaws?  Or is that too hard? ;-)


I agree with Radford (who was complaining about surprising behaviour 
with dropped dimensions in array indexing, and the result of 1:n when n 
is zero), but I don't particularly like his solution.  It seems to me 
that introducing a new operator that returns a sequence from 1 up to n 
is a good idea, but having a new data type is not:  there is too much 
legacy code that would not be able to handle it.  So we need some other 
way to handle the array indexing problem, such as ways to detect 
unintentional omissions of drop=FALSE, if we want to handle it.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Petr PIKAL wrote:

 Well, if somebody does not care what is he/she doing then he/she should 
 stop immediately. 
   

then many r users should perhaps stop using r.

but seriously, when one buys a complicated device one typically reads a
quick start guide, and makes intuitive assumptions about how the device
will work, turning back to the reference when the expectations fail. 
good design should aim at reducing the need for checking why an
intuitive assumption fails.


 If you do not care about how to use machine-gun correctly you could easily 
 harm yourself or others. 
   
indeed, and i'm scared to think that some of the published research can
be harmful because the researcher denied to read the whole r reference
before doing a stats analysis.

 those outside the r team who care about language design have probably
 left the list long ago, if only they were subscribed.  the fact that
 

 I am just a BFU although for some time already, so I learned much virtues 
 from capable persons who are developing and using R. I started with R when 
 I had to change from DOS Statgraphics to some Windows based program and 
 get used to it. 

 It is like buying new shoes. If somebody just put them on, go for a some 
 mountaineering, find out that they cause blisters, discard them and buy a 
 new pair then he probable does not get rid of blisters.
   

you see, i'm not complaining about my own analyses failing because i
have not read the appropriate section in the reference.  if this were
the problem, i'd just read more and keep silent.

i'm complaining about the need to read, by anyone who starts up with r,
in all gory details, about the intricacies of r before doing anything,
because the behaviour is often so unexpected.  i'm using a whole range
of programming languages, including functional ones, they differ a lot,
they do surprise me at times, but once you learn a few general rules
about the syntax and semantics, it goes well.  it won't with r, because
every single function can do it's own tricks with the arguments you give
it, and it can do so in an inconsistent manner.  *this* is what should
be changed for r to be coherent and reliable.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 12:53:31 +0100
Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

 but seriously, when one buys a complicated device one typically reads
 a quick start guide, and makes intuitive assumptions about how the
 device will work, turning back to the reference when the expectations
 fail. good design should aim at reducing the need for checking why an
 intuitive assumption fails.

And on what are these intuitive assumptions based if not on familiarity
with similar devices?  And people have different intuition, why should
yours be the correct one and the golden standard?

I know that if I buy a complicated device and never owned something
similar I read more than the quick start guide to get familiar with the
device before breaking something due to using wrong assumptions.

When I started to use S-PLUS, I had used GAUSS before.  Still I took
the time off to work through the blue book and make myself familiar
with S-PLUS before using it for serious work.  Based on my experience
with R, I found R very intuitive and easy to use; but still try to keep
up with relevant documentation.

It really seems that your problem is that you have an attitude of
wanting to have instant gratification.

  If you do not care about how to use machine-gun correctly you could
  easily harm yourself or others. 

 indeed, and i'm scared to think that some of the published research
 can be harmful because the researcher denied to read the whole r
 reference before doing a stats analysis.

Sorry, but this is absolute rubbish.  There are plenty of statistical
analyses that can be done without reading the complete R reference.
However, one or two good books might help.

My concern would rather be that everybody thinks that they can do
statistics and that software project of R makes such people really
think they can do it.  I am far more concerned about inappropriate
analyses and wrong interpretations.  How often is absence of evidence
taken as evidence of absence?

 you see, i'm not complaining about my own analyses failing because i
 have not read the appropriate section in the reference.  if this were
 the problem, i'd just read more and keep silent.
 
 i'm complaining about the need to read, by anyone who starts up with
 r, in all gory details, about the intricacies of r before doing
 anything, because the behaviour is often so unexpected.  

I guess Frank Harrell had people like you in mind when he wrote:
  https://stat.ethz.ch/pipermail/r-help/2005-April/068625.html

Would you also not expect to learn about surgery in all its gory
details before attempting brain surgery because brain surgery is so
intuitive and doesn't need any study?

Believe it or not, there are lots of useful things that you can do in R
without knowing all the gory details.  There are even people who got
books on R published who obviously don't know all the gory details and
they still show useful applications of R.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Henrique Dallazuanna
Try this:

TestFunc-function(df, group) {
return(names(eval(bquote(subset(df1, select = .(group))
}

On Mon, Nov 10, 2008 at 1:18 PM, Karl Knoblick [EMAIL PROTECTED]wrote:

 Hello!

 I have the problem that in my function the passed variable is not used, but
 the variable name of the dataframe itself - difficult to explain, but an
 easy example:

 TestFunc-function(df, group) {
 print(names(subset(df, select=group)))
 }
 df1-data.frame(group=G1, visit=V1, value=0.9)
 TestFunc(df1, c(group, visit))

 Result:
 [1] group

 But I expected and want to have [1] group visit as result! Does anybody
 know how to get this result?

 Thanks!
 Karl




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Gabor Grothendieck
Here are a few things to try:

TestFunc1 - get([)

TestFunc2 - function(DF, group) DF[group]

TestFunc3 - function(...) subset(..., subset = TRUE)



On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick [EMAIL PROTECTED] wrote:
 Hello!

 I have the problem that in my function the passed variable is not used, but 
 the variable name of the dataframe itself - difficult to explain, but an easy 
 example:

 TestFunc-function(df, group) {
 print(names(subset(df, select=group)))
 }
 df1-data.frame(group=G1, visit=V1, value=0.9)
 TestFunc(df1, c(group, visit))

 Result:
 [1] group

 But I expected and want to have [1] group visit as result! Does anybody 
 know how to get this result?

 Thanks!
 Karl




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Duncan Murdoch

On 11/10/2008 10:18 AM, Karl Knoblick wrote:

Hello!

I have the problem that in my function the passed variable is not used, but the 
variable name of the dataframe itself - difficult to explain, but an easy 
example:

TestFunc-function(df, group) {
print(names(subset(df, select=group)))
}
df1-data.frame(group=G1, visit=V1, value=0.9)
TestFunc(df1, c(group, visit))

Result:
[1] group
 
But I expected and want to have [1] group visit as result! Does anybody know how to get this result?


Don't use subset.  You can get what you want using


print(names(df[,group]))

Or alternatively, you can force group to be found in the right place in 
this way:


e - environment()
print(names(subset(df, select=e$group)))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Wacek Kusnierczyk
pardon me, but does this address in any way the legitimate complaint of
the rightfully confused user?

consider the following:

d = data.frame(a=1, b=2)
a = c(a, b)
z = a
# that is, both a and z are c(a, b)

subset(d, select=z)
# gives two columns, since z is a two element vector whose elements are
valid column names

subset(d, select=a)
# gives one column, since 'a' (but not a) is a valid column name

subset(d, select=c(a,b))
# gives two columns


this is certainly what the authors intended, and they may have good
grounds for this smart design.  but this must break the expectation of a
naive (r-naive, for that matter) user, who may otherwise have excellent
experience in using a functional programming language, e.g., scheme. 
(especially scheme, where symbols and expressions are first-class
objects, yet the distinction between a symbol or an expression and their
referent is made painfully clear, perhaps except for when one hacks with
macros.)

the examples above illustrate the notorious problem with r that one can
never tell whether 'a' means the value referred to with the identifier
'a' or the symbol 'a', unless one gets ugly surprises and is forced
to study the documentation.  and even then one may not get a clear answer.

the example given by the confused user is a red flag warning.  it's a
typical abstraction where a nested sequence of operations (here print
over names over subset) is abstracted into a single procedure, which can
be called with whatever arguments are valid:

pns = function(d, g) print(names(subset(d, select=g)))

what sane person, without carefully studying the gory details of subset,
will ever expect that if the first argument happens to have a column
named 'g', only this one will be selected, while if it doesn't, subset
will select the columns named by the components of what 'g' evaluates
to.  i wonder how many users have *not* noticed that what they get is
not what they assume they get because of such tricky tricks, and in
consequence were not able to publish their analyses (or worse, have
published them). 

what is scary is that this may happen with about any other function in
r, because the design is pervasive.  no one should ever use any r
function without first carefully reading the docs (which is not
guaranteed to help) or trying it first on a number of carefully crafted
test cases.  if such care is not taken, results obtained with r cannot
be taken seriously.


vQ


Gabor Grothendieck wrote:
 Forgot the name part.  Try:

 TestFunc2 - function(DF, group) names(DF[group])
 TestFunc3 - function(...) names(subset(..., subset = TRUE))
 TestFunc4 - function(...) eval.parent(names(subset(..., subset = TRUE)))

 # e.g.
 df1 - data.frame(group = G1, visit = V1, value = 0.9)
 TestFunc2(df1, c(group, visit))
 TestFunc3(df1, c(group, visit))
 TestFunc4(df1, c(group, visit))
 TestFunc4(df1, c(group, visit)) # this works too

 On Mon, Nov 10, 2008 at 10:43 AM, Gabor Grothendieck
 [EMAIL PROTECTED] wrote:
   
 Here are a few things to try:

 TestFunc1 - get([)

 TestFunc2 - function(DF, group) DF[group]

 TestFunc3 - function(...) subset(..., subset = TRUE)



 On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick [EMAIL PROTECTED] wrote:
 
 Hello!

 I have the problem that in my function the passed variable is not used, but 
 the variable name of the dataframe itself - difficult to explain, but an 
 easy example:

 TestFunc-function(df, group) {
 print(names(subset(df, select=group)))
 }
 df1-data.frame(group=G1, visit=V1, value=0.9)
 TestFunc(df1, c(group, visit))

 Result:
 [1] group

 But I expected and want to have [1] group visit as result! Does anybody 
 know how to get this result?

 Thanks!
 Karl


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Gabor Grothendieck
Certainly this has been recognized as a potential problem:

http://developer.r-project.org/nonstandard-eval.pdf

however, it is convenient when you are performing
an analysis and entering commands directly as opposed
to writing a program although possibly the potential ambiguities
overshadow the convenience.

On Mon, Nov 10, 2008 at 2:04 PM, Wacek Kusnierczyk
[EMAIL PROTECTED] wrote:
 pardon me, but does this address in any way the legitimate complaint of
 the rightfully confused user?

 consider the following:

 d = data.frame(a=1, b=2)
 a = c(a, b)
 z = a
 # that is, both a and z are c(a, b)

 subset(d, select=z)
 # gives two columns, since z is a two element vector whose elements are
 valid column names

 subset(d, select=a)
 # gives one column, since 'a' (but not a) is a valid column name

 subset(d, select=c(a,b))
 # gives two columns


 this is certainly what the authors intended, and they may have good
 grounds for this smart design.  but this must break the expectation of a
 naive (r-naive, for that matter) user, who may otherwise have excellent
 experience in using a functional programming language, e.g., scheme.
 (especially scheme, where symbols and expressions are first-class
 objects, yet the distinction between a symbol or an expression and their
 referent is made painfully clear, perhaps except for when one hacks with
 macros.)

 the examples above illustrate the notorious problem with r that one can
 never tell whether 'a' means the value referred to with the identifier
 'a' or the symbol 'a', unless one gets ugly surprises and is forced
 to study the documentation.  and even then one may not get a clear answer.

 the example given by the confused user is a red flag warning.  it's a
 typical abstraction where a nested sequence of operations (here print
 over names over subset) is abstracted into a single procedure, which can
 be called with whatever arguments are valid:

 pns = function(d, g) print(names(subset(d, select=g)))

 what sane person, without carefully studying the gory details of subset,
 will ever expect that if the first argument happens to have a column
 named 'g', only this one will be selected, while if it doesn't, subset
 will select the columns named by the components of what 'g' evaluates
 to.  i wonder how many users have *not* noticed that what they get is
 not what they assume they get because of such tricky tricks, and in
 consequence were not able to publish their analyses (or worse, have
 published them).

 what is scary is that this may happen with about any other function in
 r, because the design is pervasive.  no one should ever use any r
 function without first carefully reading the docs (which is not
 guaranteed to help) or trying it first on a number of carefully crafted
 test cases.  if such care is not taken, results obtained with r cannot
 be taken seriously.


 vQ


 Gabor Grothendieck wrote:
 Forgot the name part.  Try:

 TestFunc2 - function(DF, group) names(DF[group])
 TestFunc3 - function(...) names(subset(..., subset = TRUE))
 TestFunc4 - function(...) eval.parent(names(subset(..., subset = TRUE)))

 # e.g.
 df1 - data.frame(group = G1, visit = V1, value = 0.9)
 TestFunc2(df1, c(group, visit))
 TestFunc3(df1, c(group, visit))
 TestFunc4(df1, c(group, visit))
 TestFunc4(df1, c(group, visit)) # this works too

 On Mon, Nov 10, 2008 at 10:43 AM, Gabor Grothendieck
 [EMAIL PROTECTED] wrote:

 Here are a few things to try:

 TestFunc1 - get([)

 TestFunc2 - function(DF, group) DF[group]

 TestFunc3 - function(...) subset(..., subset = TRUE)



 On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick [EMAIL PROTECTED] wrote:

 Hello!

 I have the problem that in my function the passed variable is not used, 
 but the variable name of the dataframe itself - difficult to explain, but 
 an easy example:

 TestFunc-function(df, group) {
 print(names(subset(df, select=group)))
 }
 df1-data.frame(group=G1, visit=V1, value=0.9)
 TestFunc(df1, c(group, visit))

 Result:
 [1] group

 But I expected and want to have [1] group visit as result! Does 
 anybody know how to get this result?

 Thanks!
 Karl




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Gabor Grothendieck
Forgot the name part.  Try:

TestFunc2 - function(DF, group) names(DF[group])
TestFunc3 - function(...) names(subset(..., subset = TRUE))
TestFunc4 - function(...) eval.parent(names(subset(..., subset = TRUE)))

# e.g.
df1 - data.frame(group = G1, visit = V1, value = 0.9)
TestFunc2(df1, c(group, visit))
TestFunc3(df1, c(group, visit))
TestFunc4(df1, c(group, visit))
TestFunc4(df1, c(group, visit)) # this works too

On Mon, Nov 10, 2008 at 10:43 AM, Gabor Grothendieck
[EMAIL PROTECTED] wrote:
 Here are a few things to try:

 TestFunc1 - get([)

 TestFunc2 - function(DF, group) DF[group]

 TestFunc3 - function(...) subset(..., subset = TRUE)



 On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick [EMAIL PROTECTED] wrote:
 Hello!

 I have the problem that in my function the passed variable is not used, but 
 the variable name of the dataframe itself - difficult to explain, but an 
 easy example:

 TestFunc-function(df, group) {
 print(names(subset(df, select=group)))
 }
 df1-data.frame(group=G1, visit=V1, value=0.9)
 TestFunc(df1, c(group, visit))

 Result:
 [1] group

 But I expected and want to have [1] group visit as result! Does anybody 
 know how to get this result?

 Thanks!
 Karl




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:
 Certainly this has been recognized as a potential problem:

 http://developer.r-project.org/nonstandard-eval.pdf

 however, it is convenient when you are performing
 an analysis and entering commands directly as opposed
 to writing a program although possibly the potential ambiguities
 overshadow the convenience.
   

in most cases, i do not see why one could not use a string literal
passed by value instead of having an expression deparsed within the
function, which may lead to confusing behaviour.  this would give much
more consistent and predictable code.  this has nothing to do with the
evaluation mechanism, which can still be lazy. 



in the case of subset, i do not really see how this design might be
helpful, but it's easy to see how it can be harmful, examples have just
been given.  the convenience here is at most up to being able to omit
quotes, at the risk of having columns selected where they should not,
and vice versa.  the worst thing is that it destroys the benefit of
lexical scoping:

subset(d, select=group)

did the programmer intend to select the column named 'group'?  or the
columns whose names appear in the vector group?  is d supposed not to
have a column named 'group', should one change the identifier if d does
have such a column, to avoid selecting that column instead of whatever
else would be selected?  etc.  could this not be written as

subset(d, select=group) 

(two extra characters), and have it cleanly and always mean 'pick the
one column named 'group''? 

so there are actually three problems here:
- one that a programmer may be unaware that her own code not do what she
wants;
- another that a user may unaware of that the code she uses performs
this way;
- another that a user may not be sure whether the code may be reused as
is, or must be modified so as not to interfere with the particular data.

the dependence of subset's behaviour on the particular data it is
applied to is confusing.  and here's an example of how it breaks its own
smart semantics:

d = data.frame(a=1)
d$`c(a,b)` = 2
d
# no problem, two columns
names(d)
# one named 'c(a,b)'

subset(d, select=c(a,b))
# so what?  the expression given to select certainly is a valid and
actual name of a column in d, but subset complains there's no such
column (well, it actually says object b not found, by which it
probably means that object b, i.e., object named 'b', has not been
found.  not only uninformative as a message in this situation, but also
revealing the pervasive confusion of the name and the named, as the
object b -- a one-character string -- has not been mentioned here at
all.  what a mess.)

this can't possibly be considered good design, can it?  the dubious
benefit is heavily outweighed by the drawbacks.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Gabor Grothendieck
On Mon, Nov 10, 2008 at 4:17 PM, Wacek Kusnierczyk
[EMAIL PROTECTED] wrote:
 Gabor Grothendieck wrote:
 Certainly this has been recognized as a potential problem:

 http://developer.r-project.org/nonstandard-eval.pdf

 however, it is convenient when you are performing
 an analysis and entering commands directly as opposed
 to writing a program although possibly the potential ambiguities
 overshadow the convenience.


 in most cases, i do not see why one could not use a string literal
 passed by value instead of having an expression deparsed within the
 function, which may lead to confusing behaviour.  this would give much
 more consistent and predictable code.  this has nothing to do with the
 evaluation mechanism, which can still be lazy.



 in the case of subset, i do not really see how this design might be
 helpful, but it's easy to see how it can be harmful, examples have just

I think the thrust of your comments were already made by reference.

Regarding the convenience it occurs in expressions like this:

   iris2 - subset(iris, select = - Species)

to create a data frame without the Species column.

Perhaps this would have better been done by allowing an optional
formula for the select clause:

   iris2 - subset(iris, select = ~ - Species)

but I think R is stuck with what it has due to compatibility and the large
base of users yet its still possible to add functions in packages or new
functions to R so a new variant of subset would be possible in which
case one could decide to use the new function in place of the old one.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.