Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Duncan Murdoch

On 11/11/2008 2:56 PM, Bert Gunter wrote:

Ummm... as today is still Armistice day  (in my time zone, anyway), maybe we
should call a truce and end this flame war...


I haven't seen very many flames  --  there have been disagreements, but 
generally it's been quite civil.  Certainly I don't think Berwin flamed me.


If we were to add in a warning about partial name matching, it would 
have to be accompanied by some way to deal with common uses like the one 
Berwin mentioned.  (There are at least 100 uses of seq(..., length=...) 
in the core & recommended packages.  I wouldn't want to fix all of 
those.)  But it could still be useful, in the same way the checks for 
using TRUE and FALSE instead of T and F are useful.


Duncan Murdoch



Cheers,
Bert


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Berwin A Turlach
Sent: Tuesday, November 11, 2008 9:31 AM
To: Duncan Murdoch
Cc: R help
Subject: Re: [R] Variable passed to function not used in function in
select=... in subset

G'day Duncan,

On Tue, 11 Nov 2008 09:37:57 -0500
Duncan Murdoch <[EMAIL PROTECTED]> wrote:


I think this tension is a fundamental part of the character of S and
R. But it is also fundamental to R that there are QC tests that apply
to code in packages:  so writing new tests that detect dangerous
usage (e.g. to disallow partial name matching) would be another way
to improve reliability.  [...]


Please not. :)
After years of using of R, it is now second nature to me to type (yes,
I always spell out "from" and "to") 
	seq(from=xx, to=yy, length=zz)

and I never understood why the full name of that argument had to be
length.out.  I would hate to see lots of warning messages because I am
using partial matching.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Bert Gunter
Ummm... as today is still Armistice day  (in my time zone, anyway), maybe we
should call a truce and end this flame war...

Cheers,
Bert


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Berwin A Turlach
Sent: Tuesday, November 11, 2008 9:31 AM
To: Duncan Murdoch
Cc: R help
Subject: Re: [R] Variable passed to function not used in function in
select=... in subset

G'day Duncan,

On Tue, 11 Nov 2008 09:37:57 -0500
Duncan Murdoch <[EMAIL PROTECTED]> wrote:

> I think this tension is a fundamental part of the character of S and
> R. But it is also fundamental to R that there are QC tests that apply
> to code in packages:  so writing new tests that detect dangerous
> usage (e.g. to disallow partial name matching) would be another way
> to improve reliability.  [...]

Please not. :)
After years of using of R, it is now second nature to me to type (yes,
I always spell out "from" and "to") 
seq(from=xx, to=yy, length=zz)
and I never understood why the full name of that argument had to be
length.out.  I would hate to see lots of warning messages because I am
using partial matching.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Thomas Lumley


Some of the uses of non-standard evaluation are undoubtedly a problem in 
R. Probably the worst is in model.frame, because it is much harder to 
work around. I have never used subset(,select=) and hence have never 
been at risk of confusion (if you don't like how it works, I suggest you 
do the same), but model.frame() is inside lots of things.


 There are two issues here that I think are worth pointing out:
1/ Some things are just not fixable any more. They can only be fixed in a 
new language. The people thinking about new statistical languages mostly 
know what the problems are, because they have been using S and/or R for 
many years and it's really not that hard to notice the problems. The 
document on non-standard evaluation demonstrates that R-core is aware of 
this particular problem.


2/ There are some uses of non-standard evaluation that don't seem to 
confuse people, and an interesting question is how to characterise them. 
These are what I referred to as 'macro-like functions' in the document 
that you have already been referred to.  For example, subset(,subset=) and 
with() don't seem to be as confusing or to cause problems for programmers 
in the same way. There is an empirical question as to what these 
relatively non-problematic constructs are, and a theoretical question as 
to why they are different. In particular, with() not only has non-standard 
evaluation, it is quite similar to the notoriously confusing attach().



-thomas


On Tue, 11 Nov 2008, Wacek Kusnierczyk wrote:


Berwin A Turlach wrote:

On Tue, 11 Nov 2008 09:27:41 +0100
Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote:



but then it might be worth asking whether carrying on with misdesign
for backward compatibility outbalances guaranteed crashes in future
users' programs, [...]



Why is it worth asking this if nobody else asks it?



i guess most of the people who do ask questions here care little about r
itself, they just want it to solve a problem, even if it involves
hacking the language.

those outside the r team who care about language design have probably
left the list long ago, if only they were subscribed.  the fact that
it's only me asking is no statistics.  i do talk to people, and know
many who'd ask, but they just don't care, because they have already
trashed r.  instead of discouraging me, make use of that i care to ask.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Peter Dalgaard
Berwin A Turlach wrote:
> G'day Duncan,
> 
> On Tue, 11 Nov 2008 09:37:57 -0500
> Duncan Murdoch <[EMAIL PROTECTED]> wrote:
> 
>> I think this tension is a fundamental part of the character of S and
>> R. But it is also fundamental to R that there are QC tests that apply
>> to code in packages:  so writing new tests that detect dangerous
>> usage (e.g. to disallow partial name matching) would be another way
>> to improve reliability.  [...]
> 
> Please not. :)
> After years of using of R, it is now second nature to me to type (yes,
> I always spell out "from" and "to") 
>   seq(from=xx, to=yy, length=zz)
> and I never understood why the full name of that argument had to be
> length.out.  I would hate to see lots of warning messages because I am
> using partial matching.

I think the story is this:

At some point in time, in a galaxy not too far away, and using one of
the R-like languages, calling the argument "length" gave you trouble
calling length(from) inside the function ("attempt to call non-function"
or some such error). Later, this issue was fixed so that function calls
would look for functions only, but by then, the name couldn't be changed
since some people had been writing it out in full.

(There are a couple of other cases, one of them involving an argument
ending in a ".", but I forgot what they are. I don't think there was
ever an along() function, so "along.with" escapes me.)

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Gavin Simpson wrote:
>
>> my whole posting is an attempt, may you try to notice.
>>
>> vQ
>> 
>
> Did you read what you wrote. And you still wonder why you get little
> response from certain quarters?
>
> 1) Don't say "no further comment" - that is quite arrogant to think that
> you are right and everyone who disagrees is wrong.
>   

that meant 'i won't further comment on this, i give up'.  i thought for
a while about explaining this, but then i though i might use the r
strategy -- let it be ambiguous.

> 2) You are being critical of other people's work in a manner that is not
> polite or respectful of the efforts of others.
>   

i can certainly agree that i don't pay attention to diplomacy.  my
favourite philosopher said 'you should seek friends, not truth';  i
betray him here, fortunately or not.  anyway, if you mean a post raising
serious issues should be ignored just because it is not polished enough,
let it be.  you gain peace, you lose feedback.

i can promise to make more effort to wrap the essence in a cake, and
drop unnecessary pun (you know, i have drop=FALSE by default, because
that's the way many languages other than r have), if this helps.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk

Gavin Simpson wrote:
>
> I've found several of these discussions involving Wacek's questions very
> enlightening at times; once you get past the "it doesn't work as I
> expect so is wrong" attitude.
>   

just one fix:  my attitude is 'it doesn't work as i imagine an average
user would expect it so it's potentially confusing'.

vQ



-- 
---
Wacek Kusnierczyk, MD PhD

Email: [EMAIL PROTECTED]
Phone: +47 73591875, +47 72574609

Department of Computer and Information Science (IDI)
Faculty of Information Technology, Mathematics and Electrical Engineering (IME)
Norwegian University of Science and Technology (NTNU)
Sem Saelands vei 7, 7491 Trondheim, Norway
Room itv303

Bioinformatics & Gene Regulation Group
Department of Cancer Research and Molecular Medicine (IKM)
Faculty of Medicine (DMF)
Norwegian University of Science and Technology (NTNU)
Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway
Room 231.05.060

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
G'day Duncan,

On Tue, 11 Nov 2008 09:37:57 -0500
Duncan Murdoch <[EMAIL PROTECTED]> wrote:

> I think this tension is a fundamental part of the character of S and
> R. But it is also fundamental to R that there are QC tests that apply
> to code in packages:  so writing new tests that detect dangerous
> usage (e.g. to disallow partial name matching) would be another way
> to improve reliability.  [...]

Please not. :)
After years of using of R, it is now second nature to me to type (yes,
I always spell out "from" and "to") 
seq(from=xx, to=yy, length=zz)
and I never understood why the full name of that argument had to be
length.out.  I would hate to see lots of warning messages because I am
using partial matching.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 12:53:31 +0100
Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote:

> but seriously, when one buys a complicated device one typically reads
> a quick start guide, and makes intuitive assumptions about how the
> device will work, turning back to the reference when the expectations
> fail. good design should aim at reducing the need for checking why an
> intuitive assumption fails.

And on what are these intuitive assumptions based if not on familiarity
with similar devices?  And people have different intuition, why should
yours be the correct one and the golden standard?

I know that if I buy a complicated device and never owned something
similar I read more than the quick start guide to get familiar with the
device before breaking something due to using wrong assumptions.

When I started to use S-PLUS, I had used GAUSS before.  Still I took
the time off to work through the blue book and make myself familiar
with S-PLUS before using it for serious work.  Based on my experience
with R, I found R very intuitive and easy to use; but still try to keep
up with relevant documentation.

It really seems that your problem is that you have an attitude of
wanting to have instant gratification.

> > If you do not care about how to use machine-gun correctly you could
> > easily harm yourself or others. 
> >   
> indeed, and i'm scared to think that some of the published research
> can be harmful because the researcher denied to read the whole r
> reference before doing a stats analysis.

Sorry, but this is absolute rubbish.  There are plenty of statistical
analyses that can be done without reading the complete R reference.
However, one or two good books might help.

My concern would rather be that everybody thinks that they can do
statistics and that software project of R makes such people really
think they can do it.  I am far more concerned about inappropriate
analyses and wrong interpretations.  How often is absence of evidence
taken as evidence of absence?

> you see, i'm not complaining about my own analyses failing because i
> have not read the appropriate section in the reference.  if this were
> the problem, i'd just read more and keep silent.
> 
> i'm complaining about the need to read, by anyone who starts up with
> r, in all gory details, about the intricacies of r before doing
> anything, because the behaviour is often so unexpected.  

I guess Frank Harrell had people like you in mind when he wrote:
  https://stat.ethz.ch/pipermail/r-help/2005-April/068625.html

Would you also not expect to learn about surgery in all its gory
details before attempting brain surgery because brain surgery is so
intuitive and doesn't need any study?

Believe it or not, there are lots of useful things that you can do in R
without knowing all the gory details.  There are even people who got
books on R published who obviously don't know all the gory details and
they still show useful applications of R.

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6515 4416 (secr)
Dept of Statistics and Applied Probability+65 6515 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 11:27:30 +0100
Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote:

> Berwin A Turlach wrote:

> > Why is it worth asking this if nobody else asks it?  Most notably a
> > certain software company in Redmond, Washington, which is famous for
> > carrying on with bad designs and bugs all in the name of backward
> > compatibility.  Apparently this company also sets industry
> > standards so it must be o.k. to do that. ;-)
> >   
> 
> sure.  i have had this analogy in mind for a long time, but just
> didn't want to say it aloud.  

Mate, if you contemplate comparing R to anything coming out of Redmond,
Washington, then you should first heed the old saying that "it is
better to remain silent and let people believe that one is a fool than
to open one's mouth and remove any doubt". :)

> indeed, r carries on with bad design, but since there are more and
> more users, it's just fine.

Whether R carries on with bad design is debatable.  Clearly the changes
that you would like to see would lead to big changes that might break a
lot of existing code and programming idioms.  Such changes could
estrange large part of the user base and, in a worst case scenario,
make R unusable for many tasks it is used now.  No wonder that nobody
is eager to implement such design changes.  Apparently Python is
planning such whole sale changes when moving to version 3.x.  Let's see
what that does to the popularity of Python and the uptake of the new
version.

> > Didn't see any confused complaints yet.  
> 
> really.  the discussion was motivated precisely by a user's
> complaint. 

We must have different definition of what constitutes a complaint.  I
looked at the initial posting again.  In my book there was no
complaint.  Just a user who asked how to achieve a certain aim because
the way he tried to achieve it did not work.  There were three or four
constructive answers that pointed out how it can be done and then all
of a sudden complaints about alleged design flaws of R started.

> just scan this list;  a large part of the questions stems from
> confusion, which results directly from r's design. 

That's your opinion, to which you are of course entitled to.  In my
opinion, a large part of the questions on r-help these days stem from
the fact that in this age of instant gratification it seems to be
easier to fire off an e-mail to a mailing list and try to pick the
brain of thousands of subscribers  instead of spending time on trying
to read the documentation, learn about R and figure out the question on
one's own.

> >> because r is likely to do everything but what the user expects.
> >
> > This is quite a strong statement, and I wonder what the basis is for
> > that a statement.  Care to provide any evidence?
> 
> i could think of organizing a (don't)useR conference, where
> submissions would provide such evidence.  

Please do so.  Such a conference would probably turn out to be more
hilarious and funnier than the Melbourne International Comedy Festival;
should be real fun to attend. :)

> whatever i say here, is mostly discarded as nonsense comments (while
> it certainly isn't), you say i make the problem up (while i just
> follow up user's complaints).  seriously, i may have exaggerated in
> the immediately above, but lots of comments made here by the users
> convince me that r very often breaks expectations.

Ever heard about biased sampling?  On a list like this you, of course,
hear questions by useRs who had the wrong expectations about how R
should behave and got surprised.  You do not hear of all the instances
in which useRs had the correct expectations which promptly were met by
R.  

> > R is a tool; a very powerful one and hence also very sharp.  It is
> > easy to cut yourself with it, but when one knows how to use it
> > gives the results that one expects.  I guess the problem in this
> > age of instant gratification is that people are not willing to put
> > in the time and effort to learn about the tools they are using.  
> 
> but a good tool should be made with care for how users will use it.  

But the group of users change, and sometimes one cannot foresee all
possible ways in which future users may use the software.  As a
programming paradigm says, "you cannot make a piece of software
idiot-proof; nature will always come up with a better idiot".  

> r apparently fits the ideas of its developers, 

That's the prerogative of the developers, isn't it?  But if it would
only fit their ideas, then it would only be used by them.  The fact
that it is used by many others seem to indicate that it fits also the
ideas of many others.

> while confuses naive users.  

Well, many judiciaries have staged driver licenses for motorcycle;
initially allowing only low-powered machine for new users with
increasing powerful machines allowed for more experiences users.  Some
people in Australia would like to introduce a similar system for
car-drivers since, apparently, too many P-platers kill themselves with
high-powered V8 c

Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Gavin Simpson
On Tue, 2008-11-11 at 15:54 +0100, Wacek Kusnierczyk wrote:

> > Have you tried? But bear in mind that R Core has more to balance that
> > just whether you think a design "flaw" or infelicity etc should be fixed
> > when it decides whether to accept patches.
> >   
> 
> my whole posting is an attempt, may you try to notice.
> 
> vQ

Did you read what you wrote. And you still wonder why you get little
response from certain quarters?

1) Don't say "no further comment" - that is quite arrogant to think that
you are right and everyone who disagrees is wrong.

2) You are being critical of other people's work in a manner that is not
polite or respectful of the efforts of others.

There is nothing wrong with being critical - I never said there was -
but there is a right way to go about it and a wrong way.

Also, you have to consider where we are now with R and where we have
come from. Whilst it would, in an ideal world, be great to fix every
design flaw that you think is in R, there is too much inertia there now
to change somethings or it will take a lot of effort on the part of a
team of people who give that time for free. This has to be a
consideration along side all the other considerations of good design,
improving the logic of how R works etc. You might not agree, but as long
as things are documented to work in a particular way then we might have
to live with them, unless a good case can be made to break existing code
and someone steps up to make the changes.

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Gavin Simpson
On Tue, 2008-11-11 at 08:14 -0600, hadley wickham wrote:
> > And without wanting to be rude or anything, your opinion carries very
> > little weight in a project like R. You've arrived on the list and been
> > very critical of the work of others. Now there is nothing wrong with
> > being critical if it is constructive, and additionally with something
> > like R you need to be constructive *and* contribute back. I'm not saying
> 
> You are holding Wacek to a very high standard.  Why is not acceptable
> to say that this part of R is hard to understand without having to
> provide a better solution?

Ok, reading back I should have said if you want something fixed, patches
are welcome. I didn't mean to say that to get help you had to contribute
back. However, Wacek's approach was (and I'm paraphrasing): subset
doesn't work logically or as I expect. It is a mess and needs fixing.

I'm sure no-one on the list minds if people don't understand things and
want to ask questions - I know I ask plenty of questions here about
things I don't understand. But just as there is a posting guide that
says how to go about phrasing a question that is likely to get a
response, we don't need people denigrating the work of others whilst
asking for assistance with what are admittedly hard concepts (ones I
don't fully understand either).

I've found several of these discussions involving Wacek's questions very
enlightening at times; once you get past the "it doesn't work as I
expect so is wrong" attitude.

> 
> subset() _is_ confusing to novice R users.  You can not anticipate
> what subset(df, select = a) will do unless you know what variables are
> defined in the local environment and what variables are defined in the
> data frame.  It is hard to understand how it works without a deep
> understanding of environments and it is hard to teach all the special
> cases.   It is difficult to reliably use subset within another
> function.

I agree, but one can read the documentation for help. It isn't perfect
and expects you know a bit (a lot) about environments etc, but I don't
think it is too confusing if you know what is in df (otherwise how do
you know what to select?), you read the help page and follow the
examples.

G

> 
> This comes from my personal experience with subset (good for
> interactive use, never program with) and from my experiences teaching
> ~80 students how to use R over the last two years.
> 
> Hadley
> 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread hadley wickham
> I think your analysis is correct, that the goals of casual use and
> programming are inconsistent.  But in general I think there's always going
> to be support for providing alternative ways that are programmer-safe.
>
> For instance, library( foo, character.only=TRUE) says that foo is a
> character vector, not the name of a package.  I don't know of anything that
> subset() provides that is not available in other ways (I think of it as
> purely a convenience function, and my first piece of advice to Karl was not
> to use it).

Good points - every function optimised for interactive use should have
a companion that is optimised for programmatic use.

> However, if there really is something there, then it would be
> worthwhile pointing that out, and either modifying subset() to make it safe,
> or providing an alternative function.

When I teach subsetting I try to make this clear - using [ will always
work, there's no magic and everything is explicit.  subset() has more
magic which saves you typing, but occasionally the magic doesn't work
and you'll be left scratching your head as to why.  In my experience
students prefer subset() until they encounter strange behaviour that
they don't understand.

> I think this tension is a fundamental part of the character of S and R.  But
> it is also fundamental to R that there are QC tests that apply to code in
> packages:  so writing new tests that detect dangerous usage (e.g. to
> disallow partial name matching) would be another way to improve reliability.
>  Writing a test for misuse of drop=TRUE seems quite hard, but there are
> probably ways a debugger could do it:  e.g. to tag the invocation as to
> whether any indices were dropped on the first call, and then warn if the
> result isn't the same on every subsequent call).

A similar thing would be to force package authors to explicitly
specify na.rm to ensure that they have thought about how to deal with
missing values (this always trips me up).  Perhaps you could treat
drop similarly - in non-interactive code drop should not have a
default value.   Presumably this wouldn't be too hard to implement - R
CMD check would just switch out [ for a version that didn't have a
default value, in a similar way to what happens with T and F (another
example of implicit interactive use vs. explicit programmatic use)

> Conceivably Karl's problem could be detected in the same way:  tag each name
> in the expression as to whether it was found in the data frame or some other
> environment, and then warn if that tag ever changes.  Or maybe the test
> should just warn that subset() is a convenience function, not meant for
> programming.

It would be nice if the documentation was clearer on these issues.  I
can imagine every function having a numeric value associated with it
which gave it's position on the interactive vs programming continuum.
Then you could sum up all the values in a function and warn the author
if it was too high.  Not very practical to implement though!

Hadley
-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Duncan Murdoch wrote:
> On 11/11/2008 8:53 AM, hadley wickham wrote:
>> On Mon, Nov 10, 2008 at 1:04 PM, Wacek Kusnierczyk
>> <[EMAIL PROTECTED]> wrote:
>>> pardon me, but does this address in any way the legitimate complaint of
>>> the rightfully confused user?
>>>
>>> consider the following:
>>>
>>> d = data.frame(a=1, b=2)
>>> a = c("a", "b")
>>> z = a
>>> # that is, both a and z are c("a", "b")
>>>
>>> subset(d, select=z)
>>> # gives two columns, since z is a two element vector whose elements are
>>> valid column names
>>>
>>> subset(d, select=a)
>>> # gives one column, since 'a' (but not a) is a valid column name
>>>
>>> subset(d, select=c(a,b))
>>> # gives two columns
>>>
>>>
>>> this is certainly what the authors intended, and they may have good
>>> grounds for this smart design.  but this must break the expectation
>>> of a
>>> naive (r-naive, for that matter) user, who may otherwise have excellent
>>> experience in using a functional programming language, e.g., scheme.
>>> (especially scheme, where symbols and expressions are first-class
>>> objects, yet the distinction between a symbol or an expression and
>>> their
>>> referent is made painfully clear, perhaps except for when one hacks
>>> with
>>> macros.)
>>>
>>> the examples above illustrate the notorious problem with r that one can
>>> never tell whether 'a' means "the value referred to with the identifier
>>> 'a'" or "the symbol 'a'", unless one gets ugly surprises and is forced
>>> to study the documentation.  and even then one may not get a clear
>>> answer.
>>
>> I agree, with some caveats.  There are basically two uses of R: as a
>> interactive data analysis package and as a statistical programming
>> language.  These uses come into conflict: in the interactive
>> environment, you want to minimise typing so that you can be as speedy
>> as possible.  It doesn't matter if R occasionally makes a wrong guess
>> when you have specified something implicitly, because you can fix it
>> on the fly.  When you are programming, you care less about saving
>> typing and more about reproducibility.  You want to be explicit so
>> your function is robust to widely varying inputs, even if it means you
>> have to type a lot more.  You see this tension in quite a few places:
>>
>>  * drop = T
>>  * functions that return different types of output (e.g. sapply)
>> depending on input parameters
>>  * partial matching of argument names
>>  * using unevaluated expressions instead of strings (e.g. library,
>> subset, ...)
>>
>> These are all things that are helpful for interactive use, but make
>> life as a programmer more difficult.  I find the last one particularly
>> frustrating because it means it is very difficult to program with some
>> functions (i.e subset) without resorting to complex quoting,
>> substituting and evaluating tricks.  I have tried to steer away from
>> this technique in my packages, and where it's just too convenient for
>> interactive use, insulating the deparsing into special functions that
>> the data analyst must use (e.g. aes() in ggplot, and .() in plyr),
>> along with providing alternatives for the programmer.
>>
>> I don't understand why you're getting so much push-back on this issue.
>>  R is a fantastic language, but it has some genuinely nasty corners.
>> In my opinion, this is one of them.
>
> I think your analysis is correct, that the goals of casual use and
> programming are inconsistent.  But in general I think there's always
> going to be support for providing alternative ways that are
> programmer-safe.

you know, in ipython you can write, e.g., m 1 instead of m(1) to call
the method m on the value 1.  but this is a syntactic shorthand which is
not valid in python, and you can see how it gets translated into python
when you try it.  so you have the cake and you eat it -- there is
consistent (at least, much more consistent than in r) policy on the
syntax, and you can still have conveniences in the interactive interpreter.

r, on the other hand, prefers solutions such as the subset one, which
are the best recipe for confusion.  why would not the r team have a look
at what others are doing?  programming language design has progressed a
lot since the so often cited reference for r was written in 1988.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Gavin Simpson wrote:
> On Tue, 2008-11-11 at 11:08 +0100, Wacek Kusnierczyk wrote:
>   
>> Gavin Simpson wrote:
>> 
 d = data.frame(a = 1)
 d$`-b` = 2
 names(d)
 # here we go

 subset(d, select = -b)
 # to b or not to b?
 
 
>>> but -b is not the name of the column; you explicitly called it `-b` and
>>> you should refer to it as such. If you use "non-standard" names then
>>> expect to do a bit more work.
>>>   
>>>   
>> identical(names(d)[2], "-b")
>>
>> if i do
>>
>> d$`c` = 4
>>
>> then you claim d has no column named 'c'?
>> 
>
> No, where do you get that from?
>   
by simple analogy to the above, just read your own comments.  if you're
suggesting one should not expect this bit to be consistent, it would be
just another example of messy semantics.

>   
>>   do i have to refer to the c
>> column as `c`?
>> 
>
> No, but then "c" is a name that doesn't need to be quoted. -b is a name
> that needs to be quoted and if you quote it, things work as you might
> expect.
>   

not necessarily, as one of my examples showed:  again, the result of
subset(d, select=`-b`) will depend on whether d has a column named '-b',
and if it doesn't, on whether there is a variable called '-b' that is a
character vector.  there is no way out of this issue, backquoting is no
solution.  no further comment.

>   
>> 
>>>   
>>>   
 subset(d, select = `-b`)
 
 
>>>   -b
>>> 1  2
>>>   
>>>   
>> ... and i have to use
>>
>> subset(d, select = `a`)
>>
>> and not
>>
>> subset(d, select = a)
>>
>> right?
>> 
>
> Is "a" a name in d? You can quote it if you want but it doesn't need to
> be quoted, so you can use either.
>   

you see, yo need to know whether 'a' is a name in d to know what
subset(d, select=a) would do.  no further comment.
>   
>>   besides, subset(d, select = `-b`) should rather return the
>> column(s) whose names are the value of the variable `-b`:
>>
>> `-a` = "a"
>> subset(d, select = `-a`)
>> # returns all columns except for the one named 'a', rather than the
>> column named '-a' -- but that's just because there is no such column in
>> d;  if there were, this one would be returned. 
>> 
>
> No, it returns a if you are following on from your original examples.
> `-a` refers to a variable (object) and that evaluates to "a" and "a" is
> component of d so is returned.
>   
you're right here, but the problem remains: subset(d, select=`-a`) will
treat `-a` as a column name or as a name of a variable with a vector of
column names, depending on what's in the data.  no further comment.


>   
>> so even with backquotes used, there is no obvious interpretation of what
>> select=`-b`should mean, because it depends on what names components of
>> the first argument have.  and this breaks the concept of referential
>> transparency.
>>
>> so the problem is not so easily explained away.  what subset does *is*
>> messy.
>> 
>
> In your opinion.
>   

yes, but not only mine.  perhaps some more r users will want to support
this claim; just wait.

> And without wanting to be rude or anything, your opinion carries very
> little weight in a project like R. You've arrived on the list and been
> very critical of the work of others. Now there is nothing wrong with
> being critical if it is constructive, and additionally with something
> like R you need to be constructive *and* contribute back. I'm not saying
> that if you did patch R to work the way you think is correct R Core will
> accept them as they need to maintain backwards compatibility and with S
> and not annoy the hundreds of package authors. but coming on here and
> criticising the work of others isn't going to win you many friends.
>   

that's really sad.  you're saying no one should ever criticize r without
reading the source code.  you are *really* not interested in feedback. 
note, feedback on the *design*, not implementation, is not fixed by
sending a patch.  you have a serious misconception here.

if i buy a tv, and read the quick guide, and start using it, and push
buttons, and suddenly get an electric shock, and complain to the
manufacturer, and they say i should have carefully read the 2K pages
manual because it says there i can get high voltage on my fingers while
pushing the buttons, and it's my fault, and if i want to complain i
should first study the schematics --- what??  they're just crazy, no?

> Also, subset (and the other things you've been harping on about) work as
> documented. So you kind of have to like it or lump it.
>   

we've just gone through the docs, and it's *you* who thinks it's so
beautifully clear from the docs what subset does.  i lump it.

>   
>> 
 subset(d, select = - `-b`)
 
 
>>>   a
>>> 1 1
>>>
>>>   
>>>   
 b = "a"
 subset(d, select = -b)
 # tragedy
 
 
>>> For this, I interpret it as not finding a column named b so tries to
>>> evaluate:
>>>
>>>   
>>>   
>> y

Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Duncan Murdoch

On 11/11/2008 8:53 AM, hadley wickham wrote:

On Mon, Nov 10, 2008 at 1:04 PM, Wacek Kusnierczyk
<[EMAIL PROTECTED]> wrote:

pardon me, but does this address in any way the legitimate complaint of
the rightfully confused user?

consider the following:

d = data.frame(a=1, b=2)
a = c("a", "b")
z = a
# that is, both a and z are c("a", "b")

subset(d, select=z)
# gives two columns, since z is a two element vector whose elements are
valid column names

subset(d, select=a)
# gives one column, since 'a' (but not a) is a valid column name

subset(d, select=c(a,b))
# gives two columns


this is certainly what the authors intended, and they may have good
grounds for this smart design.  but this must break the expectation of a
naive (r-naive, for that matter) user, who may otherwise have excellent
experience in using a functional programming language, e.g., scheme.
(especially scheme, where symbols and expressions are first-class
objects, yet the distinction between a symbol or an expression and their
referent is made painfully clear, perhaps except for when one hacks with
macros.)

the examples above illustrate the notorious problem with r that one can
never tell whether 'a' means "the value referred to with the identifier
'a'" or "the symbol 'a'", unless one gets ugly surprises and is forced
to study the documentation.  and even then one may not get a clear answer.


I agree, with some caveats.  There are basically two uses of R: as a
interactive data analysis package and as a statistical programming
language.  These uses come into conflict: in the interactive
environment, you want to minimise typing so that you can be as speedy
as possible.  It doesn't matter if R occasionally makes a wrong guess
when you have specified something implicitly, because you can fix it
on the fly.  When you are programming, you care less about saving
typing and more about reproducibility.  You want to be explicit so
your function is robust to widely varying inputs, even if it means you
have to type a lot more.  You see this tension in quite a few places:

 * drop = T
 * functions that return different types of output (e.g. sapply)
depending on input parameters
 * partial matching of argument names
 * using unevaluated expressions instead of strings (e.g. library, subset, ...)

These are all things that are helpful for interactive use, but make
life as a programmer more difficult.  I find the last one particularly
frustrating because it means it is very difficult to program with some
functions (i.e subset) without resorting to complex quoting,
substituting and evaluating tricks.  I have tried to steer away from
this technique in my packages, and where it's just too convenient for
interactive use, insulating the deparsing into special functions that
the data analyst must use (e.g. aes() in ggplot, and .() in plyr),
along with providing alternatives for the programmer.

I don't understand why you're getting so much push-back on this issue.
 R is a fantastic language, but it has some genuinely nasty corners.
In my opinion, this is one of them.


I think your analysis is correct, that the goals of casual use and 
programming are inconsistent.  But in general I think there's always 
going to be support for providing alternative ways that are 
programmer-safe.


For instance, library( foo, character.only=TRUE) says that foo is a 
character vector, not the name of a package.  I don't know of anything 
that subset() provides that is not available in other ways (I think of 
it as purely a convenience function, and my first piece of advice to 
Karl was not to use it).  However, if there really is something there, 
then it would be worthwhile pointing that out, and either modifying 
subset() to make it safe, or providing an alternative function.


I think this tension is a fundamental part of the character of S and R. 
 But it is also fundamental to R that there are QC tests that apply to 
code in packages:  so writing new tests that detect dangerous usage 
(e.g. to disallow partial name matching) would be another way to improve 
reliability.  Writing a test for misuse of drop=TRUE seems quite hard, 
but there are probably ways a debugger could do it:  e.g. to tag the 
invocation as to whether any indices were dropped on the first call, and 
then warn if the result isn't the same on every subsequent call).


Conceivably Karl's problem could be detected in the same way:  tag each 
name in the expression as to whether it was found in the data frame or 
some other environment, and then warn if that tag ever changes.  Or 
maybe the test should just warn that subset() is a convenience function, 
not meant for programming.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread hadley wickham
> And without wanting to be rude or anything, your opinion carries very
> little weight in a project like R. You've arrived on the list and been
> very critical of the work of others. Now there is nothing wrong with
> being critical if it is constructive, and additionally with something
> like R you need to be constructive *and* contribute back. I'm not saying

You are holding Wacek to a very high standard.  Why is not acceptable
to say that this part of R is hard to understand without having to
provide a better solution?

subset() _is_ confusing to novice R users.  You can not anticipate
what subset(df, select = a) will do unless you know what variables are
defined in the local environment and what variables are defined in the
data frame.  It is hard to understand how it works without a deep
understanding of environments and it is hard to teach all the special
cases.   It is difficult to reliably use subset within another
function.

This comes from my personal experience with subset (good for
interactive use, never program with) and from my experiences teaching
~80 students how to use R over the last two years.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Duncan Murdoch

On 11/11/2008 5:00 AM, Berwin A Turlach wrote:


Radford Neal is also complaining on his blog
(http://radfordneal.wordpress.com/) about what he thinks are design
flaws in R.  Why don't you two get together and design a good
substitute without any flaws?  Or is that too hard? ;-)


I agree with Radford (who was complaining about surprising behaviour 
with dropped dimensions in array indexing, and the result of 1:n when n 
is zero), but I don't particularly like his solution.  It seems to me 
that introducing a new operator that returns "a sequence from 1 up to n" 
is a good idea, but having a new data type is not:  there is too much 
legacy code that would not be able to handle it.  So we need some other 
way to handle the array indexing problem, such as ways to detect 
unintentional omissions of "drop=FALSE", if we want to handle it.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Gavin Simpson
On Tue, 2008-11-11 at 11:08 +0100, Wacek Kusnierczyk wrote:
> Gavin Simpson wrote:
> >
> >> d = data.frame(a = 1)
> >> d$`-b` = 2
> >> names(d)
> >> # here we go
> >>
> >> subset(d, select = -b)
> >> # to b or not to b?
> >> 
> >
> > but -b is not the name of the column; you explicitly called it `-b` and
> > you should refer to it as such. If you use "non-standard" names then
> > expect to do a bit more work.
> >   
> identical(names(d)[2], "-b")
> 
> if i do
> 
> d$`c` = 4
> 
> then you claim d has no column named 'c'?

No, where do you get that from?

>   do i have to refer to the c
> column as `c`?

No, but then "c" is a name that doesn't need to be quoted. -b is a name
that needs to be quoted and if you quote it, things work as you might
expect.

> 
> 
> >   
> >> subset(d, select = `-b`)
> >> 
> >   -b
> > 1  2
> >   
> 
> ... and i have to use
> 
> subset(d, select = `a`)
> 
> and not
> 
> subset(d, select = a)
> 
> right?

Is "a" a name in d? You can quote it if you want but it doesn't need to
be quoted, so you can use either.

>   besides, subset(d, select = `-b`) should rather return the
> column(s) whose names are the value of the variable `-b`:
> 
> `-a` = "a"
> subset(d, select = `-a`)
> # returns all columns except for the one named 'a', rather than the
> column named '-a' -- but that's just because there is no such column in
> d;  if there were, this one would be returned. 

No, it returns a if you are following on from your original examples.
`-a` refers to a variable (object) and that evaluates to "a" and "a" is
component of d so is returned.

> 
> so even with backquotes used, there is no obvious interpretation of what
> select=`-b`should mean, because it depends on what names components of
> the first argument have.  and this breaks the concept of referential
> transparency.
> 
> so the problem is not so easily explained away.  what subset does *is*
> messy.

In your opinion.

And without wanting to be rude or anything, your opinion carries very
little weight in a project like R. You've arrived on the list and been
very critical of the work of others. Now there is nothing wrong with
being critical if it is constructive, and additionally with something
like R you need to be constructive *and* contribute back. I'm not saying
that if you did patch R to work the way you think is correct R Core will
accept them as they need to maintain backwards compatibility and with S
and not annoy the hundreds of package authors. but coming on here and
criticising the work of others isn't going to win you many friends.

Also, subset (and the other things you've been harping on about) work as
documented. So you kind of have to like it or lump it.

> 
> 
> >> subset(d, select = - `-b`)
> >> 
> >   a
> > 1 1
> >
> >   
> >> b = "a"
> >> subset(d, select = -b)
> >> # tragedy
> >> 
> >
> > For this, I interpret it as not finding a column named b so tries to
> > evaluate:
> >
> >   
> 
> you interpret it.  how obvious is this for most users?
> it tries to find a column named 'b', not a column named b.  that's the
> problem with subset.

If users read the documentation then they'd know about unary operators.

> 
> 
> >> b = "a"
> >> `-`(b)
> >> 
> > Error in -b : invalid argument to unary operator
> >
> > `-` is a function remember.
> >
> > If you want this to work you can use get()
> >
> >   
> >> subset(d, select = - get(b))
> >> 
> >   -b
> > 1  2
> >
> >   
> 
> "use this hack to get around the design."

No hack, that is what get() is for. b is *not* a component of d. - b (or
`-`(b) evaluates to an error. If you want to select columns except the
column referenced by the contents of b (which is "a") then you can use
get().

> 
> >> d$b = 3
> >> subset(d, select = -b)
> >> # catharsis
> >>
> >> (for whatever reason a user may choose to have a column named '-b')
> >> 
> >
> > Yes, but the user is warned about not using standard naming conventions
> > in the Introduction to R manual. You aren't stopped from using names
> > like `-b` but if you use them, you have to expect to work a little
> > harder.
> >   
> 
> i'd like you to point me to that warning, as i apparently need to read
> it, but i haven't found it in the manual yet.  thanks.

You could look at section 1.8 of An Introduction to R for a
starter. ?Syntax is also a logical place to start and it explicitly
refers you to details in the See Also section. If you read all of those
(but I'll save you some time and point you to ?Quotes) you find the
answers to how things like this work. ?Quotes explains what are
syntactic names and how to use '`' backticks to quote non-syntactic
names.

Ok, ?Syntax and ?Quotes may not jump out at you as being very obvious
places to look. If so, grab the source to the introduction to R manual,
find a logical place to put this information or note to point people to
the help pages and patch it accordingly. Then contribute that back to
good of everyone.

> 
> > Reading ?subset we have:
> >
> >   select: e

Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread hadley wickham
On Mon, Nov 10, 2008 at 1:04 PM, Wacek Kusnierczyk
<[EMAIL PROTECTED]> wrote:
> pardon me, but does this address in any way the legitimate complaint of
> the rightfully confused user?
>
> consider the following:
>
> d = data.frame(a=1, b=2)
> a = c("a", "b")
> z = a
> # that is, both a and z are c("a", "b")
>
> subset(d, select=z)
> # gives two columns, since z is a two element vector whose elements are
> valid column names
>
> subset(d, select=a)
> # gives one column, since 'a' (but not a) is a valid column name
>
> subset(d, select=c(a,b))
> # gives two columns
>
>
> this is certainly what the authors intended, and they may have good
> grounds for this smart design.  but this must break the expectation of a
> naive (r-naive, for that matter) user, who may otherwise have excellent
> experience in using a functional programming language, e.g., scheme.
> (especially scheme, where symbols and expressions are first-class
> objects, yet the distinction between a symbol or an expression and their
> referent is made painfully clear, perhaps except for when one hacks with
> macros.)
>
> the examples above illustrate the notorious problem with r that one can
> never tell whether 'a' means "the value referred to with the identifier
> 'a'" or "the symbol 'a'", unless one gets ugly surprises and is forced
> to study the documentation.  and even then one may not get a clear answer.

I agree, with some caveats.  There are basically two uses of R: as a
interactive data analysis package and as a statistical programming
language.  These uses come into conflict: in the interactive
environment, you want to minimise typing so that you can be as speedy
as possible.  It doesn't matter if R occasionally makes a wrong guess
when you have specified something implicitly, because you can fix it
on the fly.  When you are programming, you care less about saving
typing and more about reproducibility.  You want to be explicit so
your function is robust to widely varying inputs, even if it means you
have to type a lot more.  You see this tension in quite a few places:

 * drop = T
 * functions that return different types of output (e.g. sapply)
depending on input parameters
 * partial matching of argument names
 * using unevaluated expressions instead of strings (e.g. library, subset, ...)

These are all things that are helpful for interactive use, but make
life as a programmer more difficult.  I find the last one particularly
frustrating because it means it is very difficult to program with some
functions (i.e subset) without resorting to complex quoting,
substituting and evaluating tricks.  I have tried to steer away from
this technique in my packages, and where it's just too convenient for
interactive use, insulating the deparsing into special functions that
the data analyst must use (e.g. aes() in ggplot, and .() in plyr),
along with providing alternatives for the programmer.

I don't understand why you're getting so much push-back on this issue.
 R is a fantastic language, but it has some genuinely nasty corners.
In my opinion, this is one of them.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
> On Tue, 11 Nov 2008 09:49:31 +0100
> Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote:
>
>   
>> (for whatever reason a user may choose to have a column named '-b')
>> 
>
> For whatever reason, people also jump from bridges.  Does that mean
> all bridges have an inherently flawed design and should be abolished?
>
> Wait, then we would only have level crossing and some people, for
> whatever reason, think it is a good idea to race trains to level
> crossings.  Gee, we better abolish them too since they are such a bad
> design.  
>   
i agree that the case of -b is extreme, but your response is still
unfair to the original problem.  people that jump from bridges usually
do that intentionally.  the intention of the user who complained about
his code (below) was certainly not to jump off a bridge, but to walk
over it.  and yet he's fallen into cold water.  a bridge which makes you
fall when you want to walk and not to jump has flawed design and is a
good candidate for abolishing.

testfunc = function(data, group) print(names(subset(data, select=group)))

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Petr PIKAL wrote:
>
> Well, if somebody does not care what is he/she doing then he/she should 
> stop immediately. 
>   

then many r users should perhaps stop using r.

but seriously, when one buys a complicated device one typically reads a
quick start guide, and makes intuitive assumptions about how the device
will work, turning back to the reference when the expectations fail. 
good design should aim at reducing the need for checking why an
intuitive assumption fails.


> If you do not care about how to use machine-gun correctly you could easily 
> harm yourself or others. 
>   
indeed, and i'm scared to think that some of the published research can
be harmful because the researcher denied to read the whole r reference
before doing a stats analysis.

>> those outside the r team who care about language design have probably
>> left the list long ago, if only they were subscribed.  the fact that
>> 
>
> I am just a BFU although for some time already, so I learned much virtues 
> from capable persons who are developing and using R. I started with R when 
> I had to change from DOS Statgraphics to some Windows based program and 
> get used to it. 
>
> It is like buying new shoes. If somebody just put them on, go for a some 
> mountaineering, find out that they cause blisters, discard them and buy a 
> new pair then he probable does not get rid of blisters.
>   

you see, i'm not complaining about my own analyses failing because i
have not read the appropriate section in the reference.  if this were
the problem, i'd just read more and keep silent.

i'm complaining about the need to read, by anyone who starts up with r,
in all gory details, about the intricacies of r before doing anything,
because the behaviour is often so unexpected.  i'm using a whole range
of programming languages, including functional ones, they differ a lot,
they do surprise me at times, but once you learn a few general rules
about the syntax and semantics, it goes well.  it won't with r, because
every single function can do it's own tricks with the arguments you give
it, and it can do so in an inconsistent manner.  *this* is what should
be changed for r to be coherent and reliable.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Petr PIKAL
Hi

[EMAIL PROTECTED] napsal dne 11.11.2008 11:32:27:

> Berwin A Turlach wrote:
> > On Tue, 11 Nov 2008 09:27:41 +0100
> > Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote:
> >
> > 
> >> but then it might be worth asking whether carrying on with misdesign
> >> for backward compatibility outbalances guaranteed crashes in future
> >> users' programs, [...]
> >> 
> >
> > Why is it worth asking this if nobody else asks it? 
> 
> 
> i guess most of the people who do ask questions here care little about r
> itself, they just want it to solve a problem, even if it involves
> hacking the language.

Well, if somebody does not care what is he/she doing then he/she should 
stop immediately. 
If you do not care about how to use machine-gun correctly you could easily 
harm yourself or others. 

> 
> those outside the r team who care about language design have probably
> left the list long ago, if only they were subscribed.  the fact that

I am just a BFU although for some time already, so I learned much virtues 
from capable persons who are developing and using R. I started with R when 
I had to change from DOS Statgraphics to some Windows based program and 
get used to it. 

It is like buying new shoes. If somebody just put them on, go for a some 
mountaineering, find out that they cause blisters, discard them and buy a 
new pair then he probable does not get rid of blisters.

> it's only me asking is no statistics.  i do talk to people, and know
> many who'd ask, but they just don't care, because they have already
> trashed r.  instead of discouraging me, make use of that i care to ask.

If i understand - see Gabors post

> Gabor Grothendieck wrote: 
>> Certainly this has been recognized as a potential problem: 
>> 
>> http://developer.r-project.org/nonstandard-eval.pdf 
>> 
>> however, it is convenient when you are performing 
>> an analysis and entering commands directly as opposed 
>> to writing a program although possibly the potential ambiguities 
>> overshadow the convenience. 

But changing it could be quite difficult and not on developers high 
priority list.

Just my 2c

Regards
Petr

> 
> vQ
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Kenn Konstabel wrote:
>
> On the other hand, while there may be ground to complain,  it may be easier
> to make your own version of subset.data.frame and  advertise it to everyone:
>
>   

sure, but:

a) it may actually increase the mess, and reduce portability
b) is still vulnerable to the idiosyncrasies of the functions you use to
develop your own function.

to b), that was the original case; the user wanted to implement a
function that did print-names-subset, and he got caught by subset.


it should be preferred to have a clean and consistent protocol for how
functions treat their arguments, rather than to multiply implementations
of the same operation to provide versions that differ in nitty-gritty
details just because the original does something odd.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Kenn Konstabel
On Tue, Nov 11, 2008 at 12:27 PM, Wacek Kusnierczyk <
[EMAIL PROTECTED]> wrote:

> it's certainly hard to design and implement a system of the size of r.
> it's certainly easier to just complain rather than make a better tool.
> but it would really be a pitiful world if all of us were just
> developing, and no one would criticize.  my purpose is not (or not just,
> if you prefer) to annoy the r team, but to point out and document issues
> that really need rethinking.  discouragingly, many of these issues
> appear to be known already, but simply ignored.
>

On the other hand, while there may be ground to complain,  it may be easier
to make your own version of subset.data.frame and  advertise it to everyone:

Substitute the second `substitute` in subset.data.frame for nothing, i.e.,
replace
   vars <- eval(substitute(select), nl, parent.frame())
.. with
  vars <- eval(select, nl, parent.frame())
.. and it will behave as you want (if I understood you).

# suppose you have modified subset.data.frame this way
# and called it waceks.subset
 df1<-data.frame(group="G1", visit="V1", value=0.9)
group <- c("group", "visit")

> subset(df1, select=group)
  group
1G1

> waceks.subset(df1, select=group)
  group visit
1G1V1


KK

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
> On Tue, 11 Nov 2008 09:27:41 +0100
> Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote:
>
>   
>> but then it might be worth asking whether carrying on with misdesign
>> for backward compatibility outbalances guaranteed crashes in future
>> users' programs, [...]
>> 
>
> Why is it worth asking this if nobody else asks it?  


i guess most of the people who do ask questions here care little about r
itself, they just want it to solve a problem, even if it involves
hacking the language.

those outside the r team who care about language design have probably
left the list long ago, if only they were subscribed.  the fact that
it's only me asking is no statistics.  i do talk to people, and know
many who'd ask, but they just don't care, because they have already
trashed r.  instead of discouraging me, make use of that i care to ask.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Berwin A Turlach wrote:
> On Tue, 11 Nov 2008 09:27:41 +0100
> Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote:
>
>   
>> but then it might be worth asking whether carrying on with misdesign
>> for backward compatibility outbalances guaranteed crashes in future
>> users' programs, [...]
>> 
>
> Why is it worth asking this if nobody else asks it?  Most notably a
> certain software company in Redmond, Washington, which is famous for
> carrying on with bad designs and bugs all in the name of backward
> compatibility.  Apparently this company also sets industry standards so
> it must be o.k. to do that. ;-)
>   

sure.  i have had this analogy in mind for a long time, but just didn't
want to say it aloud.  indeed, r carries on with bad design, but since
there are more and more users, it's just fine.

>   
>> which result in confused complaints, 
>> 
>
> Didn't see any confused complaints yet.  

really.  the discussion was motivated precisely by a user's complaint. 
just scan this list;  a large part of the questions stems from
confusion, which results directly from r's design. 

> Only polite requests for
> enlightenment after coming across behaviour that useRs found surprising
> given their knowledge of R.  The confused complaints seem to be posted
> as responses to responses to such question by people who for what ever
> reason seem to have an axe to grind with R. 
>   
>> the need for responses suggesting hacks to bypass the design, 
>> 
>
> Not to bypass the design, but to achieve what the person whats.  As any
> programming language, R is a Turing machine and anything can be done
> with it; it is just a question how.
>   

yes, to bypass the design.  to achieve what one would normally expect an
expression to be evaluated to, but r does it differently.

>   
>> and possibly incorrect results published 
>> 
>
> I guess such things cannot be avoided no matter what software you are
> using.  I am more worried about all the analysis done in MS Excel, in
> particular in the financial maths/stats world.  Also, to me it seems
> that getting incorrect results is a relative small problem compared with
> the frequent misinterpretation of correct results or the use of
> inappropriate statistical techniques.  
>   

could not agree more, which does oppose in any way my complaints.


>   
>> because r is likely to do everything but what the user expects.
>> 
>
> This is quite a strong statement, and I wonder what the basis is for
> that a statement.  Care to provide any evidence?
>   

i could think of organizing a (don't)useR conference, where submissions
would provide such evidence.  whatever i say here, is mostly discarded
as nonsense comments (while it certainly isn't), you say i make the
problem up (while i just follow up user's complaints).  seriously, i may
have exaggerated in the immediately above, but lots of comments made
here by the users convince me that r very often breaks expectations.

> R is a tool; a very powerful one and hence also very sharp.  It is easy
> to cut yourself with it, but when one knows how to use it gives the
> results that one expects.  I guess the problem in this age of instant
> gratification is that people are not willing to put in the time and
> effort to learn about the tools they are using.  
>   

but a good tool should be made with care for how users will use it.  r
apparently fits the ideas of its developers, while confuses naive
users.  i do not opt for redmond-like 'i know better what you want'
intelligence, but i think some of the confusions should be predicted and
the design tuned accordingly.

> How about spending some time learning about R instead of continuously
> griping about it?  Just imagine how much you could have learned in the
> time you spend writing all those e-mails. :)
>   

i learn a lot while writing these emails, because i do read manuals and
make up tests.  but there would be little progress if we all were buying
what we are given instead of critically examining it.  i can stop
posting at any moment, but i don't think it would help the community ;)

>> r suffers from early made poor decisions, but then this in itself is
>> not a good reason to carry on.
>> 
>
> Radford Neal is also complaining on his blog
> (http://radfordneal.wordpress.com/) about what he thinks are design
> flaws in R.  Why don't you two get together and design a good
> substitute without any flaws?  Or is that too hard? ;-)
>   

it's certainly hard to design and implement a system of the size of r. 
it's certainly easier to just complain rather than make a better tool. 
but it would really be a pitiful world if all of us were just
developing, and no one would criticize.  my purpose is not (or not just,
if you prefer) to annoy the r team, but to point out and document issues
that really need rethinking.  discouragingly, many of these issues
appear to be known already, but simply ignored. 

vQ

__
R-help@r-project.org mail

Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Gavin Simpson wrote:
>
>> d = data.frame(a = 1)
>> d$`-b` = 2
>> names(d)
>> # here we go
>>
>> subset(d, select = -b)
>> # to b or not to b?
>> 
>
> but -b is not the name of the column; you explicitly called it `-b` and
> you should refer to it as such. If you use "non-standard" names then
> expect to do a bit more work.
>   
identical(names(d)[2], "-b")

if i do

d$`c` = 4

then you claim d has no column named 'c'?  do i have to refer to the c
column as `c`?


>   
>> subset(d, select = `-b`)
>> 
>   -b
> 1  2
>   

... and i have to use

subset(d, select = `a`)

and not

subset(d, select = a)

right?  besides, subset(d, select = `-b`) should rather return the
column(s) whose names are the value of the variable `-b`:

`-a` = "a"
subset(d, select = `-a`)
# returns all columns except for the one named 'a', rather than the
column named '-a' -- but that's just because there is no such column in
d;  if there were, this one would be returned. 

so even with backquotes used, there is no obvious interpretation of what
select=`-b`should mean, because it depends on what names components of
the first argument have.  and this breaks the concept of referential
transparency.

so the problem is not so easily explained away.  what subset does *is*
messy.


>> subset(d, select = - `-b`)
>> 
>   a
> 1 1
>
>   
>> b = "a"
>> subset(d, select = -b)
>> # tragedy
>> 
>
> For this, I interpret it as not finding a column named b so tries to
> evaluate:
>
>   

you interpret it.  how obvious is this for most users?
it tries to find a column named 'b', not a column named b.  that's the
problem with subset.


>> b = "a"
>> `-`(b)
>> 
> Error in -b : invalid argument to unary operator
>
> `-` is a function remember.
>
> If you want this to work you can use get()
>
>   
>> subset(d, select = - get(b))
>> 
>   -b
> 1  2
>
>   

"use this hack to get around the design."

>> d$b = 3
>> subset(d, select = -b)
>> # catharsis
>>
>> (for whatever reason a user may choose to have a column named '-b')
>> 
>
> Yes, but the user is warned about not using standard naming conventions
> in the Introduction to R manual. You aren't stopped from using names
> like `-b` but if you use them, you have to expect to work a little
> harder.
>   

i'd like you to point me to that warning, as i apparently need to read
it, but i haven't found it in the manual yet.  thanks.

> Reading ?subset we have:
>
>   select: expression, indicating columns to select from a data frame.
>
> 
>
>  For data frames, the 'subset' argument works on the rows.  Note
>  that 'subset' will be evaluated in the data frame, so columns can
>  be referred to (by name) as variables in the expression (see the
>  examples).
>
> which I think is reasonably explicit is it not? 

about?  it says nothing about how the expression passed as the select
argument is treated.  it just says that the select argument is an
expression indicating columns (but how?), and then, in the middle of
explaining the subset parameter, it mentions that columns can be
referred to by name as variables in the expression.  how clear is this?

the following does not work -- i'd expect it to, by virtue the clear
explanation:

d = data.frame(a=1, b=2)
subset(d, select=c(a, "b"))
# what??  it does not break any 'specification' given in the docs


> It explains why your
> second example fails and why '- get(b)' doesn't, and also why your other
> examples don't give you what you want. You aren't using the appropriate
> 'name'.
>   
that's still too confusing.  ?get:

get(x, ...)

x: a variable name (given as a character string)

so:

get("b")
# "a", because we get the variable b, whose value is "a"

get(b)
# variable "a" not found

in '-get(b)', get(b) should evaluate to the value of the variable named
in b; b is "a", so get should lookup the value of the variable a, but
there is none (unless you defined it), so this should break.  instead,
'get(b)' is replaced with 'a', and '-a' in subset(d, select=-a) is not
treated as an application of the function `-`to the variable a, but
literally as the specification 'but column named 'a''. 

it must be painfully obvious to a casual user.


> I'm sure we could all find aspects of R that don't work in exactly the
> way we might preconceive or think of as being intuitive. 

most of it, seems like.

> But if it works
> as documented 

in many cases, the documentation is insufficient, confusing, and
unhelpful when it comes to this sort of what you might call 'optimizations'.

> then I don't see what the problem is unless i) you are
> offering to rewrite the code to make it "work better", ii) that R Core
> thinks any proposal "works better" and iii) in doing so it doesn't break
> most of the R code out there in R itself or in add-on packages.
>   

i'd prefer r to work better rather than "work better".  i'm afraid that
serious improvements to r must, by necessity, break quite a lot of
earlier code, which exploits, if only due the impossibil

Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 09:49:31 +0100
Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote:

> (for whatever reason a user may choose to have a column named '-b')

For whatever reason, people also jump from bridges.  Does that mean
all bridges have an inherently flawed design and should be abolished?

Wait, then we would only have level crossing and some people, for
whatever reason, think it is a good idea to race trains to level
crossings.  Gee, we better abolish them too since they are such a bad
design.  

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Berwin A Turlach
On Tue, 11 Nov 2008 09:27:41 +0100
Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote:

> but then it might be worth asking whether carrying on with misdesign
> for backward compatibility outbalances guaranteed crashes in future
> users' programs, [...]

Why is it worth asking this if nobody else asks it?  Most notably a
certain software company in Redmond, Washington, which is famous for
carrying on with bad designs and bugs all in the name of backward
compatibility.  Apparently this company also sets industry standards so
it must be o.k. to do that. ;-)

> which result in confused complaints, 

Didn't see any confused complaints yet.  Only polite requests for
enlightenment after coming across behaviour that useRs found surprising
given their knowledge of R.  The confused complaints seem to be posted
as responses to responses to such question by people who for what ever
reason seem to have an axe to grind with R. 

> the need for responses suggesting hacks to bypass the design, 

Not to bypass the design, but to achieve what the person whats.  As any
programming language, R is a Turing machine and anything can be done
with it; it is just a question how.

> and possibly incorrect results published 

I guess such things cannot be avoided no matter what software you are
using.  I am more worried about all the analysis done in MS Excel, in
particular in the financial maths/stats world.  Also, to me it seems
that getting incorrect results is a relative small problem compared with
the frequent misinterpretation of correct results or the use of
inappropriate statistical techniques.  

> because r is likely to do everything but what the user expects.

This is quite a strong statement, and I wonder what the basis is for
that a statement.  Care to provide any evidence?

R is a tool; a very powerful one and hence also very sharp.  It is easy
to cut yourself with it, but when one knows how to use it gives the
results that one expects.  I guess the problem in this age of instant
gratification is that people are not willing to put in the time and
effort to learn about the tools they are using.  

How about spending some time learning about R instead of continuously
griping about it?  Just imagine how much you could have learned in the
time you spend writing all those e-mails. :)

> r suffers from early made poor decisions, but then this in itself is
> not a good reason to carry on.

Radford Neal is also complaining on his blog
(http://radfordneal.wordpress.com/) about what he thinks are design
flaws in R.  Why don't you two get together and design a good
substitute without any flaws?  Or is that too hard? ;-)

Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Gavin Simpson
On Tue, 2008-11-11 at 09:49 +0100, Wacek Kusnierczyk wrote:
> Gabor Grothendieck wrote:
> >
> > Regarding the convenience it occurs in expressions like this:
> >
> >iris2 <- subset(iris, select = - Species)
> >
> > to create a data frame without the Species column.
> >   
> 
> aha!  so what's you best guess about the result here:

I'm not sure I see too much of a problem here.

> 
> d = data.frame(a = 1)
> d$`-b` = 2
> names(d)
> # here we go
> 
> subset(d, select = -b)
> # to b or not to b?

but -b is not the name of the column; you explicitly called it `-b` and
you should refer to it as such. If you use "non-standard" names then
expect to do a bit more work.

> subset(d, select = `-b`)
  -b
1  2
> subset(d, select = - `-b`)
  a
1 1

> 
> b = "a"
> subset(d, select = -b)
> # tragedy

For this, I interpret it as not finding a column named b so tries to
evaluate:

> b = "a"
> `-`(b)
Error in -b : invalid argument to unary operator

`-` is a function remember.

If you want this to work you can use get()

> subset(d, select = - get(b))
  -b
1  2

> 
> d$b = 3
> subset(d, select = -b)
> # catharsis
> 
> (for whatever reason a user may choose to have a column named '-b')

Yes, but the user is warned about not using standard naming conventions
in the Introduction to R manual. You aren't stopped from using names
like `-b` but if you use them, you have to expect to work a little
harder.

Reading ?subset we have:

  select: expression, indicating columns to select from a data frame.



 For data frames, the 'subset' argument works on the rows.  Note
 that 'subset' will be evaluated in the data frame, so columns can
 be referred to (by name) as variables in the expression (see the
 examples).

which I think is reasonably explicit is it not? It explains why your
second example fails and why '- get(b)' doesn't, and also why your other
examples don't give you what you want. You aren't using the appropriate
'name'.

I'm sure we could all find aspects of R that don't work in exactly the
way we might preconceive or think of as being intuitive. But if it works
as documented then I don't see what the problem is unless i) you are
offering to rewrite the code to make it "work better", ii) that R Core
thinks any proposal "works better" and iii) in doing so it doesn't break
most of the R code out there in R itself or in add-on packages.

G

> 
> 
> vQ
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:
>
> Regarding the convenience it occurs in expressions like this:
>
>iris2 <- subset(iris, select = - Species)
>
> to create a data frame without the Species column.
>   

aha!  so what's you best guess about the result here:

d = data.frame(a = 1)
d$`-b` = 2
names(d)
# here we go

subset(d, select = -b)
# to b or not to b?

b = "a"
subset(d, select = -b)
# tragedy

d$b = 3
subset(d, select = -b)
# catharsis

(for whatever reason a user may choose to have a column named '-b')


vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-11 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:
>
> but I think R is stuck with what it has due to compatibility and the large
> base of users yet its still possible to add functions in packages or new
> functions to R so a new variant of subset would be possible in which
> case one could decide to use the new function in place of the old one.
>   

you're probably correct.

but then it might be worth asking whether carrying on with misdesign for
backward compatibility outbalances guaranteed crashes in future users'
programs, which result in confused complaints, the need for responses
suggesting hacks to bypass the design, and possibly incorrect results
published because r is likely to do everything but what the user expects.

r suffers from early made poor decisions, but then this in itself is not
a good reason to carry on.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Gabor Grothendieck
On Mon, Nov 10, 2008 at 4:17 PM, Wacek Kusnierczyk
<[EMAIL PROTECTED]> wrote:
> Gabor Grothendieck wrote:
>> Certainly this has been recognized as a potential problem:
>>
>> http://developer.r-project.org/nonstandard-eval.pdf
>>
>> however, it is convenient when you are performing
>> an analysis and entering commands directly as opposed
>> to writing a program although possibly the potential ambiguities
>> overshadow the convenience.
>>
>
> in most cases, i do not see why one could not use a string literal
> passed by value instead of having an expression deparsed within the
> function, which may lead to confusing behaviour.  this would give much
> more consistent and predictable code.  this has nothing to do with the
> evaluation mechanism, which can still be lazy.
>
>
>
> in the case of subset, i do not really see how this design might be
> helpful, but it's easy to see how it can be harmful, examples have just

I think the thrust of your comments were already made by reference.

Regarding the convenience it occurs in expressions like this:

   iris2 <- subset(iris, select = - Species)

to create a data frame without the Species column.

Perhaps this would have better been done by allowing an optional
formula for the select clause:

   iris2 <- subset(iris, select = ~ - Species)

but I think R is stuck with what it has due to compatibility and the large
base of users yet its still possible to add functions in packages or new
functions to R so a new variant of subset would be possible in which
case one could decide to use the new function in place of the old one.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Wacek Kusnierczyk
Gabor Grothendieck wrote:
> Certainly this has been recognized as a potential problem:
>
> http://developer.r-project.org/nonstandard-eval.pdf
>
> however, it is convenient when you are performing
> an analysis and entering commands directly as opposed
> to writing a program although possibly the potential ambiguities
> overshadow the convenience.
>   

in most cases, i do not see why one could not use a string literal
passed by value instead of having an expression deparsed within the
function, which may lead to confusing behaviour.  this would give much
more consistent and predictable code.  this has nothing to do with the
evaluation mechanism, which can still be lazy. 



in the case of subset, i do not really see how this design might be
helpful, but it's easy to see how it can be harmful, examples have just
been given.  the convenience here is at most up to being able to omit
quotes, at the risk of having columns selected where they should not,
and vice versa.  the worst thing is that it destroys the benefit of
lexical scoping:

subset(d, select=group)

did the programmer intend to select the column named 'group'?  or the
columns whose names appear in the vector group?  is d supposed not to
have a column named 'group', should one change the identifier if d does
have such a column, to avoid selecting that column instead of whatever
else would be selected?  etc.  could this not be written as

subset(d, select="group") 

(two extra characters), and have it cleanly and always mean 'pick the
one column named 'group''? 

so there are actually three problems here:
- one that a programmer may be unaware that her own code not do what she
wants;
- another that a user may unaware of that the code she uses performs
this way;
- another that a user may not be sure whether the code may be reused as
is, or must be modified so as not to interfere with the particular data.

the dependence of subset's behaviour on the particular data it is
applied to is confusing.  and here's an example of how it breaks its own
smart semantics:

d = data.frame(a=1)
d$`c(a,b)` = 2
d
# no problem, two columns
names(d)
# one named 'c(a,b)'

subset(d, select=c(a,b))
# so what?  the expression given to select certainly is a valid and
actual name of a column in d, but subset complains there's no such
column (well, it actually says object "b" not found, by which it
probably means that object b, i.e., object named 'b', has not been
found.  not only uninformative as a message in this situation, but also
revealing the pervasive confusion of the name and the named, as the
object "b" -- a one-character string -- has not been mentioned here at
all.  what a mess.)

this can't possibly be considered good design, can it?  the dubious
benefit is heavily outweighed by the drawbacks.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Gabor Grothendieck
Forgot the name part.  Try:

TestFunc2 <- function(DF, group) names(DF[group])
TestFunc3 <- function(...) names(subset(..., subset = TRUE))
TestFunc4 <- function(...) eval.parent(names(subset(..., subset = TRUE)))

# e.g.
df1 <- data.frame(group = "G1", visit = "V1", value = 0.9)
TestFunc2(df1, c("group", "visit"))
TestFunc3(df1, c("group", "visit"))
TestFunc4(df1, c("group", "visit"))
TestFunc4(df1, c(group, visit)) # this works too

On Mon, Nov 10, 2008 at 10:43 AM, Gabor Grothendieck
<[EMAIL PROTECTED]> wrote:
> Here are a few things to try:
>
> TestFunc1 <- get("[")
>
> TestFunc2 <- function(DF, group) DF[group]
>
> TestFunc3 <- function(...) subset(..., subset = TRUE)
>
>
>
> On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick <[EMAIL PROTECTED]> wrote:
>> Hello!
>>
>> I have the problem that in my function the passed variable is not used, but 
>> the variable name of the dataframe itself - difficult to explain, but an 
>> easy example:
>>
>> TestFunc<-function(df, group) {
>> print(names(subset(df, select=group)))
>> }
>> df1<-data.frame(group="G1", visit="V1", value=0.9)
>> TestFunc(df1, c("group", "visit"))
>>
>> Result:
>> [1] "group"
>>
>> But I expected and want to have [1] "group" "visit" as result! Does anybody 
>> know how to get this result?
>>
>> Thanks!
>> Karl
>>
>>
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Gabor Grothendieck
Certainly this has been recognized as a potential problem:

http://developer.r-project.org/nonstandard-eval.pdf

however, it is convenient when you are performing
an analysis and entering commands directly as opposed
to writing a program although possibly the potential ambiguities
overshadow the convenience.

On Mon, Nov 10, 2008 at 2:04 PM, Wacek Kusnierczyk
<[EMAIL PROTECTED]> wrote:
> pardon me, but does this address in any way the legitimate complaint of
> the rightfully confused user?
>
> consider the following:
>
> d = data.frame(a=1, b=2)
> a = c("a", "b")
> z = a
> # that is, both a and z are c("a", "b")
>
> subset(d, select=z)
> # gives two columns, since z is a two element vector whose elements are
> valid column names
>
> subset(d, select=a)
> # gives one column, since 'a' (but not a) is a valid column name
>
> subset(d, select=c(a,b))
> # gives two columns
>
>
> this is certainly what the authors intended, and they may have good
> grounds for this smart design.  but this must break the expectation of a
> naive (r-naive, for that matter) user, who may otherwise have excellent
> experience in using a functional programming language, e.g., scheme.
> (especially scheme, where symbols and expressions are first-class
> objects, yet the distinction between a symbol or an expression and their
> referent is made painfully clear, perhaps except for when one hacks with
> macros.)
>
> the examples above illustrate the notorious problem with r that one can
> never tell whether 'a' means "the value referred to with the identifier
> 'a'" or "the symbol 'a'", unless one gets ugly surprises and is forced
> to study the documentation.  and even then one may not get a clear answer.
>
> the example given by the confused user is a red flag warning.  it's a
> typical abstraction where a nested sequence of operations (here print
> over names over subset) is abstracted into a single procedure, which can
> be called with whatever arguments are valid:
>
> pns = function(d, g) print(names(subset(d, select=g)))
>
> what sane person, without carefully studying the gory details of subset,
> will ever expect that if the first argument happens to have a column
> named 'g', only this one will be selected, while if it doesn't, subset
> will select the columns named by the components of what 'g' evaluates
> to.  i wonder how many users have *not* noticed that what they get is
> not what they assume they get because of such tricky tricks, and in
> consequence were not able to publish their analyses (or worse, have
> published them).
>
> what is scary is that this may happen with about any other function in
> r, because the design is pervasive.  no one should ever use any r
> function without first carefully reading the docs (which is not
> guaranteed to help) or trying it first on a number of carefully crafted
> test cases.  if such care is not taken, results obtained with r cannot
> be taken seriously.
>
>
> vQ
>
>
> Gabor Grothendieck wrote:
>> Forgot the name part.  Try:
>>
>> TestFunc2 <- function(DF, group) names(DF[group])
>> TestFunc3 <- function(...) names(subset(..., subset = TRUE))
>> TestFunc4 <- function(...) eval.parent(names(subset(..., subset = TRUE)))
>>
>> # e.g.
>> df1 <- data.frame(group = "G1", visit = "V1", value = 0.9)
>> TestFunc2(df1, c("group", "visit"))
>> TestFunc3(df1, c("group", "visit"))
>> TestFunc4(df1, c("group", "visit"))
>> TestFunc4(df1, c(group, visit)) # this works too
>>
>> On Mon, Nov 10, 2008 at 10:43 AM, Gabor Grothendieck
>> <[EMAIL PROTECTED]> wrote:
>>
>>> Here are a few things to try:
>>>
>>> TestFunc1 <- get("[")
>>>
>>> TestFunc2 <- function(DF, group) DF[group]
>>>
>>> TestFunc3 <- function(...) subset(..., subset = TRUE)
>>>
>>>
>>>
>>> On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick <[EMAIL PROTECTED]> wrote:
>>>
 Hello!

 I have the problem that in my function the passed variable is not used, 
 but the variable name of the dataframe itself - difficult to explain, but 
 an easy example:

 TestFunc<-function(df, group) {
 print(names(subset(df, select=group)))
 }
 df1<-data.frame(group="G1", visit="V1", value=0.9)
 TestFunc(df1, c("group", "visit"))

 Result:
 [1] "group"

 But I expected and want to have [1] "group" "visit" as result! Does 
 anybody know how to get this result?

 Thanks!
 Karl

>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Wacek Kusnierczyk
pardon me, but does this address in any way the legitimate complaint of
the rightfully confused user?

consider the following:

d = data.frame(a=1, b=2)
a = c("a", "b")
z = a
# that is, both a and z are c("a", "b")

subset(d, select=z)
# gives two columns, since z is a two element vector whose elements are
valid column names

subset(d, select=a)
# gives one column, since 'a' (but not a) is a valid column name

subset(d, select=c(a,b))
# gives two columns


this is certainly what the authors intended, and they may have good
grounds for this smart design.  but this must break the expectation of a
naive (r-naive, for that matter) user, who may otherwise have excellent
experience in using a functional programming language, e.g., scheme. 
(especially scheme, where symbols and expressions are first-class
objects, yet the distinction between a symbol or an expression and their
referent is made painfully clear, perhaps except for when one hacks with
macros.)

the examples above illustrate the notorious problem with r that one can
never tell whether 'a' means "the value referred to with the identifier
'a'" or "the symbol 'a'", unless one gets ugly surprises and is forced
to study the documentation.  and even then one may not get a clear answer.

the example given by the confused user is a red flag warning.  it's a
typical abstraction where a nested sequence of operations (here print
over names over subset) is abstracted into a single procedure, which can
be called with whatever arguments are valid:

pns = function(d, g) print(names(subset(d, select=g)))

what sane person, without carefully studying the gory details of subset,
will ever expect that if the first argument happens to have a column
named 'g', only this one will be selected, while if it doesn't, subset
will select the columns named by the components of what 'g' evaluates
to.  i wonder how many users have *not* noticed that what they get is
not what they assume they get because of such tricky tricks, and in
consequence were not able to publish their analyses (or worse, have
published them). 

what is scary is that this may happen with about any other function in
r, because the design is pervasive.  no one should ever use any r
function without first carefully reading the docs (which is not
guaranteed to help) or trying it first on a number of carefully crafted
test cases.  if such care is not taken, results obtained with r cannot
be taken seriously.


vQ


Gabor Grothendieck wrote:
> Forgot the name part.  Try:
>
> TestFunc2 <- function(DF, group) names(DF[group])
> TestFunc3 <- function(...) names(subset(..., subset = TRUE))
> TestFunc4 <- function(...) eval.parent(names(subset(..., subset = TRUE)))
>
> # e.g.
> df1 <- data.frame(group = "G1", visit = "V1", value = 0.9)
> TestFunc2(df1, c("group", "visit"))
> TestFunc3(df1, c("group", "visit"))
> TestFunc4(df1, c("group", "visit"))
> TestFunc4(df1, c(group, visit)) # this works too
>
> On Mon, Nov 10, 2008 at 10:43 AM, Gabor Grothendieck
> <[EMAIL PROTECTED]> wrote:
>   
>> Here are a few things to try:
>>
>> TestFunc1 <- get("[")
>>
>> TestFunc2 <- function(DF, group) DF[group]
>>
>> TestFunc3 <- function(...) subset(..., subset = TRUE)
>>
>>
>>
>> On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick <[EMAIL PROTECTED]> wrote:
>> 
>>> Hello!
>>>
>>> I have the problem that in my function the passed variable is not used, but 
>>> the variable name of the dataframe itself - difficult to explain, but an 
>>> easy example:
>>>
>>> TestFunc<-function(df, group) {
>>> print(names(subset(df, select=group)))
>>> }
>>> df1<-data.frame(group="G1", visit="V1", value=0.9)
>>> TestFunc(df1, c("group", "visit"))
>>>
>>> Result:
>>> [1] "group"
>>>
>>> But I expected and want to have [1] "group" "visit" as result! Does anybody 
>>> know how to get this result?
>>>
>>> Thanks!
>>> Karl
>>>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Duncan Murdoch

On 11/10/2008 10:18 AM, Karl Knoblick wrote:

Hello!

I have the problem that in my function the passed variable is not used, but the 
variable name of the dataframe itself - difficult to explain, but an easy 
example:

TestFunc<-function(df, group) {
print(names(subset(df, select=group)))
}
df1<-data.frame(group="G1", visit="V1", value=0.9)
TestFunc(df1, c("group", "visit"))

Result:
[1] "group"
 
But I expected and want to have [1] "group" "visit" as result! Does anybody know how to get this result?


Don't use subset.  You can get what you want using


print(names(df[,group]))

Or alternatively, you can force group to be found in the right place in 
this way:


e <- environment()
print(names(subset(df, select=e$group)))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Gabor Grothendieck
Here are a few things to try:

TestFunc1 <- get("[")

TestFunc2 <- function(DF, group) DF[group]

TestFunc3 <- function(...) subset(..., subset = TRUE)



On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick <[EMAIL PROTECTED]> wrote:
> Hello!
>
> I have the problem that in my function the passed variable is not used, but 
> the variable name of the dataframe itself - difficult to explain, but an easy 
> example:
>
> TestFunc<-function(df, group) {
> print(names(subset(df, select=group)))
> }
> df1<-data.frame(group="G1", visit="V1", value=0.9)
> TestFunc(df1, c("group", "visit"))
>
> Result:
> [1] "group"
>
> But I expected and want to have [1] "group" "visit" as result! Does anybody 
> know how to get this result?
>
> Thanks!
> Karl
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable passed to function not used in function in select=... in subset

2008-11-10 Thread Henrique Dallazuanna
Try this:

TestFunc<-function(df, group) {
return(names(eval(bquote(subset(df1, select = .(group))
}

On Mon, Nov 10, 2008 at 1:18 PM, Karl Knoblick <[EMAIL PROTECTED]>wrote:

> Hello!
>
> I have the problem that in my function the passed variable is not used, but
> the variable name of the dataframe itself - difficult to explain, but an
> easy example:
>
> TestFunc<-function(df, group) {
> print(names(subset(df, select=group)))
> }
> df1<-data.frame(group="G1", visit="V1", value=0.9)
> TestFunc(df1, c("group", "visit"))
>
> Result:
> [1] "group"
>
> But I expected and want to have [1] "group" "visit" as result! Does anybody
> know how to get this result?
>
> Thanks!
> Karl
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.