Re: [R] Variable passed to function not used in function in select=... in subset
On 11/11/2008 2:56 PM, Bert Gunter wrote: Ummm... as today is still Armistice day (in my time zone, anyway), maybe we should call a truce and end this flame war... I haven't seen very many flames -- there have been disagreements, but generally it's been quite civil. Certainly I don't think Berwin flamed me. If we were to add in a warning about partial name matching, it would have to be accompanied by some way to deal with common uses like the one Berwin mentioned. (There are at least 100 uses of seq(..., length=...) in the core & recommended packages. I wouldn't want to fix all of those.) But it could still be useful, in the same way the checks for using TRUE and FALSE instead of T and F are useful. Duncan Murdoch Cheers, Bert -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Berwin A Turlach Sent: Tuesday, November 11, 2008 9:31 AM To: Duncan Murdoch Cc: R help Subject: Re: [R] Variable passed to function not used in function in select=... in subset G'day Duncan, On Tue, 11 Nov 2008 09:37:57 -0500 Duncan Murdoch <[EMAIL PROTECTED]> wrote: I think this tension is a fundamental part of the character of S and R. But it is also fundamental to R that there are QC tests that apply to code in packages: so writing new tests that detect dangerous usage (e.g. to disallow partial name matching) would be another way to improve reliability. [...] Please not. :) After years of using of R, it is now second nature to me to type (yes, I always spell out "from" and "to") seq(from=xx, to=yy, length=zz) and I never understood why the full name of that argument had to be length.out. I would hate to see lots of warning messages because I am using partial matching. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Ummm... as today is still Armistice day (in my time zone, anyway), maybe we should call a truce and end this flame war... Cheers, Bert -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Berwin A Turlach Sent: Tuesday, November 11, 2008 9:31 AM To: Duncan Murdoch Cc: R help Subject: Re: [R] Variable passed to function not used in function in select=... in subset G'day Duncan, On Tue, 11 Nov 2008 09:37:57 -0500 Duncan Murdoch <[EMAIL PROTECTED]> wrote: > I think this tension is a fundamental part of the character of S and > R. But it is also fundamental to R that there are QC tests that apply > to code in packages: so writing new tests that detect dangerous > usage (e.g. to disallow partial name matching) would be another way > to improve reliability. [...] Please not. :) After years of using of R, it is now second nature to me to type (yes, I always spell out "from" and "to") seq(from=xx, to=yy, length=zz) and I never understood why the full name of that argument had to be length.out. I would hate to see lots of warning messages because I am using partial matching. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Some of the uses of non-standard evaluation are undoubtedly a problem in R. Probably the worst is in model.frame, because it is much harder to work around. I have never used subset(,select=) and hence have never been at risk of confusion (if you don't like how it works, I suggest you do the same), but model.frame() is inside lots of things. There are two issues here that I think are worth pointing out: 1/ Some things are just not fixable any more. They can only be fixed in a new language. The people thinking about new statistical languages mostly know what the problems are, because they have been using S and/or R for many years and it's really not that hard to notice the problems. The document on non-standard evaluation demonstrates that R-core is aware of this particular problem. 2/ There are some uses of non-standard evaluation that don't seem to confuse people, and an interesting question is how to characterise them. These are what I referred to as 'macro-like functions' in the document that you have already been referred to. For example, subset(,subset=) and with() don't seem to be as confusing or to cause problems for programmers in the same way. There is an empirical question as to what these relatively non-problematic constructs are, and a theoretical question as to why they are different. In particular, with() not only has non-standard evaluation, it is quite similar to the notoriously confusing attach(). -thomas On Tue, 11 Nov 2008, Wacek Kusnierczyk wrote: Berwin A Turlach wrote: On Tue, 11 Nov 2008 09:27:41 +0100 Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: but then it might be worth asking whether carrying on with misdesign for backward compatibility outbalances guaranteed crashes in future users' programs, [...] Why is it worth asking this if nobody else asks it? i guess most of the people who do ask questions here care little about r itself, they just want it to solve a problem, even if it involves hacking the language. those outside the r team who care about language design have probably left the list long ago, if only they were subscribed. the fact that it's only me asking is no statistics. i do talk to people, and know many who'd ask, but they just don't care, because they have already trashed r. instead of discouraging me, make use of that i care to ask. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Berwin A Turlach wrote: > G'day Duncan, > > On Tue, 11 Nov 2008 09:37:57 -0500 > Duncan Murdoch <[EMAIL PROTECTED]> wrote: > >> I think this tension is a fundamental part of the character of S and >> R. But it is also fundamental to R that there are QC tests that apply >> to code in packages: so writing new tests that detect dangerous >> usage (e.g. to disallow partial name matching) would be another way >> to improve reliability. [...] > > Please not. :) > After years of using of R, it is now second nature to me to type (yes, > I always spell out "from" and "to") > seq(from=xx, to=yy, length=zz) > and I never understood why the full name of that argument had to be > length.out. I would hate to see lots of warning messages because I am > using partial matching. I think the story is this: At some point in time, in a galaxy not too far away, and using one of the R-like languages, calling the argument "length" gave you trouble calling length(from) inside the function ("attempt to call non-function" or some such error). Later, this issue was fixed so that function calls would look for functions only, but by then, the name couldn't be changed since some people had been writing it out in full. (There are a couple of other cases, one of them involving an argument ending in a ".", but I forgot what they are. I don't think there was ever an along() function, so "along.with" escapes me.) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Gavin Simpson wrote: > >> my whole posting is an attempt, may you try to notice. >> >> vQ >> > > Did you read what you wrote. And you still wonder why you get little > response from certain quarters? > > 1) Don't say "no further comment" - that is quite arrogant to think that > you are right and everyone who disagrees is wrong. > that meant 'i won't further comment on this, i give up'. i thought for a while about explaining this, but then i though i might use the r strategy -- let it be ambiguous. > 2) You are being critical of other people's work in a manner that is not > polite or respectful of the efforts of others. > i can certainly agree that i don't pay attention to diplomacy. my favourite philosopher said 'you should seek friends, not truth'; i betray him here, fortunately or not. anyway, if you mean a post raising serious issues should be ignored just because it is not polished enough, let it be. you gain peace, you lose feedback. i can promise to make more effort to wrap the essence in a cake, and drop unnecessary pun (you know, i have drop=FALSE by default, because that's the way many languages other than r have), if this helps. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Gavin Simpson wrote: > > I've found several of these discussions involving Wacek's questions very > enlightening at times; once you get past the "it doesn't work as I > expect so is wrong" attitude. > just one fix: my attitude is 'it doesn't work as i imagine an average user would expect it so it's potentially confusing'. vQ -- --- Wacek Kusnierczyk, MD PhD Email: [EMAIL PROTECTED] Phone: +47 73591875, +47 72574609 Department of Computer and Information Science (IDI) Faculty of Information Technology, Mathematics and Electrical Engineering (IME) Norwegian University of Science and Technology (NTNU) Sem Saelands vei 7, 7491 Trondheim, Norway Room itv303 Bioinformatics & Gene Regulation Group Department of Cancer Research and Molecular Medicine (IKM) Faculty of Medicine (DMF) Norwegian University of Science and Technology (NTNU) Laboratory Center, Erling Skjalgsons gt. 1, 7030 Trondheim, Norway Room 231.05.060 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
G'day Duncan, On Tue, 11 Nov 2008 09:37:57 -0500 Duncan Murdoch <[EMAIL PROTECTED]> wrote: > I think this tension is a fundamental part of the character of S and > R. But it is also fundamental to R that there are QC tests that apply > to code in packages: so writing new tests that detect dangerous > usage (e.g. to disallow partial name matching) would be another way > to improve reliability. [...] Please not. :) After years of using of R, it is now second nature to me to type (yes, I always spell out "from" and "to") seq(from=xx, to=yy, length=zz) and I never understood why the full name of that argument had to be length.out. I would hate to see lots of warning messages because I am using partial matching. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 11 Nov 2008 12:53:31 +0100 Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > but seriously, when one buys a complicated device one typically reads > a quick start guide, and makes intuitive assumptions about how the > device will work, turning back to the reference when the expectations > fail. good design should aim at reducing the need for checking why an > intuitive assumption fails. And on what are these intuitive assumptions based if not on familiarity with similar devices? And people have different intuition, why should yours be the correct one and the golden standard? I know that if I buy a complicated device and never owned something similar I read more than the quick start guide to get familiar with the device before breaking something due to using wrong assumptions. When I started to use S-PLUS, I had used GAUSS before. Still I took the time off to work through the blue book and make myself familiar with S-PLUS before using it for serious work. Based on my experience with R, I found R very intuitive and easy to use; but still try to keep up with relevant documentation. It really seems that your problem is that you have an attitude of wanting to have instant gratification. > > If you do not care about how to use machine-gun correctly you could > > easily harm yourself or others. > > > indeed, and i'm scared to think that some of the published research > can be harmful because the researcher denied to read the whole r > reference before doing a stats analysis. Sorry, but this is absolute rubbish. There are plenty of statistical analyses that can be done without reading the complete R reference. However, one or two good books might help. My concern would rather be that everybody thinks that they can do statistics and that software project of R makes such people really think they can do it. I am far more concerned about inappropriate analyses and wrong interpretations. How often is absence of evidence taken as evidence of absence? > you see, i'm not complaining about my own analyses failing because i > have not read the appropriate section in the reference. if this were > the problem, i'd just read more and keep silent. > > i'm complaining about the need to read, by anyone who starts up with > r, in all gory details, about the intricacies of r before doing > anything, because the behaviour is often so unexpected. I guess Frank Harrell had people like you in mind when he wrote: https://stat.ethz.ch/pipermail/r-help/2005-April/068625.html Would you also not expect to learn about surgery in all its gory details before attempting brain surgery because brain surgery is so intuitive and doesn't need any study? Believe it or not, there are lots of useful things that you can do in R without knowing all the gory details. There are even people who got books on R published who obviously don't know all the gory details and they still show useful applications of R. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6515 4416 (secr) Dept of Statistics and Applied Probability+65 6515 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 11 Nov 2008 11:27:30 +0100 Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > Berwin A Turlach wrote: > > Why is it worth asking this if nobody else asks it? Most notably a > > certain software company in Redmond, Washington, which is famous for > > carrying on with bad designs and bugs all in the name of backward > > compatibility. Apparently this company also sets industry > > standards so it must be o.k. to do that. ;-) > > > > sure. i have had this analogy in mind for a long time, but just > didn't want to say it aloud. Mate, if you contemplate comparing R to anything coming out of Redmond, Washington, then you should first heed the old saying that "it is better to remain silent and let people believe that one is a fool than to open one's mouth and remove any doubt". :) > indeed, r carries on with bad design, but since there are more and > more users, it's just fine. Whether R carries on with bad design is debatable. Clearly the changes that you would like to see would lead to big changes that might break a lot of existing code and programming idioms. Such changes could estrange large part of the user base and, in a worst case scenario, make R unusable for many tasks it is used now. No wonder that nobody is eager to implement such design changes. Apparently Python is planning such whole sale changes when moving to version 3.x. Let's see what that does to the popularity of Python and the uptake of the new version. > > Didn't see any confused complaints yet. > > really. the discussion was motivated precisely by a user's > complaint. We must have different definition of what constitutes a complaint. I looked at the initial posting again. In my book there was no complaint. Just a user who asked how to achieve a certain aim because the way he tried to achieve it did not work. There were three or four constructive answers that pointed out how it can be done and then all of a sudden complaints about alleged design flaws of R started. > just scan this list; a large part of the questions stems from > confusion, which results directly from r's design. That's your opinion, to which you are of course entitled to. In my opinion, a large part of the questions on r-help these days stem from the fact that in this age of instant gratification it seems to be easier to fire off an e-mail to a mailing list and try to pick the brain of thousands of subscribers instead of spending time on trying to read the documentation, learn about R and figure out the question on one's own. > >> because r is likely to do everything but what the user expects. > > > > This is quite a strong statement, and I wonder what the basis is for > > that a statement. Care to provide any evidence? > > i could think of organizing a (don't)useR conference, where > submissions would provide such evidence. Please do so. Such a conference would probably turn out to be more hilarious and funnier than the Melbourne International Comedy Festival; should be real fun to attend. :) > whatever i say here, is mostly discarded as nonsense comments (while > it certainly isn't), you say i make the problem up (while i just > follow up user's complaints). seriously, i may have exaggerated in > the immediately above, but lots of comments made here by the users > convince me that r very often breaks expectations. Ever heard about biased sampling? On a list like this you, of course, hear questions by useRs who had the wrong expectations about how R should behave and got surprised. You do not hear of all the instances in which useRs had the correct expectations which promptly were met by R. > > R is a tool; a very powerful one and hence also very sharp. It is > > easy to cut yourself with it, but when one knows how to use it > > gives the results that one expects. I guess the problem in this > > age of instant gratification is that people are not willing to put > > in the time and effort to learn about the tools they are using. > > but a good tool should be made with care for how users will use it. But the group of users change, and sometimes one cannot foresee all possible ways in which future users may use the software. As a programming paradigm says, "you cannot make a piece of software idiot-proof; nature will always come up with a better idiot". > r apparently fits the ideas of its developers, That's the prerogative of the developers, isn't it? But if it would only fit their ideas, then it would only be used by them. The fact that it is used by many others seem to indicate that it fits also the ideas of many others. > while confuses naive users. Well, many judiciaries have staged driver licenses for motorcycle; initially allowing only low-powered machine for new users with increasing powerful machines allowed for more experiences users. Some people in Australia would like to introduce a similar system for car-drivers since, apparently, too many P-platers kill themselves with high-powered V8 c
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 2008-11-11 at 15:54 +0100, Wacek Kusnierczyk wrote: > > Have you tried? But bear in mind that R Core has more to balance that > > just whether you think a design "flaw" or infelicity etc should be fixed > > when it decides whether to accept patches. > > > > my whole posting is an attempt, may you try to notice. > > vQ Did you read what you wrote. And you still wonder why you get little response from certain quarters? 1) Don't say "no further comment" - that is quite arrogant to think that you are right and everyone who disagrees is wrong. 2) You are being critical of other people's work in a manner that is not polite or respectful of the efforts of others. There is nothing wrong with being critical - I never said there was - but there is a right way to go about it and a wrong way. Also, you have to consider where we are now with R and where we have come from. Whilst it would, in an ideal world, be great to fix every design flaw that you think is in R, there is too much inertia there now to change somethings or it will take a lot of effort on the part of a team of people who give that time for free. This has to be a consideration along side all the other considerations of good design, improving the logic of how R works etc. You might not agree, but as long as things are documented to work in a particular way then we might have to live with them, unless a good case can be made to break existing code and someone steps up to make the changes. G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 2008-11-11 at 08:14 -0600, hadley wickham wrote: > > And without wanting to be rude or anything, your opinion carries very > > little weight in a project like R. You've arrived on the list and been > > very critical of the work of others. Now there is nothing wrong with > > being critical if it is constructive, and additionally with something > > like R you need to be constructive *and* contribute back. I'm not saying > > You are holding Wacek to a very high standard. Why is not acceptable > to say that this part of R is hard to understand without having to > provide a better solution? Ok, reading back I should have said if you want something fixed, patches are welcome. I didn't mean to say that to get help you had to contribute back. However, Wacek's approach was (and I'm paraphrasing): subset doesn't work logically or as I expect. It is a mess and needs fixing. I'm sure no-one on the list minds if people don't understand things and want to ask questions - I know I ask plenty of questions here about things I don't understand. But just as there is a posting guide that says how to go about phrasing a question that is likely to get a response, we don't need people denigrating the work of others whilst asking for assistance with what are admittedly hard concepts (ones I don't fully understand either). I've found several of these discussions involving Wacek's questions very enlightening at times; once you get past the "it doesn't work as I expect so is wrong" attitude. > > subset() _is_ confusing to novice R users. You can not anticipate > what subset(df, select = a) will do unless you know what variables are > defined in the local environment and what variables are defined in the > data frame. It is hard to understand how it works without a deep > understanding of environments and it is hard to teach all the special > cases. It is difficult to reliably use subset within another > function. I agree, but one can read the documentation for help. It isn't perfect and expects you know a bit (a lot) about environments etc, but I don't think it is too confusing if you know what is in df (otherwise how do you know what to select?), you read the help page and follow the examples. G > > This comes from my personal experience with subset (good for > interactive use, never program with) and from my experiences teaching > ~80 students how to use R over the last two years. > > Hadley > -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
> I think your analysis is correct, that the goals of casual use and > programming are inconsistent. But in general I think there's always going > to be support for providing alternative ways that are programmer-safe. > > For instance, library( foo, character.only=TRUE) says that foo is a > character vector, not the name of a package. I don't know of anything that > subset() provides that is not available in other ways (I think of it as > purely a convenience function, and my first piece of advice to Karl was not > to use it). Good points - every function optimised for interactive use should have a companion that is optimised for programmatic use. > However, if there really is something there, then it would be > worthwhile pointing that out, and either modifying subset() to make it safe, > or providing an alternative function. When I teach subsetting I try to make this clear - using [ will always work, there's no magic and everything is explicit. subset() has more magic which saves you typing, but occasionally the magic doesn't work and you'll be left scratching your head as to why. In my experience students prefer subset() until they encounter strange behaviour that they don't understand. > I think this tension is a fundamental part of the character of S and R. But > it is also fundamental to R that there are QC tests that apply to code in > packages: so writing new tests that detect dangerous usage (e.g. to > disallow partial name matching) would be another way to improve reliability. > Writing a test for misuse of drop=TRUE seems quite hard, but there are > probably ways a debugger could do it: e.g. to tag the invocation as to > whether any indices were dropped on the first call, and then warn if the > result isn't the same on every subsequent call). A similar thing would be to force package authors to explicitly specify na.rm to ensure that they have thought about how to deal with missing values (this always trips me up). Perhaps you could treat drop similarly - in non-interactive code drop should not have a default value. Presumably this wouldn't be too hard to implement - R CMD check would just switch out [ for a version that didn't have a default value, in a similar way to what happens with T and F (another example of implicit interactive use vs. explicit programmatic use) > Conceivably Karl's problem could be detected in the same way: tag each name > in the expression as to whether it was found in the data frame or some other > environment, and then warn if that tag ever changes. Or maybe the test > should just warn that subset() is a convenience function, not meant for > programming. It would be nice if the documentation was clearer on these issues. I can imagine every function having a numeric value associated with it which gave it's position on the interactive vs programming continuum. Then you could sum up all the values in a function and warn the author if it was too high. Not very practical to implement though! Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Duncan Murdoch wrote: > On 11/11/2008 8:53 AM, hadley wickham wrote: >> On Mon, Nov 10, 2008 at 1:04 PM, Wacek Kusnierczyk >> <[EMAIL PROTECTED]> wrote: >>> pardon me, but does this address in any way the legitimate complaint of >>> the rightfully confused user? >>> >>> consider the following: >>> >>> d = data.frame(a=1, b=2) >>> a = c("a", "b") >>> z = a >>> # that is, both a and z are c("a", "b") >>> >>> subset(d, select=z) >>> # gives two columns, since z is a two element vector whose elements are >>> valid column names >>> >>> subset(d, select=a) >>> # gives one column, since 'a' (but not a) is a valid column name >>> >>> subset(d, select=c(a,b)) >>> # gives two columns >>> >>> >>> this is certainly what the authors intended, and they may have good >>> grounds for this smart design. but this must break the expectation >>> of a >>> naive (r-naive, for that matter) user, who may otherwise have excellent >>> experience in using a functional programming language, e.g., scheme. >>> (especially scheme, where symbols and expressions are first-class >>> objects, yet the distinction between a symbol or an expression and >>> their >>> referent is made painfully clear, perhaps except for when one hacks >>> with >>> macros.) >>> >>> the examples above illustrate the notorious problem with r that one can >>> never tell whether 'a' means "the value referred to with the identifier >>> 'a'" or "the symbol 'a'", unless one gets ugly surprises and is forced >>> to study the documentation. and even then one may not get a clear >>> answer. >> >> I agree, with some caveats. There are basically two uses of R: as a >> interactive data analysis package and as a statistical programming >> language. These uses come into conflict: in the interactive >> environment, you want to minimise typing so that you can be as speedy >> as possible. It doesn't matter if R occasionally makes a wrong guess >> when you have specified something implicitly, because you can fix it >> on the fly. When you are programming, you care less about saving >> typing and more about reproducibility. You want to be explicit so >> your function is robust to widely varying inputs, even if it means you >> have to type a lot more. You see this tension in quite a few places: >> >> * drop = T >> * functions that return different types of output (e.g. sapply) >> depending on input parameters >> * partial matching of argument names >> * using unevaluated expressions instead of strings (e.g. library, >> subset, ...) >> >> These are all things that are helpful for interactive use, but make >> life as a programmer more difficult. I find the last one particularly >> frustrating because it means it is very difficult to program with some >> functions (i.e subset) without resorting to complex quoting, >> substituting and evaluating tricks. I have tried to steer away from >> this technique in my packages, and where it's just too convenient for >> interactive use, insulating the deparsing into special functions that >> the data analyst must use (e.g. aes() in ggplot, and .() in plyr), >> along with providing alternatives for the programmer. >> >> I don't understand why you're getting so much push-back on this issue. >> R is a fantastic language, but it has some genuinely nasty corners. >> In my opinion, this is one of them. > > I think your analysis is correct, that the goals of casual use and > programming are inconsistent. But in general I think there's always > going to be support for providing alternative ways that are > programmer-safe. you know, in ipython you can write, e.g., m 1 instead of m(1) to call the method m on the value 1. but this is a syntactic shorthand which is not valid in python, and you can see how it gets translated into python when you try it. so you have the cake and you eat it -- there is consistent (at least, much more consistent than in r) policy on the syntax, and you can still have conveniences in the interactive interpreter. r, on the other hand, prefers solutions such as the subset one, which are the best recipe for confusion. why would not the r team have a look at what others are doing? programming language design has progressed a lot since the so often cited reference for r was written in 1988. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Gavin Simpson wrote: > On Tue, 2008-11-11 at 11:08 +0100, Wacek Kusnierczyk wrote: > >> Gavin Simpson wrote: >> d = data.frame(a = 1) d$`-b` = 2 names(d) # here we go subset(d, select = -b) # to b or not to b? >>> but -b is not the name of the column; you explicitly called it `-b` and >>> you should refer to it as such. If you use "non-standard" names then >>> expect to do a bit more work. >>> >>> >> identical(names(d)[2], "-b") >> >> if i do >> >> d$`c` = 4 >> >> then you claim d has no column named 'c'? >> > > No, where do you get that from? > by simple analogy to the above, just read your own comments. if you're suggesting one should not expect this bit to be consistent, it would be just another example of messy semantics. > >> do i have to refer to the c >> column as `c`? >> > > No, but then "c" is a name that doesn't need to be quoted. -b is a name > that needs to be quoted and if you quote it, things work as you might > expect. > not necessarily, as one of my examples showed: again, the result of subset(d, select=`-b`) will depend on whether d has a column named '-b', and if it doesn't, on whether there is a variable called '-b' that is a character vector. there is no way out of this issue, backquoting is no solution. no further comment. > >> >>> >>> subset(d, select = `-b`) >>> -b >>> 1 2 >>> >>> >> ... and i have to use >> >> subset(d, select = `a`) >> >> and not >> >> subset(d, select = a) >> >> right? >> > > Is "a" a name in d? You can quote it if you want but it doesn't need to > be quoted, so you can use either. > you see, yo need to know whether 'a' is a name in d to know what subset(d, select=a) would do. no further comment. > >> besides, subset(d, select = `-b`) should rather return the >> column(s) whose names are the value of the variable `-b`: >> >> `-a` = "a" >> subset(d, select = `-a`) >> # returns all columns except for the one named 'a', rather than the >> column named '-a' -- but that's just because there is no such column in >> d; if there were, this one would be returned. >> > > No, it returns a if you are following on from your original examples. > `-a` refers to a variable (object) and that evaluates to "a" and "a" is > component of d so is returned. > you're right here, but the problem remains: subset(d, select=`-a`) will treat `-a` as a column name or as a name of a variable with a vector of column names, depending on what's in the data. no further comment. > >> so even with backquotes used, there is no obvious interpretation of what >> select=`-b`should mean, because it depends on what names components of >> the first argument have. and this breaks the concept of referential >> transparency. >> >> so the problem is not so easily explained away. what subset does *is* >> messy. >> > > In your opinion. > yes, but not only mine. perhaps some more r users will want to support this claim; just wait. > And without wanting to be rude or anything, your opinion carries very > little weight in a project like R. You've arrived on the list and been > very critical of the work of others. Now there is nothing wrong with > being critical if it is constructive, and additionally with something > like R you need to be constructive *and* contribute back. I'm not saying > that if you did patch R to work the way you think is correct R Core will > accept them as they need to maintain backwards compatibility and with S > and not annoy the hundreds of package authors. but coming on here and > criticising the work of others isn't going to win you many friends. > that's really sad. you're saying no one should ever criticize r without reading the source code. you are *really* not interested in feedback. note, feedback on the *design*, not implementation, is not fixed by sending a patch. you have a serious misconception here. if i buy a tv, and read the quick guide, and start using it, and push buttons, and suddenly get an electric shock, and complain to the manufacturer, and they say i should have carefully read the 2K pages manual because it says there i can get high voltage on my fingers while pushing the buttons, and it's my fault, and if i want to complain i should first study the schematics --- what?? they're just crazy, no? > Also, subset (and the other things you've been harping on about) work as > documented. So you kind of have to like it or lump it. > we've just gone through the docs, and it's *you* who thinks it's so beautifully clear from the docs what subset does. i lump it. > >> subset(d, select = - `-b`) >>> a >>> 1 1 >>> >>> >>> b = "a" subset(d, select = -b) # tragedy >>> For this, I interpret it as not finding a column named b so tries to >>> evaluate: >>> >>> >>> >> y
Re: [R] Variable passed to function not used in function in select=... in subset
On 11/11/2008 8:53 AM, hadley wickham wrote: On Mon, Nov 10, 2008 at 1:04 PM, Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: pardon me, but does this address in any way the legitimate complaint of the rightfully confused user? consider the following: d = data.frame(a=1, b=2) a = c("a", "b") z = a # that is, both a and z are c("a", "b") subset(d, select=z) # gives two columns, since z is a two element vector whose elements are valid column names subset(d, select=a) # gives one column, since 'a' (but not a) is a valid column name subset(d, select=c(a,b)) # gives two columns this is certainly what the authors intended, and they may have good grounds for this smart design. but this must break the expectation of a naive (r-naive, for that matter) user, who may otherwise have excellent experience in using a functional programming language, e.g., scheme. (especially scheme, where symbols and expressions are first-class objects, yet the distinction between a symbol or an expression and their referent is made painfully clear, perhaps except for when one hacks with macros.) the examples above illustrate the notorious problem with r that one can never tell whether 'a' means "the value referred to with the identifier 'a'" or "the symbol 'a'", unless one gets ugly surprises and is forced to study the documentation. and even then one may not get a clear answer. I agree, with some caveats. There are basically two uses of R: as a interactive data analysis package and as a statistical programming language. These uses come into conflict: in the interactive environment, you want to minimise typing so that you can be as speedy as possible. It doesn't matter if R occasionally makes a wrong guess when you have specified something implicitly, because you can fix it on the fly. When you are programming, you care less about saving typing and more about reproducibility. You want to be explicit so your function is robust to widely varying inputs, even if it means you have to type a lot more. You see this tension in quite a few places: * drop = T * functions that return different types of output (e.g. sapply) depending on input parameters * partial matching of argument names * using unevaluated expressions instead of strings (e.g. library, subset, ...) These are all things that are helpful for interactive use, but make life as a programmer more difficult. I find the last one particularly frustrating because it means it is very difficult to program with some functions (i.e subset) without resorting to complex quoting, substituting and evaluating tricks. I have tried to steer away from this technique in my packages, and where it's just too convenient for interactive use, insulating the deparsing into special functions that the data analyst must use (e.g. aes() in ggplot, and .() in plyr), along with providing alternatives for the programmer. I don't understand why you're getting so much push-back on this issue. R is a fantastic language, but it has some genuinely nasty corners. In my opinion, this is one of them. I think your analysis is correct, that the goals of casual use and programming are inconsistent. But in general I think there's always going to be support for providing alternative ways that are programmer-safe. For instance, library( foo, character.only=TRUE) says that foo is a character vector, not the name of a package. I don't know of anything that subset() provides that is not available in other ways (I think of it as purely a convenience function, and my first piece of advice to Karl was not to use it). However, if there really is something there, then it would be worthwhile pointing that out, and either modifying subset() to make it safe, or providing an alternative function. I think this tension is a fundamental part of the character of S and R. But it is also fundamental to R that there are QC tests that apply to code in packages: so writing new tests that detect dangerous usage (e.g. to disallow partial name matching) would be another way to improve reliability. Writing a test for misuse of drop=TRUE seems quite hard, but there are probably ways a debugger could do it: e.g. to tag the invocation as to whether any indices were dropped on the first call, and then warn if the result isn't the same on every subsequent call). Conceivably Karl's problem could be detected in the same way: tag each name in the expression as to whether it was found in the data frame or some other environment, and then warn if that tag ever changes. Or maybe the test should just warn that subset() is a convenience function, not meant for programming. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
> And without wanting to be rude or anything, your opinion carries very > little weight in a project like R. You've arrived on the list and been > very critical of the work of others. Now there is nothing wrong with > being critical if it is constructive, and additionally with something > like R you need to be constructive *and* contribute back. I'm not saying You are holding Wacek to a very high standard. Why is not acceptable to say that this part of R is hard to understand without having to provide a better solution? subset() _is_ confusing to novice R users. You can not anticipate what subset(df, select = a) will do unless you know what variables are defined in the local environment and what variables are defined in the data frame. It is hard to understand how it works without a deep understanding of environments and it is hard to teach all the special cases. It is difficult to reliably use subset within another function. This comes from my personal experience with subset (good for interactive use, never program with) and from my experiences teaching ~80 students how to use R over the last two years. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On 11/11/2008 5:00 AM, Berwin A Turlach wrote: Radford Neal is also complaining on his blog (http://radfordneal.wordpress.com/) about what he thinks are design flaws in R. Why don't you two get together and design a good substitute without any flaws? Or is that too hard? ;-) I agree with Radford (who was complaining about surprising behaviour with dropped dimensions in array indexing, and the result of 1:n when n is zero), but I don't particularly like his solution. It seems to me that introducing a new operator that returns "a sequence from 1 up to n" is a good idea, but having a new data type is not: there is too much legacy code that would not be able to handle it. So we need some other way to handle the array indexing problem, such as ways to detect unintentional omissions of "drop=FALSE", if we want to handle it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 2008-11-11 at 11:08 +0100, Wacek Kusnierczyk wrote: > Gavin Simpson wrote: > > > >> d = data.frame(a = 1) > >> d$`-b` = 2 > >> names(d) > >> # here we go > >> > >> subset(d, select = -b) > >> # to b or not to b? > >> > > > > but -b is not the name of the column; you explicitly called it `-b` and > > you should refer to it as such. If you use "non-standard" names then > > expect to do a bit more work. > > > identical(names(d)[2], "-b") > > if i do > > d$`c` = 4 > > then you claim d has no column named 'c'? No, where do you get that from? > do i have to refer to the c > column as `c`? No, but then "c" is a name that doesn't need to be quoted. -b is a name that needs to be quoted and if you quote it, things work as you might expect. > > > > > >> subset(d, select = `-b`) > >> > > -b > > 1 2 > > > > ... and i have to use > > subset(d, select = `a`) > > and not > > subset(d, select = a) > > right? Is "a" a name in d? You can quote it if you want but it doesn't need to be quoted, so you can use either. > besides, subset(d, select = `-b`) should rather return the > column(s) whose names are the value of the variable `-b`: > > `-a` = "a" > subset(d, select = `-a`) > # returns all columns except for the one named 'a', rather than the > column named '-a' -- but that's just because there is no such column in > d; if there were, this one would be returned. No, it returns a if you are following on from your original examples. `-a` refers to a variable (object) and that evaluates to "a" and "a" is component of d so is returned. > > so even with backquotes used, there is no obvious interpretation of what > select=`-b`should mean, because it depends on what names components of > the first argument have. and this breaks the concept of referential > transparency. > > so the problem is not so easily explained away. what subset does *is* > messy. In your opinion. And without wanting to be rude or anything, your opinion carries very little weight in a project like R. You've arrived on the list and been very critical of the work of others. Now there is nothing wrong with being critical if it is constructive, and additionally with something like R you need to be constructive *and* contribute back. I'm not saying that if you did patch R to work the way you think is correct R Core will accept them as they need to maintain backwards compatibility and with S and not annoy the hundreds of package authors. but coming on here and criticising the work of others isn't going to win you many friends. Also, subset (and the other things you've been harping on about) work as documented. So you kind of have to like it or lump it. > > > >> subset(d, select = - `-b`) > >> > > a > > 1 1 > > > > > >> b = "a" > >> subset(d, select = -b) > >> # tragedy > >> > > > > For this, I interpret it as not finding a column named b so tries to > > evaluate: > > > > > > you interpret it. how obvious is this for most users? > it tries to find a column named 'b', not a column named b. that's the > problem with subset. If users read the documentation then they'd know about unary operators. > > > >> b = "a" > >> `-`(b) > >> > > Error in -b : invalid argument to unary operator > > > > `-` is a function remember. > > > > If you want this to work you can use get() > > > > > >> subset(d, select = - get(b)) > >> > > -b > > 1 2 > > > > > > "use this hack to get around the design." No hack, that is what get() is for. b is *not* a component of d. - b (or `-`(b) evaluates to an error. If you want to select columns except the column referenced by the contents of b (which is "a") then you can use get(). > > >> d$b = 3 > >> subset(d, select = -b) > >> # catharsis > >> > >> (for whatever reason a user may choose to have a column named '-b') > >> > > > > Yes, but the user is warned about not using standard naming conventions > > in the Introduction to R manual. You aren't stopped from using names > > like `-b` but if you use them, you have to expect to work a little > > harder. > > > > i'd like you to point me to that warning, as i apparently need to read > it, but i haven't found it in the manual yet. thanks. You could look at section 1.8 of An Introduction to R for a starter. ?Syntax is also a logical place to start and it explicitly refers you to details in the See Also section. If you read all of those (but I'll save you some time and point you to ?Quotes) you find the answers to how things like this work. ?Quotes explains what are syntactic names and how to use '`' backticks to quote non-syntactic names. Ok, ?Syntax and ?Quotes may not jump out at you as being very obvious places to look. If so, grab the source to the introduction to R manual, find a logical place to put this information or note to point people to the help pages and patch it accordingly. Then contribute that back to good of everyone. > > > Reading ?subset we have: > > > > select: e
Re: [R] Variable passed to function not used in function in select=... in subset
On Mon, Nov 10, 2008 at 1:04 PM, Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > pardon me, but does this address in any way the legitimate complaint of > the rightfully confused user? > > consider the following: > > d = data.frame(a=1, b=2) > a = c("a", "b") > z = a > # that is, both a and z are c("a", "b") > > subset(d, select=z) > # gives two columns, since z is a two element vector whose elements are > valid column names > > subset(d, select=a) > # gives one column, since 'a' (but not a) is a valid column name > > subset(d, select=c(a,b)) > # gives two columns > > > this is certainly what the authors intended, and they may have good > grounds for this smart design. but this must break the expectation of a > naive (r-naive, for that matter) user, who may otherwise have excellent > experience in using a functional programming language, e.g., scheme. > (especially scheme, where symbols and expressions are first-class > objects, yet the distinction between a symbol or an expression and their > referent is made painfully clear, perhaps except for when one hacks with > macros.) > > the examples above illustrate the notorious problem with r that one can > never tell whether 'a' means "the value referred to with the identifier > 'a'" or "the symbol 'a'", unless one gets ugly surprises and is forced > to study the documentation. and even then one may not get a clear answer. I agree, with some caveats. There are basically two uses of R: as a interactive data analysis package and as a statistical programming language. These uses come into conflict: in the interactive environment, you want to minimise typing so that you can be as speedy as possible. It doesn't matter if R occasionally makes a wrong guess when you have specified something implicitly, because you can fix it on the fly. When you are programming, you care less about saving typing and more about reproducibility. You want to be explicit so your function is robust to widely varying inputs, even if it means you have to type a lot more. You see this tension in quite a few places: * drop = T * functions that return different types of output (e.g. sapply) depending on input parameters * partial matching of argument names * using unevaluated expressions instead of strings (e.g. library, subset, ...) These are all things that are helpful for interactive use, but make life as a programmer more difficult. I find the last one particularly frustrating because it means it is very difficult to program with some functions (i.e subset) without resorting to complex quoting, substituting and evaluating tricks. I have tried to steer away from this technique in my packages, and where it's just too convenient for interactive use, insulating the deparsing into special functions that the data analyst must use (e.g. aes() in ggplot, and .() in plyr), along with providing alternatives for the programmer. I don't understand why you're getting so much push-back on this issue. R is a fantastic language, but it has some genuinely nasty corners. In my opinion, this is one of them. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Berwin A Turlach wrote: > On Tue, 11 Nov 2008 09:49:31 +0100 > Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > > >> (for whatever reason a user may choose to have a column named '-b') >> > > For whatever reason, people also jump from bridges. Does that mean > all bridges have an inherently flawed design and should be abolished? > > Wait, then we would only have level crossing and some people, for > whatever reason, think it is a good idea to race trains to level > crossings. Gee, we better abolish them too since they are such a bad > design. > i agree that the case of -b is extreme, but your response is still unfair to the original problem. people that jump from bridges usually do that intentionally. the intention of the user who complained about his code (below) was certainly not to jump off a bridge, but to walk over it. and yet he's fallen into cold water. a bridge which makes you fall when you want to walk and not to jump has flawed design and is a good candidate for abolishing. testfunc = function(data, group) print(names(subset(data, select=group))) vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Petr PIKAL wrote: > > Well, if somebody does not care what is he/she doing then he/she should > stop immediately. > then many r users should perhaps stop using r. but seriously, when one buys a complicated device one typically reads a quick start guide, and makes intuitive assumptions about how the device will work, turning back to the reference when the expectations fail. good design should aim at reducing the need for checking why an intuitive assumption fails. > If you do not care about how to use machine-gun correctly you could easily > harm yourself or others. > indeed, and i'm scared to think that some of the published research can be harmful because the researcher denied to read the whole r reference before doing a stats analysis. >> those outside the r team who care about language design have probably >> left the list long ago, if only they were subscribed. the fact that >> > > I am just a BFU although for some time already, so I learned much virtues > from capable persons who are developing and using R. I started with R when > I had to change from DOS Statgraphics to some Windows based program and > get used to it. > > It is like buying new shoes. If somebody just put them on, go for a some > mountaineering, find out that they cause blisters, discard them and buy a > new pair then he probable does not get rid of blisters. > you see, i'm not complaining about my own analyses failing because i have not read the appropriate section in the reference. if this were the problem, i'd just read more and keep silent. i'm complaining about the need to read, by anyone who starts up with r, in all gory details, about the intricacies of r before doing anything, because the behaviour is often so unexpected. i'm using a whole range of programming languages, including functional ones, they differ a lot, they do surprise me at times, but once you learn a few general rules about the syntax and semantics, it goes well. it won't with r, because every single function can do it's own tricks with the arguments you give it, and it can do so in an inconsistent manner. *this* is what should be changed for r to be coherent and reliable. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Hi [EMAIL PROTECTED] napsal dne 11.11.2008 11:32:27: > Berwin A Turlach wrote: > > On Tue, 11 Nov 2008 09:27:41 +0100 > > Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > > > > > >> but then it might be worth asking whether carrying on with misdesign > >> for backward compatibility outbalances guaranteed crashes in future > >> users' programs, [...] > >> > > > > Why is it worth asking this if nobody else asks it? > > > i guess most of the people who do ask questions here care little about r > itself, they just want it to solve a problem, even if it involves > hacking the language. Well, if somebody does not care what is he/she doing then he/she should stop immediately. If you do not care about how to use machine-gun correctly you could easily harm yourself or others. > > those outside the r team who care about language design have probably > left the list long ago, if only they were subscribed. the fact that I am just a BFU although for some time already, so I learned much virtues from capable persons who are developing and using R. I started with R when I had to change from DOS Statgraphics to some Windows based program and get used to it. It is like buying new shoes. If somebody just put them on, go for a some mountaineering, find out that they cause blisters, discard them and buy a new pair then he probable does not get rid of blisters. > it's only me asking is no statistics. i do talk to people, and know > many who'd ask, but they just don't care, because they have already > trashed r. instead of discouraging me, make use of that i care to ask. If i understand - see Gabors post > Gabor Grothendieck wrote: >> Certainly this has been recognized as a potential problem: >> >> http://developer.r-project.org/nonstandard-eval.pdf >> >> however, it is convenient when you are performing >> an analysis and entering commands directly as opposed >> to writing a program although possibly the potential ambiguities >> overshadow the convenience. But changing it could be quite difficult and not on developers high priority list. Just my 2c Regards Petr > > vQ > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Kenn Konstabel wrote: > > On the other hand, while there may be ground to complain, it may be easier > to make your own version of subset.data.frame and advertise it to everyone: > > sure, but: a) it may actually increase the mess, and reduce portability b) is still vulnerable to the idiosyncrasies of the functions you use to develop your own function. to b), that was the original case; the user wanted to implement a function that did print-names-subset, and he got caught by subset. it should be preferred to have a clean and consistent protocol for how functions treat their arguments, rather than to multiply implementations of the same operation to provide versions that differ in nitty-gritty details just because the original does something odd. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, Nov 11, 2008 at 12:27 PM, Wacek Kusnierczyk < [EMAIL PROTECTED]> wrote: > it's certainly hard to design and implement a system of the size of r. > it's certainly easier to just complain rather than make a better tool. > but it would really be a pitiful world if all of us were just > developing, and no one would criticize. my purpose is not (or not just, > if you prefer) to annoy the r team, but to point out and document issues > that really need rethinking. discouragingly, many of these issues > appear to be known already, but simply ignored. > On the other hand, while there may be ground to complain, it may be easier to make your own version of subset.data.frame and advertise it to everyone: Substitute the second `substitute` in subset.data.frame for nothing, i.e., replace vars <- eval(substitute(select), nl, parent.frame()) .. with vars <- eval(select, nl, parent.frame()) .. and it will behave as you want (if I understood you). # suppose you have modified subset.data.frame this way # and called it waceks.subset df1<-data.frame(group="G1", visit="V1", value=0.9) group <- c("group", "visit") > subset(df1, select=group) group 1G1 > waceks.subset(df1, select=group) group visit 1G1V1 KK [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Berwin A Turlach wrote: > On Tue, 11 Nov 2008 09:27:41 +0100 > Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > > >> but then it might be worth asking whether carrying on with misdesign >> for backward compatibility outbalances guaranteed crashes in future >> users' programs, [...] >> > > Why is it worth asking this if nobody else asks it? i guess most of the people who do ask questions here care little about r itself, they just want it to solve a problem, even if it involves hacking the language. those outside the r team who care about language design have probably left the list long ago, if only they were subscribed. the fact that it's only me asking is no statistics. i do talk to people, and know many who'd ask, but they just don't care, because they have already trashed r. instead of discouraging me, make use of that i care to ask. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Berwin A Turlach wrote: > On Tue, 11 Nov 2008 09:27:41 +0100 > Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > > >> but then it might be worth asking whether carrying on with misdesign >> for backward compatibility outbalances guaranteed crashes in future >> users' programs, [...] >> > > Why is it worth asking this if nobody else asks it? Most notably a > certain software company in Redmond, Washington, which is famous for > carrying on with bad designs and bugs all in the name of backward > compatibility. Apparently this company also sets industry standards so > it must be o.k. to do that. ;-) > sure. i have had this analogy in mind for a long time, but just didn't want to say it aloud. indeed, r carries on with bad design, but since there are more and more users, it's just fine. > >> which result in confused complaints, >> > > Didn't see any confused complaints yet. really. the discussion was motivated precisely by a user's complaint. just scan this list; a large part of the questions stems from confusion, which results directly from r's design. > Only polite requests for > enlightenment after coming across behaviour that useRs found surprising > given their knowledge of R. The confused complaints seem to be posted > as responses to responses to such question by people who for what ever > reason seem to have an axe to grind with R. > >> the need for responses suggesting hacks to bypass the design, >> > > Not to bypass the design, but to achieve what the person whats. As any > programming language, R is a Turing machine and anything can be done > with it; it is just a question how. > yes, to bypass the design. to achieve what one would normally expect an expression to be evaluated to, but r does it differently. > >> and possibly incorrect results published >> > > I guess such things cannot be avoided no matter what software you are > using. I am more worried about all the analysis done in MS Excel, in > particular in the financial maths/stats world. Also, to me it seems > that getting incorrect results is a relative small problem compared with > the frequent misinterpretation of correct results or the use of > inappropriate statistical techniques. > could not agree more, which does oppose in any way my complaints. > >> because r is likely to do everything but what the user expects. >> > > This is quite a strong statement, and I wonder what the basis is for > that a statement. Care to provide any evidence? > i could think of organizing a (don't)useR conference, where submissions would provide such evidence. whatever i say here, is mostly discarded as nonsense comments (while it certainly isn't), you say i make the problem up (while i just follow up user's complaints). seriously, i may have exaggerated in the immediately above, but lots of comments made here by the users convince me that r very often breaks expectations. > R is a tool; a very powerful one and hence also very sharp. It is easy > to cut yourself with it, but when one knows how to use it gives the > results that one expects. I guess the problem in this age of instant > gratification is that people are not willing to put in the time and > effort to learn about the tools they are using. > but a good tool should be made with care for how users will use it. r apparently fits the ideas of its developers, while confuses naive users. i do not opt for redmond-like 'i know better what you want' intelligence, but i think some of the confusions should be predicted and the design tuned accordingly. > How about spending some time learning about R instead of continuously > griping about it? Just imagine how much you could have learned in the > time you spend writing all those e-mails. :) > i learn a lot while writing these emails, because i do read manuals and make up tests. but there would be little progress if we all were buying what we are given instead of critically examining it. i can stop posting at any moment, but i don't think it would help the community ;) >> r suffers from early made poor decisions, but then this in itself is >> not a good reason to carry on. >> > > Radford Neal is also complaining on his blog > (http://radfordneal.wordpress.com/) about what he thinks are design > flaws in R. Why don't you two get together and design a good > substitute without any flaws? Or is that too hard? ;-) > it's certainly hard to design and implement a system of the size of r. it's certainly easier to just complain rather than make a better tool. but it would really be a pitiful world if all of us were just developing, and no one would criticize. my purpose is not (or not just, if you prefer) to annoy the r team, but to point out and document issues that really need rethinking. discouragingly, many of these issues appear to be known already, but simply ignored. vQ __ R-help@r-project.org mail
Re: [R] Variable passed to function not used in function in select=... in subset
Gavin Simpson wrote: > >> d = data.frame(a = 1) >> d$`-b` = 2 >> names(d) >> # here we go >> >> subset(d, select = -b) >> # to b or not to b? >> > > but -b is not the name of the column; you explicitly called it `-b` and > you should refer to it as such. If you use "non-standard" names then > expect to do a bit more work. > identical(names(d)[2], "-b") if i do d$`c` = 4 then you claim d has no column named 'c'? do i have to refer to the c column as `c`? > >> subset(d, select = `-b`) >> > -b > 1 2 > ... and i have to use subset(d, select = `a`) and not subset(d, select = a) right? besides, subset(d, select = `-b`) should rather return the column(s) whose names are the value of the variable `-b`: `-a` = "a" subset(d, select = `-a`) # returns all columns except for the one named 'a', rather than the column named '-a' -- but that's just because there is no such column in d; if there were, this one would be returned. so even with backquotes used, there is no obvious interpretation of what select=`-b`should mean, because it depends on what names components of the first argument have. and this breaks the concept of referential transparency. so the problem is not so easily explained away. what subset does *is* messy. >> subset(d, select = - `-b`) >> > a > 1 1 > > >> b = "a" >> subset(d, select = -b) >> # tragedy >> > > For this, I interpret it as not finding a column named b so tries to > evaluate: > > you interpret it. how obvious is this for most users? it tries to find a column named 'b', not a column named b. that's the problem with subset. >> b = "a" >> `-`(b) >> > Error in -b : invalid argument to unary operator > > `-` is a function remember. > > If you want this to work you can use get() > > >> subset(d, select = - get(b)) >> > -b > 1 2 > > "use this hack to get around the design." >> d$b = 3 >> subset(d, select = -b) >> # catharsis >> >> (for whatever reason a user may choose to have a column named '-b') >> > > Yes, but the user is warned about not using standard naming conventions > in the Introduction to R manual. You aren't stopped from using names > like `-b` but if you use them, you have to expect to work a little > harder. > i'd like you to point me to that warning, as i apparently need to read it, but i haven't found it in the manual yet. thanks. > Reading ?subset we have: > > select: expression, indicating columns to select from a data frame. > > > > For data frames, the 'subset' argument works on the rows. Note > that 'subset' will be evaluated in the data frame, so columns can > be referred to (by name) as variables in the expression (see the > examples). > > which I think is reasonably explicit is it not? about? it says nothing about how the expression passed as the select argument is treated. it just says that the select argument is an expression indicating columns (but how?), and then, in the middle of explaining the subset parameter, it mentions that columns can be referred to by name as variables in the expression. how clear is this? the following does not work -- i'd expect it to, by virtue the clear explanation: d = data.frame(a=1, b=2) subset(d, select=c(a, "b")) # what?? it does not break any 'specification' given in the docs > It explains why your > second example fails and why '- get(b)' doesn't, and also why your other > examples don't give you what you want. You aren't using the appropriate > 'name'. > that's still too confusing. ?get: get(x, ...) x: a variable name (given as a character string) so: get("b") # "a", because we get the variable b, whose value is "a" get(b) # variable "a" not found in '-get(b)', get(b) should evaluate to the value of the variable named in b; b is "a", so get should lookup the value of the variable a, but there is none (unless you defined it), so this should break. instead, 'get(b)' is replaced with 'a', and '-a' in subset(d, select=-a) is not treated as an application of the function `-`to the variable a, but literally as the specification 'but column named 'a''. it must be painfully obvious to a casual user. > I'm sure we could all find aspects of R that don't work in exactly the > way we might preconceive or think of as being intuitive. most of it, seems like. > But if it works > as documented in many cases, the documentation is insufficient, confusing, and unhelpful when it comes to this sort of what you might call 'optimizations'. > then I don't see what the problem is unless i) you are > offering to rewrite the code to make it "work better", ii) that R Core > thinks any proposal "works better" and iii) in doing so it doesn't break > most of the R code out there in R itself or in add-on packages. > i'd prefer r to work better rather than "work better". i'm afraid that serious improvements to r must, by necessity, break quite a lot of earlier code, which exploits, if only due the impossibil
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 11 Nov 2008 09:49:31 +0100 Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > (for whatever reason a user may choose to have a column named '-b') For whatever reason, people also jump from bridges. Does that mean all bridges have an inherently flawed design and should be abolished? Wait, then we would only have level crossing and some people, for whatever reason, think it is a good idea to race trains to level crossings. Gee, we better abolish them too since they are such a bad design. Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 11 Nov 2008 09:27:41 +0100 Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > but then it might be worth asking whether carrying on with misdesign > for backward compatibility outbalances guaranteed crashes in future > users' programs, [...] Why is it worth asking this if nobody else asks it? Most notably a certain software company in Redmond, Washington, which is famous for carrying on with bad designs and bugs all in the name of backward compatibility. Apparently this company also sets industry standards so it must be o.k. to do that. ;-) > which result in confused complaints, Didn't see any confused complaints yet. Only polite requests for enlightenment after coming across behaviour that useRs found surprising given their knowledge of R. The confused complaints seem to be posted as responses to responses to such question by people who for what ever reason seem to have an axe to grind with R. > the need for responses suggesting hacks to bypass the design, Not to bypass the design, but to achieve what the person whats. As any programming language, R is a Turing machine and anything can be done with it; it is just a question how. > and possibly incorrect results published I guess such things cannot be avoided no matter what software you are using. I am more worried about all the analysis done in MS Excel, in particular in the financial maths/stats world. Also, to me it seems that getting incorrect results is a relative small problem compared with the frequent misinterpretation of correct results or the use of inappropriate statistical techniques. > because r is likely to do everything but what the user expects. This is quite a strong statement, and I wonder what the basis is for that a statement. Care to provide any evidence? R is a tool; a very powerful one and hence also very sharp. It is easy to cut yourself with it, but when one knows how to use it gives the results that one expects. I guess the problem in this age of instant gratification is that people are not willing to put in the time and effort to learn about the tools they are using. How about spending some time learning about R instead of continuously griping about it? Just imagine how much you could have learned in the time you spend writing all those e-mails. :) > r suffers from early made poor decisions, but then this in itself is > not a good reason to carry on. Radford Neal is also complaining on his blog (http://radfordneal.wordpress.com/) about what he thinks are design flaws in R. Why don't you two get together and design a good substitute without any flaws? Or is that too hard? ;-) Cheers, Berwin === Full address = Berwin A TurlachTel.: +65 6516 4416 (secr) Dept of Statistics and Applied Probability+65 6516 6650 (self) Faculty of Science FAX : +65 6872 3919 National University of Singapore 6 Science Drive 2, Blk S16, Level 7 e-mail: [EMAIL PROTECTED] Singapore 117546http://www.stat.nus.edu.sg/~statba __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On Tue, 2008-11-11 at 09:49 +0100, Wacek Kusnierczyk wrote: > Gabor Grothendieck wrote: > > > > Regarding the convenience it occurs in expressions like this: > > > >iris2 <- subset(iris, select = - Species) > > > > to create a data frame without the Species column. > > > > aha! so what's you best guess about the result here: I'm not sure I see too much of a problem here. > > d = data.frame(a = 1) > d$`-b` = 2 > names(d) > # here we go > > subset(d, select = -b) > # to b or not to b? but -b is not the name of the column; you explicitly called it `-b` and you should refer to it as such. If you use "non-standard" names then expect to do a bit more work. > subset(d, select = `-b`) -b 1 2 > subset(d, select = - `-b`) a 1 1 > > b = "a" > subset(d, select = -b) > # tragedy For this, I interpret it as not finding a column named b so tries to evaluate: > b = "a" > `-`(b) Error in -b : invalid argument to unary operator `-` is a function remember. If you want this to work you can use get() > subset(d, select = - get(b)) -b 1 2 > > d$b = 3 > subset(d, select = -b) > # catharsis > > (for whatever reason a user may choose to have a column named '-b') Yes, but the user is warned about not using standard naming conventions in the Introduction to R manual. You aren't stopped from using names like `-b` but if you use them, you have to expect to work a little harder. Reading ?subset we have: select: expression, indicating columns to select from a data frame. For data frames, the 'subset' argument works on the rows. Note that 'subset' will be evaluated in the data frame, so columns can be referred to (by name) as variables in the expression (see the examples). which I think is reasonably explicit is it not? It explains why your second example fails and why '- get(b)' doesn't, and also why your other examples don't give you what you want. You aren't using the appropriate 'name'. I'm sure we could all find aspects of R that don't work in exactly the way we might preconceive or think of as being intuitive. But if it works as documented then I don't see what the problem is unless i) you are offering to rewrite the code to make it "work better", ii) that R Core thinks any proposal "works better" and iii) in doing so it doesn't break most of the R code out there in R itself or in add-on packages. G > > > vQ > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Gabor Grothendieck wrote: > > Regarding the convenience it occurs in expressions like this: > >iris2 <- subset(iris, select = - Species) > > to create a data frame without the Species column. > aha! so what's you best guess about the result here: d = data.frame(a = 1) d$`-b` = 2 names(d) # here we go subset(d, select = -b) # to b or not to b? b = "a" subset(d, select = -b) # tragedy d$b = 3 subset(d, select = -b) # catharsis (for whatever reason a user may choose to have a column named '-b') vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Gabor Grothendieck wrote: > > but I think R is stuck with what it has due to compatibility and the large > base of users yet its still possible to add functions in packages or new > functions to R so a new variant of subset would be possible in which > case one could decide to use the new function in place of the old one. > you're probably correct. but then it might be worth asking whether carrying on with misdesign for backward compatibility outbalances guaranteed crashes in future users' programs, which result in confused complaints, the need for responses suggesting hacks to bypass the design, and possibly incorrect results published because r is likely to do everything but what the user expects. r suffers from early made poor decisions, but then this in itself is not a good reason to carry on. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On Mon, Nov 10, 2008 at 4:17 PM, Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > Gabor Grothendieck wrote: >> Certainly this has been recognized as a potential problem: >> >> http://developer.r-project.org/nonstandard-eval.pdf >> >> however, it is convenient when you are performing >> an analysis and entering commands directly as opposed >> to writing a program although possibly the potential ambiguities >> overshadow the convenience. >> > > in most cases, i do not see why one could not use a string literal > passed by value instead of having an expression deparsed within the > function, which may lead to confusing behaviour. this would give much > more consistent and predictable code. this has nothing to do with the > evaluation mechanism, which can still be lazy. > > > > in the case of subset, i do not really see how this design might be > helpful, but it's easy to see how it can be harmful, examples have just I think the thrust of your comments were already made by reference. Regarding the convenience it occurs in expressions like this: iris2 <- subset(iris, select = - Species) to create a data frame without the Species column. Perhaps this would have better been done by allowing an optional formula for the select clause: iris2 <- subset(iris, select = ~ - Species) but I think R is stuck with what it has due to compatibility and the large base of users yet its still possible to add functions in packages or new functions to R so a new variant of subset would be possible in which case one could decide to use the new function in place of the old one. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Gabor Grothendieck wrote: > Certainly this has been recognized as a potential problem: > > http://developer.r-project.org/nonstandard-eval.pdf > > however, it is convenient when you are performing > an analysis and entering commands directly as opposed > to writing a program although possibly the potential ambiguities > overshadow the convenience. > in most cases, i do not see why one could not use a string literal passed by value instead of having an expression deparsed within the function, which may lead to confusing behaviour. this would give much more consistent and predictable code. this has nothing to do with the evaluation mechanism, which can still be lazy. in the case of subset, i do not really see how this design might be helpful, but it's easy to see how it can be harmful, examples have just been given. the convenience here is at most up to being able to omit quotes, at the risk of having columns selected where they should not, and vice versa. the worst thing is that it destroys the benefit of lexical scoping: subset(d, select=group) did the programmer intend to select the column named 'group'? or the columns whose names appear in the vector group? is d supposed not to have a column named 'group', should one change the identifier if d does have such a column, to avoid selecting that column instead of whatever else would be selected? etc. could this not be written as subset(d, select="group") (two extra characters), and have it cleanly and always mean 'pick the one column named 'group''? so there are actually three problems here: - one that a programmer may be unaware that her own code not do what she wants; - another that a user may unaware of that the code she uses performs this way; - another that a user may not be sure whether the code may be reused as is, or must be modified so as not to interfere with the particular data. the dependence of subset's behaviour on the particular data it is applied to is confusing. and here's an example of how it breaks its own smart semantics: d = data.frame(a=1) d$`c(a,b)` = 2 d # no problem, two columns names(d) # one named 'c(a,b)' subset(d, select=c(a,b)) # so what? the expression given to select certainly is a valid and actual name of a column in d, but subset complains there's no such column (well, it actually says object "b" not found, by which it probably means that object b, i.e., object named 'b', has not been found. not only uninformative as a message in this situation, but also revealing the pervasive confusion of the name and the named, as the object "b" -- a one-character string -- has not been mentioned here at all. what a mess.) this can't possibly be considered good design, can it? the dubious benefit is heavily outweighed by the drawbacks. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Forgot the name part. Try: TestFunc2 <- function(DF, group) names(DF[group]) TestFunc3 <- function(...) names(subset(..., subset = TRUE)) TestFunc4 <- function(...) eval.parent(names(subset(..., subset = TRUE))) # e.g. df1 <- data.frame(group = "G1", visit = "V1", value = 0.9) TestFunc2(df1, c("group", "visit")) TestFunc3(df1, c("group", "visit")) TestFunc4(df1, c("group", "visit")) TestFunc4(df1, c(group, visit)) # this works too On Mon, Nov 10, 2008 at 10:43 AM, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > Here are a few things to try: > > TestFunc1 <- get("[") > > TestFunc2 <- function(DF, group) DF[group] > > TestFunc3 <- function(...) subset(..., subset = TRUE) > > > > On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick <[EMAIL PROTECTED]> wrote: >> Hello! >> >> I have the problem that in my function the passed variable is not used, but >> the variable name of the dataframe itself - difficult to explain, but an >> easy example: >> >> TestFunc<-function(df, group) { >> print(names(subset(df, select=group))) >> } >> df1<-data.frame(group="G1", visit="V1", value=0.9) >> TestFunc(df1, c("group", "visit")) >> >> Result: >> [1] "group" >> >> But I expected and want to have [1] "group" "visit" as result! Does anybody >> know how to get this result? >> >> Thanks! >> Karl >> >> >> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Certainly this has been recognized as a potential problem: http://developer.r-project.org/nonstandard-eval.pdf however, it is convenient when you are performing an analysis and entering commands directly as opposed to writing a program although possibly the potential ambiguities overshadow the convenience. On Mon, Nov 10, 2008 at 2:04 PM, Wacek Kusnierczyk <[EMAIL PROTECTED]> wrote: > pardon me, but does this address in any way the legitimate complaint of > the rightfully confused user? > > consider the following: > > d = data.frame(a=1, b=2) > a = c("a", "b") > z = a > # that is, both a and z are c("a", "b") > > subset(d, select=z) > # gives two columns, since z is a two element vector whose elements are > valid column names > > subset(d, select=a) > # gives one column, since 'a' (but not a) is a valid column name > > subset(d, select=c(a,b)) > # gives two columns > > > this is certainly what the authors intended, and they may have good > grounds for this smart design. but this must break the expectation of a > naive (r-naive, for that matter) user, who may otherwise have excellent > experience in using a functional programming language, e.g., scheme. > (especially scheme, where symbols and expressions are first-class > objects, yet the distinction between a symbol or an expression and their > referent is made painfully clear, perhaps except for when one hacks with > macros.) > > the examples above illustrate the notorious problem with r that one can > never tell whether 'a' means "the value referred to with the identifier > 'a'" or "the symbol 'a'", unless one gets ugly surprises and is forced > to study the documentation. and even then one may not get a clear answer. > > the example given by the confused user is a red flag warning. it's a > typical abstraction where a nested sequence of operations (here print > over names over subset) is abstracted into a single procedure, which can > be called with whatever arguments are valid: > > pns = function(d, g) print(names(subset(d, select=g))) > > what sane person, without carefully studying the gory details of subset, > will ever expect that if the first argument happens to have a column > named 'g', only this one will be selected, while if it doesn't, subset > will select the columns named by the components of what 'g' evaluates > to. i wonder how many users have *not* noticed that what they get is > not what they assume they get because of such tricky tricks, and in > consequence were not able to publish their analyses (or worse, have > published them). > > what is scary is that this may happen with about any other function in > r, because the design is pervasive. no one should ever use any r > function without first carefully reading the docs (which is not > guaranteed to help) or trying it first on a number of carefully crafted > test cases. if such care is not taken, results obtained with r cannot > be taken seriously. > > > vQ > > > Gabor Grothendieck wrote: >> Forgot the name part. Try: >> >> TestFunc2 <- function(DF, group) names(DF[group]) >> TestFunc3 <- function(...) names(subset(..., subset = TRUE)) >> TestFunc4 <- function(...) eval.parent(names(subset(..., subset = TRUE))) >> >> # e.g. >> df1 <- data.frame(group = "G1", visit = "V1", value = 0.9) >> TestFunc2(df1, c("group", "visit")) >> TestFunc3(df1, c("group", "visit")) >> TestFunc4(df1, c("group", "visit")) >> TestFunc4(df1, c(group, visit)) # this works too >> >> On Mon, Nov 10, 2008 at 10:43 AM, Gabor Grothendieck >> <[EMAIL PROTECTED]> wrote: >> >>> Here are a few things to try: >>> >>> TestFunc1 <- get("[") >>> >>> TestFunc2 <- function(DF, group) DF[group] >>> >>> TestFunc3 <- function(...) subset(..., subset = TRUE) >>> >>> >>> >>> On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick <[EMAIL PROTECTED]> wrote: >>> Hello! I have the problem that in my function the passed variable is not used, but the variable name of the dataframe itself - difficult to explain, but an easy example: TestFunc<-function(df, group) { print(names(subset(df, select=group))) } df1<-data.frame(group="G1", visit="V1", value=0.9) TestFunc(df1, c("group", "visit")) Result: [1] "group" But I expected and want to have [1] "group" "visit" as result! Does anybody know how to get this result? Thanks! Karl > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
pardon me, but does this address in any way the legitimate complaint of the rightfully confused user? consider the following: d = data.frame(a=1, b=2) a = c("a", "b") z = a # that is, both a and z are c("a", "b") subset(d, select=z) # gives two columns, since z is a two element vector whose elements are valid column names subset(d, select=a) # gives one column, since 'a' (but not a) is a valid column name subset(d, select=c(a,b)) # gives two columns this is certainly what the authors intended, and they may have good grounds for this smart design. but this must break the expectation of a naive (r-naive, for that matter) user, who may otherwise have excellent experience in using a functional programming language, e.g., scheme. (especially scheme, where symbols and expressions are first-class objects, yet the distinction between a symbol or an expression and their referent is made painfully clear, perhaps except for when one hacks with macros.) the examples above illustrate the notorious problem with r that one can never tell whether 'a' means "the value referred to with the identifier 'a'" or "the symbol 'a'", unless one gets ugly surprises and is forced to study the documentation. and even then one may not get a clear answer. the example given by the confused user is a red flag warning. it's a typical abstraction where a nested sequence of operations (here print over names over subset) is abstracted into a single procedure, which can be called with whatever arguments are valid: pns = function(d, g) print(names(subset(d, select=g))) what sane person, without carefully studying the gory details of subset, will ever expect that if the first argument happens to have a column named 'g', only this one will be selected, while if it doesn't, subset will select the columns named by the components of what 'g' evaluates to. i wonder how many users have *not* noticed that what they get is not what they assume they get because of such tricky tricks, and in consequence were not able to publish their analyses (or worse, have published them). what is scary is that this may happen with about any other function in r, because the design is pervasive. no one should ever use any r function without first carefully reading the docs (which is not guaranteed to help) or trying it first on a number of carefully crafted test cases. if such care is not taken, results obtained with r cannot be taken seriously. vQ Gabor Grothendieck wrote: > Forgot the name part. Try: > > TestFunc2 <- function(DF, group) names(DF[group]) > TestFunc3 <- function(...) names(subset(..., subset = TRUE)) > TestFunc4 <- function(...) eval.parent(names(subset(..., subset = TRUE))) > > # e.g. > df1 <- data.frame(group = "G1", visit = "V1", value = 0.9) > TestFunc2(df1, c("group", "visit")) > TestFunc3(df1, c("group", "visit")) > TestFunc4(df1, c("group", "visit")) > TestFunc4(df1, c(group, visit)) # this works too > > On Mon, Nov 10, 2008 at 10:43 AM, Gabor Grothendieck > <[EMAIL PROTECTED]> wrote: > >> Here are a few things to try: >> >> TestFunc1 <- get("[") >> >> TestFunc2 <- function(DF, group) DF[group] >> >> TestFunc3 <- function(...) subset(..., subset = TRUE) >> >> >> >> On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick <[EMAIL PROTECTED]> wrote: >> >>> Hello! >>> >>> I have the problem that in my function the passed variable is not used, but >>> the variable name of the dataframe itself - difficult to explain, but an >>> easy example: >>> >>> TestFunc<-function(df, group) { >>> print(names(subset(df, select=group))) >>> } >>> df1<-data.frame(group="G1", visit="V1", value=0.9) >>> TestFunc(df1, c("group", "visit")) >>> >>> Result: >>> [1] "group" >>> >>> But I expected and want to have [1] "group" "visit" as result! Does anybody >>> know how to get this result? >>> >>> Thanks! >>> Karl >>> __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
On 11/10/2008 10:18 AM, Karl Knoblick wrote: Hello! I have the problem that in my function the passed variable is not used, but the variable name of the dataframe itself - difficult to explain, but an easy example: TestFunc<-function(df, group) { print(names(subset(df, select=group))) } df1<-data.frame(group="G1", visit="V1", value=0.9) TestFunc(df1, c("group", "visit")) Result: [1] "group" But I expected and want to have [1] "group" "visit" as result! Does anybody know how to get this result? Don't use subset. You can get what you want using print(names(df[,group])) Or alternatively, you can force group to be found in the right place in this way: e <- environment() print(names(subset(df, select=e$group))) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Here are a few things to try: TestFunc1 <- get("[") TestFunc2 <- function(DF, group) DF[group] TestFunc3 <- function(...) subset(..., subset = TRUE) On Mon, Nov 10, 2008 at 10:18 AM, Karl Knoblick <[EMAIL PROTECTED]> wrote: > Hello! > > I have the problem that in my function the passed variable is not used, but > the variable name of the dataframe itself - difficult to explain, but an easy > example: > > TestFunc<-function(df, group) { > print(names(subset(df, select=group))) > } > df1<-data.frame(group="G1", visit="V1", value=0.9) > TestFunc(df1, c("group", "visit")) > > Result: > [1] "group" > > But I expected and want to have [1] "group" "visit" as result! Does anybody > know how to get this result? > > Thanks! > Karl > > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Variable passed to function not used in function in select=... in subset
Try this: TestFunc<-function(df, group) { return(names(eval(bquote(subset(df1, select = .(group)) } On Mon, Nov 10, 2008 at 1:18 PM, Karl Knoblick <[EMAIL PROTECTED]>wrote: > Hello! > > I have the problem that in my function the passed variable is not used, but > the variable name of the dataframe itself - difficult to explain, but an > easy example: > > TestFunc<-function(df, group) { > print(names(subset(df, select=group))) > } > df1<-data.frame(group="G1", visit="V1", value=0.9) > TestFunc(df1, c("group", "visit")) > > Result: > [1] "group" > > But I expected and want to have [1] "group" "visit" as result! Does anybody > know how to get this result? > > Thanks! > Karl > > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.