Gavin Simpson wrote: > >> d = data.frame(a = 1) >> d$`-b` = 2 >> names(d) >> # here we go >> >> subset(d, select = -b) >> # to b or not to b? >> > > but -b is not the name of the column; you explicitly called it `-b` and > you should refer to it as such. If you use "non-standard" names then > expect to do a bit more work. > identical(names(d)[2], "-b")
if i do d$`c` = 4 then you claim d has no column named 'c'? do i have to refer to the c column as `c`? > >> subset(d, select = `-b`) >> > -b > 1 2 > ... and i have to use subset(d, select = `a`) and not subset(d, select = a) right? besides, subset(d, select = `-b`) should rather return the column(s) whose names are the value of the variable `-b`: `-a` = "a" subset(d, select = `-a`) # returns all columns except for the one named 'a', rather than the column named '-a' -- but that's just because there is no such column in d; if there were, this one would be returned. so even with backquotes used, there is no obvious interpretation of what select=`-b`should mean, because it depends on what names components of the first argument have. and this breaks the concept of referential transparency. so the problem is not so easily explained away. what subset does *is* messy. >> subset(d, select = - `-b`) >> > a > 1 1 > > >> b = "a" >> subset(d, select = -b) >> # tragedy >> > > For this, I interpret it as not finding a column named b so tries to > evaluate: > > you interpret it. how obvious is this for most users? it tries to find a column named 'b', not a column named b. that's the problem with subset. >> b = "a" >> `-`(b) >> > Error in -b : invalid argument to unary operator > > `-` is a function remember. > > If you want this to work you can use get() > > >> subset(d, select = - get(b)) >> > -b > 1 2 > > "use this hack to get around the design." >> d$b = 3 >> subset(d, select = -b) >> # catharsis >> >> (for whatever reason a user may choose to have a column named '-b') >> > > Yes, but the user is warned about not using standard naming conventions > in the Introduction to R manual. You aren't stopped from using names > like `-b` but if you use them, you have to expect to work a little > harder. > i'd like you to point me to that warning, as i apparently need to read it, but i haven't found it in the manual yet. thanks. > Reading ?subset we have: > > select: expression, indicating columns to select from a data frame. > > .... > > For data frames, the 'subset' argument works on the rows. Note > that 'subset' will be evaluated in the data frame, so columns can > be referred to (by name) as variables in the expression (see the > examples). > > which I think is reasonably explicit is it not? about? it says nothing about how the expression passed as the select argument is treated. it just says that the select argument is an expression indicating columns (but how?), and then, in the middle of explaining the subset parameter, it mentions that columns can be referred to by name as variables in the expression. how clear is this? the following does not work -- i'd expect it to, by virtue the clear explanation: d = data.frame(a=1, b=2) subset(d, select=c(a, "b")) # what?? it does not break any 'specification' given in the docs > It explains why your > second example fails and why '- get(b)' doesn't, and also why your other > examples don't give you what you want. You aren't using the appropriate > 'name'. > that's still too confusing. ?get: get(x, ...) x: a variable name (given as a character string) so: get("b") # "a", because we get the variable b, whose value is "a" get(b) # variable "a" not found in '-get(b)', get(b) should evaluate to the value of the variable named in b; b is "a", so get should lookup the value of the variable a, but there is none (unless you defined it), so this should break. instead, 'get(b)' is replaced with 'a', and '-a' in subset(d, select=-a) is not treated as an application of the function `-`to the variable a, but literally as the specification 'but column named 'a''. it must be painfully obvious to a casual user. > I'm sure we could all find aspects of R that don't work in exactly the > way we might preconceive or think of as being intuitive. most of it, seems like. > But if it works > as documented in many cases, the documentation is insufficient, confusing, and unhelpful when it comes to this sort of what you might call 'optimizations'. > then I don't see what the problem is unless i) you are > offering to rewrite the code to make it "work better", ii) that R Core > thinks any proposal "works better" and iii) in doing so it doesn't break > most of the R code out there in R itself or in add-on packages. > i'd prefer r to work better rather than "work better". i'm afraid that serious improvements to r must, by necessity, break quite a lot of earlier code, which exploits, if only due the impossibility of not doing so, such design. it certainly is a good idea to offer to contribute and i'd be happy to do so, but i wouldn't be given a chance, i suppose. besides, i try not to imagine what hides under the surface of a language with such a design. vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.