On 12/12/2008 11:38 AM, hadley wickham wrote:
On Fri, Dec 12, 2008 at 8:41 AM, Duncan Murdoch <murd...@stats.uwo.ca> wrote:
On 12/12/2008 8:25 AM, hadley wickham wrote:

From which you might conclude that I don't like the design of subset, and
you'd be right.  However, I don't think this is a counterexample to my
general rule.  In the subset function, the select argument is treated as
an
unevaluated expression, and then there are rules about what to do with
it.
 (I.e. try to look up name `a` in the data frame, if that fails, ...)

For the requested behaviour to similarly fall within the general rule,
we'd
have to treat all indices to all kinds of things (vectors, matrices,
dataframes, etc.) as unevaluated expressions, with special handling for
the
particular symbol `end`.

Except you wouldn't have to necessarily change indexing - you could
change seq instead.  Then 5:end could produce some kind of special
data structure (maybe an iterator) that was recognised by the various
indexing functions.

Ummm, doesn't that require changes to *both* indexing and seq?

Ooops, yes.  I meant it wouldn't require indexing to use unevaluated
expression.

This would still be a lot of work for not a lot
of payoff, but it would be a logically consistent way of adding this
behaviour to indexing, and the basic work would make it possible to
develop other sorts of indexing, eg df[evens(), ], or df[last(5),
last(3)].

I agree:  it would be a nice addition, but a fair bit of work.  I think it
would be quite doable for the indexable things in the base packages, but
there are a lot of contributed packages that define [ methods, and those
methods would all need to be modified too.

That's true, although I suspect many contributed [.methods eventually
delegate to base methods and might work without further modification.

(Just to be clear, when I say doable, I'm thinking that your iterators
return functions that compute subsets of index ranges.  For example, evens()
might be implemented as

evens <- function() {
 result <- function(indices) {
   indices[indices %% 2 == 0]
 }
 class(result) <- "iterator"
 return(result)
}

and then `[` in v[evens()] would recognize that it had been passed an
iterator, and would pass 1:length(v) to the iterator to get the subset of
even indices.  Is that what you had in mind?)

Yes, that's exactly what I was thinking, although you'd have to put
some thought into the conventions - would it be better to pass in the
length of the vector instead of a vector of indices?  Should all
iterators return logical vectors?  That way you could do x[evens() &
last(5)] to get the even indices out of the last 5, as opposed to
x[evens()][last(5)] which would return the last 5 even indices.

Actually, I don't think so. "evens() & last(5)" would fail to evaluate, because you're trying to do a logical combination of two functions, not of two logical vectors. Or are we going to extend the logical operators to work on iterators/selectors too?

Duncan Murdoch

You could also imagine similar iterators for random sampling, like
samp(0.2) to choose 20% of the indices, or boot(0.8) to choose 80%
with replacement.  first(n) could also be useful, selecting the first
min(n, length(vector)) observations.   An iterator version of rev()
would also be handy.

Maybe selector would be a better name than iterator though, as these
don't have the same feel as iterators in other languages.

Hadley


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to