Re: [R] dropping rows

Richard A. O'Keefe Thu, 02 Dec 2004 15:07:50 -0800

Douglas Bates <[EMAIL PROTECTED]> wrote:
        In R this is called subsetting and the simplest way to do this
        is with the subset function.
        
        older <- subset(master, year < 1960)
        
I'm not sure that it's the "simplest".
Since rows for year < 1960 were to be dropped,
I'd say the _simplest_ way to do it is one which exploits
a primitive feature of R:


    master[master$year >= 1960,]

For me, the fact that the 'subset' argument of subset() is evaluated
in the scope of the data frame makes subset() quite a complicated way
to do things.  It's certainly something I'd hesitate to use inside a
function which might be given a data frame without knowing _exactly_
which column names were going to be in scope for the 2nd argument.
The fact that the 'subset' argument is *not* evaluated in the scope
of the 1st argument in other cases also makes subset() a somewhat
confusing function, compared with simple logical indexing.

Strengths of subset() include
 - you can select which columns you want, either instead of choosing
   a subset or at the same time (but you can do this with indexing too)
 - the drop= argument of indexing defaults to FALSE instead of TRUE
   (but this is not a problem for indexing data frames, where
    master[master$year == 1960,] will give you a data frame even if
    there is exactly one row with year 1960)

I would suggest that people who aren't yet thoroughly familiar with
what a simple "[" can do should add subset() to the list of things to
learn about _after_ they've done learning about "[".  On second thoughts,
maybe looking at the implementation of subset.default and subset.data.frame
would be helpful in learning about "[".

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] dropping rows

Reply via email to