On Tue, Jan 25, 2005 at 10:43:24AM -0500, Liaw, Andy wrote: > > From: Göran Broström > > > > I have a data frame containing children, with variables 'year' = birth > > year, and 'm.id' = mother's id number. Let's assume that all > > the births of > > each mother is represented in the data frame. > > > > Now I want to create a subset of this data frame containing > > all children, > > whose mother's first birth was in the year 1816 or later. > > This seems to > > work: > > > > mid <- tapply(dat$year, dat$m.id, min) > > mid <- as.numeric(names(mid)[mid >= 1816]) > > dat <- dat[dat$m.id %in% mid, ] > > > > but I'm worried about the second line, because the output > > from 'tapply' > > isn't documented to have a 'dimnames' attribute (although it > > has one, at > > least in R-2.1.0, 2005-01-19). Another aspect is that this > > code relies on > > m.id being numeric; I would have to change it if the type of > > m.id changes > > to, eg, character. > > > > So, question: Is there a better way of doing this? > > Would this work? > > dat <- dat[ave(dat$year, dat$m.id, min) >= 1816, ]
Yes, but you (or I) need > dat <- dat[ave(dat$year, dat$m.id, FUN = min) >= 1816, ] ^^^^^ (took me some time to figure out), because ?ave Usage: ave(x, ..., FUN = mean) Thanks Andy for giving me 'ave'! And thanks to Dimitris for his suggestion. Göran ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html