Another strategy is to sort by month, id, and, in reverse order, distance and select the rows that start each month/id run. This can be much faster than the other ways when there are lots of month/id combinations.
f1 <- function (DATA) { stopifnot(is.data.frame(DATA), all(c("distance", "id", "month") %in% names(DATA))) DATA <- DATA[order(DATA$month, DATA$id, -DATA$distance), ] ldiff <- function(x) c(TRUE, x[-1] != x[-length(x)]) DATA[ldiff(DATA$month) | ldiff(DATA$id), ] } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: Peter Ehlers [mailto:ehl...@ucalgary.ca] > Sent: Friday, August 24, 2012 10:51 AM > To: William Dunlap > Cc: Peter Alspach; rjb; r-help@r-project.org > Subject: Re: [R] Extracting data from dataframe with tied rows > > Here's another pretty straightforward solution, using the plyr pkg: > > DF <- data.frame(id, month, distance, bearing) > # variables as defined in the OP > > require(plyr) > DF1<-ddply(DF, .(id,month), summarize, > maxdist = max(distance), > maxbearing = bearing[which.max(distance)]) > > Peter Ehlers > > On 2012-08-24 09:54, William Dunlap wrote: > > Or use ave() to compute the within-group ranks (reversed, so max has rank > > 1) and > select > > the elements whose ranks are 1: > > f2 <- function (DATA) > > { > > stopifnot(is.data.frame(DATA), all(c("distance", "id", "month") %in% > > names(DATA))) > > revRanks <- ave(DATA[["distance"]], DATA[["id"]], DATA[["month"]], > > FUN = function(x) rank(-x, ties = "first")) > > DATA[revRanks == 1, ] > > } > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > > > > >> -----Original Message----- > >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > >> Of Peter Alspach > >> Sent: Thursday, August 23, 2012 4:37 PM > >> To: rjb; r-help@r-project.org > >> Subject: Re: [R] Extracting data from dataframe with tied rows > >> > >> Tena koe John > >> > >> One way: > >> > >> johnData <- data.frame(id=rep(LETTERS[1:5],20), distance=rnorm(1:100, mean > >> = 100), > >> bearing=sample(1:360,100,replace=T), month=sample(1:12,100,replace=T)) > >> johnAgg <- aggregate(johnData[,'distance'], johnData[,c('id','month')], > >> max) > >> names(johnAgg)[3] <- 'distance' > >> merge(johnAgg, johnData) > >> > >> HTH .... > >> > >> Peter Alspach > >> > >> -----Original Message----- > >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > >> Of rjb > >> Sent: Friday, 24 August 2012 9:19 a.m. > >> To: r-help@r-project.org > >> Subject: [R] Extracting data from dataframe with tied rows > >> > >> Hi R help, > >> > >> I'm a fairly experienced R user but this manipulation has me stumped, > >> please > >> help: > >> > >> DATA > >> id<-rep(LETTERS[1:5],20) > >> distance<-rnorm(1:100, mean = 100) > >> bearing<-sample(1:360,100,replace=T) > >> month<-sample(1:12,100,replace=T) > >> > >> I have a dataset with records of individuals (id) , each with a distance > >> (distance) & direction (bearing) recorded for each month (month). > >> I want to find the largest distance per individual per month, which is easy > >> with /tapply/ or /melt/cast (reshape)/, > >> head(DATA_m<-melt(DATA,id=c("id","month"))) > >> cast(DATA_m,id+month~.,max) > >> OR > >> na.omit(melt(tapply(distance,list(id,month),max))) > >> > >> *BUT THE CATCH IS* , > >> I also want the the *corresponding* bearing for that maximum distance per > >> month. I've tried the steps above plus using which.max() and loops, but > >> can't solve the problem. The real dataset is about 6000 rows. > >> > >> I'm guessing the answer is in finding the row number from the original DATA > >> but I can't figure how to do that with tapply or melt. > >> > >> Any suggestions would be greatly appreciated. > >> > >> John Burnside > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.