This might be quicker. Para.5C.sorted <- Para.5C[order(Para.5C[, 1]), ] Para.5C.final <- Para.5C.sorted[!duplicated(Para.5C.sorted$REQ.NR), ]
If your data are already sorted by date, then you can skip the first step and just run Para.5C.final <- Para.5C[!duplicated(Para.5C$REQ.NR), ] Jean wwreith <reith_will...@bah.com> wrote on 09/26/2012 10:19:21 AM: > > I have several thousand rows of shipment data imported into R as a data > frame, with two columns of particular interest, col 1 is the entry date, and > col 2 is the tracking number (colname is REQ.NR). Tracking numbers should be > unique but on occassion aren't because they get entered more than once. This > creates two or more rows of with the same tracking number but different > dates. I wrote a for loop that will keep the row with the oldest date but it > is extremely slow. > > Any suggestions of how I should write this so that it is faster? > > # Creates a vector of on the unique tracking numbers # > u<-na.omit(unique(Para.5C$REQ.NR)) > > # Create Data Frame to rbind unique rows to # > Para.5C.final<-data.frame() > > # For each value in u subset Para.5C find the min date and rbind it to > Para.5C.final # > for(i in 1:length(u)) > { > x<-subset(Para.5C,Para.5C$REQ.NR==u[i]) > Para.5C.final<-rbind(Para.5C.final,x[which(x[,1]==min(x[,1])),]) > } [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.