Re: [R] Removing duplicates without a for loop

Jean V Adams Wed, 26 Sep 2012 12:33:42 -0700

This might be quicker.

Para.5C.sorted <- Para.5C[order(Para.5C[, 1]), ]
Para.5C.final <- Para.5C.sorted[!duplicated(Para.5C.sorted$REQ.NR), ]


If your data are already sorted by date, then you can skip the first step 
and just run
Para.5C.final <- Para.5C[!duplicated(Para.5C$REQ.NR), ]

Jean



wwreith <reith_will...@bah.com> wrote on 09/26/2012 10:19:21 AM:
> 
>  I have several thousand rows of shipment data imported into R as a data
> frame, with two columns of particular interest, col 1 is the entry date, 
and
> col 2 is the tracking number (colname is REQ.NR). Tracking numbers 
should be
> unique but on occassion aren't because they get entered more than once. 
This
> creates two or more rows of with the same tracking number but different
> dates. I wrote a for loop that will keep the row with the oldest date 
but it
> is extremely slow. 
> 
> Any suggestions of how I should write this so that it is faster?
> 
> # Creates a vector of on the unique tracking numbers #
> u<-na.omit(unique(Para.5C$REQ.NR))
> 
> # Create Data Frame to rbind unique rows to #
> Para.5C.final<-data.frame()
> 
> # For each value in u subset Para.5C find the min date and rbind it to
> Para.5C.final #
> for(i in 1:length(u))
> {
>   x<-subset(Para.5C,Para.5C$REQ.NR==u[i])
>   Para.5C.final<-rbind(Para.5C.final,x[which(x[,1]==min(x[,1])),])
> }

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Removing duplicates without a for loop

Reply via email to