Marc, I have tried unique but unique looks at the entire row. I have a data set with a variable TRIPID. The dataset has 469,000 rows. In most cases TRIPID is a unique value. However, in some cases I have the same TRIPID value but different values for other variables. What this amounts to is an data entry error. I need to get rid of the repeated rows that have the same TRIPID but different co-variables. Thanks for your help. Cam
Cameron Guenther, Ph.D. Associate Research Scientist FWC/FWRI, Marine Fisheries Research 100 8th Avenue S.E. St. Petersburg, FL 33701 (727)896-8626 Ext. 4305 [EMAIL PROTECTED] -----Original Message----- From: Marc Schwartz (via MN) [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 2:50 PM To: Guenther, Cameron Cc: r-help@stat.math.ethz.ch Subject: Re: [R] subset On Tue, 2006-05-16 at 14:37 -0400, Guenther, Cameron wrote: > Hello everyone, > > I have a large dataset (x) with some rows that have duplicate > variables that I would like to remove. I find which rows are the > duplicates with X1<-which(duplicated(x)). That gives me the rows with > duplicated variables. Now, how can I remove just those rose from the > original data frame. I think I can create a new data frame without > the duplicates using subset. I have tried: > Subset(x,!x1) and subset(x,!x[x1,]) > I can't seem to find the correct syntax. Any advice. > Thanks in advance Even easier would be to use unique(): NewDF < unique(x) NewDF will contain rows from 'x' with duplicates removed. See ?unique for more information. unique(), which has a data.frame method, is basically: x[!duplicated(x), , drop = FALSE] which covers the case where the result may contain a single row and which remains a data frame. Note that the above presumes that you want to test all columns in 'x' for dups. HTH, Marc Schwartz ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html