Here is one way of doing it: > a <- + data.frame(id=c(c("A1","A2","A3","A4","A5"), + c("A3","A2","A3","A4","A5")),loc=c("B1","B2","B3","B4","B5"), + clm=c(rep(("General"),6),rep("Life",4))) > # split the indices based on 'id' & 'loc' > a.indx <- split(seq(nrow(a)), paste(a$id, a$loc)) > # now take each group and see if 'clm' differs (don't know what you want to > # do if more than 2 are in the group) > result <- lapply(a.indx, function(.indx){ + if (length(.indx) == 1) return(.indx) + if (any(a$clm[.indx[1]] != a$clm[.indx])) return(NULL) + .indx + }) > # output the matches > a[unlist(result),,drop=FALSE] id loc clm 1 A1 B1 General 6 A3 B1 General > >
On Wed, Oct 28, 2009 at 9:30 PM, Steven Kang <stochastick...@gmail.com> wrote: > Dear R users, > > > Basically, from the following arbitrary data set: > > a <- > data.frame(id=c(c("A1","A2","A3","A4","A5"),c("A3","A2","A3","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"),6),rep("Life",4))) > >> a > id loc clm > 1 A1 B1 General > 2 A2 B2 General > 3 A3 B3 General > 4 A4 B4 General > 5 A5 B5 General > 6 A3 B1 General > 7 A2 B2 Life > 8 A3 B3 Life > 9 A4 B4 Life > 10 A5 B5 Life > > I desire removing records (highlighted records above) with identical values > in each fields ("id" & "loc") but with different value of "clm" (i.e > according to category) > i.e >> categ <- table(a$id,a$clm) >> categ > > General Life > A1 1 0 > A2 1 1 > A3 2 1 > A4 1 1 > A5 1 1 > > The desired output is > > id loc clm > 1 A1 B1 General > 6 A3 B1 General > > Because the data set I am working on is quite big (~ 800,000 x 20) > with majority of the fields values being long strings, looping turned out to > be very inefficient in comapring individual rows.. > > Are there any alternative efficient methods in implementing this problem? > > Greatly appreciate for your expertise. > > > > Steven > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.