try this: > x # print data id A v1 v2 v3 v4 v5 numMiss 1 id1 11905 NA NA NA N 0 3 2 id1 11907 3 2 1 Y 0 0 3 id1 11907 NA NA NA N 0 3 4 id2 11829 1 2 1 Y 1 0 5 id2 11829 2 NA NA N 0 2 6 id2 11829 NA NA NA N 0 3 > # select best data > xBest <- do.call(rbind, lapply(split(x, x$A), function(.grp){ + best <- which.min(apply(.grp, 1, function(a) sum(is.na(a)))) + .grp[best, ] + })) > xBest id A v1 v2 v3 v4 v5 numMiss 11829 id2 11829 1 2 1 Y 1 0 11905 id1 11905 NA NA NA N 0 3 11907 id1 11907 3 2 1 Y 0 0 > > xWorst <- do.call(rbind, lapply(split(x, x$A), function(.grp){ + worst <- which.max(apply(.grp, 1, function(a) sum(is.na(a)))) + .grp[worst, ] + })) > xWorst id A v1 v2 v3 v4 v5 numMiss 11829 id2 11829 NA NA NA N 0 3 11905 id1 11905 NA NA NA N 0 3 11907 id1 11907 NA NA NA N 0 3 > > >
On Sat, Apr 14, 2012 at 3:03 PM, francy <francy.casal...@gmail.com> wrote: > Dear r experts, > > Sorry for this basic question, but I can't seem to find a solution > > I have this data frame: > df <- data.frame(id = c("id1", "id1", "id1", "id2", "id2", "id2"), A = > c(11905, 11907, 11907, 11829, 11829, 11829), v1 = c(NA, 3, NA,1,2,NA), v2 = > c(NA,2,NA, 2, NA,NA), v3 = c(NA,1,NA,1,NA,NA), v4 = c("N", "Y", "N", "Y", > "N","N"), v5 = c(0,0,0,1,0,0), numMiss=c(3,0,3,0,2,3)) > > > df > id A v1 v2 v3 v4 v5 numMiss > 1 id1 11905 NA NA NA N 0 3 > 2 id1 11907 3 2 1 Y 0 0 > 3 id1 11907 NA NA NA N 0 3 > 4 id2 11829 1 2 1 Y 1 0 > 5 id2 11829 2 NA NA N 0 2 > 6 id2 11829 NA NA NA N 0 3 > > > And I need to keep, of the rows that have the same value for "A" by id, > only > the ones with the least amount of missing values for all the variables > (with > min(numMiss)) to get this: > > id A v1 v2 v3 v4 v5 numMiss > 1 id1 11905 NA NA NA N 0 3 > 2 id1 11907 3 2 1 Y 0 0 > 4 id2 11829 1 2 1 Y 1 0 > > Then I have to choose the records with the least value of "A" of the rows > that have the same id like this: > id A v1 v2 v3 v4 v5 numMiss > 1 id1 11905 NA NA NA N 0 3 > 4 id2 11829 1 2 1 Y 1 0 > > For groupings I have used the package "plyr" before, but this would involve > a sort of double-grouping by id and by duplicated values of A Could you > please help me understand how this can be done? > > Thank you very much. > -f > > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Choose-between-duplicated-rows-tp4557833p4557833.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.