Wow! Jim, this is really impressive. I can't wrap my head around how you figured this out.
Thank you, AC On Sun, Feb 21, 2010 at 12:02 AM, jim holtman <jholt...@gmail.com> wrote: > This will do it. You can see two different values for id=1: > >> x <- with(datas, aggregate(list(r = r), by = list(id = id, mod1 = >> mod1),mean)) >> x > id mod1 r > 1 1 1 0.980 > 2 4 1 0.640 > 3 7 1 0.490 > 4 10 1 0.180 > 5 1 2 0.295 > 6 5 2 0.490 > 7 8 2 0.330 > 8 11 2 0.600 > 9 6 3 -0.040 > 10 9 3 0.580 > 11 12 3 0.210 >> # choose random duplicate to use >> do.call(rbind, lapply(split(x, x$id), function(.data) >> .data[sample(nrow(.data), 1),])) > id mod1 r > 1 1 1 0.98 > 4 4 1 0.64 > 5 5 2 0.49 > 6 6 3 -0.04 > 7 7 1 0.49 > 8 8 2 0.33 > 9 9 3 0.58 > 10 10 1 0.18 > 11 11 2 0.60 > 12 12 3 0.21 >> >> # choose random duplicate to use - try to see if a different one comes up >> do.call(rbind, lapply(split(x, x$id), function(.data) >> .data[sample(nrow(.data), 1),])) > id mod1 r > 1 1 2 0.295 > 4 4 1 0.640 > 5 5 2 0.490 > 6 6 3 -0.040 > 7 7 1 0.490 > 8 8 2 0.330 > 9 9 3 0.580 > 10 10 1 0.180 > 11 11 2 0.600 > 12 12 3 0.210 >> >> > > > On Sat, Feb 20, 2010 at 9:50 PM, AC Del Re <acde...@gmail.com> wrote: >> >> OK, this is great, Jim. Last question: How about if I want the 1 copy >> of each id to be selected randomly versus taking the first one? >> >> AC >> >> On Sat, Feb 20, 2010 at 8:37 PM, jim holtman <jholt...@gmail.com> wrote: >> > I am not sure what you mean by eliminating a row. Now if you want only >> > one >> > copy of each 'id', and it is the first one, the you can use >> > 'duplicated': >> > >> >> x <- with(datas, aggregate(list(r = r), by = list(id = id, mod1 = >> >> mod1),mean)) >> >> x >> > id mod1 r >> > 1 1 1 0.980 >> > 2 4 1 0.640 >> > 3 7 1 0.490 >> > 4 10 1 0.180 >> > 5 1 2 0.295 >> > 6 5 2 0.490 >> > 7 8 2 0.330 >> > 8 11 2 0.600 >> > 9 6 3 -0.040 >> > 10 9 3 0.580 >> > 11 12 3 0.210 >> >> subset(x, !duplicated(id)) >> > id mod1 r >> > 1 1 1 0.98 >> > 2 4 1 0.64 >> > 3 7 1 0.49 >> > 4 10 1 0.18 >> > 6 5 2 0.49 >> > 7 8 2 0.33 >> > 8 11 2 0.60 >> > 9 6 3 -0.04 >> > 10 9 3 0.58 >> > 11 12 3 0.21 >> > >> > >> > On Sat, Feb 20, 2010 at 8:07 PM, AC Del Re <de...@wisc.edu> wrote: >> >> >> >> Perfect! Thanks Jim. >> >> >> >> Do you know how I could then reduce the data even further? >> >> Specifically, reducing it to 1 id per row? In this dataset, id 1 would >> >> have one row eliminated. >> >> Assume the data is much larger and cannot be deleted by visual >> >> inspection and elimination one row at a time. >> >> >> >> >> >> Thank you, >> >> >> >> AC >> >> >> >> On Sat, Feb 20, 2010 at 6:26 PM, jim holtman <jholt...@gmail.com> >> >> wrote: >> >> > This seems to work fine (notice the missing 'c(...)'; why did you >> >> > think >> >> > you >> >> > needed it); >> >> > >> >> >> with(datas, aggregate(list(r = r), by = list(id = id, mod1 = >> >> >> mod1),mean)) >> >> > id mod1 r >> >> > 1 1 1 0.980 >> >> > 2 4 1 0.640 >> >> > 3 7 1 0.490 >> >> > 4 10 1 0.180 >> >> > 5 1 2 0.295 >> >> > 6 5 2 0.490 >> >> > 7 8 2 0.330 >> >> > 8 11 2 0.600 >> >> > 9 6 3 -0.040 >> >> > 10 9 3 0.580 >> >> > 11 12 3 0.210 >> >> >> >> >> > >> >> > >> >> > On Sat, Feb 20, 2010 at 6:54 PM, AC Del Re <de...@wisc.edu> wrote: >> >> >> >> >> >> Hi All, >> >> >> >> >> >> I am interested in aggregating a data frame based on 2 >> >> >> categories--mean effect size (r) for each 'id's' 'mod1'. The >> >> >> 'with' function works well when aggregating on one category (e.g., >> >> >> based on 'id' below) but doesnt work if I try 2 categories. How can >> >> >> this be accomplished? >> >> >> >> >> >> # sample data >> >> >> >> >> >> id<-c(1,1,1,rep(4:12)) >> >> >> n<-c(10,20,13,22,28,12,12,36,19,12, 15,8) >> >> >> r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21) >> >> >> mod1<-factor(c(1,2,2, rep(c(1,2,3),3))) >> >> >> mod2<-c(1,2,15,rep(3,9)) >> >> >> datas<-data.frame(id,n,r,mod1,mod2) >> >> >> >> >> >> # one category works perfect: >> >> >> >> >> >> with(datas, aggregate(list(r = r), by = list(id = id),mean)) >> >> >> >> >> >> id r >> >> >> 1 1 0.5233333 >> >> >> 2 4 0.6400000 >> >> >> 3 5 0.4900000 >> >> >> 4 6 -0.0400000 >> >> >> 5 7 0.4900000 >> >> >> 6 8 0.3300000 >> >> >> 7 9 0.5800000 >> >> >> 8 10 0.1800000 >> >> >> 9 11 0.6000000 >> >> >> 10 12 0.2100000 >> >> >> >> >> >> # trying with 2 categories: >> >> >> >> >> >> with(datas, aggregate(list(r = r), by = list(c(id = id, mod1 = >> >> >> mod1)),mean)) >> >> >> >> >> >> Error in FUN(X[[1L]], ...) : arguments must have same length >> >> >> >> >> >> Thank you, >> >> >> >> >> >> AC >> >> >> >> >> >> ______________________________________________ >> >> >> R-help@r-project.org mailing list >> >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> >> PLEASE do read the posting guide >> >> >> http://www.R-project.org/posting-guide.html >> >> >> and provide commented, minimal, self-contained, reproducible code. >> >> > >> >> > >> >> > >> >> > -- >> >> > Jim Holtman >> >> > Cincinnati, OH >> >> > +1 513 646 9390 >> >> > >> >> > What is the problem that you are trying to solve? >> >> > >> > >> > >> > >> > -- >> > Jim Holtman >> > Cincinnati, OH >> > +1 513 646 9390 >> > >> > What is the problem that you are trying to solve? >> > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.