On 3/18/06, Dan Bolser <[EMAIL PROTECTED]> wrote: > Gabor Grothendieck wrote: > > If you are just looking for something simple that may be good enough > > then assign the largest one to group 1, the second largest to group 2, > > ..., the 8th largest to group 8 and then start over again with group 1 > > and so on. > > > > # test data > > set.seed(1) > > x <- sample(100, 100, rep = TRUE) > > > > xs <- sort(x) > > g <- gl(8, 1, length(xs)) # 8 groups > > > > # so that g contains the groups that correspond to xs. > > > > tapply(xs, g, sum) # 659 671 687 701 612 622 629 646 > > > > > That is a fairly neat way of getting groups with a good 'approximate > same size', however, in general I would like to be able to order my data > in any way, and still cut it into equal 'size' groups (like quantiles > for rows, but for row variable totals instead).
Do you mean you want g to be in the original order of x? order(x) is the permutation which sorts x and order(order(x)) is its inverse permutation so apply that to the gl expression: x <- c(10, 4, 15, 2, 20, 13) g <- gl(2, 1, length(x))[order(order(x))] # check it identical(tapply(sort(x), gl(2, 1, length(x)), sum), tapply(x, g, sum)) > > Seems it should be possible without an explicit loop (and some more > 'refinement' of the final group sizes), but I can't work it out. > > > > > > > > On 3/17/06, Dan Bolser <[EMAIL PROTECTED]> wrote: > > > >>Dan Bolser wrote: > >> > >>>Hi, > >>> > >>>I have tuples of data in rows of a data.frame, each column is a variable > >>>for the 'items' (one per row). > >>> > >>>One of the variables is the 'size' of the item (row). > >>> > >>>I would like to cut my data.frame into groups such that each group has > >>>the same *total size*. So, assuming that we order by size, some groups > >>>should have several small items while other groups have a few large > >>>items. All the groups should have approximately the same total size. > >>> > >>>I have tried various combinations of cut, quantile, and ecdf, and I just > >>>can't work out how to do this! > >>> > >>>Any help is greatly appreciated! > >>> > >>>All the best, > >>>Dan. > >>> > >> > >>Perhaps there is a cleaver way, but I just wrote this in despiration... > >> > >> > >>my.groups <- 8 > >> > >>my.total <- > >> sum(my.res.1$TOT) ## The 'size' variable in my data.frame > >> > >>my.approx.size <- > >> my.total/ > >> my.groups > >> > >>my.j <- 1 > >>my.roll <- 0 > >>my.factor <- numeric() > >> > >>for(i in sort(my.res.1$TOT)){ > >> > >> my.roll <- > >> my.roll + i > >> > >> if (my.roll > my.approx.size * my.j) > >> my.j <- my.j + 1 > >> > >> my.factor <- > >> append(my.factor,my.j) > >>} > >> > >>my.factor <- > >> as.factor(my.factor) > >> > >> > >> > >>Then... > >> > >> > tapply(my.factor,my.factor,length) > >> 1 2 3 4 5 6 7 8 > >>152 62 45 34 25 21 14 8 > >> > >> > >>And... > >> > >> > tapply(sort(my.res.1$TOT),my.factor,sum) > >> 1 2 3 4 5 6 7 8 > >>2880 2848 2912 2893 2832 2906 2776 3029 > >> > > >> > >> > >> > >>Which isn't bad. > >> ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html