on 11/16/2008 08:12 PM jeffc wrote:
> Hi, 
> 
> I have a data set similar to the following
> 
> State Gender  Quantity
> TX    Male    1
> NY    Female  2
> TX    Male    3
> NY    Female  4
> 
> 
> I need to calculate cumulative sum of the quantity by State and Gender. The
> expected output is 
> State Gender  Quantity        CumQuantity
> TX    Male    1       1
> TX    Male    3       4
> NY    Female  2       2
> NY    Female  4       6
> 
> I highly appreciate if someone can give me some hints on solving that in R.
> 
> Hao

I would verify this, but something along the lines of the following:

> DF
  State Gender Quantity
1    TX   Male        1
2    NY Female        2
3    TX   Male        3
4    NY Female        4


do.call(rbind,
        lapply(split(DF, list(DF$State, DF$Gender), drop = TRUE),
               function(x) cbind(x, cumsum(x$Quantity))))

which yields:

            State Gender Quantity cumsum(x$Quantity)
NY.Female.2    NY Female        2                  2
NY.Female.4    NY Female        4                  6
TX.Male.1      TX   Male        1                  1
TX.Male.3      TX   Male        3                  4



To take this step by step:

First, split() DF by the two factors:

> split(DF, list(DF$State, DF$Gender), drop = TRUE)
$NY.Female
  State Gender Quantity
2    NY Female        2
4    NY Female        4

$TX.Male
  State Gender Quantity
1    TX   Male        1
3    TX   Male        3



Pass that to lapply(), in which we do the cumsum() and cbind():

> lapply(split(DF, list(DF$State, DF$Gender), drop = TRUE),
         function(x) cbind(x, cumsum(x$Quantity)))
$NY.Female
  State Gender Quantity cumsum(x$Quantity)
2    NY Female        2                  2
4    NY Female        4                  6

$TX.Male
  State Gender Quantity cumsum(x$Quantity)
1    TX   Male        1                  1
3    TX   Male        3                  4



Pass that to do.call() to rbind() the results together:

> do.call(rbind,
          lapply(split(DF, list(DF$State, DF$Gender), drop = TRUE),
                 function(x) cbind(x, cumsum(x$Quantity))))
            State Gender Quantity cumsum(x$Quantity)
NY.Female.2    NY Female        2                  2
NY.Female.4    NY Female        4                  6
TX.Male.1      TX   Male        1                  1
TX.Male.3      TX   Male        3                  4


See ?split, ?do.call, ?rbind and ?cumsum.

If you want the exact row ordering as you had it in your post, you can
alter the factor levels, otherwise they will be sorted by alpha (eg. NY
before TX and Female before Male).

HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to