Hi,

As Jeff said, more than one grouping variable can be supplied, and there is an example at the bottom of the help page for ave(). The same goes for by(), but the order that you supply the grouping variables becomes important. Whichever grouping variable is supplied first to by() will change its levels first in the output sequence. You can see from your dataset:

d2 <- data.frame(city=rep(1:2, ea=6),
    year=c(rep(2001, 3), rep(2002, 3), rep(2001, 3), rep(2002, 3)),
    num=c(25,75,150,35,65,120,25,95,150,35,110,120))

d2
   # city year num
# 1     1 2001  25
# 2     1 2001  75
# 3     1 2001 150
# 4     1 2002  35
# 5     1 2002  65
# 6     1 2002 120
# 7     2 2001  25
# 8     2 2001  95
# 9     2 2001 150
# 10    2 2002  35
# 11    2 2002 110
# 12    2 2002 120

that `year' changes its levels through the sequence down the table first, and then `city' changes. You want your new column to align with this sequence. If you put city first in the list of grouping variables for by(), rather than `year', you won't get the sequence reflected in your dataset:

by(d2$num, d2[c('city', 'year')], function(x) x - x[1])

# city: 1
# year: 2001
# [1]   0  50 125
# -----------------------------
# city: 2
# year: 2001
# [1]   0  70 125
# -----------------------------
# city: 1
# year: 2002
# [1]  0 30 85
# -----------------------------
# city: 2
# year: 2002
# [1]  0 75 85

In contrast to using by() as I've suggested, using match() to create indices that flag when a new `city/year' category is encountered seems a more explicit, secure way to do the calculation. Adapting an earlier solution provided in this thread:

year.city <- with(d2, interaction(year, city))
indexOfFirstYearCity <- match(year.city, year.city)
indexOfFirstYearCity
# [1]  1  1  1  4  4  4  7  7  7 10 10 10

d2$diff <- d2$num - d2$num[indexOfFirstYearCity]
d2

  city year num diff
1     1 2001  25    0
2     1 2001  75   50
3     1 2001 150  125
4     1 2002  35    0
5     1 2002  65   30
6     1 2002 120   85
7     2 2001  25    0
8     2 2001  95   70
9     2 2001 150  125
10    2 2002  35    0
11    2 2002 110   75
12    2 2002 120   85


Philip

On 29/10/2016 3:15 PM, Jeff Newmiller wrote:
Now would be an excellent time to read the help page for ?ave. You can specify 
multiple grouping variables.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to