Hi all,

This is my first ever post, so forgive me and let me know if my
etiquette is less than that required.

I am searching for a faster way of subracting group means within a
data frame than the solution I've found so far, using AGGREGATE and
MERGE.

I'll flesh my question out using a trivial example: I have a data
frame Z with two columns - one X of values and one Y of labels:

> Z
    X    Y
1    1    4
2    2    4
3    3    5
4    4    5

I want to take the group means (for the two groups Y=4 and Y=5) and
subtract them from X resulting in the vector Result = t(-0.5  0.5 -0.5
 0.5). I have found a (slow) way of achieving this, using the
AGGREGATE function to get the group means and then MERGE to construct
an appropriate vector of these values, M:

> A <- aggregate(Z$X, by=Z$Y, FUN=mean)
> A
   Y     X
1   4   1.5
2   5   3.5

> M <- merge(Z,A,by="Y")[,3]
> M
[1] 1.5   1.5   3.5   3.5

> Result <- X - M
> Result
    X
1 -0.5
2  0.5
3 -0.5
4  0.5

My problem: for lots of records, while AGGREGATE is very fast, MERGE
is very slow - in real life I need to call this routine many times
over a very large dataset. Could anyone help me find a faster way of
achieving the same goal?

Many thanks,

Ben Cocker
MSc Statistics at UCL, London, UK

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to