Hi all, This is my first ever post, so forgive me and let me know if my etiquette is less than that required.
I am searching for a faster way of subracting group means within a data frame than the solution I've found so far, using AGGREGATE and MERGE. I'll flesh my question out using a trivial example: I have a data frame Z with two columns - one X of values and one Y of labels: > Z X Y 1 1 4 2 2 4 3 3 5 4 4 5 I want to take the group means (for the two groups Y=4 and Y=5) and subtract them from X resulting in the vector Result = t(-0.5 0.5 -0.5 0.5). I have found a (slow) way of achieving this, using the AGGREGATE function to get the group means and then MERGE to construct an appropriate vector of these values, M: > A <- aggregate(Z$X, by=Z$Y, FUN=mean) > A Y X 1 4 1.5 2 5 3.5 > M <- merge(Z,A,by="Y")[,3] > M [1] 1.5 1.5 3.5 3.5 > Result <- X - M > Result X 1 -0.5 2 0.5 3 -0.5 4 0.5 My problem: for lots of records, while AGGREGATE is very fast, MERGE is very slow - in real life I need to call this routine many times over a very large dataset. Could anyone help me find a faster way of achieving the same goal? Many thanks, Ben Cocker MSc Statistics at UCL, London, UK ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.