The bottleneck of ave is the call to interaction (i.e. not the call to split/lapply).
Therefore, the following code runs as expected (but I may miss something...): ave2 <- function (x, ..., FUN = mean) { if(missing(...)) x[] <- FUN(x) else { #g <- interaction(...) g <- paste0(...) split(x,g) <- lapply(split(x, g), FUN) } x } df2$diff <- ave2(df2$val, df2$id1, df2$id2, df2$id3, FUN = function(i) c(diff(i), 0)) Of course I can also simply solve my current issue with: df2$id123 <- paste0(df2$id1, df2$id2, df2$id3) df2$diff <- ave(df2$val, df2$id123, FUN = function(i) c(diff(i), 0)) In addition, ave2 also avoid warnings in case of unused levels (see point 2) in my previous message). ________________________________________ De : SOEIRO Thomas Envoyé : vendredi 12 mars 2021 23:59 À : r-devel@r-project.org Objet : Potential improvements of ave? Dear all, I have two questions/suggestions about ave, but I am not sure if it's relevant for bug reports. 1) I have performance issues with ave in a case where I didn't expect it. The following code runs as expected: set.seed(1) df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE), id2 = sample(1:3, 5e2, TRUE), id3 = sample(1:5, 5e2, TRUE), val = sample(1:300, 5e2, TRUE)) df1$diff <- ave(df1$val, df1$id1, df1$id2, df1$id3, FUN = function(i) c(diff(i), 0)) head(df1[order(df1$id1, df1$id2, df1$id3), ]) But when expanding the data.frame (* 1e4), ave fails (Error: cannot allocate vector of size 1110.0 Gb): df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE), id2 = sample(1:3, 5e2 * 1e4, TRUE), id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE), val = sample(1:300, 5e2 * 1e4, TRUE)) df2$diff <- ave(df2$val, df2$id1, df2$id2, df2$id3, FUN = function(i) c(diff(i), 0)) This use case does not seem extreme to me (e.g. aggregate et al work perfectly on this data.frame). So my question is: Is this expected/intended/reasonable? i.e. Does ave need to be optimized? 2) Gabor Grothendieck pointed out in 2011 that drop = TRUE is needed to avoid warnings in case of unused levels (https://stat.ethz.ch/pipermail/r-devel/2011-February/059947.html). Is it relevant/possible to expose the drop argument explicitly? Thanks, Thomas ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel