Hello, This is one problem at the time :)
I have a data frame df that looks like this: time partitioning_mode workload runtime 1 1 sharding query 607 2 1 sharding query 85 3 1 sharding query 52 4 1 sharding query 79 5 1 sharding query 77 6 1 sharding query 67 7 1 sharding query 98 8 1 sharding refresh 2932 9 1 sharding refresh 2870 10 1 sharding refresh 2877 11 1 sharding refresh 2868 12 1 replication query 2891 13 1 replication query 2907 14 1 replication query 2922 15 1 replication query 2937 and if I could use SQL ... omg! I really wish I could! I would do exactly this: insert into throughput select time, partitioning_mode, count(*) from data.frame group by time, partitioning_mode My attempted R versions are wrong and produce very cryptic error message: > throughput <- aggregate(x=df[,c("time", "partitioning_mode")], > by=list(df$time,df$partitioning_mode), count) Error in `[.default`(df2, u_id, , drop = FALSE) : incorrect number of dimensions > throughput <- aggregate(x=df, by=list(df$time,df$partitioning_mode), count) Error in `[.default`(df2, u_id, , drop = FALSE) : incorrect number of dimensions >throughput <- tapply(X=df$time, INDEX=list(df$time,df$partitioning), FUN=count) I cant comprehend what comes out from this one ... :( and I thought C++ template errors were the most cryptic ;P Many many thanks in advance, Best regards, Giovanni ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.