Here is the way that I would do it. Using 'lapply' to process the list and create a matrix; take less than 1 second: > dat <- data.frame(D=sample(32000:33000, 33000, T), + Fid=sample(1:10,33000,T), A=sample(1:5,33000,T)) > system.time({ + result <- lapply(split(seq(nrow(dat)), dat$D), function(.d){ # split by first level + lapply(split(.d, dat$Fid[.d]), function(.f){ # now by the second + # create the sum and count + c(D=dat$D[.f[1]], Fid=dat$Fid[.f[1]], sum=sum(dat$A[.f]), cnt=length(.f)) + }) + }) + mat <- do.call('rbind',lapply(result, function(x) do.call('rbind',x))) + }) [1] 0.66 0.00 0.73 NA NA > > mat[1:20,] D Fid sum cnt 1 32000 1 8 3 2 32000 2 11 4 3 32000 3 11 3 4 32000 4 2 1 5 32000 5 8 2 6 32000 6 4 2 7 32000 7 21 6 8 32000 8 13 3 9 32000 9 12 4 10 32000 10 10 3 1 32001 1 12 4 2 32001 2 2 1 3 32001 3 10 4 4 32001 4 12 3 5 32001 5 10 3 6 32001 6 8 2 7 32001 7 22 7 8 32001 8 3 2 9 32001 9 7 3 10 32001 10 3 2 >
On 10/14/05, TEMPL Matthias <[EMAIL PROTECTED]> wrote: > > Hi, > > Yesterday, I have analysed data with 160000 rows and 10 columns. > Aggregation would be impossible with a data frame format, but when > converting it to a matrix with *numeric* entries (check, if the variables > are of class numeric!) the computation needs only 7 seconds on a Pentium > III. I´m sadly to say, that this is also slow in comparsion with the proc > summary in SAS (less than one second), but the code is much more elegant in > R! > > Best, > Matthias > > > > Hi, > > > > I use the code below to aggregate / cnt my test data. It > > works fine, but the problem is with my real data (33'000 > > rows) where the function is really slow (nothing happened in > > half an hour). > > > > Does anybody know of other functions that I could use? > > > > Thanks, > > Hans-Peter > > > > -------------- > > dat <- data.frame( Datum = c( 32586, 32587, 32587, 32625, > > 32656, 32656, 32656, 32672, 32672, 32699 ), > > FischerID = c( 58395, 58395, 58395, 88434, > > 89953, 89953, 89953, 64395, 62896, 62870 ), > > Anzahl = c( 2, 2, 1, 1, 2, 1, 7, 1, 1, 2 ) ) > > f <- function(x) data.frame( Datum = x[1,1], FischerID = > > x[1,2], Anzahl = sum( x[,3] ), Cnt = dim( x )[1] ) > > t.a <- do.call("rbind", by(dat, dat[,1:2], f)) # slow for > > 33'000 rows > > t.a <- t.a[order( t.a[,1], t.a[,2] ),] > > > > # show data > > dat > > t.a > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read > > the posting guide! http://www.R-project.org/posting-guide.html > > > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? [[alternative HTML version deleted]]
______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html