Take 0.6 seconds on my slow laptop: > n <- 1e6 > x <- data.frame(a=sample(LETTERS, n, TRUE)) > system.time(print(tapply(x$a, x$a, length))) A B C D E F G H I J K L M N O P Q 38555 38349 38647 38271 38456 38352 38644 38679 38575 38730 38471 38379 38540 38413 38365 38501 38555 R S T U V W X Y Z 38379 38417 38326 38509 38238 38395 38625 38175 38454 user system elapsed 0.59 0.02 0.63 >
On Wed, Sep 2, 2009 at 6:39 PM, Leo Alekseyev<dnqu...@gmail.com> wrote: > I have a data frame with about 10^6 rows; I want to group the data > according to entries in one of the columns and do something with it. > For instance, suppose I want to count up the number of elements in > each group. I tried something like aggregate(my.df$my.field, > list(my.df$my.field), length) but it seems to be very slow. Likewise, > the split() function was slow (I killed it before it completed). Is > there a way to efficiently accomplish this in R?.. I am almost > tempted to write an external Perl/Python script entering every row > into a hashtable keyed by my.field and iterating over the keys... > Might this be faster?.. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.