Hello,

This is to be expected. Matrices can hold only one type of data so the problem is solved once and for all, data frames can have many types of data so the code to handle them must determine which type to handle on every access.

Hope this helps,

Rui Barradas

Em 16-03-2014 18:57, Göran Broström escreveu:
I have always known that "matrices are faster than data frames", for
instance this function:


dumkoll <- function(n = 1000, df = TRUE){
     dfr <- data.frame(x = rnorm(n), y = rnorm(n))
     if (df){
         for (i in 2:NROW(dfr)){
             if (!(i %% 100)) cat("i = ", i, "\n")
             dfr$x[i] <- dfr$x[i-1]
         }
     }else{
         dm <- as.matrix(dfr)
         for (i in 2:NROW(dm)){
             if (!(i %% 100)) cat("i = ", i, "\n")
             dm[i, 1] <- dm[i-1, 1]
         }
         dfr$x <- dm[, 1]
     }
}

--------------------
 > system.time(dumkoll())

    user  system elapsed
   0.046   0.000   0.045

 > system.time(dumkoll(df = FALSE))

    user  system elapsed
   0.007   0.000   0.008
----------------------

OK, no big deal, but I stumbled over a data frame with one million
records. Then, with df = TRUE,
----------------------------
      user    system   elapsed
44677.141  1271.544 46016.754
----------------------------
This is around 12 hours.

With df = FALSE, it took only six seconds! About 7500 time faster.

I was really surprised by the huge difference, and I wonder if this is
to be expected, or if it is some peculiarity with my installation: I'm
running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3.

Göran B.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to