Hello,
This is to be expected. Matrices can hold only one type of data so the
problem is solved once and for all, data frames can have many types of
data so the code to handle them must determine which type to handle on
every access.
Hope this helps,
Rui Barradas
Em 16-03-2014 18:57, Göran Broström escreveu:
I have always known that "matrices are faster than data frames", for
instance this function:
dumkoll <- function(n = 1000, df = TRUE){
dfr <- data.frame(x = rnorm(n), y = rnorm(n))
if (df){
for (i in 2:NROW(dfr)){
if (!(i %% 100)) cat("i = ", i, "\n")
dfr$x[i] <- dfr$x[i-1]
}
}else{
dm <- as.matrix(dfr)
for (i in 2:NROW(dm)){
if (!(i %% 100)) cat("i = ", i, "\n")
dm[i, 1] <- dm[i-1, 1]
}
dfr$x <- dm[, 1]
}
}
--------------------
> system.time(dumkoll())
user system elapsed
0.046 0.000 0.045
> system.time(dumkoll(df = FALSE))
user system elapsed
0.007 0.000 0.008
----------------------
OK, no big deal, but I stumbled over a data frame with one million
records. Then, with df = TRUE,
----------------------------
user system elapsed
44677.141 1271.544 46016.754
----------------------------
This is around 12 hours.
With df = FALSE, it took only six seconds! About 7500 time faster.
I was really surprised by the huge difference, and I wonder if this is
to be expected, or if it is some peculiarity with my installation: I'm
running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3.
Göran B.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.