I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source()) when it is parsing long vectors of numeric data. dump/source has never been an efficient way of transferring data between different R session, but it is much worse now for long vectors. In 2.15.2 doubling the size of the vector (of lengths in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1. In 3.0.2 that factor is more like 4.4.
n elapsed-2.15.2 elapsed-3.0.2 2048 0.003 0.018 4096 0.006 0.065 8192 0.013 0.254 16384 0.025 1.067 32768 0.050 4.114 65536 0.100 16.236 131072 0.219 66.013 262144 0.808 291.883 524288 2.022 1285.265 1048576 4.918 NA 2097152 9.857 NA 4194304 22.916 NA 8388608 49.671 NA 16777216 101.042 NA 33554432 512.719 NA I tried this with 64-bit R on a Linux box. The NA's represent sizes that did not finish while I was at a 1 1/2 hour dentist's apppointment. The timing function was: test <- function(n = 2^(11:25)) { tf <- tempfile() on.exit(unlink(tf)) t(sapply(n, function(n){ dput(log(seq_len(n)), file=tf) print(c(n=n, system.time(parse(file=tf))[1:3])) })) } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Carl Witthoft > Sent: Wednesday, October 30, 2013 5:29 AM > To: r-help@r-project.org > Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ? > > Did you run the identical code on the identical machine, and did you verify > there were no other tasks running which might have limited the RAM available > to R? And equally important, did you run these tests in the reverse order > (in case R was storing large objects from the first run, thus chewing up > RAM)? > > > > Dear All, > > is it known that source works much faster in R 2.15.2 than in R 3.0.2 ? > In the example below I observe e.g. for a data.frame with 10^7 rows the > following timings: > > R version 2.15.2 Patched (2012-11-29 r61184) > length: 1e+07 > user system elapsed > 62.04 0.22 62.26 > > R version 3.0.2 Patched (2013-10-27 r64116) > length: 1e+07 > user system elapsed > 388.63 176.42 566.41 > > Is there a way to speed R version 3.0.2 up to the performance of R > version 2.15.2? > > best regards, > > Heinz Tüchler > > > example: > sessionInfo() > sample.vec <- > c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the', > 'named', 'file', 'or', 'URL', 'or', 'connection') > dmp.size <- c(10^(1:7)) > set.seed(37) > > for(i in dmp.size) { > df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE)) > dump('df0', file='testdump') > cat('length:', i, '\n') > print(system.time(source('testdump', keep.source = FALSE, > encoding=''))) > } > > output for R version 2.15.2 Patched (2012-11-29 r61184): > > sessionInfo() > R version 2.15.2 Patched (2012-11-29 r61184) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 > [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C > [5] LC_TIME=German_Switzerland.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > sample.vec <- > + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', > 'the', > + 'named', 'file', 'or', 'URL', 'or', 'connection') > > dmp.size <- c(10^(1:7)) > > set.seed(37) > > > > for(i in dmp.size) { > + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE)) > + dump('df0', file='testdump') > + cat('length:', i, '\n') > + print(system.time(source('testdump', keep.source = FALSE, > + encoding=''))) > + } > length: 10 > user system elapsed > 0 0 0 > length: 100 > user system elapsed > 0 0 0 > length: 1000 > user system elapsed > 0 0 0 > length: 10000 > user system elapsed > 0.02 0.00 0.01 > length: 1e+05 > user system elapsed > 0.21 0.00 0.20 > length: 1e+06 > user system elapsed > 4.47 0.04 4.51 > length: 1e+07 > user system elapsed > 62.04 0.22 62.26 > > > > > output for R version 3.0.2 Patched (2013-10-27 r64116): > > sessionInfo() > R version 3.0.2 Patched (2013-10-27 r64116) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 > [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C > [5] LC_TIME=German_Switzerland.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > sample.vec <- > + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', > 'the', > + 'named', 'file', 'or', 'URL', 'or', 'connection') > > dmp.size <- c(10^(1:7)) > > set.seed(37) > > > > for(i in dmp.size) { > + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE)) > + dump('df0', file='testdump') > + cat('length:', i, '\n') > + print(system.time(source('testdump', keep.source = FALSE, > + encoding=''))) > + } > length: 10 > user system elapsed > 0 0 0 > length: 100 > user system elapsed > 0 0 0 > length: 1000 > user system elapsed > 0 0 0 > length: 10000 > user system elapsed > 0.01 0.00 0.01 > length: 1e+05 > user system elapsed > 0.36 0.06 0.42 > length: 1e+06 > user system elapsed > 6.02 1.86 7.88 > length: 1e+07 > user system elapsed > 388.63 176.42 566.41 > > > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/big-speed-difference-in- > source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.