On 30/10/2013 21:15, William Dunlap wrote:
I have to defer to others for policy declarations like how long
the current format used by load and save should be readable.

You could also ask how long R will last ....

R can still read (but not write) save() formats used in the 1990's. We would expect R to be able to read saves since R 1.0.0 for as long as R exists. And as R is Open Source, you would be able to compile it and dump the objects you want for as long as suitable compilers and OSes exist .... And of course R is not the only application which will read the format.

There is no guarantee that source() will be able to parse dumps from earlier versions of R, and that has not always been true.

People commenting on parse() speed should note the NEWS for R-devel:

    • The parser has been modified to use less memory.



Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


-----Original Message-----
From: Heinz Tuechler [mailto:tuech...@gmx.at]
Sent: Wednesday, October 30, 2013 1:43 PM
To: William Dunlap
Cc: Carl Witthoft; r-help@r-project.org
Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Best thanks for confirming my impression. I use dump for storing large
data.frames with a number of attributes for each variable. save/load is
much faster, but I am unsure, if such files will be readable by R
versions years later.
What format/functions would you suggest for data storage/transfer
between different (future) R versions?

best regards,
Heinz

on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:
I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
when it is parsing long vectors of numeric data.  dump/source has never been an
efficient
way of transferring data between different R session, but it is much worse
now for long vectors.   In 2.15.2 doubling the size of the vector (of lengths
in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
In 3.0.2 that factor is more like 4.4.

         n elapsed-2.15.2 elapsed-3.0.2
      2048          0.003         0.018
      4096          0.006         0.065
      8192          0.013         0.254
     16384          0.025         1.067
     32768          0.050         4.114
     65536          0.100        16.236
    131072          0.219        66.013
    262144          0.808       291.883
    524288          2.022      1285.265
   1048576          4.918            NA
   2097152          9.857            NA
   4194304         22.916            NA
   8388608         49.671            NA
16777216        101.042            NA
33554432        512.719            NA

I tried this with 64-bit R on a Linux box.  The NA's represent sizes that did 
not
finish while I was at a 1 1/2 hour dentist's apppointment.  The timing function
was:
    test <- function(n = 2^(11:25))
    {
        tf <- tempfile()
        on.exit(unlink(tf))
        t(sapply(n, function(n){
            dput(log(seq_len(n)), file=tf)
            print(c(n=n, system.time(parse(file=tf))[1:3]))
        }))
    }

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf
Of Carl Witthoft
Sent: Wednesday, October 30, 2013 5:29 AM
To: r-help@r-project.org
Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Did you run the identical code on the identical machine, and did you verify
there were no other tasks running which might have limited the RAM available
to R?  And equally important, did you run these tests in the reverse order
(in case R was storing large objects from the first run, thus chewing up
RAM)?



Dear All,

is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
In the example below I observe e.g. for a data.frame with 10^7 rows the
following timings:

R version 2.15.2 Patched (2012-11-29 r61184)
length: 1e+07
      user  system elapsed
     62.04    0.22   62.26

R version 3.0.2 Patched (2013-10-27 r64116)
length: 1e+07
      user  system elapsed
    388.63  176.42  566.41

Is there a way to speed R version 3.0.2 up to the performance of R
version 2.15.2?

best regards,

Heinz Tüchler


example:
sessionInfo()
sample.vec <-
     c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
       'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size <- c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
     df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
     dump('df0', file='testdump')
     cat('length:', i, '\n')
     print(system.time(source('testdump', keep.source = FALSE,
                              encoding='')))
}

output for R version 2.15.2 Patched (2012-11-29 r61184):
sessionInfo()
R version 2.15.2 Patched (2012-11-29 r61184)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size <- c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
      user  system elapsed
         0       0       0
length: 100
      user  system elapsed
         0       0       0
length: 1000
      user  system elapsed
         0       0       0
length: 10000
      user  system elapsed
      0.02    0.00    0.01
length: 1e+05
      user  system elapsed
      0.21    0.00    0.20
length: 1e+06
      user  system elapsed
      4.47    0.04    4.51
length: 1e+07
      user  system elapsed
     62.04    0.22   62.26



output for R version 3.0.2 Patched (2013-10-27 r64116):
sessionInfo()
R version 3.0.2 Patched (2013-10-27 r64116)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
sample.vec <-
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
'the',
+     'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size <- c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
+   df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+                            encoding='')))
+ }
length: 10
      user  system elapsed
         0       0       0
length: 100
      user  system elapsed
         0       0       0
length: 1000
      user  system elapsed
         0       0       0
length: 10000
      user  system elapsed
      0.01    0.00    0.01
length: 1e+05
      user  system elapsed
      0.36    0.06    0.42
length: 1e+06
      user  system elapsed
      6.02    1.86    7.88
length: 1e+07
      user  system elapsed
    388.63  176.42  566.41






--
View this message in context: 
http://r.789695.n4.nabble.com/big-speed-difference-
in-
source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to