Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
Looks like this is an unavoidable interaction between the way source references and lazy loading are implemented. The link back to the crash_dumps environment comes though source references on an unevaluated argument promise. Creating a fresh environment is .onLoad() avoids this and is probably your best bet. Having an option to serialize without source references might be nice but would probably not be high enough on anyone's priority list to get done anytime soon. Best, luke On Thu, 8 Apr 2021, luke-tier...@uiowa.edu wrote: I see that now also. Not sure yet what is going on. One work-around that may work for you is to create a fresh crash dump in a .onLoad function; somehting like crash_dumps <- NULL .onLoad <- function(...) crash_dumps <<- new.env() Best, luke On Wed, 7 Apr 2021, Andreas Kersting wrote: Hi Dirk, hi Luke, Thanks for checking! I could narrow it down further. I have the issue only if I install --with-keep.source, i.e. R CMD INSTALL --with-keep.source dumpTest Since this is the default in RStudio when clicking "Install and Restart", I was always having the issue - also from base R. If I install using e.g. devtools::install_github() directly it is also fine for me. Could you please confirm? Thanks! Regards, Andreas 2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" : On 7 April 2021 at 16:06, Andreas Kersting wrote: | Hi Luke, | | Please see https://github.com/akersting/dumpTest for the package. | | Here a session showing my issue: | | > library(dumpTest) | > sessionInfo() | R version 4.0.5 (2021-03-31) | Platform: x86_64-pc-linux-gnu (64-bit) | Running under: Debian GNU/Linux 10 (buster) | | Matrix products: default | BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 | | locale: | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C | [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 | [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 | [7] LC_PAPER=en_US.UTF-8 LC_NAME=C | [9] LC_ADDRESS=C LC_TELEPHONE=C | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C | | attached base packages: | [1] stats graphics grDevices utils datasets methods base | | other attached packages: | [1] dumpTest_0.1.0 | | loaded via a namespace (and not attached): | [1] compiler_4.0.5 | > for (i in 1:100) { | + print(i) | + print(system.time(f())) | + } | [1] 1 |user system elapsed | 0.028 0.004 0.034 | [1] 2 |user system elapsed | 0.067 0.008 0.075 | [1] 3 |user system elapsed | 0.176 0.000 0.176 | [1] 4 |user system elapsed | 0.335 0.012 0.349 | [1] 5 |user system elapsed | 0.745 0.023 0.770 | [1] 6 |user system elapsed | 1.495 0.060 1.572 | [1] 7 |user system elapsed | 2.902 0.136 3.040 | [1] 8 |user system elapsed | 5.753 0.272 6.034 | [1] 9 |user system elapsed | 11.807 0.708 12.597 | [1] 10 | ^C | Timing stopped at: 6.638 0.549 7.214 | | I had to interrupt in iteration 10 because I was running low on RAM. No issue here. Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build off my Debian package, hence instrumentation as in the Debian package. edd@rob:~$ installGithub.r akersting/dumpTest Using github PAT from envvar GITHUB_PAT Downloading GitHub repo akersting/dumpTest@HEAD ✔ checking for file ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ... ─ preparing ‘dumpTest’: ✔ checking DESCRIPTION meta-information ... ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories ─ building ‘dumpTest_0.1.0.tar.gz’ Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) * installing *source* package ‘dumpTest’ ... ** using staged installation ** R ** byte-compile and prepare package for lazy loading ** help No man pages found in package ‘dumpTest’ *** installing help indices ** building package indices ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (dumpTest) edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})' user system elapsed 0.481 0.019 0.500 edd@rob:~$ (I also ran the variant you showed with the dual print statements, it just consumes more screen real estate and ends on [...] [1] 97 user system elapsed 0.004 0.000 0.005 [1] 98 user system elapsed 0.004 0.000 0.005 [1] 99 user system elapsed 0.004 0.000 0.004 [1] 100 user system elapsed 0.005 0.000 0.005 edd@rob:~$ ) Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax:
Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
I see that now also. Not sure yet what is going on. One work-around that may work for you is to create a fresh crash dump in a .onLoad function; somehting like crash_dumps <- NULL .onLoad <- function(...) crash_dumps <<- new.env() Best, luke On Wed, 7 Apr 2021, Andreas Kersting wrote: Hi Dirk, hi Luke, Thanks for checking! I could narrow it down further. I have the issue only if I install --with-keep.source, i.e. R CMD INSTALL --with-keep.source dumpTest Since this is the default in RStudio when clicking "Install and Restart", I was always having the issue - also from base R. If I install using e.g. devtools::install_github() directly it is also fine for me. Could you please confirm? Thanks! Regards, Andreas 2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" : On 7 April 2021 at 16:06, Andreas Kersting wrote: | Hi Luke, | | Please see https://github.com/akersting/dumpTest for the package. | | Here a session showing my issue: | | > library(dumpTest) | > sessionInfo() | R version 4.0.5 (2021-03-31) | Platform: x86_64-pc-linux-gnu (64-bit) | Running under: Debian GNU/Linux 10 (buster) | | Matrix products: default | BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 | | locale: | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C | [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 | [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 | [7] LC_PAPER=en_US.UTF-8 LC_NAME=C | [9] LC_ADDRESS=C LC_TELEPHONE=C | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C | | attached base packages: | [1] stats graphics grDevices utils datasets methods base | | other attached packages: | [1] dumpTest_0.1.0 | | loaded via a namespace (and not attached): | [1] compiler_4.0.5 | > for (i in 1:100) { | + print(i) | + print(system.time(f())) | + } | [1] 1 |user system elapsed | 0.028 0.004 0.034 | [1] 2 |user system elapsed | 0.067 0.008 0.075 | [1] 3 |user system elapsed | 0.176 0.000 0.176 | [1] 4 |user system elapsed | 0.335 0.012 0.349 | [1] 5 |user system elapsed | 0.745 0.023 0.770 | [1] 6 |user system elapsed | 1.495 0.060 1.572 | [1] 7 |user system elapsed | 2.902 0.136 3.040 | [1] 8 |user system elapsed | 5.753 0.272 6.034 | [1] 9 |user system elapsed | 11.807 0.708 12.597 | [1] 10 | ^C | Timing stopped at: 6.638 0.549 7.214 | | I had to interrupt in iteration 10 because I was running low on RAM. No issue here. Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build off my Debian package, hence instrumentation as in the Debian package. edd@rob:~$ installGithub.r akersting/dumpTest Using github PAT from envvar GITHUB_PAT Downloading GitHub repo akersting/dumpTest@HEAD ✔ checking for file ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ... ─ preparing ‘dumpTest’: ✔ checking DESCRIPTION meta-information ... ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories ─ building ‘dumpTest_0.1.0.tar.gz’ Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) * installing *source* package ‘dumpTest’ ... ** using staged installation ** R ** byte-compile and prepare package for lazy loading ** help No man pages found in package ‘dumpTest’ *** installing help indices ** building package indices ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (dumpTest) edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})' user system elapsed 0.481 0.019 0.500 edd@rob:~$ (I also ran the variant you showed with the dual print statements, it just consumes more screen real estate and ends on [...] [1] 97 user system elapsed 0.004 0.000 0.005 [1] 98 user system elapsed 0.004 0.000 0.005 [1] 99 user system elapsed 0.004 0.000 0.004 [1] 100 user system elapsed 0.005 0.000 0.005 edd@rob:~$ ) Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
Hi, For (hopefully) full reproducibility: docker run rocker/tidyverse:4.0.5 Rscript -e 'devtools::install_github("akersting/dumpTest", INSTALL_opts = "--with-keep.source"); library(dumpTest); for (i in 1:100) {print(i); print(system.time(f()))}' Regards, Andreas 2021-04-07 17:09 GMT+02:00 "Andreas Kersting" : > Hi Dirk, hi Luke, > > Thanks for checking! > > I could narrow it down further. I have the issue only if I install > --with-keep.source, i.e. > > R CMD INSTALL --with-keep.source dumpTest > > Since this is the default in RStudio when clicking "Install and Restart", I > was always having the issue - also from base R. If I install using e.g. > devtools::install_github() directly it is also fine for me. > > Could you please confirm? Thanks! > > Regards, > Andreas > > 2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" : >> >> On 7 April 2021 at 16:06, Andreas Kersting wrote: >> | Hi Luke, >> | >> | Please see https://github.com/akersting/dumpTest for the package. >> | >> | Here a session showing my issue: >> | >> | > library(dumpTest) >> | > sessionInfo() >> | R version 4.0.5 (2021-03-31) >> | Platform: x86_64-pc-linux-gnu (64-bit) >> | Running under: Debian GNU/Linux 10 (buster) >> | >> | Matrix products: default >> | BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 >> | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 >> | >> | locale: >> | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> | [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 >> | [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 >> | [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> | [9] LC_ADDRESS=C LC_TELEPHONE=C >> | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> | >> | attached base packages: >> | [1] stats graphics grDevices utils datasets methods base >> | >> | other attached packages: >> | [1] dumpTest_0.1.0 >> | >> | loaded via a namespace (and not attached): >> | [1] compiler_4.0.5 >> | > for (i in 1:100) { >> | + print(i) >> | + print(system.time(f())) >> | + } >> | [1] 1 >> |user system elapsed >> | 0.028 0.004 0.034 >> | [1] 2 >> |user system elapsed >> | 0.067 0.008 0.075 >> | [1] 3 >> |user system elapsed >> | 0.176 0.000 0.176 >> | [1] 4 >> |user system elapsed >> | 0.335 0.012 0.349 >> | [1] 5 >> |user system elapsed >> | 0.745 0.023 0.770 >> | [1] 6 >> |user system elapsed >> | 1.495 0.060 1.572 >> | [1] 7 >> |user system elapsed >> | 2.902 0.136 3.040 >> | [1] 8 >> |user system elapsed >> | 5.753 0.272 6.034 >> | [1] 9 >> |user system elapsed >> | 11.807 0.708 12.597 >> | [1] 10 >> | ^C >> | Timing stopped at: 6.638 0.549 7.214 >> | >> | I had to interrupt in iteration 10 because I was running low on RAM. >> >> No issue here. Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build >> off my Debian package, hence instrumentation as in the Debian package. >> >> edd@rob:~$ installGithub.r akersting/dumpTest >> Using github PAT from envvar GITHUB_PAT >> Downloading GitHub repo akersting/dumpTest@HEAD >> ✔ checking for file >> ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ... >> ─ preparing ‘dumpTest’: >> ✔ checking DESCRIPTION meta-information ... >> ─ checking for LF line-endings in source and make files and shell scripts >> ─ checking for empty or unneeded directories >> ─ building ‘dumpTest_0.1.0.tar.gz’ >> >> Installing package into ‘/usr/local/lib/R/site-library’ >> (as ‘lib’ is unspecified) >> * installing *source* package ‘dumpTest’ ... >> ** using staged installation >> ** R >> ** byte-compile and prepare package for lazy loading >> ** help >> No man pages found in package ‘dumpTest’ >> *** installing help indices >> ** building package indices >> ** testing if installed package can be loaded from temporary location >> ** testing if installed package can be loaded from final location >> ** testing if installed package keeps a record of temporary installation path >> * DONE (dumpTest) >> edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})' >>user system elapsed >> 0.481 0.019 0.500 >> edd@rob:~$ >> >> (I also ran the variant you showed with the dual print statements, it just >> consumes more screen real estate and ends on >> >> [...] >> [1] 97 >>user system elapsed >> 0.004 0.000 0.005 >> [1] 98 >> >>user system elapsed >> 0.004 0.000 0.005 >> [1] 99 >>user system elapsed >> 0.004 0.000 0.004 >> >> [1] 100 >>
Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
Hi Dirk, hi Luke, Thanks for checking! I could narrow it down further. I have the issue only if I install --with-keep.source, i.e. R CMD INSTALL --with-keep.source dumpTest Since this is the default in RStudio when clicking "Install and Restart", I was always having the issue - also from base R. If I install using e.g. devtools::install_github() directly it is also fine for me. Could you please confirm? Thanks! Regards, Andreas 2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" : > > On 7 April 2021 at 16:06, Andreas Kersting wrote: > | Hi Luke, > | > | Please see https://github.com/akersting/dumpTest for the package. > | > | Here a session showing my issue: > | > | > library(dumpTest) > | > sessionInfo() > | R version 4.0.5 (2021-03-31) > | Platform: x86_64-pc-linux-gnu (64-bit) > | Running under: Debian GNU/Linux 10 (buster) > | > | Matrix products: default > | BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 > | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 > | > | locale: > | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > | [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 > | [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 > | [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > | [9] LC_ADDRESS=C LC_TELEPHONE=C > | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > | > | attached base packages: > | [1] stats graphics grDevices utils datasets methods base > | > | other attached packages: > | [1] dumpTest_0.1.0 > | > | loaded via a namespace (and not attached): > | [1] compiler_4.0.5 > | > for (i in 1:100) { > | + print(i) > | + print(system.time(f())) > | + } > | [1] 1 > |user system elapsed > | 0.028 0.004 0.034 > | [1] 2 > |user system elapsed > | 0.067 0.008 0.075 > | [1] 3 > |user system elapsed > | 0.176 0.000 0.176 > | [1] 4 > |user system elapsed > | 0.335 0.012 0.349 > | [1] 5 > |user system elapsed > | 0.745 0.023 0.770 > | [1] 6 > |user system elapsed > | 1.495 0.060 1.572 > | [1] 7 > |user system elapsed > | 2.902 0.136 3.040 > | [1] 8 > |user system elapsed > | 5.753 0.272 6.034 > | [1] 9 > |user system elapsed > | 11.807 0.708 12.597 > | [1] 10 > | ^C > | Timing stopped at: 6.638 0.549 7.214 > | > | I had to interrupt in iteration 10 because I was running low on RAM. > > No issue here. Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build > off my Debian package, hence instrumentation as in the Debian package. > > edd@rob:~$ installGithub.r akersting/dumpTest > Using github PAT from envvar GITHUB_PAT > Downloading GitHub repo akersting/dumpTest@HEAD > ✔ checking for file > ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ... > ─ preparing ‘dumpTest’: > ✔ checking DESCRIPTION meta-information ... > ─ checking for LF line-endings in source and make files and shell scripts > ─ checking for empty or unneeded directories > ─ building ‘dumpTest_0.1.0.tar.gz’ > > Installing package into ‘/usr/local/lib/R/site-library’ > (as ‘lib’ is unspecified) > * installing *source* package ‘dumpTest’ ... > ** using staged installation > ** R > ** byte-compile and prepare package for lazy loading > ** help > No man pages found in package ‘dumpTest’ > *** installing help indices > ** building package indices > ** testing if installed package can be loaded from temporary location > ** testing if installed package can be loaded from final location > ** testing if installed package keeps a record of temporary installation path > * DONE (dumpTest) > edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})' >user system elapsed > 0.481 0.019 0.500 > edd@rob:~$ > > (I also ran the variant you showed with the dual print statements, it just > consumes more screen real estate and ends on > > [...] > [1] 97 >user system elapsed > 0.004 0.000 0.005 > [1] 98 > >user system elapsed > 0.004 0.000 0.005 > [1] 99 >user system elapsed > 0.004 0.000 0.004 > > [1] 100 > >user system elapsed > > 0.005 0.000 0.005 > edd@rob:~$ ) > > Dirk > > -- > https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
No issues here with that either. Looks like something is different on your end. Best, luke On Wed, 7 Apr 2021, Andreas Kersting wrote: Hi Luke, Please see https://github.com/akersting/dumpTest for the package. Here a session showing my issue: library(dumpTest) sessionInfo() R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] dumpTest_0.1.0 loaded via a namespace (and not attached): [1] compiler_4.0.5 for (i in 1:100) { + print(i) + print(system.time(f())) + } [1] 1 user system elapsed 0.028 0.004 0.034 [1] 2 user system elapsed 0.067 0.008 0.075 [1] 3 user system elapsed 0.176 0.000 0.176 [1] 4 user system elapsed 0.335 0.012 0.349 [1] 5 user system elapsed 0.745 0.023 0.770 [1] 6 user system elapsed 1.495 0.060 1.572 [1] 7 user system elapsed 2.902 0.136 3.040 [1] 8 user system elapsed 5.753 0.272 6.034 [1] 9 user system elapsed 11.807 0.708 12.597 [1] 10 ^C Timing stopped at: 6.638 0.549 7.214 I had to interrupt in iteration 10 because I was running low on RAM. Regards, Andreas 2021-04-07 15:28 GMT+02:00 luke-tier...@uiowa.edu: On Wed, 7 Apr 2021, Andreas Kersting wrote: Hi, please consider the following minimal reproducible example: Create a new R package which just contains the following two (exported) objects: I would not expect this behavior and I don't see it when I make such a package (in R 4.0.3 or R-devel on Ubuntu). You will need to provide a more complete reproducible example if you want help with what you are trying to do; also sessionInfo() would help. Best, luke crash_dumps <- new.env() f <- function() { x <- runif(1e5) dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) assign("last.dump", dump, crash_dumps) } WARNING: the following will probably eat all your RAM! Attach this package and run: for (i in 1:100) { print(i) f() } You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes. I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks! Regards, Andreas Background: In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory. Some more observations: - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced. - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!? - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env. - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround: crash_dumps <- list() f <- function() { x <- runif(1e5) dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) crash_dumps[["last.dump"]] <- dump assignInMyNamespace("crash_dumps", crash_dumps) } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241
Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
On 7 April 2021 at 16:06, Andreas Kersting wrote: | Hi Luke, | | Please see https://github.com/akersting/dumpTest for the package. | | Here a session showing my issue: | | > library(dumpTest) | > sessionInfo() | R version 4.0.5 (2021-03-31) | Platform: x86_64-pc-linux-gnu (64-bit) | Running under: Debian GNU/Linux 10 (buster) | | Matrix products: default | BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 | | locale: | [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C | [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 | [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 | [7] LC_PAPER=en_US.UTF-8 LC_NAME=C | [9] LC_ADDRESS=C LC_TELEPHONE=C | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C | | attached base packages: | [1] stats graphics grDevices utils datasets methods base | | other attached packages: | [1] dumpTest_0.1.0 | | loaded via a namespace (and not attached): | [1] compiler_4.0.5 | > for (i in 1:100) { | + print(i) | + print(system.time(f())) | + } | [1] 1 |user system elapsed | 0.028 0.004 0.034 | [1] 2 |user system elapsed | 0.067 0.008 0.075 | [1] 3 |user system elapsed | 0.176 0.000 0.176 | [1] 4 |user system elapsed | 0.335 0.012 0.349 | [1] 5 |user system elapsed | 0.745 0.023 0.770 | [1] 6 |user system elapsed | 1.495 0.060 1.572 | [1] 7 |user system elapsed | 2.902 0.136 3.040 | [1] 8 |user system elapsed | 5.753 0.272 6.034 | [1] 9 |user system elapsed | 11.807 0.708 12.597 | [1] 10 | ^C | Timing stopped at: 6.638 0.549 7.214 | | I had to interrupt in iteration 10 because I was running low on RAM. No issue here. Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build off my Debian package, hence instrumentation as in the Debian package. edd@rob:~$ installGithub.r akersting/dumpTest Using github PAT from envvar GITHUB_PAT Downloading GitHub repo akersting/dumpTest@HEAD ✔ checking for file ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ... ─ preparing ‘dumpTest’: ✔ checking DESCRIPTION meta-information ... ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories ─ building ‘dumpTest_0.1.0.tar.gz’ Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) * installing *source* package ‘dumpTest’ ... ** using staged installation ** R ** byte-compile and prepare package for lazy loading ** help No man pages found in package ‘dumpTest’ *** installing help indices ** building package indices ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (dumpTest) edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})' user system elapsed 0.481 0.019 0.500 edd@rob:~$ (I also ran the variant you showed with the dual print statements, it just consumes more screen real estate and ends on [...] [1] 97 user system elapsed 0.004 0.000 0.005 [1] 98 user system elapsed 0.004 0.000 0.005 [1] 99 user system elapsed 0.004 0.000 0.004 [1] 100 user system elapsed 0.005 0.000 0.005 edd@rob:~$ ) Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
Hi Luke, Please see https://github.com/akersting/dumpTest for the package. Here a session showing my issue: > library(dumpTest) > sessionInfo() R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] dumpTest_0.1.0 loaded via a namespace (and not attached): [1] compiler_4.0.5 > for (i in 1:100) { + print(i) + print(system.time(f())) + } [1] 1 user system elapsed 0.028 0.004 0.034 [1] 2 user system elapsed 0.067 0.008 0.075 [1] 3 user system elapsed 0.176 0.000 0.176 [1] 4 user system elapsed 0.335 0.012 0.349 [1] 5 user system elapsed 0.745 0.023 0.770 [1] 6 user system elapsed 1.495 0.060 1.572 [1] 7 user system elapsed 2.902 0.136 3.040 [1] 8 user system elapsed 5.753 0.272 6.034 [1] 9 user system elapsed 11.807 0.708 12.597 [1] 10 ^C Timing stopped at: 6.638 0.549 7.214 I had to interrupt in iteration 10 because I was running low on RAM. Regards, Andreas 2021-04-07 15:28 GMT+02:00 luke-tier...@uiowa.edu: > On Wed, 7 Apr 2021, Andreas Kersting wrote: > >> Hi, >> >> please consider the following minimal reproducible example: >> >> Create a new R package which just contains the following two (exported) >> objects: > > I would not expect this behavior and I don't see it when I make such a > package (in R 4.0.3 or R-devel on Ubuntu). You will need to provide a > more complete reproducible example if you want help with what you are > trying to do; also sessionInfo() would help. > > Best, > > luke > >> >> >> crash_dumps <- new.env() >> >> f <- function() { >> x <- runif(1e5) >> dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) >> assign("last.dump", dump, crash_dumps) >> } >> >> >> WARNING: the following will probably eat all your RAM! >> >> Attach this package and run: >> >> for (i in 1:100) { >> print(i) >> f() >> } >> >> You will notice that with each iteration the execution of f() slows down >> significantly while the memory consumption of the R process (v4.0.5 on >> Linux) quickly explodes. >> >> I am having a hard time to understand what exactly is happening here. >> Something w.r.t. too deeply nested environments? Could someone please >> enlighten me? Thanks! >> >> Regards, >> Andreas >> >> >> Background: >> In an R package I store crash dumps on error in a parallel processes in a >> way similar to what I have just shown (hence the (un)serialize(), which >> happens as part of returning the objects to the parent process). The first 2 >> or 3 times I do so in a session everything is fine, but afterwards it takes >> very long and I soon run out of memory. >> >> Some more observations: >> - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced. >> - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no >> issue - probably because .GlobalEnv is not included in sys.frames(), while >> crash_dumps is indirectly via the namespace of the package being the >> parent.env of some of the sys.frames()!? >> - If I omit the lapply(...), i.e. use `dump <- >> unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no >> issue. The immediate consequence is that there are less sys.frames and - in >> particular - there is no frame which has the base namespace as its >> parent.env. >> - If I make crash_dumps a list and use assignInMyNamespace() to store the >> dump in it, there also seems to be no issue. I will probably use this as a >> workaround: >> >> crash_dumps <- list() >> >> f <- function() { >> x <- runif(1e5) >> dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) >> crash_dumps[["last.dump"]] <- dump >> assignInMyNamespace("crash_dumps", crash_dumps) >> } >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics andFax: 319-335-3017 >Actuarial Science > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu >
Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()
On Wed, 7 Apr 2021, Andreas Kersting wrote: Hi, please consider the following minimal reproducible example: Create a new R package which just contains the following two (exported) objects: I would not expect this behavior and I don't see it when I make such a package (in R 4.0.3 or R-devel on Ubuntu). You will need to provide a more complete reproducible example if you want help with what you are trying to do; also sessionInfo() would help. Best, luke crash_dumps <- new.env() f <- function() { x <- runif(1e5) dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) assign("last.dump", dump, crash_dumps) } WARNING: the following will probably eat all your RAM! Attach this package and run: for (i in 1:100) { print(i) f() } You will notice that with each iteration the execution of f() slows down significantly while the memory consumption of the R process (v4.0.5 on Linux) quickly explodes. I am having a hard time to understand what exactly is happening here. Something w.r.t. too deeply nested environments? Could someone please enlighten me? Thanks! Regards, Andreas Background: In an R package I store crash dumps on error in a parallel processes in a way similar to what I have just shown (hence the (un)serialize(), which happens as part of returning the objects to the parent process). The first 2 or 3 times I do so in a session everything is fine, but afterwards it takes very long and I soon run out of memory. Some more observations: - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced. - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue - probably because .GlobalEnv is not included in sys.frames(), while crash_dumps is indirectly via the namespace of the package being the parent.env of some of the sys.frames()!? - If I omit the lapply(...), i.e. use `dump <- unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. The immediate consequence is that there are less sys.frames and - in particular - there is no frame which has the base namespace as its parent.env. - If I make crash_dumps a list and use assignInMyNamespace() to store the dump in it, there also seems to be no issue. I will probably use this as a workaround: crash_dumps <- list() f <- function() { x <- runif(1e5) dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL))) crash_dumps[["last.dump"]] <- dump assignInMyNamespace("crash_dumps", crash_dumps) } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel