Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-08 Thread luke-tierney

Looks like this is an unavoidable interaction between the way source
references and lazy loading are implemented. The link back to the
crash_dumps environment comes though source references on an
unevaluated argument promise. Creating a fresh environment is
.onLoad() avoids this and is probably your best bet.

Having an option to serialize without source references might be nice
but would probably not be high enough on anyone's priority list to get
done anytime soon.

Best,

luke

On Thu, 8 Apr 2021, luke-tier...@uiowa.edu wrote:


I see that now also. Not sure yet what is going on.

One work-around that may work for you is to create a fresh crash dump
in a .onLoad function; somehting like

crash_dumps <- NULL
.onLoad <- function(...) crash_dumps <<- new.env()

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi Dirk, hi Luke,

Thanks for checking!

I could narrow it down further. I have the issue only if I install 
--with-keep.source, i.e.


R CMD INSTALL --with-keep.source dumpTest

Since this is the default in RStudio when clicking "Install and Restart", I 
was always having the issue - also from base R. If I install using e.g. 
devtools::install_github() directly it is also fine for me.


Could you please confirm? Thanks!

Regards,
Andreas

2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" :


On 7 April 2021 at 16:06, Andreas Kersting wrote:
| Hi Luke,
|
| Please see https://github.com/akersting/dumpTest for the package.
|
| Here a session showing my issue:
|
| > library(dumpTest)
| > sessionInfo()
| R version 4.0.5 (2021-03-31)
| Platform: x86_64-pc-linux-gnu (64-bit)
| Running under: Debian GNU/Linux 10 (buster)
|
| Matrix products: default
| BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
| LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
|
| locale:
|  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
|  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
|  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
|  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
|  [9] LC_ADDRESS=C   LC_TELEPHONE=C
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
|
| attached base packages:
| [1] stats graphics  grDevices utils datasets  methods   base
|
| other attached packages:
| [1] dumpTest_0.1.0
|
| loaded via a namespace (and not attached):
| [1] compiler_4.0.5
| > for (i in 1:100) {
| +   print(i)
| +   print(system.time(f()))
| + }
| [1] 1
|user  system elapsed
|   0.028   0.004   0.034
| [1] 2
|user  system elapsed
|   0.067   0.008   0.075
| [1] 3
|user  system elapsed
|   0.176   0.000   0.176
| [1] 4
|user  system elapsed
|   0.335   0.012   0.349
| [1] 5
|user  system elapsed
|   0.745   0.023   0.770
| [1] 6
|user  system elapsed
|   1.495   0.060   1.572
| [1] 7
|user  system elapsed
|   2.902   0.136   3.040
| [1] 8
|user  system elapsed
|   5.753   0.272   6.034
| [1] 9
|user  system elapsed
|  11.807   0.708  12.597
| [1] 10
| ^C
| Timing stopped at: 6.638 0.549 7.214
|
| I had to interrupt in iteration 10 because I was running low on RAM.

No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
off my Debian package, hence instrumentation as in the Debian package.

edd@rob:~$ installGithub.r akersting/dumpTest
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo akersting/dumpTest@HEAD
✔  checking for file 
‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...

─  preparing ‘dumpTest’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘dumpTest_0.1.0.tar.gz’

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
* installing *source* package ‘dumpTest’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
No man pages found in package  ‘dumpTest’
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation 
path

* DONE (dumpTest)
edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
   user  system elapsed
  0.481   0.019   0.500
edd@rob:~$

(I also ran the variant you showed with the dual print statements, it just
consumes more screen real estate and ends on

[...]
[1] 97
   user  system elapsed
  0.004   0.000   0.005
[1] 98
   user  system elapsed
  0.004   0.000   0.005
[1] 99
   user  system elapsed
  0.004   0.000   0.004
[1] 100
   user  system elapsed
  0.005   0.000   0.005
edd@rob:~$ )

Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org






--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-08 Thread luke-tierney

I see that now also. Not sure yet what is going on.

One work-around that may work for you is to create a fresh crash dump
in a .onLoad function; somehting like

crash_dumps <- NULL
.onLoad <- function(...) crash_dumps <<- new.env()

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi Dirk, hi Luke,

Thanks for checking!

I could narrow it down further. I have the issue only if I install 
--with-keep.source, i.e.

R CMD INSTALL --with-keep.source dumpTest

Since this is the default in RStudio when clicking "Install and Restart", I was 
always having the issue - also from base R. If I install using e.g. 
devtools::install_github() directly it is also fine for me.

Could you please confirm? Thanks!

Regards,
Andreas

2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" :


On 7 April 2021 at 16:06, Andreas Kersting wrote:
| Hi Luke,
|
| Please see https://github.com/akersting/dumpTest for the package.
|
| Here a session showing my issue:
|
| > library(dumpTest)
| > sessionInfo()
| R version 4.0.5 (2021-03-31)
| Platform: x86_64-pc-linux-gnu (64-bit)
| Running under: Debian GNU/Linux 10 (buster)
|
| Matrix products: default
| BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
| LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
|
| locale:
|  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
|  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
|  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
|  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
|  [9] LC_ADDRESS=C   LC_TELEPHONE=C
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
|
| attached base packages:
| [1] stats graphics  grDevices utils datasets  methods   base
|
| other attached packages:
| [1] dumpTest_0.1.0
|
| loaded via a namespace (and not attached):
| [1] compiler_4.0.5
| > for (i in 1:100) {
| +   print(i)
| +   print(system.time(f()))
| + }
| [1] 1
|user  system elapsed
|   0.028   0.004   0.034
| [1] 2
|user  system elapsed
|   0.067   0.008   0.075
| [1] 3
|user  system elapsed
|   0.176   0.000   0.176
| [1] 4
|user  system elapsed
|   0.335   0.012   0.349
| [1] 5
|user  system elapsed
|   0.745   0.023   0.770
| [1] 6
|user  system elapsed
|   1.495   0.060   1.572
| [1] 7
|user  system elapsed
|   2.902   0.136   3.040
| [1] 8
|user  system elapsed
|   5.753   0.272   6.034
| [1] 9
|user  system elapsed
|  11.807   0.708  12.597
| [1] 10
| ^C
| Timing stopped at: 6.638 0.549 7.214
|
| I had to interrupt in iteration 10 because I was running low on RAM.

No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
off my Debian package, hence instrumentation as in the Debian package.

edd@rob:~$ installGithub.r akersting/dumpTest
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo akersting/dumpTest@HEAD
✔  checking for file 
‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...
─  preparing ‘dumpTest’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘dumpTest_0.1.0.tar.gz’

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
* installing *source* package ‘dumpTest’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
No man pages found in package  ‘dumpTest’
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (dumpTest)
edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
   user  system elapsed
  0.481   0.019   0.500
edd@rob:~$

(I also ran the variant you showed with the dual print statements, it just
consumes more screen real estate and ends on

[...]
[1] 97
   user  system elapsed
  0.004   0.000   0.005
[1] 98
   user  system elapsed
  0.004   0.000   0.005
[1] 99
   user  system elapsed
  0.004   0.000   0.004
[1] 100
   user  system elapsed
  0.005   0.000   0.005
edd@rob:~$ )

Dirk

--
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-08 Thread Andreas Kersting
Hi,

For (hopefully) full reproducibility:

docker run rocker/tidyverse:4.0.5 Rscript -e 
'devtools::install_github("akersting/dumpTest", INSTALL_opts = 
"--with-keep.source"); library(dumpTest); for (i in 1:100) {print(i); 
print(system.time(f()))}'

Regards,
Andreas

2021-04-07 17:09 GMT+02:00 "Andreas Kersting" :
> Hi Dirk, hi Luke,
> 
> Thanks for checking!
> 
> I could narrow it down further. I have the issue only if I install 
> --with-keep.source, i.e.
> 
> R CMD INSTALL --with-keep.source dumpTest
> 
> Since this is the default in RStudio when clicking "Install and Restart", I 
> was always having the issue - also from base R. If I install using e.g. 
> devtools::install_github() directly it is also fine for me.
> 
> Could you please confirm? Thanks!
> 
> Regards,
> Andreas
> 
> 2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" :
>> 
>> On 7 April 2021 at 16:06, Andreas Kersting wrote:
>> | Hi Luke,
>> | 
>> | Please see https://github.com/akersting/dumpTest for the package.
>> | 
>> | Here a session showing my issue:
>> | 
>> | > library(dumpTest)
>> | > sessionInfo()
>> | R version 4.0.5 (2021-03-31)
>> | Platform: x86_64-pc-linux-gnu (64-bit)
>> | Running under: Debian GNU/Linux 10 (buster)
>> | 
>> | Matrix products: default
>> | BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
>> | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
>> | 
>> | locale:
>> |  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
>> |  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>> |  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
>> |  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
>> |  [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   
>> | 
>> | attached base packages:
>> | [1] stats graphics  grDevices utils datasets  methods   base 
>> | 
>> | other attached packages:
>> | [1] dumpTest_0.1.0
>> | 
>> | loaded via a namespace (and not attached):
>> | [1] compiler_4.0.5
>> | > for (i in 1:100) {
>> | +   print(i)
>> | +   print(system.time(f()))
>> | + }
>> | [1] 1
>> |user  system elapsed 
>> |   0.028   0.004   0.034 
>> | [1] 2
>> |user  system elapsed 
>> |   0.067   0.008   0.075 
>> | [1] 3
>> |user  system elapsed 
>> |   0.176   0.000   0.176 
>> | [1] 4
>> |user  system elapsed 
>> |   0.335   0.012   0.349 
>> | [1] 5
>> |user  system elapsed 
>> |   0.745   0.023   0.770 
>> | [1] 6
>> |user  system elapsed 
>> |   1.495   0.060   1.572 
>> | [1] 7
>> |user  system elapsed 
>> |   2.902   0.136   3.040 
>> | [1] 8
>> |user  system elapsed 
>> |   5.753   0.272   6.034 
>> | [1] 9
>> |user  system elapsed 
>> |  11.807   0.708  12.597 
>> | [1] 10
>> | ^C
>> | Timing stopped at: 6.638 0.549 7.214
>> | 
>> | I had to interrupt in iteration 10 because I was running low on RAM.
>> 
>> No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
>> off my Debian package, hence instrumentation as in the Debian package.
>> 
>> edd@rob:~$ installGithub.r akersting/dumpTest
>> Using github PAT from envvar GITHUB_PAT
>> Downloading GitHub repo akersting/dumpTest@HEAD
>> ✔  checking for file 
>> ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...
>> ─  preparing ‘dumpTest’:
>> ✔  checking DESCRIPTION meta-information ...
>> ─  checking for LF line-endings in source and make files and shell scripts
>> ─  checking for empty or unneeded directories
>> ─  building ‘dumpTest_0.1.0.tar.gz’
>>
>> Installing package into ‘/usr/local/lib/R/site-library’
>> (as ‘lib’ is unspecified)
>> * installing *source* package ‘dumpTest’ ...
>> ** using staged installation
>> ** R
>> ** byte-compile and prepare package for lazy loading
>> ** help
>> No man pages found in package  ‘dumpTest’ 
>> *** installing help indices
>> ** building package indices
>> ** testing if installed package can be loaded from temporary location
>> ** testing if installed package can be loaded from final location
>> ** testing if installed package keeps a record of temporary installation path
>> * DONE (dumpTest)
>> edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
>>user  system elapsed 
>>   0.481   0.019   0.500 
>> edd@rob:~$
>> 
>> (I also ran the variant you showed with the dual print statements, it just
>> consumes more screen real estate and ends on
>> 
>> [...]
>> [1] 97  
>>user  system elapsed 
>>   0.004   0.000   0.005 
>> [1] 98   
>> 
>>user  system elapsed 
>>   0.004   0.000   0.005   
>> [1] 99 
>>user  system elapsed
>>   0.004   0.000   0.004  
>> 
>> [1] 100  
>>   

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-07 Thread Andreas Kersting
Hi Dirk, hi Luke,

Thanks for checking!

I could narrow it down further. I have the issue only if I install 
--with-keep.source, i.e.

R CMD INSTALL --with-keep.source dumpTest

Since this is the default in RStudio when clicking "Install and Restart", I was 
always having the issue - also from base R. If I install using e.g. 
devtools::install_github() directly it is also fine for me.

Could you please confirm? Thanks!

Regards,
Andreas

2021-04-07 16:20 GMT+02:00 "Dirk Eddelbuettel" :
> 
> On 7 April 2021 at 16:06, Andreas Kersting wrote:
> | Hi Luke,
> | 
> | Please see https://github.com/akersting/dumpTest for the package.
> | 
> | Here a session showing my issue:
> | 
> | > library(dumpTest)
> | > sessionInfo()
> | R version 4.0.5 (2021-03-31)
> | Platform: x86_64-pc-linux-gnu (64-bit)
> | Running under: Debian GNU/Linux 10 (buster)
> | 
> | Matrix products: default
> | BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
> | LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
> | 
> | locale:
> |  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
> |  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> |  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
> |  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
> |  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> | [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   
> | 
> | attached base packages:
> | [1] stats graphics  grDevices utils datasets  methods   base 
> | 
> | other attached packages:
> | [1] dumpTest_0.1.0
> | 
> | loaded via a namespace (and not attached):
> | [1] compiler_4.0.5
> | > for (i in 1:100) {
> | +   print(i)
> | +   print(system.time(f()))
> | + }
> | [1] 1
> |user  system elapsed 
> |   0.028   0.004   0.034 
> | [1] 2
> |user  system elapsed 
> |   0.067   0.008   0.075 
> | [1] 3
> |user  system elapsed 
> |   0.176   0.000   0.176 
> | [1] 4
> |user  system elapsed 
> |   0.335   0.012   0.349 
> | [1] 5
> |user  system elapsed 
> |   0.745   0.023   0.770 
> | [1] 6
> |user  system elapsed 
> |   1.495   0.060   1.572 
> | [1] 7
> |user  system elapsed 
> |   2.902   0.136   3.040 
> | [1] 8
> |user  system elapsed 
> |   5.753   0.272   6.034 
> | [1] 9
> |user  system elapsed 
> |  11.807   0.708  12.597 
> | [1] 10
> | ^C
> | Timing stopped at: 6.638 0.549 7.214
> | 
> | I had to interrupt in iteration 10 because I was running low on RAM.
> 
> No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
> off my Debian package, hence instrumentation as in the Debian package.
> 
> edd@rob:~$ installGithub.r akersting/dumpTest
> Using github PAT from envvar GITHUB_PAT
> Downloading GitHub repo akersting/dumpTest@HEAD
> ✔  checking for file 
> ‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...
> ─  preparing ‘dumpTest’:
> ✔  checking DESCRIPTION meta-information ...
> ─  checking for LF line-endings in source and make files and shell scripts
> ─  checking for empty or unneeded directories
> ─  building ‘dumpTest_0.1.0.tar.gz’
>
> Installing package into ‘/usr/local/lib/R/site-library’
> (as ‘lib’ is unspecified)
> * installing *source* package ‘dumpTest’ ...
> ** using staged installation
> ** R
> ** byte-compile and prepare package for lazy loading
> ** help
> No man pages found in package  ‘dumpTest’ 
> *** installing help indices
> ** building package indices
> ** testing if installed package can be loaded from temporary location
> ** testing if installed package can be loaded from final location
> ** testing if installed package keeps a record of temporary installation path
> * DONE (dumpTest)
> edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
>user  system elapsed 
>   0.481   0.019   0.500 
> edd@rob:~$
> 
> (I also ran the variant you showed with the dual print statements, it just
> consumes more screen real estate and ends on
> 
> [...]
> [1] 97  
>user  system elapsed 
>   0.004   0.000   0.005 
> [1] 98
>
>user  system elapsed 
>   0.004   0.000   0.005   
> [1] 99 
>user  system elapsed
>   0.004   0.000   0.004   
>
> [1] 100   
>
>user  system elapsed   
>
>   0.005   0.000   0.005 
> edd@rob:~$ )
> 
> Dirk
> 
> -- 
> https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
> 
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-07 Thread luke-tierney

No issues here with that either. Looks like something is different on
your end.

Best,

luke

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi Luke,

Please see https://github.com/akersting/dumpTest for the package.

Here a session showing my issue:


library(dumpTest)
sessionInfo()

R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0

locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] dumpTest_0.1.0

loaded via a namespace (and not attached):
[1] compiler_4.0.5

for (i in 1:100) {

+   print(i)
+   print(system.time(f()))
+ }
[1] 1
  user  system elapsed
 0.028   0.004   0.034
[1] 2
  user  system elapsed
 0.067   0.008   0.075
[1] 3
  user  system elapsed
 0.176   0.000   0.176
[1] 4
  user  system elapsed
 0.335   0.012   0.349
[1] 5
  user  system elapsed
 0.745   0.023   0.770
[1] 6
  user  system elapsed
 1.495   0.060   1.572
[1] 7
  user  system elapsed
 2.902   0.136   3.040
[1] 8
  user  system elapsed
 5.753   0.272   6.034
[1] 9
  user  system elapsed
11.807   0.708  12.597
[1] 10
^C
Timing stopped at: 6.638 0.549 7.214

I had to interrupt in iteration 10 because I was running low on RAM.

Regards,
Andreas

2021-04-07 15:28 GMT+02:00 luke-tier...@uiowa.edu:

On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi,

please consider the following minimal reproducible example:

Create a new R package which just contains the following two (exported) objects:


I would not expect this behavior and I don't see it when I make such a
package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
more complete reproducible example if you want help with what you are
trying to do; also sessionInfo() would help.

Best,

luke




crash_dumps <- new.env()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 assign("last.dump", dump, crash_dumps)
}


WARNING: the following will probably eat all your RAM!

Attach this package and run:

for (i in 1:100) {
 print(i)
 f()
}

You will notice that with each iteration the execution of f() slows down 
significantly while the memory consumption of the R process (v4.0.5 on Linux) 
quickly explodes.

I am having a hard time to understand what exactly is happening here. Something 
w.r.t. too deeply nested environments? Could someone please enlighten me? 
Thanks!

Regards,
Andreas


Background:
In an R package I store crash dumps on error in a parallel processes in a way 
similar to what I have just shown (hence the (un)serialize(), which happens as 
part of returning the objects to the parent process). The first 2 or 3 times I 
do so in a session everything is fine, but afterwards it takes very long and I 
soon run out of memory.

Some more observations:
- If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
- If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue 
- probably because .GlobalEnv is not included in sys.frames(), while 
crash_dumps is indirectly via the namespace of the package being the parent.env 
of some of the sys.frames()!?
- If I omit the lapply(...), i.e. use `dump <- 
unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. 
The immediate consequence is that there are less sys.frames and - in particular - 
there is no frame which has the base namespace as its parent.env.
- If I make crash_dumps a list and use assignInMyNamespace() to store the dump 
in it, there also seems to be no issue. I will probably use this as a 
workaround:

crash_dumps <- list()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 crash_dumps[["last.dump"]] <- dump
 assignInMyNamespace("crash_dumps", crash_dumps)
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-07 Thread Dirk Eddelbuettel


On 7 April 2021 at 16:06, Andreas Kersting wrote:
| Hi Luke,
| 
| Please see https://github.com/akersting/dumpTest for the package.
| 
| Here a session showing my issue:
| 
| > library(dumpTest)
| > sessionInfo()
| R version 4.0.5 (2021-03-31)
| Platform: x86_64-pc-linux-gnu (64-bit)
| Running under: Debian GNU/Linux 10 (buster)
| 
| Matrix products: default
| BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
| LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
| 
| locale:
|  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
|  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
|  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
|  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
|  [9] LC_ADDRESS=C   LC_TELEPHONE=C
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   
| 
| attached base packages:
| [1] stats graphics  grDevices utils datasets  methods   base 
| 
| other attached packages:
| [1] dumpTest_0.1.0
| 
| loaded via a namespace (and not attached):
| [1] compiler_4.0.5
| > for (i in 1:100) {
| +   print(i)
| +   print(system.time(f()))
| + }
| [1] 1
|user  system elapsed 
|   0.028   0.004   0.034 
| [1] 2
|user  system elapsed 
|   0.067   0.008   0.075 
| [1] 3
|user  system elapsed 
|   0.176   0.000   0.176 
| [1] 4
|user  system elapsed 
|   0.335   0.012   0.349 
| [1] 5
|user  system elapsed 
|   0.745   0.023   0.770 
| [1] 6
|user  system elapsed 
|   1.495   0.060   1.572 
| [1] 7
|user  system elapsed 
|   2.902   0.136   3.040 
| [1] 8
|user  system elapsed 
|   5.753   0.272   6.034 
| [1] 9
|user  system elapsed 
|  11.807   0.708  12.597 
| [1] 10
| ^C
| Timing stopped at: 6.638 0.549 7.214
| 
| I had to interrupt in iteration 10 because I was running low on RAM.

No issue here.  Ubuntu 20.10, R 4.0.5 'from CRAN' i.e. Michael's PPA build
off my Debian package, hence instrumentation as in the Debian package.

edd@rob:~$ installGithub.r akersting/dumpTest
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo akersting/dumpTest@HEAD
✔  checking for file 
‘/tmp/remotes3f9af733166ccd/akersting-dumpTest-3bed8e2/DESCRIPTION’ ...
─  preparing ‘dumpTest’:
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘dumpTest_0.1.0.tar.gz’
   
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
* installing *source* package ‘dumpTest’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
No man pages found in package  ‘dumpTest’ 
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (dumpTest)
edd@rob:~$ Rscript -e 'system.time({for (i in 1:100) dumpTest::f()})'
   user  system elapsed 
  0.481   0.019   0.500 
edd@rob:~$

(I also ran the variant you showed with the dual print statements, it just
consumes more screen real estate and ends on

[...]
[1] 97  
   user  system elapsed 
  0.004   0.000   0.005 
[1] 98  
 
   user  system elapsed 
  0.004   0.000   0.005   
[1] 99 
   user  system elapsed
  0.004   0.000   0.004 
 
[1] 100 
 
   user  system elapsed 
 
  0.005   0.000   0.005 
edd@rob:~$ )

Dirk

-- 
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-07 Thread Andreas Kersting
Hi Luke,

Please see https://github.com/akersting/dumpTest for the package.

Here a session showing my issue:

> library(dumpTest)
> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] dumpTest_0.1.0

loaded via a namespace (and not attached):
[1] compiler_4.0.5
> for (i in 1:100) {
+   print(i)
+   print(system.time(f()))
+ }
[1] 1
   user  system elapsed 
  0.028   0.004   0.034 
[1] 2
   user  system elapsed 
  0.067   0.008   0.075 
[1] 3
   user  system elapsed 
  0.176   0.000   0.176 
[1] 4
   user  system elapsed 
  0.335   0.012   0.349 
[1] 5
   user  system elapsed 
  0.745   0.023   0.770 
[1] 6
   user  system elapsed 
  1.495   0.060   1.572 
[1] 7
   user  system elapsed 
  2.902   0.136   3.040 
[1] 8
   user  system elapsed 
  5.753   0.272   6.034 
[1] 9
   user  system elapsed 
 11.807   0.708  12.597 
[1] 10
^C
Timing stopped at: 6.638 0.549 7.214

I had to interrupt in iteration 10 because I was running low on RAM.

Regards,
Andreas

2021-04-07 15:28 GMT+02:00 luke-tier...@uiowa.edu:
> On Wed, 7 Apr 2021, Andreas Kersting wrote:
> 
>> Hi,
>>
>> please consider the following minimal reproducible example:
>>
>> Create a new R package which just contains the following two (exported) 
>> objects:
> 
> I would not expect this behavior and I don't see it when I make such a
> package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
> more complete reproducible example if you want help with what you are
> trying to do; also sessionInfo() would help.
> 
> Best,
> 
> luke
> 
>>
>>
>> crash_dumps <- new.env()
>>
>> f <- function() {
>>  x <- runif(1e5)
>>  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
>>  assign("last.dump", dump, crash_dumps)
>> }
>>
>>
>> WARNING: the following will probably eat all your RAM!
>>
>> Attach this package and run:
>>
>> for (i in 1:100) {
>>  print(i)
>>  f()
>> }
>>
>> You will notice that with each iteration the execution of f() slows down 
>> significantly while the memory consumption of the R process (v4.0.5 on 
>> Linux) quickly explodes.
>>
>> I am having a hard time to understand what exactly is happening here. 
>> Something w.r.t. too deeply nested environments? Could someone please 
>> enlighten me? Thanks!
>>
>> Regards,
>> Andreas
>>
>>
>> Background:
>> In an R package I store crash dumps on error in a parallel processes in a 
>> way similar to what I have just shown (hence the (un)serialize(), which 
>> happens as part of returning the objects to the parent process). The first 2 
>> or 3 times I do so in a session everything is fine, but afterwards it takes 
>> very long and I soon run out of memory.
>>
>> Some more observations:
>> - If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
>> - If I assign to .GlobalEnv instead of crash_dumps, there seems to be no 
>> issue - probably because .GlobalEnv is not included in sys.frames(), while 
>> crash_dumps is indirectly via the namespace of the package being the 
>> parent.env of some of the sys.frames()!?
>> - If I omit the lapply(...), i.e. use `dump <- 
>> unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no 
>> issue. The immediate consequence is that there are less sys.frames and - in 
>> particular - there is no frame which has the base namespace as its 
>> parent.env.
>> - If I make crash_dumps a list and use assignInMyNamespace() to store the 
>> dump in it, there also seems to be no issue. I will probably use this as a 
>> workaround:
>>
>> crash_dumps <- list()
>>
>> f <- function() {
>>  x <- runif(1e5)
>>  dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
>>  crash_dumps[["last.dump"]] <- dump
>>  assignInMyNamespace("crash_dumps", crash_dumps)
>> }
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
> -- 
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
>Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
> 

Re: [Rd] [External] memory consumption of nested (un)serialize of sys.frames()

2021-04-07 Thread luke-tierney



On Wed, 7 Apr 2021, Andreas Kersting wrote:


Hi,

please consider the following minimal reproducible example:

Create a new R package which just contains the following two (exported) objects:


I would not expect this behavior and I don't see it when I make such a
package (in R 4.0.3 or R-devel on Ubuntu).  You will need to provide a
more complete reproducible example if you want help with what you are
trying to do; also sessionInfo() would help.

Best,

luke




crash_dumps <- new.env()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 assign("last.dump", dump, crash_dumps)
}


WARNING: the following will probably eat all your RAM!

Attach this package and run:

for (i in 1:100) {
 print(i)
 f()
}

You will notice that with each iteration the execution of f() slows down 
significantly while the memory consumption of the R process (v4.0.5 on Linux) 
quickly explodes.

I am having a hard time to understand what exactly is happening here. Something 
w.r.t. too deeply nested environments? Could someone please enlighten me? 
Thanks!

Regards,
Andreas


Background:
In an R package I store crash dumps on error in a parallel processes in a way 
similar to what I have just shown (hence the (un)serialize(), which happens as 
part of returning the objects to the parent process). The first 2 or 3 times I 
do so in a session everything is fine, but afterwards it takes very long and I 
soon run out of memory.

Some more observations:
- If I omit `x <- runif(1e5)`, the issues seem to be less pronounced.
- If I assign to .GlobalEnv instead of crash_dumps, there seems to be no issue 
- probably because .GlobalEnv is not included in sys.frames(), while 
crash_dumps is indirectly via the namespace of the package being the parent.env 
of some of the sys.frames()!?
- If I omit the lapply(...), i.e. use `dump <- 
unserialize(serialize(sys.frames(), NULL))` directly, there seems to be no issue. 
The immediate consequence is that there are less sys.frames and - in particular - 
there is no frame which has the base namespace as its parent.env.
- If I make crash_dumps a list and use assignInMyNamespace() to store the dump 
in it, there also seems to be no issue. I will probably use this as a 
workaround:

crash_dumps <- list()

f <- function() {
 x <- runif(1e5)
 dump <- lapply(1:2, function(i) unserialize(serialize(sys.frames(), NULL)))
 crash_dumps[["last.dump"]] <- dump
 assignInMyNamespace("crash_dumps", crash_dumps)
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel