Re: [R] Issue with gc() on Ubuntu 20.04
On 27-08-2023 21:02, Ivan Krylov wrote: On Sun, 27 Aug 2023 19:54:23 +0100 John Logsdon wrote: Not so although it did lower the gc() time to 95.84%. This was on a 16 core Threadripper 1950X box so I was intending to use library parallel but I tried it on my lowly windows box that is years old and got it down to 88.07%. Does the Windows box have the same version of R on it? Yes, they are both 4.3.1 The only thing I can think of is that there are quite a lot of cases where a function is generated on the fly as in: eval(parse(t=paste("dprob <- function(x,l,s){",dist.functions[2,][dist.functions[1,]==distn],"(x,l,s)}",sep=""))) This isn't very idiomatic. If you need dprob to call the function named in dist.functions[2,][dist.functions[1,]==distn], wouldn't it be easier for R to assign that function straight to dprob? dprob <- get(dist.functions[2,][dist.functions[1,]==distn]) This way, you avoid the need to parse the code, which is typically not the fastest part of a programming language. (Generally in R and other programming languages with recursive data structures, storing variable names in other variables is not very efficient. Why not put functions directly into a list?) Agreed but this statement and other similar ones are only assigned once in an outer loop. Rprof() samples the whole call stack. Can you find out which functions result in a call to gc()? I haven't experimented with a wide sample of R code, but I don't usually encounter gc() as a major entry in my Rprof() outputs. From the first table, removing all the system functions, it suggests that the function do.combx() is mainly guilty. I have recoded that and gc() no longer appears - as it shouldn't with it switched off! One difference was that the new code used the built in combn function while the old code used gtools::combinations. I need gtools::permutations elsewhere but that is not time critical. Thanks Ivan for making me think! -- John Logsdon Quantex Research Ltd m:+447717758675/h:+441614454951 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issue with gc() on Ubuntu 20.04
On Sun, 27 Aug 2023 19:54:23 +0100 John Logsdon wrote: > Not so although it did lower the gc() time to 95.84%. > > This was on a 16 core Threadripper 1950X box so I was intending to > use library parallel but I tried it on my lowly windows box that is > years old and got it down to 88.07%. Does the Windows box have the same version of R on it? > The only thing I can think of is that there are quite a lot of cases > where a function is generated on the fly as in: > > eval(parse(t=paste("dprob <- > function(x,l,s){",dist.functions[2,][dist.functions[1,]==distn],"(x,l,s)}",sep=""))) This isn't very idiomatic. If you need dprob to call the function named in dist.functions[2,][dist.functions[1,]==distn], wouldn't it be easier for R to assign that function straight to dprob? dprob <- get(dist.functions[2,][dist.functions[1,]==distn]) This way, you avoid the need to parse the code, which is typically not the fastest part of a programming language. (Generally in R and other programming languages with recursive data structures, storing variable names in other variables is not very efficient. Why not put functions directly into a list?) Rprof() samples the whole call stack. Can you find out which functions result in a call to gc()? I haven't experimented with a wide sample of R code, but I don't usually encounter gc() as a major entry in my Rprof() outputs. -- Best regards, Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issue with gc() on Ubuntu 20.04
Folks I have come across an issue with gc() hogging the processor according to Rprof. Platform is Ubuntu 20.04 all up to date R version 4.3.1 libraries: survival, MASS, gtools and openxlsx. With default gc.auto options, the profiler notes the garbage collector as self.pct 99.39%. So I have tried switching it off using options(gc.auto=Inf) in the R session before running my program using source(). This lowered self.pct to 99.36. Not much there. After some pondering, I added an options(gc.auto=Inf) at the beginning of each function, not resetting it at exit, but expecting the offending function(s) to plead guilty. Not so although it did lower the gc() time to 95.84%. This was on a 16 core Threadripper 1950X box so I was intending to use library parallel but I tried it on my lowly windows box that is years old and got it down to 88.07%. The only thing I can think of is that there are quite a lot of cases where a function is generated on the fly as in: eval(parse(t=paste("dprob <- function(x,l,s){",dist.functions[2,][dist.functions[1,]==distn],"(x,l,s)}",sep=""))) I haven't added the options to any of these. The highest time used by any of my functions is 0.05% - the rest is dominated by gc(). There may not be much point in parallising the code until I can reduce the garbage collection. I am not short of memory and would like to disable it fully but despite adding to all routines, I haven't managed to do this yet. Can anyone advise me? And why is the Linux version so much worse than Windows? TIA -- John Logsdon Quantex Research Ltd m:+447717758675/h:+441614454951 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.