Folks

I have come across an issue with gc() hogging the processor according to Rprof.

Platform is Ubuntu 20.04 all up to date
R version 4.3.1
libraries: survival, MASS, gtools and openxlsx.

With default gc.auto options, the profiler notes the garbage collector as self.pct 99.39%.

So I have tried switching it off using options(gc.auto=Inf) in the R session before running my program using source().

This lowered self.pct to 99.36.  Not much there.

After some pondering, I added an options(gc.auto=Inf) at the beginning of each function, not resetting it at exit, but expecting the offending function(s) to plead guilty.

Not so although it did lower the gc() time to 95.84%.

This was on a 16 core Threadripper 1950X box so I was intending to use library parallel but I tried it on my lowly windows box that is years old and got it down to 88.07%.

The only thing I can think of is that there are quite a lot of cases where a function is generated on the fly as in:

eval(parse(t=paste("dprob <- function(x,l,s){",dist.functions[2,][dist.functions[1,]==distn],"(x,l,s)}",sep="")))

I haven't added the options to any of these.

The highest time used by any of my functions is 0.05% - the rest is dominated by gc().

There may not be much point in parallising the code until I can reduce the garbage collection.

I am not short of memory and would like to disable it fully but despite adding to all routines, I haven't managed to do this yet.

Can anyone advise me?

And why is the Linux version so much worse than Windows?

TIA

--
John Logsdon
Quantex Research Ltd
m:+447717758675/h:+441614454951

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to