Dear all,

in today's R Core meeting both the CRAN team and R Core agree with Simon's suggestion below.

Let me repeat the key points:

- We will try to add some interface to R that allows for more unified control about the various ways of parallelisation. That should allow users to opt in for more than 2 cores and/or threads and/or processes. Details will follow as this is not simple.

- As long as users do not have simple ways of controlling how demanding code is (e.g., different ways of parallelizationare used even in nested ways), CRAN will further on protect users and enforce that packages do not use more than 2 cores by default.

Best,
Uwe Ligges



On 26.08.2023 02:05, Simon Urbanek wrote:


On Aug 26, 2023, at 11:01 AM, Dirk Eddelbuettel <e...@debian.org> wrote:


On 25 August 2023 at 18:45, Duncan Murdoch wrote:
| The real problem is that there are two stubborn groups opposing each
| other:  the data.table developers and the CRAN maintainers.  The former
| think users should by default dedicate their whole machine to
| data.table.  The latter think users should opt in to do that.

No, it feels more like it is CRAN versus the rest of the world.



In reality it's more people running R on their laptops vs the rest of the 
world. Although people with laptops are the vast majority, they also are the 
least impacted by the decision going either way. I think Jeff summed up the 
core reasoning pretty well. Harm is done by excessive use, not other other way 
around.

That said, I think this thread is really missing the key point: there is no 
central mechanism that would govern the use of CPU resources. OMP_THREAD_LIMIT 
is just one of may ways and even that is vastly insufficient for reasons 
discussed (e.g, recursive use of processes). It is not CRAN's responsibility to 
figure out for each package what it needs to behave sanely - it has no way of 
knowing what type of parallelism is used, under which circumstances and how to 
control it. Only the package author knows that (hopefully), which is why it's 
on them. So instead of complaining here better use of time would be to look at 
what's being used in packages and come up with a unified approach to monitoring 
core usage and a mechanism by which the packages could self-govern to respect 
the desired limits. If there was one canonical place, it would be also easy for 
users to opt in/out as they desire - and I'd be happy to help if any components 
of it need to be in core R.



Take but one example, and as I may have mentioned elsewhere, my day job consists in 
providing software so that (to take one recent example) bioinformatics specialist can 
slice huge amounts of genomics data.  When that happens on a dedicated (expensive) 
hardware with dozens of cores, it would be wasteful to have an unconditional default of 
two threads. It would be the end of R among serious people, no more, no less. Can you 
imagine how the internet headlines would go: "R defaults to two threads".


If you run on such a machine then you or your admin certainly know how to set the desired 
limits. From experience the problem is exactly the opposite - it's far more common for 
users to not know how to not overload such a machine. As for internet headlines, they 
will always be saying blatantly false things like "R is not for large data" 
even though we have been using it to analyze terabytes of data per minute ...

Cheers,
Simon



And it is not just data.table as even in the long thread over in its repo we 
have people chiming in using OpenMP in their code (as data.table does but which 
needs a different setter than the data.table thread count).

It is the CRAN servers which (rightly !!) want to impose constraints for when 
packages are tested.  Nobody objects to that.

But some of us wonder if settings these defaults for all R user, all the time, 
unconditional is really the right thing to do.  Anyway, Uwe told me he will 
take it to an internal discussion, so let's hope sanity prevails.




______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to