I’m definitely sympathetic to both sides but have come around to the view of Greg, Dirk et al. It seems sensible to have a default that benefits the majority of “normal” users and require explicit action in shared environments not vice-versa.
That is not to say that data.table could not do better with it’s heuristics (e.g. respecting CGroups settings as raised by Henrik in https://github.com/Rdatatable/data.table/issues/5620) but the current defaults (50%) seem reasonable for, dare I say, most users. Tim > On 26 Aug 2023, at 03:20, Greg Hunt <g...@firmansyah.com> wrote: > > The question should be, in how many cases is the current behaviour a > problem? In a shared environment, sure, you have to be more careful. I'd > say don't let the teenagers in there. The CRAN build server does need to do > something to protect itself and I don't greatly mind the 2 thread limit, I > implemented it by hand in my examples and didn't think about it > afterwards. On most 8, 16 or 32 way environments, dedicated or > semi-dedicated to a particular workload, the defaults make some level of > sense and they are probably most of the use cases. Protecting high > processor count environments from people who don't know what they are doing > would seem to be a mismatch between the people and the environment, not so > much a matter of software. > >> On Sat, 26 Aug 2023 at 11:49, Jeff Newmiller <jdnew...@dcn.davis.ca.us> >> wrote: >> >> You have a really bizarre way of twisting what others are saying, Dirk. I >> have seen no-one here saying 'limit R to 2 threads' except for you, as a >> way to paint opposing views to be absurd. >> >> What _is_ being said is that users need to be in control_, but _the >> default needs to do least harm_ until those users take responsibility for >> that control. Do not turn the throttle up until the user is prepared for >> the consequences. Trying to subvert that responsibility into packages by >> default is going to make more trouble than giving the people using those >> packages simple examples of how to take that control. >> >> A similar problem happens when users discover .Rprofile and insert all >> those pesky library statements into it, making their scripts >> irreproducible. If data.table made a warp10() function that activated this >> current default performance setting then the user would be clearly at fault >> for using it in an inappropriate environment like a shared HPC or the CRAN >> servers. Don't put a brick on the accelerator of a teenager's car before >> they even figure out where the brakes are. >> >>> On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel <e...@debian.org> >>> wrote: >>> >>>> On 26 August 2023 at 12:05, Simon Urbanek wrote: >>> | In reality it's more people running R on their laptops vs the rest of >> the world. >>> >>> My point was that we also have 'single user on really Yuge workstation'. >>> >>> Plus we all know that those users are often not sysadmins, and do not have >>> our levels of accumulated systems knowledge. >>> >>> So we should give _more_ power by default, not less. >>> >>> | [...] they will always be saying blatantly false things like "R is not >> for large data" >>> >>> By limiting R (and/or packages) to two threads we will only get more of >>> these. Our collective call. >>> >>> This whole thread is pretty sad, actually. >>> >>> Dirk >>> >> >> -- >> Sent from my phone. Please excuse my brevity. >> >> ______________________________________________ >> R-package-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-package-devel >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel [[alternative HTML version deleted]] ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel