Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
Dear all, in today's R Core meeting both the CRAN team and R Core agree with Simon's suggestion below. Let me repeat the key points: - We will try to add some interface to R that allows for more unified control about the various ways of parallelisation. That should allow users to opt in for more than 2 cores and/or threads and/or processes. Details will follow as this is not simple. - As long as users do not have simple ways of controlling how demanding code is (e.g., different ways of parallelizationare used even in nested ways), CRAN will further on protect users and enforce that packages do not use more than 2 cores by default. Best, Uwe Ligges On 26.08.2023 02:05, Simon Urbanek wrote: On Aug 26, 2023, at 11:01 AM, Dirk Eddelbuettel wrote: On 25 August 2023 at 18:45, Duncan Murdoch wrote: | The real problem is that there are two stubborn groups opposing each | other: the data.table developers and the CRAN maintainers. The former | think users should by default dedicate their whole machine to | data.table. The latter think users should opt in to do that. No, it feels more like it is CRAN versus the rest of the world. In reality it's more people running R on their laptops vs the rest of the world. Although people with laptops are the vast majority, they also are the least impacted by the decision going either way. I think Jeff summed up the core reasoning pretty well. Harm is done by excessive use, not other other way around. That said, I think this thread is really missing the key point: there is no central mechanism that would govern the use of CPU resources. OMP_THREAD_LIMIT is just one of may ways and even that is vastly insufficient for reasons discussed (e.g, recursive use of processes). It is not CRAN's responsibility to figure out for each package what it needs to behave sanely - it has no way of knowing what type of parallelism is used, under which circumstances and how to control it. Only the package author knows that (hopefully), which is why it's on them. So instead of complaining here better use of time would be to look at what's being used in packages and come up with a unified approach to monitoring core usage and a mechanism by which the packages could self-govern to respect the desired limits. If there was one canonical place, it would be also easy for users to opt in/out as they desire - and I'd be happy to help if any components of it need to be in core R. Take but one example, and as I may have mentioned elsewhere, my day job consists in providing software so that (to take one recent example) bioinformatics specialist can slice huge amounts of genomics data. When that happens on a dedicated (expensive) hardware with dozens of cores, it would be wasteful to have an unconditional default of two threads. It would be the end of R among serious people, no more, no less. Can you imagine how the internet headlines would go: "R defaults to two threads". If you run on such a machine then you or your admin certainly know how to set the desired limits. From experience the problem is exactly the opposite - it's far more common for users to not know how to not overload such a machine. As for internet headlines, they will always be saying blatantly false things like "R is not for large data" even though we have been using it to analyze terabytes of data per minute ... Cheers, Simon And it is not just data.table as even in the long thread over in its repo we have people chiming in using OpenMP in their code (as data.table does but which needs a different setter than the data.table thread count). It is the CRAN servers which (rightly !!) want to impose constraints for when packages are tested. Nobody objects to that. But some of us wonder if settings these defaults for all R user, all the time, unconditional is really the right thing to do. Anyway, Uwe told me he will take it to an internal discussion, so let's hope sanity prevails. __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
Tim, I think that things like data.table have a different set of problems depending on the environment. Working out what the right degree of parallelism for an IO workload is is a hard question that depends on the characteristics of the IO subsystem, the characteristics of the dataset and on what problem you really have (really how much its worth spending to achieve an optimal answer). It would be interesting to see how well data.table would do with several tens of threads on several tens of processors reading a file, I suspect it might not be pretty (coordination overheads could be large relative to the actual gains from IO parallelism), but its not a subject I've looked at. It would not surprise me if the right answer was to cap the number of threads, but that cap would probably still be higher than the usual number of processors in the average physical or virtual box. This stuff is not easy and its saturated with "it depends" answers. The underlying problem here is that to get optimal or optimal-enough behaviour, a 96-way or more box will require different configuration of the software to an 8 or 16-way VM. Greg On Sat, 26 Aug 2023 at 18:15, Tim Taylor wrote: > I’m definitely sympathetic to both sides but have come around to the view > of Greg, Dirk et al. It seems sensible to have a default that benefits the > majority of “normal” users and require explicit action in shared > environments not vice-versa. > > That is not to say that data.table could not do better with it’s > heuristics (e.g. respecting CGroups settings as raised by Henrik in > https://github.com/Rdatatable/data.table/issues/5620) but the current > defaults (50%) seem reasonable for, dare I say, most users. > > Tim > > On 26 Aug 2023, at 03:20, Greg Hunt wrote: > > The question should be, in how many cases is the current behaviour a > problem? In a shared environment, sure, you have to be more careful. I'd > say don't let the teenagers in there. The CRAN build server does need to do > something to protect itself and I don't greatly mind the 2 thread limit, I > implemented it by hand in my examples and didn't think about it > afterwards. On most 8, 16 or 32 way environments, dedicated or > semi-dedicated to a particular workload, the defaults make some level of > sense and they are probably most of the use cases. Protecting high > processor count environments from people who don't know what they are doing > would seem to be a mismatch between the people and the environment, not so > much a matter of software. > > On Sat, 26 Aug 2023 at 11:49, Jeff Newmiller > wrote: > > You have a really bizarre way of twisting what others are saying, Dirk. I > > have seen no-one here saying 'limit R to 2 threads' except for you, as a > > way to paint opposing views to be absurd. > > > What _is_ being said is that users need to be in control_, but _the > > default needs to do least harm_ until those users take responsibility for > > that control. Do not turn the throttle up until the user is prepared for > > the consequences. Trying to subvert that responsibility into packages by > > default is going to make more trouble than giving the people using those > > packages simple examples of how to take that control. > > > A similar problem happens when users discover .Rprofile and insert all > > those pesky library statements into it, making their scripts > > irreproducible. If data.table made a warp10() function that activated this > > current default performance setting then the user would be clearly at fault > > for using it in an inappropriate environment like a shared HPC or the CRAN > > servers. Don't put a brick on the accelerator of a teenager's car before > > they even figure out where the brakes are. > > > On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel > > wrote: > > > On 26 August 2023 at 12:05, Simon Urbanek wrote: > > | In reality it's more people running R on their laptops vs the rest of > > the world. > > > My point was that we also have 'single user on really Yuge workstation'. > > > Plus we all know that those users are often not sysadmins, and do not have > > our levels of accumulated systems knowledge. > > > So we should give _more_ power by default, not less. > > > | [...] they will always be saying blatantly false things like "R is not > > for large data" > > > By limiting R (and/or packages) to two threads we will only get more of > > these. Our collective call. > > > This whole thread is pretty sad, actually. > > > Dirk > > > > -- > > Sent from my phone. Please excuse my brevity. > > > __ > > R-package-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-package-devel > > > >[[alternative HTML version deleted]] > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel > > [[alternative HTML version deleted]]
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
I’m definitely sympathetic to both sides but have come around to the view of Greg, Dirk et al. It seems sensible to have a default that benefits the majority of “normal” users and require explicit action in shared environments not vice-versa. That is not to say that data.table could not do better with it’s heuristics (e.g. respecting CGroups settings as raised by Henrik in https://github.com/Rdatatable/data.table/issues/5620) but the current defaults (50%) seem reasonable for, dare I say, most users. Tim > On 26 Aug 2023, at 03:20, Greg Hunt wrote: > > The question should be, in how many cases is the current behaviour a > problem? In a shared environment, sure, you have to be more careful. I'd > say don't let the teenagers in there. The CRAN build server does need to do > something to protect itself and I don't greatly mind the 2 thread limit, I > implemented it by hand in my examples and didn't think about it > afterwards. On most 8, 16 or 32 way environments, dedicated or > semi-dedicated to a particular workload, the defaults make some level of > sense and they are probably most of the use cases. Protecting high > processor count environments from people who don't know what they are doing > would seem to be a mismatch between the people and the environment, not so > much a matter of software. > >> On Sat, 26 Aug 2023 at 11:49, Jeff Newmiller >> wrote: >> >> You have a really bizarre way of twisting what others are saying, Dirk. I >> have seen no-one here saying 'limit R to 2 threads' except for you, as a >> way to paint opposing views to be absurd. >> >> What _is_ being said is that users need to be in control_, but _the >> default needs to do least harm_ until those users take responsibility for >> that control. Do not turn the throttle up until the user is prepared for >> the consequences. Trying to subvert that responsibility into packages by >> default is going to make more trouble than giving the people using those >> packages simple examples of how to take that control. >> >> A similar problem happens when users discover .Rprofile and insert all >> those pesky library statements into it, making their scripts >> irreproducible. If data.table made a warp10() function that activated this >> current default performance setting then the user would be clearly at fault >> for using it in an inappropriate environment like a shared HPC or the CRAN >> servers. Don't put a brick on the accelerator of a teenager's car before >> they even figure out where the brakes are. >> >>> On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel >>> wrote: >>> On 26 August 2023 at 12:05, Simon Urbanek wrote: >>> | In reality it's more people running R on their laptops vs the rest of >> the world. >>> >>> My point was that we also have 'single user on really Yuge workstation'. >>> >>> Plus we all know that those users are often not sysadmins, and do not have >>> our levels of accumulated systems knowledge. >>> >>> So we should give _more_ power by default, not less. >>> >>> | [...] they will always be saying blatantly false things like "R is not >> for large data" >>> >>> By limiting R (and/or packages) to two threads we will only get more of >>> these. Our collective call. >>> >>> This whole thread is pretty sad, actually. >>> >>> Dirk >>> >> >> -- >> Sent from my phone. Please excuse my brevity. >> >> __ >> R-package-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-package-devel >> > >[[alternative HTML version deleted]] > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
The question should be, in how many cases is the current behaviour a problem? In a shared environment, sure, you have to be more careful. I'd say don't let the teenagers in there. The CRAN build server does need to do something to protect itself and I don't greatly mind the 2 thread limit, I implemented it by hand in my examples and didn't think about it afterwards. On most 8, 16 or 32 way environments, dedicated or semi-dedicated to a particular workload, the defaults make some level of sense and they are probably most of the use cases. Protecting high processor count environments from people who don't know what they are doing would seem to be a mismatch between the people and the environment, not so much a matter of software. On Sat, 26 Aug 2023 at 11:49, Jeff Newmiller wrote: > You have a really bizarre way of twisting what others are saying, Dirk. I > have seen no-one here saying 'limit R to 2 threads' except for you, as a > way to paint opposing views to be absurd. > > What _is_ being said is that users need to be in control_, but _the > default needs to do least harm_ until those users take responsibility for > that control. Do not turn the throttle up until the user is prepared for > the consequences. Trying to subvert that responsibility into packages by > default is going to make more trouble than giving the people using those > packages simple examples of how to take that control. > > A similar problem happens when users discover .Rprofile and insert all > those pesky library statements into it, making their scripts > irreproducible. If data.table made a warp10() function that activated this > current default performance setting then the user would be clearly at fault > for using it in an inappropriate environment like a shared HPC or the CRAN > servers. Don't put a brick on the accelerator of a teenager's car before > they even figure out where the brakes are. > > On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel > wrote: > > > >On 26 August 2023 at 12:05, Simon Urbanek wrote: > >| In reality it's more people running R on their laptops vs the rest of > the world. > > > >My point was that we also have 'single user on really Yuge workstation'. > > > >Plus we all know that those users are often not sysadmins, and do not have > >our levels of accumulated systems knowledge. > > > >So we should give _more_ power by default, not less. > > > >| [...] they will always be saying blatantly false things like "R is not > for large data" > > > >By limiting R (and/or packages) to two threads we will only get more of > >these. Our collective call. > > > >This whole thread is pretty sad, actually. > > > >Dirk > > > > -- > Sent from my phone. Please excuse my brevity. > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel > [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
On 25 August 2023 at 18:48, Jeff Newmiller wrote: | You have a really bizarre way of twisting what others are saying, Dirk. I have seen no-one here saying 'limit R to 2 threads' except for you, as a way to paint opposing views to be absurd. That's too cute. Nobody needs to repeat it, and some of us know that "it is law" as the "CRAN Repository Policy" (which each package uploads promises to adhere to) says If running a package uses multiple threads/cores it must never use more than two simultaneously: the check farm is a shared resource and will typically be running many checks simultaneously. You may find reading the document informative. The source reference (mirrored for convenience at GH) of that line is https://github.com/r-devel/r-dev-web/blob/master/CRAN/Policy/CRAN_policies.texi#L244-L246 and the rendered page is at https://cran.r-project.org/web/packages/policies.html Dirk -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
You have a really bizarre way of twisting what others are saying, Dirk. I have seen no-one here saying 'limit R to 2 threads' except for you, as a way to paint opposing views to be absurd. What _is_ being said is that users need to be in control_, but _the default needs to do least harm_ until those users take responsibility for that control. Do not turn the throttle up until the user is prepared for the consequences. Trying to subvert that responsibility into packages by default is going to make more trouble than giving the people using those packages simple examples of how to take that control. A similar problem happens when users discover .Rprofile and insert all those pesky library statements into it, making their scripts irreproducible. If data.table made a warp10() function that activated this current default performance setting then the user would be clearly at fault for using it in an inappropriate environment like a shared HPC or the CRAN servers. Don't put a brick on the accelerator of a teenager's car before they even figure out where the brakes are. On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel wrote: > >On 26 August 2023 at 12:05, Simon Urbanek wrote: >| In reality it's more people running R on their laptops vs the rest of the >world. > >My point was that we also have 'single user on really Yuge workstation'. > >Plus we all know that those users are often not sysadmins, and do not have >our levels of accumulated systems knowledge. > >So we should give _more_ power by default, not less. > >| [...] they will always be saying blatantly false things like "R is not for >large data" > >By limiting R (and/or packages) to two threads we will only get more of >these. Our collective call. > >This whole thread is pretty sad, actually. > >Dirk > -- Sent from my phone. Please excuse my brevity. __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
On 26 August 2023 at 12:05, Simon Urbanek wrote: | In reality it's more people running R on their laptops vs the rest of the world. My point was that we also have 'single user on really Yuge workstation'. Plus we all know that those users are often not sysadmins, and do not have our levels of accumulated systems knowledge. So we should give _more_ power by default, not less. | [...] they will always be saying blatantly false things like "R is not for large data" By limiting R (and/or packages) to two threads we will only get more of these. Our collective call. This whole thread is pretty sad, actually. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
> On Aug 26, 2023, at 11:01 AM, Dirk Eddelbuettel wrote: > > > On 25 August 2023 at 18:45, Duncan Murdoch wrote: > | The real problem is that there are two stubborn groups opposing each > | other: the data.table developers and the CRAN maintainers. The former > | think users should by default dedicate their whole machine to > | data.table. The latter think users should opt in to do that. > > No, it feels more like it is CRAN versus the rest of the world. > In reality it's more people running R on their laptops vs the rest of the world. Although people with laptops are the vast majority, they also are the least impacted by the decision going either way. I think Jeff summed up the core reasoning pretty well. Harm is done by excessive use, not other other way around. That said, I think this thread is really missing the key point: there is no central mechanism that would govern the use of CPU resources. OMP_THREAD_LIMIT is just one of may ways and even that is vastly insufficient for reasons discussed (e.g, recursive use of processes). It is not CRAN's responsibility to figure out for each package what it needs to behave sanely - it has no way of knowing what type of parallelism is used, under which circumstances and how to control it. Only the package author knows that (hopefully), which is why it's on them. So instead of complaining here better use of time would be to look at what's being used in packages and come up with a unified approach to monitoring core usage and a mechanism by which the packages could self-govern to respect the desired limits. If there was one canonical place, it would be also easy for users to opt in/out as they desire - and I'd be happy to help if any components of it need to be in core R. > Take but one example, and as I may have mentioned elsewhere, my day job > consists in providing software so that (to take one recent example) > bioinformatics specialist can slice huge amounts of genomics data. When that > happens on a dedicated (expensive) hardware with dozens of cores, it would be > wasteful to have an unconditional default of two threads. It would be the end > of R among serious people, no more, no less. Can you imagine how the internet > headlines would go: "R defaults to two threads". > If you run on such a machine then you or your admin certainly know how to set the desired limits. From experience the problem is exactly the opposite - it's far more common for users to not know how to not overload such a machine. As for internet headlines, they will always be saying blatantly false things like "R is not for large data" even though we have been using it to analyze terabytes of data per minute ... Cheers, Simon > And it is not just data.table as even in the long thread over in its repo we > have people chiming in using OpenMP in their code (as data.table does but > which needs a different setter than the data.table thread count). > > It is the CRAN servers which (rightly !!) want to impose constraints for when > packages are tested. Nobody objects to that. > > But some of us wonder if settings these defaults for all R user, all the > time, unconditional is really the right thing to do. Anyway, Uwe told me he > will take it to an internal discussion, so let's hope sanity prevails. > __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
On 25 August 2023 at 18:45, Duncan Murdoch wrote: | The real problem is that there are two stubborn groups opposing each | other: the data.table developers and the CRAN maintainers. The former | think users should by default dedicate their whole machine to | data.table. The latter think users should opt in to do that. No, it feels more like it is CRAN versus the rest of the world. Take but one example, and as I may have mentioned elsewhere, my day job consists in providing software so that (to take one recent example) bioinformatics specialist can slice huge amounts of genomics data. When that happens on a dedicated (expensive) hardware with dozens of cores, it would be wasteful to have an unconditional default of two threads. It would be the end of R among serious people, no more, no less. Can you imagine how the internet headlines would go: "R defaults to two threads". And it is not just data.table as even in the long thread over in its repo we have people chiming in using OpenMP in their code (as data.table does but which needs a different setter than the data.table thread count). It is the CRAN servers which (rightly !!) want to impose constraints for when packages are tested. Nobody objects to that. But some of us wonder if settings these defaults for all R user, all the time, unconditional is really the right thing to do. Anyway, Uwe told me he will take it to an internal discussion, so let's hope sanity prevails. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
To be fair, data.table defaults to using 1/2 the available cores; they do not take the entire machine by default. Avi Sent from my iPhone > On Aug 25, 2023, at 6:46 PM, Duncan Murdoch wrote: > > On 25/08/2023 6:13 p.m., Toby Hocking wrote: >> Thanks Dirk. I agree. >> data.table is not in a situation to update very soon, so the easiest >> solution for the R community would be for CRAN to set OMP_THREAD_LIMIT >> to 2 on the Windows and Debian machines doing this test. >> Otherwise the 1400+ packages with hard dependencies on data.table will >> each have to implement custom logic to limit threads to 2. > > That doesn't follow. data.table could update soon even if that wasn't their > intention: just include bug fixes and set the default OMP_THREAD_LIMIT to 2 > in data.table. > > The real problem is that there are two stubborn groups opposing each other: > the data.table developers and the CRAN maintainers. The former think users > should by default dedicate their whole machine to data.table. The latter > think users should opt in to do that. > > Duncan Murdoch > >> Toby >>> On Fri, Aug 25, 2023 at 6:46 AM Dirk Eddelbuettel wrote: >>> >>> On 24 August 2023 at 07:42, Fred Viole wrote: >>> | Hi, I am receiving a NOTE upon submission regarding the re-building of >>> | vignettes for CPU time for the Debian check. >>> | >>> | I am unable to find any documented instances or solutions to this issue. >>> | The vignettes currently build in 1m 54.3s locally and in 56s on the Win >>> | check. >>> | >>> | >>> https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log >>> >>> Please see, inter alia, the long running thread >>> >>>"Trouble with long-running tests on CRAN debian server" >>> >>> started earlier this week (!!) on this list covering exactly this issue. >>> >>> We can only hope CRAN comes to understand our point that _it_ should set a >>> clearly-identifable variable (the OpenMP thread count would do) so that >>> package data.table can this for its several hundred users. >>> >>> As things currently stand, CRAN expects several hundred packages (such as >>> your, guessing there this comes from data.table which I do not know for sure >>> but you do import it) to make the change which is pretty close to the text >>> book definition of madness. >>> >>> Also see https://github.com/Rdatatable/data.table/issues/5658 with by now 24 >>> comments. It is on the same issue. >>> >>> Uwe, Kurt: Please please please set OMP_THREAD_LIMIT to 2 on the Windows and >>> Debian machines doing this test. >>> >>> Dirk >>> >>> -- >>> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org >>> >>> __ >>> R-package-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-package-devel >> __ >> R-package-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-package-devel > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
I've been lurking on this discussion and have a question. What does data.table do to pass CRAN tests? If this is a problem for packages that use data.table, then it certainly is a problem for data.table itself. On Fri, Aug 25, 2023 at 3:46 PM Duncan Murdoch wrote: > On 25/08/2023 6:13 p.m., Toby Hocking wrote: > > Thanks Dirk. I agree. > > data.table is not in a situation to update very soon, so the easiest > > solution for the R community would be for CRAN to set OMP_THREAD_LIMIT > > to 2 on the Windows and Debian machines doing this test. > > Otherwise the 1400+ packages with hard dependencies on data.table will > > each have to implement custom logic to limit threads to 2. > > That doesn't follow. data.table could update soon even if that wasn't > their intention: just include bug fixes and set the default > OMP_THREAD_LIMIT to 2 in data.table. > > The real problem is that there are two stubborn groups opposing each > other: the data.table developers and the CRAN maintainers. The former > think users should by default dedicate their whole machine to > data.table. The latter think users should opt in to do that. > > Duncan Murdoch > > > Toby > > > > On Fri, Aug 25, 2023 at 6:46 AM Dirk Eddelbuettel > wrote: > >> > >> > >> On 24 August 2023 at 07:42, Fred Viole wrote: > >> | Hi, I am receiving a NOTE upon submission regarding the re-building of > >> | vignettes for CPU time for the Debian check. > >> | > >> | I am unable to find any documented instances or solutions to this > issue. > >> | The vignettes currently build in 1m 54.3s locally and in 56s on the > Win > >> | check. > >> | > >> | > https://urldefense.com/v3/__https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log__;!!IKRxdwAv5BmarQ!bP_qGn5U4eBRE9TwPaPYCaT1Qxw309rspvwvyo1Vr940gwIy7A450mXBZS_k9RGH7CzesgCa8mnUgnH5h_G4aa1FRg$ > >> > >> Please see, inter alia, the long running thread > >> > >> "Trouble with long-running tests on CRAN debian server" > >> > >> started earlier this week (!!) on this list covering exactly this issue. > >> > >> We can only hope CRAN comes to understand our point that _it_ should > set a > >> clearly-identifable variable (the OpenMP thread count would do) so that > >> package data.table can this for its several hundred users. > >> > >> As things currently stand, CRAN expects several hundred packages (such > as > >> your, guessing there this comes from data.table which I do not know for > sure > >> but you do import it) to make the change which is pretty close to the > text > >> book definition of madness. > >> > >> Also see > https://urldefense.com/v3/__https://github.com/Rdatatable/data.table/issues/5658__;!!IKRxdwAv5BmarQ!bP_qGn5U4eBRE9TwPaPYCaT1Qxw309rspvwvyo1Vr940gwIy7A450mXBZS_k9RGH7CzesgCa8mnUgnH5h_Her9_pag$ > with by now 24 > >> comments. It is on the same issue. > >> > >> Uwe, Kurt: Please please please set OMP_THREAD_LIMIT to 2 on the > Windows and > >> Debian machines doing this test. > >> > >> Dirk > >> > >> -- > >> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > >> > >> __ > >> R-package-devel@r-project.org mailing list > >> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!IKRxdwAv5BmarQ!bP_qGn5U4eBRE9TwPaPYCaT1Qxw309rspvwvyo1Vr940gwIy7A450mXBZS_k9RGH7CzesgCa8mnUgnH5h_GGKzc1hA$ > > > > __ > > R-package-devel@r-project.org mailing list > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!IKRxdwAv5BmarQ!bP_qGn5U4eBRE9TwPaPYCaT1Qxw309rspvwvyo1Vr940gwIy7A450mXBZS_k9RGH7CzesgCa8mnUgnH5h_GGKzc1hA$ > > __ > R-package-devel@r-project.org mailing list > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!IKRxdwAv5BmarQ!bP_qGn5U4eBRE9TwPaPYCaT1Qxw309rspvwvyo1Vr940gwIy7A450mXBZS_k9RGH7CzesgCa8mnUgnH5h_GGKzc1hA$ > [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
On 25/08/2023 6:13 p.m., Toby Hocking wrote: Thanks Dirk. I agree. data.table is not in a situation to update very soon, so the easiest solution for the R community would be for CRAN to set OMP_THREAD_LIMIT to 2 on the Windows and Debian machines doing this test. Otherwise the 1400+ packages with hard dependencies on data.table will each have to implement custom logic to limit threads to 2. That doesn't follow. data.table could update soon even if that wasn't their intention: just include bug fixes and set the default OMP_THREAD_LIMIT to 2 in data.table. The real problem is that there are two stubborn groups opposing each other: the data.table developers and the CRAN maintainers. The former think users should by default dedicate their whole machine to data.table. The latter think users should opt in to do that. Duncan Murdoch Toby On Fri, Aug 25, 2023 at 6:46 AM Dirk Eddelbuettel wrote: On 24 August 2023 at 07:42, Fred Viole wrote: | Hi, I am receiving a NOTE upon submission regarding the re-building of | vignettes for CPU time for the Debian check. | | I am unable to find any documented instances or solutions to this issue. | The vignettes currently build in 1m 54.3s locally and in 56s on the Win | check. | | https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log Please see, inter alia, the long running thread "Trouble with long-running tests on CRAN debian server" started earlier this week (!!) on this list covering exactly this issue. We can only hope CRAN comes to understand our point that _it_ should set a clearly-identifable variable (the OpenMP thread count would do) so that package data.table can this for its several hundred users. As things currently stand, CRAN expects several hundred packages (such as your, guessing there this comes from data.table which I do not know for sure but you do import it) to make the change which is pretty close to the text book definition of madness. Also see https://github.com/Rdatatable/data.table/issues/5658 with by now 24 comments. It is on the same issue. Uwe, Kurt: Please please please set OMP_THREAD_LIMIT to 2 on the Windows and Debian machines doing this test. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
Thanks Dirk. I agree. data.table is not in a situation to update very soon, so the easiest solution for the R community would be for CRAN to set OMP_THREAD_LIMIT to 2 on the Windows and Debian machines doing this test. Otherwise the 1400+ packages with hard dependencies on data.table will each have to implement custom logic to limit threads to 2. Toby On Fri, Aug 25, 2023 at 6:46 AM Dirk Eddelbuettel wrote: > > > On 24 August 2023 at 07:42, Fred Viole wrote: > | Hi, I am receiving a NOTE upon submission regarding the re-building of > | vignettes for CPU time for the Debian check. > | > | I am unable to find any documented instances or solutions to this issue. > | The vignettes currently build in 1m 54.3s locally and in 56s on the Win > | check. > | > | > https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log > > Please see, inter alia, the long running thread > >"Trouble with long-running tests on CRAN debian server" > > started earlier this week (!!) on this list covering exactly this issue. > > We can only hope CRAN comes to understand our point that _it_ should set a > clearly-identifable variable (the OpenMP thread count would do) so that > package data.table can this for its several hundred users. > > As things currently stand, CRAN expects several hundred packages (such as > your, guessing there this comes from data.table which I do not know for sure > but you do import it) to make the change which is pretty close to the text > book definition of madness. > > Also see https://github.com/Rdatatable/data.table/issues/5658 with by now 24 > comments. It is on the same issue. > > Uwe, Kurt: Please please please set OMP_THREAD_LIMIT to 2 on the Windows and > Debian machines doing this test. > > Dirk > > -- > dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
On 24 August 2023 at 07:42, Fred Viole wrote: | Hi, I am receiving a NOTE upon submission regarding the re-building of | vignettes for CPU time for the Debian check. | | I am unable to find any documented instances or solutions to this issue. | The vignettes currently build in 1m 54.3s locally and in 56s on the Win | check. | | https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log Please see, inter alia, the long running thread "Trouble with long-running tests on CRAN debian server" started earlier this week (!!) on this list covering exactly this issue. We can only hope CRAN comes to understand our point that _it_ should set a clearly-identifable variable (the OpenMP thread count would do) so that package data.table can this for its several hundred users. As things currently stand, CRAN expects several hundred packages (such as your, guessing there this comes from data.table which I do not know for sure but you do import it) to make the change which is pretty close to the text book definition of madness. Also see https://github.com/Rdatatable/data.table/issues/5658 with by now 24 comments. It is on the same issue. Uwe, Kurt: Please please please set OMP_THREAD_LIMIT to 2 on the Windows and Debian machines doing this test. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time
* checking re-building of vignette outputs ... [577s/63s] NOTE Re-building vignettes had CPU time 9.2 times elapsed time --> Do not use more than 2 cores Best, Uwe Ligges On 24.08.2023 13:42, Fred Viole wrote: Hi, I am receiving a NOTE upon submission regarding the re-building of vignettes for CPU time for the Debian check. I am unable to find any documented instances or solutions to this issue. The vignettes currently build in 1m 54.3s locally and in 56s on the Win check. https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log Thank you for your assistance, Fred [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel