Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-29 Thread Uwe Ligges

Dear all,

in today's R Core meeting both the CRAN team and R Core agree with 
Simon's suggestion below.


Let me repeat the key points:

- We will try to add some interface to R that allows for more unified 
control about the various ways of parallelisation. That should allow 
users to opt in for more than 2 cores and/or threads and/or processes. 
Details will follow as this is not simple.


- As long as users do not have simple ways of controlling how demanding 
code is (e.g., different ways of parallelizationare used even in nested 
ways), CRAN will further on protect users and enforce that packages do 
not use more than 2 cores by default.


Best,
Uwe Ligges



On 26.08.2023 02:05, Simon Urbanek wrote:




On Aug 26, 2023, at 11:01 AM, Dirk Eddelbuettel  wrote:


On 25 August 2023 at 18:45, Duncan Murdoch wrote:
| The real problem is that there are two stubborn groups opposing each
| other:  the data.table developers and the CRAN maintainers.  The former
| think users should by default dedicate their whole machine to
| data.table.  The latter think users should opt in to do that.

No, it feels more like it is CRAN versus the rest of the world.




In reality it's more people running R on their laptops vs the rest of the 
world. Although people with laptops are the vast majority, they also are the 
least impacted by the decision going either way. I think Jeff summed up the 
core reasoning pretty well. Harm is done by excessive use, not other other way 
around.

That said, I think this thread is really missing the key point: there is no 
central mechanism that would govern the use of CPU resources. OMP_THREAD_LIMIT 
is just one of may ways and even that is vastly insufficient for reasons 
discussed (e.g, recursive use of processes). It is not CRAN's responsibility to 
figure out for each package what it needs to behave sanely - it has no way of 
knowing what type of parallelism is used, under which circumstances and how to 
control it. Only the package author knows that (hopefully), which is why it's 
on them. So instead of complaining here better use of time would be to look at 
what's being used in packages and come up with a unified approach to monitoring 
core usage and a mechanism by which the packages could self-govern to respect 
the desired limits. If there was one canonical place, it would be also easy for 
users to opt in/out as they desire - and I'd be happy to help if any components 
of it need to be in core R.




Take but one example, and as I may have mentioned elsewhere, my day job consists in 
providing software so that (to take one recent example) bioinformatics specialist can 
slice huge amounts of genomics data.  When that happens on a dedicated (expensive) 
hardware with dozens of cores, it would be wasteful to have an unconditional default of 
two threads. It would be the end of R among serious people, no more, no less. Can you 
imagine how the internet headlines would go: "R defaults to two threads".



If you run on such a machine then you or your admin certainly know how to set the desired 
limits. From experience the problem is exactly the opposite - it's far more common for 
users to not know how to not overload such a machine. As for internet headlines, they 
will always be saying blatantly false things like "R is not for large data" 
even though we have been using it to analyze terabytes of data per minute ...

Cheers,
Simon




And it is not just data.table as even in the long thread over in its repo we 
have people chiming in using OpenMP in their code (as data.table does but which 
needs a different setter than the data.table thread count).

It is the CRAN servers which (rightly !!) want to impose constraints for when 
packages are tested.  Nobody objects to that.

But some of us wonder if settings these defaults for all R user, all the time, 
unconditional is really the right thing to do.  Anyway, Uwe told me he will 
take it to an internal discussion, so let's hope sanity prevails.






__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-26 Thread Greg Hunt
Tim,
I think that things like data.table have a different set of problems
depending on the environment.  Working out what the right degree of
parallelism for an IO workload is is a hard question that depends on the
characteristics of the IO subsystem, the characteristics of the dataset and
on what problem you really have (really how much its worth spending to
achieve an optimal answer).  It would be interesting to see how well
data.table would do with several tens of threads on several tens of
processors reading a file, I suspect it might not be pretty (coordination
overheads could be large relative to the actual gains from IO
parallelism), but its not a subject I've looked at.  It would not
surprise me if the right answer was to cap the number of threads, but that
cap would probably still be higher than the usual number of processors in
the average physical or virtual box.  This stuff is not easy and its
saturated with "it depends" answers.  The underlying problem here is that
to get optimal or optimal-enough behaviour, a 96-way or more box
will require different configuration of the software to an 8 or 16-way VM.


Greg

On Sat, 26 Aug 2023 at 18:15, Tim Taylor 
wrote:

> I’m definitely sympathetic to both sides but have come around to the view
> of Greg, Dirk et al. It seems sensible to have a default that benefits the
> majority of “normal” users and require explicit action in shared
> environments not vice-versa.
>
> That is not to say that data.table could not do better with it’s
> heuristics (e.g. respecting CGroups settings as raised by Henrik in
> https://github.com/Rdatatable/data.table/issues/5620) but the current
> defaults (50%) seem reasonable for, dare I say, most users.
>
> Tim
>
> On 26 Aug 2023, at 03:20, Greg Hunt  wrote:
>
> The question should be, in how many cases is the current behaviour a
> problem?  In a shared environment, sure, you have to be more careful.  I'd
> say don't let the teenagers in there. The CRAN build server does need to do
> something to protect itself and I don't greatly mind the 2 thread limit, I
> implemented it by hand in my examples and didn't think about it
> afterwards.  On most 8, 16 or 32 way environments, dedicated or
> semi-dedicated to a particular workload, the defaults make some level of
> sense and they are probably most of the use cases.  Protecting high
> processor count environments from people who don't know what they are doing
> would seem to be a mismatch between the people and the environment, not so
> much a matter of software.
>
> On Sat, 26 Aug 2023 at 11:49, Jeff Newmiller 
> wrote:
>
> You have a really bizarre way of twisting what others are saying, Dirk. I
>
> have seen no-one here saying 'limit R to 2 threads' except for you, as a
>
> way to paint opposing views to be absurd.
>
>
> What _is_ being said is that users need to be in control_, but _the
>
> default needs to do least harm_ until those users take responsibility for
>
> that control. Do not turn the throttle up until the user is prepared for
>
> the consequences. Trying to subvert that responsibility into packages by
>
> default is going to make more trouble than giving the people using those
>
> packages simple examples of how to take that control.
>
>
> A similar problem happens when users discover .Rprofile and insert all
>
> those pesky library statements into it, making their scripts
>
> irreproducible. If data.table made a warp10() function that activated this
>
> current default performance setting then the user would be clearly at fault
>
> for using it in an inappropriate environment like a shared HPC or the CRAN
>
> servers. Don't put a brick on the accelerator of a teenager's car before
>
> they even figure out where the brakes are.
>
>
> On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel 
>
> wrote:
>
>
> On 26 August 2023 at 12:05, Simon Urbanek wrote:
>
> | In reality it's more people running R on their laptops vs the rest of
>
> the world.
>
>
> My point was that we also have 'single user on really Yuge workstation'.
>
>
> Plus we all know that those users are often not sysadmins, and do not have
>
> our levels of accumulated systems knowledge.
>
>
> So we should give _more_ power by default, not less.
>
>
> | [...] they will always be saying blatantly false things like "R is not
>
> for large data"
>
>
> By limiting R (and/or packages) to two threads we will only get more of
>
> these.  Our collective call.
>
>
> This whole thread is pretty sad, actually.
>
>
> Dirk
>
>
>
> --
>
> Sent from my phone. Please excuse my brevity.
>
>
> __
>
> R-package-devel@r-project.org mailing list
>
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
>

[[alternative HTML version deleted]]


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-26 Thread Tim Taylor
I’m definitely sympathetic to both sides but have come around to the view of 
Greg, Dirk et al. It seems sensible to have a default that benefits the 
majority of “normal” users and require explicit action in shared environments 
not vice-versa.

That is not to say that data.table could not do better with it’s heuristics 
(e.g. respecting CGroups settings as raised by Henrik in 
https://github.com/Rdatatable/data.table/issues/5620) but the current defaults 
(50%) seem reasonable for, dare I say, most users.

Tim

> On 26 Aug 2023, at 03:20, Greg Hunt  wrote:
> 
> The question should be, in how many cases is the current behaviour a
> problem?  In a shared environment, sure, you have to be more careful.  I'd
> say don't let the teenagers in there. The CRAN build server does need to do
> something to protect itself and I don't greatly mind the 2 thread limit, I
> implemented it by hand in my examples and didn't think about it
> afterwards.  On most 8, 16 or 32 way environments, dedicated or
> semi-dedicated to a particular workload, the defaults make some level of
> sense and they are probably most of the use cases.  Protecting high
> processor count environments from people who don't know what they are doing
> would seem to be a mismatch between the people and the environment, not so
> much a matter of software.
> 
>> On Sat, 26 Aug 2023 at 11:49, Jeff Newmiller 
>> wrote:
>> 
>> You have a really bizarre way of twisting what others are saying, Dirk. I
>> have seen no-one here saying 'limit R to 2 threads' except for you, as a
>> way to paint opposing views to be absurd.
>> 
>> What _is_ being said is that users need to be in control_, but _the
>> default needs to do least harm_ until those users take responsibility for
>> that control. Do not turn the throttle up until the user is prepared for
>> the consequences. Trying to subvert that responsibility into packages by
>> default is going to make more trouble than giving the people using those
>> packages simple examples of how to take that control.
>> 
>> A similar problem happens when users discover .Rprofile and insert all
>> those pesky library statements into it, making their scripts
>> irreproducible. If data.table made a warp10() function that activated this
>> current default performance setting then the user would be clearly at fault
>> for using it in an inappropriate environment like a shared HPC or the CRAN
>> servers. Don't put a brick on the accelerator of a teenager's car before
>> they even figure out where the brakes are.
>> 
>>> On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel 
>>> wrote:
>>> 
 On 26 August 2023 at 12:05, Simon Urbanek wrote:
>>> | In reality it's more people running R on their laptops vs the rest of
>> the world.
>>> 
>>> My point was that we also have 'single user on really Yuge workstation'.
>>> 
>>> Plus we all know that those users are often not sysadmins, and do not have
>>> our levels of accumulated systems knowledge.
>>> 
>>> So we should give _more_ power by default, not less.
>>> 
>>> | [...] they will always be saying blatantly false things like "R is not
>> for large data"
>>> 
>>> By limiting R (and/or packages) to two threads we will only get more of
>>> these.  Our collective call.
>>> 
>>> This whole thread is pretty sad, actually.
>>> 
>>> Dirk
>>> 
>> 
>> --
>> Sent from my phone. Please excuse my brevity.
>> 
>> __
>> R-package-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> 
> 
>[[alternative HTML version deleted]]
> 
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Greg Hunt
The question should be, in how many cases is the current behaviour a
problem?  In a shared environment, sure, you have to be more careful.  I'd
say don't let the teenagers in there. The CRAN build server does need to do
something to protect itself and I don't greatly mind the 2 thread limit, I
implemented it by hand in my examples and didn't think about it
afterwards.  On most 8, 16 or 32 way environments, dedicated or
semi-dedicated to a particular workload, the defaults make some level of
sense and they are probably most of the use cases.  Protecting high
processor count environments from people who don't know what they are doing
would seem to be a mismatch between the people and the environment, not so
much a matter of software.

On Sat, 26 Aug 2023 at 11:49, Jeff Newmiller 
wrote:

> You have a really bizarre way of twisting what others are saying, Dirk. I
> have seen no-one here saying 'limit R to 2 threads' except for you, as a
> way to paint opposing views to be absurd.
>
> What _is_ being said is that users need to be in control_, but _the
> default needs to do least harm_ until those users take responsibility for
> that control. Do not turn the throttle up until the user is prepared for
> the consequences. Trying to subvert that responsibility into packages by
> default is going to make more trouble than giving the people using those
> packages simple examples of how to take that control.
>
> A similar problem happens when users discover .Rprofile and insert all
> those pesky library statements into it, making their scripts
> irreproducible. If data.table made a warp10() function that activated this
> current default performance setting then the user would be clearly at fault
> for using it in an inappropriate environment like a shared HPC or the CRAN
> servers. Don't put a brick on the accelerator of a teenager's car before
> they even figure out where the brakes are.
>
> On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel 
> wrote:
> >
> >On 26 August 2023 at 12:05, Simon Urbanek wrote:
> >| In reality it's more people running R on their laptops vs the rest of
> the world.
> >
> >My point was that we also have 'single user on really Yuge workstation'.
> >
> >Plus we all know that those users are often not sysadmins, and do not have
> >our levels of accumulated systems knowledge.
> >
> >So we should give _more_ power by default, not less.
> >
> >| [...] they will always be saying blatantly false things like "R is not
> for large data"
> >
> >By limiting R (and/or packages) to two threads we will only get more of
> >these.  Our collective call.
> >
> >This whole thread is pretty sad, actually.
> >
> >Dirk
> >
>
> --
> Sent from my phone. Please excuse my brevity.
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Dirk Eddelbuettel


On 25 August 2023 at 18:48, Jeff Newmiller wrote:
| You have a really bizarre way of twisting what others are saying, Dirk. I 
have seen no-one here saying 'limit R to 2 threads' except for you, as a way to 
paint opposing views to be absurd.

That's too cute.

Nobody needs to repeat it, and some of us know that "it is law"
as the "CRAN Repository Policy" (which each package uploads
promises to adhere to) says
 
   If running a package uses multiple threads/cores it must never
   use more than two simultaneously: the check farm is a shared
   resource and will typically be running many checks
   simultaneously.

You may find reading the document informative. The source reference
(mirrored for convenience at GH) of that line is

https://github.com/r-devel/r-dev-web/blob/master/CRAN/Policy/CRAN_policies.texi#L244-L246

and the rendered page is at 
https://cran.r-project.org/web/packages/policies.html

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Jeff Newmiller
You have a really bizarre way of twisting what others are saying, Dirk. I have 
seen no-one here saying 'limit R to 2 threads' except for you, as a way to 
paint opposing views to be absurd.

What _is_ being said is that users need to be in control_, but _the default 
needs to do least harm_ until those users take responsibility for that control. 
Do not turn the throttle up until the user is prepared for the consequences. 
Trying to subvert that responsibility into packages by default is going to make 
more trouble than giving the people using those packages simple examples of how 
to take that control.

A similar problem happens when users discover .Rprofile and insert all those 
pesky library statements into it, making their scripts irreproducible. If 
data.table made a warp10() function that activated this current default 
performance setting then the user would be clearly at fault for using it in an 
inappropriate environment like a shared HPC or the CRAN servers. Don't put a 
brick on the accelerator of a teenager's car before they even figure out where 
the brakes are.

On August 25, 2023 6:17:04 PM PDT, Dirk Eddelbuettel  wrote:
>
>On 26 August 2023 at 12:05, Simon Urbanek wrote:
>| In reality it's more people running R on their laptops vs the rest of the 
>world.
>
>My point was that we also have 'single user on really Yuge workstation'. 
>
>Plus we all know that those users are often not sysadmins, and do not have
>our levels of accumulated systems knowledge.
>
>So we should give _more_ power by default, not less.
>
>| [...] they will always be saying blatantly false things like "R is not for 
>large data"
>
>By limiting R (and/or packages) to two threads we will only get more of
>these.  Our collective call.
>
>This whole thread is pretty sad, actually.
>
>Dirk
>

-- 
Sent from my phone. Please excuse my brevity.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Dirk Eddelbuettel


On 26 August 2023 at 12:05, Simon Urbanek wrote:
| In reality it's more people running R on their laptops vs the rest of the 
world.

My point was that we also have 'single user on really Yuge workstation'. 

Plus we all know that those users are often not sysadmins, and do not have
our levels of accumulated systems knowledge.

So we should give _more_ power by default, not less.

| [...] they will always be saying blatantly false things like "R is not for 
large data"

By limiting R (and/or packages) to two threads we will only get more of
these.  Our collective call.

This whole thread is pretty sad, actually.

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Simon Urbanek



> On Aug 26, 2023, at 11:01 AM, Dirk Eddelbuettel  wrote:
> 
> 
> On 25 August 2023 at 18:45, Duncan Murdoch wrote:
> | The real problem is that there are two stubborn groups opposing each 
> | other:  the data.table developers and the CRAN maintainers.  The former 
> | think users should by default dedicate their whole machine to 
> | data.table.  The latter think users should opt in to do that.
> 
> No, it feels more like it is CRAN versus the rest of the world.
> 


In reality it's more people running R on their laptops vs the rest of the 
world. Although people with laptops are the vast majority, they also are the 
least impacted by the decision going either way. I think Jeff summed up the 
core reasoning pretty well. Harm is done by excessive use, not other other way 
around.

That said, I think this thread is really missing the key point: there is no 
central mechanism that would govern the use of CPU resources. OMP_THREAD_LIMIT 
is just one of may ways and even that is vastly insufficient for reasons 
discussed (e.g, recursive use of processes). It is not CRAN's responsibility to 
figure out for each package what it needs to behave sanely - it has no way of 
knowing what type of parallelism is used, under which circumstances and how to 
control it. Only the package author knows that (hopefully), which is why it's 
on them. So instead of complaining here better use of time would be to look at 
what's being used in packages and come up with a unified approach to monitoring 
core usage and a mechanism by which the packages could self-govern to respect 
the desired limits. If there was one canonical place, it would be also easy for 
users to opt in/out as they desire - and I'd be happy to help if any components 
of it need to be in core R.



> Take but one example, and as I may have mentioned elsewhere, my day job 
> consists in providing software so that (to take one recent example) 
> bioinformatics specialist can slice huge amounts of genomics data.  When that 
> happens on a dedicated (expensive) hardware with dozens of cores, it would be 
> wasteful to have an unconditional default of two threads. It would be the end 
> of R among serious people, no more, no less. Can you imagine how the internet 
> headlines would go: "R defaults to two threads". 
> 

If you run on such a machine then you or your admin certainly know how to set 
the desired limits. From experience the problem is exactly the opposite - it's 
far more common for users to not know how to not overload such a machine. As 
for internet headlines, they will always be saying blatantly false things like 
"R is not for large data" even though we have been using it to analyze 
terabytes of data per minute ...

Cheers,
Simon



> And it is not just data.table as even in the long thread over in its repo we 
> have people chiming in using OpenMP in their code (as data.table does but 
> which needs a different setter than the data.table thread count).
> 
> It is the CRAN servers which (rightly !!) want to impose constraints for when 
> packages are tested.  Nobody objects to that.
> 
> But some of us wonder if settings these defaults for all R user, all the 
> time, unconditional is really the right thing to do.  Anyway, Uwe told me he 
> will take it to an internal discussion, so let's hope sanity prevails.
> 

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Dirk Eddelbuettel


On 25 August 2023 at 18:45, Duncan Murdoch wrote:
| The real problem is that there are two stubborn groups opposing each 
| other:  the data.table developers and the CRAN maintainers.  The former 
| think users should by default dedicate their whole machine to 
| data.table.  The latter think users should opt in to do that.

No, it feels more like it is CRAN versus the rest of the world.

Take but one example, and as I may have mentioned elsewhere, my day job
consists in providing software so that (to take one recent example)
bioinformatics specialist can slice huge amounts of genomics data.  When that
happens on a dedicated (expensive) hardware with dozens of cores, it would be
wasteful to have an unconditional default of two threads. It would be the end
of R among serious people, no more, no less. Can you imagine how the internet
headlines would go: "R defaults to two threads". 

And it is not just data.table as even in the long thread over in its repo we
have people chiming in using OpenMP in their code (as data.table does but
which needs a different setter than the data.table thread count).

It is the CRAN servers which (rightly !!) want to impose constraints for when
packages are tested.  Nobody objects to that.

But some of us wonder if settings these defaults for all R user, all the
time, unconditional is really the right thing to do.  Anyway, Uwe told me he
will take it to an internal discussion, so let's hope sanity prevails.

Dirk
-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Avraham Adler
To be fair, data.table defaults to using 1/2 the available cores; they do not 
take the entire machine by default. 

Avi

Sent from my iPhone

> On Aug 25, 2023, at 6:46 PM, Duncan Murdoch  wrote:
> 
> On 25/08/2023 6:13 p.m., Toby Hocking wrote:
>> Thanks Dirk. I agree.
>> data.table is not in a situation to update very soon, so the easiest
>> solution for the R community would be for CRAN to set OMP_THREAD_LIMIT
>> to 2 on the Windows and Debian machines doing this test.
>> Otherwise the 1400+ packages with hard dependencies on data.table will
>> each have to implement custom logic to limit threads to 2.
> 
> That doesn't follow.  data.table could update soon even if that wasn't their 
> intention:  just include bug fixes and set the default OMP_THREAD_LIMIT to 2 
> in data.table.
> 
> The real problem is that there are two stubborn groups opposing each other:  
> the data.table developers and the CRAN maintainers.  The former think users 
> should by default dedicate their whole machine to data.table.  The latter 
> think users should opt in to do that.
> 
> Duncan Murdoch
> 
>> Toby
>>> On Fri, Aug 25, 2023 at 6:46 AM Dirk Eddelbuettel  wrote:
>>> 
>>> 
 On 24 August 2023 at 07:42, Fred Viole wrote:
>>> | Hi, I am receiving a NOTE upon submission regarding the re-building of
>>> | vignettes for CPU time for the Debian check.
>>> |
>>> | I am unable to find any documented instances or solutions to this issue.
>>> | The vignettes currently build in 1m 54.3s locally and in 56s on the Win
>>> | check.
>>> |
>>> | 
>>> https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log
>>> 
>>> Please see, inter alia, the long running thread
>>> 
>>>"Trouble with long-running tests on CRAN debian server"
>>> 
>>> started earlier this week (!!) on this list covering exactly this issue.
>>> 
>>> We can only hope CRAN comes to understand our point that _it_ should set a
>>> clearly-identifable variable (the OpenMP thread count would do) so that
>>> package data.table can this for its several hundred users.
>>> 
>>> As things currently stand, CRAN expects several hundred packages (such as
>>> your, guessing there this comes from data.table which I do not know for sure
>>> but you do import it) to make the change which is pretty close to the text
>>> book definition of madness.
>>> 
>>> Also see https://github.com/Rdatatable/data.table/issues/5658 with by now 24
>>> comments.  It is on the same issue.
>>> 
>>> Uwe, Kurt: Please please please set OMP_THREAD_LIMIT to 2 on the Windows and
>>> Debian machines doing this test.
>>> 
>>> Dirk
>>> 
>>> --
>>> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>>> 
>>> __
>>> R-package-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> __
>> R-package-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> 
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Reed A. Cartwright
I've been lurking on this discussion and have a question.

What does data.table do to pass CRAN tests? If this is a problem for
packages that use data.table, then it certainly is a problem for data.table
itself.

On Fri, Aug 25, 2023 at 3:46 PM Duncan Murdoch 
wrote:

> On 25/08/2023 6:13 p.m., Toby Hocking wrote:
> > Thanks Dirk. I agree.
> > data.table is not in a situation to update very soon, so the easiest
> > solution for the R community would be for CRAN to set OMP_THREAD_LIMIT
> > to 2 on the Windows and Debian machines doing this test.
> > Otherwise the 1400+ packages with hard dependencies on data.table will
> > each have to implement custom logic to limit threads to 2.
>
> That doesn't follow.  data.table could update soon even if that wasn't
> their intention:  just include bug fixes and set the default
> OMP_THREAD_LIMIT to 2 in data.table.
>
> The real problem is that there are two stubborn groups opposing each
> other:  the data.table developers and the CRAN maintainers.  The former
> think users should by default dedicate their whole machine to
> data.table.  The latter think users should opt in to do that.
>
> Duncan Murdoch
>
> > Toby
> >
> > On Fri, Aug 25, 2023 at 6:46 AM Dirk Eddelbuettel 
> wrote:
> >>
> >>
> >> On 24 August 2023 at 07:42, Fred Viole wrote:
> >> | Hi, I am receiving a NOTE upon submission regarding the re-building of
> >> | vignettes for CPU time for the Debian check.
> >> |
> >> | I am unable to find any documented instances or solutions to this
> issue.
> >> | The vignettes currently build in 1m 54.3s locally and in 56s on the
> Win
> >> | check.
> >> |
> >> |
> https://urldefense.com/v3/__https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log__;!!IKRxdwAv5BmarQ!bP_qGn5U4eBRE9TwPaPYCaT1Qxw309rspvwvyo1Vr940gwIy7A450mXBZS_k9RGH7CzesgCa8mnUgnH5h_G4aa1FRg$
> >>
> >> Please see, inter alia, the long running thread
> >>
> >> "Trouble with long-running tests on CRAN debian server"
> >>
> >> started earlier this week (!!) on this list covering exactly this issue.
> >>
> >> We can only hope CRAN comes to understand our point that _it_ should
> set a
> >> clearly-identifable variable (the OpenMP thread count would do) so that
> >> package data.table can this for its several hundred users.
> >>
> >> As things currently stand, CRAN expects several hundred packages (such
> as
> >> your, guessing there this comes from data.table which I do not know for
> sure
> >> but you do import it) to make the change which is pretty close to the
> text
> >> book definition of madness.
> >>
> >> Also see
> https://urldefense.com/v3/__https://github.com/Rdatatable/data.table/issues/5658__;!!IKRxdwAv5BmarQ!bP_qGn5U4eBRE9TwPaPYCaT1Qxw309rspvwvyo1Vr940gwIy7A450mXBZS_k9RGH7CzesgCa8mnUgnH5h_Her9_pag$
> with by now 24
> >> comments.  It is on the same issue.
> >>
> >> Uwe, Kurt: Please please please set OMP_THREAD_LIMIT to 2 on the
> Windows and
> >> Debian machines doing this test.
> >>
> >> Dirk
> >>
> >> --
> >> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
> >>
> >> __
> >> R-package-devel@r-project.org mailing list
> >>
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!IKRxdwAv5BmarQ!bP_qGn5U4eBRE9TwPaPYCaT1Qxw309rspvwvyo1Vr940gwIy7A450mXBZS_k9RGH7CzesgCa8mnUgnH5h_GGKzc1hA$
> >
> > __
> > R-package-devel@r-project.org mailing list
> >
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!IKRxdwAv5BmarQ!bP_qGn5U4eBRE9TwPaPYCaT1Qxw309rspvwvyo1Vr940gwIy7A450mXBZS_k9RGH7CzesgCa8mnUgnH5h_GGKzc1hA$
>
> __
> R-package-devel@r-project.org mailing list
>
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!IKRxdwAv5BmarQ!bP_qGn5U4eBRE9TwPaPYCaT1Qxw309rspvwvyo1Vr940gwIy7A450mXBZS_k9RGH7CzesgCa8mnUgnH5h_GGKzc1hA$
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Duncan Murdoch

On 25/08/2023 6:13 p.m., Toby Hocking wrote:

Thanks Dirk. I agree.
data.table is not in a situation to update very soon, so the easiest
solution for the R community would be for CRAN to set OMP_THREAD_LIMIT
to 2 on the Windows and Debian machines doing this test.
Otherwise the 1400+ packages with hard dependencies on data.table will
each have to implement custom logic to limit threads to 2.


That doesn't follow.  data.table could update soon even if that wasn't 
their intention:  just include bug fixes and set the default 
OMP_THREAD_LIMIT to 2 in data.table.


The real problem is that there are two stubborn groups opposing each 
other:  the data.table developers and the CRAN maintainers.  The former 
think users should by default dedicate their whole machine to 
data.table.  The latter think users should opt in to do that.


Duncan Murdoch


Toby

On Fri, Aug 25, 2023 at 6:46 AM Dirk Eddelbuettel  wrote:



On 24 August 2023 at 07:42, Fred Viole wrote:
| Hi, I am receiving a NOTE upon submission regarding the re-building of
| vignettes for CPU time for the Debian check.
|
| I am unable to find any documented instances or solutions to this issue.
| The vignettes currently build in 1m 54.3s locally and in 56s on the Win
| check.
|
| 
https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log

Please see, inter alia, the long running thread

"Trouble with long-running tests on CRAN debian server"

started earlier this week (!!) on this list covering exactly this issue.

We can only hope CRAN comes to understand our point that _it_ should set a
clearly-identifable variable (the OpenMP thread count would do) so that
package data.table can this for its several hundred users.

As things currently stand, CRAN expects several hundred packages (such as
your, guessing there this comes from data.table which I do not know for sure
but you do import it) to make the change which is pretty close to the text
book definition of madness.

Also see https://github.com/Rdatatable/data.table/issues/5658 with by now 24
comments.  It is on the same issue.

Uwe, Kurt: Please please please set OMP_THREAD_LIMIT to 2 on the Windows and
Debian machines doing this test.

Dirk

--
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Toby Hocking
Thanks Dirk. I agree.
data.table is not in a situation to update very soon, so the easiest
solution for the R community would be for CRAN to set OMP_THREAD_LIMIT
to 2 on the Windows and Debian machines doing this test.
Otherwise the 1400+ packages with hard dependencies on data.table will
each have to implement custom logic to limit threads to 2.
Toby

On Fri, Aug 25, 2023 at 6:46 AM Dirk Eddelbuettel  wrote:
>
>
> On 24 August 2023 at 07:42, Fred Viole wrote:
> | Hi, I am receiving a NOTE upon submission regarding the re-building of
> | vignettes for CPU time for the Debian check.
> |
> | I am unable to find any documented instances or solutions to this issue.
> | The vignettes currently build in 1m 54.3s locally and in 56s on the Win
> | check.
> |
> | 
> https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log
>
> Please see, inter alia, the long running thread
>
>"Trouble with long-running tests on CRAN debian server"
>
> started earlier this week (!!) on this list covering exactly this issue.
>
> We can only hope CRAN comes to understand our point that _it_ should set a
> clearly-identifable variable (the OpenMP thread count would do) so that
> package data.table can this for its several hundred users.
>
> As things currently stand, CRAN expects several hundred packages (such as
> your, guessing there this comes from data.table which I do not know for sure
> but you do import it) to make the change which is pretty close to the text
> book definition of madness.
>
> Also see https://github.com/Rdatatable/data.table/issues/5658 with by now 24
> comments.  It is on the same issue.
>
> Uwe, Kurt: Please please please set OMP_THREAD_LIMIT to 2 on the Windows and
> Debian machines doing this test.
>
> Dirk
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Dirk Eddelbuettel


On 24 August 2023 at 07:42, Fred Viole wrote:
| Hi, I am receiving a NOTE upon submission regarding the re-building of
| vignettes for CPU time for the Debian check.
| 
| I am unable to find any documented instances or solutions to this issue.
| The vignettes currently build in 1m 54.3s locally and in 56s on the Win
| check.
| 
| 
https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log

Please see, inter alia, the long running thread

   "Trouble with long-running tests on CRAN debian server"

started earlier this week (!!) on this list covering exactly this issue.

We can only hope CRAN comes to understand our point that _it_ should set a
clearly-identifable variable (the OpenMP thread count would do) so that
package data.table can this for its several hundred users.

As things currently stand, CRAN expects several hundred packages (such as
your, guessing there this comes from data.table which I do not know for sure
but you do import it) to make the change which is pretty close to the text
book definition of madness.

Also see https://github.com/Rdatatable/data.table/issues/5658 with by now 24
comments.  It is on the same issue.

Uwe, Kurt: Please please please set OMP_THREAD_LIMIT to 2 on the Windows and
Debian machines doing this test.

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Re-building vignettes had CPU time 9.2 times elapsed time

2023-08-25 Thread Uwe Ligges



* checking re-building of vignette outputs ... [577s/63s] NOTE
Re-building vignettes had CPU time 9.2 times elapsed time

--> Do not use more than 2 cores

Best,
Uwe Ligges


On 24.08.2023 13:42, Fred Viole wrote:

Hi, I am receiving a NOTE upon submission regarding the re-building of
vignettes for CPU time for the Debian check.

I am unable to find any documented instances or solutions to this issue.
The vignettes currently build in 1m 54.3s locally and in 56s on the Win
check.

https://win-builder.r-project.org/incoming_pretest/NNS_10.1_20230824_132459/Debian/00check.log


Thank you for your assistance,
Fred

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel