Adam, We just pushed a new parallel job implementation by Andrew Morrow which might be worth trying in your environment. It's currently an experimental feature enabled by --experimental=tm_v2 or SetOption('experimental','tm_v2')
-Bill On Tue, Jul 9, 2019 at 11:35 AM Adam Gross via Scons-dev < scons-dev@scons.org> wrote: > In VMware builds, the sheer number of tasks (33000 leaf tasks in the first > iteration of ESX builds, for example) means that Parallel.start can take up > to 20 minutes simply collecting and preparing tasks over the course of a > build. I started a project to look at batch task handling in order to > support remote caching (e.g. asking the cache for 1000 nodes at a time > instead of 1) and realized that the approach that I want to take actually > makes normal builds more efficient as well. In this e-mail, I’d like to > explain my proposal so I can get your thoughts on it. > > > > --- Current Parallel.start performance problems --- > > > > Reference: > https://github.com/SCons/scons/blob/master/src/engine/SCons/Job.py#L369 > > > > In the current implementation, SCons collects just enough tasks to > dispatch to the thread pool such that the number of active jobs is equal to > the max number of jobs. It then waits for at least one job to be done, > gathers all finished jobs, then repeats the process of collecting enough > tasks to have jobs==self.maxjobs. > > > > Waiting on at least one job to be done misses an opportunity to keep > calling taskmaster.next_task() and task.prepare() while jobs are active. > These calls are not cheap for many reasons, including that it initiates > scanning of source nodes. > > > > --- Proposal --- > > > > A first rough draft is contained in draft pull request > https://github.com/SCons/scons/pull/3404 . In this form it is an > alternative child class of Parallel; it could just replace it if people > felt strongly. > > > > I would like to implement an alternative to the Parallel class that only > waits for jobs to complete if there are no tasks left (i.e. > taskmaster.next_task() returns None). It is optimized for keeping > jobs==self.maxjobs but otherwise, will keep looking for more tasks. If > there are no more tasks left, it waits for a job to complete and then > rechecks whether there are any tasks left, just in case other tasks were > unblocked by its completion. > > > > One very useful side effect is that this class will be collecting lists of > tasks instead of operating on one at a time, so it serves as a useful > building block towards remote caching. The current one-at-a-time cache > retrieval approach wouldn’t work for remote caching due to network latency > but this approach can. > > > > Please let me know what you think either over e-mail or on the > aforementioned pull request. > > > > Thanks, > > Adam Gross > _______________________________________________ > Scons-dev mailing list > Scons-dev@scons.org > https://pairlist2.pair.net/mailman/listinfo/scons-dev >
_______________________________________________ Scons-dev mailing list Scons-dev@scons.org https://pairlist2.pair.net/mailman/listinfo/scons-dev