In VMware builds, the sheer number of tasks (33000 leaf tasks in the first 
iteration of ESX builds, for example) means that Parallel.start can take up to 
20 minutes simply collecting and preparing tasks over the course of a build. I 
started a project to look at batch task handling in order to support remote 
caching (e.g. asking the cache for 1000 nodes at a time instead of 1) and 
realized that the approach that I want to take actually makes normal builds 
more efficient as well. In this e-mail, I'd like to explain my proposal so I 
can get your thoughts on it.

--- Current Parallel.start performance problems ---

Reference: 
https://github.com/SCons/scons/blob/master/src/engine/SCons/Job.py#L369

In the current implementation, SCons collects just enough tasks to dispatch to 
the thread pool such that the number of active jobs is equal to the max number 
of jobs. It then waits for at least one job to be done, gathers all finished 
jobs, then repeats the process of collecting enough tasks to have 
jobs==self.maxjobs.

Waiting on at least one job to be done misses an opportunity to keep calling 
taskmaster.next_task() and task.prepare() while jobs are active. These calls 
are not cheap for many reasons, including that it initiates scanning of source 
nodes.

--- Proposal ---

A first rough draft is contained in draft pull request 
https://github.com/SCons/scons/pull/3404 . In this form it is an alternative 
child class of Parallel; it could just replace it if people felt strongly.

I would like to implement an alternative to the Parallel class that only waits 
for jobs to complete if there are no tasks left (i.e. taskmaster.next_task() 
returns None). It is optimized for keeping jobs==self.maxjobs but otherwise, 
will keep looking for more tasks. If there are no more tasks left, it waits for 
a job to complete and then rechecks whether there are any tasks left, just in 
case other tasks were unblocked by its completion.

One very useful side effect is that this class will be collecting lists of 
tasks instead of operating on one at a time, so it serves as a useful building 
block towards remote caching. The current one-at-a-time cache retrieval 
approach wouldn't work for remote caching due to network latency but this 
approach can.

Please let me know what you think either over e-mail or on the aforementioned 
pull request.

Thanks,
Adam Gross
_______________________________________________
Scons-dev mailing list
Scons-dev@scons.org
https://pairlist2.pair.net/mailman/listinfo/scons-dev

Reply via email to