The human way to do this might be a good model:

Start N, one for each core, for example, 8, nominally hoping for 100% or less 
busy.  If the system is already 25% busy, it might be nice to try to leave that 
intact, for example starting with 6.

After a moment, review total idle time, and for say 30% idle, try 30 * N / 70 
more, for 8, 3 to make nominally 96.25% busy.  Or get greedy, with 4 for 105% 
busy.

After another moment, if the target is not approximately reached, cap the 
parallelism at the amount the idle time indicates, maybe plus 1 (rounded up), 
for instance, for 20% idle, 8 * 80 / 70 = 10.  If there is essentially no 
increase, it might be good to lower the cap, for instance below 8.  If the 
target is reached, these steps can be repeated as processing progresses, so if 
CPU use drops, parallelism can be increased, and if it increases, no more are 
spawned until the running count drops.
Thus, 100% CPU, if usable, is utilized on the nose, middle or tail of the 
operation, wherever the lower need.  At times when CPU need expands to exceed 
100%, use of other resources may drop, but that is all the CPU you bought.  If 
the other resources cap the parallelism, there is not point in more, and more 
reduces stability and increases any rerun time if there is an interruption, 
because fewer reach completion before.

For instance, on network TCP transfers, often process 2 adds 5-10% and process 
3 adds nothing, but 3 might be a nice level of parallelism, so you drop to 2 
when a process terminates or a packet is lost and never lose that 10%, unless 2 
or 3 end simultaneously.  The price of process 2 is that process 1 drops from 
90% to 50% speed, and process 3 takes them all to 33%, so choosing between 2 
for faster unit turnaround and 3 for better total bandwidth use during job 
end/start or packet loss is a matter of taste and situation.

If reliability is never an issue (interruptions like network loss), overloading 
some resource on a host with plenty of RAM does not hurt final run time.  There 
may be some loss when going past the number of cores, even if there is idle 
time, if cache hits are reduced by forcing more process changes on each core.  
Added cache latency can turn into critical process latency, if progress is 
somehow tied to event turnaround time, like a transfer with insufficient 
buffering.

-- David

Reply via email to