It's difficult to comment without knowing more detail about numbers of 
workers, their relative speed, number of tasks and their expected 
completion times.

As an extreme example, say you have 4 workers (all of the same speed) and 
2x15-minute tasks and 16x1-minute tasks.

Depending on how this is scheduled, this will take between 15 to 19 
minutes. Optimally:
Worker 1: 15
Worker 2: 15
Worker 3: 1 1 1 1 1 1 1 1 
Worker 4: 1 1 1 1 1 1 1 1

After 8 minutes, workers 3 and 4 will be idle, and remain idle for the 
remaining 7 minutes before workers 1 and 2 finish.

I had a similar problem where I had fast and slow workers, and initially 
split the work into a number of tasks similar to the number of workers.
This left an overhang similar to what you describe.
In my case more granularity helped. Splitting into many tasks so that 
#tasks >> #workers helped.



On Thursday, April 7, 2016 at 2:21:28 AM UTC+10, Thomas Covert wrote:

> The manual suggests that pmap(f, lst) will dynamically "feed" elements of 
> lst to the function f as each worker completes its previous assignment, and 
> in my read of the code for pmap, this is indeed what it does.
>
> However, I have found that, in practice, many of the workers that I spin 
> up for pmap tasks are idle for the last, say, half of the total time needed 
> to complete the task.  In my pmap usage, it is the case that the complexity 
> of the workload varies across elements of lst, so that some elements should 
> take a long time to compute (say, 15 minutes on a core of my machine) and 
> others a short time (less than 1 minute).  Knowing about this heterogeneity 
> and observing this pattern of idle workers after about half of the work is 
> done would normally lead me to think that pmap is scheduling workers ahead 
> of time, not dynamically.  Some workers will get "lucky" and have easier 
> than average workload, and others are unlucky and have harder workload.  At 
> the end of the calculation, only the unlucky workers are still working. 
>  However, this isn't what pmap is doing, so I'm kinda confused. 
>
> Am I crazy?  The documentation for pmap says that it is scheduling tasks 
> dynamically and I am pre-randomizing the order of work in lst so that 
> worker 1 doesn't get easier tasks, in expectation, than worker N.  Or is it 
> more likely that I've got a bug somewhere?
>
>
>

Reply via email to