Re: Slow start to cope with load

Ole Tange Mon, 19 Mar 2012 03:26:56 -0700

On Mon, Mar 19, 2012 at 10:20 AM, Matt Oates (Home) <[email protected]> wrote:
>
> On 16 March 2012 00:32, Ole Tange <[email protected]> wrote:
> > One of the problems with --load is that it only limits how many jobs
> > are started. So you may start way too many. This will give you a load
> > of 100:
> >
> >  seq 100 | nice parallel -j0 --load 2.00 burnP6
> >
> > and that is most likely not what you want.
>
> Am I wrong in thinking you can just do -j 100% so that you never spawn
> more than maxload processes assuming one process load 1.0 on a single
> core? Can you not use -j 100% in conjunction with --load to prevent
> the overload on startup?


For CPU hungry programs like 'burnP6' that would be true. But if the
program only uses 10% CPU (because it is waiting for network or disk
I/O), then we should be able to spawn more - preferably automatically
figuring out the "right" amount.

> > While some programs run multiple threads (and thus can give a load > 1
> > each) that is the exception. So in general I think we can assume one
> > job will at most give a load of 1.
>
> It would be nice to explicitly state the likely load per process
> though especially if you are the one setting it. I frequently run hmm
> building with concurrent threading per process and just do the maths
> myself, and am lucky that all the hosts have the same number of CPUs.
> Perhaps a flag like --is-threaded=4  or something to indicate the
> likely load per job?

I am not too happy about that. I would much prefer some automated way
of doing-the-right-thing.

> > Currently load is only computed every 10 seconds. So we could
> > recompute every 10 seconds:
> >
> >    number_of_concurrent_jobs = max_load - current_load +
> > number_of_concurrent_jobs
>
> Looks good, though I have a couple of questions: If this is negative
> are you going to kill processes rather than start them? What if it's
> always 0 even from the start are you just never going to run on this
> host?

As a user I would be very surprised if GNU Parallel started to kill my
jobs, and I try to design GNU Parallel adherring to POLA:
http://en.wikipedia.org/wiki/Principle_of_least_astonishment

So if it is < 1 it would mean: Do not spawn more new jobs, but wait
for jobs to complete.

> > I believe it would be better than the current, but I am very open to
> > even better ideas.
>
> You are starting to get into the realm of needing to understand
> scheduling per host... Load might be reported for something with a
> different nice value than what you want to submit. So 100% load for
> something with <0 nice and you want to put something in for +19. In
> your equation above I would just add in something looking at the
> difference between parallel's jobs that are running and those that are
> ready/waiting. If all our jobs are running even under high load who
> cares, we have priority here so keep up with the max load. If half of
> our jobs are waiting then we might as well reduce spawning by half.

I did not understand this part.

> Best,
> Matt.

/Ole

Re: Slow start to cope with load

Reply via email to