Re: Slow start to cope with load

Matt Oates (Home) Tue, 20 Mar 2012 03:20:55 -0700

> The --is-threaded will only make sense for CPU limited jobs.

I agree, and these are usually the jobs that the program developer
added in multi threading already.


> So explain in which situations that these would not be equivalent:
>
> -j 100% --is-threaded=4
> -j 25%

The difference is what if I don't know how many CPUs there are on each
machine with -S and it's heterogenous and not divisible by four
evenly. I'd like to specify any given percentage as a specificity of
total CPU use, and then hint to parallel that my job is going to use 4
cores if you schedule a single one. For example 25% of a 6 core
machine isn't enough to hold a single 4 core job without going over
the 25% allocation I specified. I'm not suggesting that this is a
worthwhile feature, just probably an easier one to implement that has
valid use.

> If I understand you correctly you basically want to ignore the load
> average as reported by the server, but instead compute your own, where
> you ignore the jobs that are nicer than you are.

Not at all, I just think it makes more sense to take into account the
ratio of parallel submitted jobs that are in the run/block state to
the ready/waiting state. What is the point in issuing more jobs that
are CPU bound and waiting, it's adding load with no reward! If the
opposite is true why not issue more jobs even starting at high load. I
would use this as a weighting for your current equation not as the
method of planning how many jobs to issue.

> If that is what you mean I see the following problems:
>
> * It is hard to explain what is going on (thus not adhering to
> Principle of Least Astonishment).
> * How do you determine what processes will be knocked off the scheduling 
> queue?

You don't, you just know its happening if your running/ready ratio is
good at high load. This is something thats not hard for parallel to
work out, especially for child processes.

> * How do you tell that whether the job you are running is limited by
> disk I/O or CPU?

If its in the running state its not IO limited instantaneously so who
cares. The whole job will be IO limited which is more important for
balancing load if the ratio of running+waiting to blocked processes is
small: (1 running + 3 waiting) / 100 blocked its going to be IO
limited. Thats kind of the whole point of the kernel telling you
process states.

> * How do you tell if the running process is a (detatched)
> (grand*)child of a process started by GNU Parallel and that the parent
> is just waiting for the child complete?

If by detached you mean daemonized with its parent pid as 1? AFAIK you
wouldn't ever see something waiting on a daemon unless its done badly.
Also wouldn't that utterly break parallel anyway as there is no way to
get the stdout back since the processes parallel had a pipe to will
have exited, if daemonizing was done properly. If you mean forked
rather than detached then walking the process tree and taking an
aggregate of all the leaf processes per job is the way to go.

> It seems like an awful lot of complexity, but I might be wrong.

I agree completely and was pointing out the levels of complexity you
would need to go to causing the least surprise given what people do to
load balance. My point is an over simplification of the actual problem
of load balancing is even more dangerous if people rely on it to do
something smart. Already you are causing surprise by farming out 100
jobs if the load is starting out nearly maxed out. To do something
thats magical you have to create the magic. I would if anything remove
the load feature before making it more complex, or just write in the
documentation the limitations of its use and cases where its very
useful and others where it's pathalogical. IMHO by adding shallow
support for a batch queueing use people are going to just be
increasingly annoyed when they shoot themselves in the foot, such as
Thomas.

Best,
Matt.

Re: Slow start to cope with load

Reply via email to