> The --is-threaded will only make sense for CPU limited jobs. I agree, and these are usually the jobs that the program developer added in multi threading already.
> So explain in which situations that these would not be equivalent: > > -j 100% --is-threaded=4 > -j 25% The difference is what if I don't know how many CPUs there are on each machine with -S and it's heterogenous and not divisible by four evenly. I'd like to specify any given percentage as a specificity of total CPU use, and then hint to parallel that my job is going to use 4 cores if you schedule a single one. For example 25% of a 6 core machine isn't enough to hold a single 4 core job without going over the 25% allocation I specified. I'm not suggesting that this is a worthwhile feature, just probably an easier one to implement that has valid use. > If I understand you correctly you basically want to ignore the load > average as reported by the server, but instead compute your own, where > you ignore the jobs that are nicer than you are. Not at all, I just think it makes more sense to take into account the ratio of parallel submitted jobs that are in the run/block state to the ready/waiting state. What is the point in issuing more jobs that are CPU bound and waiting, it's adding load with no reward! If the opposite is true why not issue more jobs even starting at high load. I would use this as a weighting for your current equation not as the method of planning how many jobs to issue. > If that is what you mean I see the following problems: > > * It is hard to explain what is going on (thus not adhering to > Principle of Least Astonishment). > * How do you determine what processes will be knocked off the scheduling > queue? You don't, you just know its happening if your running/ready ratio is good at high load. This is something thats not hard for parallel to work out, especially for child processes. > * How do you tell that whether the job you are running is limited by > disk I/O or CPU? If its in the running state its not IO limited instantaneously so who cares. The whole job will be IO limited which is more important for balancing load if the ratio of running+waiting to blocked processes is small: (1 running + 3 waiting) / 100 blocked its going to be IO limited. Thats kind of the whole point of the kernel telling you process states. > * How do you tell if the running process is a (detatched) > (grand*)child of a process started by GNU Parallel and that the parent > is just waiting for the child complete? If by detached you mean daemonized with its parent pid as 1? AFAIK you wouldn't ever see something waiting on a daemon unless its done badly. Also wouldn't that utterly break parallel anyway as there is no way to get the stdout back since the processes parallel had a pipe to will have exited, if daemonizing was done properly. If you mean forked rather than detached then walking the process tree and taking an aggregate of all the leaf processes per job is the way to go. > It seems like an awful lot of complexity, but I might be wrong. I agree completely and was pointing out the levels of complexity you would need to go to causing the least surprise given what people do to load balance. My point is an over simplification of the actual problem of load balancing is even more dangerous if people rely on it to do something smart. Already you are causing surprise by farming out 100 jobs if the load is starting out nearly maxed out. To do something thats magical you have to create the magic. I would if anything remove the load feature before making it more complex, or just write in the documentation the limitations of its use and cases where its very useful and others where it's pathalogical. IMHO by adding shallow support for a batch queueing use people are going to just be increasingly annoyed when they shoot themselves in the foot, such as Thomas. Best, Matt.
