Re: [PATCHv5] Another squash on run-command: add an asynchronous parallel child processor

Junio C Hamano Wed, 23 Sep 2015 19:18:07 -0700

Stefan Beller <sbel...@google.com> writes:

> I agree the commit 7087c5f3 (SQUASH??? on top of run-command: add an
> asynchronous parallel child processor) makes sense; I arrived at the same
> patch after adding in the feedback.


I dunno.  As I said, while a small part of that commit is necessary
to be squashed into [6/14] (i.e. the "failed to start" bugfix and
redefinition of the return value of start_one()), the remainder of
the commit is primarily to illustrate possible future enhancements
that the basic structure of your code must support, so that we can
get the fundamentals right.  Each of the actual "possible future
enhancements" may or may not be a good idea and there is nothing
that back it up.  In such a vacuum, I'd prefer to leave them simple
and avoid starting performance tuning prematurely.

One thing that is sort-of backed up already by your "cap to 4"
experiment is that some sort of slow-start ramping up is far better
than letting thundering herd stampede, so I am OK if we kept that
SPAWN_CAP part of the commit.

But even then, we do not know if tying that cap to online_cpu() is a
good idea.  Neither of us have a good argument backed by data on it.

It is tempting to imagine, when you have N cores on an otherwise
idle box, the setting of SPAWN_CAP shouldn't make much difference to
how well the first child process makes its initial progress as long
as it does not exceed the number of idle cores N-1 you have at hand.

But that assumes that the task is CPU bound and you have infinite
memory bandwidth.  Once the task needs a lot of disk bandwidth to
make its initial progress, which certainly is the case for fetch,
the first child that is spawned together with (SPAWN_CAP-1) other
processes would be competing for the shared resource, and having
more online_cpus() would not help you.

If we are not doing analysis that takes into such factors (and it is
way too premature for us to be tuning), even "online_cpu() - 1" is
unnecessarily too complex than a hardcoded small number (say, "2",
or even "1").

The same thing can be said for the output_timeout selection.  "Do
not get stuck for too long until we have fully ramped up.  Do not
spin too frequently when there is no more room for a new child" was
something I came up out of thin air as an example of something we
might want to do, and I did write such a code in that commit, but
that was primarily done so that you can clearly see that a better
design would be to allow the caller, i.e. the scheduling loop,
specify output_timeout to buffer_stderr(), and to keep the latter a
"dumb" helper that can be controlled by a more smart caller (as
opposed to hiding such a logic in buffer_stderr() and have a "dumb"
driver call it).  The actual output_timeout computation logic is not
well thought out---it may even turn out to be that we are better off
if we lengthened the timeout before we have fully ramped up, to
encourage the first process to produce some output before we give
chance to other new processes to be spawned in the later round.

So for that change, while I think adding that parameter to
buffer_stderr() is something we would want to keep, I'd prefer to
keep the caller simpler by always passing a hardcoded 100 in the
initial version, before we start tuning.  And I do not think we want
to start tuning before building a solid foundation to tune.

In short, if I were amending that SQUASH??? commit, I'd probably be
making it do less, not more, than what it does, something along the
line of the attached.

 run-command.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/run-command.c b/run-command.c
index b6d8b39..829b6fe 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1108,20 +1108,19 @@ static void pp_collect_finished(struct 
parallel_processes *pp)
 }
 
 
-#define SPAWN_CAP (pp.max_processes + 1) /* spawn as many as possible */
-
 int run_processes_parallel(int n, void *data,
                           get_next_task_fn get_next_task,
                           start_failure_fn start_failure,
                           return_value_fn return_value)
 {
        struct parallel_processes pp;
-       pp_init(&pp, n, data, get_next_task, start_failure, return_value);
 
+       pp_init(&pp, n, data, get_next_task, start_failure, return_value);
        while (1) {
-               int no_more_task, cnt, output_timeout;
+               int no_more_task, cnt, output_timeout = 100;
+               int spawn_cap = 2;
 
-               for (cnt = SPAWN_CAP, no_more_task = 0;
+               for (cnt = spawn_cap, no_more_task = 0;
                     cnt && pp.nr_processes < pp.max_processes;
                     cnt--) {
                        if (!pp_start_one(&pp)) {
@@ -1132,12 +1131,6 @@ int run_processes_parallel(int n, void *data,
 
                if (no_more_task && !pp.nr_processes)
                        break;
-               if (!cnt)
-                       output_timeout = 50;
-               else if (pp.nr_processes < pp.max_processes)
-                       output_timeout = 100;
-               else
-                       output_timeout = 1000;
                pp_buffer_stderr(&pp, output_timeout);
 
                pp_output(&pp);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv5] Another squash on run-command: add an asynchronous parallel child processor

Reply via email to