Re: [PATCHv3 06/13] run-command: add an asynchronous parallel child processor

Junio C Hamano Mon, 21 Sep 2015 18:09:06 -0700

Stefan Beller <sbel...@google.com> writes:

> +void default_start_failure(void *data,
> +                        struct child_process *cp,
> +                        struct strbuf *err)
> +{
> +     int i;
> +     struct strbuf sb = STRBUF_INIT;
> +
> +     for (i = 0; cp->argv[i]; i++)
> +             strbuf_addf(&sb, "%s ", cp->argv[i]);
> +     die_errno("Starting a child failed:\n%s", sb.buf);


Do we want that trailing SP after the last element of argv[]?
Same question applies to the one in "return-value".

> +static void run_processes_parallel_init(struct parallel_processes *pp,
> +                                     int n, void *data,
> +                                     get_next_task_fn get_next_task,
> +                                     start_failure_fn start_failure,
> +                                     return_value_fn return_value)
> +{
> +     int i;
> +
> +     if (n < 1)
> +             n = online_cpus();
> +
> +     pp->max_processes = n;
> +     pp->data = data;
> +     if (!get_next_task)
> +             die("BUG: you need to specify a get_next_task function");
> +     pp->get_next_task = get_next_task;
> +
> +     pp->start_failure = start_failure ? start_failure : 
> default_start_failure;
> +     pp->return_value = return_value ? return_value : default_return_value;

I would actually have expected that leaving these to NULL will just
skip pp->fn calls, instead of a "default implementation", but a pair
of very simple default implementation would not hrtut.

> +static void run_processes_parallel_cleanup(struct parallel_processes *pp)
> +{
> +     int i;

Have a blank between the decl block and the first stmt here (and
elsewhere, too---which you got correct in the function above)?

> +     for (i = 0; i < pp->max_processes; i++)
> +             strbuf_release(&pp->children[i].err);

> +static void run_processes_parallel_start_one(struct parallel_processes *pp)
> +{
> +     int i;
> +
> +     for (i = 0; i < pp->max_processes; i++)
> +             if (!pp->children[i].in_use)
> +                     break;
> +     if (i == pp->max_processes)
> +             die("BUG: bookkeeping is hard");

Mental note: the caller is responsible for not calling this when all
slots are taken.

> +     if (!pp->get_next_task(pp->data,
> +                            &pp->children[i].process,
> +                            &pp->children[i].err)) {
> +             pp->all_tasks_started = 1;
> +             return;
> +     }

Mental note: but it is OK to call this if get_next_task() previously
said "no more task".

The above two shows a slight discrepancy (nothing earth-breaking).

I have this suspicion that the all-tasks-started bit may turn out to
be a big mistake that we may later regret.  Don't we want to allow
pp->more_task() to say "no more task to run at this moment" implying
"but please do ask me later, because I may have found more to do by
the time you ask me again"?

That is one of the reasons why I do not think the "very top level is
a bulleted list" organization is a good idea in general.  A good
scheduling decision can seldom be made in isolation without taking
global picture into account.

> +static void run_processes_parallel_collect_finished(struct 
> parallel_processes *pp)
> +{
> +     int i = 0;
> +     pid_t pid;
> +     int wait_status, code;
> +     int n = pp->max_processes;
> +
> +     while (pp->nr_processes > 0) {
> +             pid = waitpid(-1, &wait_status, WNOHANG);
> +             if (pid == 0)
> +                     return;
> +
> +             if (pid < 0)
> +                     die_errno("wait");
> +
> +             for (i = 0; i < pp->max_processes; i++)
> +                     if (pp->children[i].in_use &&
> +                         pid == pp->children[i].process.pid)
> +                             break;
> +             if (i == pp->max_processes)
> +                     /*
> +                      * waitpid returned another process id
> +                      * which we are not waiting for.
> +                      */
> +                     return;

If we culled a child process that this machinery is not in charge
of, waitpid() in other places that wants to see that child will not
see it.  Perhaps such a situation might even warrant an error() or
BUG()?  Do we want a "NEEDSWORK: Is this a bug?" comment here at
least?

> +             if (strbuf_read_once(&pp->children[i].err,
> +                                  pp->children[i].process.err, 0) < 0 &&
> +                 errno != EAGAIN)
> +                     die_errno("strbuf_read_once");

Don't we want to read thru to the end here?  The reason read_once()
did not read thru to the end may not have anything to do with
NONBLOCK (e.g. xread_nonblock() caps len, and it does not loop).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 06/13] run-command: add an asynchronous parallel child processor

Reply via email to