Re: [PATCH] run-command: detect finished children by closed pipe rather than waitpid

2015-11-11 Thread Stefan Beller
including the list and all others this time.

>> if (code < 0) {
>> pp->shutdown = 1;
>> -   kill_children(pp, SIGTERM);
>> +   kill_children(pp, -code);
>
>
> I'll see what this means for our kill emulation on Windows. Currently, we
> handle only SIGTERM.

So currently we only pass in SIGTERM from the callers, and I certainly
only intend
to use that signal. I just thought special casing the SIGTERM signal
would do no good
in terms of design here.

So maybe that was not the right thought and we do have to special case
SIGTERM here?

I worked with another large proprietary parallel program (spawning
child processes) before.
If you pressed CTRL-C, it would omit a message like:

Sending no further input to the children

If you press Ctrl-C again, it would print:

Sending SIGTERM to children, to shut them down gracefully

If you pressed once again, you'll get a

User seems to be angry, shutting down hard, sending SIGKILL
$ # that was fast now.

I guess we don't want to play around like that, but if the parent process gets
a SIGTERM, we relay that and other signals we ignore for now?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] run-command: detect finished children by closed pipe rather than waitpid

2015-11-11 Thread Stefan Beller
On Wed, Nov 11, 2015 at 12:48 PM, Johannes Sixt  wrote:
>>
>> So maybe that was not the right thought and we do have to special case
>> SIGTERM here?
>
>
> I wonder why task_finish() callback gets to choose a signal. The point here
> is, IIUC, when one child dies, the others must be halted, too. SIGTERM seems
> to be the only sensible choice.

SIGKILL would also do?

In case you know your children, you can also send a SIGUSR1 or SIGUSR2.

Or if you want to quit the top level program, but want to keep going with some
of the children (repacking, garbage collection, networking stuff), you
way want to
decouple them using SIGHUP ?

So I am not convinced SIGTERM is the only true choice here. And because I
have no idea which of the signals may be useful in the future, I decided to
go with all of them.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] run-command: detect finished children by closed pipe rather than waitpid

2015-11-11 Thread Johannes Sixt

Am 11.11.2015 um 21:37 schrieb Stefan Beller:

including the list and all others this time.


 if (code < 0) {
 pp->shutdown = 1;
-   kill_children(pp, SIGTERM);
+   kill_children(pp, -code);



I'll see what this means for our kill emulation on Windows. Currently, we
handle only SIGTERM.


So currently we only pass in SIGTERM from the callers, and I certainly
only intend
to use that signal. I just thought special casing the SIGTERM signal
would do no good
in terms of design here.

So maybe that was not the right thought and we do have to special case
SIGTERM here?


I wonder why task_finish() callback gets to choose a signal. The point 
here is, IIUC, when one child dies, the others must be halted, too. 
SIGTERM seems to be the only sensible choice.


-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] run-command: detect finished children by closed pipe rather than waitpid

2015-11-11 Thread Johannes Sixt

Am 11.11.2015 um 21:53 schrieb Stefan Beller:

On Wed, Nov 11, 2015 at 12:48 PM, Johannes Sixt  wrote:

I wonder why task_finish() callback gets to choose a signal. The point here
is, IIUC, when one child dies, the others must be halted, too. SIGTERM seems
to be the only sensible choice.


SIGKILL would also do?

In case you know your children, you can also send a SIGUSR1 or SIGUSR2.
...
So I am not convinced SIGTERM is the only true choice here. And because I
have no idea which of the signals may be useful in the future, I decided to
go with all of them.


Fair enough.

-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] run-command: detect finished children by closed pipe rather than waitpid

2015-11-07 Thread Johannes Sixt

Am 07.11.2015 um 00:48 schrieb Stefan Beller:

Detect if a child stopped working by checking if their stderr pipe
was closed instead of checking their state with waitpid.
As waitpid is not fully working in Windows, this is an approach which
allows for better cross platform operation. (It's less code, too)

Previously we did not close the read pipe of finished children, which we
do now.

The old way missed some messages on an early abort. We just killed the
children and did not bother to look what was left over. With this approach
we'd send a signal to the children and wait for them to close the pipe to
have all the messages (including possible "killed by signal 15" messages).

To have the test suite passing as before, we allow for real graceful
abortion now. In case the user wishes to abort parallel execution
the user needs to provide either the signal used to kill all children
or the children are let run until they finish normally.

Signed-off-by: Stefan Beller 
---
  Hi,

  this applis on top of origin/sb/submodule-parallel-fetch,
  making Windows folks possibly even more happy. It makes the code easier
  to read and has less races on cleaning up a terminated child.

  It follows the idea of Johannes patch, instead of encoding information in .err
  I removed the in_use flag and added a state, currently having 3 states.

  Thanks,
  Stefan

  Johannes schrieb:
  > First let me say that I find it very questionable that the callbacks
  > receive a struct child_process.

  I tried to get rid of the child_process struct in the callbacks, but that's
  not as easy as one may think.


Fair enough. I see you removed .err, .no_stdin and .stdout_to_stderr 
from the callback. Good.



pp->nr_processes--;
-   pp->children[i].in_use = 0;
+   pp->children[i].state = FREE;
pp->pfd[i].fd = -1;
child_process_deinit(>children[i].process);


This cleanup is implied by finish_command and can be removed.


child_process_init(>children[i].process);



@@ -1195,12 +1175,12 @@ int run_processes_parallel(int n,
i < spawn_cap && !pp->shutdown &&
pp->nr_processes < pp->max_processes;
i++) {
-   int code = pp_start_one(pp);
+   code = pp_start_one(pp);
if (!code)
continue;
if (code < 0) {
pp->shutdown = 1;
-   kill_children(pp, SIGTERM);
+   kill_children(pp, -code);


I'll see what this means for our kill emulation on Windows. Currently, 
we handle only SIGTERM.



}
break;
}


Thanks you very much!

-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] run-command: detect finished children by closed pipe rather than waitpid

2015-11-06 Thread Stefan Beller
Detect if a child stopped working by checking if their stderr pipe
was closed instead of checking their state with waitpid.
As waitpid is not fully working in Windows, this is an approach which
allows for better cross platform operation. (It's less code, too)

Previously we did not close the read pipe of finished children, which we
do now.

The old way missed some messages on an early abort. We just killed the
children and did not bother to look what was left over. With this approach
we'd send a signal to the children and wait for them to close the pipe to
have all the messages (including possible "killed by signal 15" messages).

To have the test suite passing as before, we allow for real graceful
abortion now. In case the user wishes to abort parallel execution
the user needs to provide either the signal used to kill all children
or the children are let run until they finish normally.

Signed-off-by: Stefan Beller 
---
 Hi,
 
 this applis on top of origin/sb/submodule-parallel-fetch,
 making Windows folks possibly even more happy. It makes the code easier
 to read and has less races on cleaning up a terminated child.
 
 It follows the idea of Johannes patch, instead of encoding information in .err
 I removed the in_use flag and added a state, currently having 3 states.
 
 Thanks,
 Stefan

 Johannes schrieb:
 > First let me say that I find it very questionable that the callbacks
 > receive a struct child_process.
 
 I tried to get rid of the child_process struct in the callbacks, but that's
 not as easy as one may think. The submodule fetch code looks like this:
 
 get_next_submodule(..) {
...
child_process_init(cp);
cp->dir = strbuf_detach(_path, NULL);
cp->env = local_repo_env;
cp->git_cmd = 1;
if (!spf->quiet)
strbuf_addf(err, "Fetching submodule %s%s\n",
spf->prefix, ce->name);
argv_array_init(>args);
argv_array_pushv(>args, spf->args.argv);
argv_array_push(>args, default_argv);
argv_array_push(>args, "--submodule-prefix");
argv_array_push(>args, submodule_prefix.buf);
 }
 
 So we need to have access to the args, and git_cmd, environment variables
 and the directory where to run the command in. (So quite a lot of the
 internals already)
 

 run-command.c  | 140 +++--
 run-command.h  |  12 +++--
 submodule.c|   3 --
 test-run-command.c |   3 --
 4 files changed, 69 insertions(+), 89 deletions(-)

diff --git a/run-command.c b/run-command.c
index 1fbd286..cf17baf 100644
--- a/run-command.c
+++ b/run-command.c
@@ -858,6 +858,12 @@ int capture_command(struct child_process *cmd, struct 
strbuf *buf, size_t hint)
return finish_command(cmd);
 }
 
+enum child_state {
+   FREE,
+   WORKING,
+   WAIT_CLEANUP,
+};
+
 static struct parallel_processes {
void *data;
 
@@ -869,7 +875,7 @@ static struct parallel_processes {
task_finished_fn task_finished;
 
struct {
-   unsigned in_use : 1;
+   enum child_state state;
struct child_process process;
struct strbuf err;
void *data;
@@ -923,7 +929,7 @@ static void kill_children(struct parallel_processes *pp, 
int signo)
int i, n = pp->max_processes;
 
for (i = 0; i < n; i++)
-   if (pp->children[i].in_use)
+   if (pp->children[i].state == WORKING)
kill(pp->children[i].process.pid, signo);
 }
 
@@ -967,7 +973,7 @@ static struct parallel_processes *pp_init(int n,
for (i = 0; i < n; i++) {
strbuf_init(>children[i].err, 0);
child_process_init(>children[i].process);
-   pp->pfd[i].events = POLLIN;
+   pp->pfd[i].events = POLLIN | POLLHUP;
pp->pfd[i].fd = -1;
}
sigchain_push_common(handle_children_on_signal);
@@ -1011,41 +1017,48 @@ static void set_nonblocking(int fd)
  *  0 if a new task was started.
  *  1 if no new jobs was started (get_next_task ran out of work, non critical
  *problem with starting a new command)
- * -1 no new job was started, user wishes to shutdown early.
+ * <0 no new job was started, user wishes to shutdown early. Use negative code
+ *to signal the children.
  */
 static int pp_start_one(struct parallel_processes *pp)
 {
-   int i;
+   int i, code;
 
for (i = 0; i < pp->max_processes; i++)
-   if (!pp->children[i].in_use)
+   if (pp->children[i].state == FREE)
break;
if (i == pp->max_processes)
die("BUG: bookkeeping is hard");
 
-   if (!pp->get_next_task(>children[i].data,
-  >children[i].process,
-  >children[i].err,
-  pp->data)) {
+   code = pp->get_next_task(>children[i].data,
+   

Re: [PATCH] run-command: detect finished children by closed pipe rather than waitpid

2015-11-06 Thread Torsten Bögershausen
On 07.11.15 00:48, Stefan Beller wrote:
>  
> +enum child_state {
> + FREE,
> + WORKING,
> + WAIT_CLEANUP,
> +};

This kind of  "generic names" feels are begging for 
name clash, some day, may be in the future.
How about something like this:
> + GIT_CP_FREE,
> + GIT_CP_WORKING,
> + GIT_CP_WAIT_CLEANUP,

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html