Re: [PATCHv2] run-command: detect finished children by closed pipe rather than waitpid

2015-11-23 Thread Stefan Beller
On Fri, Nov 20, 2015 at 10:58 PM, Torsten Bögershausen  wrote:
> On 2015-11-20 22.08, Stefan Beller wrote:
> The patch looks good at first glance, one minor remark below:
>>
>> diff --git a/run-command.c b/run-command.c
>
>> @@ -1071,70 +1089,31 @@ static void pp_output(struct parallel_processes *pp)
>>
>>  static int pp_collect_finished(struct parallel_processes *pp)
>>  {
>> - int i = 0;
>> - pid_t pid;
>> - int wait_status, code;
>> + int i, code;
>
> code is probably "return code"?
> woud "ret_value", "res" or "rc" make that more clear ?
>

The return value of invoked process, yes.
"ret_value", "res" sound too much like our own future return value
(by convention of naming its own return value ret_value or similar).

Instead of return code, I may settle with `exit_status`?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] run-command: detect finished children by closed pipe rather than waitpid

2015-11-23 Thread Stefan Beller
On Mon, Nov 23, 2015 at 10:44 AM, Stefan Beller  wrote:
> On Fri, Nov 20, 2015 at 10:58 PM, Torsten Bögershausen  wrote:
>> On 2015-11-20 22.08, Stefan Beller wrote:
>> The patch looks good at first glance, one minor remark below:
>>>
>>> diff --git a/run-command.c b/run-command.c
>>
>>> @@ -1071,70 +1089,31 @@ static void pp_output(struct parallel_processes *pp)
>>>
>>>  static int pp_collect_finished(struct parallel_processes *pp)
>>>  {
>>> - int i = 0;
>>> - pid_t pid;
>>> - int wait_status, code;
>>> + int i, code;
>>
>> code is probably "return code"?
>> woud "ret_value", "res" or "rc" make that more clear ?
>>
>

Although looking through the code, we have lots of functions
having a local `code` variable, so we may want to preserve consistency
across the different functions to have a `code`which contains the return
value of the process or function invoked.

We had the `code` already in pp_collect_finished, so I'd like to not
rename a variable (which was used for the same purpose) in this patch.

In pp_start_one we introduce a new variable `code` which contains
the return from the user callback function, so I would understand if
we were arguing there.

That said, I plan to resend with a reworded commit message later today.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] run-command: detect finished children by closed pipe rather than waitpid

2015-11-20 Thread Stefan Beller
Detect if a child stopped working by checking if their stderr pipe
was closed instead of checking their state with waitpid.
As waitpid is not fully working in Windows, this is an approach which
allows for better cross platform operation. (It's less code, too)

Previously we did not close the read pipe of finished children, which we
do now.

The old way missed some messages on an early abort. We just killed the
children and did not bother to look what was left over. With this approach
we'd send a signal to the children and wait for them to close the pipe to
have all the messages (including possible "killed by signal 15" messages).

To have the test suite passing as before, we allow for real graceful
abortion now. In case the user wishes to abort parallel execution
the user needs to provide either the signal used to kill all children
or the children are let run until they finish normally.

Signed-off-by: Stefan Beller 
---

This applies on top of peff/sb/submodule-parallel-fetch.
It is a resend without modification from Nov. 11th.

 run-command.c  | 141 +++--
 run-command.h  |  12 +++--
 submodule.c|   3 --
 test-run-command.c |   3 --
 4 files changed, 69 insertions(+), 90 deletions(-)

diff --git a/run-command.c b/run-command.c
index 07424e9..db4d916 100644
--- a/run-command.c
+++ b/run-command.c
@@ -858,6 +858,12 @@ int capture_command(struct child_process *cmd, struct 
strbuf *buf, size_t hint)
return finish_command(cmd);
 }
 
+enum child_state {
+   GIT_CP_FREE,
+   GIT_CP_WORKING,
+   GIT_CP_WAIT_CLEANUP,
+};
+
 static struct parallel_processes {
void *data;
 
@@ -869,7 +875,7 @@ static struct parallel_processes {
task_finished_fn task_finished;
 
struct {
-   unsigned in_use : 1;
+   enum child_state state;
struct child_process process;
struct strbuf err;
void *data;
@@ -923,7 +929,7 @@ static void kill_children(struct parallel_processes *pp, 
int signo)
int i, n = pp->max_processes;
 
for (i = 0; i < n; i++)
-   if (pp->children[i].in_use)
+   if (pp->children[i].state == GIT_CP_WORKING)
kill(pp->children[i].process.pid, signo);
 }
 
@@ -967,7 +973,7 @@ static struct parallel_processes *pp_init(int n,
for (i = 0; i < n; i++) {
strbuf_init(>children[i].err, 0);
child_process_init(>children[i].process);
-   pp->pfd[i].events = POLLIN;
+   pp->pfd[i].events = POLLIN | POLLHUP;
pp->pfd[i].fd = -1;
}
sigchain_push_common(handle_children_on_signal);
@@ -1000,39 +1006,46 @@ static void pp_cleanup(struct parallel_processes *pp)
  *  0 if a new task was started.
  *  1 if no new jobs was started (get_next_task ran out of work, non critical
  *problem with starting a new command)
- * -1 no new job was started, user wishes to shutdown early.
+ * <0 no new job was started, user wishes to shutdown early. Use negative code
+ *to signal the children.
  */
 static int pp_start_one(struct parallel_processes *pp)
 {
-   int i;
+   int i, code;
 
for (i = 0; i < pp->max_processes; i++)
-   if (!pp->children[i].in_use)
+   if (pp->children[i].state == GIT_CP_FREE)
break;
if (i == pp->max_processes)
die("BUG: bookkeeping is hard");
 
-   if (!pp->get_next_task(>children[i].data,
-  >children[i].process,
-  >children[i].err,
-  pp->data)) {
+   code = pp->get_next_task(>children[i].data,
+>children[i].process,
+>children[i].err,
+pp->data);
+   if (!code) {
strbuf_addbuf(>buffered_output, >children[i].err);
strbuf_reset(>children[i].err);
return 1;
}
+   pp->children[i].process.err = -1;
+   pp->children[i].process.stdout_to_stderr = 1;
+   pp->children[i].process.no_stdin = 1;
 
if (start_command(>children[i].process)) {
-   int code = pp->start_failure(>children[i].process,
->children[i].err,
-pp->data,
->children[i].data);
+   code = pp->start_failure(>children[i].process,
+>children[i].err,
+pp->data,
+>children[i].data);
strbuf_addbuf(>buffered_output, >children[i].err);
strbuf_reset(>children[i].err);
-   return code ? -1 : 1;
+   if (code)
+   pp->shutdown = 

Re: [PATCHv2] run-command: detect finished children by closed pipe rather than waitpid

2015-11-20 Thread Jonathan Nieder
Stefan Beller wrote:

> Detect if a child stopped working by checking if their stderr pipe
> was closed instead of checking their state with waitpid.
> As waitpid is not fully working in Windows, this is an approach which
> allows for better cross platform operation. (It's less code, too)

Can you say more about what is broken about waitpid on Windows?

I ask because it's possible for a child to close stderr without
intending to be finished.  That might be okay here (though the commit
subject doesn't explain so, it is only affecting the workqueue
interface that runs commands in parallel and not the normal
run-command interface) but would need some documentation and could be
counterintuitive.

So if there's a simple way to get waitpid to work, that seems
preferrable.

Thanks,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] run-command: detect finished children by closed pipe rather than waitpid

2015-11-20 Thread Johannes Sixt

Am 20.11.2015 um 23:05 schrieb Jonathan Nieder:

Stefan Beller wrote:


Detect if a child stopped working by checking if their stderr pipe
was closed instead of checking their state with waitpid.
As waitpid is not fully working in Windows, this is an approach which
allows for better cross platform operation. (It's less code, too)


Can you say more about what is broken about waitpid on Windows?


waitpid(-1, ...) is not implemented on Windows.

Is it necessary to mention waitpid here at all? The most compelling 
reason to write the infrastructure in this new way is that it is much 
more in line with the usual "start_command, read-until-EOF, 
finish_command" sequence.



I ask because it's possible for a child to close stderr without
intending to be finished.  That might be okay here (though the commit
subject doesn't explain so, it is only affecting the workqueue
interface that runs commands in parallel and not the normal
run-command interface) but would need some documentation and could be
counterintuitive.


It could be spelled out more clearly. The children have both their stdin 
and stdout redirected to the same writable end. On the parent side, we 
have to deal only with "stderr" simply because our child_process 
facility has everything to redirect like ">&2" (the stdout_to_stderr 
flag), but nothing for "2>&1".


Yeah, it's still possible that the child closes both stdout and stderr 
long before it dies, but that would really be far, far outside the norm.



So if there's a simple way to get waitpid to work, that seems
preferrable.


It would be possible, but not simple, to make waitpid on Windows 
suitable for the original code, but that does not make the original code 
preferrable.


-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] run-command: detect finished children by closed pipe rather than waitpid

2015-11-20 Thread Torsten Bögershausen
On 2015-11-20 22.08, Stefan Beller wrote:
The patch looks good at first glance, one minor remark below:
> 
> diff --git a/run-command.c b/run-command.c

> @@ -1071,70 +1089,31 @@ static void pp_output(struct parallel_processes *pp)
>  
>  static int pp_collect_finished(struct parallel_processes *pp)
>  {
> - int i = 0;
> - pid_t pid;
> - int wait_status, code;
> + int i, code;

code is probably "return code"?
woud "ret_value", "res" or "rc" make that more clear ?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html