Re: Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran
On 2016-08-03 3:54 PM, Junio C Hamano wrote: Jeff King writes: I agree it would be a good property to have. I think it's hard to do atomically, though. Certainly we can wait to tell the other side "your push has been recorded" until after the hook is run. But we would already have updated the refs locally at that point (and we must -- that is part of the interface to the post-receive hooks, that the refs are already in place). So would we roll-back the ref update then? Even that suffers from power failures, etc. So I'm not sure if making it truly atomic is all the feasible. As long as the requirement is that post- hook must see the updated ref in place, I do not think it is feasible to give "the hook always runs once" guarantee, without cooperation by other parts of the flow (think: pulling the power at an arbitrary point in the process). A receiving repository can implement it all in the userland, I would think, though: * A pre-receive hook records the intention to update a ref (from what old commit to what new commit), and does not return until that record is securely in a database; * A post-receive hook checks the entry in the database above (it _must_ find one), and atomically does its thing and marks the entry "done"; * A separate sweeper scans the database for entries not yet marked as "done", sees if the ref has been already updated, and atomically does its thing and marks the entry "done" (the same can be done as part of a post-receive for previously pushed commit that pre-receive recorded but did not manage to run post-receive before the power was pulled or the user did \C-c). As you originally described, the non-atomicity is not new; as long as we have necessary hooks in place on the git-core side for repositories that want a stronger guarantee, I do not think there is any more thing we need to do on this topic. If we can narrow the window in a non-intrusive way, that would be a good thing to do, though. I certainly understand not being able to make it atomic when faced with say "pulling the power at an arbitrary point in the process". That seems to me almost along the lines of disaster recovery contingency plans. But could we not guarantee that if there is no problem on the receiving end, that "IF a commit is received and the ref updated, THEN the post-receive hook is guaranteed to run". The not-so-uncommon situation where I saw this was where a user had second-thoughts and hit Ctrl-C in the middle of a push. The push went through --the ref was updated-- but the post-receive hooks were not run. Steve -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran
Jeff King writes: > I agree it would be a good property to have. I think it's hard to do > atomically, though. Certainly we can wait to tell the other side "your > push has been recorded" until after the hook is run. But we would > already have updated the refs locally at that point (and we must -- that > is part of the interface to the post-receive hooks, that the refs are > already in place). So would we roll-back the ref update then? Even that > suffers from power failures, etc. > > So I'm not sure if making it truly atomic is all the feasible. As long as the requirement is that post- hook must see the updated ref in place, I do not think it is feasible to give "the hook always runs once" guarantee, without cooperation by other parts of the flow (think: pulling the power at an arbitrary point in the process). A receiving repository can implement it all in the userland, I would think, though: * A pre-receive hook records the intention to update a ref (from what old commit to what new commit), and does not return until that record is securely in a database; * A post-receive hook checks the entry in the database above (it _must_ find one), and atomically does its thing and marks the entry "done"; * A separate sweeper scans the database for entries not yet marked as "done", sees if the ref has been already updated, and atomically does its thing and marks the entry "done" (the same can be done as part of a post-receive for previously pushed commit that pre-receive recorded but did not manage to run post-receive before the power was pulled or the user did \C-c). As you originally described, the non-atomicity is not new; as long as we have necessary hooks in place on the git-core side for repositories that want a stronger guarantee, I do not think there is any more thing we need to do on this topic. If we can narrow the window in a non-intrusive way, that would be a good thing to do, though. > However, > we could certainly make things more robust than they are now. And this change may to the "narrowing the window in a non-intrusive way" (I wonder if we also need to lift the post-update hook the same way, though). But that would still not guarantee "the hook always runs once". What we have is "the hook runs at most once". Thanks. > The simplest thing may be to just bump the post-receive hook before the > status report. That opens up the question of whether clients are > actually waiting already for the post-receive to finish. Looking at the > code in send-pack, it looks like the network is hooked up to the > sideband demuxer thread, which will read until EOF on the network. So we > are waiting either way for the post-receive to run. It doesn't really > matter if it happens before or after the report to the client. > > So I _think_ something like this would work: > > diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c > index 15c323a..91d01f0 100644 > --- a/builtin/receive-pack.c > +++ b/builtin/receive-pack.c > @@ -1767,9 +1767,9 @@ int cmd_receive_pack(int argc, const char **argv, const > char *prefix) > execute_commands(commands, unpack_status, &si); > if (pack_lockfile) > unlink_or_warn(pack_lockfile); > + run_receive_hook(commands, "post-receive", 1); > if (report_status) > report(commands, unpack_status); > - run_receive_hook(commands, "post-receive", 1); > run_update_post_hook(commands); > if (auto_gc) { > const char *argv_gc_auto[] = { > > but maybe there are other gotchas. > > -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran
On Tue, Aug 02, 2016 at 12:01:57PM -0400, Stephen Morton wrote: > Sounds tricky. I do think it's important, almost a 'data integrity' issue, > that IF a commit is received, THEN the post-receive hook must be run. Too > much mission-critical stuff is based on post-receive hooks. I agree it would be a good property to have. I think it's hard to do atomically, though. Certainly we can wait to tell the other side "your push has been recorded" until after the hook is run. But we would already have updated the refs locally at that point (and we must -- that is part of the interface to the post-receive hooks, that the refs are already in place). So would we roll-back the ref update then? Even that suffers from power failures, etc. So I'm not sure if making it truly atomic is all the feasible. However, we could certainly make things more robust than they are now. The simplest thing may be to just bump the post-receive hook before the status report. That opens up the question of whether clients are actually waiting already for the post-receive to finish. Looking at the code in send-pack, it looks like the network is hooked up to the sideband demuxer thread, which will read until EOF on the network. So we are waiting either way for the post-receive to run. It doesn't really matter if it happens before or after the report to the client. So I _think_ something like this would work: diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 15c323a..91d01f0 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1767,9 +1767,9 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix) execute_commands(commands, unpack_status, &si); if (pack_lockfile) unlink_or_warn(pack_lockfile); + run_receive_hook(commands, "post-receive", 1); if (report_status) report(commands, unpack_status); - run_receive_hook(commands, "post-receive", 1); run_update_post_hook(commands); if (auto_gc) { const char *argv_gc_auto[] = { but maybe there are other gotchas. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran
On 2016-07-25 6:22 PM, Jeff King wrote: On Mon, Jul 25, 2016 at 12:34:04PM +0200, Jan Smets wrote: I have always assumed the post-receive hook to be executed whenever a commit is "accepted" by the (gitolite) server. That does not seem to be true any more. Generally, yeah, I would expect that to be the case, too. Since 9658846 is appears that, when a client bails out, the pre-receive hook continues to run and the commit is written to the repository, but no post-receive hook is executed. No signal of any kind is received in the hook, not even a sig pipe when the post- hook is writing to stdout whilst the client has disconnected. I see. The problem is that cmd_receive_pack() does this: execute_commands(commands, unpack_status, &si); if (pack_lockfile) unlink_or_warn(pack_lockfile); if (report_status) report(commands, unpack_status); run_receive_hook(commands, "post-receive", 1); run_update_post_hook(commands); It reports the status to the client, and _then_ runs the post-receive hook. But if that reporting fails (either because of an error, or if we just get SIGPIPE because the client hung up), then we don't actually run the hooks. Leaving 9658846 out of it entirely, it is always going to be racy whether we notice that the client hung up during the pre-receive step. E.g.: - your pre-receive might not write any output, so the muxer has nothing to write to the other side, and we never notice that the connection closed until we write the status out in report() - if NO_PTHREADS is set, the sideband muxer runs in a sub-process, not a sub-thread. And thus we don't know of a hangup except by checking the result of finish_async(), which we never do. - the client could hang up just _after_ we've written the pre-receive output, but before report() is called, so there's nothing to notice until we're in report() So I think 9658846 just made that race a bit longer, because it means that a write error in the sideband muxer during the pre-receive hook means we return an error via finish_async() rather than unceremoniously calling exit() from a sub-thread. So we have a longer period during which we might actually finish off execute_commands() but not make it out of report(). And the real solution is to make cmd_receive_pack() more robust, and try harder to run the hooks even when the client hangs up or we have some other reporting error (because getting data back to the user is only one use of post-receive hooks; they are also used to queue jobs or do maintenance). But that's a bit tricky, as it requires report() to ignore SIGPIPE, and to stop using write_or_die() or any other functions that can exit (some of which happen at a lower level). Plus if a client does hangup, we don't want our hook to die with SIGPIPE either, so we'd want to proxy the data into /dev/null. -Peff Sounds tricky. I do think it's important, almost a 'data integrity' issue, that IF a commit is received, THEN the post-receive hook must be run. Too much mission-critical stuff is based on post-receive hooks. The alternatives, as I see them --either document that the post-receive hook cannot be fully trusted and that all such uses must change to asynchronous polling, or otherwise just say that nobody should hit Ctrl-C during a push (not even reflexively when their lizard-brain says "Woops, no!") and hope that network issues don't cause the same thing-- are simply not realistic. Stephen -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran
On Mon, Jul 25, 2016 at 12:34:04PM +0200, Jan Smets wrote: > I have always assumed the post-receive hook to be executed whenever a commit > is "accepted" by the (gitolite) server. That does not seem to be true any > more. Generally, yeah, I would expect that to be the case, too. > Since 9658846 is appears that, when a client bails out, the pre-receive hook > continues to run and the commit is written to the repository, but no > post-receive hook is executed. No signal of any kind is received in the > hook, not even a sig pipe when the post- hook is writing to stdout whilst > the client has disconnected. I see. The problem is that cmd_receive_pack() does this: execute_commands(commands, unpack_status, &si); if (pack_lockfile) unlink_or_warn(pack_lockfile); if (report_status) report(commands, unpack_status); run_receive_hook(commands, "post-receive", 1); run_update_post_hook(commands); It reports the status to the client, and _then_ runs the post-receive hook. But if that reporting fails (either because of an error, or if we just get SIGPIPE because the client hung up), then we don't actually run the hooks. Leaving 9658846 out of it entirely, it is always going to be racy whether we notice that the client hung up during the pre-receive step. E.g.: - your pre-receive might not write any output, so the muxer has nothing to write to the other side, and we never notice that the connection closed until we write the status out in report() - if NO_PTHREADS is set, the sideband muxer runs in a sub-process, not a sub-thread. And thus we don't know of a hangup except by checking the result of finish_async(), which we never do. - the client could hang up just _after_ we've written the pre-receive output, but before report() is called, so there's nothing to notice until we're in report() So I think 9658846 just made that race a bit longer, because it means that a write error in the sideband muxer during the pre-receive hook means we return an error via finish_async() rather than unceremoniously calling exit() from a sub-thread. So we have a longer period during which we might actually finish off execute_commands() but not make it out of report(). And the real solution is to make cmd_receive_pack() more robust, and try harder to run the hooks even when the client hangs up or we have some other reporting error (because getting data back to the user is only one use of post-receive hooks; they are also used to queue jobs or do maintenance). But that's a bit tricky, as it requires report() to ignore SIGPIPE, and to stop using write_or_die() or any other functions that can exit (some of which happen at a lower level). Plus if a client does hangup, we don't want our hook to die with SIGPIPE either, so we'd want to proxy the data into /dev/null. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Client exit whilst running pre-receive hook : commit accepted but no post-receive hook ran
Hi I have always assumed the post-receive hook to be executed whenever a commit is "accepted" by the (gitolite) server. That does not seem to be true any more. Since 9658846 is appears that, when a client bails out, the pre-receive hook continues to run and the commit is written to the repository, but no post-receive hook is executed. No signal of any kind is received in the hook, not even a sig pipe when the post- hook is writing to stdout whilst the client has disconnected. commit 9658846ce3d379b9ff8010a2ed326fcafc10eb82 Author: Jeff King Date: Wed Feb 24 02:40:16 2016 -0500 write_or_die: handle EPIPE in async threads diff --git a/write_or_die.c b/write_or_die.c ... static void check_pipe(int err) { if (err == EPIPE) { + if (in_async()) + async_exit(141); Please keep me in CC as I am not subscribed to the list. Thanks Jan The pre-receive hook from my quick testing => press Ctrl-C on the client when it is busy processing the 'sleep 5' In my testing I was committing/pushing 32MB+ binary files that take some time to process. #!/bin/bash trap 'echo TRAP >> /tmp/gittest/log' 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 IN=$(cat /dev/stdin) echo -n $(date) >> /tmp/gittest/log echo " : PRE START" >> /tmp/gittest/log for i in $(seq 1 10); do echo This is the pre-receive hook $i; sleep 0.1 done # give time for client to ctrl-c out sleep 5 echo -n $(date) >> /tmp/gittest/log echo " : PRE END" >> /tmp/gittest/log # This should result in a sigpipe? but it isn't. echo "Done !" echo "Done !" # no exit code -> accept commit -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html