Hi git Experts,
I believe we've encountered a bug in git. I built the latest, git 2.16.3, and
it still appears to be a problem. I'm not a git expert and my C is
ridiculously rusty but I think a co-worker and I figured it out... apologies if
we are incorrect in any assumptions (as I do not wish to waste anyone's time).
The location of the problem is in credential.c, run_credential_helper()... this
code:
...
fp = xfdopen(helper.in, "w");
credential_write(c, fp);
fclose(fp);
..
Which I think needs to become something like this:
fp = xfdopen(helper.in, "w");
sigchain_push(SIGPIPE, SIG_IGN);
credential_write(c, fp);
fclose(fp);
sigchain_pop(SIGPIPE);
The basics are that we wrote a credential helper in Go and, for the store
action, it simply exits 0. It is fast. This is similar to the example here:
https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage#_a_custom_credential_cache
Which is, of course, in Ruby, not Go (so, not so fast). It exits if it is not
a get cred helper action without reading stdin. Anyhow, for our credential
helper the store operation completes very quickly. What we've found is that
occasionally the git command will be killed off just after running the
credential store operation. We can see that our credential helper is being
fired up (it has a debug mode) and it quickly exits. We can see that git dies
on the fclose(fp) by putting trace_printf() calls before and after that fclose
in the git source code (ie: the trace message before the fclose() prints, the
trace message after the fclose does not).
Our belief is that the write is buffered and written at the time of fclose()...
and that the credential helper tool has already exited "sometimes" (as it is
very fast, but even so this doesn't fail very often). For those times when our
cred helper has exited "quickly enough", a SIGPIPE can be generated... and, as
SIGPIPE is not ignored, git goes "kaboom!" and dies.
To catch this scenario we wrote a simple hack Perl tool to run a bunch of
serial git ls-remote commands like so:
#!/usr/cisco/bin/perl -w
$ENV{'GIT_TRACE'} = 2;
$ENV{'GIT_TRACE_CURL'} = 2;
$ENV{'GIT_TRACE_PERFORMANCE'} = 1;
$ENV{'GIT_TRACE_PACKET'} = 1;
for ( my $i = 1; $i<=100000; $i++) {
print("Run: $i\n");
my $output = `/ws/brady-sjc/git/git-2.16.3/bin/git ls-remote
https://hostname.company.com/git/path/repo.git HEAD 2>&1`;
if ( $? != 0 ) {
print("FAIL: output:\n$output\n");
exit(1);
}
}
exit(0);
The problem seemed to come up most commonly on older linux VM's... in this case
running 2.6.18-416.el5 #1 SMP. The tool will iterate for a while and then,
boom, it blows up (usually within 1000 iterations, sometimes a couple/few
thousand but usually well before that).
Anyhow... we did a few things to test our theory that it is, indeed, SIGPIPE
causing git to exit:
1) My co-worker modified our credential manager to read in the data sent by git
before exiting... that solved the problem as we accept the data written so the
child is still there and no SIGPIPE is generated... this is our current
workaround (so we are OK, but would be good to fix this in the git code we
think)
2) I modified the above code to use a signal handler in credential.c (instead
of SIG_IGN) and put a trace_printf() and an exit(1) inside the signal
handler... similar to the problem we're seeing it'll run a bunch successfully
until, boom, timing is hit such that the child exits quickly enough and causes
the SIGPIPE to occur at which point git is killed.... so using the cheesy Perl
test script it ran through a couple hundred iterations fine and then a SIGPIPE
was seen and it died in the signal handler I wrote... so clearly SIGPIPE is
being generated but only occasionally (it is timing based, so occurs only now
and then... and we almost never see it on some of our boxes)
3) I modified the git code (using our old cred helper which exits quickly) per
the above recommended credential.c changes (you folks can likely do a better
fix)... and re-run the Perl test script... no longer fails now that we are
ignoring SIGPIPE (ran about 20,000+ iterations).
Note that the build-in credential manager was not failing... it reads the cred
helper store data so it would not have the problem.
Let me know if you need any additional information... and thanks for your time
and consideration.
Erik
[email protected]