Werner Koch writes ("False assumptions about nPth (was: Bug#841143: Suspected race in gpg1 to gpg2 conversion or agent startup [and 1 more messages])"): > Please point out a single threading bug in gpg-agent or any other part > of GnuPG. But before you point me to your patches please learn about > nPth (and its predecessor GNU Pth) and understand why we are not using > Posix threads directly.
You are right that I was confused about pth. It would have been very helpful if you had mentioned at some earlier point in this conversation that npth is a non-preemptive threading library and that that is why you thought there aren't threading bugs. I thought it was a simple wrapper around pthreads with some signal handling support. Use of a non-concurrent threading library is part of the kind of "systematic and effective way to avoid threading bugs" which I was hoping to find. Sorry for missing that. I think that at least my patch [PATCH 4/4] gpg agent lockup fix: Interrupt main loop when active_connections_value==0 is very likely a fix to an actual race. During debugging I several times had a gdb attached to a stuck gpg-agent process. I found the process stuck in select, selecting only on the inotify fd, with `shutdown_pending' having the value 1 and `active_connections' having the value 0. Because of difficulties collecting logging, and the fact that adding logging (once I figured out how to do so) seemed to dramatically reduce the failure probability, I can't be 100% sure of the history of those stuck gpg-agents. At the very least empirically that patch reduces the failure probability of a run of the complete dgit test suite on my laptop from about 100% (I guess that represents a failure probability of 0.1% per gnupg run) to about 5-10%. Thanks for your logging tips. Unfortunately, however, they came rather late. Yesterday this problem got me completely blocked on dgit development so I had to fight the bug alone. It took me many hours which could probably have been significantly shortened with your help. Next time someone reports a bug like this, it would be better if you mentioned the reasons why you think it's not a bug (npth's special properties, in this case). You could have linked to npth's documentation. Earlier instructions for collecting debug logs would have been helpful. Speculation as to where the bug might or might not be, rather than blanket denials, would have been welcome. I'm afraid this has made me somewhat tetchy as you can probably tell. Do you intend to rework my patch(es) and apply the ones that make sense ? Do you intend to fix the remaining bug ? Ian. PS: npth is also not bug-free. For example, see #850686, just reported. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.