Bug#493055: BRIStuff patches introduce deadlock
Hi, Faidon Liambotis wrote: Kevin Shanahan wrote: Well it looked promising, but no - the lockup still appears to be reproducable. The patch I used to test is attached. Let me know if I made any mistake in backporting it (it should apply cleanly as the last patch in the debian/patches/series file). Yes, you made a mistake in backporting it :) The right backported patch is this: http://svn.debian.org/viewsvn/pkg-voip/asterisk/trunk/debian/patches/zap-fix-deadlock?rev=6221view=auto The problem is where thw lock is getting held; I'm not sure if we should hold the lock there as well but it's definitely not what the original patch did. Did you have any luck with this? I'll make an upload tomorrow or the day after that, so I'd appreciate it you could answer me until then. We can always make an upload later, but it'd be nice to know if the bug will exist in lenny (hopefully not!) Thanks, Faidon -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#493055: BRIStuff patches introduce deadlock
On Sat, Sep 27, 2008 at 05:10:33AM +0300, Faidon Liambotis wrote: You seem to have a good overview of this particular bug. Do you think that the patch that Tzafrir mentioned[1] would fix it? If not, do you have a proposal for a fix? 1: http://repo.or.cz/w/asterisk-bristuff.git?a=commit;h=6e44531a8a112e36588b5dbced309b0521d6b64e Well it looked promising, but no - the lockup still appears to be reproducable. The patch I used to test is attached. Let me know if I made any mistake in backporting it (it should apply cleanly as the last patch in the debian/patches/series file). I'll see if I can get a useful backtrace when I have a bit more time where I can afford to have the phones offline. Regards, Kevin. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#493055: BRIStuff patches introduce deadlock
On Sun, Sep 28, 2008 at 08:19:57PM +0930, Kevin Shanahan wrote: On Sat, Sep 27, 2008 at 05:10:33AM +0300, Faidon Liambotis wrote: You seem to have a good overview of this particular bug. Do you think that the patch that Tzafrir mentioned[1] would fix it? If not, do you have a proposal for a fix? 1: http://repo.or.cz/w/asterisk-bristuff.git?a=commit;h=6e44531a8a112e36588b5dbced309b0521d6b64e Well it looked promising, but no - the lockup still appears to be reproducable. The patch I used to test is attached. Let me know if I made any mistake in backporting it (it should apply cleanly as the last patch in the debian/patches/series file). I'll see if I can get a useful backtrace when I have a bit more time where I can afford to have the phones offline. I noticed that 1.4.21 - 1.4.22 cleared quite a few deadlocks (see the DEADLOCK_AVOIDANCE macro). Can you give a reproducable deadlock scenario? -- Tzafrir Cohen icq#16849755 jabber:[EMAIL PROTECTED] +972-50-7952406 mailto:[EMAIL PROTECTED] http://www.xorcom.com iax:[EMAIL PROTECTED]/tzafrir -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#493055: BRIStuff patches introduce deadlock
On Sun, Sep 28, 2008 at 02:00:17PM +0300, Tzafrir Cohen wrote: I noticed that 1.4.21 - 1.4.22 cleared quite a few deadlocks (see the DEADLOCK_AVOIDANCE macro). Remeber, my deadlock only occurs when the bristuff patches have been applied. Can you give a reproducable deadlock scenario? I don't know that I can give you an easy way to reproduce it yourself, but I can certainly reproduce this at will here. All I need to do is call into the PBX on a PRI number from my home phone. Regards, Kevin. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#493055: BRIStuff patches introduce deadlock
On Sun, Sep 28, 2008 at 08:44:14PM +0930, Kevin Shanahan wrote: On Sun, Sep 28, 2008 at 02:00:17PM +0300, Tzafrir Cohen wrote: I noticed that 1.4.21 - 1.4.22 cleared quite a few deadlocks (see the DEADLOCK_AVOIDANCE macro). Remeber, my deadlock only occurs when the bristuff patches have been applied. Can you give a reproducable deadlock scenario? I don't know that I can give you an easy way to reproduce it yourself, but I can certainly reproduce this at will here. All I need to do is call into the PBX on a PRI number from my home phone. Can you please give a full trace (both 'pri debug span N' and 'core set verbose/debug' with verbose/debug logging enabled)? What version do you use exactly? -- Tzafrir Cohen icq#16849755 jabber:[EMAIL PROTECTED] +972-50-7952406 mailto:[EMAIL PROTECTED] http://www.xorcom.com iax:[EMAIL PROTECTED]/tzafrir -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#493055: BRIStuff patches introduce deadlock
On Sun, Sep 28, 2008 at 08:19:57PM +0930, Kevin Shanahan wrote: Well it looked promising, but no - the lockup still appears to be reproducable. The patch I used to test is attached. Let me know if I made any mistake in backporting it (it should apply cleanly as the last patch in the debian/patches/series file). Ugh, sorry - forgot to attach the patch. Locking fixes in chan_zap: lock chan before pvt Taken from http://repo.or.cz/w/asterisk-bristuff.git?a=commit;h=6e44531a8a112e36588b5dbced309b0521d6b64e diff -urNp a/channels/chan_zap.c b/channels/chan_zap.c --- a/channels/chan_zap.c 2008-09-28 19:33:09.0 +0930 +++ b/channels/chan_zap.c 2008-09-28 19:37:27.0 +0930 @@ -8851,7 +8851,7 @@ static void *pri_dchannel(void *vpri) int haveidles; int activeidles; int nextidle = -1; - struct ast_channel *c; + struct ast_channel *c = NULL; struct timeval tv, lowest, *next; struct timeval lastidle = { 0, 0 }; int doidling=0; @@ -9539,6 +9539,7 @@ static void *pri_dchannel(void *vpri) if (e-ring.redirectingreason = 0) pbx_builtin_setvar_helper(c, PRIREDIRECTREASON, redirectingreason2str(e-ring.redirectingreason)); + ast_muted_lock(c-lock); ast_mutex_lock(pri-pvts[chanpos]-lock); ast_mutex_lock(pri-lock); @@ -9637,6 +9638,8 @@ static void *pri_dchannel(void *vpri) if (crv) ast_mutex_unlock(crv-lock); ast_mutex_unlock(pri-pvts[chanpos]-lock); + if (c) + ast_mutex_unlock(c-lock); } else { if (e-ring.flexible) pri_hangup(pri-pri, e-ring.call, PRI_CAUSE_NORMAL_CIRCUIT_CONGESTION, -1);
Bug#493055: BRIStuff patches introduce deadlock
Kevin, hi, Kevin Shanahan wrote: Well it looked promising, but no - the lockup still appears to be reproducable. The patch I used to test is attached. Let me know if I made any mistake in backporting it (it should apply cleanly as the last patch in the debian/patches/series file). Yes, you made a mistake in backporting it :) The right backported patch is this: http://svn.debian.org/viewsvn/pkg-voip/asterisk/trunk/debian/patches/zap-fix-deadlock?rev=6221view=auto The problem is where thw lock is getting held; I'm not sure if we should hold the lock there as well but it's definitely not what the original patch did. Thanks a lot, Faidon -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#493055: BRIStuff patches introduce deadlock
Kevin, hi, Kevin Shanahan wrote: I had some problems with my Asterisk installation having the PRI channels lock up completely when certain types of calls were received. Eventually this was traced back to a deadlock caused by the bristuff patches being applied. Various information, logs and a stack trace of the lockup here: http://bugs.digium.com/view.php?id=13192 I'm choosing severity grave because this effectively will allow someone calling in with particlar caller ID options/flags to lock up your PRI span on demand (i.e. DoS). You seem to have a good overview of this particular bug. Do you think that the patch that Tzafrir mentioned[1] would fix it? If not, do you have a proposal for a fix? Thanks, Faidon 1: http://repo.or.cz/w/asterisk-bristuff.git?a=commit;h=6e44531a8a112e36588b5dbced309b0521d6b64e -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#493055: BRIStuff patches introduce deadlock
On Thu, Jul 31, 2008 at 01:14:46PM +0930, Kevin Shanahan wrote: Package: asterisk Version: 1.4.21.2~dfsg-1 Severity: grave I had some problems with my Asterisk installation having the PRI channels lock up completely when certain types of calls were received. Eventually this was traced back to a deadlock caused by the bristuff patches being applied. Various information, logs and a stack trace of the lockup here: http://bugs.digium.com/view.php?id=13192 I'm choosing severity grave because this effectively will allow someone calling in with particlar caller ID options/flags to lock up your PRI span on demand (i.e. DoS). The following patch has fixed at least one case of reproducable deadlocks for us. It still needs to be backported to chan_zap. It is currently at http://repo.or.cz/w/asterisk-bristuff.git?a=commit;h=6e44531a8a112e36588b5dbced309b0521d6b64e (Should be at the tag bristuff-0.4.0-RC4-xr1, but the mirroring seems to be lagging) -- Tzafrir Cohen icq#16849755 jabber:[EMAIL PROTECTED] +972-50-7952406 mailto:[EMAIL PROTECTED] http://www.xorcom.com iax:[EMAIL PROTECTED]/tzafrir commit ccd11da0599c190f5b678aed3f164579ca873c71 Author: Tzafrir Cohen [EMAIL PROTECTED] Date: Mon Sep 8 10:59:06 2008 + Locking fixes in chan_dahdi: lock chan before pvt diff --git a/channels/chan_dahdi.c b/channels/chan_dahdi.c index 35c4e4e..c12c112 100644 --- a/channels/chan_dahdi.c +++ b/channels/chan_dahdi.c @@ -9105,7 +9105,7 @@ static void *pri_dchannel(void *vpri) int haveidles; int activeidles; int nextidle = -1; - struct ast_channel *c; + struct ast_channel *c = NULL; struct timeval tv, lowest, *next; struct timeval lastidle = { 0, 0 }; int doidling=0; @@ -9846,6 +9846,7 @@ static void *pri_dchannel(void *vpri) snprintf(calledtonstr, sizeof(calledtonstr)-1, %d, e-ring.calledplan); pbx_builtin_setvar_helper(c, CALLEDTON, calledtonstr); +ast_mutex_lock(c-lock); ast_mutex_lock(pri-pvts[chanpos]-lock); ast_mutex_lock(pri-lock); @@ -9893,6 +9894,8 @@ static void *pri_dchannel(void *vpri) if (crv) ast_mutex_unlock(crv-lock); ast_mutex_unlock(pri-pvts[chanpos]-lock); + if (c) + ast_mutex_unlock(c-lock); } else { if (e-ring.flexible) pri_hangup(pri-pri, e-ring.call, PRI_CAUSE_NORMAL_CIRCUIT_CONGESTION);
Bug#493055: BRIStuff patches introduce deadlock
Package: asterisk Version: 1.4.21.2~dfsg-1 Severity: grave I had some problems with my Asterisk installation having the PRI channels lock up completely when certain types of calls were received. Eventually this was traced back to a deadlock caused by the bristuff patches being applied. Various information, logs and a stack trace of the lockup here: http://bugs.digium.com/view.php?id=13192 I'm choosing severity grave because this effectively will allow someone calling in with particlar caller ID options/flags to lock up your PRI span on demand (i.e. DoS). Regards, Kevin. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#493055: BRIStuff patches introduce deadlock
severity 493055 important thanks Kevin Shanahan wrote: I had some problems with my Asterisk installation having the PRI channels lock up completely when certain types of calls were received. Eventually this was traced back to a deadlock caused by the bristuff patches being applied. Various information, logs and a stack trace of the lockup here: http://bugs.digium.com/view.php?id=13192 I'm choosing severity grave because this effectively will allow someone calling in with particlar caller ID options/flags to lock up your PRI span on demand (i.e. DoS). Well, this is definitely a very nasty bug but from a release prespective I'd like to point out two things: a) You are one of the _very_ few people to experience this deadlock; I've experienced it once or twice in one of my (many!) installations all this time and noone else has reported it so far. Perhaps it has something to do with what the network sends to Asterisk, I'm not so sure myself. b) Asterisk is a very complicated program with dozens of threads and locks between them. I don't think we (as in, Debian) can solve this and I highly doubt that upstream will, at least in time for lenny. In summary, while I acknowledge that this breaks your setup -and believe me, I'd like to have this fixed as much as you do- I don't think it's a release critical bug and that it should prevent Asterisk from being shipped with our next stable release. Thanks, Faidon -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]