Re: Soft lockup issue in Linux 4.1.9
On Sat, Oct 03, 2015 at 09:14:16PM +0200, Thomas D. wrote: > Hi, > > Holger Hoffstätte wrote: > > Greg, any chance you can drop this into the pending 4.1.10? Otherwise people > > will get another broken release. > > For me it looks like the request was too late, the patch is not included > in 4.1.10. So don't forget to re-apply the patch when doing the upgrade. > > Greg, do you need a dedicated inclusion request for > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af > in 4.1.x or is it already on your list? Now applied, thanks. greg k-h -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Soft lockup issue in Linux 4.1.9
Hi, Holger Hoffstätte wrote: > Greg, any chance you can drop this into the pending 4.1.10? Otherwise people > will get another broken release. For me it looks like the request was too late, the patch is not included in 4.1.10. So don't forget to re-apply the patch when doing the upgrade. Greg, do you need a dedicated inclusion request for http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af in 4.1.x or is it already on your list? -Thomas -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Soft lockup issue in Linux 4.1.9
On 01. okt. 2015 13:52, Eric Dumazet wrote: On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstättewrote: On 10/01/15 13:29, Eric Dumazet wrote: commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af Author: Eric Dumazet Date: Thu Aug 13 15:44:51 2015 -0700 inet: fix potential deadlock in reqsk_queue_unlink() Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as we speak. Let's hope that this fixes the lockups. It definitely should help ! David, since patch is not yet seen on http://patchwork.ozlabs.org/bundle/davem/stable/?state=* could you please add it to your queue ? Seems to fix it for me as well. 3 systems have been running varying types of production-like loads with it for 14+ hours without hanging. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Soft lockup issue in Linux 4.1.9
Am Freitag, 2. Oktober 2015, 09:17:16 schrieb Holger Hoffstätte: > On 10/02/15 08:52, Andre Tomt wrote: > > On 01. okt. 2015 13:52, Eric Dumazet wrote: > >> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte > >> > >>wrote: > >>> On 10/01/15 13:29, Eric Dumazet wrote: > commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af > Author: Eric Dumazet > Date: Thu Aug 13 15:44:51 2015 -0700 > > inet: fix potential deadlock in reqsk_queue_unlink() > > > > > > > >>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as > >>> we speak. Let's hope that this fixes the lockups. > >> > >> It definitely should help ! > >> > >> David, since patch is not yet seen on > >> http://patchwork.ozlabs.org/bundle/davem/stable/?state=* > >> could you please add it to your queue ? > > > > Seems to fix it for me as well. 3 systems have been running varying > > types of production-like loads with it for 14+ hours without hanging. > > Just got up, and yes - my systems survived the night as well, no issues. > > Greg, any chance you can drop this into the pending 4.1.10? Otherwise people > will get another broken release. > Fixes the problem here, too. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Soft lockup issue in Linux 4.1.9
On Thu, 1 Oct 2015, Eric Dumazet wrote: > On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte >wrote: > > On 10/01/15 13:29, Eric Dumazet wrote: > > >> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af > >> Author: Eric Dumazet > >> Date: Thu Aug 13 15:44:51 2015 -0700 > >> > >> inet: fix potential deadlock in reqsk_queue_unlink() > >> > >> When replacing del_timer() with del_timer_sync(), I introduced > >> a deadlock condition : > >> > >> reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop() > >> > >> inet_csk_reqsk_queue_drop() can be called from many contexts, > >> one being the timer handler itself (reqsk_timer_handler()). > >> > >> In this case, del_timer_sync() loops forever. > >> > >> Simple fix is to test if timer is pending. > >> > >> Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers") > >> Signed-off-by: Eric Dumazet > >> Signed-off-by: David S. Miller > > > > Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as > > we speak. Let's hope that this fixes the lockups. > > > > It definitely should help ! What makes sure, that the timer cannot be readded while that timer callback is running? Thanks, tglx
Re: Soft lockup issue in Linux 4.1.9
On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote: > What makes sure, that the timer cannot be readded while that timer > callback is running? What is exactly your question ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Soft lockup issue in Linux 4.1.9
On Fri, 2015-10-02 at 23:04 +0200, Thomas Gleixner wrote: > On Fri, 2 Oct 2015, Eric Dumazet wrote: > > On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote: > > > > > What makes sure, that the timer cannot be readded while that timer > > > callback is running? > > > > What is exactly your question ? > > CPU0 CPU1 > > timer expires > callback > add timer > timer_pending() == true > ===> del_timer_sync() > > I was just curious how this is prevented as I got lost in the > networking code as usual :) Sure ;) I believe this can not happen for following reasons : mod_timer_pinned() is used only when req is created, while timer cannot possibly be running on the same req. The _pinned part is critical because we set the req->refcnt _after_ starting the timer, to avoid being visible and caught from rcu lookups in hash tables. Then, timer might be modified only by mod_timer_pending() from tcp_check_req() : This should not re-start timer if another cpu is in the timer callback. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Soft lockup issue in Linux 4.1.9
On Fri, 2 Oct 2015, Eric Dumazet wrote: > On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote: > > > What makes sure, that the timer cannot be readded while that timer > > callback is running? > > What is exactly your question ? CPU0CPU1 timer expires callback add timer timer_pending() == true ===> del_timer_sync() I was just curious how this is prevented as I got lost in the networking code as usual :) Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Soft lockup issue in Linux 4.1.9
On 10/02/15 08:52, Andre Tomt wrote: > On 01. okt. 2015 13:52, Eric Dumazet wrote: >> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte >>wrote: >>> On 10/01/15 13:29, Eric Dumazet wrote: >> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af Author: Eric Dumazet Date: Thu Aug 13 15:44:51 2015 -0700 inet: fix potential deadlock in reqsk_queue_unlink() > >>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as >>> we speak. Let's hope that this fixes the lockups. >>> >> >> It definitely should help ! >> >> David, since patch is not yet seen on >> http://patchwork.ozlabs.org/bundle/davem/stable/?state=* >> could you please add it to your queue ? > > Seems to fix it for me as well. 3 systems have been running varying > types of production-like loads with it for 14+ hours without hanging. Just got up, and yes - my systems survived the night as well, no issues. Greg, any chance you can drop this into the pending 4.1.10? Otherwise people will get another broken release. cheers Holger -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Soft lockup issue in Linux 4.1.9
On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstättewrote: > On 10/01/15 13:29, Eric Dumazet wrote: >> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af >> Author: Eric Dumazet >> Date: Thu Aug 13 15:44:51 2015 -0700 >> >> inet: fix potential deadlock in reqsk_queue_unlink() >> >> When replacing del_timer() with del_timer_sync(), I introduced >> a deadlock condition : >> >> reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop() >> >> inet_csk_reqsk_queue_drop() can be called from many contexts, >> one being the timer handler itself (reqsk_timer_handler()). >> >> In this case, del_timer_sync() loops forever. >> >> Simple fix is to test if timer is pending. >> >> Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers") >> Signed-off-by: Eric Dumazet >> Signed-off-by: David S. Miller > > Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as > we speak. Let's hope that this fixes the lockups. > It definitely should help ! David, since patch is not yet seen on http://patchwork.ozlabs.org/bundle/davem/stable/?state=* could you please add it to your queue ? Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Soft lockup issue in Linux 4.1.9
On 10/01/15 13:29, Eric Dumazet wrote: > On Thu, Oct 1, 2015 at 3:59 AM, Holger Hoffstätte >wrote: >> >> On Thu, 01 Oct 2015 06:41:46 +0200, Andre Tomt wrote: >> >>> On 01. okt. 2015 00:37, Holger Hoffstätte wrote: On Wed, 30 Sep 2015 23:59:43 +0200, Olivier Bonvalet wrote: > for information, I've just upgraded 6 servers from Linux 4.1.8 to Linux > 4.1.9, and have some random soft lockup. If this can help : Congratulations! You're not the first one to get hit by this, but you are probably the first one to get a meaningful stacktrace! \o/ > [ 204.478380] Call Trace: > [ 204.478381] > [ 204.478385] [] ? try_to_del_timer_sync+0x43/0x4d > [ 204.478386] [] ? del_timer+0x4d/0x4d > [ 204.478388] [] ? del_timer_sync+0x20/0x3d Can you try to revert [PATCH 4.1 157/159] inet: fix races with reqsk timers and see how that works for you? I'll do the same on my end. So far the only thing I ever could gleam was an rcu stall after cpuidle_enter(), but never anything regarding the timer - though it was definitely related to NIC activity after idle. >>> >>> I'm running with this patch reverted now as well. 2 hours no issues so >>> far, but I can't conclude anything yet as I've seen it take up to 6+ >>> hours to explode here. As a result the bisect was going veeery slowly. >> >> Now 12+ hours going without problems, never got this far with the patch >> included, as it would usually freeze during idle periods. >> >> As far as I'm concerned this is the culprit and should be reverted in >> 4.1.x, unless Eric can suggest how to fix this. (cc'ed). >> > > Looks an old and known problem... > > Following commit should be sent/added for 4.1 stable tree : > > commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af > Author: Eric Dumazet > Date: Thu Aug 13 15:44:51 2015 -0700 > > inet: fix potential deadlock in reqsk_queue_unlink() > > When replacing del_timer() with del_timer_sync(), I introduced > a deadlock condition : > > reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop() > > inet_csk_reqsk_queue_drop() can be called from many contexts, > one being the timer handler itself (reqsk_timer_handler()). > > In this case, del_timer_sync() loops forever. > > Simple fix is to test if timer is pending. > > Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers") > Signed-off-by: Eric Dumazet > Signed-off-by: David S. Miller Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as we speak. Let's hope that this fixes the lockups. Thanks for the quick reply! Holger -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html