Re: Soft lockup issue in Linux 4.1.9

2015-10-17 Thread Greg Kroah-Hartman
On Sat, Oct 03, 2015 at 09:14:16PM +0200, Thomas D. wrote:
> Hi,
> 
> Holger Hoffstätte wrote:
> > Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
> > will get another broken release.
> 
> For me it looks like the request was too late, the patch is not included
> in 4.1.10. So don't forget to re-apply the patch when doing the upgrade.
> 
> Greg, do you need a dedicated inclusion request for
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> in 4.1.x or is it already on your list?

Now applied, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Soft lockup issue in Linux 4.1.9

2015-10-03 Thread Thomas D.
Hi,

Holger Hoffstätte wrote:
> Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
> will get another broken release.

For me it looks like the request was too late, the patch is not included
in 4.1.10. So don't forget to re-apply the patch when doing the upgrade.

Greg, do you need a dedicated inclusion request for
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
in 4.1.x or is it already on your list?


-Thomas



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Soft lockup issue in Linux 4.1.9

2015-10-02 Thread Andre Tomt

On 01. okt. 2015 13:52, Eric Dumazet wrote:

On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
 wrote:

On 10/01/15 13:29, Eric Dumazet wrote:



commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
Author: Eric Dumazet 
Date:   Thu Aug 13 15:44:51 2015 -0700

 inet: fix potential deadlock in reqsk_queue_unlink()



Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
we speak. Let's hope that this fixes the lockups.



It definitely should help !

David, since patch is not yet seen on
http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
could you please add it to your queue ?


Seems to fix it for me as well. 3 systems have been running varying 
types of production-like loads with it for 14+ hours without hanging.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Soft lockup issue in Linux 4.1.9

2015-10-02 Thread Wolfgang Walter
Am Freitag, 2. Oktober 2015, 09:17:16 schrieb Holger Hoffstätte:
> On 10/02/15 08:52, Andre Tomt wrote:
> > On 01. okt. 2015 13:52, Eric Dumazet wrote:
> >> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
> >> 
> >>  wrote:
> >>> On 10/01/15 13:29, Eric Dumazet wrote:
>  commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
>  Author: Eric Dumazet 
>  Date:   Thu Aug 13 15:44:51 2015 -0700
>  
>   inet: fix potential deadlock in reqsk_queue_unlink()
> > 
> > 
> > 
> >>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
> >>> we speak. Let's hope that this fixes the lockups.
> >> 
> >> It definitely should help !
> >> 
> >> David, since patch is not yet seen on
> >> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
> >> could you please add it to your queue ?
> > 
> > Seems to fix it for me as well. 3 systems have been running varying
> > types of production-like loads with it for 14+ hours without hanging.
> 
> Just got up, and yes - my systems survived the night as well, no issues.
> 
> Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
> will get another broken release.
> 

Fixes the problem here, too.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Soft lockup issue in Linux 4.1.9

2015-10-02 Thread Thomas Gleixner
On Thu, 1 Oct 2015, Eric Dumazet wrote:
> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
>  wrote:
> > On 10/01/15 13:29, Eric Dumazet wrote:
> 
> >> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> >> Author: Eric Dumazet 
> >> Date:   Thu Aug 13 15:44:51 2015 -0700
> >>
> >> inet: fix potential deadlock in reqsk_queue_unlink()
> >>
> >> When replacing del_timer() with del_timer_sync(), I introduced
> >> a deadlock condition :
> >>
> >> reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
> >>
> >> inet_csk_reqsk_queue_drop() can be called from many contexts,
> >> one being the timer handler itself (reqsk_timer_handler()).
> >>
> >> In this case, del_timer_sync() loops forever.
> >>
> >> Simple fix is to test if timer is pending.
> >>
> >> Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
> >> Signed-off-by: Eric Dumazet 
> >> Signed-off-by: David S. Miller 
> >
> > Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
> > we speak. Let's hope that this fixes the lockups.
> >
> 
> It definitely should help !

What makes sure, that the timer cannot be readded while that timer
callback is running?

Thanks,

tglx



Re: Soft lockup issue in Linux 4.1.9

2015-10-02 Thread Eric Dumazet
On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote:

> What makes sure, that the timer cannot be readded while that timer
> callback is running?

What is exactly your question ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Soft lockup issue in Linux 4.1.9

2015-10-02 Thread Eric Dumazet
On Fri, 2015-10-02 at 23:04 +0200, Thomas Gleixner wrote:
> On Fri, 2 Oct 2015, Eric Dumazet wrote:
> > On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote:
> > 
> > > What makes sure, that the timer cannot be readded while that timer
> > > callback is running?
> > 
> > What is exactly your question ?
> 
> CPU0  CPU1
> 
> timer expires
>   callback
>   add timer
>   timer_pending() == true
>   ===> del_timer_sync()
> 
> I was just curious how this is prevented as I got lost in the
> networking code as usual :)

Sure ;)

I believe this can not happen for following reasons :

mod_timer_pinned() is used only when req is created, while timer cannot
possibly be running on the same req. The _pinned part is critical
because we set the req->refcnt _after_ starting the timer,
to avoid being visible and caught from rcu lookups in hash tables.

Then, timer might be modified only by mod_timer_pending() from
tcp_check_req() : This should not re-start timer if another cpu is in
the timer callback.

Thanks



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Soft lockup issue in Linux 4.1.9

2015-10-02 Thread Thomas Gleixner
On Fri, 2 Oct 2015, Eric Dumazet wrote:
> On Fri, 2015-10-02 at 22:04 +0200, Thomas Gleixner wrote:
> 
> > What makes sure, that the timer cannot be readded while that timer
> > callback is running?
> 
> What is exactly your question ?

CPU0CPU1

timer expires
  callback
add timer
  timer_pending() == true
  ===> del_timer_sync()

I was just curious how this is prevented as I got lost in the
networking code as usual :)

Thanks,

tglx

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Soft lockup issue in Linux 4.1.9

2015-10-02 Thread Holger Hoffstätte
On 10/02/15 08:52, Andre Tomt wrote:
> On 01. okt. 2015 13:52, Eric Dumazet wrote:
>> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
>>  wrote:
>>> On 10/01/15 13:29, Eric Dumazet wrote:
>>
 commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
 Author: Eric Dumazet 
 Date:   Thu Aug 13 15:44:51 2015 -0700

  inet: fix potential deadlock in reqsk_queue_unlink()
> 
>>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
>>> we speak. Let's hope that this fixes the lockups.
>>>
>>
>> It definitely should help !
>>
>> David, since patch is not yet seen on
>> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
>> could you please add it to your queue ?
> 
> Seems to fix it for me as well. 3 systems have been running varying
> types of production-like loads with it for 14+ hours without hanging.

Just got up, and yes - my systems survived the night as well, no issues.

Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
will get another broken release.

cheers
Holger

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Soft lockup issue in Linux 4.1.9

2015-10-01 Thread Eric Dumazet
On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
 wrote:
> On 10/01/15 13:29, Eric Dumazet wrote:

>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
>> Author: Eric Dumazet 
>> Date:   Thu Aug 13 15:44:51 2015 -0700
>>
>> inet: fix potential deadlock in reqsk_queue_unlink()
>>
>> When replacing del_timer() with del_timer_sync(), I introduced
>> a deadlock condition :
>>
>> reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
>>
>> inet_csk_reqsk_queue_drop() can be called from many contexts,
>> one being the timer handler itself (reqsk_timer_handler()).
>>
>> In this case, del_timer_sync() loops forever.
>>
>> Simple fix is to test if timer is pending.
>>
>> Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
>> Signed-off-by: Eric Dumazet 
>> Signed-off-by: David S. Miller 
>
> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
> we speak. Let's hope that this fixes the lockups.
>

It definitely should help !

David, since patch is not yet seen on
http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
could you please add it to your queue ?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Soft lockup issue in Linux 4.1.9

2015-10-01 Thread Holger Hoffstätte
On 10/01/15 13:29, Eric Dumazet wrote:
> On Thu, Oct 1, 2015 at 3:59 AM, Holger Hoffstätte
>  wrote:
>>
>> On Thu, 01 Oct 2015 06:41:46 +0200, Andre Tomt wrote:
>>
>>> On 01. okt. 2015 00:37, Holger Hoffstätte wrote:
 On Wed, 30 Sep 2015 23:59:43 +0200, Olivier Bonvalet wrote:

> for information, I've just upgraded 6 servers from Linux 4.1.8 to Linux
> 4.1.9, and have some random soft lockup. If this can help :

 Congratulations! You're not the first one to get hit by this, but
 you are probably the first one to get a meaningful stacktrace! \o/

> [  204.478380] Call Trace:
> [  204.478381]  
> [  204.478385]  [] ? try_to_del_timer_sync+0x43/0x4d
> [  204.478386]  [] ? del_timer+0x4d/0x4d
> [  204.478388]  [] ? del_timer_sync+0x20/0x3d

 Can you try to revert

 [PATCH 4.1 157/159] inet: fix races with reqsk timers

 and see how that works for you? I'll do the same on my end. So far the
 only thing I ever could gleam was an rcu stall after cpuidle_enter(),
 but never anything regarding the timer - though it was definitely
 related to NIC activity after idle.
>>>
>>> I'm running with this patch reverted now as well. 2 hours no issues so
>>> far, but I can't conclude anything yet as I've seen it take up to 6+
>>> hours to explode here. As a result the bisect was going veeery slowly.
>>
>> Now 12+ hours going without problems, never got this far with the patch
>> included, as it would usually freeze during idle periods.
>>
>> As far as I'm concerned this is the culprit and should be reverted in
>> 4.1.x, unless Eric can suggest how to fix this. (cc'ed).
>>
> 
> Looks an old and known problem...
> 
> Following commit should be sent/added for 4.1 stable tree :
> 
> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> Author: Eric Dumazet 
> Date:   Thu Aug 13 15:44:51 2015 -0700
> 
> inet: fix potential deadlock in reqsk_queue_unlink()
> 
> When replacing del_timer() with del_timer_sync(), I introduced
> a deadlock condition :
> 
> reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()
> 
> inet_csk_reqsk_queue_drop() can be called from many contexts,
> one being the timer handler itself (reqsk_timer_handler()).
> 
> In this case, del_timer_sync() loops forever.
> 
> Simple fix is to test if timer is pending.
> 
> Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
> Signed-off-by: Eric Dumazet 
> Signed-off-by: David S. Miller 

Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
we speak. Let's hope that this fixes the lockups.

Thanks for the quick reply!

Holger

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html