Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Sun, Apr 24, 2016, at 23:25, Eric Dumazet wrote: > #ifdef CONFIG_LOCKDEP > - WARN_ON(!lockdep_sock_is_held(sk)); > + WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks); > #endif Eric, could you resend this patch without the negation and also add my acked-by (I came finally around to test it). Thanks, Hannes
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Sun, 2016-04-24 at 14:25 -0700, Eric Dumazet wrote: > On Sun, 2016-04-24 at 17:13 -0400, valdis.kletni...@vt.edu wrote: > > On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said: > > > On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote: > > > > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said: > > > > > > > > > >>> + return !debug_locks || > > > > > >>> +lockdep_is_held(>sk_lock) || > > > > > > > > > Issue here is that once lockdep detected a problem (not necessarily in > > > > > net/ tree btw), your helper always 'detect' a problem, since lockdep > > > > > automatically disables itself. > > > > > > > > "D'Oh!" -- H. Simpson > > > > > > > > I thought this patch looked suspect, but couldn't put my finger on it. > > > > The > > > > reason why I got like 41,000 of them is because I built a kernel that > > > > has > > > > lockdep enabled, but I have an out-of-tree module that doesn't init > > > > something, > > > > so I get this: > > > > > > > > [ 48.898156] INFO: trying to register non-static key. > > > > [ 48.898157] the code is fine but needs lockdep annotation. > > > > [ 48.898157] turning off the locking correctness validator. > > > > > > > > After which point, even with this patch, every time through it's still > > > > going to > > > > explode. > > > > > > Which patch are you talking about ? > > > > The one that adds the !debug_locks check - once my out-of-kernel module > > hits something that turns off lockdep, it's *still* going to complain on > > pretty much all the same packets it complained about earlier. I thought > > it looked suspicious, but you clarified why... > > It does not make sense to me. If lockdep is disabled, then debug_locks > is 0. > > So no complain should happen from networking. > > I was about to send following patch, please check it solves the issue. ? > > (It certainly did for me, once I forced a lockdep splat loading a buggy > module) > > Thanks > > From: Eric Dumazet> > Valdis reported tons of stack dumps caused by WARN_ON() in > sock_owned_by_user() > > This test needs to be relaxed if/when lockdep disables itself. > > Note that other lockdep_sock_is_held() callers are all from > rcu_dereference_protected() sections which already are disabled > if/when lockdep has been disabled. > > Fixes: fafc4e1ea1a4 ("sock: tigthen lockdep checks for sock_owned_by_user") > Reported-by: Valdis Kletnieks > Signed-off-by: Eric Dumazet > --- > include/net/sock.h |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/net/sock.h b/include/net/sock.h > index 52448baf19d7..f492d01512ed 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -1409,7 +1409,7 @@ static inline void unlock_sock_fast(struct sock *sk, > bool slow) > static inline bool sock_owned_by_user(const struct sock *sk) > { > #ifdef CONFIG_LOCKDEP > - WARN_ON(!lockdep_sock_is_held(sk)); > + WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks); Silly me, I tested the opposite test of course : WARN_ON_ONCE(!lockdep_sock_is_held(sk) && debug_locks); > #endif > return sk->sk_lock.owned; > } >
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Sun, 2016-04-24 at 17:13 -0400, valdis.kletni...@vt.edu wrote: > On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said: > > On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote: > > > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said: > > > > > > > >>> + return !debug_locks || > > > > >>> + lockdep_is_held(>sk_lock) || > > > > > > > Issue here is that once lockdep detected a problem (not necessarily in > > > > net/ tree btw), your helper always 'detect' a problem, since lockdep > > > > automatically disables itself. > > > > > > "D'Oh!" -- H. Simpson > > > > > > I thought this patch looked suspect, but couldn't put my finger on it. The > > > reason why I got like 41,000 of them is because I built a kernel that has > > > lockdep enabled, but I have an out-of-tree module that doesn't init > > > something, > > > so I get this: > > > > > > [ 48.898156] INFO: trying to register non-static key. > > > [ 48.898157] the code is fine but needs lockdep annotation. > > > [ 48.898157] turning off the locking correctness validator. > > > > > > After which point, even with this patch, every time through it's still > > > going to > > > explode. > > > > Which patch are you talking about ? > > The one that adds the !debug_locks check - once my out-of-kernel module > hits something that turns off lockdep, it's *still* going to complain on > pretty much all the same packets it complained about earlier. I thought > it looked suspicious, but you clarified why... It does not make sense to me. If lockdep is disabled, then debug_locks is 0. So no complain should happen from networking. I was about to send following patch, please check it solves the issue. ? (It certainly did for me, once I forced a lockdep splat loading a buggy module) Thanks From: Eric DumazetValdis reported tons of stack dumps caused by WARN_ON() in sock_owned_by_user() This test needs to be relaxed if/when lockdep disables itself. Note that other lockdep_sock_is_held() callers are all from rcu_dereference_protected() sections which already are disabled if/when lockdep has been disabled. Fixes: fafc4e1ea1a4 ("sock: tigthen lockdep checks for sock_owned_by_user") Reported-by: Valdis Kletnieks Signed-off-by: Eric Dumazet --- include/net/sock.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/sock.h b/include/net/sock.h index 52448baf19d7..f492d01512ed 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1409,7 +1409,7 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow) static inline bool sock_owned_by_user(const struct sock *sk) { #ifdef CONFIG_LOCKDEP - WARN_ON(!lockdep_sock_is_held(sk)); + WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks); #endif return sk->sk_lock.owned; }
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said: > On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote: > > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said: > > > > > >>> + return !debug_locks || > > > >>> +lockdep_is_held(>sk_lock) || > > > > > Issue here is that once lockdep detected a problem (not necessarily in > > > net/ tree btw), your helper always 'detect' a problem, since lockdep > > > automatically disables itself. > > > > "D'Oh!" -- H. Simpson > > > > I thought this patch looked suspect, but couldn't put my finger on it. The > > reason why I got like 41,000 of them is because I built a kernel that has > > lockdep enabled, but I have an out-of-tree module that doesn't init > > something, > > so I get this: > > > > [ 48.898156] INFO: trying to register non-static key. > > [ 48.898157] the code is fine but needs lockdep annotation. > > [ 48.898157] turning off the locking correctness validator. > > > > After which point, even with this patch, every time through it's still > > going to > > explode. > > Which patch are you talking about ? The one that adds the !debug_locks check - once my out-of-kernel module hits something that turns off lockdep, it's *still* going to complain on pretty much all the same packets it complained about earlier. I thought it looked suspicious, but you clarified why... pgpMNDGLbxiui.pgp Description: PGP signature
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote: > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said: > > > >>> + return !debug_locks || > > >>> + lockdep_is_held(>sk_lock) || > > > Issue here is that once lockdep detected a problem (not necessarily in > > net/ tree btw), your helper always 'detect' a problem, since lockdep > > automatically disables itself. > > "D'Oh!" -- H. Simpson > > I thought this patch looked suspect, but couldn't put my finger on it. The > reason why I got like 41,000 of them is because I built a kernel that has > lockdep enabled, but I have an out-of-tree module that doesn't init something, > so I get this: > > [ 48.898156] INFO: trying to register non-static key. > [ 48.898157] the code is fine but needs lockdep annotation. > [ 48.898157] turning off the locking correctness validator. > > After which point, even with this patch, every time through it's still going > to > explode. Which patch are you talking about ?
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Sun, 2016-04-24 at 14:54 -0400, David Miller wrote: > From: Hannes Frederic Sowa> Date: Sun, 24 Apr 2016 20:48:24 +0200 > > > Eric's patch is worth to apply anyway, but I am not sure if it solves > > the (fundamental) problem. I couldn't reproduce it with the exact next- > > tag provided in the initial mail. All other reports also only happend > > with linux-next and not net-next. > > Ok, Eric please submit it formally. > > Thanks! Yes, I will first test it, and will take Hannes suggestion as well.
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said: > >>> + return !debug_locks || > >>> +lockdep_is_held(>sk_lock) || > Issue here is that once lockdep detected a problem (not necessarily in > net/ tree btw), your helper always 'detect' a problem, since lockdep > automatically disables itself. "D'Oh!" -- H. Simpson I thought this patch looked suspect, but couldn't put my finger on it. The reason why I got like 41,000 of them is because I built a kernel that has lockdep enabled, but I have an out-of-tree module that doesn't init something, so I get this: [ 48.898156] INFO: trying to register non-static key. [ 48.898157] the code is fine but needs lockdep annotation. [ 48.898157] turning off the locking correctness validator. After which point, even with this patch, every time through it's still going to explode. pgpPugvHosjHN.pgp Description: PGP signature
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Sun, 2016-04-24 at 20:48 +0200, Hannes Frederic Sowa wrote: > On 24.04.2016 20:38, David Miller wrote: > > From: Hannes Frederic Sowa> > Date: Thu, 21 Apr 2016 15:49:37 +0200 > > > >> On 21.04.2016 15:31, Eric Dumazet wrote: > >>> On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote: > On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said: > > Hi, > > > > On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote: > >> linux-next 20160420 is whining at an incredible rate - in 20 minutes of > >> uptime, I piled up some 41,000 hits from all over the place (cleaned up > >> to skip the CPU and PID so the list isn't quite so long): > > > > Thanks for the report. Can you give me some more details: > > > > Is this an nfs socket? Do you by accident know if this socket went > > through xs_reclassify_socket at any point? We do hold the appropriate > > locks at that point but I fear that the lockdep reinitialization > > confused lockdep. > > It wasn't an NFS socket, as NFS wasn't even active at the time. I'm > reasonably > sure that multiple sockets were in play, given that tcp_v6_rcv and > udpv6_queue_rcv_skb were both implicated. I strongly suspect that > pretty much > any IPv6 traffic could do it - the frequency dropped off quite a bit > when I > closed firefox, which is usually a heavy network hitter on my laptop. > >>> > >>> > >>> Looks like the following patch is needed, can you try it please ? > >>> > >>> Thanks ! > >>> > >>> diff --git a/include/net/sock.h b/include/net/sock.h > >>> index d997ec13a643..db8301c76d50 100644 > >>> --- a/include/net/sock.h > >>> +++ b/include/net/sock.h > >>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const > >>> struct sock *csk) > >>> { > >>> struct sock *sk = (struct sock *)csk; > >>> > >>> - return lockdep_is_held(>sk_lock) || > >>> + return !debug_locks || > >>> +lockdep_is_held(>sk_lock) || > >>> lockdep_is_held(>sk_lock.slock); > >>> } > >>> #endif > >> > >> I would prefer to add debug_locks at the WARN_ON level, like > >> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if > >> this fixes the initial splat. > > > > Can we finish this conversation out and come up with a final patch > > for this soon? > > Eric's patch is worth to apply anyway, but I am not sure if it solves > the (fundamental) problem. I couldn't reproduce it with the exact next- > tag provided in the initial mail. All other reports also only happend > with linux-next and not net-next. > > I hope I Valdis provides his config soon and I will continue my analysis > on this then. Should be easy to force a lockdep splat and check if the patch solves the issue. Issue here is that once lockdep detected a problem (not necessarily in net/ tree btw), your helper always 'detect' a problem, since lockdep automatically disables itself.
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
From: Hannes Frederic SowaDate: Sun, 24 Apr 2016 20:48:24 +0200 > Eric's patch is worth to apply anyway, but I am not sure if it solves > the (fundamental) problem. I couldn't reproduce it with the exact next- > tag provided in the initial mail. All other reports also only happend > with linux-next and not net-next. Ok, Eric please submit it formally. Thanks!
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On 24.04.2016 20:38, David Miller wrote: > From: Hannes Frederic Sowa> Date: Thu, 21 Apr 2016 15:49:37 +0200 > >> On 21.04.2016 15:31, Eric Dumazet wrote: >>> On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote: On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said: > Hi, > > On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote: >> linux-next 20160420 is whining at an incredible rate - in 20 minutes of >> uptime, I piled up some 41,000 hits from all over the place (cleaned up >> to skip the CPU and PID so the list isn't quite so long): > > Thanks for the report. Can you give me some more details: > > Is this an nfs socket? Do you by accident know if this socket went > through xs_reclassify_socket at any point? We do hold the appropriate > locks at that point but I fear that the lockdep reinitialization > confused lockdep. It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably sure that multiple sockets were in play, given that tcp_v6_rcv and udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much any IPv6 traffic could do it - the frequency dropped off quite a bit when I closed firefox, which is usually a heavy network hitter on my laptop. >>> >>> >>> Looks like the following patch is needed, can you try it please ? >>> >>> Thanks ! >>> >>> diff --git a/include/net/sock.h b/include/net/sock.h >>> index d997ec13a643..db8301c76d50 100644 >>> --- a/include/net/sock.h >>> +++ b/include/net/sock.h >>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct >>> sock *csk) >>> { >>> struct sock *sk = (struct sock *)csk; >>> >>> - return lockdep_is_held(>sk_lock) || >>> + return !debug_locks || >>> + lockdep_is_held(>sk_lock) || >>>lockdep_is_held(>sk_lock.slock); >>> } >>> #endif >> >> I would prefer to add debug_locks at the WARN_ON level, like >> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if >> this fixes the initial splat. > > Can we finish this conversation out and come up with a final patch > for this soon? Eric's patch is worth to apply anyway, but I am not sure if it solves the (fundamental) problem. I couldn't reproduce it with the exact next- tag provided in the initial mail. All other reports also only happend with linux-next and not net-next. I hope I Valdis provides his config soon and I will continue my analysis on this then. Thanks, Hannes
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
From: Hannes Frederic SowaDate: Thu, 21 Apr 2016 15:49:37 +0200 > On 21.04.2016 15:31, Eric Dumazet wrote: >> On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote: >>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said: Hi, On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote: > linux-next 20160420 is whining at an incredible rate - in 20 minutes of > uptime, I piled up some 41,000 hits from all over the place (cleaned up > to skip the CPU and PID so the list isn't quite so long): Thanks for the report. Can you give me some more details: Is this an nfs socket? Do you by accident know if this socket went through xs_reclassify_socket at any point? We do hold the appropriate locks at that point but I fear that the lockdep reinitialization confused lockdep. >>> >>> It wasn't an NFS socket, as NFS wasn't even active at the time. I'm >>> reasonably >>> sure that multiple sockets were in play, given that tcp_v6_rcv and >>> udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty >>> much >>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I >>> closed firefox, which is usually a heavy network hitter on my laptop. >> >> >> Looks like the following patch is needed, can you try it please ? >> >> Thanks ! >> >> diff --git a/include/net/sock.h b/include/net/sock.h >> index d997ec13a643..db8301c76d50 100644 >> --- a/include/net/sock.h >> +++ b/include/net/sock.h >> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct >> sock *csk) >> { >> struct sock *sk = (struct sock *)csk; >> >> -return lockdep_is_held(>sk_lock) || >> +return !debug_locks || >> + lockdep_is_held(>sk_lock) || >> lockdep_is_held(>sk_lock.slock); >> } >> #endif > > I would prefer to add debug_locks at the WARN_ON level, like > WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if > this fixes the initial splat. Can we finish this conversation out and come up with a final patch for this soon? Thanks.
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On 21.04.2016 15:31, Eric Dumazet wrote: > On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote: >> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said: >>> Hi, >>> >>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote: linux-next 20160420 is whining at an incredible rate - in 20 minutes of uptime, I piled up some 41,000 hits from all over the place (cleaned up to skip the CPU and PID so the list isn't quite so long): >>> >>> Thanks for the report. Can you give me some more details: >>> >>> Is this an nfs socket? Do you by accident know if this socket went >>> through xs_reclassify_socket at any point? We do hold the appropriate >>> locks at that point but I fear that the lockdep reinitialization >>> confused lockdep. >> >> It wasn't an NFS socket, as NFS wasn't even active at the time. I'm >> reasonably >> sure that multiple sockets were in play, given that tcp_v6_rcv and >> udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty >> much >> any IPv6 traffic could do it - the frequency dropped off quite a bit when I >> closed firefox, which is usually a heavy network hitter on my laptop. > > > Looks like the following patch is needed, can you try it please ? > > Thanks ! > > diff --git a/include/net/sock.h b/include/net/sock.h > index d997ec13a643..db8301c76d50 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct > sock *csk) > { > struct sock *sk = (struct sock *)csk; > > - return lockdep_is_held(>sk_lock) || > + return !debug_locks || > +lockdep_is_held(>sk_lock) || > lockdep_is_held(>sk_lock.slock); > } > #endif I am a little bit lost because I cannot reproduce this bug. I thought maybe it has something to do with single cpu spin_locks which don't update the lockdep_maps but I couldn't reproduce it. If debug_locks get flipped you should see something in dmesg, too. Maybe you have this handy? Was there another lockdep splat before the networking ones? Also the config would be helpful. Thanks, Hannes
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On 21.04.2016 15:31, Eric Dumazet wrote: > On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote: >> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said: >>> Hi, >>> >>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote: linux-next 20160420 is whining at an incredible rate - in 20 minutes of uptime, I piled up some 41,000 hits from all over the place (cleaned up to skip the CPU and PID so the list isn't quite so long): >>> >>> Thanks for the report. Can you give me some more details: >>> >>> Is this an nfs socket? Do you by accident know if this socket went >>> through xs_reclassify_socket at any point? We do hold the appropriate >>> locks at that point but I fear that the lockdep reinitialization >>> confused lockdep. >> >> It wasn't an NFS socket, as NFS wasn't even active at the time. I'm >> reasonably >> sure that multiple sockets were in play, given that tcp_v6_rcv and >> udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty >> much >> any IPv6 traffic could do it - the frequency dropped off quite a bit when I >> closed firefox, which is usually a heavy network hitter on my laptop. > > > Looks like the following patch is needed, can you try it please ? > > Thanks ! > > diff --git a/include/net/sock.h b/include/net/sock.h > index d997ec13a643..db8301c76d50 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct > sock *csk) > { > struct sock *sk = (struct sock *)csk; > > - return lockdep_is_held(>sk_lock) || > + return !debug_locks || > +lockdep_is_held(>sk_lock) || > lockdep_is_held(>sk_lock.slock); > } > #endif I would prefer to add debug_locks at the WARN_ON level, like WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if this fixes the initial splat. Thanks Eric!
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote: > On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said: > > Hi, > > > > On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote: > > > linux-next 20160420 is whining at an incredible rate - in 20 minutes of > > > uptime, I piled up some 41,000 hits from all over the place (cleaned up > > > to skip the CPU and PID so the list isn't quite so long): > > > > Thanks for the report. Can you give me some more details: > > > > Is this an nfs socket? Do you by accident know if this socket went > > through xs_reclassify_socket at any point? We do hold the appropriate > > locks at that point but I fear that the lockdep reinitialization > > confused lockdep. > > It wasn't an NFS socket, as NFS wasn't even active at the time. I'm > reasonably > sure that multiple sockets were in play, given that tcp_v6_rcv and > udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much > any IPv6 traffic could do it - the frequency dropped off quite a bit when I > closed firefox, which is usually a heavy network hitter on my laptop. Looks like the following patch is needed, can you try it please ? Thanks ! diff --git a/include/net/sock.h b/include/net/sock.h index d997ec13a643..db8301c76d50 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk) { struct sock *sk = (struct sock *)csk; - return lockdep_is_held(>sk_lock) || + return !debug_locks || + lockdep_is_held(>sk_lock) || lockdep_is_held(>sk_lock.slock); } #endif
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said: > Hi, > > On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote: > > linux-next 20160420 is whining at an incredible rate - in 20 minutes of > > uptime, I piled up some 41,000 hits from all over the place (cleaned up > > to skip the CPU and PID so the list isn't quite so long): > > Thanks for the report. Can you give me some more details: > > Is this an nfs socket? Do you by accident know if this socket went > through xs_reclassify_socket at any point? We do hold the appropriate > locks at that point but I fear that the lockdep reinitialization > confused lockdep. It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably sure that multiple sockets were in play, given that tcp_v6_rcv and udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much any IPv6 traffic could do it - the frequency dropped off quite a bit when I closed firefox, which is usually a heavy network hitter on my laptop. pgp33P5xQXd_u.pgp Description: PGP signature
Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
Hi, On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote: > linux-next 20160420 is whining at an incredible rate - in 20 minutes of > uptime, I piled up some 41,000 hits from all over the place (cleaned up > to skip the CPU and PID so the list isn't quite so long): Thanks for the report. Can you give me some more details: Is this an nfs socket? Do you by accident know if this socket went through xs_reclassify_socket at any point? We do hold the appropriate locks at that point but I fear that the lockdep reinitialization confused lockdep. Thanks, Hannes
linux-next: zillions of lockdep whinges in include/net/sock.h:1408
linux-next 20160420 is whining at an incredible rate - in 20 minutes of uptime, I piled up some 41,000 hits from all over the place (cleaned up to skip the CPU and PID so the list isn't quite so long): % grep include/net/sock.h /var/log/messages | cut -f5- -d: | sed -e 's/PID: [0-9]* /PID: (elided) /' -e 's/CPU: [0-3]/CPU: +/' | sort | uniq -c | sort -nr 13468 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_rcv+0xc20/0xcb0 9770 CPU: + PID: (elided) at include/net/sock.h:1408 udp_queue_rcv_skb+0x3ca/0x6d0 7706 CPU: + PID: (elided) at include/net/sock.h:1408 sock_owned_by_user+0x91/0xa0 2818 CPU: + PID: (elided) at include/net/sock.h:1408 udpv6_queue_rcv_skb+0x3b6/0x6d0 1981 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_write_timer+0xf2/0x110 1954 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_delack_timer+0x110/0x130 1912 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_keepalive_timer+0x136/0x2c0 882 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_close+0x226/0x4f0 804 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_tasklet_func+0x192/0x1e0 28 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_child_process+0x17a/0x350 2 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x401/0x660 2 CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x1fd/0x660 Seems to be from this commit, which is apparently over-stringent or isn't handling some case correctly: commit fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3 Author: Hannes Frederic SowaDate: Fri Apr 8 15:11:27 2016 +0200 sock: tigthen lockdep checks for sock_owned_by_user sock_owned_by_user should not be used without socket lock held. It seems to be a common practice to check .owned before lock reclassification, so provide a little help to abstract this check away. Cc: linux-c...@vger.kernel.org Cc: linux-blueto...@vger.kernel.org Cc: linux-...@vger.kernel.org Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller pgpzfUwGbQWVc.pgp Description: PGP signature