subject:"linux\-next\: zillions of lockdep whinges in include\/net\/sock.h\:1408"

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-25 Thread Hannes Frederic Sowa

On Sun, Apr 24, 2016, at 23:25, Eric Dumazet wrote:
>  #ifdef CONFIG_LOCKDEP
> -   WARN_ON(!lockdep_sock_is_held(sk));
> +   WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks);
>  #endif

Eric, could you resend this patch without the negation and also add my
acked-by (I came finally around to test it).

Thanks,
Hannes

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Eric Dumazet

On Sun, 2016-04-24 at 14:25 -0700, Eric Dumazet wrote:
> On Sun, 2016-04-24 at 17:13 -0400, valdis.kletni...@vt.edu wrote:
> > On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said:
> > > On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote:
> > > > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> > > >
> > > > > >>> + return !debug_locks ||
> > > > > >>> +lockdep_is_held(>sk_lock) ||
> > > >
> > > > > Issue here is that once lockdep detected a problem (not necessarily in
> > > > > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > > > > automatically disables itself.
> > > >
> > > > "D'Oh!" -- H. Simpson
> > > >
> > > > I thought this patch looked suspect, but couldn't put my finger on it. 
> > > > The
> > > > reason why I got like 41,000 of them is because I built a kernel that 
> > > > has
> > > > lockdep enabled, but I have an out-of-tree module that doesn't init 
> > > > something,
> > > > so I get this:
> > > >
> > > > [   48.898156] INFO: trying to register non-static key.
> > > > [   48.898157] the code is fine but needs lockdep annotation.
> > > > [   48.898157] turning off the locking correctness validator.
> > > >
> > > > After which point, even with this patch, every time through it's still 
> > > > going to
> > > > explode.
> > >
> > > Which patch are you talking about ?
> > 
> > The one that adds the !debug_locks check - once my out-of-kernel module
> > hits something that turns off lockdep, it's *still* going to complain on
> > pretty much all the same packets it complained about earlier.  I thought
> > it looked suspicious, but you clarified why...
> 
> It does not make sense to me. If lockdep is disabled, then debug_locks
> is 0.
> 
> So no complain should happen from networking.
> 
> I was about to send following patch, please check it solves the issue. ?
> 
> (It certainly did for me, once I forced a lockdep splat loading a buggy
> module)
> 
> Thanks
> 
> From: Eric Dumazet 
> 
> Valdis reported tons of stack dumps caused by WARN_ON() in 
> sock_owned_by_user()
> 
> This test needs to be relaxed if/when lockdep disables itself.
> 
> Note that other lockdep_sock_is_held() callers are all from
> rcu_dereference_protected() sections which already are disabled
> if/when lockdep has been disabled.
> 
> Fixes: fafc4e1ea1a4 ("sock: tigthen lockdep checks for sock_owned_by_user")
> Reported-by: Valdis Kletnieks 
> Signed-off-by: Eric Dumazet 
> ---
>  include/net/sock.h |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 52448baf19d7..f492d01512ed 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1409,7 +1409,7 @@ static inline void unlock_sock_fast(struct sock *sk, 
> bool slow)
>  static inline bool sock_owned_by_user(const struct sock *sk)
>  {
>  #ifdef CONFIG_LOCKDEP
> - WARN_ON(!lockdep_sock_is_held(sk));
> + WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks);


Silly me, I tested the opposite test of course :

WARN_ON_ONCE(!lockdep_sock_is_held(sk) && debug_locks);

>  #endif
>   return sk->sk_lock.owned;
>  }
>

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Eric Dumazet

On Sun, 2016-04-24 at 17:13 -0400, valdis.kletni...@vt.edu wrote:
> On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said:
> > On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote:
> > > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> > >
> > > > >>> +   return !debug_locks ||
> > > > >>> +  lockdep_is_held(>sk_lock) ||
> > >
> > > > Issue here is that once lockdep detected a problem (not necessarily in
> > > > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > > > automatically disables itself.
> > >
> > > "D'Oh!" -- H. Simpson
> > >
> > > I thought this patch looked suspect, but couldn't put my finger on it. The
> > > reason why I got like 41,000 of them is because I built a kernel that has
> > > lockdep enabled, but I have an out-of-tree module that doesn't init 
> > > something,
> > > so I get this:
> > >
> > > [   48.898156] INFO: trying to register non-static key.
> > > [   48.898157] the code is fine but needs lockdep annotation.
> > > [   48.898157] turning off the locking correctness validator.
> > >
> > > After which point, even with this patch, every time through it's still 
> > > going to
> > > explode.
> >
> > Which patch are you talking about ?
> 
> The one that adds the !debug_locks check - once my out-of-kernel module
> hits something that turns off lockdep, it's *still* going to complain on
> pretty much all the same packets it complained about earlier.  I thought
> it looked suspicious, but you clarified why...

It does not make sense to me. If lockdep is disabled, then debug_locks
is 0.

So no complain should happen from networking.

I was about to send following patch, please check it solves the issue. ?

(It certainly did for me, once I forced a lockdep splat loading a buggy
module)

Thanks

From: Eric Dumazet 

Valdis reported tons of stack dumps caused by WARN_ON() in sock_owned_by_user()

This test needs to be relaxed if/when lockdep disables itself.

Note that other lockdep_sock_is_held() callers are all from
rcu_dereference_protected() sections which already are disabled
if/when lockdep has been disabled.

Fixes: fafc4e1ea1a4 ("sock: tigthen lockdep checks for sock_owned_by_user")
Reported-by: Valdis Kletnieks 
Signed-off-by: Eric Dumazet 
---
 include/net/sock.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 52448baf19d7..f492d01512ed 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1409,7 +1409,7 @@ static inline void unlock_sock_fast(struct sock *sk, bool 
slow)
 static inline bool sock_owned_by_user(const struct sock *sk)
 {
 #ifdef CONFIG_LOCKDEP
-   WARN_ON(!lockdep_sock_is_held(sk));
+   WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks);
 #endif
return sk->sk_lock.owned;
 }

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Valdis . Kletnieks

On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said:
> On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote:
> > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> >
> > > >>> + return !debug_locks ||
> > > >>> +lockdep_is_held(>sk_lock) ||
> >
> > > Issue here is that once lockdep detected a problem (not necessarily in
> > > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > > automatically disables itself.
> >
> > "D'Oh!" -- H. Simpson
> >
> > I thought this patch looked suspect, but couldn't put my finger on it. The
> > reason why I got like 41,000 of them is because I built a kernel that has
> > lockdep enabled, but I have an out-of-tree module that doesn't init 
> > something,
> > so I get this:
> >
> > [   48.898156] INFO: trying to register non-static key.
> > [   48.898157] the code is fine but needs lockdep annotation.
> > [   48.898157] turning off the locking correctness validator.
> >
> > After which point, even with this patch, every time through it's still 
> > going to
> > explode.
>
> Which patch are you talking about ?

The one that adds the !debug_locks check - once my out-of-kernel module
hits something that turns off lockdep, it's *still* going to complain on
pretty much all the same packets it complained about earlier.  I thought
it looked suspicious, but you clarified why...


pgpMNDGLbxiui.pgp
Description: PGP signature

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Eric Dumazet

On Sun, 2016-04-24 at 15:56 -0400, valdis.kletni...@vt.edu wrote:
> On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> 
> > >>> +   return !debug_locks ||
> > >>> +  lockdep_is_held(>sk_lock) ||
> 
> > Issue here is that once lockdep detected a problem (not necessarily in
> > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > automatically disables itself.
> 
> "D'Oh!" -- H. Simpson
> 
> I thought this patch looked suspect, but couldn't put my finger on it. The
> reason why I got like 41,000 of them is because I built a kernel that has
> lockdep enabled, but I have an out-of-tree module that doesn't init something,
> so I get this:
> 
> [   48.898156] INFO: trying to register non-static key.
> [   48.898157] the code is fine but needs lockdep annotation.
> [   48.898157] turning off the locking correctness validator.
> 
> After which point, even with this patch, every time through it's still going 
> to
> explode.

Which patch are you talking about ?

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Eric Dumazet

On Sun, 2016-04-24 at 14:54 -0400, David Miller wrote:
> From: Hannes Frederic Sowa 
> Date: Sun, 24 Apr 2016 20:48:24 +0200
> 
> > Eric's patch is worth to apply anyway, but I am not sure if it solves
> > the (fundamental) problem. I couldn't reproduce it with the exact next-
> > tag provided in the initial mail. All other reports also only happend
> > with linux-next and not net-next.
> 
> Ok, Eric please submit it formally.
> 
> Thanks!

Yes, I will first test it, and will take Hannes suggestion as well.

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Valdis . Kletnieks

On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:

> >>> + return !debug_locks ||
> >>> +lockdep_is_held(>sk_lock) ||

> Issue here is that once lockdep detected a problem (not necessarily in
> net/ tree btw), your helper always 'detect' a problem, since lockdep
> automatically disables itself.

"D'Oh!" -- H. Simpson

I thought this patch looked suspect, but couldn't put my finger on it. The
reason why I got like 41,000 of them is because I built a kernel that has
lockdep enabled, but I have an out-of-tree module that doesn't init something,
so I get this:

[   48.898156] INFO: trying to register non-static key.
[   48.898157] the code is fine but needs lockdep annotation.
[   48.898157] turning off the locking correctness validator.

After which point, even with this patch, every time through it's still going to
explode.




pgpPugvHosjHN.pgp
Description: PGP signature

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Eric Dumazet

On Sun, 2016-04-24 at 20:48 +0200, Hannes Frederic Sowa wrote:
> On 24.04.2016 20:38, David Miller wrote:
> > From: Hannes Frederic Sowa 
> > Date: Thu, 21 Apr 2016 15:49:37 +0200
> > 
> >> On 21.04.2016 15:31, Eric Dumazet wrote:
> >>> On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote:
>  On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
> > Hi,
> >
> > On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> >> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> >> uptime, I piled up some 41,000 hits from all over the place (cleaned up
> >> to skip the CPU and PID so the list isn't quite so long):
> >
> > Thanks for the report. Can you give me some more details:
> >
> > Is this an nfs socket? Do you by accident know if this socket went
> > through xs_reclassify_socket at any point? We do hold the appropriate
> > locks at that point but I fear that the lockdep reinitialization
> > confused lockdep.
> 
>  It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm 
>  reasonably
>  sure that multiple sockets were in play, given that tcp_v6_rcv and
>  udpv6_queue_rcv_skb were both implicated.  I strongly suspect that 
>  pretty much
>  any IPv6 traffic could do it - the frequency dropped off quite a bit 
>  when I
>  closed firefox, which is usually a heavy network hitter on my laptop.
> >>>
> >>>
> >>> Looks like the following patch is needed, can you try it please ?
> >>>
> >>> Thanks !
> >>>
> >>> diff --git a/include/net/sock.h b/include/net/sock.h
> >>> index d997ec13a643..db8301c76d50 100644
> >>> --- a/include/net/sock.h
> >>> +++ b/include/net/sock.h
> >>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const 
> >>> struct sock *csk)
> >>>  {
> >>>   struct sock *sk = (struct sock *)csk;
> >>>  
> >>> - return lockdep_is_held(>sk_lock) ||
> >>> + return !debug_locks ||
> >>> +lockdep_is_held(>sk_lock) ||
> >>>  lockdep_is_held(>sk_lock.slock);
> >>>  }
> >>>  #endif
> >>
> >> I would prefer to add debug_locks at the WARN_ON level, like
> >> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
> >> this fixes the initial splat.
> > 
> > Can we finish this conversation out and come up with a final patch
> > for this soon?
> 
> Eric's patch is worth to apply anyway, but I am not sure if it solves
> the (fundamental) problem. I couldn't reproduce it with the exact next-
> tag provided in the initial mail. All other reports also only happend
> with linux-next and not net-next.
> 
> I hope I Valdis provides his config soon and I will continue my analysis
> on this then.

Should be easy to force a lockdep splat and check if the patch solves
the issue.

Issue here is that once lockdep detected a problem (not necessarily in
net/ tree btw), your helper always 'detect' a problem, since lockdep
automatically disables itself.

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread David Miller

From: Hannes Frederic Sowa 
Date: Sun, 24 Apr 2016 20:48:24 +0200

> Eric's patch is worth to apply anyway, but I am not sure if it solves
> the (fundamental) problem. I couldn't reproduce it with the exact next-
> tag provided in the initial mail. All other reports also only happend
> with linux-next and not net-next.

Ok, Eric please submit it formally.

Thanks!

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread Hannes Frederic Sowa

On 24.04.2016 20:38, David Miller wrote:
> From: Hannes Frederic Sowa 
> Date: Thu, 21 Apr 2016 15:49:37 +0200
> 
>> On 21.04.2016 15:31, Eric Dumazet wrote:
>>> On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote:
 On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
> Hi,
>
> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>> to skip the CPU and PID so the list isn't quite so long):
>
> Thanks for the report. Can you give me some more details:
>
> Is this an nfs socket? Do you by accident know if this socket went
> through xs_reclassify_socket at any point? We do hold the appropriate
> locks at that point but I fear that the lockdep reinitialization
> confused lockdep.

 It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm 
 reasonably
 sure that multiple sockets were in play, given that tcp_v6_rcv and
 udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty 
 much
 any IPv6 traffic could do it - the frequency dropped off quite a bit when I
 closed firefox, which is usually a heavy network hitter on my laptop.
>>>
>>>
>>> Looks like the following patch is needed, can you try it please ?
>>>
>>> Thanks !
>>>
>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>> index d997ec13a643..db8301c76d50 100644
>>> --- a/include/net/sock.h
>>> +++ b/include/net/sock.h
>>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct 
>>> sock *csk)
>>>  {
>>> struct sock *sk = (struct sock *)csk;
>>>  
>>> -   return lockdep_is_held(>sk_lock) ||
>>> +   return !debug_locks ||
>>> +  lockdep_is_held(>sk_lock) ||
>>>lockdep_is_held(>sk_lock.slock);
>>>  }
>>>  #endif
>>
>> I would prefer to add debug_locks at the WARN_ON level, like
>> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
>> this fixes the initial splat.
> 
> Can we finish this conversation out and come up with a final patch
> for this soon?

Eric's patch is worth to apply anyway, but I am not sure if it solves
the (fundamental) problem. I couldn't reproduce it with the exact next-
tag provided in the initial mail. All other reports also only happend
with linux-next and not net-next.

I hope I Valdis provides his config soon and I will continue my analysis
on this then.

Thanks,
Hannes

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-24 Thread David Miller

From: Hannes Frederic Sowa 
Date: Thu, 21 Apr 2016 15:49:37 +0200

> On 21.04.2016 15:31, Eric Dumazet wrote:
>> On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote:
>>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
 Hi,

 On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> uptime, I piled up some 41,000 hits from all over the place (cleaned up
> to skip the CPU and PID so the list isn't quite so long):

 Thanks for the report. Can you give me some more details:

 Is this an nfs socket? Do you by accident know if this socket went
 through xs_reclassify_socket at any point? We do hold the appropriate
 locks at that point but I fear that the lockdep reinitialization
 confused lockdep.
>>>
>>> It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm 
>>> reasonably
>>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>>> udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty 
>>> much
>>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>>> closed firefox, which is usually a heavy network hitter on my laptop.
>> 
>> 
>> Looks like the following patch is needed, can you try it please ?
>> 
>> Thanks !
>> 
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index d997ec13a643..db8301c76d50 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct 
>> sock *csk)
>>  {
>>  struct sock *sk = (struct sock *)csk;
>>  
>> -return lockdep_is_held(>sk_lock) ||
>> +return !debug_locks ||
>> +   lockdep_is_held(>sk_lock) ||
>> lockdep_is_held(>sk_lock.slock);
>>  }
>>  #endif
> 
> I would prefer to add debug_locks at the WARN_ON level, like
> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
> this fixes the initial splat.

Can we finish this conversation out and come up with a final patch
for this soon?

Thanks.

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-21 Thread Hannes Frederic Sowa

On 21.04.2016 15:31, Eric Dumazet wrote:
> On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote:
>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>> Hi,
>>>
>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
 linux-next 20160420 is whining at an incredible rate - in 20 minutes of
 uptime, I piled up some 41,000 hits from all over the place (cleaned up
 to skip the CPU and PID so the list isn't quite so long):
>>>
>>> Thanks for the report. Can you give me some more details:
>>>
>>> Is this an nfs socket? Do you by accident know if this socket went
>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>> locks at that point but I fear that the lockdep reinitialization
>>> confused lockdep.
>>
>> It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm 
>> reasonably
>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>> udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty 
>> much
>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>> closed firefox, which is usually a heavy network hitter on my laptop.
> 
> 
> Looks like the following patch is needed, can you try it please ?
> 
> Thanks !
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index d997ec13a643..db8301c76d50 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct 
> sock *csk)
>  {
>   struct sock *sk = (struct sock *)csk;
>  
> - return lockdep_is_held(>sk_lock) ||
> + return !debug_locks ||
> +lockdep_is_held(>sk_lock) ||
>  lockdep_is_held(>sk_lock.slock);
>  }
>  #endif

I am a little bit lost because I cannot reproduce this bug. I thought
maybe it has something to do with single cpu spin_locks which don't
update the lockdep_maps but I couldn't reproduce it. If debug_locks get
flipped you should see something in dmesg, too. Maybe you have this
handy? Was there another lockdep splat before the networking ones? Also
the config would be helpful.

Thanks,
Hannes

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-21 Thread Hannes Frederic Sowa

On 21.04.2016 15:31, Eric Dumazet wrote:
> On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote:
>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>> Hi,
>>>
>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
 linux-next 20160420 is whining at an incredible rate - in 20 minutes of
 uptime, I piled up some 41,000 hits from all over the place (cleaned up
 to skip the CPU and PID so the list isn't quite so long):
>>>
>>> Thanks for the report. Can you give me some more details:
>>>
>>> Is this an nfs socket? Do you by accident know if this socket went
>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>> locks at that point but I fear that the lockdep reinitialization
>>> confused lockdep.
>>
>> It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm 
>> reasonably
>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>> udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty 
>> much
>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>> closed firefox, which is usually a heavy network hitter on my laptop.
> 
> 
> Looks like the following patch is needed, can you try it please ?
> 
> Thanks !
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index d997ec13a643..db8301c76d50 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct 
> sock *csk)
>  {
>   struct sock *sk = (struct sock *)csk;
>  
> - return lockdep_is_held(>sk_lock) ||
> + return !debug_locks ||
> +lockdep_is_held(>sk_lock) ||
>  lockdep_is_held(>sk_lock.slock);
>  }
>  #endif

I would prefer to add debug_locks at the WARN_ON level, like
WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
this fixes the initial splat.

Thanks Eric!

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-21 Thread Eric Dumazet

On Thu, 2016-04-21 at 05:05 -0400, valdis.kletni...@vt.edu wrote:
> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
> > Hi,
> >
> > On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> > > linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> > > uptime, I piled up some 41,000 hits from all over the place (cleaned up
> > > to skip the CPU and PID so the list isn't quite so long):
> >
> > Thanks for the report. Can you give me some more details:
> >
> > Is this an nfs socket? Do you by accident know if this socket went
> > through xs_reclassify_socket at any point? We do hold the appropriate
> > locks at that point but I fear that the lockdep reinitialization
> > confused lockdep.
> 
> It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm 
> reasonably
> sure that multiple sockets were in play, given that tcp_v6_rcv and
> udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty much
> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
> closed firefox, which is usually a heavy network hitter on my laptop.


Looks like the following patch is needed, can you try it please ?

Thanks !

diff --git a/include/net/sock.h b/include/net/sock.h
index d997ec13a643..db8301c76d50 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock 
*csk)
 {
struct sock *sk = (struct sock *)csk;
 
-   return lockdep_is_held(>sk_lock) ||
+   return !debug_locks ||
+  lockdep_is_held(>sk_lock) ||
   lockdep_is_held(>sk_lock.slock);
 }
 #endif

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-21 Thread Valdis . Kletnieks

On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
> Hi,
>
> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> > linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> > uptime, I piled up some 41,000 hits from all over the place (cleaned up
> > to skip the CPU and PID so the list isn't quite so long):
>
> Thanks for the report. Can you give me some more details:
>
> Is this an nfs socket? Do you by accident know if this socket went
> through xs_reclassify_socket at any point? We do hold the appropriate
> locks at that point but I fear that the lockdep reinitialization
> confused lockdep.

It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm reasonably
sure that multiple sockets were in play, given that tcp_v6_rcv and
udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty much
any IPv6 traffic could do it - the frequency dropped off quite a bit when I
closed firefox, which is usually a heavy network hitter on my laptop.

pgp33P5xQXd_u.pgp
Description: PGP signature

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-21 Thread Hannes Frederic Sowa

Hi,

On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> uptime, I piled up some 41,000 hits from all over the place (cleaned up
> to skip the CPU and PID so the list isn't quite so long):

Thanks for the report. Can you give me some more details:

Is this an nfs socket? Do you by accident know if this socket went
through xs_reclassify_socket at any point? We do hold the appropriate
locks at that point but I fear that the lockdep reinitialization
confused lockdep.

Thanks,
Hannes

linux-next: zillions of lockdep whinges in include/net/sock.h:1408

2016-04-20 Thread Valdis Kletnieks

linux-next 20160420 is whining at an incredible rate - in 20 minutes of
uptime, I piled up some 41,000 hits from all over the place (cleaned up
to skip the CPU and PID so the list isn't quite so long):

% grep include/net/sock.h /var/log/messages | cut -f5- -d: |  sed -e 's/PID: 
[0-9]* /PID: (elided) /' -e 's/CPU: [0-3]/CPU: +/' | sort | uniq -c | sort -nr
  13468  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_rcv+0xc20/0xcb0
   9770  CPU: + PID: (elided) at include/net/sock.h:1408 
udp_queue_rcv_skb+0x3ca/0x6d0
   7706  CPU: + PID: (elided) at include/net/sock.h:1408 
sock_owned_by_user+0x91/0xa0
   2818  CPU: + PID: (elided) at include/net/sock.h:1408 
udpv6_queue_rcv_skb+0x3b6/0x6d0
   1981  CPU: + PID: (elided) at include/net/sock.h:1408 
tcp_write_timer+0xf2/0x110
   1954  CPU: + PID: (elided) at include/net/sock.h:1408 
tcp_delack_timer+0x110/0x130
   1912  CPU: + PID: (elided) at include/net/sock.h:1408 
tcp_keepalive_timer+0x136/0x2c0
882  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_close+0x226/0x4f0
804  CPU: + PID: (elided) at include/net/sock.h:1408 
tcp_tasklet_func+0x192/0x1e0
 28  CPU: + PID: (elided) at include/net/sock.h:1408 
tcp_child_process+0x17a/0x350
  2  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x401/0x660
  2  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x1fd/0x660

Seems to be from this commit, which is apparently over-stringent or
isn't handling some case correctly:

commit fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3
Author: Hannes Frederic Sowa 
Date:   Fri Apr 8 15:11:27 2016 +0200

sock: tigthen lockdep checks for sock_owned_by_user

sock_owned_by_user should not be used without socket lock held. It seems
to be a common practice to check .owned before lock reclassification, so
provide a little help to abstract this check away.

Cc: linux-c...@vger.kernel.org
Cc: linux-blueto...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Signed-off-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 



pgpzfUwGbQWVc.pgp
Description: PGP signature

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408

linux-next: zillions of lockdep whinges in include/net/sock.h:1408

17 matches

Site Navigation

Mail list logo

Footer information