On 6/22/06, Ian McDonald <[EMAIL PROTECTED]> wrote:
On 6/21/06, Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> On Wed, 2006-06-21 at 10:34 +1000, Herbert Xu wrote:
> > > As I read this it is not a recursive lock as sk_clone is occurring
> > > second and is actually creating a new socket so they are trying to
> > > lock on different sockets.
> > >
> > > Can someone tell me whether I am correct in my thinking or not? If I
> > > am then I will work out how to tell the lock validator not to worry
> > > about it.
> >
> > I agree, this looks bogus.  Ingo, could you please take a look?
>
> Fix is relatively easy:
>
>
> sk_clone creates a new socket, and thus can never deadlock, and in fact
> can be called with the original socket locked. This therefore is a
> legitimate nesting case; mark it as such.
>
> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>
>
>
> ---
>  net/core/sock.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-2.6.17-rc6-mm2/net/core/sock.c
> ===================================================================
> --- linux-2.6.17-rc6-mm2.orig/net/core/sock.c
> +++ linux-2.6.17-rc6-mm2/net/core/sock.c
> @@ -846,7 +846,7 @@ struct sock *sk_clone(const struct sock
>                 /* SANITY */
>                 sk_node_init(&newsk->sk_node);
>                 sock_lock_init(newsk);
> -               bh_lock_sock(newsk);
> +               bh_lock_sock_nested(newsk);
>
>                 atomic_set(&newsk->sk_rmem_alloc, 0);
>                 atomic_set(&newsk->sk_wmem_alloc, 0);
>
>
When I do this it now shifts around. I'll investigate further
(probably tomorrow).

Now get

Jun 22 14:20:48 localhost kernel: [ 1276.424531]
=============================================
Jun 22 14:20:48 localhost kernel: [ 1276.424541] [ INFO: possible
recursive locking detected ]
Jun 22 14:20:48 localhost kernel: [ 1276.424546]
---------------------------------------------
Jun 22 14:20:48 localhost kernel: [ 1276.424553] idle/0 is trying to
acquire lock:
Jun 22 14:20:48 localhost kernel: [ 1276.424559]
(&sk->sk_lock.slock#5/1){-+..}, at: [<c024594e>] sk_clone+0x5f/0x195
Jun 22 14:20:48 localhost kernel: [ 1276.424585]
Jun 22 14:20:48 localhost kernel: [ 1276.424587] but task is already
holding lock:
Jun 22 14:20:48 localhost kernel: [ 1276.424592]
(&sk->sk_lock.slock#5/1){-+..}, at: [<c027cd87>]
tcp_v4_rcv+0x42e/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.424616]
Jun 22 14:20:48 localhost kernel: [ 1276.424618] other info that might
help us debug this:
Jun 22 14:20:48 localhost kernel: [ 1276.424624] 2 locks held by idle/0:
Jun 22 14:20:48 localhost kernel: [ 1276.424628]  #0:
(&tp->rx_lock){-+..}, at: [<e0898915>] rtl8139_poll+0x42/0x41c
[8139too]
Jun 22 14:20:48 localhost kernel: [ 1276.424666]  #1:
(&sk->sk_lock.slock#5/1){-+..}, at: [<c027cd87>]
tcp_v4_rcv+0x42e/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.424685]
Jun 22 14:20:48 localhost kernel: [ 1276.424686] stack backtrace:
Jun 22 14:20:48 localhost kernel: [ 1276.425002]  [<c0103a2a>]
show_trace_log_lvl+0x53/0xff
Jun 22 14:20:48 localhost kernel: [ 1276.425038]  [<c0104078>]
show_trace+0x16/0x19
Jun 22 14:20:48 localhost kernel: [ 1276.425068]  [<c010411e>]
dump_stack+0x1a/0x1f
Jun 22 14:20:48 localhost kernel: [ 1276.425099]  [<c012d6cb>]
__lock_acquire+0x8e6/0x902
Jun 22 14:20:48 localhost kernel: [ 1276.425311]  [<c012d879>]
lock_acquire+0x4e/0x66
Jun 22 14:20:48 localhost kernel: [ 1276.425510]  [<c02989e1>]
_spin_lock_nested+0x26/0x36
Jun 22 14:20:48 localhost kernel: [ 1276.425726]  [<c024594e>]
sk_clone+0x5f/0x195
Jun 22 14:20:48 localhost kernel: [ 1276.427191]  [<c026d10f>]
inet_csk_clone+0xf/0x67
Jun 22 14:20:48 localhost kernel: [ 1276.428879]  [<c027d3d0>]
tcp_create_openreq_child+0x15/0x32b
Jun 22 14:20:48 localhost kernel: [ 1276.430598]  [<c027b383>]
tcp_v4_syn_recv_sock+0x47/0x29c
Jun 22 14:20:48 localhost kernel: [ 1276.432313]  [<e0fcf440>]
tcp_v6_syn_recv_sock+0x37/0x534 [ipv6]
Jun 22 14:20:48 localhost kernel: [ 1276.432482]  [<c027d886>]
tcp_check_req+0x1a0/0x2db
Jun 22 14:20:48 localhost kernel: [ 1276.434198]  [<c027aecc>]
tcp_v4_do_rcv+0x9f/0x2fe
Jun 22 14:20:48 localhost kernel: [ 1276.435911]  [<c027d28b>]
tcp_v4_rcv+0x932/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.437632]  [<c0265980>]
ip_local_deliver+0x159/0x1f1
Jun 22 14:20:48 localhost kernel: [ 1276.439305]  [<c02657fa>]
ip_rcv+0x3e9/0x416
Jun 22 14:20:48 localhost kernel: [ 1276.440977]  [<c024bba4>]
netif_receive_skb+0x287/0x317
Jun 22 14:20:48 localhost kernel: [ 1276.442542]  [<e0898b67>]
rtl8139_poll+0x294/0x41c [8139too]
Jun 22 14:20:48 localhost kernel: [ 1276.442590]  [<c024d585>]
net_rx_action+0x8b/0x17c
Jun 22 14:20:48 localhost kernel: [ 1276.444160]  [<c011adf6>]
__do_softirq+0x54/0xb3
Jun 22 14:20:48 localhost kernel: [ 1276.444335]  [<c011ae84>]
do_softirq+0x2f/0x47
Jun 22 14:20:48 localhost kernel: [ 1276.444460]  [<c011b0a5>]
irq_exit+0x39/0x46
Jun 22 14:20:48 localhost kernel: [ 1276.444585]  [<c0104f73>] do_IRQ+0x77/0x84
Jun 22 14:20:48 localhost kernel: [ 1276.444621]  [<c0103561>]
common_interrupt+0x25/0x2c

OK. This is in net/ipv4/tcp_ipv4.c tcp_v4_rcv with the
bh_lock_sock_nested which I presume is clashing with the nested of
skb_clone....

Can we not do two levels nested?

Is there extra documentation for the locking validation suite so that
I can stop asking stupid questions? If not I'll just read more of the
source code.

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to