Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-15 Thread Xue, Ying

> Just for the case of responding  to CONN_MANAGER messages, testing 
> socket owner is safe. But the solution seems unsafe when 
> TIPC_ERR_OVERLOAD message is sent.
> 
> While TIPC_ERR_OVERLOAD is sent out under BH context, socket's owner is set.

Where? There are two locations where tipc_sk_respond() may be sent under BH, in 
tipc_sk_enqueue() and filter_rcv(), but it is either-or: either we are under 
BH, and socket is not owned, or it is owned, and we are not under BH.
I don't see at all what you are referring to.

[Ying] let's look at the following code again:
=
static void tipc_sk_enqueue(struct sk_buff_head *inputq, struct sock *sk,
u32 dport)
{
unsigned int lim;
atomic_t *dcnt;
struct sk_buff *skb;
unsigned long time_limit = jiffies + 2;

while (skb_queue_len(inputq)) {
if (unlikely(time_after_eq(jiffies, time_limit)))
return;

skb = tipc_skb_dequeue(inputq, dport);
if (unlikely(!skb))
return;

/* Add message directly to receive queue if possible */
if (!sock_owned_by_user(sk)) 
{-->(*1)
filter_rcv(sk, skb);
continue;
}

/* Try backlog, compensating for double-counted bytes */
dcnt = &tipc_sk(sk)->dupl_rcvcnt;
if (!sk->sk_backlog.len)
atomic_set(dcnt, 0);
lim = rcvbuf_limit(sk, skb) + atomic_read(dcnt);
if (likely(!sk_add_backlog(sk, skb, lim)))
continue;

/* Overload => reject message back to sender */
tipc_sk_respond(sk, skb, 
TIPC_ERR_OVERLOAD);-(*2)
break;
}
} 
===

Firstly we assume that tipc_sk_enqueue() is called under BH now. At place (*1), 
all skb messages will be fed to filter_rcv() if socket's owner is not set. When 
code is executed to place (*2), socket's owner is absolutely set, and now we 
still stay BH context. That's why I say we still face deadlock risk in this 
scenario when tipc_sk_respond(sk, skb, TIPC_ERR_OVERLOAD) is invoked.

Regards,
Ying



--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-15 Thread Xue, Ying
I concern whether it's really safe for us by testing socket owner instead of 
sk_lock.slock.
We have identified the deadlock reason is because tipc_sk_respond() is called 
under socket lock.

Just for the case of responding  to CONN_MANAGER messages, testing socket owner 
is safe. But the solution seems unsafe when TIPC_ERR_OVERLOAD message is sent.

While TIPC_ERR_OVERLOAD is sent out under BH context, socket's owner is set. So 
the message will be sent out under BH through tipc_node_xmit_skb() instead of 
insert it into sk->sk_write_queue.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Tuesday, June 14, 2016 1:51 AM
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com; gbala...@gmail.com
Subject: [PATCH net v3 1/1] tipc: fix socket timer deadlock

We sometimes observe a 'deadly embrace' type deadlock occurring between 
mutually connected sockets on the same node. This happens when the one-hour 
peer supervision timers happen to expire simultaneously in both sockets.

The scenario is as follows:

CPU 1:  CPU 2:

tipc_sk_timeout(sk1)tipc_sk_timeout(sk2)
  lock(sk1.slock) lock(sk2.slock)
  msg_create(probe)   msg_create(probe)
  unlock(sk1.slock)   unlock(sk2.slock)
  tipc_node_xmit_skb()tipc_node_xmit_skb()
tipc_node_xmit()tipc_node_xmit()
  tipc_sk_rcv(sk2)tipc_sk_rcv(sk1)
lock(sk2.slock) lock((sk1.slock)
filter_rcv()filter_rcv()
  tipc_sk_proto_rcv() tipc_sk_proto_rcv()
msg_create(probe_rsp)   msg_create(probe_rsp)
tipc_sk_respond()   tipc_sk_respond()
  tipc_node_xmit_skb()tipc_node_xmit_skb()
tipc_node_xmit()tipc_node_xmit()
  tipc_sk_rcv(sk1)tipc_sk_rcv(sk2)
lock((sk1.slock)lock((sk2.slock)
===> DEADLOCK   ===> DEADLOCK

Further analysis reveals that there are at least three different locations in 
the socket code where tipc_sk_respond() is called within the context of the 
socket lock, with ensuing risk of similar deadlocks.

We solve this by ensuring that messages created by tipc_sk_respond() only are 
sent directly if sk_lock.owned mutex is held. Otherwise they are queued up in 
the socket write queue and sent after the socket lock has been released.

v2: - Testing on mutex sk_lock.owned instead of sk_lock.slock in
  tipc_sk_respond(). This is safer, since sk_lock.slock may
  occasionally and briefly be held (by concurrent user contexts)
  even if we are in user context.
v3: - By lowering the socket timeout to 36 ms instead of 3,600,000 and
  setting up 1000 connections I could easily reproduce the deadlock
  and verify that my solution works.
- When killing one of the processes I sometimes got a kernel crash
  in the loop emptying the socket write queue. Realizing that there
  may be concurrent processes emptying the write queue, I had to add
  a test that the dequeuing actually returned a buffer. This solved
  the problem.
- I tried Ying's suggestion with unconditionally adding all
  CONN_MANAGER messages to the backlog queue, and it didn't work.
  This is because we will often add the message to the backlog when
  the socket is *not* owned, so there will be nothing triggering
  execution of backlog_rcv() within acceptable time. Apart from
  that, my solution solves the problem at all three locations where
  this deadlock may happen, as already stated above.

Reported-by: GUNA 
Signed-off-by: Jon Maloy 
---
 net/tipc/socket.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 88bfcd7..e8ed3a8 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -278,7 +278,11 @@ static void tipc_sk_respond(struct sock *sk, struct 
sk_buff *skb, int err)
 
dnode = msg_destnode(buf_msg(skb));
selector = msg_origport(buf_msg(skb));
-   tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
+
+   if (sock_owned_by_user(sk))
+   tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
+   else
+   skb_queue_tail(&sk->sk_write_queue, skb);
 }
 
 /**
@@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head 
*inputq)
tipc_sk_enqueue(inputq, sk, dport);
spin_unlock_bh(&sk->sk_lock.slock);
}
+   /* Send pending response/rejected messages, if any */
+   while (!skb_queue_empty(&sk->sk

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-15 Thread Xue, Ying
Erik's suggest seems better.

Regards,
Ying

-Original Message-
From: Erik Hugne [mailto:erik.hu...@gmail.com] 
Sent: Tuesday, June 14, 2016 2:24 AM
To: Jon Maloy
Cc: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
gbala...@gmail.com
Subject: Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer 
deadlock

On Mon, Jun 13, 2016 at 01:51:15PM -0400, Jon Maloy wrote:
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head 
> *inputq)
>   tipc_sk_enqueue(inputq, sk, dport);
>   spin_unlock_bh(&sk->sk_lock.slock);
>   }
> + /* Send pending response/rejected messages, if any */
> + while (!skb_queue_empty(&sk->sk_write_queue)) {
> + skb = skb_dequeue(&sk->sk_write_queue);
> + if (!skb)
> + break;
> + dnode = msg_destnode(buf_msg(skb));
> + tipc_node_xmit_skb(net, skb, dnode, dport);
> + }

Might i suggest this instead?
while ((skb = skb_dequeue(&sk->sk_write_queue))) {
dnode = msg_destnode(buf_msg(skb));
tipc_node_xmit_skb(net, skb, dnode, dport);
}

//E

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-14 Thread Jon Maloy


> -Original Message-
> From: Xue, Ying [mailto:ying@windriver.com]
> Sent: Tuesday, 14 June, 2016 05:46
> To: Jon Maloy; tipc-discussion@lists.sourceforge.net; Parthasarathy 
> Bhuvaragan;
> Richard Alpe
> Cc: ma...@donjonn.com; gbala...@gmail.com
> Subject: RE: [PATCH net v3 1/1] tipc: fix socket timer deadlock
> 
> I concern whether it's really safe for us by testing socket owner instead of
> sk_lock.slock.
> We have identified the deadlock reason is because tipc_sk_respond() is called
> under socket lock.
> 
> Just for the case of responding  to CONN_MANAGER messages, testing socket
> owner is safe. But the solution seems unsafe when TIPC_ERR_OVERLOAD
> message is sent.
> 
> While TIPC_ERR_OVERLOAD is sent out under BH context, socket's owner is set.

Where? There are two locations where tipc_sk_respond() may be sent under BH, in 
tipc_sk_enqueue() and filter_rcv(), but it is either-or: either we are under 
BH, and socket is not owned, or it is owned, and we are not under BH.
I don't see at all what you are referring to.

Regards
///jon

> So the message will be sent out under BH through tipc_node_xmit_skb() instead
> of insert it into sk->sk_write_queue.
> 
> Regards,
> Ying
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Tuesday, June 14, 2016 1:51 AM
> To: tipc-discussion@lists.sourceforge.net;
> parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com;
> jon.ma...@ericsson.com
> Cc: ma...@donjonn.com; gbala...@gmail.com
> Subject: [PATCH net v3 1/1] tipc: fix socket timer deadlock
> 
> We sometimes observe a 'deadly embrace' type deadlock occurring between
> mutually connected sockets on the same node. This happens when the one-hour
> peer supervision timers happen to expire simultaneously in both sockets.
> 
> The scenario is as follows:
> 
> CPU 1:  CPU 2:
> 
> tipc_sk_timeout(sk1)tipc_sk_timeout(sk2)
>   lock(sk1.slock) lock(sk2.slock)
>   msg_create(probe)   msg_create(probe)
>   unlock(sk1.slock)   unlock(sk2.slock)
>   tipc_node_xmit_skb()tipc_node_xmit_skb()
> tipc_node_xmit()tipc_node_xmit()
>   tipc_sk_rcv(sk2)tipc_sk_rcv(sk1)
> lock(sk2.slock) lock((sk1.slock)
> filter_rcv()filter_rcv()
>   tipc_sk_proto_rcv() tipc_sk_proto_rcv()
> msg_create(probe_rsp)   msg_create(probe_rsp)
> tipc_sk_respond()   tipc_sk_respond()
>   tipc_node_xmit_skb()tipc_node_xmit_skb()
> tipc_node_xmit()tipc_node_xmit()
>   tipc_sk_rcv(sk1)tipc_sk_rcv(sk2)
> lock((sk1.slock)lock((sk2.slock)
> ===> DEADLOCK   ===> DEADLOCK
> 
> Further analysis reveals that there are at least three different locations in 
> the
> socket code where tipc_sk_respond() is called within the context of the socket
> lock, with ensuing risk of similar deadlocks.
> 
> We solve this by ensuring that messages created by tipc_sk_respond() only are
> sent directly if sk_lock.owned mutex is held. Otherwise they are queued up in
> the socket write queue and sent after the socket lock has been released.
> 
> v2: - Testing on mutex sk_lock.owned instead of sk_lock.slock in
>   tipc_sk_respond(). This is safer, since sk_lock.slock may
>   occasionally and briefly be held (by concurrent user contexts)
>   even if we are in user context.
> v3: - By lowering the socket timeout to 36 ms instead of 3,600,000 and
>   setting up 1000 connections I could easily reproduce the deadlock
>   and verify that my solution works.
> - When killing one of the processes I sometimes got a kernel crash
>   in the loop emptying the socket write queue. Realizing that there
>   may be concurrent processes emptying the write queue, I had to add
>   a test that the dequeuing actually returned a buffer. This solved
>   the problem.
> - I tried Ying's suggestion with unconditionally adding all
>   CONN_MANAGER messages to the backlog queue, and it didn't work.
>   This is because we will often add the message to the backlog when
>   the socket is *not* owned, so there will be nothing triggering
>   execution of backlog_rcv() within acceptable time. Apart from
>   that, my solution solves the problem at all three locations where
>   this deadlock may happen, as already stated above.
> 
> Reported-by: GUNA 
> Signed-off-by: Jon Maloy 
> ---
>  net/tipc/socket.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 88bfcd7..e8ed3a8 
> 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -278,7 +278,11 @@ static void 

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Erik Hugne
On Jun 14, 2016 00:42, "Jon Maloy"  wrote:

> It definitely is. When checking for !skb_queue_empty() the only thing
done is a comparison between the list->next pointer and the list pointer
itself, i.e., the lock is not taken. Since this condition almost never is
true, we can skip over the content of the if clause and continue the loop
directly.
>

Ok, i did not realize this. Previous comment withdrawn :)

//E
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Jon Maloy


On 06/13/2016 03:20 PM, Erik Hugne wrote:
>
>
> On Jun 13, 2016 20:29, "Jon Maloy"  > wrote:
> > I already thought about that, and checked the implementation of
> > skb_dequeue(). As I suspected it unconditionally grabs the queue lock
> > before checking if there is anything to return. We don't want that on
> > the critical data path.
> >
>
> What i suggested is no different than your patch from a queue locking 
> point of view.
>
It definitely is. When checking for !skb_queue_empty() the only thing 
done is a comparison between the list->next pointer and the list pointer 
itself, i.e., the lock is not taken. Since this condition almost never 
is true, we can skip over the content of the if clause and continue the 
loop directly.

If we do as you suggest, the lock will be grabbed and all interrupts 
will be disabled at least once even when the list is empty.
>
> Only the extra length check is removed.
>
> Did you perhaps intend to use __skb_dequeue()?
>
No, that cannot be used here, since we may have an interrupt feeding the 
queue while we are emptying it.

///jon

> //E
>

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Erik Hugne
On Jun 13, 2016 20:29, "Jon Maloy"  wrote:
> I already thought about that, and checked the implementation of
> skb_dequeue(). As I suspected it unconditionally grabs the queue lock
> before checking if there is anything to return. We don't want that on
> the critical data path.
>

What i suggested is no different than your patch from a queue locking point
of view.
Only the extra length check is removed.

Did you perhaps intend to use __skb_dequeue()?

//E
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Jon Maloy


On 06/13/2016 02:24 PM, Erik Hugne wrote:
> On Mon, Jun 13, 2016 at 01:51:15PM -0400, Jon Maloy wrote:
>> --- a/net/tipc/socket.c
>> +++ b/net/tipc/socket.c
>> @@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head 
>> *inputq)
>>  tipc_sk_enqueue(inputq, sk, dport);
>>  spin_unlock_bh(&sk->sk_lock.slock);
>>  }
>> +/* Send pending response/rejected messages, if any */
>> +while (!skb_queue_empty(&sk->sk_write_queue)) {
>> +skb = skb_dequeue(&sk->sk_write_queue);
>> +if (!skb)
>> +break;
>> +dnode = msg_destnode(buf_msg(skb));
>> +tipc_node_xmit_skb(net, skb, dnode, dport);
>> +}
> Might i suggest this instead?
>   while ((skb = skb_dequeue(&sk->sk_write_queue))) {
I already thought about that, and checked the implementation of 
skb_dequeue(). As I suspected it unconditionally grabs the queue lock 
before checking if there is anything to return. We don't want that on 
the critical data path.

///jon

>   dnode = msg_destnode(buf_msg(skb));
>   tipc_node_xmit_skb(net, skb, dnode, dport);
>   }
>
> //E
>
> --
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion


--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Erik Hugne
On Mon, Jun 13, 2016 at 01:51:15PM -0400, Jon Maloy wrote:
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head 
> *inputq)
>   tipc_sk_enqueue(inputq, sk, dport);
>   spin_unlock_bh(&sk->sk_lock.slock);
>   }
> + /* Send pending response/rejected messages, if any */
> + while (!skb_queue_empty(&sk->sk_write_queue)) {
> + skb = skb_dequeue(&sk->sk_write_queue);
> + if (!skb)
> + break;
> + dnode = msg_destnode(buf_msg(skb));
> + tipc_node_xmit_skb(net, skb, dnode, dport);
> + }

Might i suggest this instead?
while ((skb = skb_dequeue(&sk->sk_write_queue))) {
dnode = msg_destnode(buf_msg(skb));
tipc_node_xmit_skb(net, skb, dnode, dport);
}

//E

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-13 Thread Jon Maloy
We sometimes observe a 'deadly embrace' type deadlock occurring
between mutually connected sockets on the same node. This happens
when the one-hour peer supervision timers happen to expire
simultaneously in both sockets.

The scenario is as follows:

CPU 1:  CPU 2:

tipc_sk_timeout(sk1)tipc_sk_timeout(sk2)
  lock(sk1.slock) lock(sk2.slock)
  msg_create(probe)   msg_create(probe)
  unlock(sk1.slock)   unlock(sk2.slock)
  tipc_node_xmit_skb()tipc_node_xmit_skb()
tipc_node_xmit()tipc_node_xmit()
  tipc_sk_rcv(sk2)tipc_sk_rcv(sk1)
lock(sk2.slock) lock((sk1.slock)
filter_rcv()filter_rcv()
  tipc_sk_proto_rcv() tipc_sk_proto_rcv()
msg_create(probe_rsp)   msg_create(probe_rsp)
tipc_sk_respond()   tipc_sk_respond()
  tipc_node_xmit_skb()tipc_node_xmit_skb()
tipc_node_xmit()tipc_node_xmit()
  tipc_sk_rcv(sk1)tipc_sk_rcv(sk2)
lock((sk1.slock)lock((sk2.slock)
===> DEADLOCK   ===> DEADLOCK

Further analysis reveals that there are at least three different
locations in the socket code where tipc_sk_respond() is called within
the context of the socket lock, with ensuing risk of similar deadlocks.

We solve this by ensuring that messages created by tipc_sk_respond()
only are sent directly if sk_lock.owned mutex is held. Otherwise they
are queued up in the socket write queue and sent after the socket lock
has been released.

v2: - Testing on mutex sk_lock.owned instead of sk_lock.slock in
  tipc_sk_respond(). This is safer, since sk_lock.slock may
  occasionally and briefly be held (by concurrent user contexts)
  even if we are in user context.
v3: - By lowering the socket timeout to 36 ms instead of 3,600,000 and
  setting up 1000 connections I could easily reproduce the deadlock
  and verify that my solution works.
- When killing one of the processes I sometimes got a kernel crash
  in the loop emptying the socket write queue. Realizing that there
  may be concurrent processes emptying the write queue, I had to add
  a test that the dequeuing actually returned a buffer. This solved
  the problem.
- I tried Ying's suggestion with unconditionally adding all
  CONN_MANAGER messages to the backlog queue, and it didn't work.
  This is because we will often add the message to the backlog when
  the socket is *not* owned, so there will be nothing triggering
  execution of backlog_rcv() within acceptable time. Apart from
  that, my solution solves the problem at all three locations where
  this deadlock may happen, as already stated above.

Reported-by: GUNA 
Signed-off-by: Jon Maloy 
---
 net/tipc/socket.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 88bfcd7..e8ed3a8 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -278,7 +278,11 @@ static void tipc_sk_respond(struct sock *sk, struct 
sk_buff *skb, int err)
 
dnode = msg_destnode(buf_msg(skb));
selector = msg_origport(buf_msg(skb));
-   tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
+
+   if (sock_owned_by_user(sk))
+   tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
+   else
+   skb_queue_tail(&sk->sk_write_queue, skb);
 }
 
 /**
@@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head 
*inputq)
tipc_sk_enqueue(inputq, sk, dport);
spin_unlock_bh(&sk->sk_lock.slock);
}
+   /* Send pending response/rejected messages, if any */
+   while (!skb_queue_empty(&sk->sk_write_queue)) {
+   skb = skb_dequeue(&sk->sk_write_queue);
+   if (!skb)
+   break;
+   dnode = msg_destnode(buf_msg(skb));
+   tipc_node_xmit_skb(net, skb, dnode, dport);
+   }
sock_put(sk);
continue;
}
-- 
1.9.1


--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
___