Re: [tipc-discussion] [PATCH net v1 1/2] tipc: fix socket flow control errors

2017-02-24 Thread Jon Maloy
Hi Partha,
Forget about my previous comment. On a closer look, I find that your patch is 
correct. (But you should still rephrase.)
Both patches acked by: me

///jon


> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Friday, February 24, 2017 09:51 AM
> To: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>;
> tipc-discussion@lists.sourceforge.net; Ying Xue <ying@windriver.com>;
> pbut...@sonusnet.com
> Subject: Re: [tipc-discussion] [PATCH net v1 1/2] tipc: fix socket flow 
> control
> errors
> 
> 
> 
> > -Original Message-
> > From: Parthasarathy Bhuvaragan
> > Sent: Friday, February 24, 2017 08:51 AM
> > To: tipc-discussion@lists.sourceforge.net; Jon Maloy
> > <jon.ma...@ericsson.com>; Ying Xue <ying@windriver.com>;
> > pbut...@sonusnet.com
> > Subject: [PATCH net v1 1/2] tipc: fix socket flow control errors
> >
> > In this commit, we fix the following two errors:
> > 1. In tipc_send_stream(), fix the return value during congestion
> >when the send is partially successful. Until now, we return -1
> >instead of returning the partial sent bytes.
> > 2. In tipc_recv_stream(), we update the rcv_unack not based on the
> >message size, but on sz. Usually they are the same, but in cases
> >where the socket receivers buffer is smaller than the incoming
> >message, these two parameters differ greatly.
> 
> I was first confused by this, but I assume you with 'socket receive buffer' in
> this case mean the read buffer  given by the user when he does read(), not
> sk->sk_rcvbuf as I first thought. I suggest you rephrase this.
> Worse, I don't think this fix is correct. Now, you will add the full
> msg_data_sz() at each iteration, something that will lead to a skew in the
> other direction, -the reader will now count in *more* blocks than the
> sender, eventually leading the sender's snt_unacked to step around (and
> become 64k since it is a u16) and in practice disabling the flow control
> altogether. I believe the correction must still be based on 'sz', but
> compensating for the fact that  tsk_inc() is non-linear, i.e., it adds an 
> extra
> block at each iteration, which I think is the real problem you have here. I 
> will
> also need to spend some thought on the impact on the legacy flow control,
> which must still work.
> 
> Regards
> ///jon
> 
> > This introduces a
> >slack in accounting leading to permanent congestion. In this
> >commit, we perform accounting always based on the incoming message.
> >
> > Signed-off-by: Parthasarathy Bhuvaragan
> > <parthasarathy.bhuvara...@ericsson.com>
> > ---
> >  net/tipc/socket.c | 9 -
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/tipc/socket.c b/net/tipc/socket.c index
> > 6b09a778cc71..79e628cd08a9 100644
> > --- a/net/tipc/socket.c
> > +++ b/net/tipc/socket.c
> > @@ -1080,7 +1080,7 @@ static int __tipc_sendstream(struct socket
> > *sock, struct msghdr *m, size_t dlen)
> > }
> > } while (sent < dlen && !rc);
> >
> > -   return rc ? rc : sent;
> > +   return sent ? sent : rc;
> >  }
> >
> >  /**
> > @@ -1481,16 +1481,15 @@ static int tipc_recv_stream(struct socket
> > *sock, struct msghdr *m,
> > if (unlikely(flags & MSG_PEEK))
> > goto exit;
> >
> > -   tsk->rcv_unacked += tsk_inc(tsk, hlen + sz);
> > +   tsk->rcv_unacked += tsk_inc(tsk, hlen + msg_data_sz(msg));
> > if (unlikely(tsk->rcv_unacked >= (tsk->rcv_win / 4)))
> > tipc_sk_send_ack(tsk);
> > tsk_advance_rx_queue(sk);
> >
> > /* Loop around if more data is required */
> > -   if ((sz_copied < buf_len) &&/* didn't get all requested data */
> > +   if ((!err) && (sz_copied < buf_len) &&
> > (!skb_queue_empty(>sk_receive_queue) ||
> > -   (sz_copied < target)) &&/* and more is ready or required */
> > -   (!err)) /* and haven't reached a FIN */
> > +(sz_copied < target)))
> > goto restart;
> >
> >  exit:
> > --
> > 2.1.4
> 
> 
> --
> Check out the vibrant tech community on one of the world's most engaging
> tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net v1 1/2] tipc: fix socket flow control errors

2017-02-24 Thread Jon Maloy


> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Friday, February 24, 2017 08:51 AM
> To: tipc-discussion@lists.sourceforge.net; Jon Maloy
> <jon.ma...@ericsson.com>; Ying Xue <ying@windriver.com>;
> pbut...@sonusnet.com
> Subject: [PATCH net v1 1/2] tipc: fix socket flow control errors
> 
> In this commit, we fix the following two errors:
> 1. In tipc_send_stream(), fix the return value during congestion
>when the send is partially successful. Until now, we return -1
>instead of returning the partial sent bytes.
> 2. In tipc_recv_stream(), we update the rcv_unack not based on the
>message size, but on sz. Usually they are the same, but in cases
>where the socket receivers buffer is smaller than the incoming
>message, these two parameters differ greatly. 

I was first confused by this, but I assume you with 'socket receive buffer' in 
this case mean the read buffer  given by the user when he does read(), not 
sk->sk_rcvbuf as I first thought. I suggest you rephrase this.
Worse, I don't think this fix is correct. Now, you will add the full 
msg_data_sz() at each iteration, something that will lead to a skew in the 
other direction, -the reader will now count in *more* blocks than the sender, 
eventually leading the sender's snt_unacked to step around (and become 64k 
since it is a u16) and in practice disabling the flow control altogether. I 
believe the correction must still be based on 'sz', but compensating for the 
fact that  tsk_inc() is non-linear, i.e., it adds an extra block at each 
iteration, which I think is the real problem you have here. I will also need to 
spend some thought on the impact on the legacy flow control, which must still 
work. 

Regards
///jon

> This introduces a
>slack in accounting leading to permanent congestion. In this
>commit, we perform accounting always based on the incoming message.
> 
> Signed-off-by: Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>
> ---
>  net/tipc/socket.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c index
> 6b09a778cc71..79e628cd08a9 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1080,7 +1080,7 @@ static int __tipc_sendstream(struct socket *sock,
> struct msghdr *m, size_t dlen)
>   }
>   } while (sent < dlen && !rc);
> 
> - return rc ? rc : sent;
> + return sent ? sent : rc;
>  }
> 
>  /**
> @@ -1481,16 +1481,15 @@ static int tipc_recv_stream(struct socket *sock,
> struct msghdr *m,
>   if (unlikely(flags & MSG_PEEK))
>   goto exit;
> 
> - tsk->rcv_unacked += tsk_inc(tsk, hlen + sz);
> + tsk->rcv_unacked += tsk_inc(tsk, hlen + msg_data_sz(msg));
>   if (unlikely(tsk->rcv_unacked >= (tsk->rcv_win / 4)))
>   tipc_sk_send_ack(tsk);
>   tsk_advance_rx_queue(sk);
> 
>   /* Loop around if more data is required */
> - if ((sz_copied < buf_len) &&/* didn't get all requested data */
> + if ((!err) && (sz_copied < buf_len) &&
>   (!skb_queue_empty(>sk_receive_queue) ||
> - (sz_copied < target)) &&/* and more is ready or required */
> - (!err)) /* and haven't reached a FIN */
> +  (sz_copied < target)))
>   goto restart;
> 
>  exit:
> --
> 2.1.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Jon Maloy


> -Original Message-
> From: Butler, Peter [mailto:pbut...@sonusnet.com]
> Sent: Wednesday, February 22, 2017 04:31 PM
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net
> Cc: Butler, Peter <pbut...@sonusnet.com>
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> Hi Jon,
> 
> I think I found the problem, which ultimately may only exist on our end (see
> below for an explanation, and let me know if you agree).
> 
> The fellow that was maintaining our O/S previously (no longer with the
> company) had made some patches to the 4.4.0 kernel TIPC code, and indeed
> one of them is in the offending tipc_sk_rcv() function.
> 
> Specifically, note this segment of code from our kernel source tree:
> 
>/* Send pending response/rejected messages, if any */
>while (!skb_queue_empty(>sk_write_queue)) {
>skb = skb_dequeue(>sk_write_queue);
>dnode = msg_destnode(buf_msg(skb));
>tipc_node_xmit_skb(net, skb, dnode, dport);
>}

Yes, this is wrong. The socket write queue is only used for outgoing regular 
messages (Partha has later changed that), and should only be emptied by the 
sending thread. Running this code in interrupt context will give exactly the 
symptom you see, because the writing thread might already have freed or sent 
the buffer in question.
> 
> Whereas the latest and greatest official longterm 4.9.11 kernel has:
> 
>  /* Send pending response/rejected messages, if any */
>  while ((skb = __skb_dequeue())) {
> dnode = msg_destnode(buf_msg(skb));
> tipc_node_xmit_skb(net, skb, dnode, dport);
>  }
> 
> The code path that triggers the oops (in our source code) is from:
> 
> dnode = msg_destnode(buf_msg(skb));
> 
> where msg_destnode() calls msg_word() which calls:
> 
> ntohl(m->hdr[pos]);
> 
> which is precisely where the oops occurred.
> 
> I'm not exactly sure where he got that code change - my guess is he posted a
> question on the tipc-discussion list and got a suggestion to try a code 
> snippet,
> but in the end the actual changes (that were officially released at 
> kernel.org)
> differed, as per above. 

I rather suspect he might have looked at the more recent code and tried to do 
the same, while misunderstanding the role of the write queue.

> Indeed, on Google I can see some threads discussing
> a 'deadly embrace' deadlock (for example
> http://www.spinics.net/lists/netdev/msg382379.html) between yourself
> and him.  Another possibility is that the offending source code in question
> was indeed released sometime after 4.4.0, but has since modified/fixed,
> thus explaining the discrepancy.

The loop was introduced in conjunction with that discussion, but it should not 
be done in the way it is done above. Indeed, I cannot see that this can have 
solved the "deadly embrace" problem at all, unless he made other changes and 
added the rejected/returned messages to the write queue. That might work most 
of the time, but will still sooner or later interfere with a sending thread.

There are two ways you can solve this:
1: Introduce a stack based queue for reject/return messages, as we do, and pass 
it along in the calls.
2: Put send messages on a stack based queue, as Partha has done in the later 
versions. This assuming that the rejected messages are added to the write 
queue, as I am speculating above.

BR
///jon

> 
> If either of possibilities is what actually happened, then this may not a bug
> you need to worry about.  Granted, the same msg_destnode() call still exists
> in the current (4.9.11 and 4.10) code, but the semantics of the encapsulating
> while loop are different, and maybe as such that eliminates the issue.
> Thoughts?
> 
> Peter
> 
> 
> 
> 
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: February-22-17 3:01 PM
> To: Butler, Peter <pbut...@sonusnet.com>; tipc-
> discuss...@lists.sourceforge.net
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> 
> 
> > -Original Message-
> > From: Butler, Peter [mailto:pbut...@sonusnet.com]
> > Sent: Wednesday, February 22, 2017 02:15 PM
> > To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> > discuss...@lists.sourceforge.net
> > Cc: Butler, Peter <pbut...@sonusnet.com>
> > Subject: RE: TIPC Oops in tipc_sk_recv
> >
> > For the " Source file is more recent than executable" message, could
> > this simply be due to the fact that I copied the kernel source to the
> > lab and then ran the gdb commands as shown?  As suc

[tipc-discussion] [PATCH net 1/1] tipc: move premature initilalization of stack variables

2017-02-23 Thread Jon Maloy
In the function tipc_rcv() we initialize a couple of stack variables
from the message header before that same header has been validated.
In rare cases when the arriving header is non-linar, the validation
function itself may linearize the buffer by calling skb_may_pull(),
while the wrongly initialized stack fields are not updated accordingly.

We fix this in this commit.

Reported-by: Matthew Wong <mw...@sonusnet.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/node.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/tipc/node.c b/net/tipc/node.c
index e9295fa..4512e83 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1505,19 +1505,21 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, 
struct tipc_bearer *b)
 {
struct sk_buff_head xmitq;
struct tipc_node *n;
-   struct tipc_msg *hdr = buf_msg(skb);
-   int usr = msg_user(hdr);
+   struct tipc_msg *hdr;
int bearer_id = b->identity;
struct tipc_link_entry *le;
-   u16 bc_ack = msg_bcast_ack(hdr);
u32 self = tipc_own_addr(net);
-   int rc = 0;
+   int usr, rc = 0;
+   u16 bc_ack;
 
__skb_queue_head_init();
 
-   /* Ensure message is well-formed */
+   /* Ensure message is well-formed before touching the header */
if (unlikely(!tipc_msg_validate(skb)))
goto discard;
+   hdr = buf_msg(skb);
+   usr = msg_user(hdr);
+   bc_ack = msg_bcast_ack(hdr);
 
/* Handle arrival of discovery or broadcast packet */
if (unlikely(msg_non_seq(hdr))) {
-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Jon Maloy


> -Original Message-
> From: Butler, Peter [mailto:pbut...@sonusnet.com]
> Sent: Thursday, February 23, 2017 10:25 AM
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net
> Cc: Butler, Peter <pbut...@sonusnet.com>
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> Hi Jon,
> 
> Thanks for the info.  The solution we are considering (to give the customer an
> emergency patch) is backport the TIPC code from kernel 4.4.50 into our 4.4.0
> kernel source tree.  From what I can see, I should be able to do so with 
> little
> effort.  I am assuming (?) that since 4.4.x is a longterm kernel release that 
> the
> 4.4.50 TIPC code is considered stable and devoid of the original bug
> associated with this section of code in tipc_sk_rcv() - 
> am I wrong to assume that? 

Unfortunately yes. The only safe solution to the deadlock problem is the one 
you find in later versions.
The patch fixing this particular problem hasn't been applied this far back, 
probably because it didn't apply cleanly.

> The section of code in question is entirely different in 4.4.50 than what
> we currently have:
> 
>   if (likely(tsk)) {
>  sk = >sk;
>  if (likely(spin_trylock_bh(>sk_lock.slock))) {
> tipc_sk_enqueue(inputq, sk, dport);
> spin_unlock_bh(>sk_lock.slock);
>  }
>  sock_put(sk);
>  continue;
>   }
> 
> Does this mean that the 4.4.50 version (as shown above) is still susceptible 
> to
> the original bug?  (Our original O/S maintainer patched this section because
> of the original bug that was causing an oops there - but obviously the patch
> he implemented was also buggy, as previously discussed.)
> 
> Ultimately we would rather upgrade our entire kernel (say, to 4.9.11 - the
> latest and greatest longterm release) but I see the TIPC design has changed
> significantly and I'm not sure if it would backport into our 4.4.0 kernel 
> without
> significant effort; i.e. perhaps this change in design also depends on other
> API changes within other layers of the kernel.  If I am wrong in this and you
> think that the 4.9.11 TIPC code should be able to be backported to our 4.4.0
> base then I will do so, 

It is absolutely doable. As a matter of fact, this is what Partha has been 
doing in one of our own product lines.
AFAIK, the only build issue you will encounter is a change to the iov handling 
in msg_build(), and that is easily fixed by reverting to the old method. 
(Correct me Partha, if I am wrong here). But, with new functionality (e.g., new 
flow control) there are new issues which still haven't been ironed out 
completely. I think Partha is the one to give a better update here.

///jon

> as there are far more fixes in 4.9.11 than in 4.4.50.  The
> reason we can't upgrade the entire kernel to 4.4.50 or 4.9.11 in the short
> term is a bit of a long story (which I will spare you), but suffice it to say 
> that
> that is only an option for a long-term fix for our customers and not for this
> short term emergency fix which we need released asap.
> 
> All this to say, the goal here is to move to the latest possible TIPC code 
> which
> will (relatively) seamlessly integrate with our 4.4.0 kernel, and also be 
> free of
> the aforementioned bug.  Let me know what you think.
> 
> Thanks,
> 
> Peter
> 
> 
> 
> 
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: February-23-17 8:22 AM
> To: Butler, Peter <pbut...@sonusnet.com>; tipc-
> discuss...@lists.sourceforge.net
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> 
> 
> > -Original Message-
> > From: Butler, Peter [mailto:pbut...@sonusnet.com]
> > Sent: Wednesday, February 22, 2017 04:31 PM
> > To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> > discuss...@lists.sourceforge.net
> > Cc: Butler, Peter <pbut...@sonusnet.com>
> > Subject: RE: TIPC Oops in tipc_sk_recv
> >
> > Hi Jon,
> >
> > I think I found the problem, which ultimately may only exist on our
> > end (see below for an explanation, and let me know if you agree).
> >
> > The fellow that was maintaining our O/S previously (no longer with the
> > company) had made some patches to the 4.4.0 kernel TIPC code, and
> > indeed one of them is in the offending tipc_sk_rcv() function.
> >
> > Specifically, note this segment of code from our kernel source tree:
> >
> >/* Send pending response/rejected messages, if any */
> >while (!skb_queue_empty(>sk_write_queue)) {
> >skb = skb_dequeue(>sk_write_queue);
> >

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Jon Maloy


> -Original Message-
> From: Butler, Peter [mailto:pbut...@sonusnet.com]
> Sent: Thursday, February 23, 2017 01:09 PM
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net; Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>
> Cc: Butler, Peter <pbut...@sonusnet.com>
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> Partha - an update for you
> 
> I've ported all the TIPC code from 4.9.11 into our 4.4.0 kernel code base.  By
> this I mean I have completely removed all the existing TIPC files in their
> entirety from:
> 
> include/uapi/linux/tipc*
> net/tipc/*
> 
> in our 4.4.0 kernel source tree, and replaced these with all the files from
> 4.9.11.
> 
> As Jon indeed forewarned me, there will be a hurdle or two to integrate this
> with the 4.4.0 kernel's internal API.  As it stands this is where the 
> compilation
> first fails.  I can certainly look into this myself but am told you are the 
> expert.
> (I am far from a kernel expert myself.)
> 
>   LD  net/tipc/built-in.o
>   CC [M]  net/tipc/addr.o
>   CC [M]  net/tipc/bcast.o
>   CC [M]  net/tipc/bearer.o
>   CC [M]  net/tipc/core.o
>   CC [M]  net/tipc/link.o
>   CC [M]  net/tipc/discover.o
>   CC [M]  net/tipc/msg.o
>   CC [M]  net/tipc/name_distr.o
>   CC [M]  net/tipc/subscr.o
>   CC [M]  net/tipc/monitor.o
> net/tipc/monitor.c: In function '__tipc_nl_add_monitor_peer':

Unless you are running a cluster > 32 nodes and need the hierarchical neighbor 
monitoring feature, you can just comment out the contents of this function and 
other monitor-related netlink function.

///jon

> net/tipc/monitor.c:707:3: error: implicit declaration of function
> 'nla_put_u64_64bit' [-Werror=implicit-function-declaration]
> cc1: some warnings being treated as errors
> make[2]: *** [net/tipc/monitor.o] Error 1
> make[1]: *** [net/tipc] Error 2
> make: *** [net] Error 2
> 
> 
> 
> -Original Message-
> From: Butler, Peter
> Sent: February-23-17 10:56 AM
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net; Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>
> Cc: Butler, Peter <pbut...@sonusnet.com>
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> Hi Partha,
> 
> I'll give you the short version here to save you the time of reading this 
> entire
> thread.
> 
> Basically I need to port the latest and greatest TIPC code (i.e. from the 
> latest
> longterm kernel release, namely 4.9.11) into a 4.4.0 kernel source base.  (I
> know that sounds ugly but it's for an emergency quick-fix and upgrading the
> entire kernel is not an option at this time...)
> 
> Jon has said this is entirely doable but that you are the expert, and that 
> there
> will be at least one minor hurdle in doing so, namely in iov handling in
> msg_build().
> 
> Thanks,
> 
> Peter
> 
> 
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: February-23-17 10:45 AM
> To: Butler, Peter <pbut...@sonusnet.com>; tipc-
> discuss...@lists.sourceforge.net; Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> 
> 
> > -Original Message-
> > From: Butler, Peter [mailto:pbut...@sonusnet.com]
> > Sent: Thursday, February 23, 2017 10:25 AM
> > To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> > discuss...@lists.sourceforge.net
> > Cc: Butler, Peter <pbut...@sonusnet.com>
> > Subject: RE: TIPC Oops in tipc_sk_recv
> >
> > Hi Jon,
> >
> > Thanks for the info.  The solution we are considering (to give the
> > customer an emergency patch) is backport the TIPC code from kernel
> > 4.4.50 into our 4.4.0 kernel source tree.  From what I can see, I
> > should be able to do so with little effort.  I am assuming (?) that
> > since 4.4.x is a longterm kernel release that the
> > 4.4.50 TIPC code is considered stable and devoid of the original bug
> > associated with this section of code in tipc_sk_rcv() - am I wrong to
> > assume that?
> 
> Unfortunately yes. The only safe solution to the deadlock problem is the one
> you find in later versions.
> The patch fixing this particular problem hasn't been applied this far back,
> probably because it didn't apply cleanly.
> 
> > The section of code in question is entirely different in 4.4.50 than
> > what we currently have:
> >
> >   if (likely(tsk)) {
> >  sk = >sk;
> >  if (likely(spin_trylock_bh(>sk_lock.slock))) {
> > tipc_sk_enqueue(inputq, sk, dpo

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Jon Maloy


> -Original Message-
> From: Butler, Peter [mailto:pbut...@sonusnet.com]
> Sent: Thursday, February 23, 2017 01:23 PM
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net; Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>
> Cc: Butler, Peter <pbut...@sonusnet.com>
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> That might be a possibility - I know the customer is close to 32 nodes
> however, so it might not be.
> 
> I'm also looking at porting the required functionality from
> include/net/netlink.h and lib/nlattr.c directly into the TIPC monitor.c file 
> (as
> opposed to changing any code directly in include/net and lib/.

I think you are moving into dangerous waters here, unless you only want the 
code to compile.
A simpler and safer option: change #define TIPC_DEF_MON_THRESHOLD in core.h 
from  32 to e.g. 100, and the hierarchical monitoring will be disabled. This is 
the way we have been running forever until 4.7, so this is a safe bet.

//jon

> 
> 
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: February-23-17 1:19 PM
> To: Butler, Peter <pbut...@sonusnet.com>; tipc-
> discuss...@lists.sourceforge.net; Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> 
> 
> > -Original Message-
> > From: Butler, Peter [mailto:pbut...@sonusnet.com]
> > Sent: Thursday, February 23, 2017 01:09 PM
> > To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> > discuss...@lists.sourceforge.net; Parthasarathy Bhuvaragan
> > <parthasarathy.bhuvara...@ericsson.com>
> > Cc: Butler, Peter <pbut...@sonusnet.com>
> > Subject: RE: TIPC Oops in tipc_sk_recv
> >
> > Partha - an update for you
> >
> > I've ported all the TIPC code from 4.9.11 into our 4.4.0 kernel code
> > base.  By this I mean I have completely removed all the existing TIPC
> > files in their entirety from:
> >
> > include/uapi/linux/tipc*
> > net/tipc/*
> >
> > in our 4.4.0 kernel source tree, and replaced these with all the files
> > from 4.9.11.
> >
> > As Jon indeed forewarned me, there will be a hurdle or two to
> > integrate this with the 4.4.0 kernel's internal API.  As it stands
> > this is where the compilation first fails.  I can certainly look into this 
> > myself
> but am told you are the expert.
> > (I am far from a kernel expert myself.)
> >
> >   LD  net/tipc/built-in.o
> >   CC [M]  net/tipc/addr.o
> >   CC [M]  net/tipc/bcast.o
> >   CC [M]  net/tipc/bearer.o
> >   CC [M]  net/tipc/core.o
> >   CC [M]  net/tipc/link.o
> >   CC [M]  net/tipc/discover.o
> >   CC [M]  net/tipc/msg.o
> >   CC [M]  net/tipc/name_distr.o
> >   CC [M]  net/tipc/subscr.o
> >   CC [M]  net/tipc/monitor.o
> > net/tipc/monitor.c: In function '__tipc_nl_add_monitor_peer':
> 
> Unless you are running a cluster > 32 nodes and need the hierarchical
> neighbor monitoring feature, you can just comment out the contents of this
> function and other monitor-related netlink function.
> 
> ///jon
> 
> > net/tipc/monitor.c:707:3: error: implicit declaration of function
> > 'nla_put_u64_64bit' [-Werror=implicit-function-declaration]
> > cc1: some warnings being treated as errors
> > make[2]: *** [net/tipc/monitor.o] Error 1
> > make[1]: *** [net/tipc] Error 2
> > make: *** [net] Error 2
> >
> >
> >
> > -Original Message-
> > From: Butler, Peter
> > Sent: February-23-17 10:56 AM
> > To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> > discuss...@lists.sourceforge.net; Parthasarathy Bhuvaragan
> > <parthasarathy.bhuvara...@ericsson.com>
> > Cc: Butler, Peter <pbut...@sonusnet.com>
> > Subject: RE: TIPC Oops in tipc_sk_recv
> >
> > Hi Partha,
> >
> > I'll give you the short version here to save you the time of reading
> > this entire thread.
> >
> > Basically I need to port the latest and greatest TIPC code (i.e. from
> > the latest longterm kernel release, namely 4.9.11) into a 4.4.0 kernel
> > source base.  (I know that sounds ugly but it's for an emergency
> > quick-fix and upgrading the entire kernel is not an option at this
> > time...)
> >
> > Jon has said this is entirely doable but that you are the expert, and
> > that there will be at least one minor hurdle in doing so, namely in
> > iov handling in msg_build().
> >
> > Thanks,
> >
> > Peter
> >
> >
> >
> > -Origina

Re: [tipc-discussion] [net 0/5] solve two deadlock issues

2017-02-21 Thread Jon Maloy
Hi John.
Yes you are right. But I still would prefer a condensed patch where we don’t 
touch the refcounts when this goes into ‘net’. I am awaiting a comment from 
Ying.

///jon


From: John Thompson [mailto:thompa@gmail.com]
Sent: Tuesday, February 21, 2017 04:18 PM
To: Jon Maloy <jon.ma...@ericsson.com>
Cc: Ying Xue <ying@windriver.com>; Parthasarathy Bhuvaragan 
<parthasarathy.bhuvara...@ericsson.com>; tipc-discussion@lists.sourceforge.net
Subject: Re: [net 0/5] solve two deadlock issues

Patch #2 removes the tipc_subscrp_get() and _put() from 
tipc_subscrp_report_overlap().
This prevents the problem of the early returns.
JT


On Wed, Feb 22, 2017 at 1:42 AM, Jon Maloy 
<jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>> wrote:
I don't see that you remove the two premature 'return's in 
subcsrb_report_overlap() in your series. These are also genuine bugs that must 
be fixed.

///jon


> -----Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>]
> Sent: Tuesday, February 21, 2017 06:12 AM
> To: Ying Xue <ying@windriver.com<mailto:ying@windriver.com>>; 
> Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com<mailto:parthasarathy.bhuvara...@ericsson.com>>;
>  thompa@gmail.com<mailto:thompa@gmail.com>
> Cc: 
> tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
> Subject: Re: [tipc-discussion] [net 0/5] solve two deadlock issues
>
> Hi Ying,
> These are good design changes, that definitely should go in asap. However, I
> feel deeply uncomfortable with such a big change going into 'net', especially
> since our previous, exceptionally large, contribution now has turned out to
> have problems. I wonder if we could not get away with something simpler
> for 'net'.
>
> Looking closer at your series, it seems to me that only patches ## 1, 4, and
> the lock removal part of #5 are really needed to solve the problem we have
> at hand now. Why not merge those into one patch and deliver this to 'net',
> while reference count redesign parts can go into net-next ?
>
> Regards
> ///jon
>
>
> > -Original Message-
> > From: Ying Xue 
> > [mailto:ying@windriver.com<mailto:ying@windriver.com>]
> > Sent: Monday, February 20, 2017 06:39 AM
> > To: Jon Maloy <jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>>; 
> > Parthasarathy Bhuvaragan
> > <parthasarathy.bhuvara...@ericsson.com<mailto:parthasarathy.bhuvara...@ericsson.com>>;
> >  thompa@gmail.com<mailto:thompa@gmail.com>
> > Cc: 
> > tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
> > Subject: [net 0/5] solve two deadlock issues
> >
> > Commit d094c4d5f5 ("tipc: add subscription refcount to avoid invalid
> > delete") accidently introduce the following deadlock scenarios:
> >
> >CPU1: CPU2:
> > -- 
> > tipc_nametbl_publish
> > spin_lock_bh(>nametbl_lock)
> > tipc_nametbl_insert_publ
> > tipc_nameseq_insert_publ
> > tipc_subscrp_report_overlap
> > tipc_subscrp_get
> > tipc_subscrp_send_event
> >  tipc_close_conn
> >  tipc_subscrb_release_cb
> >  tipc_subscrb_delete
> >  tipc_subscrp_put
> > tipc_subscrp_put
> > tipc_subscrp_kref_release
> > tipc_nametbl_unsubscribe
> > spin_lock_bh(>nametbl_lock)
> > <>
> >
> >CPU1:  CPU2:
> > -- 
> > tipc_nametbl_stop
> > spin_lock_bh(>nametbl_lock)
> > tipc_purge_publications
> > tipc_nameseq_remove_publ
> > tipc_subscrp_report_overlap
> > tipc_subscrp_get
> > tipc_subscrp_send_event
> >  tipc_close_conn
> >  tipc_subscrb_release_cb
> >  tipc_subscrb_delete
> >  tipc_subscrp_put
> > tipc_subscrp_put
> > tipc_subscrp_kref_release
> > tipc_nametbl_unsubscribe
> > spin_lock_bh(>nametbl_lock)
> > <>
> >
> > The root cause of two deadlocks is that we have to hold nametbl lock
> > when subscription is freed in tipc_subscrp_kref_release(). In order to
> > eliminate the need of taking nametbl lock in
> > tipc_subscrp_kref_release(), the functions protected by nametbl lock
> > in tipc_subscrp_kref_release() are moved to oth

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-22 Thread Jon Maloy
Hi Peter,
Very hard to make any suggestions on how to reproduce this. What I can see is 
that it is a STREAM message being sent from a node local socket, i.e., it 
doesn't go via any interface. The crash seems to happen when the receiving 
socket is owned by the user, and while we are instead adding the message to the 
backlog queue:

Reading symbols from net/tipc/tipc.ko...done.
(gdb) list *(tipc_sk_rcv+0x238)
0x13d78 is in tipc_sk_rcv (./arch/x86/include/asm/atomic.h:214).
209 static __always_inline int __atomic_add_unless(atomic_t *v, int a, int 
u)
210 {
211 int c, old;
212 c = atomic_read(v);
213 for (;;) {
214 if (unlikely(c == (u)))
215 break;
216 old = atomic_cmpxchg((v), c, c + (a));
217 if (likely(old == c))
218 break;

This is about what I can get out of it at the moment. Maybe you should try a 
high-load test between two local sockets (try the benchmark demo from 
tipcutils) and see what you can achieve.

BR
///jon


> -Original Message-
> From: Butler, Peter [mailto:pbut...@sonusnet.com]
> Sent: Wednesday, February 22, 2017 10:40 AM
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net
> Cc: Butler, Peter <pbut...@sonusnet.com>
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> If you have any suggestions as to procedures/tricks you think might trigger
> this bug I can certainly attempt to do so in the lab.  Obviously we can't
> attempt to reproduce it on the customer's (live) system.
> 
> 
> 
> -Original Message-
> From: Butler, Peter
> Sent: February-21-17 3:39 PM
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net
> Cc: Butler, Peter <pbut...@sonusnet.com>
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> Unfortunately this occurred on a customer system so it is not readily
> reproducible.  We have not seen this occur in our lab.
> 
> For what it's worth, it occurred while the process was in
> TASK_UNINTERRUPTIBLE.  As such, the kernel could not actually kill off the
> associated process despite the Oops, and the process remained forever
> frozen in the 'D' state and the card had to be rebooted.
> 
> 
> 
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: February-21-17 3:36 PM
> To: Butler, Peter <pbut...@sonusnet.com>; tipc-
> discuss...@lists.sourceforge.net
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> Hi Peter,
> I don't think this is any known bug. Is it repeatable?
> 
> ///jon
> 
> > -Original Message-
> > From: Butler, Peter [mailto:pbut...@sonusnet.com]
> > Sent: Tuesday, February 21, 2017 12:14 PM
> > To: tipc-discussion@lists.sourceforge.net
> > Cc: Butler, Peter <pbut...@sonusnet.com>
> > Subject: [tipc-discussion] TIPC Oops in tipc_sk_recv
> >
> > This was with kernel 4.4.0, however I don't see any fix specifically
> > related to this in any subsequent 4.4.x kernel...
> >
> > BUG: unable to handle kernel NULL pointer dereference at
> > 00d8
> > IP: [] tipc_sk_rcv+0x238/0x4d0 [tipc] PGD 34f4c0067
> > PUD
> > 34ed95067 PMD 0
> > Oops:  [#1] SMP
> > Modules linked in: nf_log_ipv4 nf_log_common xt_LOG sctp libcrc32c
> > e1000e tipc udp_tunnel ip6_udp_tunnel iTCO_wdt 8021q garp xt_physdev
> > br_netfilter bridge stp llc nf_conntrack_ipv4 ipmiq_drv(O)
> > nf_defrag_ipv4
> > sio_mmc(O) ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
> > xt_state nf_conntrack event_drv(O) ip6table_filter lockd ip6_tables
> > pt_timer_info(O) ddi(O) grace usb_storage ixgbe igb
> > iTCO_vendor_support i2c_algo_bit ptp i2c_i801 pps_core lpc_ich
> > i2c_core intel_ips mfd_core pcspkr ioatdma sunrpc dca tpm_tis mdio tpm
> [last unloaded: iTCO_wdt]
> > CPU: 2 PID: 12144 Comm: dinamo Tainted: G   O4.4.0 #23
> > Hardware name: PT AMC124/Base Board Product Name, BIOS
> > LGNAJFIP.PTI.0012.P15 01/15/2014
> > task: 880036ad8000 ti: 88003690 task.ti: 88003690
> > RIP: 0010:[]  []
> > tipc_sk_rcv+0x238/0x4d0 [tipc]
> > RSP: 0018:880036903bb8  EFLAGS: 00010292
> > RAX:  RBX: 88034def3970 RCX: 0001
> > RDX: 0101 RSI: 0292 RDI: 88034def3984
> > RBP: 880036903c28 R08: 0101 R09: 0004
> > R10: 0001 R11:  R12: 880036903d28
> > R13: bd1fd8b2 R14: 88034def3840 R15: 880036903d3c
> > FS:  7f1e86299740() GS:88035fc4()
> > knlGS:

Re: [tipc-discussion] tipc multicast stuck (hit max window) due to invalid bc_ack value

2017-02-22 Thread Jon Maloy
Hi Matthew,
See below for my comment.

Also, although this is about a different problem, you should check if you have 
the following patch, and the one it is referring to:
commit 06bd2b1ed04ca9f (" tipc: fix broadcast link synchronization problem")

 
> -Original Message-
> From: Wong, Matthew [mailto:mw...@sonusnet.com]
> Sent: Wednesday, February 22, 2017 12:41 PM
> To: tipc-discussion@lists.sourceforge.net
> Subject: [tipc-discussion] tipc multicast stuck (hit max window) due to 
> invalid
> bc_ack value
> 
> 
> Hi all,
> 
>I'm currently working on 4.4.0 kernel and is observing the following issues
> on tipc multicast.
> 
> 
> 1.  I have a system setup with 3 CPUs each using tipc to multicast to
> processes running on each CPU. After sending around 50 messages (the max
> window size), the far end did not receive the message any more. When
> Iooking at the tipc-conf -ls data, it said the broadcast-link start bunding

[...]

> 
> 4.  It seems the tipc_msg_validate modified the skb message and the hdr.
> The modified data looks fine and has the correct expected bc-ack/ack values
> in the message. However, currently the bc_ack and ack value is initialized
> before the tipc_msg_validate and so we'll use that value which may cause
> issue on my bc_ack update and comparsion.

The only possible culprit here is the function skb_may_pull(), which is called 
from msg_validate() in the rare case that header part of the packet buffer is 
non-linear. The function is a little hard to follow, but as I understand it, it 
linearizes the buffer in such cases, and header fields read before the 
validation will obviously be wrong.

> 
> 
> 
> 5If i move the bc_ack and ack after tipc_msg_validate, i don't have 
> any
> more tipc multicast stuck issue.  I have run it for half a day with multicast 
> on 4
> CPUs and so far there is no tipc multicast bundle trigger and no bogus bc_ack
> issue.  All multicast messges has been sent and received properly.
> 
> 
> 
> 6 Is this a known behavior and is this an issue? If yes, is this a 
> patch for it
> and will 4.4.48 has the same issue? Does the tipc_msg_validate function
> suppose to modify the hdr data and should we use the bc_ack/ack values
> afterwards the modification is completed.

We have never seen this before, but your diagnostics is totally credible. I 
will post a patch for this asap.
Nice job!

BR
///jon

> 
> 
> 
> Any comment is appreciated.
> 
> 
> 
> Regards,
> 
>Matthew
> 
>Sonus network.
> 
> --
> Check out the vibrant tech community on one of the world's most engaging
> tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [net-next v3 3/4] tipc: introduce replicast as transport option for multicast

2017-01-17 Thread Jon Maloy
Thanks Partha. Any viewpoints form Ying? Otherwise I'll send it in tomorrow.

///jon


> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Tuesday, 17 January, 2017 09:35
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net;
> Ying Xue <ying@windriver.com>
> Subject: Re: [tipc-discussion] [net-next v3 3/4] tipc: introduce replicast as
> transport option for multicast
> 
> On 01/16/2017 03:13 PM, Jon Maloy wrote:
> >
> >
> >> -Original Message-
> >> From: Parthasarathy Bhuvaragan
> >> Sent: Monday, 16 January, 2017 05:20
> >> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net;
> >> Ying Xue <ying@windriver.com>
> >> Subject: Re: [tipc-discussion] [net-next v3 3/4] tipc: introduce replicast 
> >> as
> >> transport option for multicast
> >>
> >> On 01/13/2017 04:18 PM, Jon Maloy wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Parthasarathy Bhuvaragan
> >>>> Sent: Friday, 13 January, 2017 04:24
> >>>> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> >> discuss...@lists.sourceforge.net;
> >>>> Ying Xue <ying@windriver.com>
> >>>> Subject: Re: [tipc-discussion] [net-next v3 3/4] tipc: introduce 
> >>>> replicast as
> >>>> transport option for multicast
> >>>>
> >>>> On 01/04/2017 06:05 PM, Parthasarathy Bhuvaragan wrote:
> >>>>> Hi Jon,
> >>>>>
> >>>>> Added some minor comments inline in this patch, apart from that the
> >>>>> major concern is the following:
> >>>>>
> >>>>> All my tests which passed before this patch, fails while sending
> >>>>> multicast to a receiver on own node.
> >>>>>
> >>>>> With this patch, we increase the likelyhood of receive buffer overflow
> >>>>> if the sender & receivers are running on the same host as we bypass the
> >>>>> link layer completely. I confirmed this with some traces in 
> >>>>> filter_rcv().
> >>>>>
> >>>>> If I add another multicast listener running on another node, this
> >>>>> pacifies the sender (put the sender to sleep at link congestion) and
> >>>>> relatively slow link layer reduces the buffer overflow.
> >>>>>
> >>>>> We need to find a way reduce the aggressiveness of the sender.
> >>>>> We want users to be transparent about the location of the services, so
> >>>>> we should to provide similar charecteristics regardless of the service
> >>>>> location.
> >>>>>
> >>>> Jon, running ptts sever and client on a standalone node without your
> >>>> updates failed. So in that aspect, iam ok with this patch.
> >>>>
> >>>> If the ethernet bearer lacks broadcast ability, then neighbor discovery
> >>>> will not work. So do we intend to introduce support to add ethernet
> >>>> peers manually as we do for udp bearers? otherwise we can never use
> >>>> replicast for non udp bearers.
> >>>
> >>> I believe all Ethernet implementations, even overlay networks, provide
> some
> >> form of broadcast, or in lack thereof, an emulated broadcast.
> >>> So, discovery should work, but it will be very inefficient when we do link
> >> broadcast, because tipc will think that genuine Ethernet broadcast is
> supported.
> >>> We actually need some way to find out what kind of "Ethernet" we are
> >> attached to, e.g. VXLAN, so that the "bcast supported" flag  can be set
> correctly.
> >>> I wonder if that if possible, or if it has to be configured.
> >>>
> >> I assumed that, but thanks for the clarification. I infer from your
> >> statement that its the User, who shall configure this per socket in case
> >> tipc is running over some kind of overlay networks. Tipc has no
> >> knowledge about the tunnel mechanisms used under the exposed bearers.
> >
> > No, I don't think that is a good option. I think it will be in only very 
> > special cases
> the user will want to enforce replicast, (e.g., security, or if he know there 
> will
> always be very few destinations), and those would not be related to the
> deployment.
> >
> >>
> >>

Re: [tipc-discussion] [net-next 0/3] tipc: improve interaction socket-link

2017-01-16 Thread Jon Maloy


> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of David Miller
> Sent: Monday, 16 January, 2017 14:45
> To: Jon Maloy <jon.ma...@ericsson.com>
> Cc: net...@vger.kernel.org; v...@zeniv.linux.org.uk; Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>; Ying Xue
> <ying@windriver.com>; ma...@donjonn.com; tipc-
> discuss...@lists.sourceforge.net
> Subject: Re: [net-next 0/3] tipc: improve interaction socket-link
> 
> From: Jon Maloy <jon.ma...@ericsson.com>
> Date: Tue,  3 Jan 2017 10:26:45 -0500
> 
> > We fix a very real starvation problem that may occur when a link
> > encounters send buffer congestion. At the same time we make the
> > interaction between the socket and link layer simpler and more
> > consistent.
> 
> This doesn't apply to net-next, also the Date in your emails is 10 days
> in the past.  What's going on here?

I don't know. This series was sent in on Jan 3rd, and applied by you the same 
day. Maybe the mail server decided to send you a duplicate?

///jon


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [net-next 1/4] tipc: add function for checking broadcast support in bearer

2017-01-18 Thread Jon Maloy
As a preparation for the 'replicast' functionality we are going to
introduce in the next commits, we need the broadcast base structure to
store whether bearer broadcast is available at all from the currently
used bearer or bearers.

We do this by adding a new function tipc_bearer_bcast_support() to
the bearer layer, and letting the bearer selection function in
bcast.c use this to give a new boolean field, 'bcast_support' the
appropriate value.

Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
Acked-by: Ying Xue <ying@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c | 12 +---
 net/tipc/bearer.c| 15 ++-
 net/tipc/bearer.h|  8 +++-
 net/tipc/udp_media.c |  8 
 4 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index c35fad3..3256276 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -1,7 +1,7 @@
 /*
  * net/tipc/bcast.c: TIPC broadcast code
  *
- * Copyright (c) 2004-2006, 2014-2015, Ericsson AB
+ * Copyright (c) 2004-2006, 2014-2016, Ericsson AB
  * Copyright (c) 2004, Intel Corporation.
  * Copyright (c) 2005, 2010-2011, Wind River Systems
  * All rights reserved.
@@ -54,12 +54,14 @@ const char tipc_bclink_name[] = "broadcast-link";
  * @inputq: data input queue; will only carry SOCK_WAKEUP messages
  * @dest: array keeping number of reachable destinations per bearer
  * @primary_bearer: a bearer having links to all broadcast destinations, if any
+ * @bcast_support: indicates if primary bearer, if any, supports broadcast
  */
 struct tipc_bc_base {
struct tipc_link *link;
struct sk_buff_head inputq;
int dests[MAX_BEARERS];
int primary_bearer;
+   bool bcast_support;
 };
 
 static struct tipc_bc_base *tipc_bc_base(struct net *net)
@@ -79,9 +81,10 @@ static void tipc_bcbase_select_primary(struct net *net)
 {
struct tipc_bc_base *bb = tipc_bc_base(net);
int all_dests =  tipc_link_bc_peers(bb->link);
-   int i, mtu;
+   int i, mtu, prim;
 
bb->primary_bearer = INVALID_BEARER_ID;
+   bb->bcast_support = true;
 
if (!all_dests)
return;
@@ -93,7 +96,7 @@ static void tipc_bcbase_select_primary(struct net *net)
mtu = tipc_bearer_mtu(net, i);
if (mtu < tipc_link_mtu(bb->link))
tipc_link_set_mtu(bb->link, mtu);
-
+   bb->bcast_support &= tipc_bearer_bcast_support(net, i);
if (bb->dests[i] < all_dests)
continue;
 
@@ -103,6 +106,9 @@ static void tipc_bcbase_select_primary(struct net *net)
if ((i ^ tipc_own_addr(net)) & 1)
break;
}
+   prim = bb->primary_bearer;
+   if (prim != INVALID_BEARER_ID)
+   bb->bcast_support = tipc_bearer_bcast_support(net, prim);
 }
 
 void tipc_bcast_inc_bearer_dst_cnt(struct net *net, int bearer_id)
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 52d7476..33a5bdf 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -431,7 +431,7 @@ int tipc_enable_l2_media(struct net *net, struct 
tipc_bearer *b,
memset(>bcast_addr, 0, sizeof(b->bcast_addr));
memcpy(b->bcast_addr.value, dev->broadcast, b->media->hwaddr_len);
b->bcast_addr.media_id = b->media->type_id;
-   b->bcast_addr.broadcast = 1;
+   b->bcast_addr.broadcast = TIPC_BROADCAST_SUPPORT;
b->mtu = dev->mtu;
b->media->raw2addr(b, >addr, (char *)dev->dev_addr);
rcu_assign_pointer(dev->tipc_ptr, b);
@@ -482,6 +482,19 @@ int tipc_l2_send_msg(struct net *net, struct sk_buff *skb,
return 0;
 }
 
+bool tipc_bearer_bcast_support(struct net *net, u32 bearer_id)
+{
+   bool supp = false;
+   struct tipc_bearer *b;
+
+   rcu_read_lock();
+   b = bearer_get(net, bearer_id);
+   if (b)
+   supp = (b->bcast_addr.broadcast == TIPC_BROADCAST_SUPPORT);
+   rcu_read_unlock();
+   return supp;
+}
+
 int tipc_bearer_mtu(struct net *net, u32 bearer_id)
 {
int mtu = 0;
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index 278ff7f..635c908 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -60,9 +60,14 @@
 #define TIPC_MEDIA_TYPE_IB 2
 #define TIPC_MEDIA_TYPE_UDP3
 
-/* minimum bearer MTU */
+/* Minimum bearer MTU */
 #define TIPC_MIN_BEARER_MTU(MAX_H_SIZE + INT_H_SIZE)
 
+/* Identifiers for distinguishing between broadcast/multicast and replicast
+ */
+#define TIPC_BROADCAST_SUPPORT  1
+#define TIPC_REPLICAST_SUPPORT  2
+
 /**
  * struct tipc_media_addr - destination address used by TIPC bearers
  * @value: address info (format defined by media)
@@ -210,6 +215,7 @@ int tipc_bearer_setup(void);
 void tipc_bearer_cleanup(void);
 void tipc_bearer_stop(struct n

[tipc-discussion] [net-next 4/4] tipc: make replicast a user selectable option

2017-01-18 Thread Jon Maloy
If the bearer carrying multicast messages supports broadcast, those
messages will be sent to all cluster nodes, irrespective of whether
these nodes host any actual destinations socket or not. This is clearly
wasteful if the cluster is large and there are only a few real
destinations for the message being sent.

In this commit we extend the eligibility of the newly introduced
"replicast" transmit option. We now make it possible for a user to
select which method he wants to be used, either as a mandatory setting
via setsockopt(), or as a relative setting where we let the broadcast
layer decide which method to use based on the ratio between cluster
size and the message's actual number of destination nodes.

In the latter case, a sending socket must stick to a previously
selected method until it enters an idle period of at least 5 seconds.
This eliminates the risk of message reordering caused by method change,
i.e., when changes to cluster size or number of destinations would
otherwise mandate a new method to be used.

Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
Acked-by: Ying Xue <ying@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 include/uapi/linux/tipc.h |  6 +++--
 net/tipc/bcast.c  | 62 ++-
 net/tipc/bcast.h  | 17 -
 net/tipc/link.c   |  4 +++
 net/tipc/node.h   |  4 ++-
 net/tipc/socket.c | 36 +--
 6 files changed, 112 insertions(+), 17 deletions(-)

diff --git a/include/uapi/linux/tipc.h b/include/uapi/linux/tipc.h
index bf049e8..5351b08 100644
--- a/include/uapi/linux/tipc.h
+++ b/include/uapi/linux/tipc.h
@@ -1,7 +1,7 @@
 /*
  * include/uapi/linux/tipc.h: Header for TIPC socket interface
  *
- * Copyright (c) 2003-2006, Ericsson AB
+ * Copyright (c) 2003-2006, 2015-2016 Ericsson AB
  * Copyright (c) 2005, 2010-2011, Wind River Systems
  * All rights reserved.
  *
@@ -220,7 +220,7 @@ struct sockaddr_tipc {
 #define TIPC_DESTNAME  3   /* destination name */
 
 /*
- * TIPC-specific socket option values
+ * TIPC-specific socket option names
  */
 
 #define TIPC_IMPORTANCE127 /* Default: TIPC_LOW_IMPORTANCE 
*/
@@ -229,6 +229,8 @@ struct sockaddr_tipc {
 #define TIPC_CONN_TIMEOUT  130 /* Default: 8000 (ms)  */
 #define TIPC_NODE_RECVQ_DEPTH  131 /* Default: none (read only) */
 #define TIPC_SOCK_RECVQ_DEPTH  132 /* Default: none (read only) */
+#define TIPC_MCAST_BROADCAST133 /* Default: TIPC selects. No arg */
+#define TIPC_MCAST_REPLICAST134 /* Default: TIPC selects. No arg */
 
 /*
  * Maximum sizes of TIPC bearer-related names (including terminating NULL)
diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 672e6ef..7d99029 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -54,6 +54,9 @@ const char tipc_bclink_name[] = "broadcast-link";
  * @dest: array keeping number of reachable destinations per bearer
  * @primary_bearer: a bearer having links to all broadcast destinations, if any
  * @bcast_support: indicates if primary bearer, if any, supports broadcast
+ * @rcast_support: indicates if all peer nodes support replicast
+ * @rc_ratio: dest count as percentage of cluster size where send method 
changes
+ * @bc_threshold: calculated drom rc_ratio; if dests > threshold use broadcast
  */
 struct tipc_bc_base {
struct tipc_link *link;
@@ -61,6 +64,9 @@ struct tipc_bc_base {
int dests[MAX_BEARERS];
int primary_bearer;
bool bcast_support;
+   bool rcast_support;
+   int rc_ratio;
+   int bc_threshold;
 };
 
 static struct tipc_bc_base *tipc_bc_base(struct net *net)
@@ -73,6 +79,19 @@ int tipc_bcast_get_mtu(struct net *net)
return tipc_link_mtu(tipc_bc_sndlink(net)) - INT_H_SIZE;
 }
 
+void tipc_bcast_disable_rcast(struct net *net)
+{
+   tipc_bc_base(net)->rcast_support = false;
+}
+
+static void tipc_bcbase_calc_bc_threshold(struct net *net)
+{
+   struct tipc_bc_base *bb = tipc_bc_base(net);
+   int cluster_size = tipc_link_bc_peers(tipc_bc_sndlink(net));
+
+   bb->bc_threshold = 1 + (cluster_size * bb->rc_ratio / 100);
+}
+
 /* tipc_bcbase_select_primary(): find a bearer with links to all destinations,
  *   if any, and make it primary bearer
  */
@@ -175,6 +194,31 @@ static void tipc_bcbase_xmit(struct net *net, struct 
sk_buff_head *xmitq)
__skb_queue_purge(&_xmitq);
 }
 
+static void tipc_bcast_select_xmit_method(struct net *net, int dests,
+ struct tipc_mc_method *method)
+{
+   struct tipc_bc_base *bb = tipc_bc_base(net);
+   unsigned long exp = method->expires;
+
+   /* Broadcast supported by used bearer/bearers? */
+   if (!bb->bcast_support) {
+   method->rcast = true;
+   return;
+   }
+   /* Any 

[tipc-discussion] [net-next 0/4] tipc: emulate multicast through replication

2017-01-18 Thread Jon Maloy
TIPC multicast messages are currently distributed via L2 broadcast
or IP multicast to all nodes in the cluster, irrespective of the
number of real destinations of the message.

In this series we introduce an option to transport messages via
replication ("replicast") across a selected number of unicast links,
instead of relying on the underlying media. This option is used when
true broadcast/multicast is not supported by the media, or when the
number of true destinations is much smaller than the cluster size.


Jon Maloy (4):
  tipc: add function for checking broadcast support in bearer
  tipc: add functionality to lookup multicast destination nodes
  tipc: introduce replicast as transport option for multicast
  tipc: make replicast a user selectable option

 include/uapi/linux/tipc.h |   6 +-
 net/tipc/bcast.c  | 200 +++---
 net/tipc/bcast.h  |  33 +++-
 net/tipc/bearer.c |  15 +++-
 net/tipc/bearer.h |   8 +-
 net/tipc/link.c   |  12 ++-
 net/tipc/msg.c|  17 
 net/tipc/msg.h|   9 +--
 net/tipc/name_table.c |  38 +++--
 net/tipc/name_table.h |   9 +++
 net/tipc/node.c   |  27 ---
 net/tipc/node.h   |   4 +-
 net/tipc/socket.c |  61 ++
 net/tipc/udp_media.c  |   8 +-
 14 files changed, 374 insertions(+), 73 deletions(-)

-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [net-next 2/4] tipc: add functionality to lookup multicast destination nodes

2017-01-18 Thread Jon Maloy
As a further preparation for the upcoming 'replicast' functionality,
we add some necessary structs and functions for looking up and returning
a list of all nodes that host destinations for a given multicast message.

Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
Acked-by: Ying Xue <ying@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c  | 33 +++--
 net/tipc/bcast.h  | 15 ++-
 net/tipc/name_table.c | 38 +-
 net/tipc/name_table.h |  9 +
 4 files changed, 87 insertions(+), 8 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 3256276..412d335 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -39,9 +39,8 @@
 #include "socket.h"
 #include "msg.h"
 #include "bcast.h"
-#include "name_distr.h"
 #include "link.h"
-#include "node.h"
+#include "name_table.h"
 
 #defineBCLINK_WIN_DEFAULT  50  /* bcast link window size 
(default) */
 #defineBCLINK_WIN_MIN  32  /* bcast minimum link window 
size */
@@ -434,3 +433,33 @@ void tipc_bcast_stop(struct net *net)
kfree(tn->bcbase);
kfree(tn->bcl);
 }
+
+void tipc_nlist_init(struct tipc_nlist *nl, u32 self)
+{
+   memset(nl, 0, sizeof(*nl));
+   INIT_LIST_HEAD(>list);
+   nl->self = self;
+}
+
+void tipc_nlist_add(struct tipc_nlist *nl, u32 node)
+{
+   if (node == nl->self)
+   nl->local = true;
+   else if (u32_push(>list, node))
+   nl->remote++;
+}
+
+void tipc_nlist_del(struct tipc_nlist *nl, u32 node)
+{
+   if (node == nl->self)
+   nl->local = false;
+   else if (u32_del(>list, node))
+   nl->remote--;
+}
+
+void tipc_nlist_purge(struct tipc_nlist *nl)
+{
+   u32_list_purge(>list);
+   nl->remote = 0;
+   nl->local = 0;
+}
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 855d53c..18f3791 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -42,9 +42,22 @@
 struct tipc_node;
 struct tipc_msg;
 struct tipc_nl_msg;
-struct tipc_node_map;
+struct tipc_nlist;
+struct tipc_nitem;
 extern const char tipc_bclink_name[];
 
+struct tipc_nlist {
+   struct list_head list;
+   u32 self;
+   u16 remote;
+   bool local;
+};
+
+void tipc_nlist_init(struct tipc_nlist *nl, u32 self);
+void tipc_nlist_purge(struct tipc_nlist *nl);
+void tipc_nlist_add(struct tipc_nlist *nl, u32 node);
+void tipc_nlist_del(struct tipc_nlist *nl, u32 node);
+
 int tipc_bcast_init(struct net *net);
 void tipc_bcast_stop(struct net *net);
 void tipc_bcast_add_peer(struct net *net, struct tipc_link *l,
diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index 5a86df1..9be6592 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -645,6 +645,39 @@ int tipc_nametbl_mc_translate(struct net *net, u32 type, 
u32 lower, u32 upper,
return res;
 }
 
+/* tipc_nametbl_lookup_dst_nodes - find broadcast destination nodes
+ * - Creates list of nodes that overlap the given multicast address
+ * - Determines if any node local ports overlap
+ */
+void tipc_nametbl_lookup_dst_nodes(struct net *net, u32 type, u32 lower,
+  u32 upper, u32 domain,
+  struct tipc_nlist *nodes)
+{
+   struct sub_seq *sseq, *stop;
+   struct publication *publ;
+   struct name_info *info;
+   struct name_seq *seq;
+
+   rcu_read_lock();
+   seq = nametbl_find_seq(net, type);
+   if (!seq)
+   goto exit;
+
+   spin_lock_bh(>lock);
+   sseq = seq->sseqs + nameseq_locate_subseq(seq, lower);
+   stop = seq->sseqs + seq->first_free;
+   for (; sseq->lower <= upper && sseq != stop; sseq++) {
+   info = sseq->info;
+   list_for_each_entry(publ, >zone_list, zone_list) {
+   if (tipc_in_scope(domain, publ->node))
+   tipc_nlist_add(nodes, publ->node);
+   }
+   }
+   spin_unlock_bh(>lock);
+exit:
+   rcu_read_unlock();
+}
+
 /*
  * tipc_nametbl_publish - add name publication to network name tables
  */
@@ -1022,11 +1055,6 @@ int tipc_nl_name_table_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
return skb->len;
 }
 
-struct u32_item {
-   struct list_head list;
-   u32 value;
-};
-
 bool u32_find(struct list_head *l, u32 value)
 {
struct u32_item *item;
diff --git a/net/tipc/name_table.h b/net/tipc/name_table.h
index c89bb3f..6ebdeb1 100644
--- a/net/tipc/name_table.h
+++ b/net/tipc/name_table.h
@@ -39,6 +39,7 @@
 
 struct tipc_subscription;
 struct tipc_plist;
+struct tipc_nlist;
 
 /*
  * TIPC name types reserved for internal TIPC use (both current and planned)
@

[tipc-discussion] [PATCH net-next 1/2] tipc: make bearer packet filtering generic

2016-08-16 Thread Jon Maloy
In commit 5b7066c3dd24 ("tipc: stricter filtering of packets in bearer
layer") we introduced a method of filtering out messages while a bearer
is being reset, to avoid that links may be re-created and come back in
working state while we are still in the process of shutting them down.

This solution works well, but is limited to only work with L2 media, which
is insufficient with the increasing use of UDP as carrier media.

We now replace this solution with a more generic one, by introducing a
new flag "up" in the generic struct tipc_bearer. This field will be set
and reset at the same locations as with the previous solution, while
the packet filtering is moved to the generic code for the sending side.
On the receiving side, the filtering is still done in media specific
code, but now including the UDP bearer.

Acked-by: Ying Xue <ying@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bearer.c| 78 +++-
 net/tipc/bearer.h|  1 +
 net/tipc/udp_media.c |  2 +-
 3 files changed, 42 insertions(+), 39 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 65b1bbf..6fc4e3c 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -56,6 +56,13 @@ static struct tipc_media * const media_info_array[] = {
NULL
 };
 
+static struct tipc_bearer *bearer_get(struct net *net, int bearer_id)
+{
+   struct tipc_net *tn = tipc_net(net);
+
+   return rcu_dereference_rtnl(tn->bearer_list[bearer_id]);
+}
+
 static void bearer_disable(struct net *net, struct tipc_bearer *b);
 
 /**
@@ -323,6 +330,7 @@ restart:
b->domain = disc_domain;
b->net_plane = bearer_id + 'A';
b->priority = priority;
+   test_and_set_bit_lock(0, >up);
 
res = tipc_disc_create(net, b, >bcast_addr, );
if (res) {
@@ -360,15 +368,24 @@ static int tipc_reset_bearer(struct net *net, struct 
tipc_bearer *b)
  */
 void tipc_bearer_reset_all(struct net *net)
 {
-   struct tipc_net *tn = tipc_net(net);
struct tipc_bearer *b;
int i;
 
for (i = 0; i < MAX_BEARERS; i++) {
-   b = rcu_dereference_rtnl(tn->bearer_list[i]);
+   b = bearer_get(net, i);
+   if (b)
+   clear_bit_unlock(0, >up);
+   }
+   for (i = 0; i < MAX_BEARERS; i++) {
+   b = bearer_get(net, i);
if (b)
tipc_reset_bearer(net, b);
}
+   for (i = 0; i < MAX_BEARERS; i++) {
+   b = bearer_get(net, i);
+   if (b)
+   test_and_set_bit_lock(0, >up);
+   }
 }
 
 /**
@@ -382,8 +399,9 @@ static void bearer_disable(struct net *net, struct 
tipc_bearer *b)
int bearer_id = b->identity;
 
pr_info("Disabling bearer <%s>\n", b->name);
-   b->media->disable_media(b);
+   clear_bit_unlock(0, >up);
tipc_node_delete_links(net, bearer_id);
+   b->media->disable_media(b);
RCU_INIT_POINTER(b->media_ptr, NULL);
if (b->link_req)
tipc_disc_delete(b->link_req);
@@ -440,22 +458,16 @@ int tipc_l2_send_msg(struct net *net, struct sk_buff *skb,
 {
struct net_device *dev;
int delta;
-   void *tipc_ptr;
 
dev = (struct net_device *)rcu_dereference_rtnl(b->media_ptr);
if (!dev)
return 0;
 
-   /* Send RESET message even if bearer is detached from device */
-   tipc_ptr = rcu_dereference_rtnl(dev->tipc_ptr);
-   if (unlikely(!tipc_ptr && !msg_is_reset(buf_msg(skb
-   goto drop;
-
-   delta = dev->hard_header_len - skb_headroom(skb);
-   if ((delta > 0) &&
-   pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC))
-   goto drop;
-
+   delta = SKB_DATA_ALIGN(dev->hard_header_len - skb_headroom(skb));
+   if ((delta > 0) && pskb_expand_head(skb, delta, 0, GFP_ATOMIC)) {
+   kfree_skb(skb);
+   return 0;
+   }
skb_reset_network_header(skb);
skb->dev = dev;
skb->protocol = htons(ETH_P_TIPC);
@@ -463,9 +475,6 @@ int tipc_l2_send_msg(struct net *net, struct sk_buff *skb,
dev->dev_addr, skb->len);
dev_queue_xmit(skb);
return 0;
-drop:
-   kfree_skb(skb);
-   return 0;
 }
 
 int tipc_bearer_mtu(struct net *net, u32 bearer_id)
@@ -487,12 +496,12 @@ void tipc_bearer_xmit_skb(struct net *net, u32 bearer_id,
  struct sk_buff *skb,
  struct tipc_media_addr *dest)
 {
-   struct tipc_net *tn = tipc_net(net);
+   struct tipc_msg *hdr = buf_msg(skb);
struct tipc_bearer *b;
 
rcu_read_lock();
-   b = rcu_dereference_rtnl(tn->bearer_list[bearer_id]);
-   if

Re: [tipc-discussion] [PATCH 0/3] tipcutils: remove duplicated functionality

2016-08-15 Thread Jon Maloy
If Erik doesn't have access to the tipcutils repository I suggest one of you 
guys apply and check it in. 
Acked-by: jon

> -Original Message-
> From: Richard Alpe
> Sent: Monday, 01 August, 2016 08:39
> To: Erik Hugne <erik.hu...@gmail.com>; Jon Maloy <jon.ma...@ericsson.com>;
> Ying Xue <ying@windriver.com>; Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [PATCH 0/3] tipcutils: remove duplicated functionality
> 
> What happened to this patch-set?
> 
> /Richard
> 
> On 2016-04-09 08:21, Erik Hugne wrote:
> > It made sense to keep tipcconfig in the tipcutils package for a while
> > after Richard merged it to iproute2, but now it's becoming more of a
> > source of confusion. And any patches submitted to iproute2/tipc would
> > need to be applied here aswell..
> >
> > This series removes tipcconfig from the utils package, and also does
> > some other small cleanups.
> >
> > Erik Hugne (3):
> >   tipcutils: fix .gitignore and remove binaries from git control
> >   tipcutils: fix build warnings
> >   tipcutils: purge old config tool
> >
> >  .gitignore |3 +
> >  Makefile.am|1 -
> >  configure.ac   |3 +-
> >  demos/c_api_demo/tipc_c_api_client |  Bin 44348 -> 0 bytes
> >  demos/c_api_demo/tipc_c_api_server |  Bin 44404 -> 0 bytes
> >  demos/iov_control/iov_control.c|4 +-
> >  man/Makefile.am|2 +-
> >  man/tipc-config.1  |  115 ---
> >  man/tipc-pipe.1|3 -
> >  ptts/tipc_ts_server.c  |9 -
> >  tipc-config/Makefile.am|1 -
> >  tipc-config/README |   34 -
> >  tipc-config/tipc-config.c  | 1506 
> > 
> >  tipc-pipe/tipc-pipe.c  |2 +-
> >  14 files changed, 9 insertions(+), 1674 deletions(-)
> >  delete mode 100755 demos/c_api_demo/tipc_c_api_client
> >  delete mode 100755 demos/c_api_demo/tipc_c_api_server
> >  delete mode 100644 man/tipc-config.1
> >  delete mode 100644 tipc-config/Makefile.am
> >  delete mode 100644 tipc-config/README
> >  delete mode 100644 tipc-config/tipc-config.c
> >


--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-27 Thread Jon Maloy


> -Original Message-
> From: Butler, Peter [mailto:pbut...@sonusnet.com]
> Sent: Monday, February 27, 2017 10:15 AM
> To: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
> Cc: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net; Butler, Peter <pbut...@sonusnet.com>
> Subject: RE: TIPC Oops in tipc_sk_recv
> 
> Partha,
> 
> You said: "A word of caution: My users still face stability issues (connection
> are permanently congested) while running load over several connections
> (~1000+) and i am yet to find the root cause."

It is 1000+ connections with very high traffic load. We will keep you updated.
PS. I am also in CET now, and for the next two weeks.

///jon

> 
> When you say "connections are permanently congested while running load
> over several connections", do you mean 1000+ connections?  Or 1000+
> messages per second?
> 
> Our mesh only has ~30 nodes.
> 
> Peter
> 
> 
> 
> 
> 
> -Original Message-
> From: Parthasarathy Bhuvaragan
> [mailto:parthasarathy.bhuvara...@ericsson.com]
> Sent: February-27-17 7:37 AM
> To: Butler, Peter <pbut...@sonusnet.com>
> Cc: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net
> Subject: Re: TIPC Oops in tipc_sk_recv
> 
> On 02/24/2017 04:15 PM, Butler, Peter wrote:
> > Hi Partha,
> >
> >
> >
> > In our situation we do not need to support the delivery of RPMs in any
> > way.  Literally the only thing we are changing on the target systems
> > is the tipc.ko file.  That is, the original 4.4.0 kernel and all other
> > 4.4.0-specific kernel modules will be left untouched.
> >
> >
> >
> > I am doing this actually removing the include/uapi/linux/tipc* and
> > net/tipc/* files from within our 4.4.0 kernel source tree, and
> > replacing them with the files from kernel 4.9.11.  (Note that kernel
> > 4.9.11 actually has a couple more TIPC-related files than the 4.4.0
> > kernel.) To accomplish I had to make a few changes (as per the email
> > thread between Jon and myself) to get it to compile.
> >
> >
> >
> > Then, when I kick off a 'make' (no 'make clean' is performed) at the
> > top level of the kernel source tree the build process detects that
> > everything TIPC-related requires building, and a new tipc.ko is
> > generated.  This tipc.ko is literally taken and installed onto the
> > existing 4.4.0 systems without any other changes (e.g. no new bzImage
> > is installed - the original kernel file is left untouched).
> >
> >
> >
> > We're not concerned about maintainability for now, as we plan on doing
> > a full upgrade of the entire kernel at some point in the next few months.
> > The hybrid of a 4.4.0 kernel running a TIPC source from 4.9.11 is only
> > a stop-gap measure for an emergency fix needed asap.
> >
> >
> >
> > If you can foresee any issues with our short-term plan here let me
> > know.  As it stands I have the module built and running - but that of
> > course doesn't mean that run-time issues won't occur.
> >
> Hi Peter,
> 
> If you are taking the latest tipc, please patch yours with these two fixes for
> the socket.
> [PATCH net v1 1/2] tipc: fix socket flow control errors [PATCH net v1 2/2] 
> tipc:
> Fix missing connection request handling
> 
> A word of caution: My users still face stability issues (connection are
> permanently congested) while running load over several connections
> (~1000+) and i am yet to find the root cause.
> 
> /Partha
> >
> >
> > /Peter
> >
> >
> >
> > *From:*Parthasarathy Bhuvaragan
> > [mailto:parthasarathy.bhuvara...@ericsson.com]
> > *Sent:* February-24-17 5:21 AM
> > *To:* Butler, Peter <pbut...@sonusnet.com>
> > *Cc:* Jon Maloy <jon.ma...@ericsson.com>;
> > tipc-discussion@lists.sourceforge.net
> > *Subject:* Re: TIPC Oops in tipc_sk_recv
> >
> >
> >
> > Hi Peter,
> >
> >
> >
> > The backporting strategy varies depending on:
> >
> > 1. Supporting upgrades of rpm's. Ex: can you deliver a new tipc rpm
> > and update it on an existing kernel.
> >
> > 2. Delivering / Upgrading the entire kernel. No individual rpm updates
> > are delivered.
> >
> >
> >
> > If its option 2, then you may be allowed to update tipc ABI i.e
> > include the commits which touch include/uapi/linux/tipc*.
> >
> > I have to support option 1, so I cannot include any commit which
> > touches files outside net/tipc/ without manual i

Re: [tipc-discussion] [PATCH net-next v2 4/7] tipc: introduce UDP replicast

2016-08-24 Thread Jon Maloy


On 08/23/2016 10:41 AM, Richard Alpe wrote:
> This patch introduces UDP replicast. A concept where we emulate
> multicast by sending multiple unicast messages to configured peers.
>
> The purpose of replicast is mainly to be able to use TIPC in cloud
> environments where IP multicast is disabled. Using replicas to unicast
> multicast messages is costly as we have to copy each skb and send the
> copies individually.
>
> Signed-off-by: Richard Alpe 
> ---
>   include/uapi/linux/tipc_netlink.h |   1 +
>   net/tipc/bearer.c |  44 +
>   net/tipc/bearer.h |   1 +
>   net/tipc/netlink.c|   5 ++
>   net/tipc/udp_media.c  | 126 
> ++
>   net/tipc/udp_media.h  |  44 +
>   6 files changed, 210 insertions(+), 11 deletions(-)
>   create mode 100644 net/tipc/udp_media.h
>
> diff --git a/include/uapi/linux/tipc_netlink.h 
> b/include/uapi/linux/tipc_netlink.h
> index bcb65ef..b15664c 100644
> --- a/include/uapi/linux/tipc_netlink.h
> +++ b/include/uapi/linux/tipc_netlink.h
> @@ -60,6 +60,7 @@ enum {
>   TIPC_NL_MON_GET,
>   TIPC_NL_MON_PEER_GET,
>   TIPC_NL_PEER_REMOVE,
> + TIPC_NL_BEARER_ADD,
>   
>   __TIPC_NL_CMD_MAX,
>   TIPC_NL_CMD_MAX = __TIPC_NL_CMD_MAX - 1
> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
> index 6fc4e3c..b82cb00 100644
> --- a/net/tipc/bearer.c
> +++ b/net/tipc/bearer.c
> @@ -42,6 +42,7 @@
>   #include "monitor.h"
>   #include "bcast.h"
>   #include "netlink.h"
> +#include "udp_media.h"
>   
>   #define MAX_ADDR_STR 60
>   
> @@ -897,6 +898,49 @@ int tipc_nl_bearer_enable(struct sk_buff *skb, struct 
> genl_info *info)
>   return 0;
>   }
>   
> +int tipc_nl_bearer_add(struct sk_buff *skb, struct genl_info *info)
> +{
> + int err;
> + char *name;
> + struct tipc_bearer *b;
> + struct nlattr *attrs[TIPC_NLA_BEARER_MAX + 1];
> + struct net *net = sock_net(skb->sk);
> +
> + if (!info->attrs[TIPC_NLA_BEARER])
> + return -EINVAL;
> +
> + err = nla_parse_nested(attrs, TIPC_NLA_BEARER_MAX,
> +info->attrs[TIPC_NLA_BEARER],
> +tipc_nl_bearer_policy);
> + if (err)
> + return err;
> +
> + if (!attrs[TIPC_NLA_BEARER_NAME])
> + return -EINVAL;
> + name = nla_data(attrs[TIPC_NLA_BEARER_NAME]);
> +
> + rtnl_lock();
> + b = tipc_bearer_find(net, name);
> + if (!b) {
> + rtnl_unlock();
> + return -EINVAL;
> + }
> +
> +#ifdef CONFIG_TIPC_MEDIA_UDP
> + if (attrs[TIPC_NLA_BEARER_UDP_OPTS]) {
> + err = tipc_udp_nl_bearer_add(b,
> +  attrs[TIPC_NLA_BEARER_UDP_OPTS]);
> + if (err) {
> + rtnl_unlock();
> + return err;
> + }
> + }
> +#endif
> + rtnl_unlock();
> +
> + return 0;
> +}
> +
>   int tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info)
>   {
>   int err;
> diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
> index 83a9abb..78892e2f 100644
> --- a/net/tipc/bearer.h
> +++ b/net/tipc/bearer.h
> @@ -181,6 +181,7 @@ int tipc_nl_bearer_enable(struct sk_buff *skb, struct 
> genl_info *info);
>   int tipc_nl_bearer_dump(struct sk_buff *skb, struct netlink_callback *cb);
>   int tipc_nl_bearer_get(struct sk_buff *skb, struct genl_info *info);
>   int tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info);
> +int tipc_nl_bearer_add(struct sk_buff *skb, struct genl_info *info);
>   
>   int tipc_nl_media_dump(struct sk_buff *skb, struct netlink_callback *cb);
>   int tipc_nl_media_get(struct sk_buff *skb, struct genl_info *info);
> diff --git a/net/tipc/netlink.c b/net/tipc/netlink.c
> index 2718de6..3122f21 100644
> --- a/net/tipc/netlink.c
> +++ b/net/tipc/netlink.c
> @@ -161,6 +161,11 @@ static const struct genl_ops tipc_genl_v2_ops[] = {
>   .policy = tipc_nl_policy,
>   },
>   {
> + .cmd= TIPC_NL_BEARER_ADD,
> + .doit   = tipc_nl_bearer_add,
> + .policy = tipc_nl_policy,
> + },
> + {
>   .cmd= TIPC_NL_BEARER_SET,
>   .doit   = tipc_nl_bearer_set,
>   .policy = tipc_nl_policy,
> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
> index b8ec1a1..d4517f4 100644
> --- a/net/tipc/udp_media.c
> +++ b/net/tipc/udp_media.c
> @@ -49,6 +49,7 @@
>   #include "core.h"
>   #include "bearer.h"
>   #include "netlink.h"
> +#include "msg.h"
>   
>   /* IANA assigned UDP port */
>   #define UDP_PORT_DEFAULT6118
> @@ -70,6 +71,13 @@ struct udp_media_addr {
>   };
>   };
>   
> +/* struct udp_replicast - container for UDP remote addresses */
> +struct udp_replicast {
> + struct udp_media_addr addr;
> + struct rcu_head rcu;
> + struct list_head list;
> +};
> +
>   /**
>* struct udp_bearer 

Re: [tipc-discussion] BC rcv link acked stuck after receiving a named with a BC ACK of 0

2016-09-06 Thread Jon Maloy
Hi John,
See below.

///jon


From: John THompson [mailto:thompa@gmail.com]
Sent: Tuesday, 06 September, 2016 00:14
To: Jon Maloy <ma...@donjonn.com>
Cc: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] BC rcv link acked stuck after receiving a named 
with a BC ACK of 0

Hi Jon,

The packet I see the error happening on is when receiving a usr 11
(NAME_DISTRIBUTOR) over the unicast link.
The reception of this packet is happening interleaved with processing
a packet (or packets) on the BC link that has brought the peer up.
The BC link packet processing has the tipc_bcast_lock and the unicast
pkt processing cannot get the bcast lock for a while.
When it can get the lock it processes the BC ack == 0 from the NAME_DISTRIBUTOR
packet and sets the acked field on the BC link to 0.

The debug / call trace below is me trying to show from the debug I captured 
what happens.
If I add debug for each pkt the problem doesn't reproduce.


tipc_rcv 1.1.5:vcs_mgmt-1.1.18:vcs_mgmt bc ack rcv 0 uc seq 3 ack 0 user 11 
type 0
  + calls tipc_bcast_ack_rcv
tipc_rcv
  + tipc_bcast_ack_rcv
+ tipc_link_bc_ack_rcv broadcast-link-5-18 bc ack 53574 - can't ack as link 
not up 1 or peer not up 1

What kind of packet was this?

tipc_rcv
  + tipc_bcast_ack_rcv
+ tipc_link_bc_ack_rcv broadcast-link-5-18 bc ack 53574 - can't ack as link 
not up 0 or peer not up 1

And this?

===
Somewhere at this point bc_peer_is_up gets set
===
tipc_rcv
  + tipc_bcast_ack_rcv
+ tipc_link_bc_ack_rcv broadcast-link-5-18 bc ack - acked (53574) less than 
it was previously (53574)
tipc_rcv
  + tipc_bcast_ack_rcv
+ tipc_link_bc_ack_rcv broadcast-link-5-18 bc ack - acked (53574) less than 
it was previously (53574)

  + from tipc_rcv on unicast link
+ tipc_bcast_ack_rcv Going to set BC ACK outside window, new 0 old 53574 
win 200
  - dump_stack
CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: P   O4.4.6-at1 #3
Call Trace:
[a3093a80] [806943b0] dump_stack+0x84/0xb0 (unreliable)
[a3093a90] [c1507314] tipc_link_bc_ack_rcv+0x244/0x250 [tipc]
[a3093ab0] [c1501b04] tipc_bcast_ack_rcv+0x74/0xd0 [tipc]
[a3093ae0] [c1511a08] tipc_rcv+0x468/0xa30 [tipc]
[a3093b80] [c150218c] tipc_bcast_stop+0xfc/0x7b0 [tipc]
[a3093b90] [8050d6a8] __netif_receive_skb_core+0x468/0xa10
[a3093c30] [80510b6c] netif_receive_skb_internal+0x3c/0xe0
[a3093c60] [8064b2b8] br_handle_frame_finish+0x1d8/0x4d0
[a3093cd0] [8064b7a0] br_handle_frame+0x1f0/0x330
[a3093d20] [8050d738] __netif_receive_skb_core+0x4f8/0xa10
[a3093dc0] [805119f0] process_backlog+0x90/0x140
[a3093df0] [8051103c] net_rx_action+0x15c/0x320
[a3093e50] [8002594c] __do_softirq+0x13c/0x250
[a3093eb0] [80025ab0] run_ksoftirqd+0x50/0x80
[a3093ec0] [800434c4] smpboot_thread_fn+0x1e4/0x1f0
[a3093ef0] [8003fb38] kthread+0xc8/0xe0
[a3093f40] [8000eed8] ret_from_kernel_thread+0x5c/0x64

I am going to send in a patch that adds checking for a valid BC ack (being 
within the window size) to
tipc_link_bc_ack_rcv.

Not sure that is a good idea. Even if #0 happens to be within a valid range it 
is still invalid, and may lead to an inadvertent release of packets which are 
not ready to be released yet.  I’ll try to take a closer look at this today.

///jon

Cheers,
JT

On Wed, Aug 31, 2016 at 10:57 PM, John THompson 
<thompa@gmail.com<mailto:thompa@gmail.com>> wrote:
Hi Jon,

I have verified that the patch is included in my build.
2d18ac4ba7454a426047 (“ tipc: extend broadcast link initialization criteria”)

I am trying to verify which packets are received when the problem occurs but I 
am having trouble getting the information out of my system at the moment.

I will keep trying.
Thanks,
JT


On Tue, Aug 30, 2016 at 6:20 PM, Jon Maloy 
<ma...@donjonn.com<mailto:ma...@donjonn.com>> wrote:


On 08/29/2016 06:48 PM, Jon Maloy wrote:
Hi John,
Sorry for my late answer; I was on vacation for a few days.
It seems I gave you the wrong commit reference in my previous mail. The one I 
really meant was
2d18ac4ba7454a426047 (“ tipc: extend broadcast link initialization criteria”)

This one explains why the first packets sometimes get an invalid ack number, 
but also remedies it, and I simply cannot see how an invalid ack #0 can ever be 
accepted when this patch is applied.
I see no reason why this patch shouldn’t also be present in you code, but just 
to make sure, can you confirm this?

I am right now wondering if a retransmission is the problem:
1: we receive pkt #2 which contains ack #1, so we set bc_peer_is_up to true.
Since only LINK_PROTO/STATE messages can cause bc_peer_is_up to go true, the 
likely sequence is rather
1: We receive a STATE message with unicast ack #1. This message should also 
contain a valid, with high probability non-zero, bc_ack. bc_peer_is_up is set 
to true.
2: We receive unicast pkt#1 (BCAST init or NAMED) which contains the invalid 
unicast ack #0. This one is now accepted.

I believe thi

[tipc-discussion] [PATCH net-next 3/3] tipc: send broadcast nack directly upon sequence gap detection

2016-09-01 Thread Jon Maloy
Because of the risk of an excessive number of NACK messages and
retransissions, receivers have until now abstained from sending
broadcast NACKS directly upon detection of a packet sequence number
gap. We have instead relied on such gaps being detected by link
protocol STATE message exchange, something that by necessity delays
such detection and subsequent retransmissions.

With the introduction of unicast NACK transmission and rate control
of retransmissions we can now remove this limitation. We now allow
receiving nodes to send NACKS immediately, while coordinating the
permission to do so among the nodes in order to avoid NACK storms.

Reviewed-by: Ying Xue <ying@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/link.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 58bb44d..b36e16c 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -242,8 +242,8 @@ static void tipc_link_build_proto_msg(struct tipc_link *l, 
int mtyp, bool probe,
  u16 rcvgap, int tolerance, int priority,
  struct sk_buff_head *xmitq);
 static void link_print(struct tipc_link *l, const char *str);
-static void tipc_link_build_nack_msg(struct tipc_link *l,
-struct sk_buff_head *xmitq);
+static int tipc_link_build_nack_msg(struct tipc_link *l,
+   struct sk_buff_head *xmitq);
 static void tipc_link_build_bc_init_msg(struct tipc_link *l,
struct sk_buff_head *xmitq);
 static bool tipc_link_release_pkts(struct tipc_link *l, u16 to);
@@ -1184,17 +1184,26 @@ void tipc_link_build_reset_msg(struct tipc_link *l, 
struct sk_buff_head *xmitq)
 }
 
 /* tipc_link_build_nack_msg: prepare link nack message for transmission
+ * Note that sending of broadcast NACK is coordinated among nodes, to
+ * reduce the risk of NACK storms towards the sender
  */
-static void tipc_link_build_nack_msg(struct tipc_link *l,
-struct sk_buff_head *xmitq)
+static int tipc_link_build_nack_msg(struct tipc_link *l,
+   struct sk_buff_head *xmitq)
 {
u32 def_cnt = ++l->stats.deferred_recv;
+   int match1, match2;
 
-   if (link_is_bc_rcvlink(l))
-   return;
+   if (link_is_bc_rcvlink(l)) {
+   match1 = def_cnt & 0xf;
+   match2 = tipc_own_addr(l->net) & 0xf;
+   if (match1 == match2)
+   return TIPC_LINK_SND_STATE;
+   return 0;
+   }
 
if ((skb_queue_len(>deferdq) == 1) || !(def_cnt % TIPC_NACK_INTV))
tipc_link_build_proto_msg(l, STATE_MSG, 0, 0, 0, 0, xmitq);
+   return 0;
 }
 
 /* tipc_link_rcv - process TIPC packets/messages arriving from off-node
@@ -1245,7 +1254,7 @@ int tipc_link_rcv(struct tipc_link *l, struct sk_buff 
*skb,
/* Defer delivery if sequence gap */
if (unlikely(seqno != rcv_nxt)) {
__tipc_skb_queue_sorted(defq, seqno, skb);
-   tipc_link_build_nack_msg(l, xmitq);
+   rc |= tipc_link_build_nack_msg(l, xmitq);
break;
}
 
-- 
2.7.4


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH net-next 1/3] tipc: transfer broadcast nacks in link state messages

2016-09-01 Thread Jon Maloy
When we send broadcasts in clusters of more 70-80 nodes, we sometimes
see the broadcast link resetting because of an excessive number of
retransmissions. This is caused by a combination of two factors:

1) A 'NACK crunch", where loss of broadcast packets is discovered
   and NACK'ed by several nodes simultaneously, leading to multiple
   redundant broadcast retransmissions.

2) The fact that the NACKS as such also are sent as broadcast, leading
   to excessive load and packet loss on the transmitting switch/bridge.

This commit deals with the latter problem, by moving sending of
broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type
to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused
bits in word 8 of the said message for this purpose, and introduce a
new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change
backwards compatible.

Reviewed-by: Ying Xue <ying@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c |  8 ---
 net/tipc/bcast.h |  4 ++--
 net/tipc/link.c  | 64 
 net/tipc/link.h  |  6 +++---
 net/tipc/msg.h   | 10 +
 net/tipc/node.c  | 32 ++--
 net/tipc/node.h  | 11 ++
 7 files changed, 108 insertions(+), 27 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index ae469b3..753f774 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -269,18 +269,19 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link 
*l, u32 acked)
  *
  * RCU is locked, no other locks set
  */
-void tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
-struct tipc_msg *hdr)
+int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
+   struct tipc_msg *hdr)
 {
struct sk_buff_head *inputq = _bc_base(net)->inputq;
struct sk_buff_head xmitq;
+   int rc = 0;
 
__skb_queue_head_init();
 
tipc_bcast_lock(net);
if (msg_type(hdr) == STATE_MSG) {
tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), );
-   tipc_link_bc_sync_rcv(l, hdr, );
+   rc = tipc_link_bc_sync_rcv(l, hdr, );
} else {
tipc_link_bc_init_rcv(l, hdr);
}
@@ -291,6 +292,7 @@ void tipc_bcast_sync_rcv(struct net *net, struct tipc_link 
*l,
/* Any socket wakeup messages ? */
if (!skb_queue_empty(inputq))
tipc_sk_rcv(net, inputq);
+   return rc;
 }
 
 /* tipc_bcast_add_peer - add a peer node to broadcast link and bearer
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index d5e79b3..5ffe344 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -56,8 +56,8 @@ int  tipc_bcast_get_mtu(struct net *net);
 int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list);
 int tipc_bcast_rcv(struct net *net, struct tipc_link *l, struct sk_buff *skb);
 void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32 acked);
-void tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
-struct tipc_msg *hdr);
+int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
+   struct tipc_msg *hdr);
 int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg);
 int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[]);
 int tipc_bclink_reset_stats(struct net *net);
diff --git a/net/tipc/link.c b/net/tipc/link.c
index 2c6e1b9..136316f 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -367,6 +367,18 @@ int tipc_link_bc_peers(struct tipc_link *l)
return l->ackers;
 }
 
+u16 link_bc_rcv_gap(struct tipc_link *l)
+{
+   struct sk_buff *skb = skb_peek(>deferdq);
+   u16 gap = 0;
+
+   if (more(l->snd_nxt, l->rcv_nxt))
+   gap = l->snd_nxt - l->rcv_nxt;
+   if (skb)
+   gap = buf_seqno(skb) - l->rcv_nxt;
+   return gap;
+}
+
 void tipc_link_set_mtu(struct tipc_link *l, int mtu)
 {
l->mtu = mtu;
@@ -1135,7 +1147,10 @@ int tipc_link_build_state_msg(struct tipc_link *l, 
struct sk_buff_head *xmitq)
if (((l->rcv_nxt ^ tipc_own_addr(l->net)) & 0xf) != 0xf)
return 0;
l->rcv_unacked = 0;
-   return TIPC_LINK_SND_BC_ACK;
+
+   /* Use snd_nxt to store peer's snd_nxt in broadcast rcv link */
+   l->snd_nxt = l->rcv_nxt;
+   return TIPC_LINK_SND_STATE;
}
 
/* Unicast ACK */
@@ -1236,7 +1251,7 @@ int tipc_link_rcv(struct tipc_link *l, struct sk_buff 
*skb,
rc |= tipc_link_input(l, skb, l->inputq);
if (unlikely(++l->rcv_unacked >= TIPC_MIN_LINK_WIN))
rc |= tipc_link_build_state_msg(l, xmitq);
-   if (unlikely(rc & ~TIPC_LINK_SND_BC_ACK))
+   if (unlikely(rc & ~TIPC_LINK_SND_STATE))

[tipc-discussion] [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions

2016-09-01 Thread Jon Maloy
As cluster sizes grow, so does the amount of identical or overlapping
broadcast NACKs generated by the packet receivers. This often leads to
'NACK crunches' resulting in huge numbers of redundant retransmissions
of the same packet ranges.

In this commit, we introduce rate control of broadcast retransmissions,
so that a retransmitted range cannot be retransmitted again until after
at least 10 ms. This reduces the frequency of duplicate, redundant
retransmissions by an order of magnitude, while having a significant
positive impact on overall throughput and scalability.

Reviewed-by: Ying Xue <ying@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/link.c | 52 +++-
 1 file changed, 47 insertions(+), 5 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 136316f..58bb44d 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -181,7 +181,10 @@ struct tipc_link {
u16 acked;
struct tipc_link *bc_rcvlink;
struct tipc_link *bc_sndlink;
-   int nack_state;
+   unsigned long prev_retr;
+   u16 prev_from;
+   u16 prev_to;
+   u8 nack_state;
bool bc_peer_is_up;
 
/* Statistics */
@@ -202,6 +205,8 @@ enum {
BC_NACK_SND_SUPPRESS,
 };
 
+#define TIPC_BC_RETR_LIMIT 10   /* [ms] */
+
 /*
  * Interval between NACKs when packets arrive out of order
  */
@@ -1590,11 +1595,48 @@ void tipc_link_bc_init_rcv(struct tipc_link *l, struct 
tipc_msg *hdr)
l->rcv_nxt = peers_snd_nxt;
 }
 
+/* link_bc_retr eval()- check if the indicated range can be retransmitted now
+ * - Adjust permitted range if there is overlap with previous retransmission
+ */
+static bool link_bc_retr_eval(struct tipc_link *l, u16 *from, u16 *to)
+{
+   unsigned long elapsed = jiffies_to_msecs(jiffies - l->prev_retr);
+
+   if (less(*to, *from))
+   return false;
+
+   /* New retransmission request */
+   if ((elapsed > TIPC_BC_RETR_LIMIT) ||
+   less(*to, l->prev_from) || more(*from, l->prev_to)) {
+   l->prev_from = *from;
+   l->prev_to = *to;
+   l->prev_retr = jiffies;
+   return true;
+   }
+
+   /* Inside range of previous retransmit */
+   if (!less(*from, l->prev_from) && !more(*to, l->prev_to))
+   return false;
+
+   /* Fully or partially outside previous range => exclude overlap */
+   if (less(*from, l->prev_from)) {
+   *to = l->prev_from - 1;
+   l->prev_from = *from;
+   }
+   if (more(*to, l->prev_to)) {
+   *from = l->prev_to + 1;
+   l->prev_to = *to;
+   }
+   l->prev_retr = jiffies;
+   return true;
+}
+
 /* tipc_link_bc_sync_rcv - update rcv link according to peer's send state
  */
 int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
  struct sk_buff_head *xmitq)
 {
+   struct tipc_link *snd_l = l->bc_sndlink;
u16 peers_snd_nxt = msg_bc_snd_nxt(hdr);
u16 from = msg_bcast_ack(hdr) + 1;
u16 to = from + msg_bc_gap(hdr) - 1;
@@ -1613,14 +1655,14 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct 
tipc_msg *hdr,
if (!l->bc_peer_is_up)
return rc;
 
+   l->stats.recv_nacks++;
+
/* Ignore if peers_snd_nxt goes beyond receive window */
if (more(peers_snd_nxt, l->rcv_nxt + l->window))
return rc;
 
-   if (!less(to, from)) {
-   rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq);
-   l->stats.recv_nacks++;
-   }
+   if (link_bc_retr_eval(snd_l, , ))
+   rc = tipc_link_retrans(snd_l, from, to, xmitq);
 
l->snd_nxt = peers_snd_nxt;
if (link_bc_rcv_gap(l))
-- 
2.7.4


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH net-next 0/3] tipc: improve broadcast NACK mechanism

2016-09-01 Thread Jon Maloy
The broadcast protocol has turned out to not scale well beyond 70-80
nodes, while it is now possible to build TIPC clusters of at least ten
times that size. This commit series improves the NACK/retransmission
mechanism of the broadcast protocol to make is at scalable as the rest
of TIPC.

Jon Maloy (3):
  tipc: transfer broadcast nacks in link state messages
  tipc: rate limit broadcast retransmissions
  tipc: send broadcast nack directly upon sequence gap detection

 net/tipc/bcast.c |   8 ++--
 net/tipc/bcast.h |   4 +-
 net/tipc/link.c  | 131 ++-
 net/tipc/link.h  |   6 +--
 net/tipc/msg.h   |  10 +
 net/tipc/node.c  |  32 +-
 net/tipc/node.h  |  11 +++--
 7 files changed, 167 insertions(+), 35 deletions(-)

-- 
2.7.4


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] BC rcv link acked stuck after receiving a named with a BC ACK of 0

2016-09-07 Thread Jon Maloy


From: John THompson [mailto:thompa@gmail.com]
Sent: Tuesday, 06 September, 2016 20:48
To: Jon Maloy <jon.ma...@ericsson.com>
Cc: Jon Maloy <ma...@donjonn.com>; tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] BC rcv link acked stuck after receiving a named 
with a BC ACK of 0

Hi Jon,

My comments are below.

JT

On Wed, Sep 7, 2016 at 12:48 AM, Jon Maloy 
<jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>> wrote:

[…]

Not sure that is a good idea. Even if #0 happens to be within a valid range it 
is still invalid, and may lead to an inadvertent release of packets which are 
not ready to be released yet.  I’ll try to take a closer look at this today.
I thought that #0 was a valid value when the seqno wraps around?  There doesn't 
appear to be any special handling for the wrap around case.  I agree with your 
statement that it might lead to inadvertent release of packets that aren't 
ready to be released.  Therefore I won't send it in.

There is nothing wrong with the value 0 as such, it may be as valid as any 
other value. But any values that is not directly or indirectly based on the one 
received in the received broadcast init message is per definition invalid, and 
may wreak havoc in the send queue.

///jon


JT




///jon

Cheers,
JT

On Wed, Aug 31, 2016 at 10:57 PM, John THompson 
<thompa@gmail.com<mailto:thompa@gmail.com>> wrote:
Hi Jon,

I have verified that the patch is included in my build.
2d18ac4ba7454a426047 (“ tipc: extend broadcast link initialization criteria”)

I am trying to verify which packets are received when the problem occurs but I 
am having trouble getting the information out of my system at the moment.

I will keep trying.
Thanks,
JT


On Tue, Aug 30, 2016 at 6:20 PM, Jon Maloy 
<ma...@donjonn.com<mailto:ma...@donjonn.com>> wrote:


On 08/29/2016 06:48 PM, Jon Maloy wrote:
Hi John,
Sorry for my late answer; I was on vacation for a few days.
It seems I gave you the wrong commit reference in my previous mail. The one I 
really meant was
2d18ac4ba7454a426047 (“ tipc: extend broadcast link initialization criteria”)

This one explains why the first packets sometimes get an invalid ack number, 
but also remedies it, and I simply cannot see how an invalid ack #0 can ever be 
accepted when this patch is applied.
I see no reason why this patch shouldn’t also be present in you code, but just 
to make sure, can you confirm this?

I am right now wondering if a retransmission is the problem:
1: we receive pkt #2 which contains ack #1, so we set bc_peer_is_up to true.
Since only LINK_PROTO/STATE messages can cause bc_peer_is_up to go true, the 
likely sequence is rather
1: We receive a STATE message with unicast ack #1. This message should also 
contain a valid, with high probability non-zero, bc_ack. bc_peer_is_up is set 
to true.
2: We receive unicast pkt#1 (BCAST init or NAMED) which contains the invalid 
unicast ack #0. This one is now accepted.

I believe this may happen, because STATE messages, contrary to data packets, 
are sent as TC_PRIO_CONTROL, and may sometimes bypass data messages, but I 
cannot see it happening as often and consistently as you seem to be observing 
it. Another possibility is that bc_ack in the received STATE message also is an 
invalid zero, although I cannot see how this can happen either.

Regards
///jon
2: we receive pkt #1 retransmitted with ack #0. This now gets accepted, and we 
are in trouble.

I’ll try to figure out a solution to this, but it may be possible for you to 
verify this first.

BR
///jon



From: John THompson [mailto:thompa@gmail.com<mailto:thompa@gmail.com>]
Sent: Wednesday, 24 August, 2016 16:22
To: Jon Maloy <jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>>
Cc: 
tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
Subject: Re: [tipc-discussion] BC rcv link acked stuck after receiving a named 
with a BC ACK of 0

Hi Jon,

To clarify my previous email regarding the behaviour observed,

What happens over time:
+ remove bc peer
...
some time until peer rejoins
...
+ add bc peer
+ tipc_link_bc_ack_rcv
   link is up = false, node is up = false
   (this gets called a number of times until both the link and node are up)

+ tipc_link_bc_ack_rcv
   l->acked set to valid ack
...
+ tipc_rcv - usr 5 or 11, bc_ack = 0
   + tipc_bcast_ack_rcv
 + tipc_link_bc_ack_rcv
   sets l->acked to 0

Regards,
JT
On Thu, Aug 25, 2016 at 8:06 AM, John THompson 
<thompa@gmail.com<mailto:thompa@gmail.com><mailto:thompa@gmail.com<mailto:thompa@gmail.com>>>
 wrote:
Hi Jon,

It is a similar problem in terms of what happens to the bc link.  I do have 
that patch applied.

I have added debug through the remove bc peer and various other functions and 
the setting of the acked field to 0 is occurring when processing a packet from 
named (msg user 11) or BCAST pr

Re: [tipc-discussion] [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions

2016-08-29 Thread Jon Maloy


On 08/25/2016 05:41 AM, Xue, Ying wrote:
>
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Wednesday, August 17, 2016 2:09 AM
> To: tipc-discussion@lists.sourceforge.net; 
> parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
> jon.ma...@ericsson.com
> Cc: ma...@donjonn.com; gbala...@gmail.com
> Subject: [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions
>
> As cluster sizes grow, so does the amount of identical or overlapping 
> broadcast NACKs generated by the packet receivers. This often leads to 'NACK 
> crunches' resulting in huge numbers of redundant retransmissions of the same 
> packet ranges.
>
> In this commit, we introduce rate control of broadcast retransmissions, so 
> that a retransmitted range cannot be retransmitted again until after at least 
> 10 ms. This reduces the frequency of duplicate retransmissions by an order of 
> magnitude, while having a significant positive impact on throughput and 
> scalability.
>
> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> ---
>   net/tipc/link.c | 52 +++-
>   1 file changed, 47 insertions(+), 5 deletions(-)
>
> diff --git a/net/tipc/link.c b/net/tipc/link.c index 136316f..58bb44d 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -181,7 +181,10 @@ struct tipc_link {
>   u16 acked;
>   struct tipc_link *bc_rcvlink;
>   struct tipc_link *bc_sndlink;
> - int nack_state;
> + unsigned long prev_retr;
> + u16 prev_from;
> + u16 prev_to;
> + u8 nack_state;
>   bool bc_peer_is_up;
>   
>   /* Statistics */
> @@ -202,6 +205,8 @@ enum {
>   BC_NACK_SND_SUPPRESS,
>   };
>   
> +#define TIPC_BC_RETR_LIMIT 10   /* [ms] */
>
> [Ying] I suggest we should define jiffies number rather than 10ms. If so, we 
> don't need to convert l->prev_retr  from jiffies to microsecond.

jiffies is not a uniquely defined entity, as it depends on the clock 
speed of the actual CPU.
I measured 10 ms to be a value that works better than e.g. 2 ms or 50 
ms, so I don't think it is a good idea to add any more randomness to this.

>
> +
>   /*
>* Interval between NACKs when packets arrive out of order
>*/
> @@ -1590,11 +1595,48 @@ void tipc_link_bc_init_rcv(struct tipc_link *l, 
> struct tipc_msg *hdr)
>   l->rcv_nxt = peers_snd_nxt;
>   }
>   
> +/* link_bc_retr eval()- check if the indicated range can be
> +retransmitted now
> + * - Adjust permitted range if there is overlap with previous
> +retransmission  */ static bool link_bc_retr_eval(struct tipc_link *l,
> +u16 *from, u16 *to) {
> + unsigned long elapsed = jiffies_to_msecs(jiffies - l->prev_retr);
> +
> + if (less(*to, *from))
> + return false;
> +
> + /* New retransmission request */
> + if ((elapsed > TIPC_BC_RETR_LIMIT) ||
> + less(*to, l->prev_from) || more(*from, l->prev_to)) {
> + l->prev_from = *from;
> + l->prev_to = *to;
> + l->prev_retr = jiffies;
> + return true;
> + }
> +
>
> [Ying] In the my understanding, the statement above seems to be changed as 
> below:
>
> if ((elapsed > TIPC_BC_RETR_LIMIT) && (less(*to, l->prev_from) || more(*from, 
> l->prev_to)))
>
> Otherwise, I think the rate of retransmission may be still very high 
> especially when packet disorder frequently happens.
> As it's very normal that BCast retransmission range requested from different 
> nodes is not same, I guess the rate is not well suppressed in above condition.
>
> Maybe, we can use the method which is being effective in tipc_sk_enqueue() to 
> limit the rate like:

The condition indicates that the two retransmission requests are 
completely disjoint, e.g., 10-12 and 13-15. I don't see any reason why 
we should wait with retransmitting 13-15 in this case, as it is obvious 
that somebody is missing it, and we haven't retransmitted it before.
But I do understand it can be problematic if re receive repeated 
disjoint nacks, e.g., 10-12, 13-15, 10-12, 13-15 etc. I will make a test 
and see if there is anything to gain from keeping a "history" of the 
retransmissions, so that we never repeat the same packets inside the 
same interval. I'll take a look.

///jon

>
> tipc_sk_enqueue()
> {
> ..
>  while (skb_queue_len(inputq)) {
>  if (unlikely(time_after_eq(jiffies, time_limit)))
>  return;
> ..
> }
>
> And we don’t need to consider the retransmission range.
>
> Regards,
> Ying
>
> + /* Inside range of previous retransmi

Re: [tipc-discussion] BC rcv link acked stuck after receiving a named with a BC ACK of 0

2016-08-29 Thread Jon Maloy
Hi John,
Sorry for my late answer; I was on vacation for a few days.
It seems I gave you the wrong commit reference in my previous mail. The one I 
really meant was
2d18ac4ba7454a426047 (“ tipc: extend broadcast link initialization criteria”)

This one explains why the first packets sometimes get an invalid ack number, 
but also remedies it, and I simply cannot see how an invalid ack #0 can ever be 
accepted when this patch is applied.
I see no reason why this patch shouldn’t also be present in you code, but just 
to make sure, can you confirm this?

I am right now wondering if a retransmission is the problem:
1: we receive pkt #2 which contains ack #1, so we set bc_peer_is_up to true.
2: we receive pkt #1 retransmitted with ack #0. This now gets accepted, and we 
are in trouble.

I’ll try to figure out a solution to this, but it may be possible for you to 
verify this first.

BR
///jon



From: John THompson [mailto:thompa@gmail.com]
Sent: Wednesday, 24 August, 2016 16:22
To: Jon Maloy <jon.ma...@ericsson.com>
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] BC rcv link acked stuck after receiving a named 
with a BC ACK of 0

Hi Jon,

To clarify my previous email regarding the behaviour observed,

What happens over time:
+ remove bc peer
...
some time until peer rejoins
...
+ add bc peer
+ tipc_link_bc_ack_rcv
  link is up = false, node is up = false
  (this gets called a number of times until both the link and node are up)

+ tipc_link_bc_ack_rcv
  l->acked set to valid ack
...
+ tipc_rcv - usr 5 or 11, bc_ack = 0
  + tipc_bcast_ack_rcv
+ tipc_link_bc_ack_rcv
  sets l->acked to 0

Regards,
JT


On Thu, Aug 25, 2016 at 8:06 AM, John THompson 
<thompa@gmail.com<mailto:thompa@gmail.com>> wrote:
Hi Jon,

It is a similar problem in terms of what happens to the bc link.  I do have 
that patch applied.

I have added debug through the remove bc peer and various other functions and 
the setting of the acked field to 0 is occurring when processing a packet from 
named (msg user 11) or BCAST protocol (msg user 5).

Thanks,
JT

On Wed, Aug 24, 2016 at 10:23 PM, Jon Maloy 
<jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>> wrote:
Hi John,
This sounds a lot like the problem I tried to fix in
a71eb720355c2 ("tipc: ensure correct broadcast send buffer release when peer is 
lost")
So, either that patch is not present in your kernel (if it is 4.7 it is 
supposed to be) or my solution somehow hasn't solved the problem.
Can you confirm that the patch is there?

BR
///jon

> -Original Message-
> From: John THompson [mailto:thompa@gmail.com<mailto:thompa@gmail.com>]
> Sent: Tuesday, 23 August, 2016 20:21
> To: 
> tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
> Subject: [tipc-discussion] BC rcv link acked stuck after receiving a named 
> with a BC
> ACK of 0
>
> Hi,
>
> I am running TIPC 2.0 on Linux 4.7 on a cluster of Freescale QorIQ P2040
> and Marvell Armada-XP processors.  There are 10 nodes in all.
> When 2 of the nodes are removed, then rejoin the cluster we sometimes see
> behaviour where the TIPC BC link gets stuck and eventually the backlog gets
> full.  the 2 nodes that are joining have already connected together.
>
> The problem occurs when the BC link sndnxt value is greater than 32k on one
> of the nodes (call it NODE1) and 2 nodes begin to join.
> When NODE1 detects the joining nodes, at some early point after they have
> joined, NODE1 receives a NAMED publication with a BC ack of 0.  NODE1
> immediately sets its BC acked to 0 and tries to ack packets off the
> transmq.  No packets get removed as the new ack value doesn't match any of
> the packets that need to be acked.
>
> The problem doesn't recover because in tipc_link_bc_ack_rcv it ensures that
> the new acked value is more than the old acked value.  When the values are
> greater than 32k apart this means that 0 can indeed be greater than
> 40,000.  So when new packets are processed the new BC ack value is
> considered less than the stored one (0).
>
> This results in the BC transmq getting full and the backlog getting full,
> thereby preventing communication over the BC link between nodes.
>
> I am persisting in trying to work out why the NAMED publication has a BC
> ack of 0, which I think is the root cause of the problem.
>
> I think that tipc_link_bc_ack_rcv needs an extra check to ensure that an
> invalid BC ack value cannot be set.  I am defining invalid as being an
> acked value that is greater than the current BC acked value + the link
> window.
>
> Thanks,
> John
> --
> ___

Re: [tipc-discussion] [PATCH net-next v2 00/12] tipc: create socket FSM using sk_state only

2016-08-30 Thread Jon Maloy
The multicast failure is known. It happens because I lowered the socket 
receive buffer size in commit 10724cc7bb783 ("tipc: redesign 
connection-level flow control") so that the send stream now occasionally 
get too big, and a message is rejected. I don't consider this an error, 
as a multicast sender always has to expect and prepare his code for such 
events.  I have not seen the problem in test #6, but it is easily 
imaginable that this is the same issue.

///jon



On 08/30/2016 06:02 AM, Xue, Ying wrote:
>
> Just a reminder: this issues might not be caused by your patches. So 
> firstly we need to identify if your patch set leads to the side 
> effects. If not, they are introduced by previous commits merged into 
> mainline.
>
> Regards,
>
> Ying
>
> *From:*Parthasarathy Bhuvaragan 
> [mailto:parthasarathy.bhuvara...@ericsson.com]
> *Sent:* Tuesday, August 30, 2016 4:30 PM
> *To:* Xue, Ying; tipc-discussion@lists.sourceforge.net; 
> jon.ma...@ericsson.com; ma...@donjonn.com
> *Subject:* Re: [PATCH net-next v2 00/12] tipc: create socket FSM using 
> sk_state only
>
> Hi Ying,
>
> I tried to trigger this fault. I get a different error on the 
> multicast test, after about 2 hours.
>  node1 ~ # ./tipcTS
>  Received on 1 sockets in subtest 6, expected 2
>  TEST FAILED Received wrong number of multicast messages
>   errno = 11: Resource temporarily unavailable
>
> My client and server run on 2 qemu-guest's with 4 cpus. I will 
> continue to investigate this. Sorry for the late reply, as I got stuck 
> with other issues recently and couldn't focus on this.
>
> regards partha
>
> On 08/17/2016 12:40 PM, Xue, Ying wrote:
>
> I have found the following issue after the series is applied to the 
> latest kernel:
>
> Test # 8
>
> TIPC TIPC_IMPORTANCE test...STARTED!
>
> TEST FAILED unexpected number of send() errors errno = 113: No route to 
> host
>
> Test # 1
>
> Below is the procedure of how to reproduce above error:
>
> Prepare for two nodes. One is to run "tipcTS"; and on another node, we 
> use the below commands to repeatedly run tipcTC test case:
>
> while [ true ]; do tipcTC 0; done
>
> About after 2 or 3 hours, the error above will appear.
>
> Regards,
>
> Ying
>
> -Original Message-
>
> From: Parthasarathy Bhuvaragan 
> [mailto:parthasarathy.bhuvara...@ericsson.com]
>
> Sent: Monday, August 15, 2016 5:19 PM
>
> To:tipc-discussion@lists.sourceforge.net
> ;jon.ma...@ericsson.com 
> ;ma...@donjonn.com ; 
> Xue, Ying
>
> Subject: [PATCH net-next v2 00/12] tipc: create socket FSM using sk_state 
> only
>
> The following issues with the current socket layer hinders socket 
> diagnostics implementation, which led to this patch series. The series does 
> not add any functional change.
>
> 1. tipc socket state is derived from multiple variables like
>
> sock->state, tsk->probing_state and tsk->connected. This style forces
>
> us to export multiple attributes to the user space, which has to be
>
> backward compatible.
>
> 2. Abuse of sock->state cannot be exported to user-space without
>
> requiring tipc specific hacks in the user-space.
>
> - For connection less (CL) sockets sock->state is overloaded to
>
>   tipc state SS_READY.
>
> - For connection oriented (CO) listening socket sock->state is
>
>   overloaded to tipc state SS_LISTEN.
>
> This series is split into three:
>
> 1. A bug fix in patch-1
>
> 2. Express all tipc states using a single variable. This is done in 
> patch#2-5.
>
> 3. Migrate the new tipc states to sk->sk_state. This is done in 
> patch#6-12.
>
> The figures below represents the FSM after this series:
>
> Unconnected Sockets:
>
> +--+++
>
> | TIPC_UNCONNECTED |--->| TIPC_DISCONNECTING |
>
> +--+++
>
> Stream Server Listening Socket:
>
> +--++-+
>
> | TIPC_UNCONNECTED |--->| TIPC_LISTEN |
>
> +--++-+
>
>|
>
> ++|
>
> | TIPC_DISCONNECTING |<---+
>
> ++
>
> Stream Server Data Socket:
>
> +-++--+
>
> |TIPC_UNCONNECTED |--> | TIPC_ESTABLISHED |<---+
>
> +-++--+|
>
> ^   ||  |
>
> |   |+--+
>
> |   v
>
> +--+  +-+
>
> |TIPC_DISCONNECTING|<-|TIPC_PROBING |
>
> +--+  +-+
>
> Stream Socket Client:
>
> 

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix random link resets while adding a second bearer

2016-08-31 Thread Jon Maloy
Reviewed-by: Jon Maloy


On 08/31/2016 07:35 AM, Parthasarathy Bhuvaragan wrote:
> In a dual bearer configuration, if the second tipc link becomes
> active while the first link still has pending nametable "bulk"
> updates, it randomly leads to reset of the second link.
>
> When a link is established, the function named_distribute(),
> fills the skb based on node mtu (allows room for TUNNEL_PROTOCOL)
> with NAME_DISTRIBUTOR message for each PUBLICATION.
> However, the function named_distribute() allocates the buffer by
> increasing the node mtu by INT_H_SIZE (to insert NAME_DISTRIBUTOR).
> This consumes the space allocated for TUNNEL_PROTOCOL.
>
> When establishing the second link, the link shall tunnel all the
> messages in the first link queue including the "bulk" update.
> As size of the NAME_DISTRIBUTOR messages while tunnelling, exceeds
> the link mtu the transmission fails (-EMSGSIZE).
>
> Thus, the synch point based on the message count of the tunnel
> packets is never reached leading to link timeout.
>
> In this commit, we adjust the size of name distributor message so that
> they can be tunnelled.
>
> Signed-off-by: Parthasarathy Bhuvaragan 
> <parthasarathy.bhuvara...@ericsson.com>
> ---
> v3: removed the erroneous usage of roundup() and replaced with exact math
>  as reported by Billie Alsup.
> ---
>   net/tipc/name_distr.c | 8 +---
>   1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
> index 6b626a64b517..a04fe9be1c60 100644
> --- a/net/tipc/name_distr.c
> +++ b/net/tipc/name_distr.c
> @@ -62,6 +62,8 @@ static void publ_to_item(struct distr_item *i, struct 
> publication *p)
>   
>   /**
>* named_prepare_buf - allocate & initialize a publication message
> + *
> + * The buffer returned is of size INT_H_SIZE + payload size
>*/
>   static struct sk_buff *named_prepare_buf(struct net *net, u32 type, u32 
> size,
>u32 dest)
> @@ -141,9 +143,9 @@ static void named_distribute(struct net *net, struct 
> sk_buff_head *list,
>   struct publication *publ;
>   struct sk_buff *skb = NULL;
>   struct distr_item *item = NULL;
> - uint msg_dsz = (tipc_node_get_mtu(net, dnode, 0) / ITEM_SIZE) *
> - ITEM_SIZE;
> - uint msg_rem = msg_dsz;
> + u32 msg_dsz = ((tipc_node_get_mtu(net, dnode, 0) - INT_H_SIZE) /
> + ITEM_SIZE) * ITEM_SIZE;
> + u32 msg_rem = msg_dsz;
>   
>   list_for_each_entry(publ, pls, local_list) {
>   /* Prepare next buffer: */


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH stable] tipc: apply skb linearization commit to stable 4.4.x

2016-09-14 Thread Jon Maloy
commit c7cad0d6f70cd upstream
(“tipc: move linearization of buffers to generic code”)

was applied to net-next in November 2015, and is present in kernel
versions from 4.5.x onwards.

We later discovered that this commit also fixes a serious bug, since
even L2 buffers may arrive non-linearized. Hence, in 4.4.x kernels we
often see debug printouts like this:

[880.688856] Dropping name table update (0) of {1651649891, 1819082752, 0} from 
<1.1.1> key=402710022
[880.688862] Dropping name table update (0) of {4029808599, 2711729614, 
1639218685} from <1.1.1> key=18102394
[880.688865] Dropping name table update (0) of {134218495, 4278191616, 
100669184} from <1.1.1> key=0

Those are symptoms of the binding table having received "corrupt" 
publications read linearly from non-linear buffers. The above listed
commit solves this problem, and should be applied even to 4.4 kernels.

2.7.4


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH net-next 3/4] tipc: introduce replicast as transport option for multicast

2016-09-14 Thread Jon Maloy
TIPC multicast messages are currently carried over a reliable
'broadcast link', making use of the underlying media's ability to
transport packets as L2 broadcast or IP multicast to all nodes in
the cluster.

When the used bearer is lacking that ability, we can instead emulate
the broadcast service by replicating and sending the packets over as
many unicast links as needed to reach all identified destinations.
We now introduce a new TIPC link-level 'replicast' service that does
this.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c  | 105 ++
 net/tipc/bcast.h  |   5 +--
 net/tipc/link.c   |   8 -
 net/tipc/msg.c|  17 +
 net/tipc/msg.h|   2 ++
 net/tipc/node.c   |  27 +-
 net/tipc/socket.c |  29 ++-
 7 files changed, 149 insertions(+), 44 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 886df68..e7b4d6b 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -39,7 +39,7 @@
 #include "socket.h"
 #include "msg.h"
 #include "bcast.h"
-#include "name_distr.h"
+#include "name_table.h"
 #include "link.h"
 #include "node.h"
 
@@ -176,43 +176,102 @@ static void tipc_bcbase_xmit(struct net *net, struct 
sk_buff_head *xmitq)
__skb_queue_purge(&_xmitq);
 }
 
-/* tipc_bcast_xmit - deliver buffer chain to all nodes in cluster
- *and to identified node local sockets
+/* tipc_bcast_xmit - broadcast the buffer chain to all external nodes
  * @net: the applicable net namespace
- * @list: chain of buffers containing message
+ * @msg: chain of buffers containing message
  * Consumes the buffer chain, except when returning -ELINKCONG
  * Returns 0 if success, otherwise errno: -ELINKCONG,-EHOSTUNREACH,-EMSGSIZE
  */
-int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list)
+static int tipc_bcast_xmit(struct net *net, struct sk_buff_head *msg)
 {
struct tipc_link *l = tipc_bc_sndlink(net);
-   struct sk_buff_head xmitq, inputq, rcvq;
+   struct sk_buff_head xmitq;
int rc = 0;
 
-   __skb_queue_head_init();
__skb_queue_head_init();
-   skb_queue_head_init();
-
-   /* Prepare message clone for local node */
-   if (unlikely(!tipc_msg_reassemble(list, )))
-   return -EHOSTUNREACH;
-
tipc_bcast_lock(net);
if (tipc_link_bc_peers(l))
-   rc = tipc_link_xmit(l, list, );
+   rc = tipc_link_xmit(l, msg, );
tipc_bcast_unlock(net);
+   tipc_bcbase_xmit(net, );
+   return rc;
+}
+
+/* tipc_rcast_xmit - replicate and send a message to given destination nodes
+ * @net: the applicable net namespace
+ * @msg: chain of buffers containing message
+ * @dests: list of destination nodes
+ * Returns -ELINKCONG if any link is congested, otherwise 0
+ */
+static int tipc_rcast_xmit(struct net *net, struct sk_buff_head *msg,
+  struct tipc_nlist *dests)
+{
+   struct sk_buff_head _msg;
+   struct tipc_nitem *n, *tmp;
+   int rc = 0;
+   u32 dst;
+
+   __skb_queue_head_init(&_msg);
+
+   list_for_each_entry_safe(n, tmp, >unsent, list) {
+   dst = n->node;
+   if (!tipc_msg_pskb_copy(dst, msg, &_msg))
+   return -ENOMEM;
+
+   /* Already congestion? Ensure there will be only one wakeup */
+   TIPC_SKB_CB(skb_peek(&_msg))->wakeup_pending = rc;
+
+   /* Any other failure than -ELINKCONG is ignored */
+   if (tipc_node_xmit(net, &_msg, dst, dst) == -ELINKCONG)
+   rc = -ELINKCONG;
+   else
+   tipc_nlist_sent(dests, n);
+
+   /* Message copy list is non-empty if sending failed */
+   __skb_queue_purge(&_msg);
+   }
+   return rc;
+}
 
-   /* Don't send to local node if adding to link failed */
-   if (unlikely(rc)) {
-   __skb_queue_purge();
-   return rc;
+/* tipc_mcast_xmit - deliver message to indicated destination nodes
+ *   and to identified node local sockets
+ * @net: the applicable net namespace
+ * @msg: chain of buffers containing message
+ * @dests: destination nodes for message.
+ * Consumes buffer chain, except when returning -ELINKCONG
+ * Returns dest list with all items present in either 'sent' or 'unsent' list
+ * Returns 0 if success, otherwise errno
+ */
+int tipc_mcast_xmit(struct net *net, struct sk_buff_head *msg,
+   struct tipc_nlist *dests)
+{
+   struct tipc_bc_base *bb = tipc_bc_base(net);
+   struct sk_buff_head inputq;
+   int rc = 0;
+
+   skb_queue_head_init();
+
+   /* Create message clone for local node if applicable */
+   if (dests->local && skb_queue_empty(>localq) &&
+   !tipc_msg_reasse

[tipc-discussion] [PATCH net-next 2/4] tipc: add functionality to lookup multicast destination nodes

2016-09-14 Thread Jon Maloy
As a further preparation for the upcoming 'replicast' functionality,
we add some necessary structs and functions for looking up and returning
a list of all nodes that host destinations for a given multicast message.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/name_table.c | 95 +++
 net/tipc/name_table.h | 24 +
 2 files changed, 119 insertions(+)

diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index e190460..cb9c7ea 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -645,6 +645,42 @@ exit:
return res;
 }
 
+/* tipc_nametbl_lookup_dst_nodes - find broadcast destination nodes
+ * - Creates list of nodes that overlap the given multicast address
+ * - Determines if any node local ports overlap
+ */
+void tipc_nametbl_lookup_dst_nodes(struct net *net, u32 type, u32 lower,
+  u32 upper, u32 domain,
+  struct tipc_nlist *nodes)
+{
+   struct sub_seq *sseq, *stop;
+   struct publication *publ;
+   struct name_info *info;
+   struct name_seq *seq;
+   u32 self = tipc_own_addr(net);
+
+   rcu_read_lock();
+   seq = nametbl_find_seq(net, type);
+   if (!seq)
+   goto exit;
+
+   spin_lock_bh(>lock);
+   sseq = seq->sseqs + nameseq_locate_subseq(seq, lower);
+   stop = seq->sseqs + seq->first_free;
+   for (; sseq->lower <= upper && sseq != stop; sseq++) {
+   info = sseq->info;
+   list_for_each_entry(publ, >zone_list, zone_list) {
+   if (publ->node == self)
+   nodes->local = true;
+   else if (tipc_in_scope(domain, publ->node))
+   tipc_nlist_add(nodes, publ->node);
+   }
+   }
+   spin_unlock_bh(>lock);
+exit:
+   rcu_read_unlock();
+}
+
 /*
  * tipc_nametbl_publish - add name publication to network name tables
  */
@@ -1059,3 +1095,62 @@ u32 tipc_plist_pop(struct tipc_plist *pl)
kfree(nl);
return port;
 }
+
+void tipc_nlist_init(struct tipc_nlist *nl)
+{
+   memset(nl, 0, sizeof(*nl));
+   INIT_LIST_HEAD(>sent);
+   INIT_LIST_HEAD(>unsent);
+   skb_queue_head_init(>localq);
+}
+
+void tipc_nlist_add(struct tipc_nlist *nl, u32 node)
+{
+   struct tipc_nitem *n, *tmp;
+
+   list_for_each_entry_safe(n, tmp, >unsent, list) {
+   if (n->node == node)
+   return;
+   }
+   n = kzalloc(sizeof(*n), GFP_KERNEL);
+   if (!n)
+   return;
+   n->node = node;
+   nl->remote++;
+   list_add(>list, >unsent);
+}
+
+void tipc_nlist_sent(struct tipc_nlist *nl, struct tipc_nitem *n)
+{
+   list_del(>list);
+   list_add(>list, >sent);
+   nl->remote--;
+}
+
+void tipc_nlist_del(struct tipc_nlist *nl, struct tipc_nitem *n)
+{
+   list_del(>list);
+   kfree(n);
+   nl->remote--;
+}
+
+void tipc_nlist_restore(struct tipc_nlist *nl)
+{
+   list_splice_tail_init(>sent, >unsent);
+   nl->remote = 0;
+   nl->local = 0;
+}
+
+void tipc_nlist_purge(struct tipc_nlist *nl)
+{
+   struct tipc_nitem *n, *tmp;
+
+   list_for_each_entry_safe(n, tmp, >unsent, list) {
+   list_del(>list);
+   kfree(n);
+   }
+   list_for_each_entry_safe(n, tmp, >sent, list) {
+   list_del(>list);
+   kfree(n);
+   }
+}
diff --git a/net/tipc/name_table.h b/net/tipc/name_table.h
index 1524a73..d1061b6 100644
--- a/net/tipc/name_table.h
+++ b/net/tipc/name_table.h
@@ -39,6 +39,7 @@
 
 struct tipc_subscription;
 struct tipc_plist;
+struct tipc_nlist;
 
 /*
  * TIPC name types reserved for internal TIPC use (both current and planned)
@@ -100,6 +101,9 @@ int tipc_nl_name_table_dump(struct sk_buff *skb, struct 
netlink_callback *cb);
 u32 tipc_nametbl_translate(struct net *net, u32 type, u32 instance, u32 *node);
 int tipc_nametbl_mc_translate(struct net *net, u32 type, u32 lower, u32 upper,
  u32 limit, struct tipc_plist *dports);
+void tipc_nametbl_lookup_dst_nodes(struct net *net, u32 type, u32 lower,
+  u32 upper, u32 domain,
+  struct tipc_nlist *nodes);
 struct publication *tipc_nametbl_publish(struct net *net, u32 type, u32 lower,
 u32 upper, u32 scope, u32 port_ref,
 u32 key);
@@ -130,4 +134,24 @@ static inline void tipc_plist_init(struct tipc_plist *pl)
 void tipc_plist_push(struct tipc_plist *pl, u32 port);
 u32 tipc_plist_pop(struct tipc_plist *pl);
 
+struct tipc_nitem {
+   struct list_head list;
+   u32 node;
+};
+
+struct tipc_nlist {
+   struct list_head unse

[tipc-discussion] [PATCH net-next 0/4] tipc: introduce multicast through replication

2016-09-14 Thread Jon Maloy
TIPC multicast messages are currently distributed via L2 broadcast
or IP multicast to all nodes in the cluster, irrespective of the 
number of real destinations of the message.

In this series we introduce an option to transport messages via
replication ("replicast") across a selected number of unicast links,
instead of relying on the underlying media. This option is used when
true broadcast/multicast is not supported by the media, or when the
number of true destinations is much smaller than the cluster size.

Jon Maloy (4):
  tipc: add function for checking broadcast support in bearer
  tipc: add functionality to lookup multicast destination nodes
  tipc: introduce replicast as transport option for multicast
  tipc: make replicast a user selectable option

 include/uapi/linux/tipc.h |   6 +-
 net/tipc/bcast.c  | 166 ++
 net/tipc/bcast.h  |  19 +-
 net/tipc/bearer.c |  15 -
 net/tipc/bearer.h |   6 ++
 net/tipc/link.c   |  12 +++-
 net/tipc/msg.c|  17 +
 net/tipc/msg.h|   2 +
 net/tipc/name_table.c |  95 ++
 net/tipc/name_table.h |  24 +++
 net/tipc/node.c   |  27 +---
 net/tipc/node.h   |   4 +-
 net/tipc/socket.c |  52 ++-
 net/tipc/udp_media.c  |   8 +--
 14 files changed, 392 insertions(+), 61 deletions(-)

-- 
2.7.4


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net-next v3 00/12] tipc: create socket FSM using sk_state only

2016-09-15 Thread Jon Maloy


On 09/15/2016 08:29 AM, Parthasarathy Bhuvaragan wrote:
>
> On 09/14/2016 10:20 PM, Jon Maloy wrote:
>> Hi Partha,
>> It looks good. I am especially happy that you got rid of the "connected" 
>> field, which I have been requesting for several years.
>> I only have a couple of comments:
>>
>> 1: It disturbs me that TIPC_UNCONNECTED and TIPC_LISTENING sockets goes to 
>> state TIPC_DISCONNECTING before they are deleted, without ever having been 
>> connected. This is counterintuitive and cannot be necessary.
> Changing the names of the new states should make the transitions more 
> logical as below:
> TIPC_UNCONNECTED => TIPC_OPEN
> TIPC_DISCONNECTING => TIPC_CLOSE
> So in this case, all open sockets needs to be closed. Its valid for 
> both connectionless and connection oriented sockets as for connection 
> oriented sockets there is no difference between DISCONNECTING and CLOSE.

Sounds better, although I think TIPC_CLOSING, or maybe TIPC_RELEASING 
would be even better.

>
> I need that we get names for the states correct, as I will export them 
> to user space for socket diagnostics.
>
> BTW, why do we even support shutdown() on connectionless sockets? The 
> function handles states relevant only for connection oriented sockets.
>> 2: "connection less" should be "connectionless".  In several places.
> I will fix this. I need to fix the description in tipc_poll() too.

When you have done this and the above name changes, you can add an 
Acked-by: from me.

///jon

>> 3: We really must rename/retype the confusing "handle" field used in 
>> TIPC_SKB_CB()->handle. I believe it should be made a u32 and called "read" 
>> or "offset" or similar.  Maybe a separate patch before or after this series? 
>> 
> I will look into that, and post it separately.
>> Regards
>> ///jon
>>
>>
>>> -Original Message-
>>> From: Parthasarathy Bhuvaragan
>>> Sent: Tuesday, 13 September, 2016 06:52
>>> To:tipc-discussion@lists.sourceforge.net; Jon Maloy<jon.ma...@ericsson.com>;
>>> ma...@donjonn.com; Ying Xue<ying@windriver.com>
>>> Subject: [PATCH net-next v3 00/12] tipc: create socket FSM using sk_state 
>>> only
>>>
>>> The following issues with the current socket layer hinders socket 
>>> diagnostics
>>> implementation, which led to this patch series. The series does not add any
>>> functional change.
>>>
>>> 1. tipc socket state is derived from multiple variables like
>>> sock->state, tsk->probing_state and tsk->connected. This style forces
>>> us to export multiple attributes to the user space, which has to be
>>> backward compatible.
>>>
>>> 2. Abuse of sock->state cannot be exported to user-space without
>>> requiring tipc specific hacks in the user-space.
>>> - For connection less (CL) sockets sock->state is overloaded to
>>>   tipc state SS_READY.
>>> - For connection oriented (CO) listening socket sock->state is
>>>   overloaded to tipc state SS_LISTEN.
>>>
>>> This series is split into three:
>>> 1. A bug fix in patch-1
>>> 2. Express all tipc states using a single variable. This is done in 
>>> patch#2-5.
>>> 3. Migrate the new tipc states to sk->sk_state. This is done in patch#6-12.
>>>
>>> The figures below represents the FSM after this series:
>>>
>>> Unconnected Sockets:
>>> +--+++
>>> | TIPC_UNCONNECTED |--->| TIPC_DISCONNECTING |
>>> +--+++
>>>
>>> Stream Server Listening Socket:
>>> +--++-+
>>> | TIPC_UNCONNECTED |--->| TIPC_LISTEN |
>>> +--++-+
>>>|
>>> ++|
>>> | TIPC_DISCONNECTING |<---+
>>> ++
>>>
>>> Stream Server Data Socket:
>>> +-++--+
>>> |TIPC_UNCONNECTED |--> | TIPC_ESTABLISHED |<---+
>>> +-++--+|
>>> ^   ||  |
>>> |   |+--+
>>> |   v
>>> +--+  +-+
>>> |TIPC_DISCONNECTING|<

Re: [tipc-discussion] [PATCH net v1 2/3] tipc: return early for non-blocking sockets at link congestion

2016-10-05 Thread Jon Maloy


> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Wednesday, 05 October, 2016 04:18
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net;
> ma...@donjonn.com; Ying Xue <ying@windriver.com>
> Subject: Re: [PATCH net v1 2/3] tipc: return early for non-blocking sockets 
> at link
> congestion
> 
> On 10/04/2016 09:26 PM, Jon Maloy wrote:
> >
> >
> >> -Original Message-
> >> From: Parthasarathy Bhuvaragan
> >> Sent: Tuesday, 04 October, 2016 08:29
> >> To: tipc-discussion@lists.sourceforge.net; Jon Maloy
> <jon.ma...@ericsson.com>;
> >> ma...@donjonn.com; Ying Xue <ying@windriver.com>
> >> Subject: [PATCH net v1 2/3] tipc: return early for non-blocking sockets at 
> >> link
> >> congestion
> >>
> >> Until now, in stream/mcast send() we pass the message to the link
> >> layer even when the link is congested and add the socket to the
> >> link's wakeup queue. This is unnecessary for non-blocking sockets.
> >> If a socket is set to non-blocking and sends multicast with zero
> >> back off time while receiving EAGAIN, we exhaust the memory.
> >>
> >> In this commit, we return immediately at stream/mcast send() for
> >> non-blocking sockets.
> >>
> >> Signed-off-by: Parthasarathy Bhuvaragan
> >> <parthasarathy.bhuvara...@ericsson.com>
> >> ---
> >>  net/tipc/socket.c | 6 ++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> >> index f9f5f3c3dab5..adf3e6ecf61e 100644
> >> --- a/net/tipc/socket.c
> >> +++ b/net/tipc/socket.c
> >> @@ -697,6 +697,9 @@ static int tipc_sendmcast(struct  socket *sock, struct
> >> tipc_name_seq *seq,
> >>   uint mtu;
> >>   int rc;
> >>
> >> + if (!timeo && tsk->link_cong)
> >> + return -ELINKCONG;
> >
> > Why is the !timeo test needed ?
> [partha] When a socket with a non-zero sndtimeo experiences congestion,
> the socket is put to sleep for the specified duration and is woken if
> congestion ceases and we attempt to transmit the buffer. If we had
> experienced congestion for the entire duration, then we return
> ELINKLCONG to the user. Hence I can return immediately only when timeout
> == 0.

Are you saying that we can have non-blocking sockets with a non-zero timeout?
I guess my point was that if the socket is blocking, i.e., the timeout is != 0, 
you will never get into this second call in the first place, since the socket 
already is blocked if sk->link_cong is true. Hence no need for testing on the 
timer value. Unsure if this is valid with multiple threads though.

> >
> > ///jon
> >
> >> +
> >>   msg_set_type(mhdr, TIPC_MCAST_MSG);
> >>   msg_set_lookup_scope(mhdr, TIPC_CLUSTER_SCOPE);
> >>   msg_set_destport(mhdr, 0);
> >> @@ -1072,6 +1075,9 @@ static int __tipc_send_stream(struct socket *sock,
> >> struct msghdr *m, size_t dsz)
> >>   }
> >>
> >>   timeo = sock_sndtimeo(sk, m->msg_flags & MSG_DONTWAIT);
> >> + if (!timeo && tsk->link_cong)
> >> + return -ELINKCONG;
> >> +
> >>   dnode = tsk_peer_node(tsk);
> >>   skb_queue_head_init();
> >>
> >> --
> >> 2.1.4
> >

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net v1 2/3] tipc: return early for non-blocking sockets at link congestion

2016-10-06 Thread Jon Maloy
> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Thursday, 06 October, 2016 04:35
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net;
> ma...@donjonn.com; Ying Xue <ying@windriver.com>
> Subject: Re: [PATCH net v1 2/3] tipc: return early for non-blocking sockets 
> at link
> congestion
> 
> 
> 
> On 10/05/2016 05:25 PM, Jon Maloy wrote:
> >
> >
> >> -Original Message-
> >> From: Parthasarathy Bhuvaragan
> >> Sent: Wednesday, 05 October, 2016 04:18
> >> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net;
> >> ma...@donjonn.com; Ying Xue <ying@windriver.com>
> >> Subject: Re: [PATCH net v1 2/3] tipc: return early for non-blocking 
> >> sockets at
> link
> >> congestion
> >>
> >> On 10/04/2016 09:26 PM, Jon Maloy wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Parthasarathy Bhuvaragan
> >>>> Sent: Tuesday, 04 October, 2016 08:29
> >>>> To: tipc-discussion@lists.sourceforge.net; Jon Maloy
> >> <jon.ma...@ericsson.com>;
> >>>> ma...@donjonn.com; Ying Xue <ying@windriver.com>
> >>>> Subject: [PATCH net v1 2/3] tipc: return early for non-blocking sockets 
> >>>> at
> link
> >>>> congestion
> >>>>
> >>>> Until now, in stream/mcast send() we pass the message to the link
> >>>> layer even when the link is congested and add the socket to the
> >>>> link's wakeup queue. This is unnecessary for non-blocking sockets.
> >>>> If a socket is set to non-blocking and sends multicast with zero
> >>>> back off time while receiving EAGAIN, we exhaust the memory.
> >>>>
> >>>> In this commit, we return immediately at stream/mcast send() for
> >>>> non-blocking sockets.
> >>>>
> >>>> Signed-off-by: Parthasarathy Bhuvaragan
> >>>> <parthasarathy.bhuvara...@ericsson.com>
> >>>> ---
> >>>>  net/tipc/socket.c | 6 ++
> >>>>  1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> >>>> index f9f5f3c3dab5..adf3e6ecf61e 100644
> >>>> --- a/net/tipc/socket.c
> >>>> +++ b/net/tipc/socket.c
> >>>> @@ -697,6 +697,9 @@ static int tipc_sendmcast(struct  socket *sock, 
> >>>> struct
> >>>> tipc_name_seq *seq,
> >>>>   uint mtu;
> >>>>   int rc;
> >>>>
> >>>> + if (!timeo && tsk->link_cong)
> >>>> + return -ELINKCONG;
> >>>
> >>> Why is the !timeo test needed ?
> >> [partha] When a socket with a non-zero sndtimeo experiences congestion,
> >> the socket is put to sleep for the specified duration and is woken if
> >> congestion ceases and we attempt to transmit the buffer. If we had
> >> experienced congestion for the entire duration, then we return
> >> ELINKLCONG to the user. Hence I can return immediately only when timeout
> >> == 0.
> >
> > Are you saying that we can have non-blocking sockets with a non-zero 
> > timeout?
> > I guess my point was that if the socket is blocking, i.e., the timeout is 
> > != 0, you
> will never get into this second call in the first place, since the socket 
> already is
> blocked if sk->link_cong is true. Hence no need for testing on the timer 
> value.
> Unsure if this is valid with multiple threads though.
> [partha] zero timeout means that either the socket is non-blocking or
> this message has MSG_DONTWAIT flag set (irrespective of the socket being
> blocking or non-blocking). From the timeout I derive that this user has
> no intention of waiting.
> For multi-threaded applications, each thread can call a blocking
> sendmsg() or call sendmsg() with MSG_DONTWAIT flag or use several thread
> instances of both. The fix should cover all the above.

Ok. It seems I haven't fully understood this aspect of the API.
Acked-by: jon

> >
> >>>
> >>> ///jon
> >>>
> >>>> +
> >>>>   msg_set_type(mhdr, TIPC_MCAST_MSG);
> >>>>   msg_set_lookup_scope(mhdr, TIPC_CLUSTER_SCOPE);
> >>>>   msg_set_destport(mhdr, 0);
> >>>> @@ -1072,6 +1075,9 @@ static int __tipc_send_stream(struct socket *sock,
> >>>> struct msghdr *m, size_t dsz)
> >>>>   }
> >>>>
> >>>>   timeo = sock_sndtimeo(sk, m->msg_flags & MSG_DONTWAIT);
> >>>> + if (!timeo && tsk->link_cong)
> >>>> + return -ELINKCONG;
> >>>> +
> >>>>   dnode = tsk_peer_node(tsk);
> >>>>   skb_queue_head_init();
> >>>>
> >>>> --
> >>>> 2.1.4
> >>>

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net v1 3/3] tipc: wakeup sleeping users at disconnect

2016-10-04 Thread Jon Maloy
Acked-by: jon


> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Tuesday, 04 October, 2016 08:29
> To: tipc-discussion@lists.sourceforge.net; Jon Maloy <jon.ma...@ericsson.com>;
> ma...@donjonn.com; Ying Xue <ying@windriver.com>
> Subject: [PATCH net v1 3/3] tipc: wakeup sleeping users at disconnect
> 
> Until now, in filter_connect() when we terminate a connection due to
> an error message from peer, we set the socket state to DISCONNECTING.
> 
> The socket is notified about this broken connection using EPIPE when
> a user tries to send a message. However if a socket was waiting on a
> poll() while the connection is being terminated, we fail to wakeup
> that socket.
> 
> In this commit, we wakeup sleeping sockets at connection termination.
> 
> Signed-off-by: Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>
> ---
>  net/tipc/socket.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index adf3e6ecf61e..cd01deb1da9c 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1599,6 +1599,7 @@ static bool filter_connect(struct tipc_sock *tsk, struct
> sk_buff *skb)
>   /* Let timer expire on it's own */
>   tipc_node_remove_conn(net, tsk_peer_node(tsk),
> tsk->portid);
> + sk->sk_state_change(sk);
>   }
>   return true;
> 
> --
> 2.1.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net v1 2/3] tipc: return early for non-blocking sockets at link congestion

2016-10-04 Thread Jon Maloy


> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Tuesday, 04 October, 2016 08:29
> To: tipc-discussion@lists.sourceforge.net; Jon Maloy <jon.ma...@ericsson.com>;
> ma...@donjonn.com; Ying Xue <ying@windriver.com>
> Subject: [PATCH net v1 2/3] tipc: return early for non-blocking sockets at 
> link
> congestion
> 
> Until now, in stream/mcast send() we pass the message to the link
> layer even when the link is congested and add the socket to the
> link's wakeup queue. This is unnecessary for non-blocking sockets.
> If a socket is set to non-blocking and sends multicast with zero
> back off time while receiving EAGAIN, we exhaust the memory.
> 
> In this commit, we return immediately at stream/mcast send() for
> non-blocking sockets.
> 
> Signed-off-by: Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>
> ---
>  net/tipc/socket.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index f9f5f3c3dab5..adf3e6ecf61e 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -697,6 +697,9 @@ static int tipc_sendmcast(struct  socket *sock, struct
> tipc_name_seq *seq,
>   uint mtu;
>   int rc;
> 
> + if (!timeo && tsk->link_cong)
> + return -ELINKCONG;

Why is the !timeo test needed ?

///jon

> +
>   msg_set_type(mhdr, TIPC_MCAST_MSG);
>   msg_set_lookup_scope(mhdr, TIPC_CLUSTER_SCOPE);
>   msg_set_destport(mhdr, 0);
> @@ -1072,6 +1075,9 @@ static int __tipc_send_stream(struct socket *sock,
> struct msghdr *m, size_t dsz)
>   }
> 
>   timeo = sock_sndtimeo(sk, m->msg_flags & MSG_DONTWAIT);
> + if (!timeo && tsk->link_cong)
> + return -ELINKCONG;
> +
>   dnode = tsk_peer_node(tsk);
>   skb_queue_head_init();
> 
> --
> 2.1.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net-next v4 00/14] tipc: create socket FSM using sk_state

2016-09-19 Thread Jon Maloy
This looks fine to me now.
Acked-by: Jon Maloy

> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Monday, 19 September, 2016 14:56
> To: tipc-discussion@lists.sourceforge.net; Jon Maloy <jon.ma...@ericsson.com>;
> ma...@donjonn.com; Ying Xue <ying@windriver.com>
> Subject: [PATCH net-next v4 00/14] tipc: create socket FSM using sk_state
> 
> The following issues with the current socket layer hinders socket diagnostics
> implementation, which led to this patch series. The series does not add any
> functional change.
> 
> 1. tipc socket state is derived from multiple variables like
>sock->state, tsk->probing_state and tsk->connected. This style forces
>us to export multiple attributes to the user space, which has to be
>backward compatible.
> 
> 2. Abuse of sock->state cannot be exported to user-space without
>requiring tipc specific hacks in the user-space.
>- For connection less (CL) sockets sock->state is overloaded to
>  tipc state SS_READY.
>- For connection oriented (CO) listening socket sock->state is
>  overloaded to tipc state SS_LISTEN.
> 
> This series is split into four:
> 1. A bug fix in patch#1
> 2. Minor cleanups in patch#2-3
> 3. Express all tipc states using a single variable. This is done in patch#4-7.
> 4. Migrate the new tipc states to sk->sk_state. This is done in patch#8-14.
> 
> The figures below represents the FSM after this series:
> 
> For connectionless sockets:
> +---+   +--+
> | TIPC_OPEN |-->| TIPC_CLOSING |
> +---+   +--+
> 
> Stream Server Listening Socket:
> +---+   +-+
> | TIPC_OPEN |-->| TIPC_LISTEN |
> +---+   +-+
> |
> +--+|
> | TIPC_CLOSING |<---+
> +--+
> 
> Stream Server Data Socket:
> +---+   +--+
> | TIPC_OPEN |-->| TIPC_ESTABLISHED |<---+
> +---+   +--+|
> ^   ||  |
> |   |+--+
> |   v
> +--+
> | TIPC_PROBING |
> +--+
>|
>|
>v
> +--+++
> | TIPC_CLOSING |<---| TIPC_DISCONNECTING |
> +--+++
> 
> Stream Socket Client:
> +---+   +-+
> | TIPC_OPEN |-->| TIPC_CONNECTING |
> +---+   +-+
> |
> |
> v
> +--+
> | TIPC_ESTABLISHED |<---+
> +--+|
>  ^   || |
>  |   |+-+
>  |   v
> +--+
> | TIPC_PROBING |
> +--+
>|
>|
>v
> +--+++
> | TIPC_CLOSING |<---| TIPC_DISCONNECTING |
> +--+++
> 
> NOTE:
> This is just a base refractoring required for socket diagnostics.
> Implementation of TIPC socket diagnostics will be sent as a
> separate series.
> 
> ---
> I plan to submit this series after v4.8 release cycle.
> 
> v4: Addressed comments from Jon Maloy:
> - Added a new patch #2 to rename variable.
> Make state names more readable and consistent:
> - Renamed the state TIPC_UNCONNECTED to TIPC_OPEN.
> - Adjusted the scope for TIPC_DISCONNECTING in patch#11.
> - Added a new state TIPC_CLOSING in patch#12.
> 
> v3: - Address comments from Ying Xue <ying@windriver.com> in
>   patch #7, #11.
> - Rebase on latest netnext which contains fixes for broadcast NACK
>   that seems to make ptts regression stable.
> - Ran ptts suits for 6000 iterations for 5+ hours.
> 
> Parthasarathy Bhuvaragan (14):
>   tipc: set kern=0 in sk_alloc() during tipc_accept()
>   tipc: rename struct tipc_skb_cb member handle to bytes_read
>   tipc: rename tsk->remote to tsk->peer for consistent naming
>   tipc: remove tsk->connected for connectionless sockets
>   tipc: remove tsk->connected from tipc_sock
>   tipc: remove probing_intv from tipc_sock
>   tipc: remove socket sta

Re: [tipc-discussion] Strange TIPC error

2016-09-23 Thread Jon Maloy


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Wednesday, 21 September, 2016 15:52
> To: 'Jon Maloy' <ma...@donjonn.com>; tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] Strange TIPC error
> 
> I do not think load on network was any higher than usual for this system (max
> 700mbit/sec of inbound video multicast on same interface)

If it is multicast the load on the switch/bridge may be considerable, depending 
on number of destinations.
Is it possible you to obtain any statistics on dropped packets ?

> Both nodes saw same error and was running same kernel.
> 
> After reboot, the messages went away.
> 
> tipc-config was ran right before this, by a script trying to set network ID 
> and tipc
> addresses. This would have failed as the links were already configured.
> (And no we have not yet switched to use “ip tipc” command, as we have to still
> work on some older kernels too).

I can have a look into the patches posted after 4.4, and see if there are any 
missing that could be relevant.

///jon

> 
> From: Jon Maloy [mailto:ma...@donjonn.com]
> Sent: Wednesday, September 21, 2016 3:01 AM
> To: Rune Torgersen; tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] Strange TIPC error
> 
> Hi Rune,
> It means that a link is reset after having tried to retransmit the same 
> packet > 100
> times. When we have seen this before it has normally been an indication that 
> the
> (very short) link monitoring messages still go through, and that there is
> something wrong with the packet being retransmitted. It cannot be the MTU,
> since the first packet is only 265 bytes long, and as far as I can see the 
> source
> address is also ok.
> 
> Are you running the same kernel version on node 1.1.2? Do you have high load
> on on the network when this happens? If it is repeatable, maybe you could
> provide a wireshark dump?
> 
> BR
> ///jon
> 
> 
> 
> De : Rune Torgersen <ru...@innovsys.com<mailto:ru...@innovsys.com>>
> À : "tipc-discussion@lists.sourceforge.net<mailto:tipc-
> discuss...@lists.sourceforge.net>"  discuss...@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>>
> Envoyé le : mardi 20 Septembre 2016 17h44
> Objet : [tipc-discussion] Strange TIPC error
> 
> Anybody know what this means?
> (Running Ubuntu 4.4.0-31 kernel)
> 
> [3706037.493994] Retransmission failure on link <1.1.1:eth0-1.1.2:eth0>
> [3706037.494000] Resetting link  Link <1.1.1:eth0-1.1.2:eth0> state e
> [3706037.494002] XMTQ: 27 [45216-45242], BKLGQ: 0, SNDNX: 45243, RCVNX:
> 36674
> [3706037.494004] Failed msg: usr 0, typ 2, len 265, err 0
> [3706037.494005] sqno 45216, prev: 1001001, src: 1001001
> [3706328.908044] Retransmission failure on link <1.1.1:eth0-1.1.2:eth0>
> [3706328.908048] Resetting link  Link <1.1.1:eth0-1.1.2:eth0> state e
> [3706328.908049] XMTQ: 44 [649-692], BKLGQ: 0, SNDNX: 693, RCVNX: 893
> [3706328.908050] Failed msg: usr 12, typ 1, len 1460, err 0
> [3706328.908051] sqno 649, prev: 1001001, src: 1001001
> 
> 
> --
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net<mailto:tipc-
> discuss...@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
> 
> --
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net-next 1/3] tipc: transfer broadcast nacks in link state messages

2016-08-23 Thread Jon Maloy


> -Original Message-
> From: Xue, Ying [mailto:ying@windriver.com]
> Sent: Tuesday, 23 August, 2016 05:48
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net;
> Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>; Richard
> Alpe <richard.a...@ericsson.com>
> Cc: ma...@donjonn.com; gbala...@gmail.com
> Subject: RE: [PATCH net-next 1/3] tipc: transfer broadcast nacks in link state
> messages
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Wednesday, August 17, 2016 2:09 AM
> To: tipc-discussion@lists.sourceforge.net;
> parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com;
> jon.ma...@ericsson.com
> Cc: ma...@donjonn.com; gbala...@gmail.com
> Subject: [PATCH net-next 1/3] tipc: transfer broadcast nacks in link state
> messages
> 
> When we send broadcasts in clusters of more 70-80 nodes, we sometimes see
> the broadcast link resetting because of an excessive number of 
> retransmissions.
> This is caused by a combination of two factors:
> 
> 1) A 'NACK crunch", where loss of broadcast packets is discovered
>and NACK'ed by several nodes simultaneously, leading to multiple
>redundant broadcast retransmissions.
> 
> 2) The fact that the NACKS as such also are sent as broadcast, leading
>to excessive load and packet loss on the transmitting switch/bridge.
> 
> This commit deals with the latter problem, by moving sending of broadcast 
> nacks
> from the dedicated BCAST_PROTOCOL/NACK message type to regular unicast
> LINK_PROTOCOL/STATE messages. We allocate 10 unused bits in word 8 of the
> said message for this purpose, and introduce a new capability bit,
> TIPC_BCAST_STATE_NACK in order to keep the change backwards compatible.
> 
> [Ying] Using unicast to send NACK message is much better than bcast.
> 
> 
> /* tipc_link_bc_sync_rcv - update rcv link according to peer's send state
>   */
> -void tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
> -struct sk_buff_head *xmitq)
> +int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
> +   struct sk_buff_head *xmitq)
>  {
>   u16 peers_snd_nxt = msg_bc_snd_nxt(hdr);
> + u16 from = msg_bcast_ack(hdr) + 1;
> + u16 to = from + msg_bc_gap(hdr) - 1;
> + int rc = 0;
> 
>   if (!link_is_up(l))
> - return;
> + return rc;
> 
>   if (!msg_peer_node_is_up(hdr))
> - return;
> + return rc;
> 
>   /* Open when peer ackowledges our bcast init msg (pkt #1) */
>   if (msg_ack(hdr))
>   l->bc_peer_is_up = true;
> 
>   if (!l->bc_peer_is_up)
> - return;
> + return rc;
> 
>   /* Ignore if peers_snd_nxt goes beyond receive window */
>   if (more(peers_snd_nxt, l->rcv_nxt + l->window))
> - return;
> + return rc;
> +
> + if (!less(to, from)) {
> + rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq);
> + l->stats.recv_nacks++;
> + }
> +
> + l->snd_nxt = peers_snd_nxt;
> + if (link_bc_rcv_gap(l))
> + rc |= TIPC_LINK_SND_STATE;
> +
> + /* Return now if sender supports nack via STATE messages */
> + if (l->peer_caps & TIPC_BCAST_STATE_NACK)
> + return rc;
> +
> 
> [Ying] If peer nodes doesn't support the new NACK way, it seems we not only
> retransmit the missed messages through tipc_link_retrans(), but also we send a
> NACK message created with tipc_link_build_bc_proto_msg(). But in the old
> mode, we don't need to retransmit the missed messages at the moment. Do you
> think this different behavior is right?

In the old code link_bc_sync_rcv() was dealing only with one task: to let the 
receiving endpoint discover gaps in the packet sequence, and if necessary send 
NACKs according to certain rules.

In the new code this function has two tasks: 1) the receiving endpoint still 
discovers gaps, but now sends NACKs unconditionally (in STATE messages).  2) 
The sending endpoint registers reported gaps in msg_bc_gap() (i.e., a received 
NACKs), and performs retransmission as requested.

Now, if the receiving node supports the new algorithm and the sending doesn't, 
the receiver will produce an 'old' NACK message and send it out, but not a 
'new' one. Neither will it retransmit anything, both because there probably is 
nothing to retransmit (it is the receiver), and because msg_bc_gap() can only 
be zero,  so 'to' will be less than 'from', and nothing will ever be 
retransmitted.

If it is the other way around, the sender suppor

[tipc-discussion] [PATCH net 1/1] tipc: fix broadcast link synchronization problem

2016-09-30 Thread Jon Maloy
In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization
criteria") we tried to fix a problem with the initial synchronization
of broadcast link acknowledge values. Unfortunately that solution is
not sufficient to solve the issue.

We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
broadcast acknowledge numbers, leading to premature opening of the
broadcast link. When the bypassed packets finally arrive, they are
inadvertently accepted, and the already correctly initialized
acknowledge number in the broadcast receive link is overwritten by
the invalid (zero) value of the said packets. After this the broadcast
link goes stale.

We now fix this by identifying packets where we know the acknowledge
value is or may be invalid, and then ignoring those values.

For BCAST_PROTOCOL and NAME_DISTRIBUTOR packets this identification is
easy; their broadcast acknowledge value is always zero, while false
positives are far between and harmless.

For LINK_PROTOCOL/STATE packets we cannot accept false positives, so
we have to claim an unused bit in the header to indicate that the value
is invalid. This minor protocol change is fully backwards compatible.

Reported-by: John Thompson <thompa@gmail.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c | 17 -
 net/tipc/bcast.h |  3 ++-
 net/tipc/msg.h   | 10 ++
 net/tipc/node.c  |  2 +-
 4 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 753f774..62d2dfd 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -247,15 +247,22 @@ int tipc_bcast_rcv(struct net *net, struct tipc_link *l, 
struct sk_buff *skb)
  *
  * RCU is locked, no other locks set
  */
-void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32 acked)
+void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
+   struct tipc_msg *hdr)
 {
struct sk_buff_head *inputq = _bc_base(net)->inputq;
+   int usr = msg_user(hdr);
+   u16 bc_ack = msg_bcast_ack(hdr);
struct sk_buff_head xmitq;
 
+   /* Ignore bc acks sent by peer before bcast synch point was received */
+   if (!bc_ack && ((usr == BCAST_PROTOCOL) || (usr == NAME_DISTRIBUTOR)))
+   return;
+
__skb_queue_head_init();
 
tipc_bcast_lock(net);
-   tipc_link_bc_ack_rcv(l, acked, );
+   tipc_link_bc_ack_rcv(l, bc_ack, );
tipc_bcast_unlock(net);
 
tipc_bcbase_xmit(net, );
@@ -279,11 +286,11 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link 
*l,
__skb_queue_head_init();
 
tipc_bcast_lock(net);
-   if (msg_type(hdr) == STATE_MSG) {
+   if (msg_type(hdr) != STATE_MSG) {
+   tipc_link_bc_init_rcv(l, hdr);
+   } else if (!msg_bc_ack_invalid(hdr)) {
tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), );
rc = tipc_link_bc_sync_rcv(l, hdr, );
-   } else {
-   tipc_link_bc_init_rcv(l, hdr);
}
tipc_bcast_unlock(net);
 
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 5ffe344..855d53c 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -55,7 +55,8 @@ void tipc_bcast_dec_bearer_dst_cnt(struct net *net, int 
bearer_id);
 int  tipc_bcast_get_mtu(struct net *net);
 int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list);
 int tipc_bcast_rcv(struct net *net, struct tipc_link *l, struct sk_buff *skb);
-void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32 acked);
+void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
+   struct tipc_msg *hdr);
 int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
struct tipc_msg *hdr);
 int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg);
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index c3832cd..ef37e49 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -714,6 +714,16 @@ static inline void msg_set_peer_stopping(struct tipc_msg 
*m, u32 s)
msg_set_bits(m, 5, 13, 0x1, s);
 }
 
+static inline bool msg_bc_ack_invalid(struct tipc_msg *m)
+{
+   return msg_bits(m, 5, 14, 0x1);
+}
+
+static inline void msg_set_bc_ack_invalid(struct tipc_msg *m, bool invalid)
+{
+   msg_set_bits(m, 5, 14, 0x1, invalid);
+}
+
 static inline char *msg_media_addr(struct tipc_msg *m)
 {
return (char *)>hdr[TIPC_MEDIA_INFO_OFFSET];
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 7ef14e2..9d2f4c2 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1535,7 +1535,7 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, 
struct tipc_bearer *b)
if (unlikely(usr == LINK_PROTOCOL))
tipc_node_bc_sync_rcv(n, hdr, bearer_id, );
else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack))
- 

Re: [tipc-discussion] [PATCH net 1/1] tipc: fix broadcast link synchronization problem

2016-09-30 Thread Jon Maloy
Hi John,
I think this one should solve the problem. Pease try it when you are back from 
vacation, and give feedback.

BR
///jon


> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Friday, 30 September, 2016 18:07
> To: tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>; Ying Xue
> <ying@windriver.com>; ri...@highwind.se; Jon Maloy
> <jon.ma...@ericsson.com>
> Cc: ma...@donjonn.com; thompa@gmail.com
> Subject: [PATCH net 1/1] tipc: fix broadcast link synchronization problem
> 
> In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization
> criteria") we tried to fix a problem with the initial synchronization
> of broadcast link acknowledge values. Unfortunately that solution is
> not sufficient to solve the issue.
> 
> We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
> non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
> initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
> broadcast acknowledge numbers, leading to premature opening of the
> broadcast link. When the bypassed packets finally arrive, they are
> inadvertently accepted, and the already correctly initialized
> acknowledge number in the broadcast receive link is overwritten by
> the invalid (zero) value of the said packets. After this the broadcast
> link goes stale.
> 
> We now fix this by identifying packets where we know the acknowledge
> value is or may be invalid, and then ignoring those values.
> 
> For BCAST_PROTOCOL and NAME_DISTRIBUTOR packets this identification is
> easy; their broadcast acknowledge value is always zero, while false
> positives are far between and harmless.
> 
> For LINK_PROTOCOL/STATE packets we cannot accept false positives, so
> we have to claim an unused bit in the header to indicate that the value
> is invalid. This minor protocol change is fully backwards compatible.
> 
> Reported-by: John Thompson <thompa@gmail.com>
> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> ---
>  net/tipc/bcast.c | 17 -
>  net/tipc/bcast.h |  3 ++-
>  net/tipc/msg.h   | 10 ++
>  net/tipc/node.c  |  2 +-
>  4 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
> index 753f774..62d2dfd 100644
> --- a/net/tipc/bcast.c
> +++ b/net/tipc/bcast.c
> @@ -247,15 +247,22 @@ int tipc_bcast_rcv(struct net *net, struct tipc_link *l,
> struct sk_buff *skb)
>   *
>   * RCU is locked, no other locks set
>   */
> -void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32 acked)
> +void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
> + struct tipc_msg *hdr)
>  {
>   struct sk_buff_head *inputq = _bc_base(net)->inputq;
> + int usr = msg_user(hdr);
> + u16 bc_ack = msg_bcast_ack(hdr);
>   struct sk_buff_head xmitq;
> 
> + /* Ignore bc acks sent by peer before bcast synch point was received */
> + if (!bc_ack && ((usr == BCAST_PROTOCOL) || (usr ==
> NAME_DISTRIBUTOR)))
> + return;
> +
>   __skb_queue_head_init();
> 
>   tipc_bcast_lock(net);
> - tipc_link_bc_ack_rcv(l, acked, );
> + tipc_link_bc_ack_rcv(l, bc_ack, );
>   tipc_bcast_unlock(net);
> 
>   tipc_bcbase_xmit(net, );
> @@ -279,11 +286,11 @@ int tipc_bcast_sync_rcv(struct net *net, struct 
> tipc_link
> *l,
>   __skb_queue_head_init();
> 
>   tipc_bcast_lock(net);
> - if (msg_type(hdr) == STATE_MSG) {
> + if (msg_type(hdr) != STATE_MSG) {
> + tipc_link_bc_init_rcv(l, hdr);
> + } else if (!msg_bc_ack_invalid(hdr)) {
>   tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), );
>   rc = tipc_link_bc_sync_rcv(l, hdr, );
> - } else {
> - tipc_link_bc_init_rcv(l, hdr);
>   }
>   tipc_bcast_unlock(net);
> 
> diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
> index 5ffe344..855d53c 100644
> --- a/net/tipc/bcast.h
> +++ b/net/tipc/bcast.h
> @@ -55,7 +55,8 @@ void tipc_bcast_dec_bearer_dst_cnt(struct net *net, int
> bearer_id);
>  int  tipc_bcast_get_mtu(struct net *net);
>  int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list);
>  int tipc_bcast_rcv(struct net *net, struct tipc_link *l, struct sk_buff 
> *skb);
> -void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32 acked);
> +void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
> + struct tipc_msg *hdr);
>  int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
>   struct tipc_msg *hdr);
>  int tipc_nl_ad

[tipc-discussion] [PATCH net v2 1/1] tipc: fix broadcast link synchronization problem

2016-10-01 Thread Jon Maloy
In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization
criteria") we tried to fix a problem with the initial synchronization
of broadcast link acknowledge values. Unfortunately that solution is
not sufficient to solve the issue.

We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
broadcast acknowledge numbers, leading to premature opening of the
broadcast link. When the bypassed packets finally arrive, they are
inadvertently accepted, and the already correctly initialized
acknowledge number in the broadcast receive link is overwritten by
the invalid (zero) value of the said packets. After this the broadcast
link goes stale.

We now fix this by marking the packets where we know the acknowledge
value is or may be invalid, and then ignoring the acks from those.

To this purpose, we claim an unused bit in the header to indicate that
the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL
synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
packets, plus those LINK_PROTOCOL packets sent out before the broadcast
links are fully synchronized.

This minor protocol update is fully backwards compatible.

Reported-by: John Thompson <thompa@gmail.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c  | 14 ++
 net/tipc/bcast.h  |  3 ++-
 net/tipc/link.c   |  2 ++
 net/tipc/msg.h| 17 +
 net/tipc/name_distr.c |  1 +
 net/tipc/node.c   |  2 +-
 6 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 753f774..aa1babb 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -247,11 +247,17 @@ int tipc_bcast_rcv(struct net *net, struct tipc_link *l, 
struct sk_buff *skb)
  *
  * RCU is locked, no other locks set
  */
-void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32 acked)
+void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
+   struct tipc_msg *hdr)
 {
struct sk_buff_head *inputq = _bc_base(net)->inputq;
+   u16 acked = msg_bcast_ack(hdr);
struct sk_buff_head xmitq;
 
+   /* Ignore bc acks sent by peer before bcast synch point was received */
+   if (msg_bc_ack_invalid(hdr))
+   return;
+
__skb_queue_head_init();
 
tipc_bcast_lock(net);
@@ -279,11 +285,11 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link 
*l,
__skb_queue_head_init();
 
tipc_bcast_lock(net);
-   if (msg_type(hdr) == STATE_MSG) {
+   if (msg_type(hdr) != STATE_MSG) {
+   tipc_link_bc_init_rcv(l, hdr);
+   } else if (!msg_bc_ack_invalid(hdr)) {
tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), );
rc = tipc_link_bc_sync_rcv(l, hdr, );
-   } else {
-   tipc_link_bc_init_rcv(l, hdr);
}
tipc_bcast_unlock(net);
 
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 5ffe344..855d53c 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -55,7 +55,8 @@ void tipc_bcast_dec_bearer_dst_cnt(struct net *net, int 
bearer_id);
 int  tipc_bcast_get_mtu(struct net *net);
 int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list);
 int tipc_bcast_rcv(struct net *net, struct tipc_link *l, struct sk_buff *skb);
-void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32 acked);
+void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
+   struct tipc_msg *hdr);
 int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
struct tipc_msg *hdr);
 int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg);
diff --git a/net/tipc/link.c b/net/tipc/link.c
index b36e16c..1055164 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1312,6 +1312,7 @@ static void tipc_link_build_proto_msg(struct tipc_link 
*l, int mtyp, bool probe,
msg_set_next_sent(hdr, l->snd_nxt);
msg_set_ack(hdr, l->rcv_nxt - 1);
msg_set_bcast_ack(hdr, bcl->rcv_nxt - 1);
+   msg_set_bc_ack_invalid(hdr, !node_up);
msg_set_last_bcast(hdr, l->bc_sndlink->snd_nxt - 1);
msg_set_link_tolerance(hdr, tolerance);
msg_set_linkprio(hdr, priority);
@@ -1574,6 +1575,7 @@ static void tipc_link_build_bc_init_msg(struct tipc_link 
*l,
__skb_queue_head_init();
if (!tipc_link_build_bc_proto_msg(l->bc_rcvlink, false, 0, ))
return;
+   msg_set_bc_ack_invalid(buf_msg(skb_peek()), true);
tipc_link_xmit(l, , xmitq);
 }
 
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index c3832cd..50a7398 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -714,6 +714,23 @@ static inline void msg_set_peer_stopping(struct tipc_msg 
*m, u32 s)
msg_set_bits(m, 5, 13, 0x1, s);
 }
 
+static inline bool msg

Re: [tipc-discussion] [PATCH net v2 1/1] tipc: fix broadcast link synchronization problem

2016-10-27 Thread Jon Maloy
Thank you John,I really appreciated your help and patience with this. I'll post 
the patch asap.
BR///jon


  De : John Thompson <thompa@gmail.com>
 À : Jon Maloy <ma...@donjonn.com> 
Cc : Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net; 
Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>; Ying Xue 
<ying@windriver.com>; ri...@highwind.se
 Envoyé le : jeudi 27 octobre 2016 16h26
 Objet : Re: [PATCH net v2 1/1] tipc: fix broadcast link synchronization problem
   
Hi Jon,
My overnight testing has shown no problems with the fix.  I think it is ok.  
JT
On Thu, Oct 27, 2016 at 4:30 PM, John Thompson <thompa@gmail.com> wrote:

Hi Jon,
I have had some help testing this but unfortunately the testing has not been 
able to be done yet.  It is to be done tonight a long run overnight test to see 
if we can reproduce the problem.
Sorry about the delay in getting back to you but the people assisting with the 
testing have been on other things.
I will be in touch tomorrow.JT

On Tue, Oct 25, 2016 at 2:15 PM, Jon Maloy <ma...@donjonn.com> wrote:

  Hi John, Any news about this? I would like to post this patch if you can 
confirm that it is working.
  ///jon
  
 On 10/12/2016 06:55 PM, John Thompson wrote:
  
 Hi, 
  My initial testing has looked good with this patch.  I am going to be 
performing more in depth testing over the weekend and will be in touch next 
week about how it has held up. 
  Thanks, JT 
   
 On Sat, Oct 1, 2016 at 10:23 PM, Jon Maloy <jon.ma...@ericsson.com> wrote:
 
In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization
 criteria") we tried to fix a problem with the initial synchronization
 of broadcast link acknowledge values. Unfortunately that solution is
 not sufficient to solve the issue.
 
 We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
 non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
 initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
 broadcast acknowledge numbers, leading to premature opening of the
 broadcast link. When the bypassed packets finally arrive, they are
 inadvertently accepted, and the already correctly initialized
 acknowledge number in the broadcast receive link is overwritten by
 the invalid (zero) value of the said packets. After this the broadcast
 link goes stale.
 
 We now fix this by marking the packets where we know the acknowledge
 value is or may be invalid, and then ignoring the acks from those.
 
 To this purpose, we claim an unused bit in the header to indicate that
 the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL
 synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
 packets, plus those LINK_PROTOCOL packets sent out before the broadcast
 links are fully synchronized.
 
 This minor protocol update is fully backwards compatible.
 
 Reported-by: John Thompson <thompa@gmail.com>
 Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
 ---
  net/tipc/bcast.c      | 14 ++
  net/tipc/bcast.h      |  3 ++-
  net/tipc/link.c       |  2 ++
  net/tipc/msg.h        | 17 +
  net/tipc/name_distr.c |  1 +
  net/tipc/node.c       |  2 +-
  6 files changed, 33 insertions(+), 6 deletions(-)
 
 diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
 index 753f774..aa1babb 100644
 --- a/net/tipc/bcast.c
 +++ b/net/tipc/bcast.c
 @@ -247,11 +247,17 @@ int tipc_bcast_rcv(struct net *net, struct tipc_link *l, 
struct sk_buff *skb)
   *
   * RCU is locked, no other locks set
   */
 -void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32 acked)
 +void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
 +                       struct tipc_msg *hdr)
  {
         struct sk_buff_head *inputq = _bc_base(net)->inputq;
 +       u16 acked = msg_bcast_ack(hdr);
         struct sk_buff_head xmitq;
 
 +       /* Ignore bc acks sent by peer before bcast synch point was received */
 +       if (msg_bc_ack_invalid(hdr))
 +               return;
 +
         __skb_queue_head_init();
 
         tipc_bcast_lock(net);
 @@ -279,11 +285,11 @@ int tipc_bcast_sync_rcv(struct net *net, struct 
tipc_link *l,
         __skb_queue_head_init();
 
         tipc_bcast_lock(net);
 -       if (msg_type(hdr) == STATE_MSG) {
 +       if (msg_type(hdr) != STATE_MSG) {
 +               tipc_link_bc_init_rcv(l, hdr);
 +       } else if (!msg_bc_ack_invalid(hdr)) {
                 tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), );
                 rc = tipc_link_bc_sync_rcv(l, hdr, );
 -       } else {
 -               tipc_link_bc_init_rcv(l, hdr);
         }
         tipc_bcast_unlock(net);
 
 diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
 index 5ffe344..855d53c 100644
 --- a/net/tipc/bcast.h
 +++ b/net/tipc/bcast.h
 @@ -55,7 +55,8 @@ void tipc_bcast_dec_bearer_dst_cnt( struct net *net, int 
bearer_id);
  int  tipc_bcast_get_mtu(struct 

[tipc-discussion] [PATCH net 1/1] tipc: fix broadcast link synchronization problem

2016-10-27 Thread Jon Maloy
In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization
criteria") we tried to fix a problem with the initial synchronization
of broadcast link acknowledge values. Unfortunately that solution is
not sufficient to solve the issue.

We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
broadcast acknowledge numbers, leading to premature opening of the
broadcast link. When the bypassed packets finally arrive, they are
inadvertently accepted, and the already correctly initialized
acknowledge number in the broadcast receive link is overwritten by
the invalid (zero) value of the said packets. After this the broadcast
link goes stale.

We now fix this by marking the packets where we know the acknowledge
value is or may be invalid, and then ignoring the acks from those.

To this purpose, we claim an unused bit in the header to indicate that
the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL
synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
packets, plus those LINK_PROTOCOL packets sent out before the broadcast
links are fully synchronized.

This minor protocol update is fully backwards compatible.

Reported-by: John Thompson <thompa@gmail.com>
Tested-by: John Thompson <thompa....@gmail.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c  | 14 ++
 net/tipc/bcast.h  |  3 ++-
 net/tipc/link.c   |  2 ++
 net/tipc/msg.h| 17 +
 net/tipc/name_distr.c |  1 +
 net/tipc/node.c   |  2 +-
 6 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 753f774..aa1babb 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -247,11 +247,17 @@ int tipc_bcast_rcv(struct net *net, struct tipc_link *l, 
struct sk_buff *skb)
  *
  * RCU is locked, no other locks set
  */
-void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32 acked)
+void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
+   struct tipc_msg *hdr)
 {
struct sk_buff_head *inputq = _bc_base(net)->inputq;
+   u16 acked = msg_bcast_ack(hdr);
struct sk_buff_head xmitq;
 
+   /* Ignore bc acks sent by peer before bcast synch point was received */
+   if (msg_bc_ack_invalid(hdr))
+   return;
+
__skb_queue_head_init();
 
tipc_bcast_lock(net);
@@ -279,11 +285,11 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link 
*l,
__skb_queue_head_init();
 
tipc_bcast_lock(net);
-   if (msg_type(hdr) == STATE_MSG) {
+   if (msg_type(hdr) != STATE_MSG) {
+   tipc_link_bc_init_rcv(l, hdr);
+   } else if (!msg_bc_ack_invalid(hdr)) {
tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), );
rc = tipc_link_bc_sync_rcv(l, hdr, );
-   } else {
-   tipc_link_bc_init_rcv(l, hdr);
}
tipc_bcast_unlock(net);
 
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 5ffe344..855d53c 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -55,7 +55,8 @@ void tipc_bcast_dec_bearer_dst_cnt(struct net *net, int 
bearer_id);
 int  tipc_bcast_get_mtu(struct net *net);
 int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list);
 int tipc_bcast_rcv(struct net *net, struct tipc_link *l, struct sk_buff *skb);
-void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32 acked);
+void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
+   struct tipc_msg *hdr);
 int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
struct tipc_msg *hdr);
 int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg);
diff --git a/net/tipc/link.c b/net/tipc/link.c
index b36e16c..1055164 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1312,6 +1312,7 @@ static void tipc_link_build_proto_msg(struct tipc_link 
*l, int mtyp, bool probe,
msg_set_next_sent(hdr, l->snd_nxt);
msg_set_ack(hdr, l->rcv_nxt - 1);
msg_set_bcast_ack(hdr, bcl->rcv_nxt - 1);
+   msg_set_bc_ack_invalid(hdr, !node_up);
msg_set_last_bcast(hdr, l->bc_sndlink->snd_nxt - 1);
msg_set_link_tolerance(hdr, tolerance);
msg_set_linkprio(hdr, priority);
@@ -1574,6 +1575,7 @@ static void tipc_link_build_bc_init_msg(struct tipc_link 
*l,
__skb_queue_head_init();
if (!tipc_link_build_bc_proto_msg(l->bc_rcvlink, false, 0, ))
return;
+   msg_set_bc_ack_invalid(buf_msg(skb_peek()), true);
tipc_link_xmit(l, , xmitq);
 }
 
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index c3832cd..50a7398 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -714,6 +714,23 @@ static inline void msg_set_peer_stopping(struct tipc_msg 
*m, u32 s)
 

[tipc-discussion] Specifying port number on udp bearer

2016-10-19 Thread Jon Maloy
Hi Richard,
I had a question about whether it is possible to simultaneously monitor IPv4 
and IPv6 connectivity via the same interface by enabling two udp bearers, one 
for each protocol, on the same interface. I tried this, and it failed they both 
are using the same (hard coded?) port number. 
Is it possible to somehow enable two bearers with different port numbers via 
the tipc tool? If not, would it be easy to fix? (Maybe a new "port" indicator 
in the enable command?) Would that solve the problem, or is there more that 
needs to be done.

In the old udp bearer code that was abandoned > 10 years ago the udp adapter 
code would just step the port number and try again and again until it was 
successful...

Regards
///jon


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net v2 1/1] tipc: fix broadcast link synchronization problem

2016-10-24 Thread Jon Maloy
Hi John,

Any news about this? I would like to post this patch if you can confirm 
that it is working.

///jon


On 10/12/2016 06:55 PM, John Thompson wrote:
> Hi,
>
> My initial testing has looked good with this patch.  I am going to be 
> performing more
> in depth testing over the weekend and will be in touch next week about 
> how it has
> held up.
>
> Thanks,
> JT
>
>
> On Sat, Oct 1, 2016 at 10:23 PM, Jon Maloy <jon.ma...@ericsson.com 
> <mailto:jon.ma...@ericsson.com>> wrote:
>
> In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization
> criteria") we tried to fix a problem with the initial synchronization
> of broadcast link acknowledge values. Unfortunately that solution is
> not sufficient to solve the issue.
>
> We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
> non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
> initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
> broadcast acknowledge numbers, leading to premature opening of the
> broadcast link. When the bypassed packets finally arrive, they are
> inadvertently accepted, and the already correctly initialized
> acknowledge number in the broadcast receive link is overwritten by
> the invalid (zero) value of the said packets. After this the broadcast
> link goes stale.
>
> We now fix this by marking the packets where we know the acknowledge
> value is or may be invalid, and then ignoring the acks from those.
>
> To this purpose, we claim an unused bit in the header to indicate that
> the value is invalid. We set the bit to 1 in the initial
> BCAST_PROTOCOL
> synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
> packets, plus those LINK_PROTOCOL packets sent out before the
> broadcast
> links are fully synchronized.
>
> This minor protocol update is fully backwards compatible.
>
> Reported-by: John Thompson <thompa@gmail.com
> <mailto:thompa@gmail.com>>
> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com
> <mailto:jon.ma...@ericsson.com>>
> ---
>  net/tipc/bcast.c  | 14 ++
>  net/tipc/bcast.h  |  3 ++-
>  net/tipc/link.c   |  2 ++
>  net/tipc/msg.h| 17 +
>  net/tipc/name_distr.c |  1 +
>  net/tipc/node.c   |  2 +-
>  6 files changed, 33 insertions(+), 6 deletions(-)
>
> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
> index 753f774..aa1babb 100644
> --- a/net/tipc/bcast.c
> +++ b/net/tipc/bcast.c
> @@ -247,11 +247,17 @@ int tipc_bcast_rcv(struct net *net, struct
> tipc_link *l, struct sk_buff *skb)
>   *
>   * RCU is locked, no other locks set
>   */
> -void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32
> acked)
> +void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
> +   struct tipc_msg *hdr)
>  {
> struct sk_buff_head *inputq = _bc_base(net)->inputq;
> +   u16 acked = msg_bcast_ack(hdr);
> struct sk_buff_head xmitq;
>
> +   /* Ignore bc acks sent by peer before bcast synch point
> was received */
> +   if (msg_bc_ack_invalid(hdr))
> +   return;
> +
> __skb_queue_head_init();
>
> tipc_bcast_lock(net);
> @@ -279,11 +285,11 @@ int tipc_bcast_sync_rcv(struct net *net,
> struct tipc_link *l,
> __skb_queue_head_init();
>
> tipc_bcast_lock(net);
> -   if (msg_type(hdr) == STATE_MSG) {
> +   if (msg_type(hdr) != STATE_MSG) {
> +   tipc_link_bc_init_rcv(l, hdr);
> +   } else if (!msg_bc_ack_invalid(hdr)) {
> tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), );
> rc = tipc_link_bc_sync_rcv(l, hdr, );
> -   } else {
> -   tipc_link_bc_init_rcv(l, hdr);
> }
> tipc_bcast_unlock(net);
>
> diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
> index 5ffe344..855d53c 100644
> --- a/net/tipc/bcast.h
> +++ b/net/tipc/bcast.h
> @@ -55,7 +55,8 @@ void tipc_bcast_dec_bearer_dst_cnt(struct net
> *net, int bearer_id);
>  int  tipc_bcast_get_mtu(struct net *net);
>  int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list);
>  int tipc_bcast_rcv(struct net *net, struct tipc_link *l, struct
> sk_buff *skb);
> -void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, u32
> acked);
> +void tipc_bcast_ack_rcv(stru

Re: [tipc-discussion] soft lockup for TIPC

2016-11-25 Thread Jon Maloy
Ying,
I looked into the patches you posted to remove the bearer lock around February 
2014, but could not find any obvious candidate addressing this problem. I 
suspect you just "eliminated" the potential issue though your series of 
patches. But you possibly remember better than me what was done, and which of 
the patches are needed to resolve the issue.

BR
///jon


> -Original Message-
> From: XIANG Haiming [mailto:haiming.xi...@alcatel-sbell.com.cn]
> Sent: Wednesday, 23 November, 2016 20:41
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net
> Subject: RE: soft lockup for TIPC
> 
> Hi Jon,
> 
> I am OK with installing own patches.
> 
> Thank you for your help.
> 
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: 2016年11月24日 1:03
> To: XIANG Haiming; tipc-discussion@lists.sourceforge.net
> Subject: RE: soft lockup for TIPC
> 
> Hi,
> I will take a look into this during the next days. Are you ok with installing 
> your own
> patches?
> 
> ///jon
> 
> 
> > -Original Message-
> > From: XIANG Haiming [mailto:haiming.xi...@alcatel-sbell.com.cn]
> > Sent: Sunday, 20 November, 2016 21:35
> > To: tipc-discussion@lists.sourceforge.net
> > Subject: [tipc-discussion] soft lockup for TIPC
> >
> > Hi all,
> >
> > The version for TIPC which we use is TIPC version 2.0.0.
> > The OS info is as follow:
> >
> > Red Hat Enterprise Linux Server 7.2 (Maipo)
> > Kernel 3.10.0-327.18.2.el7.x86_64 on an x86_64
> >
> > We meet two soft lockup issue about TIPC, please help us to solve this 
> > issue.
> > Thank you
> >
> > One issue is as follow:
> >
> >
> > [85502.601198] BUG: soft lockup - CPU#0 stuck for 22s! [scm:2649]
> > [85502.603585] Modules linked in: iptable_filter ip6table_mangle xt_limit
> > iptable_mangle ip6table_filter ip6_tables igb_uio(OE) uio tipc(OE) 8021q 
> > garp
> stp
> > mrp llc bonding dm_mirror dm_region_hash dm_log dm_mod ppdev
> > crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper
> > ablk_helper cryptd pcspkr i6300esb virtio_balloon cirrus syscopyarea 
> > sysfillrect
> > sysimgblt ttm drm_kms_helper drm i2c_piix4 i2c_core parport_pc parport
> > binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4
> mbcache
> > jbd2 virtio_blk virtio_console crct10dif_pclmul crct10dif_common 
> > crc32c_intel
> > serio_raw ixgbevf virtio_pci virtio_ring virtio ata_generic pata_acpi 
> > ata_piix
> libata
> > floppy
> > [85502.618210] CPU: 0 PID: 2649 Comm: scm Tainted: G   OEL 
> > 
> > 3.10.0-327.18.2.el7.x86_64 #1
> > [85502.620482] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> > [85502.622361] task: 880405da5080 ti: 8800db68c000 task.ti:
> > 8800db68c000
> > [85502.624400] RIP: 0010:[]  []
> > _raw_spin_lock_bh+0x3d/0x50
> > [85502.626558] RSP: 0018:8800db68fa68  EFLAGS: 0202
> > [85502.628412] RAX: 56f6 RBX: 880406600800 RCX:
> > 712a
> > [85502.630419] RDX: 712c RSI: 712c RDI:
> > a03f8548
> > [85502.632423] RBP: 8800db68fa70 R08: 02c0 R09:
> > 0500
> > [85502.634421] R10: 88040f001500 R11: 8800db68fc58 R12:
> 81519be1
> > [85502.636413] R13: 8800db68fa08 R14:  R15:
> > 0240
> > [85502.638415] FS:  () GS:88041fc0(0063)
> > knlGS:d1495b40
> > [85502.640514] CS:  0010 DS: 002b ES: 002b CR0: 80050033
> > [85502.642399] CR2: d37c8208 CR3: 0003cfff7000 CR4:
> > 000406f0
> > [85502.644420] DR0:  DR1:  DR2:
> > 
> > [85502.646416] DR3:  DR6: 0ff0 DR7:
> > 0400
> > [85502.648393] Stack:
> > [85502.649916]  a03f8548 8800db68fa90 a03e2afb
> > 880036242c00
> > [85502.652019]  880406600800 8800db68fac0 a03e5c1b
> > 880036242c00
> > [85502.654108]  880036b5a484 88003698e800 8800db68fb30
> > 8800db68fba8
> > [85502.656179] Call Trace:
> > [85502.657742]  [] tipc_bearer_blocked+0x1b/0x30 [tipc]
> > [85502.659678]  [] link_send_buf_fast+0x5b/0xb0 [tipc]
> > [85502.661592]  [] tipc_link_send_sections_fast+0xe6/0x630
> > [tipc]
> > [85502.663594]  [] ? _raw_read_unlock_bh+0x16/0x20
> > [85502.665473]  [] ? tipc_nametbl_translate+0xc0/0x1f0
> [tipc]
> > 

[tipc-discussion] [PATCH net 1/1] tipc: fix compatibility bug in link monitoring

2016-11-23 Thread Jon Maloy
commit 817298102b0b ("tipc: fix link priority propagation") introduced a
compatibility problem between TIPC versions newer than Linux 4.6 and
those older than Linux 4.4. In versions later than 4.4, link STATE
messages only contain a non-zero link priority value when the sender
wants the receiver to change its priority. This has the effect that the
receiver resets itself in order to apply the new priority. This works
well, and is consistent with the said commit.

However, in versions older than 4.4 a valid link priority is present in
all sent link STATE messages, leading to cyclic link establishment and
reset on the 4.6+ node.

We fix this by adding a test that the received value should not only
be valid, but also differ from the current value in order to cause the
receiving link endpoint to reset.

Reported-by: Amar Nv <amar.nv...@gmail.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/link.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 1055164..ecc12411 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1492,8 +1492,9 @@ static int tipc_link_proto_rcv(struct tipc_link *l, 
struct sk_buff *skb,
if (in_range(peers_tol, TIPC_MIN_LINK_TOL, TIPC_MAX_LINK_TOL))
l->tolerance = peers_tol;
 
-   if (peers_prio && in_range(peers_prio, TIPC_MIN_LINK_PRI,
-  TIPC_MAX_LINK_PRI)) {
+   /* Update own prio if peer indicates a different value */
+   if ((peers_prio != l->priority) &&
+   in_range(peers_prio, 1, TIPC_MAX_LINK_PRI)) {
l->priority = peers_prio;
rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT);
}
-- 
2.7.4


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] soft lockup for TIPC

2016-11-23 Thread Jon Maloy
Hi,
I will take a look into this during the next days. Are you ok with installing 
your own patches?

///jon


> -Original Message-
> From: XIANG Haiming [mailto:haiming.xi...@alcatel-sbell.com.cn]
> Sent: Sunday, 20 November, 2016 21:35
> To: tipc-discussion@lists.sourceforge.net
> Subject: [tipc-discussion] soft lockup for TIPC
> 
> Hi all,
> 
> The version for TIPC which we use is TIPC version 2.0.0.
> The OS info is as follow:
> 
> Red Hat Enterprise Linux Server 7.2 (Maipo)
> Kernel 3.10.0-327.18.2.el7.x86_64 on an x86_64
> 
> We meet two soft lockup issue about TIPC, please help us to solve this issue.
> Thank you
> 
> One issue is as follow:
> 
> 
> [85502.601198] BUG: soft lockup - CPU#0 stuck for 22s! [scm:2649]
> [85502.603585] Modules linked in: iptable_filter ip6table_mangle xt_limit
> iptable_mangle ip6table_filter ip6_tables igb_uio(OE) uio tipc(OE) 8021q garp 
> stp
> mrp llc bonding dm_mirror dm_region_hash dm_log dm_mod ppdev
> crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper
> ablk_helper cryptd pcspkr i6300esb virtio_balloon cirrus syscopyarea 
> sysfillrect
> sysimgblt ttm drm_kms_helper drm i2c_piix4 i2c_core parport_pc parport
> binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache
> jbd2 virtio_blk virtio_console crct10dif_pclmul crct10dif_common crc32c_intel
> serio_raw ixgbevf virtio_pci virtio_ring virtio ata_generic pata_acpi 
> ata_piix libata
> floppy
> [85502.618210] CPU: 0 PID: 2649 Comm: scm Tainted: G   OEL 
> 
> 3.10.0-327.18.2.el7.x86_64 #1
> [85502.620482] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [85502.622361] task: 880405da5080 ti: 8800db68c000 task.ti:
> 8800db68c000
> [85502.624400] RIP: 0010:[]  []
> _raw_spin_lock_bh+0x3d/0x50
> [85502.626558] RSP: 0018:8800db68fa68  EFLAGS: 0202
> [85502.628412] RAX: 56f6 RBX: 880406600800 RCX:
> 712a
> [85502.630419] RDX: 712c RSI: 712c RDI:
> a03f8548
> [85502.632423] RBP: 8800db68fa70 R08: 02c0 R09:
> 0500
> [85502.634421] R10: 88040f001500 R11: 8800db68fc58 R12: 
> 81519be1
> [85502.636413] R13: 8800db68fa08 R14:  R15:
> 0240
> [85502.638415] FS:  () GS:88041fc0(0063)
> knlGS:d1495b40
> [85502.640514] CS:  0010 DS: 002b ES: 002b CR0: 80050033
> [85502.642399] CR2: d37c8208 CR3: 0003cfff7000 CR4:
> 000406f0
> [85502.644420] DR0:  DR1:  DR2:
> 
> [85502.646416] DR3:  DR6: 0ff0 DR7:
> 0400
> [85502.648393] Stack:
> [85502.649916]  a03f8548 8800db68fa90 a03e2afb
> 880036242c00
> [85502.652019]  880406600800 8800db68fac0 a03e5c1b
> 880036242c00
> [85502.654108]  880036b5a484 88003698e800 8800db68fb30
> 8800db68fba8
> [85502.656179] Call Trace:
> [85502.657742]  [] tipc_bearer_blocked+0x1b/0x30 [tipc]
> [85502.659678]  [] link_send_buf_fast+0x5b/0xb0 [tipc]
> [85502.661592]  [] tipc_link_send_sections_fast+0xe6/0x630
> [tipc]
> [85502.663594]  [] ? _raw_read_unlock_bh+0x16/0x20
> [85502.665473]  [] ? tipc_nametbl_translate+0xc0/0x1f0 
> [tipc]
> [85502.667441]  [] tipc_send2name+0x155/0x1d0 [tipc]
> [85502.669351]  [] send_msg+0x1eb/0x530 [tipc]
> [85502.671205]  [] sock_sendmsg+0xb0/0xf0
> [85502.673028]  [] ? futex_wait+0x193/0x280
> [85502.674859]  [] SYSC_sendto+0x121/0x1c0
> [85502.676676]  [] ? __do_page_fault+0x16d/0x450
> [85502.678540]  [] ? poll_select_copy_remaining+0x62/0x130
> [85502.680489]  [] SyS_sendto+0xe/0x10
> [85502.682277]  [] compat_sys_socketcall+0x173/0x2a0
> [85502.684193]  [] sysenter_dispatch+0x7/0x21
> [85502.686046] Code: b8 00 00 02 00 f0 0f c1 03 89 c2 c1 ea 10 66 39 c2 75 03 
> 5b 5d
> c3 83 e2 fe 0f b7 f2 b8 00 80 00 00 0f b7 0b 66 39 ca 74 ea f3 90 <83> e8 01 
> 75 f1 48
> 89 df 66 66 66 90 66 66 90 eb e0 66 90 66 66
> [85503.372189] BUG: soft lockup - CPU#2 stuck for 23s! [nodemgr:2560]
> [85503.374181] Modules linked in: iptable_filter ip6table_mangle xt_limit
> iptable_mangle ip6table_filter ip6_tables igb_uio(OE) uio tipc(OE) 8021q garp 
> stp
> mrp llc bonding dm_mirror dm_region_hash dm_log dm_mod ppdev
> crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper
> ablk_helper cryptd pcspkr i6300esb virtio_balloon cirrus syscopyarea 
> sysfillrect
> sysimgblt ttm drm_kms_helper drm i2c_piix4 i2c_core parport_pc parport
> binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache
> jbd2 virtio_blk virtio_console crct10dif_pclmul crct10dif_common crc32c_intel
> serio_raw ixgbevf virtio_pci virtio_ring virtio ata_generic pata_acpi 
> ata_piix libata
> floppy
> [85503.389354] CPU: 2 PID: 2560 Comm: nodemgr Tainted: G   OEL 
> 
> 3.10.0-327.18.2.el7.x86_64 #1
> [85503.391715] Hardware name: Red Hat 

[tipc-discussion] [PATCH net 1/1] tipc: resolve connection flow control compatibility problem

2016-11-24 Thread Jon Maloy
In commit 10724cc7bb78 ("tipc: redesign connection-level flow control")
we replaced the previous message based flow control with one based on
1k blocks. In order to ensure backwards compatibility the mechanism
falls back to using message as base unit when it senses that the peer
doesn't support the new algorithm. The default flow control window,
i.e., how many units can be sent before the sender blocks and waits
for an acknowledge (aka advertisement) is 512. This was tested against
the previous version, which uses an acknowledge frequency of on ack per
256 received message, and found to work fine.

However, we missed the fact that versions older than Linux 3.15 use an
acknowledge frequency of 512, which is exactly the limit where a 4.6+
sender will stop and wait for acknowledge. This would also work fine if
it weren't for the fact that if the first sent message on a 4.6+ server
side is an empty SYNACK, this one is also is counted as a sent message,
while it is not counted as a received message on a legacy 3.15-receiver.
This leads to the sender always being one step ahead of the receiver, a
scenario causing the sender to block after 512 sent messages, while the
receiver only has registered 511 read messages. Hence, the legacy
receiver is not trigged to send an acknowledge, with a permanently
blocked sender as result.

We solve this deadlock by simply allowing the sender to send one more
message before it blocks, i.e., by a making minimal change to the
condition used for determining connection congestion.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index db32777..41f0138 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -186,7 +186,7 @@ static struct tipc_sock *tipc_sk(const struct sock *sk)
 
 static bool tsk_conn_cong(struct tipc_sock *tsk)
 {
-   return tsk->snt_unacked >= tsk->snd_win;
+   return tsk->snt_unacked > tsk->snd_win;
 }
 
 /* tsk_blocks(): translate a buffer size in bytes to number of
-- 
2.7.4


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] TIPC link statistic

2016-11-28 Thread Jon Maloy


> -Original Message-
> From: Holger Brunck [mailto:holger.bru...@keymile.com]
> Sent: Monday, 28 November, 2016 07:41
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] TIPC link statistic
> 
> Hi Jon,
> 
> On 24/11/16 17:08, Jon Maloy wrote:
> >> On my embedded PPC board (kernel 4.4, Server):
> >>> tipc link stat show
> >> Link statistics:
> >>

...

> >
> > I'll try to find time for it  during the coming week. (Unless somebody is
> volunteering of course).
> >
> 
> I saw your patch "tipc: fix link statistics counter errors". I assume it 
> should
> tackle this issue? I gave it a try with kernel 4.9.0-rc7 on my kmeter1 board
> which is a 32 bit powerpc board. Unfortunately the counters are still wrong in
> the link statistic. Received packets don't appear at all and transmitted
> packages to a remote node are accounted on the broadcast link.

I believe you are talking only about the broadcast link here? The figures for 
broadcast reception are currently missing by design, i.e., they have always 
been missing. We would need to scan across all broadcast reception links (on 
the contrary, there is only one broadcast transmission link, which makes that 
task easy) and accumulate all values, as well as presenting the figures for the 
individual links. It is not a particularly big or difficult task, but it is 
certainly more than the small bug corrections I just delivered. I cannot 
prioritize this myself right now.

BR
///jon

> 
> Best regards
> Holger

--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] TIPC link statistic

2016-11-28 Thread Jon Maloy


> -Original Message-
> From: Holger Brunck [mailto:holger.bru...@keymile.com]
> Sent: Monday, 28 November, 2016 09:22
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] TIPC link statistic
> 
> Hi Jon,
> 
> On 28/11/16 14:53, Jon Maloy wrote:
> >> I saw your patch "tipc: fix link statistics counter errors". I assume it 
> >> should
> >> > tackle this issue? I gave it a try with kernel 4.9.0-rc7 on my kmeter1 
> >> > board
> >> > which is a 32 bit powerpc board. Unfortunately the counters are still 
> >> > wrong
> in
> >> > the link statistic. Received packets don't appear at all and transmitted
> >> > packages to a remote node are accounted on the broadcast link.
> > I believe you are talking only about the broadcast link here? The figures 
> > for
> > broadcast reception are currently missing by design, i.e., they have always
> > been missing. We would need to scan across all broadcast reception links (on
> > the contrary, there is only one broadcast transmission link, which makes 
> > that
> > task easy) and accumulate all values, as well as presenting the figures for
> > the individual links. It is not a particularly big or difficult task, but it
> > is certainly more than the small bug corrections I just delivered. I cannot
> > prioritize this myself right now.
> 
> no I am not talking about the broadcast link in particular, it was only 
> another
> thing I noticed.
> 
> I have a TIPC link between two ethernet ports and I send packets 
> connectionless
> from a client to a server running on the other side of the link.  And what I
> still see is that the RX and TX counter are not increasing in the link
> statistic. After sending 300 packets with a size of 10kB I see:
> 
> Link <1.1.9:eth2-1.1.211:eth1>
>   ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
>   RX packets:6 fragments:0/0 bundles:0/0
>   TX packets:4 fragments:0/0 bundles:0/0
>   TX profile sample:2 packets  average:60 octets
>   0-64:100% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
>   RX states:17978 probes:368 naks:0 defs:2 dups:2
>   TX states:17772 probes:17386 naks:2 acks:16 dups:0
>   Congestion link:0  Send queue max:0 avg:0
> 
> I just wanted to know that this is a known bug or if there is something wrong 
> in
> my setup.
> 
> Best regards
> Holger

The explanation is simple: the patch is not applied on net-next yet, only on 
net. It normally takes a few days before David re-applies fixes to net back to 
net-next. Since you anyway checked out net-next, you could try to apply the 
patch yourself.

///jon


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] TIPC link statistic

2016-11-28 Thread Jon Maloy


> -Original Message-
> From: Holger Brunck [mailto:holger.bru...@keymile.com]
> Sent: Monday, 28 November, 2016 09:49
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] TIPC link statistic
> 
> On 28/11/16 15:40, Jon Maloy wrote:
> >
> >
> >> -Original Message-
> >> From: Holger Brunck [mailto:holger.bru...@keymile.com]
> >> Sent: Monday, 28 November, 2016 09:22
> >> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
> discuss...@lists.sourceforge.net
> >> Subject: Re: [tipc-discussion] TIPC link statistic
> >>
> >> Hi Jon,
> >>
> >> On 28/11/16 14:53, Jon Maloy wrote:
> >>>> I saw your patch "tipc: fix link statistics counter errors". I assume it 
> >>>> should
> >>>>> tackle this issue? I gave it a try with kernel 4.9.0-rc7 on my kmeter1 
> >>>>> board
> >>>>> which is a 32 bit powerpc board. Unfortunately the counters are still 
> >>>>> wrong
> >> in
> >>>>> the link statistic. Received packets don't appear at all and transmitted
> >>>>> packages to a remote node are accounted on the broadcast link.
> >>> I believe you are talking only about the broadcast link here? The figures 
> >>> for
> >>> broadcast reception are currently missing by design, i.e., they have 
> >>> always
> >>> been missing. We would need to scan across all broadcast reception links 
> >>> (on
> >>> the contrary, there is only one broadcast transmission link, which makes 
> >>> that
> >>> task easy) and accumulate all values, as well as presenting the figures 
> >>> for
> >>> the individual links. It is not a particularly big or difficult task, but 
> >>> it
> >>> is certainly more than the small bug corrections I just delivered. I 
> >>> cannot
> >>> prioritize this myself right now.
> >>
> >> no I am not talking about the broadcast link in particular, it was only 
> >> another
> >> thing I noticed.
> >>
> >> I have a TIPC link between two ethernet ports and I send packets
> connectionless
> >> from a client to a server running on the other side of the link.  And what 
> >> I
> >> still see is that the RX and TX counter are not increasing in the link
> >> statistic. After sending 300 packets with a size of 10kB I see:
> >>
> >> Link <1.1.9:eth2-1.1.211:eth1>
> >>   ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
> >>   RX packets:6 fragments:0/0 bundles:0/0
> >>   TX packets:4 fragments:0/0 bundles:0/0
> >>   TX profile sample:2 packets  average:60 octets
> >>   0-64:100% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
> >>   RX states:17978 probes:368 naks:0 defs:2 dups:2
> >>   TX states:17772 probes:17386 naks:2 acks:16 dups:0
> >>   Congestion link:0  Send queue max:0 avg:0
> >>
> >> I just wanted to know that this is a known bug or if there is something 
> >> wrong
> in
> >> my setup.
> >>
> >> Best regards
> >> Holger
> >
> > The explanation is simple: the patch is not applied on net-next yet, only 
> > on net.
> It normally takes a few days before David re-applies fixes to net back to 
> net-next.
> Since you anyway checked out net-next, you could try to apply the patch
> yourself.
> >
> 
> ok maybe my first e-mail was not clear enough. I applied your patch on top of
> 4.9.0-rc7 and it does not make a difference, thats what I am trying to say. It
> is still broken on my side.
> 
> Best regards
> Holger

Then I have no more theories. The patch works fine in my x64 environment, and I 
see no reason it shouldn't work on PowerPC as well, since there are no 
endianness operations involved. Is the output *exactly* the same before and 
after having applied the patch?

///jon


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] TIPC link statistic

2016-11-28 Thread Jon Maloy


On 11/28/2016 10:32 AM, Holger Brunck wrote:
> On 28/11/16 15:55, Jon Maloy wrote:
>>>> The explanation is simple: the patch is not applied on net-next yet, only 
>>>> on net.
>>> It normally takes a few days before David re-applies fixes to net back to 
>>> net-next.
>>> Since you anyway checked out net-next, you could try to apply the patch
>>> yourself.
>>> ok maybe my first e-mail was not clear enough. I applied your patch on top 
>>> of
>>> 4.9.0-rc7 and it does not make a difference, thats what I am trying to say. 
>>> It
>>> is still broken on my side.
>>>
>>> Best regards
>>> Holger
>> Then I have no more theories. The patch works fine in my x64 environment, 
>> and I see no reason it shouldn't work on PowerPC as well, since there are no 
>> endianness operations involved. Is the output *exactly* the same before and 
>> after having applied the patch?
>>
> hm weird, I have currently no setup were I can doublecheck this on a x86
> environment.
>
> The output differs, depending the patch is applied or not.
>
> Sending 150 packets with a size of 1000 bytes to the server leads to the
> following link statistic on the client side:
>
> With your patch:
>
> Link 
>Window:50 packets
>RX packets:0 fragments:0/0 bundles:0/0
>TX packets:1050 fragments:1050/150 bundles:0/0
>RX naks:0 defs:0 dups:0
>TX naks:0 acks:0 dups:0
>Congestion link:46  Send queue max:0 avg:0
>
> Link <1.1.4:eth2-1.1.211:eth1>
>ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
>RX packets:5 fragments:0/0 bundles:0/0
>TX packets:4 fragments:0/0 bundles:0/0
>TX profile sample:2 packets  average:60 octets
>0-64:100% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
>RX states:544 probes:10 naks:0 defs:2 dups:2
>TX states:548 probes:470 naks:2 acks:66 dups:0
>Congestion link:0  Send queue max:0 avg:0
>
>
> Without your patch:
>
> Link 
>Window:50 packets
>RX packets:0 fragments:0/0 bundles:0/0
>TX packets:0 fragments:0/0 bundles:0/0
>RX naks:0 defs:0 dups:0
>TX naks:0 acks:0 dups:0
>Congestion link:49  Send queue max:0 avg:0
>
> Link <1.1.4:eth2-1.1.211:eth1>
>ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
>RX packets:0 fragments:0/0 bundles:0/0
>TX packets:4 fragments:0/0 bundles:0/0
>TX profile sample:2 packets  average:60 octets
>0-64:100% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
>RX states:397 probes:7 naks:0 defs:2 dups:2
>TX states:400 probes:325 naks:2 acks:66 dups:0
>Congestion link:0  Send queue max:0 avg:0
>
> so in both cases no TX packets are accounted on the specific link, but with 
> your
> patch something is counted on the account of the broadcast link.
>
> Best regards
> Holger
>
> --
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Yes, this is really strange. I seems like the tx packets have been 
counted as fragments in the broadcast link. My best theory is that there 
must be some mismatch between the tool and the kernel. Does both "tipc"  
and "tipc-config" give this result? Have you tried to rebuild the tool(s)?

///jon


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] TIPC link statistic

2016-11-24 Thread Jon Maloy
> -Original Message-
> From: Holger Brunck [mailto:holger.bru...@keymile.com]
> Sent: Thursday, 24 November, 2016 09:48
> To: tipc-discussion@lists.sourceforge.net
> Subject: [tipc-discussion] TIPC link statistic
> 
> Hi all,
> I got a basic question to the link statistic printed out from TIPC when I use
> "tipc-config -ls" or "tipc link stat show".
> 
> I got a link between a PPC board running kernel 4.4 and my local x86 PC via
> ethernet bearers. I have a simple server running on my PPC board listening on 
> a
> TIPC socket which has published a port in TIPC_CLUSTER_MODE.
> 
> On my local PC I start a TIPC client sending 100 connectionless messages to 
> this
> TIPC port. Due  to printouts within my server I see that the messages are
> arriving where they should.
> 
> What confuses me is when I have a look on the link statistic on both sides. My
> local PC (running Kernel 3.10, Client):
> 
> > tipc-config -ls
> Link 
>   Window:20 packets
>   RX packets:0 fragments:0/0 bundles:0/0
>   TX packets:100 fragments:0/0 bundles:0/0
>   RX naks:0 defs:0 dups:0
>   TX naks:0 acks:0 dups:0
>   Congestion link:6  Send queue max:20 avg:12
> 
> Link <1.1.3:p4p1-1.1.11:eth1>
>   ACTIVE  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
>   RX packets:1 fragments:0/0 bundles:0/0
>   TX packets:0 fragments:0/0 bundles:0/0
>   TX profile sample:0 packets  average:0 octets
>   0-64:0% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
>   RX states:370 probes:182 naks:0 defs:0 dups:0
>   TX states:364 probes:181 naks:0 acks:0 dups:0
>   Congestion link:0  Send queue max:0 avg:0
> 
> The TX messages are accounted on the broadcast link. I would expect that they
> appear on the Link <1.1.3:p4p1-1.1.11:eth1>.
> 
> On my embedded PPC board (kernel 4.4, Server):
> > tipc link stat show
> Link statistics:
> 
> Link 
>   Window:50 packets
>   RX packets:1 fragments:0/0 bundles:0/0
>   TX packets:1 fragments:0/0 bundles:0/0
>   RX naks:0 defs:0 dups:0
>   TX naks:0 acks:0 dups:0
>   Congestion link:0  Send queue max:0 avg:0
> 
> Link <1.1.11:eth1-1.1.3:p4p1>
>   STANDBY  MTU:1500  Priority:10  Tolerance:1500 ms  Window:50 packets
>   RX packets:0 fragments:0/0 bundles:0/0
>   TX packets:2 fragments:0/0 bundles:0/0
>   TX profile sample:2 packets  average:60 octets
>   0-64:100% -256:0% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
>   RX states:378 probes:188 naks:0 defs:0 dups:0
>   TX states:384 probes:189 naks:0 acks:7 dups:0
>   Congestion link:0  Send queue max:0 avg:0
> 
> On the server side the packets doesn't appear at all in the link statistic, 
> but
> they were definetely received from the server.
> 
> Does anyone has an idea whats wrong here?

Yes, I noticed this a while ago, but I haven't had time to look into it. It is 
clearly a bug, probably introduced by me during my last update of the node/link 
layer, and I am sure it easy to fix. As a matter of fact, I have realized this 
is becoming urgent even for my own company, as we are planning to use this 
statistics to measure media quality.

I'll try to find time for it  during the coming week. (Unless somebody is 
volunteering of course).

BR
///jon
> 
> Best regards
> Holger
> 
> --
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion

--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH net 1/1] tipc: eliminate obsolete socket locking policy description

2016-11-19 Thread Jon Maloy
The comment block in socket.c describing the locking policy is
obsolete, and does not reflect current reality. We remove it in this
commit.

Since the current locking policy is much simpler and follows a
mainstream approach, we see no need to add a new description.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/socket.c | 48 +---
 1 file changed, 1 insertion(+), 47 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 22d92f0..4916d8f 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1,7 +1,7 @@
 /*
  * net/tipc/socket.c: TIPC socket API
  *
- * Copyright (c) 2001-2007, 2012-2015, Ericsson AB
+ * Copyright (c) 2001-2007, 2012-2016, Ericsson AB
  * Copyright (c) 2004-2008, 2010-2013, Wind River Systems
  * All rights reserved.
  *
@@ -127,54 +127,8 @@ static const struct proto_ops packet_ops;
 static const struct proto_ops stream_ops;
 static const struct proto_ops msg_ops;
 static struct proto tipc_proto;
-
 static const struct rhashtable_params tsk_rht_params;
 
-/*
- * Revised TIPC socket locking policy:
- *
- * Most socket operations take the standard socket lock when they start
- * and hold it until they finish (or until they need to sleep).  Acquiring
- * this lock grants the owner exclusive access to the fields of the socket
- * data structures, with the exception of the backlog queue.  A few socket
- * operations can be done without taking the socket lock because they only
- * read socket information that never changes during the life of the socket.
- *
- * Socket operations may acquire the lock for the associated TIPC port if they
- * need to perform an operation on the port.  If any routine needs to acquire
- * both the socket lock and the port lock it must take the socket lock first
- * to avoid the risk of deadlock.
- *
- * The dispatcher handling incoming messages cannot grab the socket lock in
- * the standard fashion, since invoked it runs at the BH level and cannot 
block.
- * Instead, it checks to see if the socket lock is currently owned by someone,
- * and either handles the message itself or adds it to the socket's backlog
- * queue; in the latter case the queued message is processed once the process
- * owning the socket lock releases it.
- *
- * NOTE: Releasing the socket lock while an operation is sleeping overcomes
- * the problem of a blocked socket operation preventing any other operations
- * from occurring.  However, applications must be careful if they have
- * multiple threads trying to send (or receive) on the same socket, as these
- * operations might interfere with each other.  For example, doing a connect
- * and a receive at the same time might allow the receive to consume the
- * ACK message meant for the connect.  While additional work could be done
- * to try and overcome this, it doesn't seem to be worthwhile at the present.
- *
- * NOTE: Releasing the socket lock while an operation is sleeping also ensures
- * that another operation that must be performed in a non-blocking manner is
- * not delayed for very long because the lock has already been taken.
- *
- * NOTE: This code assumes that certain fields of a port/socket pair are
- * constant over its lifetime; such fields can be examined without taking
- * the socket lock and/or port lock, and do not need to be re-read even
- * after resuming processing after waiting.  These fields include:
- *   - socket type
- *   - pointer to socket sk structure (aka tipc_sock structure)
- *   - pointer to port structure
- *   - port reference
- */
-
 static u32 tsk_own_node(struct tipc_sock *tsk)
 {
return msg_prevnode(>phdr);
-- 
2.7.4


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH net-next 3/3] tipc: reduce risk of user starvation during link congestion

2016-11-21 Thread Jon Maloy
The socket code currently handles link congestion by either blocking
and trying to send again when the congestion has abated, or just
returning to the user with -EAGAIN and let him re-try later.

This mechanism is prone to starvation, because the wakeup algorithm is
non-atomic. During the time the link issues a wakeup signal, until the
socket wakes up and re-attempts sending, other senders may have come
in between and occupied the free buffer space in the link. This in turn
may lead to a socket having to make many send attempts before it is
successful. In extremely loaded systems we have observed latency times
of several seconds before a low-priority socket is able to send out a
message.

In this commit, we simplify this mechanism and reduce the risk of the
described scenario happening. When a message is sent to a congested
link, we now let it keep the message in the wakeup-item that it has to
create anyway, and immediately add it to the link's send queue when
enough space has been freed up. Only when this is done do we issue a
wakeup signal to the socket, which can now immediately go on and send
the next message, if any.

The fact that a socket now can consider a message sent even when the
link returns a congestion code means that the sending socket code can
be simplified. Also, since this is a good opportunity to get rid of the
obsolete 'mtu change' condition in the three socket send functions, we
now choose to refactor those functions completely.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c  |   2 +-
 net/tipc/link.c   |  90 --
 net/tipc/msg.h|   8 +-
 net/tipc/node.c   |   2 +-
 net/tipc/socket.c | 346 --
 5 files changed, 209 insertions(+), 239 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index aa1babb..1a56cab 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -174,7 +174,7 @@ static void tipc_bcbase_xmit(struct net *net, struct 
sk_buff_head *xmitq)
  *and to identified node local sockets
  * @net: the applicable net namespace
  * @list: chain of buffers containing message
- * Consumes the buffer chain, except when returning -ELINKCONG
+ * Consumes the buffer chain.
  * Returns 0 if success, otherwise errno: -ELINKCONG,-EHOSTUNREACH,-EMSGSIZE
  */
 int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list)
diff --git a/net/tipc/link.c b/net/tipc/link.c
index 1055164..db3dcab 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -774,60 +774,77 @@ int tipc_link_timeout(struct tipc_link *l, struct 
sk_buff_head *xmitq)
return rc;
 }
 
+static bool tipc_link_congested(struct tipc_link *l, int imp)
+{
+   int i;
+
+   if (skb_queue_empty(>backlogq))
+   return false;
+
+   /* Match msg importance against this and all higher backlog limits: */
+   for (i = imp; i <= TIPC_SYSTEM_IMPORTANCE; i++) {
+   if (unlikely(l->backlog[i].len >= l->backlog[i].limit))
+   return true;
+   }
+   return false;
+}
+
 /**
  * link_schedule_user - schedule a message sender for wakeup after congestion
- * @link: congested link
+ * @l: congested link
  * @list: message that was attempted sent
  * Create pseudo msg to send back to user when congestion abates
- * Does not consume buffer list
+ * Consumes buffer list
  */
-static int link_schedule_user(struct tipc_link *link, struct sk_buff_head 
*list)
+static int link_schedule_user(struct tipc_link *l, struct sk_buff_head *list)
 {
-   struct tipc_msg *msg = buf_msg(skb_peek(list));
-   int imp = msg_importance(msg);
-   u32 oport = msg_origport(msg);
-   u32 addr = tipc_own_addr(link->net);
+   struct tipc_msg *hdr = buf_msg(skb_peek(list));
+   int imp = msg_importance(hdr);
+   u32 oport = msg_origport(hdr);
+   u32 dnode = tipc_own_addr(l->net);
struct sk_buff *skb;
+   struct sk_buff_head *pkts;
 
/* This really cannot happen...  */
if (unlikely(imp > TIPC_CRITICAL_IMPORTANCE)) {
-   pr_warn("%s<%s>, send queue full", link_rst_msg, link->name);
+   pr_warn("%s<%s>, send queue full", link_rst_msg, l->name);
return -ENOBUFS;
}
-   /* Non-blocking sender: */
-   if (TIPC_SKB_CB(skb_peek(list))->wakeup_pending)
-   return -ELINKCONG;
 
/* Create and schedule wakeup pseudo message */
skb = tipc_msg_create(SOCK_WAKEUP, 0, INT_H_SIZE, 0,
- addr, addr, oport, 0, 0);
+ dnode, l->addr, oport, 0, 0);
if (!skb)
return -ENOBUFS;
-   TIPC_SKB_CB(skb)->chain_sz = skb_queue_len(list);
-   TIPC_SKB_CB(skb)->chain_imp = imp;
-   skb_queue_tail(>wakeupq, skb);
-   link->stats.link_congs++;
+   msg_set_dest_droppable(buf_msg(skb), true);
+

[tipc-discussion] [PATCH net-next 2/3] tipc: modify struct tipc_plist to be more versatile

2016-11-21 Thread Jon Maloy
During multicast reception we currently use a simple linked list with
push/pop semantics to store port numbers.

We now see a need for a more generic list for storing values of type
u32. We therefore make some modifications to this list, while replacing
the prefix 'tipc_plist_' with 'u32_'. We also add a couple of new
functions which will come to use in the next commits.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/name_table.c | 100 --
 net/tipc/name_table.h |  21 ---
 net/tipc/socket.c |   8 ++--
 3 files changed, 83 insertions(+), 46 deletions(-)

diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index e190460..5a86df1 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -608,7 +608,7 @@ u32 tipc_nametbl_translate(struct net *net, u32 type, u32 
instance,
  * Returns non-zero if any off-node ports overlap
  */
 int tipc_nametbl_mc_translate(struct net *net, u32 type, u32 lower, u32 upper,
- u32 limit, struct tipc_plist *dports)
+ u32 limit, struct list_head *dports)
 {
struct name_seq *seq;
struct sub_seq *sseq;
@@ -633,7 +633,7 @@ int tipc_nametbl_mc_translate(struct net *net, u32 type, 
u32 lower, u32 upper,
info = sseq->info;
list_for_each_entry(publ, >node_list, node_list) {
if (publ->scope <= limit)
-   tipc_plist_push(dports, publ->ref);
+   u32_push(dports, publ->ref);
}
 
if (info->cluster_list_size != info->node_list_size)
@@ -1022,40 +1022,84 @@ int tipc_nl_name_table_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
return skb->len;
 }
 
-void tipc_plist_push(struct tipc_plist *pl, u32 port)
+struct u32_item {
+   struct list_head list;
+   u32 value;
+};
+
+bool u32_find(struct list_head *l, u32 value)
 {
-   struct tipc_plist *nl;
+   struct u32_item *item;
 
-   if (likely(!pl->port)) {
-   pl->port = port;
-   return;
+   list_for_each_entry(item, l, list) {
+   if (item->value == value)
+   return true;
}
-   if (pl->port == port)
-   return;
-   list_for_each_entry(nl, >list, list) {
-   if (nl->port == port)
-   return;
+   return false;
+}
+
+bool u32_push(struct list_head *l, u32 value)
+{
+   struct u32_item *item;
+
+   list_for_each_entry(item, l, list) {
+   if (item->value == value)
+   return false;
+   }
+   item = kmalloc(sizeof(*item), GFP_ATOMIC);
+   if (unlikely(!item))
+   return false;
+
+   item->value = value;
+   list_add(>list, l);
+   return true;
+}
+
+u32 u32_pop(struct list_head *l)
+{
+   struct u32_item *item;
+   u32 value = 0;
+
+   if (list_empty(l))
+   return 0;
+   item = list_first_entry(l, typeof(*item), list);
+   value = item->value;
+   list_del(>list);
+   kfree(item);
+   return value;
+}
+
+bool u32_del(struct list_head *l, u32 value)
+{
+   struct u32_item *item, *tmp;
+
+   list_for_each_entry_safe(item, tmp, l, list) {
+   if (item->value != value)
+   continue;
+   list_del(>list);
+   kfree(item);
+   return true;
}
-   nl = kmalloc(sizeof(*nl), GFP_ATOMIC);
-   if (nl) {
-   nl->port = port;
-   list_add(>list, >list);
+   return false;
+}
+
+void u32_list_purge(struct list_head *l)
+{
+   struct u32_item *item, *tmp;
+
+   list_for_each_entry_safe(item, tmp, l, list) {
+   list_del(>list);
+   kfree(item);
}
 }
 
-u32 tipc_plist_pop(struct tipc_plist *pl)
+int u32_list_len(struct list_head *l)
 {
-   struct tipc_plist *nl;
-   u32 port = 0;
+   struct u32_item *item;
+   int i = 0;
 
-   if (likely(list_empty(>list))) {
-   port = pl->port;
-   pl->port = 0;
-   return port;
+   list_for_each_entry(item, l, list) {
+   i++;
}
-   nl = list_first_entry(>list, typeof(*nl), list);
-   port = nl->port;
-   list_del(>list);
-   kfree(nl);
-   return port;
+   return i;
 }
diff --git a/net/tipc/name_table.h b/net/tipc/name_table.h
index 1524a73..c89bb3f 100644
--- a/net/tipc/name_table.h
+++ b/net/tipc/name_table.h
@@ -99,7 +99,7 @@ int tipc_nl_name_table_dump(struct sk_buff *skb, struct 
netlink_callback *cb);
 
 u32 tipc_nametbl_translate(struct net *net, u32 type, u32 instance, u32 *node);
 int tipc_nametbl_mc_translate(struct net *net, u32 type, u32 lower, u32 upper,
-   

[tipc-discussion] [PATCH net-next 0/3] tipc: improve interaction socket-link

2016-11-21 Thread Jon Maloy
We fix a very real starvation problem that may occur when the link level
runs into send buffer congestion. At the same time we make the 
interaction between the socket and link layer simpler and more 
consistent.

Jon Maloy (3):
  tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg()
functions
  tipc: modify struct tipc_plist to be more versatile
  tipc: reduce risk of user starvation during link congestion

 net/tipc/bcast.c  |   2 +-
 net/tipc/link.c   |  90 +-
 net/tipc/msg.h|   8 +-
 net/tipc/name_table.c | 100 +++
 net/tipc/name_table.h |  21 +--
 net/tipc/node.c   |   2 +-
 net/tipc/socket.c | 448 ++
 7 files changed, 334 insertions(+), 337 deletions(-)

-- 
2.7.4


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH net-next 1/3] tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions

2016-11-21 Thread Jon Maloy
The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very
similar. The latter function is also called from two locations, and
there will be more in the coming commits, which will all need to test on
different conditions.

Instead of making yet another duplicates of the function, we now
introduce a new macro tipc_wait_for_cond() where the wakeup condition
can be stated as an argument to the call. This macro replaces all
current and future uses of the two functions, which can now be
eliminated.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/socket.c | 108 +-
 1 file changed, 49 insertions(+), 59 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 4916d8f..f084430 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -110,7 +110,6 @@ static void tipc_write_space(struct sock *sk);
 static void tipc_sock_destruct(struct sock *sk);
 static int tipc_release(struct socket *sock);
 static int tipc_accept(struct socket *sock, struct socket *new_sock, int 
flags);
-static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p);
 static void tipc_sk_timeout(unsigned long data);
 static int tipc_sk_publish(struct tipc_sock *tsk, uint scope,
   struct tipc_name_seq const *seq);
@@ -334,6 +333,49 @@ static int tipc_set_sk_state(struct sock *sk, int state)
return res;
 }
 
+static int tipc_sk_sock_err(struct socket *sock, long *timeout)
+{
+   struct sock *sk = sock->sk;
+   int err = sock_error(sk);
+   int typ = sock->type;
+
+   if (err)
+   return err;
+   if (typ == SOCK_STREAM || typ == SOCK_SEQPACKET) {
+   if (sk->sk_state == TIPC_DISCONNECTING)
+   return -EPIPE;
+   else if (!tipc_sk_connected(sk))
+   return -ENOTCONN;
+   } else if (sk->sk_shutdown & SEND_SHUTDOWN) {
+   return -EPIPE;
+   }
+   if (!*timeout)
+   return -EAGAIN;
+   if (signal_pending(current))
+   return sock_intr_errno(*timeout);
+
+   return 0;
+}
+
+#define tipc_wait_for_cond(sock_, timeout_, condition_)
\
+   ({  \
+   int rc_ = 0;\
+   int done_ = 0;  \
+   \
+   while (!(condition_) && !done_) {   \
+   struct sock *sk_ = sock->sk;\
+   DEFINE_WAIT_FUNC(wait_, woken_wake_function);   \
+   \
+   rc_ = tipc_sk_sock_err(sock_, timeout_);\
+   if (rc_)\
+   break;  \
+   prepare_to_wait(sk_sleep(sk_), _, TASK_INTERRUPTIBLE);\
+   done_ = sk_wait_event(sk_, timeout_, (condition_), _);\
+   remove_wait_queue(sk_sleep(sk), _);\
+   }   \
+   rc_;\
+})
+
 /**
  * tipc_sk_create - create a TIPC socket
  * @net: network namespace (must be default network)
@@ -719,7 +761,7 @@ static int tipc_sendmcast(struct  socket *sock, struct 
tipc_name_seq *seq,
 
if (rc == -ELINKCONG) {
tsk->link_cong = 1;
-   rc = tipc_wait_for_sndmsg(sock, );
+   rc = tipc_wait_for_cond(sock, , !tsk->link_cong);
if (!rc)
continue;
}
@@ -828,31 +870,6 @@ static void tipc_sk_proto_rcv(struct tipc_sock *tsk, 
struct sk_buff *skb,
kfree_skb(skb);
 }
 
-static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p)
-{
-   DEFINE_WAIT_FUNC(wait, woken_wake_function);
-   struct sock *sk = sock->sk;
-   struct tipc_sock *tsk = tipc_sk(sk);
-   int done;
-
-   do {
-   int err = sock_error(sk);
-   if (err)
-   return err;
-   if (sk->sk_shutdown & SEND_SHUTDOWN)
-   return -EPIPE;
-   if (!*timeo_p)
-   return -EAGAIN;
-   if (signal_pending(current))
-   return sock_intr_errno(*timeo_p);
-
-   add_wait_queue(sk_sleep(sk), );
-   done = sk_wait_event(sk, timeo_p, !tsk->link_cong, );
-   remove_wait_queue(sk_sleep(sk), );
-   } while (!done);
-   return 0;
-}
-
 /**
  * tipc_sendmsg - send message in connectionless manner
  * @s

Re: [tipc-discussion] [PATCH net-next 3/3] tipc: reduce risk of user starvation during link congestion

2016-11-21 Thread Jon Maloy
I am having some new doubts about our current link congestion criteria. See 
below.

///jon


> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Monday, 21 November, 2016 09:57
> To: tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>; Ying Xue
> <ying....@windriver.com>; Jon Maloy <jon.ma...@ericsson.com>
> Cc: ma...@donjonn.com; thompa@gmail.com
> Subject: [PATCH net-next 3/3] tipc: reduce risk of user starvation during link
> congestion
> 
> The socket code currently handles link congestion by either blocking
> and trying to send again when the congestion has abated, or just
> returning to the user with -EAGAIN and let him re-try later.
> 
> This mechanism is prone to starvation, because the wakeup algorithm is
> non-atomic. During the time the link issues a wakeup signal, until the
> socket wakes up and re-attempts sending, other senders may have come
> in between and occupied the free buffer space in the link. This in turn
> may lead to a socket having to make many send attempts before it is
> successful. In extremely loaded systems we have observed latency times
> of several seconds before a low-priority socket is able to send out a
> message.
> 
> In this commit, we simplify this mechanism and reduce the risk of the
> described scenario happening. When a message is sent to a congested
> link, we now let it keep the message in the wakeup-item that it has to
> create anyway, and immediately add it to the link's send queue when
> enough space has been freed up. Only when this is done do we issue a
> wakeup signal to the socket, which can now immediately go on and send
> the next message, if any.
> 
> The fact that a socket now can consider a message sent even when the
> link returns a congestion code means that the sending socket code can
> be simplified. Also, since this is a good opportunity to get rid of the
> obsolete 'mtu change' condition in the three socket send functions, we
> now choose to refactor those functions completely.
> 
> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> ---
>  net/tipc/bcast.c  |   2 +-
>  net/tipc/link.c   |  90 --
>  net/tipc/msg.h|   8 +-
>  net/tipc/node.c   |   2 +-
>  net/tipc/socket.c | 346 
> --
>  5 files changed, 209 insertions(+), 239 deletions(-)
> 
> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
> index aa1babb..1a56cab 100644
> --- a/net/tipc/bcast.c
> +++ b/net/tipc/bcast.c
> @@ -174,7 +174,7 @@ static void tipc_bcbase_xmit(struct net *net, struct
> sk_buff_head *xmitq)
>   *and to identified node local sockets
>   * @net: the applicable net namespace
>   * @list: chain of buffers containing message
> - * Consumes the buffer chain, except when returning -ELINKCONG
> + * Consumes the buffer chain.
>   * Returns 0 if success, otherwise errno: -ELINKCONG,-EHOSTUNREACH,-
> EMSGSIZE
>   */
>  int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list)
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index 1055164..db3dcab 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -774,60 +774,77 @@ int tipc_link_timeout(struct tipc_link *l, struct
> sk_buff_head *xmitq)
>   return rc;
>  }
> 
> +static bool tipc_link_congested(struct tipc_link *l, int imp)
> +{
> + int i;
> +
> + if (skb_queue_empty(>backlogq))
> + return false;
> +
> + /* Match msg importance against this and all higher backlog limits: */
> + for (i = imp; i <= TIPC_SYSTEM_IMPORTANCE; i++) {
> + if (unlikely(l->backlog[i].len >= l->backlog[i].limit))
> + return true;
> + }
> + return false;
> +}

After introducing this patch I ran into a problem with spontaneous and 
premature connect disconnects in the ptty test program.
After some trouble shooting I realized that with the new algorithm, high 
priority shutdown messages sometimes bypass messages which are reported 
"delivered" but still waiting in the wakeup pending queue. This was easy to 
fix, by adding a potentially blocking tipc_wait_for_cond() call in the 
__tipc_shutdown() call.

But I also realized that we may have (and always had) another starvation 
scenario, which is not addressed with this patch. Gergely's algorithm (above)  
fixes that a socket may not be starved by lower- or equal priority users, but 
this may still be caused by higher-priority users. While a lower-prio message 
is waiting in the wakeup queue, higher-prio users may drive their own levels 
into congestion, thus potentially stopping the lower-prio user to send its 
message for a long time.
After the c

Re: [tipc-discussion] soft lockup for TIPC

2016-11-21 Thread Jon Maloy
Hi Xiang,
Although the version you are using has the same number (I am planning to step 
it to 3.0.0 soon) as the current one in the latest kernels (4.x), it is a very 
different species indeed. Almost all code has been rewritten, and in some cases 
more than once. I would strongly suggest you upgrade to a more recent kernel if 
ever possible, as it is very hard for us to maintain TIPC in 3.x kernels. 

If that is impossible for you, we will have to look into what options we have.

Regards
///jon


> -Original Message-
> From: XIANG Haiming [mailto:haiming.xi...@alcatel-sbell.com.cn]
> Sent: Sunday, 20 November, 2016 21:35
> To: tipc-discussion@lists.sourceforge.net
> Subject: [tipc-discussion] soft lockup for TIPC
> 
> Hi all,
> 
> The version for TIPC which we use is TIPC version 2.0.0.
> The OS info is as follow:
> 
> Red Hat Enterprise Linux Server 7.2 (Maipo)
> Kernel 3.10.0-327.18.2.el7.x86_64 on an x86_64
> 
> We meet two soft lockup issue about TIPC, please help us to solve this issue.
> Thank you
> 
> One issue is as follow:
> 
> 
> [85502.601198] BUG: soft lockup - CPU#0 stuck for 22s! [scm:2649]
> [85502.603585] Modules linked in: iptable_filter ip6table_mangle xt_limit
> iptable_mangle ip6table_filter ip6_tables igb_uio(OE) uio tipc(OE) 8021q garp 
> stp
> mrp llc bonding dm_mirror dm_region_hash dm_log dm_mod ppdev
> crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper
> ablk_helper cryptd pcspkr i6300esb virtio_balloon cirrus syscopyarea 
> sysfillrect
> sysimgblt ttm drm_kms_helper drm i2c_piix4 i2c_core parport_pc parport
> binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache
> jbd2 virtio_blk virtio_console crct10dif_pclmul crct10dif_common crc32c_intel
> serio_raw ixgbevf virtio_pci virtio_ring virtio ata_generic pata_acpi 
> ata_piix libata
> floppy
> [85502.618210] CPU: 0 PID: 2649 Comm: scm Tainted: G   OEL 
> 
> 3.10.0-327.18.2.el7.x86_64 #1
> [85502.620482] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [85502.622361] task: 880405da5080 ti: 8800db68c000 task.ti:
> 8800db68c000
> [85502.624400] RIP: 0010:[]  []
> _raw_spin_lock_bh+0x3d/0x50
> [85502.626558] RSP: 0018:8800db68fa68  EFLAGS: 0202
> [85502.628412] RAX: 56f6 RBX: 880406600800 RCX:
> 712a
> [85502.630419] RDX: 712c RSI: 712c RDI:
> a03f8548
> [85502.632423] RBP: 8800db68fa70 R08: 02c0 R09:
> 0500
> [85502.634421] R10: 88040f001500 R11: 8800db68fc58 R12: 
> 81519be1
> [85502.636413] R13: 8800db68fa08 R14:  R15:
> 0240
> [85502.638415] FS:  () GS:88041fc0(0063)
> knlGS:d1495b40
> [85502.640514] CS:  0010 DS: 002b ES: 002b CR0: 80050033
> [85502.642399] CR2: d37c8208 CR3: 0003cfff7000 CR4:
> 000406f0
> [85502.644420] DR0:  DR1:  DR2:
> 
> [85502.646416] DR3:  DR6: 0ff0 DR7:
> 0400
> [85502.648393] Stack:
> [85502.649916]  a03f8548 8800db68fa90 a03e2afb
> 880036242c00
> [85502.652019]  880406600800 8800db68fac0 a03e5c1b
> 880036242c00
> [85502.654108]  880036b5a484 88003698e800 8800db68fb30
> 8800db68fba8
> [85502.656179] Call Trace:
> [85502.657742]  [] tipc_bearer_blocked+0x1b/0x30 [tipc]
> [85502.659678]  [] link_send_buf_fast+0x5b/0xb0 [tipc]
> [85502.661592]  [] tipc_link_send_sections_fast+0xe6/0x630
> [tipc]
> [85502.663594]  [] ? _raw_read_unlock_bh+0x16/0x20
> [85502.665473]  [] ? tipc_nametbl_translate+0xc0/0x1f0 
> [tipc]
> [85502.667441]  [] tipc_send2name+0x155/0x1d0 [tipc]
> [85502.669351]  [] send_msg+0x1eb/0x530 [tipc]
> [85502.671205]  [] sock_sendmsg+0xb0/0xf0
> [85502.673028]  [] ? futex_wait+0x193/0x280
> [85502.674859]  [] SYSC_sendto+0x121/0x1c0
> [85502.676676]  [] ? __do_page_fault+0x16d/0x450
> [85502.678540]  [] ? poll_select_copy_remaining+0x62/0x130
> [85502.680489]  [] SyS_sendto+0xe/0x10
> [85502.682277]  [] compat_sys_socketcall+0x173/0x2a0
> [85502.684193]  [] sysenter_dispatch+0x7/0x21
> [85502.686046] Code: b8 00 00 02 00 f0 0f c1 03 89 c2 c1 ea 10 66 39 c2 75 03 
> 5b 5d
> c3 83 e2 fe 0f b7 f2 b8 00 80 00 00 0f b7 0b 66 39 ca 74 ea f3 90 <83> e8 01 
> 75 f1 48
> 89 df 66 66 66 90 66 66 90 eb e0 66 90 66 66
> [85503.372189] BUG: soft lockup - CPU#2 stuck for 23s! [nodemgr:2560]
> [85503.374181] Modules linked in: iptable_filter ip6table_mangle xt_limit
> iptable_mangle ip6table_filter ip6_tables igb_uio(OE) uio tipc(OE) 8021q garp 
> stp
> mrp llc bonding dm_mirror dm_region_hash dm_log dm_mod ppdev
> crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper
> ablk_helper cryptd pcspkr i6300esb virtio_balloon cirrus syscopyarea 
> sysfillrect
> sysimgblt ttm drm_kms_helper drm i2c_piix4 i2c_core parport_pc parport
> binfmt_misc nfsd auth_rpcgss 

Re: [tipc-discussion] TIPC compatibility with different kernels

2016-11-15 Thread Jon Maloy
Hi Amar,

The claim that TIPC is compatible between different kernel versions is 
true, and we always test for backwards compatibility when we introduce 
changes that may cause compatibility problems. However, you kernel 3.2.0 
i on node 1.1.6 is *very* old, and I don't think anybody of us is 
testing this far back before we release.

The changes between the versions is substantial, (yes, I know I should 
have changed module version, I will do that soon), and I think there is 
a risk we might have run into an unknown compatibility issue here.

One thing is puzzling me in the dump from node 1.1.1: why is the bearer 
disabled and enabled repeatedly? If you have done that manually, it is 
completely consistent with what you see on the other node. But I guess 
it can't be that simple?

Regards
///jon

On 11/15/2016 05:15 AM, Amar Nv wrote:
> Hello,
>
> I am trying to bring up a cluster of 2 nodes hosted on different
> environments.
>
>
>
> *NODE1 TIPC Node-addr
> (6)*root@host1:/lib/modules/3.2.0-29-generic/kernel/net/tipc#
> modinfo tipc.ko
> filename:   tipc.ko
> version:2.0.0
> license:Dual BSD/GPL
> description:TIPC: Transparent Inter Process Communication
> srcversion: A0CB14DDCCCBB1ABAE73386
> depends:
> intree: Y
> vermagic:   3.2.0-29-generic SMP mod_unload modversions
>
>
>
> *NODE2 TIPC Node-addr (1)*root@host2:/lib/modules/4.6.0-rc6/kernel/net/tipc#
> modinfo tipc.ko
> filename:   /lib/modules/4.6.0-rc6/kernel/net/tipc/tipc.ko
> version:2.0.0
> license:Dual BSD/GPL
> description:TIPC: Transparent Inter Process Communication
> srcversion: C58612B2A6F6FABF1EF75CE
> depends:udp_tunnel,ip6_udp_tunnel
> intree: Y
> vermagic:   4.6.0-rc6 SMP mod_unload modversions
>
>
>
>
> *Node 1 Kernel logs*
> Nov 15 15:40:09 7311-6 kernel: [598187.276986] TIPC: Established link
> <1.1.6:base-1.1.1:base> on network plane A
> Nov 15 15:40:09 7311-6 kernel: [598187.277100] TIPC: Resetting link
> <1.1.6:base-1.1.1:base>, requested by peer
> Nov 15 15:40:09 7311-6 kernel: [598187.277103] TIPC: Lost link
> <1.1.6:base-1.1.1:base> on network plane A
> Nov 15 15:40:09 7311-6 kernel: [598187.277106] TIPC: Lost contact with
> <1.1.1>
> Nov 15 15:40:09 7311-6 kernel: [598187.656616] TIPC: Established link
> <1.1.6:base-1.1.1:base> on network plane A
> Nov 15 15:40:09 7311-6 kernel: [598187.656687] TIPC: Resetting link
> <1.1.6:base-1.1.1:base>, requested by peer
> Nov 15 15:40:09 7311-6 kernel: [598187.656689] TIPC: Lost link
> <1.1.6:base-1.1.1:base> on network plane A
> Nov 15 15:40:09 7311-6 kernel: [598187.656691] TIPC: Lost contact with
> <1.1.1>
>
>
>
> *Node 2 Kernel logs*Nov 15 15:39:03 7470-1 kernel: [452484.786678] tipc:
> Activated (version 2.0.0)
> Nov 15 15:39:03 7470-1 kernel: [452484.786713] NET: Registered protocol
> family 30
> Nov 15 15:39:03 7470-1 kernel: [452484.786813] tipc: Started in single node
> mode
> Nov 15 15:39:03 7470-1 kernel: [452484.907687] Started in network mode
> Nov 15 15:39:03 7470-1 kernel: [452484.907693] Own node address <1.1.1>,
> network identity 1061
> Nov 15 15:39:03 7470-1 kernel: [452484.907759] Enabled bearer ,
> discovery domain <1.1.0>, priority 10
> Nov 15 15:40:09 7470-1 kernel: [452551.143276] Disabling bearer 
> Nov 15 15:40:09 7470-1 kernel: [452551.254557] Left network mode
> Nov 15 15:40:10 7470-1 kernel: [452551.310490] NET: Unregistered protocol
> family 30
> Nov 15 15:40:10 7470-1 kernel: [452551.310502] tipc: Deactivated
> Nov 15 15:40:47 7470-1 kernel: [452588.331653] tipc: Activated (version
> 2.0.0)
> Nov 15 15:40:47 7470-1 kernel: [452588.331688] NET: Registered protocol
> family 30
> Nov 15 15:40:47 7470-1 kernel: [452588.331790] tipc: Started in single node
> mode
> Nov 15 15:40:47 7470-1 kernel: [452588.432707] Started in network mode
> Nov 15 15:40:47 7470-1 kernel: [452588.432713] Own node address <1.1.1>,
> network identity 1061
> Nov 15 15:40:47 7470-1 kernel: [452588.432781] Enabled bearer ,
> discovery domain <1.1.0>, priority 10
> Nov 15 15:41:53 7470-1 kernel: [452654.591363] Disabling bearer 
> Nov 15 15:41:53 7470-1 kernel: [452654.713347] Left network mode
> Nov 15 15:41:53 7470-1 kernel: [452654.761326] NET: Unregistered protocol
> family 30
> Nov 15 15:41:53 7470-1 kernel: [452654.761340] tipc: Deactivated
>
>
> >From the tcpdump capture - I see a "Link State" message with type RESET is
> sent from Node2.
> I confirm the TIPC address is same on both the nodes.
>
> Please suggest,
> 1. Does the TIPC protocol works between 2 kernels as stated above?
> 2. How can we determine the reason for RESET trigger by Node2 - Any other
> debugging ideas ?
>
> Thanks,
> Amar
> --
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion



[tipc-discussion] [PATCH net-next v2 2/4] tipc: add functionality to lookup multicast destination nodes

2016-10-27 Thread Jon Maloy
As a further preparation for the upcoming 'replicast' functionality,
we add some necessary structs and functions for looking up and returning
a list of all nodes that host destinations for a given multicast message.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c  | 81 +--
 net/tipc/bcast.h  | 24 ++-
 net/tipc/name_table.c | 33 +
 net/tipc/name_table.h |  4 +++
 4 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 886df68..2a73a03 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -39,9 +39,7 @@
 #include "socket.h"
 #include "msg.h"
 #include "bcast.h"
-#include "name_distr.h"
 #include "link.h"
-#include "node.h"
 
 #defineBCLINK_WIN_DEFAULT  50  /* bcast link window size 
(default) */
 #defineBCLINK_WIN_MIN  32  /* bcast minimum link window 
size */
@@ -428,3 +426,82 @@ void tipc_bcast_stop(struct net *net)
kfree(tn->bcbase);
kfree(tn->bcl);
 }
+
+void tipc_nlist_init(struct tipc_nlist *nl, u32 self)
+{
+   memset(nl, 0, sizeof(*nl));
+   INIT_LIST_HEAD(>sent);
+   INIT_LIST_HEAD(>unsent);
+   skb_queue_head_init(>localq);
+   nl->self = self;
+}
+
+static struct tipc_nitem *tipc_nlist_find(struct tipc_nlist *nl, u32 node)
+{
+   struct tipc_nitem *n;
+
+   list_for_each_entry(n, >unsent, list) {
+   if (n->node == node)
+   return n;
+   }
+   list_for_each_entry(n, >sent, list) {
+   if (n->node == node)
+   return n;
+   }
+   return NULL;
+}
+
+void tipc_nlist_add(struct tipc_nlist *nl, u32 node)
+{
+   struct tipc_nitem *n;
+
+   if (node == nl->self) {
+   nl->local = true;
+   return;
+   }
+   n = tipc_nlist_find(nl, node);
+   if (n)
+   return;
+   n = kzalloc(sizeof(*n), GFP_KERNEL);
+   if (!n)
+   return;
+   INIT_LIST_HEAD(>list);
+   n->node = node;
+   nl->remote++;
+   list_add(>list, >unsent);
+}
+
+void tipc_nlist_sent(struct tipc_nlist *nl, struct tipc_nitem *n)
+{
+   list_del(>list);
+   list_add(>list, >sent);
+}
+
+void tipc_nlist_del(struct tipc_nlist *nl, u32 node)
+{
+   struct tipc_nitem *n = tipc_nlist_find(nl, node);
+
+   if (!n)
+   return;
+   list_del(>list);
+   kfree(n);
+   nl->remote--;
+}
+
+void tipc_nlist_restore(struct tipc_nlist *nl)
+{
+   list_splice_tail_init(>sent, >unsent);
+}
+
+void tipc_nlist_purge(struct tipc_nlist *nl)
+{
+   struct tipc_nitem *n, *tmp;
+
+   list_splice_tail_init(>sent, >unsent);
+   list_for_each_entry_safe(n, tmp, >unsent, list) {
+   list_del(>list);
+   kfree(n);
+   }
+   nl->remote = 0;
+   nl->local = 0;
+}
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 5ffe344..8f9a210 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -42,9 +42,31 @@
 struct tipc_node;
 struct tipc_msg;
 struct tipc_nl_msg;
-struct tipc_node_map;
+struct tipc_nlist;
+struct tipc_nitem;
 extern const char tipc_bclink_name[];
 
+struct tipc_nitem {
+   struct list_head list;
+   u32 node;
+};
+
+struct tipc_nlist {
+   struct list_head unsent;
+   struct list_head sent;
+   struct sk_buff_head localq;
+   u32 self;
+   int remote;
+   bool local;
+};
+
+void tipc_nlist_init(struct tipc_nlist *nl, u32 self);
+void tipc_nlist_restore(struct tipc_nlist *nl);
+void tipc_nlist_purge(struct tipc_nlist *nl);
+void tipc_nlist_add(struct tipc_nlist *nl, u32 node);
+void tipc_nlist_sent(struct tipc_nlist *nl, struct tipc_nitem *n);
+void tipc_nlist_del(struct tipc_nlist *nl, u32 node);
+
 int tipc_bcast_init(struct net *net);
 void tipc_bcast_stop(struct net *net);
 void tipc_bcast_add_peer(struct net *net, struct tipc_link *l,
diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index e190460..de58a0d 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -645,6 +645,39 @@ int tipc_nametbl_mc_translate(struct net *net, u32 type, 
u32 lower, u32 upper,
return res;
 }
 
+/* tipc_nametbl_lookup_dst_nodes - find broadcast destination nodes
+ * - Creates list of nodes that overlap the given multicast address
+ * - Determines if any node local ports overlap
+ */
+void tipc_nametbl_lookup_dst_nodes(struct net *net, u32 type, u32 lower,
+  u32 upper, u32 domain,
+  struct tipc_nlist *nodes)
+{
+   struct sub_seq *sseq, *stop;
+   struct publication *publ;
+   struct name_info *info;
+   struct name_seq *seq;
+
+   rcu_read_lock();
+   seq = nametbl_f

[tipc-discussion] [PATCH net-next v2 0/4] tipc: introduce multicast through replication

2016-10-27 Thread Jon Maloy
TIPC multicast messages are currently distributed via L2 broadcast
or IP multicast to all nodes in the cluster, irrespective of the 
number of real destinations of the message.

In this series we introduce an option to transport messages via
replication ("replicast") across a selected number of unicast links,
instead of relying on the underlying media. This option is used when
true broadcast/multicast is not supported by the media, or when the
number of true destinations is much smaller than the cluster size.

v2: -Fixed a counter bug when removing nodes from destination node list
- Moved definition of node destination list from to bcast.{h,c}

Jon Maloy (4):
  tipc: add function for checking broadcast support in bearer
  tipc: add functionality to lookup multicast destination nodes
  tipc: introduce replicast as transport option for multicast
  tipc: make replicast a user selectable option

 include/uapi/linux/tipc.h |   6 +-
 net/tipc/bcast.c  | 245 +-
 net/tipc/bcast.h  |  40 +++-
 net/tipc/bearer.c |  15 ++-
 net/tipc/bearer.h |   6 ++
 net/tipc/link.c   |  12 ++-
 net/tipc/msg.c|  17 
 net/tipc/msg.h|   2 +
 net/tipc/name_table.c |  33 +++
 net/tipc/name_table.h |   4 +
 net/tipc/node.c   |  27 +++--
 net/tipc/node.h   |   4 +-
 net/tipc/socket.c |  89 ++---
 net/tipc/udp_media.c  |   8 +-
 14 files changed, 424 insertions(+), 84 deletions(-)

-- 
2.7.4


--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH net-next v2 3/4] tipc: introduce replicast as transport option for multicast

2016-10-27 Thread Jon Maloy
TIPC multicast messages are currently carried over a reliable
'broadcast link', making use of the underlying media's ability to
transport packets as L2 broadcast or IP multicast to all nodes in
the cluster.

When the used bearer is lacking that ability, we can instead emulate
the broadcast service by replicating and sending the packets over as
many unicast links as needed to reach all identified destinations.
We now introduce a new TIPC link-level 'replicast' service that does
this.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c  | 103 ++
 net/tipc/bcast.h  |   3 +-
 net/tipc/link.c   |   8 -
 net/tipc/msg.c|  17 +
 net/tipc/msg.h|   2 ++
 net/tipc/node.c   |  27 +-
 net/tipc/socket.c |  57 +-
 7 files changed, 152 insertions(+), 65 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 2a73a03..98f0deb 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -174,43 +174,102 @@ static void tipc_bcbase_xmit(struct net *net, struct 
sk_buff_head *xmitq)
__skb_queue_purge(&_xmitq);
 }
 
-/* tipc_bcast_xmit - deliver buffer chain to all nodes in cluster
- *and to identified node local sockets
+/* tipc_bcast_xmit - broadcast the buffer chain to all external nodes
  * @net: the applicable net namespace
- * @list: chain of buffers containing message
+ * @msg: chain of buffers containing message
  * Consumes the buffer chain, except when returning -ELINKCONG
  * Returns 0 if success, otherwise errno: -ELINKCONG,-EHOSTUNREACH,-EMSGSIZE
  */
-int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list)
+static int tipc_bcast_xmit(struct net *net, struct sk_buff_head *msg)
 {
struct tipc_link *l = tipc_bc_sndlink(net);
-   struct sk_buff_head xmitq, inputq, rcvq;
+   struct sk_buff_head xmitq;
int rc = 0;
 
-   __skb_queue_head_init();
__skb_queue_head_init();
-   skb_queue_head_init();
-
-   /* Prepare message clone for local node */
-   if (unlikely(!tipc_msg_reassemble(list, )))
-   return -EHOSTUNREACH;
-
tipc_bcast_lock(net);
if (tipc_link_bc_peers(l))
-   rc = tipc_link_xmit(l, list, );
+   rc = tipc_link_xmit(l, msg, );
tipc_bcast_unlock(net);
+   tipc_bcbase_xmit(net, );
+   return rc;
+}
+
+/* tipc_rcast_xmit - replicate and send a message to given destination nodes
+ * @net: the applicable net namespace
+ * @msg: chain of buffers containing message
+ * @dests: list of destination nodes
+ * Returns -ELINKCONG if any link is congested, otherwise 0
+ */
+static int tipc_rcast_xmit(struct net *net, struct sk_buff_head *msg,
+  struct tipc_nlist *dests)
+{
+   struct sk_buff_head _msg;
+   struct tipc_nitem *n, *tmp;
+   int rc = 0;
+   u32 dst;
+
+   __skb_queue_head_init(&_msg);
 
-   /* Don't send to local node if adding to link failed */
-   if (unlikely(rc)) {
-   __skb_queue_purge();
-   return rc;
+   list_for_each_entry_safe(n, tmp, >unsent, list) {
+   dst = n->node;
+   if (!tipc_msg_pskb_copy(dst, msg, &_msg))
+   return -ENOMEM;
+
+   /* Already congestion? Ensure there will be only one wakeup */
+   TIPC_SKB_CB(skb_peek(&_msg))->wakeup_pending = rc;
+
+   /* Any other failure than -ELINKCONG is ignored */
+   if (tipc_node_xmit(net, &_msg, dst, dst) == -ELINKCONG)
+   rc = -ELINKCONG;
+   else
+   tipc_nlist_sent(dests, n);
+
+   /* Message copy list is non-empty if sending failed */
+   __skb_queue_purge(&_msg);
}
+   return rc;
+}
 
-   /* Broadcast to all nodes, inluding local node */
-   tipc_bcbase_xmit(net, );
-   tipc_sk_mcast_rcv(net, , );
-   __skb_queue_purge(list);
-   return 0;
+/* tipc_mcast_xmit - deliver message to indicated destination nodes
+ *   and to identified node local sockets
+ * @net: the applicable net namespace
+ * @msg: chain of buffers containing message
+ * @dests: destination nodes for message.
+ * Consumes buffer chain, except when returning -ELINKCONG
+ * Returns dest list with all items present in either 'sent' or 'unsent' list
+ * Returns 0 if success, otherwise errno
+ */
+int tipc_mcast_xmit(struct net *net, struct sk_buff_head *msg,
+   struct tipc_nlist *dests)
+{
+   struct tipc_bc_base *bb = tipc_bc_base(net);
+   struct sk_buff_head inputq;
+   int rc = 0;
+
+   skb_queue_head_init();
+
+   /* Create message clone for local node if applicable */
+   if (dests->local && skb_queue_empty(>localq) &&
+   !tipc_msg_reassemble(msg, >localq)) 

[tipc-discussion] [net-next v3 3/3] tipc: reduce risk of user starvation during link congestion

2016-12-12 Thread Jon Maloy
The socket code currently handles link congestion by either blocking
and trying to send again when the congestion has abated, or just
returning to the user with -EAGAIN and let him re-try later.

This mechanism is prone to starvation, because the wakeup algorithm is
non-atomic. During the time the link issues a wakeup signal, until the
socket wakes up and re-attempts sending, other senders may have come
in between and occupied the free buffer space in the link. This in turn
may lead to a socket having to make many send attempts before it is
successful. In extremely loaded systems we have observed latency times
of several seconds before a low-priority socket is able to send out a
message.

In this commit, we simplify this mechanism and reduce the risk of the
described scenario happening. When a message is attempted sent via a
congested link, we now let it be added to the link's backlog queue
anyway, thus permitting an oversubscription of one message per source
socket. We still create a wakeup item and return an error code, hence
instructing the sender to block or stop sending. Only when enough space
has been freed up in the link's backlog queue do we issue a wakeup event
that allows the sender to continue with the next message, if any.

The fact that a socket now can consider a message sent even when the
link returns a congestion code means that the sending socket code can
be simplified. Also, since this is a good opportunity to get rid of the
obsolete 'mtu change' condition in the three socket send functions, we
now choose to refactor those functions completely.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c  |   2 +-
 net/tipc/link.c   |  66 +--
 net/tipc/msg.h|   2 -
 net/tipc/node.c   |   2 +-
 net/tipc/socket.c | 346 --
 5 files changed, 184 insertions(+), 234 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index aa1babb..1a56cab 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -174,7 +174,7 @@ static void tipc_bcbase_xmit(struct net *net, struct 
sk_buff_head *xmitq)
  *and to identified node local sockets
  * @net: the applicable net namespace
  * @list: chain of buffers containing message
- * Consumes the buffer chain, except when returning -ELINKCONG
+ * Consumes the buffer chain.
  * Returns 0 if success, otherwise errno: -ELINKCONG,-EHOSTUNREACH,-EMSGSIZE
  */
 int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list)
diff --git a/net/tipc/link.c b/net/tipc/link.c
index bda89bf..5f2b478 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -776,60 +776,55 @@ int tipc_link_timeout(struct tipc_link *l, struct 
sk_buff_head *xmitq)
 
 /**
  * link_schedule_user - schedule a message sender for wakeup after congestion
- * @link: congested link
- * @list: message that was attempted sent
+ * @l: congested link
+ * @hdr: header of message that is being sent
  * Create pseudo msg to send back to user when congestion abates
- * Does not consume buffer list
+ * Consumes buffer list
  */
-static int link_schedule_user(struct tipc_link *link, struct sk_buff_head 
*list)
+static int link_schedule_user(struct tipc_link *l, struct tipc_msg *hdr)
 {
-   struct tipc_msg *msg = buf_msg(skb_peek(list));
-   int imp = msg_importance(msg);
-   u32 oport = msg_origport(msg);
-   u32 addr = tipc_own_addr(link->net);
+   int imp = msg_importance(hdr);
+   u32 dnode = tipc_own_addr(l->net);
+   u32 dport = msg_origport(hdr);
struct sk_buff *skb;
 
/* This really cannot happen...  */
if (unlikely(imp > TIPC_CRITICAL_IMPORTANCE)) {
-   pr_warn("%s<%s>, send queue full", link_rst_msg, link->name);
+   pr_warn("%s<%s>, send queue full", link_rst_msg, l->name);
return -ENOBUFS;
}
-   /* Non-blocking sender: */
-   if (TIPC_SKB_CB(skb_peek(list))->wakeup_pending)
-   return -ELINKCONG;
 
/* Create and schedule wakeup pseudo message */
skb = tipc_msg_create(SOCK_WAKEUP, 0, INT_H_SIZE, 0,
- addr, addr, oport, 0, 0);
+ dnode, l->addr, dport, 0, 0);
if (!skb)
return -ENOBUFS;
-   TIPC_SKB_CB(skb)->chain_sz = skb_queue_len(list);
+   msg_set_dest_droppable(buf_msg(skb), true);
TIPC_SKB_CB(skb)->chain_imp = imp;
-   skb_queue_tail(>wakeupq, skb);
-   link->stats.link_congs++;
+   skb_queue_tail(>wakeupq, skb);
+   l->stats.link_congs++;
return -ELINKCONG;
 }
 
 /**
  * link_prepare_wakeup - prepare users for wakeup after congestion
- * @link: congested link
- * Move a number of waiting users, as permitted by available space in
- * the send queue, from link wait queue to node wait queue for wakeup
+ * @l: congested link
+ * Wake up a number of waiting users, as perm

[tipc-discussion] [net-next v3 1/3] tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions

2016-12-12 Thread Jon Maloy
The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very
similar. The latter function is also called from two locations, and
there will be more in the coming commits, which will all need to test on
different conditions.

Instead of making yet another duplicates of the function, we now
introduce a new macro tipc_wait_for_cond() where the wakeup condition
can be stated as an argument to the call. This macro replaces all
current and future uses of the two functions, which can now be
eliminated.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/socket.c | 108 +-
 1 file changed, 49 insertions(+), 59 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 333c5da..8f3ab08 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -110,7 +110,6 @@ static void tipc_write_space(struct sock *sk);
 static void tipc_sock_destruct(struct sock *sk);
 static int tipc_release(struct socket *sock);
 static int tipc_accept(struct socket *sock, struct socket *new_sock, int 
flags);
-static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p);
 static void tipc_sk_timeout(unsigned long data);
 static int tipc_sk_publish(struct tipc_sock *tsk, uint scope,
   struct tipc_name_seq const *seq);
@@ -334,6 +333,49 @@ static int tipc_set_sk_state(struct sock *sk, int state)
return res;
 }
 
+static int tipc_sk_sock_err(struct socket *sock, long *timeout)
+{
+   struct sock *sk = sock->sk;
+   int err = sock_error(sk);
+   int typ = sock->type;
+
+   if (err)
+   return err;
+   if (typ == SOCK_STREAM || typ == SOCK_SEQPACKET) {
+   if (sk->sk_state == TIPC_DISCONNECTING)
+   return -EPIPE;
+   else if (!tipc_sk_connected(sk))
+   return -ENOTCONN;
+   }
+   if (!*timeout)
+   return -EAGAIN;
+   if (signal_pending(current))
+   return sock_intr_errno(*timeout);
+
+   return 0;
+}
+
+#define tipc_wait_for_cond(sock_, timeout_, condition_)
\
+({ \
+   int rc_ = 0;\
+   int done_ = 0;  \
+   \
+   while (!(condition_) && !done_) {   \
+   struct sock *sk_ = sock->sk;\
+   DEFINE_WAIT_FUNC(wait_, woken_wake_function);   \
+   \
+   rc_ = tipc_sk_sock_err(sock_, timeout_);\
+   if (rc_)\
+   break;  \
+   prepare_to_wait(sk_sleep(sk_), _,  \
+   TASK_INTERRUPTIBLE);\
+   done_ = sk_wait_event(sk_, timeout_,\
+ (condition_), _);\
+   remove_wait_queue(sk_sleep(sk), _);\
+   }   \
+   rc_;\
+})
+
 /**
  * tipc_sk_create - create a TIPC socket
  * @net: network namespace (must be default network)
@@ -719,7 +761,7 @@ static int tipc_sendmcast(struct  socket *sock, struct 
tipc_name_seq *seq,
 
if (rc == -ELINKCONG) {
tsk->link_cong = 1;
-   rc = tipc_wait_for_sndmsg(sock, );
+   rc = tipc_wait_for_cond(sock, , !tsk->link_cong);
if (!rc)
continue;
}
@@ -828,31 +870,6 @@ static void tipc_sk_proto_rcv(struct tipc_sock *tsk, 
struct sk_buff *skb,
kfree_skb(skb);
 }
 
-static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p)
-{
-   DEFINE_WAIT_FUNC(wait, woken_wake_function);
-   struct sock *sk = sock->sk;
-   struct tipc_sock *tsk = tipc_sk(sk);
-   int done;
-
-   do {
-   int err = sock_error(sk);
-   if (err)
-   return err;
-   if (sk->sk_shutdown & SEND_SHUTDOWN)
-   return -EPIPE;
-   if (!*timeo_p)
-   return -EAGAIN;
-   if (signal_pending(current))
-   return sock_intr_errno(*timeo_p);
-
-   add_wait_queue(sk_sleep(sk), );
-   done = sk_wait_event(sk, timeo_p, !tsk->link_cong, );
-   remove_wait_queue(sk_sleep(sk), );
-   } while (!done);
-   return 0;
-}
-
 /**
  * tipc_sendmsg - se

[tipc-discussion] [net-next v3 0/3] tipc: improve interaction socket-link

2016-12-12 Thread Jon Maloy
We fix a very real starvation problem that may occur when a link
encounters send buffer congestion. At the same time we make the 
interaction between the socket and link layer simpler and more 
consistent.

v2: - Simplified link congestion check to only check against own
  importance limit. This reduces the risk of higher levels
  starving out lower levels.
v3: - Adding one sent message to to link backlog queue even if there is
  congestion, as suggested by Partha.
- Allowing link_wakeup() loop to continue adding messages to the
  backlog queue even if one or more levels are congested. This
  seems to have a positive effect on performance.

Jon Maloy (3):
  tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg()
functions
  tipc: modify struct tipc_plist to be more versatile
  tipc: reduce risk of user starvation during link congestion

 net/tipc/bcast.c  |   2 +-
 net/tipc/link.c   |  66 
 net/tipc/msg.h|   2 -
 net/tipc/name_table.c | 100 +++
 net/tipc/name_table.h |  21 +--
 net/tipc/node.c   |   2 +-
 net/tipc/socket.c | 448 ++
 7 files changed, 309 insertions(+), 332 deletions(-)

-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH 3.10 00/17] patches for xiang in 3.10

2016-12-15 Thread Jon Maloy
Hi Xiang,

You have to apply the patches one by one, in order 1-17.  Since your 
Linux version is not exactly the same as the one from which I generated 
the patches, you may also need to try with the switch "-3", e.g. "git am 
-3 1.eml".

This will try to do a 3-way merge in case the patch doesn't apply cleanly.

///jon



On 12/15/2016 03:46 AM, XIANG Haiming wrote:
> Hi Ying,
>
> I have installed Thunderbird to receive these patch file and save them to 
> *.eml file.(from 1.eml to 17.eml, refer to the attached file)
>
> And then use command "git am *.eml" and I get some error (detailed info refer 
> to patch_result.txt)
>
> Please help me to analyze these error. Thank you.
>
>
>
>
> -Original Message-
> From: Ying Xue [mailto:ying....@windriver.com]
> Sent: 2016年12月12日 17:18
> To: XIANG Haiming; Jon Maloy; tipc-discussion@lists.sourceforge.net; 
> parthasarathy.bhuvara...@ericsson.com
> Cc: ma...@donjonn.com
> Subject: Re: [PATCH 3.10 00/17] patches for xiang in 3.10
>
> On 12/12/2016 03:53 PM, XIANG Haiming wrote:
>> Hi Ying,
>>
>> I try to convert your email([PATCH 3.10 04/17] tipc: don't use memcpy to 
>> copy from user space)  to patch file 3.patch.
>>
>> I use command " patch -p0 < /root/3.patch" , there is the follow error:
>>
>> patching file net/tipc/msg.c
>> Hunk #1 succeeded at 76 with fuzz 2.
>> Hunk #2 FAILED at 92.
>> 1 out of 2 hunks FAILED -- saving rejects to file net/tipc/msg.c.rej
>>
>> Do you know why the secondary change cannot be apply?
> The good news is that your patch format is recognized by "patch"
> command, but the bad news is that you met one conflict between your patch and 
> your Linux kernel source code, which means that you may still lack some 
> necessary patches of TIPC. Of course, you can understand what conflict you 
> encountered by analyzing net/tipc/msg.c.rej. Meanwhile, I strongly suggest 
> you don't you "patch" to apply patches into your Linux source code because 
> it's hard for us to revert a patch applied by "patch" command.
>
> As I have no your environment on hand, it's hard for me to help you.
>
>>
>> -Original Message-
>> From: Ying Xue [mailto:ying@windriver.com]
>> Sent: 2016年12月9日 18:01
>> To: XIANG Haiming; Jon Maloy; tipc-discussion@lists.sourceforge.net;
>> parthasarathy.bhuvara...@ericsson.com
>> Cc: ma...@donjonn.com
>> Subject: Re: [PATCH 3.10 00/17] patches for xiang in 3.10
>>
>> On 12/09/2016 03:30 PM, XIANG Haiming wrote:
>>> Hi Ying,
>>>
>>> I have two question about these patches:
>>>
>>> 1. There is no " net/tipc/server.c" file in our Kernel
>>> 3.10.0-327.18.2.el7.x86_64
>> It's a bit strange. server.c was merged since v3.10-rc4-858-gc5fa7b3.
>>
>>> 2. I have saved 17 patches email to xxx.msg(from 1.msg to 17.msg)
>>> file. And then copy these msg file to one directory such as tipc-patch And 
>>> then I use command "git am tipc-patch", But there is no file changed.
>>>
>>> If I use command "git am 1.msg", there is error " Patch format detection 
>>> failed."
>> Yes, this command is right. I often use the command to apply patch of email 
>> format into kernel.
>>
>> Probably you use outlook email client. In fact I use Thunderbird, and this 
>> way works fine with me.
>>
>>> If I use command "git am 1.txt" (I save email to 1.txt), there is error " 
>>> Patch format detection failed."
>>>
>>> Maybe there is some misunderstanding, please tell me the correct patch 
>>> method. Thank you.
>>>
>>>
>>>
>>> -Original Message-
>>> From: Ying Xue [mailto:ying@windriver.com]
>>> Sent: 2016年12月7日 19:28
>>> To: XIANG Haiming; Jon Maloy; tipc-discussion@lists.sourceforge.net;
>>> parthasarathy.bhuvara...@ericsson.com
>>> Cc: ma...@donjonn.com
>>> Subject: Re: [PATCH 3.10 00/17] patches for xiang in 3.10
>>>
>>> Hi Xiang,
>>>
>>> You can save all patches in your email inbox as files one by one. When you 
>>> enter the folder of Linux kernel source, please run the command below:
>>>
>>> git am patch file
>>>
>>> The patch files is just saved through email.
>>>
>>> Once all 17 patches are applied into kernel source tree, you then can
>>> compile your kernel as well as TIPC module,
>>>
>>> Regards,
>>> Ying
>>>
>>> On 12/07/201

Re: [tipc-discussion] [net-next v3 2/3] tipc: modify struct tipc_plist to be more versatile

2016-12-13 Thread Jon Maloy


> -Original Message-
> From: Ying Xue [mailto:ying@windriver.com]
> Sent: Tuesday, 13 December, 2016 06:04
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net;
> Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
> Cc: ma...@donjonn.com; thompa@gmail.com
> Subject: Re: [net-next v3 2/3] tipc: modify struct tipc_plist to be more 
> versatile
> 
> On 12/13/2016 06:42 AM, Jon Maloy wrote:
> > During multicast reception we currently use a simple linked list with
> > push/pop semantics to store port numbers.
> >
> > We now see a need for a more generic list for storing values of type
> > u32. We therefore make some modifications to this list, while replacing
> > the prefix 'tipc_plist_' with 'u32_'.
> 
> It's a shame that we cannot use interfaces defined in lib/plist.c,
> otherwise, it's unnecessary for us to implement by ourselves.
> 
> I still prefer to use "tipc_u32" prefix because the function names are a
> bit too generic without "tipc". Especially when we analyze stack trace,
> common function names cause a bit inconvenience for us.

That is what I used first, but some code lines became awkwardly long, so I had 
to split if-clauses and function calls over two lines, something I generally 
try to avoid.
Also, remember that this is an internal function that is only called from other 
functions having the "tipc_" prefix, so you will never be in in doubt where the 
problem is if you see a stack dump.

///jon

> 
> Regards,
> Ying
> 
>   We also add a couple of new
> > functions which will come to use in the next commits.
> >
> > Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> > ---
> >  net/tipc/name_table.c | 100 ---
> ---
> >  net/tipc/name_table.h |  21 ---
> >  net/tipc/socket.c |   8 ++--
> >  3 files changed, 83 insertions(+), 46 deletions(-)
> >
> > diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
> > index e190460..5a86df1 100644
> > --- a/net/tipc/name_table.c
> > +++ b/net/tipc/name_table.c
> > @@ -608,7 +608,7 @@ u32 tipc_nametbl_translate(struct net *net, u32 type,
> u32 instance,
> >   * Returns non-zero if any off-node ports overlap
> >   */
> >  int tipc_nametbl_mc_translate(struct net *net, u32 type, u32 lower, u32
> upper,
> > - u32 limit, struct tipc_plist *dports)
> > + u32 limit, struct list_head *dports)
> >  {
> > struct name_seq *seq;
> > struct sub_seq *sseq;
> > @@ -633,7 +633,7 @@ int tipc_nametbl_mc_translate(struct net *net, u32
> type, u32 lower, u32 upper,
> > info = sseq->info;
> > list_for_each_entry(publ, >node_list, node_list) {
> > if (publ->scope <= limit)
> > -   tipc_plist_push(dports, publ->ref);
> > +   u32_push(dports, publ->ref);
> > }
> >
> > if (info->cluster_list_size != info->node_list_size)
> > @@ -1022,40 +1022,84 @@ int tipc_nl_name_table_dump(struct sk_buff *skb,
> struct netlink_callback *cb)
> > return skb->len;
> >  }
> >
> > -void tipc_plist_push(struct tipc_plist *pl, u32 port)
> > +struct u32_item {
> > +   struct list_head list;
> > +   u32 value;
> > +};
> > +
> > +bool u32_find(struct list_head *l, u32 value)
> >  {
> > -   struct tipc_plist *nl;
> > +   struct u32_item *item;
> >
> > -   if (likely(!pl->port)) {
> > -   pl->port = port;
> > -   return;
> > +   list_for_each_entry(item, l, list) {
> > +   if (item->value == value)
> > +   return true;
> > }
> > -   if (pl->port == port)
> > -   return;
> > -   list_for_each_entry(nl, >list, list) {
> > -   if (nl->port == port)
> > -   return;
> > +   return false;
> > +}
> > +
> > +bool u32_push(struct list_head *l, u32 value)
> > +{
> > +   struct u32_item *item;
> > +
> > +   list_for_each_entry(item, l, list) {
> > +   if (item->value == value)
> > +   return false;
> > +   }
> > +   item = kmalloc(sizeof(*item), GFP_ATOMIC);
> > +   if (unlikely(!item))
> > +   return false;
> > +
> > +   item->value = value;
> > +   list_add(>list, l);
> > +   return true;
> > +}
> > +
> > +u32 u32_pop(struct l

Re: [tipc-discussion] [net-next v3 3/3] tipc: reduce risk of user starvation during link congestion

2016-12-13 Thread Jon Maloy


> -Original Message-
> From: Ying Xue [mailto:ying@windriver.com]
> Sent: Tuesday, 13 December, 2016 07:39
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net;
> Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
> Cc: ma...@donjonn.com; thompa@gmail.com
> Subject: Re: [net-next v3 3/3] tipc: reduce risk of user starvation during 
> link
> congestion
> 
> On 12/13/2016 06:42 AM, Jon Maloy wrote:
> >  void link_prepare_wakeup(struct tipc_link *l)
> >  {
> > -   int pnd[TIPC_SYSTEM_IMPORTANCE + 1] = {0,};
> > -   int imp, lim;
> > struct sk_buff *skb, *tmp;
> > +   int imp, i = 0;
> >
> > skb_queue_walk_safe(>wakeupq, skb, tmp) {
> > imp = TIPC_SKB_CB(skb)->chain_imp;
> > -   lim = l->backlog[imp].limit;
> > -   pnd[imp] += TIPC_SKB_CB(skb)->chain_sz;
> > -   if ((pnd[imp] + l->backlog[imp].len) >= lim)
> > +   if (l->backlog[imp].len < l->backlog[imp].limit) {
> > +   skb_unlink(skb, >wakeupq);
> > +   skb_queue_tail(l->inputq, skb);
> > +   } else if (i++ > 10) {
> 
> About wakeup skb number, probably we can make it smarter, for example,
> its value can be decided by link window size and the size of available
> backlog queue or something else. As the value is an important factor for
> us, I suggest it's worth considering more.

Sure we can make it smarter, but I don't see why you see this figure as 
important. This is just an "emergency brake" in case the wakeup queue is very 
long.
If we have failed to find a user to wake up more than ten times it is very 
likely that all relevant levels are congested, and it is meaningless and a 
waste of CPU to continue. Note that the iterator is NOT stepped when we find a 
user to wake up.

Regards
///jon


> 
> Regards,
> Ying
> 
> 
> > break;
> > -   skb_unlink(skb, >wakeupq);
> > -   skb_queue_tail(l->inputq, skb);
> > +   }
> > }
> >  }


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [net-next 2/3] tipc: modify struct tipc_plist to be more versatile

2017-01-14 Thread Jon Maloy
During multicast reception we currently use a simple linked list with
push/pop semantics to store port numbers.

We now see a need for a more generic list for storing values of type
u32. We therefore make some modifications to this list, while replacing
the prefix 'tipc_plist_' with 'u32_'. We also add a couple of new
functions which will come to use in the next commits.

Acked-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
Acked-by: Ying Xue <ying@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/name_table.c | 100 --
 net/tipc/name_table.h |  21 ---
 net/tipc/socket.c |   8 ++--
 3 files changed, 83 insertions(+), 46 deletions(-)

diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index e190460..5a86df1 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -608,7 +608,7 @@ u32 tipc_nametbl_translate(struct net *net, u32 type, u32 
instance,
  * Returns non-zero if any off-node ports overlap
  */
 int tipc_nametbl_mc_translate(struct net *net, u32 type, u32 lower, u32 upper,
- u32 limit, struct tipc_plist *dports)
+ u32 limit, struct list_head *dports)
 {
struct name_seq *seq;
struct sub_seq *sseq;
@@ -633,7 +633,7 @@ int tipc_nametbl_mc_translate(struct net *net, u32 type, 
u32 lower, u32 upper,
info = sseq->info;
list_for_each_entry(publ, >node_list, node_list) {
if (publ->scope <= limit)
-   tipc_plist_push(dports, publ->ref);
+   u32_push(dports, publ->ref);
}
 
if (info->cluster_list_size != info->node_list_size)
@@ -1022,40 +1022,84 @@ int tipc_nl_name_table_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
return skb->len;
 }
 
-void tipc_plist_push(struct tipc_plist *pl, u32 port)
+struct u32_item {
+   struct list_head list;
+   u32 value;
+};
+
+bool u32_find(struct list_head *l, u32 value)
 {
-   struct tipc_plist *nl;
+   struct u32_item *item;
 
-   if (likely(!pl->port)) {
-   pl->port = port;
-   return;
+   list_for_each_entry(item, l, list) {
+   if (item->value == value)
+   return true;
}
-   if (pl->port == port)
-   return;
-   list_for_each_entry(nl, >list, list) {
-   if (nl->port == port)
-   return;
+   return false;
+}
+
+bool u32_push(struct list_head *l, u32 value)
+{
+   struct u32_item *item;
+
+   list_for_each_entry(item, l, list) {
+   if (item->value == value)
+   return false;
+   }
+   item = kmalloc(sizeof(*item), GFP_ATOMIC);
+   if (unlikely(!item))
+   return false;
+
+   item->value = value;
+   list_add(>list, l);
+   return true;
+}
+
+u32 u32_pop(struct list_head *l)
+{
+   struct u32_item *item;
+   u32 value = 0;
+
+   if (list_empty(l))
+   return 0;
+   item = list_first_entry(l, typeof(*item), list);
+   value = item->value;
+   list_del(>list);
+   kfree(item);
+   return value;
+}
+
+bool u32_del(struct list_head *l, u32 value)
+{
+   struct u32_item *item, *tmp;
+
+   list_for_each_entry_safe(item, tmp, l, list) {
+   if (item->value != value)
+   continue;
+   list_del(>list);
+   kfree(item);
+   return true;
}
-   nl = kmalloc(sizeof(*nl), GFP_ATOMIC);
-   if (nl) {
-   nl->port = port;
-   list_add(>list, >list);
+   return false;
+}
+
+void u32_list_purge(struct list_head *l)
+{
+   struct u32_item *item, *tmp;
+
+   list_for_each_entry_safe(item, tmp, l, list) {
+   list_del(>list);
+   kfree(item);
}
 }
 
-u32 tipc_plist_pop(struct tipc_plist *pl)
+int u32_list_len(struct list_head *l)
 {
-   struct tipc_plist *nl;
-   u32 port = 0;
+   struct u32_item *item;
+   int i = 0;
 
-   if (likely(list_empty(>list))) {
-   port = pl->port;
-   pl->port = 0;
-   return port;
+   list_for_each_entry(item, l, list) {
+   i++;
}
-   nl = list_first_entry(>list, typeof(*nl), list);
-   port = nl->port;
-   list_del(>list);
-   kfree(nl);
-   return port;
+   return i;
 }
diff --git a/net/tipc/name_table.h b/net/tipc/name_table.h
index 1524a73..c89bb3f 100644
--- a/net/tipc/name_table.h
+++ b/net/tipc/name_table.h
@@ -99,7 +99,7 @@ int tipc_nl_name_table_dump(struct sk_buff *skb, struct 
netlink_callback *cb);
 
 u32 tipc_nametbl_translate(struct net *net, 

[tipc-discussion] [net-next 0/3] tipc: improve interaction socket-link

2017-01-14 Thread Jon Maloy
We fix a very real starvation problem that may occur when a link
encounters send buffer congestion. At the same time we make the 
interaction between the socket and link layer simpler and more 
consistent.

Jon Maloy (3):
  tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg()
functions
  tipc: modify struct tipc_plist to be more versatile
  tipc: reduce risk of user starvation during link congestion

 net/tipc/bcast.c  |   6 +-
 net/tipc/link.c   |  75 -
 net/tipc/msg.h|   2 -
 net/tipc/name_table.c | 100 +++
 net/tipc/name_table.h |  21 +--
 net/tipc/node.c   |  15 +-
 net/tipc/socket.c | 449 ++
 7 files changed, 319 insertions(+), 349 deletions(-)

-- 
2.7.4


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [net-next 3/3] tipc: reduce risk of user starvation during link congestion

2017-01-14 Thread Jon Maloy
The socket code currently handles link congestion by either blocking
and trying to send again when the congestion has abated, or just
returning to the user with -EAGAIN and let him re-try later.

This mechanism is prone to starvation, because the wakeup algorithm is
non-atomic. During the time the link issues a wakeup signal, until the
socket wakes up and re-attempts sending, other senders may have come
in between and occupied the free buffer space in the link. This in turn
may lead to a socket having to make many send attempts before it is
successful. In extremely loaded systems we have observed latency times
of several seconds before a low-priority socket is able to send out a
message.

In this commit, we simplify this mechanism and reduce the risk of the
described scenario happening. When a message is attempted sent via a
congested link, we now let it be added to the link's backlog queue
anyway, thus permitting an oversubscription of one message per source
socket. We still create a wakeup item and return an error code, hence
instructing the sender to block or stop sending. Only when enough space
has been freed up in the link's backlog queue do we issue a wakeup event
that allows the sender to continue with the next message, if any.

The fact that a socket now can consider a message sent even when the
link returns a congestion code means that the sending socket code can
be simplified. Also, since this is a good opportunity to get rid of the
obsolete 'mtu change' condition in the three socket send functions, we
now choose to refactor those functions completely.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
Acked-by: Ying Xue <ying@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c  |   6 +-
 net/tipc/link.c   |  75 +---
 net/tipc/msg.h|   2 -
 net/tipc/node.c   |  15 +--
 net/tipc/socket.c | 347 --
 5 files changed, 194 insertions(+), 251 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index aa1babb..c35fad3 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -174,7 +174,7 @@ static void tipc_bcbase_xmit(struct net *net, struct 
sk_buff_head *xmitq)
  *and to identified node local sockets
  * @net: the applicable net namespace
  * @list: chain of buffers containing message
- * Consumes the buffer chain, except when returning -ELINKCONG
+ * Consumes the buffer chain.
  * Returns 0 if success, otherwise errno: -ELINKCONG,-EHOSTUNREACH,-EMSGSIZE
  */
 int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list)
@@ -197,7 +197,7 @@ int tipc_bcast_xmit(struct net *net, struct sk_buff_head 
*list)
tipc_bcast_unlock(net);
 
/* Don't send to local node if adding to link failed */
-   if (unlikely(rc)) {
+   if (unlikely(rc && (rc != -ELINKCONG))) {
__skb_queue_purge();
return rc;
}
@@ -206,7 +206,7 @@ int tipc_bcast_xmit(struct net *net, struct sk_buff_head 
*list)
tipc_bcbase_xmit(net, );
tipc_sk_mcast_rcv(net, , );
__skb_queue_purge(list);
-   return 0;
+   return rc;
 }
 
 /* tipc_bcast_rcv - receive a broadcast packet, and deliver to rcv link
diff --git a/net/tipc/link.c b/net/tipc/link.c
index bda89bf..b758ca8 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -776,60 +776,47 @@ int tipc_link_timeout(struct tipc_link *l, struct 
sk_buff_head *xmitq)
 
 /**
  * link_schedule_user - schedule a message sender for wakeup after congestion
- * @link: congested link
- * @list: message that was attempted sent
+ * @l: congested link
+ * @hdr: header of message that is being sent
  * Create pseudo msg to send back to user when congestion abates
- * Does not consume buffer list
  */
-static int link_schedule_user(struct tipc_link *link, struct sk_buff_head 
*list)
+static int link_schedule_user(struct tipc_link *l, struct tipc_msg *hdr)
 {
-   struct tipc_msg *msg = buf_msg(skb_peek(list));
-   int imp = msg_importance(msg);
-   u32 oport = msg_origport(msg);
-   u32 addr = tipc_own_addr(link->net);
+   u32 dnode = tipc_own_addr(l->net);
+   u32 dport = msg_origport(hdr);
struct sk_buff *skb;
 
-   /* This really cannot happen...  */
-   if (unlikely(imp > TIPC_CRITICAL_IMPORTANCE)) {
-   pr_warn("%s<%s>, send queue full", link_rst_msg, link->name);
-   return -ENOBUFS;
-   }
-   /* Non-blocking sender: */
-   if (TIPC_SKB_CB(skb_peek(list))->wakeup_pending)
-   return -ELINKCONG;
-
/* Create and schedule wakeup pseudo message */
skb = tipc_msg_create(SOCK_WAKEUP, 0, INT_H_SIZE, 0,
- addr, addr, oport, 0, 0);
+ dnode, l->addr, dport, 0, 0);
if (!skb)
return -ENOBUFS;
-   TIPC_SKB_CB(skb)->chai

Re: [tipc-discussion] [PATCH net v1 1/1] tipc: allocate user memory with GFP_KERNEL flag

2017-01-13 Thread Jon Maloy


> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Friday, 13 January, 2017 05:14
> To: tipc-discussion@lists.sourceforge.net; Jon Maloy <jon.ma...@ericsson.com>;
> Ying Xue <ying@windriver.com>; ru...@innovsys.com
> Subject: [PATCH net v1 1/1] tipc: allocate user memory with GFP_KERNEL flag
> 
> Until now, we 

always allocate memory with GFP_ATOMIC flag.

> When the system is under memory pressure and a user tries to send,

the send may fail due to low memory. However, a user application

> can wait for free memory if we allocate it using GFP_KERNEL flag.
> 
> In this commit, we use allocate memory with GFP_KERNEL for all

 allocations in user context.

Acked-by: jon

> 
> Reported-by: Rune Torgersen <ru...@innovsys.com>
> Signed-off-by: Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com>
> ---
>  net/tipc/discover.c   |  4 ++--
>  net/tipc/link.c   |  2 +-
>  net/tipc/msg.c| 16 
>  net/tipc/msg.h|  2 +-
>  net/tipc/name_distr.c |  2 +-
>  5 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/net/tipc/discover.c b/net/tipc/discover.c
> index 6b109a808d4c..02462d67d191 100644
> --- a/net/tipc/discover.c
> +++ b/net/tipc/discover.c
> @@ -169,7 +169,7 @@ void tipc_disc_rcv(struct net *net, struct sk_buff *skb,
> 
>   /* Send response, if necessary */
>   if (respond && (mtyp == DSC_REQ_MSG)) {
> - rskb = tipc_buf_acquire(MAX_H_SIZE);
> + rskb = tipc_buf_acquire(MAX_H_SIZE, GFP_ATOMIC);
>   if (!rskb)
>   return;
>   tipc_disc_init_msg(net, rskb, DSC_RESP_MSG, bearer);
> @@ -278,7 +278,7 @@ int tipc_disc_create(struct net *net, struct tipc_bearer 
> *b,
>   req = kmalloc(sizeof(*req), GFP_ATOMIC);
>   if (!req)
>   return -ENOMEM;
> - req->buf = tipc_buf_acquire(MAX_H_SIZE);
> + req->buf = tipc_buf_acquire(MAX_H_SIZE, GFP_ATOMIC);
>   if (!req->buf) {
>   kfree(req);
>   return -ENOMEM;
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index b758ca8b2f79..b0f8646e0631 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -1384,7 +1384,7 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct
> tipc_link *tnl,
>   msg_set_seqno(hdr, seqno++);
>   pktlen = msg_size(hdr);
>   msg_set_size(, pktlen + INT_H_SIZE);
> - tnlskb = tipc_buf_acquire(pktlen + INT_H_SIZE);
> + tnlskb = tipc_buf_acquire(pktlen + INT_H_SIZE, GFP_ATOMIC);
>   if (!tnlskb) {
>   pr_warn("%sunable to send packet\n", link_co_err);
>   return;
> diff --git a/net/tipc/msg.c b/net/tipc/msg.c
> index a22be502f1bd..ab02d0742476 100644
> --- a/net/tipc/msg.c
> +++ b/net/tipc/msg.c
> @@ -58,12 +58,12 @@ static unsigned int align(unsigned int i)
>   * NOTE: Headroom is reserved to allow prepending of a data link header.
>   *   There may also be unrequested tailroom present at the buffer's end.
>   */
> -struct sk_buff *tipc_buf_acquire(u32 size)
> +struct sk_buff *tipc_buf_acquire(u32 size, gfp_t gfp)
>  {
>   struct sk_buff *skb;
>   unsigned int buf_size = (BUF_HEADROOM + size + 3) & ~3u;
> 
> - skb = alloc_skb_fclone(buf_size, GFP_ATOMIC);
> + skb = alloc_skb_fclone(buf_size, gfp);
>   if (skb) {
>   skb_reserve(skb, BUF_HEADROOM);
>   skb_put(skb, size);
> @@ -95,7 +95,7 @@ struct sk_buff *tipc_msg_create(uint user, uint type,
>   struct tipc_msg *msg;
>   struct sk_buff *buf;
> 
> - buf = tipc_buf_acquire(hdr_sz + data_sz);
> + buf = tipc_buf_acquire(hdr_sz + data_sz, GFP_ATOMIC);
>   if (unlikely(!buf))
>   return NULL;
> 
> @@ -261,7 +261,7 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr
> *m,
> 
>   /* No fragmentation needed? */
>   if (likely(msz <= pktmax)) {
> - skb = tipc_buf_acquire(msz);
> + skb = tipc_buf_acquire(msz, GFP_KERNEL);
>   if (unlikely(!skb))
>   return -ENOMEM;
>   skb_orphan(skb);
> @@ -282,7 +282,7 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr
> *m,
>   msg_set_importance(, msg_importance(mhdr));
> 
>   /* Prepare first fragment */
> - skb = tipc_buf_acquire(pktmax);
> + skb = tipc_buf_acquire(pktmax, GFP_KERNEL);
>   if (!skb)
>   return -ENOMEM;
>   skb_orphan(skb);
> @@ -313,7 +313,7 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr
> *m,
>  

Re: [tipc-discussion] [net-next v3 3/4] tipc: introduce replicast as transport option for multicast

2017-01-13 Thread Jon Maloy


> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Friday, 13 January, 2017 04:24
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net;
> Ying Xue <ying@windriver.com>
> Subject: Re: [tipc-discussion] [net-next v3 3/4] tipc: introduce replicast as
> transport option for multicast
> 
> On 01/04/2017 06:05 PM, Parthasarathy Bhuvaragan wrote:
> > Hi Jon,
> >
> > Added some minor comments inline in this patch, apart from that the
> > major concern is the following:
> >
> > All my tests which passed before this patch, fails while sending
> > multicast to a receiver on own node.
> >
> > With this patch, we increase the likelyhood of receive buffer overflow
> > if the sender & receivers are running on the same host as we bypass the
> > link layer completely. I confirmed this with some traces in filter_rcv().
> >
> > If I add another multicast listener running on another node, this
> > pacifies the sender (put the sender to sleep at link congestion) and
> > relatively slow link layer reduces the buffer overflow.
> >
> > We need to find a way reduce the aggressiveness of the sender.
> > We want users to be transparent about the location of the services, so
> > we should to provide similar charecteristics regardless of the service
> > location.
> >
> Jon, running ptts sever and client on a standalone node without your
> updates failed. So in that aspect, iam ok with this patch.
> 
> If the ethernet bearer lacks broadcast ability, then neighbor discovery
> will not work. So do we intend to introduce support to add ethernet
> peers manually as we do for udp bearers? otherwise we can never use
> replicast for non udp bearers.

I believe all Ethernet implementations, even overlay networks, provide some 
form of broadcast, or in lack thereof, an emulated broadcast.
So, discovery should work, but it will be very inefficient when we do link 
broadcast, because tipc will think that genuine Ethernet broadcast is supported.
We actually need some way to find out what kind of "Ethernet" we are attached 
to, e.g. VXLAN, so that the "bcast supported" flag  can be set correctly.
I wonder if that if possible, or if it has to be configured.

///jon

> 
> /Partha
> 
> > /Partha
> >
> > On 01/02/2017 03:34 PM, Parthasarathy Bhuvaragan wrote:
> >> Hi jon,
> >>
> >> When I include this patch, ptts case 12 (multicast) fails when the
> >> client and server are running on the same node.
> >>
> >> /Partha
> >>
> >> On 12/22/2016 04:15 PM, Jon Maloy wrote:
> >>> TIPC multicast messages are currently carried over a reliable
> >>> 'broadcast link', making use of the underlying media's ability to
> >>> transport packets as L2 broadcast or IP multicast to all nodes in
> >>> the cluster.
> >>>
> >>> When the used bearer is lacking that ability, we can instead emulate
> >>> the broadcast service by replicating and sending the packets over as
> >>> many unicast links as needed to reach all identified destinations.
> >>> We now introduce a new TIPC link-level 'replicast' service that does
> >>> this.
> >>>
> >>> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> >>> ---
> >>>  net/tipc/bcast.c  | 105
> ++
> >>>  net/tipc/bcast.h  |   3 +-
> >>>  net/tipc/link.c   |   8 -
> >>>  net/tipc/msg.c|  17 +
> >>>  net/tipc/msg.h|   9 +++--
> >>>  net/tipc/node.c   |  27 +-
> >>>  net/tipc/socket.c |  27 +-
> >>>  7 files changed, 149 insertions(+), 47 deletions(-)
> >>>
> >>> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
> >>> index 412d335..672e6ef 100644
> >>> --- a/net/tipc/bcast.c
> >>> +++ b/net/tipc/bcast.c
> >>> @@ -70,7 +70,7 @@ static struct tipc_bc_base *tipc_bc_base(struct net
> *net)
> >>>
> >>>  int tipc_bcast_get_mtu(struct net *net)
> >>>  {
> >>> - return tipc_link_mtu(tipc_bc_sndlink(net));
> >>> + return tipc_link_mtu(tipc_bc_sndlink(net)) - INT_H_SIZE;
> >>>  }
> >>>
> >>>  /* tipc_bcbase_select_primary(): find a bearer with links to all 
> >>> destinations,
> >>> @@ -175,42 +175,101 @@ static void tipc_bcbase_xmit(struct net *net,
> struct sk_buff_head *xmitq)
> >>>   __skb_queue_purge(&_xmitq);
> >>

Re: [tipc-discussion] [PATCH net-next v2 0/3] tipc: improve interaction socket-link

2016-11-30 Thread Jon Maloy
Weird. Looks like a corrupted incoming buffer directly at startup, 
before any of my new code is active. Is this repeatable?

///jon


On 11/30/2016 08:52 AM, Parthasarathy Bhuvaragan wrote:
> Hi Jon,
>
> With your patches, I get the following crash when loading the tipc 
> module. Leaving home now, so couldnt investigate further.
>
> [   58.201114] tipc: Started in single node mode
> [   58.212991] Started in network mode
> [   58.213796] Own node address <1.1.1>, network identity 4711
> [   58.238416] 8021q: adding VLAN 0 to HW filter on device data0
> [   58.252217] 8021q: adding VLAN 0 to HW filter on device data1
> [   58.270822] Enabled bearer , discovery domain <1.1.0>, 
> priority 10
> [   58.571114] general protection fault:  [#1] SMP
> [   58.572031] Modules linked in: tipc ip6_udp_tunnel udp_tunnel 
> 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio
> [   58.572031] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #15
> [   58.572031] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [   58.572031] task: 81c0d540 task.stack: 81c0
> [   58.572031] RIP: 0010:[] [] 
> skb_release_head_state+0x4d/0xa0
> [   58.572031] RSP: 0018:880037c03ba0  EFLAGS: 00010246
> [   58.572031] RAX: 0001 RBX: 880033fffa00 RCX: 
> 00ff
> [   58.572031] RDX:  RSI: 880037c03bca RDI: 
> 880033fffa00
> [   58.572031] RBP: 880037c03ba8 R08: a005f2c0 R09: 
> 
> [   58.572031] R10: 880035b0f0a0 R11: ea00 R12: 
> 880033fffa00
> [   58.572031] R13: a0048fd4 R14: 81cfbec0 R15: 
> 880033718000
> [   58.572031] FS:  () GS:880037c0() 
> knlGS:
> [   58.572031] CS:  0010 DS:  ES:  CR0: 80050033
> [   58.572031] CR2: 00851bf0 CR3: 35b0 CR4: 
> 06f0
> [   58.572031] Stack:
> [   58.572031]  880033fffa00 880037c03bc0 8162f2b2 
> 880033fffa00
> [   58.572031]  880037c03be8 8162f327 880033fffa00 
> 
> [   58.572031]  880035b32540 880037c03c68 a0048fd4 
> 0082
> [   58.572031] Call Trace:
> [   58.572031]   [   58.572031] [] 
> skb_release_all+0x12/0x30
> [   58.572031]  [] kfree_skb+0x37/0xa0
> [   58.572031]  [] tipc_disc_rcv+0x84/0x1d0 [tipc]
> [   58.572031]  [] tipc_rcv+0x3ac/0x3c0 [tipc]
> [   58.572031]  [] ? find_busiest_group+0x117/0x940
> [   58.572031]  [] tipc_l2_rcv_msg+0x48/0x60 [tipc]
> [   58.572031]  [] __netif_receive_skb_core+0x2e5/0xa60
> [   58.572031]  [] ? __build_skb+0x2a/0xe0
> [   58.572031]  [] ? __build_skb+0x2a/0xe0
> [   58.572031]  [] __netif_receive_skb+0x1b/0x70
> [   58.572031]  [] netif_receive_skb_internal+0x2d/0x90
> [   58.572031]  [] napi_gro_receive+0x94/0x130
> [   58.572031]  [] virtnet_receive+0x1a5/0x8a0 
> [virtio_net]
> [   58.572031]  [] virtnet_poll+0x1d/0x80 [virtio_net]
> [   58.572031]  [] net_rx_action+0x20e/0x390
> [   58.572031]  [] __do_softirq+0x9b/0x2a2
> [   58.572031]  [] irq_exit+0x60/0x70
> [   58.572031]  [] do_IRQ+0x54/0xd0
> [   58.572031]  [] common_interrupt+0x7f/0x7f
> [   58.572031]   [   58.572031] [] ? 
> default_idle+0x20/0xe0
> [   58.572031]  [] ? next_zone+0x29/0x30
> [   58.572031]  [] arch_cpu_idle+0xf/0x20
> [   58.572031]  [] default_idle_call+0x2c/0x30
> [   58.572031]  [] cpu_startup_entry+0x177/0x1e0
> [   58.572031]  [] rest_init+0x77/0x80
> [   58.572031]  [] start_kernel+0x40e/0x41b
> [   58.572031]  [] x86_64_start_reservations+0x2a/0x2c
> [   58.572031]  [] x86_64_start_kernel+0xea/0xed
> [   58.572031] Code: 00 00 48 8b 7b 68 48 85 ff 74 05 f0 ff 0f 74 36 
> 48 8b 43 60 48 85 c0 74 14 65 8b 15 96 d3 9d 7e 81 e2 00 00 0f 00 75 
> 30 48 89 df  d0 48 8b 7b 70 48 85 ff 74 05 f0 ff 0f 74 03 5b 5d c3 
> e8 bb
> [   58.572031] RIP  [] skb_release_head_state+0x4d/0xa0
> [   58.572031]  RSP 
> [   58.662814] ---[ end trace fa57695d3ce8757f ]---
> [   58.663875] Kernel panic - not syncing: Fatal exception in interrupt
> [   58.664872] Kernel Offset: disabled
> [   58.664872] ---[ end Kernel panic - not syncing: Fatal exception in 
> interrupt
>
> regards
> Partha
>
> On 11/29/2016 06:07 PM, Jon Maloy wrote:
>> Ying, Partha,
>> It would be very nice I could get "acked" or "reviewed" on this so I 
>> can send it to David before net-next closes.
>>
>> ///jon
>>
>>
>>> -Original Message-
>>> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
>>> Sent: Tuesday, 29 November, 2016 12:04
>>> To: tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuva

[tipc-discussion] [PATCH net-next v2 0/3] tipc: improve interaction socket-link

2016-11-29 Thread Jon Maloy
We fix a very real starvation problem that may occur when the link
level runs into send buffer congestion. At the same time we make the 
interaction between the socket and link layer simpler and more 
consistent.

v2: - Simplified link congestion check to only check against own
  importance limit. This reduces the risk of higher levels
  starving out lower levels.

Jon Maloy (3):
  tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg()
functions
  tipc: modify struct tipc_plist to be more versatile
  tipc: reduce risk of user starvation during link congestion

 net/tipc/bcast.c  |   2 +-
 net/tipc/link.c   |  81 -
 net/tipc/msg.h|   8 +-
 net/tipc/name_table.c | 100 +++
 net/tipc/name_table.h |  21 +--
 net/tipc/node.c   |   2 +-
 net/tipc/socket.c | 450 ++
 7 files changed, 327 insertions(+), 337 deletions(-)

-- 
2.7.4


--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH net-next v2 3/3] tipc: reduce risk of user starvation during link congestion

2016-11-29 Thread Jon Maloy
The socket code currently handles link congestion by either blocking
and trying to send again when the congestion has abated, or just
returning to the user with -EAGAIN and let him re-try later.

This mechanism is prone to starvation, because the wakeup algorithm is
non-atomic. During the time the link issues a wakeup signal, until the
socket wakes up and re-attempts sending, other senders may have come
in between and occupied the free buffer space in the link. This in turn
may lead to a socket having to make many send attempts before it is
successful. In extremely loaded systems we have observed latency times
of several seconds before a low-priority socket is able to send out a
message.

In this commit, we simplify this mechanism and reduce the risk of the
described scenario happening. When a message is sent to a congested
link, we now let it keep the message in the wakeup-item that it has to
create anyway, and immediately add it to the link's send queue when
enough space has been freed up. Only when this is done do we issue a
wakeup signal to the socket, which can now immediately go on and send
the next message, if any.

The fact that a socket now can consider a message sent even when the
link returns a congestion code means that the sending socket code can
be simplified. Also, since this is a good opportunity to get rid of the
obsolete 'mtu change' condition in the three socket send functions, we
now choose to refactor those functions completely.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c  |   2 +-
 net/tipc/link.c   |  81 +++--
 net/tipc/msg.h|   8 +-
 net/tipc/node.c   |   2 +-
 net/tipc/socket.c | 346 --
 5 files changed, 200 insertions(+), 239 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index aa1babb..1a56cab 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -174,7 +174,7 @@ static void tipc_bcbase_xmit(struct net *net, struct 
sk_buff_head *xmitq)
  *and to identified node local sockets
  * @net: the applicable net namespace
  * @list: chain of buffers containing message
- * Consumes the buffer chain, except when returning -ELINKCONG
+ * Consumes the buffer chain.
  * Returns 0 if success, otherwise errno: -ELINKCONG,-EHOSTUNREACH,-EMSGSIZE
  */
 int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list)
diff --git a/net/tipc/link.c b/net/tipc/link.c
index ecc12411..428437c 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -776,58 +776,60 @@ int tipc_link_timeout(struct tipc_link *l, struct 
sk_buff_head *xmitq)
 
 /**
  * link_schedule_user - schedule a message sender for wakeup after congestion
- * @link: congested link
+ * @l: congested link
  * @list: message that was attempted sent
  * Create pseudo msg to send back to user when congestion abates
- * Does not consume buffer list
+ * Consumes buffer list
  */
-static int link_schedule_user(struct tipc_link *link, struct sk_buff_head 
*list)
+static int link_schedule_user(struct tipc_link *l, struct sk_buff_head *list)
 {
-   struct tipc_msg *msg = buf_msg(skb_peek(list));
-   int imp = msg_importance(msg);
-   u32 oport = msg_origport(msg);
-   u32 addr = tipc_own_addr(link->net);
+   struct tipc_msg *hdr = buf_msg(skb_peek(list));
+   int imp = msg_importance(hdr);
+   u32 oport = msg_origport(hdr);
+   u32 dnode = tipc_own_addr(l->net);
struct sk_buff *skb;
+   struct sk_buff_head *pkts;
 
/* This really cannot happen...  */
if (unlikely(imp > TIPC_CRITICAL_IMPORTANCE)) {
-   pr_warn("%s<%s>, send queue full", link_rst_msg, link->name);
+   pr_warn("%s<%s>, send queue full", link_rst_msg, l->name);
return -ENOBUFS;
}
-   /* Non-blocking sender: */
-   if (TIPC_SKB_CB(skb_peek(list))->wakeup_pending)
-   return -ELINKCONG;
 
/* Create and schedule wakeup pseudo message */
skb = tipc_msg_create(SOCK_WAKEUP, 0, INT_H_SIZE, 0,
- addr, addr, oport, 0, 0);
+ dnode, l->addr, oport, 0, 0);
if (!skb)
return -ENOBUFS;
-   TIPC_SKB_CB(skb)->chain_sz = skb_queue_len(list);
-   TIPC_SKB_CB(skb)->chain_imp = imp;
-   skb_queue_tail(>wakeupq, skb);
-   link->stats.link_congs++;
+   msg_set_dest_droppable(buf_msg(skb), true);
+   skb_queue_tail(>wakeupq, skb);
+
+   /* Keep the packet chain until we can send it */
+   pkts = _SKB_CB(skb)->pkts;
+   __skb_queue_head_init(pkts);
+   skb_queue_splice_init(list, pkts);
+   l->stats.link_congs++;
return -ELINKCONG;
 }
 
 /**
  * link_prepare_wakeup - prepare users for wakeup after congestion
- * @link: congested link
- * Move a number of waiting users, as permitted by available space in
- * the send queue, from 

[tipc-discussion] [PATCH net-next v2 1/3] tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions

2016-11-29 Thread Jon Maloy
The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very
similar. The latter function is also called from two locations, and
there will be more in the coming commits, which will all need to test on
different conditions.

Instead of making yet another duplicates of the function, we now
introduce a new macro tipc_wait_for_cond() where the wakeup condition
can be stated as an argument to the call. This macro replaces all
current and future uses of the two functions, which can now be
eliminated.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/socket.c | 110 +-
 1 file changed, 51 insertions(+), 59 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 333c5da..30732a8 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -110,7 +110,6 @@ static void tipc_write_space(struct sock *sk);
 static void tipc_sock_destruct(struct sock *sk);
 static int tipc_release(struct socket *sock);
 static int tipc_accept(struct socket *sock, struct socket *new_sock, int 
flags);
-static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p);
 static void tipc_sk_timeout(unsigned long data);
 static int tipc_sk_publish(struct tipc_sock *tsk, uint scope,
   struct tipc_name_seq const *seq);
@@ -334,6 +333,51 @@ static int tipc_set_sk_state(struct sock *sk, int state)
return res;
 }
 
+static int tipc_sk_sock_err(struct socket *sock, long *timeout)
+{
+   struct sock *sk = sock->sk;
+   int err = sock_error(sk);
+   int typ = sock->type;
+
+   if (err)
+   return err;
+   if (typ == SOCK_STREAM || typ == SOCK_SEQPACKET) {
+   if (sk->sk_state == TIPC_DISCONNECTING)
+   return -EPIPE;
+   else if (!tipc_sk_connected(sk))
+   return -ENOTCONN;
+   } else if (sk->sk_shutdown & SEND_SHUTDOWN) {
+   return -EPIPE;
+   }
+   if (!*timeout)
+   return -EAGAIN;
+   if (signal_pending(current))
+   return sock_intr_errno(*timeout);
+
+   return 0;
+}
+
+#define tipc_wait_for_cond(sock_, timeout_, condition_)
\
+({ \
+   int rc_ = 0;\
+   int done_ = 0;  \
+   \
+   while (!(condition_) && !done_) {   \
+   struct sock *sk_ = sock->sk;\
+   DEFINE_WAIT_FUNC(wait_, woken_wake_function);   \
+   \
+   rc_ = tipc_sk_sock_err(sock_, timeout_);\
+   if (rc_)\
+   break;  \
+   prepare_to_wait(sk_sleep(sk_), _,  \
+   TASK_INTERRUPTIBLE);\
+   done_ = sk_wait_event(sk_, timeout_,\
+ (condition_), _);\
+   remove_wait_queue(sk_sleep(sk), _);\
+   }   \
+   rc_;\
+})
+
 /**
  * tipc_sk_create - create a TIPC socket
  * @net: network namespace (must be default network)
@@ -719,7 +763,7 @@ static int tipc_sendmcast(struct  socket *sock, struct 
tipc_name_seq *seq,
 
if (rc == -ELINKCONG) {
tsk->link_cong = 1;
-   rc = tipc_wait_for_sndmsg(sock, );
+   rc = tipc_wait_for_cond(sock, , !tsk->link_cong);
if (!rc)
continue;
}
@@ -828,31 +872,6 @@ static void tipc_sk_proto_rcv(struct tipc_sock *tsk, 
struct sk_buff *skb,
kfree_skb(skb);
 }
 
-static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p)
-{
-   DEFINE_WAIT_FUNC(wait, woken_wake_function);
-   struct sock *sk = sock->sk;
-   struct tipc_sock *tsk = tipc_sk(sk);
-   int done;
-
-   do {
-   int err = sock_error(sk);
-   if (err)
-   return err;
-   if (sk->sk_shutdown & SEND_SHUTDOWN)
-   return -EPIPE;
-   if (!*timeo_p)
-   return -EAGAIN;
-   if (signal_pending(current))
-   return sock_intr_errno(*timeo_p);
-
-   add_wait_queue(sk_sleep(sk), );
-   done = sk_wait_event(sk, timeo_p, !tsk->link_cong, );
-   remov

Re: [tipc-discussion] [PATCH net-next v2 0/3] tipc: improve interaction socket-link

2016-12-05 Thread Jon Maloy


> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Monday, 05 December, 2016 15:11
> To: Jon Maloy <ma...@donjonn.com>; Jon Maloy <jon.ma...@ericsson.com>;
> tipc-discussion@lists.sourceforge.net; Ying Xue <ying@windriver.com>
> Cc: thompa@gmail.com
> Subject: Re: [PATCH net-next v2 0/3] tipc: improve interaction socket-link
> 
> Hi Jon,
> 
> Sorry for the delay, could not work due to sick child.
> 
> The crash occurs due to the last commit:
> "tipc: reduce risk of user starvation during link congestion"
> 
> I examined the crash today, the crash due to array out of bounds for 
> skb->cb[48].
> The max size allowed for the callback area is 48bytes, whereas the new struct
> tipc_skb_cb is 64 bytes.

Weird. I did of course test this, and on my system sizeof(tipc_skb_cb) yields 
40, and everything works flawlessly.

> This overrides the skb->destructor callback lying below the 'skb->cb'.
> The sizeof struct sk_buff_head itself is 48bytes.
> 
> crash> p *(struct sk_buff*)0x88003f007600
>:
>dev = 0x88003f985000,
>cb = "\000\00\000",
>_skb_refdst = 0,
>destructor = 0x1,  << insane function pointer >>
> 
> I think the simpler way to place these packets 'pkts' into the backlogq and 
> allow
> temporary over-committing and keep the wakeup mechanism as it is.

You are right. The end result will be the same. I'll change it and recommit.

///jon

> 
> This way, we transmit the packet in tipc_link_advance_backlog() instead of 
> doing
> it in
> link_prepare_wakeup().  Its misleading that link_prepare_wakeup() transmits
> packets.
> 
> /Partha
> 
> 
> On 11/30/2016 07:48 PM, Jon Maloy wrote:
> > Weird. Looks like a corrupted incoming buffer directly at startup,
> > before any of my new code is active. Is this repeatable?
> >
> > ///jon
> >
> >
> > On 11/30/2016 08:52 AM, Parthasarathy Bhuvaragan wrote:
> >> Hi Jon,
> >>
> >> With your patches, I get the following crash when loading the tipc
> >> module. Leaving home now, so couldnt investigate further.
> >>
> >> [   58.201114] tipc: Started in single node mode
> >> [   58.212991] Started in network mode
> >> [   58.213796] Own node address <1.1.1>, network identity 4711
> >> [   58.238416] 8021q: adding VLAN 0 to HW filter on device data0
> >> [   58.252217] 8021q: adding VLAN 0 to HW filter on device data1
> >> [   58.270822] Enabled bearer , discovery domain <1.1.0>,
> >> priority 10
> >> [   58.571114] general protection fault:  [#1] SMP
> >> [   58.572031] Modules linked in: tipc ip6_udp_tunnel udp_tunnel
> >> 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio
> >> [   58.572031] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #15
> >> [   58.572031] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >> [   58.572031] task: 81c0d540 task.stack: 81c0
> >> [   58.572031] RIP: 0010:[] []
> >> skb_release_head_state+0x4d/0xa0
> >> [   58.572031] RSP: 0018:880037c03ba0  EFLAGS: 00010246
> >> [   58.572031] RAX: 0001 RBX: 880033fffa00 RCX:
> >> 00ff
> >> [   58.572031] RDX:  RSI: 880037c03bca RDI:
> >> 880033fffa00
> >> [   58.572031] RBP: 880037c03ba8 R08: a005f2c0 R09:
> >> 
> >> [   58.572031] R10: 880035b0f0a0 R11: ea00 R12:
> >> 880033fffa00
> >> [   58.572031] R13: a0048fd4 R14: 81cfbec0 R15:
> >> 880033718000
> >> [   58.572031] FS:  () GS:880037c0()
> >> knlGS:
> >> [   58.572031] CS:  0010 DS:  ES:  CR0: 80050033
> >> [   58.572031] CR2: 00851bf0 CR3: 35b0 CR4:
> >> 06f0
> >> [   58.572031] Stack:
> >> [   58.572031]  880033fffa00 880037c03bc0 8162f2b2
> >> 880033fffa00
> >> [   58.572031]  880037c03be8 8162f327 880033fffa00
> >> 
> >> [   58.572031]  880035b32540 880037c03c68 a0048fd4
> >> 0082
> >> [   58.572031] Call Trace:
> >> [   58.572031]   [   58.572031] []
> >> skb_release_all+0x12/0x30
> >> [   58.572031]  [] kfree_skb+0x37/0xa0
> >> [   58.572031]  [] tipc_disc_rcv+0x84/0x1d0 [tipc]
> >> [   58.572031]  [] tipc_rcv+0x3ac/0x3c0 [tipc]
> >> [   58.572031]  [] ? find_b

[tipc-discussion] [PATCH 3.10 02/17] tipc: avoid possible deadlock while enable and disable bearer

2016-12-06 Thread Jon Maloy
el_timer_sync+0x70/0x70
[  327.696018]  [] del_timer_sync+0x3d/0xd0
[  327.696018]  [] ? try_to_del_timer_sync+0x70/0x70
[  327.696018]  [] tipc_disc_delete+0x15/0x30 [tipc]
[  327.696018]  [] bearer_disable+0xef/0x120 [tipc]
[  327.696018]  [] tipc_disable_bearer+0x2f/0x60 [tipc]
[  327.696018]  [] tipc_cfg_do_cmd+0x2e2/0x550 [tipc]
[  327.696018]  [] ? security_capable+0x13/0x20
[  327.696018]  [] handle_cmd+0x49/0xe0 [tipc]
[  327.696018]  [] genl_family_rcv_msg+0x268/0x340
[  327.696018]  [] genl_rcv_msg+0x70/0xd0
[  327.696018]  [] ? genl_lock+0x20/0x20
[  327.696018]  [] netlink_rcv_skb+0x89/0xb0
[  327.696018]  [] ? genl_rcv+0x18/0x40
[  327.696018]  [] genl_rcv+0x27/0x40
[  327.696018]  [] netlink_unicast+0x15e/0x1b0
[  327.696018]  [] ? memcpy_fromiovec+0x6c/0x90
[  327.696018]  [] netlink_sendmsg+0x22f/0x400
[  327.696018]  [] __sock_sendmsg+0x66/0x80
[  327.696018]  [] sock_aio_write+0x107/0x120
[  327.696018]  [] ? release_sock+0x8c/0xa0
[  327.696018]  [] do_sync_write+0x7d/0xc0
[  327.696018]  [] ? rw_verify_area+0x54/0x100
[  327.696018]  [] vfs_write+0x186/0x190
[  327.696018]  [] SyS_write+0x60/0xb0
[  327.696018]  [] system_call_fastpath+0x16/0x1b

---

The problem is that the tipc_link_delete() will cancel the timer disc_timeout() 
when
the b_ptr->lock is hold, but the disc_timeout() still call b_ptr->lock to 
finish the
work, so the dead lock occurs.

We should unlock the b_ptr->lock when del the disc_timeout().

Remove link_timeout() still met the same problem, the patch:

http://article.gmane.org/gmane.network.tipc.general/4380

fix the problem, so no need to send patch for fix link_timeout() deadlock 
warming.

Signed-off-by: Wang Weidong <wangweido...@huawei.com>
Signed-off-by: Ding Tianhong <dingtianh...@huawei.com>
Acked-by: Ying Xue <ying....@windriver.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bearer.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index cb29ef7..609c30c 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -460,6 +460,7 @@ static void bearer_disable(struct tipc_bearer *b_ptr)
 {
struct tipc_link *l_ptr;
struct tipc_link *temp_l_ptr;
+   struct tipc_link_req *temp_req;
 
pr_info("Disabling bearer <%s>\n", b_ptr->name);
spin_lock_bh(_ptr->lock);
@@ -468,9 +469,13 @@ static void bearer_disable(struct tipc_bearer *b_ptr)
list_for_each_entry_safe(l_ptr, temp_l_ptr, _ptr->links, link_list) {
tipc_link_delete(l_ptr);
}
-   if (b_ptr->link_req)
-   tipc_disc_delete(b_ptr->link_req);
+   temp_req = b_ptr->link_req;
+   b_ptr->link_req = NULL;
spin_unlock_bh(_ptr->lock);
+
+   if (temp_req)
+   tipc_disc_delete(temp_req);
+
memset(b_ptr, 0, sizeof(struct tipc_bearer));
 }
 
-- 
2.7.4


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH 3.10 01/17] tipc: fix oops when creating server socket fails

2016-12-06 Thread Jon Maloy
From: Ying Xue <ying@windriver.com>

When creation of TIPC internal server socket fails,
we get an oops with the following dump:

BUG: unable to handle kernel NULL pointer dereference at 0020
IP: [] tipc_close_conn+0x59/0xb0 [tipc]
PGD 13719067 PUD 12008067 PMD 0
Oops:  [#1] SMP DEBUG_PAGEALLOC
Modules linked in: tipc(+)
CPU: 4 PID: 4340 Comm: insmod Not tainted 3.10.0+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
task: 88001436 ti: 88001374c000 task.ti: 88001374c000
RIP: 0010:[]  [] tipc_close_conn+0x59/0xb0 
[tipc]
RSP: 0018:88001374dc98  EFLAGS: 00010292
RAX:  RBX: 880012ac09d8 RCX: 
RDX: 0046 RSI: 0001 RDI: 88001436
RBP: 88001374dcb8 R08: 0001 R09: 0001
R10:  R11:  R12: a0016fa0
R13: a0017010 R14: a0017010 R15: 880012ac09d8
FS:  () GS:88001660(0063) knlGS:f76668d0
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2: 0020 CR3: 12227000 CR4: 06e0
Stack:
88001374dcb8 a0016fa0  0001
88001374dcf8 a0012922 88001374dce8 ffea
a0017100  8800134241a8 a0017150
Call Trace:
[] tipc_server_stop+0xa2/0x1b0 [tipc]
[] tipc_subscr_stop+0x15/0x20 [tipc]
[] tipc_core_stop+0x1d/0x33 [tipc]
[] tipc_init+0xd4/0xf8 [tipc]
[] ? 0xa001efff
[] do_one_initcall+0x3f/0x150
[] ? __blocking_notifier_call_chain+0x7d/0xd0
[] load_module+0x11aa/0x19c0
[] ? show_initstate+0x50/0x50
[] ? retint_restore_args+0xe/0xe
[] SyS_init_module+0xd9/0x110
[] sysenter_dispatch+0x7/0x1f
Code: 6c 24 70 4c 89 ef e8 b7 04 8f e1 8b 73 04 4c 89 e7 e8 7c 9e 32 e1 41 83 
ac 24
b8 00 00 00 01 4c 89 ef e8 eb 0a 8f e1 48 8b 43 08 <4c> 8b 68 20 4d 8d a5 48 03 
00
00 4c 89 e7 e8 04 05 8f e1 4c 89
RIP  [] tipc_close_conn+0x59/0xb0 [tipc]
RSP 
CR2: 0020
---[ end trace b02321f40e4269a3 ]---

We have the following call chain:

tipc_core_start()
ret = tipc_subscr_start()
ret = tipc_server_start(){
  server->enabled = 1;
  ret = tipc_open_listening_sock()
  }

I.e., the server->enabled flag is unconditionally set to 1, whatever
the return value of tipc_open_listening_sock().

This causes a crash when tipc_core_start() tries to clean up
resources after a failed initialization:

if (ret == failed)
tipc_subscr_stop()
tipc_server_stop(){
if (server->enabled)
tipc_close_conn(){
NULL reference of con->sock-sk
OOPS!
}
}

To avoid this, tipc_server_start() should only set server->enabled
to 1 in case of a succesful socket creation. In case of failure, it
should release all allocated resources before returning.

Problem introduced in commit c5fa7b3cf3cb22e4ac60485fc2dc187fe012910f
("tipc: introduce new TIPC server infrastructure") in v3.11-rc1.
Note that it won't be seen often; it takes a module load under memory
constrained conditions in order to trigger the failure condition.

Signed-off-by: Ying Xue <ying@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortma...@windriver.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
 net/tipc/server.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/tipc/server.c b/net/tipc/server.c
index 19da5abe..fd3fa57 100644
--- a/net/tipc/server.c
+++ b/net/tipc/server.c
@@ -355,8 +355,12 @@ static int tipc_open_listening_sock(struct tipc_server *s)
return PTR_ERR(con);
 
sock = tipc_create_listen_sock(con);
-   if (!sock)
+   if (!sock) {
+   idr_remove(>conn_idr, con->conid);
+   s->idr_in_use--;
+   kfree(con);
return -EINVAL;
+   }
 
tipc_register_callbacks(sock, con);
return 0;
@@ -563,9 +567,14 @@ int tipc_server_start(struct tipc_server *s)
kmem_cache_destroy(s->rcvbuf_cache);
return ret;
}
+   ret = tipc_open_listening_sock(s);
+   if (ret < 0) {
+   tipc_work_stop(s);
+   kmem_cache_destroy(s->rcvbuf_cache);
+   return ret;
+   }
s->enabled = 1;
-
-   return tipc_open_listening_sock(s);
+   return ret;
 }
 
 void tipc_server_stop(struct tipc_server *s)
-- 
2.7.4


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http:

[tipc-discussion] [PATCH 3.10 09/17] tipc: correct return value of recv_msg routine

2016-12-06 Thread Jon Maloy
From: Ying Xue <ying@windriver.com>

Currently, rcv_msg() always returns zero on a packet delivery upcall
from net_device.

To make its behavior more compliant with the way this API should be
used, we change this to let it return NET_RX_SUCCESS (which is zero
anyway) when it is able to handle the packet, and NET_RX_DROP otherwise.
The latter does not imply any functional change, it only enables the
driver to keep more accurate statistics about the fate of delivered
packets.

Signed-off-by: Ying Xue <ying@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortma...@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
 net/tipc/eth_media.c | 6 +++---
 net/tipc/ib_media.c  | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c
index c36c938..f80d59f 100644
--- a/net/tipc/eth_media.c
+++ b/net/tipc/eth_media.c
@@ -132,18 +132,18 @@ static int recv_msg(struct sk_buff *buf, struct 
net_device *dev,
 
if (!net_eq(dev_net(dev), _net)) {
kfree_skb(buf);
-   return 0;
+   return NET_RX_DROP;
}
 
if (likely(eb_ptr->bearer)) {
if (likely(buf->pkt_type <= PACKET_BROADCAST)) {
buf->next = NULL;
tipc_recv_msg(buf, eb_ptr->bearer);
-   return 0;
+   return NET_RX_SUCCESS;
}
}
kfree_skb(buf);
-   return 0;
+   return NET_RX_DROP;
 }
 
 /**
diff --git a/net/tipc/ib_media.c b/net/tipc/ib_media.c
index 20b1aa4..c139892 100644
--- a/net/tipc/ib_media.c
+++ b/net/tipc/ib_media.c
@@ -125,18 +125,18 @@ static int recv_msg(struct sk_buff *buf, struct 
net_device *dev,
 
if (!net_eq(dev_net(dev), _net)) {
kfree_skb(buf);
-   return 0;
+   return NET_RX_DROP;
}
 
if (likely(ib_ptr->bearer)) {
if (likely(buf->pkt_type <= PACKET_BROADCAST)) {
buf->next = NULL;
tipc_recv_msg(buf, ib_ptr->bearer);
-   return 0;
+   return NET_RX_SUCCESS;
}
}
kfree_skb(buf);
-   return 0;
+   return NET_RX_DROP;
 }
 
 /**
-- 
2.7.4


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH 3.10 11/17] tipc: simplify the link lookup routine

2016-12-06 Thread Jon Maloy
From: Erik Hugne <erik.hu...@ericsson.com>

When checking statistics or changing parameters on a link, the
link_find_link function is used to locate the link with a given
name. The complex method of deconstructing the name into local
and remote address/interface is error prone and may fail if the
interface names contains special characters. We change the lookup
method to iterate over the list of nodes and compare the link
names.

Signed-off-by: Erik Hugne <erik.hu...@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortma...@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
 net/tipc/link.c | 110 +++-
 1 file changed, 13 insertions(+), 97 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 223bbc8..e8153f6 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -75,20 +75,6 @@ static const char *link_unk_evt = "Unknown link event ";
  */
 #define START_CHANGEOVER 10u
 
-/**
- * struct tipc_link_name - deconstructed link name
- * @addr_local: network address of node at this end
- * @if_local: name of interface at this end
- * @addr_peer: network address of node at far end
- * @if_peer: name of interface at far end
- */
-struct tipc_link_name {
-   u32 addr_local;
-   char if_local[TIPC_MAX_IF_NAME];
-   u32 addr_peer;
-   char if_peer[TIPC_MAX_IF_NAME];
-};
-
 static void link_handle_out_of_seq_msg(struct tipc_link *l_ptr,
   struct sk_buff *buf);
 static void link_recv_proto_msg(struct tipc_link *l_ptr, struct sk_buff *buf);
@@ -160,72 +146,6 @@ int tipc_link_is_active(struct tipc_link *l_ptr)
 }
 
 /**
- * link_name_validate - validate & (optionally) deconstruct tipc_link name
- * @name: ptr to link name string
- * @name_parts: ptr to area for link name components (or NULL if not needed)
- *
- * Returns 1 if link name is valid, otherwise 0.
- */
-static int link_name_validate(const char *name,
-   struct tipc_link_name *name_parts)
-{
-   char name_copy[TIPC_MAX_LINK_NAME];
-   char *addr_local;
-   char *if_local;
-   char *addr_peer;
-   char *if_peer;
-   char dummy;
-   u32 z_local, c_local, n_local;
-   u32 z_peer, c_peer, n_peer;
-   u32 if_local_len;
-   u32 if_peer_len;
-
-   /* copy link name & ensure length is OK */
-   name_copy[TIPC_MAX_LINK_NAME - 1] = 0;
-   /* need above in case non-Posix strncpy() doesn't pad with nulls */
-   strncpy(name_copy, name, TIPC_MAX_LINK_NAME);
-   if (name_copy[TIPC_MAX_LINK_NAME - 1] != 0)
-   return 0;
-
-   /* ensure all component parts of link name are present */
-   addr_local = name_copy;
-   if_local = strchr(addr_local, ':');
-   if (if_local == NULL)
-   return 0;
-   *(if_local++) = 0;
-   addr_peer = strchr(if_local, '-');
-   if (addr_peer == NULL)
-   return 0;
-   *(addr_peer++) = 0;
-   if_local_len = addr_peer - if_local;
-   if_peer = strchr(addr_peer, ':');
-   if (if_peer == NULL)
-   return 0;
-   *(if_peer++) = 0;
-   if_peer_len = strlen(if_peer) + 1;
-
-   /* validate component parts of link name */
-   if ((sscanf(addr_local, "%u.%u.%u%c",
-   _local, _local, _local, ) != 3) ||
-   (sscanf(addr_peer, "%u.%u.%u%c",
-   _peer, _peer, _peer, ) != 3) ||
-   (z_local > 255) || (c_local > 4095) || (n_local > 4095) ||
-   (z_peer  > 255) || (c_peer  > 4095) || (n_peer  > 4095) ||
-   (if_local_len <= 1) || (if_local_len > TIPC_MAX_IF_NAME) ||
-   (if_peer_len  <= 1) || (if_peer_len  > TIPC_MAX_IF_NAME))
-   return 0;
-
-   /* return link name components, if necessary */
-   if (name_parts) {
-   name_parts->addr_local = tipc_addr(z_local, c_local, n_local);
-   strcpy(name_parts->if_local, if_local);
-   name_parts->addr_peer = tipc_addr(z_peer, c_peer, n_peer);
-   strcpy(name_parts->if_peer, if_peer);
-   }
-   return 1;
-}
-
-/**
  * link_timeout - handle expiration of link timer
  * @l_ptr: pointer to link
  *
@@ -2580,25 +2500,21 @@ void tipc_link_set_queue_limits(struct tipc_link 
*l_ptr, u32 window)
 static struct tipc_link *link_find_link(const char *name,
struct tipc_node **node)
 {
-   struct tipc_link_name link_name_parts;
-   struct tipc_bearer *b_ptr;
struct tipc_link *l_ptr;
+   struct tipc_node *n_ptr;
+   int i;
 
-   if (!link_name_validate(name, _name_parts))
-   return NULL;
-
-   b_ptr = tipc_bearer_find_interface(link_name_parts.if_local);
-   if (!b_ptr)
-   return NULL;
-
-   *node

[tipc-discussion] [PATCH 3.10 12/17] net: misc: Remove extern from function prototypes

2016-12-06 Thread Jon Maloy
From: Joe Perches <j...@perches.com>

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <j...@perches.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/irda/irnet/irnet.h   |  15 ++---
 net/l2tp/l2tp_core.h |  57 +---
 net/mac80211/rate.h  |  12 ++--
 net/netfilter/nf_internals.h |  28 
 net/rds/rds.h|   2 +-
 net/rxrpc/ar-internal.h  | 150 ---
 net/tipc/core.h  |  28 
 net/wimax/wimax-internal.h   |  18 +++---
 net/wireless/core.h  |   6 +-
 net/wireless/sysfs.h |   4 +-
 net/xfrm/xfrm_hash.h |   4 +-
 11 files changed, 157 insertions(+), 167 deletions(-)

diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h
index 564eb0b..8d65bb9 100644
--- a/net/irda/irnet/irnet.h
+++ b/net/irda/irnet/irnet.h
@@ -509,16 +509,11 @@ typedef struct irnet_ctrl_channel
  */
 
 /* -- IRDA PART -- */
-extern int
-   irda_irnet_create(irnet_socket *);  /* Initialise a IrNET socket */
-extern int
-   irda_irnet_connect(irnet_socket *); /* Try to connect over IrDA */
-extern void
-   irda_irnet_destroy(irnet_socket *); /* Teardown  a IrNET socket */
-extern int
-   irda_irnet_init(void);  /* Initialise IrDA part of IrNET */
-extern void
-   irda_irnet_cleanup(void);   /* Teardown IrDA part of IrNET */
+int irda_irnet_create(irnet_socket *); /* Initialise an IrNET socket */
+int irda_irnet_connect(irnet_socket *);/* Try to connect over IrDA */
+void irda_irnet_destroy(irnet_socket *);   /* Teardown an IrNET socket */
+int irda_irnet_init(void); /* Initialise IrDA part of IrNET */
+void irda_irnet_cleanup(void); /* Teardown IrDA part of IrNET */
 
 / VARIABLES /
 
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 66a559b..337cc9d 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -235,29 +235,40 @@ out:
return tunnel;
 }
 
-extern struct sock *l2tp_tunnel_sock_lookup(struct l2tp_tunnel *tunnel);
-extern void l2tp_tunnel_sock_put(struct sock *sk);
-extern struct l2tp_session *l2tp_session_find(struct net *net, struct 
l2tp_tunnel *tunnel, u32 session_id);
-extern struct l2tp_session *l2tp_session_find_nth(struct l2tp_tunnel *tunnel, 
int nth);
-extern struct l2tp_session *l2tp_session_find_by_ifname(struct net *net, char 
*ifname);
-extern struct l2tp_tunnel *l2tp_tunnel_find(struct net *net, u32 tunnel_id);
-extern struct l2tp_tunnel *l2tp_tunnel_find_nth(struct net *net, int nth);
-
-extern int l2tp_tunnel_create(struct net *net, int fd, int version, u32 
tunnel_id, u32 peer_tunnel_id, struct l2tp_tunnel_cfg *cfg, struct l2tp_tunnel 
**tunnelp);
-extern void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel);
-extern int l2tp_tunnel_delete(struct l2tp_tunnel *tunnel);
-extern struct l2tp_session *l2tp_session_create(int priv_size, struct 
l2tp_tunnel *tunnel, u32 session_id, u32 peer_session_id, struct 
l2tp_session_cfg *cfg);
-extern void __l2tp_session_unhash(struct l2tp_session *session);
-extern int l2tp_session_delete(struct l2tp_session *session);
-extern void l2tp_session_free(struct l2tp_session *session);
-extern void l2tp_recv_common(struct l2tp_session *session, struct sk_buff 
*skb, unsigned char *ptr, unsigned char *optr, u16 hdrflags, int length, int 
(*payload_hook)(struct sk_buff *skb));
-extern int l2tp_session_queue_purge(struct l2tp_session *session);
-extern int l2tp_udp_encap_recv(struct sock *sk, struct sk_buff *skb);
-
-extern int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb, 
int hdr_len);
-
-extern int l2tp_nl_register_ops(enum l2tp_pwtype pw_type, const struct 
l2tp_nl_cmd_ops *ops);
-extern void l2tp_nl_unregister_ops(enum l2tp_pwtype pw_type);
+struct sock *l2tp_tunnel_sock_lookup(struct l2tp_tunnel *tunnel);
+void l2tp_tunnel_sock_put(struct sock *sk);
+struct l2tp_session *l2tp_session_find(struct net *net,
+  struct l2tp_tunnel *tunnel,
+  u32 session_id);
+struct l2tp_session *l2tp_session_find_nth(struct l2tp_tunnel *tunnel, int 
nth);
+struct l2tp_session *l2tp_session_find_by_ifname(struct net *net, char 
*ifname);
+struct l2tp_tunnel *l2tp_tunnel_find(struct net *net, u32 tunnel_id);
+struct l2tp_tunnel *l2tp_tunnel_find_nth(struct net *net, int nth);
+
+int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id,
+  u3

[tipc-discussion] [PATCH 3.10 17/17] tipc: remove interface state mirroring in bearer

2016-12-06 Thread Jon Maloy
From: Erik Hugne <erik.hu...@ericsson.com>

struct 'tipc_bearer' is a generic representation of the underlying
media type, and exists in a one-to-one relationship to each interface
TIPC is using. The struct contains a 'blocked' flag that mirrors the
operational and execution state of the represented interface, and is
updated through notification calls from the latter. The users of
tipc_bearer are checking this flag before each attempt to send a
packet via the interface.

This state mirroring serves no purpose in the current code base. TIPC
links will not discover a media failure any faster through this
mechanism, and in reality the flag only adds overhead at packet
sending and reception.

Furthermore, the fact that the flag needs to be protected by a spinlock
aggregated into tipc_bearer has turned out to cause a serious and
completely unnecessary deadlock problem.

CPU0CPU1

Time 0: bearer_disable()link_timeout()
Time 1:   spin_lock_bh(_ptr->lock)  tipc_link_push_queue()
Time 2:   tipc_link_delete()tipc_bearer_blocked(b_ptr)
Time 3: k_cancel_timer(>timer)   spin_lock_bh(_ptr->lock)
Time 4:   del_timer_sync(>timer)

I.e., del_timer_sync() on CPU0 never returns, because the timer handler
on CPU1 is waiting for the bearer lock.

We eliminate the 'blocked' flag from struct tipc_bearer, along with all
tests on this flag. This not only resolves the deadlock, but also
simplifies and speeds up the data path execution of TIPC. It also fits
well into our ongoing effort to make the locking policy simpler and
more manageable.

An effect of this change is that we can get rid of functions such as
tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer().
We replace the latter with a new function, tipc_reset_bearer(), which
resets all links associated to the bearer immediately after an
interface goes down.

A user might notice one slight change in link behaviour after this
change. When an interface goes down, (e.g. through a NETDEV_DOWN
event) all attached links will be reset immediately, instead of
leaving it to each link to detect the failure through a timer-driven
mechanism. We consider this an improvement, and see no obvious risks
with the new behavior.

Signed-off-by: Erik Hugne <erik.hu...@ericsson.com>
Reviewed-by: Ying Xue <ying@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortma...@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
 net/tipc/bcast.c |  6 --
 net/tipc/bearer.c| 33 +++
 net/tipc/bearer.h|  6 +-
 net/tipc/discover.c  | 17 
 net/tipc/eth_media.c | 14 ++---
 net/tipc/ib_media.c  | 14 ++---
 net/tipc/link.c  | 55 +++-
 7 files changed, 28 insertions(+), 117 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 0d44025..4c2a80b 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -621,12 +621,6 @@ static int tipc_bcbearer_send(struct sk_buff *buf, struct 
tipc_bearer *unused1,
if (!p)
break; /* No more bearers to try */
 
-   if (tipc_bearer_blocked(p)) {
-   if (!s || tipc_bearer_blocked(s))
-   continue; /* Can't use either bearer */
-   b = s;
-   }
-
tipc_nmap_diff(>remains, >nodes,
   >remains_new);
if (bcbearer->remains_new.count == bcbearer->remains.count)
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 3f9707a..c2101c0 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -275,31 +275,6 @@ void tipc_bearer_remove_dest(struct tipc_bearer *b_ptr, 
u32 dest)
tipc_disc_remove_dest(b_ptr->link_req);
 }
 
-/*
- * Interrupt enabling new requests after bearer blocking:
- * See bearer_send().
- */
-void tipc_continue(struct tipc_bearer *b)
-{
-   spin_lock_bh(>lock);
-   b->blocked = 0;
-   spin_unlock_bh(>lock);
-}
-
-/*
- * tipc_bearer_blocked - determines if bearer is currently blocked
- */
-int tipc_bearer_blocked(struct tipc_bearer *b)
-{
-   int res;
-
-   spin_lock_bh(>lock);
-   res = b->blocked;
-   spin_unlock_bh(>lock);
-
-   return res;
-}
-
 /**
  * tipc_enable_bearer - enable bearer with the given name
  */
@@ -420,17 +395,16 @@ exit:
 }
 
 /**
- * tipc_block_bearer - Block the bearer, and reset all its links
+ * tipc_reset_bearer - Reset all links established over this bearer
  */
-int tipc_block_bearer(struct tipc_bearer *b_ptr)
+int tipc_reset_bearer(struct tipc_bearer *b_ptr)
 {
struct tipc_link *l_ptr;
struct tipc_link *temp_l_ptr;
 
read_lock_bh(_net_lock);
-   pr_info(&quo

[tipc-discussion] [PATCH 3.10 06/17] tipc: silence sparse warnings

2016-12-06 Thread Jon Maloy
From: Ying Xue <ying@windriver.com>

Eliminate below sparse warnings:

net/tipc/link.c:1210:37: warning: cast removes address space of expression
net/tipc/link.c:1218:59: warning: incorrect type in argument 2 (different 
address spaces)
net/tipc/link.c:1218:59:expected void const [noderef] *from
net/tipc/link.c:1218:59:got unsigned char const [usertype] *[assigned] 
sect_crs
net/tipc/socket.c:341:49: warning: Using plain integer as NULL pointer
net/tipc/socket.c:1371:36: warning: Using plain integer as NULL pointer
net/tipc/socket.c:1694:57: warning: Using plain integer as NULL pointer

Signed-off-by: Ying Xue <ying@windriver.com>
Signed-off-by: Andreas Bofjäll <andreas.bofj...@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortma...@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
 net/tipc/link.c   | 4 ++--
 net/tipc/socket.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index b02a6dc..be73a1f 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1160,7 +1160,7 @@ static int link_send_sections_long(struct tipc_port 
*sender,
struct tipc_msg fragm_hdr;
struct sk_buff *buf, *buf_chain, *prev;
u32 fragm_crs, fragm_rest, hsz, sect_rest;
-   const unchar *sect_crs;
+   const unchar __user *sect_crs;
int curr_sect;
u32 fragm_no;
int res = 0;
@@ -1202,7 +1202,7 @@ again:
 
if (!sect_rest) {
sect_rest = msg_sect[++curr_sect].iov_len;
-   sect_crs = (const unchar *)msg_sect[curr_sect].iov_base;
+   sect_crs = msg_sect[curr_sect].iov_base;
}
 
if (sect_rest < fragm_rest)
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index d224382..3906527 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -338,7 +338,7 @@ static int release(struct socket *sock)
buf = __skb_dequeue(>sk_receive_queue);
if (buf == NULL)
break;
-   if (TIPC_SKB_CB(buf)->handle != 0)
+   if (TIPC_SKB_CB(buf)->handle != NULL)
kfree_skb(buf);
else {
if ((sock->state == SS_CONNECTING) ||
@@ -1364,7 +1364,7 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff 
*buf)
return TIPC_ERR_OVERLOAD;
 
/* Enqueue message */
-   TIPC_SKB_CB(buf)->handle = 0;
+   TIPC_SKB_CB(buf)->handle = NULL;
__skb_queue_tail(>sk_receive_queue, buf);
skb_set_owner_r(buf, sk);
 
@@ -1687,7 +1687,7 @@ restart:
/* Disconnect and send a 'FIN+' or 'FIN-' message to peer */
buf = __skb_dequeue(>sk_receive_queue);
if (buf) {
-   if (TIPC_SKB_CB(buf)->handle != 0) {
+   if (TIPC_SKB_CB(buf)->handle != NULL) {
kfree_skb(buf);
goto restart;
}
-- 
2.7.4


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH 3.10 05/17] tipc: remove iovec length parameter from all sending functions

2016-12-06 Thread Jon Maloy
From: Ying Xue <ying@windriver.com>

tipc_msg_build() now copies message data from iovec to skb_buff
using memcpy_fromiovecend(), which doesn't need to be passed the
iovec length to perform the copying.

So we remove the parameter indicating iovec length in all
functions where TIPC messages are built and sent.

Signed-off-by: Ying Xue <ying@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortma...@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
 net/tipc/link.c   | 25 +
 net/tipc/link.h   |  4 +---
 net/tipc/msg.c|  7 +++---
 net/tipc/msg.h|  3 +--
 net/tipc/port.c   | 66 +++
 net/tipc/port.h   | 16 +-
 net/tipc/socket.c |  6 +
 7 files changed, 49 insertions(+), 78 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 0cc3d90..b02a6dc 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -97,8 +97,7 @@ static int  link_recv_changeover_msg(struct tipc_link **l_ptr,
 static void link_set_supervision_props(struct tipc_link *l_ptr, u32 tolerance);
 static int  link_send_sections_long(struct tipc_port *sender,
struct iovec const *msg_sect,
-   u32 num_sect, unsigned int total_len,
-   u32 destnode);
+   unsigned int len, u32 destnode);
 static void link_state_event(struct tipc_link *l_ptr, u32 event);
 static void link_reset_statistics(struct tipc_link *l_ptr);
 static void link_print(struct tipc_link *l_ptr, const char *str);
@@ -1065,8 +1064,7 @@ static int link_send_buf_fast(struct tipc_link *l_ptr, 
struct sk_buff *buf,
  */
 int tipc_link_send_sections_fast(struct tipc_port *sender,
 struct iovec const *msg_sect,
-const u32 num_sect, unsigned int total_len,
-u32 destaddr)
+unsigned int len, u32 destaddr)
 {
struct tipc_msg *hdr = >phdr;
struct tipc_link *l_ptr;
@@ -1080,8 +1078,7 @@ again:
 * Try building message using port's max_pkt hint.
 * (Must not hold any locks while building message.)
 */
-   res = tipc_msg_build(hdr, msg_sect, num_sect, total_len,
-sender->max_pkt, );
+   res = tipc_msg_build(hdr, msg_sect, len, sender->max_pkt, );
/* Exit if build request was invalid */
if (unlikely(res < 0))
return res;
@@ -1121,8 +1118,7 @@ exit:
if ((msg_hdr_sz(hdr) + res) <= sender->max_pkt)
goto again;
 
-   return link_send_sections_long(sender, msg_sect,
-  num_sect, total_len,
+   return link_send_sections_long(sender, msg_sect, len,
   destaddr);
}
tipc_node_unlock(node);
@@ -1133,8 +1129,8 @@ exit:
if (buf)
return tipc_reject_msg(buf, TIPC_ERR_NO_NODE);
if (res >= 0)
-   return tipc_port_reject_sections(sender, hdr, msg_sect, 
num_sect,
-total_len, TIPC_ERR_NO_NODE);
+   return tipc_port_reject_sections(sender, hdr, msg_sect,
+len, TIPC_ERR_NO_NODE);
return res;
 }
 
@@ -1154,13 +1150,12 @@ exit:
  */
 static int link_send_sections_long(struct tipc_port *sender,
   struct iovec const *msg_sect,
-  u32 num_sect, unsigned int total_len,
-  u32 destaddr)
+  unsigned int len, u32 destaddr)
 {
struct tipc_link *l_ptr;
struct tipc_node *node;
struct tipc_msg *hdr = >phdr;
-   u32 dsz = total_len;
+   u32 dsz = len;
u32 max_pkt, fragm_sz, rest;
struct tipc_msg fragm_hdr;
struct sk_buff *buf, *buf_chain, *prev;
@@ -1283,8 +1278,8 @@ reject:
buf = buf_chain->next;
kfree_skb(buf_chain);
}
-   return tipc_port_reject_sections(sender, hdr, msg_sect, 
num_sect,
-total_len, TIPC_ERR_NO_NODE);
+   return tipc_port_reject_sections(sender, hdr, msg_sect,
+len, TIPC_ERR_NO_NODE);
}
 
/* Append chain of fragments to send queue & send them */
diff --git a/net/tipc/link.h b/net/tipc/link.h
index c048ed1..55cf855 100644
--- a/net/tipc/link.h
+++ b/net/tipc/link.h
@@ -227,9 +227,7 @@ int tipc_link_send_buf(struct tipc_link *l_ptr, struct 
sk_buff *b

[tipc-discussion] [PATCH 3.10 07/17] tipc: make bearer and media naming consistent

2016-12-06 Thread Jon Maloy
From: Ying Xue <ying@windriver.com>

TIPC 'bearer' exists as an abstract concept, while 'media'
is deemed a specific implementation of a bearer, such as Ethernet
or Infiniband media. When a component inside TIPC wants to control
a specific media, it only needs to access the generic bearer API
to achieve this. However, in the current media implementations,
the 'bearer' name is also extensively used in media specific
function and variable names.

This may create confusion, so we choose to replace the term 'bearer'
with 'media' in all function names, variable names, and prefixes
where this is what really is meant.

Note that this change is cosmetic only, and no runtime behaviour
changes are made here.

Signed-off-by: Ying Xue <ying@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortma...@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
 net/tipc/bearer.c|  4 ++--
 net/tipc/bearer.h|  8 
 net/tipc/eth_media.c | 56 ++--
 net/tipc/ib_media.c  | 46 +-
 4 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 609c30c..09faa55 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -387,7 +387,7 @@ restart:
 
b_ptr = _bearers[bearer_id];
strcpy(b_ptr->name, name);
-   res = m_ptr->enable_bearer(b_ptr);
+   res = m_ptr->enable_media(b_ptr);
if (res) {
pr_warn("Bearer <%s> rejected, enable failure (%d)\n",
name, -res);
@@ -465,7 +465,7 @@ static void bearer_disable(struct tipc_bearer *b_ptr)
pr_info("Disabling bearer <%s>\n", b_ptr->name);
spin_lock_bh(_ptr->lock);
b_ptr->blocked = 1;
-   b_ptr->media->disable_bearer(b_ptr);
+   b_ptr->media->disable_media(b_ptr);
list_for_each_entry_safe(l_ptr, temp_l_ptr, _ptr->links, link_list) {
tipc_link_delete(l_ptr);
}
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index 09c869a..f800e63 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -75,8 +75,8 @@ struct tipc_bearer;
 /**
  * struct tipc_media - TIPC media information available to internal users
  * @send_msg: routine which handles buffer transmission
- * @enable_bearer: routine which enables a bearer
- * @disable_bearer: routine which disables a bearer
+ * @enable_media: routine which enables a media
+ * @disable_media: routine which disables a media
  * @addr2str: routine which converts media address to string
  * @addr2msg: routine which converts media address to protocol message area
  * @msg2addr: routine which converts media address from protocol message area
@@ -91,8 +91,8 @@ struct tipc_media {
int (*send_msg)(struct sk_buff *buf,
struct tipc_bearer *b_ptr,
struct tipc_media_addr *dest);
-   int (*enable_bearer)(struct tipc_bearer *b_ptr);
-   void (*disable_bearer)(struct tipc_bearer *b_ptr);
+   int (*enable_media)(struct tipc_bearer *b_ptr);
+   void (*disable_media)(struct tipc_bearer *b_ptr);
int (*addr2str)(struct tipc_media_addr *a, char *str_buf, int str_size);
int (*addr2msg)(struct tipc_media_addr *a, char *msg_area);
int (*msg2addr)(const struct tipc_bearer *b_ptr,
diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c
index 40ea40c..e048d49 100644
--- a/net/tipc/eth_media.c
+++ b/net/tipc/eth_media.c
@@ -2,7 +2,7 @@
  * net/tipc/eth_media.c: Ethernet bearer support for TIPC
  *
  * Copyright (c) 2001-2007, Ericsson AB
- * Copyright (c) 2005-2008, 2011, Wind River Systems
+ * Copyright (c) 2005-2008, 2011-2013, Wind River Systems
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -37,19 +37,19 @@
 #include "core.h"
 #include "bearer.h"
 
-#define MAX_ETH_BEARERSMAX_BEARERS
+#define MAX_ETH_MEDIA  MAX_BEARERS
 
 #define ETH_ADDR_OFFSET4   /* message header offset of MAC address 
*/
 
 /**
- * struct eth_bearer - Ethernet bearer data structure
+ * struct eth_media - Ethernet bearer data structure
  * @bearer: ptr to associated "generic" bearer structure
  * @dev: ptr to associated Ethernet network device
  * @tipc_packet_type: used in binding TIPC to Ethernet driver
  * @setup: work item used when enabling bearer
  * @cleanup: work item used when disabling bearer
  */
-struct eth_bearer {
+struct eth_media {
struct tipc_bearer *bearer;
struct net_device *dev;
struct packet_type tipc_packet_type;
@@ -58,7 +58,7 @@ struct eth_bearer {
 };
 
 static struct tipc_media eth_media_info;
-static struct eth_bearer eth_bearers[MAX_ETH_BEARERS];
+static struct eth_media eth_media_array[MAX_ETH_MEDIA];
 sta

[tipc-discussion] [PATCH 3.10 16/17] tipc: reassembly failures should cause link reset

2016-12-06 Thread Jon Maloy
From: Erik Hugne <erik.hu...@ericsson.com>

If appending a received fragment to the pending fragment chain
in a unicast link fails, the current code tries to force a retransmission
of the fragment by decrementing the 'next received sequence number'
field in the link. This is done under the assumption that the failure
is caused by an out-of-memory situation, an assumption that does
not hold true after the previous patch in this series.

A failure to append a fragment can now only be caused by a protocol
violation by the sending peer, and it must hence be assumed that it
is either malicious or buggy.  Either way, the correct behavior is now
to reset the link instead of trying to revert its sequence number.
So, this is what we do in this commit.

Signed-off-by: Erik Hugne <erik.hu...@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortma...@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
 net/tipc/link.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index a63646e..cf465d6 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1652,7 +1652,7 @@ deliver:
goto deliver;
}
if (ret == LINK_REASM_ERROR)
-   l_ptr->next_in_no--;
+   tipc_link_reset(l_ptr);
tipc_node_unlock(n_ptr);
continue;
case CHANGEOVER_PROTOCOL:
-- 
2.7.4


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH 3.10 10/17] tipc: correct return value of link_cmd_set_value routine

2016-12-06 Thread Jon Maloy
From: Ying Xue <ying@windriver.com>

link_cmd_set_value() takes commands for link, bearer and media related
configuration. Genereally the function returns 0 when a command is
recognized, and -EINVAL when it is not. However, in the switch for link
related commands it returns 0 even when the command is unrecognized. This
will sometimes make it look as if a failed configuration command has been
successful, but has otherwise no negative effects.

We remove this anomaly by returning -EINVAL even for link commands. We also
rework all three switches to make them  conforming to common kernel coding
style.

Signed-off-by: Ying Xue <ying@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortma...@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
 net/tipc/link.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index be73a1f..223bbc8 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -2641,6 +2641,7 @@ static int link_cmd_set_value(const char *name, u32 
new_value, u16 cmd)
struct tipc_link *l_ptr;
struct tipc_bearer *b_ptr;
struct tipc_media *m_ptr;
+   int res = 0;
 
l_ptr = link_find_link(name, );
if (l_ptr) {
@@ -2663,9 +2664,12 @@ static int link_cmd_set_value(const char *name, u32 
new_value, u16 cmd)
case TIPC_CMD_SET_LINK_WINDOW:
tipc_link_set_queue_limits(l_ptr, new_value);
break;
+   default:
+   res = -EINVAL;
+   break;
}
tipc_node_unlock(node);
-   return 0;
+   return res;
}
 
b_ptr = tipc_bearer_find(name);
@@ -2673,15 +2677,18 @@ static int link_cmd_set_value(const char *name, u32 
new_value, u16 cmd)
switch (cmd) {
case TIPC_CMD_SET_LINK_TOL:
b_ptr->tolerance = new_value;
-   return 0;
+   break;
case TIPC_CMD_SET_LINK_PRI:
b_ptr->priority = new_value;
-   return 0;
+   break;
case TIPC_CMD_SET_LINK_WINDOW:
b_ptr->window = new_value;
-   return 0;
+   break;
+   default:
+   res = -EINVAL;
+   break;
}
-   return -EINVAL;
+   return res;
}
 
m_ptr = tipc_media_find(name);
@@ -2690,15 +2697,18 @@ static int link_cmd_set_value(const char *name, u32 
new_value, u16 cmd)
switch (cmd) {
case TIPC_CMD_SET_LINK_TOL:
m_ptr->tolerance = new_value;
-   return 0;
+   break;
case TIPC_CMD_SET_LINK_PRI:
m_ptr->priority = new_value;
-   return 0;
+   break;
case TIPC_CMD_SET_LINK_WINDOW:
m_ptr->window = new_value;
-   return 0;
+   break;
+   default:
+   res = -EINVAL;
+   break;
}
-   return -EINVAL;
+   return res;
 }
 
 struct sk_buff *tipc_link_cmd_config(const void *req_tlv_area, int 
req_tlv_space,
-- 
2.7.4


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH 3.10 08/17] tipc: avoid unnecessary lookup for tipc bearer instance

2016-12-06 Thread Jon Maloy
From: Ying Xue <ying@windriver.com>

tipc_block_bearer() currently takes a bearer name (const char*)
as argument. This requires the function to make a lookup to find
the pointer to the corresponding bearer struct. In the current
code base this is not necessary, since the only two callers
(tipc_continue(),recv_notification()) already have validated
copies of this pointer, and hence can pass it directly in the
function call.

We change tipc_block_bearer() to directly take struct tipc_bearer*
as argument instead.

Signed-off-by: Ying Xue <ying@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortma...@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
 net/tipc/bearer.c| 14 +++---
 net/tipc/bearer.h|  2 +-
 net/tipc/eth_media.c |  6 +++---
 net/tipc/ib_media.c  |  6 +++---
 4 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 09faa55..3f9707a 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -420,23 +420,15 @@ exit:
 }
 
 /**
- * tipc_block_bearer - Block the bearer with the given name, and reset all its 
links
+ * tipc_block_bearer - Block the bearer, and reset all its links
  */
-int tipc_block_bearer(const char *name)
+int tipc_block_bearer(struct tipc_bearer *b_ptr)
 {
-   struct tipc_bearer *b_ptr = NULL;
struct tipc_link *l_ptr;
struct tipc_link *temp_l_ptr;
 
read_lock_bh(_net_lock);
-   b_ptr = tipc_bearer_find(name);
-   if (!b_ptr) {
-   pr_warn("Attempt to block unknown bearer <%s>\n", name);
-   read_unlock_bh(_net_lock);
-   return -EINVAL;
-   }
-
-   pr_info("Blocking bearer <%s>\n", name);
+   pr_info("Blocking bearer <%s>\n", b_ptr->name);
spin_lock_bh(_ptr->lock);
b_ptr->blocked = 1;
list_for_each_entry_safe(l_ptr, temp_l_ptr, _ptr->links, link_list) {
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index f800e63..e5e04be 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -163,7 +163,7 @@ int tipc_register_media(struct tipc_media *m_ptr);
 
 void tipc_recv_msg(struct sk_buff *buf, struct tipc_bearer *tb_ptr);
 
-int  tipc_block_bearer(const char *name);
+int  tipc_block_bearer(struct tipc_bearer *b_ptr);
 void tipc_continue(struct tipc_bearer *tb_ptr);
 
 int tipc_enable_bearer(const char *bearer_name, u32 disc_domain, u32 priority);
diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c
index e048d49..c36c938 100644
--- a/net/tipc/eth_media.c
+++ b/net/tipc/eth_media.c
@@ -265,17 +265,17 @@ static int recv_notification(struct notifier_block *nb, 
unsigned long evt,
if (netif_carrier_ok(dev))
tipc_continue(eb_ptr->bearer);
else
-   tipc_block_bearer(eb_ptr->bearer->name);
+   tipc_block_bearer(eb_ptr->bearer);
break;
case NETDEV_UP:
tipc_continue(eb_ptr->bearer);
break;
case NETDEV_DOWN:
-   tipc_block_bearer(eb_ptr->bearer->name);
+   tipc_block_bearer(eb_ptr->bearer);
break;
case NETDEV_CHANGEMTU:
case NETDEV_CHANGEADDR:
-   tipc_block_bearer(eb_ptr->bearer->name);
+   tipc_block_bearer(eb_ptr->bearer);
tipc_continue(eb_ptr->bearer);
break;
case NETDEV_UNREGISTER:
diff --git a/net/tipc/ib_media.c b/net/tipc/ib_media.c
index 5545145..20b1aa4 100644
--- a/net/tipc/ib_media.c
+++ b/net/tipc/ib_media.c
@@ -258,17 +258,17 @@ static int recv_notification(struct notifier_block *nb, 
unsigned long evt,
if (netif_carrier_ok(dev))
tipc_continue(ib_ptr->bearer);
else
-   tipc_block_bearer(ib_ptr->bearer->name);
+   tipc_block_bearer(ib_ptr->bearer);
break;
case NETDEV_UP:
tipc_continue(ib_ptr->bearer);
break;
case NETDEV_DOWN:
-   tipc_block_bearer(ib_ptr->bearer->name);
+   tipc_block_bearer(ib_ptr->bearer);
break;
case NETDEV_CHANGEMTU:
case NETDEV_CHANGEADDR:
-   tipc_block_bearer(ib_ptr->bearer->name);
+   tipc_block_bearer(ib_ptr->bearer);
tipc_continue(ib_ptr->bearer);
break;
case NETDEV_UNREGISTER:
-- 
2.7.4


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http:/

Re: [tipc-discussion] [PATCH net-next v2 1/3] tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions

2016-12-06 Thread Jon Maloy


On 12/06/2016 03:49 AM, Parthasarathy Bhuvaragan wrote:
> On 11/29/2016 06:03 PM, Jon Maloy wrote:
>> The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very
>> similar. The latter function is also called from two locations, and
>> there will be more in the coming commits, which will all need to test on
>> different conditions.
>>
>> Instead of making yet another duplicates of the function, we now
>> introduce a new macro tipc_wait_for_cond() where the wakeup condition
>> can be stated as an argument to the call. This macro replaces all
>> current and future uses of the two functions, which can now be
>> eliminated.
>>
>> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
>> ---
>>  net/tipc/socket.c | 110 
>> +-
>>  1 file changed, 51 insertions(+), 59 deletions(-)
>>
>> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
>> index 333c5da..30732a8 100644
>> --- a/net/tipc/socket.c
>> +++ b/net/tipc/socket.c
>> @@ -110,7 +110,6 @@ static void tipc_write_space(struct sock *sk);
>>  static void tipc_sock_destruct(struct sock *sk);
>>  static int tipc_release(struct socket *sock);
>>  static int tipc_accept(struct socket *sock, struct socket *new_sock, 
>> int flags);
>> -static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p);
>>  static void tipc_sk_timeout(unsigned long data);
>>  static int tipc_sk_publish(struct tipc_sock *tsk, uint scope,
>> struct tipc_name_seq const *seq);
>> @@ -334,6 +333,51 @@ static int tipc_set_sk_state(struct sock *sk, 
>> int state)
>>  return res;
>>  }
>>
>> +static int tipc_sk_sock_err(struct socket *sock, long *timeout)
>> +{
>> +struct sock *sk = sock->sk;
>> +int err = sock_error(sk);
>> +int typ = sock->type;
>> +
>> +if (err)
>> +return err;
>> +if (typ == SOCK_STREAM || typ == SOCK_SEQPACKET) {
>> +if (sk->sk_state == TIPC_DISCONNECTING)
>> +return -EPIPE;
>> +else if (!tipc_sk_connected(sk))
>> +return -ENOTCONN;
>> +} else if (sk->sk_shutdown & SEND_SHUTDOWN) {
>> +return -EPIPE;
> The else is for connection less sockets, so returning EPIPE does not 
> make sense.

This is how it is currently done in tipc_wait_for_snd_msg(), probably a 
leftover from a time when it was used even for connections. Should I 
just leave it out, or do you suggest any other code?

///jon

>
> /Partha
>> +}
>> +if (!*timeout)
>> +return -EAGAIN;
>> +if (signal_pending(current))
>> +return sock_intr_errno(*timeout);
>> +
>> +return 0;
>> +}
>> +
>> +#define tipc_wait_for_cond(sock_, timeout_, condition_)\
>> +({\
>> +int rc_ = 0;\
>> +int done_ = 0;\
>> +\
>> +while (!(condition_) && !done_) {\
>> +struct sock *sk_ = sock->sk;\
>> +DEFINE_WAIT_FUNC(wait_, woken_wake_function);\
>> +\
>> +rc_ = tipc_sk_sock_err(sock_, timeout_);\
>> +if (rc_)\
>> +break;\
>> +prepare_to_wait(sk_sleep(sk_), _,\
>> +TASK_INTERRUPTIBLE);\
>> +done_ = sk_wait_event(sk_, timeout_,\
>> +  (condition_), _);\
>> +remove_wait_queue(sk_sleep(sk), _);\
>> +}\
>> +rc_;\
>> +})
>> +
>>  /**
>>   * tipc_sk_create - create a TIPC socket
>>   * @net: network namespace (must be default network)
>> @@ -719,7 +763,7 @@ static int tipc_sendmcast(struct  socket *sock, 
>> struct tipc_name_seq *seq,
>>
>>  if (rc == -ELINKCONG) {
>>  tsk->link_cong = 1;
>> -rc = tipc_wait_for_sndmsg(sock, );
>> +rc = tipc_wait_for_cond(sock, , !tsk->link_cong);
>>  if (!rc)
>>  continue;
>>  }
>> @@ -828,31 +872,6 @@ static void tipc_sk_proto_rcv(struct tipc_sock 
>> *tsk, struct sk_buff *skb,
>>  kfree_skb(skb);
>>  }
>>
>> -static int tipc_wait_for_sndmsg(struct socket *sock, long *timeo_p)
&

[tipc-discussion] [net-next 3/3] tipc: reduce risk of user starvation during link congestion

2017-01-03 Thread Jon Maloy
The socket code currently handles link congestion by either blocking
and trying to send again when the congestion has abated, or just
returning to the user with -EAGAIN and let him re-try later.

This mechanism is prone to starvation, because the wakeup algorithm is
non-atomic. During the time the link issues a wakeup signal, until the
socket wakes up and re-attempts sending, other senders may have come
in between and occupied the free buffer space in the link. This in turn
may lead to a socket having to make many send attempts before it is
successful. In extremely loaded systems we have observed latency times
of several seconds before a low-priority socket is able to send out a
message.

In this commit, we simplify this mechanism and reduce the risk of the
described scenario happening. When a message is attempted sent via a
congested link, we now let it be added to the link's backlog queue
anyway, thus permitting an oversubscription of one message per source
socket. We still create a wakeup item and return an error code, hence
instructing the sender to block or stop sending. Only when enough space
has been freed up in the link's backlog queue do we issue a wakeup event
that allows the sender to continue with the next message, if any.

The fact that a socket now can consider a message sent even when the
link returns a congestion code means that the sending socket code can
be simplified. Also, since this is a good opportunity to get rid of the
obsolete 'mtu change' condition in the three socket send functions, we
now choose to refactor those functions completely.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
Acked-by: Ying Xue <ying@windriver.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bcast.c  |   6 +-
 net/tipc/link.c   |  75 +---
 net/tipc/msg.h|   2 -
 net/tipc/node.c   |  15 +--
 net/tipc/socket.c | 347 --
 5 files changed, 194 insertions(+), 251 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index aa1babb..c35fad3 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -174,7 +174,7 @@ static void tipc_bcbase_xmit(struct net *net, struct 
sk_buff_head *xmitq)
  *and to identified node local sockets
  * @net: the applicable net namespace
  * @list: chain of buffers containing message
- * Consumes the buffer chain, except when returning -ELINKCONG
+ * Consumes the buffer chain.
  * Returns 0 if success, otherwise errno: -ELINKCONG,-EHOSTUNREACH,-EMSGSIZE
  */
 int tipc_bcast_xmit(struct net *net, struct sk_buff_head *list)
@@ -197,7 +197,7 @@ int tipc_bcast_xmit(struct net *net, struct sk_buff_head 
*list)
tipc_bcast_unlock(net);
 
/* Don't send to local node if adding to link failed */
-   if (unlikely(rc)) {
+   if (unlikely(rc && (rc != -ELINKCONG))) {
__skb_queue_purge();
return rc;
}
@@ -206,7 +206,7 @@ int tipc_bcast_xmit(struct net *net, struct sk_buff_head 
*list)
tipc_bcbase_xmit(net, );
tipc_sk_mcast_rcv(net, , );
__skb_queue_purge(list);
-   return 0;
+   return rc;
 }
 
 /* tipc_bcast_rcv - receive a broadcast packet, and deliver to rcv link
diff --git a/net/tipc/link.c b/net/tipc/link.c
index bda89bf..b758ca8 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -776,60 +776,47 @@ int tipc_link_timeout(struct tipc_link *l, struct 
sk_buff_head *xmitq)
 
 /**
  * link_schedule_user - schedule a message sender for wakeup after congestion
- * @link: congested link
- * @list: message that was attempted sent
+ * @l: congested link
+ * @hdr: header of message that is being sent
  * Create pseudo msg to send back to user when congestion abates
- * Does not consume buffer list
  */
-static int link_schedule_user(struct tipc_link *link, struct sk_buff_head 
*list)
+static int link_schedule_user(struct tipc_link *l, struct tipc_msg *hdr)
 {
-   struct tipc_msg *msg = buf_msg(skb_peek(list));
-   int imp = msg_importance(msg);
-   u32 oport = msg_origport(msg);
-   u32 addr = tipc_own_addr(link->net);
+   u32 dnode = tipc_own_addr(l->net);
+   u32 dport = msg_origport(hdr);
struct sk_buff *skb;
 
-   /* This really cannot happen...  */
-   if (unlikely(imp > TIPC_CRITICAL_IMPORTANCE)) {
-   pr_warn("%s<%s>, send queue full", link_rst_msg, link->name);
-   return -ENOBUFS;
-   }
-   /* Non-blocking sender: */
-   if (TIPC_SKB_CB(skb_peek(list))->wakeup_pending)
-   return -ELINKCONG;
-
/* Create and schedule wakeup pseudo message */
skb = tipc_msg_create(SOCK_WAKEUP, 0, INT_H_SIZE, 0,
- addr, addr, oport, 0, 0);
+ dnode, l->addr, dport, 0, 0);
if (!skb)
return -ENOBUFS;
-   TIPC_SKB_CB(skb)->chai

[tipc-discussion] [net-next 0/3] tipc: improve interaction socket-link

2017-01-03 Thread Jon Maloy
We fix a very real starvation problem that may occur when a link
encounters send buffer congestion. At the same time we make the 
interaction between the socket and link layer simpler and more 
consistent.

Jon Maloy (3):
  tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg()
functions
  tipc: modify struct tipc_plist to be more versatile
  tipc: reduce risk of user starvation during link congestion

 net/tipc/bcast.c  |   6 +-
 net/tipc/link.c   |  75 -
 net/tipc/msg.h|   2 -
 net/tipc/name_table.c | 100 +++
 net/tipc/name_table.h |  21 +--
 net/tipc/node.c   |  15 +-
 net/tipc/socket.c | 449 ++
 7 files changed, 319 insertions(+), 349 deletions(-)

-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


<    1   2   3   4   5   6   7   8   9   10   >