from:"Xue, Ying"

Re: [tipc-discussion] [net] tipc: fix the timer expires after interval 100ms

2022-03-18 Thread Xue, Ying

On 3/18/2022 11:58 AM, Hoang Le wrote:
> In the timer callback function tipc_sk_timeout(), we're trying to
> reschedule another timeout to retransmit a setup request if destination
> link is congested. But we use the incorrect timeout value
> (msecs_to_jiffies(100)) instead of (jiffies + msecs_to_jiffies(100)),
> so that the timer expires immediately, it's irrelevant for original
> description.
> 
> In this commit we correct the timeout value in sk_reset_timer()
> 
> Fixes: 6787927475e5 ("tipc: buffer overflow handling in listener socket")
> Signed-off-by: Hoang Le 

Acked-by: Ying Xue 

> ---
>  net/tipc/socket.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 7545321c3440..17f8c523e33b 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -2852,7 +2852,8 @@ static void tipc_sk_retry_connect(struct sock *sk, 
> struct sk_buff_head *list)
>  
>   /* Try again later if dest link is congested */
>   if (tsk->cong_link_cnt) {
> - sk_reset_timer(sk, >sk_timer, msecs_to_jiffies(100));
> + sk_reset_timer(sk, >sk_timer,
> +jiffies + msecs_to_jiffies(100));
>   return;
>   }
>   /* Prepare SYN for retransmit */


___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH v2] tipc: check for null after calling kmemdup

2021-11-17 Thread Xue, Ying

Acked-by: Ying Xue 

-Original Message-
From: Tadeusz Struk  
Sent: Tuesday, November 16, 2021 12:02 AM
To: da...@davemloft.net
Cc: Tadeusz Struk ; Jon Maloy ; 
Xue, Ying ; Jakub Kicinski ; 
net...@vger.kernel.org; tipc-discussion@lists.sourceforge.net; 
linux-ker...@vger.kernel.org; sta...@vger.kernel.org; Dmitry Vyukov 

Subject: [PATCH v2] tipc: check for null after calling kmemdup

kmemdup can return a null pointer so need to check for it, otherwise the null 
key will be dereferenced later in tipc_crypto_key_xmit as can be seen in the 
trace [1].

Cc: Jon Maloy 
Cc: Ying Xue 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: net...@vger.kernel.org
Cc: tipc-discussion@lists.sourceforge.net
Cc: linux-ker...@vger.kernel.org
Cc: sta...@vger.kernel.org # 5.15, 5.14, 5.10

[1] 
https://syzkaller.appspot.com/bug?id=bca180abb29567b189efdbdb34cbf7ba851c2a58

Reported-by: Dmitry Vyukov 
Signed-off-by: Tadeusz Struk 
---
Changed in v2:
- use tipc_aead_free() to free all crytpo tfm instances
  that might have been allocated before the fail.
---
 net/tipc/crypto.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c index 
dc60c32bb70d..d293614d5fc6 100644
--- a/net/tipc/crypto.c
+++ b/net/tipc/crypto.c
@@ -597,6 +597,10 @@ static int tipc_aead_init(struct tipc_aead **aead, struct 
tipc_aead_key *ukey,
tmp->cloned = NULL;
tmp->authsize = TIPC_AES_GCM_TAG_SIZE;
tmp->key = kmemdup(ukey, tipc_aead_key_size(ukey), GFP_KERNEL);
+   if (!tmp->key) {
+   tipc_aead_free(>rcu);
+   return -ENOMEM;
+   }
memcpy(>salt, ukey->key + keylen, TIPC_AES_GCM_SALT_SIZE);
atomic_set(>users, 0);
atomic64_set(>seqno, 0);
--
2.33.1



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net] tipc: only accept encrypted MSG_CRYPTO msgs

2021-11-14 Thread Xue, Ying

Thanks Xin! The patch looks good to me.

Acked-by: Ying Xue 

-Original Message-
From: Xin Long  
Sent: Saturday, November 13, 2021 3:23 AM
To: tipc-discussion@lists.sourceforge.net
Subject: [tipc-discussion] [PATCH net] tipc: only accept encrypted MSG_CRYPTO 
msgs

The MSG_CRYPTO msgs are always encrypted and sent to other nodes for keys' 
deployment. But when receiving in peers, if those nodes do not validate it and 
make sure it's encrypted, one could craft a malicious MSG_CRYPTO msg to deploy 
its key with no need to know other nodes' keys.

This patch is to do that by checking TIPC_SKB_CB(skb)->decrypted and discard it 
if this packet never got decrypted.

Fixes: 1ef6f7c9390f ("tipc: add automatic session key exchange")
Signed-off-by: Xin Long 
---
 net/tipc/link.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c index 1b7a487c8841..09ae8448f394 
100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1298,8 +1298,11 @@ static bool tipc_data_input(struct tipc_link *l, struct 
sk_buff *skb,
return false;
 #ifdef CONFIG_TIPC_CRYPTO
case MSG_CRYPTO:
-   tipc_crypto_msg_rcv(l->net, skb);
-   return true;
+   if (TIPC_SKB_CB(skb)->decrypted) {
+   tipc_crypto_msg_rcv(l->net, skb);
+   return true;
+   }
+   fallthrough;
 #endif
default:
pr_warn("Dropping received illegal msg type\n");
--
2.27.0



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH RESEND][next] tipc: Fix fall-through warnings for Clang

2021-04-21 Thread Xue, Ying

This patch looks good to me.

-Original Message-
From: Gustavo A. R. Silva  
Sent: Friday, March 5, 2021 5:25 PM
To: Jon Maloy ; Xue, Ying ; David S. 
Miller ; Jakub Kicinski 
Cc: net...@vger.kernel.org; tipc-discussion@lists.sourceforge.net; 
linux-ker...@vger.kernel.org; Gustavo A. R. Silva ; 
linux-harden...@vger.kernel.org
Subject: [PATCH RESEND][next] tipc: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by 
explicitly adding a break statement instead of letting the code fall through to 
the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva 
---
 net/tipc/link.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tipc/link.c b/net/tipc/link.c index 115109259430..bcc426e16725 
100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -649,6 +649,7 @@ int tipc_link_fsm_evt(struct tipc_link *l, int evt)
break;
case LINK_FAILOVER_BEGIN_EVT:
l->state = LINK_FAILINGOVER;
+   break;
case LINK_FAILURE_EVT:
case LINK_RESET_EVT:
case LINK_ESTABLISH_EVT:
--
2.27.0



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net-next 08/16] tipc: refactor tipc_sendmsg() and tipc_lookup_anycast()

2021-01-11 Thread Xue, Ying

-   seq = >addr.nameseq;
-   if (dest->addrtype == TIPC_ADDR_MCAST)
-   return tipc_sendmcast(sock, seq, m, dlen, timeout);
-
-   if (dest->addrtype == TIPC_SERVICE_ADDR) {
-   type = dest->addr.name.name.type;
-   inst = dest->addr.name.name.instance;
-   dnode = dest->addr.name.domain;
-   dport = tipc_nametbl_lookup_anycast(net, type, inst, );
-   if (unlikely(!dport && !dnode))
+   /* Determine destination */
+   if (atype == TIPC_SERVICE_RANGE) {

[Ying] Regarding my understanding, we should compare "atype" with 
TIPC_ADDR_MCAST rather than TIPC_SERVICE_RANGE. Please help to confirm.

+   return tipc_sendmcast(sock, >sr, m, dlen, timeout);
+   } else if (atype == TIPC_SERVICE_ADDR) {
+   skaddr.node = ua->lookup_node;
+   ua->scope = skaddr.node ? TIPC_NODE_SCOPE : TIPC_CLUSTER_SCOPE;
+   if (!tipc_nametbl_lookup_anycast(net, ua, ))
return -EHOSTUNREACH;



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net v2] tipc: add stricter control of reserved service types

2020-10-26 Thread Xue, Ying

Hi Jon,

I have one idea about how to block the vulnerability described below from 
happening. Although it's not so elegant, it's quite simple:

As you mentioned below, TIPC_TOP_SRV service is only published in 
tipc_topsrv_create_listener () from kernel space. So probably we can change the 
code as below:

tipc_bind()
{
...
if ((addr->addr.nameseq.type < TIPC_RESERVED_TYPES) {
res = -EACCES;
goto exit;
}
__tipc_bind();
...
}

Int __tipc_bind()
{
If (addr->scope >= 0)
tipc_sk_publish(tsk, addr->scope, >addr.nameseq) :
 else
tipc_sk_withdraw(tsk, -addr->scope, >addr.nameseq);
}

tipc_topsrv_create_listener()
{
...
- rc = kernel_bind(lsock, (struct sockaddr *), sizeof(saddr));
+ rc = __tipc_bind(lsock, (struct sockaddr *), sizeof(saddr));
...
}

Thanks,
Ying

-Original Message-
From: jma...@redhat.com  
Sent: Sunday, October 25, 2020 3:16 AM
To: tipc-discussion@lists.sourceforge.net
Cc: tung.q.ngu...@dektech.com.au; hoang.h...@dektech.com.au; 
tuong.t.l...@dektech.com.au; jma...@redhat.com; ma...@donjonn.com; 
x...@redhat.com; Xue, Ying ; 
parthasarathy.bhuvara...@gmail.com
Subject: [net v2] tipc: add stricter control of reserved service types

From: Jon Maloy 

TIPC reserves 64 service types for current and future internal use.
Therefore, the bind() function is meant to block regular user sockets from 
being bound to these values, while it should let through such bindings from 
internal users.

However, since we at the design moment saw no way to distinguish between 
regular and internal users the filter function ended up with allowing all 
bindings of the reserved types which were really in use ([0,1]), and block all 
the rest ([2,63]).

This is risky, since a regular user may bind to the service type representing 
the topology server (TIPC_TOP_SRV == 1) or the one used for indicating 
neigboring node status (TIPC_CFG_SRV == 0), and wreak havoc for users of those 
services, i.e., practically all users.

The reality is however that TIPC_CFG_SRV never is bound through the
bind() function, since it doesn't represent a regular socket, and TIPC_TOP_SRV 
can easily be singled out, since it is published from kernel mode and is the 
very first binding performed when the system is starting.

We now introduce a 'privileged' mode for sockets, marking which of those are 
entitled to bind to reserved service type values. The only such socket we have 
so far is the topology server's listener socket, which is identified the way 
described above. All other bindings to reserved service types are rejected.

It should be noted that, although this is a change of the API semantics, there 
is no risk we will break any currently working applications by doing this. Any 
application trying to bind to the values in question would be badly broken from 
the outset, so there is no chance we would find any such applications in 
real-world production systems.

Signed-off-by: Jon Maloy 
---
 net/tipc/socket.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 
e795a8a2955b..a0a144ff84fd 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1,7 +1,8 @@
 /*
  * net/tipc/socket.c: TIPC socket API
  *
- * Copyright (c) 2001-2007, 2012-2017, Ericsson AB
+ * Copyright (c) 2020, Red Hat Inc
+ * Copyright (c) 2001-2007, 2012-2019, Ericsson AB
  * Copyright (c) 2004-2008, 2010-2013, Wind River Systems
  * All rights reserved.
  *
@@ -127,6 +128,8 @@ struct tipc_sock {
bool expect_ack;
bool nodelay;
bool group_is_open;
+   bool privileged;
+   bool kernel;
 };
 
 static int tipc_sk_backlog_rcv(struct sock *sk, struct sk_buff *skb); @@ 
-479,6 +482,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
tsk->max_pkt = MAX_PKT_DEFAULT;
tsk->maxnagle = 0;
tsk->nagle_start = NAGLE_START_INIT;
+   tsk->kernel = !!kern;
INIT_LIST_HEAD(>publications);
INIT_LIST_HEAD(>cong_links);
msg = >phdr;
@@ -665,6 +669,7 @@ static int tipc_bind(struct socket *sock, struct sockaddr 
*uaddr,
struct sockaddr_tipc *addr = (struct sockaddr_tipc *)uaddr;
struct tipc_sock *tsk = tipc_sk(sk);
int res = -EINVAL;
+   u32 stype, dnode;
 
lock_sock(sk);
if (unlikely(!uaddr_len)) {
@@ -691,13 +696,15 @@ static int tipc_bind(struct socket *sock, struct sockaddr 
*uaddr,
goto exit;
}
 
-   if ((addr->addr.nameseq.type < TIPC_RESERVED_TYPES) &&
-   (addr->addr.nameseq.type != TIPC_TOP_SRV) &&
-   (addr->addr.nameseq.type != TIPC_CFG_SRV)) {
+   stype = addr->addr.nameseq.type;
+   if (stype == TIPC_TOP_SRV && tsk->kernel &&
+   !tipc_nametbl_translate(sock_net(sk), stype, stype, ))
+

Re: [tipc-discussion] [PATCH] tipc: fix shutdown() of connection oriented socket

2020-09-08 Thread Xue, Ying

-Original Message-
From: Tetsuo Handa  
Sent: Saturday, September 5, 2020 2:15 PM
To: Jon Maloy ; Xue, Ying ; 
Parthasarathy Bhuvaragan 
Cc: David S. Miller ; Jakub Kicinski ; 
net...@vger.kernel.org; tipc-discussion@lists.sourceforge.net; Tetsuo Handa 

Subject: [PATCH] tipc: fix shutdown() of connection oriented socket

I confirmed that the problem fixed by commit 2a63866c8b51a3f7 ("tipc: fix
shutdown() of connectionless socket") also applies to stream socket.

--
#include 
#include 
#include 

int main(int argc, char *argv[])
{
int fds[2] = { -1, -1 };
socketpair(PF_TIPC, SOCK_STREAM /* or SOCK_DGRAM */, 0, fds);
if (fork() == 0)
_exit(read(fds[0], NULL, 1));
shutdown(fds[0], SHUT_RDWR); /* This must make read() return. */
wait(NULL); /* To be woken up by _exit(). */
return 0;
}
--

Since shutdown(SHUT_RDWR) should affect all processes sharing that socket, 
unconditionally setting sk->sk_shutdown to SHUTDOWN_MASK will be the right 
behavior.

Signed-off-by: Tetsuo Handa 

Acked-by: Ying Xue 

---
 net/tipc/socket.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 
ebd280e767bd..11b27ddc75ba 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2771,10 +2771,7 @@ static int tipc_shutdown(struct socket *sock, int how)
 
trace_tipc_sk_shutdown(sk, NULL, TIPC_DUMP_ALL, " ");
__tipc_shutdown(sock, TIPC_CONN_SHUTDOWN);
-   if (tipc_sk_type_connectionless(sk))
-   sk->sk_shutdown = SHUTDOWN_MASK;
-   else
-   sk->sk_shutdown = SEND_SHUTDOWN;
+   sk->sk_shutdown = SHUTDOWN_MASK;
 
if (sk->sk_state == TIPC_DISCONNECTING) {
/* Discard any unreceived messages */
--
2.18.4



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [RFC PATCH] tipc: define TIPC version 3 address types

2020-05-12 Thread Xue, Ying

Hi Jon,

Sorry for late response. 

Before I reply to your comments below, I want to discuss a more general 
question:

When you posted the idea that we use UUID to resolve service name conflict 
issue, in the recent days I was always wondering whether we could implement 
Raft consensus algorithm (https://raft.github.io/) in internal TIPC module. In 
my opinion, there are different advantages and disadvantages respectively: 

UUID: 
Advantages:
 - Its generation algorithm is straightforward and Linux kernel has the 
interfaces available to generate UUID numbers.
Disadvantages:
 - Protocol backwards compatibility
 - UUID generation algorithms cannot 100% guarantee that UUID numbers are not 
repeated particularly in distribution environment although the probability of 
UUID repeated occurrence is very little.

Raft:
Advantages:
 - Can 100% guarantee that service name doesn't conflict each other in in 
distribution environment
 - No protocol backwards compatibility issue
Disadvantages:
 - Compared to the changes brought by UUID, the changes might not be very big, 
but it's quite complex particularly when we try to implement its algorithm in 
kernel space.

I love to hear your opinion.

Thanks.,
Ying

-Original Message-
From: Jon Maloy [mailto:jma...@redhat.com] 
Sent: Tuesday, May 5, 2020 8:15 AM
To: tipc-discussion@lists.sourceforge.net
Cc: tung.q.ngu...@dektech.com.au; hoang.h...@dektech.com.au; 
tuong.t.l...@dektech.com.au; ma...@donjonn.com; x...@redhat.com; Xue, Ying; 
parthasarathy.bhuvara...@gmail.com
Subject: Re: [RFC PATCH] tipc: define TIPC version 3 address types

Hi all,
I was pondering a little more about this possible feature.
First of all, I realized that the following test

bool tipc_msg_validate(struct sk_buff **_skb)
{
     [...]
     if (unlikely(msg_version(hdr) != TIPC_VERSION))
     return false;
     [...]
}
makes it very hard to update the version number in a
backwards compatible way. Even discovery messages
will be rejected by v2 nodes, and we don't get around
that unless we do discovery with v2 messages, or send
out a duplicate set (v2 +v3) of discovery messages.
And, we can actually achieve exactly what we want
with just setting another capability bit.
So, we set bit #12 to mean "TIPC_EXTENDED", to also to
mean "all previous capabilities are valid if this bit is set,
no need to test for it"
That way, we can zero out these bits and start reusing them
for new capabilities when we need to.

AF_TIPC3 now becomes AF_TIPCE, tipc_addr becomes
tipce_addr etc.

The binding table needs to be updated the following way:

union publ_item {
   struct {
                __be32 type;
    __be32 lower;
                __be32 upper;
   } legacy;
    struct {
             u8 type[16];
     u8 lower[16];
     u8 upper[16];
       } extended;
  };

struct publication {
     u8 extended;
     u8 scope;    /* This can only take values [0:3] */
     u8 spare[2];
     union publ_item publ;
     u8 node[16];
     u32 port;
     u32 key;
     struct list_head binding_node;
     struct list_head binding_sock;
     struct list_head local_publ;
     struct list_head all_publ;
     struct rcu_head rcu;
};

struct distr_item {
     union publ_item;
     __be32 port;
     __be32 key;
};

The NAME_DISTR protocol must be extended with a field
indicating if it contains legacy publication(s) or extended
publication(s).
'Extended' nodes receive separate bulks for legacy and
extended publications, since it is hard to mix them in the
same message.
Legacy nodes only receive legacy publications, so in this
case the distributor just send a bulk for those.

The topology subscriber must be updated in a similar
manner, but we can assume that the same socket cannot
issue two types of subscriptions and receive two types
of events; it has to be on or the other. This should
simplify the task somewhat.

User message header format needs to be changed for
Service Address (Port Name) messages:
   - Type occupies word [8:B], i.e. bytes [32:47]
   - Instance occupies word [C:F], i.e. bytes [48:64]

This is where it gets tricky. The 'header size' field is only 4
bits and counts 32-bit words. This means that current
max header size that can be indicated is 60 bytes.
A simple way might be to just extend the field with one of
the tree unused bits [16:18] in word 1 as msb. That would
be backwards compatible since those bits are currently 0,
and no special tricks are needed.
Regarding TIPC_MCAST_MSG we need yet another 16 bytes,
[65:80] if we want to preserve the current  semantics on
[lower,upper]. However, I am highly uncertain if that feature
is ever used and needed. We may be good by just keeping
one 'instance' field just as in NAMED messages.

The group cast protocol could be left for later, once we understand
the consequences better than now, but semanti

Re: [tipc-discussion] [RFC PATCH 0/2] tipc: patches for Nagle algorithm issues

2020-05-11 Thread Xue, Ying

Acked-by: Ying Xue 

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Monday, May 4, 2020 7:28 PM
To: jma...@redhat.com; ma...@donjonn.com; Xue, Ying; 
tipc-discussion@lists.sourceforge.net
Cc: tipc-...@dektech.com.au
Subject: [RFC PATCH 0/2] tipc: patches for Nagle algorithm issues

Hi Jon, all,

Here are the patches for the Nagle issues that I mentioned in the last
meeting, please take a look and give me feedback.

Thanks a lot!

BR/Tuong

Tuong Lien (2):
  tipc: fix large latency in smart Nagle streaming
  tipc: add test for Nagle algorithm effectiveness

 net/tipc/msg.c|   3 --
 net/tipc/msg.h|  14 ++--
 net/tipc/socket.c | 101 ++
 3 files changed, 91 insertions(+), 27 deletions(-)

-- 
2.13.7



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [RFC PATCH 1/2] tipc: fix large latency in smart Nagle streaming

2020-05-08 Thread Xue, Ying

@@ -2011,7 +2021,7 @@ static int tipc_recvstream(struct socket *sock, struct 
msghdr *m,
 
/* Send connection flow control advertisement when applicable */
tsk->rcv_unacked += tsk_inc(tsk, hlen + dlen);
-   if (ack || tsk->rcv_unacked >= tsk->rcv_win / TIPC_ACK_RATE)
+   if (tsk->rcv_unacked >= tsk->rcv_win / TIPC_ACK_RATE)
tipc_sk_send_ack(tsk);
 
Beside tipc_recvstream(), we also need to make the same change in 
tipc_recvmsg().



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH RFC 0/4] tipc: add some improvements for broadcast

2020-04-07 Thread Xue, Ying

Good work Tuong!

Acked-by: Ying Xue 

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Saturday, March 28, 2020 12:03 PM
To: jma...@redhat.com; ma...@donjonn.com; Xue, Ying; 
tipc-discussion@lists.sourceforge.net
Cc: tipc-...@dektech.com.au
Subject: [PATCH RFC 0/4] tipc: add some improvements for broadcast

Hi Jon, all,

Please find the full series here,
+ For the 1st patch: it's really the last one I sent before, so you have
ack-ed already.
+ For the other ones, please help take a look. Also, I will send another
patch for iproute2/tipc which is user-space part of the last one in this
series i.e. broadcast rcv stats dumping.

Thanks alot!

Tuong Lien (4):
  tipc: introduce Gap ACK blocks for broadcast link
  tipc: add back link trace events
  tipc: enable broadcast retrans via unicast
  tipc: add support for broadcast rcv stats dumping

 net/tipc/bcast.c   |  22 ++-
 net/tipc/bcast.h   |   9 +-
 net/tipc/link.c| 500 +++--
 net/tipc/link.h|  11 +-
 net/tipc/msg.c |   9 +-
 net/tipc/msg.h |  16 +-
 net/tipc/netlink.c |   2 +-
 net/tipc/node.c|  75 ++--
 net/tipc/sysctl.c  |   9 +-
 net/tipc/trace.h   |  17 +-
 10 files changed, 424 insertions(+), 246 deletions(-)

-- 
2.13.7



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH RFC 4/4] tipc: add support for broadcast rcv stats dumping

2020-04-07 Thread Xue, Ying



Basically when it performs by the iproute2/tipc tool, it first asks the kernel 
to dump everything and then makes a filter on specific links according to the 
command options. Please see the other patch on iproute2/tipc for more details:

[iproute2-next] tipc: enable printing of broadcast rcv link stats

So, for this patch, we just introduce a flag for user to dump the 
broadcast-receiver links stats (in addition to the traditional links ones) when 
needed. That is the 'TIPC_NL_LINK_GET'/'TIPC_NLA_LINK_BROADCAST' flag as 
mentioned in the commit message. In the iproute2/tipc, it will be:

> + /* Set the flag to dump all bc links */
> + attrs = mnl_attr_nest_start(nlh, TIPC_NLA_LINK);
> + mnl_attr_put(nlh, TIPC_NLA_LINK_BROADCAST, 0, NULL);
> + mnl_attr_nest_end(nlh, attrs);

===
The reason why I asked the question is that "type" is still a hard code in the 
following function, which is difficult for us to understand what " type == 2" 
is meaning:
 
/* Caller should hold node lock  */
 static int __tipc_nl_add_node_links(struct net *net, struct tipc_nl_msg *msg,
-   struct tipc_node *node, u32 *prev_link)
+   struct tipc_node *node, u32 *prev_link,
+   u32 type)
 {
u32 i;
int err;
@@ -2536,6 +2553,14 @@ static int __tipc_nl_add_node_links(struct net *net, 
struct tipc_nl_msg *msg,
if (err)
return err;
}
+
+   if (type == 2) {
+   *prev_link = 3;
+   err = tipc_nl_add_bc_link(net, msg, node->bc_entry.link);
+   if (err)
+   return err;
+   }
+
*prev_link = 0;
===

Thanks,
Ying

-----Original Message-
From: Xue, Ying  
Sent: Monday, April 6, 2020 1:45 PM
To: Tuong Tong Lien ; jma...@redhat.com; 
ma...@donjonn.com; tipc-discussion@lists.sourceforge.net
Cc: tipc-dek 
Subject: RE: [PATCH RFC 4/4] tipc: add support for broadcast rcv stats dumping

Just a minor comment:

Please define macros for the cases:
1. Dump broadcast-link & unicast links
2. Dump broadcast-receiver links

Thanks,
Ying

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Saturday, March 28, 2020 12:03 PM
To: jma...@redhat.com; ma...@donjonn.com; Xue, Ying; 
tipc-discussion@lists.sourceforge.net
Cc: tipc-...@dektech.com.au
Subject: [PATCH RFC 4/4] tipc: add support for broadcast rcv stats dumping

This commit enables dumping the statistics of a broadcast-receiver link
like the traditional 'broadcast-link' one (which is for broadcast-
sender). The link dumping can be triggered via netlink (e.g. the
iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
indicator.

The name of a broadcast-receiver link of a specific peer will be in the
format: 'broadcast-link:'.

For example:

Link 
  Window:50 packets
  RX packets:7841 fragments:2408/440 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:124 dups:0
  TX naks:21 acks:0 retrans:0
  Congestion link:0  Send queue max:0 avg:0

In addition, the broadcast-receiver link statistics can be reset in the
usual way via netlink by specifying that link name in command.

Note: the 'tipc_link_name_ext()' is removed because the link name can
now be retrieved simply via the 'l->name'.

Signed-off-by: Tuong Lien 
---
 net/tipc/bcast.c   |  6 ++---
 net/tipc/bcast.h   |  5 +++--
 net/tipc/link.c| 65 +++---
 net/tipc/link.h|  3 +--
 net/tipc/msg.c |  9 
 net/tipc/msg.h |  2 +-
 net/tipc/netlink.c |  2 +-
 net/tipc/node.c| 63 +---
 net/tipc/trace.h   |  4 ++--
 9 files changed, 103 insertions(+), 56 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 50a16f8bebd9..383f87bc1061 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -563,10 +563,8 @@ void tipc_bcast_remove_peer(struct net *net, struct 
tipc_link *rcv_l)
tipc_sk_rcv(net, inputq);
 }
 
-int tipc_bclink_reset_stats(struct net *net)
+int tipc_bclink_reset_stats(struct net *net, struct tipc_link *l)
 {
-   struct tipc_link *l = tipc_bc_sndlink(net);
-
if (!l)
return -ENOPROTOOPT;
 
@@ -694,7 +692,7 @@ int tipc_bcast_init(struct net *net)
tn->bcbase = bb;
spin_lock_init(_net(net)->bclock);
 
-   if (!tipc_link_bc_create(net, 0, 0,
+   if (!tipc_link_bc_create(net, 0, 0, NULL,
 FB_MTU,
 BCLINK_WIN_DEFAULT,
 BCLINK_WIN_DEFAULT,
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 97d3cf9d3e4d..4240c95188b1 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -96,9 +96,10 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
 int tipc_bca

Re: [tipc-discussion] [PATCH RFC 3/4] tipc: enable broadcast retrans via unicast

2020-04-06 Thread Xue, Ying

Hi Tuong,

Thank you for your following clear explanations. 

The reason why I asked your test environment is that the more cores CPU has, 
the more queues NIC has and the more TIPC nodes there are in one TIPC cluster, 
the more packets will be disordered and the more retransmission requests will 
happen. If we have such test environment to validate broadcast throughput 
performance, it can help us clearly identify whether our current proposal is 
good or bad. If good, it also can help to tell how many percent broadcast 
throughput will  be improved. In addition, under RT_PREEMPT kernel, packets 
will be more easily disordered as hard-IRQ and soft-IRQ have been threaded. As 
a consequence, issuing retransmission requests become normal. On the contrary, 
if we only validate with two nodes and under virtual environment, it's a bit 
hard to measure whether bcl flow control algorithms are good or bad.

Anyway, I am still happy to see this patch being merged into upstream. But that 
will be wonderful if the switch option can be eliminated one day.

Thanks,
Ying

-Original Message-
From: Tuong Tong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Monday, April 6, 2020 4:54 PM
To: Xue, Ying; jma...@redhat.com; ma...@donjonn.com; 
tipc-discussion@lists.sourceforge.net
Cc: tipc-dek
Subject: RE: [PATCH RFC 3/4] tipc: enable broadcast retrans via unicast

Hi Ying,

No problem, I'm using outlook too...
Please see my answers correspondingly:
1. When you mention "this proposal", do you mean the single patch or the whole 
series since these features are actually separated and not dependent 
together...?
Anyway, there have been some issues with the lab here, so I just tested this 
new feature on KVM/QEMU nodes using the virtio_net driver, with 4 vCPUs and 
only one TX/RX queue enabled. Also, the real-time kernel is not patched yet...
If you have a better environment, may I ask you to help verify this?

Anyhow, if I could catch up your concerns in the last meeting, it was mainly 
related to the amount of packet retransmissions that could panic the NIC or 
kernel, so not really scalable? If so, in theoretically, it should not be a 
problem since we have already had the following mechanisms to control it:
- Link window (e.g. max 50 outstanding/retransmitted packets);
- Retransmission restricting timer on individual packets (e.g. within 10ms, if 
a new retransmission request comes it will be ignored...);
- The priority queue for packet retransmissions (that is unlikely congested);
Or do you have any other concerns, so please clarify?

2. Yes, in the commit it has mentioned about the "bandwidth limit on broadcast" 
but it can be invisible to user. One obvious thing is probably through 
broadcast statistics (so there is a need for the other patch for the broadcast 
rcv link stats) that users can see the sender trying to make a lot of 
(re-)transmissions, but it doesn't really work, the receiver gets only a few...
Ok, I will make this clear by repeating some performance tests.

3. Hmm, this totally was my mistake... I removed it when merging/separating the 
patches for this series ☹. In a premature patch, it was:

@@ -2425,7 +2426,7 @@ int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, 
u16 gap,
return 0;

trace_tipc_link_bc_ack(r, r->acked, acked, >transmq);
-   tipc_link_advance_transmq(l, r, acked, gap, ga, xmitq, , );
+   tipc_link_advance_transmq(l, r, acked, gap, ga, retrq, , );

tipc_link_advance_backlog(l, xmitq);
if (unlikely(!skb_queue_empty(>wakeupq)))

Thanks a lot for your finding, I will update this to the series!

BR/Tuong

-Original Message-
From: Xue, Ying  
Sent: Monday, April 6, 2020 1:20 PM
To: Tuong Tong Lien ; jma...@redhat.com; 
ma...@donjonn.com; tipc-discussion@lists.sourceforge.net
Cc: tipc-dek 
Subject: RE: [PATCH RFC 3/4] tipc: enable broadcast retrans via unicast

Hi Tuong,

Sorry, I have to use outlook client to reply to your email, which will make the 
email messed a bit. 

Please see my following comments:

==
[Ying] 1. Did you ever conduct comprehensive verification about this proposal? 
What kinds of test environment did you use in your testing? For example, how 
many TIPC physical nodes were gotten involved into your testing? Did the NICs 
used during your testing support multiqueue feature? How many cores were there 
on one your used physical TIPC machine? 

In addition, if possible, I suggest you could try to enable RT_PREEMPT kernel 
to measure what throughput results we would get. 
==

In some environment, broadcast traffic is suppressed at high rate (i.e.
a kind of bandwidth limit setting). When it is applied, TIPC broadcast
can still run successfully. However, when it comes to a high load, some
packets will be dropped first and TIPC tries to retransmit them but the
packet retransmission is intentionally broadcast too, so making things
worse and not helpful at all.

This commit enables the

Re: [tipc-discussion] [PATCH RFC 1/4] tipc: introduce Gap ACK blocks for broadcast link

2020-04-06 Thread Xue, Ying



 31   16 150
+-+-+-+-+
|  bgack_cnt  |  ugack_cnt  |len|
+-+-+-+-+  -
|gap|ack|   |
+-+-+-+-+> bc gacks
:   :   :   |
+-+-+-+-+  -
|gap|ack|   |
+-+-+-+-+> uc gacks
:   :   :   |
+-+-+-+-+  -

which is "automatically" backward-compatible.

===
[Ying] In my opinion,  this patch will cause the backward-compatible issue 
below:

1) On the TIPC node with the patch:
When sending a 'PROTOCOL/STATE_MSG' message , its 'Gap ACK blocks' data field 
only contains bcl gap ack blocks, but no any unicast link gap ack block.
2) On the TIPC node without the patch:
Upon receiving the message sent by the node of case 1), this node will suppose 
its 'Gap ACK blocks' data field are unicast link gap ack blocks rather than 
broadcast link gap ack blocks.

[Tuong]: As you can see in the figure above, we have two different 
"b/ugack_cnt" fields which determine the number of broadcast/unicast gap ack 
blocks in the message. The "ugack_cnt" is fully identical to the "gack_cnt" in 
the old version (- without the patch) i.e. indicating the number of unicast gap 
ack blocks anyway, whereas the "bgack_cnt" was a reserved field.
So, in your situation, the sending side will send the message with the 
"ugack_cnt" = 0 and this is completely compatible to the old version that the 
receiving side will see no unicast gap ack blocks and just ignore the broadcast 
gap ack blocks (- it doesn't really know). Actually, there is also a sanity 
check on the length in the old code that will shortly ignore such the gap ack 
block report... So, we have no problem at all. That is why I've declared it 
backward compatible automatically.

>>[Ying]: Thanks for your clarification. Yes, you are right. Now it's really 
>>compatible between old and new versions.

So I wonder no backward-compatible issue will exist and everything will become 
pretty easy if we use LINK_PROTOCOL to only contain unicast gap ack blocks and 
use BCAST_PROTOCOL to convey broadcast gap ack blocks.
In other words, we don't need to enlarge current gap ack block space, and we 
don't need to change the current code related unicast gap ack blocks. Instead, 
we just need to add the support for broadcast gap ack blocks through 
BCAST_PROTOCOL rather than LINK_PROTOCOL. 

[Tuong]: The BCAST_PROTOCOL is currently only used for broadcast initializing 
or synching when a new peer joins, the old mechanism as broadcast NACKs is 
deprecated... I suppose that using the LINK_PROTOCOL is much more convenient 
since the traditional ack/gap reports for broadcast link is also made via the 
message, so we don't need to create a new code flow to handle the gap/ack 
blocks.
Actually, the change in the current code related unicast gap ack blocks is just 
to optimize the code e.g. removing an old functions, etc., there is no impact 
in its functionality.

>>[Ying]: Sorry, I forgot this comment: 
>>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=02d11ca20091fcef904f05defda80c53e5b4e793.
It made broadcast NACK delivered through link state. 

Okay, both unicast link and bcl gap ack blocked can be transferred in the same 
link state message. 

>>[Ying]: By the way, I have another minor comment:

As  "bgack_cnt" is defined after " ugack_cnt " in struct tipc_gap_ack_blks, 
please reverse their order in this struct description section.

/* struct tipc_gap_ack_blks
  * @len: actual length of the record
- * @gack_cnt: number of Gap ACK blocks in the record
+ * @bgack_cnt: number of Gap ACK blocks for broadcast in the record
+ * @ugack_cnt: number of Gap ACK blocks for unicast (following the broadcast
+ * ones)
+ * @start_index: starting index for "valid" broadcast Gap ACK blocks
  * @gacks: array of Gap ACK blocks
  */
 struct tipc_gap_ack_blks {
__be16 len;
-   u8 gack_cnt;
-   u8 reserved;
+   union {
+   u8 ugack_cnt;
+   u8 start_index;
+   };
+   u8 bgack_cnt;
struct tipc_gap_ack gacks[];
 };




___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH RFC 3/4] tipc: enable broadcast retrans via unicast

2020-04-06 Thread Xue, Ying

Hi Tuong,

Sorry, I have to use outlook client to reply to your email, which will make the 
email messed a bit. 

Please see my following comments:

==
[Ying] 1. Did you ever conduct comprehensive verification about this proposal? 
What kinds of test environment did you use in your testing? For example, how 
many TIPC physical nodes were gotten involved into your testing? Did the NICs 
used during your testing support multiqueue feature? How many cores were there 
on one your used physical TIPC machine? 

In addition, if possible, I suggest you could try to enable RT_PREEMPT kernel 
to measure what throughput results we would get. 
==

In some environment, broadcast traffic is suppressed at high rate (i.e.
a kind of bandwidth limit setting). When it is applied, TIPC broadcast
can still run successfully. However, when it comes to a high load, some
packets will be dropped first and TIPC tries to retransmit them but the
packet retransmission is intentionally broadcast too, so making things
worse and not helpful at all.

This commit enables the broadcast retransmission via unicast which only
retransmits packets to the specific peer that has really reported a gap
i.e. not broadcasting to all nodes in the cluster, so will prevent from
being suppressed, and also reduce some overheads on the other peers due
to duplicates, finally improve the overall TIPC broadcast performance.

Note: the functionality can be turned on/off via the sysctl file:

echo 1 > /proc/sys/net/tipc/bc_retruni
echo 0 > /proc/sys/net/tipc/bc_retruni

Default is '0', i.e. the broadcast retransmission still works as usual.
==
[Ying] 2.  Actually I had a similar idea before, so I also think the broadcast 
performance might be significantly improved through this proposal, but we act 
as TIPC developers, we should explicitly tell users what condition they should 
enable this option and what condition they should disable it, otherwise, users 
have no idea at all about when to enable this option or when to disable this 
option. 

 So, please give more performance data obtained in different test conditions. 
If this patch can give broadcast performance a clear benefit under any test 
condition, ideally we completely remove this option. Otherwise, at least we can 
tell users when to enable this option.
==

Signed-off-by: Tuong Lien 

 int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap,
 struct tipc_gap_ack_blks *ga,
-struct sk_buff_head *xmitq)
+struct sk_buff_head *xmitq,
+struct sk_buff_head *retrq)
 {
struct tipc_link *l = r->bc_sndlink;
bool unused = false;

==
3. [Ying] Sorry, I felt a bit confused. One new "retrq" parameter was 
introduced, but I didn't find where it was used in this function. 
Can you please explain how the new parameter works?
==

Thanks,
Ying

@@ -2460,7 +2461,8 @@ int tipc_link_bc_nack_rcv(struct tipc_link *l, struct 
sk_buff *skb,
return 0;
 
if (dnode == tipc_own_addr(l->net)) {
-   rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, xmitq);
+   rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, xmitq,
+ xmitq);
l->stats.recv_nacks++;
return rc;
}
diff --git a/net/tipc/link.h b/net/tipc/link.h
index 0a0fa7350722..4d0768cf91d5 100644
--- a/net/tipc/link.h
+++ b/net/tipc/link.h
@@ -147,7 +147,8 @@ u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, 
struct tipc_link *l,
  struct tipc_msg *hdr, bool uc);
 int tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, u16 gap,
 struct tipc_gap_ack_blks *ga,
-struct sk_buff_head *xmitq);
+struct sk_buff_head *xmitq,
+struct sk_buff_head *retrq);
 void tipc_link_build_bc_sync_msg(struct tipc_link *l,
 struct sk_buff_head *xmitq);
 void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg *hdr);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index eb6b62de81a7..917ad3920fac 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1771,7 +1771,7 @@ static void tipc_node_bc_sync_rcv(struct tipc_node *n, 
struct tipc_msg *hdr,
struct tipc_link *ucl;
int rc;
 
-   rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr);
+   rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr, xmitq);
 
if (rc & TIPC_LINK_DOWN_EVT) {
tipc_node_reset_links(n);
diff --git a/net/tipc/sysctl.c b/net/tipc/sysctl.c
index 58ab3d6dcdce..97a6264a2993 100644
--- a/net/tipc/sysctl.c
+++ b/net/tipc/sysctl.c
@@ -36,7 +36,7 @@
 #include "core.h"
 #include "trace.h"
 #include "crypto.h"
-
+#include "bcast.h"
 #include 
 
 static struct ctl_table_header *tipc_ctl_hdr;
@@ -75,6 +75,13 @@ static struct ctl_table tipc_table[] = {
.extra1 =

Re: [tipc-discussion] [PATCH RFC 1/4] tipc: introduce Gap ACK blocks for broadcast link

2020-04-06 Thread Xue, Ying

Hi Tuong,

Please see my comments inline:

As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput
by Gap ACK blocks"), we apply the same mechanism for the broadcast link
as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will
consist of two parts built for both the broadcast and unicast types:

 31   16 150
+-+-+-+-+
|  bgack_cnt  |  ugack_cnt  |len|
+-+-+-+-+  -
|gap|ack|   |
+-+-+-+-+> bc gacks
:   :   :   |
+-+-+-+-+  -
|gap|ack|   |
+-+-+-+-+> uc gacks
:   :   :   |
+-+-+-+-+  -

which is "automatically" backward-compatible.

===
[Ying] In my opinion,  this patch will cause the backward-compatible issue 
below:

1) On the TIPC node with the patch:
When sending a 'PROTOCOL/STATE_MSG' message , its 'Gap ACK blocks' data field 
only contains bcl gap ack blocks, but no any unicast link gap ack block.
2) On the TIPC node without the patch:
Upon receiving the message sent by the node of case 1), this node will suppose 
its 'Gap ACK blocks' data field are unicast link gap ack blocks rather than 
broadcast link gap ack blocks.

So I wonder no backward-compatible issue will exist and everything will become 
pretty easy if we use LINK_PROTOCOL to only contain unicast gap ack blocks and 
use BCAST_PROTOCOL to convey broadcast gap ack blocks.
In other words, we don't need to enlarge current gap ack block space, and we 
don't need to change the current code related unicast gap ack blocks. Instead, 
we just need to add the support for broadcast gap ack blocks through 
BCAST_PROTOCOL rather than LINK_PROTOCOL. 
===

Thanks,
Ying

We also increase the max number of Gap ACK blocks to 128, allowing upto
64 blocks per type (total buffer size = 516 bytes).

Besides, the 'tipc_link_advance_transmq()' function is refactored which
is applicable for both the unicast and broadcast cases now, so some old
functions can be removed and the code is optimized.

With the patch, TIPC broadcast is more robust regardless of packet loss
or disorder, latency, ... in the underlying network. Its performance is
boost up significantly.
For example, experiment with a 5% packet loss rate results:

$ time tipc-pipe --mc --rdm --data_size 123 --data_num 150
real0m 42.46s
user0m 1.16s
sys 0m 17.67s

Without the patch:

$ time tipc-pipe --mc --rdm --data_size 123 --data_num 150
real8m 27.94s
user0m 0.55s
sys 0m 2.38s

Acked-by: Jon Maloy 
Signed-off-by: Tuong Lien 
---
 net/tipc/bcast.c |   9 +-
 net/tipc/link.c  | 438 +--
 net/tipc/link.h  |   7 +-
 net/tipc/msg.h   |  14 +-
 net/tipc/node.c  |  10 +-
 5 files changed, 293 insertions(+), 185 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 4c20be08b9c4..3ce690a96ee9 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -474,7 +474,7 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link 
*l,
__skb_queue_head_init();
 
tipc_bcast_lock(net);
-   tipc_link_bc_ack_rcv(l, acked, );
+   tipc_link_bc_ack_rcv(l, acked, 0, NULL, );
tipc_bcast_unlock(net);
 
tipc_bcbase_xmit(net, );
@@ -492,6 +492,7 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link 
*l,
struct tipc_msg *hdr)
 {
struct sk_buff_head *inputq = _bc_base(net)->inputq;
+   struct tipc_gap_ack_blks *ga;
struct sk_buff_head xmitq;
int rc = 0;
 
@@ -501,8 +502,10 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link 
*l,
if (msg_type(hdr) != STATE_MSG) {
tipc_link_bc_init_rcv(l, hdr);
} else if (!msg_bc_ack_invalid(hdr)) {
-   tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), );
-   rc = tipc_link_bc_sync_rcv(l, hdr, );
+   tipc_get_gap_ack_blks(, l, hdr, false);
+   rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr),
+ msg_bc_gap(hdr), ga, );
+   rc |= tipc_link_bc_sync_rcv(l, hdr, );
}
tipc_bcast_unlock(net);
 
diff --git a/net/tipc/link.c b/net/tipc/link.c
index 467c53a1fb5c..1b60ba665504 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -188,6 +188,8 @@ struct tipc_link {
/* Broadcast */
u16 ackers;
u16 acked;
+   u16 last_gap;
+   struct tipc_gap_ack_blks *last_ga;
struct tipc_link *bc_rcvlink;
struct tipc_link *bc_sndlink;
u8 nack_state;
@@ -249,11

Re: [tipc-discussion] [PATCH RFC 4/4] tipc: add support for broadcast rcv stats dumping

2020-04-06 Thread Xue, Ying

Just a minor comment:

Please define macros for the cases:
1. Dump broadcast-link & unicast links
2. Dump broadcast-receiver links

Thanks,
Ying

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Saturday, March 28, 2020 12:03 PM
To: jma...@redhat.com; ma...@donjonn.com; Xue, Ying; 
tipc-discussion@lists.sourceforge.net
Cc: tipc-...@dektech.com.au
Subject: [PATCH RFC 4/4] tipc: add support for broadcast rcv stats dumping

This commit enables dumping the statistics of a broadcast-receiver link
like the traditional 'broadcast-link' one (which is for broadcast-
sender). The link dumping can be triggered via netlink (e.g. the
iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
indicator.

The name of a broadcast-receiver link of a specific peer will be in the
format: 'broadcast-link:'.

For example:

Link 
  Window:50 packets
  RX packets:7841 fragments:2408/440 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:124 dups:0
  TX naks:21 acks:0 retrans:0
  Congestion link:0  Send queue max:0 avg:0

In addition, the broadcast-receiver link statistics can be reset in the
usual way via netlink by specifying that link name in command.

Note: the 'tipc_link_name_ext()' is removed because the link name can
now be retrieved simply via the 'l->name'.

Signed-off-by: Tuong Lien 
---
 net/tipc/bcast.c   |  6 ++---
 net/tipc/bcast.h   |  5 +++--
 net/tipc/link.c| 65 +++---
 net/tipc/link.h|  3 +--
 net/tipc/msg.c |  9 
 net/tipc/msg.h |  2 +-
 net/tipc/netlink.c |  2 +-
 net/tipc/node.c| 63 +---
 net/tipc/trace.h   |  4 ++--
 9 files changed, 103 insertions(+), 56 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 50a16f8bebd9..383f87bc1061 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -563,10 +563,8 @@ void tipc_bcast_remove_peer(struct net *net, struct 
tipc_link *rcv_l)
tipc_sk_rcv(net, inputq);
 }
 
-int tipc_bclink_reset_stats(struct net *net)
+int tipc_bclink_reset_stats(struct net *net, struct tipc_link *l)
 {
-   struct tipc_link *l = tipc_bc_sndlink(net);
-
if (!l)
return -ENOPROTOOPT;
 
@@ -694,7 +692,7 @@ int tipc_bcast_init(struct net *net)
tn->bcbase = bb;
spin_lock_init(_net(net)->bclock);
 
-   if (!tipc_link_bc_create(net, 0, 0,
+   if (!tipc_link_bc_create(net, 0, 0, NULL,
 FB_MTU,
 BCLINK_WIN_DEFAULT,
 BCLINK_WIN_DEFAULT,
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 97d3cf9d3e4d..4240c95188b1 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -96,9 +96,10 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l,
 int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l,
struct tipc_msg *hdr,
struct sk_buff_head *retrq);
-int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg);
+int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg,
+   struct tipc_link *bcl);
 int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[]);
-int tipc_bclink_reset_stats(struct net *net);
+int tipc_bclink_reset_stats(struct net *net, struct tipc_link *l);
 
 u32 tipc_bcast_get_broadcast_mode(struct net *net);
 u32 tipc_bcast_get_broadcast_ratio(struct net *net);
diff --git a/net/tipc/link.c b/net/tipc/link.c
index 3071e46f029a..808d3a76c27f 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -539,7 +539,7 @@ bool tipc_link_create(struct net *net, char *if_name, int 
bearer_id,
  *
  * Returns true if link was created, otherwise false
  */
-bool tipc_link_bc_create(struct net *net, u32 ownnode, u32 peer,
+bool tipc_link_bc_create(struct net *net, u32 ownnode, u32 peer, u8 *peer_id,
 int mtu, u32 min_win, u32 max_win, u16 peer_caps,
 struct sk_buff_head *inputq,
 struct sk_buff_head *namedq,
@@ -554,7 +554,18 @@ bool tipc_link_bc_create(struct net *net, u32 ownnode, u32 
peer,
return false;
 
l = *link;
-   strcpy(l->name, tipc_bclink_name);
+   if (peer_id) {
+   char peer_str[NODE_ID_STR_LEN] = {0,};
+
+   tipc_nodeid2string(peer_str, peer_id);
+   if (strlen(peer_str) > 16)
+   sprintf(peer_str, "%x", peer);
+   /* Broadcast receiver link name: "broadcast-link:" */
+   snprintf(l->name, sizeof(l->name), "%s:%s", tipc_bclink_name,
+peer_str);
+   } else {
+   strcpy(l->name, tipc_bclink_name);
+   }
trace_tipc_link_reset(l, TIPC_DUMP_ALL, "bclink created!");
tipc_link_reset(l);
l->state = LINK_R

Re: [tipc-discussion] [net-next] tipc: Add a missing case of TIPC_DIRECT_MSG type

2020-03-31 Thread Xue, Ying

Acked-by: Ying Xue 

-Original Message-
From: Hoang Le [mailto:hoang.h...@dektech.com.au] 
Sent: Wednesday, March 25, 2020 3:43 PM
To: tipc-...@dektech.com.au; ma...@donjonn.com; 
tipc-discussion@lists.sourceforge.net
Subject: [tipc-discussion] [net-next] tipc: Add a missing case of 
TIPC_DIRECT_MSG type

In the commit f73b12812a3d
("tipc: improve throughput between nodes in netns"), we're missing a check
to handle TIPC_DIRECT_MSG type, it's still using old sending mechanism for
this message type. So, throughput improvement is not significant as
expected.

Besides that, when sending a large message with that type, we're also
handle wrong receiving queue, it should be enqueued in socket receiving
instead of multicast messages.

Fix this by adding the missing case for TIPC_DIRECT_MSG.

Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns")
Reported-by: Tuong Lien 
Signed-off-by: Hoang Le 
---
 net/tipc/msg.h| 5 +
 net/tipc/node.c   | 3 ++-
 net/tipc/socket.c | 2 +-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index 6d466ebdb64f..871feadbbc19 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -394,6 +394,11 @@ static inline u32 msg_connected(struct tipc_msg *m)
return msg_type(m) == TIPC_CONN_MSG;
 }
 
+static inline u32 msg_direct(struct tipc_msg *m)
+{
+   return msg_type(m) == TIPC_DIRECT_MSG;
+}
+
 static inline u32 msg_errcode(struct tipc_msg *m)
 {
return msg_bits(m, 1, 25, 0xf);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 0c88778c88b5..10292c942384 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1586,7 +1586,8 @@ static void tipc_lxc_xmit(struct net *peer_net, struct 
sk_buff_head *list)
case TIPC_MEDIUM_IMPORTANCE:
case TIPC_HIGH_IMPORTANCE:
case TIPC_CRITICAL_IMPORTANCE:
-   if (msg_connected(hdr) || msg_named(hdr)) {
+   if (msg_connected(hdr) || msg_named(hdr) ||
+   msg_direct(hdr)) {
tipc_loopback_trace(peer_net, list);
spin_lock_init(>lock);
tipc_sk_rcv(peer_net, list);
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 693e8902161e..87466607097f 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1461,7 +1461,7 @@ static int __tipc_sendmsg(struct socket *sock, struct 
msghdr *m, size_t dlen)
}
 
__skb_queue_head_init();
-   mtu = tipc_node_get_mtu(net, dnode, tsk->portid, false);
+   mtu = tipc_node_get_mtu(net, dnode, tsk->portid, true);
rc = tipc_msg_build(hdr, m, 0, dlen, mtu, );
if (unlikely(rc != dlen))
return rc;
-- 
2.20.1



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net-next] tipc: simplify trivial boolean return

2020-03-05 Thread Xue, Ying

Acked-by: Ying Xue 

-Original Message-
From: Hoang Le [mailto:hoang.h...@dektech.com.au] 
Sent: Friday, February 21, 2020 12:49 PM
To: jma...@redhat.com; ma...@donjonn.com; 
tipc-discussion@lists.sourceforge.net; Xue, Ying
Subject: [net-next] tipc: simplify trivial boolean return

Checking and returning 'true' boolean is useless as it will be
returning at end of function

Signed-off-by: Hoang Le 
---
 net/tipc/msg.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index 0d515d20b056..4d0e0bdd997b 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -736,9 +736,6 @@ bool tipc_msg_lookup_dest(struct net *net, struct sk_buff 
*skb, int *err)
msg_set_destport(msg, dport);
*err = TIPC_OK;
 
-   if (!skb_cloned(skb))
-   return true;
-
return true;
 }
 
-- 
2.20.1



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net] tipc: fix successful connect() but timed out

2020-02-05 Thread Xue, Ying

Acked-by: Ying Xue 

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Wednesday, February 5, 2020 11:43 AM
To: jma...@redhat.com; ma...@donjonn.com; Xue, Ying; 
tipc-discussion@lists.sourceforge.net
Cc: tipc-...@dektech.com.au
Subject: [net] tipc: fix successful connect() but timed out

In commit 74cdc9035b82 ("tipc: fix wrong connect() return code"), we
fixed the issue with the 'connect()' that returns zero even though the
connecting has failed by waiting for the connection to be 'ESTABLISHED'
really. However, the approach has one drawback in conjunction with our
'lightweight' connection setup mechanism that the following scenario
can happen:

  (server)(client)

   +- accept()|  | wait_for_conn()
   |  |  |connect() ---+
   |  |<---[SYN]-| > sleeping
   |  |  *CONNECTING   |
   |->*ESTABLISHED   | |
  |[ACK]>*ESTABLISHED  > wakeup()
send()|[DATA]--->|\> wakeup()
send()|[DATA]--->| |   > wakeup()
  .   .  .   . |-> recvq   .
  .   .  .   . |   .
send()|[DATA]--->|/> wakeup()
   close()|[FIN]>*DISCONNECTING|
  *DISCONNECTING | |
  |  ~~> schedule()
   | wait again
   .
   .
   | ETIMEDOUT

Upon the receipt of the server 'ACK', the client becomes 'ESTABLISHED'
and the 'wait_for_conn()' process is woken up but not run. Meanwhile,
the server starts to send a number of data following by a 'close()'
shortly without waiting any response from the client, which then forces
the client socket to be 'DISCONNECTING' immediately. When the wait
process is switched to be running, it continues to wait until the timer
expires because of the unexpected socket state. The client 'connect()'
will finally get ‘-ETIMEDOUT’ and force to release the socket whereas
there remains the messages in its receive queue.

Obviously the issue would not happen if the server had some delay prior
to its 'close()' (or the number of 'DATA' messages is large enough),
but any kind of delay would make the connection setup/shutdown "heavy".
We solve this by simply allowing the 'connect()' returns zero in this
particular case. The socket is already 'DISCONNECTING', so any further
write will get '-EPIPE' but the socket is still able to read the
messages existing in its receive queue.

Note: This solution doesn't break the previous one as it deals with a
different situation that the socket state is 'DISCONNECTING' but has no
error (i.e. sk->sk_err = 0).

Fixes: 74cdc9035b82 ("tipc: fix wrong connect() return code")
Signed-off-by: Tuong Lien 
---
 net/tipc/socket.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index f9b4fb92c0b1..693e8902161e 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2441,6 +2441,8 @@ static int tipc_wait_for_connect(struct socket *sock, 
long *timeo_p)
return -ETIMEDOUT;
if (signal_pending(current))
return sock_intr_errno(*timeo_p);
+   if (sk->sk_state == TIPC_DISCONNECTING)
+   break;

add_wait_queue(sk_sleep(sk), );
done = sk_wait_event(sk, timeo_p, tipc_sk_connected(sk),
-- 
2.13.7

___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net v2] tipc: fix wrong connect() return code

2020-01-02 Thread Xue, Ying

Great! 

Acked-by: Ying Xue 

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Monday, December 30, 2019 4:56 PM
To: tipc-discussion@lists.sourceforge.net; jma...@redhat.com; 
ma...@donjonn.com; Xue, Ying
Subject: [net v2] tipc: fix wrong connect() return code

The current 'tipc_wait_for_connect()' function makes a loop and waits
for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the
connecting has done. However, when the condition is met, it always
returns '0' even in the case the connecting was actually failed (e.g.
refused because the server socket has closed...) and the socket state
was set to 'TIPC_DISCONNECTING'.
This results in a wrong return code for the 'connect()' call from user,
making it believe that the connection is established and goes ahead
with more actions e.g. building & sending a message but then finally
gets an unexpected result (e.g. '-EPIPE').

This commit fixes the issue by instead setting the wait condition to
'tipc_sk_connected(sk)', so that the function will return '0' only when
the connection is really established. Otherwise, either the socket
error code if any or '-ETIMEDOUT'/'-EINTR' will be returned
correspondingly.

-
v2: changed after discussing with Ying
-

Signed-off-by: Tuong Lien 
---
 net/tipc/socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 6552f986774c..2f5679f84060 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2432,8 +2432,8 @@ static int tipc_wait_for_connect(struct socket *sock, 
long *timeo_p)
return sock_intr_errno(*timeo_p);

add_wait_queue(sk_sleep(sk), );
-   done = sk_wait_event(sk, timeo_p,
-sk->sk_state != TIPC_CONNECTING, );
+   done = sk_wait_event(sk, timeo_p, tipc_sk_connected(sk),
+);
remove_wait_queue(sk_sleep(sk), );
} while (!done);
return 0;
-- 
2.13.7

___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net] tipc: fix wrong connect() return code

2019-12-25 Thread Xue, Ying

Probably below change is more easily understandable:

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 6552f98..358cc55 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2435,7 +2435,7 @@ static int tipc_wait_for_connect(struct socket *sock, 
long *timeo_p)
done = sk_wait_event(sk, timeo_p,
 sk->sk_state != TIPC_CONNECTING, );
remove_wait_queue(sk_sleep(sk), );
-   } while (!done);
+   } while (!done || sk->sk_err);
return 0;
 }

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Tuesday, December 24, 2019 4:06 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying
Subject: [net] tipc: fix wrong connect() return code

The current 'tipc_wait_for_connect()' function makes a loop and waits
for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the
connecting has done. However, when the condition is met, it always
returns '0' even in the case the connecting was actually failed (e.g.
refused because the server socket has closed...) and the socket state
was set to 'TIPC_DISCONNECTING'.
This results in a wrong return code for the 'connect()' call from user,
making it believe that the connection is established and goes ahead
with more actions e.g. building & sending a message but then finally
gets an unexpected result (e.g. '-EPIPE').

This commit fixes the issue by returning the corresponding error code
if any when the wait process is waken up.

Signed-off-by: Tuong Lien 
---
 net/tipc/socket.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 8b1daf3634b0..2e5faf89ef80 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2428,7 +2428,7 @@ static int tipc_wait_for_connect(struct socket *sock, 
long *timeo_p)
 {
DEFINE_WAIT_FUNC(wait, woken_wake_function);
struct sock *sk = sock->sk;
-   int done;
+   int done = 0;
 
do {
int err = sock_error(sk);
@@ -2438,12 +2438,14 @@ static int tipc_wait_for_connect(struct socket *sock, 
long *timeo_p)
return -ETIMEDOUT;
if (signal_pending(current))
return sock_intr_errno(*timeo_p);
+   if (done)
+   return 0;
 
add_wait_queue(sk_sleep(sk), );
done = sk_wait_event(sk, timeo_p,
 sk->sk_state != TIPC_CONNECTING, );
remove_wait_queue(sk_sleep(sk), );
-   } while (!done);
+   } while (1);
return 0;
 }
 
-- 
2.13.7



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net] tipc: fix link overflow issue at socket shutdown

2019-12-25 Thread Xue, Ying

Acked-by Ying Xue 

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Tuesday, December 24, 2019 6:09 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying
Subject: [net] tipc: fix link overflow issue at socket shutdown

When a socket is suddenly shutdown or released, it will reject all the
unreceived messages in its receive queue. This applies to a connected
socket too, whereas there is only one 'FIN' message required to be sent
back to its peer in this case.

In case there are many messages in the queue and/or some connections
with such messages are shutdown at the same time, the link layer will
easily get overflowed at the 'TIPC_SYSTEM_IMPORTANCE' backlog level
because of the message rejections. As a result, the link will be taken
down. Moreover, immediately when the link is re-established, the socket
layer can continue to reject the messages and the same issue happens...

The commit refactors the '__tipc_shutdown()' function to only send one
'FIN' in the situation mentioned above. For the connectionless case, it
is unavoidable but usually there is no rejections for such socket
messages because they are 'dest-droppable' by default.

In addition, the new code makes the other socket states clear
(e.g.'TIPC_LISTEN') and treats as a separate case to avoid misbehaving.

--
v2: completely refactor the function;
cover the other socket states;
fix a memleak issue (- reported by 'Hoang Huu Le').
--

Signed-off-by: Tuong Lien 
---
 net/tipc/socket.c | 53 -
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 41688da233ab..aa0ffd0dba50 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -287,12 +287,12 @@ static void tipc_sk_respond(struct sock *sk, struct 
sk_buff *skb, int err)
  *
  * Caller must hold socket lock
  */
-static void tsk_rej_rx_queue(struct sock *sk)
+static void tsk_rej_rx_queue(struct sock *sk, int error)
 {
struct sk_buff *skb;
 
while ((skb = __skb_dequeue(>sk_receive_queue)))
-   tipc_sk_respond(sk, skb, TIPC_ERR_NO_PORT);
+   tipc_sk_respond(sk, skb, error);
 }
 
 static bool tipc_sk_connected(struct sock *sk)
@@ -545,34 +545,45 @@ static void __tipc_shutdown(struct socket *sock, int 
error)
/* Remove pending SYN */
__skb_queue_purge(>sk_write_queue);
 
-   /* Reject all unreceived messages, except on an active connection
-* (which disconnects locally & sends a 'FIN+' to peer).
-*/
-   while ((skb = __skb_dequeue(>sk_receive_queue)) != NULL) {
-   if (TIPC_SKB_CB(skb)->bytes_read) {
-   kfree_skb(skb);
-   continue;
-   }
-   if (!tipc_sk_type_connectionless(sk) &&
-   sk->sk_state != TIPC_DISCONNECTING) {
-   tipc_set_sk_state(sk, TIPC_DISCONNECTING);
-   tipc_node_remove_conn(net, dnode, tsk->portid);
-   }
-   tipc_sk_respond(sk, skb, error);
+   /* Remove partial received buffer if any */
+   skb = skb_peek(>sk_receive_queue);
+   if (skb && TIPC_SKB_CB(skb)->bytes_read) {
+   __skb_unlink(skb, >sk_receive_queue);
+   kfree_skb(skb);
}
 
-   if (tipc_sk_type_connectionless(sk))
+   /* Reject all unreceived messages if connectionless */
+   if (tipc_sk_type_connectionless(sk)) {
+   tsk_rej_rx_queue(sk, error);
return;
+   }
 
-   if (sk->sk_state != TIPC_DISCONNECTING) {
+   switch (sk->sk_state) {
+   case TIPC_CONNECTING:
+   case TIPC_ESTABLISHED:
+   tipc_set_sk_state(sk, TIPC_DISCONNECTING);
+   tipc_node_remove_conn(net, dnode, tsk->portid);
+   /* Send a FIN+/- to its peer */
+   skb = __skb_dequeue(>sk_receive_queue);
+   if (skb) {
+   __skb_queue_purge(>sk_receive_queue);
+   tipc_sk_respond(sk, skb, error);
+   break;
+   }
skb = tipc_msg_create(TIPC_CRITICAL_IMPORTANCE,
  TIPC_CONN_MSG, SHORT_H_SIZE, 0, dnode,
  tsk_own_node(tsk), tsk_peer_port(tsk),
  tsk->portid, error);
if (skb)
tipc_node_xmit_skb(net, skb, dnode, tsk->portid);
-   tipc_node_remove_conn(net, dnode, tsk->portid);
-   tipc_set_sk_state(sk, TIPC_DISCONNECTING);
+   break;
+   case TIPC_LISTEN:
+   /* Reject all SYN messages */
+   tsk_rej_rx_queue(sk, error);
+   break;
+   default:
+

Re: [tipc-discussion] [net-next 1/3] tipc: eliminate gap indicator from ACK messages

2019-12-05 Thread Xue, Ying

Thanks Jon for your comments. I fully agree with you. To introduce a 
retransmission timer for every message might be not a good idea, but 
introducing a retransmission timer for a block of messages might be a good 
idea. In most of time tipc network condition is quite good and TIPC message 
doesn’t need to cross internet, so RTT value should be slightly constant 
between two nodes. But different TIPC nodes have different HW capability, which 
means the RTT values might be quite different between different tipc unicast 
links. Particularly node’s workload is often changed from time to time, which 
also has impact on RTT value. In short, it looks finally we really need to 
dynamically measure RTT and dynamically detect message is lost or not with a 
retransmission timer by comparing with the adjusted RTT although you and I both 
dislike this approach. Using retransmission timer to detect TIPC message loss 
probably should be the majority approach. By contrast, SACK or NACK 
retransmission mechanism might be a supplementary method like fast 
retransmission in SCTP or TCP world.

Lastly, I don’t object to this series. Instead, this series is quite good 
because it can bring us a large throughput improvement. We can submit request 
to merge the series to upstream first. Please add my ack-by if you want.
After this series, we can do more experiments under quite different network 
environments to validate whether it’s worth introducing retransmission timer.

Thanks,
Ying

From: Jon Maloy [mailto:jon.ma...@ericsson.com]
Sent: Thursday, December 5, 2019 4:14 AM
To: Jon Maloy; Xue, Ying; x...@redhat.com; Tuong Tong Lien; Tung Quang Nguyen; 
Hoang Huu Le; John Rutherford; tipc-discussion@lists.sourceforge.net; 
tipc-...@dektech.com.au
Subject: RE: [net-next 1/3] tipc: eliminate gap indicator from ACK messages

I tried different varieties of the solutions discussed below.
1)Ignoring 1st, 2d or 3d first NACKs, and rely on the time stamp for 
repeated retransmissions.
2)Ignoring 1st,2d,4th,5th and all other NACKS which are not a multiple of 3

Some gave the same throughput as with the patches I posted, but none was 
definitely better, as far as I could see.
However, I didn’t do any long series, so there might still be some small 
percentage improvement I have missed.

///jon

From: Jon Maloy 
Sent: 4-Dec-19 13:07
To: Jon Maloy ; Xue, Ying ; 
x...@redhat.com; Tuong Tong Lien ; Tung Quang 
Nguyen ; Hoang Huu Le 
; John Rutherford ; 
tipc-discussion@lists.sourceforge.net; tipc-...@dektech.com.au
Subject: Re: [net-next 1/3] tipc: eliminate gap indicator from ACK messages

Hi Ying,
(cc-ing tipc-discussion, since yout original mail seems to have been dropped 
somewhere, and I want everybody to be able follow the discussion)

Actually we do have a kind of SACK mechanism already, but maybe too simple.
See below.

On Wednesday, December 4, 2019, 11:45:37 AM GMT-5, Xue, Ying 
mailto:ying@windriver.com>> wrote:

I don’t know why I received lots of complains from outlook which indicate my 
below email was not sent to you.
I just try to resend. If you received, please ignore this one.

Thanks,
Ying

-Original Message-
From: Ying Xue [mailto:ying@windriver.com<mailto:ying@windriver.com>]
Sent: Thursday, December 5, 2019 12:39 AM
To: Jon Maloy; Jon Maloy
Cc: 
mohan.krishna.ghanta.krishnamur...@ericsson.com<mailto:mohan.krishna.ghanta.krishnamur...@ericsson.com>;
 parthasarathy.bhuvara...@gmail.com<mailto:parthasarathy.bhuvara...@gmail.com>; 
tung.q.ngu...@dektech.com.au<mailto:tung.q.ngu...@dektech.com.au>; 
hoang.h...@dektech.com.au<mailto:hoang.h...@dektech.com.au>; 
tuong.t.l...@dektech.com.au<mailto:tuong.t.l...@dektech.com.au>; 
gordan.mihalje...@dektech.com.au<mailto:gordan.mihalje...@dektech.com.au>; 
tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
Subject: Re: [net-next 1/3] tipc: eliminate gap indicator from ACK messages

On 12/2/19 8:32 AM, Jon Maloy wrote:
> When we increase the link send window we sometimes observe the
> following scenario:
>
> 1) A packet #N arrives out of order far ahead of a sequence of older
>packets which are still under way. The packet is added to the
>deferred queue.
> 2) The missing packets arrive in sequence, and for each 16th of them
>an ACK is sent back to the receiver, as it should be.
> 3) When building those ACK messages, it is checked if there is a gap
>between the link's 'rcv_nxt' and the first packet in the deferred
>queue. This is always the case until packet number #N-1 arrives, and
>a 'gap' indicator is added, effectively turning them into NACK
>messages.
> 4) When those NACKs arrive at the sender, all the requested
>retransmissions are done, since it is a first-time request.
>
> This sometimes leads to a huge amount of redundant retransmissions,
> causing a drop in max

Re: [tipc-discussion] [iproute2] tipc: add new commands to set TIPC AEAD key

2019-11-20 Thread Xue, Ying

Hi Tuong,

This patch is fine for us. Sure, you can my ack-by.

Thanks,
Yiing

-Original Message-
From: Tuong Lien Tong [mailto:tuong.t.l...@dektech.com.au] 
Sent: Wednesday, November 20, 2019 11:37 AM
To: 'Jon Maloy'; Xue, Ying; tipc-discussion@lists.sourceforge.net; 
ma...@donjonn.com
Subject: RE: [iproute2] tipc: add new commands to set TIPC AEAD key

Hi Jon/Ying,

We still have this patch (i.e. for the 'tipc node set/flush key...' commands), 
may I put your ACK on it before sending to iproute2-next?
Many thanks!

BR/Tuong

-Original Message-
From: Jon Maloy  
Sent: Wednesday, October 16, 2019 9:51 PM
To: Ying Xue ; Tuong Tong Lien 
; tipc-discussion@lists.sourceforge.net; 
ma...@donjonn.com
Subject: RE: [iproute2] tipc: add new commands to set TIPC AEAD key



> -Original Message-
> From: Ying Xue 
> Sent: 16-Oct-19 08:30
> To: Tuong Tong Lien ; tipc-
> discuss...@lists.sourceforge.net; Jon Maloy ;
> ma...@donjonn.com
> Subject: Re: [iproute2] tipc: add new commands to set TIPC AEAD key
> 
> Tt looks like we will use "tipc node" command to configure static key to TIPC
> module, right?

The key is static in the sense that TIPC itself cannot change the key. But the 
protocol ensures that keys can be replaced without any traffic disturbances.

> 
> Do we plan to support dynamic key setting? If yes, what kinds of key exchange
> protocol would we use? For example, in IPSEC, it uses IKEv2 as its key
> exchange protocol.

At the moment we assume there is an external user land framework where node 
authentication is done and where keys are generated and distributed (via TLS) 
to the nodes.
When we want to replace a key (probably at fix pre-defined intervals), the 
framework has to generate new keys and distribute/inject those to TIPC.

> 
> Will key be expired after a specific lifetime? For instance, in
> IPSEC/Raccoon2 or strongswan, they use rekey feature to provide this
> function to make security association safer.

We are considering this, so that the external framework can be kept simpler or 
even be eliminated. That would be the next step, once this series is applied.

Regards
///jon


> 
> On 10/14/19 7:36 PM, Tuong Lien wrote:
> > Two new commands are added as part of 'tipc node' command:
> >
> >  $tipc node set key KEY [algname ALGNAME] [nodeid NODEID]  $tipc node
> > flush key
> >
> > which enable user to set and remove AEAD keys in kernel TIPC.
> >
> > For the 'set key' command, the given 'nodeid' parameter decides the
> > mode to be applied to the key, particularly:
> >
> > - If NODEID is empty, the key is a 'cluster' key which will be used
> > for all message encryption/decryption from/to the node (i.e. both TX & RX).
> > The same key needs to be set in the other nodes i.e. the 'cluster key'
> > mode.
> >
> > - If NODEID is own node, the key is used for message encryption (TX)
> > from the node. Whereas, if NODEID is a peer node, the key is for
> > message decryption (RX) from that peer node.
> > This is the 'per-node-key' mode that each nodes in the cluster has its
> > specific (TX) key.
> >
> > Signed-off-by: Tuong Lien 
> > ---
> >  include/uapi/linux/tipc.h |  21 ++
> >  include/uapi/linux/tipc_netlink.h |   4 ++
> >  tipc/misc.c   |  38 +++
> >  tipc/misc.h   |   1 +
> >  tipc/node.c   | 133
> +-
> >  5 files changed, 195 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/tipc.h b/include/uapi/linux/tipc.h
> > index e16cb4e2..b118ce9b 100644
> > --- a/include/uapi/linux/tipc.h
> > +++ b/include/uapi/linux/tipc.h
> > @@ -232,6 +232,27 @@ struct tipc_sioc_nodeid_req {
> > char node_id[TIPC_NODEID_LEN];
> >  };
> >
> > +/*
> > + * TIPC Crypto, AEAD mode
> > + */
> > +#define TIPC_AEAD_MAX_ALG_NAME (32)
> > +#define TIPC_AEAD_MIN_KEYLEN   (16 + 4)
> > +#define TIPC_AEAD_MAX_KEYLEN   (32 + 4)
> > +
> > +struct tipc_aead_key {
> > +   char alg_name[TIPC_AEAD_MAX_ALG_NAME];
> > +   unsigned int keylen;/* in bytes */
> > +   char key[];
> > +};
> > +
> > +#define TIPC_AEAD_KEY_MAX_SIZE (sizeof(struct tipc_aead_key) + \
> > +   TIPC_AEAD_MAX_KEYLEN)
> > +
> > +static inline int tipc_aead_key_size(struct tipc_aead_key *key) {
> > +   return sizeof(*key) + key->keylen;
> > +}
> > +
> >  /* The macros and functions below are deprecated:
> >   */
> >
> > diff --git a/include/uapi/linux/tipc_netlink.h
> > b/include/

Re: [tipc-discussion] [net-next v3 1/1] tipc: introduce variable window congestion control

2019-11-19 Thread Xue, Ying

Hi Jon,

I don't know you ever remembered many years ago I ever implemented a prototype 
to introduce slow-start and traffic congestion avoidance algorithms on link 
layer :) 

In my experiment, there were a few of meaningful findings:
- It was crucial for the throughput performance about how to set initial 
congestion window and upper congestion window size.
- It was found that different Ethernet speeds, different CPU capabilities and 
different message sizes all have a big impact on throughput performance. I did 
lots of experiments, as a result, sometimes performance improvement was very 
obvious, but sometimes, performance improvement was minimal even when I 
measured throughput in a similar network environment. Particularly, if I 
slightly changed some test conditions, throughput improvement results were also 
quite different.

At that moment, I ever doubted whether we needed to do the following changes:
1. Should we introduce a timer to measure RTT and identify whether network 
congestion happens or not? 
2. Should we change message delivery mode from message-oriented to 
byte-oriented?

Of course, in my experiment I didn't make so big changes. 

So I want to know:
-  How did you select the minimum window size and maximum window size?
-  Did you measure TIPC throughput performance on different Ethernets? 
Including large message size test and small message size test. 
- Did you meet similar phenomena to me when we slightly changed test condition?

In this proposal, during slow-start stage, the window increase is pretty slow:

+   if (qlen >= cwin && (l->snd_nxt - buf_seqno(skb_peek(txq)) == qlen)) {
+   add = l->cong_acks++ % 32 ? 0 : 1;
+   cwin = min_t(u16, cwin + add, l->max_win);
+   l->window = cwin;
+   }

But in TCP slow-start algorithm, during slow-start stage congestion window 
increase is much more aggressive than above. As long as congestion window 
exceeds slow-start threshold, it enters congestion avoidance stage in which 
congestion window increases slowly.

I am curious why the congestion window increase is pretty conservative compared 
to TCP slow-start algorithm. What factors did you consider when you selected 
the algorithm?

Thanks,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Tuesday, November 19, 2019 7:33 AM
To: Jon Maloy; Jon Maloy
Cc: mohan.krishna.ghanta.krishnamur...@ericsson.com; 
parthasarathy.bhuvara...@gmail.com; tung.q.ngu...@dektech.com.au; 
hoang.h...@dektech.com.au; tuong.t.l...@dektech.com.au; 
gordan.mihalje...@dektech.com.au; Xue, Ying; 
tipc-discussion@lists.sourceforge.net
Subject: [net-next v3 1/1] tipc: introduce variable window congestion control

We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, and can best be
descibed as working in permanent "congestion avoidance" mode, within
strict limits.

- We introduce hard lower and upper window limits per link, still
  different and configurable per bearer type.

- Next, we let a link start at the minimum window, and then slowly
  increment it for each 32 received non-duplicate ACK. This goes on
  until it either reaches the upper limit, or until it receives a
  NACK message.

- For each non-duplicate NACK received, we let the window decrease by
  intervals of 1/2 of the current window, but not below the minimum
  window.

The change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.

This algorithm seems to give a ~25% throughput improvement for large
messages, while it has no effect on throughput for small messages.

Suggested-by: Xin Long 
Acked-by: Xin Long 
Signed-off-by: Jon Maloy 

---
v2: - Moved window increment in tipc_advance_backlogq() to before
  the transfer loop, as suggested Tuong.
- Introduced logic for incrementing the window even for the
  broadcast send link, also suggested by Tuong.
v3: - Rebased to latest net-next
---
 net/tipc/bcast.c | 11 
 net/tipc/bearer.c| 11 
 net/tipc/bearer.h|  6 +++--
 net/tipc/eth_media.c |  6 -
 net/tipc/ib_media.c  |  5 +++-
 net/tipc/link.c  | 76 ++--
 net/tipc/link.h  |  9 ---
 net/tipc/node.c  | 13 +
 net/tipc/udp_media.c |  3 ++-
 9 files changed, 90 insertions(+), 50 deletions(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index f41096a..84da317 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -562,18 +562,18 @@ int tipc_bclink_reset_stats(struct net *net)
return 0;
 }
 
-static int tipc_bc_link_set_queue_limits(struct net *net, u32 limit)
+static int tipc_bc_link_set_queue_limits(struct net *net, u32 max_win)
 {
struct t

Re: [tipc-discussion] [PATCH RFC 0/5] TIPC encryption

2019-11-01 Thread Xue, Ying

Good job. 

This is a big and complex feature. Particularly for most of users who might not 
consider to use this feature, please consider to give them a choice to 
completely disable it by adding a new kernel option like TIPC_CRYPTO.

Thanks,
Ying

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Monday, October 14, 2019 7:07 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying
Subject: [PATCH RFC 0/5] TIPC encryption

This series provides TIPC encryption feature, kernel part. There will be
another one in the 'iproute2/tipc' for user space to set key.

Tuong Lien (5):
  tipc: add reference counter to bearer
  tipc: enable creating a "preliminary" node
  tipc: add new AEAD key structure for user API
  tipc: introduce TIPC encryption & authentication
  tipc: add support for AEAD key setting via netlink

 include/uapi/linux/tipc.h |   21 +
 include/uapi/linux/tipc_netlink.h |4 +
 net/tipc/Makefile |2 +-
 net/tipc/bcast.c  |2 +-
 net/tipc/bearer.c |   52 +-
 net/tipc/bearer.h |6 +-
 net/tipc/core.c   |   10 +
 net/tipc/core.h   |4 +
 net/tipc/crypto.c | 1986 +
 net/tipc/crypto.h |  166 
 net/tipc/link.c   |   16 +-
 net/tipc/link.h   |1 +
 net/tipc/msg.c|   24 +-
 net/tipc/msg.h|   44 +-
 net/tipc/netlink.c|   16 +-
 net/tipc/node.c   |  314 +-
 net/tipc/node.h   |   10 +
 net/tipc/sysctl.c |9 +
 net/tipc/udp_media.c  |1 +
 19 files changed, 2604 insertions(+), 84 deletions(-)
 create mode 100644 net/tipc/crypto.c
 create mode 100644 net/tipc/crypto.h

-- 
2.13.7


___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net-next] tipc: improve message bundling algorithm

2019-10-31 Thread Xue, Ying

Sorry for late response to this commit. Great change!

Acked-by: Ying Xue 

-Original Message-
From: Tuong Lien [mailto:tuong.t.l...@dektech.com.au] 
Sent: Tuesday, October 15, 2019 12:59 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying
Subject: [net-next] tipc: improve message bundling algorithm

As mentioned in commit e95584a889e1 ("tipc: fix unlimited bundling of
small messages"), the current message bundling algorithm is inefficient
that can generate bundles of only one payload message, that causes
unnecessary overheads for both the sender and receiver.

This commit re-designs the 'tipc_msg_make_bundle()' function (now named
as 'tipc_msg_try_bundle()'), so that when a message comes at the first
place, we will just check & keep a reference to it if the message is
suitable for bundling. The message buffer will be put into the link
backlog queue and processed as normal. Later on, when another one comes
we will make a bundle with the first message if possible and so on...
This way, a bundle if really needed will always consist of at least two
payload messages. Otherwise, we let the first buffer go its way without
any need of bundling, so reduce the overheads to zero.

Moreover, since now we have both the messages in hand, we can even
optimize the 'tipc_msg_bundle()' function, make bundle of a very large
(size ~ MSS) and small messages which is not with the current algorithm
e.g. [1400-byte message] + [10-byte message] (MTU = 1500).

Signed-off-by: Tuong Lien 
---
 net/tipc/link.c |  60 +++---
 net/tipc/msg.c  | 153 +---
 net/tipc/msg.h  |   5 +-
 3 files changed, 114 insertions(+), 104 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 999eab592de8..3bd60bdbf56c 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -940,16 +940,17 @@ int tipc_link_xmit(struct tipc_link *l, struct 
sk_buff_head *list,
   struct sk_buff_head *xmitq)
 {
struct tipc_msg *hdr = buf_msg(skb_peek(list));
-   unsigned int maxwin = l->window;
-   int imp = msg_importance(hdr);
-   unsigned int mtu = l->mtu;
+   struct sk_buff_head *backlogq = >backlogq;
+   struct sk_buff_head *transmq = >transmq;
+   struct sk_buff *skb, *_skb;
+   u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
u16 ack = l->rcv_nxt - 1;
u16 seqno = l->snd_nxt;
-   u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
-   struct sk_buff_head *transmq = >transmq;
-   struct sk_buff_head *backlogq = >backlogq;
-   struct sk_buff *skb, *_skb, **tskb;
int pkt_cnt = skb_queue_len(list);
+   int imp = msg_importance(hdr);
+   unsigned int maxwin = l->window;
+   unsigned int mtu = l->mtu;
+   bool new_bundle;
int rc = 0;

if (unlikely(msg_size(hdr) > mtu)) {
@@ -975,20 +976,18 @@ int tipc_link_xmit(struct tipc_link *l, struct 
sk_buff_head *list,
}

/* Prepare each packet for sending, and add to relevant queue: */
-   while (skb_queue_len(list)) {
-   skb = skb_peek(list);
-   hdr = buf_msg(skb);
-   msg_set_seqno(hdr, seqno);
-   msg_set_ack(hdr, ack);
-   msg_set_bcast_ack(hdr, bc_ack);
-
+   while ((skb = __skb_dequeue(list))) {
if (likely(skb_queue_len(transmq) < maxwin)) {
+   hdr = buf_msg(skb);
+   msg_set_seqno(hdr, seqno);
+   msg_set_ack(hdr, ack);
+   msg_set_bcast_ack(hdr, bc_ack);
_skb = skb_clone(skb, GFP_ATOMIC);
if (!_skb) {
+   kfree_skb(skb);
__skb_queue_purge(list);
return -ENOBUFS;
}
-   __skb_dequeue(list);
__skb_queue_tail(transmq, skb);
/* next retransmit attempt */
if (link_is_bc_sndlink(l))
@@ -1000,22 +999,27 @@ int tipc_link_xmit(struct tipc_link *l, struct 
sk_buff_head *list,
seqno++;
continue;
}
-   tskb = >backlog[imp].target_bskb;
-   if (tipc_msg_bundle(*tskb, hdr, mtu)) {
-   kfree_skb(__skb_dequeue(list));
-   l->stats.sent_bundled++;
-   continue;
-   }
-   if (tipc_msg_make_bundle(tskb, hdr, mtu, l->addr)) {
-   kfree_skb(__skb_dequeue(list));
-   __skb_queue_tail(backlogq, *tskb);
-   l->backlog[imp].len++;
-   l->stats.sent_bundled++;
-   l->stats.sent_bundles++;
+   if (tipc_msg_try_bundle(l->back

Re: [tipc-discussion] [net-next v2 1/1] tipc: add smart nagle feature

2019-10-24 Thread Xue, Ying

Hi Jon,

We have the following comments:
- Please consider to add TIPC_NODELAY option of tipc_setsockopt() so that user 
has right to disable nagle algorithm.
-  I don't understand why we don't transmit the accumulated contents of the 
write queue when a CONN_PROBE message is received from the peer. Can you please 
explain it?
- I am just curious what impact the nagle feature has on latency for 
SOCK_STREAM socket. Did you ever measure latency after nagle feature is enabled?

Thanks,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Wednesday, October 23, 2019 3:53 PM
To: Jon Maloy; Jon Maloy
Cc: mohan.krishna.ghanta.krishnamur...@ericsson.com; 
parthasarathy.bhuvara...@gmail.com; tung.q.ngu...@dektech.com.au; 
hoang.h...@dektech.com.au; tuong.t.l...@dektech.com.au; 
gordan.mihalje...@dektech.com.au; Xue, Ying; 
tipc-discussion@lists.sourceforge.net
Subject: [net-next v2 1/1] tipc: add smart nagle feature

We introduce a Nagle-like algorithm for bundling small messages at the
socket level.

- A socket enters nagle mode when more than 4 messages have been sent
  out without receiving any data message from the peer.
- A socket leaves nagle mode whenever it receives a data message from
  the peer.

In nagle mode, small messages are accumulated in the socket write queue.
The last buffer in the queue is marked with a new 'ack_required' bit,
which forces the receiving peer to send a CONN_ACK message back to the
sender.

The accumulated contents of the write queue is transmitted when one of
the following events or conditions occur.

- A CONN_ACK message is received from the peer.
- A data message is received from the peer.
- A SOCK_WAKEUP pseudo message is received from the link level.
- The write queue contains more than 64 1k blocks of data.
- The connection is being shut down.
- There is no CONN_ACK message to expect. I.e., there is currently
  no outstanding message where the 'ack_required' bit was set. As a
  consequence, the first message added after we enter nagle mode
  is always sent directly with this bit set.

This new feature gives a 50-100% improvement of throughput for small
(i.e., less than MTU size) messages, while it might add up to one RTT
to latency time when the socket is in nagle mode.

Signed-off-by: Jon Maloy 

---
v2: Increased max nagle size for UDP to 14k. This improves
throughput for messages 750-1500 bytes with ~50%.
---
 net/tipc/msg.c| 53 
 net/tipc/msg.h| 12 
 net/tipc/node.h   |  7 +++--
 net/tipc/socket.c | 91 +--
 4 files changed, 145 insertions(+), 18 deletions(-)

diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index 922d262..973795a 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -190,6 +190,59 @@ int tipc_buf_append(struct sk_buff **headbuf, struct 
sk_buff **buf)
return 0;
 }
 
+/**
+ * tipc_msg_append(): Append data to tail of an existing buffer queue
+ * @hdr: header to be used
+ * @m: the data to be appended
+ * @mss: max allowable size of buffer
+ * @dlen: size of data to be appended
+ * @txq: queue to appand to
+ * Returns the number og 1k blocks appended or errno value
+ */
+int tipc_msg_append(struct tipc_msg *_hdr, struct msghdr *m, int dlen,
+   int mss, struct sk_buff_head *txq)
+{
+   struct sk_buff *skb, *prev;
+   int accounted, total, curr;
+   int mlen, cpy, rem = dlen;
+   struct tipc_msg *hdr;
+
+   skb = skb_peek_tail(txq);
+   accounted = skb ? msg_blocks(buf_msg(skb)) : 0;
+   total = accounted;
+
+   while (rem) {
+   if (!skb || skb->len >= mss) {
+   prev = skb;
+   skb = tipc_buf_acquire(mss, GFP_KERNEL);
+   if (unlikely(!skb))
+   return -ENOMEM;
+   skb_orphan(skb);
+   skb_trim(skb, MIN_H_SIZE);
+   hdr = buf_msg(skb);
+   skb_copy_to_linear_data(skb, _hdr, MIN_H_SIZE);
+   msg_set_hdr_sz(hdr, MIN_H_SIZE);
+   msg_set_size(hdr, MIN_H_SIZE);
+   __skb_queue_tail(txq, skb);
+   total += 1;
+   if (prev)
+   msg_set_ack_required(buf_msg(prev), 0);
+   msg_set_ack_required(hdr, 1);
+   }
+   hdr = buf_msg(skb);
+   curr = msg_blocks(hdr);
+   mlen = msg_size(hdr);
+   cpy = min_t(int, rem, mss - mlen);
+   if (cpy != copy_from_iter(skb->data + mlen, cpy, >msg_iter))
+   return -EFAULT;
+   msg_set_size(hdr, mlen + cpy);
+   skb_put(skb, cpy);
+   rem -= cpy;
+   total += msg_blocks(hdr) - curr;
+   }
+   return total - accounted;
+}
+
 /* tipc_msg_validate - validate

Re: [tipc-discussion] [net-next] tipc: improve throughput between nodes in netns

2019-10-14 Thread Xue, Ying

Hi Jon,

Please see my comment inline.

At netdev 0x13 in Prague last July there was presented a related proposal 
https://netdevconf.info/0x13/session.html?talk-AF_GRAFT.
I was there, and I cannot say there was any overwhelming approval of this 
proposal, but neither was it rejected out of hand.

[Ying]  The idea of AF_GRAFT socket is exactly the same as this patch. If it 
can be recognized, it's definitely worth trying to submit this patch to 
upstream. But after my checking, the wired thing is that AF_GRAFT is not 
supported by latest kernel and I don't find its author ever attempted to submit 
its patch to upstream.

First, I see TIPC as an IPC, not a network protocol, and anybody using TIPC 
inside a cluster has per definition been authenticated to start a node and 
connect to the cluster. Here, there is no change from current policies.
Once a node has been accepted in a cluster, possibly via encrypted discovery 
messages which have been passing all policies checks, and we are 100% certain 
it is legitimate and located in the same kernel (as we are trying to ensure in 
this patch), I cannot see any reason why we should not be allowed to short-cut 
the stack the way we do. Security checks have already been done.
Are we circumventing any other policies by doing this that must not be done? 

[Ying] If we treat TIPC as IPC channel, bypassing its lower level interface is 
acceptable. Beside AF_GRAFT socket, in fact AF_UNIX socket provides an 
interconnection mechanism between different processes on socket level, and 
there are several options available for us to configure policies against 
socket, such as, SO_ATTACH_FILTER, SO_ATTACH_BPF, SO_ATTACH_REUSEPORT_EBPF etc. 
If we bypass TIPC bearer, the most inconvenient thing is that it's hard for us 
to monitor traffics between netns with tcpdump.  Of course, as Xin mentioned 
previously, we could not use traditional tools to control/shape TIPC traffic 
across netns. 

Unless you strongly object I would suggest we send this to netdev as an RFC  
and observe the reactions. If David or Eric or any of the other heavyweight say 
flatly no there is nothing we can do. But It might be worth a try.

[Ying] No, I don't strongly object this proposal. We can try to submit it to 
net-next mail list. 

Thanks,
Ying

> -Original Message-
> From: Xue, Ying 
> Sent: 11-Oct-19 07:58
> To: Jon Maloy ; Xin Long 
> Subject: RE: [net-next] tipc: improve throughput between nodes in netns
> 
> Exactly. I agree with Xin. The major purpose of namespace is mainly to provide
> an isolated environment. But as this patch almost completely bypasses security
> check points of networking stack, the traffics between namespaces will be out
> of control. So I don't think this is a good idea.
> 
> Thanks,
> Ying
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Friday, October 11, 2019 2:14 AM
> To: Xin Long
> Cc: Xue, Ying
> Subject: RE: [net-next] tipc: improve throughput between nodes in netns
> 
> Hi Xin,
> I am not surprised by you answer. Apart from concerns about security, this is
> the same objection I have heard from others when presenting this idea, and I
> suspect that this would also be the reaction if we try to deliver this to 
> David.
> If we can achieve anything close to this by adding GSO to the veth interface I
> think that would be a safer approach.
> So, I suggest we put this one to rest for now, and I'll try to go ahead with 
> the
> GSO approach instead.
> 
> Sorry Hoang for making you waste your time.
> 
> BR
> ///jon
> 
> > -Original Message-
> > From: Xin Long 
> > Sent: 10-Oct-19 07:14
> > To: Jon Maloy 
> > Cc: Ying Xue 
> > Subject: Re: [net-next] tipc: improve throughput between nodes in
> > netns
> >
> >
> >
> > - Original Message -
> > > Ying and Xin,
> > > This is the "wormhole" functionality I have been suggesting a since
> > > while back.
> > > Basically, we send messages directly socket to socket between name
> > > spaces on the same host, not only between sockets within the same
> > > name
> > space.
> > > As you will understand this might have a huge positive impact on
> > > performance between e.g., docker containers or containers inside
> > Kubernetes pods.
> > >
> > > Please spend some time reviewing this, as it might be a
> > > controversial feature. It is imperative that we get security right here.
> > >
> > If I understand it right:
> >
> > With this patch, TIPC packets will skip all lower layers protocol
> > stack, like IP (udp media), ether link layer, which means all rules of
> > like tc, ovs, netfiler/br_netfilter will be skipped.
> >
> > I

Re: [tipc-discussion] [PATCH RFC 2/2] tipc: improve message bundling algorithm

2019-10-11 Thread Xue, Ying

I can recognize this is a good improvement except that the following switch 
cases of return values of tipc_msg_try_bundle() are not very friendly for code 
reader. Although I do understand their real meanings, I have to spend time 
checking its context back and forth. At least we should the meaningless hard 
code case numbers or we try to change return value numbers of 
tipc_msg_try_bundle().

+   n = tipc_msg_try_bundle(>backlog[imp].target_bskb, skb,
+   mtu - INT_H_SIZE,
+   l->addr);
+   switch (n) {
+   case 0:
+   break;
+   case 1:
+   __skb_queue_tail(backlogq, skb);
l->backlog[imp].len++;
-   l->stats.sent_bundled++;
+   continue;
+   case 2:
l->stats.sent_bundles++;
+   l->stats.sent_bundled++;
+   default:
+   kfree_skb(skb);
+   l->stats.sent_bundled++;
continue;





___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

2019-06-18 Thread Xue, Ying

Hi Tuong.

Thank you for your explanation . It makes sense to me. 
Please go ahead. 

Thanks,
Ying

-Original Message-
From: Tuong Lien Tong [mailto:tuong.t.l...@dektech.com.au] 
Sent: Monday, June 17, 2019 4:49 PM
To: Xue, Ying; tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com
Subject: RE: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

Hi Ying,

Thanks for your comments!
Regarding your last statement, yes when making the patch, I noticed that the 
"tipc_msg_build()" and "tipc_msg_fragment()" do a similar task, also I tried to 
think a way to combine them but didn't because of the reasons:
1- The "core" functions to copy the data are different since the 
"tipc_msg_build()" plays with user data in the iov buffers, whereas, for the 
other, it's skb data.
Also, the outputs are different, the first function will set the messages' type 
in header such as "FIRST_FRAGMENT", "FRAGMENT" or "LAST_FRAGMENT", but not with 
the other because it will overwrite the tunnel messages' type... that I had to 
use the other field (fragm_no/nof_fragms) to determine this at the receiving 
side...
2- I don't want to touch the old code that can be risky :(

BR/Tuong

-Original Message-
From: Ying Xue  
Sent: Sunday, June 16, 2019 1:42 PM
To: Tuong Lien ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: Re: [PATCH RFC 2/2] tipc: fix changeover issues due to large packet

> 2) The same scenario above can happen more easily in case the MTU of
> the links is set differently or when changing. In that case, as long as
> a large message in the failure link's transmq queue was built and
> fragmented with its link's MTU > the other link's one, the issue will
> happen (there is no need of a link synching in advance).
> 
> 3) The link synching procedure also faces with the same issue but since
> the link synching is only started upon receipt of a SYNCH_MSG, dropping
> the message will not result in a state deadlock, but it is not expected
> as design.
> 
> The 1) & 3) issues are resolved by the previous commit 81e4dd94b214

This is the same as previous commit. The commit ID might be invalid
after it's merged into upstream.

> ("tipc: optimize link synching mechanism") by generating only a dummy
> SYNCH_MSG (i.e. without data) at the link synching, so the size of a
> FAILOVER_MSG if any then will never exceed the link's MTU.

>  /**
> + * tipc_msg_fragment - build a fragment skb list for TIPC message
> + *
> + * @skb: TIPC message skb
> + * @hdr: internal msg header to be put on the top of the fragments
> + * @pktmax: max size of a fragment incl. the header
> + * @frags: returned fragment skb list
> + *
> + * Returns 0 if the fragmentation is successful, otherwise: -EINVAL
> + * or -ENOMEM
> + */
> +int tipc_msg_fragment(struct sk_buff *skb, const struct tipc_msg *hdr,
> +   int pktmax, struct sk_buff_head *frags)
> +{
> + int pktno, nof_fragms, dsz, dmax, eat;
> + struct tipc_msg *_hdr;
> + struct sk_buff *_skb;
> + u8 *data;
> +
> + /* Non-linear buffer? */
> + if (skb_linearize(skb))
> + return -ENOMEM;
> +
> + data = (u8 *)skb->data;
> + dsz = msg_size(buf_msg(skb));
> + dmax = pktmax - INT_H_SIZE;
> +
> + if (dsz <= dmax || !dmax)
> + return -EINVAL;
> +
> + nof_fragms = dsz / dmax + 1;
> +
> + for (pktno = 1; pktno <= nof_fragms; pktno++) {
> + if (pktno < nof_fragms)
> + eat = dmax;
> + else
> + eat = dsz % dmax;
> +
> + _skb = tipc_buf_acquire(INT_H_SIZE + eat, GFP_ATOMIC);
> + if (!_skb)
> + goto error;
> +
> + skb_orphan(_skb);
> + __skb_queue_tail(frags, _skb);
> +
> + skb_copy_to_linear_data(_skb, hdr, INT_H_SIZE);
> + skb_copy_to_linear_data_offset(_skb, INT_H_SIZE, data, eat);
> + data += eat;
> +
> + _hdr = buf_msg(_skb);
> + msg_set_fragm_no(_hdr, pktno);
> + msg_set_nof_fragms(_hdr, nof_fragms);
> + msg_set_size(_hdr, INT_H_SIZE + eat);
> + }
> + return 0;
> +

In fact we have similar code in tipc_msg_build() where we also fragment
packet if necessary. In order to eliminate redundant code, I suggest we
should extract the common code into a separate function and then
tipc_msg_build() and tipc_msg_fragment() call it.



___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net-next 1/3] tipc: improve TIPC throughput by Gap ACK blocks

2019-03-26 Thread Xue, Ying

Hi Tuong,

Thank you for your explanation below. It's pretty clear for me.

Also, you can add my ack-by tag in the series if you want.

Thanks,
Ying

-Original Message-
From: Tuong Lien Tong [mailto:tuong.t.l...@dektech.com.au] 
Sent: Monday, March 25, 2019 3:33 PM
To: Xue, Ying; tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com
Subject: RE: [net-next 1/3] tipc: improve TIPC throughput by Gap ACK blocks

Hi Ying!

Correct, the idea was inspired from SACK in SCTP protocol but with simplicity.
Also, I see your suggestions about the "duplicated packets"... which is 
connected to the SCTP "delayed SACK" algorithm (i.e. in the case of no payload 
message loss). In TIPC, as I understand so far, we already have such a delay on 
acknowledgements by the link "rcv_unacked" (- Jon may correct me?) (but we 
don’t implement a timer for the SACK delay timeout i.e. the 200ms in the SCTP 
RFC). However, that duplicated TSN concept is based on DATA chunks _and_ "with 
no new DATA chunk(s)" which usually happens in case of SACK loss and the sender 
has tried to retransmit the same DATA chunks when its RTO timer expired..., 
obviously an immediate SACK is needed in this situation. In TIPC, we might not 
face with this situation because we do not have a retransmission timer at 
sender side, duplicates can occur almost due to the overactive NACK sending and 
should be covered by the 2nd patch of the series.
For me, in the case of packet loss, an immediate retransmission is important, 
otherwise it can reduce the performance. However, because we never know if the 
packet is really lost or just delayed, we have to apply the "1ms restriction" 
to reduce duplicates (- as you can also see in the 2nd patch). Fast 
retransmission was also tried, Jon and I had some discussions before... but the 
fact is, in TIPC, the sender is passive (due to no retransmission timer) and we 
could be in trouble if trying to wait for the 2nd or 3rd indications... 
Instead, the NACK sending criteria has been changed by the 2nd patch to both 
reduce duplicates but try to keep the performance...
Actually, in SCTP, the situation is a bit difference as they play with "chunks" 
and "multi-streaming" than individual packets like ours at the link layer, and 
many chunks can be optionally bundled in a single packet because of the 
slow-start or Nagle's algorithm...
Anyway, if you have any ideas to improve TIPC performance more, I will try to 
see what happens.
Thanks a lot!

BR/Tuong 

-Original Message-
From: Ying Xue  
Sent: Friday, March 22, 2019 7:52 PM
To: Tuong Lien ; 
tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; ma...@donjonn.com
Subject: Re: [net-next 1/3] tipc: improve TIPC throughput by Gap ACK blocks

Hi Tuong,

Great job! It's a very nice enhancement, and we should do the
improvement early.

On 3/20/19 11:28 AM, Tuong Lien wrote:
> During unicast link transmission, it's observed very often that because
> of one or a few lost/dis-ordered packets, the sending side will fastly
> reach the send window limit and must wait for the packets to be arrived
> at the receiving side or in the worst case, a retransmission must be
> done first. The sending side cannot release a lot of subsequent packets
> in its transmq even though all of them might have already been received
> by the receiving side.
> That is, one or two packets dis-ordered/lost and dozens of packets have
> to wait, this obviously reduces the overall throughput!
> 
> This commit introduces an algorithm to overcome this by using "Gap ACK
> blocks". Basically, a Gap ACK block will consist of  numbers
> that describes the link deferdq where packets have been got by the
> receiving side but with gaps, for example:
> 
>   link deferdq: [1 2 3 4  10 11  13 14 15   20]
> --> Gap ACK blocks:   <4, 5>,   <11, 1>,  <15, 4>, <20, 0>

This idea is the exactly same as SACK of SCTP:
https://tools.ietf.org/html/rfc4960#section-3.3.4

0   1   2   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Type = 3|Chunk  Flags   |  Chunk Length |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Cumulative TSN Ack   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Advertised Receiver Window Credit (a_rwnd)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Number of Gap Ack Blocks = N  |  Number of

Re: [tipc-discussion] Deadlock warning

2018-10-16 Thread Xue, Ying

Hi Jon,

Okay, please let me look into the issue first, and will get back to you.

Regards,
Ying

From: Jon Maloy [mailto:jon.ma...@ericsson.com]
Sent: Wednesday, October 17, 2018 10:37 AM
To: Xue, Ying
Cc: tipc-discussion@lists.sourceforge.net
Subject: Deadlock warning

Hi Ying,
I sometimes get the following deadlock warning. As I understand it the reason 
is that we are calling rhashtable_walk_enter() in a tmer, i.e., in an SW 
interrupt, something that is not permitted. Do you agree with this 
interpretation? Since you have worked more with these hash tables than I have, 
can you see any easy solution to it. I would hate to introduce a work queue to 
solve this...

///jon



[346769.617370] 
[346769.618331] WARNING: inconsistent lock state
[346769.619187] 4.19.0-rc6+ #27 Tainted: GE
[346769.619651] 
[346769.619651] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[346769.619651] swapper/3/0 [HC0[0]:SC1[3]:HE1:SE0] takes:
[346769.619651] 9e3eefe1 (&(>lock)->rlock){+.?.}, at: 
rhashtable_walk_enter+0x36/0xb0
[346769.619651] {SOFTIRQ-ON-W} state was registered at:
[346769.619651]   _raw_spin_lock+0x29/0x60
[346769.619651]   rht_deferred_worker+0x556/0x810
[346769.619651]   process_one_work+0x1f5/0x540
[346769.619651]   worker_thread+0x64/0x3e0
[346769.619651]   kthread+0x112/0x150
[346769.619651]   ret_from_fork+0x3a/0x50
[346769.619651] irq event stamp: 8581694
[346769.619651] hardirqs last  enabled at (8581694): [] 
__local_bh_enable_ip+0x80   /0x100
[346769.619651] hardirqs last disabled at (8581693): [] 
__local_bh_enable_ip+0x47   /0x100
[346769.619651] softirqs last  enabled at (8581678): [] 
irq_enter+0x5e/0x60
[346769.619651] softirqs last disabled at (8581679): [] 
irq_exit+0xbb/0xc0
[346769.619651]
[346769.619651] other info that might help us debug this:
[346769.619651]  Possible unsafe locking scenario:
[346769.619651]
[346769.619651]CPU0
[346769.619651]
[346769.619651]   lock(&(>lock)->rlock);
[346769.619651]   
[346769.619651] lock(&(>lock)->rlock);
[346769.619651]
[346769.619651]  *** DEADLOCK ***
[346769.619651]
[346769.619651] 2 locks held by swapper/3/0:
[346769.619651]  #0: d9e59d74 ((>timer)){+.-.}, at: 
call_timer_fn+0x5/0x280
[346769.619651]  #1: 0bf452d8 (&(>lock)->rlock){+.-.}, at: 
tipc_disc_timeout+0xc8/0x540[tipc]
[346769.619651]
[346769.619651] stack backtrace:
[346769.619651] CPU: 3 PID: 0 Comm: swapper/3 Tainted: GE 
4.19.0-rc6+ #27
[346769.619651] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[346769.619651] Call Trace:
[346769.619651]  
[346769.619651]  dump_stack+0x78/0xb3
[346769.619651]  print_usage_bug+0x1ed/0x20b
[346769.619651]  mark_lock+0x5ca/0x640
[346769.619651]  __lock_acquire+0x41f/0x1b10
[346769.619651]  ? sched_clock_local+0x12/0x80
[346769.619651]  ? lock_acquire+0xb3/0x190
[346769.619651]  lock_acquire+0xb3/0x190
[346769.619651]  ? rhashtable_walk_enter+0x36/0xb0
[346769.651609]  _raw_spin_lock+0x29/0x60
[346769.651609]  ? rhashtable_walk_enter+0x36/0xb0
[346769.651609]  rhashtable_walk_enter+0x36/0xb0
[346769.651609]  tipc_sk_reinit+0xb0/0x410 [tipc]
[346769.651609]  ? mark_held_locks+0x6f/0x90
[346769.651609]  ? __local_bh_enable_ip+0x80/0x100
[346769.651609]  ? debug_show_all_locks+0x170/0x190
[346769.651609]  tipc_net_finalize+0xbf/0x180 [tipc]
[346769.651609]  tipc_disc_timeout+0x509/0x540 [tipc]

___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net-next 1/1] tipc: extend link reset criteria for stale packet retransmission

2018-07-06 Thread Xue, Ying

Yeah, you definitely can as long as you want :)

Sent from my iPhone

> On Jul 6, 2018, at 8:12 PM, Jon Maloy  wrote:
> 
> Hi,
> Agree with your comment, but that would be some other patch.
> Can I put an "acked-by" on this ?
> 
> ///jon
> 
>> -Original Message-
>> From: Ying Xue 
>> Sent: Friday, 06 July, 2018 07:35
>> To: Jon Maloy ; Jon Maloy 
>> Cc: Mohan Krishna Ghanta Krishnamurthy 
>> ; Tung Quang Nguyen
>> ; Hoang Huu Le ; 
>> Canh Duc Luu ;
>> Gordan Mihaljevic ; 
>> parthasarathy.bhuvara...@gmail.com; tipc-
>> discuss...@lists.sourceforge.net
>> Subject: Re: [net-next 1/1] tipc: extend link reset criteria for stale 
>> packet retransmission
>> 
>>> On 07/06/2018 09:11 AM, Jon Maloy wrote:
>>> Currently a link is declared stale and reset if there has been 100
>>> repeated attempts to retransmit the same packet. However, in certain
>>> infrastrucures we see that packet duplicates and delays may cause
>> 
>> s/infrastrucures/infrastructures
>> 
>>> such retransmit attempts to occur at a high rate, so that the peer
>>> doesn't have a reasonable chance to acknowledge the reception before
>>> the 100-limit is hit. This may take much less than the stipulated link
>>> tolerance time, and despite that probe/probe replies otherwise go
>>> through as normal.
>>> 
>>> We now extend the criteria for link reset to also being time based.
>>> I.e., we don't reset the link until the link tolerance time is passed
>>> AND we have made more than 100 retransmissions attempts.
>>> 
>> 
>> Good idea.
>> 
>> But I also ever observed another phenomenon before. In most of cases
>> many retransmission requests were caused because packets were out of
>> order on receipt side, not because the missing packages are really lost.
>> 
>> For example we often meet such case: after a node just requests its peer
>> to retransmit a series of packets, its requested packets arrive
>> subsequently. It means the unnecessary retransmission requests wasted
>> lot of network bandwidth. Especially for broadcast link, the situation
>> is much worse.
>> 
>> Therefore, we need to figure out one method to postpone a bit
>> retransmission request.
>> 
>> Of course, I am not object to this patch, instead it's is quite good.
>> 
>>> Signed-off-by: Jon Maloy 
>>> ---
>>> net/tipc/link.c | 43 ---
>>> 1 file changed, 24 insertions(+), 19 deletions(-)
>>> 
>>> diff --git a/net/tipc/link.c b/net/tipc/link.c
>>> index 4798a8b..ea1d9d0 100644
>>> --- a/net/tipc/link.c
>>> +++ b/net/tipc/link.c
>>> @@ -106,7 +106,8 @@ struct tipc_stats {
>>>  * @backlogq: queue for messages waiting to be sent
>>>  * @snt_nxt: next sequence number to use for outbound messages
>>>  * @last_retransmitted: sequence number of most recently retransmitted 
>>> message
>>> - * @stale_count: # of identical retransmit requests made by peer
>>> + * @stale_cnt: counter for number of identical retransmit attempts
>>> + * @stale_limit: time when repeated identical retransmits must force link 
>>> reset
>>>  * @ackers: # of peers that needs to ack each packet before it can be 
>>> released
>>>  * @acked: # last packet acked by a certain peer. Used for broadcast.
>>>  * @rcv_nxt: next sequence number to expect for inbound messages
>>> @@ -162,7 +163,8 @@ struct tipc_link {
>>>u16 snd_nxt;
>>>u16 last_retransm;
>>>u16 window;
>>> -u32 stale_count;
>>> +u16 stale_cnt;
>>> +unsigned long stale_limit;
>>> 
>>>/* Reception */
>>>u16 rcv_nxt;
>>> @@ -856,7 +858,7 @@ void tipc_link_reset(struct tipc_link *l)
>>>l->acked = 0;
>>>l->silent_intv_cnt = 0;
>>>l->rst_cnt = 0;
>>> -l->stale_count = 0;
>>> +l->stale_cnt = 0;
>>>l->bc_peer_is_up = false;
>>>memset(>mon_state, 0, sizeof(l->mon_state));
>>>tipc_link_reset_stats(l);
>>> @@ -993,39 +995,41 @@ static void link_retransmit_failure(struct tipc_link 
>>> *l, struct sk_buff *skb)
>>>msg_seqno(hdr), msg_prevnode(hdr), msg_orignode(hdr));
>>> }
>>> 
>>> -int tipc_link_retrans(struct tipc_link *l, struct tipc_link *nacker,
>>> +/* tipc_link_retrans() - retransmit one or more packets
>>> + * @l: the link to transmit on
>>> + * @r: the receiving link ordering the retransmit. Same as l if unicast
>>> + * @from: retransmit from (inclusive) this sequence number
>>> + * @to: retransmit to (inclusive) this sequence number
>>> + * xmitq: queue for accumulating the retransmitted packets
>>> + */
>>> +int tipc_link_retrans(struct tipc_link *l, struct tipc_link *r,
>>>  u16 from, u16 to, struct sk_buff_head *xmitq)
>>> {
>>>struct sk_buff *_skb, *skb = skb_peek(>transmq);
>>> -struct tipc_msg *hdr;
>>> -u16 ack = l->rcv_nxt - 1;
>>>u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1;
>>> +u16 ack = l->rcv_nxt - 1;
>>> +struct tipc_msg *hdr;
>>> 
>>>if (!skb)
>>>return 0;
>>> 
>>>/* Detect repeated retransmit failures on same packet */
>>> -if (nacker->last_retransm != buf_seqno(skb)) {
>>> -

Re: [tipc-discussion] [net 0/5] solve two deadlock issues

2017-02-22 Thread Xue, Ying

Hi Jon,

I understood your concern.

I have checked the possibility of merging patch #1, #4 and #5 as one. However, 
just merging the three patch is insufficient, and at least #2 seems necessary 
too, otherwise, another deadlock still exists due to two premature 'return's in 
subcsrb_report_overlap(). Even if we merged them as one, it will lose my 
initial purpose of dividing the series as so small patches. Although each patch 
is made a small change, it's often related to a policy adjustment of locking or 
holding refcount. Moreover, as our locking policy associated with topserver 
becomes complex, I want to use the comments in each patch header to record what 
policy has been adjusted. In the future, the information can guide whether our 
changes comply with the adjusted policy or not. 

In fact, all changes contained in the series is not big. But if we merged them 
as one, all useful messages will be lost forever.

Additionally, "net-next" tree reaches 4.10-rc8, and "net" tree is 4.10-rc7 now. 
I saw today there was one developer who submitted a patch to net-next and David 
also accepted it. However, if John's testing proved the series is okay 
tomorrow, probably I can send the series to net-next tomorrow. Even for the 
worst case, we cannot submit the series until net-next is open again. But I 
have checked nobody would maintain 4.10 as a stable version. So even if there 
is a big long time gap, it seems not to cause a series issue.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Tuesday, February 21, 2017 7:12 PM
To: Xue, Ying; Parthasarathy Bhuvaragan; thompa@gmail.com
Cc: tipc-discussion@lists.sourceforge.net
Subject: RE: [net 0/5] solve two deadlock issues

Hi Ying,
These are good design changes, that definitely should go in asap. However, I 
feel deeply uncomfortable with such a big change going into 'net', especially 
since our previous, exceptionally large, contribution now has turned out to 
have problems. I wonder if we could not get away with something simpler for 
'net'.

Looking closer at your series, it seems to me that only patches ## 1, 4, and 
the lock removal part of #5 are really needed to solve the problem we have at 
hand now. Why not merge those into one patch and deliver this to 'net', while 
reference count redesign parts can go into net-next ?

Regards
///jon


> -Original Message-
> From: Ying Xue [mailto:ying@windriver.com]
> Sent: Monday, February 20, 2017 06:39 AM
> To: Jon Maloy <jon.ma...@ericsson.com>; Parthasarathy Bhuvaragan 
> <parthasarathy.bhuvara...@ericsson.com>; thompa@gmail.com
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: [net 0/5] solve two deadlock issues
> 
> Commit d094c4d5f5 ("tipc: add subscription refcount to avoid invalid
> delete") accidently introduce the following deadlock scenarios:
> 
>CPU1: CPU2:
> -- 
> tipc_nametbl_publish
> spin_lock_bh(>nametbl_lock)
> tipc_nametbl_insert_publ
> tipc_nameseq_insert_publ
> tipc_subscrp_report_overlap
> tipc_subscrp_get
> tipc_subscrp_send_event
>tipc_close_conn
>tipc_subscrb_release_cb
>tipc_subscrb_delete
>tipc_subscrp_put
> tipc_subscrp_put
> tipc_subscrp_kref_release
> tipc_nametbl_unsubscribe
> spin_lock_bh(>nametbl_lock)
> <>
> 
>CPU1:  CPU2:
> -- 
> tipc_nametbl_stop
> spin_lock_bh(>nametbl_lock)
> tipc_purge_publications
> tipc_nameseq_remove_publ
> tipc_subscrp_report_overlap
> tipc_subscrp_get
> tipc_subscrp_send_event
>tipc_close_conn
>tipc_subscrb_release_cb
>tipc_subscrb_delete
>tipc_subscrp_put
> tipc_subscrp_put
> tipc_subscrp_kref_release
> tipc_nametbl_unsubscribe
> spin_lock_bh(>nametbl_lock)
> <>
> 
> The root cause of two deadlocks is that we have to hold nametbl lock 
> when subscription is freed in tipc_subscrp_kref_release(). In order to 
> eliminate the need of taking nametbl lock in 
> tipc_subscrp_kref_release(), the functions protected by nametbl lock 
> in tipc_subscrp_kref_release() are moved to other places step by step in the 
> series.
> 
> Ying Xue (5):
>   tipc: advance the time of deleting subscription from
> subscriber->subscrp_list
>   tipc: adjust the policy of holding subscription kref
>   tipc: adjust policy that sub->timer holds subscription kref
>   tipc: advance the time of calling tipc_nametbl_unsubscribe
>   tipc

Re: [tipc-discussion] [tipc:bugs] #122 TIPC link down

2017-02-19 Thread Xue, Ying

It seems CPU0 was blocked forever after sending MNI to other all CPUs through 
IPI. As a result, rcu stall happened. But as the interrupt IPI delivered is 
NMI, other CPUs should unconditionally respond to it even if their local 
interrupts were disabled, which means that CPU0 could receive IPI responses 
from other all CPUs. As a consequence, CPU0 should not be blocked in theory. 
But the truth is just converse with the analysis. However, I could not 
understand why other CPUs did not respond to NMI interrupt for CPU0.

From: Erik Hugne [mailto:ehu...@users.sf.net]
Sent: Sunday, February 19, 2017 7:19 PM
To: [tipc:bugs]
Subject: [tipc:bugs] #122 TIPC link down


As Ying already said on the mailing list, the root cause of the lost link must 
be the rcu stall on cpu0
It seems to be in an idle state when the IPI is sent that detects the stall, 
which is weird..
You are running a paravirtualized setup, have you observed this more than once? 
is it always the same backtrace? Is the microcode loaded for the cpu?



[bugs:#122] TIPC link down

Status: open
Group:
Labels: tipc rcu_bh_state
Created: Fri Feb 17, 2017 08:50 AM UTC by Sumit Gemini
Last Updated: Fri Feb 17, 2017 08:50 AM UTC
Owner: Erik Hugne

Hi All,

I have HA pair, and i observed tipc link lost event was not received by standby 
machine. I got this problem

on ACTIVE machine :

Jan 6 16:45:00 ffm-sbc-2b kernel: [3341017.308014] TIPC: Resetting link 
<1.1.2:bond0-1.1.1:bond0>, peer not responding
Jan 6 16:45:00 ffm-sbc-2b kernel: [3341017.308021] TIPC: Lost link 
<1.1.2:bond0-1.1.1:bond0> on network plane A
Jan 6 16:45:00 ffm-sbc-2b kernel: [3341017.308026] TIPC: Lost contact with 
<1.1.1>
Jan 6 16:45:01 ffm-sbc-2b osaffmd[4898]: NO Node Down event for node id 2010f:
Jan 6 16:45:01 ffm-sbc-2b osaffmd[4898]: NO Done Locking applications on node 
id:2010f ret val:0
Jan 6 16:45:01 ffm-sbc-2b osafclmd[4963]: NO Node 131343 went down. Not sending 
track callback for agents on that node
Jan 6 16:45:01 ffm-sbc-2b osafclmd: Last message 'NO Node 131343 went ' 
repeated 5 times, suppressed by syslog-ng on ffm-sbc-2b.mydomain.com
Jan 6 16:45:01 ffm-sbc-2b osaffmd[4898]: NO Current role: ACTIVE
Jan 6 16:45:01 ffm-sbc-2b osaffmd[4898]: Rebooting OpenSAF NodeId = 131343 EE 
Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131599, 
SupervisionTime = 60
Jan 6 16:45:01 ffm-sbc-2b osafamfd[4986]: NO Node 'SC-1' left the cluster
Jan 6 16:45:01 ffm-sbc-2b osafimmd[4910]: WA IMMD lost contact with peer IMMD 
(NCSMDS_RED_DOWN)
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Global discard node received for 
nodeId:2010f pid:5047
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Implementer disconnected 121 <0, 
2010f(down)> (MsgQueueService131343)
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Implementer disconnected 120 <0, 
2010f(down)> (@safAmfService2010f)
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Implementer connected: 122 
(MsgQueueService131343) <592, 2020f>
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Implementer locally disconnected. 
Marking it as doomed 122 <592, 2020f> (MsgQueueService131343)
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Implementer disconnected 122 
<592, 2020f> (MsgQueueService131343)
Jan 6 16:45:01 ffm-sbc-2b opensaf_reboot: No lock is in progress going to 
process further...

On standby machine :

I observed rcu_bh_state, and kernel stack dumo when TIPC lost link was occured 
on ACTIVE machine and after 6 sec we got link lost message on standby machine.

Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.520060] INFO: rcu_bh_state detected 
stall on CPU 0 (t=0 jiffies)
Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] sending NMI to all CPUs:
Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] NMI backtrace for cpu 0
Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] CPU 0
Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Modules linked in: 
nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit ip_set_hash_ipport 
ip_set_hash_ipportip xt_NOTRACK ip_set_bitmap_port xt_sctp nf_conntrack_ipv6 
nf_defrag_ipv6 xt_CT arpt_mangle ip_set_hash_ipnet xt_NFLOG nfnetlink_log 
ipt_ULOG xt_limit xt_hashcounter ip_set_hash_ipip xt_set ip_set_hash_ip deflate 
zlib_deflate ctr twofish_x86_64 twofish_common camellia serpent blowfish cast5 
des_generic cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic md5 
crypto_null af_key iptable_mangle ip_set nfnetlink arptable_filter arp_tables 
iptable_raw iptable_nat tipc xt_tcpudp xt_state xt_pkttype bonding binfmt_misc 
iptable_filter ip6table_filter ip6_tables nf_nat_ftp nf_nat nf_conntrack_ftp 
nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables x_tables mperf edd ipmi_devintf 
ipmi_si ipmi_msghandler nf_conntrack_proto_sctp nf_conntrack sctp 8021q garp 
stp llc gb_sys usb_storage ioatdma ixgbe uas sg igb iTCO_wdt wmi i2c_i801 pcspk
 r mdio iTCO_vendor_support button container dca ipv6 autofs4 usbhid megasr(P) 
ehci_hcd

Re: [tipc-discussion] [tipc:bugs] #122 TIPC link down

2017-02-17 Thread Xue, Ying

Regarding the rcu stall backtrace, I don'think the deadlock was caused by TIPC 
module. Instead I think the reason why the link between active machine and 
standby machine was lost is that standby machine was dead due to rcu stall. 
Therefore, it's key to find out its root cause by looking into rcu stall reason.

However, according to the following rcu stall backtrace, it doesn't provide 
very meaningful information for us. So I don't understand why the rcu stall 
occurred.

Regards,
Ying

From: Sumit Gemini [mailto:sumitgem...@users.sf.net]
Sent: Friday, February 17, 2017 4:51 PM
To: Ticket 122
Subject: [tipc:bugs] #122 TIPC link down



[bugs:#122] TIPC link down

Status: open
Group:
Labels: tipc rcu_bh_state
Created: Fri Feb 17, 2017 08:50 AM UTC by Sumit Gemini
Last Updated: Fri Feb 17, 2017 08:50 AM UTC
Owner: Erik Hugne

Hi All,

I have HA pair, and i observed tipc link lost event was not received by standby 
machine. I got this problem

on ACTIVE machine :

Jan 6 16:45:00 ffm-sbc-2b kernel: [3341017.308014] TIPC: Resetting link 
<1.1.2:bond0-1.1.1:bond0>, peer not responding
Jan 6 16:45:00 ffm-sbc-2b kernel: [3341017.308021] TIPC: Lost link 
<1.1.2:bond0-1.1.1:bond0> on network plane A
Jan 6 16:45:00 ffm-sbc-2b kernel: [3341017.308026] TIPC: Lost contact with 
<1.1.1>
Jan 6 16:45:01 ffm-sbc-2b osaffmd[4898]: NO Node Down event for node id 2010f:
Jan 6 16:45:01 ffm-sbc-2b osaffmd[4898]: NO Done Locking applications on node 
id:2010f ret val:0
Jan 6 16:45:01 ffm-sbc-2b osafclmd[4963]: NO Node 131343 went down. Not sending 
track callback for agents on that node
Jan 6 16:45:01 ffm-sbc-2b osafclmd: Last message 'NO Node 131343 went ' 
repeated 5 times, suppressed by syslog-ng on ffm-sbc-2b.mydomain.com
Jan 6 16:45:01 ffm-sbc-2b osaffmd[4898]: NO Current role: ACTIVE
Jan 6 16:45:01 ffm-sbc-2b osaffmd[4898]: Rebooting OpenSAF NodeId = 131343 EE 
Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131599, 
SupervisionTime = 60
Jan 6 16:45:01 ffm-sbc-2b osafamfd[4986]: NO Node 'SC-1' left the cluster
Jan 6 16:45:01 ffm-sbc-2b osafimmd[4910]: WA IMMD lost contact with peer IMMD 
(NCSMDS_RED_DOWN)
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Global discard node received for 
nodeId:2010f pid:5047
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Implementer disconnected 121 <0, 
2010f(down)> (MsgQueueService131343)
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Implementer disconnected 120 <0, 
2010f(down)> (@safAmfService2010f)
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Implementer connected: 122 
(MsgQueueService131343) <592, 2020f>
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Implementer locally disconnected. 
Marking it as doomed 122 <592, 2020f> (MsgQueueService131343)
Jan 6 16:45:01 ffm-sbc-2b osafimmnd[4922]: NO Implementer disconnected 122 
<592, 2020f> (MsgQueueService131343)
Jan 6 16:45:01 ffm-sbc-2b opensaf_reboot: No lock is in progress going to 
process further...

On standby machine :

I observed rcu_bh_state, and kernel stack dumo when TIPC lost link was occured 
on ACTIVE machine and after 6 sec we got link lost message on standby machine.

Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.520060] INFO: rcu_bh_state detected 
stall on CPU 0 (t=0 jiffies)
Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] sending NMI to all CPUs:
Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] NMI backtrace for cpu 0
Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] CPU 0
Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Modules linked in: 
nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit ip_set_hash_ipport 
ip_set_hash_ipportip xt_NOTRACK ip_set_bitmap_port xt_sctp nf_conntrack_ipv6 
nf_defrag_ipv6 xt_CT arpt_mangle ip_set_hash_ipnet xt_NFLOG nfnetlink_log 
ipt_ULOG xt_limit xt_hashcounter ip_set_hash_ipip xt_set ip_set_hash_ip deflate 
zlib_deflate ctr twofish_x86_64 twofish_common camellia serpent blowfish cast5 
des_generic cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic md5 
crypto_null af_key iptable_mangle ip_set nfnetlink arptable_filter arp_tables 
iptable_raw iptable_nat tipc xt_tcpudp xt_state xt_pkttype bonding binfmt_misc 
iptable_filter ip6table_filter ip6_tables nf_nat_ftp nf_nat nf_conntrack_ftp 
nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables x_tables mperf edd ipmi_devintf 
ipmi_si ipmi_msghandler nf_conntrack_proto_sctp nf_conntrack sctp 8021q garp 
stp llc gb_sys usb_storage ioatdma ixgbe uas sg igb iTCO_wdt wmi i2c_i801 pcspk
 r mdio iTCO_vendor_support button container dca ipv6 autofs4 usbhid megasr(P) 
ehci_hcd usbcore processor thermal_sys [last unloaded: ipt_PORTMAP]
Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042]
Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Pid: 0, comm: swapper 
Tainted: P 3.1.10-gb17-default #1 Intel Corporation S2600CO/S2600CO
Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RIP: 
0010:[] [] native_read_tsc+0x2/0xf
Jan 6 16:45:06 ffm-sbc-2a kernel:

Re: [tipc-discussion] Nametable soft lockup

2017-02-14 Thread Xue, Ying

Hi John,

Thanks for the report and good analysis!

You are absolutely right, and I will figure out how to fix the potential 
deadlock issue. The second issue you pointed out below also really exists. Good 
catch!

Regards,
Ying

-Original Message-
From: John Thompson [mailto:thompa@gmail.com] 
Sent: Thursday, February 09, 2017 6:37 AM
To: tipc-discussion@lists.sourceforge.net
Subject: [tipc-discussion] Nametable soft lockup

Hi,

I have been using the patches Partha had provided for the nametable soft 
lockup, and that I had tested.  This was seen when testing on a SMP system.

Unfortunately I have come across another nametable soft lockup:

<0>NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [AIS listener:1591] 
<6>Modules linked in: tipc jitterentropy_rng echainiv drbg
platform_driver(O) ipifwd(PO)
<6>CPU: 0 PID: 1591 Comm: AIS listener Tainted: P   O
<6>task: ae393600 ti: ae286000 task.ti: ae286000
<6>NIP: 806952bc LR: c160bfe0 CTR: 80695280
<6>REGS: ae287b40 TRAP: 0901   Tainted: P   O
<6>MSR: 00029002   CR: 48002484  XER:  <6>
<6>GPR00: c160a64c ae287bf0 ae393600 a20f18ac   ae064fbc
0030
<6>GPR08: 01001006 0001 0001 0006 80695280 <6>NIP [806952bc] 
_raw_spin_lock_bh+0x3c/0x70 <6>LR [c160bfe0] 
tipc_nametbl_unsubscribe+0x50/0x120 [tipc] <6>Call Trace:
<6>[ae287c10] [c160a64c] tipc_named_reinit+0x33c/0x8a0 [tipc] <6>[ae287c30] 
[c160ad44] tipc_subscrp_report_overlap+0xc4/0xe0 [tipc] <6>[ae287c70] 
[c160b30c] tipc_topsrv_stop+0x45c/0x4f0 [tipc] <6>[ae287ca0] [c160b838] 
tipc_nametbl_remove_publ+0x58/0x110 [tipc] <6>[ae287cd0] [c160bcf8] 
tipc_nametbl_withdraw+0x68/0x140 [tipc] <6>[ae287d00] [c1613cd4] 
tipc_nl_node_dump_link+0x1904/0x45d0 [tipc] <6>[ae287d30] [c16148e8] 
tipc_nl_node_dump_link+0x2518/0x45d0 [tipc] <6>[ae287d70] [804f5a40] 
sock_release+0x30/0xf0 <6>[ae287d80] [804f5b14] sock_close+0x14/0x30 
<6>[ae287d90] [80105844] __fput+0x94/0x200 <6>[ae287db0] [8003dca4] 
task_work_run+0xd4/0x100 <6>[ae287dd0] [80023620] do_exit+0x280/0x980 
<6>[ae287e10] [80024c48] do_group_exit+0x48/0xb0 <6>[ae287e30] [80030344] 
get_signal+0x244/0x4f0 <6>[ae287e80] [80007734] do_signal+0x34/0x1c0 
<6>[ae287f30] [800079a8] do_notify_resume+0x68/0x80 <6>[ae287f40] [8000fa1c] 
do_user_signal+0x74/0xc4


I have gone through the code and I think I have found a place where there is a 
potential soft lockup.
The call chain is:
tipc_nametbl_stop() Grabs nametbl_lock
   tipc_purge_publications()
  tipc_nameseq_remove_publ()
 tipc_subscrp_report_overlap()
tipc_subscrp_put() Calls kref_put when kref == 0 -- could have been 
put by a different CPU
   tipc_subscrp_kref_release()
  tipc_nametbl_unsubscribe()
 << lockup occurs as it grabs the
 nametbl_lock again >>


Another possible issue is in tipc_subscrp_report_overlap(), there are 2 early 
returns after a tipc_subscrp_get() before the tipc_subscrp_put().
Could this end up with an incorrect kref?

JT
--
Check out the vibrant tech community on one of the world's most engaging tech 
sites, SlashDot.org! http://sdm.link/slashdot 
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v7 0/6] topology server fixes for nametable soft lockup

2017-01-23 Thread Xue, Ying

Hi Partha,

As long as you can confirm the series can be applied into the latest "net" 
tree, it's better to deliver to "net".

By the way, if John's testing is passed, I suggest you can add John's 
"Tested-by" tag into the whole series.

Regards,
Ying

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Friday, January 20, 2017 9:06 PM
To: Xue, Ying; tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
thompa@gmail.com
Subject: Re: [PATCH net-next v7 0/6] topology server fixes for nametable soft 
lockup

Hi Ying,

Sure, I will wait for John as he seems to be able to trigger the race 
conditions i couldn't.

Moreover, I was thinking of posting this series to "net" instead of net-next as 
these are primarily bug-fixes.
/Partha

On 01/20/2017 01:12 PM, Xue, Ying wrote:
> Thanks, it's very good now. You can add my ack-by flag to the whole series.
>
> But if possible, please let John help us verify again.
>
> Regards,
> Ying
>
> -Original Message-
> From: Parthasarathy Bhuvaragan 
> [mailto:parthasarathy.bhuvara...@ericsson.com]
> Sent: Friday, January 20, 2017 3:52 AM
> To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
> Xue, Ying; thompa@gmail.com
> Subject: [PATCH net-next v7 0/6] topology server fixes for nametable 
> soft lockup
>
> In this series, we revert the commit 333f796235a527 ("tipc: fix a race 
> condition leading to subscriber refcnt bug") and provide an alternate 
> solution to fix the race conditions in commits 2-4.
>
> We have to do this as the above commit introduced a nametbl soft lockup at 
> module exit as described by patch#4.
>
> ---
> v7: Following updates in Patch #2:
> Fix incorrect deletion of all prior subscriptions until the specified 
> subscription.
> Protect exported tipc_subscrp_report_overlap() with subscription refcount.
> Ensure that subscription can be freed correctly at subscription timer 
> expiry. The
> earlier patch#2 in v5/v6, had refcount bug which prevents the above. This 
> was
> introduced when we skipped get/put refcount in 
> tipc_subscrb_subscrp_delete(), but
> instead do get in tipc_subscrp_subscribe() before starting the timer. 
> Thus the
> subscription_create() initialized the refcount and tipc_subscrp_subscribe 
> steps it
> to 2. At subscription timeout, we perform put only once and we cannot 
> compensate for
> this additional refcount safely.
> v6: Address krefcount warning for John Thompson in Patch#3
> v5: Address Ying's comment in Patch #2 to remove del_timer_sync().
> v4: Address Ying's comment by introducing subscription refcount.
>
> Parthasarathy Bhuvaragan (6):
>   tipc: fix nametbl_lock soft lockup at node/link events
>   tipc: add subscription refcount to avoid invalid delete
>   tipc: fix connection refcount error
>   tipc: fix nametbl_lock soft lockup at module exit
>   tipc: ignore requests when the connection state is not CONNECTED
>   tipc: fix cleanup at module unload
>
>  net/tipc/node.c   |   9 +++-
>  net/tipc/server.c |  48 +  net/tipc/subscr.c | 124 
> ++
>  net/tipc/subscr.h |   1 +
>  4 files changed, 99 insertions(+), 83 deletions(-)
>
> --
> 2.1.4
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v7 0/6] topology server fixes for nametable soft lockup

2017-01-20 Thread Xue, Ying

Thanks, it's very good now. You can add my ack-by flag to the whole series.

But if possible, please let John help us verify again.

Regards,
Ying

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Friday, January 20, 2017 3:52 AM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; Xue, Ying; 
thompa@gmail.com
Subject: [PATCH net-next v7 0/6] topology server fixes for nametable soft lockup

In this series, we revert the commit 333f796235a527 ("tipc: fix a race 
condition leading to subscriber refcnt bug") and provide an alternate solution 
to fix the race conditions in commits 2-4.

We have to do this as the above commit introduced a nametbl soft lockup at 
module exit as described by patch#4.

---
v7: Following updates in Patch #2:
Fix incorrect deletion of all prior subscriptions until the specified 
subscription.
Protect exported tipc_subscrp_report_overlap() with subscription refcount.
Ensure that subscription can be freed correctly at subscription timer 
expiry. The
earlier patch#2 in v5/v6, had refcount bug which prevents the above. This 
was
introduced when we skipped get/put refcount in 
tipc_subscrb_subscrp_delete(), but
instead do get in tipc_subscrp_subscribe() before starting the timer. Thus 
the
subscription_create() initialized the refcount and tipc_subscrp_subscribe 
steps it
to 2. At subscription timeout, we perform put only once and we cannot 
compensate for
this additional refcount safely.
v6: Address krefcount warning for John Thompson in Patch#3
v5: Address Ying's comment in Patch #2 to remove del_timer_sync().
v4: Address Ying's comment by introducing subscription refcount.

Parthasarathy Bhuvaragan (6):
  tipc: fix nametbl_lock soft lockup at node/link events
  tipc: add subscription refcount to avoid invalid delete
  tipc: fix connection refcount error
  tipc: fix nametbl_lock soft lockup at module exit
  tipc: ignore requests when the connection state is not CONNECTED
  tipc: fix cleanup at module unload

 net/tipc/node.c   |   9 +++-
 net/tipc/server.c |  48 +  net/tipc/subscr.c | 124 
++
 net/tipc/subscr.h |   1 +
 4 files changed, 99 insertions(+), 83 deletions(-)

--
2.1.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v4 0/6] topology server fixes for nametable soft lockup

2017-01-18 Thread Xue, Ying

Hi John,

Thank you for the testing.

I think your suggestion is reasonable. But we need to find out its exact 
scenario. Regarding the following message, after one object refcnt is decreased 
to zero by one thread, another thread tries to increment its refcnt, which 
means that we have a race condition. So we must solve the race condition first 
before we adopt your advice.

Regards,
Ying
From: John Thompson [mailto:thompa@gmail.com]
Sent: Wednesday, January 18, 2017 12:23 PM
To: Parthasarathy Bhuvaragan
Cc: tipc-discussion@lists.sourceforge.net; Jon Maloy; Xue, Ying
Subject: Re: [PATCH net-next v4 0/6] topology server fixes for nametable soft 
lockup

Hi Partha,

Thanks for the new patches.  I have tested them and had a kernel warning as per 
below.  This is running on a SMP system with 4 cores.
The warning reads as though one thread has gone to free the item while another 
thread has gotten a reference to it.
The suggestion is to use kref_get_unless_zero() instead of kref_get().

[ cut here ]
WARNING: at /home/johnt/views/main/linux/include/linux/kref.h:46
Modules linked in: tipc jitterentropy_rng echainiv drbg platform_driver(O)
CPU: 1 PID: 1527 Comm: intrTokenReset Tainted: P   O
task: a52c8600 ti: a4c98000 task.ti: a4c98000
NIP: c161809c LR: c1618120 CTR: 80695270
REGS: a4c99b30 TRAP: 0700   Tainted: P   O
MSR: 00029002 <CE,EE,ME>  CR: 28002482  XER: 

GPR00: c1618694 a4c99be0 a52c8600 a4c4e840 a2763aa0  a4ab623c 0030
GPR08: 01001005 0001 c162 0005 80695270 107d12f0 6c23e000 0009
GPR16:  808e 808e 00040100 00040006 807c9428 808f a2e39ac0
GPR24: 808dba00 01001005 a4ab623c a50fb300 0030 a50fb31c a50fb300 a4c4e840
NIP [c161809c] tipc_nl_publ_dump+0x93c/0xf10 [tipc]
LR [c1618120] tipc_nl_publ_dump+0x9c0/0xf10 [tipc]
Call Trace:
[a4c99be0] [800b9e2c] free_pages_prepare+0x18c/0x2a0 (unreliable)
[a4c99c00] [c1618694] tipc_conn_sendmsg+0x24/0x150 [tipc]
[a4c99c30] [c160ad5c] tipc_subscrp_report_overlap+0xbc/0xd0 [tipc]
[a4c99c70] [c160b31c] tipc_topsrv_stop+0x45c/0x4f0 [tipc]
[a4c99ca0] [c160b848] tipc_nametbl_remove_publ+0x58/0x110 [tipc]
[a4c99cd0] [c160bd08] tipc_nametbl_withdraw+0x68/0x140 [tipc]
[a4c99d00] [c1613ce4] tipc_nl_node_dump_link+0x1904/0x45d0 [tipc]
[a4c99d30] [c16148f8] tipc_nl_node_dump_link+0x2518/0x45d0 [tipc]
[a4c99d70] [804f5a40] sock_release+0x30/0xf0
[a4c99d80] [804f5b14] sock_close+0x14/0x30
[a4c99d90] [80105844] __fput+0x94/0x200
[a4c99db0] [8003dca4] task_work_run+0xd4/0x100
[a4c99dd0] [80023620] do_exit+0x280/0x980
[a4c99e10] [80024c48] do_group_exit+0x48/0xb0
[a4c99e30] [80030344] get_signal+0x244/0x4f0
[a4c99e80] [80007734] do_signal+0x34/0x1c0
[a4c99f30] [800079a8] do_notify_resume+0x68/0x80
[a4c99f40] [8000fa1c] do_user_signal+0x74/0xc4
--- interrupt: c00 at 0xf5b0cfc
LR = 0xf5b0ce8
Instruction dump:
4ba8 7c0004ac 7d201828 31290001 7d20192d 40a2fff4 7c0004ac 2f890001
4dbd0020 3d40c162 892ac11e 69290001 <0f09> 2f89 4dbe0020 3921
---[ end trace 544bc785f9258108 ]---

JT


On Thu, Jan 12, 2017 at 1:19 AM, Parthasarathy Bhuvaragan 
<parthasarathy.bhuvara...@ericsson.com<mailto:parthasarathy.bhuvara...@ericsson.com>>
 wrote:
In this series, we revert the commit 333f796235a527 ("tipc: fix a
race condition leading to subscriber refcnt bug") and provide an
alternate solution to fix the race conditions in commits 2-4.

We have to do this as the above commit introduced a nametbl soft
lockup at module exit as described by patch#4.

---
v3: introduce cleanup workqueue to fix nametbl soft lockup.

Parthasarathy Bhuvaragan (6):
  tipc: fix nametbl_lock soft lockup at node/link events
  tipc: add subscription refcount
  tipc: fix connection refcount error
  tipc: fix nametbl_lock soft lockup at module exit
  tipc: ignore requests when the connection state is not CONNECTED
  tipc: fix cleanup at module unload

 net/tipc/node.c   |   9 -
 net/tipc/server.c |  44 +
 net/tipc/subscr.c | 116 --
 net/tipc/subscr.h |   1 +
 4 files changed, 94 insertions(+), 76 deletions(-)

--
2.1.4

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [net-next v3 3/4] tipc: introduce replicast as transport option for multicast

2017-01-17 Thread Xue, Ying

Jon, I remembered I had ever given an ack for the series. Probably i forgot to 
send my ack. Anyway, I had spent time looking at all patches, as a result, no 
any issue was found during the review. This is very nice indeed. Thanks!

Sent from my iPhone

> On 17 Jan 2017, at 11:06 PM, Jon Maloy <jon.ma...@ericsson.com> wrote:
> 
> Thanks Partha. Any viewpoints form Ying? Otherwise I'll send it in tomorrow.
> 
> ///jon
> 
> 
>> -Original Message-
>> From: Parthasarathy Bhuvaragan
>> Sent: Tuesday, 17 January, 2017 09:35
>> To: Jon Maloy <jon.ma...@ericsson.com>; 
>> tipc-discussion@lists.sourceforge.net;
>> Ying Xue <ying@windriver.com>
>> Subject: Re: [tipc-discussion] [net-next v3 3/4] tipc: introduce replicast as
>> transport option for multicast
>> 
>>> On 01/16/2017 03:13 PM, Jon Maloy wrote:
>>> 
>>> 
>>>> -Original Message-
>>>> From: Parthasarathy Bhuvaragan
>>>> Sent: Monday, 16 January, 2017 05:20
>>>> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
>> discuss...@lists.sourceforge.net;
>>>> Ying Xue <ying@windriver.com>
>>>> Subject: Re: [tipc-discussion] [net-next v3 3/4] tipc: introduce replicast 
>>>> as
>>>> transport option for multicast
>>>> 
>>>>> On 01/13/2017 04:18 PM, Jon Maloy wrote:
>>>>> 
>>>>> 
>>>>>> -Original Message-
>>>>>> From: Parthasarathy Bhuvaragan
>>>>>> Sent: Friday, 13 January, 2017 04:24
>>>>>> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-
>>>> discuss...@lists.sourceforge.net;
>>>>>> Ying Xue <ying@windriver.com>
>>>>>> Subject: Re: [tipc-discussion] [net-next v3 3/4] tipc: introduce 
>>>>>> replicast as
>>>>>> transport option for multicast
>>>>>> 
>>>>>>> On 01/04/2017 06:05 PM, Parthasarathy Bhuvaragan wrote:
>>>>>>> Hi Jon,
>>>>>>> 
>>>>>>> Added some minor comments inline in this patch, apart from that the
>>>>>>> major concern is the following:
>>>>>>> 
>>>>>>> All my tests which passed before this patch, fails while sending
>>>>>>> multicast to a receiver on own node.
>>>>>>> 
>>>>>>> With this patch, we increase the likelyhood of receive buffer overflow
>>>>>>> if the sender & receivers are running on the same host as we bypass the
>>>>>>> link layer completely. I confirmed this with some traces in 
>>>>>>> filter_rcv().
>>>>>>> 
>>>>>>> If I add another multicast listener running on another node, this
>>>>>>> pacifies the sender (put the sender to sleep at link congestion) and
>>>>>>> relatively slow link layer reduces the buffer overflow.
>>>>>>> 
>>>>>>> We need to find a way reduce the aggressiveness of the sender.
>>>>>>> We want users to be transparent about the location of the services, so
>>>>>>> we should to provide similar charecteristics regardless of the service
>>>>>>> location.
>>>>>>> 
>>>>>> Jon, running ptts sever and client on a standalone node without your
>>>>>> updates failed. So in that aspect, iam ok with this patch.
>>>>>> 
>>>>>> If the ethernet bearer lacks broadcast ability, then neighbor discovery
>>>>>> will not work. So do we intend to introduce support to add ethernet
>>>>>> peers manually as we do for udp bearers? otherwise we can never use
>>>>>> replicast for non udp bearers.
>>>>> 
>>>>> I believe all Ethernet implementations, even overlay networks, provide
>> some
>>>> form of broadcast, or in lack thereof, an emulated broadcast.
>>>>> So, discovery should work, but it will be very inefficient when we do link
>>>> broadcast, because tipc will think that genuine Ethernet broadcast is
>> supported.
>>>>> We actually need some way to find out what kind of "Ethernet" we are
>>>> attached to, e.g. VXLAN, so that the "bcast supported" flag  can be set
>> correctly.
>>>>> I wonder if that if possible, or if it has to be configured.
>>>>> 
>>>> I as

Re: [tipc-discussion] [net-next v4 0/3] tipc: improve interaction socket-link

2017-01-02 Thread Xue, Ying

Very good job, thanks Jon!

You can feel free to add "acked-by Ying Xue <ying@windriver.com>" flag to 
the series.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Thursday, December 22, 2016 10:51 PM
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; jon.ma...@ericsson.com
Cc: ma...@donjonn.com; thompa@gmail.com
Subject: [net-next v4 0/3] tipc: improve interaction socket-link

We fix a very real starvation problem that may occur when a link encounters 
send buffer congestion. At the same time we make the interaction between the 
socket and link layer simpler and more consistent.

v2: - Simplified link congestion check to only check against own
  importance limit. This reduces the risk of higher levels
  starving out lower levels.
v3: - Adding one sent message to to link backlog queue even if there is
  congestion, as suggested by Partha.
- Allowing link_wakeup() loop to continue adding messages to the
  backlog queue even if one or more levels are congested. This
  seems to have a positive effect on performance.
v4: - Added Partha's fixes, except for #4. I think having a multicast
  being blocked after unicast link congestion is an acceptable
  behavior when weighed against the risks of just purging the
  congestion list.
  
Jon Maloy (3):
  tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg()
functions
  tipc: modify struct tipc_plist to be more versatile
  tipc: reduce risk of user starvation during link congestion

 net/tipc/bcast.c  |   6 +-
 net/tipc/link.c   |  75 -
 net/tipc/msg.h|   2 -
 net/tipc/name_table.c | 100 +++  net/tipc/name_table.h |  21 +--
 net/tipc/node.c   |  15 +-
 net/tipc/socket.c | 449 ++
 7 files changed, 319 insertions(+), 349 deletions(-)

--
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] soft lockup for TIPC

2016-11-29 Thread Xue, Ying

Hi Jon,

As today I am just back from travel, I will look at your question and the issue 
tomorrow.

Sorry for late response.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Friday, November 25, 2016 11:50 PM
To: XIANG Haiming; tipc-discussion@lists.sourceforge.net; Xue, Ying
Subject: RE: soft lockup for TIPC

Ying,
I looked into the patches you posted to remove the bearer lock around February 
2014, but could not find any obvious candidate addressing this problem. I 
suspect you just "eliminated" the potential issue though your series of 
patches. But you possibly remember better than me what was done, and which of 
the patches are needed to resolve the issue.

BR
///jon


> -Original Message-
> From: XIANG Haiming [mailto:haiming.xi...@alcatel-sbell.com.cn]
> Sent: Wednesday, 23 November, 2016 20:41
> To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net
> Subject: RE: soft lockup for TIPC
> 
> Hi Jon,
> 
> I am OK with installing own patches.
> 
> Thank you for your help.
> 
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: 2016年11月24日 1:03
> To: XIANG Haiming; tipc-discussion@lists.sourceforge.net
> Subject: RE: soft lockup for TIPC
> 
> Hi,
> I will take a look into this during the next days. Are you ok with installing 
> your own
> patches?
> 
> ///jon
> 
> 
> > -Original Message-
> > From: XIANG Haiming [mailto:haiming.xi...@alcatel-sbell.com.cn]
> > Sent: Sunday, 20 November, 2016 21:35
> > To: tipc-discussion@lists.sourceforge.net
> > Subject: [tipc-discussion] soft lockup for TIPC
> >
> > Hi all,
> >
> > The version for TIPC which we use is TIPC version 2.0.0.
> > The OS info is as follow:
> >
> > Red Hat Enterprise Linux Server 7.2 (Maipo)
> > Kernel 3.10.0-327.18.2.el7.x86_64 on an x86_64
> >
> > We meet two soft lockup issue about TIPC, please help us to solve this 
> > issue.
> > Thank you
> >
> > One issue is as follow:
> >
> >
> > [85502.601198] BUG: soft lockup - CPU#0 stuck for 22s! [scm:2649]
> > [85502.603585] Modules linked in: iptable_filter ip6table_mangle xt_limit
> > iptable_mangle ip6table_filter ip6_tables igb_uio(OE) uio tipc(OE) 8021q 
> > garp
> stp
> > mrp llc bonding dm_mirror dm_region_hash dm_log dm_mod ppdev
> > crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper
> > ablk_helper cryptd pcspkr i6300esb virtio_balloon cirrus syscopyarea 
> > sysfillrect
> > sysimgblt ttm drm_kms_helper drm i2c_piix4 i2c_core parport_pc parport
> > binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4
> mbcache
> > jbd2 virtio_blk virtio_console crct10dif_pclmul crct10dif_common 
> > crc32c_intel
> > serio_raw ixgbevf virtio_pci virtio_ring virtio ata_generic pata_acpi 
> > ata_piix
> libata
> > floppy
> > [85502.618210] CPU: 0 PID: 2649 Comm: scm Tainted: G   OEL 
> > 
> > 3.10.0-327.18.2.el7.x86_64 #1
> > [85502.620482] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> > [85502.622361] task: 880405da5080 ti: 8800db68c000 task.ti:
> > 8800db68c000
> > [85502.624400] RIP: 0010:[]  []
> > _raw_spin_lock_bh+0x3d/0x50
> > [85502.626558] RSP: 0018:8800db68fa68  EFLAGS: 0202
> > [85502.628412] RAX: 56f6 RBX: 880406600800 RCX:
> > 712a
> > [85502.630419] RDX: 712c RSI: 712c RDI:
> > a03f8548
> > [85502.632423] RBP: 8800db68fa70 R08: 02c0 R09:
> > 0500
> > [85502.634421] R10: 88040f001500 R11: 8800db68fc58 R12:
> 81519be1
> > [85502.636413] R13: 8800db68fa08 R14:  R15:
> > 0240
> > [85502.638415] FS:  () GS:88041fc0(0063)
> > knlGS:d1495b40
> > [85502.640514] CS:  0010 DS: 002b ES: 002b CR0: 80050033
> > [85502.642399] CR2: d37c8208 CR3: 0003cfff7000 CR4:
> > 000406f0
> > [85502.644420] DR0:  DR1:  DR2:
> > 
> > [85502.646416] DR3:  DR6: 0ff0 DR7:
> > 0400
> > [85502.648393] Stack:
> > [85502.649916]  a03f8548 8800db68fa90 a03e2afb
> > 880036242c00
> > [85502.652019]  880406600800 8800db68fac0 a03e5c1b
> > 880036242c00
> > [85502.654108]  880036b5a484 88003698e800 8800db68fb30
> > 8800db68fba8
> > [85502.656179] Call Trace:

Re: [tipc-discussion] v4.7: soft lockup when releasing a socket

2016-11-15 Thread Xue, Ying

Hi John,

Regarding the stack trace you provided below, I get the two potential call 
chains:

tipc_nametbl_withdraw
  spin_lock_bh(>nametbl_lock);
  tipc_nametbl_remove_publ
 spin_lock_bh(>lock);
 tipc_nameseq_remove_publ
   tipc_subscrp_report_overlap
 tipc_subscrp_send_event
tipc_conn_sendmsg
   spin_lock_bh(>outqueue_lock);
   list_add_tail(>list, >outqueue);
 

tipc_topsrv_stop
  tipc_server_stop
tipc_close_conn
  kernel_sock_shutdown
tipc_subscrb_delete
  spin_lock_bh(>lock);
  tipc_nametbl_unsubscribe(sub);
   spin_lock_bh(>nametbl_lock);

Although I suspect this is a revert lock issue leading to the soft lockup, I am 
still unable to understand which lock together with nametbl_lock is taken 
reversely on the two different paths above.
However, you just gave us the log printed on CPU#2, but the logs outputted by 
other cores are also important.  So if possible, please share them with us.

By the way, I agree with you, and it seems that commit 333f796235a527 is 
related to the soft lockup.

Regards,
Ying

-Original Message-
From: John Thompson [mailto:thompa@gmail.com] 
Sent: Tuesday, November 15, 2016 8:01 AM
To: tipc-discussion@lists.sourceforge.net
Subject: [tipc-discussion] v4.7: soft lockup when releasing a socket

Hi,

I am seeing an occasional kernel soft lockup.  I have TIPC v4.7 and the kernel 
dump occurs when the system is going down for a reboot.

The kernel dump is:

<0>NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [exfx:1474] <6>Modules 
linked in: tipc jitterentropy_rng echainiv drbg
platform_driver(O) ipifwd(PO)
...
<6>
<6>GPR00: c15333e8 a4e0fb80 a4ee3600 a51748ac  ae475024 a537feec 
fffd
<6>GPR08: a2197408 0001 0001 0004 80691c00 <6>NIP [80691c40] 
_raw_spin_lock_bh+0x40/0x70 <6>LR [c1534f30] 
tipc_nametbl_unsubscribe+0x50/0x120 [tipc] <6>Call Trace:
<6>[a4e0fba0] [c15333e8] tipc_named_reinit+0xf8/0x820 [tipc] <6>[a4e0fbb0] 
[c15336a0] tipc_named_reinit+0x3b0/0x820 [tipc] <6>[a4e0fbd0] [c1540bac] 
tipc_nl_publ_dump+0x50c/0xed0 [tipc] <6>[a4e0fc00] [c154164c] 
tipc_conn_sendmsg+0xdc/0x170 [tipc] <6>[a4e0fc30] [c1533c9c] 
tipc_subscrp_report_overlap+0xbc/0xd0 [tipc] <6>[a4e0fc70] [c153425c] 
tipc_topsrv_stop+0x45c/0x4f0 [tipc] <6>[a4e0fca0] [c1534788] 
tipc_nametbl_remove_publ+0x58/0x110 [tipc] <6>[a4e0fcd0] [c1534c48] 
tipc_nametbl_withdraw+0x68/0x140 [tipc] <6>[a4e0fd00] [c153cc24] 
tipc_nl_node_dump_link+0x1904/0x45d0 [tipc] <6>[a4e0fd30] [c153d838] 
tipc_nl_node_dump_link+0x2518/0x45d0 [tipc] <6>[a4e0fd70] [804f2870] 
sock_release+0x30/0xf0 <6>[a4e0fd80] [804f2944] sock_close+0x14/0x30 
<6>[a4e0fd90] [80105844] __fput+0x94/0x200 <6>[a4e0fdb0] [8003dca4] 
task_work_run+0xd4/0x100 <6>[a4e0fdd0] [80023620] do_exit+0x280/0x980 
<6>[a4e0fe10] [80024c48] do_group_exit+0x48/0xb0 <6>[a4e0fe30] [80030344] 
get_signal+0x244/0x4f0 <6>[a4e0fe80] [800077
 34] do_signal+0x34/0x1c0 <6>[a4e0ff30] [800079a8] do_notify_resume+0x68/0x80 
<6>[a4e0ff40] [8000fa1c] do_user_signal+0x74/0xc4


>From the stack dump it looks like tipc_named_reinit is trying to
acquire nametbl_lock.

>From looking at the call chain I can see that tipc_conn_sendmsg can
send up calling conn_put

which will go on and call the tipc_named_reinit via tipc_sock_release.

As tipc_nametbl_withdraw (from the stack dump) has already acquired the 
nametbl_lock, tipc_named_reinit

cannot get it and so the process hangs.


The call to tipc_sock_release (added in Commit 333f796235a527
)
seems to have changed the behaviour

such that it tries to do a lot more when shutting the connection down.


If there is other information I can provide please let me know.

Regards,

John
--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion
--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v4 00/14] tipc: create socket FSM using sk_state

2016-10-08 Thread Xue, Ying

Good job!
The series is good for me.
Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Monday, September 19, 2016 11:49 PM
To: Parthasarathy Bhuvaragan; tipc-discussion@lists.sourceforge.net; 
ma...@donjonn.com; Xue, Ying
Subject: RE: [PATCH net-next v4 00/14] tipc: create socket FSM using sk_state

This looks fine to me now.
Acked-by: Jon Maloy

> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Monday, 19 September, 2016 14:56
> To: tipc-discussion@lists.sourceforge.net; Jon Maloy 
> <jon.ma...@ericsson.com>; ma...@donjonn.com; Ying Xue 
> <ying@windriver.com>
> Subject: [PATCH net-next v4 00/14] tipc: create socket FSM using 
> sk_state
> 
> The following issues with the current socket layer hinders socket 
> diagnostics implementation, which led to this patch series. The series 
> does not add any functional change.
> 
> 1. tipc socket state is derived from multiple variables like
>sock->state, tsk->probing_state and tsk->connected. This style forces
>us to export multiple attributes to the user space, which has to be
>backward compatible.
> 
> 2. Abuse of sock->state cannot be exported to user-space without
>requiring tipc specific hacks in the user-space.
>- For connection less (CL) sockets sock->state is overloaded to
>  tipc state SS_READY.
>- For connection oriented (CO) listening socket sock->state is
>  overloaded to tipc state SS_LISTEN.
> 
> This series is split into four:
> 1. A bug fix in patch#1
> 2. Minor cleanups in patch#2-3
> 3. Express all tipc states using a single variable. This is done in patch#4-7.
> 4. Migrate the new tipc states to sk->sk_state. This is done in patch#8-14.
> 
> The figures below represents the FSM after this series:
> 
> For connectionless sockets:
> +---+   +--+
> | TIPC_OPEN |-->| TIPC_CLOSING |
> +---+   +--+
> 
> Stream Server Listening Socket:
> +---+   +-+
> | TIPC_OPEN |-->| TIPC_LISTEN |
> +---+   +-+
> |
> +--+|
> | TIPC_CLOSING |<---+
> +--+
> 
> Stream Server Data Socket:
> +---+   +--+
> | TIPC_OPEN |-->| TIPC_ESTABLISHED |<---+
> +---+   +--+|
> ^   ||  |
> |   |+--+
> |   v
> +--+
> | TIPC_PROBING |
> +--+
>|
>|
>v
> +--+++
> | TIPC_CLOSING |<---| TIPC_DISCONNECTING |
> +--+++
> 
> Stream Socket Client:
> +---+   +-+
> | TIPC_OPEN |-->| TIPC_CONNECTING |
> +---+   +-+
> |
> |
> v
> +--+
> | TIPC_ESTABLISHED |<---+
> +--+|
>  ^   || |
>  |   |+-+
>  |   v
> +--+
> | TIPC_PROBING |
> +--+
>|
>|
>v
> +--+++
> | TIPC_CLOSING |<---| TIPC_DISCONNECTING |
> +--+++
> 
> NOTE:
> This is just a base refractoring required for socket diagnostics.
> Implementation of TIPC socket diagnostics will be sent as a separate 
> series.
> 
> ---
> I plan to submit this series after v4.8 release cycle.
> 
> v4: Addressed comments from Jon Maloy:
> - Added a new patch #2 to rename variable.
> Make state names more readable and consistent:
> - Renamed the state TIPC_UNCONNECTED to TIPC_OPEN.
> - Adjusted the scope for TIPC_DISCONNECTING in patch#11.
> - Added a new state TIPC_CLOSING in patch#12.
> 
> v3: - Address comments from Ying Xue <ying@windriver.com> in
>   patch #7, #11.
> - Rebase on latest netnext which contains fixes for broadcast NACK
>   that seems to make ptts regression stable.
> - Ran ptts suits for 6000 iterations for 5+ hours.
> 
> Parthasarathy Bhuvarag

Re: [tipc-discussion] Tipc can it connect 2vm running in the same host...without using vnic

2016-10-08 Thread Xue, Ying

Hi Ragha,

TIPC protocol is unaware whether it runs on a VM target or a native host, 
moreover, it doesn't need to know. The only way to communicate with other nodes 
in external world is to use network interfaces. But the interfaces may rely on 
a physical NIC or a virtual NIC. But this approach seems unsatisfied with your 
requirement.

However, after UDP bearer was introduced, probably it can help you reach the 
goal. In other words, a TIPC node can talk with other nodes through UDP 
protocol.

Regards,
Ying

-Original Message-
From: keshava murthy [mailto:mailbox@gmail.com] 
Sent: Saturday, October 08, 2016 1:25 PM
To: tipc-discussion@lists.sourceforge.net
Subject: [tipc-discussion] Tipc can it connect 2vm running in the same 
host...without using vnic

Hi
Can Tipc connect 2vm running in the same host...without using vnic?

Regards
Ragha
--
Check out the vibrant tech community on one of the world's most engaging tech 
sites, SlashDot.org! http://sdm.link/slashdot 
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions

2016-08-31 Thread Xue, Ying



-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Wednesday, August 31, 2016 12:06 AM
To: Xue, Ying; Jon Maloy; tipc-discussion@lists.sourceforge.net; Parthasarathy 
Bhuvaragan; Richard Alpe
Cc: gbala...@gmail.com
Subject: RE: [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions



> -Original Message-
> From: Xue, Ying [mailto:ying@windriver.com]
> Sent: Tuesday, 30 August, 2016 05:55
> To: Jon Maloy <ma...@donjonn.com>; Jon Maloy <jon.ma...@ericsson.com>; 
> tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan 
> <parthasarathy.bhuvara...@ericsson.com>; Richard Alpe 
> <richard.a...@ericsson.com>
> Cc: gbala...@gmail.com
> Subject: RE: [PATCH net-next 2/3] tipc: rate limit broadcast 
> retransmissions
> 
> > +/* link_bc_retr eval()- check if the indicated range can be 
> > +retransmitted now
> > + * - Adjust permitted range if there is overlap with previous 
> > +retransmission  */ static bool link_bc_retr_eval(struct tipc_link 
> > +*l,
> > +u16 *from, u16 *to) {
> > +   unsigned long elapsed = jiffies_to_msecs(jiffies - l->prev_retr);
> > +
> > +   if (less(*to, *from))
> > +   return false;
> > +
> > +   /* New retransmission request */
> > +   if ((elapsed > TIPC_BC_RETR_LIMIT) ||
> > +   less(*to, l->prev_from) || more(*from, l->prev_to)) {
> > +   l->prev_from = *from;
> > +   l->prev_to = *to;
> > +   l->prev_retr = jiffies;
> > +   return true;
> > +   }
> > +
> >
> > [Ying] In the my understanding, the statement above seems to be 
> > changed as
> below:
> >
> > if ((elapsed > TIPC_BC_RETR_LIMIT) && (less(*to, l->prev_from) ||
> more(*from, l->prev_to)))
> >
> > Otherwise, I think the rate of retransmission may be still very high 
> > especially
> when packet disorder frequently happens.
> > As it's very normal that BCast retransmission range requested from 
> > different
> nodes is not same, I guess the rate is not well suppressed in above condition.
> >
> > Maybe, we can use the method which is being effective in 
> > tipc_sk_enqueue()
> to limit the rate like:
> 
> The condition indicates that the two retransmission requests are 
> completely disjoint, e.g., 10-12 and 13-15.
> 
> [Ying]Understood, and it makes sense.
> 
> I don't see any reason why
> we should wait with retransmitting 13-15 in this case, as it is 
> obvious that somebody is missing it, and we haven't retransmitted it before.
> 
> [Ying] Most of cases I observed on bcast before, is that after NACK is 
> just submitted to request sender to retransmit missed packets, the 
> requested packets quickly arrives soon. But tragically sender still 
> retransmits the requested packets through bcast later, leading to 
> waste network bandwidth.  Especially when packets are received in 
> disorder, it will further worsen the situation, which may be one of 
> reasons why we see many duplicated bcast message appearing on network 
> wire. To avoid this case, it's better to a bit postpone the 
> retransmission of messages on sender, on the contrary, we also can 
> restrain the retransmission request on receiver side. In other words, when 
> receiver detects that there is a gap at the first time, it's better to delay 
> a bit time to submit NACK message.
> 

There is an implicit delay in the fact that the gap detection is made through 
looking at STATE messages, and not directly from the packet flow. Reception of 
STATE messages should typically happen after the indicated "last_sent" packet 
already has been received, and after the detection we want one single 
retransmission of the missing packets as soon as possible in order to not waste 
bandwidth.
To further speed up detection and retransmission, you will see that I in the 
next commit introduce detection and NACK-ing directly from the incoming packet 
stream, although it is delayed and coordinated among the receivers.

In brief, I only see positive effects of early gap detection and sending of 
NACKs, as long as we can make an intelligent rate limiting of the actual 
retransmissions.

[Ying] Thanks for the clear explanation, and it makes sense.

> Moreover, I have a more aggressive idea: now that NACK is delivered 
> through unicast, why not use unicast to retransmit bcast messages. 
> This is because what packets missed rarely happens in practice.

I tested this by running the "bcast-blast" program between two peers in a 
200-node network, and I can assure there were plenty of missed packets. 
Moreover, it seems to me that when a packet is lost, it is most often because 
i

Re: [tipc-discussion] [PATCH net-next v2 00/12] tipc: create socket FSM using sk_state only

2016-08-30 Thread Xue, Ying

Just a reminder: this issues might not be caused by your patches. So firstly we 
need to identify if your patch set leads to the side effects. If not, they are 
introduced by previous commits merged into mainline.

Regards,
Ying

From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com]
Sent: Tuesday, August 30, 2016 4:30 PM
To: Xue, Ying; tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com
Subject: Re: [PATCH net-next v2 00/12] tipc: create socket FSM using sk_state 
only

Hi Ying,

I tried to trigger this fault. I get a different error on the multicast test, 
after about 2 hours.
 node1 ~ # ./tipcTS
 Received on 1 sockets in subtest 6, expected 2
 TEST FAILED Received wrong number of multicast messages
  errno = 11: Resource temporarily unavailable

My client and server run on 2 qemu-guest's with 4 cpus. I will continue to 
investigate this. Sorry for the late reply, as I got stuck with other issues 
recently and couldn't focus on this.

regards partha

On 08/17/2016 12:40 PM, Xue, Ying wrote:

I have found the following issue after the series is applied to the latest 
kernel:

Test # 8

TIPC TIPC_IMPORTANCE test...STARTED!

TEST FAILED unexpected number of send() errors errno = 113: No route to host

Test # 1

Below is the procedure of how to reproduce above error:

Prepare for two nodes. One is to run "tipcTS"; and on another node, we use the 
below commands to repeatedly run tipcTC test case:

while [ true ]; do tipcTC 0; done

About after 2 or 3 hours, the error above will appear.

Regards,

Ying

-Original Message-

From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com]

Sent: Monday, August 15, 2016 5:19 PM

To: 
tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>;
 jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>; 
ma...@donjonn.com<mailto:ma...@donjonn.com>; Xue, Ying

Subject: [PATCH net-next v2 00/12] tipc: create socket FSM using sk_state only

The following issues with the current socket layer hinders socket diagnostics 
implementation, which led to this patch series. The series does not add any 
functional change.

1. tipc socket state is derived from multiple variables like

   sock->state, tsk->probing_state and tsk->connected. This style forces

   us to export multiple attributes to the user space, which has to be

   backward compatible.

2. Abuse of sock->state cannot be exported to user-space without

   requiring tipc specific hacks in the user-space.

   - For connection less (CL) sockets sock->state is overloaded to

 tipc state SS_READY.

   - For connection oriented (CO) listening socket sock->state is

 overloaded to tipc state SS_LISTEN.

This series is split into three:

1. A bug fix in patch-1

2. Express all tipc states using a single variable. This is done in patch#2-5.

3. Migrate the new tipc states to sk->sk_state. This is done in patch#6-12.

The figures below represents the FSM after this series:

Unconnected Sockets:

+--+++

| TIPC_UNCONNECTED |--->| TIPC_DISCONNECTING |

+--+++

Stream Server Listening Socket:

+--++-+

| TIPC_UNCONNECTED |--->| TIPC_LISTEN |

+--++-+

  |

++|

| TIPC_DISCONNECTING |<---+

++

Stream Server Data Socket:

+-++--+

|TIPC_UNCONNECTED |--> | TIPC_ESTABLISHED |<---+

+-++--+|

   ^   ||  |

   |   |+--+

   |   v

+--+  +-+

|TIPC_DISCONNECTING|<-|TIPC_PROBING |

+--+  +-+

Stream Socket Client:

+-+   +-+

|TIPC_UNCONNECTED |-->| TIPC_CONNECTING |

+-+   +-+

  |

  |

  v

  +--+

  | TIPC_ESTABLISHED |<---+

  +--+|

   ^   || |

   |   |+-+

   |   v

+--+  +-+

|TIPC_DISCONNECTING|<-|TIPC_PROBING |

+--+  +-+

NOTE:

This is just a base refractoring required for socket diagnostics.

Implementation of TIPC socket diagnostics will be sent as a separate series.

v2: - Address comments from Ying Xue 
<ying@windriver.com&g

Re: [tipc-discussion] [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions

2016-08-25 Thread Xue, Ying



-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Wednesday, August 17, 2016 2:09 AM
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com; gbala...@gmail.com
Subject: [PATCH net-next 2/3] tipc: rate limit broadcast retransmissions

As cluster sizes grow, so does the amount of identical or overlapping broadcast 
NACKs generated by the packet receivers. This often leads to 'NACK crunches' 
resulting in huge numbers of redundant retransmissions of the same packet 
ranges.

In this commit, we introduce rate control of broadcast retransmissions, so that 
a retransmitted range cannot be retransmitted again until after at least 10 ms. 
This reduces the frequency of duplicate retransmissions by an order of 
magnitude, while having a significant positive impact on throughput and 
scalability.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/link.c | 52 +++-
 1 file changed, 47 insertions(+), 5 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c index 136316f..58bb44d 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -181,7 +181,10 @@ struct tipc_link {
u16 acked;
struct tipc_link *bc_rcvlink;
struct tipc_link *bc_sndlink;
-   int nack_state;
+   unsigned long prev_retr;
+   u16 prev_from;
+   u16 prev_to;
+   u8 nack_state;
bool bc_peer_is_up;
 
/* Statistics */
@@ -202,6 +205,8 @@ enum {
BC_NACK_SND_SUPPRESS,
 };
 
+#define TIPC_BC_RETR_LIMIT 10   /* [ms] */

[Ying] I suggest we should define jiffies number rather than 10ms. If so, we 
don't need to convert l->prev_retr  from jiffies to microsecond.

+
 /*
  * Interval between NACKs when packets arrive out of order
  */
@@ -1590,11 +1595,48 @@ void tipc_link_bc_init_rcv(struct tipc_link *l, struct 
tipc_msg *hdr)
l->rcv_nxt = peers_snd_nxt;
 }
 
+/* link_bc_retr eval()- check if the indicated range can be 
+retransmitted now
+ * - Adjust permitted range if there is overlap with previous 
+retransmission  */ static bool link_bc_retr_eval(struct tipc_link *l, 
+u16 *from, u16 *to) {
+   unsigned long elapsed = jiffies_to_msecs(jiffies - l->prev_retr);
+
+   if (less(*to, *from))
+   return false;
+
+   /* New retransmission request */
+   if ((elapsed > TIPC_BC_RETR_LIMIT) ||
+   less(*to, l->prev_from) || more(*from, l->prev_to)) {
+   l->prev_from = *from;
+   l->prev_to = *to;
+   l->prev_retr = jiffies;
+   return true;
+   }
+

[Ying] In the my understanding, the statement above seems to be changed as 
below:

if ((elapsed > TIPC_BC_RETR_LIMIT) && (less(*to, l->prev_from) || more(*from, 
l->prev_to)))

Otherwise, I think the rate of retransmission may be still very high especially 
when packet disorder frequently happens.
As it's very normal that BCast retransmission range requested from different 
nodes is not same, I guess the rate is not well suppressed in above condition.

Maybe, we can use the method which is being effective in tipc_sk_enqueue() to 
limit the rate like:

tipc_sk_enqueue()
{
..
while (skb_queue_len(inputq)) {
if (unlikely(time_after_eq(jiffies, time_limit)))
return;
..
}

And we don’t need to consider the retransmission range.

Regards,
Ying

+   /* Inside range of previous retransmit */
+   if (!less(*from, l->prev_from) && !more(*to, l->prev_to))
+   return false;
+
+   /* Fully or partially outside previous range => exclude overlap */
+   if (less(*from, l->prev_from)) {
+   *to = l->prev_from - 1;
+   l->prev_from = *from;
+   }
+   if (more(*to, l->prev_to)) {
+   *from = l->prev_to + 1;
+   l->prev_to = *to;
+   }
+   l->prev_retr = jiffies;
+   return true;
+}
+
 /* tipc_link_bc_sync_rcv - update rcv link according to peer's send state
  */
 int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr,
  struct sk_buff_head *xmitq)
 {
+   struct tipc_link *snd_l = l->bc_sndlink;
u16 peers_snd_nxt = msg_bc_snd_nxt(hdr);
u16 from = msg_bcast_ack(hdr) + 1;
u16 to = from + msg_bc_gap(hdr) - 1;
@@ -1613,14 +1655,14 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct 
tipc_msg *hdr,
if (!l->bc_peer_is_up)
return rc;
 
+   l->stats.recv_nacks++;
+
/* Ignore if peers_snd_nxt goes beyond receive window */
if (more(peers_snd_nxt, l->rcv_nxt + l->window))
return rc;
 
-   if (!less(to, from)) {
-   rc = tipc_link_retrans(l->bc_sndlink, from, to, xmitq);
-

Re: [tipc-discussion] [PATCH -next] tipc: use kfree_skb() instead of kfree()

2016-08-23 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Wei Yongjun [mailto:weiyj...@gmail.com] 
Sent: Wednesday, August 24, 2016 7:01 AM
To: Jon Maloy; Xue, Ying; David S. Miller
Cc: Wei Yongjun; net...@vger.kernel.org; tipc-discussion@lists.sourceforge.net
Subject: [PATCH -next] tipc: use kfree_skb() instead of kfree()

From: Wei Yongjun <weiyongj...@huawei.com>

Use kfree_skb() instead of kfree() to free sk_buff.

Fixes: 0d051bf93c06 ("tipc: make bearer packet filtering generic")
Signed-off-by: Wei Yongjun <weiyongj...@huawei.com>
---
 net/tipc/bearer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 6fc4e3c..28056fa 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -529,7 +529,7 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id,
if (likely(test_bit(0, >up) || msg_is_reset(buf_msg(skb
b->media->send_msg(net, skb, b, dst);
else
-   kfree(skb);
+   kfree_skb(skb);
}
rcu_read_unlock();
 }

--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v2 00/12] tipc: create socket FSM using sk_state only

2016-08-17 Thread Xue, Ying

In the final code, it's found there is the following code:

static bool filter_connect(struct tipc_sock *tsk, struct sk_buff *skb)
{
struct sock *sk = >sk;
struct net *net = sock_net(sk);
struct tipc_msg *hdr = buf_msg(skb);

if (unlikely(msg_mcast(hdr)))  
return false; 

switch (sk->sk_state) {
case TIPC_CONNECTING: 
/* Accept only ACK or NACK message */
if (unlikely(!msg_connected(hdr)))
return false;  

if (unlikely(msg_errcode(hdr))) { 
tipc_set_state(sk, TIPC_DISCONNECTING);
sk->sk_err = ECONNREFUSED; 
return true;   
}

if (unlikely(!msg_isdata(hdr))) { 
tipc_set_state(sk, TIPC_DISCONNECTING);
sk->sk_err = EINVAL;   
return true;   
}

tipc_sk_finish_conn(tsk, msg_origport(hdr), msg_orignode(hdr));
msg_set_importance(>phdr, msg_importance(hdr));

/* If 'ACK+' message, add to socket receive queue */
if (msg_data_sz(hdr))  
return true;   

/* If empty 'ACK-' message, wake up sleeping connect() */
if (waitqueue_active(sk_sleep(sk)))
wake_up_interruptible(sk_sleep(sk));

/* 'ACK-' message is neither accepted nor rejected: */
msg_set_dest_droppable(hdr, 1);
return false; 

}

switch (sk->sk_state) {   ===> But the 
switch statement should be redundant.
   case TIPC_UNCONNECTED: 

Regards,
Ying

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Monday, August 15, 2016 5:19 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying
Subject: [PATCH net-next v2 00/12] tipc: create socket FSM using sk_state only

The following issues with the current socket layer hinders socket diagnostics 
implementation, which led to this patch series. The series does not add any 
functional change.

1. tipc socket state is derived from multiple variables like
   sock->state, tsk->probing_state and tsk->connected. This style forces
   us to export multiple attributes to the user space, which has to be
   backward compatible.

2. Abuse of sock->state cannot be exported to user-space without
   requiring tipc specific hacks in the user-space.
   - For connection less (CL) sockets sock->state is overloaded to
 tipc state SS_READY.
   - For connection oriented (CO) listening socket sock->state is
 overloaded to tipc state SS_LISTEN.

This series is split into three:
1. A bug fix in patch-1
2. Express all tipc states using a single variable. This is done in patch#2-5.
3. Migrate the new tipc states to sk->sk_state. This is done in patch#6-12.

The figures below represents the FSM after this series:

Unconnected Sockets:
+--+++
| TIPC_UNCONNECTED |--->| TIPC_DISCONNECTING |
+--+++

Stream Server Listening Socket:
+--++-+
| TIPC_UNCONNECTED |--->| TIPC_LISTEN |
+--++-+
  |
++|
| TIPC_DISCONNECTING |<---+
++

Stream Server Data Socket:
+-++--+
|TIPC_UNCONNECTED |--> | TIPC_ESTABLISHED |<---+
+-++--+|
   ^   ||  |
   |   |+--+
   |   v
+--+  +-+
|TIPC_DISCONNECTING|<-|TIPC_PROBING |
+--+  +-+

Stream Socket Client:
+-+   +-+
|TIPC_UNCONNECTED |-->| TIPC_CONNECTING |
+-+   +-+
  |
  |
  v
  +--+
  | TIPC_ESTABLISHED |<---+
  +--+|
   ^   || |
   |   |+-+
   |   v
+--+  +-+
|TIPC_DISCONNECTING|<-|TIPC_PROBING |
+--+  +-+

NOTE:
This is just a base refractoring required for socket diagnostics.
Implementation of TIPC socket diagnostics will be s

Re: [tipc-discussion] [PATCH net-next v2 07/12] tipc: create TIPC_LISTEN as a new sk_state

2016-08-17 Thread Xue, Ying

The name of tipc_set_state() is too general, so it's better to change it as:

tipc_set_sk_state()

Regards,
Ying
-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Monday, August 15, 2016 5:19 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying
Subject: [PATCH net-next v2 07/12] tipc: create TIPC_LISTEN as a new sk_state

Until now, tipc maintains the socket state in sock->state variable.
This is used to maintain generic socket states, but in tipc we overload it and 
save tipc socket states like TIPC_LISTEN.
Other protocols like TCP, UDP store protocol specific states in sk->sk_state 
instead.

In this commit, we :
- declare a new tipc state TIPC_LISTEN, that replaces SS_LISTEN
- Create a new function tipc_set_state(), to update sk->sk_state.
- TIPC_LISTEN state is maintained in sk->sk_state.
- replace references to SS_LISTEN with TIPC_LISTEN.

There is no functional change in this commit.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
---
 include/uapi/linux/tipc.h |  7 ++
 net/tipc/socket.c | 59 ---
 2 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/include/uapi/linux/tipc.h b/include/uapi/linux/tipc.h index 
bf049e8fe31b..8a107085f268 100644
--- a/include/uapi/linux/tipc.h
+++ b/include/uapi/linux/tipc.h
@@ -177,6 +177,13 @@ struct tipc_event {  };
 
 /*
+ * Definitions for the TIPC protocol sk_state field.
+ */
+enum {
+   TIPC_LISTEN = 1,
+};
+
+/*
  * Socket API
  */
 
diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 
54cc9f6bed8f..00221b88838f 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -44,8 +44,6 @@
 #include "bcast.h"
 #include "netlink.h"
 
-#define SS_LISTENING   -1  /* socket is listening */
-
 #define CONN_TIMEOUT_DEFAULT   8000/* default connect timeout = 8s */
 #define CONN_PROBING_INTERVAL  msecs_to_jiffies(360)  /* [ms] => 1 h */
 #define TIPC_FWD_MSG   1
@@ -337,6 +335,32 @@ static bool tsk_peer_msg(struct tipc_sock *tsk, struct 
tipc_msg *msg)
return false;
 }
 
+/* tipc_set_state - set the sk_state of the socket
+ * @sk: socket
+ *
+ * Caller must hold socket lock
+ *
+ * Returns 0 on success, errno otherwise  */ static int 
+tipc_set_state(struct sock *sk, int state) {
+   int oldstate = sk->sk_socket->state;
+   int res = -EINVAL;
+
+   switch (state) {
+   case TIPC_LISTEN:
+   if (oldstate == SS_UNCONNECTED)
+   res = 0;
+   break;
+   }
+
+   if (res)
+   return res;
+
+   sk->sk_state = state;
+   return 0;
+}
+
 /**
  * tipc_sk_create - create a TIPC socket
  * @net: network namespace (must be default network) @@ -666,15 +690,22 @@ 
static unsigned int tipc_poll(struct file *file, struct socket *sock,
 
switch ((int)sock->state) {
case SS_UNCONNECTED:
-   if (!tsk->link_cong)
-   mask |= POLLOUT;
+   switch (sk->sk_state) {
+   case TIPC_LISTEN:
+   if (!skb_queue_empty(>sk_receive_queue))
+   mask |= (POLLIN | POLLRDNORM);
+   break;
+   default:
+   if (!tsk->link_cong)
+   mask |= POLLOUT;
+   break;
+   }
break;
case SS_CONNECTED:
if (!tsk->link_cong && !tsk_conn_cong(tsk))
mask |= POLLOUT;
/* fall thru' */
case SS_CONNECTING:
-   case SS_LISTENING:
if (!skb_queue_empty(>sk_receive_queue))
mask |= (POLLIN | POLLRDNORM);
break;
@@ -922,7 +953,7 @@ static int __tipc_sendmsg(struct socket *sock, struct 
msghdr *m, size_t dsz)
return -EINVAL;
}
if (!is_connectionless) {
-   if (sock->state == SS_LISTENING)
+   if (sk->sk_state == TIPC_LISTEN)
return -EPIPE;
if (sock->state != SS_UNCONNECTED)
return -EISCONN;
@@ -1642,7 +1673,6 @@ static bool filter_connect(struct tipc_sock *tsk, struct 
sk_buff *skb)
msg_set_dest_droppable(hdr, 1);
return false;
 
-   case SS_LISTENING:
case SS_UNCONNECTED:
 
/* Accept only SYN message */
@@ -2017,15 +2047,9 @@ static int tipc_listen(struct socket *sock, int len)
int res;
 
lock_sock(sk);
-
-   if (sock->state != SS_UNCONNECTED)
-   res = -EINVAL;
-   else {
-   sock->state = SS_LISTENING;
-   res = 0;
-   }
-
+   res = tipc_set_state(sk, TIPC_LISTEN);
release_soc

Re: [tipc-discussion] [RFC PATCH v1 09/12] tipc: create TIPC_UNCONNECTED as a new sk_state

2016-08-15 Thread Xue, Ying

Thanks for the below explanations. Please send out the next version.

Regards,
Ying

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Wednesday, August 03, 2016 10:57 PM
To: Xue, Ying; tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; erik.hu...@gmail.com
Subject: Re: [RFC PATCH v1 09/12] tipc: create TIPC_UNCONNECTED as a new 
sk_state

On 08/02/2016 11:16 AM, Xue, Ying wrote:
> @@ -1684,17 +1687,22 @@ static bool filter_connect(struct tipc_sock *tsk, 
> struct sk_buff *skb)
>   msg_set_dest_droppable(hdr, 1);
>   return false;
>  
> - case SS_UNCONNECTED:
> + case SS_DISCONNECTING:
> + break;
> + }
>  
> + switch (sk->sk_state) {
> + case TIPC_UNCONNECTED:
> + break;
>
> If the TIPC_UNCONNECTED case condition is broken, it means that 
> filter_connect () will return false. As a result, SYN msg would be rejected 
> when socket state is in TIPC_UNCONNECTED. If so, I don't understand how a 
> client stream socket can receive SYN msg.
[partha] We process sock->state intentionally before sk->sk_state. The client 
stream socket has its sock->state set to SS_CONNECTING, hence will consume the 
SYN msg from server and filter_connect() would return true. The case for 
TIPC_UNCONNECTED exists just to prevent an error message "Unknown sk_state..".
> Regards,
> Ying
>
>

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [RFC PATCH v1 00/12] tipc: create socket FSM using sk_state only

2016-08-03 Thread Xue, Ying

Hi Partha,

Thanks, it's really a good job!

Each of patch in the series is very clean, and easily understandable.

I have reviewed all of patches. Except for some minor comments, I have no other 
objections. 

Please add my reviewed-by tag if you want.

Regards,
Ying

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Monday, July 25, 2016 8:25 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying; erik.hu...@gmail.com
Subject: [RFC PATCH v1 00/12] tipc: create socket FSM using sk_state only

The following issues with the current socket layer hinders socket diagnostics 
implementation, which led to this patch series.

1. tipc socket state is derived from multiple variables like
   sock->state, tsk->probing_state and tsk->connected. This style forces
   us to export multiple attributes to the user space, which has to be
   backward compatible.

2. Abuse of sock->state cannot be exported to user-space without
   requiring tipc specific hacks in the user-space.
   - For connection less (CL) sockets sock->state is overloaded to
 tipc state SS_READY.
   - For connection oriented (CO) listening socket sock->state is
 overloaded to tipc state SS_LISTEN.

This series is split into three:
1. A bug fix in patch-1
2. Express all tipc states using a single variable. This is done in patch#2-5.
3. Migrate the new tipc states to sk->sk_state. This is done in patch#6-12.

The figures below represents the FSM after this series:

Unconnected Sockets:
+--+++
| TIPC_UNCONNECTED |--->| TIPC_DISCONNECTING |
+--+++

Stream Server Listening Socket:
+--++-+
| TIPC_UNCONNECTED |--->| TIPC_LISTEN |
+--++-+
  |
++|
| TIPC_DISCONNECTING |<---+
++

Stream Server Data Socket:
+-++--+
|TIPC_UNCONNECTED |--> | TIPC_ESTABLISHED |<---+
+-++--+|
   ^   ||  |
   |   |+--+
   |   v
+--+  +-+
|TIPC_DISCONNECTING|<-|TIPC_PROBING |
+--+  +-+

Stream Socket Client:
+-+   +-+
|TIPC_UNCONNECTED |-->| TIPC_CONNECTING |
+-+   +-+
  |
  |
  v
  +--+
  | TIPC_ESTABLISHED |<---+
  +--+|
   ^   || |
   |   |+-+
   |   v
+--+  +-+
|TIPC_DISCONNECTING|<-|TIPC_PROBING |
+--+  +-+

NOTE:
This is just a base refractoring required for socket diagnostics.
The patches for socket diagnostics will be sent when they are ready.

Parthasarathy Bhuvaragan (12):
  tipc: set kern=0 in sk_alloc() during tipc_accept()
  tipc: rename tsk->remote to tsk->peer for consistent naming
  tipc: remove tsk->connected for connection less sockets
  tipc: remove tsk->connected from tipc_sock
  tipc: remove probing_intv from tipc_sock
  tipc: remove socket state SS_READY
  tipc: create TIPC_LISTEN as a new sk_state
  tipc: create TIPC_PROBING/TIPC_ESTABLISHED as new sk_states
  tipc: create TIPC_UNCONNECTED as a new sk_state
  tipc: create TIPC_DISCONNECTING as a new sk_state
  tipc: create TIPC_CONNECTING as a new sk_state
  tipc: remove SS_CONNECTED sock state

 include/uapi/linux/tipc.h |  12 ++
 net/tipc/socket.c | 344 ++
 2 files changed, 207 insertions(+), 149 deletions(-)

--
2.1.4

--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [RFC PATCH v1 12/12] tipc: remove SS_CONNECTED sock state

2016-08-03 Thread Xue, Ying

Just one minor suggestion:

s/tipc_sk_state_connected/ tipc_sk_connected

Regards,
Ying

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Monday, July 25, 2016 8:25 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying; erik.hu...@gmail.com
Subject: [RFC PATCH v1 12/12] tipc: remove SS_CONNECTED sock state

In this commit, we remove the state SS_CONNECTED and replace it with the 
function tipc_sk_state_connected() wherever possible.
A socket with sk_state TIPC_ESTABLISHED or TIPC_PROBING replaces the socket 
state SS_CONNECTED.

After these changes, the sock->state is no longer explicitly used by tipc. The 
FSM below is for various types of connection oriented sockets.

Stream Server Listening Socket:
+--++-+
| TIPC_UNCONNECTED |--->| TIPC_LISTEN |
+--++-+
  |
++|
| TIPC_DISCONNECTING |<---+
++

Stream Server Data Socket:
+-++--+
|TIPC_UNCONNECTED |--> | TIPC_ESTABLISHED |<---+
+-++--+|
   ^   ||  |
   |   |+--+
   |   v
+--+  +-+
|TIPC_DISCONNECTING|<-|TIPC_PROBING |
+--+  +-+

Stream Socket Client:
+-+   +-+
|TIPC_UNCONNECTED |-->| TIPC_CONNECTING |
+-+   +-+
  |
  |
  v
  +--+
  | TIPC_ESTABLISHED |<---+
  +--+|
   ^   || |
   |   |+-+
   |   v
+--+  +-+
|TIPC_DISCONNECTING|<-|TIPC_PROBING |
+--+  +-+

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
---
 net/tipc/socket.c | 90 +--
 1 file changed, 41 insertions(+), 49 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 
d2598186fb65..6128c1646866 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -286,7 +286,7 @@ static void tsk_rej_rx_queue(struct sock *sk)
 
 static bool tipc_sk_state_connected(struct sock *sk)  {
-   return sk->sk_socket->state == SS_CONNECTED;
+   return sk->sk_state == TIPC_ESTABLISHED || sk->sk_state == 
+TIPC_PROBING;
 }
 
 /* tipc_sk_type_connectionless - check if the socket is datagram socket @@ 
-514,7 +514,7 @@ static int tipc_release(struct socket *sock)
kfree_skb(skb);
else {
if ((sk->sk_state == TIPC_CONNECTING) ||
-   (sock->state == SS_CONNECTED)) {
+   tipc_sk_state_connected(sk)) {
tipc_set_state(sk, TIPC_DISCONNECTING);
tipc_node_remove_conn(net, dnode, tsk->portid);
}
@@ -628,7 +628,7 @@ static int tipc_getname(struct socket *sock, struct 
sockaddr *uaddr,
 
memset(addr, 0, sizeof(*addr));
if (peer) {
-   if ((sock->state != SS_CONNECTED) &&
+   if ((!tipc_sk_state_connected(sk)) &&
((peer != 2) || (sk->sk_state != TIPC_DISCONNECTING)))
return -ENOTCONN;
addr->addr.id.ref = tsk_peer_port(tsk); @@ -704,28 +704,24 @@ 
static unsigned int tipc_poll(struct file *file, struct socket *sock,
return mask;
}
 
-   switch ((int)sock->state) {
-   case SS_CONNECTED:
+   switch (sk->sk_state) {
+   case TIPC_PROBING:
+   case TIPC_ESTABLISHED:
if (!tsk->link_cong && !tsk_conn_cong(tsk))
mask |= POLLOUT;
+   /* fall thru' */
+   case TIPC_LISTEN:
+   case TIPC_CONNECTING:
if (!skb_queue_empty(>sk_receive_queue))
mask |= (POLLIN | POLLRDNORM);
break;
-   default:
-   switch (sk->sk_state) {
-   case TIPC_UNCONNECTED:
-   if (!tsk->link_cong)
-   mask |= POLLOUT;
-   break;
-   case TIPC_LISTEN:
-   case TIPC_CONNECTING:
-   if (!skb_queue_empty(>sk_receive_queue))
-   mask |= (POLLIN | POLLRDNORM);
-

Re: [tipc-discussion] [RFC PATCH v1 11/12] tipc: create TIPC_CONNECTING as a new sk_state

2016-08-02 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Monday, July 25, 2016 8:25 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying; erik.hu...@gmail.com
Subject: [RFC PATCH v1 11/12] tipc: create TIPC_CONNECTING as a new sk_state

In this commit, we create a new tipc socket state TIPC_CONNECTING by primarily 
replacing the SS_CONNECTING with TIPC_CONNECTING.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
---
 include/uapi/linux/tipc.h |  1 +
 net/tipc/socket.c | 53 +++
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/include/uapi/linux/tipc.h b/include/uapi/linux/tipc.h index 
bbb1c1a61961..3d4844e42172 100644
--- a/include/uapi/linux/tipc.h
+++ b/include/uapi/linux/tipc.h
@@ -163,6 +163,7 @@ enum {
TIPC_ESTABLISHED,
TIPC_UNCONNECTED,
TIPC_DISCONNECTING,
+   TIPC_CONNECTING,
 };
 
 /*
diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 
216e66bd2eff..d2598186fb65 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -342,7 +342,6 @@ static bool tsk_peer_msg(struct tipc_sock *tsk, struct 
tipc_msg *msg)
  */
 static int tipc_set_state(struct sock *sk, int state)  {
-   int oldstate = sk->sk_socket->state;
int oldsk_state = sk->sk_state;
int res = -EINVAL;
 
@@ -352,6 +351,7 @@ static int tipc_set_state(struct sock *sk, int state)
res = 0;
break;
case TIPC_LISTEN:
+   case TIPC_CONNECTING:
if (oldsk_state == TIPC_UNCONNECTED)
res = 0;
break;
@@ -362,7 +362,7 @@ static int tipc_set_state(struct sock *sk, int state)
case TIPC_ESTABLISHED:
if (oldsk_state == TIPC_PROBING ||
oldsk_state == TIPC_ESTABLISHED ||
-   oldstate == SS_CONNECTING ||
+   oldsk_state == TIPC_CONNECTING ||
oldsk_state == TIPC_UNCONNECTED)
res = 0;
break;
@@ -513,7 +513,7 @@ static int tipc_release(struct socket *sock)
if (TIPC_SKB_CB(skb)->handle != NULL)
kfree_skb(skb);
else {
-   if ((sock->state == SS_CONNECTING) ||
+   if ((sk->sk_state == TIPC_CONNECTING) ||
(sock->state == SS_CONNECTED)) {
tipc_set_state(sk, TIPC_DISCONNECTING);
tipc_node_remove_conn(net, dnode, tsk->portid); 
@@ -708,8 +708,6 @@ static unsigned int tipc_poll(struct file *file, struct 
socket *sock,
case SS_CONNECTED:
if (!tsk->link_cong && !tsk_conn_cong(tsk))
mask |= POLLOUT;
-   /* fall thru' */
-   case SS_CONNECTING:
if (!skb_queue_empty(>sk_receive_queue))
mask |= (POLLIN | POLLRDNORM);
break;
@@ -720,6 +718,7 @@ static unsigned int tipc_poll(struct file *file, struct 
socket *sock,
mask |= POLLOUT;
break;
case TIPC_LISTEN:
+   case TIPC_CONNECTING:
if (!skb_queue_empty(>sk_receive_queue))
mask |= (POLLIN | POLLRDNORM);
break;
@@ -1024,7 +1023,7 @@ new_mtu:
rc = tipc_node_xmit(net, , dnode, tsk->portid);
if (likely(!rc)) {
if (!is_connectionless)
-   sock->state = SS_CONNECTING;
+   tipc_set_state(sk, TIPC_CONNECTING);
return dsz;
}
if (rc == -ELINKCONG) {
@@ -1654,9 +1653,10 @@ static bool filter_connect(struct tipc_sock *tsk, struct 
sk_buff *skb)
  tsk->portid);
}
return true;
+   }
 
-   case SS_CONNECTING:
-
+   switch (sk->sk_state) {
+   case TIPC_CONNECTING:
/* Accept only ACK or NACK message */
if (unlikely(!msg_connected(hdr)))
return false;
@@ -1960,7 +1960,8 @@ static int tipc_wait_for_connect(struct socket *sock, 
long *timeo_p)
return sock_intr_errno(*timeo_p);
 
prepare_to_wait(sk_sleep(sk), , TASK_INTERRUPTIBLE);
-   done = sk_wait_event(sk, timeo_p, sock->state != SS_CONNECTING);
+   done = sk_wait_event(sk, timeo_p,
+sk->sk_state != TIPC_CONNECTING);
finish_wait(sk_sleep(sk), );
} while (!done);
return 0;
@@ -1983,7 +1984,7

Re: [tipc-discussion] [RFC PATCH v1 08/12] tipc: create TIPC_PROBING/TIPC_ESTABLISHED as new sk_states

2016-07-28 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Monday, July 25, 2016 8:25 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying; erik.hu...@gmail.com
Subject: [RFC PATCH v1 08/12] tipc: create TIPC_PROBING/TIPC_ESTABLISHED as new 
sk_states

Until now, tipc maintains probing state for connected sockets in
tsk->probing_state variable.

In this commit, we express this information as socket states and this remove 
the variable. The sk_state is set to TIPC_PROBING instead of setting 
probing_state to TIPC_CONN_PROBING. Similarly sk_state is set to 
TIPC_ESTABLISHED instead of TIPC_CONN_OK.

There is no functional change in this commit.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
---
 include/uapi/linux/tipc.h |  2 ++
 net/tipc/socket.c | 23 +--
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/tipc.h b/include/uapi/linux/tipc.h index 
6b8c433fc4ee..7f437163c2e2 100644
--- a/include/uapi/linux/tipc.h
+++ b/include/uapi/linux/tipc.h
@@ -159,6 +159,8 @@ struct tipc_event {
  */
 enum {
TIPC_LISTEN = 1,
+   TIPC_PROBING,
+   TIPC_ESTABLISHED,
 };
 
 /*
diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 
4c7d861d0c7a..b92ff1ec0d09 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -47,8 +47,6 @@
 #define CONN_TIMEOUT_DEFAULT   8000/* default connect timeout = 8s */
 #define CONN_PROBING_INTERVAL  msecs_to_jiffies(360)  /* [ms] => 1 h */
 #define TIPC_FWD_MSG   1
-#define TIPC_CONN_OK   0
-#define TIPC_CONN_PROBING  1
 #define TIPC_MAX_PORT  0x
 #define TIPC_MIN_PORT  1
 
@@ -345,6 +343,7 @@ static bool tsk_peer_msg(struct tipc_sock *tsk, struct 
tipc_msg *msg)  static int tipc_set_state(struct sock *sk, int state)  {
int oldstate = sk->sk_socket->state;
+   int oldsk_state = sk->sk_state;
int res = -EINVAL;
 
switch (state) {
@@ -352,6 +351,17 @@ static int tipc_set_state(struct sock *sk, int state)
if (oldstate == SS_UNCONNECTED)
res = 0;
break;
+   case TIPC_PROBING:
+   if (oldsk_state == TIPC_ESTABLISHED)
+   res = 0;
+   break;
+   case TIPC_ESTABLISHED:
+   if (oldsk_state == TIPC_PROBING ||
+   oldsk_state == TIPC_ESTABLISHED ||
+   oldstate == SS_CONNECTING ||
+   oldstate == SS_UNCONNECTED)
+   res = 0;
+   break;
}
 
if (res)
@@ -852,7 +862,8 @@ static void tipc_sk_proto_rcv(struct tipc_sock *tsk, struct 
sk_buff *skb,
if (!tsk_peer_msg(tsk, hdr))
goto exit;
 
-   tsk->probing_state = TIPC_CONN_OK;
+   if (tipc_set_state(sk, TIPC_ESTABLISHED))
+   goto exit;
 
if (mtyp == CONN_PROBE) {
msg_set_type(hdr, CONN_PROBE_REPLY);
@@ -1189,8 +1200,8 @@ static void tipc_sk_finish_conn(struct tipc_sock *tsk, 
u32 peer_port,
msg_set_lookup_scope(msg, 0);
msg_set_hdr_sz(msg, SHORT_H_SIZE);
 
-   tsk->probing_state = TIPC_CONN_OK;
sk_reset_timer(sk, >sk_timer, jiffies + CONN_PROBING_INTERVAL);
+   tipc_set_state(sk, TIPC_ESTABLISHED);
tipc_node_add_conn(net, peer_node, tsk->portid, peer_port);
tsk->max_pkt = tipc_node_get_mtu(net, peer_node, tsk->portid);
tsk->peer_caps = tipc_node_get_capabilities(net, peer_node); @@ -2250,7 
+2261,7 @@ static void tipc_sk_timeout(unsigned long data)
peer_port = tsk_peer_port(tsk);
peer_node = tsk_peer_node(tsk);
 
-   if (tsk->probing_state == TIPC_CONN_PROBING) {
+   if (sk->sk_state == TIPC_PROBING) {
if (!sock_owned_by_user(sk)) {
sk->sk_socket->state = SS_DISCONNECTING;
tipc_node_remove_conn(sock_net(sk), tsk_peer_node(tsk), 
@@ -2268,7 +2279,7 @@ static void tipc_sk_timeout(unsigned long data)
skb = tipc_msg_create(CONN_MANAGER, CONN_PROBE,
  INT_H_SIZE, 0, peer_node, own_node,
  peer_port, tsk->portid, TIPC_OK);
-   tsk->probing_state = TIPC_CONN_PROBING;
+   tipc_set_state(sk, TIPC_PROBING);
sk_reset_timer(sk, >sk_timer, jiffies + CONN_PROBING_INTERVAL);
bh_unlock_sock(sk);
 
--
2.1.4

--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [RFC PATCH v1 06/12] tipc: remove socket state SS_READY

2016-07-28 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Monday, July 25, 2016 8:25 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying; erik.hu...@gmail.com
Subject: [RFC PATCH v1 06/12] tipc: remove socket state SS_READY

Until now, tipc socket state SS_READY declares that the socket is a connection 
less socket.

In this commit, we remove the state SS_READY and replace it with a condition 
which returns true for datagram / connection less sockets.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
---
 net/tipc/socket.c | 49 +++--
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 
a718f8281dde..68c52b59409d 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -45,7 +45,6 @@
 #include "netlink.h"
 
 #define SS_LISTENING   -1  /* socket is listening */
-#define SS_READY   -2  /* socket is connectionless */
 
 #define CONN_TIMEOUT_DEFAULT   8000/* default connect timeout = 8s */
 #define CONN_PROBING_INTERVAL  msecs_to_jiffies(360)  /* [ms] => 1 h */
@@ -294,6 +293,16 @@ static bool tipc_sk_state_connected(struct sock *sk)
return sk->sk_socket->state == SS_CONNECTED;  }
 
+/* tipc_sk_type_connectionless - check if the socket is datagram socket
+ * @sk: socket
+ *
+ * Returns true if connection less, false otherwise  */ static bool 
+tipc_sk_type_connectionless(struct sock *sk) {
+   return sk->sk_type == SOCK_RDM || sk->sk_type == SOCK_DGRAM; }
+
 /* tsk_peer_msg - verify if message was sent by connected port's peer
  *
  * Handles cases where the node's network address has changed from @@ -345,7 
+354,6 @@ static int tipc_sk_create(struct net *net, struct socket *sock,  {
struct tipc_net *tn;
const struct proto_ops *ops;
-   socket_state state;
struct sock *sk;
struct tipc_sock *tsk;
struct tipc_msg *msg;
@@ -357,16 +365,13 @@ static int tipc_sk_create(struct net *net, struct socket 
*sock,
switch (sock->type) {
case SOCK_STREAM:
ops = _ops;
-   state = SS_UNCONNECTED;
break;
case SOCK_SEQPACKET:
ops = _ops;
-   state = SS_UNCONNECTED;
break;
case SOCK_DGRAM:
case SOCK_RDM:
ops = _ops;
-   state = SS_READY;
break;
default:
return -EPROTOTYPE;
@@ -387,7 +392,7 @@ static int tipc_sk_create(struct net *net, struct socket 
*sock,
 
/* Finish initializing socket data structures */
sock->ops = ops;
-   sock->state = state;
+   sock->state = SS_UNCONNECTED;
sock_init_data(sock, sk);
if (tipc_sk_insert(tsk)) {
pr_warn("Socket create failed; port number exhausted\n"); @@ 
-407,7 +412,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
tsk->snd_win = tsk_adv_blocks(RCVBUF_MIN);
tsk->rcv_win = tsk->snd_win;
 
-   if (sock->state == SS_READY) {
+   if (tipc_sk_type_connectionless(sk)) {
tsk_set_unreturnable(tsk, true);
if (sock->type == SOCK_DGRAM)
tsk_set_unreliable(tsk, true);
@@ -651,12 +656,19 @@ static unsigned int tipc_poll(struct file *file, struct 
socket *sock,
 
sock_poll_wait(file, sk_sleep(sk), wait);
 
+   if (tipc_sk_type_connectionless(sk)) {
+   if (!tsk->link_cong && !tsk_conn_cong(tsk))
+   mask |= POLLOUT;
+   if (!skb_queue_empty(>sk_receive_queue))
+   mask |= (POLLIN | POLLRDNORM);
+   return mask;
+   }
+
switch ((int)sock->state) {
case SS_UNCONNECTED:
if (!tsk->link_cong)
mask |= POLLOUT;
break;
-   case SS_READY:
case SS_CONNECTED:
if (!tsk->link_cong && !tsk_conn_cong(tsk))
mask |= POLLOUT;
@@ -890,6 +902,7 @@ static int __tipc_sendmsg(struct socket *sock, struct 
msghdr *m, size_t dsz)
struct tipc_msg *mhdr = >phdr;
u32 dnode, dport;
struct sk_buff_head pktchain;
+   bool is_connectionless = tipc_sk_type_connectionless(sk);
struct sk_buff *skb;
struct tipc_name_seq *seq;
struct iov_iter save;
@@ -900,7 +913,7 @@ static int __tipc_sendmsg(struct socket *sock, struct 
msghdr *m, size_t dsz)
if (dsz > TIPC_MAX_USER_MSG_SIZE)
return -EMSGSIZE;
if (unlikely(!dest)) {
-   if (sock->state == SS_READY && tsk->peer.family == AF_TIPC)
+

Re: [tipc-discussion] [RFC PATCH v1 03/12] tipc: remove tsk->connected for connection less sockets

2016-07-28 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Monday, July 25, 2016 8:25 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying; erik.hu...@gmail.com
Subject: [RFC PATCH v1 03/12] tipc: remove tsk->connected for connection less 
sockets

Until now, for connection less sockets the peer information during connect is 
stored in tsk->peer and a connection state is set in
tsk->connected. This is redundant.

In this commit, for connection less sockets we update:
- __tipc_sendmsg(), when the destination is NULL the peer existence
  is determined by tsk->peer.family, instead of tsk->connected.
- tipc_connect(), remove set/unset of tsk->connected.
Hence tsk->connected is no longer used for connection less sockets.

There is no functional change in this commit.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
---
 net/tipc/socket.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 
4b434c9ca4c6..c4bf810aa396 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -899,7 +899,7 @@ static int __tipc_sendmsg(struct socket *sock, struct 
msghdr *m, size_t dsz)
if (dsz > TIPC_MAX_USER_MSG_SIZE)
return -EMSGSIZE;
if (unlikely(!dest)) {
-   if (tsk->connected && sock->state == SS_READY)
+   if (sock->state == SS_READY && tsk->peer.family == AF_TIPC)
dest = >peer;
else
return -EDESTADDRREQ;
@@ -1930,12 +1930,10 @@ static int tipc_connect(struct socket *sock, struct 
sockaddr *dest,
if (sock->state == SS_READY) {
if (dst->family == AF_UNSPEC) {
memset(>peer, 0, sizeof(struct sockaddr_tipc));
-   tsk->connected = 0;
} else if (destlen != sizeof(struct sockaddr_tipc)) {
res = -EINVAL;
} else {
memcpy(>peer, dest, destlen);
-   tsk->connected = 1;
}
goto exit;
}
--
2.1.4

--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [RFC PATCH v1 02/12] tipc: rename tsk->remote to tsk->peer for consistent naming

2016-07-28 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Monday, July 25, 2016 8:25 PM
To: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying; erik.hu...@gmail.com
Subject: [RFC PATCH v1 02/12] tipc: rename tsk->remote to tsk->peer for 
consistent naming

Until now, the peer information for connect is stored in tsk->remote but the 
rest of code uses the name peer for peer/remote.

In this commit, we rename tsk->remote to tsk->peer to align with naming 
convention followed in the rest of the code.

There is no functional change in this commit.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>
---
 net/tipc/socket.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 
6b53251905e4..4b434c9ca4c6 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -65,7 +65,6 @@
  * @max_pkt: maximum packet size "hint" used when building messages sent by 
port
  * @portid: unique port identity in TIPC socket hash table
  * @phdr: preformatted message header used when sending messages
- * @port_list: adjacent ports in TIPC's global list of ports
  * @publications: list of publications for port
  * @pub_count: total # of publications port has made during its lifetime
  * @probing_state:
@@ -75,7 +74,7 @@
  * @link_cong: non-zero if owner must sleep because of link congestion
  * @sent_unacked: # messages sent by socket, and not yet acked by peer
  * @rcv_unacked: # messages read by user, but not yet acked back to peer
- * @remote: 'connected' peer for dgram/rdm
+ * @peer: 'connected' peer for dgram/rdm
  * @node: hash table node
  * @rcu: rcu struct for tipc_sock
  */
@@ -101,7 +100,7 @@ struct tipc_sock {
u16 peer_caps;
u16 rcv_unacked;
u16 rcv_win;
-   struct sockaddr_tipc remote;
+   struct sockaddr_tipc peer;
struct rhash_head node;
struct rcu_head rcu;
 };
@@ -901,7 +900,7 @@ static int __tipc_sendmsg(struct socket *sock, struct 
msghdr *m, size_t dsz)
return -EMSGSIZE;
if (unlikely(!dest)) {
if (tsk->connected && sock->state == SS_READY)
-   dest = >remote;
+   dest = >peer;
else
return -EDESTADDRREQ;
} else if (unlikely(m->msg_namelen < sizeof(*dest)) || @@ -1930,12 
+1929,12 @@ static int tipc_connect(struct socket *sock, struct sockaddr *dest,
/* DGRAM/RDM connect(), just save the destaddr */
if (sock->state == SS_READY) {
if (dst->family == AF_UNSPEC) {
-   memset(>remote, 0, sizeof(struct sockaddr_tipc));
+   memset(>peer, 0, sizeof(struct sockaddr_tipc));
tsk->connected = 0;
} else if (destlen != sizeof(struct sockaddr_tipc)) {
res = -EINVAL;
} else {
-   memcpy(>remote, dest, destlen);
+   memcpy(>peer, dest, destlen);
tsk->connected = 1;
}
goto exit;
--
2.1.4

--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v5 0/2] tipc: bearer and link improvements

2016-07-26 Thread Xue, Ying

Thanks Jon!

Please add my acked-by tag.

Thanks,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Monday, July 25, 2016 10:43 PM
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com; gbala...@gmail.com
Subject: [PATCH net-next v5 0/2] tipc: bearer and link improvements

The first commit makes blocking of bearers available from the generic bearer 
layer. The second commit is a small improvement to the link congestion 
mechanism.

v2: - Removed the previous commit #2 ("tipc: reset all unicast links
  when broadcast link fails"), which should now go into 'net'.
- Reworked and reordered previous commit #1 ("tipc: make bearer
  packet filtering generic") to be based on the previous #2.
  I am aware that I am once again stirring up an old debate, but
  we have to come to a conclusion about this, since the current 
  solution is ugly and non-generic. We definitely need this when
  the broadcast link resets in a large cluster.
  
v3: - Made #1 rcu-safe by placing the block/unblock functions in the 
  media specific code, and added acces to those from the generic
  code via two new function pointers in the media interface.
  
v4: - Backed to v2, but made set/clear of the 'up' flag architecture
  safe according to comment from Ying.

v5: - Using test_bit() to read the value of the 'up' flag.

Jon Maloy (2):
  tipc: make bearer packet filtering generic
  tipc: ensure that link congestion and wakeup use same criteria

 net/tipc/bearer.c| 78 +++-
 net/tipc/bearer.h|  1 +
 net/tipc/link.c  | 18 ++--
 net/tipc/udp_media.c |  2 +-
 4 files changed, 52 insertions(+), 47 deletions(-)

--
2.7.4

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v1 0/5] tipc: netlink updates for neighbour monitor

2016-07-25 Thread Xue, Ying

Hi Partha,

Yes, just for this series, I think we don't need to take any action unless we 
wait for David's response. 

However, I see lots of people still submit patches to net-next tree. Maybe we 
have a chance to get the series merged during this window.

Anyway, it's better to submit our patches before rc6 in the future.

Regards,
Ying

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Monday, July 25, 2016 4:28 PM
To: Xue, Ying
Cc: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com
Subject: Re: [PATCH net-next v1 0/5] tipc: netlink updates for neighbour monitor

Hi Ying,

Thanks for the information. I will follow this while submitting to "net-next".

For this series, I might have to wait and see david's response as its tagged as 
"Under Review".

regards

Partha


On 07/23/2016 12:06 PM, Xue, Ying wrote:
> Hi Parthasarathy,
>
> I don't think this might be a proper time to submit the series to "net-next" 
> tree. As kernel mainline version goes to 4.7-rc7, it means that "net-next" 
> tree has already entered into merge window. So I guess when the "net-next" is 
> open, you need to resubmit the series again.
> In general, we don't encourage to submit patches to net-next tree when the 
> version of kernel mainline runs to RC6 and later except that there are some 
> urgent features/fixes we really want to merge into mainline tree during this 
> window.
>
> Regards,
> Ying
>
> -Original Message-
> From: Parthasarathy Bhuvaragan 
> [mailto:parthasarathy.bhuvara...@ericsson.com]
> Sent: Friday, July 22, 2016 2:43 PM
> To: net...@vger.kernel.org
> Cc: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
> ma...@donjonn.com; Xue, Ying
> Subject: [PATCH net-next v1 0/5] tipc: netlink updates for neighbour 
> monitor
>
> This series contains the updates to configure and read the attributes for 
> neighbour monitor.
>
> Parthasarathy Bhuvaragan (5):
>   tipc: introduce constants for tipc address validation
>   tipc: make cluster size threshold for monitoring configurable
>   tipc: get monitor threshold for the cluster
>   tipc: add a function to get the bearer name
>   tipc: dump monitor attributes
>
>  include/uapi/linux/tipc.h |  30 ++-
>  include/uapi/linux/tipc_netlink.h |  37 +
>  net/tipc/addr.h   |   5 +-
>  net/tipc/bearer.c |  25 +-
>  net/tipc/bearer.h |   1 +
>  net/tipc/monitor.c| 152 +++
>  net/tipc/monitor.h|   9 +++
>  net/tipc/netlink.c|  27 ++-
>  net/tipc/netlink.h|   1 +
>  net/tipc/node.c   | 165 
> ++
>  net/tipc/node.h   |   5 ++
>  11 files changed, 445 insertions(+), 12 deletions(-)
>
> --
> 2.1.4
>

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v1 0/5] tipc: netlink updates for neighbour monitor

2016-07-23 Thread Xue, Ying

Hi Parthasarathy,

I don't think this might be a proper time to submit the series to "net-next" 
tree. As kernel mainline version goes to 4.7-rc7, it means that "net-next" tree 
has already entered into merge window. So I guess when the "net-next" is open, 
you need to resubmit the series again.
In general, we don't encourage to submit patches to net-next tree when the 
version of kernel mainline runs to RC6 and later except that there are some 
urgent features/fixes we really want to merge into mainline tree during this 
window.

Regards,
Ying

-Original Message-
From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] 
Sent: Friday, July 22, 2016 2:43 PM
To: net...@vger.kernel.org
Cc: tipc-discussion@lists.sourceforge.net; jon.ma...@ericsson.com; 
ma...@donjonn.com; Xue, Ying
Subject: [PATCH net-next v1 0/5] tipc: netlink updates for neighbour monitor

This series contains the updates to configure and read the attributes for 
neighbour monitor.

Parthasarathy Bhuvaragan (5):
  tipc: introduce constants for tipc address validation
  tipc: make cluster size threshold for monitoring configurable
  tipc: get monitor threshold for the cluster
  tipc: add a function to get the bearer name
  tipc: dump monitor attributes

 include/uapi/linux/tipc.h |  30 ++-
 include/uapi/linux/tipc_netlink.h |  37 +
 net/tipc/addr.h   |   5 +-
 net/tipc/bearer.c |  25 +-
 net/tipc/bearer.h |   1 +
 net/tipc/monitor.c| 152 +++
 net/tipc/monitor.h|   9 +++
 net/tipc/netlink.c|  27 ++-
 net/tipc/netlink.h|   1 +
 net/tipc/node.c   | 165 ++
 net/tipc/node.h   |   5 ++
 11 files changed, 445 insertions(+), 12 deletions(-)

--
2.1.4

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v4 1/2] tipc: make bearer packet filtering generic

2016-07-23 Thread Xue, Ying

Hi Jon,

We need to use test_bit() to test whether "up" is set or not.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Friday, July 22, 2016 11:00 PM
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com; gbala...@gmail.com
Subject: [PATCH net-next v4 1/2] tipc: make bearer packet filtering generic

In commit 5b7066c3dd24 ("tipc: stricter filtering of packets in bearer
layer") we introduced a method of filtering out messages while a bearer is 
being reset, to avoid that links may be re-created and come back in working 
state while we are still in the process of shutting them down.

This solution works well, but is limited to only work with L2 media, which is 
insufficient with the increasing use of UDP as carrier media.

We now replace this solution with a more generic one, by introducing a new flag 
"up" in the generic struct tipc_bearer. This field will be set and reset at the 
same locations as with the previous solution, while the packet filtering is 
moved to the generic code for the sending side.
On the receiving side, the filtering is still done in media specific code, but 
now including the UDP bearer.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/bearer.c| 77 ++--
 net/tipc/bearer.h|  1 +
 net/tipc/udp_media.c |  2 +-
 3 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 65b1bbf..187e533 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -56,6 +56,13 @@ static struct tipc_media * const media_info_array[] = {
NULL
 };
 
+static struct tipc_bearer *bearer_get(struct net *net, int bearer_id) {
+   struct tipc_net *tn = tipc_net(net);
+
+   return rcu_dereference_rtnl(tn->bearer_list[bearer_id]);
+}
+
 static void bearer_disable(struct net *net, struct tipc_bearer *b);
 
 /**
@@ -323,6 +330,7 @@ restart:
b->domain = disc_domain;
b->net_plane = bearer_id + 'A';
b->priority = priority;
+   test_and_set_bit_lock(0, >up);
 
res = tipc_disc_create(net, b, >bcast_addr, );
if (res) {
@@ -360,15 +368,24 @@ static int tipc_reset_bearer(struct net *net, struct 
tipc_bearer *b)
  */
 void tipc_bearer_reset_all(struct net *net)  {
-   struct tipc_net *tn = tipc_net(net);
struct tipc_bearer *b;
int i;
 
for (i = 0; i < MAX_BEARERS; i++) {
-   b = rcu_dereference_rtnl(tn->bearer_list[i]);
+   b = bearer_get(net, i);
+   if (b)
+   clear_bit_unlock(0, >up);
+   }
+   for (i = 0; i < MAX_BEARERS; i++) {
+   b = bearer_get(net, i);
if (b)
tipc_reset_bearer(net, b);
}
+   for (i = 0; i < MAX_BEARERS; i++) {
+   b = bearer_get(net, i);
+   if (b)
+   test_and_set_bit_lock(0, >up);
+   }
 }
 
 /**
@@ -382,8 +399,9 @@ static void bearer_disable(struct net *net, struct 
tipc_bearer *b)
int bearer_id = b->identity;
 
pr_info("Disabling bearer <%s>\n", b->name);
-   b->media->disable_media(b);
+   clear_bit_unlock(0, >up);
tipc_node_delete_links(net, bearer_id);
+   b->media->disable_media(b);
RCU_INIT_POINTER(b->media_ptr, NULL);
if (b->link_req)
tipc_disc_delete(b->link_req);
@@ -440,22 +458,16 @@ int tipc_l2_send_msg(struct net *net, struct sk_buff 
*skb,  {
struct net_device *dev;
int delta;
-   void *tipc_ptr;
 
dev = (struct net_device *)rcu_dereference_rtnl(b->media_ptr);
if (!dev)
return 0;
 
-   /* Send RESET message even if bearer is detached from device */
-   tipc_ptr = rcu_dereference_rtnl(dev->tipc_ptr);
-   if (unlikely(!tipc_ptr && !msg_is_reset(buf_msg(skb
-   goto drop;
-
-   delta = dev->hard_header_len - skb_headroom(skb);
-   if ((delta > 0) &&
-   pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC))
-   goto drop;
-
+   delta = SKB_DATA_ALIGN(dev->hard_header_len - skb_headroom(skb));
+   if ((delta > 0) && pskb_expand_head(skb, delta, 0, GFP_ATOMIC)) {
+   kfree_skb(skb);
+   return 0;
+   }
skb_reset_network_header(skb);
skb->dev = dev;
skb->protocol = htons(ETH_P_TIPC);
@@ -463,9 +475,6 @@ int tipc_l2_send_msg(struct net *net, struct sk_buff *skb,
dev->dev_addr, skb->len);
dev_queue_xmit(skb);
return 0;
-drop:
-   kfree_skb(skb);
-   return 0;
 }
 
 int tipc_bearer_mtu(struct net *net, u32

Re: [tipc-discussion] [PATCH net-next v3 1/2] tipc: make bearer packet filtering generic

2016-07-22 Thread Xue, Ying

Hi Jon,

Please see my comments below:
===
What do you mean with "safer" here? If it means less risk for stale values I am 
all for it. But atomic operations has the unfortunate effect that they trash 
the cache (at least the atomic_t type). What we really want is operations that 
write though the cache  when we alter the variable's values, but works like a 
normal read from the cache when we only want to check its value (on the send 
and receive path). Is this what you want to achieve?
===

To understand why atomic bit/value is safer than a longs, ints, chars, bools, 
we need to know the following three things:
1. We must keep atomic while operating an integer variable. Almost on modern 
processes, they are 32bit or 64 bit. Therefore, it's atomic to read/write an 
integer variable. But three is an exception: if the int variable is not aligned 
with 32bit or 64bit, the operation will be not atomic. However, for atomic 
variable, there is not such exception.

2. At compile time, we should prevent compiler from optimizing the access 
process for the integer variable. That's why atomic variable is declared with 
"volatile" keyword on some architectures. But for most of architectures, atomic 
type is defined as a signed integer, for example, x86.

3. At run time, we should prevent CPU from randomly adjusting the access 
sequence of an integer variable and its dependent variable, which may result in 
code that may suddenly break down on some architecture. In normal case, we have 
to use memory barriers primitives to prevent these bad things from happening. 

Just as for our case, as I mentioned previously, the reordering "up" variable 
seems not a problem for us. So the latter two factors have no negative impact 
on us. Only for the first factor, as long as the "up" variable is aligned to 
32bit or 64 bit, it's no problem too. But I still suggest we should use atomic 
bitops instead of integer or bool type because atomic type has helped us hide 
any detailed behaviors of different architectures. 

About more detailed information, please refer to:
Documentation/atomic_ops.txt
Documentation/memory-barriers.txt

Regards,
Ying

BR
///jon

>
> In fact your mentioned phenomenon doesn't cause any trouble for us even if 
> b->up is set back to true after the bearer list and dev->tipc_ptr are set to 
> NULL.  As the entire tipc_node_bc_rcv() is covered by rcu_read_lock(), bearer 
> instance cannot be immediately  destroyed. Moreover, even if b->up is changed 
> back to true, it doesn't matter as the bearer is destroyed subsequently.
>
> Furthermore, even we don't need to explicitly add memory barriers around the 
> operations of "up" variable. Now when we detects whether a bearer is blocked 
> or not, we will check bearer_list[i] is NULL and "up" is true or not on xmit 
> path, and check dev->tipc_ptr is NULL and "up" is true on receive path. This 
> means as long as bearer_list[i] is NULL or "up" is false on xmit, and 
> dev->tipc_ptr is NULL or "up" is false on receive path, we cannot do further 
> normal actions on any of paths. In other words, whatever setting 
> bearer_list[i] to NULL or setting check dev->tipc_ptr to NULL occurs before 
> "up" is changed as true or false, it doesn't have any negative impact on our 
> desired behavior.
>
> Regards,
> Ying
>
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Thursday, July 21, 2016 5:36 AM
> To: Xue, Ying; Xue, Ying; Jon Maloy; 
> tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan; 
> Richard Alpe
> Subject: RE: [tipc-discussion] [PATCH net-next v3 1/2] tipc: make 
> bearer packet filtering generic
>
> Hi Ying,
> I think we first need to have a common understanding of which problem you see 
> that we need to solve.
>
> My first impression was that it was the "stale" state data that you saw as a 
> problem, although I have tried to explain why I think we can live with that. 
> This solution has been tested in my 400- name space environment, and works 
> well.
>
> But reading your mail below, I came to understand that it is the parallel 
> access as such you see as the problem, with risk of accessing pointers to 
> freed memory. This makes more sense to me, as I can see the following 
> scenario happen:
>
> CPU #1
> CPU #2:
> --
> ---
> tipc_l2_rcv_msg() 
> tipc_nl_bearer_disable()
> rcu_read_lock()   
>

Re: [tipc-discussion] [PATCH net-next v3 1/2] tipc: make bearer packet filtering generic

2016-07-21 Thread Xue, Ying

Hi Jon,

Thank you for the very clear explanation below.

I agree with you.  We should go back to your original solution. 

The reason why I objected to the initial solution is because we could not 
directly modify b->up variable on RCU read side when we stand on the point that 
the "up" flag is also protected by RCU.
On the contrary, if we think b->up is not projected by RCU, but we guarantee 
that writing/reading it is always atomic on SMP, it will be safe for us to 
change b->up even on RCU read side.
Therefore, I suggest we define "up" as atomic bit type, and use atomic bit 
operations to operate "up" variable. 

About your questions below, please see my answer:

I.e., the worst thing that can happen is that we might set the flag "up" back 
to true in a bearer instance that has been removed from both the bearer list 
and dev->tipc_ptr, and is pending for deletion. This is not a big problem, as I 
see it, and I don't see how your new proposal would solve this. The write 
access to to-be-freed memory would happen anyway. Is this the problem are you 
trying to solve, and how exactly does your proposal work solve it?


A:  My proposal is not to solve the your described problem, instead it just 
makes it safer to operate "up" on SMP.

In fact your mentioned phenomenon doesn't cause any trouble for us even if 
b->up is set back to true after the bearer list and dev->tipc_ptr are set to 
NULL.  As the entire tipc_node_bc_rcv() is covered by rcu_read_lock(), bearer 
instance cannot be immediately  destroyed. Moreover, even if b->up is changed 
back to true, it doesn't matter as the bearer is destroyed subsequently.

Furthermore, even we don't need to explicitly add memory barriers around the 
operations of "up" variable. Now when we detects whether a bearer is blocked or 
not, we will check bearer_list[i] is NULL and "up" is true or not on xmit path, 
and check dev->tipc_ptr is NULL and "up" is true on receive path. This means as 
long as bearer_list[i] is NULL or "up" is false on xmit, and dev->tipc_ptr is 
NULL or "up" is false on receive path, we cannot do further normal actions on 
any of paths. In other words, whatever setting bearer_list[i] to NULL or 
setting check dev->tipc_ptr to NULL occurs before "up" is changed as true or 
false, it doesn't have any negative impact on our desired behavior. 

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Thursday, July 21, 2016 5:36 AM
To: Xue, Ying; Xue, Ying; Jon Maloy; tipc-discussion@lists.sourceforge.net; 
Parthasarathy Bhuvaragan; Richard Alpe
Subject: RE: [tipc-discussion] [PATCH net-next v3 1/2] tipc: make bearer packet 
filtering generic

Hi Ying,
I think we first need to have a common understanding of which problem you see 
that we need to solve.

My first impression was that it was the "stale" state data that you saw as a 
problem, although I have tried to explain why I think we can live with that. 
This solution has been tested in my 400- name space environment, and works well.

But reading your mail below, I came to understand that it is the parallel 
access as such you see as the problem, with risk of accessing pointers to freed 
memory. This makes more sense to me, as I can see the following scenario happen:

CPU #1  
  CPU #2:
--  
  ---
tipc_l2_rcv_msg() 
tipc_nl_bearer_disable()
   rcu_read_lock()  
rtnl_lock()
   tipc_rcv()   
bearer_disable(b)
   tipc_node_bc_rcv()  
 tipc_bearer_reset_all()   
   b->media->block()
 RCU_INIT_POINTER(dev->tipc_ptr, NULL); 
  reset_bearer()

b->media-block()

RCU_INIT_POINTER(dev->tipc_ptr, NULL);
  b->media->unblock()
   RCU_INIT_POINTER(dev->tipc_ptr, tn->blist[b_id]);

RCU_INIT_POINTER(tn->blist[b_id], NULL);

kfree_rcu(b()

rtnl_unlock()
rcu_read_unloc

Re: [tipc-discussion] [PATCH net-next v3 1/2] tipc: make bearer packet filtering generic

2016-07-20 Thread Xue, Ying

Hi Jon,

I suggest that we can introduce a flag to bearer structure to reflect bearer's 
status like what we are doing in v1. But the only difference is that when we 
operate the "blocked" flag, we should use test_and_set_bit_lock() and 
clear_bit_unlock() to modify it. Moreover, it's better to add 
smp_mb__after_atomic() after the "blocked" flag is changed.

Regards,
Ying

-Original Message-
From: Xue, Ying [mailto:ying@windriver.com] 
Sent: Tuesday, July 19, 2016 10:10 AM
To: Jon Maloy; Jon Maloy; tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; richard.a...@ericsson.com
Subject: Re: [tipc-discussion] [PATCH net-next v3 1/2] tipc: make bearer packet 
filtering generic

Hi Jon,

Before the change, dev->tipc_ptr is set/reset through tipc_enable_l2_media() 
and tipc_disable_l2_media()  which are called by tipc_enable_bearer() and 
bearer_disable() respectively. Especially for bearer_disable(), it might be 
invoked by the four functions tipc_enable_bearer(), 
tipc_l2_device_event(),tipc_bearer_stop(),tipc_nl_bearer_disable(). But the 
four functions call bearer_disable(), RTNL lock has been held.

However, after the change, at least we find the following path is unsafe for us:

tipc_l2_rcv_msg()
  rcu_read_lock()
  tipc_rcv()
  tipc_node_bc_rcv()
tipc_bearer_reset_all()
  b->media->block()
  RCU_INIT_POINTER(dev->tipc_ptr, NULL);

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:ma...@donjonn.com] 
Sent: Monday, July 18, 2016 7:24 PM
To: Xue, Ying; Jon Maloy; tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; richard.a...@ericsson.com
Subject: Re: [PATCH net-next v3 1/2] tipc: make bearer packet filtering generic



On 07/18/2016 05:41 AM, Xue, Ying wrote:
> Hi Jon,
>
> Although I know your purpose is right, unfortunately we cannot reset bearer 
> like what we are doing in the patch. As you know, dev->tipc_ptr is protected 
> by RTNL on write side and protected by RCU on read side. Regarding RCU 
> locking sematic, we cannot change RCU projected object on read side.

But we don't. This is then same solution we have now, except I made it 
accessible from generic code. Are you saying that the current upstream code 
also is wrong? I don't understand your comment.

///jon

> But so far, I cannot find more simpler method to do this except that we reset 
> dev->tipc_ptr pointer on workqueue context under the protection of RTNL lock.
>
> Regards,
> Ying
>
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Saturday, July 16, 2016 4:44 AM
> To: tipc-discussion@lists.sourceforge.net; 
> parthasarathy.bhuvara...@ericsson.com; Xue, Ying; 
> richard.a...@ericsson.com; jon.ma...@ericsson.com
> Cc: ma...@donjonn.com
> Subject: [PATCH net-next v3 1/2] tipc: make bearer packet filtering 
> generic
>
> In commit 5b7066c3dd24 ("tipc: stricter filtering of packets in bearer
> layer") we introduced a method of filtering out messages while a bearer is 
> being reset, to avoid that links may be re-created and come back in working 
> state while we are still in the process of shutting them down.
>
> This solution works well, but is limited to only work with L2 media, which is 
> insufficient with the increasing use of UDP as carrier media.
> We also see the need to block/unblock bearers in a generic way from the 
> recently introduced function tipc_bearer_reset_all().
>
> We now replace this solution with a more generic one, by introducing two new 
> functions, "block()" and "unblock()" to the generic media interface.
> We then let each media type install its specific functions for this purpose 
> as pointers in the media interface. Finally, we let the UDP specific 
> send/receive functions also test for whether the bearer is blocked or not 
> before sending/delivering a packet.
>
> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> ---
>   net/tipc/bearer.c| 63 
> +++-
>   net/tipc/bearer.h|  5 -
>   net/tipc/eth_media.c |  2 ++
>   net/tipc/ib_media.c  |  2 ++
>   net/tipc/udp_media.c | 28 +++
>   5 files changed, 84 insertions(+), 16 deletions(-)
>
> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 
> 4131d5a..886f0c5 100644
> --- a/net/tipc/bearer.c
> +++ b/net/tipc/bearer.c
> @@ -56,6 +56,13 @@ static struct tipc_media * const media_info_array[] = {
>   NULL
>   };
>   
> +static struct tipc_bearer *bearer_get(struct net *net, int bearer_id) {
> + struct tipc_net *tn = tipc_net(net);
> +
> + return rcu_dereference_rtnl(tn->bearer_list[bearer_id]);
> +}
> +
>   static void bearer_disable(struct net *net, struct tipc_

Re: [tipc-discussion] [PATCH net-next v3 1/2] tipc: make bearer packet filtering generic

2016-07-18 Thread Xue, Ying

Hi Jon,

Before the change, dev->tipc_ptr is set/reset through tipc_enable_l2_media() 
and tipc_disable_l2_media()  which are called by tipc_enable_bearer() and 
bearer_disable() respectively. Especially for bearer_disable(), it might be 
invoked by the four functions tipc_enable_bearer(), 
tipc_l2_device_event(),tipc_bearer_stop(),tipc_nl_bearer_disable(). But the 
four functions call bearer_disable(), RTNL lock has been held.

However, after the change, at least we find the following path is unsafe for us:

tipc_l2_rcv_msg()
  rcu_read_lock()
  tipc_rcv()
  tipc_node_bc_rcv()
tipc_bearer_reset_all()
  b->media->block()
  RCU_INIT_POINTER(dev->tipc_ptr, NULL);

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:ma...@donjonn.com] 
Sent: Monday, July 18, 2016 7:24 PM
To: Xue, Ying; Jon Maloy; tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; richard.a...@ericsson.com
Subject: Re: [PATCH net-next v3 1/2] tipc: make bearer packet filtering generic



On 07/18/2016 05:41 AM, Xue, Ying wrote:
> Hi Jon,
>
> Although I know your purpose is right, unfortunately we cannot reset bearer 
> like what we are doing in the patch. As you know, dev->tipc_ptr is protected 
> by RTNL on write side and protected by RCU on read side. Regarding RCU 
> locking sematic, we cannot change RCU projected object on read side.

But we don't. This is then same solution we have now, except I made it 
accessible from generic code. Are you saying that the current upstream code 
also is wrong? I don't understand your comment.

///jon

> But so far, I cannot find more simpler method to do this except that we reset 
> dev->tipc_ptr pointer on workqueue context under the protection of RTNL lock.
>
> Regards,
> Ying
>
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Saturday, July 16, 2016 4:44 AM
> To: tipc-discussion@lists.sourceforge.net; 
> parthasarathy.bhuvara...@ericsson.com; Xue, Ying; 
> richard.a...@ericsson.com; jon.ma...@ericsson.com
> Cc: ma...@donjonn.com
> Subject: [PATCH net-next v3 1/2] tipc: make bearer packet filtering 
> generic
>
> In commit 5b7066c3dd24 ("tipc: stricter filtering of packets in bearer
> layer") we introduced a method of filtering out messages while a bearer is 
> being reset, to avoid that links may be re-created and come back in working 
> state while we are still in the process of shutting them down.
>
> This solution works well, but is limited to only work with L2 media, which is 
> insufficient with the increasing use of UDP as carrier media.
> We also see the need to block/unblock bearers in a generic way from the 
> recently introduced function tipc_bearer_reset_all().
>
> We now replace this solution with a more generic one, by introducing two new 
> functions, "block()" and "unblock()" to the generic media interface.
> We then let each media type install its specific functions for this purpose 
> as pointers in the media interface. Finally, we let the UDP specific 
> send/receive functions also test for whether the bearer is blocked or not 
> before sending/delivering a packet.
>
> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> ---
>   net/tipc/bearer.c| 63 
> +++-
>   net/tipc/bearer.h|  5 -
>   net/tipc/eth_media.c |  2 ++
>   net/tipc/ib_media.c  |  2 ++
>   net/tipc/udp_media.c | 28 +++
>   5 files changed, 84 insertions(+), 16 deletions(-)
>
> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 
> 4131d5a..886f0c5 100644
> --- a/net/tipc/bearer.c
> +++ b/net/tipc/bearer.c
> @@ -56,6 +56,13 @@ static struct tipc_media * const media_info_array[] = {
>   NULL
>   };
>   
> +static struct tipc_bearer *bearer_get(struct net *net, int bearer_id) {
> + struct tipc_net *tn = tipc_net(net);
> +
> + return rcu_dereference_rtnl(tn->bearer_list[bearer_id]);
> +}
> +
>   static void bearer_disable(struct net *net, struct tipc_bearer *b);
>   
>   /**
> @@ -339,15 +346,24 @@ static int tipc_reset_bearer(struct net *net, struct 
> tipc_bearer *b)
>*/
>   void tipc_bearer_reset_all(struct net *net)  {
> - struct tipc_net *tn = tipc_net(net);
>   struct tipc_bearer *b;
>   int i;
>   
>   for (i = 0; i < MAX_BEARERS; i++) {
> - b = rcu_dereference_rtnl(tn->bearer_list[i]);
> + b = bearer_get(net, i);
> + if (b)
> + b->media->block(b);
> + }
> + for (i = 0; i < MAX_BEARERS; i++) {
> + b = bearer_get(net, i);
>   if (b)
>   tipc_reset_bearer(net, b);
>   }
>

Re: [tipc-discussion] [PATCH net v3 0/3] tipc: three small fixes

2016-07-11 Thread Xue, Ying

Thanks, Jon!

The series looks fine with me.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Friday, July 08, 2016 10:45 PM
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com
Subject: [PATCH net v3 0/3] tipc: three small fixes

Fixes for some broadcast link problems that may occur in large systems.

v2: Added a third commit to reset all unicast links when broadcast
send link fails.
v3: Removed redundant rcu_lock/unlock() in commit #3, as per feedback
from Ying.

Jon Maloy (3):
  tipc: extend broadcast link initialization criteria
  tipc: ensure correct broadcast send buffer release when peer is lost
  tipc: reset all unicast links when broadcast send link fails

 net/tipc/bearer.c | 15 +++
 net/tipc/bearer.h |  1 +
 net/tipc/link.c   |  9 -
 net/tipc/node.c   | 15 +++
 4 files changed, 35 insertions(+), 5 deletions(-)

-- 
1.9.1

--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] Driver crash

2016-07-07 Thread Xue, Ying

Hi,

I believe 333f796235a52727db7e0a13888045f3aa3d5335 commit("tipc: fix a race 
condition leading to subscriber refcnt bug") is not applied into your kernel. 
So please integrate it and try to test again.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:ma...@donjonn.com] 
Sent: Thursday, July 07, 2016 7:18 AM
To: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] Driver crash

Hi,
I think this type of bug is Partha's table (or maybe Ying's), but he is on 
vacation until next week, and I don't have time right now.
I cannot immediately figure out any smarter way to do this, but why does the 
timeout need to be this short? Does it work with 2 ms, or 10 ms?

///jon


On 07/06/2016 05:34 PM, Rune Torgersen wrote:
> Think this is probably used by me using the topology server with a very short 
> timeout to get a list of active servers on a specific address range.
>
> If there is a better way of just that list, I'd like to rewrite my code to 
> use that.
>
>
> The code that I suspect is the following, with a timeout of 1ms.

Why do you have a timeout at all? Use 0, and the subscription will never time 
out.
>
> void WaitForServers(uint32 servertype, uint32 serverinstanceLow, 
> uint32 serverinstanceHigh, uint32 timeout_ms, std::set & 
> listeningServers) {
>  struct sockaddr_tipc topsrv;
>  struct tipc_subscr subscription = {{servertype, serverinstanceLow, 
> serverinstanceHigh},
>  timeout_ms, TIPC_SUB_SERVICE, {}};
>  struct tipc_event event;
>
>  int sd = socket(AF_TIPC, SOCK_SEQPACKET, 0);
>  if (sd <= 0)
>  {
>  return;
>  }
>
>  memset(, 0, sizeof(topsrv));
>  topsrv.family = AF_TIPC;
>  topsrv.addrtype = TIPC_ADDR_NAME;
>  topsrv.addr.name.name.type = TIPC_TOP_SRV;
>  topsrv.addr.name.name.instance = TIPC_TOP_SRV;
>
>  /* Connect to topology server: */
>
>  if (0 > connect(sd, (struct sockaddr*), sizeof(topsrv)))
>  {
>  close(sd);
>  return;
>  }
>
>  if (send(sd, , sizeof(subscription), 0) != 
> sizeof(subscription))
>  {
>  close(sd);
>  return;
>  }
>
>  bool timedOut = false;
>  while ((!timedOut)
>  &&
>  (recv(sd, , sizeof(event), 0) == sizeof(event))/* Now wait for 
> the subscription to fire: */)
>  {
>  if (event.event == TIPC_PUBLISHED)
>  {
>  listeningServers.insert(event.found_lower);
>  }
>  else if (event.event == TIPC_SUBSCR_TIMEOUT)
>  {
>  timedOut = true;
>  }
>  }
>
>  close(sd);
>
>  return;
> }
>
> -Original Message-
> From: Rune Torgersen
> Sent: Wednesday, July 06, 2016 4:02 PM
> To: tipc-discussion@lists.sourceforge.net
> Subject: Driver crash
>
> Running 4.4.0-28 (Ubuntu 16.04):
>
> [448650.106212] BUG: unable to handle kernel NULL pointer dereference 
> at 0556 [448650.106254] IP: [] 
> tipc_nametbl_unsubscribe+0x72/0x100 [tipc] [448650.106288] PGD 
> 150efa067 PUD 1059ba067 PMD 0 [448650.106308] Oops: 0002 [#1] SMP 
> [448650.106323] Modules linked in: tipc cfg80211 cls_fw sch_sfq 
> sch_htb xt_CLASSIFY xt_multiport iptable_mangle ip_tables x_tables 
> nls_iso8859_1 joydev input_leds hid_generic intel_rapl x86_pkg_temp_thermal 
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
> crc32_pclmul usbhid hid ixgbe sb_edac vxlan edac_core ip6_udp_tunnel igb 
> udp_tunnel ptp pps_core mei_me lpc_ich mei mdio ioatdma shpchp dca 
> 8250_fintek fjes ipmi_ssif acpi_power_meter acpi_pad quota_v2 quota_tree 
> mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp 
> libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_si ipmi_devintf 
> ipmi_msghandler autofs4 raid10 raid456 libcrc32c async_raid6_recov 
> async_memcpy async_pq async_xor xor async_tx raid6_pq raid0 multipath linear 
> raid1 aesni_intel aes_x86_64 glue_helper [448650.106659]  lrw gf128mul 
> ablk_helper ast cryptd i2c_algo_bit ttm drm_kms_helper syscopyarea 
> sysfillrect sysimgblt fb_sys_fops drm ahci libahci wmi
> [448650.106736] CPU: 0 PID: 8073 Comm: kworker/u65:3 Tainted: GW  
>  4.4.0-28-generic #47-Ubuntu
> [448650.106767] Hardware name: Supermicro Super Server/X10DRL-i, BIOS 
> 1.1b 09/11/2015 [448650.106799] Workqueue: tipc_rcv tipc_recv_work 
> [tipc] [448650.106817] task: 880019265280 ti: 8808ba078000 
> task.ti: 8808ba078000 [448650.106840] RIP: 
> 0010:[]  [] 
> tipc_nametbl_unsubscribe+0x72/0x100 [tipc] [448650.106876] RSP: 
> 0018:88085fa03e10  EFLAGS: 00010246 [448650.106893] RAX: 
> 88085719af80 RBX: 88085719af00 RCX: 054e 
> [448650.106923] RDX: 001a RSI: 0067 RDI: 
> 88085b355518 [448650.106950] RBP: 88085fa03e30 R08: 
> 88085fa16d00 R09: 0001 [448650.106973] R10: 
> 0001 R11:  R12: 88085b3554e0 
> [448650.107001] R13:

Re: [tipc-discussion] [PATCH net-next 2/6] tipc: split UDP send function

2016-06-29 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Richard Alpe [mailto:richard.a...@ericsson.com] 
Sent: Tuesday, June 28, 2016 5:22 PM
To: tipc-discussion@lists.sourceforge.net
Subject: [tipc-discussion] [PATCH net-next 2/6] tipc: split UDP send function

Split the UDP send function into two. One callback that prepares the skb and 
one transmit function that sends the skb. This will come in handy in later 
patches, when we introduce UDP replicast.

Signed-off-by: Richard Alpe <richard.a...@ericsson.com>
---
 net/tipc/udp_media.c | 50 --
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index 7edfb47..854faf7 
100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -140,28 +140,13 @@ static int tipc_udp_addr2msg(char *msg, struct 
tipc_media_addr *a)  }
 
 /* tipc_send_msg - enqueue a send request */ -static int 
tipc_udp_send_msg(struct net *net, struct sk_buff *skb,
-struct tipc_bearer *b,
-struct tipc_media_addr *dest)
+static int tipc_udp_xmit(struct net *net, struct sk_buff *skb,
+struct udp_bearer *ub, struct udp_media_addr *src,
+struct udp_media_addr *dst)
 {
int ttl, err = 0;
-   struct udp_bearer *ub;
-   struct udp_media_addr *dst = (struct udp_media_addr *)>value;
-   struct udp_media_addr *src = (struct udp_media_addr *)>addr.value;
struct rtable *rt;
 
-   if (skb_headroom(skb) < UDP_MIN_HEADROOM) {
-   err = pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC);
-   if (err)
-   goto tx_error;
-   }
-
-   skb_set_inner_protocol(skb, htons(ETH_P_TIPC));
-   ub = rcu_dereference_rtnl(b->media_ptr);
-   if (!ub) {
-   err = -ENODEV;
-   goto tx_error;
-   }
if (dst->proto == htons(ETH_P_IP)) {
struct flowi4 fl = {
.daddr = dst->ipv4.s_addr,
@@ -207,6 +192,35 @@ tx_error:
return err;
 }
 
+static int tipc_udp_send_msg(struct net *net, struct sk_buff *skb,
+struct tipc_bearer *b,
+struct tipc_media_addr *dest)
+{
+   int err = 0;
+   struct udp_bearer *ub;
+   struct udp_media_addr *dst = (struct udp_media_addr *)>value;
+   struct udp_media_addr *src = (struct udp_media_addr *)>addr.value;
+
+   if (skb_headroom(skb) < UDP_MIN_HEADROOM) {
+   err = pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC);
+   if (err)
+   goto tx_error;
+   }
+
+   skb_set_inner_protocol(skb, htons(ETH_P_TIPC));
+   ub = rcu_dereference_rtnl(b->media_ptr);
+   if (!ub) {
+   err = -ENODEV;
+   goto tx_error;
+   }
+
+   return tipc_udp_xmit(net, skb, ub, src, dst);
+
+tx_error:
+   kfree_skb(skb);
+   return err;
+}
+
 /* tipc_udp_recv - read data from bearer socket */  static int 
tipc_udp_recv(struct sock *sk, struct sk_buff *skb)  {
--
2.1.4


--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion
--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next 1/6] tipc: split UDP nl address parsing

2016-06-29 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Richard Alpe [mailto:richard.a...@ericsson.com] 
Sent: Tuesday, June 28, 2016 5:22 PM
To: tipc-discussion@lists.sourceforge.net
Subject: [tipc-discussion] [PATCH net-next 1/6] tipc: split UDP nl address 
parsing

Split the UDP netlink parse function so that it only parses one netlink 
attribute at the time. This makes the parse function more generic and allow 
future UDP API functions to use it for parsing.

Signed-off-by: Richard Alpe <richard.a...@ericsson.com>
---
 net/tipc/udp_media.c | 112 +--
 1 file changed, 55 insertions(+), 57 deletions(-)

diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index b016c01..7edfb47 
100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -258,68 +258,47 @@ static int enable_mcast(struct udp_bearer *ub, struct 
udp_media_addr *remote)  }
 
 /**
- * parse_options - build local/remote addresses from configuration
- * @attrs: netlink config data
- * @ub:UDP bearer instance
- * @local: local bearer IP address/port
- * @remote:peer or multicast IP/port
+ * tipc_parse_udp_addr - build udp media address from netlink data
+ * @nlattr:netlink attribute containing sockaddr storage aligned address
+ * @addr:  tipc media address to fill with address, port and protocol type
+ * @scope_id:  IPv6 scope id pointer, not NULL indicates it's required
  */
-static int parse_options(struct nlattr *attrs[], struct udp_bearer *ub,
-struct udp_media_addr *local,
-struct udp_media_addr *remote)
+
+static int tipc_parse_udp_addr(struct nlattr *nla, struct udp_media_addr *addr,
+  u32 *scope_id)
 {
-   struct nlattr *opts[TIPC_NLA_UDP_MAX + 1];
-   struct sockaddr_storage sa_local, sa_remote;
+   struct sockaddr_storage sa;
 
-   if (!attrs[TIPC_NLA_BEARER_UDP_OPTS])
-   goto err;
-   if (nla_parse_nested(opts, TIPC_NLA_UDP_MAX,
-attrs[TIPC_NLA_BEARER_UDP_OPTS],
-tipc_nl_udp_policy))
-   goto err;
-   if (opts[TIPC_NLA_UDP_LOCAL] && opts[TIPC_NLA_UDP_REMOTE]) {
-   nla_memcpy(_local, opts[TIPC_NLA_UDP_LOCAL],
-  sizeof(sa_local));
-   nla_memcpy(_remote, opts[TIPC_NLA_UDP_REMOTE],
-  sizeof(sa_remote));
-   } else {
-err:
-   pr_err("Invalid UDP bearer configuration");
-   return -EINVAL;
-   }
-   if ((sa_local.ss_family & sa_remote.ss_family) == AF_INET) {
-   struct sockaddr_in *ip4;
-
-   ip4 = (struct sockaddr_in *)_local;
-   local->proto = htons(ETH_P_IP);
-   local->port = ip4->sin_port;
-   local->ipv4.s_addr = ip4->sin_addr.s_addr;
-
-   ip4 = (struct sockaddr_in *)_remote;
-   remote->proto = htons(ETH_P_IP);
-   remote->port = ip4->sin_port;
-   remote->ipv4.s_addr = ip4->sin_addr.s_addr;
+   nla_memcpy(, nla, sizeof(sa));
+   if (sa.ss_family == AF_INET) {
+   struct sockaddr_in *ip4 = (struct sockaddr_in *)
+
+   addr->proto = htons(ETH_P_IP);
+   addr->port = ip4->sin_port;
+   addr->ipv4.s_addr = ip4->sin_addr.s_addr;
return 0;
 
 #if IS_ENABLED(CONFIG_IPV6)
-   } else if ((sa_local.ss_family & sa_remote.ss_family) == AF_INET6) {
-   int atype;
-   struct sockaddr_in6 *ip6;
-
-   ip6 = (struct sockaddr_in6 *)_local;
-   atype = ipv6_addr_type(>sin6_addr);
-   if (__ipv6_addr_needs_scope_id(atype) && !ip6->sin6_scope_id)
-   return -EINVAL;
-
-   local->proto = htons(ETH_P_IPV6);
-   local->port = ip6->sin6_port;
-   memcpy(>ipv6, >sin6_addr, sizeof(struct in6_addr));
-   ub->ifindex = ip6->sin6_scope_id;
-
-   ip6 = (struct sockaddr_in6 *)_remote;
-   remote->proto = htons(ETH_P_IPV6);
-   remote->port = ip6->sin6_port;
-   memcpy(>ipv6, >sin6_addr, sizeof(struct in6_addr));
+   } else if (sa.ss_family == AF_INET6) {
+   struct sockaddr_in6 *ip6 = (struct sockaddr_in6 *)
+
+   addr->proto = htons(ETH_P_IPV6);
+   addr->port = ip6->sin6_port;
+   memcpy(>ipv6, >sin6_addr, sizeof(struct in6_addr));
+
+   /* Scope ID is only interesting for local addresses */
+   if (scope_id) {
+   int atype;
+
+   atype = ipv6_addr_type(>sin6_addr);
+   if (__ipv6_addr_needs_scope_id(atype) &&
+

Re: [tipc-discussion] [PATCH net v5 1/1] tipc: fix socket timer deadlock

2016-06-16 Thread Xue, Ying

Only one suggestion:

Please declare filter_rcv() as:
static bool filter_rcv(struct sock *sk, struct sk_buff *iskb, struct sk_buff 
*oskb)

In other words, it's better to declare the third parameter as skb pointer 
rather than struct sk_buff_head *xmitq. This is because the number of skb 
buffer generated by filter_rcv() is no more than 1.  As a result, changes to be 
made in tipc_backlog_rcv() will be much less.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Wednesday, June 15, 2016 10:46 PM
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com; gbala...@gmail.com
Subject: [PATCH net v5 1/1] tipc: fix socket timer deadlock

We sometimes observe a 'deadly embrace' type deadlock occurring between 
mutually connected sockets on the same node. This happens when the one-hour 
peer supervision timers happen to expire simultaneously in both sockets.

The scenario is as follows:

CPU 1:  CPU 2:

tipc_sk_timeout(sk1)tipc_sk_timeout(sk2)
  lock(sk1.slock) lock(sk2.slock)
  msg_create(probe)   msg_create(probe)
  unlock(sk1.slock)   unlock(sk2.slock)
  tipc_node_xmit_skb()tipc_node_xmit_skb()
tipc_node_xmit()tipc_node_xmit()
  tipc_sk_rcv(sk2)tipc_sk_rcv(sk1)
lock(sk2.slock) lock((sk1.slock)
filter_rcv()filter_rcv()
  tipc_sk_proto_rcv() tipc_sk_proto_rcv()
msg_create(probe_rsp)   msg_create(probe_rsp)
tipc_sk_respond()   tipc_sk_respond()
  tipc_node_xmit_skb()tipc_node_xmit_skb()
tipc_node_xmit()tipc_node_xmit()
  tipc_sk_rcv(sk1)tipc_sk_rcv(sk2)
lock((sk1.slock)lock((sk2.slock)
===> DEADLOCK   ===> DEADLOCK

Further analysis reveals that there are at least three different locations in 
the socket code where tipc_sk_respond() is called within the context of the 
socket lock, with ensuing risk of similar deadlocks.

We now solve this by passing a buffer queue along with all upcalls where 
sk_lock.slock may potentially be held. Response or rejected messages are 
accumulated into this queue instead of being sent out directly, and sent out 
once we know we are safely outside the sk_lock.slock context.

v2: - Testing on mutex sk_lock.owned instead of sk_lock.slock in
  tipc_sk_respond(). This is safer, since sk_lock.slock may
  occasionally and briefly be held (by concurrent user contexts)
  even if we are in user context.

v3: - By lowering the socket timeout to 36 ms instead of 3,600,000 and
  setting up 1000 connections I could easily reproduce the deadlock
  and verify that my solution works.
- When killing one of the processes I sometimes got a kernel crash
  in the loop emptying the socket write queue. Realizing that there
  may be concurrent processes emptying the write queue, I had to add
  a test that the dequeuing actually returned a buffer. This solved
  the problem.
- I tried Ying's suggestion with unconditionally adding all
  CONN_MANAGER messages to the backlog queue, and it didn't work.
  This is because we will often add the message to the backlog when
  the socket is *not* owned, so there will be nothing triggering
  execution of backlog_rcv() within acceptable time. Apart from
  that, my solution solves the problem at all three locations where
  this deadlock may happen, as already stated above.

v4: - Introduced separate queue in struct tipc_sock for the purpose
  above, instead of using the socket send queue. The socket send
  queue was used for regular message sending until commit
  f214fc402967e ("tipc: Revert tipc: use existing sk_write_queue for
  outgoing packet chain") i.e. as recent as kernel 4.5, so using
  this queue would screw up older kernel versions.
- Made small cosmetic improvement to the dequeuing loop.

v5: - Was convinced by Ying that not even v4 above is 100% safe. Now
  (re-)introducing a solution that I believe is the only really safe
  one.

Reported-by: GUNA <gbala...@gmail.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/socket.c | 54 ++
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 88bfcd7..c49b8df 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -796,9 +796,11 @@ void tipc_sk_mcast_rcv(struct net *net, struct 
sk_buff_head *arrvq,
  * @tsk: receiving socket
  * @skb: pointer to message buffer.
  */
-static v

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-15 Thread Xue, Ying


> Just for the case of responding  to CONN_MANAGER messages, testing 
> socket owner is safe. But the solution seems unsafe when 
> TIPC_ERR_OVERLOAD message is sent.
> 
> While TIPC_ERR_OVERLOAD is sent out under BH context, socket's owner is set.

Where? There are two locations where tipc_sk_respond() may be sent under BH, in 
tipc_sk_enqueue() and filter_rcv(), but it is either-or: either we are under 
BH, and socket is not owned, or it is owned, and we are not under BH.
I don't see at all what you are referring to.

[Ying] let's look at the following code again:
=
static void tipc_sk_enqueue(struct sk_buff_head *inputq, struct sock *sk,
u32 dport)
{
unsigned int lim;
atomic_t *dcnt;
struct sk_buff *skb;
unsigned long time_limit = jiffies + 2;

while (skb_queue_len(inputq)) {
if (unlikely(time_after_eq(jiffies, time_limit)))
return;

skb = tipc_skb_dequeue(inputq, dport);
if (unlikely(!skb))
return;

/* Add message directly to receive queue if possible */
if (!sock_owned_by_user(sk)) 
{-->(*1)
filter_rcv(sk, skb);
continue;
}

/* Try backlog, compensating for double-counted bytes */
dcnt = _sk(sk)->dupl_rcvcnt;
if (!sk->sk_backlog.len)
atomic_set(dcnt, 0);
lim = rcvbuf_limit(sk, skb) + atomic_read(dcnt);
if (likely(!sk_add_backlog(sk, skb, lim)))
continue;

/* Overload => reject message back to sender */
tipc_sk_respond(sk, skb, 
TIPC_ERR_OVERLOAD);-(*2)
break;
}
} 
===

Firstly we assume that tipc_sk_enqueue() is called under BH now. At place (*1), 
all skb messages will be fed to filter_rcv() if socket's owner is not set. When 
code is executed to place (*2), socket's owner is absolutely set, and now we 
still stay BH context. That's why I say we still face deadlock risk in this 
scenario when tipc_sk_respond(sk, skb, TIPC_ERR_OVERLOAD) is invoked.

Regards,
Ying



--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://pubads.g.doubleclick.net/gampad/clk?id=1444514421=/41014381
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-15 Thread Xue, Ying

I concern whether it's really safe for us by testing socket owner instead of 
sk_lock.slock.
We have identified the deadlock reason is because tipc_sk_respond() is called 
under socket lock.

Just for the case of responding  to CONN_MANAGER messages, testing socket owner 
is safe. But the solution seems unsafe when TIPC_ERR_OVERLOAD message is sent.

While TIPC_ERR_OVERLOAD is sent out under BH context, socket's owner is set. So 
the message will be sent out under BH through tipc_node_xmit_skb() instead of 
insert it into sk->sk_write_queue.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Tuesday, June 14, 2016 1:51 AM
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com; gbala...@gmail.com
Subject: [PATCH net v3 1/1] tipc: fix socket timer deadlock

We sometimes observe a 'deadly embrace' type deadlock occurring between 
mutually connected sockets on the same node. This happens when the one-hour 
peer supervision timers happen to expire simultaneously in both sockets.

The scenario is as follows:

CPU 1:  CPU 2:

tipc_sk_timeout(sk1)tipc_sk_timeout(sk2)
  lock(sk1.slock) lock(sk2.slock)
  msg_create(probe)   msg_create(probe)
  unlock(sk1.slock)   unlock(sk2.slock)
  tipc_node_xmit_skb()tipc_node_xmit_skb()
tipc_node_xmit()tipc_node_xmit()
  tipc_sk_rcv(sk2)tipc_sk_rcv(sk1)
lock(sk2.slock) lock((sk1.slock)
filter_rcv()filter_rcv()
  tipc_sk_proto_rcv() tipc_sk_proto_rcv()
msg_create(probe_rsp)   msg_create(probe_rsp)
tipc_sk_respond()   tipc_sk_respond()
  tipc_node_xmit_skb()tipc_node_xmit_skb()
tipc_node_xmit()tipc_node_xmit()
  tipc_sk_rcv(sk1)tipc_sk_rcv(sk2)
lock((sk1.slock)lock((sk2.slock)
===> DEADLOCK   ===> DEADLOCK

Further analysis reveals that there are at least three different locations in 
the socket code where tipc_sk_respond() is called within the context of the 
socket lock, with ensuing risk of similar deadlocks.

We solve this by ensuring that messages created by tipc_sk_respond() only are 
sent directly if sk_lock.owned mutex is held. Otherwise they are queued up in 
the socket write queue and sent after the socket lock has been released.

v2: - Testing on mutex sk_lock.owned instead of sk_lock.slock in
  tipc_sk_respond(). This is safer, since sk_lock.slock may
  occasionally and briefly be held (by concurrent user contexts)
  even if we are in user context.
v3: - By lowering the socket timeout to 36 ms instead of 3,600,000 and
  setting up 1000 connections I could easily reproduce the deadlock
  and verify that my solution works.
- When killing one of the processes I sometimes got a kernel crash
  in the loop emptying the socket write queue. Realizing that there
  may be concurrent processes emptying the write queue, I had to add
  a test that the dequeuing actually returned a buffer. This solved
  the problem.
- I tried Ying's suggestion with unconditionally adding all
  CONN_MANAGER messages to the backlog queue, and it didn't work.
  This is because we will often add the message to the backlog when
  the socket is *not* owned, so there will be nothing triggering
  execution of backlog_rcv() within acceptable time. Apart from
  that, my solution solves the problem at all three locations where
  this deadlock may happen, as already stated above.

Reported-by: GUNA <gbala...@gmail.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/socket.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 88bfcd7..e8ed3a8 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -278,7 +278,11 @@ static void tipc_sk_respond(struct sock *sk, struct 
sk_buff *skb, int err)
 
dnode = msg_destnode(buf_msg(skb));
selector = msg_origport(buf_msg(skb));
-   tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
+
+   if (sock_owned_by_user(sk))
+   tipc_node_xmit_skb(sock_net(sk), skb, dnode, selector);
+   else
+   skb_queue_tail(>sk_write_queue, skb);
 }
 
 /**
@@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head 
*inputq)
tipc_sk_enqueue(inputq, sk, dport);
spin_unlock_bh(>sk_lock.slock);
}
+   /* Send pending respons

Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer deadlock

2016-06-15 Thread Xue, Ying

Erik's suggest seems better.

Regards,
Ying

-Original Message-
From: Erik Hugne [mailto:erik.hu...@gmail.com] 
Sent: Tuesday, June 14, 2016 2:24 AM
To: Jon Maloy
Cc: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
gbala...@gmail.com
Subject: Re: [tipc-discussion] [PATCH net v3 1/1] tipc: fix socket timer 
deadlock

On Mon, Jun 13, 2016 at 01:51:15PM -0400, Jon Maloy wrote:
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1830,6 +1834,14 @@ void tipc_sk_rcv(struct net *net, struct sk_buff_head 
> *inputq)
>   tipc_sk_enqueue(inputq, sk, dport);
>   spin_unlock_bh(>sk_lock.slock);
>   }
> + /* Send pending response/rejected messages, if any */
> + while (!skb_queue_empty(>sk_write_queue)) {
> + skb = skb_dequeue(>sk_write_queue);
> + if (!skb)
> + break;
> + dnode = msg_destnode(buf_msg(skb));
> + tipc_node_xmit_skb(net, skb, dnode, dport);
> + }

Might i suggest this instead?
while ((skb = skb_dequeue(>sk_write_queue))) {
dnode = msg_destnode(buf_msg(skb));
tipc_node_xmit_skb(net, skb, dnode, dport);
}

//E

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://pubads.g.doubleclick.net/gampad/clk?id=1444514421=/41014381
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v7 1/4] tipc: correct error in node fsm

2016-06-08 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: 2016年5月20日 2:09
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com
Subject: [PATCH net-next v7 1/4] tipc: correct error in node fsm

commit 88e8ac7000dc ("tipc: reduce transmission rate of reset messages when 
link is down") revealed a flaw in the node FSM, as defined in the log of commit 
66996b6c47ed ("tipc: extend node FSM").

We see the following scenario:
1: Node B receives a RESET message from node A before its link endpoint
   is fully up, i.e., the node FSM is in state SELF_UP_PEER_COMING. This
   event will not change the node FSM state, but the (distinct) link FSM
   will move to state RESETTING.
2: As an effect of the previous event, the local endpoint on B will
   declare node A lost, and post the event SELF_DOWN to the its node
   FSM. This moves the FSM state to SELF_DOWN_PEER_LEAVING, meaning
   that no messages will be accepted from A until it receives another
   RESET message that confirms that A's endpoint has been reset. This
   is  wasteful, since we know this as a fact already from the first
   received RESET, but worse is that the link instance's FSM has not
   wasted this information, but instead moved on to state ESTABLISHING,
   meaning that it repeatedly sends out ACTIVATE messages to the reset
   peer A.
3: Node A will receive one of the ACTIVATE messages, move its link FSM
   to state ESTABLISHED, and start repeatedly sending out STATE messages
   to node B.
4: Node B will consistently drop these messages, since it can only accept
   accept a RESET according to its node FSM.
5: After four lost STATE messages node A will reset its link and start
   repeatedly sending out RESET messages to B.
6: Because of the reduced send rate for RESET messages, it is very
   likely that A will receive an ACTIVATE (which is sent out at a much
   higher frequency) before it gets the chance to send a RESET, and A
   may hence quickly move back to state ESTABLISHED and continue sending
   out STATE messages, which will again be dropped by B.
7: GOTO 5.
8: After having repeated the cycle 5-7 a number of times, node A will
   by chance get in between with sending a RESET, and the situation is
   resolved.

Unfortunately, we have seen that it may take a substantial amount of time 
before this vicious loop is broken, sometimes in the order of minutes.

We correct this by making a small correction to the node FSM: When a node in 
state SELF_UP_PEER_COMING receives a SELF_DOWN event, it now moves directly 
back to state SELF_DOWN_PEER_DOWN, instead of as now SELF_DOWN_PEER_LEAVING. 
This is logically consistent, since we don't need to wait for RESET 
confirmation from of an endpoint that we alread know has been reset. It also 
means that node B in the scenario above will not be dropping incoming STATE 
messages, and the link can come up immediately.

Finally, a symmetry comparison reveals that the  FSM has a similar error when 
receiving the event PEER_DOWN in state PEER_UP_SELF_COMING.
Instead of moving to PERR_DOWN_SELF_LEAVING, it should move directly to 
SELF_DOWN_PEER_DOWN. Although we have never seen any negative effect of this 
logical error, we choose fix this one, too.

The node FSM looks as follows after those changes:

   ++
   |   PEER_DOWN_EVT|
   ||
  +++   |
  |SELF_DOWN_EVT   ||   |
  |||   |
  |  +---+  +---+   |
  |  |NODE_  |  |NODE_  |   |
  |   +--|FAILINGOVER|<-|SYNCHING   |---+   |
  |   |SELF_ +---+ FAILOVER_+---+   PEER_   |   |
  |   |DOWN_EVT   |  A BEGIN_EVT  A |   DOWN_EVT|   |
  |   |   |  || |   |   |
  |   |   |  || |   |   |
  |   |   |FAILOVER_ |FAILOVER_   |SYNCH_   |SYNCH_ |   |
  |   |   |END_EVT   |BEGIN_EVT   |BEGIN_EVT|END_EVT|   |
  |   |   |  || |   |   |
  |   |   |  || |   |   |
  |   |   | +--+|   |   |
  |   |   +>|   SELF_UP_   |<---+   |   |
  |   |   +-|   PEER_UP|+   |   |
  |   |   |SELF_DOWN_EVT

Re: [tipc-discussion] [RFC PATCH] tipc: fix timer handling when socket is owned

2016-06-04 Thread Xue, Ying

Hi Jon,

Frankly speaking, I more like to Erik's proposal than yours as Erik's method is 
more common in networking subsystem.

Regarding the backtrace of the issue, it seems that we should not deliver 
message without holding "owner" flag in BH. But we still don't hold "owner" 
flag in your solution when sending message in socket timeout function.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: 2016年6月2日 21:13
To: Erik Hugne; Xue, Ying
Cc: Richard Alpe; Parthasarathy Bhuvaragan; 
tipc-discussion@lists.sourceforge.net
Subject: RE: [RFC PATCH] tipc: fix timer handling when socket is owned

From: Erik Hugne [mailto:erik.hu...@gmail.com] 
Sent: Thursday, 02 June, 2016 14:11
To: Ying Xue
Cc: Richard Alpe; Parthasarathy Bhuvaragan; Jon Maloy; 
tipc-discussion@lists.sourceforge.net
Subject: RE: [RFC PATCH] tipc: fix timer handling when socket is owned

On Jun 2, 2016 1:03 PM, "Xue, Ying" <ying@windriver.com> wrote:
>>
>> Acked-by: Ying Xue <ying@windriver.com>
>>
>> Jon, whatever the patch can fix Guna's issue or not, I think the change is 
>> right because there is an obvious error that we deliver message through 
>> tipc_node_xmit_skb() when "owner" flag is set.
>> So, I suggest that the patch should be submitted to upstream as soon as 
>> possible.
>>
>If you think the change is reasonable, could you please help me test it?
>My crappy AMD A10 laptop does not handle VM's well..
>I have one concern about it, locking policy have changed slightly since 
>tipc_node_xmit_skb is now called with the socket spinlock held. And i dont 
>know if this introduces a new race..

Logically it should be ok, since it only means that the response message always 
ends up in the backlog queue. But it is still an unnecessary change. 
Personally I liked better the first proposal, which I think was simpler.

Why not
1) grab the lock and the mutex, if necessary after several attempts.
2) fetch all relevant socket fields to the stack.
3) release the lock and the mutex
4) create the message
5) send the message

This would give only one timeout function and minimal time spent inside the 
lock/mutex.
I  you do so you should also extend the wait period for retrial to more e.g., 1 
 second until you retry. The risk of finding the socket busy again after a 
longer period is lower.

///jon

>//E
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-06-02 Thread Xue, Ying

Hi Guna,

Please see my comments below.

Regards,
Ying

-Original Message-
From: GUNA [mailto:gbala...@gmail.com] 
Sent: 2016年6月1日 23:26
To: Xue, Ying
Cc: Jon Maloy; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; 
Xue Ying (ying.x...@gmail.com)
Subject: Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 
4.4.0

If the issue is reproducible then I could try with Erik's patch even
though the root cause is unknown. Currently, we are not clear yet on
root cause of the issue. I could add the patch on production release,
only if the patch will fix the issue.

[Ying] Understood your meaning. I think Erik's patch should be merged into 
upstream whatever it can fix your issue or not. But in my opinion, it should be 
able to fix it.

 Otherwise, I may need to find
test stream.

As per Erik's proposal,
==
if (sock_owned_by_user(sk))
we can reschedule timer for a retry in a few jiffies
==

I tried to call sk_reset_timer(sk, >sk_timer, (HZ / 20));
but the code or similar already is in place at tipc_sk_timeout() as
marked by "<<==" below"

if (tsk->probing_state == TIPC_CONN_PROBING) {
  if (!sock_owned_by_user(sk))
...
  else
sk_reset_timer(sk, >sk_timer, (HZ / 20));   <<==
} else {
   sk_reset_timer(sk, >sk_timer, jiffies + tsk->probing_intv);  <<==
}

Please let me know If I need to add/modify any.

[Ying] your change above is right and it should be workable. But I still 
suggest you should adopt Erik's patch("tipc: fix timer handling when socket is 
owned ") as it's much better than above solution.

thanks,
Guna

On Wed, Jun 1, 2016 at 7:31 AM, Xue, Ying <ying@windriver.com> wrote:
> Hi GUNA,
>
> Thanks for your confirmation, which is very important for us to look into 
> what happened in 4.4.0 version.
> Yes, my mentioned Erik's patch is just as Erik said: "tipc: fix timer 
> handling when socket is owned".
>
> I also agree to Erik's solution as its change is more common method to deal 
> with the case when owner flag is not set in BH.
>
> But now we still need to know what root cause is the issue.
>
> If possible, please apply Erik's patch on your side to check whether the 
> issue occurs or not.
>
> Regards,
> Ying
>
> -Original Message-
> From: GUNA [mailto:gbala...@gmail.com]
> Sent: 2016年5月31日 23:34
> To: Xue, Ying
> Cc: Jon Maloy; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; 
> Xue Ying (ying.x...@gmail.com)
> Subject: Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card 
> on 4.4.0
>
> Just want to clarify, system was upgraded only the kernel from 3.4.2
> to 4.4.0 + some tipc patches on Fedora distribution. That said, the
> patch, "net: do not block BH while processing socket backlog" is not
> part of the 4.4.0. So, the issue is not due to this commit.
>
> If the patch, "tipc: block BH in TCP callbacks" could resolve the
> issue then I could try applying the patch. However the issue is not
> reproducible. So, we may not get the result right away.
>
> Which Erik's patch you are talking about?
> Is this one, "tipc: fix timer handling when socket is owned" ?
>
>
> /// Guna
>
> On Tue, May 31, 2016 at 3:49 AM, Xue, Ying <ying@windriver.com> wrote:
>> Hi Jon,
>>
>> Today, I spent time further analyzing its potential root cause why the soft 
>> lockup occurred regarding log provided by GUNA. But I don't find some 
>> valuable hints.
>>
>> To be honest, even if CONN_MANAGER/CONN_PROBE message is sent through 
>> tipc_node_xmit_skb() without holding "owner" flag in tipc_sk_timeout(), 
>> deadlock should not happen in theory. Before the tipc_sk_rcv() is secondly 
>> called, destination port and source port of CONN_MANAGER/CONN_PROBE message 
>> created in tipc_sk_timeout() have been reversed. As a result, the tsk found 
>> at (*1) is different with another tsk found at (*2) because we use different 
>> destination number to look up tsk instances.
>>
>> tipc_sk_timeout()
>>   create: CONN_MANAGER/CONN_PROBE msg (src port= tsk->portid, dst port = 
>> peer_port)
>>   tipc_node_xmit_skb()
>> tipc_node_xmit()
>>   tipc_sk_rcv()
>> tsk = tipc_sk_lookup(net, dport); // use dst port(peer_port) to look 
>> up tsk, and the tsk is called tsk1 (*1)
>> if (likely(spin_trylock_bh(>sk_lock.slock)))
>> tipc_sk_enqueue()
>>   filter_rcv()
>> tipc_sk_proto_rcv()
>>tipc_sk_respond()
>>  reverse ports: dport = tsk->portid;  src port = 
>> peer_port
>>

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-30 Thread Xue, Ying

Hi Jon,

First of all, slock lock is designed very specially and wisely. In process 
context, it's similar to a mutex. By contrast, in interrupt context, it likes a 
spin lock. Moreover, it can safely protect members of sock struct on both 
contexts. When we are in interrupt/softirq mode and the "owned" flag is set, we 
know that sk_receive_queue is being operated by a process. At the moment we 
have to put skb buffer into backlog queue. When release_lock() is called by 
that process who set "owner" flag, all skb buffers will be moved from backlog 
queue to sk_recevie_queue. Regarding the scenario of slock protection, there 
are three different race conditions:
1. process vs process contexts, that is, two processes running in different 
CPUs concurrently access a sock instance. In this case, slock can is used as 
mutex to protect sock struct. For example, one process is check "owned" flag is 
set, it will be slept until the flag is unset by another process.
2. process vs softirq contexts. In process, once "owned" flag is set, slock 
spin lock is released immediately. In softirq context, it first grabs slock, 
and then checks whether the flag is set or not. If it's set, it will try later 
on, or put skb into backlog queue and then exit. 
3. softirq vs softirq contexts. Slock plays the role of spin lock semantics in 
this case. Moreover, it's not necessary to explicitly disable BH under this 
mode as we already know we are in BH mode, and BH is disabled when softirq's 
handler is called. This also means that all functions called in BH are 
nonreentarnt. 

Regarding the principles above, firstly we cannot use lock_sock() 
tipc_sk_timeout() as it is called in BH mode. If we really want to do, deadlock 
will definitely happen as might_sleep() would be called in BH. Secondly, it's 
unnecessary to set "owned" flag in tipc_sk_timeout() as it's run in BH mode.  
In my opinion, the fundamental  method to avoid the issue should be to ensure 
any functions called in BH cannot be iteratively invoked , otherwise, it's hard 
to definitely prevent the issue from happening again in other similar scenarios.

In fact we ever proposed several possible solutions of how to well deal with 
the concurrent scenario in skb receive path. Especially, we need to stay BH 
mode to directly forward back a skb received in BH, which very easily causes 
the problem that a skb will be routed back and forth, leading deadlock occurs. 
Unfortunately, so far it seems that we are still unable to perfectly solve the 
issue. Therefore, maybe we have to radically change the current mechanism of 
receiving skb and forward skb on BH mode.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: 2016年5月30日 5:32
To: GUNA; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; Xue, 
Ying; Xue Ying (ying.x...@gmail.com)
Subject: RE: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 
4.4.0

Hi Guna,
I am looking at it, but don't have much time to spend at it right now. 
A further study of your dump makes me believe this is a case of race between 
tipc_recv_stream(), which in user context  is setting the "owned" flag but not 
grabbing slock, and tipc_sk_timeout(), which in interrupt mode is grabbing 
slock without checking the "owned" flag.  This creates an obvious risk of a 
race, but I don't yet understand the exact implications or solution to it. It 
may be that it is sufficient to just replace the grabbing of slock with a call 
to lock_sock() in the timer function as I first suggested, but I rather believe 
we have to  explicitly grab both slock and the flag and keep them for the 
duration of the timeout, before tipc_node_xmit_skb().  This will protect the 
timeout against interference from both user context (tipc_recv_stream() and 
other sw interrups (tipc_sk_rcv()) and vice versa.  Or maybe not even this is 
enough?

I will continue my analysis, but input from others would be appreciated.

///jon


> -Original Message-
> From: GUNA [mailto:gbala...@gmail.com]
> Sent: Saturday, 28 May, 2016 06:00
> To: Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne
> Subject: Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card 
> on 4.4.0
> 
> Any update on the issue? Any other thoughts or possible fix ?
> 
> The issue was seen on slot12 (1.1.12) node only. The other slots were up.
> 
> I got the full logs as listed here:
> 
> 
> May 19 05:03:01 [SEQ 248049] dcsx5testslot13 /USR/SBIN/CROND[11359]:
> (root) CMD (/opt/cpu_ss7gw/current/scripts/mgmt_apache_watchdog)
> May 19 05:03:21 [SEQ 249182] dcsx5testslot12 kernel:  [673637.606852]
> NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
> May 19 05:03:21 [SEQ 249183] dcsx5testslot12 kernel:  [673637.607791]
> NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/

Re: [tipc-discussion] tipc: tipc_recv_stream with kernel panic

2016-05-04 Thread Xue, Ying

Thank you for the testing and report!

To be honest, we never met so many issues before. So I doubt all problems you 
encountered may be involved by the recent changes. 
If you have more detailed failure logs, please share them with us so that we 
can look into its root cause.

Thanks,
Ying

-Original Message-
From: GUNA [mailto:gbala...@gmail.com] 
Sent: 2016年5月4日 0:41
To: Xue, Ying
Cc: Erik Hugne; tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Richard Alpe
Subject: Re: [tipc-discussion] tipc: tipc_recv_stream with kernel panic

Thanks Ying.

As you suggested, I will revert the "tipc: avoid packets leaking on socket 
receive queue" patch. Since the issue is not reproducible on demand, I may need 
to wait the issue is seen again or not with the new driver.

We do experience in traffic throughput due to TIPC connections are failing (not 
with this change). If anyone aware of the failures please let me know.

Background 
System is originally based on Kernel 3.4.2 on Fedora 16 and stable.
Recently, I updated the system with kernel 4.4.0. All stock kernel drivers are 
being used and no customization except ported some latest TIPC patches to fix 
some TIPC issues.

All 6 routing CPUs went down over the course of the weekend. The noted output 
is from one CPU; others are unknown, but assumed to have gone down with the 
same cause. Also note that in addition, one of the routing cards (no output 
available) went down again on Monday.

The new kernel 4.4.0 is being used in the system Since April 1st, and seen the 
issue 3 times so far. All these times, mostly heartbeat type traffic, not heavy 
traffic.
=

On Tue, May 3, 2016 at 6:26 AM, Xue, Ying <ying@windriver.com> wrote:
> I agree with Erik too.
>
>
>
> The oops should be caused by socket was freed early. But
>
>
>
> GUNA, can you reproduce the issue? If so, please try to revert the 
> commit
> f4195d1eac954a67adf112dd53404560cc55b942 (“tipc: avoid packets leaking 
> on socket receive queue”), and verify whether the issue occurs or not.
>
>
>
> I suspect the commit bring some unknown side effect, leading to the panic.
>
>
>
> Thanks,
>
> Ying
>
> From: Erik Hugne [mailto:erik.hu...@gmail.com]
> Sent: 2016年5月3日 13:09
> To: GUNA
> Cc: tipc-discussion@lists.sourceforge.net;
> parthasarathy.bhuvara...@ericsson.com; Richard Alpe; Xue, Ying
> Subject: Re: [tipc-discussion] tipc: tipc_recv_stream with kernel 
> panic
>
>
>
> (On mobile)
>
> At first glance, it seems that the socket was freed, but there was a 
> pending wakeup signal for it. Which then causes the subsequent 
> spin_lock_bh() to deref freed mem.
>
> //E
>
> On May 3, 2016 02:43, "GUNA" <gbala...@gmail.com> wrote [...]
>>> [375832.498126] BUG: unable to handle kernel paging request at
>>> 01a400015ff4
>> [375832.505300] IP: []
>> queued_spin_lock_slowpath+0xe6/0x160
>> [375832.512394] PGD 0
>> [375832.514657] Oops: 0002 [#1] SMP
>> [375832.518306] Modules linked in: nf_log_ipv6 nf_log_ipv4 
>> nf_log_common xt_LOG sctp libcrc32c e1000e tipc udp_tunnel 
>> ip6_udp_tunnel 8021q garp iTCO_wdt xt_physdev br_netfilter bridge stp 
>> llc nf_conntrack_ipv4 nf_defrag_ipv4 ipmiq_drv(O) sio_mmc(O) 
>> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state 
>> nf_conntrack lockd ip6table_filter event_drv(O) ip6_tables grace
>> pt_timer_info(O) ddi(O) usb_storage ixgbe igb i2c_i801 
>> iTCO_vendor_support i2c_algo_bit ioatdma intel_ips i2c_core pcspkr 
>> sunrpc ptp mdio dca pps_core lpc_ich tpm_tis mfd_core tpm [last
>> unloaded: iTCO_wdt]
>> [375832.573693] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G   O
>>  4.4.0 #14
>> [375832.581385] Hardware name: PT AMC124/Base Board Product Name, 
>> BIOS
>> LGNAJFIP.PTI.0012.P15 01/15/2014
>> [375832.591028] task: 880351a89b40 ti: 880351a9 task.ti:
>> 880351a9
>> [375832.599026] RIP: 0010:[]  []
>> queued_spin_lock_slowpath+0xe6/0x160
>> [375832.608964] RSP: 0018:88035fc83d58  EFLAGS: 00010002 
>> [375832.614825] RAX: 1447 RBX: 0292 RCX:
>> 88035fc95fc0
>> [375832.622743] RDX: 01a400015ff4 RSI: 0014 RDI:
>> 880351232f80
>> [375832.630567] RBP: 88035fc83d58 R08: 0101 R09:
>> 0004
>> [375832.638348] R10:  R11:  R12:
>> 01001002
>> [375832.645919] R13: 0001 R14:  R15:
>> 
>> [375832.653610] FS:  () GS:88035fc8()
>> knlGS:
>> [375832.662317] CS:  0010 DS:

Re: [tipc-discussion] tipc: tipc_recv_stream with kernel panic

2016-05-03 Thread Xue, Ying

I agree with Erik too.

The oops should be caused by socket was freed early. But

GUNA, can you reproduce the issue? If so, please try to revert the commit 
f4195d1eac954a67adf112dd53404560cc55b942 (“tipc: avoid packets leaking on 
socket receive queue”), and verify whether the issue occurs or not.

I suspect the commit bring some unknown side effect, leading to the panic.

Thanks,
Ying
From: Erik Hugne [mailto:erik.hu...@gmail.com]
Sent: 2016年5月3日 13:09
To: GUNA
Cc: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Richard Alpe; Xue, Ying
Subject: Re: [tipc-discussion] tipc: tipc_recv_stream with kernel panic


(On mobile)

At first glance, it seems that the socket was freed, but there was a pending 
wakeup signal for it. Which then causes the subsequent spin_lock_bh() to deref 
freed mem.

//E

On May 3, 2016 02:43, "GUNA" <gbala...@gmail.com<mailto:gbala...@gmail.com>> 
wrote
[...]
>> [375832.498126] BUG: unable to handle kernel paging request at 
>> 01a400015ff4
> [375832.505300] IP: [] queued_spin_lock_slowpath+0xe6/0x160
> [375832.512394] PGD 0
> [375832.514657] Oops: 0002 [#1] SMP
> [375832.518306] Modules linked in: nf_log_ipv6 nf_log_ipv4
> nf_log_common xt_LOG sctp libcrc32c e1000e tipc udp_tunnel
> ip6_udp_tunnel 8021q garp iTCO_wdt xt_physdev br_netfilter bridge stp
> llc nf_conntrack_ipv4 nf_defrag_ipv4 ipmiq_drv(O) sio_mmc(O)
> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> nf_conntrack lockd ip6table_filter event_drv(O) ip6_tables grace
> pt_timer_info(O) ddi(O) usb_storage ixgbe igb i2c_i801
> iTCO_vendor_support i2c_algo_bit ioatdma intel_ips i2c_core pcspkr
> sunrpc ptp mdio dca pps_core lpc_ich tpm_tis mfd_core tpm [last
> unloaded: iTCO_wdt]
> [375832.573693] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G   O
>  4.4.0 #14
> [375832.581385] Hardware name: PT AMC124/Base Board Product Name, BIOS
> LGNAJFIP.PTI.0012.P15 01/15/2014
> [375832.591028] task: 880351a89b40 ti: 880351a9 task.ti:
> 880351a9
> [375832.599026] RIP: 0010:[]  []
> queued_spin_lock_slowpath+0xe6/0x160
> [375832.608964] RSP: 0018:88035fc83d58  EFLAGS: 00010002
> [375832.614825] RAX: 1447 RBX: 0292 RCX:
> 88035fc95fc0
> [375832.622743] RDX: 01a400015ff4 RSI: 0014 RDI:
> 880351232f80
> [375832.630567] RBP: 88035fc83d58 R08: 0101 R09:
> 0004
> [375832.638348] R10:  R11:  R12:
> 01001002
> [375832.645919] R13: 0001 R14:  R15:
> 
> [375832.653610] FS:  () GS:88035fc8()
> knlGS:
> [375832.662317] CS:  0010 DS:  ES:  CR0: 8005003b
> [375832.668483] CR2: 01a400015ff4 CR3: 01c0a000 CR4:
> 06e0
> [375832.676133] Stack:
> [375832.678344]  88035fc83d78 816de2c1 88034a8bba60
> 880351232f80
> [375832.686163]  88035fc83db8 810bc592 88035fc83dc8
> 880351758000
> [375832.694139]  01001002  b802f4bd
> a024e6f0
> [375832.702154] Call Trace:
> [375832.704844]  
> [375832.707018]  [] rqsav_raw_spin_lock_ie+0x31/0x40

rqsav_raw_spin_lock_ie??
Is this some proprietary spinlock implementation?

> [375832.713970]  [] __wake_up+0x32/0x70
> [375832.719444]  [] ? tipc_recv_stream+0x370/0x370 [tipc]
> [375832.726589]  [] sock_def_wakeup+0x30/0x40
> [375832.732566]  [] tipc_sk_timeout+0x148/0x180 [tipc]
> [375832.739388]  [] ? tipc_recv_stream+0x370/0x370 [tipc]
> [375832.746507]  [] call_timer_fn+0x44/0x110
> [375832.752378]  [] ? cascade+0x4a/0x80
> [375832.757848]  [] ? tipc_recv_stream+0x370/0x370 [tipc]
> [375832.764871]  [] run_timer_softirq+0x22c/0x280
> [375832.771175]  [] __do_softirq+0xc8/0x260
> [375832.776958]  [] irq_exit+0x83/0xb0
> [375832.782369]  [] do_IRQ+0x65/0xf0
> [375832.787607]  [] common_interrupt+0x7f/0x7f
> [375832.793709]  
> [375832.795803]  [] ? cpuidle_enter_state+0xad/0x200
> [375832.802765]  [] ? cpuidle_enter_state+0x91/0x200
> [375832.809338]  [] cpuidle_enter+0x17/0x20
> [375832.815155]  [] call_cpuidle+0x37/0x60
> [375832.821184]  [] ? cpuidle_select+0x13/0x20
> [375832.827249]  [] cpu_startup_entry+0x211/0x2d0
> [375832.833535]  [] start_secondary+0x103/0x130
> [375832.839759] Code: 87 47 02 c1 e0 10 85 c0 74 38 48 89 c2 c1 e8 12
> 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 c0 5f 01 00 48 03 14 c5
> 00 b2 d1 81 <48> 89 0a 8b 41 08 85 c0 75 0d f3 90 8b 41 08 85 c0 74 f7
> eb 02
> [375832.861151] RIP  [] queued_spin_lock_slowpath+0xe6/0x160
> [375832.868607]  RSP 
> [375832.872371] CR2: 01a40

Re: [tipc-discussion] [PATCH net-next v1] tipc: add the ability to get UDP options via netlink

2016-05-01 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Richard Alpe [mailto:richard.a...@ericsson.com] 
Sent: 2016年4月25日 16:51
To: tipc-discussion@lists.sourceforge.net
Cc: jon.ma...@ericsson.com; Xue, Ying; Richard Alpe
Subject: [PATCH net-next v1] tipc: add the ability to get UDP options via 
netlink

Add UDP bearer options to netlink bearer get message. This enables us to print 
UDP options from user space.

The UDP bearer information is passed using either a sockaddr_in or
sockaddr_in6 structs. This means the user space receiver should intermediately 
store the retrieved data in a large enough struct
(sockaddr_strage) before casting to the proper type.

Signed-off-by: Richard Alpe <richard.a...@ericsson.com>
---
 net/tipc/bearer.c|  7 +++
 net/tipc/udp_media.c | 53 
 net/tipc/udp_media.h | 42 +
 3 files changed, 102 insertions(+)
 create mode 100644 net/tipc/udp_media.h

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 6f11c62..3fdb23f 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -41,6 +41,7 @@
 #include "discover.h"
 #include "bcast.h"
 #include "netlink.h"
+#include "udp_media.h"
 
 #define MAX_ADDR_STR 60
 
@@ -667,6 +668,12 @@ static int __tipc_nl_add_bearer(struct tipc_nl_msg *msg,
goto prop_msg_full;
 
nla_nest_end(msg->skb, prop);
+
+   if (bearer->media->type_id == TIPC_MEDIA_TYPE_UDP) {
+   if (__tipc_nl_add_udp_bearer(msg, bearer))
+   goto attr_msg_full;
+   }
+
nla_nest_end(msg->skb, attrs);
genlmsg_end(msg->skb, hdr);
 
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index c9cf2be..514337e 
100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -257,6 +257,59 @@ static int enable_mcast(struct udp_bearer *ub, struct 
udp_media_addr *remote)
return err;
 }
 
+static int __tipc_nl_add_udp_addr(struct sk_buff *skb,
+ struct udp_media_addr *addr, int nla_t) {
+   if (addr->proto == htons(ETH_P_IP)) {
+   struct sockaddr_in ip4;
+
+   ip4.sin_family = AF_INET;
+   ip4.sin_port = addr->udp_port;
+   ip4.sin_addr.s_addr = addr->ipv4.s_addr;
+   if (nla_put(skb, nla_t, sizeof(ip4), ))
+   return -EMSGSIZE;
+
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (addr->proto == htons(ETH_P_IPV6)) {
+   struct sockaddr_in6 ip6;
+
+   ip6.sin6_family = AF_INET6;
+   ip6.sin6_port  = addr->udp_port;
+   memcpy(_addr, >ipv6, sizeof(struct in6_addr));
+   if (nla_put(skb, nla_t, sizeof(ip6), ))
+   return -EMSGSIZE;
+#endif
+   }
+
+   return 0;
+}
+
+int __tipc_nl_add_udp_bearer(struct tipc_nl_msg *msg, struct 
+tipc_bearer *b) {
+   int res;
+   struct nlattr *nest;
+   struct udp_media_addr *src;
+   struct udp_media_addr *dst;
+
+   src = (struct udp_media_addr *)>addr.value;
+   dst = (struct udp_media_addr *)>bcast_addr.value;
+
+   nest = nla_nest_start(msg->skb, TIPC_NLA_BEARER_UDP_OPTS);
+   if (!nest)
+   goto msg_full;
+
+   if (__tipc_nl_add_udp_addr(msg->skb, src, TIPC_NLA_UDP_LOCAL))
+   goto msg_full;
+   if (__tipc_nl_add_udp_addr(msg->skb, dst, TIPC_NLA_UDP_REMOTE))
+   goto msg_full;
+
+   nla_nest_end(msg->skb, nest);
+   return 0;
+msg_full:
+   nla_nest_cancel(msg->skb, nest);
+   return -EMSGSIZE;
+}
+
 /**
  * parse_options - build local/remote addresses from configuration
  * @attrs: netlink config data
diff --git a/net/tipc/udp_media.h b/net/tipc/udp_media.h new file mode 100644 
index 000..cde44d8
--- /dev/null
+++ b/net/tipc/udp_media.h
@@ -0,0 +1,42 @@
+/*
+ * net/tipc/udp_media.h: Include file for UDP bearer media
+ *
+ * Copyright (c) 1996-2006, 2013-2016, Ericsson AB
+ * Copyright (c) 2005, 2010-2011, Wind River Systems
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *contributors may be used to endorse or promote products derived from
+ *this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under t

Re: [tipc-discussion] [PATCH net-next 0/3] tipc: redesign socket-level flow control

2016-05-01 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: 2016年4月21日 23:38
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com
Subject: [PATCH net-next 0/3] tipc: redesign socket-level flow control

The socket-level flow control in TIPC has long been due for a major overhaul. 
This series fixes this.

Jon Maloy (3):
  tipc: re-enable compensation for socket receive buffer double counting
  tipc: propagate peer node capabilities to socket layer
  tipc: redesign connecton-level flow control

 net/tipc/core.c   |   8 ++-
 net/tipc/msg.h|  14 +-
 net/tipc/node.c   |  21 +++-
 net/tipc/node.h   |   6 ++-
 net/tipc/socket.c | 144 +++---
 net/tipc/socket.h |  17 +--
 6 files changed, 145 insertions(+), 65 deletions(-)

--
1.9.1

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next 3/3] tipc: redesign connecton-level flow control

2016-05-01 Thread Xue, Ying



> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: 2016年4月21日 23:38
> To: tipc-discussion@lists.sourceforge.net;
> parthasarathy.bhuvara...@ericsson.com; Xue, Ying;
> richard.a...@ericsson.com; jon.ma...@ericsson.com
> Cc: ma...@donjonn.com
> Subject: [PATCH net-next 3/3] tipc: redesign connecton-level flow control

s/conneton/connection

> 
> There are two flow control mechanisms in TIPC; one at link level that handles
> network congestion, burst control, and retransmission, and one at
> connection level which' only remaining task is to prevent overflow in the
> receiving socket buffer. In TIPC, the latter task has to be solved end-to-end
> because messages can not be thown away once they have been accepted

s/thown/thrown

> and delivered upwards from the link layer, i.e, we can never permit the
> receive buffer to overflow.
> 
> Currently, this algorithm is message based. A counter in the receiving socket
> keeps track of number of consumed messages, and sends a dedicated
> acknowledge message back to the sender for each 256 consumed message.
> A counter at the sending end keeps track of the sent, not yet acknowledged
> messages, and blocks the sender if this number ever reaches
> 512 unacknowledged messages. When the missing acknowledge arrives, the
> socket is then woken up for renewed transmission. This works well for
> keeping the message flow running, as it almost never happens that a sender
> socket is blocked this way.
> 
> A problem with the current mechanism is that it is potentially very memory
> consuming. Since we don't distinguish beween very small and very large

s/beween/between

> messages, we have to dimension the socket receive buffer according to a
> worst-case of both. I.e., the window size must be chosen large enough to
> sustain a reasonable throughput even for the smallest messages, while we
> must still consider a scenario where all messages are of maximum size. Hence,
> the current fix window size of 512 messages and a maximum message size of
> 66k results in a receive buffer of 66 MB when truesize(66k) = 131k is taken
> into account. It is possible to do much better.
> 
> This commit introduces an algorithm where we instead use 1024-byte blocks
> as base unit. This unit, always rounded upwards from the actual message size,
> is used when we advertise windows as well as when we count and
> acknowledge transmitted data. The advertised window is is based on the

Please remove one redundant "is"

> configured receive buffer size in such a way that even the worst-case
> truesize/msgsize ratio always is covered. Since the smallest possible message
> size (from a flow control viewpoint) now is
> 1024 bytes, we can safely assume this ratio to be less than four, which is the
> value we are now using.
> 
> This way, we have been able to reduce the default receive buffer size from
> 66 MB to 2 MB with maintained performance.
> 
> In order to keep this solution backwards compatible, we introduce a new
> capabability bit in the discovery protocol, and use this throughout the

s/capabability/capability

> message sending/reception path to always select the right base unit.
> 
> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
> ---
>  net/tipc/core.c   |   8 ++--
>  net/tipc/msg.h|  14 +-
>  net/tipc/node.h   |   5 +-
>  net/tipc/socket.c | 140 +++-
> --
>  net/tipc/socket.h |  17 +--
>  5 files changed, 122 insertions(+), 62 deletions(-)
> 
> diff --git a/net/tipc/core.c b/net/tipc/core.c index 03a8428..3c94d76 100644
> --- a/net/tipc/core.c
> +++ b/net/tipc/core.c
> @@ -111,11 +111,9 @@ static int __init tipc_init(void)
> 
>   pr_info("Activated (version " TIPC_MOD_VER ")\n");
> 
> - sysctl_tipc_rmem[0] = TIPC_CONN_OVERLOAD_LIMIT >> 4 <<
> -   TIPC_LOW_IMPORTANCE;
> - sysctl_tipc_rmem[1] = TIPC_CONN_OVERLOAD_LIMIT >> 4 <<
> -   TIPC_CRITICAL_IMPORTANCE;
> - sysctl_tipc_rmem[2] = TIPC_CONN_OVERLOAD_LIMIT;
> + sysctl_tipc_rmem[0] = RCVBUF_MIN;
> + sysctl_tipc_rmem[1] = RCVBUF_DEF;
> + sysctl_tipc_rmem[2] = RCVBUF_MAX;
> 
>   err = tipc_netlink_start();
>   if (err)
> diff --git a/net/tipc/msg.h b/net/tipc/msg.h index 58bf515..024da8a 100644
> --- a/net/tipc/msg.h
> +++ b/net/tipc/msg.h
> @@ -743,16 +743,26 @@ static inline void msg_set_msgcnt(struct tipc_msg
> *m, u16 n)
>   msg_set_bits(m, 9, 16, 0x, n);
>  }
> 
> -static inline u32 msg_bcast_tag(struct tipc_msg *m)
> +static inline u32 msg_conn_ack(struct tipc_msg *m)
>  {
>   return msg_bits(m, 9, 16, 0x)

Re: [tipc-discussion] [patch] tipc: remove an unnecessary NULL check

2016-04-27 Thread Xue, Ying

> From: Dan Carpenter [mailto:dan.carpen...@oracle.com]
> Sent: 2016年4月27日 16:05
> To: Jon Maloy
> Cc: Xue, Ying; David S. Miller; net...@vger.kernel.org; tipc-
> discuss...@lists.sourceforge.net; kernel-janit...@vger.kernel.org
> Subject: [patch] tipc: remove an unnecessary NULL check
> 
> This is never called with a NULL "buf" and anyway, we dereference 's' on the
> lines before so it would Oops before we reach the check.
> 
> Signed-off-by: Dan Carpenter <dan.carpen...@oracle.com>

Acked-by: Ying Xue <ying@windriver.com>

> 
> diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c index 79de588..0dd0224
> 100644
> --- a/net/tipc/subscr.c
> +++ b/net/tipc/subscr.c
> @@ -326,8 +326,7 @@ static void tipc_subscrb_rcv_cb(struct net *net, int
> conid,
>   return tipc_subscrp_cancel(s, subscriber);
>   }
> 
> - if (s)
> - tipc_subscrp_subscribe(net, s, subscriber, swap);
> + tipc_subscrp_subscribe(net, s, subscriber, swap);
>  }
> 
>  /* Handle one request to establish a new subscriber */
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: add neighbor monitoring framework

2016-04-25 Thread Xue, Ying

Hi Jon,

Please see my comments inline. Sorry I use outlook client to reply the mail.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: 2016年4月21日 0:23
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com
Subject: [PATCH net-next v3 1/1] tipc: add neighbor monitoring framework

TIPC based clusters are by default set up with full-mesh link connectivity 
between all nodes. Those links are expected to provide a short failure 
detection time, by default set to 1500 ms. Because of this, the background load 
for neighbor monitoring in an N-node cluster increases with a factor N on each 
node, while the overall monitoring traffic through the network infrastructure 
inceases at a ~(N * (N - 1)) rate. Experience has shown that such clusters 
don't scale well beyond ~100 nodes unless we significantly increase failure 
discovery tolerance.

[Ying]s/ inceases/ increases

This commit introduces a framework and an algorithm that drastically reduces 
this background load, while basically maintaining the original failure 
detection times across the whole cluster. Using this algortithm, background 
load will now grow at a rate of ~(2 * sqrt(N)) per node, and at ~(2 * N * 
sqrt(N)) in 

[Ying]s/ algortithm/algorithm

traffic overhead. As an example, each node will now have to actively monitor 38 
neighbors in a 400-node cluster, instead of as before 399.

This "Overlapping Ring Supervision Algorithm" is completely distributed and 
employs no centralized or coordinated state. It goes as follows:

- Each node makes up a linearly ascending, circular list of all its N
  known neighbors, based on their TIPC node identity. This algorithm
  must be the same on all nodes.

- The node then selects the next M = sqrt(N) - 1 nodes downstream from
  itself in the list, and chooses to actively monitor those. This is
  called its "local monitoring domain".

- It creates a domain record describing the monitoring domain, and
  piggy-backs this in the data area of all neighbor monitoring messages
  (LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
  the cluster eventually (default within 400 ms) will learn about
  its monitoring domain.

- Whenever a node discovers a change in its local domain, e.g., a node
  has been added or has gone down, it creates and sends out a new
  version of its node record to inform all neighbors about the change.

- A node receiving a domain record from anybody outside its local domain
  matches this against its own list (which may not look the same), and
  chooses to not actively monitor those members of the received domain
  record that are also present in its own list. Instead, it relies on
  indications from the direct monitoring nodes if an indirectly
  monitored node has gone up or down. If a node is indicated lost, the
  receiving node temporarily activates its own direct monitoring towards
  that node in order to confirm, or not, that it is actually gone.

- Since each node is actively montoring sqrt(N) downstream neighbors,

[Ying] s/ montoring/ monitoring

  each node is also actively monitored by the same number of upstream
  neighbors. This means that all non-direct monitoring nodes normally
  will receive sqrt(N) indications that a node is gone.

- A major drawback with ring monitoring is how it handles failures that
  cause massive network partitionings. If both a lost node and all its
  direct monitoring neighbors are inside the lost partition, the nodes in
  the remaining partition will never receive indications about the loss.
  To overcome this, each node also chooses to actively monitor some
  nodes outside its local domain. Those nodes are called remote domain
  "heads", and are selected in such a way that no node in the cluster
  will be more than two direct monitoring hops away. Because of this,
  each node, apart from monitoring the member of its local domain, will
  also typically monitor sqrt(N) remote head nodes.

- As an optimization, local list status, domain status and domain
  records are marked with a generation number. This saves senders from
  unecessarily conveying  unaltered domain records, and receivers from

[Ying]s/ unecessarily/ unnecessarily

  performing unneeded re-adaptations of their node monitoring list, such
  as re-assigning domain heads.

v2: - Updated according to comments from Richard Alpe. His proposal
  for the peer_xxx() functions didn't work out well, but for the
  rest it was ok.
- Added a breakpoint (cluster size) where monitoring algo should
  switch from "full mesh" to "overlapping ring". Default value
  32, but I even tested to set it to 100 without any problems.
  This should be the variable Partha should make configurable,
  instead of just toggling as I suggested originally. By setting

Re: [tipc-discussion] [PATCH net-next 1/1] tipc: add neighbor monitoring framework

2016-04-13 Thread Xue, Ying

Hi Jon,

It's very nice solution.

But I have a few of questions about the algorithm:

1. Is it possible to set a minimal node number which identify whether we deploy 
the new algorithm or not? For instance, if the number of nodes in a cluster is 
less than a threshold, such as, 16 or a smaller value, we still use old 
algorithm to detect link's failure. Of course, I am not sure whether it's worth 
doing this. Please identify.

2. Is it possible that we can set different tolerance values for nodes existing 
in local domain and remote domains?  In other words, if neighbors in the same 
local domain, the tolerance can be set to a short time. In contrast, if both 
neighbors are in local domain and a remote domain respectively, the tolerance 
of the link between them can set a bit longer so that the link is not 
unnecessarily broken. Maybe the idea is not reasonable. Instead, with the 
number of nodes in a cluster growing, we can automatically expand default 
link's tolerance value to avoid the fact link is unnecessarily broken, which is 
a bit simpler than the former.

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: 2016年4月10日 2:38
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com
Subject: [PATCH net-next 1/1] tipc: add neighbor monitoring framework

TIPC based clusters are by default set up with full-mesh link connectivity 
between all nodes. Those links are expected to provide a short failure 
detection time, by default set to 1500 ms. Because of this, the background load 
for neighbor monitoring in an N-node cluster increases with a factor N on each 
node, while the overall monitoring traffic through the network infrastructure 
inceases at a ~(N * (N - 1)) rate. Experience has shown that such clusters 
don't scale well beyond ~100 nodes unless we significantly increase failure 
discovery tolerance.

This commit introduces a framework and an algorithm that drastically reduces 
this background load, while basically maintaining the original failure 
detection times across the whole cluster. Using this algortithm, background 
load will now grow at a rate of ~(2 * sqrt(N)) per node, and at ~(2 * N * 
sqrt(N)) in traffic overhead. As an example, each node will now have to 
actively monitor 38 neighbors in a 400-node cluster, instead of as before 399.

This "Overlapping Ring Supervision Algorithm" is completely distributed and 
employs no centralized state. It goes as follows:

- Each node makes up a linearly ascending, circular list of all its
  N known neighbors, based on their TIPC node identity. This algorithm
  must be the same on all nodes.

- The node then selects the next M = sqrt(N)-1 nodes downstream in the
  list, and chooses to actively monitor those. This is called its
  "local monitoring domain".

- It creates a domain record describing the monitoring domain, and
  piggy-backs this in the data area of all neigbor monitoring messages
  (LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
  the cluster eventually (default within 400 ms) will learn about
  its monitoring domain.

- Whenever a node discovers a change in its local domain, e.g., a node
  has been added or has gone down, it creates and sends out a new
  version of its node record to inform all neighbors about the change.

- A node receiving a domain record from anybody outside its local domain
  matches this against its own list (which may not look the same), and
  chooses to not actively monitor those members of the received domain
  record that are also present in its own list. Instead, it relies on
  indications from the direct monitoring nodes if an indirecly monitored
  node has gone up or down. If a node is indicated lost, the receiving
  node temporarily activates its own direct monitoring towards that node
  in order to confirm, or not, that it is actually gone.

- Since each node is actively montoring sqrt(N) downstream neighbors,
  each node is also actively monitored by the same number of upstream
  neighbors. This means that all non-direct monitoring nodes normally
  will receive sqrt(N) indications that a node is gone.

- A major drawback with ring monitoring is how to handle failures that
  causes massive network partitionings. If both a lost node and all its
  direct monitoring neigbors are inside the lost partition, the nodes in
  the remaining partition will never receive indications about the loss.
  To overcome this, each node also chooses to actively monitor some
  nodes outside its local domain. Those nodes are called remote domain
  "heads", and are selected in such a way that no node in the cluster
  is more than one indirect monitoring hop away. Because of this, each
  node, apart from monitoring the member of its local domain, will also
  typically monitor sqrt(N) remote head nodes.

- As an optimizat

Re: [tipc-discussion] Tr : [PATCH net-next v2 4/5] tipc: ensure that first packets on link are sent in order

2016-04-13 Thread Xue, Ying

Hi Jon,

Thanks for your clear explanation. Nice change!

Regards,
Ying

From: Jon Maloy [mailto:jon.ma...@ericsson.com]
Sent: 2016年4月13日 3:51
To: Xue, Ying; tipc-discussion@lists.sourceforge.net; Jon Maloy 
(ma...@donjonn.com); Richard Alpe; Parthasarathy Bhuvaragan; 
erik.hu...@gmail.com
Subject: RE: Tr : [PATCH net-next v2 4/5] tipc: ensure that first packets on 
link are sent in order

Hi Ying,
(Mail forwarded. For some reason some of your mails only show up in my private 
mailbox, and not in Ericsson’s)

It basically goes like this:

tipc_rcv()
{
 ….
node_write_lock()
tipc_node_up()
  build_bcast_sync_msg() //pkt #1
node_write_unlock()
   named_node_up()
   named_distribute()
   tipc_node_xmit()
 bearer_xmit()  //pkt #2
   named_distribute()
   tipc_node_xmit()
 bearer_xmit()//pkt #3
  bearer_xmit() //pkt #1
}

We could of course have fixed this by passing the same queue along everywhere, 
but it is too intrusive for such a small problem.

///jon


From: Jon Maloy [mailto:ma...@donjonn.com]
Sent: Tuesday, 12 April, 2016 15:32
To: Jon Maloy
Subject: Tr : [PATCH net-next v2 4/5] tipc: ensure that first packets on link 
are sent in order



- Courriel transféré -
De : "Xue, Ying" <ying@windriver.com<mailto:ying@windriver.com>>
À : Jon Maloy <jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>>; 
"tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>"
 
<tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>>;
 
"parthasarathy.bhuvara...@ericsson.com<mailto:parthasarathy.bhuvara...@ericsson.com>"
 
<parthasarathy.bhuvara...@ericsson.com<mailto:parthasarathy.bhuvara...@ericsson.com>>;
 "richard.a...@ericsson.com<mailto:richard.a...@ericsson.com>" 
<richard.a...@ericsson.com<mailto:richard.a...@ericsson.com>>
Cc : "ma...@donjonn.com<mailto:ma...@donjonn.com>" 
<ma...@donjonn.com<mailto:ma...@donjonn.com>>
Envoyé le : mardi 12 avril 2016 6h47
Objet : RE: [PATCH net-next v2 4/5] tipc: ensure that first packets on link are 
sent in order

Hi Jon,

Can you please explain a bit more what type of packets are disordered in link 
establishment stage?

Thanks,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>]
Sent: 2016年4月11日 21:00
To: 
tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>;
 
parthasarathy.bhuvara...@ericsson.com<mailto:parthasarathy.bhuvara...@ericsson.com>;
 Xue, Ying; richard.a...@ericsson.com<mailto:richard.a...@ericsson.com>; 
jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>
Cc: ma...@donjonn.com<mailto:ma...@donjonn.com>
Subject: [PATCH net-next v2 4/5] tipc: ensure that first packets on link are 
sent in order

In some link establishment scenarios we see that packet #2 may be sent out 
before packet #1, forcing the receiver to demand retransmission of the missing 
packet. This is harmless, but may cause confusion among people tracing the 
packet flow.

Since this is extremely easy to fix, we do so by adding en extra send call to 
the bearer immediately after the link has come up.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>>
---
net/tipc/node.c | 4 
1 file changed, 4 insertions(+)

diff --git a/net/tipc/node.c b/net/tipc/node.c index ace178f..b00e12c 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -581,8 +581,12 @@ static void __tipc_node_link_up(struct tipc_node *n, int 
bearer_id,  static void tipc_node_link_up(struct tipc_node *n, int bearer_id,
  struct sk_buff_head *xmitq)
{
+struct tipc_media_addr *maddr;
+
tipc_node_write_lock(n);
__tipc_node_link_up(n, bearer_id, xmitq);
+maddr = >links[bearer_id].maddr;
+tipc_bearer_xmit(n->net, bearer_id, xmitq, maddr);
tipc_node_write_unlock(n);
}

--
1.9.1

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] tipc tool

2016-04-13 Thread Xue, Ying

It sounds like a very great news!

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: 2016年4月13日 1:03
To: tipc-discussion@lists.sourceforge.net; Richard Alpe; Xue, Ying; 
Parthasarathy Bhuvaragan; erik.hu...@gmail.com
Cc: Johan Mårtensson O; Henrik Persson
Subject: tipc tool

I just made an "apt-get install iproute2"  on Ubuntu 16.04 (Xenial), and could 
finally confirm that the tipc tool now is built by default for this package.
I had to issue a TR to make them do this.

/jon
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v2 4/5] tipc: ensure that first packets on link are sent in order

2016-04-12 Thread Xue, Ying

Hi Jon,

Can you please explain a bit more what type of packets are disordered in link 
establishment stage?

Thanks,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: 2016年4月11日 21:00
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com
Cc: ma...@donjonn.com
Subject: [PATCH net-next v2 4/5] tipc: ensure that first packets on link are 
sent in order

In some link establishment scenarios we see that packet #2 may be sent out 
before packet #1, forcing the receiver to demand retransmission of the missing 
packet. This is harmless, but may cause confusion among people tracing the 
packet flow.

Since this is extremely easy to fix, we do so by adding en extra send call to 
the bearer immediately after the link has come up.

Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/node.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/tipc/node.c b/net/tipc/node.c index ace178f..b00e12c 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -581,8 +581,12 @@ static void __tipc_node_link_up(struct tipc_node *n, int 
bearer_id,  static void tipc_node_link_up(struct tipc_node *n, int bearer_id,
  struct sk_buff_head *xmitq)
 {
+   struct tipc_media_addr *maddr;
+
tipc_node_write_lock(n);
__tipc_node_link_up(n, bearer_id, xmitq);
+   maddr = >links[bearer_id].maddr;
+   tipc_bearer_xmit(n->net, bearer_id, xmitq, maddr);
tipc_node_write_unlock(n);
 }
 
--
1.9.1

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v1 1/4] tipc: add net device to skb before UDP xmit

2016-03-03 Thread Xue, Ying

As I just saw, you had submitted the series to net-next. Anyway, this is a very 
good job!

Thanks,
Ying

-Original Message-
From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On 
Behalf Of Richard Alpe
Sent: 2016年3月3日 21:21
To: net...@vger.kernel.org
Cc: tipc-discussion@lists.sourceforge.net; Richard Alpe
Subject: [PATCH net-next v1 1/4] tipc: add net device to skb before UDP xmit

Prior to this patch enabling a IPv4 UDP bearer caused a null pointer 
dereference in iptunnel_xmit_stats(), when it tried to dereference the net 
device from the skb. To resolve this we now point the skb device to the net 
device resolved from the routing table.

Fixes: 039f50629b7f (ip_tunnel: Move stats update to iptunnel_xmit())
Signed-off-by: Richard Alpe 
Acked-by: Jon Maloy 
Reviewed-by: Erik Hugne 
---
 net/tipc/udp_media.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index 
d63a911..f22a5bb1 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -181,6 +181,8 @@ static int tipc_udp_send_msg(struct net *net, struct 
sk_buff *skb,
err = PTR_ERR(rt);
goto tx_error;
}
+
+   skb->dev = rt->dst.dev;
ttl = ip4_dst_hoplimit(>dst);
udp_tunnel_xmit_skb(rt, ub->ubsock->sk, skb, src->ipv4.s_addr,
dst->ipv4.s_addr, 0, ttl, 0, src->udp_port,
--
2.1.4

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH v1 2/4] tipc: don't check link reset on non existing link

2016-03-03 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Richard Alpe [mailto:richard.a...@ericsson.com] 
Sent: 2016年3月3日 1:11
To: tipc-discussion@lists.sourceforge.net
Cc: jon.ma...@ericsson.com; Xue, Ying; parthasarathy.bhuvara...@ericsson.com; 
Richard Alpe
Subject: [PATCH v1 2/4] tipc: don't check link reset on non existing link

Make sure we have a link before checking if it has been reset or not.

Prior to this patch tipc_link_is_reset() could be called with a non existing 
link, resulting in a null pointer dereference.

Signed-off-by: Richard Alpe <richard.a...@ericsson.com>
---
 net/tipc/node.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/node.c b/net/tipc/node.c index cdb7950..590d597 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -843,7 +843,7 @@ void tipc_node_check_dest(struct net *net, u32 onode,
memcpy(>maddr, maddr, sizeof(*maddr));
 exit:
tipc_node_write_unlock(n);
-   if (reset && !tipc_link_is_reset(l))
+   if (reset && l && !tipc_link_is_reset(l))
tipc_node_link_down(n, b->identity, false);
tipc_node_put(n);
 }
--
2.1.4

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH v1 1/4] tipc: add net device to skb before UDP xmit

2016-03-03 Thread Xue, Ying

Acked-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Richard Alpe [mailto:richard.a...@ericsson.com] 
Sent: 2016年3月3日 1:11
To: tipc-discussion@lists.sourceforge.net
Cc: jon.ma...@ericsson.com; Xue, Ying; parthasarathy.bhuvara...@ericsson.com; 
Richard Alpe
Subject: [PATCH v1 1/4] tipc: add net device to skb before UDP xmit

Prior to this patch enabling a IPv4 UDP bearer caused a null pointer 
dereference in iptunnel_xmit_stats(), when it tried to dereference the net 
device from the skb. To resolve this we now point the skb device to the net 
device resolved from the routing table.

Fixes: 039f50629b7f (ip_tunnel: Move stats update to iptunnel_xmit())
Signed-off-by: Richard Alpe <richard.a...@ericsson.com>
---
 net/tipc/udp_media.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index 
d63a911..f22a5bb1 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -181,6 +181,8 @@ static int tipc_udp_send_msg(struct net *net, struct 
sk_buff *skb,
err = PTR_ERR(rt);
goto tx_error;
}
+
+   skb->dev = rt->dst.dev;
ttl = ip4_dst_hoplimit(>dst);
udp_tunnel_xmit_skb(rt, ub->ubsock->sk, skb, src->ipv4.s_addr,
dst->ipv4.s_addr, 0, ttl, 0, src->udp_port,
--
2.1.4

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-24 Thread Xue, Ying

Ack-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: 2016年2月20日 9:17
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com; huzhiji...@gmail.com
Cc: ma...@donjonn.com
Subject: [PATCH net-next v3 1/1] tipc: fix crash during node removal

When the TIPC module is unloaded, we have identified a race condition that 
allows a node reference counter to go to zero and the node instance being freed 
before the node timer is finished with accessing it. This leads to occasional 
crashes, especially in multi-namespace environments.

The scenario goes as follows:

CPU0:(node_stop)   CPU1:(node_timeout)  // ref == 2

1:  if(!mod_timer())
2: if (del_timer())
3:   tipc_node_put()// ref -> 1
4: tipc_node_put()  // ref -> 0
5:   kfree_rcu(node);
6:   tipc_node_get(node)
7:   // BOOM!

We now clean up this functionality as follows:

1) We remove the node pointer from the node lookup table before we
   attempt deactivating the timer. This way, we reduce the risk that
   tipc_node_find() may obtain a valid pointer to an instance marked
   for deletion; a harmless but undesirable situation.

2) We use del_timer_sync() instead of del_timer() to safely deactivate
   the node timer without any risk that it might be reactivated by the
   timeout handler. There is no risk of deadlock here, since the two
   functions never touch the same spinlocks.

3: We remove a pointless tipc_node_get() + tipc_node_put() from the
   timeout handler.

v3: - changed to del_timer_sync()
- don't test for return value of mod_timer() in tipc_node_timeout(),
  and don't touch kref.

Reported-by: Zhijiang Hu <huzhiji...@gmail.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/node.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/net/tipc/node.c b/net/tipc/node.c index f310e931..1fdaed0 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -225,9 +225,10 @@ static unsigned int tipc_hashfn(u32 addr)
 
 static void tipc_node_kref_release(struct kref *kref)  {
-   struct tipc_node *node = container_of(kref, struct tipc_node, kref);
+   struct tipc_node *n = container_of(kref, struct tipc_node, kref);
 
-   tipc_node_delete(node);
+   kfree(n->bc_entry.link);
+   kfree_rcu(n, rcu);
 }
 
 static void tipc_node_put(struct tipc_node *node) @@ -395,8 +396,10 @@ static 
void tipc_node_delete(struct tipc_node *node)  {
list_del_rcu(>list);
hlist_del_rcu(>hash);
-   kfree(node->bc_entry.link);
-   kfree_rcu(node, rcu);
+   tipc_node_put(node);
+
+   del_timer_sync(>timer);
+   tipc_node_put(node);
 }
 
 void tipc_node_stop(struct net *net)
@@ -405,11 +408,8 @@ void tipc_node_stop(struct net *net)
struct tipc_node *node, *t_node;
 
spin_lock_bh(>node_list_lock);
-   list_for_each_entry_safe(node, t_node, >node_list, list) {
-   if (del_timer(>timer))
-   tipc_node_put(node);
-   tipc_node_put(node);
-   }
+   list_for_each_entry_safe(node, t_node, >node_list, list)
+   tipc_node_delete(node);
spin_unlock_bh(>node_list_lock);
 }
 
@@ -530,9 +530,7 @@ static void tipc_node_timeout(unsigned long data)
if (rc & TIPC_LINK_DOWN_EVT)
tipc_node_link_down(n, bearer_id, false);
}
-   if (!mod_timer(>timer, jiffies + n->keepalive_intv))
-   tipc_node_get(n);
-   tipc_node_put(n);
+   mod_timer(>timer, jiffies + n->keepalive_intv);
 }
 
 /**
--
1.9.1

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-24 Thread Xue, Ying

HI Jon,

Thanks for your clear explanation. Yes, I misunderstood the scenario. You are 
right and its solution is much simpler and safer than before.

Please go ahead.

Thanks,
Ying

From: Jon Maloy [mailto:jon.ma...@ericsson.com]
Sent: 2016年2月23日 21:17
To: Xue, Ying; jason
Cc: Jon Maloy; Richard Alpe; Parthasarathy Bhuvaragan; 
tipc-discussion@lists.sourceforge.net
Subject: RE: [PATCH net-next v3 1/1] tipc: fix crash during node removal

Ying,
del_timer() definitely returns 1 after the timer function has called 
mod_timer(), which is the case here.
After mod_timer() has been called, the timer is active, no matter its previous 
state and return value.
So, in the brief interval between mod_timer() and tipc_node_get() this scenario 
can happen.
Please explain why you don’t see a problem with this. Let me also remind you 
that we actually
see a crash happening here.

I am aware about how this is done in the socket code, and it is the correct 
solution there,
but not in our case, or at least it is unnecessarily complex.  They have done 
it that way
because they anticipate that sk_reset_timer() may  be called after 
sk_stop_timer(), and
hence  restart a stopped timer.

This never happens in our case. The timer is started when the node is created, 
and continues
running until it is deleted. Full stop. Between those two points in time kref 
is kept incremented,
and the fact that the timer is “ïnactive” during most of the execution of the 
timer function
changes nothing to that. The timer  is once again, unconditionally, active when 
that function
returns.
Why make things more complex than necessary?

Regards
///jon

From: Xue, Ying [mailto:ying@windriver.com]
Sent: Tuesday, 23 February, 2016 05:20
To: jason
Cc: Jon Maloy; Richard Alpe; Parthasarathy Bhuvaragan; 
tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>;
 Jon Maloy
Subject: RE: [PATCH net-next v3 1/1] tipc: fix crash during node removal

Even if node timeout function restarts the node timer with mod_timer(), I don’t 
see any problem here.
If you concern the usage about dealing with node timer and its refcount, please 
look at another similar example which can demonstrate how to maintain the 
relationship between sk_timer and sk_refcnt. For instance, you can refer to  
both sk_reset_timer() and sk_stop_timer().

Regards,
Ying

From: jason [mailto:huzhiji...@gmail.com]
Sent: 2016年2月23日 17:55
To: Xue, Ying
Cc: Jon Maloy; Richard Alpe; 
parthasarathy.bhuvara...@ericsson.com<mailto:parthasarathy.bhuvara...@ericsson.com>;

tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>;
 Jon Maloy
Subject: RE: [PATCH net-next v3 1/1] tipc: fix crash during node removal

On Feb 22, 2016 11:22 PM, "Xue, Ying" 
<ying@windriver.com<mailto:ying@windriver.com>> wrote:
>
> Hi Jon,
>
> I think the scenario described below is not true. This is because del_timer() 
> doesn't return 1 while node_timeout() is being called. Please take a look at 
> __run_timers() defined in kernel/time/timer.c. When a timer is expired, 
> __run_timers() will call the timeout function attached to the timer. But 
> before __run_timers() runs timeout function, it first detaches the expired 
> timer,

Hi Ying,
But the timeout function here do mod_timer() which attach itself again?

and then performs its timeout function. That means del_timer() will definitely 
return 0 while node_timeout() is under execution. So, the following scenario 
cannot happen at all.
>
> Regards,
> Ying
>
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>]
> Sent: 2016年2月20日 9:17
> To: 
> tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>;
>  
> parthasarathy.bhuvara...@ericsson.com<mailto:parthasarathy.bhuvara...@ericsson.com>;
>  Xue, Ying; richard.a...@ericsson.com<mailto:richard.a...@ericsson.com>; 
> jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>; 
> huzhiji...@gmail.com<mailto:huzhiji...@gmail.com>
> Cc: ma...@donjonn.com<mailto:ma...@donjonn.com>
> Subject: [PATCH net-next v3 1/1] tipc: fix crash during node removal
>
> When the TIPC module is unloaded, we have identified a race condition that 
> allows a node reference counter to go to zero and the node instance being 
> freed before the node timer is finished with accessing it. This leads to 
> occasional crashes, especially in multi-namespace environments.
>
> The scenario goes as follows:
>
> CPU0:(node_stop)   CPU1:(node_timeout)  // ref == 2
>
> 1:  if(!mod_timer())
> 2: if (del_timer())
> 3:   tipc_node_put()// ref -> 1
> 4: tipc_node_put()  // ref -> 0

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: eliminate risk of finding to-be-deleted node instance

2016-02-22 Thread Xue, Ying

Ack-by: Ying Xue <ying@windriver.com>

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: 2016年2月20日 7:52
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com; huzhiji...@gmail.com
Cc: ma...@donjonn.com
Subject: [PATCH net-next v3 1/1] tipc: eliminate risk of finding to-be-deleted 
node instance

Although we have never seen it happen, we have identified the following 
problematic scenario when nodes are stopped and deleted:

CPU0:CPU1:

tipc_node_xxx()   //ref == 1
   tipc_node_put()//ref -> 0
 tipc_node_find() // node still in table
   tipc_node_delete()
 list_del_rcu(n. list)
 tipc_node_get()  //ref -> 1, bad
 kfree_rcu()

 tipc_node_put() //ref to 0 again.
 kfree_rcu() // BOOM!

We fix this by introducing use of the conditional kref_get_if_not_zero() 
instead of kref_get() in the function tipc_node_find(). This eliminates any 
risk of post-mortem access.

Reported-by: Zhijiang Hu <huzhiji...@gmail.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/node.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/tipc/node.c b/net/tipc/node.c index 10a1e87..24cc8ec 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -245,23 +245,23 @@ static void tipc_node_get(struct tipc_node *node)
  */
 static struct tipc_node *tipc_node_find(struct net *net, u32 addr)  {
-   struct tipc_net *tn = net_generic(net, tipc_net_id);
+   struct tipc_net *tn = tipc_net(net);
struct tipc_node *node;
+   unsigned int thash = tipc_hashfn(addr);
 
if (unlikely(!in_own_cluster_exact(net, addr)))
return NULL;
 
rcu_read_lock();
-   hlist_for_each_entry_rcu(node, >node_htable[tipc_hashfn(addr)],
-hash) {
-   if (node->addr == addr) {
-   tipc_node_get(node);
-   rcu_read_unlock();
-   return node;
-   }
+   hlist_for_each_entry_rcu(node, >node_htable[thash], hash) {
+   if (node->addr != addr)
+   continue;
+   if (!kref_get_unless_zero(>kref))
+   node = NULL;
+   break;
}
rcu_read_unlock();
-   return NULL;
+   return node;
 }
 
 static void tipc_node_read_lock(struct tipc_node *n)
--
1.9.1

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Re: [tipc-discussion] [PATCH net-next v3 1/1] tipc: fix crash during node removal

2016-02-22 Thread Xue, Ying

Hi Jon,

I think the scenario described below is not true. This is because del_timer() 
doesn't return 1 while node_timeout() is being called. Please take a look at 
__run_timers() defined in kernel/time/timer.c. When a timer is expired, 
__run_timers() will call the timeout function attached to the timer. But before 
__run_timers() runs timeout function, it first detaches the expired timer, and 
then performs its timeout function. That means del_timer() will definitely 
return 0 while node_timeout() is under execution. So, the following scenario 
cannot happen at all. 

Regards,
Ying

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: 2016年2月20日 9:17
To: tipc-discussion@lists.sourceforge.net; 
parthasarathy.bhuvara...@ericsson.com; Xue, Ying; richard.a...@ericsson.com; 
jon.ma...@ericsson.com; huzhiji...@gmail.com
Cc: ma...@donjonn.com
Subject: [PATCH net-next v3 1/1] tipc: fix crash during node removal

When the TIPC module is unloaded, we have identified a race condition that 
allows a node reference counter to go to zero and the node instance being freed 
before the node timer is finished with accessing it. This leads to occasional 
crashes, especially in multi-namespace environments.

The scenario goes as follows:

CPU0:(node_stop)   CPU1:(node_timeout)  // ref == 2

1:  if(!mod_timer())
2: if (del_timer())
3:   tipc_node_put()// ref -> 1
4: tipc_node_put()  // ref -> 0
5:   kfree_rcu(node);
6:   tipc_node_get(node)
7:   // BOOM!

We now clean up this functionality as follows:

1) We remove the node pointer from the node lookup table before we
   attempt deactivating the timer. This way, we reduce the risk that
   tipc_node_find() may obtain a valid pointer to an instance marked
   for deletion; a harmless but undesirable situation.

2) We use del_timer_sync() instead of del_timer() to safely deactivate
   the node timer without any risk that it might be reactivated by the
   timeout handler. There is no risk of deadlock here, since the two
   functions never touch the same spinlocks.

3: We remove a pointless tipc_node_get() + tipc_node_put() from the
   timeout handler.

v3: - changed to del_timer_sync()
- don't test for return value of mod_timer() in tipc_node_timeout(),
  and don't touch kref.

Reported-by: Zhijiang Hu <huzhiji...@gmail.com>
Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>
---
 net/tipc/node.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/net/tipc/node.c b/net/tipc/node.c index f310e931..1fdaed0 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -225,9 +225,10 @@ static unsigned int tipc_hashfn(u32 addr)
 
 static void tipc_node_kref_release(struct kref *kref)  {
-   struct tipc_node *node = container_of(kref, struct tipc_node, kref);
+   struct tipc_node *n = container_of(kref, struct tipc_node, kref);
 
-   tipc_node_delete(node);
+   kfree(n->bc_entry.link);
+   kfree_rcu(n, rcu);
 }
 
 static void tipc_node_put(struct tipc_node *node) @@ -395,8 +396,10 @@ static 
void tipc_node_delete(struct tipc_node *node)  {
list_del_rcu(>list);
hlist_del_rcu(>hash);
-   kfree(node->bc_entry.link);
-   kfree_rcu(node, rcu);
+   tipc_node_put(node);
+
+   del_timer_sync(>timer);
+   tipc_node_put(node);
 }
 
 void tipc_node_stop(struct net *net)
@@ -405,11 +408,8 @@ void tipc_node_stop(struct net *net)
struct tipc_node *node, *t_node;
 
spin_lock_bh(>node_list_lock);
-   list_for_each_entry_safe(node, t_node, >node_list, list) {
-   if (del_timer(>timer))
-   tipc_node_put(node);
-   tipc_node_put(node);
-   }
+   list_for_each_entry_safe(node, t_node, >node_list, list)
+   tipc_node_delete(node);
spin_unlock_bh(>node_list_lock);
 }
 
@@ -530,9 +530,7 @@ static void tipc_node_timeout(unsigned long data)
if (rc & TIPC_LINK_DOWN_EVT)
tipc_node_link_down(n, bearer_id, false);
}
-   if (!mod_timer(>timer, jiffies + n->keepalive_intv))
-   tipc_node_get(n);
-   tipc_node_put(n);
+   mod_timer(>timer, jiffies + n->keepalive_intv);
 }
 
 /**
--
1.9.1

--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140
___
ti

1 2 >

1 - 100 of 104 matches

Mail list logo