Re: [PATCHv2 bpf 1/2] bpf: Support sk lookup in netns with id 0

2018-11-30 Thread Nicolas Dichtel
Le 30/11/2018 à 01:29, Joe Stringer a écrit :
> David Ahern and Nicolas Dichtel report that the handling of the netns id
> 0 is incorrect for the BPF socket lookup helpers: rather than finding
> the netns with id 0, it is resolving to the current netns. This renders
> the netns_id 0 inaccessible.
> 
> To fix this, adjust the API for the netns to treat all negative s32
> values as a lookup in the current netns, while any values with a
> positive value in the signed 32-bit integer space would result in a
> lookup for a socket in the netns corresponding to that id. As before, if
> the netns with that ID does not exist, no socket will be found.
> Furthermore, if any bits are set in the upper 32-bits, then no socket
> will be found.
> 
> Signed-off-by: Joe Stringer 
Acked-by: Nicolas Dichtel 


[PATCH net] tun: forbid iface creation with rtnl ops

2018-11-29 Thread Nicolas Dichtel
It's not supported right now (the goal of the initial patch was to support
'ip link del' only).

Before the patch:
$ ip link add foo type tun
[  239.632660] BUG: unable to handle kernel NULL pointer dereference at 

[snip]
[  239.636410] RIP: 0010:register_netdevice+0x8e/0x3a0

This panic occurs because dev->netdev_ops is not set by tun_setup(). But to
have something usable, it will require more than just setting
netdev_ops.

Fixes: f019a7a594d9 ("tun: Implement ip link del tunXXX")
CC: Eric W. Biederman 
Signed-off-by: Nicolas Dichtel 
---
 drivers/net/tun.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e244f5d7512a..cf349e65a66b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2293,9 +2293,9 @@ static void tun_setup(struct net_device *dev)
 static int tun_validate(struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
 {
-   if (!data)
-   return 0;
-   return -EINVAL;
+   NL_SET_ERR_MSG(extack,
+  "tun/tap creation via rtnetlink is not supported.");
+   return -EOPNOTSUPP;
 }
 
 static size_t tun_get_size(const struct net_device *dev)
-- 
2.18.0



Re: [PATCH net-next] tun: implement carrier change

2018-11-29 Thread Nicolas Dichtel
Le 28/11/2018 à 22:48, Andrew Lunn a écrit :
> On Wed, Nov 28, 2018 at 07:12:56PM +0100, Nicolas Dichtel wrote:
>> The userspace may need to control the carrier state.
> 
> Hi Nicolas
Hi Andrew,

> 
> Could you explain your user case a bit more.
> 
> Are you running a routing daemon on top of the interface, and want it
> to reroute when the carrier goes down?
Sort of, we have a daemon that monitors the app and may re-route the traffic to
a secondary app if needed.


Regards,
Nicolas


[PATCH net-next] tun: implement carrier change

2018-11-28 Thread Nicolas Dichtel
The userspace may need to control the carrier state.

Signed-off-by: Nicolas Dichtel 
Signed-off-by: Didier Pallard 
---
 drivers/net/tun.c   | 27 ++-
 include/uapi/linux/if_tun.h |  1 +
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 56575f88d1fd..835c73f42ae7 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1254,6 +1254,21 @@ static int tun_xdp(struct net_device *dev, struct 
netdev_bpf *xdp)
}
 }
 
+static int tun_net_change_carrier(struct net_device *dev, bool new_carrier)
+{
+   if (new_carrier) {
+   struct tun_struct *tun = netdev_priv(dev);
+
+   if (!tun->numqueues)
+   return -EPERM;
+
+   netif_carrier_on(dev);
+   } else {
+   netif_carrier_off(dev);
+   }
+   return 0;
+}
+
 static const struct net_device_ops tun_netdev_ops = {
.ndo_uninit = tun_net_uninit,
.ndo_open   = tun_net_open,
@@ -1263,6 +1278,7 @@ static const struct net_device_ops tun_netdev_ops = {
.ndo_select_queue   = tun_select_queue,
.ndo_set_rx_headroom= tun_set_headroom,
.ndo_get_stats64= tun_net_get_stats64,
+   .ndo_change_carrier = tun_net_change_carrier,
 };
 
 static void __tun_xdp_flush_tfile(struct tun_file *tfile)
@@ -1345,6 +1361,7 @@ static const struct net_device_ops tap_netdev_ops = {
.ndo_get_stats64= tun_net_get_stats64,
.ndo_bpf= tun_xdp,
.ndo_xdp_xmit   = tun_xdp_xmit,
+   .ndo_change_carrier = tun_net_change_carrier,
 };
 
 static void tun_flow_init(struct tun_struct *tun)
@@ -3002,12 +3019,12 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
struct net *net = sock_net(>sk);
struct tun_struct *tun;
void __user* argp = (void __user*)arg;
+   unsigned int ifindex, carrier;
struct ifreq ifr;
kuid_t owner;
kgid_t group;
int sndbuf;
int vnet_hdr_sz;
-   unsigned int ifindex;
int le;
int ret;
bool do_notify = false;
@@ -3291,6 +3308,14 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
ret = tun_set_ebpf(tun, >filter_prog, argp);
break;
 
+   case TUNSETCARRIER:
+   ret = -EFAULT;
+   if (copy_from_user(, argp, sizeof(carrier)))
+   goto unlock;
+
+   ret = tun_net_change_carrier(tun->dev, (bool)carrier);
+   break;
+
default:
ret = -EINVAL;
break;
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index ee432cd3018c..23a6753b37df 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -59,6 +59,7 @@
 #define TUNGETVNETBE _IOR('T', 223, int)
 #define TUNSETSTEERINGEBPF _IOR('T', 224, int)
 #define TUNSETFILTEREBPF _IOR('T', 225, int)
+#define TUNSETCARRIER _IOW('T', 226, int)
 
 /* TUNSETIFF ifr flags */
 #define IFF_TUN0x0001
-- 
2.18.0



Re: [PATCH bpf] bpf: Support sk lookup in netns with id 0

2018-11-27 Thread Nicolas Dichtel
Le 26/11/2018 à 23:08, David Ahern a écrit :
> On 11/26/18 2:27 PM, Joe Stringer wrote:
>> @@ -2405,6 +2407,9 @@ enum bpf_func_id {
>>  /* BPF_FUNC_perf_event_output for sk_buff input context. */
>>  #define BPF_F_CTXLEN_MASK   (0xfULL << 32)
>>  
>> +/* BPF_FUNC_sk_lookup_tcp and BPF_FUNC_sk_lookup_udp flags. */
>> +#define BPF_F_SK_CURRENT_NS 0x8000 /* For netns field */
>> +
> 
> I went down the nsid road because it will be needed for other use cases
> (e.g., device lookups), and we should have a general API for network
> namespaces. Given that, I think the _SK should be dropped from the name.
> 
Would it not be possible to have a s32 instead of an u32 for the coming APIs?
It would be better to match the current netlink and kernel APIs.


Regards,
Nicolas


[PATCH net-next v4 2/5] netns: introduce 'struct net_fill_args'

2018-11-26 Thread Nicolas Dichtel
This is a preparatory work. To avoid having to much arguments for the
function rtnl_net_fill(), a new structure is defined.

Signed-off-by: Nicolas Dichtel 
Reviewed-by: David Ahern 
---
 net/core/net_namespace.c | 48 
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 52b9620e3457..f8a5966b086c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -738,20 +738,28 @@ static int rtnl_net_get_size(void)
   ;
 }
 
-static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
-int cmd, int nsid)
+struct net_fill_args {
+   u32 portid;
+   u32 seq;
+   int flags;
+   int cmd;
+   int nsid;
+};
+
+static int rtnl_net_fill(struct sk_buff *skb, struct net_fill_args *args)
 {
struct nlmsghdr *nlh;
struct rtgenmsg *rth;
 
-   nlh = nlmsg_put(skb, portid, seq, cmd, sizeof(*rth), flags);
+   nlh = nlmsg_put(skb, args->portid, args->seq, args->cmd, sizeof(*rth),
+   args->flags);
if (!nlh)
return -EMSGSIZE;
 
rth = nlmsg_data(nlh);
rth->rtgen_family = AF_UNSPEC;
 
-   if (nla_put_s32(skb, NETNSA_NSID, nsid))
+   if (nla_put_s32(skb, NETNSA_NSID, args->nsid))
goto nla_put_failure;
 
nlmsg_end(skb, nlh);
@@ -767,10 +775,15 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 {
struct net *net = sock_net(skb->sk);
struct nlattr *tb[NETNSA_MAX + 1];
+   struct net_fill_args fillargs = {
+   .portid = NETLINK_CB(skb).portid,
+   .seq = nlh->nlmsg_seq,
+   .cmd = RTM_NEWNSID,
+   };
struct nlattr *nla;
struct sk_buff *msg;
struct net *peer;
-   int err, id;
+   int err;
 
err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
  rtnl_net_policy, extack);
@@ -799,9 +812,8 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
goto out;
}
 
-   id = peernet2id(net, peer);
-   err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
-   RTM_NEWNSID, id);
+   fillargs.nsid = peernet2id(net, peer);
+   err = rtnl_net_fill(msg, );
if (err < 0)
goto err_out;
 
@@ -817,7 +829,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
 struct rtnl_net_dump_cb {
struct sk_buff *skb;
-   struct netlink_callback *cb;
+   struct net_fill_args fillargs;
int idx;
int s_idx;
 };
@@ -830,9 +842,8 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
if (net_cb->idx < net_cb->s_idx)
goto cont;
 
-   ret = rtnl_net_fill(net_cb->skb, NETLINK_CB(net_cb->cb->skb).portid,
-   net_cb->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-   RTM_NEWNSID, id);
+   net_cb->fillargs.nsid = id;
+   ret = rtnl_net_fill(net_cb->skb, _cb->fillargs);
if (ret < 0)
return ret;
 
@@ -846,7 +857,12 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
struct net *net = sock_net(skb->sk);
struct rtnl_net_dump_cb net_cb = {
.skb = skb,
-   .cb = cb,
+   .fillargs = {
+   .portid = NETLINK_CB(cb->skb).portid,
+   .seq = cb->nlh->nlmsg_seq,
+   .flags = NLM_F_MULTI,
+   .cmd = RTM_NEWNSID,
+   },
.idx = 0,
.s_idx = cb->args[0],
};
@@ -867,6 +883,10 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
 
 static void rtnl_net_notifyid(struct net *net, int cmd, int id)
 {
+   struct net_fill_args fillargs = {
+   .cmd = cmd,
+   .nsid = id,
+   };
struct sk_buff *msg;
int err = -ENOMEM;
 
@@ -874,7 +894,7 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int 
id)
if (!msg)
goto out;
 
-   err = rtnl_net_fill(msg, 0, 0, 0, cmd, id);
+   err = rtnl_net_fill(msg, );
if (err < 0)
goto err_out;
 
-- 
2.18.0



[PATCH net-next v4 0/5] Ease to interpret net-nsid

2018-11-26 Thread Nicolas Dichtel


The goal of this series is to ease the interpretation of nsid received in
netlink messages from other netns (when the user uses
NETLINK_F_LISTEN_ALL_NSID).

After this series, with a patched iproute2:

$ ip netns add foo
$ ip netns add bar
$ touch /var/run/netns/init_net
$ mount --bind /proc/1/ns/net /var/run/netns/init_net
$ ip netns set init_net 11
$ ip netns set foo 12
$ ip netns set bar 13
$ ip netns
init_net (id: 11)
bar (id: 13)
foo (id: 12)
$ ip -n foo netns set init_net 21
$ ip -n foo netns set foo 22
$ ip -n foo netns set bar 23
$ ip -n foo netns
init_net (id: 21)
bar (id: 23)
foo (id: 22)
$ ip -n bar netns set init_net 31
$ ip -n bar netns set foo 32
$ ip -n bar netns set bar 33
$ ip -n bar netns
init_net (id: 31)
bar (id: 33)
foo (id: 32)
$ ip netns list-id target-nsid 12
nsid 21 current-nsid 11 (iproute2 netns name: init_net)
nsid 22 current-nsid 12 (iproute2 netns name: foo)
nsid 23 current-nsid 13 (iproute2 netns name: bar)
$ ip -n bar netns list-id target-nsid 32 nsid 31
nsid 21 current-nsid 31 (iproute2 netns name: init_net)

v3 -> v4:
  - patch 5/5: fix imbalance lock in error path

v2 -> v3:
  - patch 5/5: account NETNSA_CURRENT_NSID in rtnl_net_get_size()

v1 -> v2:
  - patch 1/5: remove net from struct rtnl_net_dump_cb
  - patch 2/5: new in this version
  - patch 3/5: use a bool to know if rtnl_get_net_ns_capable() was called
  - patch 5/5: use struct net_fill_args

 include/uapi/linux/net_namespace.h |   2 +
 net/core/net_namespace.c   | 159 +++--
 2 files changed, 135 insertions(+), 26 deletions(-)

Comments are welcomed,
Regards,
Nicolas



[PATCH net-next v4 4/5] netns: enable to specify a nsid for a get request

2018-11-26 Thread Nicolas Dichtel
Combined with NETNSA_TARGET_NSID, it enables to "translate" a nsid from one
netns to a nsid of another netns.
This is useful when using NETLINK_F_LISTEN_ALL_NSID because it helps the
user to interpret a nsid received from an other netns.

Signed-off-by: Nicolas Dichtel 
Reviewed-by: David Ahern 
---
 net/core/net_namespace.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 885c54197e31..dd25fb22ad45 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -797,6 +797,11 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
} else if (tb[NETNSA_FD]) {
peer = get_net_ns_by_fd(nla_get_u32(tb[NETNSA_FD]));
nla = tb[NETNSA_FD];
+   } else if (tb[NETNSA_NSID]) {
+   peer = get_net_ns_by_id(net, nla_get_u32(tb[NETNSA_NSID]));
+   if (!peer)
+   peer = ERR_PTR(-ENOENT);
+   nla = tb[NETNSA_NSID];
} else {
NL_SET_ERR_MSG(extack, "Peer netns reference is missing");
return -EINVAL;
-- 
2.18.0



[PATCH net-next v4 1/5] netns: remove net arg from rtnl_net_fill()

2018-11-26 Thread Nicolas Dichtel
This argument is not used anymore.

Fixes: cab3c8ec8d57 ("netns: always provide the id to rtnl_net_fill()")
Signed-off-by: Nicolas Dichtel 
Reviewed-by: David Ahern 
---
 net/core/net_namespace.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index fefe72774aeb..52b9620e3457 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -739,7 +739,7 @@ static int rtnl_net_get_size(void)
 }
 
 static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
-int cmd, struct net *net, int nsid)
+int cmd, int nsid)
 {
struct nlmsghdr *nlh;
struct rtgenmsg *rth;
@@ -801,7 +801,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
id = peernet2id(net, peer);
err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
-   RTM_NEWNSID, net, id);
+   RTM_NEWNSID, id);
if (err < 0)
goto err_out;
 
@@ -816,7 +816,6 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 }
 
 struct rtnl_net_dump_cb {
-   struct net *net;
struct sk_buff *skb;
struct netlink_callback *cb;
int idx;
@@ -833,7 +832,7 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
 
ret = rtnl_net_fill(net_cb->skb, NETLINK_CB(net_cb->cb->skb).portid,
net_cb->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-   RTM_NEWNSID, net_cb->net, id);
+   RTM_NEWNSID, id);
if (ret < 0)
return ret;
 
@@ -846,7 +845,6 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
 {
struct net *net = sock_net(skb->sk);
struct rtnl_net_dump_cb net_cb = {
-   .net = net,
.skb = skb,
.cb = cb,
.idx = 0,
@@ -876,7 +874,7 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int 
id)
if (!msg)
goto out;
 
-   err = rtnl_net_fill(msg, 0, 0, 0, cmd, net, id);
+   err = rtnl_net_fill(msg, 0, 0, 0, cmd, id);
if (err < 0)
goto err_out;
 
-- 
2.18.0



[PATCH net-next v4 3/5] netns: add support of NETNSA_TARGET_NSID

2018-11-26 Thread Nicolas Dichtel
Like it was done for link and address, add the ability to perform get/dump
in another netns by specifying a target nsid attribute.

Signed-off-by: Nicolas Dichtel 
Reviewed-by: David Ahern 
---
 include/uapi/linux/net_namespace.h |  1 +
 net/core/net_namespace.c   | 86 ++
 2 files changed, 76 insertions(+), 11 deletions(-)

diff --git a/include/uapi/linux/net_namespace.h 
b/include/uapi/linux/net_namespace.h
index 0187c74d8889..0ed9dd61d32a 100644
--- a/include/uapi/linux/net_namespace.h
+++ b/include/uapi/linux/net_namespace.h
@@ -16,6 +16,7 @@ enum {
NETNSA_NSID,
NETNSA_PID,
NETNSA_FD,
+   NETNSA_TARGET_NSID,
__NETNSA_MAX,
 };
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f8a5966b086c..885c54197e31 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -669,6 +669,7 @@ static const struct nla_policy rtnl_net_policy[NETNSA_MAX + 
1] = {
[NETNSA_NSID]   = { .type = NLA_S32 },
[NETNSA_PID]= { .type = NLA_U32 },
[NETNSA_FD] = { .type = NLA_U32 },
+   [NETNSA_TARGET_NSID]= { .type = NLA_S32 },
 };
 
 static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -780,9 +781,10 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
.seq = nlh->nlmsg_seq,
.cmd = RTM_NEWNSID,
};
+   struct net *peer, *target = net;
+   bool put_target = false;
struct nlattr *nla;
struct sk_buff *msg;
-   struct net *peer;
int err;
 
err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
@@ -806,13 +808,27 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
return PTR_ERR(peer);
}
 
+   if (tb[NETNSA_TARGET_NSID]) {
+   int id = nla_get_s32(tb[NETNSA_TARGET_NSID]);
+
+   target = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, id);
+   if (IS_ERR(target)) {
+   NL_SET_BAD_ATTR(extack, tb[NETNSA_TARGET_NSID]);
+   NL_SET_ERR_MSG(extack,
+  "Target netns reference is invalid");
+   err = PTR_ERR(target);
+   goto out;
+   }
+   put_target = true;
+   }
+
msg = nlmsg_new(rtnl_net_get_size(), GFP_KERNEL);
if (!msg) {
err = -ENOMEM;
goto out;
}
 
-   fillargs.nsid = peernet2id(net, peer);
+   fillargs.nsid = peernet2id(target, peer);
err = rtnl_net_fill(msg, );
if (err < 0)
goto err_out;
@@ -823,15 +839,19 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 err_out:
nlmsg_free(msg);
 out:
+   if (put_target)
+   put_net(target);
put_net(peer);
return err;
 }
 
 struct rtnl_net_dump_cb {
+   struct net *tgt_net;
struct sk_buff *skb;
struct net_fill_args fillargs;
int idx;
int s_idx;
+   bool put_tgt_net;
 };
 
 static int rtnl_net_dumpid_one(int id, void *peer, void *data)
@@ -852,10 +872,50 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
return 0;
 }
 
+static int rtnl_valid_dump_net_req(const struct nlmsghdr *nlh, struct sock *sk,
+  struct rtnl_net_dump_cb *net_cb,
+  struct netlink_callback *cb)
+{
+   struct netlink_ext_ack *extack = cb->extack;
+   struct nlattr *tb[NETNSA_MAX + 1];
+   int err, i;
+
+   err = nlmsg_parse_strict(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
+rtnl_net_policy, extack);
+   if (err < 0)
+   return err;
+
+   for (i = 0; i <= NETNSA_MAX; i++) {
+   if (!tb[i])
+   continue;
+
+   if (i == NETNSA_TARGET_NSID) {
+   struct net *net;
+
+   net = rtnl_get_net_ns_capable(sk, nla_get_s32(tb[i]));
+   if (IS_ERR(net)) {
+   NL_SET_BAD_ATTR(extack, tb[i]);
+   NL_SET_ERR_MSG(extack,
+  "Invalid target network 
namespace id");
+   return PTR_ERR(net);
+   }
+   net_cb->tgt_net = net;
+   net_cb->put_tgt_net = true;
+   } else {
+   NL_SET_BAD_ATTR(extack, tb[i]);
+   NL_SET_ERR_MSG(extack,
+  "Unsupported attribute in dump request");
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
 static int rtnl_net_dumpid(struct sk_buff *skb, struct netlink_callback *cb)
 {
-   

[PATCH net-next v4 5/5] netns: enable to dump full nsid translation table

2018-11-26 Thread Nicolas Dichtel
Like the previous patch, the goal is to ease to convert nsids from one
netns to another netns.
A new attribute (NETNSA_CURRENT_NSID) is added to the kernel answer when
NETNSA_TARGET_NSID is provided, thus the user can easily convert nsids.

Signed-off-by: Nicolas Dichtel 
Reviewed-by: David Ahern 
---
 include/uapi/linux/net_namespace.h |  1 +
 net/core/net_namespace.c   | 32 --
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/net_namespace.h 
b/include/uapi/linux/net_namespace.h
index 0ed9dd61d32a..9f9956809565 100644
--- a/include/uapi/linux/net_namespace.h
+++ b/include/uapi/linux/net_namespace.h
@@ -17,6 +17,7 @@ enum {
NETNSA_PID,
NETNSA_FD,
NETNSA_TARGET_NSID,
+   NETNSA_CURRENT_NSID,
__NETNSA_MAX,
 };
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index dd25fb22ad45..05b23b285058 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -736,6 +736,7 @@ static int rtnl_net_get_size(void)
 {
return NLMSG_ALIGN(sizeof(struct rtgenmsg))
   + nla_total_size(sizeof(s32)) /* NETNSA_NSID */
+  + nla_total_size(sizeof(s32)) /* NETNSA_CURRENT_NSID */
   ;
 }
 
@@ -745,6 +746,8 @@ struct net_fill_args {
int flags;
int cmd;
int nsid;
+   bool add_ref;
+   int ref_nsid;
 };
 
 static int rtnl_net_fill(struct sk_buff *skb, struct net_fill_args *args)
@@ -763,6 +766,10 @@ static int rtnl_net_fill(struct sk_buff *skb, struct 
net_fill_args *args)
if (nla_put_s32(skb, NETNSA_NSID, args->nsid))
goto nla_put_failure;
 
+   if (args->add_ref &&
+   nla_put_s32(skb, NETNSA_CURRENT_NSID, args->ref_nsid))
+   goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
 
@@ -782,7 +789,6 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
.cmd = RTM_NEWNSID,
};
struct net *peer, *target = net;
-   bool put_target = false;
struct nlattr *nla;
struct sk_buff *msg;
int err;
@@ -824,7 +830,8 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
err = PTR_ERR(target);
goto out;
}
-   put_target = true;
+   fillargs.add_ref = true;
+   fillargs.ref_nsid = peernet2id(net, peer);
}
 
msg = nlmsg_new(rtnl_net_get_size(), GFP_KERNEL);
@@ -844,7 +851,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 err_out:
nlmsg_free(msg);
 out:
-   if (put_target)
+   if (fillargs.add_ref)
put_net(target);
put_net(peer);
return err;
@@ -852,11 +859,11 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
 struct rtnl_net_dump_cb {
struct net *tgt_net;
+   struct net *ref_net;
struct sk_buff *skb;
struct net_fill_args fillargs;
int idx;
int s_idx;
-   bool put_tgt_net;
 };
 
 static int rtnl_net_dumpid_one(int id, void *peer, void *data)
@@ -868,6 +875,8 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
goto cont;
 
net_cb->fillargs.nsid = id;
+   if (net_cb->fillargs.add_ref)
+   net_cb->fillargs.ref_nsid = __peernet2id(net_cb->ref_net, peer);
ret = rtnl_net_fill(net_cb->skb, _cb->fillargs);
if (ret < 0)
return ret;
@@ -904,8 +913,9 @@ static int rtnl_valid_dump_net_req(const struct nlmsghdr 
*nlh, struct sock *sk,
   "Invalid target network 
namespace id");
return PTR_ERR(net);
}
+   net_cb->fillargs.add_ref = true;
+   net_cb->ref_net = net_cb->tgt_net;
net_cb->tgt_net = net;
-   net_cb->put_tgt_net = true;
} else {
NL_SET_BAD_ATTR(extack, tb[i]);
NL_SET_ERR_MSG(extack,
@@ -940,12 +950,22 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
}
 
spin_lock_bh(_cb.tgt_net->nsid_lock);
+   if (net_cb.fillargs.add_ref &&
+   !net_eq(net_cb.ref_net, net_cb.tgt_net) &&
+   !spin_trylock_bh(_cb.ref_net->nsid_lock)) {
+   spin_unlock_bh(_cb.tgt_net->nsid_lock);
+   err = -EAGAIN;
+   goto end;
+   }
idr_for_each(_cb.tgt_net->netns_ids, rtnl_net_dumpid_one, _cb);
+   if (net_cb.fillargs.add_ref &&
+   !net_eq(net_cb.ref_net, net_cb.tgt_net))
+   spin_unlock_bh(_cb.ref_net->nsid_lock);
spin_unlock_bh(_cb.tgt_net->nsid_lock);
 
cb->args[0] = ne

[PATCH net-next v3 4/5] netns: enable to specify a nsid for a get request

2018-11-22 Thread Nicolas Dichtel
Combined with NETNSA_TARGET_NSID, it enables to "translate" a nsid from one
netns to a nsid of another netns.
This is useful when using NETLINK_F_LISTEN_ALL_NSID because it helps the
user to interpret a nsid received from an other netns.

Signed-off-by: Nicolas Dichtel 
Reviewed-by: David Ahern 
---
 net/core/net_namespace.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 885c54197e31..dd25fb22ad45 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -797,6 +797,11 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
} else if (tb[NETNSA_FD]) {
peer = get_net_ns_by_fd(nla_get_u32(tb[NETNSA_FD]));
nla = tb[NETNSA_FD];
+   } else if (tb[NETNSA_NSID]) {
+   peer = get_net_ns_by_id(net, nla_get_u32(tb[NETNSA_NSID]));
+   if (!peer)
+   peer = ERR_PTR(-ENOENT);
+   nla = tb[NETNSA_NSID];
} else {
NL_SET_ERR_MSG(extack, "Peer netns reference is missing");
return -EINVAL;
-- 
2.18.0



[PATCH net-next v3 1/5] netns: remove net arg from rtnl_net_fill()

2018-11-22 Thread Nicolas Dichtel
This argument is not used anymore.

Fixes: cab3c8ec8d57 ("netns: always provide the id to rtnl_net_fill()")
Signed-off-by: Nicolas Dichtel 
Reviewed-by: David Ahern 
---
 net/core/net_namespace.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index fefe72774aeb..52b9620e3457 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -739,7 +739,7 @@ static int rtnl_net_get_size(void)
 }
 
 static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
-int cmd, struct net *net, int nsid)
+int cmd, int nsid)
 {
struct nlmsghdr *nlh;
struct rtgenmsg *rth;
@@ -801,7 +801,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
id = peernet2id(net, peer);
err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
-   RTM_NEWNSID, net, id);
+   RTM_NEWNSID, id);
if (err < 0)
goto err_out;
 
@@ -816,7 +816,6 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 }
 
 struct rtnl_net_dump_cb {
-   struct net *net;
struct sk_buff *skb;
struct netlink_callback *cb;
int idx;
@@ -833,7 +832,7 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
 
ret = rtnl_net_fill(net_cb->skb, NETLINK_CB(net_cb->cb->skb).portid,
net_cb->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-   RTM_NEWNSID, net_cb->net, id);
+   RTM_NEWNSID, id);
if (ret < 0)
return ret;
 
@@ -846,7 +845,6 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
 {
struct net *net = sock_net(skb->sk);
struct rtnl_net_dump_cb net_cb = {
-   .net = net,
.skb = skb,
.cb = cb,
.idx = 0,
@@ -876,7 +874,7 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int 
id)
if (!msg)
goto out;
 
-   err = rtnl_net_fill(msg, 0, 0, 0, cmd, net, id);
+   err = rtnl_net_fill(msg, 0, 0, 0, cmd, id);
if (err < 0)
goto err_out;
 
-- 
2.18.0



[PATCH net-next v3 2/5] netns: introduce 'struct net_fill_args'

2018-11-22 Thread Nicolas Dichtel
This is a preparatory work. To avoid having to much arguments for the
function rtnl_net_fill(), a new structure is defined.

Signed-off-by: Nicolas Dichtel 
Reviewed-by: David Ahern 
---
 net/core/net_namespace.c | 48 
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 52b9620e3457..f8a5966b086c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -738,20 +738,28 @@ static int rtnl_net_get_size(void)
   ;
 }
 
-static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
-int cmd, int nsid)
+struct net_fill_args {
+   u32 portid;
+   u32 seq;
+   int flags;
+   int cmd;
+   int nsid;
+};
+
+static int rtnl_net_fill(struct sk_buff *skb, struct net_fill_args *args)
 {
struct nlmsghdr *nlh;
struct rtgenmsg *rth;
 
-   nlh = nlmsg_put(skb, portid, seq, cmd, sizeof(*rth), flags);
+   nlh = nlmsg_put(skb, args->portid, args->seq, args->cmd, sizeof(*rth),
+   args->flags);
if (!nlh)
return -EMSGSIZE;
 
rth = nlmsg_data(nlh);
rth->rtgen_family = AF_UNSPEC;
 
-   if (nla_put_s32(skb, NETNSA_NSID, nsid))
+   if (nla_put_s32(skb, NETNSA_NSID, args->nsid))
goto nla_put_failure;
 
nlmsg_end(skb, nlh);
@@ -767,10 +775,15 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 {
struct net *net = sock_net(skb->sk);
struct nlattr *tb[NETNSA_MAX + 1];
+   struct net_fill_args fillargs = {
+   .portid = NETLINK_CB(skb).portid,
+   .seq = nlh->nlmsg_seq,
+   .cmd = RTM_NEWNSID,
+   };
struct nlattr *nla;
struct sk_buff *msg;
struct net *peer;
-   int err, id;
+   int err;
 
err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
  rtnl_net_policy, extack);
@@ -799,9 +812,8 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
goto out;
}
 
-   id = peernet2id(net, peer);
-   err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
-   RTM_NEWNSID, id);
+   fillargs.nsid = peernet2id(net, peer);
+   err = rtnl_net_fill(msg, );
if (err < 0)
goto err_out;
 
@@ -817,7 +829,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
 struct rtnl_net_dump_cb {
struct sk_buff *skb;
-   struct netlink_callback *cb;
+   struct net_fill_args fillargs;
int idx;
int s_idx;
 };
@@ -830,9 +842,8 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
if (net_cb->idx < net_cb->s_idx)
goto cont;
 
-   ret = rtnl_net_fill(net_cb->skb, NETLINK_CB(net_cb->cb->skb).portid,
-   net_cb->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-   RTM_NEWNSID, id);
+   net_cb->fillargs.nsid = id;
+   ret = rtnl_net_fill(net_cb->skb, _cb->fillargs);
if (ret < 0)
return ret;
 
@@ -846,7 +857,12 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
struct net *net = sock_net(skb->sk);
struct rtnl_net_dump_cb net_cb = {
.skb = skb,
-   .cb = cb,
+   .fillargs = {
+   .portid = NETLINK_CB(cb->skb).portid,
+   .seq = cb->nlh->nlmsg_seq,
+   .flags = NLM_F_MULTI,
+   .cmd = RTM_NEWNSID,
+   },
.idx = 0,
.s_idx = cb->args[0],
};
@@ -867,6 +883,10 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
 
 static void rtnl_net_notifyid(struct net *net, int cmd, int id)
 {
+   struct net_fill_args fillargs = {
+   .cmd = cmd,
+   .nsid = id,
+   };
struct sk_buff *msg;
int err = -ENOMEM;
 
@@ -874,7 +894,7 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int 
id)
if (!msg)
goto out;
 
-   err = rtnl_net_fill(msg, 0, 0, 0, cmd, id);
+   err = rtnl_net_fill(msg, );
if (err < 0)
goto err_out;
 
-- 
2.18.0



[PATCH net-next v3 5/5] netns: enable to dump full nsid translation table

2018-11-22 Thread Nicolas Dichtel
Like the previous patch, the goal is to ease to convert nsids from one
netns to another netns.
A new attribute (NETNSA_CURRENT_NSID) is added to the kernel answer when
NETNSA_TARGET_NSID is provided, thus the user can easily convert nsids.

Signed-off-by: Nicolas Dichtel 
---
 include/uapi/linux/net_namespace.h |  1 +
 net/core/net_namespace.c   | 31 --
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/net_namespace.h 
b/include/uapi/linux/net_namespace.h
index 0ed9dd61d32a..9f9956809565 100644
--- a/include/uapi/linux/net_namespace.h
+++ b/include/uapi/linux/net_namespace.h
@@ -17,6 +17,7 @@ enum {
NETNSA_PID,
NETNSA_FD,
NETNSA_TARGET_NSID,
+   NETNSA_CURRENT_NSID,
__NETNSA_MAX,
 };
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index dd25fb22ad45..2f25d7f2a43b 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -736,6 +736,7 @@ static int rtnl_net_get_size(void)
 {
return NLMSG_ALIGN(sizeof(struct rtgenmsg))
   + nla_total_size(sizeof(s32)) /* NETNSA_NSID */
+  + nla_total_size(sizeof(s32)) /* NETNSA_CURRENT_NSID */
   ;
 }
 
@@ -745,6 +746,8 @@ struct net_fill_args {
int flags;
int cmd;
int nsid;
+   bool add_ref;
+   int ref_nsid;
 };
 
 static int rtnl_net_fill(struct sk_buff *skb, struct net_fill_args *args)
@@ -763,6 +766,10 @@ static int rtnl_net_fill(struct sk_buff *skb, struct 
net_fill_args *args)
if (nla_put_s32(skb, NETNSA_NSID, args->nsid))
goto nla_put_failure;
 
+   if (args->add_ref &&
+   nla_put_s32(skb, NETNSA_CURRENT_NSID, args->ref_nsid))
+   goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
 
@@ -782,7 +789,6 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
.cmd = RTM_NEWNSID,
};
struct net *peer, *target = net;
-   bool put_target = false;
struct nlattr *nla;
struct sk_buff *msg;
int err;
@@ -824,7 +830,8 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
err = PTR_ERR(target);
goto out;
}
-   put_target = true;
+   fillargs.add_ref = true;
+   fillargs.ref_nsid = peernet2id(net, peer);
}
 
msg = nlmsg_new(rtnl_net_get_size(), GFP_KERNEL);
@@ -844,7 +851,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 err_out:
nlmsg_free(msg);
 out:
-   if (put_target)
+   if (fillargs.add_ref)
put_net(target);
put_net(peer);
return err;
@@ -852,11 +859,11 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
 struct rtnl_net_dump_cb {
struct net *tgt_net;
+   struct net *ref_net;
struct sk_buff *skb;
struct net_fill_args fillargs;
int idx;
int s_idx;
-   bool put_tgt_net;
 };
 
 static int rtnl_net_dumpid_one(int id, void *peer, void *data)
@@ -868,6 +875,8 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
goto cont;
 
net_cb->fillargs.nsid = id;
+   if (net_cb->fillargs.add_ref)
+   net_cb->fillargs.ref_nsid = __peernet2id(net_cb->ref_net, peer);
ret = rtnl_net_fill(net_cb->skb, _cb->fillargs);
if (ret < 0)
return ret;
@@ -904,8 +913,9 @@ static int rtnl_valid_dump_net_req(const struct nlmsghdr 
*nlh, struct sock *sk,
   "Invalid target network 
namespace id");
return PTR_ERR(net);
}
+   net_cb->fillargs.add_ref = true;
+   net_cb->ref_net = net_cb->tgt_net;
net_cb->tgt_net = net;
-   net_cb->put_tgt_net = true;
} else {
NL_SET_BAD_ATTR(extack, tb[i]);
NL_SET_ERR_MSG(extack,
@@ -940,12 +950,21 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
}
 
spin_lock_bh(_cb.tgt_net->nsid_lock);
+   if (net_cb.fillargs.add_ref &&
+   !net_eq(net_cb.ref_net, net_cb.tgt_net) &&
+   !spin_trylock_bh(_cb.ref_net->nsid_lock)) {
+   err = -EAGAIN;
+   goto end;
+   }
idr_for_each(_cb.tgt_net->netns_ids, rtnl_net_dumpid_one, _cb);
+   if (net_cb.fillargs.add_ref &&
+   !net_eq(net_cb.ref_net, net_cb.tgt_net))
+   spin_unlock_bh(_cb.ref_net->nsid_lock);
spin_unlock_bh(_cb.tgt_net->nsid_lock);
 
cb->args[0] = net_cb.idx;
 end:
-   if (net_cb.put_tgt_net)
+   if (net_cb.fillargs.add_ref)
put_net(net_cb.tgt_net);
return err < 0 ? err : skb->len;
 }
-- 
2.18.0



[PATCH net-next v3 0/5] Ease to interpret net-nsid

2018-11-22 Thread Nicolas Dichtel


The goal of this series is to ease the interpretation of nsid received in
netlink messages from other netns (when the user uses
NETLINK_F_LISTEN_ALL_NSID).

After this series, with a patched iproute2:

$ ip netns add foo
$ ip netns add bar
$ touch /var/run/netns/init_net
$ mount --bind /proc/1/ns/net /var/run/netns/init_net
$ ip netns set init_net 11
$ ip netns set foo 12
$ ip netns set bar 13
$ ip netns
init_net (id: 11)
bar (id: 13)
foo (id: 12)
$ ip -n foo netns set init_net 21
$ ip -n foo netns set foo 22
$ ip -n foo netns set bar 23
$ ip -n foo netns
init_net (id: 21)
bar (id: 23)
foo (id: 22)
$ ip -n bar netns set init_net 31
$ ip -n bar netns set foo 32
$ ip -n bar netns set bar 33
$ ip -n bar netns
init_net (id: 31)
bar (id: 33)
foo (id: 32)
$ ip netns list-id target-nsid 12
nsid 21 current-nsid 11 (iproute2 netns name: init_net)
nsid 22 current-nsid 12 (iproute2 netns name: foo)
nsid 23 current-nsid 13 (iproute2 netns name: bar)
$ ip -n bar netns list-id target-nsid 32 nsid 31
nsid 21 current-nsid 31 (iproute2 netns name: init_net)

v2 -> v3:
  - patch 5/5: account NETNSA_CURRENT_NSID in rtnl_net_get_size()

v1 -> v2:
  - patch 1/5: remove net from struct rtnl_net_dump_cb
  - patch 2/5: new in this version
  - patch 3/5: use a bool to know if rtnl_get_net_ns_capable() was called
  - patch 5/5: use struct net_fill_args

 include/uapi/linux/net_namespace.h |   2 +
 net/core/net_namespace.c   | 158 +++--
 2 files changed, 134 insertions(+), 26 deletions(-)

Comments are welcomed,
Regards,
Nicolas



[PATCH net-next v3 3/5] netns: add support of NETNSA_TARGET_NSID

2018-11-22 Thread Nicolas Dichtel
Like it was done for link and address, add the ability to perform get/dump
in another netns by specifying a target nsid attribute.

Signed-off-by: Nicolas Dichtel 
Reviewed-by: David Ahern 
---
 include/uapi/linux/net_namespace.h |  1 +
 net/core/net_namespace.c   | 86 ++
 2 files changed, 76 insertions(+), 11 deletions(-)

diff --git a/include/uapi/linux/net_namespace.h 
b/include/uapi/linux/net_namespace.h
index 0187c74d8889..0ed9dd61d32a 100644
--- a/include/uapi/linux/net_namespace.h
+++ b/include/uapi/linux/net_namespace.h
@@ -16,6 +16,7 @@ enum {
NETNSA_NSID,
NETNSA_PID,
NETNSA_FD,
+   NETNSA_TARGET_NSID,
__NETNSA_MAX,
 };
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f8a5966b086c..885c54197e31 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -669,6 +669,7 @@ static const struct nla_policy rtnl_net_policy[NETNSA_MAX + 
1] = {
[NETNSA_NSID]   = { .type = NLA_S32 },
[NETNSA_PID]= { .type = NLA_U32 },
[NETNSA_FD] = { .type = NLA_U32 },
+   [NETNSA_TARGET_NSID]= { .type = NLA_S32 },
 };
 
 static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -780,9 +781,10 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
.seq = nlh->nlmsg_seq,
.cmd = RTM_NEWNSID,
};
+   struct net *peer, *target = net;
+   bool put_target = false;
struct nlattr *nla;
struct sk_buff *msg;
-   struct net *peer;
int err;
 
err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
@@ -806,13 +808,27 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
return PTR_ERR(peer);
}
 
+   if (tb[NETNSA_TARGET_NSID]) {
+   int id = nla_get_s32(tb[NETNSA_TARGET_NSID]);
+
+   target = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, id);
+   if (IS_ERR(target)) {
+   NL_SET_BAD_ATTR(extack, tb[NETNSA_TARGET_NSID]);
+   NL_SET_ERR_MSG(extack,
+  "Target netns reference is invalid");
+   err = PTR_ERR(target);
+   goto out;
+   }
+   put_target = true;
+   }
+
msg = nlmsg_new(rtnl_net_get_size(), GFP_KERNEL);
if (!msg) {
err = -ENOMEM;
goto out;
}
 
-   fillargs.nsid = peernet2id(net, peer);
+   fillargs.nsid = peernet2id(target, peer);
err = rtnl_net_fill(msg, );
if (err < 0)
goto err_out;
@@ -823,15 +839,19 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 err_out:
nlmsg_free(msg);
 out:
+   if (put_target)
+   put_net(target);
put_net(peer);
return err;
 }
 
 struct rtnl_net_dump_cb {
+   struct net *tgt_net;
struct sk_buff *skb;
struct net_fill_args fillargs;
int idx;
int s_idx;
+   bool put_tgt_net;
 };
 
 static int rtnl_net_dumpid_one(int id, void *peer, void *data)
@@ -852,10 +872,50 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
return 0;
 }
 
+static int rtnl_valid_dump_net_req(const struct nlmsghdr *nlh, struct sock *sk,
+  struct rtnl_net_dump_cb *net_cb,
+  struct netlink_callback *cb)
+{
+   struct netlink_ext_ack *extack = cb->extack;
+   struct nlattr *tb[NETNSA_MAX + 1];
+   int err, i;
+
+   err = nlmsg_parse_strict(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
+rtnl_net_policy, extack);
+   if (err < 0)
+   return err;
+
+   for (i = 0; i <= NETNSA_MAX; i++) {
+   if (!tb[i])
+   continue;
+
+   if (i == NETNSA_TARGET_NSID) {
+   struct net *net;
+
+   net = rtnl_get_net_ns_capable(sk, nla_get_s32(tb[i]));
+   if (IS_ERR(net)) {
+   NL_SET_BAD_ATTR(extack, tb[i]);
+   NL_SET_ERR_MSG(extack,
+  "Invalid target network 
namespace id");
+   return PTR_ERR(net);
+   }
+   net_cb->tgt_net = net;
+   net_cb->put_tgt_net = true;
+   } else {
+   NL_SET_BAD_ATTR(extack, tb[i]);
+   NL_SET_ERR_MSG(extack,
+  "Unsupported attribute in dump request");
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
 static int rtnl_net_dumpid(struct sk_buff *skb, struct netlink_callback *cb)
 {
-   

patchwork bug?

2018-11-22 Thread Nicolas Dichtel
Not sure if it's the right place to post that.

When I try to list patches with filters, something like this:
http://patchwork.ozlabs.org/project/netdev/list/?series==2036=*==both=34

I can see only page 1. When I click on '2', the page 1 is still displayed and
the page numerotation is removed.


Regards,
Nicolas


Re: [PATCH net-next v2 5/5] netns: enable to dump full nsid translation table

2018-11-22 Thread Nicolas Dichtel
Le 22/11/2018 à 17:40, David Ahern a écrit :
> On 11/22/18 8:50 AM, Nicolas Dichtel wrote:
>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> index dd25fb22ad45..25030e0317a2 100644
>> --- a/net/core/net_namespace.c
>> +++ b/net/core/net_namespace.c
>> @@ -745,6 +745,8 @@ struct net_fill_args {
>>  int flags;
>>  int cmd;
>>  int nsid;
>> +bool add_ref;
>> +int ref_nsid;
>>  };
>>  
>>  static int rtnl_net_fill(struct sk_buff *skb, struct net_fill_args *args)
>> @@ -763,6 +765,10 @@ static int rtnl_net_fill(struct sk_buff *skb, struct 
>> net_fill_args *args)
>>  if (nla_put_s32(skb, NETNSA_NSID, args->nsid))
>>  goto nla_put_failure;
>>  
>> +if (args->add_ref &&
>> +nla_put_s32(skb, NETNSA_CURRENT_NSID, args->ref_nsid))
>> +goto nla_put_failure;
>> +
> 
> you need to add NETNSA_CURRENT_NSID to rtnl_net_get_size.
> 
Good catch.
I thought to this and I forgot at the end :/


[PATCH net-next v2 0/5] Ease to interpret net-nsid

2018-11-22 Thread Nicolas Dichtel
The goal of this series is to ease the interpretation of nsid received in
netlink messages from other netns (when the user uses
NETLINK_F_LISTEN_ALL_NSID).

After this series, with a patched iproute2:

$ ip netns add foo
$ ip netns add bar
$ touch /var/run/netns/init_net
$ mount --bind /proc/1/ns/net /var/run/netns/init_net
$ ip netns set init_net 11
$ ip netns set foo 12
$ ip netns set bar 13
$ ip netns
init_net (id: 11)
bar (id: 13)
foo (id: 12)
$ ip -n foo netns set init_net 21
$ ip -n foo netns set foo 22
$ ip -n foo netns set bar 23
$ ip -n foo netns
init_net (id: 21)
bar (id: 23)
foo (id: 22)
$ ip -n bar netns set init_net 31
$ ip -n bar netns set foo 32
$ ip -n bar netns set bar 33
$ ip -n bar netns
init_net (id: 31)
bar (id: 33)
foo (id: 32)
$ ip netns list-id target-nsid 12
nsid 21 current-nsid 11 (iproute2 netns name: init_net)
nsid 22 current-nsid 12 (iproute2 netns name: foo)
nsid 23 current-nsid 13 (iproute2 netns name: bar)
$ ip -n bar netns list-id target-nsid 32 nsid 31
nsid 21 current-nsid 31 (iproute2 netns name: init_net)

v1 -> v2:
  - patch 1/5: remove net from struct rtnl_net_dump_cb
  - patch 2/5: new in this version
  - patch 3/5: use a bool to know if rtnl_get_net_ns_capable() was called
  - patch 5/5: use struct net_fill_args

 include/uapi/linux/net_namespace.h |   2 +
 net/core/net_namespace.c   | 157 +++--
 2 files changed, 133 insertions(+), 26 deletions(-)

Comments are welcomed,
Regards,
Nicolas



[PATCH net-next v2 3/5] netns: add support of NETNSA_TARGET_NSID

2018-11-22 Thread Nicolas Dichtel
Like it was done for link and address, add the ability to perform get/dump
in another netns by specifying a target nsid attribute.

Signed-off-by: Nicolas Dichtel 
---
 include/uapi/linux/net_namespace.h |  1 +
 net/core/net_namespace.c   | 86 ++
 2 files changed, 76 insertions(+), 11 deletions(-)

diff --git a/include/uapi/linux/net_namespace.h 
b/include/uapi/linux/net_namespace.h
index 0187c74d8889..0ed9dd61d32a 100644
--- a/include/uapi/linux/net_namespace.h
+++ b/include/uapi/linux/net_namespace.h
@@ -16,6 +16,7 @@ enum {
NETNSA_NSID,
NETNSA_PID,
NETNSA_FD,
+   NETNSA_TARGET_NSID,
__NETNSA_MAX,
 };
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f8a5966b086c..885c54197e31 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -669,6 +669,7 @@ static const struct nla_policy rtnl_net_policy[NETNSA_MAX + 
1] = {
[NETNSA_NSID]   = { .type = NLA_S32 },
[NETNSA_PID]= { .type = NLA_U32 },
[NETNSA_FD] = { .type = NLA_U32 },
+   [NETNSA_TARGET_NSID]= { .type = NLA_S32 },
 };
 
 static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -780,9 +781,10 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
.seq = nlh->nlmsg_seq,
.cmd = RTM_NEWNSID,
};
+   struct net *peer, *target = net;
+   bool put_target = false;
struct nlattr *nla;
struct sk_buff *msg;
-   struct net *peer;
int err;
 
err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
@@ -806,13 +808,27 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
return PTR_ERR(peer);
}
 
+   if (tb[NETNSA_TARGET_NSID]) {
+   int id = nla_get_s32(tb[NETNSA_TARGET_NSID]);
+
+   target = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, id);
+   if (IS_ERR(target)) {
+   NL_SET_BAD_ATTR(extack, tb[NETNSA_TARGET_NSID]);
+   NL_SET_ERR_MSG(extack,
+  "Target netns reference is invalid");
+   err = PTR_ERR(target);
+   goto out;
+   }
+   put_target = true;
+   }
+
msg = nlmsg_new(rtnl_net_get_size(), GFP_KERNEL);
if (!msg) {
err = -ENOMEM;
goto out;
}
 
-   fillargs.nsid = peernet2id(net, peer);
+   fillargs.nsid = peernet2id(target, peer);
err = rtnl_net_fill(msg, );
if (err < 0)
goto err_out;
@@ -823,15 +839,19 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 err_out:
nlmsg_free(msg);
 out:
+   if (put_target)
+   put_net(target);
put_net(peer);
return err;
 }
 
 struct rtnl_net_dump_cb {
+   struct net *tgt_net;
struct sk_buff *skb;
struct net_fill_args fillargs;
int idx;
int s_idx;
+   bool put_tgt_net;
 };
 
 static int rtnl_net_dumpid_one(int id, void *peer, void *data)
@@ -852,10 +872,50 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
return 0;
 }
 
+static int rtnl_valid_dump_net_req(const struct nlmsghdr *nlh, struct sock *sk,
+  struct rtnl_net_dump_cb *net_cb,
+  struct netlink_callback *cb)
+{
+   struct netlink_ext_ack *extack = cb->extack;
+   struct nlattr *tb[NETNSA_MAX + 1];
+   int err, i;
+
+   err = nlmsg_parse_strict(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
+rtnl_net_policy, extack);
+   if (err < 0)
+   return err;
+
+   for (i = 0; i <= NETNSA_MAX; i++) {
+   if (!tb[i])
+   continue;
+
+   if (i == NETNSA_TARGET_NSID) {
+   struct net *net;
+
+   net = rtnl_get_net_ns_capable(sk, nla_get_s32(tb[i]));
+   if (IS_ERR(net)) {
+   NL_SET_BAD_ATTR(extack, tb[i]);
+   NL_SET_ERR_MSG(extack,
+  "Invalid target network 
namespace id");
+   return PTR_ERR(net);
+   }
+   net_cb->tgt_net = net;
+   net_cb->put_tgt_net = true;
+   } else {
+   NL_SET_BAD_ATTR(extack, tb[i]);
+   NL_SET_ERR_MSG(extack,
+  "Unsupported attribute in dump request");
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
 static int rtnl_net_dumpid(struct sk_buff *skb, struct netlink_callback *cb)
 {
-   struct net *net = sock_n

[PATCH net-next v2 1/5] netns: remove net arg from rtnl_net_fill()

2018-11-22 Thread Nicolas Dichtel
This argument is not used anymore.

Fixes: cab3c8ec8d57 ("netns: always provide the id to rtnl_net_fill()")
Signed-off-by: Nicolas Dichtel 
---
 net/core/net_namespace.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index fefe72774aeb..52b9620e3457 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -739,7 +739,7 @@ static int rtnl_net_get_size(void)
 }
 
 static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
-int cmd, struct net *net, int nsid)
+int cmd, int nsid)
 {
struct nlmsghdr *nlh;
struct rtgenmsg *rth;
@@ -801,7 +801,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
id = peernet2id(net, peer);
err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
-   RTM_NEWNSID, net, id);
+   RTM_NEWNSID, id);
if (err < 0)
goto err_out;
 
@@ -816,7 +816,6 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 }
 
 struct rtnl_net_dump_cb {
-   struct net *net;
struct sk_buff *skb;
struct netlink_callback *cb;
int idx;
@@ -833,7 +832,7 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
 
ret = rtnl_net_fill(net_cb->skb, NETLINK_CB(net_cb->cb->skb).portid,
net_cb->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-   RTM_NEWNSID, net_cb->net, id);
+   RTM_NEWNSID, id);
if (ret < 0)
return ret;
 
@@ -846,7 +845,6 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
 {
struct net *net = sock_net(skb->sk);
struct rtnl_net_dump_cb net_cb = {
-   .net = net,
.skb = skb,
.cb = cb,
.idx = 0,
@@ -876,7 +874,7 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int 
id)
if (!msg)
goto out;
 
-   err = rtnl_net_fill(msg, 0, 0, 0, cmd, net, id);
+   err = rtnl_net_fill(msg, 0, 0, 0, cmd, id);
if (err < 0)
goto err_out;
 
-- 
2.18.0



[PATCH net-next v2 2/5] netns: introduce 'struct net_fill_args'

2018-11-22 Thread Nicolas Dichtel
This is a preparatory work. To avoid having to much arguments for the
function rtnl_net_fill(), a new structure is defined.

Signed-off-by: Nicolas Dichtel 
---
 net/core/net_namespace.c | 48 
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 52b9620e3457..f8a5966b086c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -738,20 +738,28 @@ static int rtnl_net_get_size(void)
   ;
 }
 
-static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
-int cmd, int nsid)
+struct net_fill_args {
+   u32 portid;
+   u32 seq;
+   int flags;
+   int cmd;
+   int nsid;
+};
+
+static int rtnl_net_fill(struct sk_buff *skb, struct net_fill_args *args)
 {
struct nlmsghdr *nlh;
struct rtgenmsg *rth;
 
-   nlh = nlmsg_put(skb, portid, seq, cmd, sizeof(*rth), flags);
+   nlh = nlmsg_put(skb, args->portid, args->seq, args->cmd, sizeof(*rth),
+   args->flags);
if (!nlh)
return -EMSGSIZE;
 
rth = nlmsg_data(nlh);
rth->rtgen_family = AF_UNSPEC;
 
-   if (nla_put_s32(skb, NETNSA_NSID, nsid))
+   if (nla_put_s32(skb, NETNSA_NSID, args->nsid))
goto nla_put_failure;
 
nlmsg_end(skb, nlh);
@@ -767,10 +775,15 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 {
struct net *net = sock_net(skb->sk);
struct nlattr *tb[NETNSA_MAX + 1];
+   struct net_fill_args fillargs = {
+   .portid = NETLINK_CB(skb).portid,
+   .seq = nlh->nlmsg_seq,
+   .cmd = RTM_NEWNSID,
+   };
struct nlattr *nla;
struct sk_buff *msg;
struct net *peer;
-   int err, id;
+   int err;
 
err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
  rtnl_net_policy, extack);
@@ -799,9 +812,8 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
goto out;
}
 
-   id = peernet2id(net, peer);
-   err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
-   RTM_NEWNSID, id);
+   fillargs.nsid = peernet2id(net, peer);
+   err = rtnl_net_fill(msg, );
if (err < 0)
goto err_out;
 
@@ -817,7 +829,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
 struct rtnl_net_dump_cb {
struct sk_buff *skb;
-   struct netlink_callback *cb;
+   struct net_fill_args fillargs;
int idx;
int s_idx;
 };
@@ -830,9 +842,8 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
if (net_cb->idx < net_cb->s_idx)
goto cont;
 
-   ret = rtnl_net_fill(net_cb->skb, NETLINK_CB(net_cb->cb->skb).portid,
-   net_cb->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-   RTM_NEWNSID, id);
+   net_cb->fillargs.nsid = id;
+   ret = rtnl_net_fill(net_cb->skb, _cb->fillargs);
if (ret < 0)
return ret;
 
@@ -846,7 +857,12 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
struct net *net = sock_net(skb->sk);
struct rtnl_net_dump_cb net_cb = {
.skb = skb,
-   .cb = cb,
+   .fillargs = {
+   .portid = NETLINK_CB(cb->skb).portid,
+   .seq = cb->nlh->nlmsg_seq,
+   .flags = NLM_F_MULTI,
+   .cmd = RTM_NEWNSID,
+   },
.idx = 0,
.s_idx = cb->args[0],
};
@@ -867,6 +883,10 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
 
 static void rtnl_net_notifyid(struct net *net, int cmd, int id)
 {
+   struct net_fill_args fillargs = {
+   .cmd = cmd,
+   .nsid = id,
+   };
struct sk_buff *msg;
int err = -ENOMEM;
 
@@ -874,7 +894,7 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int 
id)
if (!msg)
goto out;
 
-   err = rtnl_net_fill(msg, 0, 0, 0, cmd, id);
+   err = rtnl_net_fill(msg, );
if (err < 0)
goto err_out;
 
-- 
2.18.0



[PATCH net-next v2 4/5] netns: enable to specify a nsid for a get request

2018-11-22 Thread Nicolas Dichtel
Combined with NETNSA_TARGET_NSID, it enables to "translate" a nsid from one
netns to a nsid of another netns.
This is useful when using NETLINK_F_LISTEN_ALL_NSID because it helps the
user to interpret a nsid received from an other netns.

Signed-off-by: Nicolas Dichtel 
---
 net/core/net_namespace.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 885c54197e31..dd25fb22ad45 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -797,6 +797,11 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
} else if (tb[NETNSA_FD]) {
peer = get_net_ns_by_fd(nla_get_u32(tb[NETNSA_FD]));
nla = tb[NETNSA_FD];
+   } else if (tb[NETNSA_NSID]) {
+   peer = get_net_ns_by_id(net, nla_get_u32(tb[NETNSA_NSID]));
+   if (!peer)
+   peer = ERR_PTR(-ENOENT);
+   nla = tb[NETNSA_NSID];
} else {
NL_SET_ERR_MSG(extack, "Peer netns reference is missing");
return -EINVAL;
-- 
2.18.0



[PATCH net-next v2 5/5] netns: enable to dump full nsid translation table

2018-11-22 Thread Nicolas Dichtel
Like the previous patch, the goal is to ease to convert nsids from one
netns to another netns.
A new attribute (NETNSA_CURRENT_NSID) is added to the kernel answer when
NETNSA_TARGET_NSID is provided, thus the user can easily convert nsids.

Signed-off-by: Nicolas Dichtel 
---
 include/uapi/linux/net_namespace.h |  1 +
 net/core/net_namespace.c   | 30 --
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/net_namespace.h 
b/include/uapi/linux/net_namespace.h
index 0ed9dd61d32a..9f9956809565 100644
--- a/include/uapi/linux/net_namespace.h
+++ b/include/uapi/linux/net_namespace.h
@@ -17,6 +17,7 @@ enum {
NETNSA_PID,
NETNSA_FD,
NETNSA_TARGET_NSID,
+   NETNSA_CURRENT_NSID,
__NETNSA_MAX,
 };
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index dd25fb22ad45..25030e0317a2 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -745,6 +745,8 @@ struct net_fill_args {
int flags;
int cmd;
int nsid;
+   bool add_ref;
+   int ref_nsid;
 };
 
 static int rtnl_net_fill(struct sk_buff *skb, struct net_fill_args *args)
@@ -763,6 +765,10 @@ static int rtnl_net_fill(struct sk_buff *skb, struct 
net_fill_args *args)
if (nla_put_s32(skb, NETNSA_NSID, args->nsid))
goto nla_put_failure;
 
+   if (args->add_ref &&
+   nla_put_s32(skb, NETNSA_CURRENT_NSID, args->ref_nsid))
+   goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
 
@@ -782,7 +788,6 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
.cmd = RTM_NEWNSID,
};
struct net *peer, *target = net;
-   bool put_target = false;
struct nlattr *nla;
struct sk_buff *msg;
int err;
@@ -824,7 +829,8 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
err = PTR_ERR(target);
goto out;
}
-   put_target = true;
+   fillargs.add_ref = true;
+   fillargs.ref_nsid = peernet2id(net, peer);
}
 
msg = nlmsg_new(rtnl_net_get_size(), GFP_KERNEL);
@@ -844,7 +850,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 err_out:
nlmsg_free(msg);
 out:
-   if (put_target)
+   if (fillargs.add_ref)
put_net(target);
put_net(peer);
return err;
@@ -852,11 +858,11 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
 struct rtnl_net_dump_cb {
struct net *tgt_net;
+   struct net *ref_net;
struct sk_buff *skb;
struct net_fill_args fillargs;
int idx;
int s_idx;
-   bool put_tgt_net;
 };
 
 static int rtnl_net_dumpid_one(int id, void *peer, void *data)
@@ -868,6 +874,8 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
goto cont;
 
net_cb->fillargs.nsid = id;
+   if (net_cb->fillargs.add_ref)
+   net_cb->fillargs.ref_nsid = __peernet2id(net_cb->ref_net, peer);
ret = rtnl_net_fill(net_cb->skb, _cb->fillargs);
if (ret < 0)
return ret;
@@ -904,8 +912,9 @@ static int rtnl_valid_dump_net_req(const struct nlmsghdr 
*nlh, struct sock *sk,
   "Invalid target network 
namespace id");
return PTR_ERR(net);
}
+   net_cb->fillargs.add_ref = true;
+   net_cb->ref_net = net_cb->tgt_net;
net_cb->tgt_net = net;
-   net_cb->put_tgt_net = true;
} else {
NL_SET_BAD_ATTR(extack, tb[i]);
NL_SET_ERR_MSG(extack,
@@ -940,12 +949,21 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
}
 
spin_lock_bh(_cb.tgt_net->nsid_lock);
+   if (net_cb.fillargs.add_ref &&
+   !net_eq(net_cb.ref_net, net_cb.tgt_net) &&
+   !spin_trylock_bh(_cb.ref_net->nsid_lock)) {
+   err = -EAGAIN;
+   goto end;
+   }
idr_for_each(_cb.tgt_net->netns_ids, rtnl_net_dumpid_one, _cb);
+   if (net_cb.fillargs.add_ref &&
+   !net_eq(net_cb.ref_net, net_cb.tgt_net))
+   spin_unlock_bh(_cb.ref_net->nsid_lock);
spin_unlock_bh(_cb.tgt_net->nsid_lock);
 
cb->args[0] = net_cb.idx;
 end:
-   if (net_cb.put_tgt_net)
+   if (net_cb.fillargs.add_ref)
put_net(net_cb.tgt_net);
return err < 0 ? err : skb->len;
 }
-- 
2.18.0



Re: [PATCH net-next 2/4] netns: add support of NETNSA_TARGET_NSID

2018-11-22 Thread Nicolas Dichtel
Le 21/11/2018 à 22:07, David Ahern a écrit :
> On 11/21/18 1:58 PM, Nicolas Dichtel wrote:
>> The target-nsid is not stored in net_cb (not needed). ref_net is set only if
>> tgt_net comes from TARGET_NSID. I can add a comment.
> 
> ref_net is added by this patch and it is only used (unless I missed
> something) to know if the put_net is needed. ie/, drop ref_net in place
> of nsid
It is used by a following patch (4/4) to get NETNSA_CURRENT_NSID.


Re: [PATCH net-next 4/4] netns: enable to dump full nsid translation table

2018-11-21 Thread Nicolas Dichtel
Le 21/11/2018 à 19:09, David Ahern a écrit :
> On 11/21/18 4:01 AM, Nicolas Dichtel wrote:
>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> index 92730905886c..fc568cd0b560 100644
>> --- a/net/core/net_namespace.c
>> +++ b/net/core/net_namespace.c
>> @@ -740,7 +740,7 @@ static int rtnl_net_get_size(void)
>>  }
>>  
>>  static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int 
>> flags,
>> - int cmd, int nsid)
>> + int cmd, int nsid, bool add_ref, int ref_nsid)
>>  {
>>  struct nlmsghdr *nlh;
>>  struct rtgenmsg *rth;
>> @@ -755,6 +755,9 @@ static int rtnl_net_fill(struct sk_buff *skb, u32 
>> portid, u32 seq, int flags,
>>  if (nla_put_s32(skb, NETNSA_NSID, nsid))
>>  goto nla_put_failure;
>>  
>> +if (add_ref && nla_put_s32(skb, NETNSA_CURRENT_NSID, ref_nsid))
>> +goto nla_put_failure;
> 
> Why not use ref_nsid >= 0 and drop the add_ref argument?
Because ref_nsid can be -1 (NETNSA_NSID_NOT_ASSIGNED) and I didn't want to add
another magic, I think it's clearer with an explicit bool.


Re: [PATCH net-next 2/4] netns: add support of NETNSA_TARGET_NSID

2018-11-21 Thread Nicolas Dichtel
Le 21/11/2018 à 19:05, David Ahern a écrit :
> On 11/21/18 4:01 AM, Nicolas Dichtel wrote:
>>  static int rtnl_net_dumpid(struct sk_buff *skb, struct netlink_callback *cb)
>>  {
>> -struct net *net = sock_net(skb->sk);
>>  struct rtnl_net_dump_cb net_cb = {
>> -.net = net,
>> +.tgt_net = sock_net(skb->sk),
>>  .skb = skb,
>>  .cb = cb,
>>  .idx = 0,
>>  .s_idx = cb->args[0],
>>  };
>> +int err = 0;
>>  
>> -if (cb->strict_check &&
>> -nlmsg_attrlen(cb->nlh, sizeof(struct rtgenmsg))) {
>> -NL_SET_ERR_MSG(cb->extack, "Unknown data in network 
>> namespace id dump request");
>> -return -EINVAL;
>> +if (cb->strict_check) {
>> +err = rtnl_valid_dump_net_req(cb->nlh, skb->sk, _cb, cb);
>> +if (err < 0)
>> +goto end;
>>  }
>>  
>> -spin_lock_bh(>nsid_lock);
>> -idr_for_each(>netns_ids, rtnl_net_dumpid_one, _cb);
>> -spin_unlock_bh(>nsid_lock);
>> +spin_lock_bh(_cb.tgt_net->nsid_lock);
>> +idr_for_each(_cb.tgt_net->netns_ids, rtnl_net_dumpid_one, _cb);
>> +spin_unlock_bh(_cb.tgt_net->nsid_lock);
>>  
>>  cb->args[0] = net_cb.idx;
>> -return skb->len;
>> +end:
>> +if (net_cb.ref_net)
>> +put_net(net_cb.tgt_net);
> 
> That is going to lead to confusion -- you check that ref_net is set put
> do a put on tgt_net.  Other places using a target namespace use the nsid
> as the key to whether a put_net is needed.
The target-nsid is not stored in net_cb (not needed). ref_net is set only if
tgt_net comes from TARGET_NSID. I can add a comment.


Re: [PATCH bpf-next v2] filter: add BPF_ADJ_ROOM_DATA mode to bpf_skb_adjust_room()

2018-11-21 Thread Nicolas Dichtel
Le 20/11/2018 à 02:06, Daniel Borkmann a écrit :
> On 11/13/2018 05:35 PM, Nicolas Dichtel wrote:
>> This new mode enables to add or remove an l2 header in a programmatic way
>> with cls_bpf.
>> For example, it enables to play with mpls headers.
>>
>> Signed-off-by: Nicolas Dichtel 
>> Acked-by: Martin KaFai Lau 
> 
> (Sorry for late reply, swamped due to Plumbers.)
I thought so, no problem ;-)

> 
>> ---
>>
>> v2: use skb_set_network_header()
>>
>>  include/uapi/linux/bpf.h   |  3 ++
>>  net/core/filter.c  | 53 ++
>>  tools/include/uapi/linux/bpf.h |  3 ++
>>  3 files changed, 59 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 47d606d744cc..9762ef3a536c 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1467,6 +1467,8 @@ union bpf_attr {
>>   *
>>   *  * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
>>   *(room space is added or removed below the layer 3 header).
>> + *  * **BPF_ADJ_ROOM_DATA**: Adjust room at the beginning of the
>> + *packet (room space is added or removed below skb->data).
> 
> Nit: in this case it would probably make more sense to name it 
> BPF_ADJ_ROOM_MAC.
> BPF_ADJ_ROOM_DATA reads like we're adjusting payload data.
Yep, I first had in mind skb->data, but you're right.

> 
>>   *  All values for *flags* are reserved for future usage, and must
>>   *  be left at zero.
>> @@ -2412,6 +2414,7 @@ enum bpf_func_id {
>>  /* Mode for BPF_FUNC_skb_adjust_room helper. */
>>  enum bpf_adj_room_mode {
>>  BPF_ADJ_ROOM_NET,
>> +BPF_ADJ_ROOM_DATA,
>>  };
>>  
>>  /* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 53d50fb75ea1..e56da3077dca 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -2884,6 +2884,57 @@ static int bpf_skb_adjust_net(struct sk_buff *skb, 
>> s32 len_diff)
>>  return ret;
>>  }
>>  
>> +static int bpf_skb_data_shrink(struct sk_buff *skb, u32 len)
>> +{
>> +unsigned short hhlen = skb->dev->header_ops ?
>> +   skb->dev->hard_header_len : 0;
>> +int ret;
>> +
>> +ret = skb_unclone(skb, GFP_ATOMIC);
>> +if (unlikely(ret < 0))
>> +return ret;
>> +
>> +__skb_pull(skb, len);
>> +skb_reset_mac_header(skb);
>> +skb_set_network_header(skb, hhlen);
>> +skb_reset_transport_header(skb);
>> +return 0;
>> +}
>> +
>> +static int bpf_skb_data_grow(struct sk_buff *skb, u32 len)
>> +{
>> +unsigned short hhlen = skb->dev->header_ops ?
>> +   skb->dev->hard_header_len : 0;
>> +int ret;
>> +
>> +ret = skb_cow(skb, len);
>> +if (unlikely(ret < 0))
>> +return ret;
>> +
>> +skb_push(skb, len);
>> +skb_reset_mac_header(skb);
> 
> Looks like this could leak uninitialized mem into the BPF prog as it's
> not zeroed out. Can't we just reuse bpf_skb_generic_push() and the
> bpf_skb_generic_pop()?
Hmm, at some point, it was not possible, but it was an intermediate version and
I don't see why now.

> 
> bpf_skb_vlan_push() and bpf_skb_vlan_pop() do a bpf_push_mac_rcsum() /
> bpf_pull_mac_rcsum() in order to not screw up things like csum complete.
> Wouldn't this be needed here as well?
> 
> How does this interact with MPLS GSO?
I overlooked both points.

> 
> If the packet gets pushed back into the stack, would AF_MPLS handle it?
> Probably good to add test cases to BPF kselftest suite with it.
Ok, I will have a look to this.

> 
> Don't we need to reject the packet in case of skb->encapsulation?
> 
> Looking at push_mpls() and pop_mpls() from OVS implementation, do we
> also need to deal with skb inner network header / proto?
> 
> Given all this would it rather make sense to add MPLS specific helpers
> in similar fashion to the vlan ones we have?
> 
> Is there any other use case aside from MPLS?
Yes, I was targeting a generic encap/decap at l2 level. Maybe more complex than
I thought.

> 
>> +return 0;
>> +}
>> +
>> +static int bpf_skb_adjust_data(struct sk_buff *skb, s32 len_diff)
>> +{
>> +u32 len_diff_abs = abs(len_diff);
>> +bool shrink = len_diff < 0;
>> +int ret;
>> +
>> +if (unlikely(len_diff_abs > 0xfffU))
>> +return -EFAULT;
>> +
>> +if (shrink && len_diff_abs >= skb_headlen(skb))
>> +return -EFAULT;
>> +
>> +ret = shrink ? bpf_skb_data_shrink(skb, len_diff_abs) :
>> +   bpf_skb_data_grow(skb, len_diff_abs);
> 
> Hmm, I think this has a bug in that the above two do not adjust
> skb->mac_len. Then things like cls_bpf_classify() will __skb_pull()
> based on the old mac_len, getting you to a wrong offset in the
> packet when this is for example pushed back into ingress, etc.
Nice catch!


Thank you,
Nicolas


[PATCH net-next 1/4] netns: remove net arg from rtnl_net_fill()

2018-11-21 Thread Nicolas Dichtel
This argument is not used anymore.

Fixes: cab3c8ec8d57 ("netns: always provide the id to rtnl_net_fill()")
Signed-off-by: Nicolas Dichtel 
---
 net/core/net_namespace.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index fefe72774aeb..3e6af99bbe53 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -739,7 +739,7 @@ static int rtnl_net_get_size(void)
 }
 
 static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
-int cmd, struct net *net, int nsid)
+int cmd, int nsid)
 {
struct nlmsghdr *nlh;
struct rtgenmsg *rth;
@@ -801,7 +801,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
id = peernet2id(net, peer);
err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
-   RTM_NEWNSID, net, id);
+   RTM_NEWNSID, id);
if (err < 0)
goto err_out;
 
@@ -833,7 +833,7 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
 
ret = rtnl_net_fill(net_cb->skb, NETLINK_CB(net_cb->cb->skb).portid,
net_cb->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-   RTM_NEWNSID, net_cb->net, id);
+   RTM_NEWNSID, id);
if (ret < 0)
return ret;
 
@@ -876,7 +876,7 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int 
id)
if (!msg)
goto out;
 
-   err = rtnl_net_fill(msg, 0, 0, 0, cmd, net, id);
+   err = rtnl_net_fill(msg, 0, 0, 0, cmd, id);
if (err < 0)
goto err_out;
 
-- 
2.18.0



[PATCH net-next 4/4] netns: enable to dump full nsid translation table

2018-11-21 Thread Nicolas Dichtel
Like the previous patch, the goal is to ease to convert nsids from one
netns to another netns.
A new attribute (NETNSA_CURRENT_NSID) is added to the kernel answer when
NETNSA_TARGET_NSID is provided, thus the user can easily convert nsids.

Signed-off-by: Nicolas Dichtel 
---
 include/uapi/linux/net_namespace.h |  1 +
 net/core/net_namespace.c   | 30 --
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/net_namespace.h 
b/include/uapi/linux/net_namespace.h
index 0ed9dd61d32a..9f9956809565 100644
--- a/include/uapi/linux/net_namespace.h
+++ b/include/uapi/linux/net_namespace.h
@@ -17,6 +17,7 @@ enum {
NETNSA_PID,
NETNSA_FD,
NETNSA_TARGET_NSID,
+   NETNSA_CURRENT_NSID,
__NETNSA_MAX,
 };
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 92730905886c..fc568cd0b560 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -740,7 +740,7 @@ static int rtnl_net_get_size(void)
 }
 
 static int rtnl_net_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
-int cmd, int nsid)
+int cmd, int nsid, bool add_ref, int ref_nsid)
 {
struct nlmsghdr *nlh;
struct rtgenmsg *rth;
@@ -755,6 +755,9 @@ static int rtnl_net_fill(struct sk_buff *skb, u32 portid, 
u32 seq, int flags,
if (nla_put_s32(skb, NETNSA_NSID, nsid))
goto nla_put_failure;
 
+   if (add_ref && nla_put_s32(skb, NETNSA_CURRENT_NSID, ref_nsid))
+   goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
 
@@ -769,9 +772,10 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
struct nlattr *tb[NETNSA_MAX + 1];
struct net *peer, *target = net;
+   bool add_ref = false;
struct nlattr *nla;
struct sk_buff *msg;
-   int err, id;
+   int err, id, ref_id;
 
err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
  rtnl_net_policy, extack);
@@ -809,6 +813,8 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
err = PTR_ERR(target);
goto put_peer;
}
+   ref_id = peernet2id(net, peer);
+   add_ref = true;
} else {
get_net(target);
}
@@ -821,7 +827,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 
id = peernet2id(target, peer);
err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
-   RTM_NEWNSID, id);
+   RTM_NEWNSID, id, add_ref, ref_id);
if (err < 0)
goto free_nlmsg;
 
@@ -849,14 +855,17 @@ struct rtnl_net_dump_cb {
 static int rtnl_net_dumpid_one(int id, void *peer, void *data)
 {
struct rtnl_net_dump_cb *net_cb = (struct rtnl_net_dump_cb *)data;
-   int ret;
+   int ref_id = 0, ret;
 
if (net_cb->idx < net_cb->s_idx)
goto cont;
 
+   if (net_cb->ref_net)
+   ref_id = __peernet2id(net_cb->ref_net, peer);
+
ret = rtnl_net_fill(net_cb->skb, NETLINK_CB(net_cb->cb->skb).portid,
net_cb->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-   RTM_NEWNSID, id);
+   RTM_NEWNSID, id, net_cb->ref_net, ref_id);
if (ret < 0)
return ret;
 
@@ -923,7 +932,16 @@ static int rtnl_net_dumpid(struct sk_buff *skb, struct 
netlink_callback *cb)
}
 
spin_lock_bh(_cb.tgt_net->nsid_lock);
+   if (net_cb.ref_net &&
+   !net_eq(net_cb.ref_net, net_cb.tgt_net) &&
+   !spin_trylock_bh(_cb.ref_net->nsid_lock)) {
+   err = -EAGAIN;
+   goto end;
+   }
idr_for_each(_cb.tgt_net->netns_ids, rtnl_net_dumpid_one, _cb);
+   if (net_cb.ref_net &&
+   !net_eq(net_cb.ref_net, net_cb.tgt_net))
+   spin_unlock_bh(_cb.ref_net->nsid_lock);
spin_unlock_bh(_cb.tgt_net->nsid_lock);
 
cb->args[0] = net_cb.idx;
@@ -942,7 +960,7 @@ static void rtnl_net_notifyid(struct net *net, int cmd, int 
id)
if (!msg)
goto out;
 
-   err = rtnl_net_fill(msg, 0, 0, 0, cmd, id);
+   err = rtnl_net_fill(msg, 0, 0, 0, cmd, id, false, 0);
if (err < 0)
goto err_out;
 
-- 
2.18.0



[PATCH net-next 0/4] Ease to interpret net-nsid

2018-11-21 Thread Nicolas Dichtel


The goal of this series is to ease the interpretation of nsid received in
netlink messages from other netns (when the user uses
NETLINK_F_LISTEN_ALL_NSID).

After this series, with a patched iproute2:

$ ip netns add foo
$ ip netns add bar
$ touch /var/run/netns/init_net
$ mount --bind /proc/1/ns/net /var/run/netns/init_net
$ ip netns set init_net 11
$ ip netns set foo 12
$ ip netns set bar 13
$ ip netns
init_net (id: 11)
bar (id: 13)
foo (id: 12)
$ ip -n foo netns set init_net 21
$ ip -n foo netns set foo 22
$ ip -n foo netns set bar 23
$ ip -n foo netns
init_net (id: 21)
bar (id: 23)
foo (id: 22)
$ ip -n bar netns set init_net 31
$ ip -n bar netns set foo 32
$ ip -n bar netns set bar 33
$ ip -n bar netns
init_net (id: 31)
bar (id: 33)
foo (id: 32)
$ ip netns list-id target-nsid 12
nsid 21 current-nsid 11 (iproute2 netns name: init_net)
nsid 22 current-nsid 12 (iproute2 netns name: foo)
nsid 23 current-nsid 13 (iproute2 netns name: bar)
$ ip -n bar netns list-id target-nsid 32 nsid 31
nsid 21 current-nsid 31 (iproute2 netns name: init_net)

 include/uapi/linux/net_namespace.h |   2 +
 net/core/net_namespace.c   | 132 ++---
 2 files changed, 110 insertions(+), 24 deletions(-)

Comments are welcomed,
Regards,
Nicolas



[PATCH net-next 2/4] netns: add support of NETNSA_TARGET_NSID

2018-11-21 Thread Nicolas Dichtel
Like it was done for link and address, add the ability to perform get/dump
in another netns by specifying a target nsid attribute.

Signed-off-by: Nicolas Dichtel 
---
 include/uapi/linux/net_namespace.h |  1 +
 net/core/net_namespace.c   | 97 --
 2 files changed, 80 insertions(+), 18 deletions(-)

diff --git a/include/uapi/linux/net_namespace.h 
b/include/uapi/linux/net_namespace.h
index 0187c74d8889..0ed9dd61d32a 100644
--- a/include/uapi/linux/net_namespace.h
+++ b/include/uapi/linux/net_namespace.h
@@ -16,6 +16,7 @@ enum {
NETNSA_NSID,
NETNSA_PID,
NETNSA_FD,
+   NETNSA_TARGET_NSID,
__NETNSA_MAX,
 };
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 3e6af99bbe53..3d02a742155f 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -669,6 +669,7 @@ static const struct nla_policy rtnl_net_policy[NETNSA_MAX + 
1] = {
[NETNSA_NSID]   = { .type = NLA_S32 },
[NETNSA_PID]= { .type = NLA_U32 },
[NETNSA_FD] = { .type = NLA_U32 },
+   [NETNSA_TARGET_NSID]= { .type = NLA_S32 },
 };
 
 static int rtnl_net_newid(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -767,9 +768,9 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 {
struct net *net = sock_net(skb->sk);
struct nlattr *tb[NETNSA_MAX + 1];
+   struct net *peer, *target = net;
struct nlattr *nla;
struct sk_buff *msg;
-   struct net *peer;
int err, id;
 
err = nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
@@ -793,30 +794,47 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
return PTR_ERR(peer);
}
 
+   if (tb[NETNSA_TARGET_NSID]) {
+   id = nla_get_s32(tb[NETNSA_TARGET_NSID]);
+   target = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, id);
+   if (IS_ERR(target)) {
+   NL_SET_BAD_ATTR(extack, tb[NETNSA_TARGET_NSID]);
+   NL_SET_ERR_MSG(extack,
+  "Target netns reference is invalid");
+   err = PTR_ERR(target);
+   goto put_peer;
+   }
+   } else {
+   get_net(target);
+   }
+
msg = nlmsg_new(rtnl_net_get_size(), GFP_KERNEL);
if (!msg) {
err = -ENOMEM;
-   goto out;
+   goto put_target;
}
 
-   id = peernet2id(net, peer);
+   id = peernet2id(target, peer);
err = rtnl_net_fill(msg, NETLINK_CB(skb).portid, nlh->nlmsg_seq, 0,
RTM_NEWNSID, id);
if (err < 0)
-   goto err_out;
+   goto free_nlmsg;
 
err = rtnl_unicast(msg, net, NETLINK_CB(skb).portid);
-   goto out;
+   goto put_target;
 
-err_out:
+free_nlmsg:
nlmsg_free(msg);
-out:
+put_target:
+   put_net(target);
+put_peer:
put_net(peer);
return err;
 }
 
 struct rtnl_net_dump_cb {
-   struct net *net;
+   struct net *tgt_net;
+   struct net *ref_net;
struct sk_buff *skb;
struct netlink_callback *cb;
int idx;
@@ -842,29 +860,72 @@ static int rtnl_net_dumpid_one(int id, void *peer, void 
*data)
return 0;
 }
 
+static int rtnl_valid_dump_net_req(const struct nlmsghdr *nlh, struct sock *sk,
+  struct rtnl_net_dump_cb *net_cb,
+  struct netlink_callback *cb)
+{
+   struct netlink_ext_ack *extack = cb->extack;
+   struct nlattr *tb[NETNSA_MAX + 1];
+   int err, i;
+
+   err = nlmsg_parse_strict(nlh, sizeof(struct rtgenmsg), tb, NETNSA_MAX,
+rtnl_net_policy, extack);
+   if (err < 0)
+   return err;
+
+   for (i = 0; i <= NETNSA_MAX; i++) {
+   if (!tb[i])
+   continue;
+
+   if (i == NETNSA_TARGET_NSID) {
+   struct net *net;
+
+   net = rtnl_get_net_ns_capable(sk, nla_get_s32(tb[i]));
+   if (IS_ERR(net)) {
+   NL_SET_BAD_ATTR(extack, tb[i]);
+   NL_SET_ERR_MSG(extack,
+  "Invalid target network 
namespace id");
+   return PTR_ERR(net);
+   }
+   net_cb->ref_net = net_cb->tgt_net;
+   net_cb->tgt_net = net;
+   } else {
+   NL_SET_BAD_ATTR(extack, tb[i]);
+   NL_SET_ERR_MSG(extack,
+  "Unsupported attribute in dump request");
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
 static int rtnl_ne

[PATCH net-next 3/4] netns: enable to specify a nsid for a get request

2018-11-21 Thread Nicolas Dichtel
Combined with NETNSA_TARGET_NSID, it enables to "translate" a nsid from one
netns to a nsid of another netns.
This is useful when using NETLINK_F_LISTEN_ALL_NSID because it helps the
user to interpret a nsid received from an other netns.

Signed-off-by: Nicolas Dichtel 
---
 net/core/net_namespace.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 3d02a742155f..92730905886c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -783,6 +783,11 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh,
} else if (tb[NETNSA_FD]) {
peer = get_net_ns_by_fd(nla_get_u32(tb[NETNSA_FD]));
nla = tb[NETNSA_FD];
+   } else if (tb[NETNSA_NSID]) {
+   peer = get_net_ns_by_id(net, nla_get_u32(tb[NETNSA_NSID]));
+   if (!peer)
+   peer = ERR_PTR(-ENOENT);
+   nla = tb[NETNSA_NSID];
} else {
NL_SET_ERR_MSG(extack, "Peer netns reference is missing");
return -EINVAL;
-- 
2.18.0



[PATCH iproute2] ipnetns: parse nsid as a signed integer

2018-11-21 Thread Nicolas Dichtel
Don't confuse the user, nsid is a signed integer, this kind of command
should return an error: 'ip netns set foo 0x'.

Also, a valid value is a positive value. To let the kernel chooses a value,
the keyword 'auto' must be used.

Signed-off-by: Nicolas Dichtel 
---
 ip/ipnetns.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index 0eac18cf2682..03879b496343 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -35,6 +35,7 @@ static int usage(void)
fprintf(stderr, "   ip [-all] netns exec [NAME] cmd ...\n");
fprintf(stderr, "   ip netns monitor\n");
fprintf(stderr, "   ip netns list-id\n");
+   fprintf(stderr, "NETNSID := auto | POSITIVE-INT\n");
exit(-1);
 }
 
@@ -739,8 +740,7 @@ static int netns_set(int argc, char **argv)
 {
char netns_path[PATH_MAX];
const char *name;
-   unsigned int nsid;
-   int netns;
+   int netns, nsid;
 
if (argc < 1) {
fprintf(stderr, "No netns name specified\n");
@@ -754,8 +754,10 @@ static int netns_set(int argc, char **argv)
/* If a negative nsid is specified the kernel will select the nsid. */
if (strcmp(argv[1], "auto") == 0)
nsid = -1;
-   else if (get_unsigned(, argv[1], 0))
+   else if (get_integer(, argv[1], 0))
invarg("Invalid \"netnsid\" value\n", argv[1]);
+   else if (nsid < 0)
+   invarg("\"netnsid\" value should be >= 0\n", argv[1]);
 
snprintf(netns_path, sizeof(netns_path), "%s/%s", NETNS_RUN_DIR, name);
netns = open(netns_path, O_RDONLY | O_CLOEXEC);
-- 
2.18.0



Re: netns_id in bpf_sk_lookup_{tcp,udp}

2018-11-20 Thread Nicolas Dichtel
Le 20/11/2018 à 16:46, David Ahern a écrit :
> On 11/20/18 2:05 AM, Nicolas Dichtel wrote:
>> Le 20/11/2018 à 00:46, David Ahern a écrit :
[snip]
>>> Seems like alloc_netid() should error out if reqid < -1 (-1 being the
>>> NETNSA_NSID_NOT_ASSIGNED flag) as opposed to blindly ignoring it.
>> alloc_netid() tries to allocate the specified nsid if this nsid is valid, ie 
>> >=
>> 0, else it allocates a new nsid (actually the lower available).
>> This is the expected behavior.
>>
>> For me, it's more an iproute2 problem, which parses an unsigned and silently
>> cast it to a signed value.
> 
> so your intention is that any < 0 value means auto generate not just -1.
Yes. If a valid value is not provided, the kernel tries to allocate one.

Nicolas


Re: netns_id in bpf_sk_lookup_{tcp,udp}

2018-11-20 Thread Nicolas Dichtel
Le 20/11/2018 à 00:46, David Ahern a écrit :
[snip]
> That revelation shows another hole:
> $ ip netns add foo
> $ ip netns set foo 0x
It also works with 0xf000 ...

> $ ip netns list
> foo (id: 0)
> 
> Seems like alloc_netid() should error out if reqid < -1 (-1 being the
> NETNSA_NSID_NOT_ASSIGNED flag) as opposed to blindly ignoring it.
alloc_netid() tries to allocate the specified nsid if this nsid is valid, ie >=
0, else it allocates a new nsid (actually the lower available).
This is the expected behavior.

For me, it's more an iproute2 problem, which parses an unsigned and silently
cast it to a signed value.

-8<

>From 79bac98bfd0acbf2526a3427d5aba96564844209 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel 
Date: Tue, 20 Nov 2018 09:59:46 +0100
Subject: ipnetns: parse nsid as a signed integer

Don't confuse the user, nsid is a signed interger, this kind of command
should return an error: 'ip netns set foo 0xffff'.

Signed-off-by: Nicolas Dichtel 
---
 ip/ipnetns.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index 0eac18cf2682..54346ac987cf 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -739,8 +739,7 @@ static int netns_set(int argc, char **argv)
 {
char netns_path[PATH_MAX];
const char *name;
-   unsigned int nsid;
-   int netns;
+   int netns, nsid;

if (argc < 1) {
fprintf(stderr, "No netns name specified\n");
@@ -754,7 +753,7 @@ static int netns_set(int argc, char **argv)
/* If a negative nsid is specified the kernel will select the nsid. */
if (strcmp(argv[1], "auto") == 0)
nsid = -1;
-   else if (get_unsigned(, argv[1], 0))
+   else if (get_integer(, argv[1], 0))
invarg("Invalid \"netnsid\" value\n", argv[1]);

snprintf(netns_path, sizeof(netns_path), "%s/%s", NETNS_RUN_DIR, name);
-- 
2.13.2


Re: netns_id in bpf_sk_lookup_{tcp,udp}

2018-11-19 Thread Nicolas Dichtel
Le 19/11/2018 à 20:54, David Ahern a écrit :
> On 11/19/18 12:47 PM, Joe Stringer wrote:
>> On Mon, 19 Nov 2018 at 10:39, David Ahern  wrote:
>>>
>>> On 11/19/18 11:36 AM, Joe Stringer wrote:
 Hi David, thanks for pointing this out.

 This is more of an oversight through iterations, the runtime lookup
 will fail to find a socket if the netns value is greater than the
 range of a uint32 so I think it would actually make more sense to drop
 the parameter size to u32 rather than u64 so that this would be
 validated at load time rather than silently returning NULL because of
 a bad parameter.
>>>
>>> ok. I was wondering if it was a u64 to handle nsid of 0 which as I
>>> understand it is a legal nsid. If you drop to u32, how do you know when
>>> nsid has been set?
>>
>> I was operating under the assumption that 0 represents the root netns
>> id, and cannot be assigned to another non-root netns.
>>
>> Looking at __peernet2id_alloc(), it seems to me like it attempts to
>> find a netns and if it cannot find one, returns 0, which then leads to
>> a scroll over the idr starting from 0 to INT_MAX to find a legitimate
>> id for the netns, so I think this is a fair assumption?
The NET_ID_ZERO trick is used to manage nsid 0 in net_eq_idr() (idr_for_each()
stops when the callback returns != 0).

>>
> 
> Maybe Nicolas can give a definitive answer; as I recall he added the
> NSID option. I have not had time to walk the code. But I do recall
> seeing an id of 0. e.g, on my dev box:
> $ ip netns
> vms (id: 0)
> 
> And include/uapi/linux/net_namespace.h shows -1 as not assigned.
Yes, 0 is a valid value and can be assigned to any netns.
nsid are signed 32 bit values. Note that -1 (NETNSA_NSID_NOT_ASSIGNED) is used
by the kernel to express that the nsid is not assigned. It can also be used by
the user to let the kernel chooses a nsid.

$ ip netns add foo
$ ip netns add bar
$ ip netns
bar
foo
$ ip netns set foo 0
$ ip netns set bar auto
$ ip netns
bar (id: 1)
foo (id: 0)


Regards,
Nicolas


Re: [RFC v1 1/3] udp_tunnel: add config option to bind to a device

2018-11-14 Thread Nicolas Dichtel
Le 14/11/2018 à 10:31, Alexis Bauvin a écrit :
> UDP tunnel sockets are always opened unbound to a specific device. This
> patch allow the socket to be bound on a custom device, which
> incidentally makes UDP tunnels VRF-aware if binding to an l3mdev.
> 
> Signed-off-by: Alexis Bauvin 
> Reviewed-by: Amine Kherbouche 
> Tested-by: Amine Kherbouche 
What is the difference with the previous version?
Maybe a cover letter would help to track the history.


Regards,
Nicolas


[PATCH bpf-next v2] filter: add BPF_ADJ_ROOM_DATA mode to bpf_skb_adjust_room()

2018-11-13 Thread Nicolas Dichtel
This new mode enables to add or remove an l2 header in a programmatic way
with cls_bpf.
For example, it enables to play with mpls headers.

Signed-off-by: Nicolas Dichtel 
Acked-by: Martin KaFai Lau 
---

v2: use skb_set_network_header()

 include/uapi/linux/bpf.h   |  3 ++
 net/core/filter.c  | 53 ++
 tools/include/uapi/linux/bpf.h |  3 ++
 3 files changed, 59 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 47d606d744cc..9762ef3a536c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1467,6 +1467,8 @@ union bpf_attr {
  *
  * * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
  *   (room space is added or removed below the layer 3 header).
+ * * **BPF_ADJ_ROOM_DATA**: Adjust room at the beginning of the
+ *   packet (room space is added or removed below skb->data).
  *
  * All values for *flags* are reserved for future usage, and must
  * be left at zero.
@@ -2412,6 +2414,7 @@ enum bpf_func_id {
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET,
+   BPF_ADJ_ROOM_DATA,
 };
 
 /* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
diff --git a/net/core/filter.c b/net/core/filter.c
index 53d50fb75ea1..e56da3077dca 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2884,6 +2884,57 @@ static int bpf_skb_adjust_net(struct sk_buff *skb, s32 
len_diff)
return ret;
 }
 
+static int bpf_skb_data_shrink(struct sk_buff *skb, u32 len)
+{
+   unsigned short hhlen = skb->dev->header_ops ?
+  skb->dev->hard_header_len : 0;
+   int ret;
+
+   ret = skb_unclone(skb, GFP_ATOMIC);
+   if (unlikely(ret < 0))
+   return ret;
+
+   __skb_pull(skb, len);
+   skb_reset_mac_header(skb);
+   skb_set_network_header(skb, hhlen);
+   skb_reset_transport_header(skb);
+   return 0;
+}
+
+static int bpf_skb_data_grow(struct sk_buff *skb, u32 len)
+{
+   unsigned short hhlen = skb->dev->header_ops ?
+  skb->dev->hard_header_len : 0;
+   int ret;
+
+   ret = skb_cow(skb, len);
+   if (unlikely(ret < 0))
+   return ret;
+
+   skb_push(skb, len);
+   skb_reset_mac_header(skb);
+   return 0;
+}
+
+static int bpf_skb_adjust_data(struct sk_buff *skb, s32 len_diff)
+{
+   u32 len_diff_abs = abs(len_diff);
+   bool shrink = len_diff < 0;
+   int ret;
+
+   if (unlikely(len_diff_abs > 0xfffU))
+   return -EFAULT;
+
+   if (shrink && len_diff_abs >= skb_headlen(skb))
+   return -EFAULT;
+
+   ret = shrink ? bpf_skb_data_shrink(skb, len_diff_abs) :
+  bpf_skb_data_grow(skb, len_diff_abs);
+
+   bpf_compute_data_pointers(skb);
+   return ret;
+}
+
 BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
   u32, mode, u64, flags)
 {
@@ -2891,6 +2942,8 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, 
s32, len_diff,
return -EINVAL;
if (likely(mode == BPF_ADJ_ROOM_NET))
return bpf_skb_adjust_net(skb, len_diff);
+   if (likely(mode == BPF_ADJ_ROOM_DATA))
+   return bpf_skb_adjust_data(skb, len_diff);
 
return -ENOTSUPP;
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 852dc17ab47a..47407fd5162b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1467,6 +1467,8 @@ union bpf_attr {
  *
  * * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
  *   (room space is added or removed below the layer 3 header).
+ * * **BPF_ADJ_ROOM_DATA**: Adjust room at the beginning of the
+ *   packet (room space is added or removed below skb->data).
  *
  * All values for *flags* are reserved for future usage, and must
  * be left at zero.
@@ -2408,6 +2410,7 @@ enum bpf_func_id {
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET,
+   BPF_ADJ_ROOM_DATA,
 };
 
 /* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
-- 
2.18.0



Re: [PATCH bpf-next] filter: add BPF_ADJ_ROOM_DATA mode to bpf_skb_adjust_room()

2018-11-10 Thread Nicolas Dichtel
Le 09/11/2018 à 19:51, Martin Lau a écrit :
> On Thu, Nov 08, 2018 at 04:11:37PM +0100, Nicolas Dichtel wrote:
[snip]
>> +static int bpf_skb_data_shrink(struct sk_buff *skb, u32 len)
>> +{
>> +unsigned short hhlen = skb->dev->header_ops ?
>> +   skb->dev->hard_header_len : 0;
>> +int ret;
>> +
>> +ret = skb_unclone(skb, GFP_ATOMIC);
>> +if (unlikely(ret < 0))
>> +return ret;
>> +
>> +__skb_pull(skb, len);
>> +skb_reset_mac_header(skb);
>> +skb_reset_network_header(skb);
>> +skb->network_header += hhlen;
>> +skb_reset_transport_header(skb);
> hmm...why transport_header does not need += hhlen here
> while network_header does?

network_header is mandatory because bpf_redirect(BPF_F_INGRESS) can be called
and network_header is expected to be correctly set in this case.
For transport_header, I choose to not set it, because the stack will set it
later (for example ip_rcv_core()).


Regards,
Nicolas


[PATCH bpf-next] filter: add BPF_ADJ_ROOM_DATA mode to bpf_skb_adjust_room()

2018-11-08 Thread Nicolas Dichtel
This new mode enables to add or remove an l2 header in a programmatic way
with cls_bpf.
For example, it enables to play with mpls headers.

Signed-off-by: Nicolas Dichtel 
---
 include/uapi/linux/bpf.h   |  3 ++
 net/core/filter.c  | 54 ++
 tools/include/uapi/linux/bpf.h |  3 ++
 3 files changed, 60 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 852dc17ab47a..47407fd5162b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1467,6 +1467,8 @@ union bpf_attr {
  *
  * * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
  *   (room space is added or removed below the layer 3 header).
+ * * **BPF_ADJ_ROOM_DATA**: Adjust room at the beginning of the
+ *   packet (room space is added or removed below skb->data).
  *
  * All values for *flags* are reserved for future usage, and must
  * be left at zero.
@@ -2408,6 +2410,7 @@ enum bpf_func_id {
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET,
+   BPF_ADJ_ROOM_DATA,
 };
 
 /* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
diff --git a/net/core/filter.c b/net/core/filter.c
index e521c5ebc7d1..e699849b269d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2884,6 +2884,58 @@ static int bpf_skb_adjust_net(struct sk_buff *skb, s32 
len_diff)
return ret;
 }
 
+static int bpf_skb_data_shrink(struct sk_buff *skb, u32 len)
+{
+   unsigned short hhlen = skb->dev->header_ops ?
+  skb->dev->hard_header_len : 0;
+   int ret;
+
+   ret = skb_unclone(skb, GFP_ATOMIC);
+   if (unlikely(ret < 0))
+   return ret;
+
+   __skb_pull(skb, len);
+   skb_reset_mac_header(skb);
+   skb_reset_network_header(skb);
+   skb->network_header += hhlen;
+   skb_reset_transport_header(skb);
+   return 0;
+}
+
+static int bpf_skb_data_grow(struct sk_buff *skb, u32 len)
+{
+   unsigned short hhlen = skb->dev->header_ops ?
+  skb->dev->hard_header_len : 0;
+   int ret;
+
+   ret = skb_cow(skb, len);
+   if (unlikely(ret < 0))
+   return ret;
+
+   skb_push(skb, len);
+   skb_reset_mac_header(skb);
+   return 0;
+}
+
+static int bpf_skb_adjust_data(struct sk_buff *skb, s32 len_diff)
+{
+   u32 len_diff_abs = abs(len_diff);
+   bool shrink = len_diff < 0;
+   int ret;
+
+   if (unlikely(len_diff_abs > 0xfffU))
+   return -EFAULT;
+
+   if (shrink && len_diff_abs >= skb_headlen(skb))
+   return -EFAULT;
+
+   ret = shrink ? bpf_skb_data_shrink(skb, len_diff_abs) :
+  bpf_skb_data_grow(skb, len_diff_abs);
+
+   bpf_compute_data_pointers(skb);
+   return ret;
+}
+
 BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
   u32, mode, u64, flags)
 {
@@ -2891,6 +2943,8 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, 
s32, len_diff,
return -EINVAL;
if (likely(mode == BPF_ADJ_ROOM_NET))
return bpf_skb_adjust_net(skb, len_diff);
+   if (likely(mode == BPF_ADJ_ROOM_DATA))
+   return bpf_skb_adjust_data(skb, len_diff);
 
return -ENOTSUPP;
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 852dc17ab47a..47407fd5162b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1467,6 +1467,8 @@ union bpf_attr {
  *
  * * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
  *   (room space is added or removed below the layer 3 header).
+ * * **BPF_ADJ_ROOM_DATA**: Adjust room at the beginning of the
+ *   packet (room space is added or removed below skb->data).
  *
  * All values for *flags* are reserved for future usage, and must
  * be left at zero.
@@ -2408,6 +2410,7 @@ enum bpf_func_id {
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET,
+   BPF_ADJ_ROOM_DATA,
 };
 
 /* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
-- 
2.18.0



Re: [PATCH bpf] bpf: fix doc of bpf_skb_adjust_room() in uapi

2018-10-18 Thread Nicolas Dichtel
Le 18/10/2018 à 06:49, Alexei Starovoitov a écrit :
> On Wed, Oct 17, 2018 at 04:24:48PM +0200, Nicolas Dichtel wrote:
>> len_diff is signed.
>>
>> Fixes: fa15601ab31e ("bpf: add documentation for eBPF helpers (33-41)")
>> CC: Quentin Monnet 
>> Signed-off-by: Nicolas Dichtel 
>> ---
>>  include/uapi/linux/bpf.h   | 2 +-
>>  tools/include/uapi/linux/bpf.h | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 66917a4eba27..c4ffe91d5598 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1430,7 +1430,7 @@ union bpf_attr {
>>   *  Return
>>   *  0 on success, or a negative error in case of failure.
>>   *
>> - * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 
>> flags)
>> + * int bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode, u64 
>> flags)
> 
> Thanks. Applied to bpf-next, since we're very late into release cycle.
> 
Yep, I was also wondering if it was not too late for the bpf tree ;-)


Thank you,
Nicolas


[PATCH bpf] bpf: fix doc of bpf_skb_adjust_room() in uapi

2018-10-17 Thread Nicolas Dichtel
len_diff is signed.

Fixes: fa15601ab31e ("bpf: add documentation for eBPF helpers (33-41)")
CC: Quentin Monnet 
Signed-off-by: Nicolas Dichtel 
---
 include/uapi/linux/bpf.h   | 2 +-
 tools/include/uapi/linux/bpf.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 66917a4eba27..c4ffe91d5598 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1430,7 +1430,7 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
- * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 
flags)
+ * int bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode, u64 
flags)
  * Description
  * Grow or shrink the room for data in the packet associated to
  * *skb* by *len_diff*, and according to the selected *mode*.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 66917a4eba27..c4ffe91d5598 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1430,7 +1430,7 @@ union bpf_attr {
  * Return
  * 0 on success, or a negative error in case of failure.
  *
- * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 
flags)
+ * int bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode, u64 
flags)
  * Description
  * Grow or shrink the room for data in the packet associated to
  * *skb* by *len_diff*, and according to the selected *mode*.
-- 
2.18.0



Re: [PATCH net 1/2] geneve, vxlan: Don't check skb_dst() twice

2018-10-15 Thread Nicolas Dichtel
Le 15/10/2018 à 13:08, Stefano Brivio a écrit :
> On Mon, 15 Oct 2018 12:19:41 +0200
> Nicolas Dichtel  wrote:
> 
>> Le 12/10/2018 à 23:53, Stefano Brivio a écrit :
>>> Commit f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids
>>> that we try updating PMTU for a non-existent destination, but didn't clean
>>> up cases where the check was already explicit. Drop those redundant checks. 
>>>  
>> Yes, I leave them to avoid calculating the new mtu value when not needed. We 
>> are
>> in the xmit path.
> 
> Before 2/2 of this series, though, we call skb_dst_update_pmtu() (and
> in turn dst->ops->update_pmtu()) for *every* packet with a dst, which
Not if dst is of type md_dst_ops.

> I'd dare saying is by far the most common case. Besides, 2/2 needs
> anyway to calculate the MTU to fix a bug.
> 
> So I think this is a vast improvement overall.
Fair point.


Re: [PATCH net 1/2] geneve, vxlan: Don't check skb_dst() twice

2018-10-15 Thread Nicolas Dichtel
Le 12/10/2018 à 23:53, Stefano Brivio a écrit :
> Commit f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids
> that we try updating PMTU for a non-existent destination, but didn't clean
> up cases where the check was already explicit. Drop those redundant checks.
Yes, I leave them to avoid calculating the new mtu value when not needed. We are
in the xmit path.
As skb_dst_update_pmtu() is inlined, we probably don't care, but gcc could still
decide to not inline it.


Regards,
Nicolas


Re: [PATCH net] ip6_tunnel: Don't update PMTU on tunnels with collect_md

2018-10-15 Thread Nicolas Dichtel
Le 12/10/2018 à 18:34, Stefano Brivio a écrit :
> On Fri, 12 Oct 2018 17:58:55 +0200
> Nicolas Dichtel  wrote:
[snip]
>> Could you explain in your commit log which problem your patch fixes?
> 
> Nothing really.
> 
> The change in f15ca723c1eb looked accidental and I thought it doesn't
> make sense to update the PMTU in that case, but I didn't figure out
> it's not actually done anyway.
> 
> Maybe it makes things a bit more readable, in that case I'd target it
> for net-next. What do you think?
> 
I don't think that this patch helps. The purpose of the skb_dst_update_pmtu()
helper is to hide those things. If one day, update_pmtu is defined for
md_dst_op, I bet that we won't remove this test.


Regards,
Nicolas


Re: [PATCH net] ip6_tunnel: Don't update PMTU on tunnels with collect_md

2018-10-12 Thread Nicolas Dichtel
Le 12/10/2018 à 14:32, Stefano Brivio a écrit :
> Commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6
> tunnels") introduced a check to avoid updating PMTU when
> collect_md mode is enabled.
> 
> Later, commit f15ca723c1eb ("net: don't call update_pmtu
> unconditionally") dropped this check, I guess inadvertently.
I removed it because update_pmtu() is not set for md_dst_op, thus I assume this
check was done for that purpose.

Could you explain in your commit log which problem your patch fixes?


Re: [PATCH net 1/2] selftests: pmtu: maximum MTU for vti4 is 2^16-1-20

2018-08-30 Thread Nicolas Dichtel
Le 30/08/2018 à 16:01, Sabrina Dubroca a écrit :
> Since commit 82612de1c98e ("ip_tunnel: restore binding to ifaces with a
> large mtu"), the maximum MTU for vti4 is based on IP_MAX_MTU instead of
> the mysterious constant 0xFFF8.  This makes this selftest fail.
> 
> Fixes: 82612de1c98e ("ip_tunnel: restore binding to ifaces with a large mtu")
> Signed-off-by: Sabrina Dubroca 
> Acked-by: Stefano Brivio 

Thanks for fixing this.

Acked-by: Nicolas Dichtel 


Re: [PATCH iproute2 v2 1/2] ip: display netns name instead of nsid

2018-06-06 Thread Nicolas Dichtel
Le 05/06/2018 à 18:52, Stephen Hemminger a écrit :
> On Tue,  5 Jun 2018 15:08:30 +0200
> Nicolas Dichtel  wrote:
> 
>>  
>> +char *get_name_from_nsid(int nsid)
>> +{
>> +struct nsid_cache *c;
>> +
>> +netns_nsid_socket_init();
>> +netns_map_init();
>> +
>> +c = netns_map_get_by_nsid(nsid);
>> +if (c)
>> +return c->name;
>> +
>> +return NULL;
>> +}
>> +
> 
> This is better, but now there is a different problem.
> When doing multiple interfaces, won't the initialization code be called twice?
> 
No, the init function is propected:

void netns_nsid_socket_init(void)
{
if (rtnsh.fd > -1 || !ipnetns_have_nsid())
return;
...

void netns_map_init(void)
{
...
if (initialized || !ipnetns_have_nsid())
return;
...


[PATCH iproute2 v2 1/2] ip: display netns name instead of nsid

2018-06-05 Thread Nicolas Dichtel
When iproute2 has a name for the nsid, let's display it. It's more
user friendly than a number.

Signed-off-by: Nicolas Dichtel 
---
 ip/ip_common.h |  1 +
 ip/ipaddress.c | 20 +++-
 ip/ipnetns.c   | 14 ++
 3 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index 49eb7d7bed40..794478c546cd 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -60,6 +60,7 @@ void netns_map_init(void);
 void netns_nsid_socket_init(void);
 int print_nsid(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
+char *get_name_from_nsid(int nsid);
 int do_ipaddr(int argc, char **argv);
 int do_ipaddrlabel(int argc, char **argv);
 int do_iproute(int argc, char **argv);
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index c7c7e7df4e81..e4a1b985e4e9 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -955,10 +955,16 @@ int print_linkinfo(const struct sockaddr_nl *who,
if (is_json_context()) {
print_int(PRINT_JSON, "link_netnsid", NULL, id);
} else {
-   if (id >= 0)
-   print_int(PRINT_FP, NULL,
- " link-netnsid %d", id);
-   else
+   if (id >= 0) {
+   char *name = get_name_from_nsid(id);
+
+   if (name)
+   print_string(PRINT_FP, NULL,
+" link-netns %s", name);
+   else
+   print_int(PRINT_FP, NULL,
+ " link-netnsid %d", id);
+   } else
print_string(PRINT_FP, NULL,
 " link-netnsid %s", "unknown");
}
@@ -966,8 +972,12 @@ int print_linkinfo(const struct sockaddr_nl *who,
 
if (tb[IFLA_NEW_NETNSID]) {
int id = rta_getattr_u32(tb[IFLA_NEW_NETNSID]);
+   char *name = get_name_from_nsid(id);
 
-   print_int(PRINT_FP, NULL, " new-nsid %d", id);
+   if (name)
+   print_string(PRINT_FP, NULL, " new-netns %s", name);
+   else
+   print_int(PRINT_FP, NULL, " new-netnsid %d", id);
}
if (tb[IFLA_NEW_IFINDEX]) {
int id = rta_getattr_u32(tb[IFLA_NEW_IFINDEX]);
diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index e06100f4ad2d..30af9319f39e 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -169,6 +169,20 @@ static struct nsid_cache *netns_map_get_by_nsid(int nsid)
return NULL;
 }
 
+char *get_name_from_nsid(int nsid)
+{
+   struct nsid_cache *c;
+
+   netns_nsid_socket_init();
+   netns_map_init();
+
+   c = netns_map_get_by_nsid(nsid);
+   if (c)
+   return c->name;
+
+   return NULL;
+}
+
 static int netns_map_add(int nsid, const char *name)
 {
struct nsid_cache *c;
-- 
2.15.1



[PATCH iproute2 v2 0/2] display netns name instead of nsid

2018-06-05 Thread Nicolas Dichtel


After these patches, the iproute2 name of netns is displayed instead of
the nsid. It's easier to read/understand.

v1 -> v2:
 - open netns socket and init netns map only when needed

 ip/ip_common.h |  3 +++
 ip/ipaddress.c | 20 +++-
 ip/iplink.c| 18 --
 ip/ipnetns.c   | 22 --
 4 files changed, 54 insertions(+), 9 deletions(-)
 
Comments are welcomed, 
Regards, 
Nicolas


[PATCH iproute2 v2 2/2] iplink: enable to specify a name for the link-netns

2018-06-05 Thread Nicolas Dichtel
The 'link-netnsid' argument needs a number. Add 'link-netns' when the user
wants to use the iproute2 netns name instead of the nsid.

Example:
ip link add ipip1 link-netns foo type ipip remote 10.16.0.121 local 10.16.0.249

Signed-off-by: Nicolas Dichtel 
---
 ip/ip_common.h |  2 ++
 ip/iplink.c| 18 --
 ip/ipnetns.c   |  8 ++--
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index 794478c546cd..4d3227cbc389 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -61,6 +61,8 @@ void netns_nsid_socket_init(void);
 int print_nsid(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
 char *get_name_from_nsid(int nsid);
+int get_netnsid_from_name(const char *name);
+int set_netnsid_from_name(const char *name, int nsid);
 int do_ipaddr(int argc, char **argv);
 int do_ipaddrlabel(int argc, char **argv);
 int do_iproute(int argc, char **argv);
diff --git a/ip/iplink.c b/ip/iplink.c
index 9ff5f692a1d4..e4d4da96aedb 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -85,7 +85,7 @@ void iplink_usage(void)
" [ broadcast LLADDR ]\n"
" [ mtu MTU ]\n"
" [ netns { PID | NAME } ]\n"
-   " [ link-netnsid ID ]\n"
+   " [ link-netns NAME | link-netnsid ID 
]\n"
" [ alias NAME ]\n"
" [ vf NUM [ mac LLADDR ]\n"
"  [ vlan VLANID [ qos VLAN-QOS 
] [ proto VLAN-PROTO ] ]\n"
@@ -865,10 +865,24 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req, char **type)
 IFLA_INET6_ADDR_GEN_MODE, mode);
addattr_nest_end(>n, afs6);
addattr_nest_end(>n, afs);
+   } else if (matches(*argv, "link-netns") == 0) {
+   NEXT_ARG();
+   if (link_netnsid != -1)
+   duparg("link-netns/link-netnsid", *argv);
+   link_netnsid = get_netnsid_from_name(*argv);
+   /* No nsid? Try to assign one. */
+   if (link_netnsid < 0)
+   set_netnsid_from_name(*argv, -1);
+   link_netnsid = get_netnsid_from_name(*argv);
+   if (link_netnsid < 0)
+   invarg("Invalid \"link-netns\" value\n",
+  *argv);
+   addattr32(>n, sizeof(*req), IFLA_LINK_NETNSID,
+ link_netnsid);
} else if (matches(*argv, "link-netnsid") == 0) {
NEXT_ARG();
if (link_netnsid != -1)
-   duparg("link-netnsid", *argv);
+   duparg("link-netns/link-netnsid", *argv);
if (get_integer(_netnsid, *argv, 0))
invarg("Invalid \"link-netnsid\" value\n",
   *argv);
diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index 30af9319f39e..368be0cbc0a4 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -91,7 +91,7 @@ static int ipnetns_have_nsid(void)
return have_rtnl_getnsid;
 }
 
-static int get_netnsid_from_name(const char *name)
+int get_netnsid_from_name(const char *name)
 {
struct {
struct nlmsghdr n;
@@ -108,6 +108,8 @@ static int get_netnsid_from_name(const char *name)
struct rtgenmsg *rthdr;
int len, fd;
 
+   netns_nsid_socket_init();
+
fd = netns_get_fd(name);
if (fd < 0)
return fd;
@@ -705,7 +707,7 @@ out_delete:
return -1;
 }
 
-static int set_netnsid_from_name(const char *name, int nsid)
+int set_netnsid_from_name(const char *name, int nsid)
 {
struct {
struct nlmsghdr n;
@@ -719,6 +721,8 @@ static int set_netnsid_from_name(const char *name, int nsid)
};
int fd, err = 0;
 
+   netns_nsid_socket_init();
+
fd = netns_get_fd(name);
if (fd < 0)
return fd;
-- 
2.15.1



[PATCH iproute2 1/2] ip: display netns name instead of nsid

2018-06-04 Thread Nicolas Dichtel
When iproute2 has a name for the nsid, let's display it. It's more
user friendly than a number.

Signed-off-by: Nicolas Dichtel 
---
 ip/ip_common.h |  1 +
 ip/ipaddress.c | 23 ++-
 ip/ipnetns.c   | 10 ++
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index 49eb7d7bed40..794478c546cd 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -60,6 +60,7 @@ void netns_map_init(void);
 void netns_nsid_socket_init(void);
 int print_nsid(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
+char *get_name_from_nsid(int nsid);
 int do_ipaddr(int argc, char **argv);
 int do_ipaddrlabel(int argc, char **argv);
 int do_iproute(int argc, char **argv);
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index c7c7e7df4e81..aee09c7ff6df 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -819,6 +819,9 @@ int print_linkinfo(const struct sockaddr_nl *who,
unsigned int m_flag = 0;
SPRINT_BUF(b1);
 
+   netns_nsid_socket_init();
+   netns_map_init();
+
if (n->nlmsg_type != RTM_NEWLINK && n->nlmsg_type != RTM_DELLINK)
return 0;
 
@@ -955,10 +958,16 @@ int print_linkinfo(const struct sockaddr_nl *who,
if (is_json_context()) {
print_int(PRINT_JSON, "link_netnsid", NULL, id);
} else {
-   if (id >= 0)
-   print_int(PRINT_FP, NULL,
- " link-netnsid %d", id);
-   else
+   if (id >= 0) {
+   char *name = get_name_from_nsid(id);
+
+   if (name)
+   print_string(PRINT_FP, NULL,
+" link-netns %s", name);
+   else
+   print_int(PRINT_FP, NULL,
+ " link-netnsid %d", id);
+   } else
print_string(PRINT_FP, NULL,
 " link-netnsid %s", "unknown");
}
@@ -966,8 +975,12 @@ int print_linkinfo(const struct sockaddr_nl *who,
 
if (tb[IFLA_NEW_NETNSID]) {
int id = rta_getattr_u32(tb[IFLA_NEW_NETNSID]);
+   char *name = get_name_from_nsid(id);
 
-   print_int(PRINT_FP, NULL, " new-nsid %d", id);
+   if (name)
+   print_string(PRINT_FP, NULL, " new-netns %s", name);
+   else
+   print_int(PRINT_FP, NULL, " new-netnsid %d", id);
}
if (tb[IFLA_NEW_IFINDEX]) {
int id = rta_getattr_u32(tb[IFLA_NEW_IFINDEX]);
diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index e06100f4ad2d..a4f5b02427e7 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -169,6 +169,16 @@ static struct nsid_cache *netns_map_get_by_nsid(int nsid)
return NULL;
 }
 
+char *get_name_from_nsid(int nsid)
+{
+   struct nsid_cache *c = netns_map_get_by_nsid(nsid);
+
+   if (c)
+   return c->name;
+
+   return NULL;
+}
+
 static int netns_map_add(int nsid, const char *name)
 {
struct nsid_cache *c;
-- 
2.15.1



[PATCH iproute2 0/2] display netns name instead of nsid

2018-06-04 Thread Nicolas Dichtel


[PATCH iproute2 0/2] display netns name instead of nsid
 
After these patches, the iproute2 name of netns is displayed instead of
the nsid. It's easier to read/understand.

 ip/ip_common.h |  3 +++
 ip/ipaddress.c | 23 ++-
 ip/iplink.c| 18 --
 ip/ipnetns.c   | 18 --
 4 files changed, 53 insertions(+), 9 deletions(-)

Comments are welcomed,
Regards,
Nicolas


[PATCH iproute2 2/2] iplink: enable to specify a name for the link-netns

2018-06-04 Thread Nicolas Dichtel
The 'link-netnsid' argument needs a number. Add 'link-netns' when the user
wants to use the iproute2 netns name instead of the nsid.

Example:
ip link add ipip1 link-netns foo type ipip remote 10.16.0.121 local 10.16.0.249

Signed-off-by: Nicolas Dichtel 
---
 ip/ip_common.h |  2 ++
 ip/iplink.c| 18 --
 ip/ipnetns.c   |  8 ++--
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index 794478c546cd..4d3227cbc389 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -61,6 +61,8 @@ void netns_nsid_socket_init(void);
 int print_nsid(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg);
 char *get_name_from_nsid(int nsid);
+int get_netnsid_from_name(const char *name);
+int set_netnsid_from_name(const char *name, int nsid);
 int do_ipaddr(int argc, char **argv);
 int do_ipaddrlabel(int argc, char **argv);
 int do_iproute(int argc, char **argv);
diff --git a/ip/iplink.c b/ip/iplink.c
index 9ff5f692a1d4..e4d4da96aedb 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -85,7 +85,7 @@ void iplink_usage(void)
" [ broadcast LLADDR ]\n"
" [ mtu MTU ]\n"
" [ netns { PID | NAME } ]\n"
-   " [ link-netnsid ID ]\n"
+   " [ link-netns NAME | link-netnsid ID 
]\n"
" [ alias NAME ]\n"
" [ vf NUM [ mac LLADDR ]\n"
"  [ vlan VLANID [ qos VLAN-QOS 
] [ proto VLAN-PROTO ] ]\n"
@@ -865,10 +865,24 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req, char **type)
 IFLA_INET6_ADDR_GEN_MODE, mode);
addattr_nest_end(>n, afs6);
addattr_nest_end(>n, afs);
+   } else if (matches(*argv, "link-netns") == 0) {
+   NEXT_ARG();
+   if (link_netnsid != -1)
+   duparg("link-netns/link-netnsid", *argv);
+   link_netnsid = get_netnsid_from_name(*argv);
+   /* No nsid? Try to assign one. */
+   if (link_netnsid < 0)
+   set_netnsid_from_name(*argv, -1);
+   link_netnsid = get_netnsid_from_name(*argv);
+   if (link_netnsid < 0)
+   invarg("Invalid \"link-netns\" value\n",
+  *argv);
+   addattr32(>n, sizeof(*req), IFLA_LINK_NETNSID,
+ link_netnsid);
} else if (matches(*argv, "link-netnsid") == 0) {
NEXT_ARG();
if (link_netnsid != -1)
-   duparg("link-netnsid", *argv);
+   duparg("link-netns/link-netnsid", *argv);
if (get_integer(_netnsid, *argv, 0))
invarg("Invalid \"link-netnsid\" value\n",
   *argv);
diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index a4f5b02427e7..fb1daade366b 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -91,7 +91,7 @@ static int ipnetns_have_nsid(void)
return have_rtnl_getnsid;
 }
 
-static int get_netnsid_from_name(const char *name)
+int get_netnsid_from_name(const char *name)
 {
struct {
struct nlmsghdr n;
@@ -108,6 +108,8 @@ static int get_netnsid_from_name(const char *name)
struct rtgenmsg *rthdr;
int len, fd;
 
+   netns_nsid_socket_init();
+
fd = netns_get_fd(name);
if (fd < 0)
return fd;
@@ -701,7 +703,7 @@ out_delete:
return -1;
 }
 
-static int set_netnsid_from_name(const char *name, int nsid)
+int set_netnsid_from_name(const char *name, int nsid)
 {
struct {
struct nlmsghdr n;
@@ -715,6 +717,8 @@ static int set_netnsid_from_name(const char *name, int nsid)
};
int fd, err = 0;
 
+   netns_nsid_socket_init();
+
fd = netns_get_fd(name);
if (fd < 0)
return fd;
-- 
2.15.1



Re: [PATCH net v2 0/2] ip[6] tunnels: fix mtu calculations

2018-06-04 Thread Nicolas Dichtel
Le 01/06/2018 à 19:57, David Miller a écrit :
[snip]
> I think the 0xfff8 value might come from the requirement that ipv6
> fragments need to be a multiple of 8 bytes long.
> 
Oh, thanks for the explanation!


Re: [PATCH iproute2] ip: IFLA_NEW_NETNSID/IFLA_NEW_IFINDEX support

2018-06-01 Thread Nicolas Dichtel
Le 31/05/2018 à 17:51, Nicolas Dichtel a écrit :
> Le 31/05/2018 à 17:46, Stephen Hemminger a écrit :
>> On Thu, 31 May 2018 16:28:48 +0200
> [snip]
>> This makes sense. All of linkinfo that is present should be displayed.
>>
>> Both netns and ifindex are really unsigned values. Use __u32 and print_uint.
> Ok.
I replied a bit quickly, both are signed values:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/rtnetlink.c#n1621


Regards,
Nicolas


Re: [PATCH iproute2] ip: IFLA_NEW_NETNSID/IFLA_NEW_IFINDEX support

2018-05-31 Thread Nicolas Dichtel
Le 31/05/2018 à 17:46, Stephen Hemminger a écrit :
> On Thu, 31 May 2018 16:28:48 +0200
[snip]
> This makes sense. All of linkinfo that is present should be displayed.
> 
> Both netns and ifindex are really unsigned values. Use __u32 and print_uint.
Ok.

> Also why not convert numeric values to names?
The only case where the ifname can change is when a netns is deleted and the
interface is put back in init_net. But at this stage, we don't know the new 
name.

For the nsid, you're right, it will be better to display the netns name. If you
agree, I will do this in a following patch, thus all places using nsid can be
converted at the same time.


Regards,
Nicolas


Re: [PATCH iproute2] ip: IFLA_NEW_NETNSID/IFLA_NEW_IFINDEX support

2018-05-31 Thread Nicolas Dichtel
Oops, I use an old email for Stephen, sorry.


Regards,
Nicolas

Le 31/05/2018 à 16:28, Nicolas Dichtel a écrit :
> Parse and display those attributes.
> Example:
> ip l a type dummy
> ip netns add foo
> ip monitor link&
> ip l s dummy1 netns foo
> Deleted 6: dummy1:  mtu 1500 qdisc noop state DOWN group 
> default
> link/ether 66:af:3a:3f:a0:89 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 
> 6
> 
> Signed-off-by: Nicolas Dichtel 
> ---
>  ip/ipaddress.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index 00da14c6f97c..c7c7e7df4e81 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -964,6 +964,17 @@ int print_linkinfo(const struct sockaddr_nl *who,
>   }
>   }
>  
> + if (tb[IFLA_NEW_NETNSID]) {
> + int id = rta_getattr_u32(tb[IFLA_NEW_NETNSID]);
> +
> + print_int(PRINT_FP, NULL, " new-nsid %d", id);
> + }
> + if (tb[IFLA_NEW_IFINDEX]) {
> + int id = rta_getattr_u32(tb[IFLA_NEW_IFINDEX]);
> +
> + print_int(PRINT_FP, NULL, " new-ifindex %d", id);
> + }
> +
>   if (tb[IFLA_PROTO_DOWN]) {
>   if (rta_getattr_u8(tb[IFLA_PROTO_DOWN]))
>   print_bool(PRINT_ANY,
> 


[PATCH iproute2] ip: IFLA_NEW_NETNSID/IFLA_NEW_IFINDEX support

2018-05-31 Thread Nicolas Dichtel
Parse and display those attributes.
Example:
ip l a type dummy
ip netns add foo
ip monitor link&
ip l s dummy1 netns foo
Deleted 6: dummy1:  mtu 1500 qdisc noop state DOWN group 
default
link/ether 66:af:3a:3f:a0:89 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6

Signed-off-by: Nicolas Dichtel 
---
 ip/ipaddress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 00da14c6f97c..c7c7e7df4e81 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -964,6 +964,17 @@ int print_linkinfo(const struct sockaddr_nl *who,
}
}
 
+   if (tb[IFLA_NEW_NETNSID]) {
+   int id = rta_getattr_u32(tb[IFLA_NEW_NETNSID]);
+
+   print_int(PRINT_FP, NULL, " new-nsid %d", id);
+   }
+   if (tb[IFLA_NEW_IFINDEX]) {
+   int id = rta_getattr_u32(tb[IFLA_NEW_IFINDEX]);
+
+   print_int(PRINT_FP, NULL, " new-ifindex %d", id);
+   }
+
if (tb[IFLA_PROTO_DOWN]) {
if (rta_getattr_u8(tb[IFLA_PROTO_DOWN]))
print_bool(PRINT_ANY,
-- 
2.15.1



[PATCH net v2 2/2] ip6_tunnel: remove magic mtu value 0xFFF8

2018-05-31 Thread Nicolas Dichtel
I don't know where this value comes from (probably a copy and paste and
paste and paste ...).
Let's use standard values which are a bit greater.

Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Signed-off-by: Nicolas Dichtel 
---
 net/ipv6/ip6_tunnel.c | 11 ---
 net/ipv6/sit.c|  5 +++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index da66aaac51ce..00e138a44cbb 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1692,8 +1692,13 @@ int ip6_tnl_change_mtu(struct net_device *dev, int 
new_mtu)
if (new_mtu < ETH_MIN_MTU)
return -EINVAL;
}
-   if (new_mtu > 0xFFF8 - dev->hard_header_len)
-   return -EINVAL;
+   if (tnl->parms.proto == IPPROTO_IPV6 || tnl->parms.proto == 0) {
+   if (new_mtu > IP6_MAX_MTU - dev->hard_header_len)
+   return -EINVAL;
+   } else {
+   if (new_mtu > IP_MAX_MTU - dev->hard_header_len)
+   return -EINVAL;
+   }
dev->mtu = new_mtu;
return 0;
 }
@@ -1841,7 +1846,7 @@ ip6_tnl_dev_init_gen(struct net_device *dev)
if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
dev->mtu -= 8;
dev->min_mtu = ETH_MIN_MTU;
-   dev->max_mtu = 0xFFF8 - dev->hard_header_len;
+   dev->max_mtu = IP6_MAX_MTU - dev->hard_header_len;
 
return 0;
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 2afce37a7177..e9400ffa7875 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1371,7 +1371,7 @@ static void ipip6_tunnel_setup(struct net_device *dev)
dev->hard_header_len= LL_MAX_HEADER + t_hlen;
dev->mtu= ETH_DATA_LEN - t_hlen;
dev->min_mtu= IPV6_MIN_MTU;
-   dev->max_mtu= 0xFFF8 - t_hlen;
+   dev->max_mtu= IP6_MAX_MTU - t_hlen;
dev->flags  = IFF_NOARP;
netif_keep_dst(dev);
dev->addr_len   = 4;
@@ -1583,7 +1583,8 @@ static int ipip6_newlink(struct net *src_net, struct 
net_device *dev,
if (tb[IFLA_MTU]) {
u32 mtu = nla_get_u32(tb[IFLA_MTU]);
 
-   if (mtu >= IPV6_MIN_MTU && mtu <= 0xFFF8 - dev->hard_header_len)
+   if (mtu >= IPV6_MIN_MTU &&
+   mtu <= IP6_MAX_MTU - dev->hard_header_len)
dev->mtu = mtu;
}
 
-- 
2.15.1



[PATCH net v2 0/2] ip[6] tunnels: fix mtu calculations

2018-05-31 Thread Nicolas Dichtel


The first patch restores the possibility to bind an ip4 tunnel to an
interface whith a large mtu.
The second patch was spotted after the first fix. I also target it to net
because it fixes the max mtu value that can be used for ipv6 tunnels.

v2: remove the 0xfff8 in ip_tunnel_newlink()

 net/ipv4/ip_tunnel.c  |  8 
 net/ipv6/ip6_tunnel.c | 11 ---
 net/ipv6/sit.c|  5 +++--
 3 files changed, 15 insertions(+), 9 deletions(-)

Comments are welcomed,
Regards,
Nicolas



[PATCH net v2 1/2] ip_tunnel: restore binding to ifaces with a large mtu

2018-05-31 Thread Nicolas Dichtel
After commit f6cc9c054e77, the following conf is broken (note that the
default loopback mtu is 65536, ie IP_MAX_MTU + 1):

$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo
add tunnel "gre0" failed: Invalid argument
$ ip l a type dummy
$ ip l s dummy1 up
$ ip l s dummy1 mtu 65535
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1
add tunnel "gre0" failed: Invalid argument

dev_set_mtu() doesn't allow to set a mtu which is too large.
First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove
the magic value 0xFFF8 and use IP_MAX_MTU instead.
0xFFF8 seems to be there for ages, I don't know why this value was used.

With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU:
$ ip l s dummy1 mtu 66000
After that patch, it's also possible to bind an ip tunnel on that kind of
interface.

CC: Petr Machata 
CC: Ido Schimmel 
Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Fixes: f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes")
Signed-off-by: Nicolas Dichtel 
Reviewed-by: Ido Schimmel 
---
 net/ipv4/ip_tunnel.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 6b0e362cc99b..38d906baf1df 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -328,7 +328,7 @@ static int ip_tunnel_bind_dev(struct net_device *dev)
 
if (tdev) {
hlen = tdev->hard_header_len + tdev->needed_headroom;
-   mtu = tdev->mtu;
+   mtu = min(tdev->mtu, IP_MAX_MTU);
}
 
dev->needed_headroom = t_hlen + hlen;
@@ -362,7 +362,7 @@ static struct ip_tunnel *ip_tunnel_create(struct net *net,
nt = netdev_priv(dev);
t_hlen = nt->hlen + sizeof(struct iphdr);
dev->min_mtu = ETH_MIN_MTU;
-   dev->max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen;
+   dev->max_mtu = IP_MAX_MTU - dev->hard_header_len - t_hlen;
ip_tunnel_add(itn, nt);
return nt;
 
@@ -930,7 +930,7 @@ int __ip_tunnel_change_mtu(struct net_device *dev, int 
new_mtu, bool strict)
 {
struct ip_tunnel *tunnel = netdev_priv(dev);
int t_hlen = tunnel->hlen + sizeof(struct iphdr);
-   int max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen;
+   int max_mtu = IP_MAX_MTU - dev->hard_header_len - t_hlen;
 
if (new_mtu < ETH_MIN_MTU)
return -EINVAL;
@@ -1107,7 +1107,7 @@ int ip_tunnel_newlink(struct net_device *dev, struct 
nlattr *tb[],
 
mtu = ip_tunnel_bind_dev(dev);
if (tb[IFLA_MTU]) {
-   unsigned int max = 0xfff8 - dev->hard_header_len - nt->hlen;
+   unsigned int max = IP_MAX_MTU - dev->hard_header_len - nt->hlen;
 
mtu = clamp(dev->mtu, (unsigned int)ETH_MIN_MTU,
(unsigned int)(max - sizeof(struct iphdr)));
-- 
2.15.1



Re: [PATCH net 1/2] ip_tunnel: restore binding to ifaces with a large mtu

2018-05-31 Thread Nicolas Dichtel
Le 30/05/2018 à 22:29, Ido Schimmel a écrit :
[snip]
> There is another instance of this magic number in the file, but it's
> written in lower case so you might have missed it - see
> ip_tunnel_newlink(). Can you please take care of it in v2?
Good catch, thank you.
Will send a v2.


[PATCH net 1/2] ip_tunnel: restore binding to ifaces with a large mtu

2018-05-30 Thread Nicolas Dichtel
After commit f6cc9c054e77, the following conf is broken (note that the
default loopback mtu is 65536, ie IP_MAX_MTU + 1):

$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo
add tunnel "gre0" failed: Invalid argument
$ ip l a type dummy
$ ip l s dummy1 up
$ ip l s dummy1 mtu 65535
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1
add tunnel "gre0" failed: Invalid argument

dev_set_mtu() doesn't allow to set a mtu which is too large.
First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove
the magic value 0xFFF8 and use IP_MAX_MTU instead.
0xFFF8 seems to be there for ages, I don't know why this value was used.

With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU:
$ ip l s dummy1 mtu 66000
After that patch, it's also possible to bind an ip tunnel on that kind of
interface.

CC: Petr Machata 
CC: Ido Schimmel 
Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Fixes: f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes")
Signed-off-by: Nicolas Dichtel 
---
 net/ipv4/ip_tunnel.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 6b0e362cc99b..3b39c72a1029 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -328,7 +328,7 @@ static int ip_tunnel_bind_dev(struct net_device *dev)
 
if (tdev) {
hlen = tdev->hard_header_len + tdev->needed_headroom;
-   mtu = tdev->mtu;
+   mtu = min(tdev->mtu, IP_MAX_MTU);
}
 
dev->needed_headroom = t_hlen + hlen;
@@ -362,7 +362,7 @@ static struct ip_tunnel *ip_tunnel_create(struct net *net,
nt = netdev_priv(dev);
t_hlen = nt->hlen + sizeof(struct iphdr);
dev->min_mtu = ETH_MIN_MTU;
-   dev->max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen;
+   dev->max_mtu = IP_MAX_MTU - dev->hard_header_len - t_hlen;
ip_tunnel_add(itn, nt);
return nt;
 
@@ -930,7 +930,7 @@ int __ip_tunnel_change_mtu(struct net_device *dev, int 
new_mtu, bool strict)
 {
struct ip_tunnel *tunnel = netdev_priv(dev);
int t_hlen = tunnel->hlen + sizeof(struct iphdr);
-   int max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen;
+   int max_mtu = IP_MAX_MTU - dev->hard_header_len - t_hlen;
 
if (new_mtu < ETH_MIN_MTU)
return -EINVAL;
-- 
2.15.1



[PATCH net 2/2] ip6_tunnel: remove magic mtu value 0xFFF8

2018-05-30 Thread Nicolas Dichtel
I don't know where this value comes from (probably a copy and paste and
paste and paste ...).
Let's use standard values which are a bit greater.

Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Signed-off-by: Nicolas Dichtel 
---
 net/ipv6/ip6_tunnel.c | 11 ---
 net/ipv6/sit.c|  5 +++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index da66aaac51ce..00e138a44cbb 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1692,8 +1692,13 @@ int ip6_tnl_change_mtu(struct net_device *dev, int 
new_mtu)
if (new_mtu < ETH_MIN_MTU)
return -EINVAL;
}
-   if (new_mtu > 0xFFF8 - dev->hard_header_len)
-   return -EINVAL;
+   if (tnl->parms.proto == IPPROTO_IPV6 || tnl->parms.proto == 0) {
+   if (new_mtu > IP6_MAX_MTU - dev->hard_header_len)
+   return -EINVAL;
+   } else {
+   if (new_mtu > IP_MAX_MTU - dev->hard_header_len)
+   return -EINVAL;
+   }
dev->mtu = new_mtu;
return 0;
 }
@@ -1841,7 +1846,7 @@ ip6_tnl_dev_init_gen(struct net_device *dev)
if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
dev->mtu -= 8;
dev->min_mtu = ETH_MIN_MTU;
-   dev->max_mtu = 0xFFF8 - dev->hard_header_len;
+   dev->max_mtu = IP6_MAX_MTU - dev->hard_header_len;
 
return 0;
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 2afce37a7177..e9400ffa7875 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1371,7 +1371,7 @@ static void ipip6_tunnel_setup(struct net_device *dev)
dev->hard_header_len= LL_MAX_HEADER + t_hlen;
dev->mtu= ETH_DATA_LEN - t_hlen;
dev->min_mtu= IPV6_MIN_MTU;
-   dev->max_mtu= 0xFFF8 - t_hlen;
+   dev->max_mtu= IP6_MAX_MTU - t_hlen;
dev->flags  = IFF_NOARP;
netif_keep_dst(dev);
dev->addr_len   = 4;
@@ -1583,7 +1583,8 @@ static int ipip6_newlink(struct net *src_net, struct 
net_device *dev,
if (tb[IFLA_MTU]) {
u32 mtu = nla_get_u32(tb[IFLA_MTU]);
 
-   if (mtu >= IPV6_MIN_MTU && mtu <= 0xFFF8 - dev->hard_header_len)
+   if (mtu >= IPV6_MIN_MTU &&
+   mtu <= IP6_MAX_MTU - dev->hard_header_len)
dev->mtu = mtu;
}
 
-- 
2.15.1



[PATCH net 0/2] ip[6] tunnels: fix mtu calculations

2018-05-30 Thread Nicolas Dichtel


The first patch restores the possibility to bind an ip4 tunnel to an
interface whith a large mtu.
The second patch was spotted after the first fix. I also target it to net
because it fixes the max mtu value that can be used for ipv6 tunnels.

 net/ipv4/ip_tunnel.c  |  6 +++---
 net/ipv6/ip6_tunnel.c | 11 ---
 net/ipv6/sit.c|  5 +++--
 3 files changed, 14 insertions(+), 8 deletions(-)

Comments are welcomed,
Regards,
Nicolas


Re: [PATCH net-next] fib_rules: rename FRA_PROTOCOL to FRA_ORIGINATOR

2018-03-21 Thread Nicolas Dichtel
Le 21/03/2018 à 16:15, David Miller a écrit :
> From: David Ahern 
> Date: Wed, 21 Mar 2018 09:00:09 -0600
> 
>> The rule->proto value is not used as a selector. It is passed in, stored
>> on a rule and returned to userspace. It is book keeping only so an admin
>> has some idea of which program installed the rule.
> 
> This is how I understand this value to work as well.
> 
Me too, and it's the reason why I propose to rename it, to make this clear.
'originator' is a lot more explicit than 'proto'.


Re: [PATCH net-next] fib_rules: rename FRA_PROTOCOL to FRA_ORIGINATOR

2018-03-21 Thread Nicolas Dichtel
Le 20/03/2018 à 18:27, David Ahern a écrit :
> On 3/20/18 11:04 AM, Nicolas Dichtel wrote:
>> As the comment said, this attribute defines the originator of the rule,
>> it's not really a (network) protocol.
>> Let's rename it accordingly to avoid confusion (difference between
>> FRA_PROTOCOL and FRA_IP_PROTO was not obvious).
>>
>> CC: Donald Sharp <sha...@cumulusnetworks.com>
>> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
>> ---
>>
>> FRA_PROTOCOL exists only in net-next for now, thus it's still possible to
>> rename it.
>>
>>  drivers/net/vrf.c  |  4 ++--
>>  include/net/fib_rules.h|  4 ++--
>>  include/uapi/linux/fib_rules.h |  2 +-
>>  net/core/fib_rules.c   | 14 +++---
>>  4 files changed, 12 insertions(+), 12 deletions(-)
>>
>>
> 
> This protocol is meant to be analogous to rtm_protocol. Changing the
> name to FRA_ORIGINATOR loses that connection.
I understand your concerns. But I think the connection still exists after the
patch because the values used for this field are RTPROT_*
rtm_protocol is here from ages and we cannot change that. Moreover, FRA_*
attributes are usually used as a selector or a target, which is not the case for
a route. Thus I think it's important to carrefully choose the name.


Regards,
Nicolas


[PATCH net-next] fib_rules: rename FRA_PROTOCOL to FRA_ORIGINATOR

2018-03-20 Thread Nicolas Dichtel
As the comment said, this attribute defines the originator of the rule,
it's not really a (network) protocol.
Let's rename it accordingly to avoid confusion (difference between
FRA_PROTOCOL and FRA_IP_PROTO was not obvious).

CC: Donald Sharp <sha...@cumulusnetworks.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

FRA_PROTOCOL exists only in net-next for now, thus it's still possible to
rename it.

 drivers/net/vrf.c  |  4 ++--
 include/net/fib_rules.h|  4 ++--
 include/uapi/linux/fib_rules.h |  2 +-
 net/core/fib_rules.c   | 14 +++---
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index c6be49d3a9eb..8f559b74404d 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1146,7 +1146,7 @@ static inline size_t vrf_fib_rule_nl_size(void)
sz  = NLMSG_ALIGN(sizeof(struct fib_rule_hdr));
sz += nla_total_size(sizeof(u8));   /* FRA_L3MDEV */
sz += nla_total_size(sizeof(u32));  /* FRA_PRIORITY */
-   sz += nla_total_size(sizeof(u8));   /* FRA_PROTOCOL */
+   sz += nla_total_size(sizeof(u8));   /* FRA_ORIGINATOR */
 
return sz;
 }
@@ -1177,7 +1177,7 @@ static int vrf_fib_rule(const struct net_device *dev, 
__u8 family, bool add_it)
frh->family = family;
frh->action = FR_ACT_TO_TBL;
 
-   if (nla_put_u8(skb, FRA_PROTOCOL, RTPROT_KERNEL))
+   if (nla_put_u8(skb, FRA_ORIGINATOR, RTPROT_KERNEL))
goto nla_put_failure;
 
if (nla_put_u8(skb, FRA_L3MDEV, 1))
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index e5cfcfc7dd93..8ff89beea845 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -26,7 +26,7 @@ struct fib_rule {
u32 table;
u8  action;
u8  l3mdev;
-   u8  proto;
+   u8  originator;
u8  ip_proto;
u32 target;
__be64  tun_id;
@@ -113,7 +113,7 @@ struct fib_rule_notifier_info {
[FRA_GOTO]  = { .type = NLA_U32 }, \
[FRA_L3MDEV]= { .type = NLA_U8 }, \
[FRA_UID_RANGE] = { .len = sizeof(struct fib_rule_uid_range) }, \
-   [FRA_PROTOCOL]  = { .type = NLA_U8 }, \
+   [FRA_ORIGINATOR] = { .type = NLA_U8 }, \
[FRA_IP_PROTO]  = { .type = NLA_U8 }, \
[FRA_SPORT_RANGE] = { .len = sizeof(struct fib_rule_port_range) }, \
[FRA_DPORT_RANGE] = { .len = sizeof(struct fib_rule_port_range) }
diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
index 232df14e1287..6bb950a9144b 100644
--- a/include/uapi/linux/fib_rules.h
+++ b/include/uapi/linux/fib_rules.h
@@ -63,7 +63,7 @@ enum {
FRA_PAD,
FRA_L3MDEV, /* iif or oif is l3mdev goto its table */
FRA_UID_RANGE,  /* UID range */
-   FRA_PROTOCOL,   /* Originator of the rule */
+   FRA_ORIGINATOR, /* Originator of the rule */
FRA_IP_PROTO,   /* ip proto */
FRA_SPORT_RANGE, /* sport */
FRA_DPORT_RANGE, /* dport */
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index f6f04fc0f629..c81dc0600dc4 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -55,7 +55,7 @@ int fib_default_rule_add(struct fib_rules_ops *ops,
r->pref = pref;
r->table = table;
r->flags = flags;
-   r->proto = RTPROT_KERNEL;
+   r->originator = RTPROT_KERNEL;
r->fr_net = ops->fro_net;
r->uid_range = fib_kuid_range_unset;
 
@@ -505,8 +505,8 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr 
*nlh,
rule->pref = tb[FRA_PRIORITY] ? nla_get_u32(tb[FRA_PRIORITY])
  : fib_default_rule_pref(ops);
 
-   rule->proto = tb[FRA_PROTOCOL] ?
-   nla_get_u8(tb[FRA_PROTOCOL]) : RTPROT_UNSPEC;
+   rule->originator = tb[FRA_ORIGINATOR] ?
+   nla_get_u8(tb[FRA_ORIGINATOR]) : RTPROT_UNSPEC;
 
if (tb[FRA_IIFNAME]) {
struct net_device *dev;
@@ -736,8 +736,8 @@ int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr 
*nlh,
}
 
list_for_each_entry(rule, >rules_list, list) {
-   if (tb[FRA_PROTOCOL] &&
-   (rule->proto != nla_get_u8(tb[FRA_PROTOCOL])))
+   if (tb[FRA_ORIGINATOR] &&
+   (rule->originator != nla_get_u8(tb[FRA_ORIGINATOR])))
continue;
 
if (frh->action && (frh->action != rule->action))
@@ -870,7 +870,7 @@ static inline size_t fib_rule_nlmsg_size(struct 
fib_rules_ops *ops,
 + nla_total_size(4) /* FRA_FWMASK */
 + nla_total_size_64bit(8) /* FRA_TUN_ID */
 + nla_total_siz

[PATCH net] netlink: avoid a double skb free in genlmsg_mcast()

2018-03-14 Thread Nicolas Dichtel
nlmsg_multicast() consumes always the skb, thus the original skb must be
freed only when this function is called with a clone.

Fixes: cb9f7a9a5c96 ("netlink: ensure to loop over all netns in 
genlmsg_multicast_allns()")
Reported-by: Ben Hutchings <ben.hutchi...@codethink.co.uk>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 net/netlink/genetlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 6f02499ef007..b9ce82c9440f 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -1106,7 +1106,7 @@ static int genlmsg_mcast(struct sk_buff *skb, u32 portid, 
unsigned long group,
if (!err)
delivered = true;
else if (err != -ESRCH)
-   goto error;
+   return err;
return delivered ? 0 : -ESRCH;
  error:
kfree_skb(skb);
-- 
2.15.1



Re: [PATCH RFC net-next 5/7] net/ipv6: Add support for path selection using hash of 5-tuple

2018-02-13 Thread Nicolas Dichtel
Le 13/02/2018 à 22:02, David Ahern a écrit :
> On 2/13/18 1:59 PM, Nicolas Dichtel wrote:
>> Le 13/02/2018 à 21:35, David Ahern a écrit :
>>> On 2/13/18 1:31 PM, Nicolas Dichtel wrote:
>>>> Le 13/02/2018 à 01:06, David Ahern a écrit :
>>>>> Some operators prefer IPv6 path selection to use a standard 5-tuple
>>>>> hash rather than just an L3 hash with the flow the label. To that end
>>>>> add support to IPv6 for multipath hash policy similar to bf4e0a3db97eb
>>>>> ("net: ipv4: add support for ECMP hash policy choice"). The default
>>>>> is still L3 which covers source and destination addresses along with
>>>>> flow label and IPv6 protocol.
>>>>>
>>>>> Signed-off-by: David Ahern <dsah...@gmail.com>
>>>>> ---
>>>> Do you plan to add the nl NETCONF support for this new sysctl?
>>>>
>>>
>>> hmmm doesn't exist for the current IPv4 version, but in-kernel
>>> drivers are notified. I wasn't planning on it, but it can be added.
>>>
>> Right for ipv4, it was on my todo list ;-)
>>
> 
> Awesome, then ipv6 is a 1-liner after you add ipv4 support. :-)
> 
Sure :)


Re: [PATCH RFC net-next 5/7] net/ipv6: Add support for path selection using hash of 5-tuple

2018-02-13 Thread Nicolas Dichtel
Le 13/02/2018 à 21:35, David Ahern a écrit :
> On 2/13/18 1:31 PM, Nicolas Dichtel wrote:
>> Le 13/02/2018 à 01:06, David Ahern a écrit :
>>> Some operators prefer IPv6 path selection to use a standard 5-tuple
>>> hash rather than just an L3 hash with the flow the label. To that end
>>> add support to IPv6 for multipath hash policy similar to bf4e0a3db97eb
>>> ("net: ipv4: add support for ECMP hash policy choice"). The default
>>> is still L3 which covers source and destination addresses along with
>>> flow label and IPv6 protocol.
>>>
>>> Signed-off-by: David Ahern <dsah...@gmail.com>
>>> ---
>> Do you plan to add the nl NETCONF support for this new sysctl?
>>
> 
> hmmm doesn't exist for the current IPv4 version, but in-kernel
> drivers are notified. I wasn't planning on it, but it can be added.
> 
Right for ipv4, it was on my todo list ;-)


Re: [PATCH RFC net-next 5/7] net/ipv6: Add support for path selection using hash of 5-tuple

2018-02-13 Thread Nicolas Dichtel
Le 13/02/2018 à 01:06, David Ahern a écrit :
> Some operators prefer IPv6 path selection to use a standard 5-tuple
> hash rather than just an L3 hash with the flow the label. To that end
> add support to IPv6 for multipath hash policy similar to bf4e0a3db97eb
> ("net: ipv4: add support for ECMP hash policy choice"). The default
> is still L3 which covers source and destination addresses along with
> flow label and IPv6 protocol.
> 
> Signed-off-by: David Ahern 
> ---
Do you plan to add the nl NETCONF support for this new sysctl?


Regards,
Nicolas


[PATCH net] netlink: ensure to loop over all netns in genlmsg_multicast_allns()

2018-02-06 Thread Nicolas Dichtel
Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the
case when commit 134e63756d5f was pushed.
However, there was no reason to stop the loop if a netns does not have
listeners.
Returns -ESRCH only if there was no listeners in all netns.

To avoid having the same problem in the future, I didn't take the
assumption that nlmsg_multicast() returns only 0 or -ESRCH.

Fixes: 134e63756d5f ("genetlink: make netns aware")
CC: Johannes Berg <johannes.b...@intel.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 net/netlink/genetlink.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index d444daf1ac04..6f02499ef007 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -1081,6 +1081,7 @@ static int genlmsg_mcast(struct sk_buff *skb, u32 portid, 
unsigned long group,
 {
struct sk_buff *tmp;
struct net *net, *prev = NULL;
+   bool delivered = false;
int err;
 
for_each_net_rcu(net) {
@@ -1092,14 +1093,21 @@ static int genlmsg_mcast(struct sk_buff *skb, u32 
portid, unsigned long group,
}
err = nlmsg_multicast(prev->genl_sock, tmp,
  portid, group, flags);
-   if (err)
+   if (!err)
+   delivered = true;
+   else if (err != -ESRCH)
goto error;
}
 
prev = net;
}
 
-   return nlmsg_multicast(prev->genl_sock, skb, portid, group, flags);
+   err = nlmsg_multicast(prev->genl_sock, skb, portid, group, flags);
+   if (!err)
+   delivered = true;
+   else if (err != -ESRCH)
+   goto error;
+   return delivered ? 0 : -ESRCH;
  error:
kfree_skb(skb);
return err;
-- 
2.15.1



Re: [PATCH net-next 0/3 V1] rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK

2018-01-26 Thread Nicolas Dichtel
Le 26/01/2018 à 09:36, Jiri Benc a écrit :
> On Fri, 26 Jan 2018 00:34:51 +0100, Nicolas Dichtel wrote:
>> Why meaningful? The user knows that the answer is like if if was done in 
>> another
>> netns. It enables to have only one netlink socket instead of one per netns. 
>> But
>> the code using it will be the same.  
> 
> Because you can't use it to query the linked interface. You can't even
> use it as an opaque value to track interfaces (netnsid+ifindex) because
> netnsids are not unique across net name spaces. You can easily have two
> interfaces that have all the ifindex, ifname, netnsid (and basically
> everything else) identical but being completely different interfaces.
Yes, the user have to map those info correctly. And this complexifies the (user)
code a lot.

> That's really not helpful.
> 
>> I fear that with your approach, it will results to a lot of complexity in the
>> kernel.  
> 
> The complexity is (at least partly) already there. It's an inevitable
> result of the design decision to have relative identifiers.
Yes, you're right. My approach moves the complexity to the user, which make this
feature hard to use.

> 
> I agree that we should think about how to make this easy to implement.
> I like your idea of doing this somehow generically. Perhaps it's
> possible to do while keeping the netnsids valid in the caller's netns?
Yes. I agree that it will be a lot easier to use if the conversion is done in
the kernel. And having a generic mechanism will also help a lot to use it.

> 
>> What is really missing for me, is a way to get a fd from an nsid. The user
>> should be able to call RTM_GETNSID with an fd and a nsid and the kernel 
>> performs
>> the needed operations so that the fd points to the corresponding netns.  
> 
> That's what I was missing, too. I even looked into implementing it. But
> opening a fd on behalf of the process and returning it over netlink is a
> wrong thing to do. Netlink messages can get lost. Then you have a fd
> leak you can do nothing about.
Yes, I also looked at this ;-)

> 
> Given that we have netnsids used for so much stuff already (like
> NETLINK_LISTEN_ALL_NSID) you need to track them anyway. And if you need
> to track them, why bother with another identifier? It would be better
> if netnsid can be used universally for anything. Then there will be no
> need for the conversion.
I like this idea a lot. So the missing part is a setns() using the nsid ;-)


Regards,
Nicolas


Re: [PATCH net-next 0/3 V1] rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK

2018-01-25 Thread Nicolas Dichtel
Le 25/01/2018 à 23:30, Jiri Benc a écrit :
> On Thu, 25 Jan 2018 15:20:59 +0100, Nicolas Dichtel wrote:
>> Hmm, I don't agree. For me, it would be the correct answer. If user has a 
>> socket
>> in ns_a and targets a RTM_GETLINK in ns_b, the answer he gets should be like 
>> if
>> it was done in ns_b.
> 
> But that information would be useless for the caller. Why return a
> value that has no meaning for the caller and can not be used? More so
> when the kernel is aware of what the correct meaningful value is?
Why meaningful? The user knows that the answer is like if if was done in another
netns. It enables to have only one netlink socket instead of one per netns. But
the code using it will be the same.
I fear that with your approach, it will results to a lot of complexity in the
kernel.

> 
>> This is already the case with messages received with NETLINK_LISTEN_ALL_NSID,
>> there is no reason to do something different.
> 
> NETLINK_LISTEN_ALL_NSID is tough due to way it is implemented. But yes,
> it should translate the netnsids to be valid in the socket's netns.
> That's the only sane way for the listener.
A listener that uses this option should know the details about each netns it
listens. Thus, he has no problem to interpret the answer.

What is really missing for me, is a way to get a fd from an nsid. The user
should be able to call RTM_GETNSID with an fd and a nsid and the kernel performs
the needed operations so that the fd points to the corresponding netns.

Nicolas


Re: [PATCH net] net: don't call update_pmtu unconditionally

2018-01-25 Thread Nicolas Dichtel
Le 25/01/2018 à 22:28, David Miller a écrit :
> From: Nicolas Dichtel <nicolas.dich...@6wind.com>
> Date: Thu, 25 Jan 2018 19:03:03 +0100
> 
>> Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
>> "BUG: unable to handle kernel NULL pointer dereference at   (null)"
>>
>> Let's add a helper to check if update_pmtu is available before calling it.
>>
>> Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
>> Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
>> CC: Roman Kapl <c...@rkapl.cz>
>> CC: Xin Long <lucien@gmail.com>
>> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
> 
> Applied with the fixup suggested by David Ahern and queued up for -stable,
Thank you for taking care of that.


Regards,
Nicolas


[PATCH net] net: don't call update_pmtu unconditionally

2018-01-25 Thread Nicolas Dichtel
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at   (null)"

Let's add a helper to check if update_pmtu is available before calling it.

Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <c...@rkapl.cz>
CC: Xin Long <lucien@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_cm.c | 3 +--
 drivers/net/geneve.c| 4 ++--
 drivers/net/vxlan.c | 6 ++
 include/net/dst.h   | 8 
 net/ipv4/ip_tunnel.c| 3 +--
 net/ipv4/ip_vti.c   | 2 +-
 net/ipv6/ip6_tunnel.c   | 6 ++
 net/ipv6/ip6_vti.c  | 2 +-
 net/ipv6/sit.c  | 4 ++--
 9 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 2c13123bfd69..71ea9e2c 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1456,8 +1456,7 @@ void ipoib_cm_skb_too_long(struct net_device *dev, struct 
sk_buff *skb,
struct ipoib_dev_priv *priv = ipoib_priv(dev);
int e = skb_queue_empty(>cm.skb_queue);
 
-   if (skb_dst(skb))
-   skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+   skb_dst_update_pmtu(skb, mtu);
 
skb_queue_tail(>cm.skb_queue, skb);
if (e)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 0a48b3073d3d..64fda2e1040e 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -829,7 +829,7 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct 
net_device *dev,
int mtu = dst_mtu(>dst) - sizeof(struct iphdr) -
  GENEVE_BASE_HLEN - info->options_len - 14;
 
-   skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+   skb_dst_update_pmtu(skb, mtu);
}
 
sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
@@ -875,7 +875,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct 
net_device *dev,
int mtu = dst_mtu(dst) - sizeof(struct ipv6hdr) -
  GENEVE_BASE_HLEN - info->options_len - 14;
 
-   skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+   skb_dst_update_pmtu(skb, mtu);
}
 
sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 31f4b7911ef8..c3e34e3c82a7 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2158,8 +2158,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
if (skb_dst(skb)) {
int mtu = dst_mtu(ndst) - VXLAN_HEADROOM;
 
-   skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL,
-  skb, mtu);
+   skb_dst_update_pmtu(skb, mtu);
}
 
tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
@@ -2200,8 +2199,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
if (skb_dst(skb)) {
int mtu = dst_mtu(ndst) - VXLAN6_HEADROOM;
 
-   skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL,
-  skb, mtu);
+   skb_dst_update_pmtu(skb, mtu);
}
 
tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
diff --git a/include/net/dst.h b/include/net/dst.h
index b091fd536098..6216edb58393 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -521,4 +521,12 @@ static inline struct xfrm_state *dst_xfrm(const struct 
dst_entry *dst)
 }
 #endif
 
+static inline void skb_dst_update_pmtu(struct sk_buff *skb, u32 mtu)
+{
+   struct dst_entry *dst = skb_dst(skb);
+
+   if (dst && dst->ops->update_pmtu)
+   skb_dst(skb)->ops->update_pmtu(dst, NULL, skb, mtu);
+}
+
 #endif /* _NET_DST_H */
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 5ddb1cb52bd4..6d21068f9b55 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -520,8 +520,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct 
sk_buff *skb,
else
mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
 
-   if (skb_dst(skb))
-   skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu);
+   skb_dst_update_pmtu(skb, mtu);
 
if (skb->protocol == htons(ETH_P_IP)) {
if (!skb_is_gso(skb) &&
diff --git a/net/ipv4/ip_v

Re: [PATCH net-next 0/3 V1] rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK

2018-01-25 Thread Nicolas Dichtel
Le 24/01/2018 à 17:35, Jiri Benc a écrit :
> On Wed, 24 Jan 2018 16:24:34 +0100, Nicolas Dichtel wrote:
>> I wonder if it would be possible to do something in the netlink framework, 
>> like
>> NETLINK_LISTEN_ALL_NSID.
>> Having some ancillary data at the netlink socket level and a function like
>> nlsock_net() (instead of sock_net()) to get the corresponding netns.
>> With that, it would be possible, in a generci way, to support this feature 
>> for
>> all netlink family.
> 
> I'm not sure it's worth the effort to do that in the framework. You'll
> need modifications all the way down to the code that generates the
> attributes anyway.
> 
> It's not enough to just specify that the operation should be done on a
> different netns and hide that from the handlers. Take for example the
> existing RTM_GETLINK. Let's say it's executed from within ns_a and
> targeted to ns_b (thus IFLA_IF_NETNSID = netnsid of ns_b). Now, if
> there's a veth interface in ns_b whose other end is in ns_c, there will
> be IFLA_LINK_NETNSID attribute in the response. But the value has to be
> netnsid of ns_c as seen from *ns_a*. If you just pretended to switch to
> ns_b before invoking rtnl_getlink, it would be netnsid of ns_c as seen
> from ns_b which would be wrong.
Hmm, I don't agree. For me, it would be the correct answer. If user has a socket
in ns_a and targets a RTM_GETLINK in ns_b, the answer he gets should be like if
it was done in ns_b.
This is already the case with messages received with NETLINK_LISTEN_ALL_NSID,
there is no reason to do something different.


Regards,
Nicolas


[PATCH net-next 2/2] dev: advertise the new ifindex when the netns iface changes

2018-01-25 Thread Nicolas Dichtel
The goal is to let the user follow an interface that moves to another
netns.

CC: Jiri Benc <jb...@redhat.com>
CC: Christian Brauner <christian.brau...@ubuntu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 include/linux/rtnetlink.h|  5 +++--
 include/uapi/linux/if_link.h |  1 +
 net/core/dev.c   | 19 ---
 net/core/rtnetlink.c | 31 ---
 4 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 62d508b31f56..0514cc36ac34 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -19,10 +19,11 @@ extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct 
dst_entry *dst,
 
 void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change, gfp_t 
flags);
 void rtmsg_ifinfo_newnet(int type, struct net_device *dev, unsigned int change,
-gfp_t flags, int *new_nsid);
+gfp_t flags, int *new_nsid, int new_ifindex);
 struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
   unsigned change, u32 event,
-  gfp_t flags, int *new_nsid);
+  gfp_t flags, int *new_nsid,
+  int new_ifindex);
 void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev,
   gfp_t flags);
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8616131e2c61..6d9447700e18 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -163,6 +163,7 @@ enum {
IFLA_IF_NETNSID,
IFLA_CARRIER_UP_COUNT,
IFLA_CARRIER_DOWN_COUNT,
+   IFLA_NEW_IFINDEX,
__IFLA_MAX
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 59987eb6511a..858501b12869 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7360,7 +7360,7 @@ static void rollback_registered_many(struct list_head 
*head)
if (!dev->rtnl_link_ops ||
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U, 0,
-GFP_KERNEL, NULL);
+GFP_KERNEL, NULL, 0);
 
/*
 *  Flush the unicast and multicast chains
@@ -8473,7 +8473,7 @@ EXPORT_SYMBOL(unregister_netdev);
 
 int dev_change_net_namespace(struct net_device *dev, struct net *net, const 
char *pat)
 {
-   int err, new_nsid;
+   int err, new_nsid, new_ifindex;
 
ASSERT_RTNL();
 
@@ -8529,8 +8529,16 @@ int dev_change_net_namespace(struct net_device *dev, 
struct net *net, const char
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
rcu_barrier();
call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
+
new_nsid = peernet2id_alloc(dev_net(dev), net);
-   rtmsg_ifinfo_newnet(RTM_DELLINK, dev, ~0U, GFP_KERNEL, _nsid);
+   /* If there is an ifindex conflict assign a new one */
+   if (__dev_get_by_index(net, dev->ifindex))
+   new_ifindex = dev_new_index(net);
+   else
+   new_ifindex = dev->ifindex;
+
+   rtmsg_ifinfo_newnet(RTM_DELLINK, dev, ~0U, GFP_KERNEL, _nsid,
+   new_ifindex);
 
/*
 *  Flush the unicast and multicast chains
@@ -8544,10 +8552,7 @@ int dev_change_net_namespace(struct net_device *dev, 
struct net *net, const char
 
/* Actually switch the network namespace */
dev_net_set(dev, net);
-
-   /* If there is an ifindex conflict assign a new one */
-   if (__dev_get_by_index(net, dev->ifindex))
-   dev->ifindex = dev_new_index(net);
+   dev->ifindex = new_ifindex;
 
/* Send a netdev-add uevent to the new namespace */
kobject_uevent(>dev.kobj, KOBJ_ADD);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 97874daa1336..cdade82bb363 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -988,6 +988,7 @@ static noinline size_t if_nlmsg_size(const struct 
net_device *dev,
   + rtnl_xdp_size() /* IFLA_XDP */
   + nla_total_size(4)  /* IFLA_EVENT */
   + nla_total_size(4)  /* IFLA_NEW_NETNSID */
+  + nla_total_size(4)  /* IFLA_NEW_IFINDEX */
   + nla_total_size(1)  /* IFLA_PROTO_DOWN */
   + nla_total_size(4)  /* IFLA_IF_NETNSID */
   + nla_total_size(4)  /* IFLA_CARRIER_UP_COUNT */
@@ -1511,7 +1512,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
struct net_device *dev, struct net *src_net,
int type, u32 pid, u32 seq, u32 change,
unsigned int flags, u32 ext_filter_mask,
-  

[PATCH net-next 0/2] Ease to follow an interface that moves to another netns

2018-01-25 Thread Nicolas Dichtel

The goal of this series is to ease the user to follow an interface that
moves to another netns.

After this series, with a patched iproute2:

$ ip netns
bar
foo
$ ip monitor link &
$ ip link set dummy0 netns foo
Deleted 5: dummy0:  mtu 1500 qdisc noop state DOWN group 
default
link/ether 6e:a7:82:35:96:46 brd ff:ff:ff:ff:ff:ff new-nsid 0 new-ifindex 6

=> new nsid: 0, new ifindex: 6 (was 5 in the previous netns)

$ ip link set eth1 netns bar
Deleted 3: eth1:  mtu 1500 qdisc noop state DOWN group 
default
link/ether 52:54:01:12:34:57 brd ff:ff:ff:ff:ff:ff new-nsid 1 new-ifindex 3

=> new nsid: 1, new ifindex: 3 (same ifindex)

$ ip netns
bar (id: 1)
foo (id: 0)

 include/linux/rtnetlink.h|  5 +++--
 include/uapi/linux/if_link.h |  1 +
 net/core/dev.c   | 22 --
 net/core/rtnetlink.c | 31 ---
 4 files changed, 36 insertions(+), 23 deletions(-)

Comments are welcomed,
Regards,
Nicolas


[PATCH net-next 1/2] dev: always advertise the new nsid when the netns iface changes

2018-01-25 Thread Nicolas Dichtel
The user should be able to follow any interface that moves to another
netns.  There is no reason to hide physical interfaces.

CC: Jiri Benc <jb...@redhat.com>
CC: Christian Brauner <christian.brau...@ubuntu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 net/core/dev.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 4670ccabe23a..59987eb6511a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8529,10 +8529,7 @@ int dev_change_net_namespace(struct net_device *dev, 
struct net *net, const char
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
rcu_barrier();
call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
-   if (dev->rtnl_link_ops && dev->rtnl_link_ops->get_link_net)
-   new_nsid = peernet2id_alloc(dev_net(dev), net);
-   else
-   new_nsid = peernet2id(dev_net(dev), net);
+   new_nsid = peernet2id_alloc(dev_net(dev), net);
rtmsg_ifinfo_newnet(RTM_DELLINK, dev, ~0U, GFP_KERNEL, _nsid);
 
/*
-- 
2.15.1



Re: [PATCH] net: make sure skb_dst is valid before using

2018-01-24 Thread Nicolas Dichtel
Le 24/01/2018 à 10:49, Roman Kapl a écrit :
> On 01/24/2018 10:16 AM, Xin Long wrote:
>> On Wed, Jan 24, 2018 at 6:42 AM, Roman Kapl  wrote:
>>> Tunnel devices often use skb_dst(skb)->ops, but ops are not implemented
>>> for metadata tunnel destinations. Use skb_valid_dst to check if skb_dst
>>> is a real (non-metadata) destination.
>>>
>>> Such packets can easily be crafted using tc tunnel_key actions or BPF
>>> and will crash the tunnels, as observed at least with sit, geneve and
>>> vxlan. Some of these tunnels are also intended to be used with metadata
>>> destinations, so this represents a loss of functionality.
>>>
>>> There might be more places with similar patterns, but sometimes it is
>>> hard to tell if metadata dst packets can reach them, so this patch is
>>> not exhaustive.
>> This patch is trying to fix a lot of places, and there may be more.
>> But all because of the lack of .update_pmtu or .neigh_lookup.
>> (dst_link_failure is safe, btw)
>>
>> Not sure if it will be better to add .update_pmtu, .neigh_lookup and
>> even .redirect for md_dst_ops, but just leave them empty ?
>> like fake_dst_ops in br_nf.
> That's what I was suggesting in the original mail. However, it might have 
> slight
> impact on performance. Also it was not done when metadata destinations were
> originally added, was there any good reason?
>> Let's see other's opinions.
> Exactly.
I would prefer a patch that test if the handler is available. It would prevent
to have this bug again in the future. But I don't have a strong opinion about 
this.

But it would be nice to send a patch quickly, before the 4.15 release, to fix
this crash.


Thank you,
Nicolas


Re: [PATCH net-next 0/3 V1] rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK

2018-01-24 Thread Nicolas Dichtel
Hi,

Le 24/01/2018 à 15:26, Christian Brauner a écrit :
> Hi,
> 
> Based on the previous discussion this enables passing a IFLA_IF_NETNSID
> property along with RTM_SETLINK and RTM_DELLINK requests. The patch for
> RTM_NEWLINK will be sent out in a separate patch since there are more
> corner-cases to think about.
I wonder if it would be possible to do something in the netlink framework, like
NETLINK_LISTEN_ALL_NSID.
Having some ancillary data at the netlink socket level and a function like
nlsock_net() (instead of sock_net()) to get the corresponding netns.
With that, it would be possible, in a generci way, to support this feature for
all netlink family.


Regards,
Nicolas


Re: [PATCH net-next 1/1] rtnetlink: request RTM_GETLINK by pid or fd

2018-01-24 Thread Nicolas Dichtel
Le 23/01/2018 à 18:08, Jiri Benc a écrit :
> On Tue, 23 Jan 2018 17:37:11 +0100, Nicolas Dichtel wrote:
>> When a virtual interface moves to another netns, the netlink RTM_DELLINK 
>> message
>> contains the attribute IFLA_NEW_NETNSID, which identifies where the interface
>> moves. The nsid may be allocated if needed.
> 
> The problem is that ifindex may change and it's not announced. The only
> way is to track both ifindex and ifname, watch for the ifname to appear
> in the target netns and update the application's view of ifindex.
Yes, you're right.

> 
> It would be much better if the whole (ifindex, netnsid) pair was
> returned. I think it could be added.
Sure. Do you plan to send a patch?

> 
>> I don't know if it's acceptable to also allocate an nsid in case of a 
>> physical
>> interface.
> 
> I think we need that.
If you agree, I can send a patch to remove this limitation.


Regards,
Nicolas


Re: [PATCH net-next 1/1] rtnetlink: request RTM_GETLINK by pid or fd

2018-01-23 Thread Nicolas Dichtel
Le 23/01/2018 à 13:22, Jiri Benc a écrit :
> (Christian, I'm adding back the netdev list, there's no reason not to
> have the discussion in open.)
> 
> On Tue, 23 Jan 2018 12:42:19 +0100, Christian Brauner wrote:
>> Thanks for the comments and discussion. Sorry, for not going through the
>> list but I just have a quick question that doesn't deserve to spam the
>> whole netdev list. :) I have written another set of patches that would
>> add support for IFLA_IF_NETNSID to RTM_{N,S}ETLINK and friends which I
>> planned to send out after the pid and fd ones. Would you be interested
>> in those if I sent them out over the next week?
> 
> Yes, please CC me. I will have limited time next week (whole day
> meetings for the most of the week) but I'll try to review.
CC me also please.


Regards,
Nicolas


Re: [PATCH net-next 1/1] rtnetlink: request RTM_GETLINK by pid or fd

2018-01-23 Thread Nicolas Dichtel
Le 23/01/2018 à 11:26, Wolfgang Bumiller a écrit :
> On Tue, Jan 23, 2018 at 10:30:09AM +0100, Jiri Benc wrote:
>> On Mon, 22 Jan 2018 23:25:41 +0100, Christian Brauner wrote:
>>> This is not necessarily true in scenarios where I move a network device
>>> via RTM_NEWLINK + IFLA_NET_NS_PID into a network namespace I haven't
>>> created. Here is an example:
>>>
>>> nlmsghdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
>>> nlmsghdr->nlmsg_type = RTM_NEWLINK;
>>> /* move to network namespace of pid */
>>> nla_put_u32(nlmsg, IFLA_NET_NS_PID, pid)
>>> /* give interface new name */
>>> nla_put_string(nlmsg, IFLA_IFNAME, ifname)
>>>
>>> The only thing I have is the pid that identifies the network namespace.
>>
>> How do you know the interface did not get renamed in the new netns?
>>
>> This is racy and won't work reliably. You really need to know the
>> netnsid before moving the interface to the netns to be able to do
>> meaningful queries.
> 
> Even if you know the netnsid, do the mentioned watches work for
> nested/child namespaces if eg. a container creates new namespace before
> and/or after the watch was established and moves interfaces to these
> child namespaces, would you just see them disappear, or can you keep
> track of them later on as well?
nsid can be monitored (see ip monitor nsid).

> 
> Even if that works, from what the documentation tells me netlink is an
> unreliable protocol, so if my watcher's socket buffer is full, wouldn't
> I be losing important tracking information?
You can track socket error statistics. In case of error, you can start a dump to
ensure that you have the right view of the system.

> 
> I think one possible solution to tracking interfaces would be to have a
> unique identifier that never changes (even if it's just a simple
> uint64_t incremented whenever an interface is created). But since
> they're not local to the current namespace that may require a lot of
> extra permission checks (but I'm just speculating here...).
It's not possible to have unique identifiers. With CRIU, you need to be able to
reassign all existing ids.


Re: [PATCH net-next 1/1] rtnetlink: request RTM_GETLINK by pid or fd

2018-01-23 Thread Nicolas Dichtel
Le 22/01/2018 à 23:06, Jiri Benc a écrit :
[snip]
> Btw, we have one missing piece here: when an interface is moved to a
> name space that does not have netnsid attached, we want to find out
> where the interface was moved to. But there's currently no reliable way
> to do it. For veth, the other end can be used to get the netnsid (note
> that RTM_GETLINK will return the correct link netnsid even when the
> queried veth interface is in a different name space), for openvswitch,
> we can now use genetlink, etc., but using different ways for different
> interface types is not the best API ever and for standalone interfaces
> we have nothing. I'd like to see something added to uAPI to cover this
> in a generic way.
When a virtual interface moves to another netns, the netlink RTM_DELLINK message
contains the attribute IFLA_NEW_NETNSID, which identifies where the interface
moves. The nsid may be allocated if needed.
I don't know if it's acceptable to also allocate an nsid in case of a physical
interface.


Regards,
Nicolas


Re: [PATCH 2/2] Revert "xfrm: Fix stack-out-of-bounds read in xfrm_state_find."

2018-01-08 Thread Nicolas Dichtel
Le 05/01/2018 à 18:17, David Miller a écrit :
[snip]
> I will in my next batch of stable submissions.
> 
Thank you!


Re: [PATCH 2/2] Revert "xfrm: Fix stack-out-of-bounds read in xfrm_state_find."

2018-01-05 Thread Nicolas Dichtel
Le 23/12/2017 à 17:09, Steffen Klassert a écrit :
> On Sat, Dec 23, 2017 at 10:56:12AM -0500, David Miller wrote:
>> From: Steffen Klassert 
>> Date: Sat, 23 Dec 2017 10:22:17 +0100
>>
>>> On Thu, Nov 16, 2017 at 11:00:40AM +0100, Steffen Klassert wrote:
 This reverts commit c9f3f813d462c72dbe412cee6a5cbacf13c4ad5e.

 This commit breaks transport mode when the policy template
 has widlcard addresses configured, so revert it.

 Signed-off-by: Steffen Klassert 
>>>
>>> David, can you please queue this one up for v4.14-stable?
>>> Commit ID is 94802151894d482e82c324edf2c658f8e6b96508
>>>
>>> v4.14 is unusable for some people without this revert.
>>
>> Yes, but it adds back the stack out-of-bounds bug.
>>
>> If I queue up the revert, I would also need to queue up whatever
>> follow-on you used to fix the out-of-bounds bug properly.  Which
>> commit is that?
> 
> This is commit ddc47e4404b58f03e98345398fb12d38fe291512
> ("xfrm: Fix stack-out-of-bounds read on socket policy lookup.")
> 
> It is included in the pull request for the net tree that
> I sent yesterday. The patch looks save, but not so sure
> if it should go directly to stable. These bugs reported by
> the syzbot are usually quite subtile and I already broke
> something when I tried to fix the original stack out-of-bounds
> bug. So maybe we should wait until the v4.15 release before
> backporting...
> 
This patch is still missing in the 4.14 stable. Without it, some IPsec scenarii
are broken. Is there a plan to queue this patch for the 4.14 stable ?


Thank you,
Nicolas


Re: [PATCH net v2] netns, rtnetlink: fix struct net reference leak

2017-12-23 Thread Nicolas Dichtel
Le 22/12/2017 à 21:36, Craig Gallek a écrit :
> From: Craig Gallek <kr...@google.com>
> 
> netns ids were added in commit 0c7aecd4bde4 and defined as signed
> integers in both the kernel datastructures and the netlink interface.
> However, the semantics of the implementation assume that the ids
> are always greater than or equal to zero, except for an internal
> sentinal value NETNSA_NSID_NOT_ASSIGNED.
> 
> Several subsequent patches carried this pattern forward.  This patch
> updates all of the netlink input paths of this value to enforce the
> 'greater than or equal to zero' constraint.
> 
> This issue was discovered by syskaller.  It would set a negative
> value for a netns id and then repeatedly call the RTM_GETLINK.
> The put_net call in that path would not trigger for negative netns ids,
> caused a reference count leak, and eventually overflowed.  There are
> probably additional error paths that do not handle this situation
> correctly, but this was the only one I was able to trigger a real
> issue through.
> 
> Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
> Fixes: 317f4810e45e ("rtnl: allow to create device with IFLA_LINK_NETNSID 
> set")
> Fixes: 79e1ad148c84 ("rtnetlink: use netnsid to query interface")
> CC: Jiri Benc <jb...@redhat.com>
> CC: Nicolas Dichtel <nicolas.dich...@6wind.com>
> CC: Jason A. Donenfeld <ja...@zx2c4.com>
> Signed-off-by: Craig Gallek <kr...@google.com>
> ---
>  net/core/net_namespace.c |  2 ++
>  net/core/rtnetlink.c | 17 +
>  2 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 60a71be75aea..4b7ea33f5705 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -627,6 +627,8 @@ static int rtnl_net_newid(struct sk_buff *skb, struct 
> nlmsghdr *nlh,
>   return -EINVAL;
>   }
>   nsid = nla_get_s32(tb[NETNSA_NSID]);
> + if (nsid < 0)
> + return -EINVAL;
No, this breaks the current behavior.
Look at alloc_netid(). If reqid is < 0, the kernel allocates an nsid with no
constraint. If reqid is >= 0, it tries to alloc the specified nsid.

>  
>   if (tb[NETNSA_PID]) {
>   peer = get_net_ns_by_pid(nla_get_u32(tb[NETNSA_PID]));
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index dabba2a91fc8..a928b8f081b8 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1733,10 +1733,12 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, 
> struct netlink_callback *cb)
>   ifla_policy, NULL) >= 0) {
>   if (tb[IFLA_IF_NETNSID]) {
>   netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
> - tgt_net = get_target_net(skb, netnsid);
> - if (IS_ERR(tgt_net)) {
> - tgt_net = net;
> - netnsid = -1;
> + if (netnsid >= 0) {
> + tgt_net = get_target_net(skb, netnsid);
I would prefer to put this test in get_target_net.

> + if (IS_ERR(tgt_net)) {
> + tgt_net = net;
> + netnsid = -1;
Maybe using NETNSA_NSID_NOT_ASSIGNED is better? Same for the initialization of
this variable.

> + }
>   }
>   }
>  
> @@ -2792,6 +2794,11 @@ static int rtnl_newlink(struct sk_buff *skb, struct 
> nlmsghdr *nlh,
>   if (tb[IFLA_LINK_NETNSID]) {
>   int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
>  
> + if (id < 0) {
> + err =  -EINVAL;
> + goto out;
> + }
> +
This is not needed. get_net_ns_by_id() returns NULL if id is < 0.

>   link_net = get_net_ns_by_id(dest_net, id);
>   if (!link_net) {
>   err =  -EINVAL;
> @@ -2883,6 +2890,8 @@ static int rtnl_getlink(struct sk_buff *skb, struct 
> nlmsghdr *nlh,
>  
>   if (tb[IFLA_IF_NETNSID]) {
>   netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
> + if (netnsid < 0)
> + return -EINVAL;
>   tgt_net = get_target_net(skb, netnsid);
>   if (IS_ERR(tgt_net))
>   return PTR_ERR(tgt_net);
> 


Re: [PATCH net] rtnetlink: fix struct net reference leak

2017-12-22 Thread Nicolas Dichtel
Le 21/12/2017 à 23:18, Craig Gallek a écrit :
> From: Craig Gallek 
> 
> The below referenced commit extended the RTM_GETLINK interface to
> allow querying by netns id.  The netnsid property was previously
> defined as a signed integer, but this patch assumes that the user
> always passes a positive integer.  syzkaller discovered this problem
> by setting a negative netnsid and then calling the get-link path
> in a tight loop.  This surprisingly quickly overflows the reference
> count on the associated struct net, potentially destroying it.  When the
> default namespace is used, the machine crashes in strange and interesting
> ways.
> 
> Unfortunately, this is not easy to reproduce with just the ip tool
> as it enforces unsigned integer parsing despite the interface interpeting
> the NETNSID attribute as signed.
> 
> I'm not sure why this attribute is signed in the first place, but
> the first commit that introduced it (6621dd29eb9b) is in v4.15-rc4,
> so I assume it's too late to change.
A valid (assigned) nsid is always >= 0.

> 
> This patch removes the positive netns id assumption, but adds another
> assumption that the netns id 0 is always the 'self' identifying id (for
> which an additional struct net reference is not necessary).
We cannot make this assumption, this is wrong. nsids may be automatically
allocated by the kernel, and it starts by 0.
The current netns can be identify by NETNSA_NSID_NOT_ASSIGNED, ie -1.


Regards,
Nicolas


Re: [PATCH] net: Fix double free and memory corruption in get_net_ns_by_id()

2017-12-20 Thread Nicolas Dichtel
Le 19/12/2017 à 18:27, Eric W. Biederman a écrit :
> 
> (I can trivially verify that that idr_remove in cleanup_net happens
>  after the network namespace count has dropped to zero --EWB)
> 
> Function get_net_ns_by_id() does not check for net::count
> after it has found a peer in netns_ids idr.
> 
> It may dereference a peer, after its count has already been
> finaly decremented. This leads to double free and memory
> corruption:
> 
> put_net(peer)   rtnl_lock()
> atomic_dec_and_test(>count) [count=0] ...
> __put_net(peer) get_net_ns_by_id(net, id)
>   spin_lock(_list_lock)
>   list_add(>cleanup_list, _list)
>   spin_unlock(_list_lock)
> queue_work()  peer = 
> idr_find(>netns_ids, id)
>   |   get_net(peer) [count=1]
>   |   ...
>   |   (use after final put)
>   v   ...
>   cleanup_net()   ...
> spin_lock(_list_lock) ...
> list_replace_init(_list, ..)  ...
> spin_unlock(_list_lock)   ...
> ...   ...
> ...   put_net(peer)
> ... 
> atomic_dec_and_test(>count) [count=0]
> ...   
> spin_lock(_list_lock)
> ...   
> list_add(>cleanup_list, _list)
> ...   
> spin_unlock(_list_lock)
> ... queue_work()
> ...   rtnl_unlock()
> rtnl_lock()   ...
> for_each_net(tmp) {   ...
>   id = __peernet2id(tmp, peer)...
>   spin_lock_irq(>nsid_lock)  ...
>   idr_remove(>netns_ids, id) ...
>   ... ...
>   net_drop_ns()   ...
>   net_free(peer)...
> } ...
>   |
>   v
>   cleanup_net()
> ...
> (Second free of peer)
> 
> Also, put_net() on the right cpu may reorder with left's cpu
> list_replace_init(_list, ..), and then cleanup_list
> will be corrupted.
> 
> Since cleanup_net() is executed in worker thread, while
> put_net(peer) can happen everywhere, there should be
> enough time for concurrent get_net_ns_by_id() to pick
> the peer up, and the race does not seem to be unlikely.
> The patch fixes the problem in standard way.
> 
> (Also, there is possible problem in peernet2id_alloc(), which requires
> check for net::count under nsid_lock and maybe_get_net(peer), but
> in current stable kernel it's used under rtnl_lock() and it has to be
> safe. Openswitch begun to use peernet2id_alloc(), and possibly it should
> be fixed too. While this is not in stable kernel yet, so I'll send
> a separate message to netdev@ later).
> 
> Cc: Nicolas Dichtel <nicolas.dich...@6wind.com>
> Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com>
> Fixes: 0c7aecd4bde4 "netns: add rtnl cmd to add and get peer netns ids"
> Reviewed-by: Andrey Ryabinin <aryabi...@virtuozzo.com>
> Reviewed-by: "Eric W. Biederman" <ebied...@xmission.com>
> Signed-off-by: Eric W. Biederman <ebied...@xmission.com>
Good catch, thank you.

Acked-by: Nicolas Dichtel <nicolas.dich...@6wind.com>


Re: [PATCH net] sit: update frag_off info

2017-11-30 Thread Nicolas Dichtel
Le 30/11/2017 à 03:41, Hangbin Liu a écrit :
> After parsing the sit netlink change info, we forget to update frag_off in
> ipip6_tunnel_update(). Fix it by assigning frag_off with new value.
> 
> Fixes: f37234160233 ("sit: add support of link creation via rtnl")
I think it's older than this patch (before git ages), but it probably doesn't
matter.


> Reported-by: Jianlin Shi <ji...@redhat.com>
> Signed-off-by: Hangbin Liu <liuhang...@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dich...@6wind.com>


Re: [iproute PATCH] link_gre6: Detect invalid encaplimit values

2017-11-30 Thread Nicolas Dichtel
Le 28/11/2017 à 16:49, Phil Sutter a écrit :
> Looks like a typo: get_u8() returns 0 on success and -1 on error, so the
> error checking here was ineffective.
> 
> Fixes: a11b7b71a6eba ("link_gre6: really support encaplimit option")
> Signed-off-by: Phil Sutter <p...@nwl.cc>
Good catch!

Acked-by: Nicolas Dichtel <nicolas.dich...@6wind.com>


Re: [RFC] [PATCH] netns: Fix race in virtual interface bringup

2017-11-16 Thread Nicolas Dichtel
Le 15/11/2017 à 20:04, Dan Rue a écrit :
> Adding CC netdev
> 
> Can someone comment on the expected behavior of this test case?
> 
> Here's the isolated test:
> 
> ip netns del tst_net_ns0
> ip netns del tst_net_ns1
> ip netns add tst_net_ns0
> ip netns add tst_net_ns1
> ip netns exec tst_net_ns0 ip link add veth0 type veth peer name veth1
> ip netns exec tst_net_ns0 ip link set veth1 netns tst_net_ns1
> 
> ip netns exec tst_net_ns0 ifconfig veth0 inet6 add fd00::2/64
> ip netns exec tst_net_ns1 ifconfig veth1 inet6 add fd00::3/64
ip netns exec tst_net_ns0 sysctl -w net.ipv6.conf.all.accept_dad=0
ip netns exec tst_net_ns0 sysctl -w net.ipv6.conf.veth0.accept_dad=0
ip netns exec tst_net_ns1 sysctl -w net.ipv6.conf.all.accept_dad=0
ip netns exec tst_net_ns1 sysctl -w net.ipv6.conf.veth1.accept_dad=0

> ip netns exec tst_net_ns0 ifconfig veth0 up
> ip netns exec tst_net_ns1 ifconfig veth1 up
> 
> #sleep 2>
> ip netns exec tst_net_ns0 ping6 -q -c2 -I veth0 fd00::3
> 
> This is essentially what LTP is running. Sometimes, on some systems,
> ping6 fails with "connect: Cannot assign requested address". Adding a
> "sleep 2" always fixes it (but we'd obviously like to avoid a hard coded
> sleep in the test).
Probably due to DAD. By disabling DAD, you should be able to use immediately the
ipv6 addrs.


Regards,
Nicolas


Re: [PATCH net v2] ipv6: set all.accept_dad to 0 by default

2017-11-15 Thread Nicolas Dichtel
Le 15/11/2017 à 11:25, David Miller a écrit :
> From: Nicolas Dichtel <nicolas.dich...@6wind.com>
> Date: Wed, 15 Nov 2017 11:17:28 +0100
> 
>> I saw that you pushed this patch in net-next instead of net. Is it 
>> intentional?
>> I was expecting to see it in net, to fix the 4.14.
> 
> All changes are going into net-next as I prepare to send a pull request
> to Linus for the merge window.
> 
> This is always how things happen.
> 
Oh ok, thanks for the explanation.
It's the first time I submit a patch just before the pull request of net-next 
;-)

Nicolas


Re: [PATCH net v2] ipv6: set all.accept_dad to 0 by default

2017-11-15 Thread Nicolas Dichtel
Le 14/11/2017 à 14:21, Nicolas Dichtel a écrit :
> With commits 35e015e1f577 and a2d3f3e33853, the global 'accept_dad' flag
> is also taken into account (default value is 1). If either global or
> per-interface flag is non-zero, DAD will be enabled on a given interface.
> 
> This is not backward compatible: before those patches, the user could
> disable DAD just by setting the per-interface flag to 0. Now, the
> user instead needs to set both flags to 0 to actually disable DAD.
> 
> Restore the previous behaviour by setting the default for the global
> 'accept_dad' flag to 0. This way, DAD is still enabled by default,
> as per-interface flags are set to 1 on device creation, but setting
> them to 0 is enough to disable DAD on a given interface.
> 
> - Before 35e015e1f57a7 and a2d3f3e33853:
>   globalper-interfaceDAD enabled
> [default]   1 1  yes
> X 0  no
> X 1  yes
> 
> - After 35e015e1f577 and a2d3f3e33853:
>   globalper-interfaceDAD enabled
> [default]   1 1  yes
> 0 0  no
> 0 1  yes
> 1 0  yes
> 
> - After this fix:
>   globalper-interfaceDAD enabled
> 1 1  yes
> 0 0  no
> [default]   0 1  yes
> 1 0  yes
> 
> Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
> Fixes: a2d3f3e33853 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for 
> real")
> CC: Stefano Brivio <sbri...@redhat.com>
> CC: Matteo Croce <mcr...@redhat.com>
> CC: Erik Kline <e...@google.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
David,

I saw that you pushed this patch in net-next instead of net. Is it intentional?
I was expecting to see it in net, to fix the 4.14.


Thank you,
Nicolas


  1   2   3   4   5   6   >