from:"Nicolas Dichtel"

[PATCH] uapi: use wildcards to list files

2017-01-03 Thread Nicolas Dichtel

Regularly, when a new header is created in include/uapi/, the developer
forgets to add it in the corresponding Kbuild file. This error is usually
detected after the release is out.

In fact, all headers under include/uapi/ should be exported, so let's
use wildcards.

After this patch, the following files, which were not exported, are now
exported:
drm/vgem_drm.h
drm/armada_drm.h
drm/omap_drm.h
drm/etnaviv_drm.h
rdma/qedr-abi.h
linux/bcache.h
linux/kfd_ioctl.h
linux/cryptouser.h
linux/kcm.h
linux/kcov.h
linux/seg6_iptunnel.h
linux/stm.h
linux/seg6.h
linux/auto_dev-ioctl.h
linux/userio.h
linux/pr.h
linux/wil6210_uapi.h
linux/nilfs2_ondisk.h
linux/hash_info.h
linux/seg6_genl.h
linux/seg6_hmac.h
linux/batman_adv.h
linux/nsfs.h
linux/qrtr.h
linux/btrfs_tree.h
linux/coresight-stm.h
linux/dma-buf.h
linux/module.h
linux/lightnvm.h
linux/nilfs2_api.h

Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

This patch is built against linus tree. I don't know if it should be
done against antoher tree.

Comments are welcomed,
Nicolas

 include/uapi/asm-generic/Kbuild|  36 +--
 include/uapi/drm/Kbuild|  22 +-
 include/uapi/linux/Kbuild  | 463 +
 include/uapi/linux/android/Kbuild  |   2 +-
 include/uapi/linux/byteorder/Kbuild|   3 +-
 include/uapi/linux/caif/Kbuild |   3 +-
 include/uapi/linux/can/Kbuild  |   6 +-
 include/uapi/linux/dvb/Kbuild  |   9 +-
 include/uapi/linux/hdlc/Kbuild |   2 +-
 include/uapi/linux/hsi/Kbuild  |   2 +-
 include/uapi/linux/iio/Kbuild  |   3 +-
 include/uapi/linux/isdn/Kbuild |   2 +-
 include/uapi/linux/mmc/Kbuild  |   2 +-
 include/uapi/linux/netfilter/Kbuild|  88 +-
 include/uapi/linux/netfilter/ipset/Kbuild  |   5 +-
 include/uapi/linux/netfilter_arp/Kbuild|   3 +-
 include/uapi/linux/netfilter_bridge/Kbuild |  18 +-
 include/uapi/linux/netfilter_ipv4/Kbuild   |  10 +-
 include/uapi/linux/netfilter_ipv6/Kbuild   |  13 +-
 include/uapi/linux/nfsd/Kbuild |   6 +-
 include/uapi/linux/raid/Kbuild |   3 +-
 include/uapi/linux/spi/Kbuild  |   2 +-
 include/uapi/linux/sunrpc/Kbuild   |   2 +-
 include/uapi/linux/tc_act/Kbuild   |  15 +-
 include/uapi/linux/tc_ematch/Kbuild|   5 +-
 include/uapi/linux/usb/Kbuild  |  12 +-
 include/uapi/linux/wimax/Kbuild|   2 +-
 include/uapi/misc/Kbuild   |   2 +-
 include/uapi/mtd/Kbuild|   6 +-
 include/uapi/rdma/Kbuild   |  17 +-
 include/uapi/rdma/hfi/Kbuild   |   2 +-
 include/uapi/scsi/Kbuild   |   5 +-
 include/uapi/scsi/fc/Kbuild|   5 +-
 include/uapi/sound/Kbuild  |  16 +-
 include/uapi/video/Kbuild  |   4 +-
 include/uapi/xen/Kbuild|   5 +-
 36 files changed, 47 insertions(+), 754 deletions(-)

diff --git a/include/uapi/asm-generic/Kbuild b/include/uapi/asm-generic/Kbuild
index b73de7bb7a62..8e52cdc3d941 100644
--- a/include/uapi/asm-generic/Kbuild
+++ b/include/uapi/asm-generic/Kbuild
@@ -1,36 +1,2 @@
 # UAPI Header export list
-header-y += auxvec.h
-header-y += bitsperlong.h
-header-y += errno-base.h
-header-y += errno.h
-header-y += fcntl.h
-header-y += int-l64.h
-header-y += int-ll64.h
-header-y += ioctl.h
-header-y += ioctls.h
-header-y += ipcbuf.h
-header-y += kvm_para.h
-header-y += mman-common.h
-header-y += mman.h
-header-y += msgbuf.h
-header-y += param.h
-header-y += poll.h
-header-y += posix_types.h
-header-y += resource.h
-header-y += sembuf.h
-header-y += setup.h
-header-y += shmbuf.h
-header-y += shmparam.h
-header-y += siginfo.h
-header-y += signal-defs.h
-header-y += signal.h
-header-y += socket.h
-header-y += sockios.h
-header-y += stat.h
-header-y += statfs.h
-header-y += swab.h
-header-y += termbits.h
-header-y += termios.h
-header-y += types.h
-header-y += ucontext.h
-header-y += unistd.h
+header-y += $(notdir $(wildcard $(srctree)/include/uapi/asm-generic/*.h))
diff --git a/include/uapi/drm/Kbuild b/include/uapi/drm/Kbuild
index 9355dd8eff3b..75f4cde6d9ba 100644
--- a/include/uapi/drm/Kbuild
+++ b/include/uapi/drm/Kbuild
@@ -1,22 +1,2 @@
 # UAPI Header export list
-header-y += drm.h
-header-y += drm_fourcc.h
-header-y += drm_mode.h
-header-y += drm_sarea.h
-header-y += amdgpu_drm.h
-header-y += exynos_drm.h
-header-y += i810_drm.h
-header-y += i915_drm.h
-header-y += mga_drm.h
-header-y += nouveau_drm.h
-header-y += qxl_drm.h
-header-y += r128_drm.h
-header-y += radeon_drm.h
-header-y += savage_drm.h
-header-y += sis_drm.h
-header-y += tegra_drm.h
-header-y += via_drm.h
-header-y += vmwgfx_drm.h
-header-y += msm_drm.h
-header-y += vc4_drm.h
-header-y += virtgpu_drm.h
+header-y += $(notdir $(wildcard $(srctree)/include/uapi/drm/*.h))
diff --git a/include/uapi/linux/Kbuild b/includ

Re: [PATCH ipsec-next 6/8] xfrm: policy: only use rcu in xfrm_sk_policy_lookup

2016-11-17 Thread Nicolas Dichtel

Le 16/11/2016 à 18:30, Florian Westphal a écrit :
[snip]
> This should fix it (if we succeed grabbing the refcount, then if (err && 
> !xfrm_pol_hold_rcu
> evaluates to false and we hit last else branch and set pol to ERR_PTR(0)...
Haha, right :) Thank you for the quick fix.
The patch solves my problem. You can add my 'tested-by' tag in the formal
submission.

Tested-by: Nicolas Dichtel <nicolas.dich...@6wind.com>

Re: [PATCH net v2] net: nsid cannot be allocated for a dead netns

2016-11-17 Thread Nicolas Dichtel

Le 17/11/2016 à 07:32, Cong Wang a écrit :
[snip]
> since Nicolas' patch doesn't even compile...
It's always surprising how agressive you are, really :(
Can you show me the compilation error of this patch (we are talking about the v2
patch here)?

Re: [Patch net] net: check dead netns for peernet2id_alloc()

2016-11-17 Thread Nicolas Dichtel

Le 16/11/2016 à 19:27, Cong Wang a écrit :
> Andrei reports we still allocate netns ID from idr after we destroy
> it in cleanup_net().
> 
> cleanup_net():
>   ...
>   idr_destroy(>netns_ids);
>   ...
>   list_for_each_entry_reverse(ops, _list, list)
> ops_exit_list(ops, _exit_list);
>   -> rollback_registered_many()
> -> rtmsg_ifinfo_build_skb()
>  -> rtnl_fill_ifinfo()
>-> peernet2id_alloc()
> 
> After that point we should not even access net->netns_ids, we
> should check the death of the current netns as early as we can in
> peernet2id_alloc().
> 
> For net-next we can consider to avoid sending rtmsg totally,
> it is a good optimization for netns teardown path.
> 
> Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
> Reported-by: Andrei Vagin <ava...@gmail.com>
> Cc: Nicolas Dichtel <nicolas.dich...@6wind.com>
> Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>

Re: [PATCH net v2] net: nsid cannot be allocated for a dead netns

2016-11-17 Thread Nicolas Dichtel

Le 17/11/2016 à 05:17, David Miller a écrit :
[snip]
> Indeed, this looks cleaner, Nicolas?
Yes, I agree. Cong already sent a v3 of my patch, let's apply it.

Re: [PATCH ipsec-next 6/8] xfrm: policy: only use rcu in xfrm_sk_policy_lookup

2016-11-16 Thread Nicolas Dichtel

Le 11/08/2016 à 15:17, Florian Westphal a écrit :
> Don't acquire the readlock anymore and rely on rcu alone.
> 
> In case writer on other CPU changed policy at the wrong moment (after we
> obtained sk policy pointer but before we could obtain the reference)
> just repeat the lookup.
> 
> Signed-off-by: Florian Westphal 
Since this patch, our IKEv1 Transport tests (using charon) fail to establish the
connection. If I revert it, the IKE negociation is ok again.
charon logs are enclosed.

I didn't had time to investigate now, but any idea is welcomed ;-)

Regards,
Nicolas
/var/log/syslog:Nov 16 16:28:47 router charon: 04[CFG] received stroke: add 
connection 'mytunnel-17'
/var/log/syslog:Nov 16 16:28:47 router charon: 04[CFG] added configuration 
'mytunnel-17'
/var/log/syslog:Nov 16 16:28:47 router charon: 08[CFG] received stroke: route 
'mytunnel-17'
/var/log/syslog:Nov 16 16:28:52 router charon: 13[KNL] creating acquire job for 
policy 10.125.0.1/32[gre] === 10.125.0.2/32[gre] with reqid {1}
/var/log/syslog:Nov 16 16:28:52 router charon: 13[IKE] initiating Aggressive 
Mode IKE_SA mytunnel-17[1] to 10.125.0.2
/var/log/syslog:Nov 16 16:28:52 router charon: 13[ENC] generating AGGRESSIVE 
request 0 [ SA KE No ID V V V V V ]
/var/log/syslog:Nov 16 16:28:52 router charon: 13[NET] sending packet: from 
10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:28:56 router charon: 07[IKE] sending retransmit 1 of 
request message ID 0, seq 1
/var/log/syslog:Nov 16 16:28:56 router charon: 07[NET] sending packet: from 
10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:29:04 router charon: 08[IKE] sending retransmit 2 of 
request message ID 0, seq 1
/var/log/syslog:Nov 16 16:29:04 router charon: 08[NET] sending packet: from 
10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:29:17 router charon: 09[IKE] sending retransmit 3 of 
request message ID 0, seq 1
/var/log/syslog:Nov 16 16:29:17 router charon: 09[NET] sending packet: from 
10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:29:40 router charon: 11[IKE] sending retransmit 4 of 
request message ID 0, seq 1
/var/log/syslog:Nov 16 16:29:40 router charon: 11[NET] sending packet: from 
10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:30:22 router charon: 05[IKE] sending retransmit 5 of 
request message ID 0, seq 1
/var/log/syslog:Nov 16 16:30:22 router charon: 05[NET] sending packet: from 
10.125.0.1[500] to 10.125.0.2[500] (548 bytes)
/var/log/syslog:Nov 16 16:31:37 router charon: 15[KNL] creating delete job for 
CHILD_SA ESP/0x/10.125.0.2
/var/log/syslog:Nov 16 16:31:37 router charon: 13[JOB] CHILD_SA 
ESP/0x/10.125.0.2 not found for delete
/var/log/syslog:Nov 16 16:31:38 router charon: 09[IKE] giving up after 5 
retransmits
/var/log/syslog:Nov 16 16:31:38 router charon: 09[IKE] establishing IKE_SA 
failed, peer not responding

[PATCH net v2] net: nsid cannot be allocated for a dead netns

2016-11-16 Thread Nicolas Dichtel

Andrei reports the following kmemleak error:
unreferenced object 0x91badb543950 (size 2096):
  comm "kworker/u4:0", pid 6, jiffies 4295152553 (age 28.418s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 cb 5f df ba 91 ff ff  .._.
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[] kmemleak_alloc+0x4a/0xa0
[] kmem_cache_alloc+0x128/0x280
[] idr_layer_alloc+0x2b/0x90
[] idr_get_empty_slot+0x34d/0x370
[] idr_alloc+0x5e/0x110
[] __peernet2id_alloc+0x6d/0x90
[] peernet2id_alloc+0x55/0xb0
[] rtnl_fill_ifinfo+0xaa6/0x10a0
[] rtmsg_ifinfo_build_skb+0x73/0xd0
[] rollback_registered_many+0x295/0x390
[] unregister_netdevice_many+0x25/0x80
[] default_device_exit_batch+0x145/0x170
[] ops_exit_list.isra.4+0x52/0x60
[] cleanup_net+0x1bf/0x2a0
[] process_one_work+0x1ff/0x660
[] worker_thread+0x4e/0x480

There is no reason to try to allocate an nsid for a netns which is dying.

Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
Reported-by: Andrei Vagin <ava...@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

v2: fix compilation
add the 'Fixes' tag.

Andrei, can you test this new version?

Regards,
Nicolas

 net/core/net_namespace.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f61c0e02a413..63f65387f4e1 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -159,6 +159,9 @@ static int alloc_netid(struct net *net, struct net *peer, 
int reqid)
max = reqid + 1;
}
 
+   if (!atomic_read(>count) || !atomic_read(>count))
+   return -EINVAL;
+
return idr_alloc(>netns_ids, peer, min, max, GFP_ATOMIC);
 }
 
-- 
2.8.1

Re: [PATCH net] net: nsid cannot be allocated for a dead netns

2016-11-16 Thread Nicolas Dichtel

Le 16/11/2016 à 10:10, kbuild test robot a écrit :
> Hi Nicolas,
> 
> [auto build test ERROR on net/master]
And I finally send the wrong version of the patch :(

Re: [PATCH net] net: nsid cannot be allocated for a dead netns

2016-11-16 Thread Nicolas Dichtel

Le 16/11/2016 à 09:41, Nicolas Dichtel a écrit :
> Andrei reports the following kmemleak error:
> unreferenced object 0x91badb543950 (size 2096):
>   comm "kworker/u4:0", pid 6, jiffies 4295152553 (age 28.418s)
>   hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 cb 5f df ba 91 ff ff  .._.
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [] kmemleak_alloc+0x4a/0xa0
> [] kmem_cache_alloc+0x128/0x280
> [] idr_layer_alloc+0x2b/0x90
> [] idr_get_empty_slot+0x34d/0x370
> [] idr_alloc+0x5e/0x110
> [] __peernet2id_alloc+0x6d/0x90
> [] peernet2id_alloc+0x55/0xb0
> [] rtnl_fill_ifinfo+0xaa6/0x10a0
> [] rtmsg_ifinfo_build_skb+0x73/0xd0
> [] rollback_registered_many+0x295/0x390
> [] unregister_netdevice_many+0x25/0x80
> [] default_device_exit_batch+0x145/0x170
> [] ops_exit_list.isra.4+0x52/0x60
> [] cleanup_net+0x1bf/0x2a0
> [] process_one_work+0x1ff/0x660
> [] worker_thread+0x4e/0x480
> 
> There is no reason to try to allocate an nsid for a netns which is dying.
> 
> Reported-by: Andrei Vagin <ava...@gmail.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
'Fixes' tag is missing :/

Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")

[PATCH net] net: nsid cannot be allocated for a dead netns

2016-11-16 Thread Nicolas Dichtel

Andrei reports the following kmemleak error:
unreferenced object 0x91badb543950 (size 2096):
  comm "kworker/u4:0", pid 6, jiffies 4295152553 (age 28.418s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 cb 5f df ba 91 ff ff  .._.
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[] kmemleak_alloc+0x4a/0xa0
[] kmem_cache_alloc+0x128/0x280
[] idr_layer_alloc+0x2b/0x90
[] idr_get_empty_slot+0x34d/0x370
[] idr_alloc+0x5e/0x110
[] __peernet2id_alloc+0x6d/0x90
[] peernet2id_alloc+0x55/0xb0
[] rtnl_fill_ifinfo+0xaa6/0x10a0
[] rtmsg_ifinfo_build_skb+0x73/0xd0
[] rollback_registered_many+0x295/0x390
[] unregister_netdevice_many+0x25/0x80
[] default_device_exit_batch+0x145/0x170
[] ops_exit_list.isra.4+0x52/0x60
[] cleanup_net+0x1bf/0x2a0
[] process_one_work+0x1ff/0x660
[] worker_thread+0x4e/0x480

There is no reason to try to allocate an nsid for a netns which is dying.

Reported-by: Andrei Vagin <ava...@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

Thank you for the report. Can you test this patch?

Regards,
Nicolas

 net/core/net_namespace.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f61c0e02a413..f1340ed0d8df 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -159,6 +159,9 @@ static int alloc_netid(struct net *net, struct net *peer, 
int reqid)
max = reqid + 1;
}
 
+   if (!atomic_read(>count) || !_read(peer->count))
+   return -EINVAL;
+
return idr_alloc(>netns_ids, peer, min, max, GFP_ATOMIC);
 }
 
-- 
2.8.1

Re: [Patch net] net: saving irq context for peernet2id()

2016-10-20 Thread Nicolas Dichtel

Le 20/10/2016 à 08:52, Cong Wang a écrit :
> A kernel warning inside __local_bh_enable_ip() was reported by people
> running SELinux, this is caused due to some SELinux functions
> (indirectly) call peernet2id() with IRQ disabled in process context,
> when we re-enable BH with IRQ disabled kernel complains. Shut up this
> warning by saving IRQ context in peernet2id(), BH is still implicitly
> disabled.

Franckly, reverting the original patch seems better for me. The intention for
that patch was "it's not needed", see the commit log:

"We never read or change netns id in hardirq context,
the only place we read netns id in softirq context
is in vxlan_xmit(). So, it should be enough to just
disable BH."

Now, we see that "it's needed" and that the analysis was wrong. If a race is
introduced by this patch, it will be hard to detect and fix it.


Regards,
Nicolas

Re: [PATCH net] conntrack: perform a full scan in gc

2016-10-20 Thread Nicolas Dichtel

Le 18/10/2016 à 12:06, Nicolas Dichtel a écrit :
> Le 18/10/2016 à 10:47, Florian Westphal a écrit :
>> Nicolas Dichtel <nicolas.dich...@6wind.com> wrote:
>>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
>>> timed-out entries"), netlink conntrack deletion events may be sent with a
>>> huge delay (5 minutes).
>>>
>>> There is two ways to evict conntrack:
>>>  - during a conntrack lookup;
>>>  - during a conntrack dump.
>>> Let's do a full scan of conntrack entries after a period of inactivity
>>> (no conntrack lookup).
>>>
>>> CC: Florian Westphal <f...@strlen.de>
>>> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
>>> ---
>>>
>>> Here is another proposal to try to fix the problem.
>>> Comments are welcomed,
>>> Nicolas
>>
>> Hmm, I don't think its good idea in practice.
>> If goal is to avoid starving arbitrary 'dead' ct for too long,
>> then simple ping will defeat the logic here, because...
>>
>>>  net/netfilter/nf_conntrack_core.c | 11 +--
>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/netfilter/nf_conntrack_core.c 
>>> b/net/netfilter/nf_conntrack_core.c
>>> index ba6a1d421222..3dbb27bd9582 100644
>>> --- a/net/netfilter/nf_conntrack_core.c
>>> +++ b/net/netfilter/nf_conntrack_core.c
>>> @@ -87,6 +87,7 @@ static __read_mostly bool nf_conntrack_locks_all;
>>>  #define GC_MAX_BUCKETS 8192u
>>>  #define GC_INTERVAL(5 * HZ)
>>>  #define GC_MAX_EVICTS  256u
>>> +static bool gc_full_scan = true;
>>>  
>>>  static struct conntrack_gc_work conntrack_gc_work;
>>>  
>>> @@ -511,6 +512,7 @@ nf_conntrack_find(struct net *net, const struct 
>>> nf_conntrack_zone *zone,
>>> unsigned int bucket, hsize;
>>>  
>>>  begin:
>>> +   gc_full_scan = false;
>>
>> ... we do periodic lookup (but always in same slot), so no full scan is
>> triggered.
> Yes, I was wondering about that. My first idea was to have that bool per 
> bucket
> and force a scan of the bucket instead of the whole table.
FYI, I'am off for about two weeks, but we really need to find a way to fix that
before the release goes out.

Re: Useless debug warning "netlink: 16 bytes leftover after parsing attributes"

2016-10-19 Thread Nicolas Dichtel

Le 18/10/2016 à 07:06, Marcel Holtmann a écrit :
> Hi,
> 
> so lately I am seeing a bunch of these warnings:
> 
> netlink: 16 bytes leftover after parsing attributes..
> 
> While they give you the process name, they are still useless to track down 
> the message that causes them. I find them even more useless since an updated 
> userspace on an older kernel can trigger the nla_policy warning here. And 
> that updated userspace program is doing nothing wrong by including extra 
> attributes. So what purpose is this warning serving?
Usually, it means that a netlink message is malformed, for example that the
header of the family has the wrong size. An unknown attribute should not trigger
this kind of message.
Here is an example:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e5eca6d41f53d


Regards,
Nicolas

[PATCH net] conntrack: restart gc immediately if GC_MAX_EVICTS is reached

2016-10-19 Thread Nicolas Dichtel

When the maximum evictions number is reached, do not wait 5 seconds before
the next run.

CC: Florian Westphal <f...@strlen.de>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 net/netfilter/nf_conntrack_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index ba6a1d421222..df2f5a3901df 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -983,7 +983,7 @@ static void gc_worker(struct work_struct *work)
return;
 
ratio = scanned ? expired_count * 100 / scanned : 0;
-   if (ratio >= 90)
+   if (ratio >= 90 || expired_count == GC_MAX_EVICTS)
next_run = 0;
 
gc_work->last_bucket = i;
-- 
2.8.1

Re: [PATCH net] conntrack: perform a full scan in gc

2016-10-18 Thread Nicolas Dichtel

Le 18/10/2016 à 10:47, Florian Westphal a écrit :
> Nicolas Dichtel <nicolas.dich...@6wind.com> wrote:
>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
>> timed-out entries"), netlink conntrack deletion events may be sent with a
>> huge delay (5 minutes).
>>
>> There is two ways to evict conntrack:
>>  - during a conntrack lookup;
>>  - during a conntrack dump.
>> Let's do a full scan of conntrack entries after a period of inactivity
>> (no conntrack lookup).
>>
>> CC: Florian Westphal <f...@strlen.de>
>> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
>> ---
>>
>> Here is another proposal to try to fix the problem.
>> Comments are welcomed,
>> Nicolas
> 
> Hmm, I don't think its good idea in practice.
> If goal is to avoid starving arbitrary 'dead' ct for too long,
> then simple ping will defeat the logic here, because...
> 
>>  net/netfilter/nf_conntrack_core.c | 11 +--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/netfilter/nf_conntrack_core.c 
>> b/net/netfilter/nf_conntrack_core.c
>> index ba6a1d421222..3dbb27bd9582 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -87,6 +87,7 @@ static __read_mostly bool nf_conntrack_locks_all;
>>  #define GC_MAX_BUCKETS  8192u
>>  #define GC_INTERVAL (5 * HZ)
>>  #define GC_MAX_EVICTS   256u
>> +static bool gc_full_scan = true;
>>  
>>  static struct conntrack_gc_work conntrack_gc_work;
>>  
>> @@ -511,6 +512,7 @@ nf_conntrack_find(struct net *net, const struct 
>> nf_conntrack_zone *zone,
>>  unsigned int bucket, hsize;
>>  
>>  begin:
>> +gc_full_scan = false;
> 
> ... we do periodic lookup (but always in same slot), so no full scan is
> triggered.
Yes, I was wondering about that. My first idea was to have that bool per bucket
and force a scan of the bucket instead of the whole table.

> 
> If you think its useful, consider sending patch that rescheds worker
> instantly in case budget expired, otherwise I will do this later this
> week.
Ok, I will send it, but it does not address the "inactivity" problem.

> 
> [ I am aware doing instant restart might be too late, but at least we
>   would then reap more entries once we stumble upon large number of
>   expired ones ].
>

[PATCH net] conntrack: perform a full scan in gc

2016-10-18 Thread Nicolas Dichtel

After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
timed-out entries"), netlink conntrack deletion events may be sent with a
huge delay (5 minutes).

There is two ways to evict conntrack:
 - during a conntrack lookup;
 - during a conntrack dump.
Let's do a full scan of conntrack entries after a period of inactivity
(no conntrack lookup).

CC: Florian Westphal <f...@strlen.de>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

Here is another proposal to try to fix the problem.
Comments are welcomed,
Nicolas

 net/netfilter/nf_conntrack_core.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index ba6a1d421222..3dbb27bd9582 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -87,6 +87,7 @@ static __read_mostly bool nf_conntrack_locks_all;
 #define GC_MAX_BUCKETS 8192u
 #define GC_INTERVAL(5 * HZ)
 #define GC_MAX_EVICTS  256u
+static bool gc_full_scan = true;
 
 static struct conntrack_gc_work conntrack_gc_work;
 
@@ -511,6 +512,7 @@ nf_conntrack_find(struct net *net, const struct 
nf_conntrack_zone *zone,
unsigned int bucket, hsize;
 
 begin:
+   gc_full_scan = false;
nf_conntrack_get_ht(_hash, );
bucket = reciprocal_scale(hash, hsize);
 
@@ -942,7 +944,11 @@ static void gc_worker(struct work_struct *work)
 
gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
 
-   goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, 
GC_MAX_BUCKETS);
+   if (gc_full_scan)
+   goal = nf_conntrack_htable_size;
+   else
+   goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV,
+  GC_MAX_BUCKETS);
i = gc_work->last_bucket;
 
do {
@@ -977,7 +983,8 @@ static void gc_worker(struct work_struct *work)
rcu_read_unlock();
cond_resched_rcu_qs();
} while (++buckets < goal &&
-expired_count < GC_MAX_EVICTS);
+(gc_full_scan || expired_count < GC_MAX_EVICTS));
+   gc_full_scan = true;
 
if (gc_work->exiting)
return;
-- 
2.8.1

Re: [PATCH net 2/2] conntrack: enable to tune gc parameters

2016-10-14 Thread Nicolas Dichtel

Le 13/10/2016 à 22:43, Florian Westphal a écrit :
> Nicolas Dichtel <nicolas.dich...@6wind.com> wrote:
>> Le 10/10/2016 à 16:04, Florian Westphal a écrit :
>>> Nicolas Dichtel <nicolas.dich...@6wind.com> wrote:
>>>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
>>>> timed-out entries"), netlink conntrack deletion events may be sent with a
>>>> huge delay. It could be interesting to let the user tweak gc parameters
>>>> depending on its use case.
>>>
>>> Hmm, care to elaborate?
>>>
>>> I am not against doing this but I'd like to hear/read your use case.
>>>
>>> The expectation is that in almot all cases eviction will happen from
>>> packet path.  The gc worker is jusdt there for case where a busy system
>>> goes idle.
>> It was precisely that case. After a period of activity, the event is sent a 
>> long
>> time after the timeout. If the router does not manage a lot of flows, why not
>> trying to parse more entries instead of the default 1/64 of the table?
>> In fact, I don't understand why using GC_MAX_BUCKETS_DIV instead of using 
>> always
>> GC_MAX_BUCKETS whatever the size of the table is.
> 
> I wanted to make sure that we have a known upper bound on the number of
> buckets we process so that we do not block other pending kworker items
> for too long.
I don't understand. GC_MAX_BUCKETS is the upper bound and I agree that it is
needed. But why GC_MAX_BUCKETS_DIV (ie 1/64)?
In other words, why this line:
goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);
instead of:
goal = GC_MAX_BUCKETS;
?

> 
> (Or cause too many useless scans)
> 
> Another idea worth trying might be to get rid of the max cap and
> instead break early in case too many jiffies expired.
> 
> I don't want to add sysctl knobs for this unless absolutely needed; its 
> already
> possible to 'force' eviction cycle by running 'conntrack -L'.
> 
Sure, but this is not a "real" solution, just a workaround.
We need to find a way to deliver conntrack deletion events in a reasonable
delay, whatever the traffic on the machine is.

[PATCH net] ipv6: correctly add local routes when lo goes up

2016-10-12 Thread Nicolas Dichtel

The goal of the patch is to fix this scenario:
 ip link add dummy1 type dummy
 ip link set dummy1 up
 ip link set lo down ; ip link set lo up

After that sequence, the local route to the link layer address of dummy1 is
not there anymore.

When the loopback is set down, all local routes are deleted by
addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still
exists, because the corresponding idev has a reference on it. After the rcu
grace period, dst_rcu_free() is called, and thus ___dst_free(), which will
set obsolete to DST_OBSOLETE_DEAD.

In this case, init_loopback() is called before dst_rcu_free(), thus
obsolete is still sets to something <= 0. So, the function doesn't add the
route again. To avoid that race, let's check the rt6 refcnt instead.

Fixes: 25fb6ca4ed9c ("net IPv6 : Fix broken IPv6 routing table after loopback 
down-up")
Fixes: a881ae1f625c ("ipv6: don't call addrconf_dst_alloc again when enable lo")
Fixes: 33d99113b110 ("ipv6: reallocate addrconf router for ipv6 address when lo 
device up")
Reported-by: Francesco Santoro <francesco.sant...@6wind.com>
Reported-by: Samuel Gauthier <samuel.gauth...@6wind.com>
CC: Balakumaran Kannan <balakumaran.kan...@ap.sony.com>
CC: Maruthi Thotad <maruthi.tho...@ap.sony.com>
CC: Sabrina Dubroca <s...@queasysnail.net>
CC: Hannes Frederic Sowa <han...@stressinduktion.org>
CC: Weilong Chen <chenweil...@huawei.com>
CC: Gao feng <gaof...@cn.fujitsu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 net/ipv6/addrconf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d8983e15f859..9faafe58516a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3018,7 +3018,7 @@ static void init_loopback(struct net_device *dev)
 * lo device down, release this obsolete dst and
 * reallocate a new router for ifa.
 */
-   if (sp_ifa->rt->dst.obsolete > 0) {
+   if (!atomic_read(_ifa->rt->rt6i_ref)) {
ip6_rt_put(sp_ifa->rt);
sp_ifa->rt = NULL;
} else {
-- 
2.8.1

Re: [PATCH net 2/2] conntrack: enable to tune gc parameters

2016-10-10 Thread Nicolas Dichtel

Le 10/10/2016 à 16:04, Florian Westphal a écrit :
> Nicolas Dichtel <nicolas.dich...@6wind.com> wrote:
>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
>> timed-out entries"), netlink conntrack deletion events may be sent with a
>> huge delay. It could be interesting to let the user tweak gc parameters
>> depending on its use case.
> 
> Hmm, care to elaborate?
> 
> I am not against doing this but I'd like to hear/read your use case.
> 
> The expectation is that in almot all cases eviction will happen from
> packet path.  The gc worker is jusdt there for case where a busy system
> goes idle.
It was precisely that case. After a period of activity, the event is sent a long
time after the timeout. If the router does not manage a lot of flows, why not
trying to parse more entries instead of the default 1/64 of the table?
In fact, I don't understand why using GC_MAX_BUCKETS_DIV instead of using always
GC_MAX_BUCKETS whatever the size of the table is.

> 
>> +nf_conntrack_gc_max_evicts - INTEGER
>> +The maximum number of entries to be evicted during a run of gc.
>> +This sysctl is only writeable in the initial net namespace.
> 
> Hmmm, do you have any advice on sizing this one?
In fact, no ;-)
I really hesitate to expose the four values or just a subset. My goal was also
to get feedback. I can remove this one.

> 
> I think a better change might be (instead of adding htis knob) to
> resched the gc worker for immediate re-executaion in case the entire
> "budget" was used.  What do you think?
Even if it's not directly related to my problem, I think it's a good idea.

> 
> 
> diff --git a/net/netfilter/nf_conntrack_core.c 
> b/net/netfilter/nf_conntrack_core.c
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -983,7 +983,7 @@ static void gc_worker(struct work_struct *work)
> return;
>  
> ratio = scanned ? expired_count * 100 / scanned : 0;
> -   if (ratio >= 90)
> +   if (ratio >= 90 || expired_count == GC_MAX_EVICTS)
> next_run = 0;
>

[PATCH net 0/2] conntrack update

2016-10-10 Thread Nicolas Dichtel

The first patch is a small documentation fix.
The second patch adds more flexibility to configure the conntrack gc. I target
it specifically to net and not net-next because the removal of the conntrack
timer has just land into net.

 Documentation/networking/nf_conntrack-sysctl.txt | 35 +++
 include/net/netfilter/nf_conntrack_core.h|  5 
 net/netfilter/nf_conntrack_core.c| 17 +--
 net/netfilter/nf_conntrack_standalone.c  | 36 
 4 files changed, 67 insertions(+), 26 deletions(-)

Comments are welcomed,
Regards,
Nicolas

[PATCH net 1/2] conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout)

2016-10-10 Thread Nicolas Dichtel

This entry has been removed in commit 9500507c6138.

Fixes: 9500507c6138 ("netfilter: conntrack: remove timer from ecache extension")
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 Documentation/networking/nf_conntrack-sysctl.txt | 18 --
 1 file changed, 18 deletions(-)

diff --git a/Documentation/networking/nf_conntrack-sysctl.txt 
b/Documentation/networking/nf_conntrack-sysctl.txt
index 4fb51d32fccc..399e4e866a9c 100644
--- a/Documentation/networking/nf_conntrack-sysctl.txt
+++ b/Documentation/networking/nf_conntrack-sysctl.txt
@@ -33,24 +33,6 @@ nf_conntrack_events - BOOLEAN
If this option is enabled, the connection tracking code will
provide userspace with connection tracking events via ctnetlink.
 
-nf_conntrack_events_retry_timeout - INTEGER (seconds)
-   default 15
-
-   This option is only relevant when "reliable connection tracking
-   events" are used.  Normally, ctnetlink is "lossy", that is,
-   events are normally dropped when userspace listeners can't keep up.
-
-   Userspace can request "reliable event mode".  When this mode is
-   active, the conntrack will only be destroyed after the event was
-   delivered.  If event delivery fails, the kernel periodically
-   re-tries to send the event to userspace.
-
-   This is the maximum interval the kernel should use when re-trying
-   to deliver the destroy event.
-
-   A higher number means there will be fewer delivery retries and it
-   will take longer for a backlog to be processed.
-
 nf_conntrack_expect_max - INTEGER
Maximum size of expectation table.  Default value is
nf_conntrack_buckets / 256. Minimum is 1.
-- 
2.8.1

[PATCH net 2/2] conntrack: enable to tune gc parameters

2016-10-10 Thread Nicolas Dichtel

After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
timed-out entries"), netlink conntrack deletion events may be sent with a
huge delay. It could be interesting to let the user tweak gc parameters
depending on its use case.

CC: Florian Westphal <f...@strlen.de>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 Documentation/networking/nf_conntrack-sysctl.txt | 17 +++
 include/net/netfilter/nf_conntrack_core.h|  5 
 net/netfilter/nf_conntrack_core.c| 17 +--
 net/netfilter/nf_conntrack_standalone.c  | 36 
 4 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/Documentation/networking/nf_conntrack-sysctl.txt 
b/Documentation/networking/nf_conntrack-sysctl.txt
index 399e4e866a9c..5b6ace93521d 100644
--- a/Documentation/networking/nf_conntrack-sysctl.txt
+++ b/Documentation/networking/nf_conntrack-sysctl.txt
@@ -37,6 +37,23 @@ nf_conntrack_expect_max - INTEGER
Maximum size of expectation table.  Default value is
nf_conntrack_buckets / 256. Minimum is 1.
 
+nf_conntrack_gc_interval - INTEGER
+   Maximum interval in second between two run of the conntrack gc. This
+   gc is in charge of removing stale entries. It also impacts the delay
+   before notifying the userland a conntrack deletion.
+   This sysctl is only writeable in the initial net namespace.
+
+nf_conntrack_gc_max_buckets - INTEGER
+nf_conntrack_gc_max_buckets_div - INTEGER
+   During a run, the conntrack gc processes at maximum
+   nf_conntrack_buckets/nf_conntrack_gc_max_buckets_div (and never more
+   than nf_conntrack_gc_max_buckets) entries.
+   These sysctl are only writeable in the initial net namespace.
+
+nf_conntrack_gc_max_evicts - INTEGER
+   The maximum number of entries to be evicted during a run of gc.
+   This sysctl is only writeable in the initial net namespace.
+
 nf_conntrack_frag6_high_thresh - INTEGER
default 262144
 
diff --git a/include/net/netfilter/nf_conntrack_core.h 
b/include/net/netfilter/nf_conntrack_core.h
index 62e17d1319ff..2a5ed368fb71 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -86,4 +86,9 @@ void nf_conntrack_lock(spinlock_t *lock);
 
 extern spinlock_t nf_conntrack_expect_lock;
 
+extern unsigned int nf_ct_gc_interval;
+extern unsigned int nf_ct_gc_max_buckets_div;
+extern unsigned int nf_ct_gc_max_buckets;
+extern unsigned int nf_ct_gc_max_evicts;
+
 #endif /* _NF_CONNTRACK_CORE_H */
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index ba6a1d421222..435b431e3449 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -83,10 +83,10 @@ static __read_mostly spinlock_t nf_conntrack_locks_all_lock;
 static __read_mostly DEFINE_SPINLOCK(nf_conntrack_locks_all_lock);
 static __read_mostly bool nf_conntrack_locks_all;
 
-#define GC_MAX_BUCKETS_DIV 64u
-#define GC_MAX_BUCKETS 8192u
-#define GC_INTERVAL(5 * HZ)
-#define GC_MAX_EVICTS  256u
+unsigned int nf_ct_gc_interval = 5 * HZ;
+unsigned int nf_ct_gc_max_buckets = 8192;
+unsigned int nf_ct_gc_max_buckets_div = 64;
+unsigned int nf_ct_gc_max_evicts = 256;
 
 static struct conntrack_gc_work conntrack_gc_work;
 
@@ -936,13 +936,14 @@ static noinline int early_drop(struct net *net, unsigned 
int _hash)
 static void gc_worker(struct work_struct *work)
 {
unsigned int i, goal, buckets = 0, expired_count = 0;
-   unsigned long next_run = GC_INTERVAL;
+   unsigned long next_run = nf_ct_gc_interval;
unsigned int ratio, scanned = 0;
struct conntrack_gc_work *gc_work;
 
gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
 
-   goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, 
GC_MAX_BUCKETS);
+   goal = min(nf_conntrack_htable_size / nf_ct_gc_max_buckets_div,
+  nf_ct_gc_max_buckets);
i = gc_work->last_bucket;
 
do {
@@ -977,7 +978,7 @@ static void gc_worker(struct work_struct *work)
rcu_read_unlock();
cond_resched_rcu_qs();
} while (++buckets < goal &&
-expired_count < GC_MAX_EVICTS);
+expired_count < nf_ct_gc_max_evicts);
 
if (gc_work->exiting)
return;
@@ -1885,7 +1886,7 @@ int nf_conntrack_init_start(void)
nf_ct_untracked_status_or(IPS_CONFIRMED | IPS_UNTRACKED);
 
conntrack_gc_work_init(_gc_work);
-   schedule_delayed_work(_gc_work.dwork, GC_INTERVAL);
+   schedule_delayed_work(_gc_work.dwork, nf_ct_gc_interval);
 
return 0;
 
diff --git a/net/netfilter/nf_conntrack_standalone.c 
b/net/netfilter/nf_conntrack_standalone.c
index 5f446cd9f3fd..c5310fb35eca 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack

[PATCH net] vti6: flush x-netns xfrm cache when vti interface is removed

2016-09-30 Thread Nicolas Dichtel

This is the same fix than commit a5d0dc810abf ("vti: flush x-netns xfrm
cache when vti interface is removed")

This patch fixes a refcnt problem when a x-netns vti6 interface is removed:
unregister_netdevice: waiting for vti6_test to become free. Usage count = 1

Here is a script to reproduce the problem:

ip link set dev ntfp2 up
ip addr add dev ntfp2 2001::1/64
ip link add vti6_test type vti6 local 2001::1 remote 2001::2 key 1
ip netns add secure
ip link set vti6_test netns secure
ip netns exec secure ip link set vti6_test up
ip netns exec secure ip link s lo up
ip netns exec secure ip addr add dev vti6_test 2003::1/64
ip -6 xfrm policy add dir out tmpl src 2001::1 dst 2001::2 proto esp \
   mode tunnel mark 1
ip -6 xfrm policy add dir in tmpl src 2001::2 dst 2001::1 proto esp \
   mode tunnel mark 1
ip xfrm state add src 2001::1 dst 2001::2 proto esp spi 1 mode tunnel \
   enc des3_ede 0x112233445566778811223344556677881122334455667788 mark 
1
ip xfrm state add src 2001::2 dst 2001::1 proto esp spi 1 mode tunnel \
   enc des3_ede 0x112233445566778811223344556677881122334455667788 mark 
1
ip netns exec secure  ping6 -c 4 2003::2
ip netns del secure

CC: Lance Richardson <lrich...@redhat.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 net/ipv6/ip6_vti.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 5bd3afdcc771..cc7f7e9a8e8d 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -1138,6 +1138,33 @@ static struct xfrm6_protocol vti_ipcomp6_protocol 
__read_mostly = {
.priority   =   100,
 };
 
+static bool is_vti6_tunnel(const struct net_device *dev)
+{
+   return dev->netdev_ops == _netdev_ops;
+}
+
+static int vti6_device_event(struct notifier_block *unused,
+unsigned long event, void *ptr)
+{
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct ip6_tnl *t = netdev_priv(dev);
+
+   if (!is_vti6_tunnel(dev))
+   return NOTIFY_DONE;
+
+   switch (event) {
+   case NETDEV_DOWN:
+   if (!net_eq(t->net, dev_net(dev)))
+   xfrm_garbage_collect(t->net);
+   break;
+   }
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block vti6_notifier_block __read_mostly = {
+   .notifier_call = vti6_device_event,
+};
+
 /**
  * vti6_tunnel_init - register protocol and reserve needed resources
  *
@@ -1148,6 +1175,8 @@ static int __init vti6_tunnel_init(void)
const char *msg;
int err;
 
+   register_netdevice_notifier(_notifier_block);
+
msg = "tunnel device";
err = register_pernet_device(_net_ops);
if (err < 0)
@@ -1180,6 +1209,7 @@ xfrm_proto_ah_failed:
 xfrm_proto_esp_failed:
unregister_pernet_device(_net_ops);
 pernet_dev_failed:
+   unregister_netdevice_notifier(_notifier_block);
pr_err("vti6 init: failed to register %s\n", msg);
return err;
 }
@@ -1194,6 +1224,7 @@ static void __exit vti6_tunnel_cleanup(void)
xfrm6_protocol_deregister(_ah6_protocol, IPPROTO_AH);
xfrm6_protocol_deregister(_esp6_protocol, IPPROTO_ESP);
unregister_pernet_device(_net_ops);
+   unregister_netdevice_notifier(_notifier_block);
 }
 
 module_init(vti6_tunnel_init);
-- 
2.8.1

[PATCH net v2] i40e: fix call of ndo_dflt_bridge_getlink()

2016-09-26 Thread Nicolas Dichtel

From: Huaibin Wang <huaibin.w...@6wind.com>

Order of arguments is wrong.
The wrong code has been introduced by commit 7d4f8d871ab1, but is compiled
only since commit 9df70b66418e.

Note that this may break netlink dumps.

Fixes: 9df70b66418e ("i40e: Remove incorrect #ifdef's")
Fixes: 7d4f8d871ab1 ("switchdev; add VLAN support for port's bridge_getlink")
CC: Carolyn Wyborny <carolyn.wybo...@intel.com>
Signed-off-by: Huaibin Wang <huaibin.w...@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

v1 -> v2
 - remove CC
 - rebase on the last net tree

 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d0b3a1bb82ca..2843f6cae97a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9012,7 +9012,7 @@ static int i40e_ndo_bridge_getlink(struct sk_buff *skb, 
u32 pid, u32 seq,
return 0;
 
return ndo_dflt_bridge_getlink(skb, pid, seq, dev, veb->bridge_mode,
-  nlflags, 0, 0, filter_mask, NULL);
+  0, 0, nlflags, filter_mask, NULL);
 }
 
 /* Hardware supports L4 tunnel length of 128B (=2^7) which includes
-- 
2.8.1

Re: [PATCH net] i40e: fix call of ndo_dflt_bridge_getlink()

2016-09-26 Thread Nicolas Dichtel

Le 23/09/2016 à 23:37, Jeff Kirsher a écrit :
[snip]
> Yes, it needs to go through my tree.   Please send it to intel-wired-lan@li
> sts.osuosl.org mailing list, that way I can track it through our patchwork
Ok, will do. I didn't send it to this list the first time because this list is
marked as "moderated for non-subscribers" in the MAINTAINERS file.

Re: [PATCH net] i40e: fix call of ndo_dflt_bridge_getlink()

2016-09-23 Thread Nicolas Dichtel

Le 19/09/2016 à 18:14, Nicolas Dichtel a écrit :
> From: Huaibin Wang <huaibin.w...@6wind.com>
> 
> Order of arguments is wrong.
> The wrong code has been introduced by commit 7d4f8d871ab1, but is compiled
> only since commit 9df70b66418e.
> 
> Note that this may break netlink dumps.
> 
> Fixes: 9df70b66418e ("i40e: Remove incorrect #ifdef's")
> Fixes: 7d4f8d871ab1 ("switchdev; add VLAN support for port's bridge_getlink")
> CC: Scott Feldman <sfel...@gmail.com>
> CC: Carolyn Wyborny <carolyn.wybo...@intel.com>
> CC: Catherine Sullivan <catherine.sulli...@intel.com>
> Signed-off-by: Huaibin Wang <huaibin.w...@6wind.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
Hi Jeff,

any news about this patch? David has marked it "awaiting upstream" on the
patchwork, so I understand it should go to your tree.


Thank you,
Nicolas

Re: [PATCH v3] iproute2: build nsid-name cache only for commands that need it

2016-09-20 Thread Nicolas Dichtel

Le 20/09/2016 à 08:01, Anton Aksola a écrit :
> The calling of netns_map_init() before command parsing introduced
> a performance issue with large number of namespaces.
> 
> As commands such as add, del and exec do not need to iterate through
> /var/run/netns it would be good not no build the cache before executing
> these commands.
> 
> Example:
> unpatched:
> time seq 1 1000 | xargs -n 1 ip netns add
> 
> real0m16.832s
> user0m1.350s
> sys0m15.029s
> 
> patched:
> time seq 1 1000 | xargs -n 1 ip netns add
> 
> real0m3.859s
> user0m0.132s
> sys0m3.205s
> 
> Signed-off-by: Anton Aksola <aa...@iki.fi>
Acked-by: Nicolas Dichtel <nicolas.dich...@6wind.com>

Re: [PATCH iproute2] ipmonitor: fix ip monitor can't work when NET_NS is not enabled

2016-09-20 Thread Nicolas Dichtel

Le 20/09/2016 à 11:09, Liping Zhang a écrit :
> From: Liping Zhang <liping.zh...@spreadtrum.com>
> 
> In ip monitor, netns_map_init will check getnsid is supported or not.
> But when /proc/self/ns/net does not exist, we just print out error
> messages and exit. So user cannot use ip monitor anymore when
> CONFIG_NET_NS is disabled:
>   # ip monitor
>   open("/proc/self/ns/net"): No such file or directory
> 
> If open "/proc/self/ns/net" failed, set have_rtnl_getnsid to false.
> 
> Fixes: d652ccbf8195 ("netns: allow to dump and monitor nsid")
> Signed-off-by: Liping Zhang <liping.zh...@spreadtrum.com>
Acked-by: Nicolas Dichtel <nicolas.dich...@6wind.com>

[PATCH net] i40e: fix call of ndo_dflt_bridge_getlink()

2016-09-19 Thread Nicolas Dichtel

From: Huaibin Wang <huaibin.w...@6wind.com>

Order of arguments is wrong.
The wrong code has been introduced by commit 7d4f8d871ab1, but is compiled
only since commit 9df70b66418e.

Note that this may break netlink dumps.

Fixes: 9df70b66418e ("i40e: Remove incorrect #ifdef's")
Fixes: 7d4f8d871ab1 ("switchdev; add VLAN support for port's bridge_getlink")
CC: Scott Feldman <sfel...@gmail.com>
CC: Carolyn Wyborny <carolyn.wybo...@intel.com>
CC: Catherine Sullivan <catherine.sulli...@intel.com>
Signed-off-by: Huaibin Wang <huaibin.w...@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d0b3a1bb82ca..2843f6cae97a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9012,7 +9012,7 @@ static int i40e_ndo_bridge_getlink(struct sk_buff *skb, 
u32 pid, u32 seq,
return 0;
 
return ndo_dflt_bridge_getlink(skb, pid, seq, dev, veb->bridge_mode,
-  nlflags, 0, 0, filter_mask, NULL);
+  0, 0, nlflags, filter_mask, NULL);
 }
 
 /* Hardware supports L4 tunnel length of 128B (=2^7) which includes
-- 
2.8.1

[PATCH net] vti6: fix input path

2016-09-19 Thread Nicolas Dichtel

Since commit 1625f4529957, vti6 is broken, all input packets are dropped
(LINUX_MIB_XFRMINNOSTATES is incremented).

XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
xfrm6_rcv_spi().

A new function xfrm6_rcv_tnl() that enables to pass a value to
xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
is used in several handlers).

CC: Alexey Kodanev <alexey.koda...@oracle.com>
Fixes: 1625f4529957 ("net/xfrm_input: fix possible NULL deref of 
tunnel.ip6->parms.i_key")
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 include/net/xfrm.h  |  4 +++-
 net/ipv6/ip6_vti.c  |  4 +---
 net/ipv6/xfrm6_input.c  | 16 +++-
 net/ipv6/xfrm6_tunnel.c |  2 +-
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index adfebd6f243c..17934312eecb 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1540,8 +1540,10 @@ int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, 
unsigned short family);
 void xfrm4_local_error(struct sk_buff *skb, u32 mtu);
 int xfrm6_extract_header(struct sk_buff *skb);
 int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
-int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
+int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
+ struct ip6_tnl *t);
 int xfrm6_transport_finish(struct sk_buff *skb, int async);
+int xfrm6_rcv_tnl(struct sk_buff *skb, struct ip6_tnl *t);
 int xfrm6_rcv(struct sk_buff *skb);
 int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 xfrm_address_t *saddr, u8 proto);
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d90a11f14040..9dfb676c08af 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -321,11 +321,9 @@ static int vti6_rcv(struct sk_buff *skb)
goto discard;
}
 
-   XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t;
-
rcu_read_unlock();
 
-   return xfrm6_rcv(skb);
+   return xfrm6_rcv_tnl(skb, t);
}
rcu_read_unlock();
return -EINVAL;
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 00a2d40677d6..b5789562aded 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -21,9 +21,10 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff 
*skb)
return xfrm6_extract_header(skb);
 }
 
-int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
+int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
+ struct ip6_tnl *t)
 {
-   XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = NULL;
+   XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t;
XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
return xfrm_input(skb, nexthdr, spi, 0);
@@ -49,13 +50,18 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
return -1;
 }
 
-int xfrm6_rcv(struct sk_buff *skb)
+int xfrm6_rcv_tnl(struct sk_buff *skb, struct ip6_tnl *t)
 {
return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff],
-0);
+0, t);
 }
-EXPORT_SYMBOL(xfrm6_rcv);
+EXPORT_SYMBOL(xfrm6_rcv_tnl);
 
+int xfrm6_rcv(struct sk_buff *skb)
+{
+   return xfrm6_rcv_tnl(skb, NULL);
+}
+EXPORT_SYMBOL(xfrm6_rcv);
 int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 xfrm_address_t *saddr, u8 proto)
 {
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index 5743044cd660..e1c0bbe7996c 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -236,7 +236,7 @@ static int xfrm6_tunnel_rcv(struct sk_buff *skb)
__be32 spi;
 
spi = xfrm6_tunnel_spi_lookup(net, (const xfrm_address_t *)>saddr);
-   return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi);
+   return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi, NULL);
 }
 
 static int xfrm6_tunnel_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
-- 
2.8.1

Re: [PATCH v2] iproute2: build nsid-name cache only for commands that need it

2016-09-16 Thread Nicolas Dichtel

Le 16/09/2016 à 15:18, Anton Aksola a écrit :
[snip]
> Nicolas,
> This seems to be caused by netns_add calling unshare(CLONE_NEWNET).
> If we initialize the socket for nsid after that it doesn't seem to work.
> 
> Unfortunately I'm not an expert in these details. Should we separate the
> socket and cache initialization to different functions and call the
> socket init in the beginning of do_netns() as before? What do you think?
Seems good.

> I made a quick patch and it seems to work in batch mode now.
What is the result of that sequence?

$ ip netns add bar
$ ip netns set bar 5678
$ ip -b test.batch
nsid 1234 (iproute2 netns name: foo)
nsid 5678 (iproute2 netns name: bar)
$

Re: [PATCH v2] iproute2: build nsid-name cache only for commands that need it

2016-09-16 Thread Nicolas Dichtel

Le 16/09/2016 à 11:23, Vadim Kochan a écrit :
[snip]
> Would it be useful to add test for this case into testsuite/ ?
Yes, it's a good idea.

Regards,
Nicolas

Re: [PATCH v2] iproute2: build nsid-name cache only for commands that need it

2016-09-16 Thread Nicolas Dichtel

Le 16/09/2016 à 09:22, Anton Aksola a écrit :
> The calling of netns_map_init() before command parsing introduced
> a performance issue with large number of namespaces.
> 
> As commands such as add, del and exec do not need to iterate through
> /var/run/netns it would be good not no build the cache before executing
> these commands.
> 
> Example:
> unpatched:
> time seq 1 1000 | xargs -n 1 ip netns add
> 
> real0m16.832s
> user0m1.350s
> sys0m15.029s
> 
> patched:
> time seq 1 1000 | xargs -n 1 ip netns add
> 
> real0m3.859s
> user0m0.132s
> sys0m3.205s
> 
> Signed-off-by: Anton Aksola 
> ---
There is still some differences:
$ cat test.batch
netns add foo
netns set foo 1234
netns list-id

Before your patch:
$ ip -b test.batch
nsid 1234 (iproute2 netns name: foo)

After your patch:
$ ip -b test.batch
nsid 1234

Re: [PATCH] iproute2: build nsid-name cache only for commands that need it

2016-09-15 Thread Nicolas Dichtel

Le 15/09/2016 à 10:23, Anton Aksola a écrit :
[snip]
> --- a/ip/ipnetns.c
> +++ b/ip/ipnetns.c
> @@ -775,8 +775,6 @@ static int netns_monitor(int argc, char **argv)
>  
>  int do_netns(int argc, char **argv)
>  {
> - netns_map_init();
> -
>   if (argc < 1)
>   return netns_list(0, NULL);
>  
> @@ -784,8 +782,10 @@ int do_netns(int argc, char **argv)
>   (matches(*argv, "lst") == 0))
>   return netns_list(argc-1, argv+1);
>  
> - if ((matches(*argv, "list-id") == 0))
> + if ((matches(*argv, "list-id") == 0)) {
> + netns_map_init();
>   return netns_list_id(argc-1, argv+1);
> + }
'ip netns' (ip netns list) also need it.

Re: [PATCH net-next] tcp: use an RB tree for ooo receive queue

2016-09-08 Thread Nicolas Dichtel

Le 08/09/2016 à 13:02, Ilpo Järvinen a écrit :
[snip]
> Acked-By: Ilpo Järvinen 
Also nitpicking: the end of the email address is missing ;-)

Re: [RFC 2/9] ipv6: sr: add code base for control plane support of SR-IPv6

2016-09-05 Thread Nicolas Dichtel

Le 01/09/2016 à 15:21, David Lebrun a écrit :
> On 08/29/2016 05:31 PM, Roopa Prabhu wrote:
>> This looks fine. But, i am just trying to see if this can be rtnetlink.
>> Have you considered it already ?.
>> We would like to keep the API consistent or abstracted to accommodate 
>> SR-MPLS in the
>> future too. so, any abstraction there will help.
>>
>> what is your control-plane software using this ?
>> quagga or any-other routing daemon ?
>>
>> Thanks,
>> Roopa
>>
> 
> rtnetlink is used for the routing part through the lwtunnels. In this
> specific patch, the API is intended to configure namespace-wide
> parameters such as a map of hmackeyid -> hashfn,secret, without
> connection to particular routes. I agree that the tunsrc parameter could
> be provided through rtnetlink, but the hmac data cannot (which by the
> way does not exist for SR-MPLS).
But it's closely related to routes. You can add a new family RTM_foo. It will be
easier to implement userspace part (only one netlink family is involved).


Regards,
Nicolas

Re: [Patch net-next 2/2] netns: avoid disabling irq for netns id

2016-09-04 Thread Nicolas Dichtel

Le 02/09/2016 à 19:24, Cong Wang a écrit :
> On Fri, Sep 2, 2016 at 9:39 AM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>> On Fri, Sep 2, 2016 at 1:12 AM, Nicolas Dichtel
>> <nicolas.dich...@6wind.com> wrote:
>>> Le 02/09/2016 à 06:53, Cong Wang a écrit :
>>>> We never read or change netns id in hardirq context,
>>>> the only place we read netns id in softirq context
>>>> is in vxlan_xmit(). So, it should be enough to just
>>>> disable BH.
>>>
>>> Are you sure? Did you audit all part of the code?
>>
>> I did audit all the callers, and I didn't find any of them in IRQ context.
>>
>>> peernet2id() is called from netlink core system (do_one_broadcast()). Are 
>>> you
>>> sure that no driver call this function from an hard irq context?
>>
>> I audit all callers of netlink_broadcast(), and I don't see any of
>> them in hardirq context.
> 
> Note, you can rule out most of them by checking GFP_KERNEL,
> which indicates a process context. ;) For GFP_ATOMIC cases,
> I don't see any of them in hardirq context either, but I am definitely
> not familiar with drivers like infiniband.
> 
Yes, I was thinking to that one. But I'm also not familiar with it ;-)

Re: [Patch net-next 1/2] vxlan: call peernet2id() in fdb notification

2016-09-02 Thread Nicolas Dichtel

Le 02/09/2016 à 06:53, Cong Wang a écrit :
> netns id should be already allocated each time we change
> netns, that is, in dev_change_net_namespace() (more precisely
> in rtnl_fill_ifinfo()). It is safe to just call peernet2id() here.
> 
> Cc: Nicolas Dichtel <nicolas.dich...@6wind.com>
> Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dich...@6wind.com>

Re: [Patch net-next 2/2] netns: avoid disabling irq for netns id

2016-09-02 Thread Nicolas Dichtel

Le 02/09/2016 à 06:53, Cong Wang a écrit :
> We never read or change netns id in hardirq context,
> the only place we read netns id in softirq context
> is in vxlan_xmit(). So, it should be enough to just
> disable BH.

Are you sure? Did you audit all part of the code?
peernet2id() is called from netlink core system (do_one_broadcast()). Are you
sure that no driver call this function from an hard irq context?

I think that NETLINK_LISTEN_ALL_NSID is largely untested, so it will be hard to
detect a bug introduced in this feature.

Re: [RFC 1/9] ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)

2016-09-01 Thread Nicolas Dichtel

Le 01/09/2016 à 15:11, David Lebrun a écrit :
> On 08/31/2016 04:51 PM, Nicolas Dichtel wrote:
[snip]
>>> +config IPV6_SEG6
>>> +   bool "IPv6: Segment Routing support"
>>> +   depends on IPV6
>>> +   ---help---
>>> + Experimental support for IPv6 Segment Routing dataplane as defined
>>> + in IETF draft-ietf-6man-segment-routing-header-01. This option
>>> + enables the processing of SR-enabled packets allowing the kernel
>>> + to act as a segment endpoint (intermediate or egress).
>>> +
>>> + If unsure, say N.
>>> +
>> I don't think that the option is needed. At the end, every distributions will
>> turn it on.
>>
> 
> Are you sure ? This is a rather specific feature, used in specific
> environments. Not that I would mind removing the option if it makes sense.
Check default config file of a standard distributions (ubuntu, debian, ...), all
options are enabled (MIP6, SIT_6RD, PIMSM_v2, etc. ;-)).
Anyway, one option is enough. You add 3 or 4 options in your series.

[snip]

>>> +
>>> +   ip6_route_input(skb);
>> The destination address has now changed and the packet is routed again.
>> skb->nfct is not updated, it is intentional? I'm asking me if it's 
>> conceptually
>> right.
>>
> 
> I fail to see any usecase where conntrack would run on SR-enabled
> packets. Things such as NAT would just defeat the purpose.

I've added netfilter folks in CC, it would be good to have feedback from them.

For new people, here is the thread:
http://www.spinics.net/lists/netdev/msg392031.html


Regards,
Nicolas

Re: [RFC 2/9] ipv6: sr: add code base for control plane support of SR-IPv6

2016-08-31 Thread Nicolas Dichtel

Le 26/08/2016 à 17:52, David Lebrun a écrit :
[snip]
> +#define SEG6_VERSION_MAJOR   0
> +#define SEG6_VERSION_MINOR   30
nit: This kind of macros are not used anymore upstream. The git sha1 or the
linux version perfectly identifies the version.

Re: [RFC 2/9] ipv6: sr: add code base for control plane support of SR-IPv6

2016-08-31 Thread Nicolas Dichtel

Le 29/08/2016 à 17:31, Roopa Prabhu a écrit :
> On 8/26/16, 8:52 AM, David Lebrun wrote:
>> This patch adds the necessary hooks and structures to provide support
>> for SR-IPv6 control plane, essentially the Generic Netlink commands
>> that will be used for userspace control over the Segment Routing
>> kernel structures.
>>
>> The genetlink commands provide control over two different structures:
>> tunnel source and HMAC data. The tunnel source is the source address
>> that will be used by default when encapsulating packets into an
>> outer IPv6 header + SRH. If the tunnel source is set to :: then an
>> address of the outgoing interface will be selected as the source.
>>
>> The HMAC commands currently just return ENOTSUPP and will be implemented
>> in a future patch.
>>
>> Signed-off-by: David Lebrun 
>>
> This looks fine. But, i am just trying to see if this can be rtnetlink.
I agree with Roopa, why not using rtnl?

Re: [RFC 1/9] ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)

2016-08-31 Thread Nicolas Dichtel

Le 26/08/2016 à 17:52, David Lebrun a écrit :
> Implement minimal support for processing of SR-enabled packets
> as described in
> https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-01.
> 
> This patch implements the following operations:
> - Intermediate segment endpoint: incrementation of active segment and 
> rerouting.
> - Egress for SR-encapsulated packets: decapsulation of outer IPv6 header + SRH
>   and routing of inner packet.
> - Cleanup flag support for SR-inlined packets: removal of SRH if we are the
>   penultimate segment endpoint.
> 
> A per-interface sysctl seg6_enabled is provided, to accept/deny SR-enabled
> packets. Default is deny.
> 
> This patch does not provide support for HMAC-signed packets.
> 
> Signed-off-by: David Lebrun 
> ---
Thanks for proposing this feature. It would be great to have it upstream.

[snip]
> +config IPV6_SEG6
> + bool "IPv6: Segment Routing support"
> + depends on IPV6
> + ---help---
> +   Experimental support for IPv6 Segment Routing dataplane as defined
> +   in IETF draft-ietf-6man-segment-routing-header-01. This option
> +   enables the processing of SR-enabled packets allowing the kernel
> +   to act as a segment endpoint (intermediate or egress).
> +
> +   If unsure, say N.
> +
I don't think that the option is needed. At the end, every distributions will
turn it on.

[snip]
> +#ifdef CONFIG_IPV6_SEG6
> + {
> + .procname   = "seg6_enabled",
> + .data   = _devconf.seg6_enabled,
> + .maxlen = sizeof(int),
> + .mode   = 0644,
> + .proc_handler   = proc_dointvec,
> + },
> +#endif
Don't forget to document this option in Documentation/networking/ip-sysctl.txt.
Don't forget to explain how 'all' works ;-)
It would be nice to also add it in netconf subsystem (see 'git grep netconf
net/ipv6').

[snip]
> +#ifdef CONFIG_IPV6_SEG6
> +static int ipv6_srh_rcv(struct sk_buff *skb)
> +{
> + struct inet6_skb_parm *opt = IP6CB(skb);
> + struct in6_addr *addr = NULL, *last_addr = NULL, *active_addr = NULL;
> + struct ipv6_sr_hdr *hdr;
> + struct net *net = dev_net(skb->dev);
> + int cleanup = 0;
> + struct inet6_dev *idev;
> + int accept_seg6;
nit: better to follow the 'reverse christmas tree' scheme when declaring 
variables.

> +
> + hdr = (struct ipv6_sr_hdr *)skb_transport_header(skb);
> +
> + idev = __in6_dev_get(skb->dev);
> +
> + accept_seg6 = net->ipv6.devconf_all->seg6_enabled;
> + if (accept_seg6 > idev->cnf.seg6_enabled)
> + accept_seg6 = idev->cnf.seg6_enabled;
> +
> + if (!accept_seg6) {
> + kfree_skb(skb);
> + return -1;
> + }
> +
> +looped_back:
> + last_addr = hdr->segments;
> +
> + if (hdr->segments_left > 0) {
> + if (hdr->nexthdr != NEXTHDR_IPV6 && hdr->segments_left == 1 &&
> + sr_get_flags(hdr) & SR6_FLAG_CLEANUP)
> + cleanup = 1;
> + } else {
> + if (hdr->nexthdr == NEXTHDR_IPV6) {
> + int offset = (hdr->hdrlen + 1) << 3;
> +
> + if (!pskb_pull(skb, offset)) {
> + kfree_skb(skb);
> + return -1;
> + }
> + skb_postpull_rcsum(skb, skb_transport_header(skb),
> +offset);
> +
> + skb_reset_network_header(skb);
> + skb_reset_transport_header(skb);
> + skb->encapsulation = 0;
> +
> + __skb_tunnel_rx(skb, skb->dev, net);
> +
> + netif_rx(skb);
> + return -1;
> + }
> +
> + opt->srcrt = skb_network_header_len(skb);
> + opt->lastopt = opt->srcrt;
> + skb->transport_header += (hdr->hdrlen + 1) << 3;
> + opt->nhoff = (>nexthdr) - skb_network_header(skb);
> +
> + return 1;
> + }
> +
> + if (skb_cloned(skb)) {
> + if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
> + __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
> + IPSTATS_MIB_OUTDISCARDS);
> + kfree_skb(skb);
> + return -1;
> + }
> + }
> +
> + if (skb->ip_summed == CHECKSUM_COMPLETE)
> + skb->ip_summed = CHECKSUM_NONE;
> +
> + hdr = (struct ipv6_sr_hdr *)skb_transport_header(skb);
> +
> + active_addr = hdr->segments + hdr->segments_left;
> + hdr->segments_left--;
> + addr = hdr->segments + hdr->segments_left;
> +
> + ipv6_hdr(skb)->daddr = *addr;
> +
> + skb_push(skb, sizeof(struct ipv6hdr));
> +
> + /* cleanup */
> +
> + if (cleanup) {
> + int srhlen = (hdr->hdrlen + 1) << 3;
> + int nh = hdr->nexthdr;
> +
> +

[PATCH net v2 1/2] ipv6: add missing netconf notif when 'all' is updated

2016-08-30 Thread Nicolas Dichtel

The 'default' value was not advertised.

Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding 
status")
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

v2: fix a typo in the variable old_dflt

 net/ipv6/addrconf.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f418d2eaeddd..2a688171a188 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -778,7 +778,14 @@ static int addrconf_fixup_forwarding(struct ctl_table 
*table, int *p, int newf)
}
 
if (p == >ipv6.devconf_all->forwarding) {
+   int old_dflt = net->ipv6.devconf_dflt->forwarding;
+
net->ipv6.devconf_dflt->forwarding = newf;
+   if ((!newf) ^ (!old_dflt))
+   inet6_netconf_notify_devconf(net, NETCONFA_FORWARDING,
+NETCONFA_IFINDEX_DEFAULT,
+net->ipv6.devconf_dflt);
+
addrconf_forward_change(net, newf);
if ((!newf) ^ (!old))
inet6_netconf_notify_devconf(net, NETCONFA_FORWARDING,
-- 
2.8.1

[PATCH net v2 2/2] netconf: add a notif when settings are created

2016-08-30 Thread Nicolas Dichtel

All changes are notified, but the initial state was missing.

Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

v2: no change

 net/ipv4/devinet.c  | 11 +++
 net/ipv6/addrconf.c |  9 -
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 415e117967c7..062a67ca9a21 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2232,7 +2232,7 @@ static struct devinet_sysctl_table {
 };
 
 static int __devinet_sysctl_register(struct net *net, char *dev_name,
-   struct ipv4_devconf *p)
+int ifindex, struct ipv4_devconf *p)
 {
int i;
struct devinet_sysctl_table *t;
@@ -2255,6 +2255,8 @@ static int __devinet_sysctl_register(struct net *net, 
char *dev_name,
goto free;
 
p->sysctl = t;
+
+   inet_netconf_notify_devconf(net, NETCONFA_ALL, ifindex, p);
return 0;
 
 free:
@@ -2286,7 +2288,7 @@ static int devinet_sysctl_register(struct in_device *idev)
if (err)
return err;
err = __devinet_sysctl_register(dev_net(idev->dev), idev->dev->name,
-   >cnf);
+   idev->dev->ifindex, >cnf);
if (err)
neigh_sysctl_unregister(idev->arp_parms);
return err;
@@ -2347,11 +2349,12 @@ static __net_init int devinet_init_net(struct net *net)
}
 
 #ifdef CONFIG_SYSCTL
-   err = __devinet_sysctl_register(net, "all", all);
+   err = __devinet_sysctl_register(net, "all", NETCONFA_IFINDEX_ALL, all);
if (err < 0)
goto err_reg_all;
 
-   err = __devinet_sysctl_register(net, "default", dflt);
+   err = __devinet_sysctl_register(net, "default",
+   NETCONFA_IFINDEX_DEFAULT, dflt);
if (err < 0)
goto err_reg_dflt;
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2a688171a188..bdf368eff5ab 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -6032,7 +6032,7 @@ static const struct ctl_table addrconf_sysctl[] = {
 static int __addrconf_sysctl_register(struct net *net, char *dev_name,
struct inet6_dev *idev, struct ipv6_devconf *p)
 {
-   int i;
+   int i, ifindex;
struct ctl_table *table;
char path[sizeof("net/ipv6/conf/") + IFNAMSIZ];
 
@@ -6052,6 +6052,13 @@ static int __addrconf_sysctl_register(struct net *net, 
char *dev_name,
if (!p->sysctl_header)
goto free;
 
+   if (!strcmp(dev_name, "all"))
+   ifindex = NETCONFA_IFINDEX_ALL;
+   else if (!strcmp(dev_name, "default"))
+   ifindex = NETCONFA_IFINDEX_DEFAULT;
+   else
+   ifindex = idev->dev->ifindex;
+   inet6_netconf_notify_devconf(net, NETCONFA_ALL, ifindex, p);
return 0;
 
 free:
-- 
2.8.1

Re: [PATCH net 1/2] ipv6: add missing netconf notif when 'all' is updated

2016-08-30 Thread Nicolas Dichtel

Le 30/08/2016 à 04:51, YOSHIFUJI Hideaki a écrit :
> 
> 
> Nicolas Dichtel wrote:
[snip]
>> +int old_dftl = net->ipv6.devconf_dflt->forwarding;
> 
> dflt, not dftl?
Good catch!

Re: [PATCH net 1/2] ipv6: add missing netconf notif when 'all' is updated

2016-08-29 Thread Nicolas Dichtel

Le 29/08/2016 à 15:00, Sergei Shtylyov a écrit :
[snip]
>>  if (p == >ipv6.devconf_all->forwarding) {
>> +int old_dftl = net->ipv6.devconf_dflt->forwarding;
>> +
>>  net->ipv6.devconf_dflt->forwarding = newf;
>> +if ((!newf) ^ (!old_dftl))
> 
>IIUC, !'s are not necessary here (and more so the parens around them). And
> perhaps ^ can be changed to != for clarity...
Yes, but a lot of places in this code use that. So for consistency I use the 
same.

> 
>> +inet6_netconf_notify_devconf(net, NETCONFA_FORWARDING,
>> + NETCONFA_IFINDEX_DEFAULT,
>> + net->ipv6.devconf_dflt);
>> +
>>  addrconf_forward_change(net, newf);
>>  if ((!newf) ^ (!old))
Here is an example.

[PATCH net 2/2] netconf: add a notif when settings are created

2016-08-29 Thread Nicolas Dichtel

All changes are notified, but the initial state was missing.

Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 net/ipv4/devinet.c  | 11 +++
 net/ipv6/addrconf.c |  9 -
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 415e117967c7..062a67ca9a21 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2232,7 +2232,7 @@ static struct devinet_sysctl_table {
 };
 
 static int __devinet_sysctl_register(struct net *net, char *dev_name,
-   struct ipv4_devconf *p)
+int ifindex, struct ipv4_devconf *p)
 {
int i;
struct devinet_sysctl_table *t;
@@ -2255,6 +2255,8 @@ static int __devinet_sysctl_register(struct net *net, 
char *dev_name,
goto free;
 
p->sysctl = t;
+
+   inet_netconf_notify_devconf(net, NETCONFA_ALL, ifindex, p);
return 0;
 
 free:
@@ -2286,7 +2288,7 @@ static int devinet_sysctl_register(struct in_device *idev)
if (err)
return err;
err = __devinet_sysctl_register(dev_net(idev->dev), idev->dev->name,
-   >cnf);
+   idev->dev->ifindex, >cnf);
if (err)
neigh_sysctl_unregister(idev->arp_parms);
return err;
@@ -2347,11 +2349,12 @@ static __net_init int devinet_init_net(struct net *net)
}
 
 #ifdef CONFIG_SYSCTL
-   err = __devinet_sysctl_register(net, "all", all);
+   err = __devinet_sysctl_register(net, "all", NETCONFA_IFINDEX_ALL, all);
if (err < 0)
goto err_reg_all;
 
-   err = __devinet_sysctl_register(net, "default", dflt);
+   err = __devinet_sysctl_register(net, "default",
+   NETCONFA_IFINDEX_DEFAULT, dflt);
if (err < 0)
goto err_reg_dflt;
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 299f0656e87f..bb3874d2f248 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -6032,7 +6032,7 @@ static const struct ctl_table addrconf_sysctl[] = {
 static int __addrconf_sysctl_register(struct net *net, char *dev_name,
struct inet6_dev *idev, struct ipv6_devconf *p)
 {
-   int i;
+   int i, ifindex;
struct ctl_table *table;
char path[sizeof("net/ipv6/conf/") + IFNAMSIZ];
 
@@ -6052,6 +6052,13 @@ static int __addrconf_sysctl_register(struct net *net, 
char *dev_name,
if (!p->sysctl_header)
goto free;
 
+   if (!strcmp(dev_name, "all"))
+   ifindex = NETCONFA_IFINDEX_ALL;
+   else if (!strcmp(dev_name, "default"))
+   ifindex = NETCONFA_IFINDEX_DEFAULT;
+   else
+   ifindex = idev->dev->ifindex;
+   inet6_netconf_notify_devconf(net, NETCONFA_ALL, ifindex, p);
return 0;
 
 free:
-- 
2.8.1

[PATCH net 1/2] ipv6: add missing netconf notif when 'all' is updated

2016-08-29 Thread Nicolas Dichtel

The 'default' value was not advertised.

Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding 
status")
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 net/ipv6/addrconf.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f418d2eaeddd..299f0656e87f 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -778,7 +778,14 @@ static int addrconf_fixup_forwarding(struct ctl_table 
*table, int *p, int newf)
}
 
if (p == >ipv6.devconf_all->forwarding) {
+   int old_dftl = net->ipv6.devconf_dflt->forwarding;
+
net->ipv6.devconf_dflt->forwarding = newf;
+   if ((!newf) ^ (!old_dftl))
+   inet6_netconf_notify_devconf(net, NETCONFA_FORWARDING,
+NETCONFA_IFINDEX_DEFAULT,
+net->ipv6.devconf_dflt);
+
addrconf_forward_change(net, newf);
if ((!newf) ^ (!old))
inet6_netconf_notify_devconf(net, NETCONFA_FORWARDING,
-- 
2.8.1

Re: [iproute PATCH v4 0/5] Big C99 style initializer rework

2016-07-15 Thread Nicolas Dichtel

Le 13/07/2016 20:47, Phil Sutter a écrit :
> This is v4 of my C99-style initializer related patch series. 
It compiles successfully with my gcc 4.4.7.


Regards,
Nicolas

Re: [iproute PATCH 0/2] Netns performance improvements

2016-07-11 Thread Nicolas Dichtel

Le 08/07/2016 19:18, Rick Jones a écrit :
> On 07/08/2016 01:01 AM, Nicolas Dichtel wrote:
[snip]
>> Do you have a script or something else to easily reproduce this problem?
> 
> Do you mean for my much older, slightly different stuff done in HP Public 
> Cloud,
> or for what Phil (?) is doing presently?  I believe Phil posted something
> several messages back in the thread.
I was thinking to Phil's scenario.


Thank you,
Nicolas

Re: [iproute PATCH 0/2] Netns performance improvements

2016-07-08 Thread Nicolas Dichtel

Le 07/07/2016 18:16, Rick Jones a écrit :
> On 07/07/2016 08:48 AM, Phil Sutter wrote:
>> On Thu, Jul 07, 2016 at 02:59:48PM +0200, Nicolas Dichtel wrote:
>>> Le 07/07/2016 13:17, Phil Sutter a écrit :
>>> [snip]
>>>> The issue came up during OpenStack Neutron testing, see this ticket for
>>>> reference:
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1310795
>>> Access to this ticket is not public :(
>>
>> *Sigh* OK, here are a few quotes:
>>
>> "OpenStack Neutron controller nodes, when undergoing testing, are
>> locking up specifically during creation and mounting of namespaces.
>> They appear to be blocking behind vfsmount_lock, and contention for the
>> namespace_sem"
>>
>> "During the scale testing, we have 300 routers, 600 dhcp namespaces
>> spread across four neutron network nodes. When then start as one set of
>> standard Openstack Rally benchmark test cycle against neutron. An
>> example scenario is creating 10x networks, list them, delete them and
>> repeat 10x times. The second set performs an L3 benchmark test between
>> two instances."
>>
> 
> Those 300 routers will each have at least one namespace along with the dhcp
> namespaces.  Depending on the nature of the routers (Distributed versus
> Centralized Virtual Routers - DVR vs CVR) and whether the routers are supposed
> to be "HA" there can be more than one namespace for a given router.
> 
> 300 routers is far from the upper limit/goal.  Back in HP Public Cloud, we 
> were
> running as many as 700 routers per network node (*), and more than four 
> network
> nodes. (back then it was just the one namespace per router and network). 
> Mileage
> will of course vary based on the "oomph" of one's network node(s).
Thank you for the details.

Do you have a script or something else to easily reproduce this problem?

Re: [PATCH net-next] ipv6: do not abuse GFP_ATOMIC in inet6_netconf_notify_devconf()

2016-07-08 Thread Nicolas Dichtel

Le 08/07/2016 05:46, Eric Dumazet a écrit :
> From: Eric Dumazet <eduma...@google.com>
> 
> All inet6_netconf_notify_devconf() callers are in process context,
> so we can use GFP_KERNEL allocations if we take care of not holding
> a rwlock while not needed in ip6mr (we hold RTNL there)
> 
> 
> Fixes: d67b8c616b48 ("netconf: advertise mc_forwarding status")
> Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding 
> status")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Cc: Nicolas Dichtel <nicolas.dich...@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dich...@6wind.com>

Also nice, thanks.

Re: [PATCH net-next] ipv4: do not abuse GFP_ATOMIC in inet_netconf_notify_devconf()

2016-07-08 Thread Nicolas Dichtel

Le 08/07/2016 05:18, Eric Dumazet a écrit :
> From: Eric Dumazet <eduma...@google.com>
> 
> inet_forward_change() runs with RTNL held.
> We are allowed to sleep if required.
> 
> If we use __in_dev_get_rtnl() instead of __in_dev_get_rcu(),
> we no longer have to use GFP_ATOMIC allocations in
> inet_netconf_notify_devconf(), meaning we are less likely to miss
> notifications under memory pressure, and wont touch precious memory
> reserves either and risk dropping incoming packets.
> 
> inet_netconf_get_devconf() can also use GFP_KERNEL allocation.
> 
> Fixes: edc9e748934c ("rtnl/ipv4: use netconf msg to advertise forwarding 
> status")
> Fixes: 9e5511106f99 ("rtnl/ipv4: add support of RTM_GETNETCONF")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Cc: Nicolas Dichtel <nicolas.dich...@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dich...@6wind.com>

Nice catch, thank you!

Re: [iproute PATCH 0/2] Netns performance improvements

2016-07-07 Thread Nicolas Dichtel

Le 07/07/2016 13:17, Phil Sutter a écrit :
[snip]
> The issue came up during OpenStack Neutron testing, see this ticket for
> reference:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1310795
Access to this ticket is not public :(

Re: [PATCH] xfrm: fix crash in XFRM_MSG_GETSA netlink handler

2016-07-05 Thread Nicolas Dichtel

Le 05/07/2016 10:18, Vegard Nossum a écrit :
> If we hit any of the error conditions inside xfrm_dump_sa(), then
> xfrm_state_walk_init() never gets called. However, we still call
> xfrm_state_walk_done() from xfrm_dump_sa_done(), which will crash
> because the state walk was never initialized properly.
> 
> We can fix this by setting cb->args[0] only after we've processed the
> first element and checking this before calling xfrm_state_walk_done().
> 
> Fixes: d3623099d3 ("ipsec: add support of limited SA dump")
> Cc: Nicolas Dichtel <nicolas.dich...@6wind.com>
> Cc: Steffen Klassert <steffen.klass...@secunet.com>
> Signed-off-by: Vegard Nossum <vegard.nos...@oracle.com>
Acked-by: Nicolas Dichtel <nicolas.dich...@6wind.com>

Thank you for the fix!

Re: [iproute PATCH v3 0/6] Big C99 style initializer rework

2016-06-24 Thread Nicolas Dichtel

Le 23/06/2016 19:34, Phil Sutter a écrit :
> This is v3 of my C99-style initializer related patch series. The changes
> since v2 are:
Compile-tested with a gcc 4.4.7.


Regards,
Nicolas

Re: [iproute PATCH 1/3] Use C99 style initializers everywhere

2016-06-17 Thread Nicolas Dichtel

Le 17/06/2016 18:46, Daniel Borkmann a écrit :
> On 06/17/2016 06:34 PM, Stephen Hemminger wrote:
>> On Fri, 17 Jun 2016 16:09:20 +
>> Daniel Borkmann  wrote:
>>
>>> Please have a look at commit 8f80d450c3cb ("tc: fix compilation with old gcc
>>> (< 4.6)") ...
>>>
>>> Your changes effectively revert them again. Here, and some other parts of 
>>> the
>>> bpf frontend
>>> code bits.
>>
>> GCC 4.6 is 3 years old. So perhaps it is time to move on.
>> Maybe add a GCC version check in the makefile, to fail cleanly.
> 
> Well, you don't have to ask me but rather the patch submitters (Cc).
> 
> I haven't used RHEL in quite a while, but I could imagine it might
> be related to built it there perhaps.
Yes. For some specific arch, we have only old toolchains.

The rule was always to be backward compatible with old kernels. It implies to
also support the compilation with old toolchains ;-)

[PATCH net v2 4/4] ovs/geneve: fix rtnl notifications on iface deletion

2016-06-13 Thread Nicolas Dichtel

The function geneve_dev_create_fb() (only used by ovs) never calls
rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
never set to RTNL_LINK_INITIALIZED.
During the deletion phase, the function rollback_registered_many() sends
a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.

Fixes: e305ac6cf5a1 ("geneve: Add support to collect tunnel metadata.")
CC: Pravin B Shelar <pshe...@nicira.com>
CC: Jesse Gross <je...@nicira.com>
CC: Thomas Graf <tg...@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 drivers/net/geneve.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 086c2dae4c3d..305a04e45a13 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1532,6 +1532,10 @@ struct net_device *geneve_dev_create_fb(struct net *net, 
const char *name,
if (err)
goto err;
 
+   err = rtnl_configure_link(dev, NULL);
+   if (err < 0)
+   goto err;
+
return dev;
 
  err:
-- 
2.4.2

[PATCH net v2 2/4] ovs/vxlan: fix rtnl notifications on iface deletion

2016-06-13 Thread Nicolas Dichtel

The function vxlan_dev_create() (only used by ovs) never calls
rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
never set to RTNL_LINK_INITIALIZED.
During the deletion phase, the function rollback_registered_many() sends
a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.

Note that the function vxlan_dev_create() is moved after the rtnl stuff so
that vxlan_dellink() can be called in this function.

Fixes: dcc38c033b32 ("openvswitch: Re-add CONFIG_OPENVSWITCH_VXLAN")
CC: Thomas Graf <tg...@suug.ch>
CC: Pravin B Shelar <pshe...@nicira.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 drivers/net/vxlan.c | 58 +++--
 1 file changed, 34 insertions(+), 24 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index f999db2f97b4..b3b9db68f758 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2952,30 +2952,6 @@ static int vxlan_dev_configure(struct net *src_net, 
struct net_device *dev,
return 0;
 }
 
-struct net_device *vxlan_dev_create(struct net *net, const char *name,
-   u8 name_assign_type, struct vxlan_config 
*conf)
-{
-   struct nlattr *tb[IFLA_MAX+1];
-   struct net_device *dev;
-   int err;
-
-   memset(, 0, sizeof(tb));
-
-   dev = rtnl_create_link(net, name, name_assign_type,
-  _link_ops, tb);
-   if (IS_ERR(dev))
-   return dev;
-
-   err = vxlan_dev_configure(net, dev, conf);
-   if (err < 0) {
-   free_netdev(dev);
-   return ERR_PTR(err);
-   }
-
-   return dev;
-}
-EXPORT_SYMBOL_GPL(vxlan_dev_create);
-
 static int vxlan_newlink(struct net *src_net, struct net_device *dev,
 struct nlattr *tb[], struct nlattr *data[])
 {
@@ -3268,6 +3244,40 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly 
= {
.get_link_net   = vxlan_get_link_net,
 };
 
+struct net_device *vxlan_dev_create(struct net *net, const char *name,
+   u8 name_assign_type,
+   struct vxlan_config *conf)
+{
+   struct nlattr *tb[IFLA_MAX + 1];
+   struct net_device *dev;
+   int err;
+
+   memset(, 0, sizeof(tb));
+
+   dev = rtnl_create_link(net, name, name_assign_type,
+  _link_ops, tb);
+   if (IS_ERR(dev))
+   return dev;
+
+   err = vxlan_dev_configure(net, dev, conf);
+   if (err < 0) {
+   free_netdev(dev);
+   return ERR_PTR(err);
+   }
+
+   err = rtnl_configure_link(dev, NULL);
+   if (err < 0) {
+   LIST_HEAD(list_kill);
+
+   vxlan_dellink(dev, _kill);
+   unregister_netdevice_many(_kill);
+   return ERR_PTR(err);
+   }
+
+   return dev;
+}
+EXPORT_SYMBOL_GPL(vxlan_dev_create);
+
 static void vxlan_handle_lowerdev_unregister(struct vxlan_net *vn,
 struct net_device *dev)
 {
-- 
2.4.2

[PATCH net v2 3/4] ovs/gre: fix rtnl notifications on iface deletion

2016-06-13 Thread Nicolas Dichtel

The function gretap_fb_dev_create() (only used by ovs) never calls
rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
never set to RTNL_LINK_INITIALIZED.
During the deletion phase, the function rollback_registered_many() sends
a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.

Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport")
CC: Thomas Graf <tg...@suug.ch>
CC: Pravin B Shelar <pshe...@nicira.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 net/ipv4/ip_gre.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 08deba679c8c..07c5cf1838d8 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1149,6 +1149,10 @@ struct net_device *gretap_fb_dev_create(struct net *net, 
const char *name,
if (err)
goto out;
 
+   err = rtnl_configure_link(dev, NULL);
+   if (err < 0)
+   goto out;
+
return dev;
 out:
ip_tunnel_dellink(dev, _kill);
-- 
2.4.2

[PATCH net v2 0/4] ovs: fix rtnl notifications on interface deletion

2016-06-13 Thread Nicolas Dichtel


There was no rtnl notifications for interfaces (gre, vxlan, geneve) created
by ovs. This problem is fixed by adjusting the creation path.

v1 -> v2:
 - add patch #1 and #4
 - rework error handling in patch #2

 drivers/net/geneve.c | 14 ++---
 drivers/net/vxlan.c  | 58 ++--
 net/ipv4/ip_gre.c| 14 ++---
 3 files changed, 56 insertions(+), 30 deletions(-)

Regards,
Nicolas

[PATCH net v2 1/4] ovs/gre,geneve: fix error path when creating an iface

2016-06-13 Thread Nicolas Dichtel

After ipgre_newlink()/geneve_configure() call, the netdev is registered.

Fixes: 7e059158d57b ("vxlan, gre, geneve: Set a large MTU on ovs-created tunnel 
devices")
CC: David Wragg <david@weave.works>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 drivers/net/geneve.c | 10 +++---
 net/ipv4/ip_gre.c| 10 +++---
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index cadefe4fdaa2..086c2dae4c3d 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1508,6 +1508,7 @@ struct net_device *geneve_dev_create_fb(struct net *net, 
const char *name,
 {
struct nlattr *tb[IFLA_MAX + 1];
struct net_device *dev;
+   LIST_HEAD(list_kill);
int err;
 
memset(tb, 0, sizeof(tb));
@@ -1519,8 +1520,10 @@ struct net_device *geneve_dev_create_fb(struct net *net, 
const char *name,
err = geneve_configure(net, dev, _remote_unspec,
   0, 0, 0, 0, htons(dst_port), true,
   GENEVE_F_UDP_ZERO_CSUM6_RX);
-   if (err)
-   goto err;
+   if (err) {
+   free_netdev(dev);
+   return ERR_PTR(err);
+   }
 
/* openvswitch users expect packet sizes to be unrestricted,
 * so set the largest MTU we can.
@@ -1532,7 +1535,8 @@ struct net_device *geneve_dev_create_fb(struct net *net, 
const char *name,
return dev;
 
  err:
-   free_netdev(dev);
+   geneve_dellink(dev, _kill);
+   unregister_netdevice_many(_kill);
return ERR_PTR(err);
 }
 EXPORT_SYMBOL_GPL(geneve_dev_create_fb);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 4d2025f7ec57..08deba679c8c 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1121,6 +1121,7 @@ struct net_device *gretap_fb_dev_create(struct net *net, 
const char *name,
 {
struct nlattr *tb[IFLA_MAX + 1];
struct net_device *dev;
+   LIST_HEAD(list_kill);
struct ip_tunnel *t;
int err;
 
@@ -1136,8 +1137,10 @@ struct net_device *gretap_fb_dev_create(struct net *net, 
const char *name,
t->collect_md = true;
 
err = ipgre_newlink(net, dev, tb, NULL);
-   if (err < 0)
-   goto out;
+   if (err < 0) {
+   free_netdev(dev);
+   return ERR_PTR(err);
+   }
 
/* openvswitch users expect packet sizes to be unrestricted,
 * so set the largest MTU we can.
@@ -1148,7 +1151,8 @@ struct net_device *gretap_fb_dev_create(struct net *net, 
const char *name,
 
return dev;
 out:
-   free_netdev(dev);
+   ip_tunnel_dellink(dev, _kill);
+   unregister_netdevice_many(_kill);
return ERR_PTR(err);
 }
 EXPORT_SYMBOL_GPL(gretap_fb_dev_create);
-- 
2.4.2

Re: [PATCH net 1/2] ovs/vxlan: fix rtnl notifications on iface deletion

2016-06-13 Thread Nicolas Dichtel

Le 10/06/2016 19:50, pravin shelar a écrit :
[snip]
> Patch looks good except the error handling code. earlier call to
> vxlan_dev_configure() registers the net-device  and add the device to
> vxlan device list. it needs to be undone before freeing the device.
> 
Ok, will fix it (and also for GRE).


Thank you,
Nicolas

Re: [net v3] veth: advertise peer link once both links are tied together

2016-06-10 Thread Nicolas Dichtel

Le 08/06/2016 22:30, Lance Richardson a écrit :
[snip]
> I've been pondering how to fix this very problem off-and-on for a few months
> now, without having arrived at any solution that was particularly satisfying.
> 
> The main constraints I've been trying to meet are:
>- User-space should be informed of veth pairing for both peers.
>- RTM_NEWLINK messages should not refer to interfaces that haven't
>  been announced to user-space via previous RTM_NEWLINK messages.
>- The first (and only the first) RTM_NEWLINK message for a given
>  interface should have ifi_changes set to ~0U, subsequent RTM_NEWLINK
>  messages should have ifi_changes set to reflect the flags that
>  have changed.
> 
> This is the closest I've come to a satisfactory solution, it does meet
> the above constraints but still seems a little unnatural to me:
The patch looks good to me. Can you submit it formally?

[PATCH net 2/2] ovs/gre: fix rtnl notifications on iface deletion

2016-06-10 Thread Nicolas Dichtel

The function gretap_fb_dev_create() (only used by ovs) never calls
rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
never set to RTNL_LINK_INITIALIZED.
During the deletion phase, the function rollback_registered_many() sends
a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.

Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport")
CC: Thomas Graf <tg...@suug.ch>
CC: Pravin B Shelar <pshe...@nicira.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 net/ipv4/ip_gre.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 4d2025f7ec57..c6a46ee87a3a 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1146,6 +1146,10 @@ struct net_device *gretap_fb_dev_create(struct net *net, 
const char *name,
if (err)
goto out;
 
+   err = rtnl_configure_link(dev, NULL);
+   if (err < 0)
+   goto out;
+
return dev;
 out:
free_netdev(dev);
-- 
2.4.2

[PATCH net 1/2] ovs/vxlan: fix rtnl notifications on iface deletion

2016-06-10 Thread Nicolas Dichtel

The function vxlan_dev_create() (only used by ovs) never calls
rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
never set to RTNL_LINK_INITIALIZED.
During the deletion phase, the function rollback_registered_many() sends
a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.

Fixes: dcc38c033b32 ("openvswitch: Re-add CONFIG_OPENVSWITCH_VXLAN")
CC: Thomas Graf <tg...@suug.ch>
CC: Pravin B Shelar <pshe...@nicira.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 drivers/net/vxlan.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index f999db2f97b4..f7669d60b3f7 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2972,6 +2972,12 @@ struct net_device *vxlan_dev_create(struct net *net, 
const char *name,
return ERR_PTR(err);
}
 
+   err = rtnl_configure_link(dev, NULL);
+   if (err < 0) {
+   free_netdev(dev);
+   return ERR_PTR(err);
+   }
+
return dev;
 }
 EXPORT_SYMBOL_GPL(vxlan_dev_create);
-- 
2.4.2

Re: [PATCH net] net_sched: add missing paddattr description

2016-06-08 Thread Nicolas Dichtel

Le 08/06/2016 15:19, Eric Dumazet a écrit :
> From: Eric Dumazet <eduma...@google.com>
> 
> "make htmldocs" complains otherwise:
> 
> .//net/core/gen_stats.c:65: warning: No description found for parameter 
> 'padattr'
> .//net/core/gen_stats.c:101: warning: No description found for parameter 
> 'padattr'
> 
> 
> Fixes: 9854518ea04d ("sched: align nlattr properly when needed")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Reported-by: kbuild test robot <fengguang...@intel.com>
Acked-by: Nicolas Dichtel <nicolas.dich...@6wind.com>

Thanks Eric.

Re: [net v3] veth: advertise peer link once both links are tied together

2016-05-31 Thread Nicolas Dichtel

Le 31/05/2016 08:29, Vincent Bernat a écrit :
>  ❦ 30 mai 2016 18:27 CEST, Nicolas Dichtel <nicolas.dich...@6wind.com> :
> 
>>>> +
>>>> +  rtmsg_ifinfo(RTM_NEWLINK, peer, IFF_SLAVE, GFP_KERNEL);
>>>
>>> Maybe ~0U would be better than hijacking IFF_SLAVE?
>> IFF_SLAVE is wrong. It's a flag here, that will be put in the ifi_change 
>> field
>> not an attribute number.
> 
> There are some use of IFF_SLAVE (in bonding/bond_main.c). But, I'll
> update the patch nonetheless.
Sorry, I read it too quickly, IFF_SLAVE is a flag, not an attribute.
But this field indicates to the userland which flags has changed and this flag
does not change here ;-)

Re: [net v3] veth: advertise peer link once both links are tied together

2016-05-30 Thread Nicolas Dichtel

Le 30/05/2016 18:01, Vincent Bernat a écrit :
>  ❦ 30 mai 2016 17:58 CEST, Vincent Bernat  :
> 
>> +
>> +rtmsg_ifinfo(RTM_NEWLINK, peer, IFF_SLAVE, GFP_KERNEL);
> 
> Maybe ~0U would be better than hijacking IFF_SLAVE?
IFF_SLAVE is wrong. It's a flag here, that will be put in the ifi_change field
not an attribute number.

Re: [PATCH] veth: delay peer link configuration after interfaces are tied

2016-05-30 Thread Nicolas Dichtel

Le 30/05/2016 17:26, Vincent Bernat a écrit :
>  ❦ 30 mai 2016 17:19 CEST, Nicolas Dichtel <nicolas.dich...@6wind.com> :
> 
>>>>>   priv = netdev_priv(peer);
>>>>>   rcu_assign_pointer(priv->peer, dev);
>>>>> +
>>>>> + err = rtnl_configure_link(peer, ifmp);
>>>>> + if (err < 0)
>>>>> + goto err_configure_peer;
>>>
>>>> You should fix the error path. 'unregister_netdevice(dev)' is missing.
>>>
>>> I am sending another patch to fix that. I am quite unsure if I do the
>>> right thing here.
>>>
>> A less intrusive fix is to call 'rtmsg_ifinfo(RTM_NEWLINK, peer, ~0U,
>> GFP_KERNEL);' a the end of veth_newlink().
> 
> I did that at first. Maybe this would make more sense to do
> that. Otherwise, the first message contains an iflink value that we
> cannot resolve with just the received netlink messages (since the
> information is in the next netlink message). "ip monitor" seems to be
> able to get the info, but I suppose it does an additional
> lookup.
> 
Yes, it's a chicken and egg problem ;-)
I think that the first message with an iflink set to '0' is not a problem if a
second one update this value. Daemons that listen to those rtnl messages must
always update their caches. When the peer is put in another netns, its ifindex
may change again.

Re: [PATCH] veth: delay peer link configuration after interfaces are tied

2016-05-30 Thread Nicolas Dichtel

Le 30/05/2016 12:13, Vincent Bernat a écrit :
> When the peer link is created, its "iflink" information is not
> advertised through netlink. If a user is maintaining a cache from all
> updates, it will miss this information:
> 
> 2: veth0@NONE:  mtu 1500 qdisc noop state DOWN group 
> default
> link/ether ae:0e:08:af:fb:a0 brd ff:ff:ff:ff:ff:ff
> 3: veth1@veth0:  mtu 1500 qdisc noop state 
> DOWN group default
> link/ether 3a:31:f1:36:2e:e5 brd ff:ff:ff:ff:ff:ff
> 
> To avoid this situation, the peer link is only configured after both
> interfaces are tied together:
> 
> 3: veth0@veth1:  mtu 1500 qdisc noop state DOWN 
> group default
> link/ether ee:0d:80:46:36:fe brd ff:ff:ff:ff:ff:ff
> 4: veth1@veth0:  mtu 1500 qdisc noop state 
> DOWN group default
> link/ether ba:25:bc:7a:0d:c8 brd ff:ff:ff:ff:ff:ff
> 
> Signed-off-by: Vincent Bernat 
> ---
Please specify the version in the title (something like '[PATCH net v3]' here,
see --subject-prefix).

Also add the changelog after the '---' (see --annotate).

Re: [PATCH] veth: delay peer link configuration after interfaces are tied

2016-05-30 Thread Nicolas Dichtel

Le 30/05/2016 12:11, Vincent Bernat a écrit :
>  ❦ 30 mai 2016 11:23 CEST, Nicolas Dichtel <nicolas.dich...@6wind.com> :
> 
>>> @@ -466,6 +462,10 @@ static int veth_newlink(struct net *src_net, struct 
>>> net_device *dev,
>>>  
>>> priv = netdev_priv(peer);
>>> rcu_assign_pointer(priv->peer, dev);
>>> +
>>> +   err = rtnl_configure_link(peer, ifmp);
>>> +   if (err < 0)
>>> +   goto err_configure_peer;
> 
>> You should fix the error path. 'unregister_netdevice(dev)' is missing.
> 
> I am sending another patch to fix that. I am quite unsure if I do the
> right thing here.
> 
A less intrusive fix is to call 'rtmsg_ifinfo(RTM_NEWLINK, peer, ~0U,
GFP_KERNEL);' a the end of veth_newlink().

Re: [PATCH] veth: delay peer link configuration after interfaces are tied

2016-05-30 Thread Nicolas Dichtel

Le 29/05/2016 13:17, Vincent Bernat a écrit :
[snip]
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index f37a6e61d4ad..9726c4dbf659 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -432,10 +432,6 @@ static int veth_newlink(struct net *src_net, struct 
> net_device *dev,
>  
>   netif_carrier_off(peer);
>  
> - err = rtnl_configure_link(peer, ifmp);
> - if (err < 0)
> - goto err_configure_peer;
> -
>   /*
>* register dev last
>*
> @@ -466,6 +462,10 @@ static int veth_newlink(struct net *src_net, struct 
> net_device *dev,
>  
>   priv = netdev_priv(peer);
>   rcu_assign_pointer(priv->peer, dev);
> +
> + err = rtnl_configure_link(peer, ifmp);
> + if (err < 0)
> + goto err_configure_peer;
You should fix the error path. 'unregister_netdevice(dev)' is missing.

[PATCH net] uapi glibc compat: fix compilation when !__USE_MISC in glibc

2016-05-19 Thread Nicolas Dichtel

These structures are defined only if __USE_MISC is set in glibc net/if.h
headers, ie when _BSD_SOURCE or _SVID_SOURCE are defined.

CC: Jan Engelhardt <jeng...@inai.de>
CC: Josh Boyer <jwbo...@fedoraproject.org>
CC: Stephen Hemminger <shemm...@brocade.com>
CC: Waldemar Brodkorb <m...@waldemar-brodkorb.de>
CC: Gabriel Laskar <gabr...@lse.epita.fr>
CC: Mikko Rapeli <mikko.rap...@iki.fi>
Fixes: 4a91cb61bb99 ("uapi glibc compat: fix compile errors when glibc net/if.h 
included before linux/if.h")
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 include/uapi/linux/libc-compat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/libc-compat.h b/include/uapi/linux/libc-compat.h
index d5e38c73377c..e4f048ee7043 100644
--- a/include/uapi/linux/libc-compat.h
+++ b/include/uapi/linux/libc-compat.h
@@ -52,7 +52,7 @@
 #if defined(__GLIBC__)
 
 /* Coordinate with glibc net/if.h header. */
-#if defined(_NET_IF_H)
+#if defined(_NET_IF_H) && defined(__USE_MISC)
 
 /* GLIBC headers included first so don't define anything
  * that would already be defined. */
-- 
2.8.1

[PATCH net-next] netlink: kill nla_put_u64()

2016-05-13 Thread Nicolas Dichtel

This function is not used anymore. nla_put_u64_64bit() should be used
instead.

Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

I'm asking myself if it's better to rename nla_put_u64_64bit() to
nla_put_u64(). Because there is a lot of users, the change will be
huge, so I tend to say no.

 include/net/netlink.h | 14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index e589cb3dccee..254a0fc01800 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -98,7 +98,8 @@
  *   nla_put_u8(skb, type, value)  add u8 attribute to skb
  *   nla_put_u16(skb, type, value) add u16 attribute to skb
  *   nla_put_u32(skb, type, value) add u32 attribute to skb
- *   nla_put_u64(skb, type, value) add u64 attribute to skb
+ *   nla_put_u64_64bits(skb, type,
+ * value, padattr) add u64 attribute to skb
  *   nla_put_s8(skb, type, value)  add s8 attribute to skb
  *   nla_put_s16(skb, type, value) add s16 attribute to skb
  *   nla_put_s32(skb, type, value) add s32 attribute to skb
@@ -847,17 +848,6 @@ static inline int nla_put_le32(struct sk_buff *skb, int 
attrtype, __le32 value)
 }
 
 /**
- * nla_put_u64 - Add a u64 netlink attribute to a socket buffer
- * @skb: socket buffer to add attribute to
- * @attrtype: attribute type
- * @value: numeric value
- */
-static inline int nla_put_u64(struct sk_buff *skb, int attrtype, u64 value)
-{
-   return nla_put(skb, attrtype, sizeof(u64), );
-}
-
-/**
  * nla_put_u64_64bit - Add a u64 netlink attribute to a skb and align it
  * @skb: socket buffer to add attribute to
  * @attrtype: attribute type
-- 
2.8.1

[PATCH net-next] ipv6: fix 4in6 tunnel receive path

2016-05-10 Thread Nicolas Dichtel

Protocol for 4in6 tunnel is IPPROTO_IPIP. This was wrongly changed by
the last cleanup.

CC: Tom Herbert <t...@herbertland.com>
Fixes: 0d3c703a9d17 ("ipv6: Cleanup IPv6 tunnel receive path")
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 net/ipv6/ip6_tunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 50af7061ecdb..e79330f214bd 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -897,7 +897,7 @@ drop:
 
 static int ip4ip6_rcv(struct sk_buff *skb)
 {
-   return ipxip6_rcv(skb, IPPROTO_IP, _v4,
+   return ipxip6_rcv(skb, IPPROTO_IPIP, _v4,
  ip4ip6_dscp_ecn_decapsulate);
 }
 
-- 
2.8.1

Re: [Drbd-dev] [PATCH net-next v3] block/drbd: align properly u64 in nl messages

2016-05-10 Thread Nicolas Dichtel

Le 10/05/2016 11:40, Lars Ellenberg a écrit :
> On Tue, May 10, 2016 at 11:09:53AM +0200, Nicolas Dichtel wrote:
>> Le 09/05/2016 15:15, Lars Ellenberg a écrit :
>>> On Mon, May 09, 2016 at 11:40:20AM +0200, Nicolas Dichtel wrote:
>> [snip]
>>>> Maybe prefixing genl_magic_func.h and genl_magic_struct.h by 'drbd_'
>>>> could be interesting so that new module won't use it. What is your
>>>> opinion?
>>>
>>> This was supposed to not be DRBD specific.  But it might even still
>>> need some massaging before it was truly generic. And obviously,
>>> it does not meet the taste of genetlink folks, to say the least :(
>> Yes, this file is not generic and netlink APIs are never defined like this.
>> These tons of macro complexifies the code too much. It's overengineering for
>> what purpose?
> 
> If we introduce a new config option,
> we have to add it to the config scanner (one line),
> define min, max, default and scale (four short defines),
> and add it to the netlink definition here (one line).
> Done, rest of the code is generated,
> both on the kernel side,
> and on the drbd-utils side used to talk to the kernel.
> We found that to be very convenient.
Ok.

> 
>> Small examples:
>>  - the drbd netlink API is not exported via uapi (I wonder how apps using 
>> this
>>API get it)
> 
> There used to be a time where there was no "uapi".
> (I wonder how apps ever worked back then).
At that time, include/linux/ was exported ;-)

> 
>>  - v2 of the patch is nacked because adding a new attribute may break 
>> existing
> 
> No.
> 
> But because the "new" attributes you chose have not been new,
> but already used (though not yet merged back into mainline yet).
> (Which you did not realize, and had no obvious way of knowing.
>  Could have been fixed.).
Ok.

> 
> And because your patch introduced useless new members to the structs.
> (Could also have been fixed).
> 
> And because I did not see any use defining that many new "padding attributes"
> for no reason, where the obvious (to me) choice was to use 0, and you
> did not even try to explain why that would have been a bad choice.
Because some nl APIs were wrongly use 0 as a valid attribute we make the choice
of always adding a new attribute for padding to be sure to not break existing 
API.
And yes, in drdb it does not seem to be the case.

> Is this going somewhere?
I'm just trying to understand things.


Regards,
Nicolas

[PATCH net-next v2] ila: ipv6/ila: fix nlsize calculation for lwtunnel

2016-05-10 Thread Nicolas Dichtel

From: Tom Herbert <t...@herbertland.com>

The handler 'ila_fill_encap_info' adds two attributes: ILA_ATTR_LOCATOR
and ILA_ATTR_CSUM_MODE.

nla_total_size_64bit() must be use for ILA_ATTR_LOCATOR.

Also, do nla_put_u8 instead of nla_put_u64 for ILA_ATTR_CSUM_MODE.

Fixes: f13a82d87b21 ("ipv6: use nla_put_u64_64bit()")
Fixes: 90bfe662db13 ("ila: add checksum neutral ILA translations")
Reported-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
Signed-off-by: Tom Herbert <t...@herbertland.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

Tom, I have taken the liberty of resending your patch, I hope it's ok
for you. Goal is to fix this before net-next closes.

v1 -> v2:
  update the patch after the merge of net (and thus update 'Fixes' tag)
  use nla_total_size_64bit() for ILA_ATTR_LOCATOR

 net/ipv6/ila/ila_lwt.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index 17038e1ede98..1dfb64166d7d 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -133,7 +133,7 @@ static int ila_fill_encap_info(struct sk_buff *skb,
if (nla_put_u64_64bit(skb, ILA_ATTR_LOCATOR, (__force 
u64)p->locator.v64,
  ILA_ATTR_PAD))
goto nla_put_failure;
-   if (nla_put_u64(skb, ILA_ATTR_CSUM_MODE, (__force u8)p->csum_mode))
+   if (nla_put_u8(skb, ILA_ATTR_CSUM_MODE, (__force u8)p->csum_mode))
goto nla_put_failure;
 
return 0;
@@ -144,7 +144,9 @@ nla_put_failure:
 
 static int ila_encap_nlsize(struct lwtunnel_state *lwtstate)
 {
-   return nla_total_size(sizeof(u64)); /* ILA_ATTR_LOCATOR */
+   return nla_total_size_64bit(sizeof(u64)) + /* ILA_ATTR_LOCATOR */
+  nla_total_size(sizeof(u8)) +/* ILA_ATTR_CSUM_MODE */
+  0;
 }
 
 static int ila_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
-- 
2.8.1

Re: [PATCH net-next v3] block/drbd: align properly u64 in nl messages

2016-05-10 Thread Nicolas Dichtel

Le 09/05/2016 15:15, Lars Ellenberg a écrit :
> On Mon, May 09, 2016 at 11:40:20AM +0200, Nicolas Dichtel wrote:
[snip]
>> Maybe prefixing genl_magic_func.h and genl_magic_struct.h by 'drbd_'
>> could be interesting so that new module won't use it. What is your
>> opinion?
> 
> This was supposed to not be DRBD specific.  But it might even still
> need some massaging before it was truly generic. And obviously,
> it does not meet the taste of genetlink folks, to say the least :(
Yes, this file is not generic and netlink APIs are never defined like this.
These tons of macro complexifies the code too much. It's overengineering for
what purpose?
Small examples:
 - the drbd netlink API is not exported via uapi (I wonder how apps using this
   API get it)
 - v2 of the patch is nacked because adding a new attribute may break existing
   apps (in networking code, a lot of new attributes are added in each version)
 - it's not possible to grep to show the definition of an attribute ('git grep
   -w T_bits_total' returns only 1 line)


Regards,
Nicolas

[PATCH net-next v3] block/drbd: align properly u64 in nl messages

2016-05-09 Thread Nicolas Dichtel

The attribute 0 is never used in drbd, so let's use it as pad attribute
in netlink messages. This minimizes the patch.

Note that this patch is only compile-tested.

Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
Signed-off-by: Lars Ellenberg <lars.ellenb...@linbit.com>
---

v2 -> v3:
  use 0 as padattr instead of adding new attributes

v1 -> v2:
 rework the patch to handle all cases

Maybe prefixing genl_magic_func.h and genl_magic_struct.h by 'drbd_'
could be interesting so that new module won't use it. What is your
opinion?

 drivers/block/drbd/drbd_nl.c  | 28 
 include/linux/genl_magic_struct.h |  7 ++-
 2 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 1fd1dccebb6b..0bac9c8246bc 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -3633,14 +3633,15 @@ static int nla_put_status_info(struct sk_buff *skb, 
struct drbd_device *device,
goto nla_put_failure;
if (nla_put_u32(skb, T_sib_reason, sib ? sib->sib_reason : 
SIB_GET_STATUS_REPLY) ||
nla_put_u32(skb, T_current_state, device->state.i) ||
-   nla_put_u64(skb, T_ed_uuid, device->ed_uuid) ||
-   nla_put_u64(skb, T_capacity, drbd_get_capacity(device->this_bdev)) 
||
-   nla_put_u64(skb, T_send_cnt, device->send_cnt) ||
-   nla_put_u64(skb, T_recv_cnt, device->recv_cnt) ||
-   nla_put_u64(skb, T_read_cnt, device->read_cnt) ||
-   nla_put_u64(skb, T_writ_cnt, device->writ_cnt) ||
-   nla_put_u64(skb, T_al_writ_cnt, device->al_writ_cnt) ||
-   nla_put_u64(skb, T_bm_writ_cnt, device->bm_writ_cnt) ||
+   nla_put_u64_0pad(skb, T_ed_uuid, device->ed_uuid) ||
+   nla_put_u64_0pad(skb, T_capacity,
+drbd_get_capacity(device->this_bdev)) ||
+   nla_put_u64_0pad(skb, T_send_cnt, device->send_cnt) ||
+   nla_put_u64_0pad(skb, T_recv_cnt, device->recv_cnt) ||
+   nla_put_u64_0pad(skb, T_read_cnt, device->read_cnt) ||
+   nla_put_u64_0pad(skb, T_writ_cnt, device->writ_cnt) ||
+   nla_put_u64_0pad(skb, T_al_writ_cnt, device->al_writ_cnt) ||
+   nla_put_u64_0pad(skb, T_bm_writ_cnt, device->bm_writ_cnt) ||
nla_put_u32(skb, T_ap_bio_cnt, atomic_read(>ap_bio_cnt)) ||
nla_put_u32(skb, T_ap_pending_cnt, 
atomic_read(>ap_pending_cnt)) ||
nla_put_u32(skb, T_rs_pending_cnt, 
atomic_read(>rs_pending_cnt)))
@@ -3657,13 +3658,16 @@ static int nla_put_status_info(struct sk_buff *skb, 
struct drbd_device *device,
goto nla_put_failure;
 
if (nla_put_u32(skb, T_disk_flags, device->ldev->md.flags) ||
-   nla_put_u64(skb, T_bits_total, drbd_bm_bits(device)) ||
-   nla_put_u64(skb, T_bits_oos, drbd_bm_total_weight(device)))
+   nla_put_u64_0pad(skb, T_bits_total, drbd_bm_bits(device)) ||
+   nla_put_u64_0pad(skb, T_bits_oos,
+drbd_bm_total_weight(device)))
goto nla_put_failure;
if (C_SYNC_SOURCE <= device->state.conn &&
C_PAUSED_SYNC_T >= device->state.conn) {
-   if (nla_put_u64(skb, T_bits_rs_total, device->rs_total) 
||
-   nla_put_u64(skb, T_bits_rs_failed, 
device->rs_failed))
+   if (nla_put_u64_0pad(skb, T_bits_rs_total,
+device->rs_total) ||
+   nla_put_u64_0pad(skb, T_bits_rs_failed,
+device->rs_failed))
goto nla_put_failure;
}
}
diff --git a/include/linux/genl_magic_struct.h 
b/include/linux/genl_magic_struct.h
index eecd19b37001..6270a56e5edc 100644
--- a/include/linux/genl_magic_struct.h
+++ b/include/linux/genl_magic_struct.h
@@ -62,6 +62,11 @@ extern void CONCAT_(GENL_MAGIC_FAMILY, 
_genl_unregister)(void);
 
 /* MAGIC helpers   {{{2 */
 
+static inline int nla_put_u64_0pad(struct sk_buff *skb, int attrtype, u64 
value)
+{
+   return nla_put_64bit(skb, attrtype, sizeof(u64), , 0);
+}
+
 /* possible field types */
 #define __flg_field(attr_nr, attr_flag, name) \
__field(attr_nr, attr_flag, name, NLA_U8, char, \
@@ -80,7 +85,7 @@ extern void CONCAT_(GENL_MAGIC_FAMILY, 
_genl_unregister)(void);
nla_get_u32, nla_put_u32, true)
 #define __u64_field(attr_nr, attr_flag, name)  \
__field(attr_nr, attr_flag, name, NLA_U64, __u64, \
-   nla_get_u64, nla_put_u64, false)
+   nla_get_u64, nla_put_u64_0pad, false)
 #define

Re: [PATCH net-next v2] block/drbd: use nla_put_u64_64bit()

2016-05-04 Thread Nicolas Dichtel

Le 04/05/2016 11:05, Lars Ellenberg a écrit :
[snip]
> We don't have an "alignment problem" there, btw.
> Last time I checked, we did work fine without this alignment magic,
> we already take care of that, yes, even on affected architectures.
The code adds several consecutive u64 attributes. The nl attribute header is 4
bytes, thus the full attribute length is 12 bytes. If the first u64 is aligned
on 8 (nla_data()), the next one is not aligned on 8: it starts 12 bytes (8 (u64)
+ 4 (nl attr hdr)) after the previous u64.

Re: [PATCH net-next v2] block/drbd: use nla_put_u64_64bit()

2016-05-03 Thread Nicolas Dichtel

Le 03/05/2016 12:06, Lars Ellenberg a écrit :
> On Tue, May 03, 2016 at 11:39:18AM +0200, Nicolas Dichtel wrote:
>> Two new handlers have been defined in genl_magic_ headers:
>>  - __field2: the corresponding nla_put() function (nla_put_flag()) takes
>>  only two args
>>  - __field4: the corresponding nla_put() function (nla_put_u64_64bit())
>>  takes four args
>>
>> __field2 allows us to define __unspec_field for padding attribute.
>> __field4 allows us to update the definition of __u64_field: the pad
>> attribute should now be specified.
> 
> Please just NOT use an additional "field",
> but always use 0 to pad.
> 
> Patch is much shorter as well, see below.
I don't think that the goal is to make the shortest patch...
But frankly, I don't care. The goal was to use the new interface in a proper
way, like every other subsystem.

> 
> Attribute type "0" is not used,
> and will never be of semantic value,
> but always be ignored in the DRBD netlink family.
> 
> Whereas using some arbitrary value will be wrong,
> and will needlessly break userland.
An application should always ignore unknown attribute, this is a golden rule.
Now if you know that this patch will break applications (which one exactly?), we
can use your proposal.


Regards,
Nicolas

[PATCH net-next v2] block/drbd: use nla_put_u64_64bit()

2016-05-03 Thread Nicolas Dichtel

Two new handlers have been defined in genl_magic_ headers:
 - __field2: the corresponding nla_put() function (nla_put_flag()) takes
 only two args
 - __field4: the corresponding nla_put() function (nla_put_u64_64bit())
 takes four args

__field2 allows us to define __unspec_field for padding attribute.
__field4 allows us to update the definition of __u64_field: the pad
attribute should now be specified.

Note that this patch is only compile-tested.

Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

v1 -> v2:
 rework the patch to handle all cases

 drivers/block/drbd/drbd_nl.c  | 40 +
 include/linux/drbd_genl.h | 62 +--
 include/linux/genl_magic_func.h   | 41 ++
 include/linux/genl_magic_struct.h | 45 ++--
 4 files changed, 145 insertions(+), 43 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 1fd1dccebb6b..93d873532195 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -3633,14 +3633,23 @@ static int nla_put_status_info(struct sk_buff *skb, 
struct drbd_device *device,
goto nla_put_failure;
if (nla_put_u32(skb, T_sib_reason, sib ? sib->sib_reason : 
SIB_GET_STATUS_REPLY) ||
nla_put_u32(skb, T_current_state, device->state.i) ||
-   nla_put_u64(skb, T_ed_uuid, device->ed_uuid) ||
-   nla_put_u64(skb, T_capacity, drbd_get_capacity(device->this_bdev)) 
||
-   nla_put_u64(skb, T_send_cnt, device->send_cnt) ||
-   nla_put_u64(skb, T_recv_cnt, device->recv_cnt) ||
-   nla_put_u64(skb, T_read_cnt, device->read_cnt) ||
-   nla_put_u64(skb, T_writ_cnt, device->writ_cnt) ||
-   nla_put_u64(skb, T_al_writ_cnt, device->al_writ_cnt) ||
-   nla_put_u64(skb, T_bm_writ_cnt, device->bm_writ_cnt) ||
+   nla_put_u64_64bit(skb, T_ed_uuid, device->ed_uuid,
+ T_state_info_pad) ||
+   nla_put_u64_64bit(skb, T_capacity,
+ drbd_get_capacity(device->this_bdev),
+ T_state_info_pad) ||
+   nla_put_u64_64bit(skb, T_send_cnt, device->send_cnt,
+ T_state_info_pad) ||
+   nla_put_u64_64bit(skb, T_recv_cnt, device->recv_cnt,
+ T_state_info_pad) ||
+   nla_put_u64_64bit(skb, T_read_cnt, device->read_cnt,
+ T_state_info_pad) ||
+   nla_put_u64_64bit(skb, T_writ_cnt, device->writ_cnt,
+ T_state_info_pad) ||
+   nla_put_u64_64bit(skb, T_al_writ_cnt, device->al_writ_cnt,
+ T_state_info_pad) ||
+   nla_put_u64_64bit(skb, T_bm_writ_cnt, device->bm_writ_cnt,
+ T_state_info_pad) ||
nla_put_u32(skb, T_ap_bio_cnt, atomic_read(>ap_bio_cnt)) ||
nla_put_u32(skb, T_ap_pending_cnt, 
atomic_read(>ap_pending_cnt)) ||
nla_put_u32(skb, T_rs_pending_cnt, 
atomic_read(>rs_pending_cnt)))
@@ -3657,13 +3666,20 @@ static int nla_put_status_info(struct sk_buff *skb, 
struct drbd_device *device,
goto nla_put_failure;
 
if (nla_put_u32(skb, T_disk_flags, device->ldev->md.flags) ||
-   nla_put_u64(skb, T_bits_total, drbd_bm_bits(device)) ||
-   nla_put_u64(skb, T_bits_oos, drbd_bm_total_weight(device)))
+   nla_put_u64_64bit(skb, T_bits_total, drbd_bm_bits(device),
+ T_state_info_pad) ||
+   nla_put_u64_64bit(skb, T_bits_oos,
+ drbd_bm_total_weight(device),
+ T_state_info_pad))
goto nla_put_failure;
if (C_SYNC_SOURCE <= device->state.conn &&
C_PAUSED_SYNC_T >= device->state.conn) {
-   if (nla_put_u64(skb, T_bits_rs_total, device->rs_total) 
||
-   nla_put_u64(skb, T_bits_rs_failed, 
device->rs_failed))
+   if (nla_put_u64_64bit(skb, T_bits_rs_total,
+ device->rs_total,
+ T_state_info_pad) ||
+   nla_put_u64_64bit(skb, T_bits_rs_failed,
+ device->rs_failed,
+ T_state_info_pad))
goto nla_put_failure;
}
}
diff --git a/include/linux/drbd_genl.h b/include/linux/drbd_genl.h
index 2d0e5ad5de9d..f3b810089142 100644
--- a/include/linux/drbd_genl.h
+++ b/include/linux/drbd_genl.h
@@ -107,7 +107,7 @@ GENL_struct(

Re: [PATCH net-next] block/drbd: use nla_put_u64_64bit()

2016-05-03 Thread Nicolas Dichtel

Le 03/05/2016 10:50, Nicolas Dichtel a écrit :
> I had to define an intermediate function (nla_magic_put_flag()) because
> handlers in genl_magic_struct.h expect a function with three arguments.
> 
> Note that this patch is only compile-tested.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
Please, drop it. I will send another version to handle all cases.

[PATCH net-next] block/drbd: use nla_put_u64_64bit()

2016-05-03 Thread Nicolas Dichtel

I had to define an intermediate function (nla_magic_put_flag()) because
handlers in genl_magic_struct.h expect a function with three arguments.

Note that this patch is only compile-tested.

Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 drivers/block/drbd/drbd_nl.c  | 29 +
 include/linux/drbd_genl.h |  1 +
 include/linux/genl_magic_struct.h |  4 
 3 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 1fd1dccebb6b..22ec2ede4110 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -3633,14 +3633,15 @@ static int nla_put_status_info(struct sk_buff *skb, 
struct drbd_device *device,
goto nla_put_failure;
if (nla_put_u32(skb, T_sib_reason, sib ? sib->sib_reason : 
SIB_GET_STATUS_REPLY) ||
nla_put_u32(skb, T_current_state, device->state.i) ||
-   nla_put_u64(skb, T_ed_uuid, device->ed_uuid) ||
-   nla_put_u64(skb, T_capacity, drbd_get_capacity(device->this_bdev)) 
||
-   nla_put_u64(skb, T_send_cnt, device->send_cnt) ||
-   nla_put_u64(skb, T_recv_cnt, device->recv_cnt) ||
-   nla_put_u64(skb, T_read_cnt, device->read_cnt) ||
-   nla_put_u64(skb, T_writ_cnt, device->writ_cnt) ||
-   nla_put_u64(skb, T_al_writ_cnt, device->al_writ_cnt) ||
-   nla_put_u64(skb, T_bm_writ_cnt, device->bm_writ_cnt) ||
+   nla_put_u64_64bit(skb, T_ed_uuid, device->ed_uuid, T_pad) ||
+   nla_put_u64_64bit(skb, T_capacity,
+ drbd_get_capacity(device->this_bdev), T_pad) ||
+   nla_put_u64_64bit(skb, T_send_cnt, device->send_cnt, T_pad) ||
+   nla_put_u64_64bit(skb, T_recv_cnt, device->recv_cnt, T_pad) ||
+   nla_put_u64_64bit(skb, T_read_cnt, device->read_cnt, T_pad) ||
+   nla_put_u64_64bit(skb, T_writ_cnt, device->writ_cnt, T_pad) ||
+   nla_put_u64_64bit(skb, T_al_writ_cnt, device->al_writ_cnt, T_pad) ||
+   nla_put_u64_64bit(skb, T_bm_writ_cnt, device->bm_writ_cnt, T_pad) ||
nla_put_u32(skb, T_ap_bio_cnt, atomic_read(>ap_bio_cnt)) ||
nla_put_u32(skb, T_ap_pending_cnt, 
atomic_read(>ap_pending_cnt)) ||
nla_put_u32(skb, T_rs_pending_cnt, 
atomic_read(>rs_pending_cnt)))
@@ -3657,13 +3658,17 @@ static int nla_put_status_info(struct sk_buff *skb, 
struct drbd_device *device,
goto nla_put_failure;
 
if (nla_put_u32(skb, T_disk_flags, device->ldev->md.flags) ||
-   nla_put_u64(skb, T_bits_total, drbd_bm_bits(device)) ||
-   nla_put_u64(skb, T_bits_oos, drbd_bm_total_weight(device)))
+   nla_put_u64_64bit(skb, T_bits_total, drbd_bm_bits(device),
+ T_pad) ||
+   nla_put_u64_64bit(skb, T_bits_oos,
+ drbd_bm_total_weight(device), T_pad))
goto nla_put_failure;
if (C_SYNC_SOURCE <= device->state.conn &&
C_PAUSED_SYNC_T >= device->state.conn) {
-   if (nla_put_u64(skb, T_bits_rs_total, device->rs_total) 
||
-   nla_put_u64(skb, T_bits_rs_failed, 
device->rs_failed))
+   if (nla_put_u64_64bit(skb, T_bits_rs_total,
+ device->rs_total, T_pad) ||
+   nla_put_u64_64bit(skb, T_bits_rs_failed,
+ device->rs_failed, T_pad))
goto nla_put_failure;
}
}
diff --git a/include/linux/drbd_genl.h b/include/linux/drbd_genl.h
index 2d0e5ad5de9d..8d327d8fbbc2 100644
--- a/include/linux/drbd_genl.h
+++ b/include/linux/drbd_genl.h
@@ -227,6 +227,7 @@ GENL_struct(DRBD_NLA_STATE_INFO, 8, state_info,
__u32_field(21,  0, ap_bio_cnt)
__u32_field(22,  0, ap_pending_cnt)
__u32_field(23,  0, rs_pending_cnt)
+   __unspec_field(24,   0, pad)
 )
 
 GENL_struct(DRBD_NLA_START_OV_PARMS, 9, start_ov_parms,
diff --git a/include/linux/genl_magic_struct.h 
b/include/linux/genl_magic_struct.h
index eecd19b37001..fde46be8fc40 100644
--- a/include/linux/genl_magic_struct.h
+++ b/include/linux/genl_magic_struct.h
@@ -61,11 +61,15 @@ extern void CONCAT_(GENL_MAGIC_FAMILY, 
_genl_unregister)(void);
  */
 
 /* MAGIC helpers   {{{2 */
+#define nla_magic_put_flag(skb, attr, val) nla_put_flag(skb, attr)
 
 /* possible field types */
 #define __flg_field(attr_nr, attr_flag, name) \
__field(attr_nr, attr_flag, name, NLA_U8, char, \
nla_get_u8, nla_put_u8

[PATCH net v1] ipv6/ila: fix nlsize calculation for lwtunnel

2016-05-03 Thread Nicolas Dichtel

The handler 'ila_fill_encap_info' adds one attribute: ILA_ATTR_LOCATOR.

Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module")
CC: Tom Herbert <t...@herbertland.com>
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---

RFC -> v1:
  - rebase on last net tree

 net/ipv6/ila/ila_lwt.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index 2ae3c4fd8aab..41f18de5dcc2 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -120,8 +120,7 @@ nla_put_failure:
 
 static int ila_encap_nlsize(struct lwtunnel_state *lwtstate)
 {
-   /* No encapsulation overhead */
-   return 0;
+   return nla_total_size(sizeof(u64)); /* ILA_ATTR_LOCATOR */
 }
 
 static int ila_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
-- 
2.8.1

Re: [RFC PATCH net] ipv6/ila: fix nlsize calculation for lwtunnel

2016-04-29 Thread Nicolas Dichtel

Le 28/04/2016 18:07, Tom Herbert a écrit :
> On Wed, Apr 27, 2016 at 12:20 PM, David Miller <da...@davemloft.net> wrote:
>> From: Nicolas Dichtel <nicolas.dich...@6wind.com>
>> Date: Fri, 22 Apr 2016 17:58:02 +0200
>>
>>> The handler 'ila_fill_encap_info' adds one attribute: ILA_ATTR_LOCATOR.
>>>
>>> Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module")
>>> CC: Tom Herbert <t...@herbertland.com>
>>> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
>>> ---
>>>
>>> Tom, when I read the comment, I feel I'm misssing something, but what?
>>
>> Tom, seriously, please look at this.
>>
> Yes, it is an issue. Need to do add size for ILA_ATTR_LOCATOR and
> ILA_ATTR_CSUM_MODE. Also not we made ILA_ATTR_CSUM_MODE u64 in fill
> encap info. I will send a path shortly.
David, I saw that you didn't apply this patch. Note that Tom's patch is
for net-next while this one is for net.
I think that both patches are needed (and they will conflict after the next 
merge).


Regards,
Nicolas

Re: [PATCH net-next] drivers/net: add 6WIND SHULTI support

2016-04-29 Thread Nicolas Dichtel

Le 28/04/2016 17:54, David Miller a écrit :
[snip]
> You can say whatever you want, but the facilities you are adding to
> this driver enables proprietary userland SDK components.
> 
> And this is precisely what we are trying to avoid by having a clean,
> fully featured switch device model in the kernel.
> 
> It is against your interestes of upstreaming your driver to continue
> denying what your changes facilitate.
Ok, I will rework this patch to remove the controversial parts and custom APIs.


Thank you,
Nicolas

Re: [PATCH net-next] ila: ipv6/ila: fix nlsize calculation for lwtunnel

2016-04-29 Thread Nicolas Dichtel

Le 29/04/2016 02:12, Tom Herbert a écrit :
> The handler 'ila_fill_encap_info' adds two attributes: ILA_ATTR_LOCATOR
> and ILA_ATTR_CSUM_MODE.
> 
> Also, do nla_put_u8 instead of nla_put_u64 for ILA_ATTR_CSUM_MODE.
> 
> Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module")
> Reported-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
> Signed-off-by: Tom Herbert <t...@herbertland.com>
> ---
>  net/ipv6/ila/ila_lwt.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
> index 4985e1a..7788090 100644
> --- a/net/ipv6/ila/ila_lwt.c
> +++ b/net/ipv6/ila/ila_lwt.c
> @@ -133,7 +133,7 @@ static int ila_fill_encap_info(struct sk_buff *skb,
>   if (nla_put_u64_64bit(skb, ILA_ATTR_LOCATOR, (__force 
> u64)p->locator.v64,
> ILA_ATTR_PAD))
>   goto nla_put_failure;
> - if (nla_put_u64(skb, ILA_ATTR_CSUM_MODE, (__force u8)p->csum_mode))
> + if (nla_put_u8(skb, ILA_ATTR_CSUM_MODE, (__force u8)p->csum_mode))
>   goto nla_put_failure;
>  
>   return 0;
> @@ -144,8 +144,12 @@ nla_put_failure:
>  
>  static int ila_encap_nlsize(struct lwtunnel_state *lwtstate)
>  {
> - /* No encapsulation overhead */
> - return 0;
> + return
> + /* ILA_ATTR_LOCATOR */
> + nla_total_size(sizeof(u64)) +
It should be nla_total_size_64bit() here.


Regards,
Nicolas

Re: [PATCH net-next] vxlan: fix ethernet address initialization

2016-04-28 Thread Nicolas Dichtel

Le 28/04/2016 12:04, Nicolas Dichtel a écrit :
> Since commit 0c867c9bf84c, when the user specifies an ethernet address with
> IFLA_ADDRESS, it's overridden by vxlan_ether_setup() (rtnl_link_ops->setup
> is called in rtnetlink.c before handling IFLA_ADDRESS).
> 
> To test it:
> ip link add name vxlan1 address de:ad:de:4c:0f:c2 type vxlan id 1 group 
> 239.0.0.10 dev eth0
> 
> CC: Jiri Benc <jb...@redhat.com>
> Fixes: 0c867c9bf84c ("vxlan: move Ethernet initialization to a separate 
> function")
> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
David, please drop this patch. Jiri has sent a better fix.


Thank you,
Nicolas

Re: [PATCH net-next] vxlan: fix initialization with custom link parameters

2016-04-28 Thread Nicolas Dichtel

Le 28/04/2016 16:36, Jiri Benc a écrit :
> Commit 0c867c9bf84c ("vxlan: move Ethernet initialization to a separate
> function") changed initialization order and as an unintended result, when the
> user specifies additional link parameters (such as IFLA_ADDRESS) while
> creating vxlan interface, those are overwritten by vxlan_ether_setup later.
> 
> It's necessary to call ether_setup from withing the ->setup callback. That
> way, the correct parameters are set by rtnl_create_link later. This is done
> also for VXLAN-GPE, as we don't know the interface type yet at that point,
> and changed to the correct interface type later.
> 
> Fixes: 0c867c9bf84c ("vxlan: move Ethernet initialization to a separate 
> function")
> Reported-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
> Signed-off-by: Jiri Benc <jb...@redhat.com>
Tested-by: Nicolas Dichtel <nicolas.dich...@6wind.com>

Re: [PATCH net-next 00/17] net: snmp: update SNMP methods

2016-04-28 Thread Nicolas Dichtel

Le 28/04/2016 14:00, Eric Dumazet a écrit :
[snip]
> Hi Nicolas, thanks for testing.
> 
> Oh right, I shouldn't have changed the BH disabling of 64bit stats on
> 32bit arches, of course.
> 
> Can you double check this will fix the problem ? Thanks !
Thank you for the quick fix!

Tested-by: Nicolas Dichtel <nicolas.dich...@6wind.com>

Re: [PATCH net-next] drivers/net: add 6WIND SHULTI support

2016-04-28 Thread Nicolas Dichtel

Le 27/04/2016 19:07, Stephen Hemminger a écrit :
[snip]
> Also it has a bunch of device specific generic netlink which was
> a red flag for me.
> 
Ok, I will rework this part.

Re: [PATCH net-next] drivers/net: add 6WIND SHULTI support

2016-04-28 Thread Nicolas Dichtel

Le 27/04/2016 18:55, David Miller a écrit :
> From: Jiri Pirko 
[snip]
>> The difference is that it this tries to allow userspace crap to mirror
>> setting user does for bridge/ovs. Basically this looks to me like an
>> attempt to enable userspace SDKs and such.
> 
> +1
I don't think so because a userspace can receive all the bridge/ovs settings by
mean of netlink mirror, without the need for this driver at all.

[PATCH net-next] vxlan: fix ethernet address initialization

2016-04-28 Thread Nicolas Dichtel

Since commit 0c867c9bf84c, when the user specifies an ethernet address with
IFLA_ADDRESS, it's overridden by vxlan_ether_setup() (rtnl_link_ops->setup
is called in rtnetlink.c before handling IFLA_ADDRESS).

To test it:
ip link add name vxlan1 address de:ad:de:4c:0f:c2 type vxlan id 1 group 
239.0.0.10 dev eth0

CC: Jiri Benc <jb...@redhat.com>
Fixes: 0c867c9bf84c ("vxlan: move Ethernet initialization to a separate 
function")
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 drivers/net/vxlan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 6fb93b57a724..d454c7e7d16e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2592,7 +2592,8 @@ static void vxlan_setup(struct net_device *dev)
 
 static void vxlan_ether_setup(struct net_device *dev)
 {
-   eth_hw_addr_random(dev);
+   if (!is_valid_ether_addr(dev->dev_addr))
+   eth_hw_addr_random(dev);
ether_setup(dev);
dev->priv_flags &= ~IFF_TX_SKB_SHARING;
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
-- 
2.4.2

[PATCH net-next] taskstats: fix nl parsing in accounting/getdelays.c

2016-04-27 Thread Nicolas Dichtel

The type TASKSTATS_TYPE_NULL should always be ignored.

When jumping to the next attribute, only the length of the current
attribute should be added, not the length of all nested attributes.
This last bug was not visible before commit 80df554275c2, because the
kernel didn't put more than two nested attributes.

Fixes: a3baf649ca9c ("[PATCH] per-task-delay-accounting: documentation")
Fixes: 80df554275c2 ("taskstats: use the libnl API to align nlattr on 64-bit")
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 Documentation/accounting/getdelays.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/accounting/getdelays.c 
b/Documentation/accounting/getdelays.c
index 7785fb5eb93f..d3caa6748a46 100644
--- a/Documentation/accounting/getdelays.c
+++ b/Documentation/accounting/getdelays.c
@@ -512,7 +512,7 @@ int main(int argc, char *argv[])
break;
}
len2 += NLA_ALIGN(na->nla_len);
-   na = (struct nlattr *) ((char *) na + 
len2);
+   na = (struct nlattr *) ((char *) na + 
NLA_ALIGN(na->nla_len));
}
break;
 
-- 
2.8.1

[PATCH net-next v2] taskstats: fix nl parsing in accounting/getdelays.c

2016-04-27 Thread Nicolas Dichtel

The type TASKSTATS_TYPE_NULL should always be ignored.

When jumping to the next attribute, only the length of the current
attribute should be added, not the length of all nested attributes.
This last bug was not visible before commit 80df554275c2, because the
kernel didn't put more than two nested attributes.

Fixes: a3baf649ca9c ("[PATCH] per-task-delay-accounting: documentation")
Fixes: 80df554275c2 ("taskstats: use the libnl API to align nlattr on 64-bit")
Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
---
 Documentation/accounting/getdelays.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/accounting/getdelays.c 
b/Documentation/accounting/getdelays.c
index 7785fb5eb93f..b5ca536e56a8 100644
--- a/Documentation/accounting/getdelays.c
+++ b/Documentation/accounting/getdelays.c
@@ -505,6 +505,8 @@ int main(int argc, char *argv[])
if (!loop)
goto done;
break;
+   case TASKSTATS_TYPE_NULL:
+   break;
default:
fprintf(stderr, "Unknown nested"
" nla_type %d\n",
@@ -512,7 +514,8 @@ int main(int argc, char *argv[])
break;
}
len2 += NLA_ALIGN(na->nla_len);
-   na = (struct nlattr *) ((char *) na + 
len2);
+   na = (struct nlattr *)((char *)na +
+  
NLA_ALIGN(na->nla_len));
}
break;
 
-- 
2.8.1

Re: [PATCH net-next] taskstats: fix nl parsing in accounting/getdelays.c

2016-04-27 Thread Nicolas Dichtel

Le 27/04/2016 17:47, Nicolas Dichtel a écrit :
> The type TASKSTATS_TYPE_NULL should always be ignored.
> 
> When jumping to the next attribute, only the length of the current
> attribute should be added, not the length of all nested attributes.
> This last bug was not visible before commit 80df554275c2, because the
> kernel didn't put more than two nested attributes.
> 
> Fixes: a3baf649ca9c ("[PATCH] per-task-delay-accounting: documentation")
> Fixes: 80df554275c2 ("taskstats: use the libnl API to align nlattr on 64-bit")
> Signed-off-by: Nicolas Dichtel <nicolas.dich...@6wind.com>
Please, drop this version. I fatfingered my rebase.

Re: [PATCH net-next 9/9] taskstats: use the libnl API to align nlattr on 64-bit

2016-04-27 Thread Nicolas Dichtel

Le 27/04/2016 14:29, Balbir Singh a écrit :
[snip]
> Please try
> 
> https://www.kernel.org/doc/Documentation/accounting/getdelays.c
A patch follows this mail to fix that.

> 
> iotop uses it as well. My concern is ABI breakage of user space.
My test is ok here, I didn't see a problem.
Code review (from here http://repo.or.cz/w/iotop.git) is also ok.


Regards,
Nicolas

< 1 2 3 4 5 6 >

201 - 300 of 516 matches

Mail list logo