[PATCH ipsec-next] xfrm: use a dedicated slab cache for struct xfrm_state

2018-05-03 Thread Mathias Krause
struct xfrm_state is rather large (768 bytes here) and therefore wastes
quite a lot of memory as it falls into the kmalloc-1024 slab cache,
leaving 256 bytes of unused memory per XFRM state object -- a net waste
of 25%.

Using a dedicated slab cache for struct xfrm_state reduces the level of
internal fragmentation to a minimum.

On my configuration SLUB chooses to create a slab cache covering 4
pages holding 21 objects, resulting in an average memory waste of ~13
bytes per object -- a net waste of only 1.6%.

In my tests this led to memory savings of roughly 2.3MB for 10k XFRM
states.

Signed-off-by: Mathias Krause 
---
 net/xfrm/xfrm_state.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index f9d2f2233f09..73db0ea8692a 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -42,6 +42,7 @@
 
 static unsigned int xfrm_state_hashmax __read_mostly = 1 * 1024 * 1024;
 static __read_mostly seqcount_t xfrm_state_hash_generation = 
SEQCNT_ZERO(xfrm_state_hash_generation);
+static struct kmem_cache *xfrm_state_cache __ro_after_init;
 
 static DECLARE_WORK(xfrm_state_gc_work, xfrm_state_gc_task);
 static HLIST_HEAD(xfrm_state_gc_list);
@@ -451,7 +452,7 @@ static void xfrm_state_gc_destroy(struct xfrm_state *x)
}
xfrm_dev_state_free(x);
security_xfrm_state_free(x);
-   kfree(x);
+   kmem_cache_free(xfrm_state_cache, x);
 }
 
 static void xfrm_state_gc_task(struct work_struct *work)
@@ -563,7 +564,7 @@ struct xfrm_state *xfrm_state_alloc(struct net *net)
 {
struct xfrm_state *x;
 
-   x = kzalloc(sizeof(struct xfrm_state), GFP_ATOMIC);
+   x = kmem_cache_alloc(xfrm_state_cache, GFP_ATOMIC | __GFP_ZERO);
 
if (x) {
write_pnet(&x->xs_net, net);
@@ -2307,6 +2308,10 @@ int __net_init xfrm_state_init(struct net *net)
 {
unsigned int sz;
 
+   if (net_eq(net, &init_net))
+   xfrm_state_cache = KMEM_CACHE(xfrm_state,
+ SLAB_HWCACHE_ALIGN | SLAB_PANIC);
+
INIT_LIST_HEAD(&net->xfrm.state_all);
 
sz = sizeof(struct hlist_head) * 8;
-- 
1.7.10.4



Re: [PATCH] net: ipv6: xfrm6_state: remove VLA usage

2018-03-09 Thread Mathias Krause
On 9 March 2018 at 13:21, Andreas Christoforou
 wrote:
> The kernel would like to have all stack VLA usage removed[1].
>
> Signed-off-by: Andreas Christoforou 
> ---
>  net/ipv6/xfrm6_state.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
> index b15075a..45c0d98 100644
> --- a/net/ipv6/xfrm6_state.c
> +++ b/net/ipv6/xfrm6_state.c
> @@ -62,7 +62,12 @@ __xfrm6_sort(void **dst, void **src, int n, int 
> (*cmp)(void *p), int maxclass)
>  {
> int i;
> int class[XFRM_MAX_DEPTH];
> -   int count[maxclass];
> +   int *count;
> +
> +   count = kcalloc(maxclass + 1, sizeof(*count), GFP_KERNEL);
> +
> +   if (!count)
> +   return -ENOMEM;
>
> memset(count, 0, sizeof(count));
>
> @@ -80,6 +85,7 @@ __xfrm6_sort(void **dst, void **src, int n, int (*cmp)(void 
> *p), int maxclass)
> src[i] = NULL;
> }
>
> +   kfree(count);
> return 0;
>  }

Instead of dynamically allocating and freeing memory here, shouldn't
we just get rid of the maxclass parameter and use XFRM_MAX_DEPTH as
size for the count[] array, too?

Cheers,
Mathias


Re: [PATCH net 0/4] xfrm_user info leaks

2017-08-26 Thread Mathias Krause
On 26 August 2017 at 17:58, Joe Perches  wrote:
> On Sat, 2017-08-26 at 17:08 +0200, Mathias Krause wrote:
>> Hi David, Steffen,
>>
>> the following series fixes a few info leaks due to missing padding byte
>> initialization in the xfrm_user netlink interface.
>
> Were these found by inspection or by some tool?
> If by tool, perhaps there are other _to_user cases?

I found the one in the offload API by manual inspection, looked around
a little and found the others. No tool involved.

I already looked at the xfrm_user API back in 2012 and fixed quite a
few info leaks but missed the ones in the netlink multicast
notification code :/


Regards,
Mathias


[PATCH net 3/4] xfrm_user: fix info leak in build_expire()

2017-08-26 Thread Mathias Krause
The memory reserved to dump the expired xfrm state includes padding
bytes in struct xfrm_user_expire added by the compiler for alignment. To
prevent the heap info leak, memset(0) the remainder of the struct.
Initializing the whole structure isn't needed as copy_to_user_state()
already takes care of clearing the padding bytes within the 'state'
member.

Signed-off-by: Mathias Krause 
---
 net/xfrm/xfrm_user.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index c33516ef52f2..2cbdc81610c6 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2578,6 +2578,8 @@ static int build_expire(struct sk_buff *skb, struct 
xfrm_state *x, const struct
ue = nlmsg_data(nlh);
copy_to_user_state(x, &ue->state);
ue->hard = (c->data.hard != 0) ? 1 : 0;
+   /* clear the padding bytes */
+   memset(&ue->hard + 1, 0, sizeof(*ue) - offsetofend(typeof(*ue), hard));
 
err = xfrm_mark_put(skb, &x->mark);
if (err)
-- 
1.7.10.4



[PATCH net 2/4] xfrm_user: fix info leak in xfrm_notify_sa()

2017-08-26 Thread Mathias Krause
The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the whole struct before filling
it.

Cc: Herbert Xu 
Fixes: 0603eac0d6b7 ("[IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete 
notification")
Signed-off-by: Mathias Krause 
---
 net/xfrm/xfrm_user.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 3259555ae7d7..c33516ef52f2 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2715,6 +2715,7 @@ static int xfrm_notify_sa(struct xfrm_state *x, const 
struct km_event *c)
struct nlattr *attr;
 
id = nlmsg_data(nlh);
+   memset(id, 0, sizeof(*id));
memcpy(&id->daddr, &x->id.daddr, sizeof(id->daddr));
id->spi = x->id.spi;
id->family = x->props.family;
-- 
1.7.10.4



[PATCH net 1/4] xfrm_user: fix info leak in copy_user_offload()

2017-08-26 Thread Mathias Krause
The memory reserved to dump the xfrm offload state includes padding
bytes of struct xfrm_user_offload added by the compiler for alignment.
Add an explicit memset(0) before filling the buffer to avoid the heap
info leak.

Cc: Steffen Klassert 
Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Mathias Krause 
---
 net/xfrm/xfrm_user.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2be4c6af008a..3259555ae7d7 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -796,7 +796,7 @@ static int copy_user_offload(struct xfrm_state_offload 
*xso, struct sk_buff *skb
return -EMSGSIZE;
 
xuo = nla_data(attr);
-
+   memset(xuo, 0, sizeof(*xuo));
xuo->ifindex = xso->dev->ifindex;
xuo->flags = xso->flags;
 
-- 
1.7.10.4



[PATCH net 4/4] xfrm_user: fix info leak in build_aevent()

2017-08-26 Thread Mathias Krause
The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the sa_id before filling it.

Cc: Jamal Hadi Salim 
Fixes: d51d081d6504 ("[IPSEC]: Sync series - user")
Signed-off-by: Mathias Krause 
---
 net/xfrm/xfrm_user.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2cbdc81610c6..9391ced05259 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1869,6 +1869,7 @@ static int build_aevent(struct sk_buff *skb, struct 
xfrm_state *x, const struct
return -EMSGSIZE;
 
id = nlmsg_data(nlh);
+   memset(&id->sa_id, 0, sizeof(id->sa_id));
memcpy(&id->sa_id.daddr, &x->id.daddr, sizeof(x->id.daddr));
id->sa_id.spi = x->id.spi;
id->sa_id.family = x->props.family;
-- 
1.7.10.4



[PATCH net 0/4] xfrm_user info leaks

2017-08-26 Thread Mathias Krause
Hi David, Steffen,

the following series fixes a few info leaks due to missing padding byte
initialization in the xfrm_user netlink interface.

Please apply!

Mathias Krause (4):
  xfrm_user: fix info leak in copy_user_offload()
  xfrm_user: fix info leak in xfrm_notify_sa()
  xfrm_user: fix info leak in build_expire()
  xfrm_user: fix info leak in build_aevent()

 net/xfrm/xfrm_user.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

-- 
1.7.10.4



[PATCH net] rtnl: stats - add missing netlink message size checks

2016-12-28 Thread Mathias Krause
We miss to check if the netlink message is actually big enough to contain
a struct if_stats_msg.

Add a check to prevent userland from sending us short messages that would
make us access memory beyond the end of the message.

Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump...")
Signed-off-by: Mathias Krause 
Cc: Roopa Prabhu 
---
 net/core/rtnetlink.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 18b5aae99bec..75e3ea7bda08 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3898,6 +3898,9 @@ static int rtnl_stats_get(struct sk_buff *skb, struct 
nlmsghdr *nlh)
u32 filter_mask;
int err;
 
+   if (nlmsg_len(nlh) < sizeof(*ifsm))
+   return -EINVAL;
+
ifsm = nlmsg_data(nlh);
if (ifsm->ifindex > 0)
dev = __dev_get_by_index(net, ifsm->ifindex);
@@ -3947,6 +3950,9 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
 
cb->seq = net->dev_base_seq;
 
+   if (nlmsg_len(cb->nlh) < sizeof(*ifsm))
+   return -EINVAL;
+
ifsm = nlmsg_data(cb->nlh);
filter_mask = ifsm->filter_mask;
if (!filter_mask)
-- 
1.7.10.4



[PATCH] rtnl: reset calcit fptr in rtnl_unregister()

2016-11-07 Thread Mathias Krause
To avoid having dangling function pointers left behind, reset calcit in
rtnl_unregister(), too.

This is no issue so far, as only the rtnl core registers a netlink
handler with a calcit hook which won't be unregistered, but may become
one if new code makes use of the calcit hook.

Fixes: c7ac8679bec9 ("rtnetlink: Compute and store minimum ifinfo...")
Cc: Jeff Kirsher 
Cc: Greg Rose 
Signed-off-by: Mathias Krause 
---
 net/core/rtnetlink.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 189cc78c77eb..d4c601604bf7 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -275,6 +275,7 @@ int rtnl_unregister(int protocol, int msgtype)
 
rtnl_msg_handlers[protocol][msgindex].doit = NULL;
rtnl_msg_handlers[protocol][msgindex].dumpit = NULL;
+   rtnl_msg_handlers[protocol][msgindex].calcit = NULL;
 
return 0;
 }
-- 
1.7.10.4



[PATCH] xfrm_user: propagate sec ctx allocation errors

2016-09-08 Thread Mathias Krause
When we fail to attach the security context in xfrm_state_construct()
we'll return 0 as error value which, in turn, will wrongly claim success
to userland when, in fact, we won't be adding / updating the XFRM state.

This is a regression introduced by commit fd21150a0fe1 ("[XFRM] netlink:
Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()").

Fix it by propagating the error returned by security_xfrm_state_alloc()
in this case.

Fixes: fd21150a0fe1 ("[XFRM] netlink: Inline attach_encap_tmpl()...")
Signed-off-by: Mathias Krause 
Cc: Thomas Graf 
---
 net/xfrm/xfrm_user.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index d516845e16e3..b08440a8f123 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -581,9 +581,12 @@ static struct xfrm_state *xfrm_state_construct(struct net 
*net,
if (err)
goto error;
 
-   if (attrs[XFRMA_SEC_CTX] &&
-   security_xfrm_state_alloc(x, nla_data(attrs[XFRMA_SEC_CTX])))
-   goto error;
+   if (attrs[XFRMA_SEC_CTX]) {
+   err = security_xfrm_state_alloc(x,
+   nla_data(attrs[XFRMA_SEC_CTX]));
+   if (err)
+   goto error;
+   }
 
if ((err = xfrm_alloc_replay_state_esn(&x->replay_esn, &x->preplay_esn,
   attrs[XFRMA_REPLAY_ESN_VAL])))
-- 
1.7.10.4



[PATCH net] packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface

2016-04-10 Thread Mathias Krause
Because we miss to wipe the remainder of i->addr[] in packet_mc_add(),
pdiag_put_mclist() leaks uninitialized heap bytes via the
PACKET_DIAG_MCLIST netlink attribute.

Fix this by explicitly memset(0)ing the remaining bytes in i->addr[].

Fixes: eea68e2f1a00 ("packet: Report socket mclist info via diag module")
Signed-off-by: Mathias Krause 
Cc: Eric W. Biederman 
Cc: Pavel Emelyanov 
---
The bug itself precedes commit eea68e2f1a00 but the list wasn't exposed
to userland before the introduction of the packet_diag interface.
Therefore the "Fixes:" line on that commit.

 net/packet/af_packet.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 992396aa635c..86a408cf38d5 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3441,6 +3441,7 @@ static int packet_mc_add(struct sock *sk, struct 
packet_mreq_max *mreq)
i->ifindex = mreq->mr_ifindex;
i->alen = mreq->mr_alen;
memcpy(i->addr, mreq->mr_address, i->alen);
+   memset(i->addr + i->alen, 0, sizeof(i->addr) - i->alen);
i->count = 1;
i->next = po->mclist;
po->mclist = i;
-- 
1.7.10.4



Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-02 Thread Mathias Krause
t; +   POLLOUT | POLLWRNORM | POLLWRBAND);
> return 0;
>
>  out_unlock:
> @@ -1194,6 +1216,8 @@ restart:
>
> sock_hold(sk);
> unix_peer(newsk)= sk;
> +   if (sk->sk_type == SOCK_SEQPACKET)
> +   add_wait_queue(&unix_sk(sk)->peer_wait, 
> &unix_sk(newsk)->wait);
> newsk->sk_state = TCP_ESTABLISHED;
> newsk->sk_type  = sk->sk_type;
> init_peercred(newsk);
> @@ -1220,6 +1244,8 @@ restart:
>
> smp_mb__after_atomic(); /* sock_hold() does an atomic_inc() */
> unix_peer(sk)   = newsk;
> +   if (sk->sk_type == SOCK_SEQPACKET)
> +   add_wait_queue(&unix_sk(newsk)->peer_wait, 
> &unix_sk(sk)->wait);
>
> unix_state_unlock(sk);
>
> @@ -1254,6 +1280,10 @@ static int unix_socketpair(struct socket *socka, 
> struct socket *sockb)
> sock_hold(skb);
> unix_peer(ska) = skb;
> unix_peer(skb) = ska;
> +   if (ska->sk_type != SOCK_STREAM) {
> +   add_wait_queue(&unix_sk(ska)->peer_wait, &unix_sk(skb)->wait);
> +   add_wait_queue(&unix_sk(skb)->peer_wait, &unix_sk(ska)->wait);
> +   }
> init_peercred(ska);
> init_peercred(skb);
>
> @@ -1565,6 +1595,7 @@ restart:
> unix_state_lock(sk);
> if (unix_peer(sk) == other) {
> unix_peer(sk) = NULL;
> +   remove_wait_queue(&unix_sk(other)->peer_wait, 
> &u->wait);
>     unix_state_unlock(sk);
>
> unix_dgram_disconnected(sk, other);
> @@ -2441,7 +2472,6 @@ static unsigned int unix_dgram_poll(struct file *file, 
> struct socket *sock,
> other = unix_peer_get(sk);
> if (other) {
> if (unix_peer(other) != sk) {
> -   sock_poll_wait(file, &unix_sk(other)->peer_wait, 
> wait);
> if (unix_recvq_full(other))
> writable = 0;
> }
> --
> 1.8.2.rc2
>

My reproducer runs on this patch for more than 3 days now without
triggering anything anymore.

Tested-by: Mathias Krause 

Thanks Jason!

Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-30 Thread Mathias Krause
On 30 September 2015 at 15:25, Rainer Weikusat
 wrote:
> Mathias Krause  writes:
>> On 30 September 2015 at 12:56, Rainer Weikusat 
>>  wrote:
>>> In case you want some information on this: This is a kernel warning I
>>> could trigger (more than once) on the single day I could so far spend
>>> looking into this (3.2.54 kernel):
>>>
>>> Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 
>>> list_del+0x9/0x30()
>>> Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam
>>> Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should 
>>> be 88022c38f078, but was dead00100100
>>> [snip]
>>
>> Is that with Jason's patch or a vanilla v3.2.54?
>
> That's a kernel warning which occurred repeatedly (among other "link
> pointer disorganization" warnings) when I tested the "program with
> unknown behaviour" you wrote with the kernel I'm currently supporting a
> while ago (as I already wrote in the original mail).

So I assume Jason's patch is not included in your kernel. Then those
messages are expected; expected even on kernels as old as v2.6.27.
Can you re-try with Jason's patch applied?

Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-30 Thread Mathias Krause
On 30 September 2015 at 12:56, Rainer Weikusat
 wrote:
> Mathias Krause  writes:
>> On 29 September 2015 at 21:09, Jason Baron  wrote:
>>> However, if we call connect on socket 's', to connect to a new socket 'o2', 
>>> we
>>> drop the reference on the original socket 'o'. Thus, we can now close socket
>>> 'o' without unregistering from epoll. Then, when we either close the ep
>>> or unregister 'o', we end up with this list corruption. Thus, this is not a
>>> race per se, but can be triggered sequentially.
>>
>> Sounds profound, but the reproducers calls connect only once per
>> socket. So there is no "connect to a new socket", no?
>> But w/e, see below.
>
> In case you want some information on this: This is a kernel warning I
> could trigger (more than once) on the single day I could so far spend
> looking into this (3.2.54 kernel):
>
> Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 
> list_del+0x9/0x30()
> Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam
> Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should 
> be 88022c38f078, but was dead00100100
> [snip]

Is that with Jason's patch or a vanilla v3.2.54?

Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-29 Thread Mathias Krause
On 29 September 2015 at 21:09, Jason Baron  wrote:
> However, if we call connect on socket 's', to connect to a new socket 'o2', we
> drop the reference on the original socket 'o'. Thus, we can now close socket
> 'o' without unregistering from epoll. Then, when we either close the ep
> or unregister 'o', we end up with this list corruption. Thus, this is not a
> race per se, but can be triggered sequentially.

Sounds profound, but the reproducers calls connect only once per
socket. So there is no "connect to a new socket", no?
But w/e, see below.

> Linus explains the general case in the context the signalfd stuff here:
> https://lkml.org/lkml/2013/10/14/634

I also found that posting while looking for similar bug reports. Also
found that one: https://lkml.org/lkml/2014/5/15/532

> So this may be the case that we've been chasing here for a while...

That bug triggers since commit 3c73419c09 "af_unix: fix 'poll for
write'/ connected DGRAM sockets". That's v2.6.26-rc7, as noted in the
reproducer.

>
> In any case, we could fix with that same POLLFREE mechansim, the simplest
> would be something like:
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 03ee4d3..d499f81 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -392,6 +392,9 @@ static void unix_sock_destructor(struct sock *sk)
> pr_debug("UNIX %p is destroyed, %ld are still alive.\n", sk,
> atomic_long_read(&unix_nr_socks));
>  #endif
> +   /* make sure we remove from epoll */
> +   wake_up_poll(&u->peer_wait, POLLFREE);
> +   synchronize_sched();
>  }
>
>  static void unix_release_sock(struct sock *sk, int embrion)
>
> I'm not suggesting we apply that, but that fixed the supplied test case.
> We could enhance the above, to avoid the free'ing latency there by doing
> the SLAB_DESTROY_BY_RCU for unix sockets. But I'm not convinced
> that this wouldn't be still broken for select()/poll() as well. I think
> we can be in a select() call for socket 's', and if we remove socket
> 'o' from it in the meantime (by doing a connect() on s to somewhere else
> and a close on 'o'), I think we can still crash there. So POLLFREE would
> have to be extended. I tried to hit this with select() but could not,
> but I think if I tried harder I could.
>
> Instead of going further down that route, perhaps something like below
> might be better. The basic idea would be to do away with the 'other'
> poll call in unix_dgram_poll(), and instead revert back to a registering
> on a single wait queue. We add a new wait queue to unix sockets such
> that we can register it with a remote other on connect(). Then we can
> use the wakeup from the remote to wake up the registered unix socket.
> Probably better explained with the patch below. Note I didn't add to
> the remote for SOCK_STREAM, since the poll() routine there doesn't do
> the double wait queue registering:
>
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index 4a167b3..9698aff 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -62,6 +62,7 @@ struct unix_sock {
>  #define UNIX_GC_CANDIDATE  0
>  #define UNIX_GC_MAYBE_CYCLE1
> struct socket_wqpeer_wq;
> +   wait_queue_twait;
>  };
>  #define unix_sk(__sk) ((struct unix_sock *)__sk)
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 03ee4d3..9e0692a 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -420,6 +420,8 @@ static void unix_release_sock(struct sock *sk, int 
> embrion)
> skpair = unix_peer(sk);
>
> if (skpair != NULL) {
> +   if (sk->sk_type != SOCK_STREAM)
> +   remove_wait_queue(&unix_sk(skpair)->peer_wait, 
> &u->wait);
> if (sk->sk_type == SOCK_STREAM || sk->sk_type == 
> SOCK_SEQPACKET) {
> unix_state_lock(skpair);
> /* No more writes */
> @@ -636,6 +638,16 @@ static struct proto unix_proto = {
>   */
>  static struct lock_class_key af_unix_sk_receive_queue_lock_key;
>
> +static int peer_wake(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +   struct unix_sock *u;
> +
> +   u = container_of(wait, struct unix_sock, wait);
> +   wake_up_interruptible_sync_poll(sk_sleep(&u->sk), key);
> +
> +   return 0;
> +}
> +
>  static struct sock *unix_create1(struct net *net, struct socket *sock, int 
> kern)
>  {
> struct sock *sk = NULL;
> @@ -664,6 +676,7 @@ static struct sock *unix_create1(struct net *net, struct 
> socket *sock, int kern)
> INIT_LIST_HEAD(&u->link);
> mutex_init(&u->readlock); /* single task reading lock */
> init_waitqueue_head(&u->peer_wait);
> +   init_waitqueue_func_entry(&u->wait, peer_wake);
> unix_insert_socket(unix_sockets_unbound(sk), sk);
>  out:
> if (sk == NULL)
> @@ -1030,7 +1043,10 @@ restart:
>  */
> if (unix_peer(sk)) {
> struct sock *old_peer = unix

Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-29 Thread Mathias Krause
On 14 September 2015 at 04:39, Eric Wong  wrote:
> +cc Jason Baron since he might be able to provide more insight into
> epoll.
>
> Mathias Krause  wrote:
>> Hi,
>>
>> this is an attempt to resurrect the thread initially started here:
>>
>>   http://thread.gmane.org/gmane.linux.network/353003
>>
>> As that patch fixed the issue for the mentioned reproducer, it did not
>> fix the bug for the production code Olivier is using. :(
>>
>> Changing the reproducer only slightly allows me to trigger the following
>> list debug splat (CONFIG_DEBUG_LIST=y) reliable within seconds -- even
>> with the above linked patch applied:
>>
>> [   50.264249] [ cut here ]
>> [   50.264249] WARNING: CPU: 0 PID: 214 at lib/list_debug.c:59 
>> __list_del_entry+0xa4/0xd0()
>> [   50.264249] list_del corruption. prev->next should be 88003c2c1bb8, 
>> but was 88003f07bbb8
>> [   50.264249] Modules linked in: ipv6 pcspkr serio_raw microcode virtio_net 
>> virtio_pci virtio_ring virtio sr_mod cdrom
>> [   50.264249] CPU: 0 PID: 214 Comm: epoll_bug Not tainted 4.2.0 #75
>> [   50.264249] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> 1.7.5-20140531_083030-gandalf 04/01/2014
>> [   50.264249]  817e902e 88087d08 8155593c 
>> 0007
>> [   50.264249]  88087d58 88087d48 8105202a 
>> 0001
>> [   50.264249]  88003c2c1bb8 88003f07bb80 0286 
>> 88003f736640
>> [   50.264249] Call Trace:
>> [   50.264249]  [] dump_stack+0x4c/0x65
>> [   50.264249]  [] warn_slowpath_common+0x8a/0xc0
>> [   50.264249]  [] warn_slowpath_fmt+0x46/0x50
>> [   50.264249]  [] __list_del_entry+0xa4/0xd0
>> [   50.264249]  [] list_del+0x11/0x40
>> [   50.264249]  [] remove_wait_queue+0x29/0x40
>> [   50.264249]  [] ep_unregister_pollwait.isra.6+0x58/0x1a0
>> [   50.264249]  [] ? 
>> ep_unregister_pollwait.isra.6+0xa9/0x1a0
>> [   50.264249]  [] ep_remove+0x22/0x110
>> [   50.264249]  [] SyS_epoll_ctl+0x62b/0xf70
>> [   50.264249]  [] ? lockdep_sys_exit_thunk+0x12/0x14
>> [   50.264249]  [] entry_SYSCALL_64_fastpath+0x12/0x6f
>> [   50.264249] ---[ end trace d9af9b915df9667e ]---
>> [   50.572100] [ cut here ]
>> [   50.572100] WARNING: CPU: 1 PID: 212 at lib/list_debug.c:62 
>> __list_del_entry+0xc3/0xd0()
>> [   50.584263] list_del corruption. next->prev should be 88003f664c90, 
>> but was 88003f0cb5b8
>> [   50.584263] Modules linked in: ipv6 pcspkr serio_raw microcode virtio_net 
>> virtio_pci virtio_ring virtio sr_mod cdrom
>> [   50.584263] CPU: 1 PID: 212 Comm: epoll_bug Tainted: GW   
>> 4.2.0 #75
>> [   50.584263] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> 1.7.5-20140531_083030-gandalf 04/01/2014
>> [   50.584263]  817e902e 88003d37fce8 8155593c 
>> 0007
>> [   50.584263]  88003d37fd38 88003d37fd28 8105202a 
>> 0001
>> [   50.584263]  88003f664c90 88003f0cb580 0282 
>> 88003f731740
>> [   50.584263] Call Trace:
>> [   50.584263]  [] dump_stack+0x4c/0x65
>> [   50.584263]  [] warn_slowpath_common+0x8a/0xc0
>> [   50.584263]  [] warn_slowpath_fmt+0x46/0x50
>> [   50.584263]  [] __list_del_entry+0xc3/0xd0
>> [   50.584263]  [] list_del+0x11/0x40
>> [   50.584263]  [] remove_wait_queue+0x29/0x40
>> [   50.584263]  [] ep_unregister_pollwait.isra.6+0x58/0x1a0
>> [   50.584263]  [] ? 
>> ep_unregister_pollwait.isra.6+0xa9/0x1a0
>> [   50.584263]  [] ep_remove+0x22/0x110
>> [   50.584263]  [] eventpoll_release_file+0x62/0xa0
>> [   50.584263]  [] __fput+0x1af/0x200
>> [   50.584263]  [] ? int_very_careful+0x5/0x3f
>> [   50.584263]  [] fput+0xe/0x10
>> [   50.584263]  [] task_work_run+0x8d/0xc0
>> [   50.584263]  [] do_notify_resume+0x4f/0x60
>> [   50.584263]  [] int_signal+0x12/0x17
>> [   50.584263] ---[ end trace d9af9b915df9667f ]---
>> [   50.584263] BUG: spinlock already unlocked on CPU#1, epoll_bug/212
>> [   50.584263]  lock: 0x88003f0cb580, .magic: dead4ead, .owner: 
>> /-1, .owner_cpu: -1
>> [   50.584263] CPU: 1 PID: 212 Comm: epoll_bug Tainted: GW   
>> 4.2.0 #75
>> [   50.584263] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> 1.7.5-20140531_083030-gandalf 04/01/2014
>> [   50.584263]  88003f0cb580 88003d37fd38 8155593c 
>> 0007
>> [   50.584263]  f

Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-15 Thread Mathias Krause
On Tue, Sep 15, 2015 at 06:07:05PM +0100, Rainer Weikusat wrote:
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
>  -2233,10 +2233,14  static unsigned int 
> unix_dgram_poll(struct file *file, struct socket *sock,
>   writable = unix_writable(sk);
>   other = unix_peer_get(sk);
>   if (other) {
> - if (unix_peer(other) != sk) {
> + unix_state_lock(other);
> + if (!sock_flag(other, SOCK_DEAD) && unix_peer(other) != sk) {
> + unix_state_unlock(other);
>   sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
>   if (unix_recvq_full(other))
>   writable = 0;
> + } else {
> + unix_state_unlock(other);
>   }
>   sock_put(other);
>   }
> 
> That's obviously not going to help you when 'racing with
> unix_release_sock' as the socket might be released immediately after the
> unix_state_unlock, ie, before sock_poll_wait is called. Provided I
> understand this correctly, the problem is that the socket reference
> count may have become 1 by the time sock_put is called but the
> sock_poll_wait has created a new reference to it which isn't accounted
> for.
> 
> A simple way to fix that could be to do something like
> 
> unix_state_lock(other);
> if (!sock_flag(other, SOCK_DEAD)) sock_poll_wait(...)
> unix_state_unlock(other);

Sorry, but that does not fix the bug. I get the same GPF as before.

> This would imply that unix_release_sock either marked the socket as dead
> before the sock_poll_wait was executed or that the wake_up_interruptible
> call in there will run after ->peer_wait was used (and it will thus
> 'unpollwait' it again).

It must be something else as I also tried the following patch, which
moves the wake_up_interruptible_all() call into the locked code section:

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 03ee4d359f6a..58570a7680ce 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -360,10 +360,12 @@ static void unix_dgram_disconnected(struct sock *sk, 
struct sock *other)
 * we signal error. Messages are lost. Do not make this,
 * when peer was not connected to us.
 */
+   unix_state_lock(other);
if (!sock_flag(other, SOCK_DEAD) && unix_peer(other) == sk) {
other->sk_err = ECONNRESET;
other->sk_error_report(other);
}
+   unix_state_unlock(other);
}
 }
 
@@ -413,11 +415,13 @@ static void unix_release_sock(struct sock *sk, int 
embrion)
u->path.mnt = NULL;
state = sk->sk_state;
sk->sk_state = TCP_CLOSE;
-   unix_state_unlock(sk);
-
-   wake_up_interruptible_all(&u->peer_wait);
 
skpair = unix_peer(sk);
+   unix_peer(sk) = NULL;
+
+   wake_up_interruptible_all(&u->peer_wait);
+
+   unix_state_unlock(sk);
 
if (skpair != NULL) {
if (sk->sk_type == SOCK_STREAM || sk->sk_type == 
SOCK_SEQPACKET) {
@@ -431,7 +435,6 @@ static void unix_release_sock(struct sock *sk, int embrion)
sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
}
sock_put(skpair); /* It may now die */
-   unix_peer(sk) = NULL;
}
 
/* Try to flush out this socket. Throw out buffers at least */
@@ -2440,10 +2443,18 @@ static unsigned int unix_dgram_poll(struct file *file, 
struct socket *sock,
writable = unix_writable(sk);
other = unix_peer_get(sk);
if (other) {
-   if (unix_peer(other) != sk) {
+   unix_state_lock(other);
+
+   if (sock_flag(other, SOCK_DEAD) || (other->sk_shutdown & 
RCV_SHUTDOWN)) {
+   unix_state_unlock(other);
+   writable = 0;
+   } else if (unix_peer(other) != sk) {
sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
+   unix_state_unlock(other);
if (unix_recvq_full(other))
writable = 0;
+   } else {
+   unix_state_unlock(other);
}
sock_put(other);
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

2015-09-13 Thread Mathias Krause
Hi,

this is an attempt to resurrect the thread initially started here:

  http://thread.gmane.org/gmane.linux.network/353003

As that patch fixed the issue for the mentioned reproducer, it did not
fix the bug for the production code Olivier is using. :(

Changing the reproducer only slightly allows me to trigger the following
list debug splat (CONFIG_DEBUG_LIST=y) reliable within seconds -- even
with the above linked patch applied:

[   50.264249] [ cut here ]
[   50.264249] WARNING: CPU: 0 PID: 214 at lib/list_debug.c:59 
__list_del_entry+0xa4/0xd0()
[   50.264249] list_del corruption. prev->next should be 88003c2c1bb8, but 
was 88003f07bbb8
[   50.264249] Modules linked in: ipv6 pcspkr serio_raw microcode virtio_net 
virtio_pci virtio_ring virtio sr_mod cdrom
[   50.264249] CPU: 0 PID: 214 Comm: epoll_bug Not tainted 4.2.0 #75
[   50.264249] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[   50.264249]  817e902e 88087d08 8155593c 
0007
[   50.264249]  88087d58 88087d48 8105202a 
0001
[   50.264249]  88003c2c1bb8 88003f07bb80 0286 
88003f736640
[   50.264249] Call Trace:
[   50.264249]  [] dump_stack+0x4c/0x65
[   50.264249]  [] warn_slowpath_common+0x8a/0xc0
[   50.264249]  [] warn_slowpath_fmt+0x46/0x50
[   50.264249]  [] __list_del_entry+0xa4/0xd0
[   50.264249]  [] list_del+0x11/0x40
[   50.264249]  [] remove_wait_queue+0x29/0x40
[   50.264249]  [] ep_unregister_pollwait.isra.6+0x58/0x1a0
[   50.264249]  [] ? ep_unregister_pollwait.isra.6+0xa9/0x1a0
[   50.264249]  [] ep_remove+0x22/0x110
[   50.264249]  [] SyS_epoll_ctl+0x62b/0xf70
[   50.264249]  [] ? lockdep_sys_exit_thunk+0x12/0x14
[   50.264249]  [] entry_SYSCALL_64_fastpath+0x12/0x6f
[   50.264249] ---[ end trace d9af9b915df9667e ]---
[   50.572100] [ cut here ]
[   50.572100] WARNING: CPU: 1 PID: 212 at lib/list_debug.c:62 
__list_del_entry+0xc3/0xd0()
[   50.584263] list_del corruption. next->prev should be 88003f664c90, but 
was 88003f0cb5b8
[   50.584263] Modules linked in: ipv6 pcspkr serio_raw microcode virtio_net 
virtio_pci virtio_ring virtio sr_mod cdrom
[   50.584263] CPU: 1 PID: 212 Comm: epoll_bug Tainted: GW   4.2.0 
#75
[   50.584263] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[   50.584263]  817e902e 88003d37fce8 8155593c 
0007
[   50.584263]  88003d37fd38 88003d37fd28 8105202a 
0001
[   50.584263]  88003f664c90 88003f0cb580 0282 
88003f731740
[   50.584263] Call Trace:
[   50.584263]  [] dump_stack+0x4c/0x65
[   50.584263]  [] warn_slowpath_common+0x8a/0xc0
[   50.584263]  [] warn_slowpath_fmt+0x46/0x50
[   50.584263]  [] __list_del_entry+0xc3/0xd0
[   50.584263]  [] list_del+0x11/0x40
[   50.584263]  [] remove_wait_queue+0x29/0x40
[   50.584263]  [] ep_unregister_pollwait.isra.6+0x58/0x1a0
[   50.584263]  [] ? ep_unregister_pollwait.isra.6+0xa9/0x1a0
[   50.584263]  [] ep_remove+0x22/0x110
[   50.584263]  [] eventpoll_release_file+0x62/0xa0
[   50.584263]  [] __fput+0x1af/0x200
[   50.584263]  [] ? int_very_careful+0x5/0x3f
[   50.584263]  [] fput+0xe/0x10
[   50.584263]  [] task_work_run+0x8d/0xc0
[   50.584263]  [] do_notify_resume+0x4f/0x60
[   50.584263]  [] int_signal+0x12/0x17
[   50.584263] ---[ end trace d9af9b915df9667f ]---
[   50.584263] BUG: spinlock already unlocked on CPU#1, epoll_bug/212
[   50.584263]  lock: 0x88003f0cb580, .magic: dead4ead, .owner: /-1, 
.owner_cpu: -1
[   50.584263] CPU: 1 PID: 212 Comm: epoll_bug Tainted: GW   4.2.0 
#75
[   50.584263] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[   50.584263]  88003f0cb580 88003d37fd38 8155593c 
0007
[   50.584263]   88003d37fd58 810a3375 
88003f0cb580
[   50.584263]  817b9cc8 88003d37fd78 810a33f6 
88003f0cb580
[   50.584263] Call Trace:
[   50.584263]  [] dump_stack+0x4c/0x65
[   50.584263]  [] spin_dump+0x85/0xe0
[   50.584263]  [] spin_bug+0x26/0x30
[   50.584263]  [] do_raw_spin_unlock+0x75/0xa0
[   50.584263]  [] _raw_spin_unlock_irqrestore+0x2c/0x50
[   50.584263]  [] remove_wait_queue+0x34/0x40
[   50.584263]  [] ep_unregister_pollwait.isra.6+0x58/0x1a0
[   50.584263]  [] ? ep_unregister_pollwait.isra.6+0xa9/0x1a0
[   50.584263]  [] ep_remove+0x22/0x110
[   50.584263]  [] eventpoll_release_file+0x62/0xa0
[   50.584263]  [] __fput+0x1af/0x200
[   50.584263]  [] ? int_very_careful+0x5/0x3f
[   50.584263]  [] fput+0xe/0x10
[   50.584263]  [] task_work_run+0x8d/0xc0
[   50.584263]  [] do_notify_resume+0x4f/0x60
[   50.584263]  [] int_signal+0x12/0x17
[...]

That 'spinlock already unlocked' message is also interesting. But even
b

[PATCH] xfrm6: Fix ICMPv6 and MH header checks in _decode_session6

2015-09-11 Thread Mathias Krause
From: Mathias Krause 

Ensure there's enough data left prior calling pskb_may_pull(). If
skb->data was already advanced, we'll call pskb_may_pull() with a
negative value converted to unsigned int -- leading to a huge
positive value. That won't matter in practice as pskb_may_pull()
will likely fail in this case, but it leads to underflow reports on
kernels handling such kind of over-/underflows, e.g. a PaX enabled
kernel instrumented with the size_overflow plugin.

Reported-by: satmd 
Reported-and-tested-by: Marcin Jurkowski 
Signed-off-by: Mathias Krause 
Cc: PaX Team 
---
 net/ipv6/xfrm6_policy.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index ed0583c1b9fc..c988c3f033cf 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -174,7 +174,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int 
reverse)
return;
 
case IPPROTO_ICMPV6:
-   if (!onlyproto && pskb_may_pull(skb, nh + offset + 2 - 
skb->data)) {
+   if (!onlyproto && (nh + offset + 2 < skb->data ||
+   pskb_may_pull(skb, nh + offset + 2 - skb->data))) {
u8 *icmp;
 
nh = skb_network_header(skb);
@@ -188,7 +189,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int 
reverse)
 #if IS_ENABLED(CONFIG_IPV6_MIP6)
case IPPROTO_MH:
offset += ipv6_optlen(exthdr);
-   if (!onlyproto && pskb_may_pull(skb, nh + offset + 3 - 
skb->data)) {
+   if (!onlyproto && (nh + offset + 3 < skb->data ||
+   pskb_may_pull(skb, nh + offset + 3 - skb->data))) {
struct ip6_mh *mh;
 
nh = skb_network_header(skb);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next] net: #ifdefify sk_classid member of struct sock

2015-07-19 Thread Mathias Krause
The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is
enabled. #ifdefify it to reduce the size of struct sock on 32 bit
systems, at least.

Signed-off-by: Mathias Krause 
---
v2:
- ensure we'll error out in nft_meta_get_init() if CONFIG_CGROUP_NET_CLASSID
  is not set

 include/net/sock.h   |2 ++
 net/netfilter/nft_meta.c |4 
 2 files changed, 6 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 05a8c1aea251..4353ef70bf48 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -429,7 +429,9 @@ struct sock {
void*sk_security;
 #endif
__u32   sk_mark;
+#ifdef CONFIG_CGROUP_NET_CLASSID
u32 sk_classid;
+#endif
struct cg_proto *sk_cgrp;
void(*sk_state_change)(struct sock *sk);
void(*sk_data_ready)(struct sock *sk);
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 52561e1c31e2..cb2f13ebb5a6 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -166,11 +166,13 @@ void nft_meta_get_eval(const struct nft_expr *expr,
goto err;
*dest = out->group;
break;
+#ifdef CONFIG_CGROUP_NET_CLASSID
case NFT_META_CGROUP:
if (skb->sk == NULL || !sk_fullsock(skb->sk))
goto err;
*dest = skb->sk->sk_classid;
break;
+#endif
default:
WARN_ON(1);
goto err;
@@ -246,7 +248,9 @@ int nft_meta_get_init(const struct nft_ctx *ctx,
case NFT_META_CPU:
case NFT_META_IIFGROUP:
case NFT_META_OIFGROUP:
+#ifdef CONFIG_CGROUP_NET_CLASSID
case NFT_META_CGROUP:
+#endif
len = sizeof(u32);
break;
case NFT_META_IIFNAME:
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: #ifdefify sk_classid member of struct sock

2015-07-19 Thread Mathias Krause
On 19 July 2015 at 20:42, David Miller  wrote:
> From: Mathias Krause 
> Date: Sun, 19 Jul 2015 20:17:41 +0200
>
>> The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is
>> enabled. #ifdefify it to reduce the size of struct sock on 32 bit
>> systems, at least.
>>
>> Signed-off-by: Mathias Krause 
>
> Are you sure NFT_META_CGROUP cannot enter nft_meta_get_eval()?  If so it'll
> WARN_ON().

I just tried to bring the code in line as it already is for
NFT_META_RTCLASSID and NFT_META_SECMARK. But, obviously, I messed it
up by missing the nft_meta_get_init() function. Sorry for that!

The init function gets called prior to the eval function (see
nft_expr_init() and nf_tables_newrule()). If it returns an error, the
latter won't be called. So the fix is, to add the #ifdef to
nft_meta_get_init() as well.

Another direct caller of  nft_meta_get_eval() is
nft_meta_bridge_get_eval(). However, it also complies to the init /
eval rule by calling nft_meta_get_init() in its init function, so will
error out in the case, too.

I'll send a v2 soon.

Thanks for the review!

> I really don't like changes like this.

Sorry. I should have taken more attention.


Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] net: #ifdefify sk_classid member of struct sock

2015-07-19 Thread Mathias Krause
The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is
enabled. #ifdefify it to reduce the size of struct sock on 32 bit
systems, at least.

Signed-off-by: Mathias Krause 
---
 include/net/sock.h   |2 ++
 net/netfilter/nft_meta.c |2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 05a8c1aea251..4353ef70bf48 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -429,7 +429,9 @@ struct sock {
void*sk_security;
 #endif
__u32   sk_mark;
+#ifdef CONFIG_CGROUP_NET_CLASSID
u32 sk_classid;
+#endif
struct cg_proto *sk_cgrp;
void(*sk_state_change)(struct sock *sk);
void(*sk_data_ready)(struct sock *sk);
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 52561e1c31e2..005b374d1967 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -166,11 +166,13 @@ void nft_meta_get_eval(const struct nft_expr *expr,
goto err;
*dest = out->group;
break;
+#ifdef CONFIG_CGROUP_NET_CLASSID
case NFT_META_CGROUP:
if (skb->sk == NULL || !sk_fullsock(skb->sk))
goto err;
*dest = skb->sk->sk_classid;
break;
+#endif
default:
WARN_ON(1);
goto err;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html