[PATCH ipsec-next] xfrm: use a dedicated slab cache for struct xfrm_state
struct xfrm_state is rather large (768 bytes here) and therefore wastes quite a lot of memory as it falls into the kmalloc-1024 slab cache, leaving 256 bytes of unused memory per XFRM state object -- a net waste of 25%. Using a dedicated slab cache for struct xfrm_state reduces the level of internal fragmentation to a minimum. On my configuration SLUB chooses to create a slab cache covering 4 pages holding 21 objects, resulting in an average memory waste of ~13 bytes per object -- a net waste of only 1.6%. In my tests this led to memory savings of roughly 2.3MB for 10k XFRM states. Signed-off-by: Mathias Krause --- net/xfrm/xfrm_state.c |9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index f9d2f2233f09..73db0ea8692a 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -42,6 +42,7 @@ static unsigned int xfrm_state_hashmax __read_mostly = 1 * 1024 * 1024; static __read_mostly seqcount_t xfrm_state_hash_generation = SEQCNT_ZERO(xfrm_state_hash_generation); +static struct kmem_cache *xfrm_state_cache __ro_after_init; static DECLARE_WORK(xfrm_state_gc_work, xfrm_state_gc_task); static HLIST_HEAD(xfrm_state_gc_list); @@ -451,7 +452,7 @@ static void xfrm_state_gc_destroy(struct xfrm_state *x) } xfrm_dev_state_free(x); security_xfrm_state_free(x); - kfree(x); + kmem_cache_free(xfrm_state_cache, x); } static void xfrm_state_gc_task(struct work_struct *work) @@ -563,7 +564,7 @@ struct xfrm_state *xfrm_state_alloc(struct net *net) { struct xfrm_state *x; - x = kzalloc(sizeof(struct xfrm_state), GFP_ATOMIC); + x = kmem_cache_alloc(xfrm_state_cache, GFP_ATOMIC | __GFP_ZERO); if (x) { write_pnet(&x->xs_net, net); @@ -2307,6 +2308,10 @@ int __net_init xfrm_state_init(struct net *net) { unsigned int sz; + if (net_eq(net, &init_net)) + xfrm_state_cache = KMEM_CACHE(xfrm_state, + SLAB_HWCACHE_ALIGN | SLAB_PANIC); + INIT_LIST_HEAD(&net->xfrm.state_all); sz = sizeof(struct hlist_head) * 8; -- 1.7.10.4
Re: [PATCH] net: ipv6: xfrm6_state: remove VLA usage
On 9 March 2018 at 13:21, Andreas Christoforou wrote: > The kernel would like to have all stack VLA usage removed[1]. > > Signed-off-by: Andreas Christoforou > --- > net/ipv6/xfrm6_state.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c > index b15075a..45c0d98 100644 > --- a/net/ipv6/xfrm6_state.c > +++ b/net/ipv6/xfrm6_state.c > @@ -62,7 +62,12 @@ __xfrm6_sort(void **dst, void **src, int n, int > (*cmp)(void *p), int maxclass) > { > int i; > int class[XFRM_MAX_DEPTH]; > - int count[maxclass]; > + int *count; > + > + count = kcalloc(maxclass + 1, sizeof(*count), GFP_KERNEL); > + > + if (!count) > + return -ENOMEM; > > memset(count, 0, sizeof(count)); > > @@ -80,6 +85,7 @@ __xfrm6_sort(void **dst, void **src, int n, int (*cmp)(void > *p), int maxclass) > src[i] = NULL; > } > > + kfree(count); > return 0; > } Instead of dynamically allocating and freeing memory here, shouldn't we just get rid of the maxclass parameter and use XFRM_MAX_DEPTH as size for the count[] array, too? Cheers, Mathias
Re: [PATCH net 0/4] xfrm_user info leaks
On 26 August 2017 at 17:58, Joe Perches wrote: > On Sat, 2017-08-26 at 17:08 +0200, Mathias Krause wrote: >> Hi David, Steffen, >> >> the following series fixes a few info leaks due to missing padding byte >> initialization in the xfrm_user netlink interface. > > Were these found by inspection or by some tool? > If by tool, perhaps there are other _to_user cases? I found the one in the offload API by manual inspection, looked around a little and found the others. No tool involved. I already looked at the xfrm_user API back in 2012 and fixed quite a few info leaks but missed the ones in the netlink multicast notification code :/ Regards, Mathias
[PATCH net 3/4] xfrm_user: fix info leak in build_expire()
The memory reserved to dump the expired xfrm state includes padding bytes in struct xfrm_user_expire added by the compiler for alignment. To prevent the heap info leak, memset(0) the remainder of the struct. Initializing the whole structure isn't needed as copy_to_user_state() already takes care of clearing the padding bytes within the 'state' member. Signed-off-by: Mathias Krause --- net/xfrm/xfrm_user.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index c33516ef52f2..2cbdc81610c6 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -2578,6 +2578,8 @@ static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct ue = nlmsg_data(nlh); copy_to_user_state(x, &ue->state); ue->hard = (c->data.hard != 0) ? 1 : 0; + /* clear the padding bytes */ + memset(&ue->hard + 1, 0, sizeof(*ue) - offsetofend(typeof(*ue), hard)); err = xfrm_mark_put(skb, &x->mark); if (err) -- 1.7.10.4
[PATCH net 2/4] xfrm_user: fix info leak in xfrm_notify_sa()
The memory reserved to dump the ID of the xfrm state includes a padding byte in struct xfrm_usersa_id added by the compiler for alignment. To prevent the heap info leak, memset(0) the whole struct before filling it. Cc: Herbert Xu Fixes: 0603eac0d6b7 ("[IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete notification") Signed-off-by: Mathias Krause --- net/xfrm/xfrm_user.c |1 + 1 file changed, 1 insertion(+) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 3259555ae7d7..c33516ef52f2 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -2715,6 +2715,7 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c) struct nlattr *attr; id = nlmsg_data(nlh); + memset(id, 0, sizeof(*id)); memcpy(&id->daddr, &x->id.daddr, sizeof(id->daddr)); id->spi = x->id.spi; id->family = x->props.family; -- 1.7.10.4
[PATCH net 1/4] xfrm_user: fix info leak in copy_user_offload()
The memory reserved to dump the xfrm offload state includes padding bytes of struct xfrm_user_offload added by the compiler for alignment. Add an explicit memset(0) before filling the buffer to avoid the heap info leak. Cc: Steffen Klassert Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Mathias Krause --- net/xfrm/xfrm_user.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 2be4c6af008a..3259555ae7d7 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -796,7 +796,7 @@ static int copy_user_offload(struct xfrm_state_offload *xso, struct sk_buff *skb return -EMSGSIZE; xuo = nla_data(attr); - + memset(xuo, 0, sizeof(*xuo)); xuo->ifindex = xso->dev->ifindex; xuo->flags = xso->flags; -- 1.7.10.4
[PATCH net 4/4] xfrm_user: fix info leak in build_aevent()
The memory reserved to dump the ID of the xfrm state includes a padding byte in struct xfrm_usersa_id added by the compiler for alignment. To prevent the heap info leak, memset(0) the sa_id before filling it. Cc: Jamal Hadi Salim Fixes: d51d081d6504 ("[IPSEC]: Sync series - user") Signed-off-by: Mathias Krause --- net/xfrm/xfrm_user.c |1 + 1 file changed, 1 insertion(+) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 2cbdc81610c6..9391ced05259 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -1869,6 +1869,7 @@ static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct return -EMSGSIZE; id = nlmsg_data(nlh); + memset(&id->sa_id, 0, sizeof(id->sa_id)); memcpy(&id->sa_id.daddr, &x->id.daddr, sizeof(x->id.daddr)); id->sa_id.spi = x->id.spi; id->sa_id.family = x->props.family; -- 1.7.10.4
[PATCH net 0/4] xfrm_user info leaks
Hi David, Steffen, the following series fixes a few info leaks due to missing padding byte initialization in the xfrm_user netlink interface. Please apply! Mathias Krause (4): xfrm_user: fix info leak in copy_user_offload() xfrm_user: fix info leak in xfrm_notify_sa() xfrm_user: fix info leak in build_expire() xfrm_user: fix info leak in build_aevent() net/xfrm/xfrm_user.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) -- 1.7.10.4
[PATCH net] rtnl: stats - add missing netlink message size checks
We miss to check if the netlink message is actually big enough to contain a struct if_stats_msg. Add a check to prevent userland from sending us short messages that would make us access memory beyond the end of the message. Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump...") Signed-off-by: Mathias Krause Cc: Roopa Prabhu --- net/core/rtnetlink.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 18b5aae99bec..75e3ea7bda08 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3898,6 +3898,9 @@ static int rtnl_stats_get(struct sk_buff *skb, struct nlmsghdr *nlh) u32 filter_mask; int err; + if (nlmsg_len(nlh) < sizeof(*ifsm)) + return -EINVAL; + ifsm = nlmsg_data(nlh); if (ifsm->ifindex > 0) dev = __dev_get_by_index(net, ifsm->ifindex); @@ -3947,6 +3950,9 @@ static int rtnl_stats_dump(struct sk_buff *skb, struct netlink_callback *cb) cb->seq = net->dev_base_seq; + if (nlmsg_len(cb->nlh) < sizeof(*ifsm)) + return -EINVAL; + ifsm = nlmsg_data(cb->nlh); filter_mask = ifsm->filter_mask; if (!filter_mask) -- 1.7.10.4
[PATCH] rtnl: reset calcit fptr in rtnl_unregister()
To avoid having dangling function pointers left behind, reset calcit in rtnl_unregister(), too. This is no issue so far, as only the rtnl core registers a netlink handler with a calcit hook which won't be unregistered, but may become one if new code makes use of the calcit hook. Fixes: c7ac8679bec9 ("rtnetlink: Compute and store minimum ifinfo...") Cc: Jeff Kirsher Cc: Greg Rose Signed-off-by: Mathias Krause --- net/core/rtnetlink.c |1 + 1 file changed, 1 insertion(+) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 189cc78c77eb..d4c601604bf7 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -275,6 +275,7 @@ int rtnl_unregister(int protocol, int msgtype) rtnl_msg_handlers[protocol][msgindex].doit = NULL; rtnl_msg_handlers[protocol][msgindex].dumpit = NULL; + rtnl_msg_handlers[protocol][msgindex].calcit = NULL; return 0; } -- 1.7.10.4
[PATCH] xfrm_user: propagate sec ctx allocation errors
When we fail to attach the security context in xfrm_state_construct() we'll return 0 as error value which, in turn, will wrongly claim success to userland when, in fact, we won't be adding / updating the XFRM state. This is a regression introduced by commit fd21150a0fe1 ("[XFRM] netlink: Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()"). Fix it by propagating the error returned by security_xfrm_state_alloc() in this case. Fixes: fd21150a0fe1 ("[XFRM] netlink: Inline attach_encap_tmpl()...") Signed-off-by: Mathias Krause Cc: Thomas Graf --- net/xfrm/xfrm_user.c |9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index d516845e16e3..b08440a8f123 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -581,9 +581,12 @@ static struct xfrm_state *xfrm_state_construct(struct net *net, if (err) goto error; - if (attrs[XFRMA_SEC_CTX] && - security_xfrm_state_alloc(x, nla_data(attrs[XFRMA_SEC_CTX]))) - goto error; + if (attrs[XFRMA_SEC_CTX]) { + err = security_xfrm_state_alloc(x, + nla_data(attrs[XFRMA_SEC_CTX])); + if (err) + goto error; + } if ((err = xfrm_alloc_replay_state_esn(&x->replay_esn, &x->preplay_esn, attrs[XFRMA_REPLAY_ESN_VAL]))) -- 1.7.10.4
[PATCH net] packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface
Because we miss to wipe the remainder of i->addr[] in packet_mc_add(), pdiag_put_mclist() leaks uninitialized heap bytes via the PACKET_DIAG_MCLIST netlink attribute. Fix this by explicitly memset(0)ing the remaining bytes in i->addr[]. Fixes: eea68e2f1a00 ("packet: Report socket mclist info via diag module") Signed-off-by: Mathias Krause Cc: Eric W. Biederman Cc: Pavel Emelyanov --- The bug itself precedes commit eea68e2f1a00 but the list wasn't exposed to userland before the introduction of the packet_diag interface. Therefore the "Fixes:" line on that commit. net/packet/af_packet.c |1 + 1 file changed, 1 insertion(+) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 992396aa635c..86a408cf38d5 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3441,6 +3441,7 @@ static int packet_mc_add(struct sock *sk, struct packet_mreq_max *mreq) i->ifindex = mreq->mr_ifindex; i->alen = mreq->mr_alen; memcpy(i->addr, mreq->mr_address, i->alen); + memset(i->addr + i->alen, 0, sizeof(i->addr) - i->alen); i->count = 1; i->next = po->mclist; po->mclist = i; -- 1.7.10.4
Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()
t; + POLLOUT | POLLWRNORM | POLLWRBAND); > return 0; > > out_unlock: > @@ -1194,6 +1216,8 @@ restart: > > sock_hold(sk); > unix_peer(newsk)= sk; > + if (sk->sk_type == SOCK_SEQPACKET) > + add_wait_queue(&unix_sk(sk)->peer_wait, > &unix_sk(newsk)->wait); > newsk->sk_state = TCP_ESTABLISHED; > newsk->sk_type = sk->sk_type; > init_peercred(newsk); > @@ -1220,6 +1244,8 @@ restart: > > smp_mb__after_atomic(); /* sock_hold() does an atomic_inc() */ > unix_peer(sk) = newsk; > + if (sk->sk_type == SOCK_SEQPACKET) > + add_wait_queue(&unix_sk(newsk)->peer_wait, > &unix_sk(sk)->wait); > > unix_state_unlock(sk); > > @@ -1254,6 +1280,10 @@ static int unix_socketpair(struct socket *socka, > struct socket *sockb) > sock_hold(skb); > unix_peer(ska) = skb; > unix_peer(skb) = ska; > + if (ska->sk_type != SOCK_STREAM) { > + add_wait_queue(&unix_sk(ska)->peer_wait, &unix_sk(skb)->wait); > + add_wait_queue(&unix_sk(skb)->peer_wait, &unix_sk(ska)->wait); > + } > init_peercred(ska); > init_peercred(skb); > > @@ -1565,6 +1595,7 @@ restart: > unix_state_lock(sk); > if (unix_peer(sk) == other) { > unix_peer(sk) = NULL; > + remove_wait_queue(&unix_sk(other)->peer_wait, > &u->wait); > unix_state_unlock(sk); > > unix_dgram_disconnected(sk, other); > @@ -2441,7 +2472,6 @@ static unsigned int unix_dgram_poll(struct file *file, > struct socket *sock, > other = unix_peer_get(sk); > if (other) { > if (unix_peer(other) != sk) { > - sock_poll_wait(file, &unix_sk(other)->peer_wait, > wait); > if (unix_recvq_full(other)) > writable = 0; > } > -- > 1.8.2.rc2 > My reproducer runs on this patch for more than 3 days now without triggering anything anymore. Tested-by: Mathias Krause Thanks Jason! Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
On 30 September 2015 at 15:25, Rainer Weikusat wrote: > Mathias Krause writes: >> On 30 September 2015 at 12:56, Rainer Weikusat >> wrote: >>> In case you want some information on this: This is a kernel warning I >>> could trigger (more than once) on the single day I could so far spend >>> looking into this (3.2.54 kernel): >>> >>> Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 >>> list_del+0x9/0x30() >>> Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam >>> Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should >>> be 88022c38f078, but was dead00100100 >>> [snip] >> >> Is that with Jason's patch or a vanilla v3.2.54? > > That's a kernel warning which occurred repeatedly (among other "link > pointer disorganization" warnings) when I tested the "program with > unknown behaviour" you wrote with the kernel I'm currently supporting a > while ago (as I already wrote in the original mail). So I assume Jason's patch is not included in your kernel. Then those messages are expected; expected even on kernels as old as v2.6.27. Can you re-try with Jason's patch applied? Thanks, Mathias -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
On 30 September 2015 at 12:56, Rainer Weikusat wrote: > Mathias Krause writes: >> On 29 September 2015 at 21:09, Jason Baron wrote: >>> However, if we call connect on socket 's', to connect to a new socket 'o2', >>> we >>> drop the reference on the original socket 'o'. Thus, we can now close socket >>> 'o' without unregistering from epoll. Then, when we either close the ep >>> or unregister 'o', we end up with this list corruption. Thus, this is not a >>> race per se, but can be triggered sequentially. >> >> Sounds profound, but the reproducers calls connect only once per >> socket. So there is no "connect to a new socket", no? >> But w/e, see below. > > In case you want some information on this: This is a kernel warning I > could trigger (more than once) on the single day I could so far spend > looking into this (3.2.54 kernel): > > Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 > list_del+0x9/0x30() > Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam > Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should > be 88022c38f078, but was dead00100100 > [snip] Is that with Jason's patch or a vanilla v3.2.54? Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
On 29 September 2015 at 21:09, Jason Baron wrote: > However, if we call connect on socket 's', to connect to a new socket 'o2', we > drop the reference on the original socket 'o'. Thus, we can now close socket > 'o' without unregistering from epoll. Then, when we either close the ep > or unregister 'o', we end up with this list corruption. Thus, this is not a > race per se, but can be triggered sequentially. Sounds profound, but the reproducers calls connect only once per socket. So there is no "connect to a new socket", no? But w/e, see below. > Linus explains the general case in the context the signalfd stuff here: > https://lkml.org/lkml/2013/10/14/634 I also found that posting while looking for similar bug reports. Also found that one: https://lkml.org/lkml/2014/5/15/532 > So this may be the case that we've been chasing here for a while... That bug triggers since commit 3c73419c09 "af_unix: fix 'poll for write'/ connected DGRAM sockets". That's v2.6.26-rc7, as noted in the reproducer. > > In any case, we could fix with that same POLLFREE mechansim, the simplest > would be something like: > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 03ee4d3..d499f81 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -392,6 +392,9 @@ static void unix_sock_destructor(struct sock *sk) > pr_debug("UNIX %p is destroyed, %ld are still alive.\n", sk, > atomic_long_read(&unix_nr_socks)); > #endif > + /* make sure we remove from epoll */ > + wake_up_poll(&u->peer_wait, POLLFREE); > + synchronize_sched(); > } > > static void unix_release_sock(struct sock *sk, int embrion) > > I'm not suggesting we apply that, but that fixed the supplied test case. > We could enhance the above, to avoid the free'ing latency there by doing > the SLAB_DESTROY_BY_RCU for unix sockets. But I'm not convinced > that this wouldn't be still broken for select()/poll() as well. I think > we can be in a select() call for socket 's', and if we remove socket > 'o' from it in the meantime (by doing a connect() on s to somewhere else > and a close on 'o'), I think we can still crash there. So POLLFREE would > have to be extended. I tried to hit this with select() but could not, > but I think if I tried harder I could. > > Instead of going further down that route, perhaps something like below > might be better. The basic idea would be to do away with the 'other' > poll call in unix_dgram_poll(), and instead revert back to a registering > on a single wait queue. We add a new wait queue to unix sockets such > that we can register it with a remote other on connect(). Then we can > use the wakeup from the remote to wake up the registered unix socket. > Probably better explained with the patch below. Note I didn't add to > the remote for SOCK_STREAM, since the poll() routine there doesn't do > the double wait queue registering: > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h > index 4a167b3..9698aff 100644 > --- a/include/net/af_unix.h > +++ b/include/net/af_unix.h > @@ -62,6 +62,7 @@ struct unix_sock { > #define UNIX_GC_CANDIDATE 0 > #define UNIX_GC_MAYBE_CYCLE1 > struct socket_wqpeer_wq; > + wait_queue_twait; > }; > #define unix_sk(__sk) ((struct unix_sock *)__sk) > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 03ee4d3..9e0692a 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -420,6 +420,8 @@ static void unix_release_sock(struct sock *sk, int > embrion) > skpair = unix_peer(sk); > > if (skpair != NULL) { > + if (sk->sk_type != SOCK_STREAM) > + remove_wait_queue(&unix_sk(skpair)->peer_wait, > &u->wait); > if (sk->sk_type == SOCK_STREAM || sk->sk_type == > SOCK_SEQPACKET) { > unix_state_lock(skpair); > /* No more writes */ > @@ -636,6 +638,16 @@ static struct proto unix_proto = { > */ > static struct lock_class_key af_unix_sk_receive_queue_lock_key; > > +static int peer_wake(wait_queue_t *wait, unsigned mode, int sync, void *key) > +{ > + struct unix_sock *u; > + > + u = container_of(wait, struct unix_sock, wait); > + wake_up_interruptible_sync_poll(sk_sleep(&u->sk), key); > + > + return 0; > +} > + > static struct sock *unix_create1(struct net *net, struct socket *sock, int > kern) > { > struct sock *sk = NULL; > @@ -664,6 +676,7 @@ static struct sock *unix_create1(struct net *net, struct > socket *sock, int kern) > INIT_LIST_HEAD(&u->link); > mutex_init(&u->readlock); /* single task reading lock */ > init_waitqueue_head(&u->peer_wait); > + init_waitqueue_func_entry(&u->wait, peer_wake); > unix_insert_socket(unix_sockets_unbound(sk), sk); > out: > if (sk == NULL) > @@ -1030,7 +1043,10 @@ restart: > */ > if (unix_peer(sk)) { > struct sock *old_peer = unix
Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
On 14 September 2015 at 04:39, Eric Wong wrote: > +cc Jason Baron since he might be able to provide more insight into > epoll. > > Mathias Krause wrote: >> Hi, >> >> this is an attempt to resurrect the thread initially started here: >> >> http://thread.gmane.org/gmane.linux.network/353003 >> >> As that patch fixed the issue for the mentioned reproducer, it did not >> fix the bug for the production code Olivier is using. :( >> >> Changing the reproducer only slightly allows me to trigger the following >> list debug splat (CONFIG_DEBUG_LIST=y) reliable within seconds -- even >> with the above linked patch applied: >> >> [ 50.264249] [ cut here ] >> [ 50.264249] WARNING: CPU: 0 PID: 214 at lib/list_debug.c:59 >> __list_del_entry+0xa4/0xd0() >> [ 50.264249] list_del corruption. prev->next should be 88003c2c1bb8, >> but was 88003f07bbb8 >> [ 50.264249] Modules linked in: ipv6 pcspkr serio_raw microcode virtio_net >> virtio_pci virtio_ring virtio sr_mod cdrom >> [ 50.264249] CPU: 0 PID: 214 Comm: epoll_bug Not tainted 4.2.0 #75 >> [ 50.264249] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> 1.7.5-20140531_083030-gandalf 04/01/2014 >> [ 50.264249] 817e902e 88087d08 8155593c >> 0007 >> [ 50.264249] 88087d58 88087d48 8105202a >> 0001 >> [ 50.264249] 88003c2c1bb8 88003f07bb80 0286 >> 88003f736640 >> [ 50.264249] Call Trace: >> [ 50.264249] [] dump_stack+0x4c/0x65 >> [ 50.264249] [] warn_slowpath_common+0x8a/0xc0 >> [ 50.264249] [] warn_slowpath_fmt+0x46/0x50 >> [ 50.264249] [] __list_del_entry+0xa4/0xd0 >> [ 50.264249] [] list_del+0x11/0x40 >> [ 50.264249] [] remove_wait_queue+0x29/0x40 >> [ 50.264249] [] ep_unregister_pollwait.isra.6+0x58/0x1a0 >> [ 50.264249] [] ? >> ep_unregister_pollwait.isra.6+0xa9/0x1a0 >> [ 50.264249] [] ep_remove+0x22/0x110 >> [ 50.264249] [] SyS_epoll_ctl+0x62b/0xf70 >> [ 50.264249] [] ? lockdep_sys_exit_thunk+0x12/0x14 >> [ 50.264249] [] entry_SYSCALL_64_fastpath+0x12/0x6f >> [ 50.264249] ---[ end trace d9af9b915df9667e ]--- >> [ 50.572100] [ cut here ] >> [ 50.572100] WARNING: CPU: 1 PID: 212 at lib/list_debug.c:62 >> __list_del_entry+0xc3/0xd0() >> [ 50.584263] list_del corruption. next->prev should be 88003f664c90, >> but was 88003f0cb5b8 >> [ 50.584263] Modules linked in: ipv6 pcspkr serio_raw microcode virtio_net >> virtio_pci virtio_ring virtio sr_mod cdrom >> [ 50.584263] CPU: 1 PID: 212 Comm: epoll_bug Tainted: GW >> 4.2.0 #75 >> [ 50.584263] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> 1.7.5-20140531_083030-gandalf 04/01/2014 >> [ 50.584263] 817e902e 88003d37fce8 8155593c >> 0007 >> [ 50.584263] 88003d37fd38 88003d37fd28 8105202a >> 0001 >> [ 50.584263] 88003f664c90 88003f0cb580 0282 >> 88003f731740 >> [ 50.584263] Call Trace: >> [ 50.584263] [] dump_stack+0x4c/0x65 >> [ 50.584263] [] warn_slowpath_common+0x8a/0xc0 >> [ 50.584263] [] warn_slowpath_fmt+0x46/0x50 >> [ 50.584263] [] __list_del_entry+0xc3/0xd0 >> [ 50.584263] [] list_del+0x11/0x40 >> [ 50.584263] [] remove_wait_queue+0x29/0x40 >> [ 50.584263] [] ep_unregister_pollwait.isra.6+0x58/0x1a0 >> [ 50.584263] [] ? >> ep_unregister_pollwait.isra.6+0xa9/0x1a0 >> [ 50.584263] [] ep_remove+0x22/0x110 >> [ 50.584263] [] eventpoll_release_file+0x62/0xa0 >> [ 50.584263] [] __fput+0x1af/0x200 >> [ 50.584263] [] ? int_very_careful+0x5/0x3f >> [ 50.584263] [] fput+0xe/0x10 >> [ 50.584263] [] task_work_run+0x8d/0xc0 >> [ 50.584263] [] do_notify_resume+0x4f/0x60 >> [ 50.584263] [] int_signal+0x12/0x17 >> [ 50.584263] ---[ end trace d9af9b915df9667f ]--- >> [ 50.584263] BUG: spinlock already unlocked on CPU#1, epoll_bug/212 >> [ 50.584263] lock: 0x88003f0cb580, .magic: dead4ead, .owner: >> /-1, .owner_cpu: -1 >> [ 50.584263] CPU: 1 PID: 212 Comm: epoll_bug Tainted: GW >> 4.2.0 #75 >> [ 50.584263] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> 1.7.5-20140531_083030-gandalf 04/01/2014 >> [ 50.584263] 88003f0cb580 88003d37fd38 8155593c >> 0007 >> [ 50.584263] f
Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
On Tue, Sep 15, 2015 at 06:07:05PM +0100, Rainer Weikusat wrote: > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > -2233,10 +2233,14 static unsigned int > unix_dgram_poll(struct file *file, struct socket *sock, > writable = unix_writable(sk); > other = unix_peer_get(sk); > if (other) { > - if (unix_peer(other) != sk) { > + unix_state_lock(other); > + if (!sock_flag(other, SOCK_DEAD) && unix_peer(other) != sk) { > + unix_state_unlock(other); > sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); > if (unix_recvq_full(other)) > writable = 0; > + } else { > + unix_state_unlock(other); > } > sock_put(other); > } > > That's obviously not going to help you when 'racing with > unix_release_sock' as the socket might be released immediately after the > unix_state_unlock, ie, before sock_poll_wait is called. Provided I > understand this correctly, the problem is that the socket reference > count may have become 1 by the time sock_put is called but the > sock_poll_wait has created a new reference to it which isn't accounted > for. > > A simple way to fix that could be to do something like > > unix_state_lock(other); > if (!sock_flag(other, SOCK_DEAD)) sock_poll_wait(...) > unix_state_unlock(other); Sorry, but that does not fix the bug. I get the same GPF as before. > This would imply that unix_release_sock either marked the socket as dead > before the sock_poll_wait was executed or that the wake_up_interruptible > call in there will run after ->peer_wait was used (and it will thus > 'unpollwait' it again). It must be something else as I also tried the following patch, which moves the wake_up_interruptible_all() call into the locked code section: diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 03ee4d359f6a..58570a7680ce 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -360,10 +360,12 @@ static void unix_dgram_disconnected(struct sock *sk, struct sock *other) * we signal error. Messages are lost. Do not make this, * when peer was not connected to us. */ + unix_state_lock(other); if (!sock_flag(other, SOCK_DEAD) && unix_peer(other) == sk) { other->sk_err = ECONNRESET; other->sk_error_report(other); } + unix_state_unlock(other); } } @@ -413,11 +415,13 @@ static void unix_release_sock(struct sock *sk, int embrion) u->path.mnt = NULL; state = sk->sk_state; sk->sk_state = TCP_CLOSE; - unix_state_unlock(sk); - - wake_up_interruptible_all(&u->peer_wait); skpair = unix_peer(sk); + unix_peer(sk) = NULL; + + wake_up_interruptible_all(&u->peer_wait); + + unix_state_unlock(sk); if (skpair != NULL) { if (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) { @@ -431,7 +435,6 @@ static void unix_release_sock(struct sock *sk, int embrion) sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP); } sock_put(skpair); /* It may now die */ - unix_peer(sk) = NULL; } /* Try to flush out this socket. Throw out buffers at least */ @@ -2440,10 +2443,18 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock, writable = unix_writable(sk); other = unix_peer_get(sk); if (other) { - if (unix_peer(other) != sk) { + unix_state_lock(other); + + if (sock_flag(other, SOCK_DEAD) || (other->sk_shutdown & RCV_SHUTDOWN)) { + unix_state_unlock(other); + writable = 0; + } else if (unix_peer(other) != sk) { sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); + unix_state_unlock(other); if (unix_recvq_full(other)) writable = 0; + } else { + unix_state_unlock(other); } sock_put(other); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
Hi, this is an attempt to resurrect the thread initially started here: http://thread.gmane.org/gmane.linux.network/353003 As that patch fixed the issue for the mentioned reproducer, it did not fix the bug for the production code Olivier is using. :( Changing the reproducer only slightly allows me to trigger the following list debug splat (CONFIG_DEBUG_LIST=y) reliable within seconds -- even with the above linked patch applied: [ 50.264249] [ cut here ] [ 50.264249] WARNING: CPU: 0 PID: 214 at lib/list_debug.c:59 __list_del_entry+0xa4/0xd0() [ 50.264249] list_del corruption. prev->next should be 88003c2c1bb8, but was 88003f07bbb8 [ 50.264249] Modules linked in: ipv6 pcspkr serio_raw microcode virtio_net virtio_pci virtio_ring virtio sr_mod cdrom [ 50.264249] CPU: 0 PID: 214 Comm: epoll_bug Not tainted 4.2.0 #75 [ 50.264249] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 50.264249] 817e902e 88087d08 8155593c 0007 [ 50.264249] 88087d58 88087d48 8105202a 0001 [ 50.264249] 88003c2c1bb8 88003f07bb80 0286 88003f736640 [ 50.264249] Call Trace: [ 50.264249] [] dump_stack+0x4c/0x65 [ 50.264249] [] warn_slowpath_common+0x8a/0xc0 [ 50.264249] [] warn_slowpath_fmt+0x46/0x50 [ 50.264249] [] __list_del_entry+0xa4/0xd0 [ 50.264249] [] list_del+0x11/0x40 [ 50.264249] [] remove_wait_queue+0x29/0x40 [ 50.264249] [] ep_unregister_pollwait.isra.6+0x58/0x1a0 [ 50.264249] [] ? ep_unregister_pollwait.isra.6+0xa9/0x1a0 [ 50.264249] [] ep_remove+0x22/0x110 [ 50.264249] [] SyS_epoll_ctl+0x62b/0xf70 [ 50.264249] [] ? lockdep_sys_exit_thunk+0x12/0x14 [ 50.264249] [] entry_SYSCALL_64_fastpath+0x12/0x6f [ 50.264249] ---[ end trace d9af9b915df9667e ]--- [ 50.572100] [ cut here ] [ 50.572100] WARNING: CPU: 1 PID: 212 at lib/list_debug.c:62 __list_del_entry+0xc3/0xd0() [ 50.584263] list_del corruption. next->prev should be 88003f664c90, but was 88003f0cb5b8 [ 50.584263] Modules linked in: ipv6 pcspkr serio_raw microcode virtio_net virtio_pci virtio_ring virtio sr_mod cdrom [ 50.584263] CPU: 1 PID: 212 Comm: epoll_bug Tainted: GW 4.2.0 #75 [ 50.584263] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 50.584263] 817e902e 88003d37fce8 8155593c 0007 [ 50.584263] 88003d37fd38 88003d37fd28 8105202a 0001 [ 50.584263] 88003f664c90 88003f0cb580 0282 88003f731740 [ 50.584263] Call Trace: [ 50.584263] [] dump_stack+0x4c/0x65 [ 50.584263] [] warn_slowpath_common+0x8a/0xc0 [ 50.584263] [] warn_slowpath_fmt+0x46/0x50 [ 50.584263] [] __list_del_entry+0xc3/0xd0 [ 50.584263] [] list_del+0x11/0x40 [ 50.584263] [] remove_wait_queue+0x29/0x40 [ 50.584263] [] ep_unregister_pollwait.isra.6+0x58/0x1a0 [ 50.584263] [] ? ep_unregister_pollwait.isra.6+0xa9/0x1a0 [ 50.584263] [] ep_remove+0x22/0x110 [ 50.584263] [] eventpoll_release_file+0x62/0xa0 [ 50.584263] [] __fput+0x1af/0x200 [ 50.584263] [] ? int_very_careful+0x5/0x3f [ 50.584263] [] fput+0xe/0x10 [ 50.584263] [] task_work_run+0x8d/0xc0 [ 50.584263] [] do_notify_resume+0x4f/0x60 [ 50.584263] [] int_signal+0x12/0x17 [ 50.584263] ---[ end trace d9af9b915df9667f ]--- [ 50.584263] BUG: spinlock already unlocked on CPU#1, epoll_bug/212 [ 50.584263] lock: 0x88003f0cb580, .magic: dead4ead, .owner: /-1, .owner_cpu: -1 [ 50.584263] CPU: 1 PID: 212 Comm: epoll_bug Tainted: GW 4.2.0 #75 [ 50.584263] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 50.584263] 88003f0cb580 88003d37fd38 8155593c 0007 [ 50.584263] 88003d37fd58 810a3375 88003f0cb580 [ 50.584263] 817b9cc8 88003d37fd78 810a33f6 88003f0cb580 [ 50.584263] Call Trace: [ 50.584263] [] dump_stack+0x4c/0x65 [ 50.584263] [] spin_dump+0x85/0xe0 [ 50.584263] [] spin_bug+0x26/0x30 [ 50.584263] [] do_raw_spin_unlock+0x75/0xa0 [ 50.584263] [] _raw_spin_unlock_irqrestore+0x2c/0x50 [ 50.584263] [] remove_wait_queue+0x34/0x40 [ 50.584263] [] ep_unregister_pollwait.isra.6+0x58/0x1a0 [ 50.584263] [] ? ep_unregister_pollwait.isra.6+0xa9/0x1a0 [ 50.584263] [] ep_remove+0x22/0x110 [ 50.584263] [] eventpoll_release_file+0x62/0xa0 [ 50.584263] [] __fput+0x1af/0x200 [ 50.584263] [] ? int_very_careful+0x5/0x3f [ 50.584263] [] fput+0xe/0x10 [ 50.584263] [] task_work_run+0x8d/0xc0 [ 50.584263] [] do_notify_resume+0x4f/0x60 [ 50.584263] [] int_signal+0x12/0x17 [...] That 'spinlock already unlocked' message is also interesting. But even b
[PATCH] xfrm6: Fix ICMPv6 and MH header checks in _decode_session6
From: Mathias Krause Ensure there's enough data left prior calling pskb_may_pull(). If skb->data was already advanced, we'll call pskb_may_pull() with a negative value converted to unsigned int -- leading to a huge positive value. That won't matter in practice as pskb_may_pull() will likely fail in this case, but it leads to underflow reports on kernels handling such kind of over-/underflows, e.g. a PaX enabled kernel instrumented with the size_overflow plugin. Reported-by: satmd Reported-and-tested-by: Marcin Jurkowski Signed-off-by: Mathias Krause Cc: PaX Team --- net/ipv6/xfrm6_policy.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index ed0583c1b9fc..c988c3f033cf 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -174,7 +174,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse) return; case IPPROTO_ICMPV6: - if (!onlyproto && pskb_may_pull(skb, nh + offset + 2 - skb->data)) { + if (!onlyproto && (nh + offset + 2 < skb->data || + pskb_may_pull(skb, nh + offset + 2 - skb->data))) { u8 *icmp; nh = skb_network_header(skb); @@ -188,7 +189,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse) #if IS_ENABLED(CONFIG_IPV6_MIP6) case IPPROTO_MH: offset += ipv6_optlen(exthdr); - if (!onlyproto && pskb_may_pull(skb, nh + offset + 3 - skb->data)) { + if (!onlyproto && (nh + offset + 3 < skb->data || + pskb_may_pull(skb, nh + offset + 3 - skb->data))) { struct ip6_mh *mh; nh = skb_network_header(skb); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 net-next] net: #ifdefify sk_classid member of struct sock
The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is enabled. #ifdefify it to reduce the size of struct sock on 32 bit systems, at least. Signed-off-by: Mathias Krause --- v2: - ensure we'll error out in nft_meta_get_init() if CONFIG_CGROUP_NET_CLASSID is not set include/net/sock.h |2 ++ net/netfilter/nft_meta.c |4 2 files changed, 6 insertions(+) diff --git a/include/net/sock.h b/include/net/sock.h index 05a8c1aea251..4353ef70bf48 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -429,7 +429,9 @@ struct sock { void*sk_security; #endif __u32 sk_mark; +#ifdef CONFIG_CGROUP_NET_CLASSID u32 sk_classid; +#endif struct cg_proto *sk_cgrp; void(*sk_state_change)(struct sock *sk); void(*sk_data_ready)(struct sock *sk); diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c index 52561e1c31e2..cb2f13ebb5a6 100644 --- a/net/netfilter/nft_meta.c +++ b/net/netfilter/nft_meta.c @@ -166,11 +166,13 @@ void nft_meta_get_eval(const struct nft_expr *expr, goto err; *dest = out->group; break; +#ifdef CONFIG_CGROUP_NET_CLASSID case NFT_META_CGROUP: if (skb->sk == NULL || !sk_fullsock(skb->sk)) goto err; *dest = skb->sk->sk_classid; break; +#endif default: WARN_ON(1); goto err; @@ -246,7 +248,9 @@ int nft_meta_get_init(const struct nft_ctx *ctx, case NFT_META_CPU: case NFT_META_IIFGROUP: case NFT_META_OIFGROUP: +#ifdef CONFIG_CGROUP_NET_CLASSID case NFT_META_CGROUP: +#endif len = sizeof(u32); break; case NFT_META_IIFNAME: -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: #ifdefify sk_classid member of struct sock
On 19 July 2015 at 20:42, David Miller wrote: > From: Mathias Krause > Date: Sun, 19 Jul 2015 20:17:41 +0200 > >> The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is >> enabled. #ifdefify it to reduce the size of struct sock on 32 bit >> systems, at least. >> >> Signed-off-by: Mathias Krause > > Are you sure NFT_META_CGROUP cannot enter nft_meta_get_eval()? If so it'll > WARN_ON(). I just tried to bring the code in line as it already is for NFT_META_RTCLASSID and NFT_META_SECMARK. But, obviously, I messed it up by missing the nft_meta_get_init() function. Sorry for that! The init function gets called prior to the eval function (see nft_expr_init() and nf_tables_newrule()). If it returns an error, the latter won't be called. So the fix is, to add the #ifdef to nft_meta_get_init() as well. Another direct caller of nft_meta_get_eval() is nft_meta_bridge_get_eval(). However, it also complies to the init / eval rule by calling nft_meta_get_init() in its init function, so will error out in the case, too. I'll send a v2 soon. Thanks for the review! > I really don't like changes like this. Sorry. I should have taken more attention. Thanks, Mathias -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: #ifdefify sk_classid member of struct sock
The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is enabled. #ifdefify it to reduce the size of struct sock on 32 bit systems, at least. Signed-off-by: Mathias Krause --- include/net/sock.h |2 ++ net/netfilter/nft_meta.c |2 ++ 2 files changed, 4 insertions(+) diff --git a/include/net/sock.h b/include/net/sock.h index 05a8c1aea251..4353ef70bf48 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -429,7 +429,9 @@ struct sock { void*sk_security; #endif __u32 sk_mark; +#ifdef CONFIG_CGROUP_NET_CLASSID u32 sk_classid; +#endif struct cg_proto *sk_cgrp; void(*sk_state_change)(struct sock *sk); void(*sk_data_ready)(struct sock *sk); diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c index 52561e1c31e2..005b374d1967 100644 --- a/net/netfilter/nft_meta.c +++ b/net/netfilter/nft_meta.c @@ -166,11 +166,13 @@ void nft_meta_get_eval(const struct nft_expr *expr, goto err; *dest = out->group; break; +#ifdef CONFIG_CGROUP_NET_CLASSID case NFT_META_CGROUP: if (skb->sk == NULL || !sk_fullsock(skb->sk)) goto err; *dest = skb->sk->sk_classid; break; +#endif default: WARN_ON(1); goto err; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html