Re: [PATCH net-next] netns: filter uevents correctly

2018-04-11 Thread Eric W. Biederman
Christian Brauner  writes:

> On Wed, Apr 11, 2018 at 11:40:14AM -0500, Eric W. Biederman wrote:
>> Christian Brauner  writes:
>> > Yeah, agreed.
>> > But I think the patch is not complete. To guarantee that no non-initial
>> > user namespace actually receives uevents we need to:
>> > 1. only sent uevents to uevent sockets that are located in network
>> >namespaces that are owned by init_user_ns
>> > 2. filter uevents that are sent to sockets in mc_list that have opened a
>> >uevent socket that is owned by init_user_ns *from* a
>> >non-init_user_ns
>> >
>> > We account for 1. by only recording uevent sockets in the global uevent
>> > socket list who are owned by init_user_ns.
>> > But to account for 2. we need to filter by the user namespace who owns
>> > the socket in mc_list. So in addition to that we also need to slightly
>> > change the filter logic in kobj_bcast_filter() I think:
>> >
>> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>> > index 22a2c1a98b8f..064d7d29ace5 100644
>> > --- a/lib/kobject_uevent.c
>> > +++ b/lib/kobject_uevent.c
>> > @@ -251,7 +251,8 @@ static int kobj_bcast_filter(struct sock *dsk, struct 
>> > sk_buff *skb, void *data)
>> >return sock_ns != ns;
>> >}
>> >  
>> > -  return 0;
>> > +  /* Check if socket was opened from non-initial user namespace. */
>> > +  return sk_user_ns(dsk) != _user_ns;
>> >  }
>> >  #endif
>> >  
>> >
>> > But correct me if I'm wrong.
>> 
>> You are worrying about NETLINK_LISTEN_ALL_NSID sockets.  That has
>> permissions and an explicit opt-in to receiving packets from multiple
>> network namespaces.
>
> I don't think that's what I'm talking about unless that is somehow the
> default for NETLINK_KOBJECT_UEVENT sockets. What I'm worried about is
> doing
>
> unshare -U --map-root
>
> then opening a NETLINK_KOBJECT_UEVENT socket and starting to listen to
> uevents. Imho, this should not be possible because I'm in a
> non-init_user_ns. But currently I'm able to - even with the patch to
> come - since the uevent socket in the kernel was created when init_net
> was created and hence is *owned* by the init_user_ns which means it is
> in the list of uevent sockets. Here's a demo of what I mean:
>
> https://asciinema.org/a/175632

Why do you care about this case?

Everyone is allowed to listen to the uevent netlink sockets with or
without user namespaces.  So there are no permission issues, and
this is not even a data information leak.

If you don't want programs in your user namespace to have access you
will be able to unshare the network namespace.

Eric


Re: [PATCH net-next] netns: filter uevents correctly

2018-04-11 Thread Christian Brauner
On Wed, Apr 11, 2018 at 02:16:23PM -0500, Eric W. Biederman wrote:
> Christian Brauner  writes:
> 
> > On Wed, Apr 11, 2018 at 01:37:18PM -0500, Eric W. Biederman wrote:
> >> Christian Brauner  writes:
> >> 
> >> > On Wed, Apr 11, 2018 at 11:40:14AM -0500, Eric W. Biederman wrote:
> >> >> Christian Brauner  writes:
> >> >> > Yeah, agreed.
> >> >> > But I think the patch is not complete. To guarantee that no 
> >> >> > non-initial
> >> >> > user namespace actually receives uevents we need to:
> >> >> > 1. only sent uevents to uevent sockets that are located in network
> >> >> >namespaces that are owned by init_user_ns
> >> >> > 2. filter uevents that are sent to sockets in mc_list that have 
> >> >> > opened a
> >> >> >uevent socket that is owned by init_user_ns *from* a
> >> >> >non-init_user_ns
> >> >> >
> >> >> > We account for 1. by only recording uevent sockets in the global 
> >> >> > uevent
> >> >> > socket list who are owned by init_user_ns.
> >> >> > But to account for 2. we need to filter by the user namespace who owns
> >> >> > the socket in mc_list. So in addition to that we also need to slightly
> >> >> > change the filter logic in kobj_bcast_filter() I think:
> >> >> >
> >> >> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >> >> > index 22a2c1a98b8f..064d7d29ace5 100644
> >> >> > --- a/lib/kobject_uevent.c
> >> >> > +++ b/lib/kobject_uevent.c
> >> >> > @@ -251,7 +251,8 @@ static int kobj_bcast_filter(struct sock *dsk, 
> >> >> > struct sk_buff *skb, void *data)
> >> >> >   return sock_ns != ns;
> >> >> >   }
> >> >> >  
> >> >> > - return 0;
> >> >> > + /* Check if socket was opened from non-initial user namespace. 
> >> >> > */
> >> >> > + return sk_user_ns(dsk) != _user_ns;
> >> >> >  }
> >> >> >  #endif
> >> >> >  
> >> >> >
> >> >> > But correct me if I'm wrong.
> >> >> 
> >> >> You are worrying about NETLINK_LISTEN_ALL_NSID sockets.  That has
> >> >> permissions and an explicit opt-in to receiving packets from multiple
> >> >> network namespaces.
> >> >
> >> > I don't think that's what I'm talking about unless that is somehow the
> >> > default for NETLINK_KOBJECT_UEVENT sockets. What I'm worried about is
> >> > doing
> >> >
> >> > unshare -U --map-root
> >> >
> >> > then opening a NETLINK_KOBJECT_UEVENT socket and starting to listen to
> >> > uevents. Imho, this should not be possible because I'm in a
> >> > non-init_user_ns. But currently I'm able to - even with the patch to
> >> > come - since the uevent socket in the kernel was created when init_net
> >> > was created and hence is *owned* by the init_user_ns which means it is
> >> > in the list of uevent sockets. Here's a demo of what I mean:
> >> >
> >> > https://asciinema.org/a/175632
> >> 
> >> Why do you care about this case?
> >
> > It's not so much that I care about this case since any workload that
> > wants to run a separate udevd will have to unshare the network namespace
> > and the user namespace for it to make complete sense.
> > What I do care about is that the two of us are absolutely in the clear
> > about what semantics we are going to expose to userspace and it seems
> > that we were talking past each other wrt to this "corner case".
> > For userspace, it needs to be very clear that the intention is to filter
> > by *owning user namespace of the network namespace a given task resides
> > in* and not by user namespace of the task per se. This is what this
> > corner case basically shows, I think.
> 
> If this is just a clarification of semantics then yes this is a
> productive question.  I almost agree with your definition above.
> 
> I would make the definition very simple.  Uevents will not be broadcast
> via netlink in a network namespace where net->user_ns != _user_ns,
> with the exception of uevents for network devices in that network
> namespace.

Well, for the sake of posterity :) I should add that I'd prefer we'd add
what I suggested above:

-   return 0;
+   /* Check if socket was opened from non-initial user namespace. */
+   return sk_user_ns(dsk) != _user_ns;
 }

to slam the door shut once and for all for all non-init_user_ns
namespaces because it *seems* like the cleanest solution: uevents are
owned by init_user_ns; period. Because it is the only user namespace
that can do anything interesting with them *by default*.
But what we have now right now with my upcoming patch is at least
sufficient and safe.

Christian

> 
> The existing filtering by the sending uid and verifying that it is uid 0
> gives a little more room to filter if we want (as udev & friends will
> ignore the uevent), but I don't see the point.
> 
> Eric


Re: [PATCH net-next] netns: filter uevents correctly

2018-04-11 Thread Christian Brauner
On Wed, Apr 11, 2018 at 01:37:18PM -0500, Eric W. Biederman wrote:
> Christian Brauner  writes:
> 
> > On Wed, Apr 11, 2018 at 11:40:14AM -0500, Eric W. Biederman wrote:
> >> Christian Brauner  writes:
> >> > Yeah, agreed.
> >> > But I think the patch is not complete. To guarantee that no non-initial
> >> > user namespace actually receives uevents we need to:
> >> > 1. only sent uevents to uevent sockets that are located in network
> >> >namespaces that are owned by init_user_ns
> >> > 2. filter uevents that are sent to sockets in mc_list that have opened a
> >> >uevent socket that is owned by init_user_ns *from* a
> >> >non-init_user_ns
> >> >
> >> > We account for 1. by only recording uevent sockets in the global uevent
> >> > socket list who are owned by init_user_ns.
> >> > But to account for 2. we need to filter by the user namespace who owns
> >> > the socket in mc_list. So in addition to that we also need to slightly
> >> > change the filter logic in kobj_bcast_filter() I think:
> >> >
> >> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >> > index 22a2c1a98b8f..064d7d29ace5 100644
> >> > --- a/lib/kobject_uevent.c
> >> > +++ b/lib/kobject_uevent.c
> >> > @@ -251,7 +251,8 @@ static int kobj_bcast_filter(struct sock *dsk, 
> >> > struct sk_buff *skb, void *data)
> >> >  return sock_ns != ns;
> >> >  }
> >> >  
> >> > -return 0;
> >> > +/* Check if socket was opened from non-initial user namespace. 
> >> > */
> >> > +return sk_user_ns(dsk) != _user_ns;
> >> >  }
> >> >  #endif
> >> >  
> >> >
> >> > But correct me if I'm wrong.
> >> 
> >> You are worrying about NETLINK_LISTEN_ALL_NSID sockets.  That has
> >> permissions and an explicit opt-in to receiving packets from multiple
> >> network namespaces.
> >
> > I don't think that's what I'm talking about unless that is somehow the
> > default for NETLINK_KOBJECT_UEVENT sockets. What I'm worried about is
> > doing
> >
> > unshare -U --map-root
> >
> > then opening a NETLINK_KOBJECT_UEVENT socket and starting to listen to
> > uevents. Imho, this should not be possible because I'm in a
> > non-init_user_ns. But currently I'm able to - even with the patch to
> > come - since the uevent socket in the kernel was created when init_net
> > was created and hence is *owned* by the init_user_ns which means it is
> > in the list of uevent sockets. Here's a demo of what I mean:
> >
> > https://asciinema.org/a/175632
> 
> Why do you care about this case?

It's not so much that I care about this case since any workload that
wants to run a separate udevd will have to unshare the network namespace
and the user namespace for it to make complete sense.
What I do care about is that the two of us are absolutely in the clear
about what semantics we are going to expose to userspace and it seems
that we were talking past each other wrt to this "corner case".
For userspace, it needs to be very clear that the intention is to filter
by *owning user namespace of the network namespace a given task resides
in* and not by user namespace of the task per se. This is what this
corner case basically shows, I think.

Christian

> 
> Everyone is allowed to listen to the uevent netlink sockets with or
> without user namespaces.  So there are no permission issues, and
> this is not even a data information leak.
> 
> If you don't want programs in your user namespace to have access you
> will be able to unshare the network namespace.
> 
> Eric


Re: [PATCH net-next] netns: filter uevents correctly

2018-04-11 Thread Eric W. Biederman
Christian Brauner  writes:

> On Wed, Apr 11, 2018 at 01:37:18PM -0500, Eric W. Biederman wrote:
>> Christian Brauner  writes:
>> 
>> > On Wed, Apr 11, 2018 at 11:40:14AM -0500, Eric W. Biederman wrote:
>> >> Christian Brauner  writes:
>> >> > Yeah, agreed.
>> >> > But I think the patch is not complete. To guarantee that no non-initial
>> >> > user namespace actually receives uevents we need to:
>> >> > 1. only sent uevents to uevent sockets that are located in network
>> >> >namespaces that are owned by init_user_ns
>> >> > 2. filter uevents that are sent to sockets in mc_list that have opened a
>> >> >uevent socket that is owned by init_user_ns *from* a
>> >> >non-init_user_ns
>> >> >
>> >> > We account for 1. by only recording uevent sockets in the global uevent
>> >> > socket list who are owned by init_user_ns.
>> >> > But to account for 2. we need to filter by the user namespace who owns
>> >> > the socket in mc_list. So in addition to that we also need to slightly
>> >> > change the filter logic in kobj_bcast_filter() I think:
>> >> >
>> >> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>> >> > index 22a2c1a98b8f..064d7d29ace5 100644
>> >> > --- a/lib/kobject_uevent.c
>> >> > +++ b/lib/kobject_uevent.c
>> >> > @@ -251,7 +251,8 @@ static int kobj_bcast_filter(struct sock *dsk, 
>> >> > struct sk_buff *skb, void *data)
>> >> > return sock_ns != ns;
>> >> > }
>> >> >  
>> >> > -   return 0;
>> >> > +   /* Check if socket was opened from non-initial user namespace. 
>> >> > */
>> >> > +   return sk_user_ns(dsk) != _user_ns;
>> >> >  }
>> >> >  #endif
>> >> >  
>> >> >
>> >> > But correct me if I'm wrong.
>> >> 
>> >> You are worrying about NETLINK_LISTEN_ALL_NSID sockets.  That has
>> >> permissions and an explicit opt-in to receiving packets from multiple
>> >> network namespaces.
>> >
>> > I don't think that's what I'm talking about unless that is somehow the
>> > default for NETLINK_KOBJECT_UEVENT sockets. What I'm worried about is
>> > doing
>> >
>> > unshare -U --map-root
>> >
>> > then opening a NETLINK_KOBJECT_UEVENT socket and starting to listen to
>> > uevents. Imho, this should not be possible because I'm in a
>> > non-init_user_ns. But currently I'm able to - even with the patch to
>> > come - since the uevent socket in the kernel was created when init_net
>> > was created and hence is *owned* by the init_user_ns which means it is
>> > in the list of uevent sockets. Here's a demo of what I mean:
>> >
>> > https://asciinema.org/a/175632
>> 
>> Why do you care about this case?
>
> It's not so much that I care about this case since any workload that
> wants to run a separate udevd will have to unshare the network namespace
> and the user namespace for it to make complete sense.
> What I do care about is that the two of us are absolutely in the clear
> about what semantics we are going to expose to userspace and it seems
> that we were talking past each other wrt to this "corner case".
> For userspace, it needs to be very clear that the intention is to filter
> by *owning user namespace of the network namespace a given task resides
> in* and not by user namespace of the task per se. This is what this
> corner case basically shows, I think.

If this is just a clarification of semantics then yes this is a
productive question.  I almost agree with your definition above.

I would make the definition very simple.  Uevents will not be broadcast
via netlink in a network namespace where net->user_ns != _user_ns,
with the exception of uevents for network devices in that network
namespace.

The existing filtering by the sending uid and verifying that it is uid 0
gives a little more room to filter if we want (as udev & friends will
ignore the uevent), but I don't see the point.

Eric


Re: [PATCH net-next] netns: filter uevents correctly

2018-04-11 Thread Christian Brauner
On Wed, Apr 11, 2018 at 11:40:14AM -0500, Eric W. Biederman wrote:
> Christian Brauner  writes:
> 
> > On Tue, Apr 10, 2018 at 10:04:46AM -0500, Eric W. Biederman wrote:
> >> Christian Brauner  writes:
> >> 
> >> > On Mon, Apr 09, 2018 at 06:21:31PM -0500, Eric W. Biederman wrote:
> >> >> Christian Brauner  writes:
> >> >> 
> >> >> > On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
> >> >> >> Christian Brauner  writes:
> >> >> >> 
> >> >> >> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
> >> >> >> >> On 05.04.2018 17:07, Christian Brauner wrote:
> >> >> >> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> >> >> >> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
> >> >> >> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all 
> >> >> >> >> >>> network namespaces")
> >> >> >> >> >>>
> >> >> >> >> >>> enabled sending hotplug events into all network namespaces 
> >> >> >> >> >>> back in 2010.
> >> >> >> >> >>> Over time the set of uevents that get sent into all network 
> >> >> >> >> >>> namespaces has
> >> >> >> >> >>> shrunk. We have now reached the point where hotplug events 
> >> >> >> >> >>> for all devices
> >> >> >> >> >>> that carry a namespace tag are filtered according to that 
> >> >> >> >> >>> namespace.
> >> >> >> >> >>>
> >> >> >> >> >>> Specifically, they are filtered whenever the namespace tag of 
> >> >> >> >> >>> the kobject
> >> >> >> >> >>> does not match the namespace tag of the netlink socket. One 
> >> >> >> >> >>> example are
> >> >> >> >> >>> network devices. Uevents for network devices only show up in 
> >> >> >> >> >>> the network
> >> >> >> >> >>> namespaces these devices are moved to or created in.
> >> >> >> >> >>>
> >> >> >> >> >>> However, any uevent for a kobject that does not have a 
> >> >> >> >> >>> namespace tag
> >> >> >> >> >>> associated with it will not be filtered and we will *try* to 
> >> >> >> >> >>> broadcast it
> >> >> >> >> >>> into all network namespaces.
> >> >> >> >> >>>
> >> >> >> >> >>> The original patchset was written in 2010 before user 
> >> >> >> >> >>> namespaces were a
> >> >> >> >> >>> thing. With the introduction of user namespaces sending out 
> >> >> >> >> >>> uevents became
> >> >> >> >> >>> partially isolated as they were filtered by user namespaces:
> >> >> >> >> >>>
> >> >> >> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
> >> >> >> >> >>>
> >> >> >> >> >>> if (!net_eq(sock_net(sk), p->net)) {
> >> >> >> >> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >> >> >> >> >>> return;
> >> >> >> >> >>>
> >> >> >> >> >>> if (!peernet_has_id(sock_net(sk), p->net))
> >> >> >> >> >>> return;
> >> >> >> >> >>>
> >> >> >> >> >>> if (!file_ns_capable(sk->sk_socket->file, 
> >> >> >> >> >>> p->net->user_ns,
> >> >> >> >> >>>  CAP_NET_BROADCAST))
> >> >> >> >> >>> j   return;
> >> >> >> >> >>> }
> >> >> >> >> >>>
> >> >> >> >> >>> The file_ns_capable() check will check whether the caller had
> >> >> >> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket 
> >> >> >> >> >>> in the user
> >> >> >> >> >>> namespace of interest. This check is fine in general but 
> >> >> >> >> >>> seems insufficient
> >> >> >> >> >>> to me when paired with uevents. The reason is that devices 
> >> >> >> >> >>> always belong to
> >> >> >> >> >>> the initial user namespace so uevents for kobjects that do 
> >> >> >> >> >>> not carry a
> >> >> >> >> >>> namespace tag should never be sent into another user 
> >> >> >> >> >>> namespace. This has
> >> >> >> >> >>> been the intention all along. But there's one case where this 
> >> >> >> >> >>> breaks,
> >> >> >> >> >>> namely if a new user namespace is created by root on the host 
> >> >> >> >> >>> and an
> >> >> >> >> >>> identity mapping is established between root on the host and 
> >> >> >> >> >>> root in the
> >> >> >> >> >>> new user namespace. Here's a reproducer:
> >> >> >> >> >>>
> >> >> >> >> >>>  sudo unshare -U --map-root
> >> >> >> >> >>>  udevadm monitor -k
> >> >> >> >> >>>  # Now change to initial user namespace and e.g. do
> >> >> >> >> >>>  modprobe kvm
> >> >> >> >> >>>  # or
> >> >> >> >> >>>  rmmod kvm
> >> >> >> >> >>>
> >> >> >> >> >>> will allow the non-initial user namespace to retrieve all 
> >> >> >> >> >>> uevents from the
> >> >> >> >> >>> host. This seems very anecdotal given that in the general 
> >> >> >> >> >>> case user
> >> >> >> >> >>> namespaces do not see any uevents and also can't really do 
> >> >> >> >> >>> anything useful
> >> >> >> >> >>> with them.
> >> >> >> >> >>>
> >> >> >> >> >>> Additionally, it is now possible to send uevents from 
> >> >> >> >> >>> userspace. As such we
> >> >> >> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the 
> >> >> >> >> >>> owning 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-11 Thread Eric W. Biederman
Christian Brauner  writes:

> On Tue, Apr 10, 2018 at 10:04:46AM -0500, Eric W. Biederman wrote:
>> Christian Brauner  writes:
>> 
>> > On Mon, Apr 09, 2018 at 06:21:31PM -0500, Eric W. Biederman wrote:
>> >> Christian Brauner  writes:
>> >> 
>> >> > On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
>> >> >> Christian Brauner  writes:
>> >> >> 
>> >> >> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
>> >> >> >> On 05.04.2018 17:07, Christian Brauner wrote:
>> >> >> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
>> >> >> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
>> >> >> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all 
>> >> >> >> >>> network namespaces")
>> >> >> >> >>>
>> >> >> >> >>> enabled sending hotplug events into all network namespaces back 
>> >> >> >> >>> in 2010.
>> >> >> >> >>> Over time the set of uevents that get sent into all network 
>> >> >> >> >>> namespaces has
>> >> >> >> >>> shrunk. We have now reached the point where hotplug events for 
>> >> >> >> >>> all devices
>> >> >> >> >>> that carry a namespace tag are filtered according to that 
>> >> >> >> >>> namespace.
>> >> >> >> >>>
>> >> >> >> >>> Specifically, they are filtered whenever the namespace tag of 
>> >> >> >> >>> the kobject
>> >> >> >> >>> does not match the namespace tag of the netlink socket. One 
>> >> >> >> >>> example are
>> >> >> >> >>> network devices. Uevents for network devices only show up in 
>> >> >> >> >>> the network
>> >> >> >> >>> namespaces these devices are moved to or created in.
>> >> >> >> >>>
>> >> >> >> >>> However, any uevent for a kobject that does not have a 
>> >> >> >> >>> namespace tag
>> >> >> >> >>> associated with it will not be filtered and we will *try* to 
>> >> >> >> >>> broadcast it
>> >> >> >> >>> into all network namespaces.
>> >> >> >> >>>
>> >> >> >> >>> The original patchset was written in 2010 before user 
>> >> >> >> >>> namespaces were a
>> >> >> >> >>> thing. With the introduction of user namespaces sending out 
>> >> >> >> >>> uevents became
>> >> >> >> >>> partially isolated as they were filtered by user namespaces:
>> >> >> >> >>>
>> >> >> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
>> >> >> >> >>>
>> >> >> >> >>> if (!net_eq(sock_net(sk), p->net)) {
>> >> >> >> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
>> >> >> >> >>> return;
>> >> >> >> >>>
>> >> >> >> >>> if (!peernet_has_id(sock_net(sk), p->net))
>> >> >> >> >>> return;
>> >> >> >> >>>
>> >> >> >> >>> if (!file_ns_capable(sk->sk_socket->file, 
>> >> >> >> >>> p->net->user_ns,
>> >> >> >> >>>  CAP_NET_BROADCAST))
>> >> >> >> >>> j   return;
>> >> >> >> >>> }
>> >> >> >> >>>
>> >> >> >> >>> The file_ns_capable() check will check whether the caller had
>> >> >> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in 
>> >> >> >> >>> the user
>> >> >> >> >>> namespace of interest. This check is fine in general but seems 
>> >> >> >> >>> insufficient
>> >> >> >> >>> to me when paired with uevents. The reason is that devices 
>> >> >> >> >>> always belong to
>> >> >> >> >>> the initial user namespace so uevents for kobjects that do not 
>> >> >> >> >>> carry a
>> >> >> >> >>> namespace tag should never be sent into another user namespace. 
>> >> >> >> >>> This has
>> >> >> >> >>> been the intention all along. But there's one case where this 
>> >> >> >> >>> breaks,
>> >> >> >> >>> namely if a new user namespace is created by root on the host 
>> >> >> >> >>> and an
>> >> >> >> >>> identity mapping is established between root on the host and 
>> >> >> >> >>> root in the
>> >> >> >> >>> new user namespace. Here's a reproducer:
>> >> >> >> >>>
>> >> >> >> >>>  sudo unshare -U --map-root
>> >> >> >> >>>  udevadm monitor -k
>> >> >> >> >>>  # Now change to initial user namespace and e.g. do
>> >> >> >> >>>  modprobe kvm
>> >> >> >> >>>  # or
>> >> >> >> >>>  rmmod kvm
>> >> >> >> >>>
>> >> >> >> >>> will allow the non-initial user namespace to retrieve all 
>> >> >> >> >>> uevents from the
>> >> >> >> >>> host. This seems very anecdotal given that in the general case 
>> >> >> >> >>> user
>> >> >> >> >>> namespaces do not see any uevents and also can't really do 
>> >> >> >> >>> anything useful
>> >> >> >> >>> with them.
>> >> >> >> >>>
>> >> >> >> >>> Additionally, it is now possible to send uevents from 
>> >> >> >> >>> userspace. As such we
>> >> >> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning 
>> >> >> >> >>> user
>> >> >> >> >>> namespace of the network namespace of the netlink socket) 
>> >> >> >> >>> userspace process
>> >> >> >> >>> make a decision what uevents should be sent.
>> >> >> >> >>>
>> >> >> >> >>> This makes me think that we should simply ensure that 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-11 Thread Christian Brauner
On Tue, Apr 10, 2018 at 10:04:46AM -0500, Eric W. Biederman wrote:
> Christian Brauner  writes:
> 
> > On Mon, Apr 09, 2018 at 06:21:31PM -0500, Eric W. Biederman wrote:
> >> Christian Brauner  writes:
> >> 
> >> > On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
> >> >> Christian Brauner  writes:
> >> >> 
> >> >> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
> >> >> >> On 05.04.2018 17:07, Christian Brauner wrote:
> >> >> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> >> >> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
> >> >> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all 
> >> >> >> >>> network namespaces")
> >> >> >> >>>
> >> >> >> >>> enabled sending hotplug events into all network namespaces back 
> >> >> >> >>> in 2010.
> >> >> >> >>> Over time the set of uevents that get sent into all network 
> >> >> >> >>> namespaces has
> >> >> >> >>> shrunk. We have now reached the point where hotplug events for 
> >> >> >> >>> all devices
> >> >> >> >>> that carry a namespace tag are filtered according to that 
> >> >> >> >>> namespace.
> >> >> >> >>>
> >> >> >> >>> Specifically, they are filtered whenever the namespace tag of 
> >> >> >> >>> the kobject
> >> >> >> >>> does not match the namespace tag of the netlink socket. One 
> >> >> >> >>> example are
> >> >> >> >>> network devices. Uevents for network devices only show up in the 
> >> >> >> >>> network
> >> >> >> >>> namespaces these devices are moved to or created in.
> >> >> >> >>>
> >> >> >> >>> However, any uevent for a kobject that does not have a namespace 
> >> >> >> >>> tag
> >> >> >> >>> associated with it will not be filtered and we will *try* to 
> >> >> >> >>> broadcast it
> >> >> >> >>> into all network namespaces.
> >> >> >> >>>
> >> >> >> >>> The original patchset was written in 2010 before user namespaces 
> >> >> >> >>> were a
> >> >> >> >>> thing. With the introduction of user namespaces sending out 
> >> >> >> >>> uevents became
> >> >> >> >>> partially isolated as they were filtered by user namespaces:
> >> >> >> >>>
> >> >> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
> >> >> >> >>>
> >> >> >> >>> if (!net_eq(sock_net(sk), p->net)) {
> >> >> >> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >> >> >> >>> return;
> >> >> >> >>>
> >> >> >> >>> if (!peernet_has_id(sock_net(sk), p->net))
> >> >> >> >>> return;
> >> >> >> >>>
> >> >> >> >>> if (!file_ns_capable(sk->sk_socket->file, 
> >> >> >> >>> p->net->user_ns,
> >> >> >> >>>  CAP_NET_BROADCAST))
> >> >> >> >>> j   return;
> >> >> >> >>> }
> >> >> >> >>>
> >> >> >> >>> The file_ns_capable() check will check whether the caller had
> >> >> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in 
> >> >> >> >>> the user
> >> >> >> >>> namespace of interest. This check is fine in general but seems 
> >> >> >> >>> insufficient
> >> >> >> >>> to me when paired with uevents. The reason is that devices 
> >> >> >> >>> always belong to
> >> >> >> >>> the initial user namespace so uevents for kobjects that do not 
> >> >> >> >>> carry a
> >> >> >> >>> namespace tag should never be sent into another user namespace. 
> >> >> >> >>> This has
> >> >> >> >>> been the intention all along. But there's one case where this 
> >> >> >> >>> breaks,
> >> >> >> >>> namely if a new user namespace is created by root on the host 
> >> >> >> >>> and an
> >> >> >> >>> identity mapping is established between root on the host and 
> >> >> >> >>> root in the
> >> >> >> >>> new user namespace. Here's a reproducer:
> >> >> >> >>>
> >> >> >> >>>  sudo unshare -U --map-root
> >> >> >> >>>  udevadm monitor -k
> >> >> >> >>>  # Now change to initial user namespace and e.g. do
> >> >> >> >>>  modprobe kvm
> >> >> >> >>>  # or
> >> >> >> >>>  rmmod kvm
> >> >> >> >>>
> >> >> >> >>> will allow the non-initial user namespace to retrieve all 
> >> >> >> >>> uevents from the
> >> >> >> >>> host. This seems very anecdotal given that in the general case 
> >> >> >> >>> user
> >> >> >> >>> namespaces do not see any uevents and also can't really do 
> >> >> >> >>> anything useful
> >> >> >> >>> with them.
> >> >> >> >>>
> >> >> >> >>> Additionally, it is now possible to send uevents from userspace. 
> >> >> >> >>> As such we
> >> >> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning 
> >> >> >> >>> user
> >> >> >> >>> namespace of the network namespace of the netlink socket) 
> >> >> >> >>> userspace process
> >> >> >> >>> make a decision what uevents should be sent.
> >> >> >> >>>
> >> >> >> >>> This makes me think that we should simply ensure that uevents 
> >> >> >> >>> for kobjects
> >> >> >> >>> that do not carry a namespace tag are *always* filtered by user 
> >> >> >> >>> namespace
> >> >> >> >>> in 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-10 Thread Eric W. Biederman
Christian Brauner  writes:

> On Mon, Apr 09, 2018 at 06:21:31PM -0500, Eric W. Biederman wrote:
>> Christian Brauner  writes:
>> 
>> > On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
>> >> Christian Brauner  writes:
>> >> 
>> >> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
>> >> >> On 05.04.2018 17:07, Christian Brauner wrote:
>> >> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
>> >> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
>> >> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network 
>> >> >> >>> namespaces")
>> >> >> >>>
>> >> >> >>> enabled sending hotplug events into all network namespaces back in 
>> >> >> >>> 2010.
>> >> >> >>> Over time the set of uevents that get sent into all network 
>> >> >> >>> namespaces has
>> >> >> >>> shrunk. We have now reached the point where hotplug events for all 
>> >> >> >>> devices
>> >> >> >>> that carry a namespace tag are filtered according to that 
>> >> >> >>> namespace.
>> >> >> >>>
>> >> >> >>> Specifically, they are filtered whenever the namespace tag of the 
>> >> >> >>> kobject
>> >> >> >>> does not match the namespace tag of the netlink socket. One 
>> >> >> >>> example are
>> >> >> >>> network devices. Uevents for network devices only show up in the 
>> >> >> >>> network
>> >> >> >>> namespaces these devices are moved to or created in.
>> >> >> >>>
>> >> >> >>> However, any uevent for a kobject that does not have a namespace 
>> >> >> >>> tag
>> >> >> >>> associated with it will not be filtered and we will *try* to 
>> >> >> >>> broadcast it
>> >> >> >>> into all network namespaces.
>> >> >> >>>
>> >> >> >>> The original patchset was written in 2010 before user namespaces 
>> >> >> >>> were a
>> >> >> >>> thing. With the introduction of user namespaces sending out 
>> >> >> >>> uevents became
>> >> >> >>> partially isolated as they were filtered by user namespaces:
>> >> >> >>>
>> >> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
>> >> >> >>>
>> >> >> >>> if (!net_eq(sock_net(sk), p->net)) {
>> >> >> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
>> >> >> >>> return;
>> >> >> >>>
>> >> >> >>> if (!peernet_has_id(sock_net(sk), p->net))
>> >> >> >>> return;
>> >> >> >>>
>> >> >> >>> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>> >> >> >>>  CAP_NET_BROADCAST))
>> >> >> >>> j   return;
>> >> >> >>> }
>> >> >> >>>
>> >> >> >>> The file_ns_capable() check will check whether the caller had
>> >> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the 
>> >> >> >>> user
>> >> >> >>> namespace of interest. This check is fine in general but seems 
>> >> >> >>> insufficient
>> >> >> >>> to me when paired with uevents. The reason is that devices always 
>> >> >> >>> belong to
>> >> >> >>> the initial user namespace so uevents for kobjects that do not 
>> >> >> >>> carry a
>> >> >> >>> namespace tag should never be sent into another user namespace. 
>> >> >> >>> This has
>> >> >> >>> been the intention all along. But there's one case where this 
>> >> >> >>> breaks,
>> >> >> >>> namely if a new user namespace is created by root on the host and 
>> >> >> >>> an
>> >> >> >>> identity mapping is established between root on the host and root 
>> >> >> >>> in the
>> >> >> >>> new user namespace. Here's a reproducer:
>> >> >> >>>
>> >> >> >>>  sudo unshare -U --map-root
>> >> >> >>>  udevadm monitor -k
>> >> >> >>>  # Now change to initial user namespace and e.g. do
>> >> >> >>>  modprobe kvm
>> >> >> >>>  # or
>> >> >> >>>  rmmod kvm
>> >> >> >>>
>> >> >> >>> will allow the non-initial user namespace to retrieve all uevents 
>> >> >> >>> from the
>> >> >> >>> host. This seems very anecdotal given that in the general case user
>> >> >> >>> namespaces do not see any uevents and also can't really do 
>> >> >> >>> anything useful
>> >> >> >>> with them.
>> >> >> >>>
>> >> >> >>> Additionally, it is now possible to send uevents from userspace. 
>> >> >> >>> As such we
>> >> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
>> >> >> >>> namespace of the network namespace of the netlink socket) 
>> >> >> >>> userspace process
>> >> >> >>> make a decision what uevents should be sent.
>> >> >> >>>
>> >> >> >>> This makes me think that we should simply ensure that uevents for 
>> >> >> >>> kobjects
>> >> >> >>> that do not carry a namespace tag are *always* filtered by user 
>> >> >> >>> namespace
>> >> >> >>> in kobj_bcast_filter(). Specifically:
>> >> >> >>> - If the owning user namespace of the uevent socket is not 
>> >> >> >>> init_user_ns the
>> >> >> >>>   event will always be filtered.
>> >> >> >>> - If the network namespace the uevent socket belongs to was 
>> >> >> >>> created in the
>> >> >> >>>   initial user namespace but was 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-10 Thread Christian Brauner
On Mon, Apr 09, 2018 at 06:21:31PM -0500, Eric W. Biederman wrote:
> Christian Brauner  writes:
> 
> > On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
> >> Christian Brauner  writes:
> >> 
> >> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
> >> >> On 05.04.2018 17:07, Christian Brauner wrote:
> >> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> >> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
> >> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network 
> >> >> >>> namespaces")
> >> >> >>>
> >> >> >>> enabled sending hotplug events into all network namespaces back in 
> >> >> >>> 2010.
> >> >> >>> Over time the set of uevents that get sent into all network 
> >> >> >>> namespaces has
> >> >> >>> shrunk. We have now reached the point where hotplug events for all 
> >> >> >>> devices
> >> >> >>> that carry a namespace tag are filtered according to that namespace.
> >> >> >>>
> >> >> >>> Specifically, they are filtered whenever the namespace tag of the 
> >> >> >>> kobject
> >> >> >>> does not match the namespace tag of the netlink socket. One example 
> >> >> >>> are
> >> >> >>> network devices. Uevents for network devices only show up in the 
> >> >> >>> network
> >> >> >>> namespaces these devices are moved to or created in.
> >> >> >>>
> >> >> >>> However, any uevent for a kobject that does not have a namespace tag
> >> >> >>> associated with it will not be filtered and we will *try* to 
> >> >> >>> broadcast it
> >> >> >>> into all network namespaces.
> >> >> >>>
> >> >> >>> The original patchset was written in 2010 before user namespaces 
> >> >> >>> were a
> >> >> >>> thing. With the introduction of user namespaces sending out uevents 
> >> >> >>> became
> >> >> >>> partially isolated as they were filtered by user namespaces:
> >> >> >>>
> >> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
> >> >> >>>
> >> >> >>> if (!net_eq(sock_net(sk), p->net)) {
> >> >> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >> >> >>> return;
> >> >> >>>
> >> >> >>> if (!peernet_has_id(sock_net(sk), p->net))
> >> >> >>> return;
> >> >> >>>
> >> >> >>> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >> >> >>>  CAP_NET_BROADCAST))
> >> >> >>> j   return;
> >> >> >>> }
> >> >> >>>
> >> >> >>> The file_ns_capable() check will check whether the caller had
> >> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the 
> >> >> >>> user
> >> >> >>> namespace of interest. This check is fine in general but seems 
> >> >> >>> insufficient
> >> >> >>> to me when paired with uevents. The reason is that devices always 
> >> >> >>> belong to
> >> >> >>> the initial user namespace so uevents for kobjects that do not 
> >> >> >>> carry a
> >> >> >>> namespace tag should never be sent into another user namespace. 
> >> >> >>> This has
> >> >> >>> been the intention all along. But there's one case where this 
> >> >> >>> breaks,
> >> >> >>> namely if a new user namespace is created by root on the host and an
> >> >> >>> identity mapping is established between root on the host and root 
> >> >> >>> in the
> >> >> >>> new user namespace. Here's a reproducer:
> >> >> >>>
> >> >> >>>  sudo unshare -U --map-root
> >> >> >>>  udevadm monitor -k
> >> >> >>>  # Now change to initial user namespace and e.g. do
> >> >> >>>  modprobe kvm
> >> >> >>>  # or
> >> >> >>>  rmmod kvm
> >> >> >>>
> >> >> >>> will allow the non-initial user namespace to retrieve all uevents 
> >> >> >>> from the
> >> >> >>> host. This seems very anecdotal given that in the general case user
> >> >> >>> namespaces do not see any uevents and also can't really do anything 
> >> >> >>> useful
> >> >> >>> with them.
> >> >> >>>
> >> >> >>> Additionally, it is now possible to send uevents from userspace. As 
> >> >> >>> such we
> >> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> >> >> >>> namespace of the network namespace of the netlink socket) userspace 
> >> >> >>> process
> >> >> >>> make a decision what uevents should be sent.
> >> >> >>>
> >> >> >>> This makes me think that we should simply ensure that uevents for 
> >> >> >>> kobjects
> >> >> >>> that do not carry a namespace tag are *always* filtered by user 
> >> >> >>> namespace
> >> >> >>> in kobj_bcast_filter(). Specifically:
> >> >> >>> - If the owning user namespace of the uevent socket is not 
> >> >> >>> init_user_ns the
> >> >> >>>   event will always be filtered.
> >> >> >>> - If the network namespace the uevent socket belongs to was created 
> >> >> >>> in the
> >> >> >>>   initial user namespace but was opened from a non-initial user 
> >> >> >>> namespace
> >> >> >>>   the event will be filtered as well.
> >> >> >>> Put another way, uevents for kobjects not carrying a namespace tag 
> >> >> >>> are now
> 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-09 Thread Eric W. Biederman
Christian Brauner  writes:

> On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
>> Christian Brauner  writes:
>> 
>> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
>> >> On 05.04.2018 17:07, Christian Brauner wrote:
>> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
>> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
>> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network 
>> >> >>> namespaces")
>> >> >>>
>> >> >>> enabled sending hotplug events into all network namespaces back in 
>> >> >>> 2010.
>> >> >>> Over time the set of uevents that get sent into all network 
>> >> >>> namespaces has
>> >> >>> shrunk. We have now reached the point where hotplug events for all 
>> >> >>> devices
>> >> >>> that carry a namespace tag are filtered according to that namespace.
>> >> >>>
>> >> >>> Specifically, they are filtered whenever the namespace tag of the 
>> >> >>> kobject
>> >> >>> does not match the namespace tag of the netlink socket. One example 
>> >> >>> are
>> >> >>> network devices. Uevents for network devices only show up in the 
>> >> >>> network
>> >> >>> namespaces these devices are moved to or created in.
>> >> >>>
>> >> >>> However, any uevent for a kobject that does not have a namespace tag
>> >> >>> associated with it will not be filtered and we will *try* to 
>> >> >>> broadcast it
>> >> >>> into all network namespaces.
>> >> >>>
>> >> >>> The original patchset was written in 2010 before user namespaces were 
>> >> >>> a
>> >> >>> thing. With the introduction of user namespaces sending out uevents 
>> >> >>> became
>> >> >>> partially isolated as they were filtered by user namespaces:
>> >> >>>
>> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
>> >> >>>
>> >> >>> if (!net_eq(sock_net(sk), p->net)) {
>> >> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
>> >> >>> return;
>> >> >>>
>> >> >>> if (!peernet_has_id(sock_net(sk), p->net))
>> >> >>> return;
>> >> >>>
>> >> >>> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>> >> >>>  CAP_NET_BROADCAST))
>> >> >>> j   return;
>> >> >>> }
>> >> >>>
>> >> >>> The file_ns_capable() check will check whether the caller had
>> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the 
>> >> >>> user
>> >> >>> namespace of interest. This check is fine in general but seems 
>> >> >>> insufficient
>> >> >>> to me when paired with uevents. The reason is that devices always 
>> >> >>> belong to
>> >> >>> the initial user namespace so uevents for kobjects that do not carry a
>> >> >>> namespace tag should never be sent into another user namespace. This 
>> >> >>> has
>> >> >>> been the intention all along. But there's one case where this breaks,
>> >> >>> namely if a new user namespace is created by root on the host and an
>> >> >>> identity mapping is established between root on the host and root in 
>> >> >>> the
>> >> >>> new user namespace. Here's a reproducer:
>> >> >>>
>> >> >>>  sudo unshare -U --map-root
>> >> >>>  udevadm monitor -k
>> >> >>>  # Now change to initial user namespace and e.g. do
>> >> >>>  modprobe kvm
>> >> >>>  # or
>> >> >>>  rmmod kvm
>> >> >>>
>> >> >>> will allow the non-initial user namespace to retrieve all uevents 
>> >> >>> from the
>> >> >>> host. This seems very anecdotal given that in the general case user
>> >> >>> namespaces do not see any uevents and also can't really do anything 
>> >> >>> useful
>> >> >>> with them.
>> >> >>>
>> >> >>> Additionally, it is now possible to send uevents from userspace. As 
>> >> >>> such we
>> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
>> >> >>> namespace of the network namespace of the netlink socket) userspace 
>> >> >>> process
>> >> >>> make a decision what uevents should be sent.
>> >> >>>
>> >> >>> This makes me think that we should simply ensure that uevents for 
>> >> >>> kobjects
>> >> >>> that do not carry a namespace tag are *always* filtered by user 
>> >> >>> namespace
>> >> >>> in kobj_bcast_filter(). Specifically:
>> >> >>> - If the owning user namespace of the uevent socket is not 
>> >> >>> init_user_ns the
>> >> >>>   event will always be filtered.
>> >> >>> - If the network namespace the uevent socket belongs to was created 
>> >> >>> in the
>> >> >>>   initial user namespace but was opened from a non-initial user 
>> >> >>> namespace
>> >> >>>   the event will be filtered as well.
>> >> >>> Put another way, uevents for kobjects not carrying a namespace tag 
>> >> >>> are now
>> >> >>> always only sent to the initial user namespace. The regression 
>> >> >>> potential
>> >> >>> for this is near to non-existent since user namespaces can't really do
>> >> >>> anything with interesting devices.
>> >> >>>
>> >> >>> Signed-off-by: Christian Brauner 
>> 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-09 Thread Christian Brauner
On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
> Christian Brauner  writes:
> 
> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
> >> On 05.04.2018 17:07, Christian Brauner wrote:
> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network 
> >> >>> namespaces")
> >> >>>
> >> >>> enabled sending hotplug events into all network namespaces back in 
> >> >>> 2010.
> >> >>> Over time the set of uevents that get sent into all network namespaces 
> >> >>> has
> >> >>> shrunk. We have now reached the point where hotplug events for all 
> >> >>> devices
> >> >>> that carry a namespace tag are filtered according to that namespace.
> >> >>>
> >> >>> Specifically, they are filtered whenever the namespace tag of the 
> >> >>> kobject
> >> >>> does not match the namespace tag of the netlink socket. One example are
> >> >>> network devices. Uevents for network devices only show up in the 
> >> >>> network
> >> >>> namespaces these devices are moved to or created in.
> >> >>>
> >> >>> However, any uevent for a kobject that does not have a namespace tag
> >> >>> associated with it will not be filtered and we will *try* to broadcast 
> >> >>> it
> >> >>> into all network namespaces.
> >> >>>
> >> >>> The original patchset was written in 2010 before user namespaces were a
> >> >>> thing. With the introduction of user namespaces sending out uevents 
> >> >>> became
> >> >>> partially isolated as they were filtered by user namespaces:
> >> >>>
> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
> >> >>>
> >> >>> if (!net_eq(sock_net(sk), p->net)) {
> >> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >> >>> return;
> >> >>>
> >> >>> if (!peernet_has_id(sock_net(sk), p->net))
> >> >>> return;
> >> >>>
> >> >>> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >> >>>  CAP_NET_BROADCAST))
> >> >>> j   return;
> >> >>> }
> >> >>>
> >> >>> The file_ns_capable() check will check whether the caller had
> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
> >> >>> namespace of interest. This check is fine in general but seems 
> >> >>> insufficient
> >> >>> to me when paired with uevents. The reason is that devices always 
> >> >>> belong to
> >> >>> the initial user namespace so uevents for kobjects that do not carry a
> >> >>> namespace tag should never be sent into another user namespace. This 
> >> >>> has
> >> >>> been the intention all along. But there's one case where this breaks,
> >> >>> namely if a new user namespace is created by root on the host and an
> >> >>> identity mapping is established between root on the host and root in 
> >> >>> the
> >> >>> new user namespace. Here's a reproducer:
> >> >>>
> >> >>>  sudo unshare -U --map-root
> >> >>>  udevadm monitor -k
> >> >>>  # Now change to initial user namespace and e.g. do
> >> >>>  modprobe kvm
> >> >>>  # or
> >> >>>  rmmod kvm
> >> >>>
> >> >>> will allow the non-initial user namespace to retrieve all uevents from 
> >> >>> the
> >> >>> host. This seems very anecdotal given that in the general case user
> >> >>> namespaces do not see any uevents and also can't really do anything 
> >> >>> useful
> >> >>> with them.
> >> >>>
> >> >>> Additionally, it is now possible to send uevents from userspace. As 
> >> >>> such we
> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> >> >>> namespace of the network namespace of the netlink socket) userspace 
> >> >>> process
> >> >>> make a decision what uevents should be sent.
> >> >>>
> >> >>> This makes me think that we should simply ensure that uevents for 
> >> >>> kobjects
> >> >>> that do not carry a namespace tag are *always* filtered by user 
> >> >>> namespace
> >> >>> in kobj_bcast_filter(). Specifically:
> >> >>> - If the owning user namespace of the uevent socket is not 
> >> >>> init_user_ns the
> >> >>>   event will always be filtered.
> >> >>> - If the network namespace the uevent socket belongs to was created in 
> >> >>> the
> >> >>>   initial user namespace but was opened from a non-initial user 
> >> >>> namespace
> >> >>>   the event will be filtered as well.
> >> >>> Put another way, uevents for kobjects not carrying a namespace tag are 
> >> >>> now
> >> >>> always only sent to the initial user namespace. The regression 
> >> >>> potential
> >> >>> for this is near to non-existent since user namespaces can't really do
> >> >>> anything with interesting devices.
> >> >>>
> >> >>> Signed-off-by: Christian Brauner 
> >> >>> ---
> >> >>>  lib/kobject_uevent.c | 10 +-
> >> >>>  1 file changed, 9 insertions(+), 1 deletion(-)
> >> >>>
> >> >>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >> >>> 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-06 Thread Eric W. Biederman
Christian Brauner  writes:

>> At a practical level there should be no receivers.  Plus performance
>> issues.  At least my memory is that any unprivileged user on the system
>> is allowed to listen to those events.
>
> Any unprivileged user is allowed to listen to uevents if they have
> net_broadcast in the user namespace the uevent socket was opened in;
> unless I'm misreading.

I believe you are.

This code in do_one_broadcast.

if (!net_eq(sock_net(sk), p->net)) {
if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
return;

if (!peernet_has_id(sock_net(sk), p->net))
return;

if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
 CAP_NET_BROADCAST))
return;
}

Used to just be:
if (!net_eq(sock_net(sk), p->net))
return;

Which makes sense when you have a shared hash table and a shared mc_list
between network namespaces.

There is a non-container use of network namespaces where you just need
different contexts were ip addresses can overlap etc.  In that
configuration where a single program is mananging multiple network
namespaces being able to listen to rtnetlink events in all of them is an
advantage.

For that case a special socket option NETLINK_F_LISTEN_ALL_NSID was
added that allowed one socket to listen for events from multiple network
namespaces.

If we rework the code in af_netlink.c that matters.  However for just
understanding uevents you can assume there are no sockets with
NETLINK_F_LISTEN_ALL_NSID set.

Eric



Re: [PATCH net-next] netns: filter uevents correctly

2018-04-06 Thread Christian Brauner
On Fri, Apr 06, 2018 at 09:45:41AM -0500, Eric W. Biederman wrote:
> Christian Brauner  writes:
> 
> > On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
> >> Christian Brauner  writes:
> >> 
> >> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
> >> >> On 05.04.2018 17:07, Christian Brauner wrote:
> >> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> >> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
> >> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network 
> >> >> >>> namespaces")
> >> >> >>>
> >> >> >>> enabled sending hotplug events into all network namespaces back in 
> >> >> >>> 2010.
> >> >> >>> Over time the set of uevents that get sent into all network 
> >> >> >>> namespaces has
> >> >> >>> shrunk. We have now reached the point where hotplug events for all 
> >> >> >>> devices
> >> >> >>> that carry a namespace tag are filtered according to that namespace.
> >> >> >>>
> >> >> >>> Specifically, they are filtered whenever the namespace tag of the 
> >> >> >>> kobject
> >> >> >>> does not match the namespace tag of the netlink socket. One example 
> >> >> >>> are
> >> >> >>> network devices. Uevents for network devices only show up in the 
> >> >> >>> network
> >> >> >>> namespaces these devices are moved to or created in.
> >> >> >>>
> >> >> >>> However, any uevent for a kobject that does not have a namespace tag
> >> >> >>> associated with it will not be filtered and we will *try* to 
> >> >> >>> broadcast it
> >> >> >>> into all network namespaces.
> >> >> >>>
> >> >> >>> The original patchset was written in 2010 before user namespaces 
> >> >> >>> were a
> >> >> >>> thing. With the introduction of user namespaces sending out uevents 
> >> >> >>> became
> >> >> >>> partially isolated as they were filtered by user namespaces:
> >> >> >>>
> >> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
> >> >> >>>
> >> >> >>> if (!net_eq(sock_net(sk), p->net)) {
> >> >> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >> >> >>> return;
> >> >> >>>
> >> >> >>> if (!peernet_has_id(sock_net(sk), p->net))
> >> >> >>> return;
> >> >> >>>
> >> >> >>> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >> >> >>>  CAP_NET_BROADCAST))
> >> >> >>> j   return;
> >> >> >>> }
> >> >> >>>
> >> >> >>> The file_ns_capable() check will check whether the caller had
> >> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the 
> >> >> >>> user
> >> >> >>> namespace of interest. This check is fine in general but seems 
> >> >> >>> insufficient
> >> >> >>> to me when paired with uevents. The reason is that devices always 
> >> >> >>> belong to
> >> >> >>> the initial user namespace so uevents for kobjects that do not 
> >> >> >>> carry a
> >> >> >>> namespace tag should never be sent into another user namespace. 
> >> >> >>> This has
> >> >> >>> been the intention all along. But there's one case where this 
> >> >> >>> breaks,
> >> >> >>> namely if a new user namespace is created by root on the host and an
> >> >> >>> identity mapping is established between root on the host and root 
> >> >> >>> in the
> >> >> >>> new user namespace. Here's a reproducer:
> >> >> >>>
> >> >> >>>  sudo unshare -U --map-root
> >> >> >>>  udevadm monitor -k
> >> >> >>>  # Now change to initial user namespace and e.g. do
> >> >> >>>  modprobe kvm
> >> >> >>>  # or
> >> >> >>>  rmmod kvm
> >> >> >>>
> >> >> >>> will allow the non-initial user namespace to retrieve all uevents 
> >> >> >>> from the
> >> >> >>> host. This seems very anecdotal given that in the general case user
> >> >> >>> namespaces do not see any uevents and also can't really do anything 
> >> >> >>> useful
> >> >> >>> with them.
> >> >> >>>
> >> >> >>> Additionally, it is now possible to send uevents from userspace. As 
> >> >> >>> such we
> >> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> >> >> >>> namespace of the network namespace of the netlink socket) userspace 
> >> >> >>> process
> >> >> >>> make a decision what uevents should be sent.
> >> >> >>>
> >> >> >>> This makes me think that we should simply ensure that uevents for 
> >> >> >>> kobjects
> >> >> >>> that do not carry a namespace tag are *always* filtered by user 
> >> >> >>> namespace
> >> >> >>> in kobj_bcast_filter(). Specifically:
> >> >> >>> - If the owning user namespace of the uevent socket is not 
> >> >> >>> init_user_ns the
> >> >> >>>   event will always be filtered.
> >> >> >>> - If the network namespace the uevent socket belongs to was created 
> >> >> >>> in the
> >> >> >>>   initial user namespace but was opened from a non-initial user 
> >> >> >>> namespace
> >> >> >>>   the event will be filtered as well.
> >> >> >>> Put another way, uevents for kobjects not carrying a namespace tag 
> >> >> >>> are now
> 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-06 Thread Eric W. Biederman
Christian Brauner  writes:

> On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
>> Christian Brauner  writes:
>> 
>> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
>> >> On 05.04.2018 17:07, Christian Brauner wrote:
>> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
>> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
>> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network 
>> >> >>> namespaces")
>> >> >>>
>> >> >>> enabled sending hotplug events into all network namespaces back in 
>> >> >>> 2010.
>> >> >>> Over time the set of uevents that get sent into all network 
>> >> >>> namespaces has
>> >> >>> shrunk. We have now reached the point where hotplug events for all 
>> >> >>> devices
>> >> >>> that carry a namespace tag are filtered according to that namespace.
>> >> >>>
>> >> >>> Specifically, they are filtered whenever the namespace tag of the 
>> >> >>> kobject
>> >> >>> does not match the namespace tag of the netlink socket. One example 
>> >> >>> are
>> >> >>> network devices. Uevents for network devices only show up in the 
>> >> >>> network
>> >> >>> namespaces these devices are moved to or created in.
>> >> >>>
>> >> >>> However, any uevent for a kobject that does not have a namespace tag
>> >> >>> associated with it will not be filtered and we will *try* to 
>> >> >>> broadcast it
>> >> >>> into all network namespaces.
>> >> >>>
>> >> >>> The original patchset was written in 2010 before user namespaces were 
>> >> >>> a
>> >> >>> thing. With the introduction of user namespaces sending out uevents 
>> >> >>> became
>> >> >>> partially isolated as they were filtered by user namespaces:
>> >> >>>
>> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
>> >> >>>
>> >> >>> if (!net_eq(sock_net(sk), p->net)) {
>> >> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
>> >> >>> return;
>> >> >>>
>> >> >>> if (!peernet_has_id(sock_net(sk), p->net))
>> >> >>> return;
>> >> >>>
>> >> >>> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>> >> >>>  CAP_NET_BROADCAST))
>> >> >>> j   return;
>> >> >>> }
>> >> >>>
>> >> >>> The file_ns_capable() check will check whether the caller had
>> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the 
>> >> >>> user
>> >> >>> namespace of interest. This check is fine in general but seems 
>> >> >>> insufficient
>> >> >>> to me when paired with uevents. The reason is that devices always 
>> >> >>> belong to
>> >> >>> the initial user namespace so uevents for kobjects that do not carry a
>> >> >>> namespace tag should never be sent into another user namespace. This 
>> >> >>> has
>> >> >>> been the intention all along. But there's one case where this breaks,
>> >> >>> namely if a new user namespace is created by root on the host and an
>> >> >>> identity mapping is established between root on the host and root in 
>> >> >>> the
>> >> >>> new user namespace. Here's a reproducer:
>> >> >>>
>> >> >>>  sudo unshare -U --map-root
>> >> >>>  udevadm monitor -k
>> >> >>>  # Now change to initial user namespace and e.g. do
>> >> >>>  modprobe kvm
>> >> >>>  # or
>> >> >>>  rmmod kvm
>> >> >>>
>> >> >>> will allow the non-initial user namespace to retrieve all uevents 
>> >> >>> from the
>> >> >>> host. This seems very anecdotal given that in the general case user
>> >> >>> namespaces do not see any uevents and also can't really do anything 
>> >> >>> useful
>> >> >>> with them.
>> >> >>>
>> >> >>> Additionally, it is now possible to send uevents from userspace. As 
>> >> >>> such we
>> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
>> >> >>> namespace of the network namespace of the netlink socket) userspace 
>> >> >>> process
>> >> >>> make a decision what uevents should be sent.
>> >> >>>
>> >> >>> This makes me think that we should simply ensure that uevents for 
>> >> >>> kobjects
>> >> >>> that do not carry a namespace tag are *always* filtered by user 
>> >> >>> namespace
>> >> >>> in kobj_bcast_filter(). Specifically:
>> >> >>> - If the owning user namespace of the uevent socket is not 
>> >> >>> init_user_ns the
>> >> >>>   event will always be filtered.
>> >> >>> - If the network namespace the uevent socket belongs to was created 
>> >> >>> in the
>> >> >>>   initial user namespace but was opened from a non-initial user 
>> >> >>> namespace
>> >> >>>   the event will be filtered as well.
>> >> >>> Put another way, uevents for kobjects not carrying a namespace tag 
>> >> >>> are now
>> >> >>> always only sent to the initial user namespace. The regression 
>> >> >>> potential
>> >> >>> for this is near to non-existent since user namespaces can't really do
>> >> >>> anything with interesting devices.
>> >> >>>
>> >> >>> Signed-off-by: Christian Brauner 
>> 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-06 Thread Christian Brauner
On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
> Christian Brauner  writes:
> 
> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
> >> On 05.04.2018 17:07, Christian Brauner wrote:
> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network 
> >> >>> namespaces")
> >> >>>
> >> >>> enabled sending hotplug events into all network namespaces back in 
> >> >>> 2010.
> >> >>> Over time the set of uevents that get sent into all network namespaces 
> >> >>> has
> >> >>> shrunk. We have now reached the point where hotplug events for all 
> >> >>> devices
> >> >>> that carry a namespace tag are filtered according to that namespace.
> >> >>>
> >> >>> Specifically, they are filtered whenever the namespace tag of the 
> >> >>> kobject
> >> >>> does not match the namespace tag of the netlink socket. One example are
> >> >>> network devices. Uevents for network devices only show up in the 
> >> >>> network
> >> >>> namespaces these devices are moved to or created in.
> >> >>>
> >> >>> However, any uevent for a kobject that does not have a namespace tag
> >> >>> associated with it will not be filtered and we will *try* to broadcast 
> >> >>> it
> >> >>> into all network namespaces.
> >> >>>
> >> >>> The original patchset was written in 2010 before user namespaces were a
> >> >>> thing. With the introduction of user namespaces sending out uevents 
> >> >>> became
> >> >>> partially isolated as they were filtered by user namespaces:
> >> >>>
> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
> >> >>>
> >> >>> if (!net_eq(sock_net(sk), p->net)) {
> >> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >> >>> return;
> >> >>>
> >> >>> if (!peernet_has_id(sock_net(sk), p->net))
> >> >>> return;
> >> >>>
> >> >>> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >> >>>  CAP_NET_BROADCAST))
> >> >>> j   return;
> >> >>> }
> >> >>>
> >> >>> The file_ns_capable() check will check whether the caller had
> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
> >> >>> namespace of interest. This check is fine in general but seems 
> >> >>> insufficient
> >> >>> to me when paired with uevents. The reason is that devices always 
> >> >>> belong to
> >> >>> the initial user namespace so uevents for kobjects that do not carry a
> >> >>> namespace tag should never be sent into another user namespace. This 
> >> >>> has
> >> >>> been the intention all along. But there's one case where this breaks,
> >> >>> namely if a new user namespace is created by root on the host and an
> >> >>> identity mapping is established between root on the host and root in 
> >> >>> the
> >> >>> new user namespace. Here's a reproducer:
> >> >>>
> >> >>>  sudo unshare -U --map-root
> >> >>>  udevadm monitor -k
> >> >>>  # Now change to initial user namespace and e.g. do
> >> >>>  modprobe kvm
> >> >>>  # or
> >> >>>  rmmod kvm
> >> >>>
> >> >>> will allow the non-initial user namespace to retrieve all uevents from 
> >> >>> the
> >> >>> host. This seems very anecdotal given that in the general case user
> >> >>> namespaces do not see any uevents and also can't really do anything 
> >> >>> useful
> >> >>> with them.
> >> >>>
> >> >>> Additionally, it is now possible to send uevents from userspace. As 
> >> >>> such we
> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> >> >>> namespace of the network namespace of the netlink socket) userspace 
> >> >>> process
> >> >>> make a decision what uevents should be sent.
> >> >>>
> >> >>> This makes me think that we should simply ensure that uevents for 
> >> >>> kobjects
> >> >>> that do not carry a namespace tag are *always* filtered by user 
> >> >>> namespace
> >> >>> in kobj_bcast_filter(). Specifically:
> >> >>> - If the owning user namespace of the uevent socket is not 
> >> >>> init_user_ns the
> >> >>>   event will always be filtered.
> >> >>> - If the network namespace the uevent socket belongs to was created in 
> >> >>> the
> >> >>>   initial user namespace but was opened from a non-initial user 
> >> >>> namespace
> >> >>>   the event will be filtered as well.
> >> >>> Put another way, uevents for kobjects not carrying a namespace tag are 
> >> >>> now
> >> >>> always only sent to the initial user namespace. The regression 
> >> >>> potential
> >> >>> for this is near to non-existent since user namespaces can't really do
> >> >>> anything with interesting devices.
> >> >>>
> >> >>> Signed-off-by: Christian Brauner 
> >> >>> ---
> >> >>>  lib/kobject_uevent.c | 10 +-
> >> >>>  1 file changed, 9 insertions(+), 1 deletion(-)
> >> >>>
> >> >>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >> >>> 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-05 Thread Eric W. Biederman
Christian Brauner  writes:

> On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
>> On 05.04.2018 17:07, Christian Brauner wrote:
>> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
>> >> On 04.04.2018 22:48, Christian Brauner wrote:
>> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network 
>> >>> namespaces")
>> >>>
>> >>> enabled sending hotplug events into all network namespaces back in 2010.
>> >>> Over time the set of uevents that get sent into all network namespaces 
>> >>> has
>> >>> shrunk. We have now reached the point where hotplug events for all 
>> >>> devices
>> >>> that carry a namespace tag are filtered according to that namespace.
>> >>>
>> >>> Specifically, they are filtered whenever the namespace tag of the kobject
>> >>> does not match the namespace tag of the netlink socket. One example are
>> >>> network devices. Uevents for network devices only show up in the network
>> >>> namespaces these devices are moved to or created in.
>> >>>
>> >>> However, any uevent for a kobject that does not have a namespace tag
>> >>> associated with it will not be filtered and we will *try* to broadcast it
>> >>> into all network namespaces.
>> >>>
>> >>> The original patchset was written in 2010 before user namespaces were a
>> >>> thing. With the introduction of user namespaces sending out uevents 
>> >>> became
>> >>> partially isolated as they were filtered by user namespaces:
>> >>>
>> >>> net/netlink/af_netlink.c:do_one_broadcast()
>> >>>
>> >>> if (!net_eq(sock_net(sk), p->net)) {
>> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
>> >>> return;
>> >>>
>> >>> if (!peernet_has_id(sock_net(sk), p->net))
>> >>> return;
>> >>>
>> >>> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>> >>>  CAP_NET_BROADCAST))
>> >>> j   return;
>> >>> }
>> >>>
>> >>> The file_ns_capable() check will check whether the caller had
>> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
>> >>> namespace of interest. This check is fine in general but seems 
>> >>> insufficient
>> >>> to me when paired with uevents. The reason is that devices always belong 
>> >>> to
>> >>> the initial user namespace so uevents for kobjects that do not carry a
>> >>> namespace tag should never be sent into another user namespace. This has
>> >>> been the intention all along. But there's one case where this breaks,
>> >>> namely if a new user namespace is created by root on the host and an
>> >>> identity mapping is established between root on the host and root in the
>> >>> new user namespace. Here's a reproducer:
>> >>>
>> >>>  sudo unshare -U --map-root
>> >>>  udevadm monitor -k
>> >>>  # Now change to initial user namespace and e.g. do
>> >>>  modprobe kvm
>> >>>  # or
>> >>>  rmmod kvm
>> >>>
>> >>> will allow the non-initial user namespace to retrieve all uevents from 
>> >>> the
>> >>> host. This seems very anecdotal given that in the general case user
>> >>> namespaces do not see any uevents and also can't really do anything 
>> >>> useful
>> >>> with them.
>> >>>
>> >>> Additionally, it is now possible to send uevents from userspace. As such 
>> >>> we
>> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
>> >>> namespace of the network namespace of the netlink socket) userspace 
>> >>> process
>> >>> make a decision what uevents should be sent.
>> >>>
>> >>> This makes me think that we should simply ensure that uevents for 
>> >>> kobjects
>> >>> that do not carry a namespace tag are *always* filtered by user namespace
>> >>> in kobj_bcast_filter(). Specifically:
>> >>> - If the owning user namespace of the uevent socket is not init_user_ns 
>> >>> the
>> >>>   event will always be filtered.
>> >>> - If the network namespace the uevent socket belongs to was created in 
>> >>> the
>> >>>   initial user namespace but was opened from a non-initial user namespace
>> >>>   the event will be filtered as well.
>> >>> Put another way, uevents for kobjects not carrying a namespace tag are 
>> >>> now
>> >>> always only sent to the initial user namespace. The regression potential
>> >>> for this is near to non-existent since user namespaces can't really do
>> >>> anything with interesting devices.
>> >>>
>> >>> Signed-off-by: Christian Brauner 
>> >>> ---
>> >>>  lib/kobject_uevent.c | 10 +-
>> >>>  1 file changed, 9 insertions(+), 1 deletion(-)
>> >>>
>> >>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>> >>> index 15ea216a67ce..cb98cddb6e3b 100644
>> >>> --- a/lib/kobject_uevent.c
>> >>> +++ b/lib/kobject_uevent.c
>> >>> @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, 
>> >>> struct sk_buff *skb, void *data)
>> >>>  return sock_ns != ns;
>> >>>  }
>> >>>  
>> >>> -return 0;
>> >>> +/*
>> >>> + * 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-05 Thread Christian Brauner
On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
> On 05.04.2018 17:07, Christian Brauner wrote:
> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> >> On 04.04.2018 22:48, Christian Brauner wrote:
> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network 
> >>> namespaces")
> >>>
> >>> enabled sending hotplug events into all network namespaces back in 2010.
> >>> Over time the set of uevents that get sent into all network namespaces has
> >>> shrunk. We have now reached the point where hotplug events for all devices
> >>> that carry a namespace tag are filtered according to that namespace.
> >>>
> >>> Specifically, they are filtered whenever the namespace tag of the kobject
> >>> does not match the namespace tag of the netlink socket. One example are
> >>> network devices. Uevents for network devices only show up in the network
> >>> namespaces these devices are moved to or created in.
> >>>
> >>> However, any uevent for a kobject that does not have a namespace tag
> >>> associated with it will not be filtered and we will *try* to broadcast it
> >>> into all network namespaces.
> >>>
> >>> The original patchset was written in 2010 before user namespaces were a
> >>> thing. With the introduction of user namespaces sending out uevents became
> >>> partially isolated as they were filtered by user namespaces:
> >>>
> >>> net/netlink/af_netlink.c:do_one_broadcast()
> >>>
> >>> if (!net_eq(sock_net(sk), p->net)) {
> >>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >>> return;
> >>>
> >>> if (!peernet_has_id(sock_net(sk), p->net))
> >>> return;
> >>>
> >>> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >>>  CAP_NET_BROADCAST))
> >>> j   return;
> >>> }
> >>>
> >>> The file_ns_capable() check will check whether the caller had
> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
> >>> namespace of interest. This check is fine in general but seems 
> >>> insufficient
> >>> to me when paired with uevents. The reason is that devices always belong 
> >>> to
> >>> the initial user namespace so uevents for kobjects that do not carry a
> >>> namespace tag should never be sent into another user namespace. This has
> >>> been the intention all along. But there's one case where this breaks,
> >>> namely if a new user namespace is created by root on the host and an
> >>> identity mapping is established between root on the host and root in the
> >>> new user namespace. Here's a reproducer:
> >>>
> >>>  sudo unshare -U --map-root
> >>>  udevadm monitor -k
> >>>  # Now change to initial user namespace and e.g. do
> >>>  modprobe kvm
> >>>  # or
> >>>  rmmod kvm
> >>>
> >>> will allow the non-initial user namespace to retrieve all uevents from the
> >>> host. This seems very anecdotal given that in the general case user
> >>> namespaces do not see any uevents and also can't really do anything useful
> >>> with them.
> >>>
> >>> Additionally, it is now possible to send uevents from userspace. As such 
> >>> we
> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> >>> namespace of the network namespace of the netlink socket) userspace 
> >>> process
> >>> make a decision what uevents should be sent.
> >>>
> >>> This makes me think that we should simply ensure that uevents for kobjects
> >>> that do not carry a namespace tag are *always* filtered by user namespace
> >>> in kobj_bcast_filter(). Specifically:
> >>> - If the owning user namespace of the uevent socket is not init_user_ns 
> >>> the
> >>>   event will always be filtered.
> >>> - If the network namespace the uevent socket belongs to was created in the
> >>>   initial user namespace but was opened from a non-initial user namespace
> >>>   the event will be filtered as well.
> >>> Put another way, uevents for kobjects not carrying a namespace tag are now
> >>> always only sent to the initial user namespace. The regression potential
> >>> for this is near to non-existent since user namespaces can't really do
> >>> anything with interesting devices.
> >>>
> >>> Signed-off-by: Christian Brauner 
> >>> ---
> >>>  lib/kobject_uevent.c | 10 +-
> >>>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >>> index 15ea216a67ce..cb98cddb6e3b 100644
> >>> --- a/lib/kobject_uevent.c
> >>> +++ b/lib/kobject_uevent.c
> >>> @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, 
> >>> struct sk_buff *skb, void *data)
> >>>   return sock_ns != ns;
> >>>   }
> >>>  
> >>> - return 0;
> >>> + /*
> >>> +  * The kobject does not carry a namespace tag so filter by user
> >>> +  * namespace below.
> >>> +  */
> >>> + if (sock_net(dsk)->user_ns != _user_ns)
> >>> + return 1;
> >>> +
> >>> + /* Check if socket was opened from non-initial user namespace. */
> >>> + 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-05 Thread Kirill Tkhai
On 05.04.2018 17:07, Christian Brauner wrote:
> On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
>> On 04.04.2018 22:48, Christian Brauner wrote:
>>> commit 07e98962fa77 ("kobject: Send hotplug events in all network 
>>> namespaces")
>>>
>>> enabled sending hotplug events into all network namespaces back in 2010.
>>> Over time the set of uevents that get sent into all network namespaces has
>>> shrunk. We have now reached the point where hotplug events for all devices
>>> that carry a namespace tag are filtered according to that namespace.
>>>
>>> Specifically, they are filtered whenever the namespace tag of the kobject
>>> does not match the namespace tag of the netlink socket. One example are
>>> network devices. Uevents for network devices only show up in the network
>>> namespaces these devices are moved to or created in.
>>>
>>> However, any uevent for a kobject that does not have a namespace tag
>>> associated with it will not be filtered and we will *try* to broadcast it
>>> into all network namespaces.
>>>
>>> The original patchset was written in 2010 before user namespaces were a
>>> thing. With the introduction of user namespaces sending out uevents became
>>> partially isolated as they were filtered by user namespaces:
>>>
>>> net/netlink/af_netlink.c:do_one_broadcast()
>>>
>>> if (!net_eq(sock_net(sk), p->net)) {
>>> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
>>> return;
>>>
>>> if (!peernet_has_id(sock_net(sk), p->net))
>>> return;
>>>
>>> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>>>  CAP_NET_BROADCAST))
>>> j   return;
>>> }
>>>
>>> The file_ns_capable() check will check whether the caller had
>>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
>>> namespace of interest. This check is fine in general but seems insufficient
>>> to me when paired with uevents. The reason is that devices always belong to
>>> the initial user namespace so uevents for kobjects that do not carry a
>>> namespace tag should never be sent into another user namespace. This has
>>> been the intention all along. But there's one case where this breaks,
>>> namely if a new user namespace is created by root on the host and an
>>> identity mapping is established between root on the host and root in the
>>> new user namespace. Here's a reproducer:
>>>
>>>  sudo unshare -U --map-root
>>>  udevadm monitor -k
>>>  # Now change to initial user namespace and e.g. do
>>>  modprobe kvm
>>>  # or
>>>  rmmod kvm
>>>
>>> will allow the non-initial user namespace to retrieve all uevents from the
>>> host. This seems very anecdotal given that in the general case user
>>> namespaces do not see any uevents and also can't really do anything useful
>>> with them.
>>>
>>> Additionally, it is now possible to send uevents from userspace. As such we
>>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
>>> namespace of the network namespace of the netlink socket) userspace process
>>> make a decision what uevents should be sent.
>>>
>>> This makes me think that we should simply ensure that uevents for kobjects
>>> that do not carry a namespace tag are *always* filtered by user namespace
>>> in kobj_bcast_filter(). Specifically:
>>> - If the owning user namespace of the uevent socket is not init_user_ns the
>>>   event will always be filtered.
>>> - If the network namespace the uevent socket belongs to was created in the
>>>   initial user namespace but was opened from a non-initial user namespace
>>>   the event will be filtered as well.
>>> Put another way, uevents for kobjects not carrying a namespace tag are now
>>> always only sent to the initial user namespace. The regression potential
>>> for this is near to non-existent since user namespaces can't really do
>>> anything with interesting devices.
>>>
>>> Signed-off-by: Christian Brauner 
>>> ---
>>>  lib/kobject_uevent.c | 10 +-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>>> index 15ea216a67ce..cb98cddb6e3b 100644
>>> --- a/lib/kobject_uevent.c
>>> +++ b/lib/kobject_uevent.c
>>> @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct 
>>> sk_buff *skb, void *data)
>>> return sock_ns != ns;
>>> }
>>>  
>>> -   return 0;
>>> +   /*
>>> +* The kobject does not carry a namespace tag so filter by user
>>> +* namespace below.
>>> +*/
>>> +   if (sock_net(dsk)->user_ns != _user_ns)
>>> +   return 1;
>>> +
>>> +   /* Check if socket was opened from non-initial user namespace. */
>>> +   return sk_user_ns(dsk) != _user_ns;
>>>  }
>>>  #endif
>>
>> So, this prohibits to listen events of all devices except network-related
>> in containers? If it's so, I don't think it's a good solution. Uevents is not
> 
> No, this is not correct: As it is right now *without my 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-05 Thread Christian Brauner
On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> On 04.04.2018 22:48, Christian Brauner wrote:
> > commit 07e98962fa77 ("kobject: Send hotplug events in all network 
> > namespaces")
> > 
> > enabled sending hotplug events into all network namespaces back in 2010.
> > Over time the set of uevents that get sent into all network namespaces has
> > shrunk. We have now reached the point where hotplug events for all devices
> > that carry a namespace tag are filtered according to that namespace.
> > 
> > Specifically, they are filtered whenever the namespace tag of the kobject
> > does not match the namespace tag of the netlink socket. One example are
> > network devices. Uevents for network devices only show up in the network
> > namespaces these devices are moved to or created in.
> > 
> > However, any uevent for a kobject that does not have a namespace tag
> > associated with it will not be filtered and we will *try* to broadcast it
> > into all network namespaces.
> > 
> > The original patchset was written in 2010 before user namespaces were a
> > thing. With the introduction of user namespaces sending out uevents became
> > partially isolated as they were filtered by user namespaces:
> > 
> > net/netlink/af_netlink.c:do_one_broadcast()
> > 
> > if (!net_eq(sock_net(sk), p->net)) {
> > if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> > return;
> > 
> > if (!peernet_has_id(sock_net(sk), p->net))
> > return;
> > 
> > if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >  CAP_NET_BROADCAST))
> > j   return;
> > }
> > 
> > The file_ns_capable() check will check whether the caller had
> > CAP_NET_BROADCAST at the time of opening the netlink socket in the user
> > namespace of interest. This check is fine in general but seems insufficient
> > to me when paired with uevents. The reason is that devices always belong to
> > the initial user namespace so uevents for kobjects that do not carry a
> > namespace tag should never be sent into another user namespace. This has
> > been the intention all along. But there's one case where this breaks,
> > namely if a new user namespace is created by root on the host and an
> > identity mapping is established between root on the host and root in the
> > new user namespace. Here's a reproducer:
> > 
> >  sudo unshare -U --map-root
> >  udevadm monitor -k
> >  # Now change to initial user namespace and e.g. do
> >  modprobe kvm
> >  # or
> >  rmmod kvm
> > 
> > will allow the non-initial user namespace to retrieve all uevents from the
> > host. This seems very anecdotal given that in the general case user
> > namespaces do not see any uevents and also can't really do anything useful
> > with them.
> > 
> > Additionally, it is now possible to send uevents from userspace. As such we
> > can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> > namespace of the network namespace of the netlink socket) userspace process
> > make a decision what uevents should be sent.
> > 
> > This makes me think that we should simply ensure that uevents for kobjects
> > that do not carry a namespace tag are *always* filtered by user namespace
> > in kobj_bcast_filter(). Specifically:
> > - If the owning user namespace of the uevent socket is not init_user_ns the
> >   event will always be filtered.
> > - If the network namespace the uevent socket belongs to was created in the
> >   initial user namespace but was opened from a non-initial user namespace
> >   the event will be filtered as well.
> > Put another way, uevents for kobjects not carrying a namespace tag are now
> > always only sent to the initial user namespace. The regression potential
> > for this is near to non-existent since user namespaces can't really do
> > anything with interesting devices.
> > 
> > Signed-off-by: Christian Brauner 
> > ---
> >  lib/kobject_uevent.c | 10 +-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> > index 15ea216a67ce..cb98cddb6e3b 100644
> > --- a/lib/kobject_uevent.c
> > +++ b/lib/kobject_uevent.c
> > @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct 
> > sk_buff *skb, void *data)
> > return sock_ns != ns;
> > }
> >  
> > -   return 0;
> > +   /*
> > +* The kobject does not carry a namespace tag so filter by user
> > +* namespace below.
> > +*/
> > +   if (sock_net(dsk)->user_ns != _user_ns)
> > +   return 1;
> > +
> > +   /* Check if socket was opened from non-initial user namespace. */
> > +   return sk_user_ns(dsk) != _user_ns;
> >  }
> >  #endif
> 
> So, this prohibits to listen events of all devices except network-related
> in containers? If it's so, I don't think it's a good solution. Uevents is not

No, this is not correct: As it is right now *without my patch* no
non-initial user namespace is 

Re: [PATCH net-next] netns: filter uevents correctly

2018-04-05 Thread Kirill Tkhai
On 04.04.2018 22:48, Christian Brauner wrote:
> commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
> 
> enabled sending hotplug events into all network namespaces back in 2010.
> Over time the set of uevents that get sent into all network namespaces has
> shrunk. We have now reached the point where hotplug events for all devices
> that carry a namespace tag are filtered according to that namespace.
> 
> Specifically, they are filtered whenever the namespace tag of the kobject
> does not match the namespace tag of the netlink socket. One example are
> network devices. Uevents for network devices only show up in the network
> namespaces these devices are moved to or created in.
> 
> However, any uevent for a kobject that does not have a namespace tag
> associated with it will not be filtered and we will *try* to broadcast it
> into all network namespaces.
> 
> The original patchset was written in 2010 before user namespaces were a
> thing. With the introduction of user namespaces sending out uevents became
> partially isolated as they were filtered by user namespaces:
> 
> net/netlink/af_netlink.c:do_one_broadcast()
> 
> if (!net_eq(sock_net(sk), p->net)) {
> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> return;
> 
> if (!peernet_has_id(sock_net(sk), p->net))
> return;
> 
> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>  CAP_NET_BROADCAST))
> j   return;
> }
> 
> The file_ns_capable() check will check whether the caller had
> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
> namespace of interest. This check is fine in general but seems insufficient
> to me when paired with uevents. The reason is that devices always belong to
> the initial user namespace so uevents for kobjects that do not carry a
> namespace tag should never be sent into another user namespace. This has
> been the intention all along. But there's one case where this breaks,
> namely if a new user namespace is created by root on the host and an
> identity mapping is established between root on the host and root in the
> new user namespace. Here's a reproducer:
> 
>  sudo unshare -U --map-root
>  udevadm monitor -k
>  # Now change to initial user namespace and e.g. do
>  modprobe kvm
>  # or
>  rmmod kvm
> 
> will allow the non-initial user namespace to retrieve all uevents from the
> host. This seems very anecdotal given that in the general case user
> namespaces do not see any uevents and also can't really do anything useful
> with them.
> 
> Additionally, it is now possible to send uevents from userspace. As such we
> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> namespace of the network namespace of the netlink socket) userspace process
> make a decision what uevents should be sent.
> 
> This makes me think that we should simply ensure that uevents for kobjects
> that do not carry a namespace tag are *always* filtered by user namespace
> in kobj_bcast_filter(). Specifically:
> - If the owning user namespace of the uevent socket is not init_user_ns the
>   event will always be filtered.
> - If the network namespace the uevent socket belongs to was created in the
>   initial user namespace but was opened from a non-initial user namespace
>   the event will be filtered as well.
> Put another way, uevents for kobjects not carrying a namespace tag are now
> always only sent to the initial user namespace. The regression potential
> for this is near to non-existent since user namespaces can't really do
> anything with interesting devices.
> 
> Signed-off-by: Christian Brauner 
> ---
>  lib/kobject_uevent.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> index 15ea216a67ce..cb98cddb6e3b 100644
> --- a/lib/kobject_uevent.c
> +++ b/lib/kobject_uevent.c
> @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct 
> sk_buff *skb, void *data)
>   return sock_ns != ns;
>   }
>  
> - return 0;
> + /*
> +  * The kobject does not carry a namespace tag so filter by user
> +  * namespace below.
> +  */
> + if (sock_net(dsk)->user_ns != _user_ns)
> + return 1;
> +
> + /* Check if socket was opened from non-initial user namespace. */
> + return sk_user_ns(dsk) != _user_ns;
>  }
>  #endif

So, this prohibits to listen events of all devices except network-related
in containers? If it's so, I don't think it's a good solution. Uevents is not
net-devices-only related interface and it's used for all devices in system.
People may want to delegate block devices to nested user_ns, for example.

Better we should think about something like "generic device <-> user_ns" 
connection,
and to filter events by this user_ns.

Thanks,
Kirill


[PATCH net-next] netns: filter uevents correctly

2018-04-04 Thread Christian Brauner
commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")

enabled sending hotplug events into all network namespaces back in 2010.
Over time the set of uevents that get sent into all network namespaces has
shrunk. We have now reached the point where hotplug events for all devices
that carry a namespace tag are filtered according to that namespace.

Specifically, they are filtered whenever the namespace tag of the kobject
does not match the namespace tag of the netlink socket. One example are
network devices. Uevents for network devices only show up in the network
namespaces these devices are moved to or created in.

However, any uevent for a kobject that does not have a namespace tag
associated with it will not be filtered and we will *try* to broadcast it
into all network namespaces.

The original patchset was written in 2010 before user namespaces were a
thing. With the introduction of user namespaces sending out uevents became
partially isolated as they were filtered by user namespaces:

net/netlink/af_netlink.c:do_one_broadcast()

if (!net_eq(sock_net(sk), p->net)) {
if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
return;

if (!peernet_has_id(sock_net(sk), p->net))
return;

if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
 CAP_NET_BROADCAST))
j   return;
}

The file_ns_capable() check will check whether the caller had
CAP_NET_BROADCAST at the time of opening the netlink socket in the user
namespace of interest. This check is fine in general but seems insufficient
to me when paired with uevents. The reason is that devices always belong to
the initial user namespace so uevents for kobjects that do not carry a
namespace tag should never be sent into another user namespace. This has
been the intention all along. But there's one case where this breaks,
namely if a new user namespace is created by root on the host and an
identity mapping is established between root on the host and root in the
new user namespace. Here's a reproducer:

 sudo unshare -U --map-root
 udevadm monitor -k
 # Now change to initial user namespace and e.g. do
 modprobe kvm
 # or
 rmmod kvm

will allow the non-initial user namespace to retrieve all uevents from the
host. This seems very anecdotal given that in the general case user
namespaces do not see any uevents and also can't really do anything useful
with them.

Additionally, it is now possible to send uevents from userspace. As such we
can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
namespace of the network namespace of the netlink socket) userspace process
make a decision what uevents should be sent.

This makes me think that we should simply ensure that uevents for kobjects
that do not carry a namespace tag are *always* filtered by user namespace
in kobj_bcast_filter(). Specifically:
- If the owning user namespace of the uevent socket is not init_user_ns the
  event will always be filtered.
- If the network namespace the uevent socket belongs to was created in the
  initial user namespace but was opened from a non-initial user namespace
  the event will be filtered as well.
Put another way, uevents for kobjects not carrying a namespace tag are now
always only sent to the initial user namespace. The regression potential
for this is near to non-existent since user namespaces can't really do
anything with interesting devices.

Signed-off-by: Christian Brauner 
---
 lib/kobject_uevent.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 15ea216a67ce..cb98cddb6e3b 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct 
sk_buff *skb, void *data)
return sock_ns != ns;
}
 
-   return 0;
+   /*
+* The kobject does not carry a namespace tag so filter by user
+* namespace below.
+*/
+   if (sock_net(dsk)->user_ns != _user_ns)
+   return 1;
+
+   /* Check if socket was opened from non-initial user namespace. */
+   return sk_user_ns(dsk) != _user_ns;
 }
 #endif
 
-- 
2.15.1