Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-14 Thread Eric W. Biederman
Vivek Goyal  writes:

> On Wed, Mar 12, 2014 at 07:12:25PM -0700, Andy Lutomirski wrote:
>
>> I can think of at least three other ways to do this.
>> 
>> 1. Fix Docker to use user namespaces and use the uid of the requesting
>> process via SCM_CREDENTIALS.
>
> Using user namespaces sounds like the right way to do it (atleast
> conceptually). But I think hurdle here is that people are not convinced
> yet that user namespaces are secure and work well. IOW, some people
> don't seem to think that user namespaces are ready yet.

If the problem is user namespace immaturity patches or bug reports need
to be sent for user namespaces.

Containers with user namespaces (however immature they are) are much
more secure than running container with processes with uid == 0 inside
of them.  User namespaces do considerably reduce the attack surface of
what uid == 0 can do.

> I guess that's the reason people are looking for other ways to
> achieve their goal.

It seems strange to work around a feature that is 99% of the way to
solving their problem with more kernel patches.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-14 Thread Eric W. Biederman
Vivek Goyal  writes:

> On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
>> On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
>> > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
>> >
>> > [..]
>> >> >> 2. Docker is a container system, so use the "container" (aka
>> >> >> namespace) APIs.  There are probably several clever things that could
>> >> >> be done with /proc//ns.
>> >> >
>> >> > pid is racy, if it weren't I would simply go straight
>> >> > to /proc//cgroups ...
>> >>
>> >> How about:
>> >>
>> >> open("/proc/self/ns/ipc", O_RDONLY);
>> >> send the result over SCM_RIGHTS?
>> >
>> > As I don't know I will ask. So what will server now do with this file
>> > descriptor of client's ipc namespace.
>> >
>> > IOW, what information/identifier does it contain which can be
>> > used to map to pre-configrued per container/per namespace policies.
>> 
>> Inode number, which will match that assigned to the container at runtime.
>> 
>
> But what would I do with this inode number. I am assuming this is
> generated dynamically when respective namespace was created. To me
> this is like assigning a pid dynamically and one does not create
> policies in user space based on pid. Similarly I will not be able
> to create policies based on an inode number which is generated
> dynamically.
>
> For it to be useful, it should map to something more static which
> user space understands.

But the mapping can be done in userspace.  stat all of the namespaces
you care about, get their inode numbers, and then do a lookup.

Hard coding string based names in the kernel the way cgroups does is
really pretty terrible and it seriously limits the flexibility of the
api, and so far breaks nested containers.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 1:06 PM, Vivek Goyal  wrote:
> On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
>> On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
>> > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
>> >
>> > [..]
>> >> >> 2. Docker is a container system, so use the "container" (aka
>> >> >> namespace) APIs.  There are probably several clever things that could
>> >> >> be done with /proc//ns.
>> >> >
>> >> > pid is racy, if it weren't I would simply go straight
>> >> > to /proc//cgroups ...
>> >>
>> >> How about:
>> >>
>> >> open("/proc/self/ns/ipc", O_RDONLY);
>> >> send the result over SCM_RIGHTS?
>> >
>> > As I don't know I will ask. So what will server now do with this file
>> > descriptor of client's ipc namespace.
>> >
>> > IOW, what information/identifier does it contain which can be
>> > used to map to pre-configrued per container/per namespace policies.
>>
>> Inode number, which will match that assigned to the container at runtime.
>>
>
> But what would I do with this inode number. I am assuming this is
> generated dynamically when respective namespace was created. To me
> this is like assigning a pid dynamically and one does not create
> policies in user space based on pid. Similarly I will not be able
> to create policies based on an inode number which is generated
> dynamically.
>
> For it to be useful, it should map to something more static which
> user space understands.

Like what?  I imagine that, at best, sssd will be hardcoding some
understanding of Docker's cgroup names.  As an alternative, it could
ask Docker for a uid or an inode number of something else -- it's
hardcoding an understanding of Docker anyway.  And Docker needs to
cooperate regardless, since otherwise it could change its cgroup
naming or stop using cgroups entirely.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 04:17:55PM -0400, Vivek Goyal wrote:
> On Thu, Mar 13, 2014 at 04:06:49PM -0400, Vivek Goyal wrote:
> > On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
> > > On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
> > > > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
> > > >
> > > > [..]
> > > >> >> 2. Docker is a container system, so use the "container" (aka
> > > >> >> namespace) APIs.  There are probably several clever things that 
> > > >> >> could
> > > >> >> be done with /proc//ns.
> > > >> >
> > > >> > pid is racy, if it weren't I would simply go straight
> > > >> > to /proc//cgroups ...
> > > >>
> > > >> How about:
> > > >>
> > > >> open("/proc/self/ns/ipc", O_RDONLY);
> > > >> send the result over SCM_RIGHTS?
> > > >
> > > > As I don't know I will ask. So what will server now do with this file
> > > > descriptor of client's ipc namespace.
> > > >
> > > > IOW, what information/identifier does it contain which can be
> > > > used to map to pre-configrued per container/per namespace policies.
> > > 
> > > Inode number, which will match that assigned to the container at runtime.
> > > 
> > 
> > But what would I do with this inode number. I am assuming this is
> > generated dynamically when respective namespace was created. To me
> > this is like assigning a pid dynamically and one does not create
> > policies in user space based on pid. Similarly I will not be able
> > to create policies based on an inode number which is generated
> > dynamically.
> > 
> > For it to be useful, it should map to something more static which
> > user space understands.
> 
> Or could we do following.
> 
> open("/proc/self/cgroup", O_RDONLY);
> send the result over SCM_RIGHTS

I guess that would not work. Client should be able to create a file,
fake cgroup information and send fd.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 04:06:49PM -0400, Vivek Goyal wrote:
> On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
> > On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
> > > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
> > >
> > > [..]
> > >> >> 2. Docker is a container system, so use the "container" (aka
> > >> >> namespace) APIs.  There are probably several clever things that could
> > >> >> be done with /proc//ns.
> > >> >
> > >> > pid is racy, if it weren't I would simply go straight
> > >> > to /proc//cgroups ...
> > >>
> > >> How about:
> > >>
> > >> open("/proc/self/ns/ipc", O_RDONLY);
> > >> send the result over SCM_RIGHTS?
> > >
> > > As I don't know I will ask. So what will server now do with this file
> > > descriptor of client's ipc namespace.
> > >
> > > IOW, what information/identifier does it contain which can be
> > > used to map to pre-configrued per container/per namespace policies.
> > 
> > Inode number, which will match that assigned to the container at runtime.
> > 
> 
> But what would I do with this inode number. I am assuming this is
> generated dynamically when respective namespace was created. To me
> this is like assigning a pid dynamically and one does not create
> policies in user space based on pid. Similarly I will not be able
> to create policies based on an inode number which is generated
> dynamically.
> 
> For it to be useful, it should map to something more static which
> user space understands.

Or could we do following.

open("/proc/self/cgroup", O_RDONLY);
send the result over SCM_RIGHTS

But this requires client modification.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
> On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
> > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
> >
> > [..]
> >> >> 2. Docker is a container system, so use the "container" (aka
> >> >> namespace) APIs.  There are probably several clever things that could
> >> >> be done with /proc//ns.
> >> >
> >> > pid is racy, if it weren't I would simply go straight
> >> > to /proc//cgroups ...
> >>
> >> How about:
> >>
> >> open("/proc/self/ns/ipc", O_RDONLY);
> >> send the result over SCM_RIGHTS?
> >
> > As I don't know I will ask. So what will server now do with this file
> > descriptor of client's ipc namespace.
> >
> > IOW, what information/identifier does it contain which can be
> > used to map to pre-configrued per container/per namespace policies.
> 
> Inode number, which will match that assigned to the container at runtime.
> 

But what would I do with this inode number. I am assuming this is
generated dynamically when respective namespace was created. To me
this is like assigning a pid dynamically and one does not create
policies in user space based on pid. Similarly I will not be able
to create policies based on an inode number which is generated
dynamically.

For it to be useful, it should map to something more static which
user space understands.

Thanks
Vivek 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
> On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
>
> [..]
>> >> 2. Docker is a container system, so use the "container" (aka
>> >> namespace) APIs.  There are probably several clever things that could
>> >> be done with /proc//ns.
>> >
>> > pid is racy, if it weren't I would simply go straight
>> > to /proc//cgroups ...
>>
>> How about:
>>
>> open("/proc/self/ns/ipc", O_RDONLY);
>> send the result over SCM_RIGHTS?
>
> As I don't know I will ask. So what will server now do with this file
> descriptor of client's ipc namespace.
>
> IOW, what information/identifier does it contain which can be
> used to map to pre-configrued per container/per namespace policies.

Inode number, which will match that assigned to the container at runtime.

(I'm not sure this is a great idea -- there's no convention that "I
have an fd for a namespace" means "I'm a daemon in that namespace".)

--Andy

>
> Thanks
> Vivek



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:

[..]
> >> 2. Docker is a container system, so use the "container" (aka
> >> namespace) APIs.  There are probably several clever things that could
> >> be done with /proc//ns.
> >
> > pid is racy, if it weren't I would simply go straight
> > to /proc//cgroups ...
> 
> How about:
> 
> open("/proc/self/ns/ipc", O_RDONLY);
> send the result over SCM_RIGHTS?

As I don't know I will ask. So what will server now do with this file
descriptor of client's ipc namespace.

IOW, what information/identifier does it contain which can be
used to map to pre-configrued per container/per namespace policies.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Tim Hockin
I don't buy that it is not practical.  Not convenient, maybe.  Not
clean, sure.  But it is practical - it uses mechanisms that exist on
all kernels today.  That is a win, to me.

On Thu, Mar 13, 2014 at 10:58 AM, Simo Sorce  wrote:
> On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
>>
>> So give each container its own unix socket.  Problem solved, no?
>
> Not really practical if you have hundreds of containers.
>
> Simo.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 10:57 AM, Simo Sorce  wrote:
> On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
>> On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce  wrote:
>> > On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
>> >> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce  wrote:
>> >> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
>> >> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
>> >> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> >> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  
>> >> >> >> wrote:
>> >> >> >>
>> >> >> >> >
>> >> >> >> > Connection time is all we do and can care about.
>> >> >> >>
>> >> >> >> You have not answered why.
>> >> >> >
>> >> >> > We are going to disclose information to the peer based on policy that
>> >> >> > depends on the cgroup the peer is part of. All we care for is who 
>> >> >> > opened
>> >> >> > the connection, if the peer wants to pass on that information after 
>> >> >> > it
>> >> >> > has obtained it there is nothing we can do, so connection time is 
>> >> >> > all we
>> >> >> > really care about.
>> >> >>
>> >> >> Can you give a realistic example?
>> >> >>
>> >> >> I could say that I'd like to disclose information to processes based
>> >> >> on their rlimits at the time they connected, but I don't think that
>> >> >> would carry much weight.
>> >> >
>> >> > We want to be able to show different user's list from SSSD based on the
>> >> > docker container that is asking for it.
>> >> >
>> >> > This works by having libnsss_sss.so from the containerized application
>> >> > connect to an SSSD daemon running on the host or in another container.
>> >> >
>> >> > The only way to distinguish between containers "from the outside" is to
>> >> > lookup the cgroup of the requesting process. It has a unique container
>> >> > ID, and can therefore be mapped to the appropriate policy that will let
>> >> > us decide which 'user domain' to serve to the container.
>> >> >
>> >>
>> >> I can think of at least three other ways to do this.
>> >>
>> >> 1. Fix Docker to use user namespaces and use the uid of the requesting
>> >> process via SCM_CREDENTIALS.
>> >
>> > This is not practical, I have no control on what UIDs will be used
>> > within a container, and IIRC user namespaces have severe limitations
>> > that may make them unusable in some situations. Forcing the use of user
>> > namespaces on docker to satisfy my use case is not in my power.
>>
>> Except that Docker w/o userns is basically completely insecure unless
>> selinux or apparmor is in use, so this may not matter.
>>
>> >
>> >> 2. Docker is a container system, so use the "container" (aka
>> >> namespace) APIs.  There are probably several clever things that could
>> >> be done with /proc//ns.
>> >
>> > pid is racy, if it weren't I would simply go straight
>> > to /proc//cgroups ...
>>
>> How about:
>>
>> open("/proc/self/ns/ipc", O_RDONLY);
>> send the result over SCM_RIGHTS?
>
> This needs to work with existing clients, existing clients, don't do
> this.
>

Wait... you want completely unmodified clients in a container to talk
to a service that they don't even realize is outside the container and
for that server to magically behave differently because the container
is there?  And there's no per-container proxy involved?  And every
container is connecting to *the very same socket*?

I just can't imagine this working well regardless if what magic socket
options you add, especially if user namespaces aren't in use.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 01:51:17PM -0400, Simo Sorce wrote:

[..]
> > 1. Fix Docker to use user namespaces and use the uid of the requesting
> > process via SCM_CREDENTIALS.
> 
> This is not practical, I have no control on what UIDs will be used
> within a container,

I guess uid to container mapping has to be managed by somebody, say systemd.
Then there systemd should export an API to query the container a uid is
mapped into. So that should not be the real problem.

> and IIRC user namespaces have severe limitations
> that may make them unusable in some situations. Forcing the use of user
> namespaces on docker to satisfy my use case is not in my power.

I think that's the real practical problem. Adoption of user name space.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 10:58 AM, Simo Sorce  wrote:
> On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
>>
>> So give each container its own unix socket.  Problem solved, no?
>
> Not really practical if you have hundreds of containers.

I don't see the problem.  Sockets are cheap.

>
> Simo.
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
> 
> So give each container its own unix socket.  Problem solved, no?

Not really practical if you have hundreds of containers.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
> On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce  wrote:
> > On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce  wrote:
> >> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
> >> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
> >> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> >> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  
> >> >> >> wrote:
> >> >> >>
> >> >> >> >
> >> >> >> > Connection time is all we do and can care about.
> >> >> >>
> >> >> >> You have not answered why.
> >> >> >
> >> >> > We are going to disclose information to the peer based on policy that
> >> >> > depends on the cgroup the peer is part of. All we care for is who 
> >> >> > opened
> >> >> > the connection, if the peer wants to pass on that information after it
> >> >> > has obtained it there is nothing we can do, so connection time is all 
> >> >> > we
> >> >> > really care about.
> >> >>
> >> >> Can you give a realistic example?
> >> >>
> >> >> I could say that I'd like to disclose information to processes based
> >> >> on their rlimits at the time they connected, but I don't think that
> >> >> would carry much weight.
> >> >
> >> > We want to be able to show different user's list from SSSD based on the
> >> > docker container that is asking for it.
> >> >
> >> > This works by having libnsss_sss.so from the containerized application
> >> > connect to an SSSD daemon running on the host or in another container.
> >> >
> >> > The only way to distinguish between containers "from the outside" is to
> >> > lookup the cgroup of the requesting process. It has a unique container
> >> > ID, and can therefore be mapped to the appropriate policy that will let
> >> > us decide which 'user domain' to serve to the container.
> >> >
> >>
> >> I can think of at least three other ways to do this.
> >>
> >> 1. Fix Docker to use user namespaces and use the uid of the requesting
> >> process via SCM_CREDENTIALS.
> >
> > This is not practical, I have no control on what UIDs will be used
> > within a container, and IIRC user namespaces have severe limitations
> > that may make them unusable in some situations. Forcing the use of user
> > namespaces on docker to satisfy my use case is not in my power.
> 
> Except that Docker w/o userns is basically completely insecure unless
> selinux or apparmor is in use, so this may not matter.
> 
> >
> >> 2. Docker is a container system, so use the "container" (aka
> >> namespace) APIs.  There are probably several clever things that could
> >> be done with /proc//ns.
> >
> > pid is racy, if it weren't I would simply go straight
> > to /proc//cgroups ...
> 
> How about:
> 
> open("/proc/self/ns/ipc", O_RDONLY);
> send the result over SCM_RIGHTS?

This needs to work with existing clients, existing clients, don't do
this.

> >> 3. Given that Docker uses network namespaces, I assume that the socket
> >> connection between the two sssd instances either comes from Docker
> >> itself or uses socket inodes.  In either case, the same mechanism
> >> should be usable for authentication.
> >
> > It is a unix socket, ie bind mounted on the container filesystem, not
> > sure network namespaces really come into the picture, and I do not know
> > of a racefree way of knowing what is the namespace of the peer at
> > connect time.
> > Is there a SO_PEER_NAMESPACE option ?
> 
> So give each container its own unix socket.  Problem solved, no?
> 
> --Andy



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:25 -0700, Andy Lutomirski wrote:
> On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce  wrote:
> > On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
> >> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
> >>
> >> [..]
> >> > > > This might not be quite as awful as I thought.  At least you're
> >> > > > looking up the cgroup at connection time instead of at send time.
> >> > > >
> >> > > > OTOH, this is still racy -- the socket could easily outlive the 
> >> > > > cgroup
> >> > > > that created it.
> >> > >
> >> > > That's a good point. What guarantees that previous cgroup was not
> >> > > reassigned to a different container.
> >> > >
> >> > > What if a process A opens the connection with sssd. Process A passes 
> >> > > the
> >> > > file descriptor to a different process B in a differnt container.
> >> >
> >> > Stop right here.
> >> > If the process passes the fd it is not my problem anymore.
> >> > The process can as well just 'proxy' all the information to another
> >> > process.
> >> >
> >> > We just care to properly identify the 'original' container, we are not
> >> > in the business of detecting malicious behavior. That's something other
> >> > mechanism need to protect against (SELinux or other LSMs, normal
> >> > permissions, capabilities, etc...).
> >> >
> >> > > Process A exits. Container gets removed from system and new one gets
> >> > > launched which uses same cgroup as old one. Now process B sends a new
> >> > > request and SSSD will serve it based on policy of newly launched
> >> > > container.
> >> > >
> >> > > This sounds very similar to pid race where socket/connection will 
> >> > > outlive
> >> > > the pid.
> >> >
> >> > Nope, completely different.
> >> >
> >>
> >> I think you missed my point. Passing file descriptor is not the problem.
> >> Problem is reuse of same cgroup name for a different container while
> >> socket lives on. And it is same race as reuse of a pid for a different
> >> process.
> >
> > The cgroup name should not be reused of course, if userspace does that,
> > it is userspace's issue. cgroup names are not a constrained namespace
> > like pids which force the kernel to reuse them for processes of a
> > different nature.
> >
> 
> You're proposing a feature that will enshrine cgroups into the API use
> by non-cgroup-controlling applications.  I don't think that anyone
> thinks that cgroups are pretty, so this is an unfortunate thing to
> have to do.
> 
> I've suggested three different ways that your goal could be achieved
> without using cgroups at all.  You haven't really addressed any of
> them.

I replied now, none of them strike me as practical or something that can
be enforced.

> In order for something like this to go into the kernel, I would expect
> a real use case and a justification for why this is the right way to
> do it.

I think my justification is quite real, the fact you do not like it does
not really make it any less real.

I am open to suggestions on alternative methods of course, I do not care
which way as long as it is practical and does not cause unreasonable
restrictions on the containerization. As far as I could see all of the
container stuff uses cgroups already for various reasons, so using
cgroups seem natural.

> "Docker containers can be identified by cgroup path" is completely
> unconvincing to me.

Provide an alternative, so far there is a cgroup with a unique name
associated to every container, I haven't found any other way to derive
that information in a race free way so far.

Simo.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Tim Hockin
In some sense a cgroup is a pgrp that mere mortals can't escape.  Why
not just do something like that?  root can set this "container id" or
"job id" on your process when it first starts (e.g. docker sets it on
your container process) or even make a cgroup that sets this for all
processes in that cgroup.

ints are better than strings anyway.

On Thu, Mar 13, 2014 at 10:25 AM, Andy Lutomirski  wrote:
> On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce  wrote:
>> On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
>>> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
>>>
>>> [..]
>>> > > > This might not be quite as awful as I thought.  At least you're
>>> > > > looking up the cgroup at connection time instead of at send time.
>>> > > >
>>> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup
>>> > > > that created it.
>>> > >
>>> > > That's a good point. What guarantees that previous cgroup was not
>>> > > reassigned to a different container.
>>> > >
>>> > > What if a process A opens the connection with sssd. Process A passes the
>>> > > file descriptor to a different process B in a differnt container.
>>> >
>>> > Stop right here.
>>> > If the process passes the fd it is not my problem anymore.
>>> > The process can as well just 'proxy' all the information to another
>>> > process.
>>> >
>>> > We just care to properly identify the 'original' container, we are not
>>> > in the business of detecting malicious behavior. That's something other
>>> > mechanism need to protect against (SELinux or other LSMs, normal
>>> > permissions, capabilities, etc...).
>>> >
>>> > > Process A exits. Container gets removed from system and new one gets
>>> > > launched which uses same cgroup as old one. Now process B sends a new
>>> > > request and SSSD will serve it based on policy of newly launched
>>> > > container.
>>> > >
>>> > > This sounds very similar to pid race where socket/connection will 
>>> > > outlive
>>> > > the pid.
>>> >
>>> > Nope, completely different.
>>> >
>>>
>>> I think you missed my point. Passing file descriptor is not the problem.
>>> Problem is reuse of same cgroup name for a different container while
>>> socket lives on. And it is same race as reuse of a pid for a different
>>> process.
>>
>> The cgroup name should not be reused of course, if userspace does that,
>> it is userspace's issue. cgroup names are not a constrained namespace
>> like pids which force the kernel to reuse them for processes of a
>> different nature.
>>
>
> You're proposing a feature that will enshrine cgroups into the API use
> by non-cgroup-controlling applications.  I don't think that anyone
> thinks that cgroups are pretty, so this is an unfortunate thing to
> have to do.
>
> I've suggested three different ways that your goal could be achieved
> without using cgroups at all.  You haven't really addressed any of
> them.
>
> In order for something like this to go into the kernel, I would expect
> a real use case and a justification for why this is the right way to
> do it.
>
> "Docker containers can be identified by cgroup path" is completely
> unconvincing to me.
>
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce  wrote:
> On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce  wrote:
>> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
>> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
>> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
>> >> >>
>> >> >> >
>> >> >> > Connection time is all we do and can care about.
>> >> >>
>> >> >> You have not answered why.
>> >> >
>> >> > We are going to disclose information to the peer based on policy that
>> >> > depends on the cgroup the peer is part of. All we care for is who opened
>> >> > the connection, if the peer wants to pass on that information after it
>> >> > has obtained it there is nothing we can do, so connection time is all we
>> >> > really care about.
>> >>
>> >> Can you give a realistic example?
>> >>
>> >> I could say that I'd like to disclose information to processes based
>> >> on their rlimits at the time they connected, but I don't think that
>> >> would carry much weight.
>> >
>> > We want to be able to show different user's list from SSSD based on the
>> > docker container that is asking for it.
>> >
>> > This works by having libnsss_sss.so from the containerized application
>> > connect to an SSSD daemon running on the host or in another container.
>> >
>> > The only way to distinguish between containers "from the outside" is to
>> > lookup the cgroup of the requesting process. It has a unique container
>> > ID, and can therefore be mapped to the appropriate policy that will let
>> > us decide which 'user domain' to serve to the container.
>> >
>>
>> I can think of at least three other ways to do this.
>>
>> 1. Fix Docker to use user namespaces and use the uid of the requesting
>> process via SCM_CREDENTIALS.
>
> This is not practical, I have no control on what UIDs will be used
> within a container, and IIRC user namespaces have severe limitations
> that may make them unusable in some situations. Forcing the use of user
> namespaces on docker to satisfy my use case is not in my power.

Except that Docker w/o userns is basically completely insecure unless
selinux or apparmor is in use, so this may not matter.

>
>> 2. Docker is a container system, so use the "container" (aka
>> namespace) APIs.  There are probably several clever things that could
>> be done with /proc//ns.
>
> pid is racy, if it weren't I would simply go straight
> to /proc//cgroups ...

How about:

open("/proc/self/ns/ipc", O_RDONLY);
send the result over SCM_RIGHTS?

>
>> 3. Given that Docker uses network namespaces, I assume that the socket
>> connection between the two sssd instances either comes from Docker
>> itself or uses socket inodes.  In either case, the same mechanism
>> should be usable for authentication.
>
> It is a unix socket, ie bind mounted on the container filesystem, not
> sure network namespaces really come into the picture, and I do not know
> of a racefree way of knowing what is the namespace of the peer at
> connect time.
> Is there a SO_PEER_NAMESPACE option ?

So give each container its own unix socket.  Problem solved, no?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce  wrote:
> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
> >> >>
> >> >> >
> >> >> > Connection time is all we do and can care about.
> >> >>
> >> >> You have not answered why.
> >> >
> >> > We are going to disclose information to the peer based on policy that
> >> > depends on the cgroup the peer is part of. All we care for is who opened
> >> > the connection, if the peer wants to pass on that information after it
> >> > has obtained it there is nothing we can do, so connection time is all we
> >> > really care about.
> >>
> >> Can you give a realistic example?
> >>
> >> I could say that I'd like to disclose information to processes based
> >> on their rlimits at the time they connected, but I don't think that
> >> would carry much weight.
> >
> > We want to be able to show different user's list from SSSD based on the
> > docker container that is asking for it.
> >
> > This works by having libnsss_sss.so from the containerized application
> > connect to an SSSD daemon running on the host or in another container.
> >
> > The only way to distinguish between containers "from the outside" is to
> > lookup the cgroup of the requesting process. It has a unique container
> > ID, and can therefore be mapped to the appropriate policy that will let
> > us decide which 'user domain' to serve to the container.
> >
> 
> I can think of at least three other ways to do this.
> 
> 1. Fix Docker to use user namespaces and use the uid of the requesting
> process via SCM_CREDENTIALS.

This is not practical, I have no control on what UIDs will be used
within a container, and IIRC user namespaces have severe limitations
that may make them unusable in some situations. Forcing the use of user
namespaces on docker to satisfy my use case is not in my power.

> 2. Docker is a container system, so use the "container" (aka
> namespace) APIs.  There are probably several clever things that could
> be done with /proc//ns.

pid is racy, if it weren't I would simply go straight
to /proc//cgroups ...

> 3. Given that Docker uses network namespaces, I assume that the socket
> connection between the two sssd instances either comes from Docker
> itself or uses socket inodes.  In either case, the same mechanism
> should be usable for authentication.

It is a unix socket, ie bind mounted on the container filesystem, not
sure network namespaces really come into the picture, and I do not know
of a racefree way of knowing what is the namespace of the peer at
connect time.
Is there a SO_PEER_NAMESPACE option ?

Simo.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce  wrote:
> On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
>> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
>>
>> [..]
>> > > > This might not be quite as awful as I thought.  At least you're
>> > > > looking up the cgroup at connection time instead of at send time.
>> > > >
>> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup
>> > > > that created it.
>> > >
>> > > That's a good point. What guarantees that previous cgroup was not
>> > > reassigned to a different container.
>> > >
>> > > What if a process A opens the connection with sssd. Process A passes the
>> > > file descriptor to a different process B in a differnt container.
>> >
>> > Stop right here.
>> > If the process passes the fd it is not my problem anymore.
>> > The process can as well just 'proxy' all the information to another
>> > process.
>> >
>> > We just care to properly identify the 'original' container, we are not
>> > in the business of detecting malicious behavior. That's something other
>> > mechanism need to protect against (SELinux or other LSMs, normal
>> > permissions, capabilities, etc...).
>> >
>> > > Process A exits. Container gets removed from system and new one gets
>> > > launched which uses same cgroup as old one. Now process B sends a new
>> > > request and SSSD will serve it based on policy of newly launched
>> > > container.
>> > >
>> > > This sounds very similar to pid race where socket/connection will outlive
>> > > the pid.
>> >
>> > Nope, completely different.
>> >
>>
>> I think you missed my point. Passing file descriptor is not the problem.
>> Problem is reuse of same cgroup name for a different container while
>> socket lives on. And it is same race as reuse of a pid for a different
>> process.
>
> The cgroup name should not be reused of course, if userspace does that,
> it is userspace's issue. cgroup names are not a constrained namespace
> like pids which force the kernel to reuse them for processes of a
> different nature.
>

You're proposing a feature that will enshrine cgroups into the API use
by non-cgroup-controlling applications.  I don't think that anyone
thinks that cgroups are pretty, so this is an unfortunate thing to
have to do.

I've suggested three different ways that your goal could be achieved
without using cgroups at all.  You haven't really addressed any of
them.

In order for something like this to go into the kernel, I would expect
a real use case and a justification for why this is the right way to
do it.

"Docker containers can be identified by cgroup path" is completely
unconvincing to me.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
> 
> [..]
> > > > This might not be quite as awful as I thought.  At least you're
> > > > looking up the cgroup at connection time instead of at send time.
> > > > 
> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup
> > > > that created it.
> > > 
> > > That's a good point. What guarantees that previous cgroup was not
> > > reassigned to a different container.
> > > 
> > > What if a process A opens the connection with sssd. Process A passes the
> > > file descriptor to a different process B in a differnt container.
> > 
> > Stop right here.
> > If the process passes the fd it is not my problem anymore.
> > The process can as well just 'proxy' all the information to another
> > process.
> > 
> > We just care to properly identify the 'original' container, we are not
> > in the business of detecting malicious behavior. That's something other
> > mechanism need to protect against (SELinux or other LSMs, normal
> > permissions, capabilities, etc...).
> > 
> > > Process A exits. Container gets removed from system and new one gets
> > > launched which uses same cgroup as old one. Now process B sends a new
> > > request and SSSD will serve it based on policy of newly launched
> > > container.
> > > 
> > > This sounds very similar to pid race where socket/connection will outlive
> > > the pid.
> > 
> > Nope, completely different.
> > 
> 
> I think you missed my point. Passing file descriptor is not the problem.
> Problem is reuse of same cgroup name for a different container while
> socket lives on. And it is same race as reuse of a pid for a different
> process.

The cgroup name should not be reused of course, if userspace does that,
it is userspace's issue. cgroup names are not a constrained namespace
like pids which force the kernel to reuse them for processes of a
different nature.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:

[..]
> > > This might not be quite as awful as I thought.  At least you're
> > > looking up the cgroup at connection time instead of at send time.
> > > 
> > > OTOH, this is still racy -- the socket could easily outlive the cgroup
> > > that created it.
> > 
> > That's a good point. What guarantees that previous cgroup was not
> > reassigned to a different container.
> > 
> > What if a process A opens the connection with sssd. Process A passes the
> > file descriptor to a different process B in a differnt container.
> 
> Stop right here.
> If the process passes the fd it is not my problem anymore.
> The process can as well just 'proxy' all the information to another
> process.
> 
> We just care to properly identify the 'original' container, we are not
> in the business of detecting malicious behavior. That's something other
> mechanism need to protect against (SELinux or other LSMs, normal
> permissions, capabilities, etc...).
> 
> > Process A exits. Container gets removed from system and new one gets
> > launched which uses same cgroup as old one. Now process B sends a new
> > request and SSSD will serve it based on policy of newly launched
> > container.
> > 
> > This sounds very similar to pid race where socket/connection will outlive
> > the pid.
> 
> Nope, completely different.
> 

I think you missed my point. Passing file descriptor is not the problem.
Problem is reuse of same cgroup name for a different container while
socket lives on. And it is same race as reuse of a pid for a different
process.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:14 -0400, Vivek Goyal wrote:
> On Wed, Mar 12, 2014 at 02:12:33PM -0700, Andy Lutomirski wrote:
> > On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  
> > wrote:
> > > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> > >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> > >> cgroup of first mounted hierarchy of the task. For the case of client,
> > >> it represents the cgroup of client at the time of opening the connection.
> > >> After that client cgroup might change.
> > >
> > > Even if people decide that sending cgroups over a unix socket is a good
> > > idea, this API has my NAK in the strongest possible sense, for whatever
> > > my NAK is worth.
> > >
> > > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> > > *never* imply the use of a credential.  A program should always have to
> > > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> > >
> > > (I've found privilege escalations before based on this observation, and
> > > I suspect I'll find them again.)
> > >
> > >
> > > Note that I think that you really want SCM_SOMETHING_ELSE and not
> > > SCM_CGROUP, but I don't know what the use case is yet.
> > 
> > This might not be quite as awful as I thought.  At least you're
> > looking up the cgroup at connection time instead of at send time.
> > 
> > OTOH, this is still racy -- the socket could easily outlive the cgroup
> > that created it.
> 
> That's a good point. What guarantees that previous cgroup was not
> reassigned to a different container.
> 
> What if a process A opens the connection with sssd. Process A passes the
> file descriptor to a different process B in a differnt container.

Stop right here.
If the process passes the fd it is not my problem anymore.
The process can as well just 'proxy' all the information to another
process.

We just care to properly identify the 'original' container, we are not
in the business of detecting malicious behavior. That's something other
mechanism need to protect against (SELinux or other LSMs, normal
permissions, capabilities, etc...).

> Process A exits. Container gets removed from system and new one gets
> launched which uses same cgroup as old one. Now process B sends a new
> request and SSSD will serve it based on policy of newly launched
> container.
> 
> This sounds very similar to pid race where socket/connection will outlive
> the pid.

Nope, completely different.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Wed, Mar 12, 2014 at 07:12:25PM -0700, Andy Lutomirski wrote:

[..]
> >> Can you give a realistic example?
> >>
> >> I could say that I'd like to disclose information to processes based
> >> on their rlimits at the time they connected, but I don't think that
> >> would carry much weight.
> >
> > We want to be able to show different user's list from SSSD based on the
> > docker container that is asking for it.
> >
> > This works by having libnsss_sss.so from the containerized application
> > connect to an SSSD daemon running on the host or in another container.
> >
> > The only way to distinguish between containers "from the outside" is to
> > lookup the cgroup of the requesting process. It has a unique container
> > ID, and can therefore be mapped to the appropriate policy that will let
> > us decide which 'user domain' to serve to the container.
> >
> 
> I can think of at least three other ways to do this.
> 
> 1. Fix Docker to use user namespaces and use the uid of the requesting
> process via SCM_CREDENTIALS.

Using user namespaces sounds like the right way to do it (atleast
conceptually). But I think hurdle here is that people are not convinced
yet that user namespaces are secure and work well. IOW, some people
don't seem to think that user namespaces are ready yet.

I guess that's the reason people are looking for other ways to
achieve their goal.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Wed, Mar 12, 2014 at 02:12:33PM -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  wrote:
> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> >> cgroup of first mounted hierarchy of the task. For the case of client,
> >> it represents the cgroup of client at the time of opening the connection.
> >> After that client cgroup might change.
> >
> > Even if people decide that sending cgroups over a unix socket is a good
> > idea, this API has my NAK in the strongest possible sense, for whatever
> > my NAK is worth.
> >
> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> > *never* imply the use of a credential.  A program should always have to
> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> >
> > (I've found privilege escalations before based on this observation, and
> > I suspect I'll find them again.)
> >
> >
> > Note that I think that you really want SCM_SOMETHING_ELSE and not
> > SCM_CGROUP, but I don't know what the use case is yet.
> 
> This might not be quite as awful as I thought.  At least you're
> looking up the cgroup at connection time instead of at send time.
> 
> OTOH, this is still racy -- the socket could easily outlive the cgroup
> that created it.

That's a good point. What guarantees that previous cgroup was not
reassigned to a different container.

What if a process A opens the connection with sssd. Process A passes the
file descriptor to a different process B in a differnt container.
Process A exits. Container gets removed from system and new one gets
launched which uses same cgroup as old one. Now process B sends a new
request and SSSD will serve it based on policy of newly launched
container.

This sounds very similar to pid race where socket/connection will outlive
the pid.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Wed, Mar 12, 2014 at 01:58:57PM -0700, Cong Wang wrote:
> On Wed, Mar 12, 2014 at 1:46 PM, Vivek Goyal  wrote:
> > @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, 
> > struct sockaddr *uaddr,
> > if (newsk == NULL)
> > goto out;
> >
> > +   err = init_peercgroup(newsk);
> > +   if (err)
> > +   goto out;
> > +
> > +   err = alloc_cgroup_path(sk);
> > +   if (err)
> > +   goto out;
> > +
> > +   err = -ENOMEM;
> > +
> 
> Don't we need to free the cgroup_path on error path
> in this function?

Previous allocated cgroup_path is now in newsk->cgroup_path and I was
relying on __sk_free() freeing that memory if error happens.

unix_release_sock(sk)
  sock_put()
sk_free()
  __sk_free()
kfree(sk->cgroup_path)

Do you see a problem with that?

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce  wrote:
> On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
>> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
>> >>
>> >> >
>> >> > Connection time is all we do and can care about.
>> >>
>> >> You have not answered why.
>> >
>> > We are going to disclose information to the peer based on policy that
>> > depends on the cgroup the peer is part of. All we care for is who opened
>> > the connection, if the peer wants to pass on that information after it
>> > has obtained it there is nothing we can do, so connection time is all we
>> > really care about.
>>
>> Can you give a realistic example?
>>
>> I could say that I'd like to disclose information to processes based
>> on their rlimits at the time they connected, but I don't think that
>> would carry much weight.
>
> We want to be able to show different user's list from SSSD based on the
> docker container that is asking for it.
>
> This works by having libnsss_sss.so from the containerized application
> connect to an SSSD daemon running on the host or in another container.
>
> The only way to distinguish between containers "from the outside" is to
> lookup the cgroup of the requesting process. It has a unique container
> ID, and can therefore be mapped to the appropriate policy that will let
> us decide which 'user domain' to serve to the container.
>

I can think of at least three other ways to do this.

1. Fix Docker to use user namespaces and use the uid of the requesting
process via SCM_CREDENTIALS.

2. Docker is a container system, so use the "container" (aka
namespace) APIs.  There are probably several clever things that could
be done with /proc//ns.

3. Given that Docker uses network namespaces, I assume that the socket
connection between the two sssd instances either comes from Docker
itself or uses socket inodes.  In either case, the same mechanism
should be usable for authentication.

On an unrelated note, since you seem to have found a way to get unix
sockets to connect the inside and outside of a Docker container, it
would be awesome if Docker could use the same mechanism to pass TCP
sockets around rather than playing awful games with virtual networks.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Simo Sorce
On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
> >>
> >> >
> >> > Connection time is all we do and can care about.
> >>
> >> You have not answered why.
> >
> > We are going to disclose information to the peer based on policy that
> > depends on the cgroup the peer is part of. All we care for is who opened
> > the connection, if the peer wants to pass on that information after it
> > has obtained it there is nothing we can do, so connection time is all we
> > really care about.
> 
> Can you give a realistic example?
> 
> I could say that I'd like to disclose information to processes based
> on their rlimits at the time they connected, but I don't think that
> would carry much weight.

We want to be able to show different user's list from SSSD based on the
docker container that is asking for it.

This works by having libnsss_sss.so from the containerized application
connect to an SSSD daemon running on the host or in another container.

The only way to distinguish between containers "from the outside" is to
lookup the cgroup of the requesting process. It has a unique container
ID, and can therefore be mapped to the appropriate policy that will let
us decide which 'user domain' to serve to the container.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
> On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
>>
>> >
>> > Connection time is all we do and can care about.
>>
>> You have not answered why.
>
> We are going to disclose information to the peer based on policy that
> depends on the cgroup the peer is part of. All we care for is who opened
> the connection, if the peer wants to pass on that information after it
> has obtained it there is nothing we can do, so connection time is all we
> really care about.

Can you give a realistic example?

I could say that I'd like to disclose information to processes based
on their rlimits at the time they connected, but I don't think that
would carry much weight.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Simo Sorce
On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
> > On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  
> >> wrote:
> >> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> >> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> >> >> cgroup of first mounted hierarchy of the task. For the case of client,
> >> >> it represents the cgroup of client at the time of opening the 
> >> >> connection.
> >> >> After that client cgroup might change.
> >> >
> >> > Even if people decide that sending cgroups over a unix socket is a good
> >> > idea, this API has my NAK in the strongest possible sense, for whatever
> >> > my NAK is worth.
> >> >
> >> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> >> > *never* imply the use of a credential.  A program should always have to
> >> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> >> >
> >> > (I've found privilege escalations before based on this observation, and
> >> > I suspect I'll find them again.)
> >> >
> >> >
> >> > Note that I think that you really want SCM_SOMETHING_ELSE and not
> >> > SCM_CGROUP, but I don't know what the use case is yet.
> >>
> >> This might not be quite as awful as I thought.  At least you're
> >> looking up the cgroup at connection time instead of at send time.
> >>
> >> OTOH, this is still racy -- the socket could easily outlive the cgroup
> >> that created it.
> >
> > I think you do not understand how this whole problem space works.
> >
> > The problem is exactly the same as with SO_PEERCRED, so we are taking
> > the same proven solution.
> 
> You mean the same proven crappy solution?
> 
> >
> > Connection time is all we do and can care about.
> 
> You have not answered why.

We are going to disclose information to the peer based on policy that
depends on the cgroup the peer is part of. All we care for is who opened
the connection, if the peer wants to pass on that information after it
has obtained it there is nothing we can do, so connection time is all we
really care about.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
> On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  wrote:
>> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
>> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
>> >> cgroup of first mounted hierarchy of the task. For the case of client,
>> >> it represents the cgroup of client at the time of opening the connection.
>> >> After that client cgroup might change.
>> >
>> > Even if people decide that sending cgroups over a unix socket is a good
>> > idea, this API has my NAK in the strongest possible sense, for whatever
>> > my NAK is worth.
>> >
>> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
>> > *never* imply the use of a credential.  A program should always have to
>> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
>> >
>> > (I've found privilege escalations before based on this observation, and
>> > I suspect I'll find them again.)
>> >
>> >
>> > Note that I think that you really want SCM_SOMETHING_ELSE and not
>> > SCM_CGROUP, but I don't know what the use case is yet.
>>
>> This might not be quite as awful as I thought.  At least you're
>> looking up the cgroup at connection time instead of at send time.
>>
>> OTOH, this is still racy -- the socket could easily outlive the cgroup
>> that created it.
>
> I think you do not understand how this whole problem space works.
>
> The problem is exactly the same as with SO_PEERCRED, so we are taking
> the same proven solution.

You mean the same proven crappy solution?

>
> Connection time is all we do and can care about.

You have not answered why.

>
> Simo.
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Simo Sorce
On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  wrote:
> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> >> cgroup of first mounted hierarchy of the task. For the case of client,
> >> it represents the cgroup of client at the time of opening the connection.
> >> After that client cgroup might change.
> >
> > Even if people decide that sending cgroups over a unix socket is a good
> > idea, this API has my NAK in the strongest possible sense, for whatever
> > my NAK is worth.
> >
> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> > *never* imply the use of a credential.  A program should always have to
> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> >
> > (I've found privilege escalations before based on this observation, and
> > I suspect I'll find them again.)
> >
> >
> > Note that I think that you really want SCM_SOMETHING_ELSE and not
> > SCM_CGROUP, but I don't know what the use case is yet.
> 
> This might not be quite as awful as I thought.  At least you're
> looking up the cgroup at connection time instead of at send time.
> 
> OTOH, this is still racy -- the socket could easily outlive the cgroup
> that created it.

I think you do not understand how this whole problem space works.

The problem is exactly the same as with SO_PEERCRED, so we are taking
the same proven solution.

Connection time is all we do and can care about.

Simo.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  wrote:
> On 03/12/2014 01:46 PM, Vivek Goyal wrote:
>> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
>> cgroup of first mounted hierarchy of the task. For the case of client,
>> it represents the cgroup of client at the time of opening the connection.
>> After that client cgroup might change.
>
> Even if people decide that sending cgroups over a unix socket is a good
> idea, this API has my NAK in the strongest possible sense, for whatever
> my NAK is worth.
>
> IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> *never* imply the use of a credential.  A program should always have to
> *explicitly* request use of a credential.  What you want is SCM_CGROUP.
>
> (I've found privilege escalations before based on this observation, and
> I suspect I'll find them again.)
>
>
> Note that I think that you really want SCM_SOMETHING_ELSE and not
> SCM_CGROUP, but I don't know what the use case is yet.

This might not be quite as awful as I thought.  At least you're
looking up the cgroup at connection time instead of at send time.

OTOH, this is still racy -- the socket could easily outlive the cgroup
that created it.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> cgroup of first mounted hierarchy of the task. For the case of client,
> it represents the cgroup of client at the time of opening the connection.
> After that client cgroup might change.

Even if people decide that sending cgroups over a unix socket is a good
idea, this API has my NAK in the strongest possible sense, for whatever
my NAK is worth.

IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
*never* imply the use of a credential.  A program should always have to
*explicitly* request use of a credential.  What you want is SCM_CGROUP.

(I've found privilege escalations before based on this observation, and
I suspect I'll find them again.)


Note that I think that you really want SCM_SOMETHING_ELSE and not
SCM_CGROUP, but I don't know what the use case is yet.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Cong Wang
On Wed, Mar 12, 2014 at 1:46 PM, Vivek Goyal  wrote:
> @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, 
> struct sockaddr *uaddr,
> if (newsk == NULL)
> goto out;
>
> +   err = init_peercgroup(newsk);
> +   if (err)
> +   goto out;
> +
> +   err = alloc_cgroup_path(sk);
> +   if (err)
> +   goto out;
> +
> +   err = -ENOMEM;
> +

Don't we need to free the cgroup_path on error path
in this function?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Vivek Goyal
Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
cgroup of first mounted hierarchy of the task. For the case of client,
it represents the cgroup of client at the time of opening the connection.
After that client cgroup might change.

Signed-off-by: Vivek Goyal 
---
 arch/alpha/include/uapi/asm/socket.h   |  1 +
 arch/avr32/include/uapi/asm/socket.h   |  1 +
 arch/cris/include/uapi/asm/socket.h|  2 ++
 arch/frv/include/uapi/asm/socket.h |  1 +
 arch/ia64/include/uapi/asm/socket.h|  2 ++
 arch/m32r/include/uapi/asm/socket.h|  1 +
 arch/mips/include/uapi/asm/socket.h|  1 +
 arch/mn10300/include/uapi/asm/socket.h |  1 +
 arch/parisc/include/uapi/asm/socket.h  |  1 +
 arch/powerpc/include/uapi/asm/socket.h |  1 +
 arch/s390/include/uapi/asm/socket.h|  1 +
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  1 +
 include/net/sock.h |  1 +
 include/uapi/asm-generic/socket.h  |  2 ++
 net/core/sock.c| 19 ++
 net/unix/af_unix.c | 48 ++
 17 files changed, 86 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 3de1394..7178353 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -87,4 +87,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h 
b/arch/avr32/include/uapi/asm/socket.h
index 6e6cd15..486212b 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/cris/include/uapi/asm/socket.h 
b/arch/cris/include/uapi/asm/socket.h
index ed94e5e..89a09e3 100644
--- a/arch/cris/include/uapi/asm/socket.h
+++ b/arch/cris/include/uapi/asm/socket.h
@@ -82,6 +82,8 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
+
 #endif /* _ASM_SOCKET_H */
 
 
diff --git a/arch/frv/include/uapi/asm/socket.h 
b/arch/frv/include/uapi/asm/socket.h
index ca2c6e6..c4d90bc 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -80,5 +80,6 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index a1b49ba..62c196d 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h 
b/arch/m32r/include/uapi/asm/socket.h
index 6c9a24b..6e04a7d 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index a14baa2..cfbd84b 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -98,4 +98,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h 
b/arch/mn10300/include/uapi/asm/socket.h
index 6aa3ce1..73467fe 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h 
b/arch/parisc/include/uapi/asm/socket.h
index fe35cea..24d8913 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -79,4 +79,5 @@
 
 #define SO_BPF_EXTENSIONS  0x4029
 
+#define SO_PEERCGROUP   0x402a
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h 
b/arch/powerpc/include/uapi/asm/socket.h
index a9c3e2e..50106be 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -87,4 +87,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h 
b/arch/s390/include/uapi/asm/socket.h
index e031332..4ae2f3c 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -86,4 +86,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h 
b/arch/sparc/include/uapi/asm/socket.h
index 54d9608..1056168 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/i

[PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Vivek Goyal
Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
cgroup of first mounted hierarchy of the task. For the case of client,
it represents the cgroup of client at the time of opening the connection.
After that client cgroup might change.

Signed-off-by: Vivek Goyal 
---
 arch/alpha/include/uapi/asm/socket.h   |  1 +
 arch/avr32/include/uapi/asm/socket.h   |  1 +
 arch/cris/include/uapi/asm/socket.h|  2 ++
 arch/frv/include/uapi/asm/socket.h |  1 +
 arch/ia64/include/uapi/asm/socket.h|  2 ++
 arch/m32r/include/uapi/asm/socket.h|  1 +
 arch/mips/include/uapi/asm/socket.h|  1 +
 arch/mn10300/include/uapi/asm/socket.h |  1 +
 arch/parisc/include/uapi/asm/socket.h  |  1 +
 arch/powerpc/include/uapi/asm/socket.h |  1 +
 arch/s390/include/uapi/asm/socket.h|  1 +
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  1 +
 include/net/sock.h |  1 +
 include/uapi/asm-generic/socket.h  |  2 ++
 net/core/sock.c| 19 ++
 net/unix/af_unix.c | 48 ++
 17 files changed, 86 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 3de1394..7178353 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -87,4 +87,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h 
b/arch/avr32/include/uapi/asm/socket.h
index 6e6cd15..486212b 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/cris/include/uapi/asm/socket.h 
b/arch/cris/include/uapi/asm/socket.h
index ed94e5e..89a09e3 100644
--- a/arch/cris/include/uapi/asm/socket.h
+++ b/arch/cris/include/uapi/asm/socket.h
@@ -82,6 +82,8 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
+
 #endif /* _ASM_SOCKET_H */
 
 
diff --git a/arch/frv/include/uapi/asm/socket.h 
b/arch/frv/include/uapi/asm/socket.h
index ca2c6e6..c4d90bc 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -80,5 +80,6 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index a1b49ba..62c196d 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h 
b/arch/m32r/include/uapi/asm/socket.h
index 6c9a24b..6e04a7d 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index a14baa2..cfbd84b 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -98,4 +98,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h 
b/arch/mn10300/include/uapi/asm/socket.h
index 6aa3ce1..73467fe 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h 
b/arch/parisc/include/uapi/asm/socket.h
index fe35cea..24d8913 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -79,4 +79,5 @@
 
 #define SO_BPF_EXTENSIONS  0x4029
 
+#define SO_PEERCGROUP   0x402a
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h 
b/arch/powerpc/include/uapi/asm/socket.h
index a9c3e2e..50106be 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -87,4 +87,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h 
b/arch/s390/include/uapi/asm/socket.h
index e031332..4ae2f3c 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -86,4 +86,5 @@
 
 #define SO_BPF_EXTENSIONS  48
 
+#define SO_PEERCGROUP  49
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h 
b/arch/sparc/include/uapi/asm/socket.h
index 54d9608..1056168 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/i