Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Carl-Daniel Hailfinger
On 23.06.2007 19:19, Eric W. Biederman wrote:
> Patrick McHardy <[EMAIL PROTECTED]> writes:
> 
>> Eric W. Biederman wrote:
> 
>>> Depending upon the data structure it will either be modified to hold
>>> a per entry network namespace pointer or it there will be a separate
>>> copy per network namespace.  For large global data structures like
>>> the ipv4 routing cache hash table adding an additional pointer to the
>>> entries appears the more reasonable solution.
>>
>> So the routing cache is shared between all namespaces?
> 
> Yes.  Each namespaces has it's own view so semantically it's not
> shared.  But the initial fan out of the hash table 2M or something
> isn't something we want to replicate on a per namespace basis even
> assuming the huge page allocations could happen.
> 
> So we just tag the entries and add the network namespace as one more
> part of the key when doing hash table look ups.

Can one namespace DoS other namespaces' access to the routing cache?
Two scenarios come to mind:
* provoking hash collisions
* lock contention (sorry, haven't checked whether/how we do locking)

Regards,
Carl-Daniel
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Ben Greear

Eric W. Biederman wrote:

Ben Greear <[EMAIL PROTECTED]> writes:

  

Will we be able to have a single application be in multiple name-spaces?



A single application certainly.   But then an application can be composed
of multiple processes which can be composed of multiple threads.

In my current patches a single task_struct belongs to a single network
namespace.  That namespace is used when creating sockets.  The sockets
themselves have a namespace tag and that is used when transmitting
packets, or otherwise operating on the socket.

So if you pass a socket from one process to another you can have
sockets that belong to different network namespaces in a single task.
  
Any chance it could allow one to use a single threaded, single process 
and do something like

int fd1 = socket(, namespace1);
int fd2 = socket(, namespace2);

Or, maybe a sockopt or similar call to move a socket into a particular 
namespace?


I can certainly see it being useful to allow a default name-space per 
process, but it would be nice
to also allow explicit assignment of a socket to a name-space for 
applications that want to span

a large number of name-spaces.

Thanks,
Ben

--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Ben Greear

Stephen Hemminger wrote:
  

Will we be able to have a single application be in multiple name-spaces?



That would break the whole point of namespaces...
  
I was hoping that I could open a socket in one name-space and another in 
another name
space, and send traffic between them, within a single application.  This 
is basically what I can do now with my
send-to-self patch and (for more clever virtual-routing schemes + NAT, 
with a conn-track
patch that Patrick cooked up for me).  It seems these patches I use are 
not acceptable

for merge, so I was hoping name-spaces might work instead.

Thanks,
Ben

--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Eric W. Biederman
Carl-Daniel Hailfinger <[EMAIL PROTECTED]> writes:

> Can one namespace DoS other namespaces' access to the routing cache?
> Two scenarios come to mind:
> * provoking hash collisions
> * lock contention (sorry, haven't checked whether/how we do locking)

My initial expectation is that the protections we have to prevent one user from
performing a DoS on another user generally cover the cases between namespaces as
well.

Further in general global caches and global resource management is more 
efficient
then per namespace management.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Eric W. Biederman
Ben Greear <[EMAIL PROTECTED]> writes:

> Any chance it could allow one to use a single threaded, single process and do
> something like
> int fd1 = socket(, namespace1);
> int fd2 = socket(, namespace2);
>
> Or, maybe a sockopt or similar call to move a socket into a particular
> namespace?
>
> I can certainly see it being useful to allow a default name-space per process,
> but it would be nice
> to also allow explicit assignment of a socket to a name-space for applications
> that want to span
> a large number of name-spaces.

That isn't the primary use case so I have not considered it much.
A setsockopt call might be possible.

It is also possible to have a bunch of children opening sockets for you
and passing to the process that wants to do the work. If you have a
sufficiently slow socket creation rate that will not be a problem just
a little cumbersome.

If you can open all of your sockets upfront it is possible to do
something where you open your sockets then unshare your network
namespace and repeat.

I am committed to making general infrastructure not something that is
targeted in a brittle way at only one scenario.

So it may be that we can cover your scenario.  However it is just
enough off of the beaten path that I'm not going to worry about it
the first time through.  It looks like it is a very small step from
where I am at to where you want to be.  So you may be able to cook
up something that will satisfy your requirements relatively easily.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Ben Greear

Eric W. Biederman wrote:

So it may be that we can cover your scenario.  However it is just
enough off of the beaten path that I'm not going to worry about it
the first time through.  It looks like it is a very small step from
where I am at to where you want to be.  So you may be able to cook
up something that will satisfy your requirements relatively easily.
  

That sounds fair to me.

I will assume that as long as you can migrate sockets with the methods you
described, it should not be that difficult to do the same with a sockopt
or similar.

I'll revisit this when your patches are in mainline.

Thanks,
Ben

--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 23 Jun 2007 11:19:34 -0600

> Further and fundamentally all a global achieves is removing the need
> for the noise patches where you pass the pointer into the various
> functions.  For long term maintenance it doesn't help anything.

I don't accept that we have to add another function argument
to a bunch of core routines just to support this crap,
especially since you give no way to turn it off and get
that function argument slot back.

To be honest I think this form of virtualization is a complete
waste of time, even the openvz approach.

We're protecting the kernel from itself, and that's an endless
uphill battle that you will never win.  Let's do this kind of
stuff properly with a real minimal hypervisor, hopefully with
appropriate hardware level support and good virtualized device
interfaces, instead of this namespace stuff.

At least the hypervisor approach you have some chance to fully
harden in some verifyable and truly protected way, with
namespaces it's just a pipe dream and everyone who works on
these namespace approaches knows that very well.

The only positive thing that came out of this work is the
great auditing that the openvz folks have done and the bugs
they have found, but it basically ends right there.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Benny Amorsen
> "DM" == David Miller <[EMAIL PROTECTED]> writes:

DM> To be honest I think this form of virtualization is a complete
DM> waste of time, even the openvz approach.

You are only considering the security values of OpenVZ. Where I work,
OpenVZ and Linux-vserver are used for their ability to cleanly
separate processes. Security-wise, we could get the same effect just
by running the processes as separate users, but management-wise it is
so much easier to give them a completely separate environment.
OpenVZ's network virtualization enables us to do things which are
completely impossible with both the vanilla kernel and Xen -- e.g.
hundreds of virtual routers, with their own routing daemons.

Policy routing just doesn't cut it; it's cumbersome to set up, limited
to 256 tables, and routing daemons generally can't handle it well, if
at all.


/Benny



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Eric W. Biederman
David Miller <[EMAIL PROTECTED]> writes:

> From: [EMAIL PROTECTED] (Eric W. Biederman)
> Date: Sat, 23 Jun 2007 11:19:34 -0600
>
>> Further and fundamentally all a global achieves is removing the need
>> for the noise patches where you pass the pointer into the various
>> functions.  For long term maintenance it doesn't help anything.
>
> I don't accept that we have to add another function argument
> to a bunch of core routines just to support this crap,
> especially since you give no way to turn it off and get
> that function argument slot back.
>
> To be honest I think this form of virtualization is a complete
> waste of time, even the openvz approach.
>
> We're protecting the kernel from itself, and that's an endless
> uphill battle that you will never win.  Let's do this kind of
> stuff properly with a real minimal hypervisor, hopefully with
> appropriate hardware level support and good virtualized device
> interfaces, instead of this namespace stuff.
>
> At least the hypervisor approach you have some chance to fully
> harden in some verifyable and truly protected way, with
> namespaces it's just a pipe dream and everyone who works on
> these namespace approaches knows that very well.
>
> The only positive thing that came out of this work is the
> great auditing that the openvz folks have done and the bugs
> they have found, but it basically ends right there.

Dave thank you for your candor, it looks like I have finally made
the pieces small enough that we can discuss them.

If you want the argument to compile out.  That is not a problem at all.
I dropped that part from my patch because it makes infrastructure more
complicated and there appeared to be no gain.  However having a type
that you can pass that the compiler can optimize away is not a
problem.  Basically you just make the argument:

typedef struct {} you_can_compile_me_out;  /* when you don't want it. */
typedef void * you_can_compile_me_out; /* when you do want it. */

And gcc will generate no code to pass the argument when you compile
it out.


As far as the hardening goes.  There is definitely a point there,
short of a kernel proof subsystem that sounds correct to me.

There are some other factors that make a different tradeoff interesting.
First hypervisors do not allow global optimizations (because of the
better isolation) so have an inherent performance disadvantage.
Something like a 10x scaling penalty from the figures I have seen.

Even more interesting for me is the possibility of unmodified
application migration.  Where the limiting factor is that
you cannot reliably restore an application because the global
identifiers are not available.

So yes monolithic kernels may have grown so complex that they cannot
be verified and thus you cannot actually keep untrusted users from
doing bad things to each other with any degree of certainty.

However the interesting cases for me are cases where the users are not
aggressively hostile with each other but being stuck with one set of
global identifiers are a problem.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Jeff Garzik

David Miller wrote:

I don't accept that we have to add another function argument
to a bunch of core routines just to support this crap,
especially since you give no way to turn it off and get
that function argument slot back.

To be honest I think this form of virtualization is a complete
waste of time, even the openvz approach.

We're protecting the kernel from itself, and that's an endless
uphill battle that you will never win.  Let's do this kind of
stuff properly with a real minimal hypervisor, hopefully with
appropriate hardware level support and good virtualized device
interfaces, instead of this namespace stuff.


Strongly seconded.  This containerized virtualization approach just 
bloats up the kernel for something that is inherently fragile and IMO 
less secure -- protecting the kernel from itself.


Plenty of other virt approaches don't stir the code like this, while 
simultaneously providing fewer, more-clean entry points for the 
virtualization to occur.


And that's speaking WITHOUT my vendor hat on...

Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Eric W. Biederman
Jeff Garzik <[EMAIL PROTECTED]> writes:

> David Miller wrote:
>> I don't accept that we have to add another function argument
>> to a bunch of core routines just to support this crap,
>> especially since you give no way to turn it off and get
>> that function argument slot back.
>>
>> To be honest I think this form of virtualization is a complete
>> waste of time, even the openvz approach.
>>
>> We're protecting the kernel from itself, and that's an endless
>> uphill battle that you will never win.  Let's do this kind of
>> stuff properly with a real minimal hypervisor, hopefully with
>> appropriate hardware level support and good virtualized device
>> interfaces, instead of this namespace stuff.
>
> Strongly seconded.  This containerized virtualization approach just bloats up
> the kernel for something that is inherently fragile and IMO less secure --
> protecting the kernel from itself.
>
> Plenty of other virt approaches don't stir the code like this, while
> simultaneously providing fewer, more-clean entry points for the virtualization
> to occur.

Wrong.  I really don't want to get into a my virtualization approach is better
then yours.  But this is flat out wrong.

99% of the changes I'm talking about introducing are just:
- variable 
+ ptr->variable

There are more pieces mostly with when we initialize those variables but
that is the essence of the change.

And as opposed to other virtualization approaches so far no one has been
able to measure the overhead.  I suspect there will be a few more cache
line misses somewhere but they haven't shown up yet.

If the only use was strong isolation which Dave complains about I would
concur that the namespace approach is inappropriate.  However there are
a lot other uses.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Jeff Garzik

Eric W. Biederman wrote:

Jeff Garzik <[EMAIL PROTECTED]> writes:


David Miller wrote:

I don't accept that we have to add another function argument
to a bunch of core routines just to support this crap,
especially since you give no way to turn it off and get
that function argument slot back.

To be honest I think this form of virtualization is a complete
waste of time, even the openvz approach.

We're protecting the kernel from itself, and that's an endless
uphill battle that you will never win.  Let's do this kind of
stuff properly with a real minimal hypervisor, hopefully with
appropriate hardware level support and good virtualized device
interfaces, instead of this namespace stuff.

Strongly seconded.  This containerized virtualization approach just bloats up
the kernel for something that is inherently fragile and IMO less secure --
protecting the kernel from itself.

Plenty of other virt approaches don't stir the code like this, while
simultaneously providing fewer, more-clean entry points for the virtualization
to occur.


Wrong.  I really don't want to get into a my virtualization approach is better
then yours.  But this is flat out wrong.



99% of the changes I'm talking about introducing are just:
- variable 
+ ptr->variable


There are more pieces mostly with when we initialize those variables but
that is the essence of the change.


You completely dodged the main objection.  Which is OK if you are 
selling something to marketing departments, but not OK


Containers introduce chroot-jail-like features that give one a false 
sense of security, while still requiring one to "poke holes" in the 
illusion to get hardware-specific tasks accomplished.


The capable/not-capable model (i.e. superuser / normal user) is _still_ 
being secured locally, even after decades of work and whitepapers and 
audits.


You are drinking Deep Kool-Aid if you think adding containers to the 
myriad kernel subsystems does anything besides increasing fragility, and 
decreasing security.  You are securing in-kernel subsystems against 
other in-kernel subsystems.  superuser/user model made that difficult 
enough... now containers add exponential audit complexity to that.  Who 
is to say that a local root does not also pierce the container model?




And as opposed to other virtualization approaches so far no one has been
able to measure the overhead.  I suspect there will be a few more cache
line misses somewhere but they haven't shown up yet.

If the only use was strong isolation which Dave complains about I would
concur that the namespace approach is inappropriate.  However there are
a lot other uses.


Sure there are uses.  There are uses to putting the X server into the 
kernel, too.  At some point complexity and featuritis has to take a back 
seat to basic sanity.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread David Miller
From: Benny Amorsen <[EMAIL PROTECTED]>
Date: 23 Jun 2007 23:22:38 +0200

> Policy routing just doesn't cut it; it's cumbersome to set up, limited
> to 256 tables

False.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 23 Jun 2007 15:41:16 -0600

> If you want the argument to compile out.  That is not a problem at all.
> I dropped that part from my patch because it makes infrastructure more
> complicated and there appeared to be no gain.  However having a type
> that you can pass that the compiler can optimize away is not a
> problem.  Basically you just make the argument:
> 
> typedef struct {} you_can_compile_me_out;  /* when you don't want it. */
> typedef void * you_can_compile_me_out; /* when you do want it. */
> 
> And gcc will generate no code to pass the argument when you compile
> it out.

I don't want to have to see or be aware of the types or the
fact that we support namespaces when I work on the networking
code.

This is why I like the security layer in the kernel we have,
I can disable it and it's completely not there.  And I can
be completely ignorant of it's existence when I work on the
networking stack.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sat, 23 Jun 2007 16:56:49 -0600

> If the only use was strong isolation which Dave complains about I would
> concur that the namespace approach is inappropriate.  However there are
> a lot other uses.

By your very admission the only appropriate use case is when users
are not "hostile" and can be trusted to some extent.

And that by definition makes it not appropriate for a general purpose
operating system like Linux.

Containers are I believe a step backwards, and we're better than that.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-24 Thread Benny Amorsen
> "DM" == David Miller <[EMAIL PROTECTED]> writes:

DM> Containers are I believe a step backwards, and we're better than
DM> that.

Are there any alternative proposals?

I guess it would be a start if you could run processes with a
different policy table as default. Ideally traffic from those
processes would automatically enter a particular policy table without
having to manually set up "ip rule" to catch their traffic. ip route
would by default show that policy table; quagga would by default
manage that table.

I think that would provide me with the functionality I need, and it
would not pretend to provide isolation.


/Benny


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-24 Thread Eric W. Biederman
David Miller <[EMAIL PROTECTED]> writes:

> From: [EMAIL PROTECTED] (Eric W. Biederman)
> Date: Sat, 23 Jun 2007 16:56:49 -0600
>
>> If the only use was strong isolation which Dave complains about I would
>> concur that the namespace approach is inappropriate.  However there are
>> a lot other uses.
>
> By your very admission the only appropriate use case is when users
> are not "hostile" and can be trusted to some extent.

Yes.  Like all of Linux.  Totally hostile antagonistic users that
have millions or billions of dollars to spend better figuring 
out how to mess each other up and do bad things to each other should
not be running code on the same machine.

> And that by definition makes it not appropriate for a general purpose
> operating system like Linux.

Not at all.  The security should be at least as good as between different
users today, and most likely better.

> Containers are I believe a step backwards, and we're better than that.

I heartily disagree.

I do agree that if someone can write a minimal hypervisor it is possible
to provide stronger guarantees of separation on the same hardware then
it is with linux.  But that is because a hypervisor should be such a small
code base one man should be able to prove and audit the thing.  Which allows
several independent people to reproduce that effort.

So my real admission is that hypervisor should be able to do much
stronger isolation.

Namespaces come in when you want to do silly little things like share the
sysadmin responsibility, or not modify applications that make silly assumptions
or don't want to take the huge resource consumption hit hypervisors require.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-24 Thread Eric W. Biederman
David Miller <[EMAIL PROTECTED]> writes:

> From: [EMAIL PROTECTED] (Eric W. Biederman)
> Date: Sat, 23 Jun 2007 15:41:16 -0600
>
>> If you want the argument to compile out.  That is not a problem at all.
>> I dropped that part from my patch because it makes infrastructure more
>> complicated and there appeared to be no gain.  However having a type
>> that you can pass that the compiler can optimize away is not a
>> problem.  Basically you just make the argument:
>> 
>> typedef struct {} you_can_compile_me_out;  /* when you don't want it. */
>> typedef void * you_can_compile_me_out; /* when you do want it. */
>> 
>> And gcc will generate no code to pass the argument when you compile
>> it out.
>
> I don't want to have to see or be aware of the types or the
> fact that we support namespaces when I work on the networking
> code.
>
> This is why I like the security layer in the kernel we have,
> I can disable it and it's completely not there.  And I can
> be completely ignorant of it's existence when I work on the
> networking stack.

That is a mostly reasonable objection.  I certainly see minimal
impact to people working on the code long term is an important goal.

My impression of the current linux security subsystem is a non-flexible
brittle iptables that works between processes.  I'm rather jaded in
that regard because I have fixed several things and broken the security
subsystem, and people yell.  So I'm not certain if it is good for anything
or if we can actually ignore it right now.  However for most of networking
we can ignore it.

What makes namespaces interesting to me is that they enable us to do things
that we can't do with linux today.  Sorry you seem to have a security
mono-focus so this bears repeating.



I'm not trying to hide network namespaces because I think that is
potentially a much more serious maintenance issue.

During the merge and enabling of this we have to audit everything or
not be responsible programmers.  So that is going to be slightly
intrusive.

I am convinced I can keep network namespaces something that is so
trivial and obvious to get right you won't have to pay attention
to them.

Mostly it isn't the core paths in the network that are a challenge to
work with but the little out of the way paths that don't take a
network device or a socket, and things like the initialization code
which is almost never touched. 


I don't actually thing the core fast paths of the network stack even
make namespaces visible.  I will go back an reexamine that.


You are asking for something much more challenging though.  You are
asking for absolutely no foot-print.  Something that is a completely a
side-car.  Given the thousands of global variables in the networking
code I don't know if that is possible to implement.

Eric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-24 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sun, 24 Jun 2007 06:58:54 -0600

> I am convinced I can keep network namespaces something that is so
> trivial and obvious to get right you won't have to pay attention
> to them.

Ok then, I'll hold you to this when you post the rest of your
implementation :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-25 Thread Serge E. Hallyn
Quoting David Miller ([EMAIL PROTECTED]):
> From: [EMAIL PROTECTED] (Eric W. Biederman)
> Date: Sat, 23 Jun 2007 11:19:34 -0600
> 
> > Further and fundamentally all a global achieves is removing the need
> > for the noise patches where you pass the pointer into the various
> > functions.  For long term maintenance it doesn't help anything.
> 
> I don't accept that we have to add another function argument
> to a bunch of core routines just to support this crap,
> especially since you give no way to turn it off and get
> that function argument slot back.
> 
> To be honest I think this form of virtualization is a complete
> waste of time, even the openvz approach.
> 
> We're protecting the kernel from itself, and that's an endless
> uphill battle that you will never win.  Let's do this kind of

Hi David,

just to be clear this isn't so much about security.  Security can be
provided using selinux, just as with the userid namespace.  But like
with the userid namespace, this provides usability for the virtual
servers, plus some support for restarting checkpointed applications.

That doesn't attempt to justify the extra argument - if you don't
like it, you don't like it  :)  Just wanted to clarify.

thanks,
-serge

> stuff properly with a real minimal hypervisor, hopefully with
> appropriate hardware level support and good virtualized device
> interfaces, instead of this namespace stuff.
> 
> At least the hypervisor approach you have some chance to fully
> harden in some verifyable and truly protected way, with
> namespaces it's just a pipe dream and everyone who works on
> these namespace approaches knows that very well.
> 
> The only positive thing that came out of this work is the
> great auditing that the openvz folks have done and the bugs
> they have found, but it basically ends right there.
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.linux-foundation.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-25 Thread Serge E. Hallyn
Quoting Jeff Garzik ([EMAIL PROTECTED]):
> Eric W. Biederman wrote:
> >Jeff Garzik <[EMAIL PROTECTED]> writes:
> >
> >>David Miller wrote:
> >>>I don't accept that we have to add another function argument
> >>>to a bunch of core routines just to support this crap,
> >>>especially since you give no way to turn it off and get
> >>>that function argument slot back.
> >>>
> >>>To be honest I think this form of virtualization is a complete
> >>>waste of time, even the openvz approach.
> >>>
> >>>We're protecting the kernel from itself, and that's an endless
> >>>uphill battle that you will never win.  Let's do this kind of
> >>>stuff properly with a real minimal hypervisor, hopefully with
> >>>appropriate hardware level support and good virtualized device
> >>>interfaces, instead of this namespace stuff.
> >>Strongly seconded.  This containerized virtualization approach just 
> >>bloats up
> >>the kernel for something that is inherently fragile and IMO less secure --
> >>protecting the kernel from itself.
> >>
> >>Plenty of other virt approaches don't stir the code like this, while
> >>simultaneously providing fewer, more-clean entry points for the 
> >>virtualization
> >>to occur.
> >
> >Wrong.  I really don't want to get into a my virtualization approach is 
> >better
> >then yours.  But this is flat out wrong.
> 
> >99% of the changes I'm talking about introducing are just:
> >- variable 
> >+ ptr->variable
> >
> >There are more pieces mostly with when we initialize those variables but
> >that is the essence of the change.
> 
> You completely dodged the main objection.  Which is OK if you are 
> selling something to marketing departments, but not OK
> 
> Containers introduce chroot-jail-like features that give one a false 
> sense of security, while still requiring one to "poke holes" in the 
> illusion to get hardware-specific tasks accomplished.
> 
> The capable/not-capable model (i.e. superuser / normal user) is _still_ 
> being secured locally, even after decades of work and whitepapers and 
> audits.
> 
> You are drinking Deep Kool-Aid if you think adding containers to the 
> myriad kernel subsystems does anything besides increasing fragility, and 
> decreasing security.  You are securing in-kernel subsystems against 
> other in-kernel subsystems.

No we're not.  As the name 'network namespaces' implies, we are
introducing namespaces for network-related variables.  That's it.

We are not trying to protect in-kernel subsystems from each other.  In
fact we're not even trying to protect userspace process from each other.
Though that will in part come free when user processes can't access each
other's data because they are in different namespaces.  But using an
LSM like selinux or a custom one to tag and enforce isolation would
still be encouraged.

> superuser/user model made that difficult 
> enough... now containers add exponential audit complexity to that.  Who 
> is to say that a local root does not also pierce the container model?

At the moment it does.

> >And as opposed to other virtualization approaches so far no one has been
> >able to measure the overhead.  I suspect there will be a few more cache
> >line misses somewhere but they haven't shown up yet.
> >
> >If the only use was strong isolation which Dave complains about I would
> >concur that the namespace approach is inappropriate.  However there are
> >a lot other uses.
> 
> Sure there are uses.  There are uses to putting the X server into the 
> kernel, too.  At some point complexity and featuritis has to take a back 
> seat to basic sanity.

Generally true, yes.

-serge
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-26 Thread Eric W. Biederman
David Miller <[EMAIL PROTECTED]> writes:

> From: [EMAIL PROTECTED] (Eric W. Biederman)
> Date: Sun, 24 Jun 2007 06:58:54 -0600
>
>> I am convinced I can keep network namespaces something that is so
>> trivial and obvious to get right you won't have to pay attention
>> to them.
>
> Ok then, I'll hold you to this when you post the rest of your
> implementation :-)

Sounds fair to me.  I definitely don't want a network stack
that is noticeably harder to maintain.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-28 Thread Kirill Korotaev
Eric W. Biederman wrote:
> Patrick McHardy <[EMAIL PROTECTED]> writes:
> 
> 
>>Eric W. Biederman wrote:
>>
>>>-- The basic design
>>>
>>>There will be a network namespace structure that holds the global
>>>variables for a network namespace, making those global variables
>>>per network namespace.
>>>
>>>One of those per network namespace global variables will be the
>>>loopback device.  Which means the network namespace a packet resides
>>>in can be found simply by examining the network device or the socket
>>>the packet is traversing.
>>>
>>>Either a pointer to this global structure will be passed into
>>>the functions that need to reference per network namespace variables
>>>or a structure that is already passed in (such as the network device)
>>>will be modified to contain a pointer to the network namespace
>>>structure.
>>
>>
>>I believe OpenVZ stores the current namespace somewhere global,
>>which avoids passing the namespace around. Couldn't you do this
>>as well?
> 
> 
> It sucks.  Especially in the corner cases.   Think macvlan
> with the real network device in one namespace and the ``vlan''
> device in another device.

sorry, what's the problem here? I don't see a single problem here
related to the global current context.
 
> The implementation of a global is also pretty a little questionable.
> Last I looked it didn't work on the transmit path at all and
> interesting on the receive path.
> 
> Further and fundamentally all a global achieves is removing the need
> for the noise patches where you pass the pointer into the various
> functions.  For long term maintenance it doesn't help anything.

this is not 100% true.

Having global context also helps:
1. to account for time spent in network processing or some other activity
   (like IRQ handling) to appropriate namespace.

2. it is really usefull and helpfull for functions like printk() with 
virtualized
   syslog buffer. Otherwise you would need to find a context in every place 
where
   namespace private message should be printed.

   Actually it is the same natural as the 'current' which is global,
   but could be passed from entry.S instead to every function which needs it.

   OVZ experience just tells me that this helps to avoid all this noise
   and not harder to work with then with 'current'. But it's up to you.

Thanks,
Kirill
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Patrick McHardy
Eric W. Biederman wrote:
> -- The basic design
> 
> There will be a network namespace structure that holds the global
> variables for a network namespace, making those global variables
> per network namespace.
> 
> One of those per network namespace global variables will be the
> loopback device.  Which means the network namespace a packet resides
> in can be found simply by examining the network device or the socket
> the packet is traversing.
> 
> Either a pointer to this global structure will be passed into
> the functions that need to reference per network namespace variables
> or a structure that is already passed in (such as the network device)
> will be modified to contain a pointer to the network namespace
> structure.


I believe OpenVZ stores the current namespace somewhere global,
which avoids passing the namespace around. Couldn't you do this
as well?

> Depending upon the data structure it will either be modified to hold
> a per entry network namespace pointer or it there will be a separate
> copy per network namespace.  For large global data structures like
> the ipv4 routing cache hash table adding an additional pointer to the
> entries appears the more reasonable solution.


So the routing cache is shared between all namespaces?

> --- Performance
> 
> In initial measurements the only performance overhead we have been
> able to measure is getting the packet to the network namespace.
> Going through ethernet bridging or routing seems to trigger copies
> of the packet that slow things down.  When packets go directly to
> the network namespace no performance penalty has yet been measured.


It would be interesting to find out whats triggering these copies.
Do you have NAT enabled?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Ben Greear

Patrick McHardy wrote:

Eric W. Biederman wrote:
  

-- The basic design

There will be a network namespace structure that holds the global
variables for a network namespace, making those global variables
per network namespace.

One of those per network namespace global variables will be the
loopback device.  Which means the network namespace a packet resides
in can be found simply by examining the network device or the socket
the packet is traversing.

Either a pointer to this global structure will be passed into
the functions that need to reference per network namespace variables
or a structure that is already passed in (such as the network device)
will be modified to contain a pointer to the network namespace
structure.




I believe OpenVZ stores the current namespace somewhere global,
which avoids passing the namespace around. Couldn't you do this
as well?
  

Will we be able to have a single application be in multiple name-spaces?

Thanks,
Ben

--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Eric W. Biederman
Patrick McHardy <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote:
>> -- The basic design
>> 
>> There will be a network namespace structure that holds the global
>> variables for a network namespace, making those global variables
>> per network namespace.
>> 
>> One of those per network namespace global variables will be the
>> loopback device.  Which means the network namespace a packet resides
>> in can be found simply by examining the network device or the socket
>> the packet is traversing.
>> 
>> Either a pointer to this global structure will be passed into
>> the functions that need to reference per network namespace variables
>> or a structure that is already passed in (such as the network device)
>> will be modified to contain a pointer to the network namespace
>> structure.
>
>
> I believe OpenVZ stores the current namespace somewhere global,
> which avoids passing the namespace around. Couldn't you do this
> as well?

It sucks.  Especially in the corner cases.   Think macvlan
with the real network device in one namespace and the ``vlan''
device in another device.

The implementation of a global is also pretty a little questionable.
Last I looked it didn't work on the transmit path at all and
interesting on the receive path.

Further and fundamentally all a global achieves is removing the need
for the noise patches where you pass the pointer into the various
functions.  For long term maintenance it doesn't help anything.

All of the other changes such as messing with the
initialization/cleanup and changing access to access the per network
namespace data structure, and modifying the code partly along the way
to reject working in other non-default network namespaces that are
truly intrusive we both still have to make.

So except as an implementation detail how we pass the per network
namespace pointer is uninteresting.

Currently I am trying for the least clever most straight forward
implementation I can find, that doesn't give us a regression 
in network stack performance.

So yes if we want to do passing through a magic per cpu global on
the packet receive path now is the time to decide to do that.
Currently I don't see the advantage in doing that so I'm not
suggesting it.

In general if there are any specific objections people have written
complicated code that allows us to avoid those objections, so it
should just be a matter of dusting those patches off.  I would much
rather go with something stupid and simple if people are willing to
merge that however.

>> Depending upon the data structure it will either be modified to hold
>> a per entry network namespace pointer or it there will be a separate
>> copy per network namespace.  For large global data structures like
>> the ipv4 routing cache hash table adding an additional pointer to the
>> entries appears the more reasonable solution.
>
>
> So the routing cache is shared between all namespaces?

Yes.  Each namespaces has it's own view so semantically it's not
shared.  But the initial fan out of the hash table 2M or something
isn't something we want to replicate on a per namespace basis even
assuming the huge page allocations could happen.

So we just tag the entries and add the network namespace as one more
part of the key when doing hash table look ups.

>> --- Performance
>> 
>> In initial measurements the only performance overhead we have been
>> able to measure is getting the packet to the network namespace.
>> Going through ethernet bridging or routing seems to trigger copies
>> of the packet that slow things down.  When packets go directly to
>> the network namespace no performance penalty has yet been measured.
>
>
> It would be interesting to find out whats triggering these copies.
> Do you have NAT enabled?

I would have to go back and look.  There was a skb_cow call someplace
in the routing path.  Something else with ipfilter, ethernet bridging.
So yes it is probably interesting to dig into.  

So the thread where we dug into this last time to the point of
identifying the problem is here:
https://lists.linux-foundation.org/pipermail/containers/2007-March/004309.html
The problem in the bridging was here:
https://lists.linux-foundation.org/pipermail/containers/2007-March/004336.html
I can't find a good pointer to the bit of discussion that described
the routing.  I just remember it was an skb_cow somewhere in the
routing output path, I believe at the point where we write in the new
destination IP.  I haven't a clue why the copy was triggering.

Design wise the interesting bit was it nothing was measurable when the
network device was in the network namespace.  So adding an extra
pointer parameter to functions and dereferencing the pointer has not
measurably affected performance at this point.

Eric





-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Eric W. Biederman
Stephen Hemminger <[EMAIL PROTECTED]> writes:

> On Sat, 23 Jun 2007 08:20:40 -0700
> Ben Greear <[EMAIL PROTECTED]> wrote:
>
>> Patrick McHardy wrote:
>> > Eric W. Biederman wrote:
>> >   
>> >> -- The basic design
>> >>
>> >> There will be a network namespace structure that holds the global
>> >> variables for a network namespace, making those global variables
>> >> per network namespace.
>> >>
>> >> One of those per network namespace global variables will be the
>> >> loopback device.  Which means the network namespace a packet resides
>> >> in can be found simply by examining the network device or the socket
>> >> the packet is traversing.
>> >>
>> >> Either a pointer to this global structure will be passed into
>> >> the functions that need to reference per network namespace variables
>> >> or a structure that is already passed in (such as the network device)
>> >> will be modified to contain a pointer to the network namespace
>> >> structure.
>> >> 
>> >
>> >
>> > I believe OpenVZ stores the current namespace somewhere global,
>> > which avoids passing the namespace around. Couldn't you do this
>> > as well?
>
> Maybe the current namespace should be attached to something else
> like sysfs root? Having multiple namespace indirection possiblities
> leads to interesting cases where current namespace is not correctly
> associated with current sysfs tree or current proc tree, ...

Yes.  There are some oddities there.  

In my current tree there is code that makes proc and sysfs match the inspecting
process.

I haven't quite solved the inspection problem where we want to look at
the namespace of a different process.  But as long as we have clean
code to do the basics that isn't a big leap when we come to it.

I'm not really seeing any problems along this line at this point.

The big problem at this point is code review and merging, and in
particular breaking this work up into small enough pieces that they
can be digested, successfully code reviewed and merged.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Eric W. Biederman
Ben Greear <[EMAIL PROTECTED]> writes:

> Patrick McHardy wrote:
>> Eric W. Biederman wrote:
>>
>>> -- The basic design
>>>
>>> There will be a network namespace structure that holds the global
>>> variables for a network namespace, making those global variables
>>> per network namespace.
>>>
>>> One of those per network namespace global variables will be the
>>> loopback device.  Which means the network namespace a packet resides
>>> in can be found simply by examining the network device or the socket
>>> the packet is traversing.
>>>
>>> Either a pointer to this global structure will be passed into
>>> the functions that need to reference per network namespace variables
>>> or a structure that is already passed in (such as the network device)
>>> will be modified to contain a pointer to the network namespace
>>> structure.
>>>
>>
>>
>> I believe OpenVZ stores the current namespace somewhere global,
>> which avoids passing the namespace around. Couldn't you do this
>> as well?
>>
> Will we be able to have a single application be in multiple name-spaces?

A single application certainly.   But then an application can be composed
of multiple processes which can be composed of multiple threads.

In my current patches a single task_struct belongs to a single network
namespace.  That namespace is used when creating sockets.  The sockets
themselves have a namespace tag and that is used when transmitting
packets, or otherwise operating on the socket.

So if you pass a socket from one process to another you can have
sockets that belong to different network namespaces in a single task.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Stephen Hemminger
On Sat, 23 Jun 2007 08:20:40 -0700
Ben Greear <[EMAIL PROTECTED]> wrote:

> Patrick McHardy wrote:
> > Eric W. Biederman wrote:
> >   
> >> -- The basic design
> >>
> >> There will be a network namespace structure that holds the global
> >> variables for a network namespace, making those global variables
> >> per network namespace.
> >>
> >> One of those per network namespace global variables will be the
> >> loopback device.  Which means the network namespace a packet resides
> >> in can be found simply by examining the network device or the socket
> >> the packet is traversing.
> >>
> >> Either a pointer to this global structure will be passed into
> >> the functions that need to reference per network namespace variables
> >> or a structure that is already passed in (such as the network device)
> >> will be modified to contain a pointer to the network namespace
> >> structure.
> >> 
> >
> >
> > I believe OpenVZ stores the current namespace somewhere global,
> > which avoids passing the namespace around. Couldn't you do this
> > as well?

Maybe the current namespace should be attached to something else
like sysfs root? Having multiple namespace indirection possiblities
leads to interesting cases where current namespace is not correctly
associated with current sysfs tree or current proc tree, ...


> Will we be able to have a single application be in multiple name-spaces?

That would break the whole point of namespaces...

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Patrick McHardy
Eric W. Biederman wrote:
> Patrick McHardy <[EMAIL PROTECTED]> writes:
> 
>>I believe OpenVZ stores the current namespace somewhere global,
>>which avoids passing the namespace around. Couldn't you do this
>>as well?
> 
> 
> It sucks.  Especially in the corner cases.   Think macvlan
> with the real network device in one namespace and the ``vlan''
> device in another device.
> 
> The implementation of a global is also pretty a little questionable.
> Last I looked it didn't work on the transmit path at all and
> interesting on the receive path.
> 
> Further and fundamentally all a global achieves is removing the need
> for the noise patches where you pass the pointer into the various
> functions.  For long term maintenance it doesn't help anything.
> 
> All of the other changes such as messing with the
> initialization/cleanup and changing access to access the per network
> namespace data structure, and modifying the code partly along the way
> to reject working in other non-default network namespaces that are
> truly intrusive we both still have to make.
> 
> So except as an implementation detail how we pass the per network
> namespace pointer is uninteresting.
> 
> Currently I am trying for the least clever most straight forward
> implementation I can find, that doesn't give us a regression 
> in network stack performance.
> 
> So yes if we want to do passing through a magic per cpu global on
> the packet receive path now is the time to decide to do that.
> Currently I don't see the advantage in doing that so I'm not
> suggesting it.


I think your approach is fine and is probably a lot easier
to review than using something global.

>>>Depending upon the data structure it will either be modified to hold
>>>a per entry network namespace pointer or it there will be a separate
>>>copy per network namespace.  For large global data structures like
>>>the ipv4 routing cache hash table adding an additional pointer to the
>>>entries appears the more reasonable solution.
>>
>>
>>So the routing cache is shared between all namespaces?
> 
> 
> Yes.  Each namespaces has it's own view so semantically it's not
> shared.  But the initial fan out of the hash table 2M or something
> isn't something we want to replicate on a per namespace basis even
> assuming the huge page allocations could happen.
> 
> So we just tag the entries and add the network namespace as one more
> part of the key when doing hash table look ups.


I can wait for the patches, but I would be interested in how
GC is performed and whether limits can be configured per
namespace.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Eric W. Biederman
Patrick McHardy <[EMAIL PROTECTED]> writes:

Depending upon the data structure it will either be modified to hold
a per entry network namespace pointer or it there will be a separate
copy per network namespace.  For large global data structures like
the ipv4 routing cache hash table adding an additional pointer to the
entries appears the more reasonable solution.
>>>
>>>
>>>So the routing cache is shared between all namespaces?
>> 
>> 
>> Yes.  Each namespaces has it's own view so semantically it's not
>> shared.  But the initial fan out of the hash table 2M or something
>> isn't something we want to replicate on a per namespace basis even
>> assuming the huge page allocations could happen.
>> 
>> So we just tag the entries and add the network namespace as one more
>> part of the key when doing hash table look ups.
>
>
> I can wait for the patches, but I would be interested in how
> GC is performed and whether limits can be configured per
> namespace.

Currently I believe the gc code is unmodified in my patches.

Currently I have been focusing on the normal semantics and just
making something work in a mergeable fashion.

Limits and the like are comparatively easy to add in after the
rest is working so I haven't been focusing on that.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [RFD] L2 Network namespace infrastructure

2007-06-27 Thread Kirill Korotaev
Ben Greear wrote:
> Patrick McHardy wrote:
> 
>>Eric W. Biederman wrote:
>>  
>>
>>>-- The basic design
>>>
>>>There will be a network namespace structure that holds the global
>>>variables for a network namespace, making those global variables
>>>per network namespace.
>>>
>>>One of those per network namespace global variables will be the
>>>loopback device.  Which means the network namespace a packet resides
>>>in can be found simply by examining the network device or the socket
>>>the packet is traversing.
>>>
>>>Either a pointer to this global structure will be passed into
>>>the functions that need to reference per network namespace variables
>>>or a structure that is already passed in (such as the network device)
>>>will be modified to contain a pointer to the network namespace
>>>structure.
>>>
>>
>>
>>I believe OpenVZ stores the current namespace somewhere global,
>>which avoids passing the namespace around. Couldn't you do this
>>as well?
>>  
> 
> Will we be able to have a single application be in multiple name-spaces?

Application itself can't be in multiple namespaces in both approaches.
But the objects which belong to the application (e.g. sockets opened by an app)
can belong to different namespaces.

Thanks,
Kirill

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [RFD] L2 Network namespace infrastructure

2007-06-27 Thread Kirill Korotaev
Patrick McHardy wrote:
> Eric W. Biederman wrote:
> 
>>-- The basic design
>>
>>There will be a network namespace structure that holds the global
>>variables for a network namespace, making those global variables
>>per network namespace.
>>
>>One of those per network namespace global variables will be the
>>loopback device.  Which means the network namespace a packet resides
>>in can be found simply by examining the network device or the socket
>>the packet is traversing.
>>
>>Either a pointer to this global structure will be passed into
>>the functions that need to reference per network namespace variables
>>or a structure that is already passed in (such as the network device)
>>will be modified to contain a pointer to the network namespace
>>structure.
> 
> 
> 
> I believe OpenVZ stores the current namespace somewhere global,
> which avoids passing the namespace around. Couldn't you do this
> as well?

yes, we store a global namespace context on current
(can be stored in per-cpu as well).

do you prefer this way?

Thanks,
Kirill

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [RFD] L2 Network namespace infrastructure

2007-06-27 Thread Patrick McHardy

Kirill Korotaev wrote:

Patrick McHardy wrote:
  



I believe OpenVZ stores the current namespace somewhere global,
which avoids passing the namespace around. Couldn't you do this
as well?



yes, we store a global namespace context on current
(can be stored in per-cpu as well).

do you prefer this way?


I'm not sure, it would avoid passing it around everywhere, but I'd
expect it to be harder to review because you can't simply follow it
as it gets passed around.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [RFD] L2 Network namespace infrastructure

2007-06-27 Thread Ben Greear

Kirill Korotaev wrote:

Patrick McHardy wrote:
  

I believe OpenVZ stores the current namespace somewhere global,
which avoids passing the namespace around. Couldn't you do this
as well?



yes, we store a global namespace context on current
(can be stored in per-cpu as well).

do you prefer this way?
  
For what it's worth, I don't prefer this way as I can see wanting to 
have one application

use several namespaces at once

Thanks,
Ben


Thanks,
Kirill

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  



--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [RFD] L2 Network namespace infrastructure

2007-06-27 Thread Kirill Korotaev
Jeff Garzik wrote:
> Eric W. Biederman wrote:
> 
>>Jeff Garzik <[EMAIL PROTECTED]> writes:
>>
>>
>>>David Miller wrote:
>>>
I don't accept that we have to add another function argument
to a bunch of core routines just to support this crap,
especially since you give no way to turn it off and get
that function argument slot back.

To be honest I think this form of virtualization is a complete
waste of time, even the openvz approach.

We're protecting the kernel from itself, and that's an endless
uphill battle that you will never win.  Let's do this kind of
stuff properly with a real minimal hypervisor, hopefully with
appropriate hardware level support and good virtualized device
interfaces, instead of this namespace stuff.
>>>
>>>Strongly seconded.  This containerized virtualization approach just bloats up
>>>the kernel for something that is inherently fragile and IMO less secure --
>>>protecting the kernel from itself.
>>>
>>>Plenty of other virt approaches don't stir the code like this, while
>>>simultaneously providing fewer, more-clean entry points for the 
>>>virtualization
>>>to occur.
>>
>>Wrong.  I really don't want to get into a my virtualization approach is better
>>then yours.  But this is flat out wrong.
> 
> 
>>99% of the changes I'm talking about introducing are just:
>>- variable 
>>+ ptr->variable
>>
>>There are more pieces mostly with when we initialize those variables but
>>that is the essence of the change.
> 
> 
> You completely dodged the main objection.  Which is OK if you are 
> selling something to marketing departments, but not OK

It is a pure illusion that one kind of virtualization is better then the other 
one.
Look, maybe *hardware* virtualization and a small size of the hypervisor
make you feel safer, however you totally forget about all the emulation drivers 
etc.
which can have bugs and security implications as well and maybe triggerable
from inside VM.

> Containers introduce chroot-jail-like features that give one a false 
> sense of security, while still requiring one to "poke holes" in the 
> illusion to get hardware-specific tasks accomplished.

The concepts of users in any OS (and in Linux in particular) give people
a false sense of security as well!
I know a bunch of examples of how one user can crash/DoS/abuse another one
e.g. on RHEL5 or mainstream.
But no one have problems with that and no one thinks that multiuserness
is a step-backward or maybe someone does?!

Yes, there are always bugs and sometimes security implications,
but people leave fine with it when it is fixed.

> The capable/not-capable model (i.e. superuser / normal user) is _still_ 
> being secured locally, even after decades of work and whitepapers and 
> audits.

> You are drinking Deep Kool-Aid if you think adding containers to the 
> myriad kernel subsystems does anything besides increasing fragility, and 
> decreasing security.  You are securing in-kernel subsystems against 
> other in-kernel subsystems.  superuser/user model made that difficult 
> enough... now containers add exponential audit complexity to that.  Who 
> is to say that a local root does not also pierce the container model?

containers do the only thing:
make sure that objects from one context are not visible to another one.

If containers are not used - everything returns to the case as it is now, i.e.
everything is global and globally visible and no auditing overhead at all.
So if you are not interested in containers - the code auditing
won't be noticably harder for you.

>>And as opposed to other virtualization approaches so far no one has been
>>able to measure the overhead.  I suspect there will be a few more cache
>>line misses somewhere but they haven't shown up yet.
>>
>>If the only use was strong isolation which Dave complains about I would
>>concur that the namespace approach is inappropriate.  However there are
>>a lot other uses.
> 
> 
> Sure there are uses.  There are uses to putting the X server into the 
> kernel, too.  At some point complexity and featuritis has to take a back 
> seat to basic sanity.

I agree about sanity, however I totally disagree about complexity
you talk about.
Bugs we face/fix in Linux kernel which are found with the help of
that kind of virtualization makes me believe that Linux kernel
only wins from it.

Thanks,
Kirill

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Devel] Re: [RFD] L2 Network namespace infrastructure

2007-06-28 Thread Kirill Korotaev
Ben Greear wrote:
> Kirill Korotaev wrote:
> 
>>Patrick McHardy wrote:
>>  
>>
>>>I believe OpenVZ stores the current namespace somewhere global,
>>>which avoids passing the namespace around. Couldn't you do this
>>>as well?
>>>
>>
>>yes, we store a global namespace context on current
>>(can be stored in per-cpu as well).
>>
>>do you prefer this way?
>>  
> 
> For what it's worth, I don't prefer this way as I can see wanting to 
> have one application
> use several namespaces at once

As I wrote to you in another email, it's not a problem,
since applications itself do not own network namespace.

It is objects like sockets which are binded to net namespace and thus
you can have sockets from different namespaces in one application
regardless of mechanism of context handling we talk about.

Thanks,
Kirill

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html