Re: strict isolation of net interfaces

2006-07-04 Thread Daniel Lezcano

Sam Vilain wrote:

Daniel Lezcano wrote:


If it is ok for you, we can collaborate to merge the two solutions in
one. I will focus on layer 3 isolation and you on the layer 2.



So, you're writing a LSM module or adapting the BSD Jail LSM, right? :)

Sam.


No. I am adapting a prototype of network application container we did.

  -- Daniel
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-07-04 Thread Sam Vilain
Daniel Lezcano wrote:
> 
> If it is ok for you, we can collaborate to merge the two solutions in
> one. I will focus on layer 3 isolation and you on the layer 2.

So, you're writing a LSM module or adapting the BSD Jail LSM, right? :)

Sam.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-07-04 Thread Daniel Lezcano

Andrey Savochkin wrote:


I still can't completely understand your direction of thoughts.
Could you elaborate on IP address assignment in your diagram, please?  For
example, guest0 wants 127.0.0.1 and 192.168.0.1 addresses on its lo
interface, and 10.1.1.1 on its eth0 interface.
Does this diagram assume any local IP addresses on v* interfaces in the
"host"?

And the second question.
Are vlo0, veth0, etc. devices supposed to have hard_xmit routines?



Andrey,

some people are interested by a network full isolation/virtualization 
like you did with the layer 2 isolation and some other people are 
interested by a light network isolation done at the layer 3. This one is 
intended to implement "application container" aka "lightweight container".


In the case of a layer 3 isolation, the network interface is not totally 
isolated and the debate here is to find a way to have something 
intuitive to manage the network devices.


IHMO, all the discussion we had convinced me of the needs to have the 
possibility to choose between a layer 2 or a layer 3 isolation.


If it is ok for you, we can collaborate to merge the two solutions in 
one. I will focus on layer 3 isolation and you on the layer 2.


Regards

  - Daniel
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-07-03 Thread Sam Vilain
Andrey Savochkin wrote:
>> Why special case loopback?
>>
>> Why not:
>>
>> host  |  guest 0  |  guest 1  |  guest2
>> --+---+---+--
>>   |   |   |   |
>>   |-> lo  |   |   |
>>   |   |   |   |
>>   |-> vlo0  <-+-> lo  |   |
>>   |   |   |   |
>>   |-> vlo1  <-+---+---+-> lo
>>   |   |   |   |
>>   |-> vlo2   <+---+-> lo  |
>>   |   |   |   |
>>   |-> eth0|   |   |
>>   |   |   |   |
>>   |-> veth0  <+-> eth0|   |
>>   |   |   |   |
>>   |-> veth1  <+---+---+-> eth0
>>   |   |   |   |
>>   |-> veth2   <---+---+-> eth0|
>> 
>
> I still can't completely understand your direction of thoughts.
> Could you elaborate on IP address assignment in your diagram, please?  For
> example, guest0 wants 127.0.0.1 and 192.168.0.1 addresses on its lo
> interface, and 10.1.1.1 on its eth0 interface.
> Does this diagram assume any local IP addresses on v* interfaces in the
> "host"?
>   

Well, Eric already pointed out some pretty good reasons why this thread
should die.

The idea is that each "lo" interface would have the same set of
addresses. Which would make routing on the host confusing. Yet another
reason to kill this idea. Let's just make better tools instead.

Sam.

> And the second question.
> Are vlo0, veth0, etc. devices supposed to have hard_xmit routines?
>   

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-07-03 Thread Andrey Savochkin
Sam, Serge, Cedric,

On Fri, Jun 30, 2006 at 02:49:05PM +1200, Sam Vilain wrote:
> Serge E. Hallyn wrote:
> > The last one in your diagram confuses me - why foo0:1?  I would
> > have thought it'd be
> >
> > host  |  guest 0  |  guest 1  |  guest2
> > --+---+---+--
> >   |   |   |   |
> >   |-> l0  <---+-> lo0 ... | lo0   | lo0
> >   |   |   |   |
> >   |-> eth0|   |   |
> >   |   |   |   |
> >   |-> veth0  <+-> eth0|   |
> >   |   |   |   |
> >   |-> veth1  <+---+---+-> eth0
> >   |   |   |   |
> >   |-> veth2   <---+---+-> eth0|
> >
> > [...]
> >
> > So conceptually using a full virtual net device per container
> > certainly seems cleaner to me, and it seems like it should be
> > simpler by way of statistics gathering etc, but are there actually
> > any real gains?  Or is the support for multiple IPs per device
> > actually enough?
> >   
> 
> Why special case loopback?
> 
> Why not:
> 
> host  |  guest 0  |  guest 1  |  guest2
> --+---+---+--
>   |   |   |   |
>   |-> lo  |   |   |
>   |   |   |   |
>   |-> vlo0  <-+-> lo  |   |
>   |   |   |   |
>   |-> vlo1  <-+---+---+-> lo
>   |   |   |   |
>   |-> vlo2   <+---+-> lo  |
>   |   |   |   |
>   |-> eth0|   |   |
>   |   |   |   |
>   |-> veth0  <+-> eth0|   |
>   |   |   |   |
>   |-> veth1  <+---+---+-> eth0
>   |   |   |   |
>   |-> veth2   <---+---+-> eth0|

I still can't completely understand your direction of thoughts.
Could you elaborate on IP address assignment in your diagram, please?  For
example, guest0 wants 127.0.0.1 and 192.168.0.1 addresses on its lo
interface, and 10.1.1.1 on its eth0 interface.
Does this diagram assume any local IP addresses on v* interfaces in the
"host"?

And the second question.
Are vlo0, veth0, etc. devices supposed to have hard_xmit routines?

Best regards

Andrey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-07-03 Thread Herbert Poetzl
On Fri, Jun 30, 2006 at 10:56:13AM +0200, Cedric Le Goater wrote:
> Serge E. Hallyn wrote:
> > 
> > The last one in your diagram confuses me - why foo0:1?  I would
> > have thought it'd be
> 
> just thinking aloud. I thought that any kind/type of interface could be
> mapped from host to guest.
> 
> > host  |  guest 0  |  guest 1  |  guest2
> > --+---+---+--
> >   |   |   |   |
> >   |-> l0  <---+-> lo0 ... | lo0   | lo0
> >   |   |   |   |
> >   |-> eth0|   |   |
> >   |   |   |   |
> >   |-> veth0  <+-> eth0|   |
> >   |   |   |   |
> >   |-> veth1  <+---+---+-> eth0
> >   |   |   |   |
> >   |-> veth2   <---+---+-> eth0|
> > 
> > I think we should avoid using device aliases, as trying to do
> > something like giving eth0:1 to guest1 and eth0:2 to guest2
> > while hiding eth0:1 from guest2 requires some uglier code (as
> > I recall) than working with full devices.  In other words,
> > if a namespace can see eth0, and eth0:2 exists, it should always
> > see eth0:2.
> > 
> > So conceptually using a full virtual net device per container
> > certainly seems cleaner to me, and it seems like it should be
> > simpler by way of statistics gathering etc, but are there actually
> > any real gains?  Or is the support for multiple IPs per device
> > actually enough?
> > 
> > Herbert, is this basically how ngnet is supposed to work?

hard to tell, we have at least three ngnet prototypes
and basically all variants are covered there, from
separate interfaces which map to real ones to perfect
isolation of addresses assigned to global interfaces

IMHO the 'virtual' interface per guest is fine, as
the overhead and consumed resources are non critical
and it will definitely simplify handling for the
guest side

I'd really appreciate if we could find a solution which
allows both, isolation and virtualization, and if the
bridge scenario is as fast as a direct mapping, I'm
perfectly fine with a big bridge + ebtables to handle
security issues

best,
Herbert

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-06-30 Thread Eric W. Biederman
Daniel Lezcano <[EMAIL PROTECTED]> writes:

> Serge E. Hallyn wrote:
>> Quoting Cedric Le Goater ([EMAIL PROTECTED]):
>>
>>>we could work on virtualizing the net interfaces in the host, map them to
>>>eth0 or something in the guest and let the guest handle upper network layers 
>>>?
>>>
>>>lo0 would just be exposed relying on skbuff tagging to discriminate traffic
>>>between guests.
>> This seems to me the preferable way.  We create a full virtual net
>> device for each new container, and fully virtualize the device
>> namespace.
>

Answers with respect to how I see layer 2 isolation,
with network devices and sockets as well as the associated routing
information given per namespace.

> I have a few questions about all the network isolation stuff:
>
>   * What level of isolation is wanted for the network ? network devices ?
> IPv4/IPv6 ? TCP/UDP ?
>
>   * How is handled the incoming packets from the network ? I mean what will be
> mecanism to dispatch the packet to the right virtual device ?

Wrong question.  A better question is to ask how do you know which namespace
a packet is in.  
Answer:  By looking at which device or socket it just came from.

How do you get a packet into a non-default namespace?
Either you move a real network interface into that namespace.
Or you use a tunnel device that shows up as two network interfaces in
two different namespaces.

Then you route, or bridge packets between the two.  Trivial.

>   * How to handle the SO_BINDTODEVICE socket option ?

Just like we do now.

>   * Has the virtual device a different MAC address ? 

All network devices are abstractions of the hardware so they are all
sort of virtual.  My implementation of a tunnel device has a mac
address so I can use it with ethernet bridging but that isn't a hard
requirement.  And yes the mac address is different because you can't
do layer 2 switching if everyone has the same mac address.

But there is no special ``virtual'' device.

> How to manage it with the real MAC address on the system ? 
Manage?

> How to manage ARP, ICMP, multicasting and IP ?

Like you always do.  It would be a terrible implementation if
we had to change that logic.  There is a little bit of that
where we need to detect which network namespace we are going to because
the answers can differ but that is pretty straight forward.

> It seems for me, IMHO that will require a lot of translation and browsing
> table. It will probably add a very significant overhead.

Then look at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-ns.git#proof-of-concept
or the OpenVZ implementation.  

It isn't serious overhead.

>* How to handle NFS access mounted outside of the container ?

The socket should remember it's network namespace.
It works fine.

>* How to handle ICMP_REDIRECT ?

Just like we always do?

Eric


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-06-30 Thread Eric W. Biederman
Daniel Lezcano <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote:
>> Daniel Lezcano <[EMAIL PROTECTED]> writes:
>>
>>>Serge E. Hallyn wrote:
>>>
Quoting Cedric Le Goater ([EMAIL PROTECTED]):


>we could work on virtualizing the net interfaces in the host, map them to
>eth0 or something in the guest and let the guest handle upper network 
>layers
> ?
>
>lo0 would just be exposed relying on skbuff tagging to discriminate traffic
>between guests.

This seems to me the preferable way.  We create a full virtual net
device for each new container, and fully virtualize the device
namespace.
>>>
>>>I have a few questions about all the network isolation stuff:
>>
>
> It seems these questions are not important.

I'm just trying to get us back to a productive topic.

>> So far I have seen two viable possibilities on the table,
>> neither of them involve multiple names for a network device.
>> layer 3 (filtering the allowed ip addresses at bind time roughly the current
>> vserver).
>>   - implementable as a security hook.
>>   - Benefit no measurable performance impact.
>>   - Downside not many things we can do.
>
> What things ? Can you develop please ? Can you give some examples ?

DHCP, tcpdump,..  Probably a bad way of phrasing it.  But there
is a lot more that we can do using a pure layer 2 approach.

>> layer 2 (What appears to applications a separate instance of the network
>> stack).
>>   - Implementable as a namespace.
>
> what about accessing a NFS mounted outside the container ?

As I replied earlier it isn't a problem.  If you get to it through the
filesystem namespace it uses the network namespace it was mounted with
for it's connection.

>>   - Each network namespace would have dedicated network devices.
>>   - Benefit extremely flexible.
>
> For what ? For who ? Do you have examples ?

See above.

>>   - Downside since at least the slow path must examine the packet
>> it has the possibility of slowing down the networking stack.
>
> What is/are the slow path(s) you identified ?

Grr.  I put that badly.  Basically at least on the slow path you need to
look at a per network namespace data structure.  The extra pointer
indirection could slow things down.  The point is that we may be
able to have a fast path that is exactly the same as the rest
of the network stack.

If the obvious approach does not work my gut the feeling the
network stack fast path will give us an implementation without overhead.

>> For me the important characteristics.
>> - Allows for application migration, when we take our ip address with us.
>>   In particular it allows for importation of addresses assignments
>>   mad on other machines.
>
> Ok for the two methods no ?

So far.

>> - No measurable impact on the existing networking when the code
>>   is compiled in.
>
> You contradict ...

How so?  As far as I can tell this is a basic requirement to get
merged.

>> - Clean predictable semantics.
>
> What that means ? Can you explain, please ?

>> This whole debate on network devices show up in multiple network namespaces
>> is just silly.
>
> The debate is not on the network device show up. The debate is can we have a
> network isolation ___usable for everybody___ not only for the beauty of having
> namespaces and for a system container like.

This subthread talking about devices showing up in multiple namespaces seemed
 very much exactly on how network devices show up.

> I am not against the network device virtualization or against the namespaces. 
> I
> am just asking if the namespace is the solution for all the network
> isolation. Should we nest layer 2 and layer 3 vitualization into namespaces or
> separate them in order to have the flexibility to choose 
> isolation/performance.

I believe I addressed Herbert Poetzl's concerns earlier.  To me the question
is can we implement an acceptable layer 2 solution, that distrubutions and
other people who do not need isolation would have no problem compiling in
by default.

The joy of namespaces is that if you don't want it you don't have to use it.
Layer 2 can do everything and is likely usable by everyone iff the performance
is acceptable.

>> The only reason for wanting that appears to be better management.
>> We have deeper issues like can we do a reasonable implementation without a
>> network device showing up in multiple namespaces.
>
> Again, I am not against having the network device virtualization. It is a good
> idea.
>
>> I think the reason the debate exists at all is that it is a very approachable
>> topic, as opposed to the fundamentals here.
>> If we can get layer 2 level isolation working without measurable overhead
>> with one namespace per device it may be worth revisiting things.  Until
>> then it is a side issue at best.
>
> I agree, so where are the answers of the questions I asked in my previous 
> email
> ? You said you did some implementation of network isolation with and without
> namespaces, so you should 

Re: strict isolation of net interfaces

2006-06-30 Thread Eric W. Biederman
"Serge E. Hallyn" <[EMAIL PROTECTED]> writes:

> Quoting Eric W. Biederman ([EMAIL PROTECTED]):
>> This whole debate on network devices show up in multiple network namespaces
>> is just silly.  The only reason for wanting that appears to be better
> management.
>
> A damned good reason.  

Better management is a good reason.  But constructing the management in 
a way that hampers the implementation and confuses existing applications is
a problem.

Things are much easier if namespaces are completely independent.

Among other things the semantics are clear and obvious.

> Clearly we want the parent namespace to be able
> to control what the child can do.  So whatever interface a child gets,
> the parent should be able to somehow address.  Simple iptables rules
> controlling traffic between it's own netdevice and the one it hands it's
> children seem a good option.

That or we setup the child and then drop CAP_NET_ADMIN.

>> We have deeper issues like can we do a reasonable implementation without a
>> network device showing up in multiple namespaces.
>
> Isn't that the same issue?

I guess I was thinking from the performance and cleanliness point of
view.

>> If we can get layer 2 level isolation working without measurable overhead
>> with one namespace per device it may be worth revisiting things.  Until
>> then it is a side issue at best.
>
> Ok, and in the meantime we can all use the network part of the bsdjail
> lsm?  :)

If necessary.  But mostly we concentrate on the fundamentals and figure
out what it takes to take the level 2 stuff working.

Eric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-06-30 Thread Serge E. Hallyn
Quoting Eric W. Biederman ([EMAIL PROTECTED]):
> This whole debate on network devices show up in multiple network namespaces
> is just silly.  The only reason for wanting that appears to be better 
> management.

A damned good reason.  Clearly we want the parent namespace to be able
to control what the child can do.  So whatever interface a child gets,
the parent should be able to somehow address.  Simple iptables rules
controlling traffic between it's own netdevice and the one it hands it's
children seem a good option.

> We have deeper issues like can we do a reasonable implementation without a
> network device showing up in multiple namespaces.

Isn't that the same issue?

> If we can get layer 2 level isolation working without measurable overhead
> with one namespace per device it may be worth revisiting things.  Until
> then it is a side issue at best.

Ok, and in the meantime we can all use the network part of the bsdjail
lsm?  :)

-serge
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-06-30 Thread Daniel Lezcano

Eric W. Biederman wrote:

Daniel Lezcano <[EMAIL PROTECTED]> writes:



Serge E. Hallyn wrote:


Quoting Cedric Le Goater ([EMAIL PROTECTED]):



we could work on virtualizing the net interfaces in the host, map them to
eth0 or something in the guest and let the guest handle upper network layers ?

lo0 would just be exposed relying on skbuff tagging to discriminate traffic
between guests.


This seems to me the preferable way.  We create a full virtual net
device for each new container, and fully virtualize the device
namespace.


I have a few questions about all the network isolation stuff:




It seems these questions are not important.



So far I have seen two viable possibilities on the table,
neither of them involve multiple names for a network device.

layer 3 (filtering the allowed ip addresses at bind time roughly the current 
vserver).
  - implementable as a security hook.
  - Benefit no measurable performance impact.
  - Downside not many things we can do.


What things ? Can you develop please ? Can you give some examples ?



layer 2 (What appears to applications a separate instance of the network stack).
  - Implementable as a namespace.


what about accessing a NFS mounted outside the container ?


  - Each network namespace would have dedicated network devices.
  - Benefit extremely flexible.


For what ? For who ? Do you have examples ?


  - Downside since at least the slow path must examine the packet
it has the possibility of slowing down the networking stack.


What is/are the slow path(s) you identified ?


For me the important characteristics.
- Allows for application migration, when we take our ip address with us.
  In particular it allows for importation of addresses assignments
  mad on other machines.


Ok for the two methods no ?


- No measurable impact on the existing networking when the code
  is compiled in.


You contradict ...


- Clean predictable semantics.


What that means ? Can you explain, please ?


This whole debate on network devices show up in multiple network namespaces
is just silly.  


The debate is not on the network device show up. The debate is can we 
have a network isolation ___usable for everybody___ not only for the 
beauty of having namespaces and for a system container like.


I am not against the network device virtualization or against the 
namespaces. I am just asking if the namespace is the solution for all 
the network isolation. Should we nest layer 2 and layer 3 vitualization 
into namespaces or separate them in order to have the flexibility to 
choose isolation/performance.



The only reason for wanting that appears to be better management.
We have deeper issues like can we do a reasonable implementation without a
network device showing up in multiple namespaces.


Again, I am not against having the network device virtualization. It is 
a good idea.



I think the reason the debate exists at all is that it is a very approachable
topic, as opposed to the fundamentals here.

If we can get layer 2 level isolation working without measurable overhead
with one namespace per device it may be worth revisiting things.  Until
then it is a side issue at best.


I agree, so where are the answers of the questions I asked in my 
previous email ? You said you did some implementation of network 
isolation with and without namespaces, so you should be able to answer...



  -- Daniel
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-06-30 Thread Eric W. Biederman
Daniel Lezcano <[EMAIL PROTECTED]> writes:

> Serge E. Hallyn wrote:
>> Quoting Cedric Le Goater ([EMAIL PROTECTED]):
>>
>>>we could work on virtualizing the net interfaces in the host, map them to
>>>eth0 or something in the guest and let the guest handle upper network layers 
>>>?
>>>
>>>lo0 would just be exposed relying on skbuff tagging to discriminate traffic
>>>between guests.
>> This seems to me the preferable way.  We create a full virtual net
>> device for each new container, and fully virtualize the device
>> namespace.
>
> I have a few questions about all the network isolation stuff:

So far I have seen two viable possibilities on the table,
neither of them involve multiple names for a network device.

layer 3 (filtering the allowed ip addresses at bind time roughly the current 
vserver).
  - implementable as a security hook.
  - Benefit no measurable performance impact.
  - Downside not many things we can do.

layer 2 (What appears to applications a separate instance of the network stack).
  - Implementable as a namespace.
  - Each network namespace would have dedicated network devices.
  - Benefit extremely flexible.
  - Downside since at least the slow path must examine the packet
it has the possibility of slowing down the networking stack.


For me the important characteristics.
- Allows for application migration, when we take our ip address with us.
  In particular it allows for importation of addresses assignments
  mad on other machines.

- No measurable impact on the existing networking when the code
  is compiled in.

- Clean predictable semantics.


This whole debate on network devices show up in multiple network namespaces
is just silly.  The only reason for wanting that appears to be better 
management.
We have deeper issues like can we do a reasonable implementation without a
network device showing up in multiple namespaces.

I think the reason the debate exists at all is that it is a very approachable
topic, as opposed to the fundamentals here.

If we can get layer 2 level isolation working without measurable overhead
with one namespace per device it may be worth revisiting things.  Until
then it is a side issue at best.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-06-30 Thread Daniel Lezcano

Serge E. Hallyn wrote:

Quoting Cedric Le Goater ([EMAIL PROTECTED]):


we could work on virtualizing the net interfaces in the host, map them to
eth0 or something in the guest and let the guest handle upper network layers ?

lo0 would just be exposed relying on skbuff tagging to discriminate traffic
between guests.



This seems to me the preferable way.  We create a full virtual net
device for each new container, and fully virtualize the device
namespace.


I have a few questions about all the network isolation stuff:

  * What level of isolation is wanted for the network ? network devices 
? IPv4/IPv6 ? TCP/UDP ?


  * How is handled the incoming packets from the network ? I mean what 
will be mecanism to dispatch the packet to the right virtual device ?


  * How to handle the SO_BINDTODEVICE socket option ?

  * Has the virtual device a different MAC address ? How to manage it 
with the real MAC address on the system ? How to manage ARP, ICMP, 
multicasting and IP ?


It seems for me, IMHO that will require a lot of translation and 
browsing table. It will probably add a very significant overhead.


   * How to handle NFS access mounted outside of the container ?

   * How to handle ICMP_REDIRECT ?

Regards







-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-06-30 Thread Cedric Le Goater
Serge E. Hallyn wrote:
> 
> The last one in your diagram confuses me - why foo0:1?  I would
> have thought it'd be

just thinking aloud. I thought that any kind/type of interface could be
mapped from host to guest.

> host  |  guest 0  |  guest 1  |  guest2
> --+---+---+--
>   |   |   |   |
>   |-> l0  <---+-> lo0 ... | lo0   | lo0
>   |   |   |   |
>   |-> eth0|   |   |
>   |   |   |   |
>   |-> veth0  <+-> eth0|   |
>   |   |   |   |
>   |-> veth1  <+---+---+-> eth0
>   |   |   |   |
>   |-> veth2   <---+---+-> eth0|
> 
> I think we should avoid using device aliases, as trying to do
> something like giving eth0:1 to guest1 and eth0:2 to guest2
> while hiding eth0:1 from guest2 requires some uglier code (as
> I recall) than working with full devices.  In other words,
> if a namespace can see eth0, and eth0:2 exists, it should always
> see eth0:2.
> 
> So conceptually using a full virtual net device per container
> certainly seems cleaner to me, and it seems like it should be
> simpler by way of statistics gathering etc, but are there actually
> any real gains?  Or is the support for multiple IPs per device
> actually enough?
> 
> Herbert, is this basically how ngnet is supposed to work?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-06-29 Thread Sam Vilain
Serge E. Hallyn wrote:
> The last one in your diagram confuses me - why foo0:1?  I would
> have thought it'd be
>
> host  |  guest 0  |  guest 1  |  guest2
> --+---+---+--
>   |   |   |   |
>   |-> l0  <---+-> lo0 ... | lo0   | lo0
>   |   |   |   |
>   |-> eth0|   |   |
>   |   |   |   |
>   |-> veth0  <+-> eth0|   |
>   |   |   |   |
>   |-> veth1  <+---+---+-> eth0
>   |   |   |   |
>   |-> veth2   <---+---+-> eth0|
>
> [...]
>
> So conceptually using a full virtual net device per container
> certainly seems cleaner to me, and it seems like it should be
> simpler by way of statistics gathering etc, but are there actually
> any real gains?  Or is the support for multiple IPs per device
> actually enough?
>   

Why special case loopback?

Why not:

host  |  guest 0  |  guest 1  |  guest2
--+---+---+--
  |   |   |   |
  |-> lo  |   |   |
  |   |   |   |
  |-> vlo0  <-+-> lo  |   |
  |   |   |   |
  |-> vlo1  <-+---+---+-> lo
  |   |   |   |
  |-> vlo2   <+---+-> lo  |
  |   |   |   |
  |-> eth0|   |   |
  |   |   |   |
  |-> veth0  <+-> eth0|   |
  |   |   |   |
  |-> veth1  <+---+---+-> eth0
  |   |   |   |
  |-> veth2   <---+---+-> eth0|


Sam.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strict isolation of net interfaces

2006-06-29 Thread Serge E. Hallyn
Quoting Cedric Le Goater ([EMAIL PROTECTED]):
> Sam Vilain wrote:
> > jamal wrote:
> >>> note: personally I'm absolutely not against virtualizing
> >>> the device names so that each guest can have a separate
> >>> name space for devices, but there should be a way to
> >>> 'see' _and_ 'identify' the interfaces from outside
> >>> (i.e. host or spectator context)
> >>>
> >>> 
> >> Makes sense for the host side to have naming convention tied
> >> to the guest. Example as a prefix: guest0-eth0. Would it not
> >> be interesting to have the host also manage these interfaces
> >> via standard tools like ip or ifconfig etc? i.e if i admin up
> >> guest0-eth0, then the user in guest0 will see its eth0 going
> >> up.
> > 
> > That particular convention only works if you have network namespaces and
> > UTS namespaces tightly bound.  We plan to have them separate - so for
> > that to work, each network namespace could have an arbitrary "prefix"
> > that determines what the interface name will look like from the outside
> > when combined.  We'd have to be careful about length limits.
> > 
> > And guest0-eth0 doesn't necessarily make sense; it's not really an
> > ethernet interface, more like a tun or something.
> > 
> > So, an equally good convention might be to use sequential prefixes on
> > the host, like "tun", "dummy", or a new prefix - then a property of that
> > is what the name of the interface is perceived to be to those who are in
> > the corresponding network namespace.
> > 
> > Then the pragmatic question becomes how to correlate what you see from
> > `ip addr list' to guests.
> 
> 
> we could work on virtualizing the net interfaces in the host, map them to
> eth0 or something in the guest and let the guest handle upper network layers ?
> 
> lo0 would just be exposed relying on skbuff tagging to discriminate traffic
> between guests.

This seems to me the preferable way.  We create a full virtual net
device for each new container, and fully virtualize the device
namespace.

> host  |  guest 0  |  guest 1  |  guest2
> --+---+---+--
>   |   |   |   |
>   |-> l0  <---+-> lo0 ... | lo0   | lo0
>   |   |   |   |
>   |-> bar0   <+-> eth0|   |
>   |   |   |   |
>   |-> foo0   <+---+---+-> eth0
>   |   |   |   |
>   `-> foo0:1  <---+---+-> eth0|
>   |   |   |
> 
> 
> is that clear ? stupid ? reinventing the wheel ?

The last one in your diagram confuses me - why foo0:1?  I would
have thought it'd be

host  |  guest 0  |  guest 1  |  guest2
--+---+---+--
  |   |   |   |
  |-> l0  <---+-> lo0 ... | lo0   | lo0
  |   |   |   |
  |-> eth0|   |   |
  |   |   |   |
  |-> veth0  <+-> eth0|   |
  |   |   |   |
  |-> veth1  <+---+---+-> eth0
  |   |   |   |
  |-> veth2   <---+---+-> eth0|

I think we should avoid using device aliases, as trying to do
something like giving eth0:1 to guest1 and eth0:2 to guest2
while hiding eth0:1 from guest2 requires some uglier code (as
I recall) than working with full devices.  In other words,
if a namespace can see eth0, and eth0:2 exists, it should always
see eth0:2.

So conceptually using a full virtual net device per container
certainly seems cleaner to me, and it seems like it should be
simpler by way of statistics gathering etc, but are there actually
any real gains?  Or is the support for multiple IPs per device
actually enough?

Herbert, is this basically how ngnet is supposed to work?

-serge
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


strict isolation of net interfaces

2006-06-29 Thread Cedric Le Goater
Sam Vilain wrote:
> jamal wrote:
>>> note: personally I'm absolutely not against virtualizing
>>> the device names so that each guest can have a separate
>>> name space for devices, but there should be a way to
>>> 'see' _and_ 'identify' the interfaces from outside
>>> (i.e. host or spectator context)
>>>
>>> 
>> Makes sense for the host side to have naming convention tied
>> to the guest. Example as a prefix: guest0-eth0. Would it not
>> be interesting to have the host also manage these interfaces
>> via standard tools like ip or ifconfig etc? i.e if i admin up
>> guest0-eth0, then the user in guest0 will see its eth0 going
>> up.
> 
> That particular convention only works if you have network namespaces and
> UTS namespaces tightly bound.  We plan to have them separate - so for
> that to work, each network namespace could have an arbitrary "prefix"
> that determines what the interface name will look like from the outside
> when combined.  We'd have to be careful about length limits.
> 
> And guest0-eth0 doesn't necessarily make sense; it's not really an
> ethernet interface, more like a tun or something.
> 
> So, an equally good convention might be to use sequential prefixes on
> the host, like "tun", "dummy", or a new prefix - then a property of that
> is what the name of the interface is perceived to be to those who are in
> the corresponding network namespace.
> 
> Then the pragmatic question becomes how to correlate what you see from
> `ip addr list' to guests.


we could work on virtualizing the net interfaces in the host, map them to
eth0 or something in the guest and let the guest handle upper network layers ?

lo0 would just be exposed relying on skbuff tagging to discriminate traffic
between guests.



host  |  guest 0  |  guest 1  |  guest2
--+---+---+--
  |   |   |   |
  |-> l0  <---+-> lo0 ... | lo0   | lo0
  |   |   |   |
  |-> bar0   <+-> eth0|   |
  |   |   |   |
  |-> foo0   <+---+---+-> eth0
  |   |   |   |
  `-> foo0:1  <---+---+-> eth0|
  |   |   |


is that clear ? stupid ? reinventing the wheel ?

thanks,

C.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html