Quoting Daniel Lezcano ([EMAIL PROTECTED]): > Serge E. Hallyn wrote: >> Quoting Daniel Lezcano ([EMAIL PROTECTED]): >>> Serge E. Hallyn wrote: >>>> Quoting [EMAIL PROTECTED] ([EMAIL PROTECTED]): >>>>> From: Daniel Lezcano <[EMAIL PROTECTED]> >>>>> >>>>> For the moment, I only made this patch for the RFC. It shows how simple >>>>> it is >>>>> to hook different socket syscalls. This patch denies bind to any >>>>> addresses >>>>> which are not in the container IPV4 address list, except for the >>>>> INADDR_ANY. >>>>> >>>>> Signed-off-by: Daniel Lezcano <[EMAIL PROTECTED]> >>>>> >>>>> --- >>>>> kernel/container_network.c | 66 >>>>> +++++++++++++++++++++++---------------------- >>>>> 1 file changed, 35 insertions(+), 31 deletions(-) >>>>> >>>>> Index: 2.6-mm/kernel/container_network.c >>>>> =================================================================== >>>>> --- 2.6-mm.orig/kernel/container_network.c >>>>> +++ 2.6-mm/kernel/container_network.c >>>>> @@ -12,6 +12,9 @@ >>>>> #include <linux/list.h> >>>>> #include <linux/spinlock.h> >>>>> #include <linux/security.h> >>>>> +#include <linux/in.h> >>>>> +#include <linux/net.h> >>>>> +#include <linux/socket.h> >>>>> >>>>> struct network { >>>>> struct container_subsys_state css; >>>>> @@ -53,24 +56,14 @@ >>>>> >>>>> static int network_socket_create(int family, int type, int protocol, >>>>> int kern) >>>>> { >>>>> - struct network *network; >>>>> - >>>>> - network = task_network(current); >>>>> - if (!network || network == &top_network) >>>>> - return 0; >>>>> - >>>>> + /* nothing to do right now */ >>>>> return 0; >>>>> } >>>>> >>>>> static int network_socket_post_create(struct socket *sock, int family, >>>>> int type, int protocol, int kern) >>>>> { >>>>> - struct network *network; >>>>> - >>>>> - network = task_network(current); >>>>> - if (!network || network == &top_network) >>>>> - return 0; >>>>> - >>>>> + /* nothing to do right now */ >>>>> return 0; >>>>> } >>>>> >>>>> @@ -79,47 +72,58 @@ >>>> Please so send -p diffs. I'll assume this is network_socket_bind() >>>> given your patch description :) >>>>> int addrlen) >>>>> { >>>>> struct network *network; >>>>> + struct list_head *l; >>>>> + rwlock_t *lock; >>>>> + struct ipv4_list *entry; >>>>> + __be32 addr; >>>>> + int ret = -EPERM; >>>>> >>>>> + /* Do nothing for the root container */ >>>>> network = task_network(current); >>>>> if (!network || network == &top_network) >>>>> return 0; >>>>> >>>>> - return 0; >>>>> + /* Check we have to do some filtering */ >>>>> + if (sock->ops->family != AF_INET) >>>>> + return 0; >>>>> + >>>>> + l = &network->ipv4_list; >>>>> + lock = &network->ipv4_list_lock; >>>>> + addr = ((struct sockaddr_in *)address)->sin_addr.s_addr; >>>>> + >>>>> + if (addr == INADDR_ANY) >>>> In bsdjail, if addr == INADDR_ANY, I set addr = jailaddr. Do you think >>>> you want to do that? >>> Good question. This is one think I would like to define. If we do that we >>> can not connect via 127.0.0.1. and|or a container can have more than one >>> IP address, no ? >> Yes. >>> IMHO, we should have the loopback address available for all containers >>> and that means 127.0.0.1 is an IP address which is not isolated. >> For real network namespaces yes. For this version, I would have thought >> the goal would be to provide a minimal, useful, but very fast >> container-paddress binding. >> I guess I'll have to see the rest of your implementation, but I have the >> feeling that to not have this limitation you'll affect performance a >> bit. And since we are also working on full network namespaces, >> providing maximal functionality with worse performance would be a poor >> tradeoff here. >> But let's see the rest of your implementation. >> Did you mention somewhere that Eric still prefers using netfilter rather >> than LSM? > > Paul told me about a ip isolation based on the netfilter and a specific > iptable module: > > ------------------------------------------------------------------------ > " > On 9/6/07, Daniel Lezcano <[EMAIL PROTECTED]> wrote: > > > > > > I am really not opposed to iptables, I was thinking that if we want to > > > have bind filtering, security provides the framework for that and > adding > > > new hooks for the iptable will just add a hook duplication because they > > > are the same. > > > > > > So the result is: > > > > > > 1 - create a container => network.ipv4 (allowed addresses) > > > 2 - echo add 192.168.20.10 > network.ipv4 > > > > > > The application running inside the container can not use another > address > > > than the one assigned to it. > > > > > > This features is needed for some IP jailing like linux-vserver or for > > > security. The association container + IP isolation is really a good > feature. > > > > >> > > For instance, I personally am much more interested in being able to > >> > > control ports rather than IP addresses (although that could be > >> > > interesting too). > > > > > > What do want to do ? Can you describe the features you want ? > > > Is it a bind filtering for port ? If this is the case, then I can add > > > two new files: > > > network.tcp.ports > > > network.udp.ports > > > and extend the hooks to check the port too. > > > > > I think that (at least today :-) ) my ideal interface would be a list > of tuples of the form: > > local port range/remote ip address/remote ip mask/remote port range > > because I don't really care about multiple local addresses, but I do > care about binding to local ports and connecting to remote addresses > and ports. > > But other people (e.g. Eric) have completely different requirements. > Creating an API and mechanism that satisfies everyone is going to > result in you reimplementing a significant chunk of the iptables > functionality. > > > > > >> > > And someone else might have completely different > >> > > needs (e.g. people mentioned IPv6). Rather than you having to > >> > > implement all of these things, just giving a tag that can be tied to > >> > > iptables means that people can define these rules themselves in > >> > > userspace. > > > > > > I understand. But I don't see how we can handle bind filtering (ip | > port). > > > > > 1) Completely separately from containers, we create a new iptable > called something like "socket", with predefined chains BIND, ACCEPT > and CONNECT. When ever someone tries to do a bind(), accept() or > connect() we create a fake packet with the appropriate local and > remote addresses and ports, and feed it through the appropriate chain. > If it gets though OK we allow the operation to proceed, else we fail > with EPERM. > > 2) We create a new container (control group) subsystem, e.g. called > "network_id" that does two things: > > - creates a simple state object with a single uniquely-generated > integer network_id for each control group > > - provides a new iptable match module ("control_group"?) that matches > if the current task's network_id is within a given range. > > Then the user can create pretty much arbitrary rules with the existing > iptables tools and primitives. No complex new user APIs needed. > > Paul" > ------------------------------------------------------------------------ > >> So what... if a packet comes in with a certain destination >> address you can tag it with a container, and once a connection starts >> you can use connection tracking to continue tagging it with that >> container. You tag an outgoing packet with the container as soon as >> it's dumped in the socket, and rules enforce that the source address be >> valid for that container. Are you saying the netfilter hooks are in >> the wrong places for that? > > No, for that netfilters hooks are in the right place.
Ok, then that approach definately has its upsides. The only downside I see right now is what to do about a sendto() on a udp socket that hasn't been bound. -serge _______________________________________________ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers _______________________________________________ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel