[NET-2.6.24][VETH][patch 0/1] fix kernel Oops for veth
When I tryed the veth driver, I fall into a kernel oops. qemu login: Oops: [#1] Modules linked in: CPU:0 EIP:0060:[c0265c9e]Not tainted VLI EFLAGS: 0202 (2.6.23-rc6-g754f885d-dirty #33) EIP is at __linkwatch_run_queue+0x6a/0x175 eax: c7fc9550 ebx: 6b6b6b6b ecx: c3360c80 edx: 0246 esi: 0001 edi: 6b6b6b6b ebp: c7fd9f7c esp: c7fd9f5c ds: 007b es: 007b fs: gs: ss: 0068 Process events/0 (pid: 5, ti=c7fd8000 task=c7fc9550 task.ti=c7fd8000) Stack: c7fee5a8 c0387680 c7fd9f74 c02e1aaa 4f732564 c0387684 c7fee5a8 c0387680 c7fd9f84 c0265dc9 c7fd9fac c011fb3c c7fd9f94 c02e277e c7fd9fac c02e1166 c0265da9 c7fee5a8 c0120203 c7fd9fc8 c7fd9fd0 c01202ba c7fc9550 Call Trace: [c0102c69] show_trace_log_lvl+0x1a/0x2f [c0102d1b] show_stack_log_lvl+0x9d/0xa5 [c0102ee1] show_registers+0x1be/0x28f [c010309a] die+0xe8/0x208 [c010d5a1] do_page_fault+0x4ba/0x595 [c02e2842] error_code+0x6a/0x70 [c0265dc9] linkwatch_event+0x20/0x27 [c011fb3c] run_workqueue+0x7c/0x102 [c01202ba] worker_thread+0xb7/0xc5 [c012270c] kthread+0x39/0x61 [c0102913] kernel_thread_helper+0x7/0x10 === Code: b8 60 76 38 c0 e8 e3 ca 07 00 b8 60 76 38 c0 8b 1d 78 a7 3d c0 c7 05 78 a7 3d c0 00 00 00 00 e8 df ca 07 00 e9 ed 00 00 00 85 f6 8b bb f4 01 00 00 74 17 89 d8 e8 73 fe ff ff 85 c0 75 0c 89 d8 EIP: [c0265c9e] __linkwatch_run_queue+0x6a/0x175 SS:ESP 0068:c7fd9f5c Slab corruption: size-2048 start=c473eac8, len=2048 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [c025be72](free_netdev+0x1f/0x41) 200: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b c0 e2 73 c4 Prev obj: start=c473e2b0, len=2048 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. Last user: [c025bed0](alloc_netdev_mq+0x3c/0xa1) 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 010: 76 65 74 68 30 00 00 00 00 00 00 00 00 00 00 00 Next obj: start=c473f2e0, len=2048 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [c0260e69](neigh_sysctl_unregister+0x2b/0x2e) 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b That happens when trying to add the veth driver using the ip command: ip link add veth0 which fail. It appears that the netif_carrier_off is placed into the setup function and this one is called before register_netdevice. The register_netdevice function does a lot of initialization to the netdev and if the netif_carrier_off is called before the register_netdev function, it will use and trigger an event for an uninitialized netdev. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][patch 0/2] Dynamically allocate the loopback
This patch allows to dynamically allocate the loopback like an usual network device. This global static variable loopback_dev has been replaced by a netdev pointer and the init function does the usual allocation, initialization and registering of the loopback. This patchset is splitted in two parts, the first one is a big but trivial patch which replace the usage of the static variable loopback_dev by the usage of a pointer. The second patch is the interesting part where the loopback is dynamically allocated. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][patch 2/2] Dynamically allocate the loopback device
From: Daniel Lezcano [EMAIL PROTECTED] Doing this makes loopback.c a better example of how to do a simple network device, and it removes the special case single static allocation of a struct net_device, hopefully making maintenance easier. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Acked-By: Kirill Korotaev [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] --- drivers/net/loopback.c | 69 ++--- 1 file changed, 43 insertions(+), 26 deletions(-) Index: net-2.6.24/drivers/net/loopback.c === --- net-2.6.24.orig/drivers/net/loopback.c +++ net-2.6.24/drivers/net/loopback.c @@ -202,44 +202,61 @@ static const struct ethtool_ops loopback * The loopback device is special. There is only one instance and * it is statically allocated. Don't do this for other devices. */ -struct net_device __loopback_dev = { - .name = lo, - .get_stats = get_stats, - .mtu= (16 * 1024) + 20 + 20 + 12, - .hard_start_xmit= loopback_xmit, - .hard_header= eth_header, - .hard_header_cache = eth_header_cache, - .header_cache_update= eth_header_cache_update, - .hard_header_len= ETH_HLEN, /* 14 */ - .addr_len = ETH_ALEN, /* 6*/ - .tx_queue_len = 0, - .type = ARPHRD_LOOPBACK, /* 0x0001*/ - .rebuild_header = eth_rebuild_header, - .flags = IFF_LOOPBACK, - .features = NETIF_F_SG | NETIF_F_FRAGLIST +static void loopback_setup(struct net_device *dev) +{ + dev-get_stats = get_stats; + dev-mtu= (16 * 1024) + 20 + 20 + 12; + dev-hard_start_xmit= loopback_xmit; + dev-hard_header= eth_header; + dev-hard_header_cache = eth_header_cache; + dev-header_cache_update = eth_header_cache_update; + dev-hard_header_len= ETH_HLEN; /* 14 */ + dev-addr_len = ETH_ALEN; /* 6*/ + dev-tx_queue_len = 0; + dev-type = ARPHRD_LOOPBACK; /* 0x0001*/ + dev-rebuild_header = eth_rebuild_header; + dev-flags = IFF_LOOPBACK; + dev-features = NETIF_F_SG | NETIF_F_FRAGLIST #ifdef LOOPBACK_TSO - | NETIF_F_TSO + | NETIF_F_TSO #endif - | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA - | NETIF_F_LLTX - | NETIF_F_NETNS_LOCAL, - .ethtool_ops= loopback_ethtool_ops, - .nd_net = init_net, -}; - -struct net_device *loopback_dev = __loopback_dev; + | NETIF_F_NO_CSUM + | NETIF_F_HIGHDMA + | NETIF_F_LLTX + | NETIF_F_NETNS_LOCAL, + dev-ethtool_ops= loopback_ethtool_ops; +} /* Setup and register the loopback device. */ static int __init loopback_init(void) { - int err = register_netdev(loopback_dev); + struct net_device *dev; + int err; + + err = -ENOMEM; + dev = alloc_netdev(0, lo, loopback_setup); + if (!dev) + goto out; + err = register_netdev(dev); + if (err) + goto out_free_netdev; + + err = 0; + loopback_dev = dev; + +out: if (err) panic(loopback: Failed to register netdevice: %d\n, err); + return err; +out_free_netdev: + free_netdev(dev); + goto out; return err; }; -module_init(loopback_init); +fs_initcall(loopback_init); +struct net_device *loopback_dev; EXPORT_SYMBOL(loopback_dev); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][patch 1/2] Dynamically allocate the loopback device - mindless changes
From: Daniel Lezcano [EMAIL PROTECTED] This patch replaces all occurences to the static variable loopback_dev to a pointer loopback_dev. That provides the mindless, trivial, uninteressting change part for the dynamic allocation for the loopback. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Acked-By: Kirill Korotaev [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] --- drivers/net/loopback.c |6 -- include/linux/netdevice.h|2 +- net/core/dst.c |8 net/decnet/dn_dev.c |4 ++-- net/decnet/dn_route.c| 14 +++--- net/ipv4/devinet.c |6 +++--- net/ipv4/ipconfig.c |6 +++--- net/ipv4/ipvs/ip_vs_core.c |2 +- net/ipv4/route.c | 18 +- net/ipv4/xfrm4_policy.c |2 +- net/ipv6/addrconf.c | 15 +-- net/ipv6/ip6_input.c |2 +- net/ipv6/netfilter/ip6t_REJECT.c |2 +- net/ipv6/route.c | 15 ++- net/ipv6/xfrm6_policy.c |2 +- net/xfrm/xfrm_policy.c |4 ++-- 16 files changed, 55 insertions(+), 53 deletions(-) Index: net-2.6.24/drivers/net/loopback.c === --- net-2.6.24.orig/drivers/net/loopback.c +++ net-2.6.24/drivers/net/loopback.c @@ -202,7 +202,7 @@ static const struct ethtool_ops loopback * The loopback device is special. There is only one instance and * it is statically allocated. Don't do this for other devices. */ -struct net_device loopback_dev = { +struct net_device __loopback_dev = { .name = lo, .get_stats = get_stats, .mtu= (16 * 1024) + 20 + 20 + 12, @@ -227,10 +227,12 @@ struct net_device loopback_dev = { .nd_net = init_net, }; +struct net_device *loopback_dev = __loopback_dev; + /* Setup and register the loopback device. */ static int __init loopback_init(void) { - int err = register_netdev(loopback_dev); + int err = register_netdev(loopback_dev); if (err) panic(loopback: Failed to register netdevice: %d\n, err); Index: net-2.6.24/include/linux/netdevice.h === --- net-2.6.24.orig/include/linux/netdevice.h +++ net-2.6.24/include/linux/netdevice.h @@ -742,7 +742,7 @@ struct packet_type { #include linux/interrupt.h #include linux/notifier.h -extern struct net_device loopback_dev; /* The loopback */ +extern struct net_device *loopback_dev; /* The loopback */ extern rwlock_tdev_base_lock; /* Device list lock */ Index: net-2.6.24/net/core/dst.c === --- net-2.6.24.orig/net/core/dst.c +++ net-2.6.24/net/core/dst.c @@ -278,13 +278,13 @@ static inline void dst_ifdown(struct dst if (!unregister) { dst-input = dst-output = dst_discard; } else { - dst-dev = loopback_dev; - dev_hold(loopback_dev); + dst-dev = loopback_dev; + dev_hold(dst-dev); dev_put(dev); if (dst-neighbour dst-neighbour-dev == dev) { - dst-neighbour-dev = loopback_dev; + dst-neighbour-dev = loopback_dev; dev_put(dev); - dev_hold(loopback_dev); + dev_hold(dst-neighbour-dev); } } } Index: net-2.6.24/net/decnet/dn_dev.c === --- net-2.6.24.orig/net/decnet/dn_dev.c +++ net-2.6.24/net/decnet/dn_dev.c @@ -869,10 +869,10 @@ last_chance: rv = dn_dev_get_first(dev, addr); read_unlock(dev_base_lock); dev_put(dev); - if (rv == 0 || dev == loopback_dev) + if (rv == 0 || dev == loopback_dev) return rv; } - dev = loopback_dev; + dev = loopback_dev; dev_hold(dev); goto last_chance; } Index: net-2.6.24/net/decnet/dn_route.c === --- net-2.6.24.orig/net/decnet/dn_route.c +++ net-2.6.24/net/decnet/dn_route.c @@ -887,7 +887,7 @@ static int dn_route_output_slow(struct d .scope = RT_SCOPE_UNIVERSE, } }, .mark = oldflp-mark, - .iif = loopback_dev.ifindex, + .iif = loopback_dev-ifindex, .oif = oldflp-oif }; struct dn_route *rt = NULL; struct net_device *dev_out = NULL, *dev;
[net-2.6.24][NETNS][patch 0/1] fix allnoconfig compilation erro
fixes a compilation issue when allnoconfig is used. - init_net is unresolved. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][NETNS][patch 1/1] fix allnoconfig compilation error
From: Daniel Lezcano [EMAIL PROTECTED] When CONFIG_NET=no, init_net is unresolved because net_namespace.c is not compiled and the include pull init_net definition. This problem was very similar with the ipc namespace where the kernel can be compiled with SYSV ipc out. This patch fix that defining a macro which simply remove init_net initialization from nsproxy namespace aggregator. Compiled and booted on qemu-i386 with CONFIG_NET=no and CONFIG_NET=yes. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Acked-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/init_task.h |2 +- include/net/net_namespace.h |7 +++ 2 files changed, 8 insertions(+), 1 deletion(-) Index: net-2.6.24/include/linux/init_task.h === --- net-2.6.24.orig/include/linux/init_task.h +++ net-2.6.24/include/linux/init_task.h @@ -79,7 +79,7 @@ extern struct nsproxy init_nsproxy; .nslock = __SPIN_LOCK_UNLOCKED(nsproxy.nslock), \ .uts_ns = init_uts_ns, \ .mnt_ns = NULL, \ - .net_ns = init_net,\ + INIT_NET_NS(net_ns) \ INIT_IPC_NS(ipc_ns) \ .user_ns= init_user_ns,\ } Index: net-2.6.24/include/net/net_namespace.h === --- net-2.6.24.orig/include/net/net_namespace.h +++ net-2.6.24/include/net/net_namespace.h @@ -28,7 +28,14 @@ struct net { struct hlist_head *dev_index_head; }; +#ifdef CONFIG_NET +/* Init's network namespace */ extern struct net init_net; +#define INIT_NET_NS(net_ns) .net_ns = init_net, +#else +#define INIT_NET_NS(net_ns) +#endif + extern struct list_head net_namespace_list; extern void __put_net(struct net *net); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][NETNS][patch 0/3] fixes for the core network namespace
The following patches fixes some compilation errors and boot problems related to the network namespace patchset. They apply to net-2.6.24 -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][NETNS][patch 3/3] fix bad macro definition
From: Daniel Lezcano [EMAIL PROTECTED] The macro definition is bad. When calling next_net_device with parameter name dev, the resulting code is: struct net_device *dev = dev and that leads to an unexpected behavior. Especially when llc_core is compiled in, the kernel panics at boot time. The patchset change macro definition with static inline functions as they were defined before. Signed-off-by: Benjamin Thery [EMAIL PROTECTED] Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/linux/netdevice.h | 35 +-- 1 file changed, 17 insertions(+), 18 deletions(-) Index: net-2.6.24/include/linux/netdevice.h === --- net-2.6.24.orig/include/linux/netdevice.h +++ net-2.6.24/include/linux/netdevice.h @@ -41,7 +41,8 @@ #include linux/dmaengine.h #include linux/workqueue.h -struct net; +#include net/net_namespace.h + struct vlan_group; struct ethtool_ops; struct netpoll_info; @@ -739,23 +740,21 @@ list_for_each_entry_continue(d, (net)-dev_base_head, dev_list) #define net_device_entry(lh) list_entry(lh, struct net_device, dev_list) -#define next_net_device(d) \ -({ \ - struct net_device *dev = d; \ - struct list_head *lh; \ - struct net *net;\ - \ - net = dev-nd_net; \ - lh = dev-dev_list.next;\ - lh == net-dev_base_head ? NULL : net_device_entry(lh);\ -}) - -#define first_net_device(N)\ -({ \ - struct net *NET = (N); \ - list_empty(NET-dev_base_head) ? NULL :\ - net_device_entry(NET-dev_base_head.next); \ -}) +static inline struct net_device *next_net_device(struct net_device *dev) +{ + struct list_head *lh; + struct net *net; + + net = dev-nd_net; +lh = dev-dev_list.next; + return lh == net-dev_base_head ? NULL : net_device_entry(lh); +} + +static inline struct net_device *first_net_device(struct net *net) +{ + return list_empty(net-dev_base_head) ? NULL : + net_device_entry(net-dev_base_head.next); +} extern int netdev_boot_setup_check(struct net_device *dev); extern unsigned long netdev_boot_base(const char *prefix, int unit); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][NETNS][patch 2/3] fix loopback network namespace initialization
From: Daniel Lezcano [EMAIL PROTECTED] The core patchset of the network namespace sent by Eric Biederman does not do dynamic loopback creation. So there is no call to alloc_netdev_mq which fills the network namespace field of the netdevice. This patch assign the loopback to the init network namespace. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- drivers/net/loopback.c |1 + 1 file changed, 1 insertion(+) Index: net-2.6.24/drivers/net/loopback.c === --- net-2.6.24.orig/drivers/net/loopback.c +++ net-2.6.24/drivers/net/loopback.c @@ -225,6 +225,7 @@ | NETIF_F_LLTX | NETIF_F_NETNS_LOCAL, .ethtool_ops= loopback_ethtool_ops, + .nd_net = init_net, }; /* Setup and register the loopback device. */ -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][NETNS][patch 1/3] fix export symbols
From: Daniel Lezcano [EMAIL PROTECTED] Add the appropriate EXPORT_SYMBOLS for proc_net_create, proc_net_fops_create and proc_net_remove to fix errors when compiling allmodconfig Signed-off-by: Mark Nelson [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] --- fs/proc/proc_net.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: net-2.6.24/fs/proc/proc_net.c === --- net-2.6.24.orig/fs/proc/proc_net.c +++ net-2.6.24/fs/proc/proc_net.c @@ -31,6 +31,7 @@ { return create_proc_info_entry(name,mode, net-proc_net, get_info); } +EXPORT_SYMBOL_GPL(proc_net_create); struct proc_dir_entry *proc_net_fops_create(struct net *net, const char *name, mode_t mode, const struct file_operations *fops) @@ -42,12 +43,13 @@ res-proc_fops = fops; return res; } +EXPORT_SYMBOL_GPL(proc_net_fops_create); void proc_net_remove(struct net *net, const char *name) { remove_proc_entry(name, net-proc_net); } - +EXPORT_SYMBOL_GPL(proc_net_remove); static struct proc_dir_entry *proc_net_shadow; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][XFRM][patch 0/1] fix allmodconfig
Fixes missing export symbols -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-2.6.24][XFRM][patch 1/1] fix xfrm audit export symbol for allmodconfig
From: Daniel Lezcano [EMAIL PROTECTED] This patch fixes export symbol for: xfrm_audit_policy_add xfrm_audit_policy_delete xfrm_audit_state_add xfrm_audit_state_delete That allows xfrm_user and af_key to be compiled as module I didn't used EXPORT_SYMBOL_GPL to be consistent with the rest of the code. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- net/xfrm/xfrm_policy.c |2 ++ net/xfrm/xfrm_state.c |3 +++ 2 files changed, 5 insertions(+) Index: net-2.6.24/net/xfrm/xfrm_policy.c === --- net-2.6.24.orig/net/xfrm/xfrm_policy.c +++ net-2.6.24/net/xfrm/xfrm_policy.c @@ -2341,6 +2341,7 @@ xfrm_audit_common_policyinfo(xp, audit_buf); audit_log_end(audit_buf); } +EXPORT_SYMBOL(xfrm_audit_policy_add); void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result, u32 auid, u32 sid) @@ -2357,6 +2358,7 @@ xfrm_audit_common_policyinfo(xp, audit_buf); audit_log_end(audit_buf); } +EXPORT_SYMBOL(xfrm_audit_policy_delete); #endif #ifdef CONFIG_XFRM_MIGRATE Index: net-2.6.24/net/xfrm/xfrm_state.c === --- net-2.6.24.orig/net/xfrm/xfrm_state.c +++ net-2.6.24/net/xfrm/xfrm_state.c @@ -1865,6 +1865,7 @@ (unsigned long)x-id.spi, (unsigned long)x-id.spi); audit_log_end(audit_buf); } +EXPORT_SYMBOL(xfrm_audit_state_add); void xfrm_audit_state_delete(struct xfrm_state *x, int result, u32 auid, u32 sid) @@ -1883,4 +1884,6 @@ (unsigned long)x-id.spi, (unsigned long)x-id.spi); audit_log_end(audit_buf); } +EXPORT_SYMBOL(xfrm_audit_state_delete); + #endif /* CONFIG_AUDITSYSCALL */ -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/1] [PATCH] Fix Kconfigs for net-2.6.24
Fixes for 3 typos in Kconfig files -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/1] Fix some Kconfigs on net-2.6.24
From: Daniel Lezcano [EMAIL PROTECTED] Three fixes for Kconfigs. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- drivers/input/misc/Kconfig |2 +- drivers/leds/Kconfig |2 +- drivers/telephony/Kconfig |2 +- 3 files changed, 3 insertions(+), 3 deletions(-) Index: net-2.6.24/drivers/input/misc/Kconfig === --- net-2.6.24.orig/drivers/input/misc/Kconfig +++ net-2.6.24/drivers/input/misc/Kconfig @@ -152,7 +152,7 @@ config INPUT_YEALINK tristate Yealink usb-p1k voip phone - depends EXPERIMENTAL + depends on EXPERIMENTAL depends on USB_ARCH_HAS_HCD select USB help Index: net-2.6.24/drivers/leds/Kconfig === --- net-2.6.24.orig/drivers/leds/Kconfig +++ net-2.6.24/drivers/leds/Kconfig @@ -83,7 +83,7 @@ config LEDS_H1940 tristate LED Support for iPAQ H1940 device - depends LEDS_CLASS ARCH_H1940 + depends on LEDS_CLASS ARCH_H1940 help This option enables support for the LEDs on the h1940. Index: net-2.6.24/drivers/telephony/Kconfig === --- net-2.6.24.orig/drivers/telephony/Kconfig +++ net-2.6.24/drivers/telephony/Kconfig @@ -19,7 +19,7 @@ config PHONE_IXJ tristate QuickNet Internet LineJack/PhoneJack support - depends ISA || PCI + depends on ISA || PCI ---help--- Say M if you have a telephony card manufactured by Quicknet Technologies, Inc. These include the Internet PhoneJACK and -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/1][RFC] add a private field to the sock structure
From: Daniel Lezcano [EMAIL PROTECTED] Store private information for a socket This patch adds a field to the common socket structure. This field is a anonymous pointer which allow to store an information about the socket Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/net/inet_timewait_sock.h |1 + include/net/sock.h |3 +++ net/ipv4/inet_timewait_sock.c|1 + 3 files changed, 5 insertions(+) Index: net-2.6.24-bf/include/net/sock.h === --- net-2.6.24-bf.orig/include/net/sock.h +++ net-2.6.24-bf/include/net/sock.h @@ -106,6 +106,7 @@ * @skc_refcnt: reference count * @skc_hash: hash value used with various protocol lookup tables * @skc_prot: protocol handlers inside a network family + * @skc_private: field used to store private data * * This is the minimal network layer representation of sockets, the header * for struct sock and struct inet_timewait_sock. @@ -120,6 +121,7 @@ atomic_tskc_refcnt; unsigned intskc_hash; struct proto*skc_prot; + void*skc_private; }; /** @@ -196,6 +198,7 @@ #define sk_refcnt __sk_common.skc_refcnt #define sk_hash__sk_common.skc_hash #define sk_prot__sk_common.skc_prot +#define sk_private __sk_common.skc_private unsigned char sk_shutdown : 2, sk_no_check : 2, sk_userlocks : 4; Index: net-2.6.24-bf/net/ipv4/inet_timewait_sock.c === --- net-2.6.24-bf.orig/net/ipv4/inet_timewait_sock.c +++ net-2.6.24-bf/net/ipv4/inet_timewait_sock.c @@ -108,6 +108,7 @@ tw-tw_hash = sk-sk_hash; tw-tw_ipv6only = 0; tw-tw_prot = sk-sk_prot_creator; + tw-tw_private = sk-sk_private; atomic_set(tw-tw_refcnt, 1); inet_twsk_dead_node_init(tw); __module_get(tw-tw_prot-owner); Index: net-2.6.24-bf/include/net/inet_timewait_sock.h === --- net-2.6.24-bf.orig/include/net/inet_timewait_sock.h +++ net-2.6.24-bf/include/net/inet_timewait_sock.h @@ -115,6 +115,7 @@ #define tw_refcnt __tw_common.skc_refcnt #define tw_hash__tw_common.skc_hash #define tw_prot__tw_common.skc_prot +#define tw_private __tw_common.skc_private volatile unsigned char tw_substate; /* 3 bits hole, try to pack */ unsigned char tw_rcv_wscale; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/1][RFC] add a private field to the sock structure
When a socket is created it is sometime useful to store a specific information for this socket. This information can be for examples: * a creation time * a pid * a uid/gid * a container identifier * a pointer to a more specific structure * ... The following patch is a proposition to add a private anonymous pointer field to the common part of the sock structure. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] Dynamically allocate the loopback device
From: Daniel Lezcano [EMAIL PROTECTED] Doing this makes loopback.c a better example of how to do a simple network device, and it removes the special case single static allocation of a struct net_device, hopefully making maintenance easier. Applies against net-2.6.24 Tested on i386, x86_64 Compiled on ia64, sparc Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Acked-By: Kirill Korotaev [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] --- drivers/net/loopback.c | 63 +++--- include/linux/netdevice.h|2 +- net/core/dst.c |8 ++-- net/decnet/dn_dev.c |4 +- net/decnet/dn_route.c| 14 net/ipv4/devinet.c |6 ++-- net/ipv4/ipconfig.c |6 ++-- net/ipv4/ipvs/ip_vs_core.c |2 +- net/ipv4/route.c | 18 +- net/ipv4/xfrm4_policy.c |2 +- net/ipv6/addrconf.c | 15 +--- net/ipv6/ip6_input.c |2 +- net/ipv6/netfilter/ip6t_REJECT.c |2 +- net/ipv6/route.c | 15 +++- net/ipv6/xfrm6_policy.c |2 +- net/xfrm/xfrm_policy.c |4 +- 16 files changed, 89 insertions(+), 76 deletions(-) diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index 5106c23..3642aff 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -199,44 +199,57 @@ static const struct ethtool_ops loopback_ethtool_ops = { .get_rx_csum= always_on, }; -/* - * The loopback device is special. There is only one instance and - * it is statically allocated. Don't do this for other devices. - */ -struct net_device loopback_dev = { - .name = lo, - .get_stats = get_stats, - .mtu= (16 * 1024) + 20 + 20 + 12, - .hard_start_xmit= loopback_xmit, - .hard_header= eth_header, - .hard_header_cache = eth_header_cache, - .header_cache_update= eth_header_cache_update, - .hard_header_len= ETH_HLEN, /* 14 */ - .addr_len = ETH_ALEN, /* 6*/ - .tx_queue_len = 0, - .type = ARPHRD_LOOPBACK, /* 0x0001*/ - .rebuild_header = eth_rebuild_header, - .flags = IFF_LOOPBACK, - .features = NETIF_F_SG | NETIF_F_FRAGLIST +static void loopback_setup(struct net_device *dev) +{ + dev-get_stats = get_stats; + dev-mtu= (16 * 1024) + 20 + 20 + 12; + dev-hard_start_xmit= loopback_xmit; + dev-hard_header= eth_header; + dev-hard_header_cache = eth_header_cache; + dev-header_cache_update = eth_header_cache_update; + dev-hard_header_len= ETH_HLEN; /* 14 */ + dev-addr_len = ETH_ALEN; /* 6*/ + dev-tx_queue_len = 0; + dev-type = ARPHRD_LOOPBACK; /* 0x0001*/ + dev-rebuild_header = eth_rebuild_header; + dev-flags = IFF_LOOPBACK; + dev-features = NETIF_F_SG | NETIF_F_FRAGLIST #ifdef LOOPBACK_TSO | NETIF_F_TSO #endif | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA - | NETIF_F_LLTX, - .ethtool_ops= loopback_ethtool_ops, -}; + | NETIF_F_LLTX; + dev-ethtool_ops= loopback_ethtool_ops; +} /* Setup and register the loopback device. */ static int __init loopback_init(void) { - int err = register_netdev(loopback_dev); + struct net_device *dev; + int err; + + err = -ENOMEM; + dev = alloc_netdev(0, lo, loopback_setup); + if (!dev) + goto out; + + err = register_netdev(dev); + if (err) + goto out_free_netdev; + err = 0; + loopback_dev = dev; + +out: if (err) panic(loopback: Failed to register netdevice: %d\n, err); - return err; +out_free_netdev: + free_netdev(dev); + goto out; }; -module_init(loopback_init); +fs_initcall(loopback_init); +struct net_device *loopback_dev; EXPORT_SYMBOL(loopback_dev); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 8d12f02..7cd0641 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -680,7 +680,7 @@ struct packet_type { #include linux/interrupt.h #include linux/notifier.h -extern struct net_device loopback_dev; /* The loopback */ +extern struct net_device *loopback_dev; /* The loopback */ extern struct list_headdev_base_head; /* All devices */ extern rwlock_tdev_base_lock;
[patch 01/12] net namespace : initialize init process to level 2
From: Daniel Lezcano [EMAIL PROTECTED] Initialize the init's network namespace to level 2 Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- net/core/net_namespace.c |1 + 1 file changed, 1 insertion(+) Index: 2.6.20-rc4-mm1/net/core/net_namespace.c === --- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c +++ 2.6.20-rc4-mm1/net/core/net_namespace.c @@ -21,6 +21,7 @@ .dev_tail_p = init_net_ns.dev_base_p, .loopback_dev_p = NULL, .pcpu_lstats_p = NULL, + .level = NET_NS_LEVEL2, }; #ifdef CONFIG_NET_NS -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 00/12] net namespace : L3 namespace - introduction
This patchset provide a network isolation similar at what Linux-Vserver provides. It is based on the L2 namespaces and relies on the mechanisms provided by the namespace. This L3 namespaces does not aim to bring full virtualization for the network, it provides an IP isolation which can be reused for Linux-Vserver, jailed application or application containers. A L3 namespace are always L2 s' childs and they can not create more network namespaces, furthermore, they lose their NET_ADMIN capability. They share their parent's network ressources. From the parent namespace, IP addresses are created and assigned to the different L3 childs. From this point, L3 namespaces can use their assigned IP address and all computed broadcast addresses. Because the L3 namespace relies on the L2 virtualization mechanisms, it is possible to have several L3 namespaces listening on INADDR_ANY:port without conflict, that's allow to run several server without modifying the network configuration. The loopback is a shared device between all L3 namespaces. To ensure the 127.0.0.1 address isolation, the sender store its namespace into the packet, so when the packet arrives, the destination namespace is already set, because source == destination. By this way, it is easy to disable the loopback isolation and let the application to talk with application outside of the namespace via the 127.0.0.1 because we consider them trusted (like portmap). The ifconfig / ip commands will only show IP addresses assigned to the L3 namespace. When a L3 namespace dies, the assigned IP address is released to its parent. At the IP level, when a packet arrives, the L3 network namespace destination is retrieved from the destination address. At the bind time, the address is checked against the assigned IP address. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 02/12] net namespace : store L2 parent namespace
From: Daniel Lezcano [EMAIL PROTECTED] All L3 namespaces are the final nodes of the L2 namespaces tree. Because their share some ressources coming from the L2 namespace. The L2 parent namespace should be stored into the L3 child when it is created. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/linux/net_namespace.h |1 + net/core/net_namespace.c | 11 +++ 2 files changed, 12 insertions(+) Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h === --- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h +++ 2.6.20-rc4-mm1/include/linux/net_namespace.h @@ -27,6 +27,7 @@ #define NET_NS_LEVEL2 1 #define NET_NS_LEVEL3 2 unsigned intlevel; + struct net_namespace*parent; }; extern struct net_namespace init_net_ns; Index: 2.6.20-rc4-mm1/net/core/net_namespace.c === --- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c +++ 2.6.20-rc4-mm1/net/core/net_namespace.c @@ -22,6 +22,7 @@ .loopback_dev_p = NULL, .pcpu_lstats_p = NULL, .level = NET_NS_LEVEL2, + .parent = NULL, }; #ifdef CONFIG_NET_NS @@ -62,6 +63,12 @@ if (ip_fib_struct_init()) goto out_fib4; } + + if (level == NET_NS_LEVEL3) { + get_net_ns(old_ns); + ns-parent = old_ns; + } + ns-level = level; if (loopback_init()) goto out_loopback; @@ -126,8 +133,12 @@ ns, atomic_read(ns-kref.refcount)); return; } + if (ns-level == NET_NS_LEVEL2) ip_fib_struct_cleanup(ns); + if (ns-level == NET_NS_LEVEL3) + put_net_ns(ns-parent); + printk(KERN_DEBUG NET_NS: net namespace %p destroyed\n, ns); kfree(ns); } -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 12/12] net namespace : Add broadcasting
From: Daniel Lezcano [EMAIL PROTECTED] Broadcast packets should be delivered to l2 and all l3 childs Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/linux/net_namespace.h | 11 +++ net/core/net_namespace.c | 27 +++ net/ipv4/udp.c|3 ++- 3 files changed, 40 insertions(+), 1 deletion(-) Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h === --- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h +++ 2.6.20-rc4-mm1/include/linux/net_namespace.h @@ -9,6 +9,7 @@ struct in_ifaddr; struct sk_buff; +struct sock; struct net_namespace { struct kref kref; @@ -109,6 +110,9 @@ extern void net_ns_tag_sk_buff(struct sk_buff *skb); +extern int net_ns_sock_is_visible(const struct sock *sk, + const struct net_namespace *net_ns); + #define SELECT_SRC_ADDR net_ns_select_source_address #else /* CONFIG_NET_NS */ @@ -192,6 +196,13 @@ { ; } + +static inline int net_ns_sock_is_visible(const struct sock *sk, +const struct net_namespace *net_ns) +{ + return 1; +} + #define SELECT_SRC_ADDR inet_select_addr #endif /* !CONFIG_NET_NS */ Index: 2.6.20-rc4-mm1/net/core/net_namespace.c === --- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c +++ 2.6.20-rc4-mm1/net/core/net_namespace.c @@ -17,6 +17,7 @@ #include linux/ip.h #include net/ip_fib.h +#include net/sock.h struct net_namespace init_net_ns = { .kref = { @@ -464,4 +465,30 @@ struct net_namespace *net_ns = current_net_ns; skb-net_ns = net_ns; } + +/* + * This function checks if the socket is visible from the specified + * namespace. This is needed to ensure the broadcast and the multicast + * for multiple network namespace l2 and l3 to have the packets to be + * delivered. If we have a l3 namespace and its parent (l2 namespace) + * listening on a broadcast address, we should deliver the packet to + * both. That is done by the udp_v4_mcast_next function. But we should + * find a common point between sockets which are relatives to a + * namespace. The common point is they have the same parent in case + * of l3 network namespace. + * @sk : the socket to be checked + * @net_ns : the receiving network namespace + * Returns: 1 if the socket is visible by the namespace, 0 otherwise. + */ +int net_ns_sock_is_visible(const struct sock *sk, + const struct net_namespace *net_ns) +{ + if (net_ns-level == NET_NS_LEVEL3) + net_ns = net_ns-parent; + + if (sk-sk_net_ns-level == NET_NS_LEVEL3) + return sk-sk_net_ns-parent == net_ns; + else + return sk-sk_net_ns == net_ns; +} #endif /* CONFIG_NET_NS */ Index: 2.6.20-rc4-mm1/net/ipv4/udp.c === --- 2.6.20-rc4-mm1.orig/net/ipv4/udp.c +++ 2.6.20-rc4-mm1/net/ipv4/udp.c @@ -309,9 +309,10 @@ (inet-dport != rmt_port inet-dport)|| (inet-rcv_saddr inet-rcv_saddr != loc_addr)|| ipv6_only_sock(s) || - !net_ns_match(sk-sk_net_ns, ns)|| (s-sk_bound_dev_if s-sk_bound_dev_if != dif)) continue; + if (!net_ns_sock_is_visible(sk, ns)) + continue; if (!ip_mc_sf_allow(s, loc_addr, rmt_addr, dif)) continue; goto found; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 08/12] net namespace : find namespace by addr
From: Daniel Lezcano [EMAIL PROTECTED] Switch to the the l3 namespace using the destination address. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/linux/net_namespace.h |7 +++ net/core/net_namespace.c | 35 +++ net/ipv4/ip_input.c | 16 +++- 3 files changed, 57 insertions(+), 1 deletion(-) Index: 2.6.20-rc4-mm1/net/ipv4/ip_input.c === --- 2.6.20-rc4-mm1.orig/net/ipv4/ip_input.c +++ 2.6.20-rc4-mm1/net/ipv4/ip_input.c @@ -374,6 +374,9 @@ { struct iphdr *iph; u32 len; + int err; + struct net_namespace *net_ns = current_net_ns; + struct net_namespace *dst_net_ns = NULL; /* When the interface is in promisc. mode, drop all the crap * that it receives, do not try to analyse it. @@ -393,6 +396,9 @@ iph = skb-nh.iph; + dst_net_ns = net_ns_find_from_dest_addr(iph-daddr); + if (dst_net_ns !net_ns_match(net_ns, dst_net_ns)) + push_net_ns(dst_net_ns); /* * RFC1122: 3.1.2.2 MUST silently discard any IP frame that fails the checksum. * @@ -431,10 +437,18 @@ /* Remove any debris in the socket control block */ memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); - return NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL, + err = NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL, ip_rcv_finish); + if (dst_net_ns !net_ns_match(net_ns, dst_net_ns)) + pop_net_ns(net_ns); + + return err; + inhdr_error: + if (dst_net_ns !net_ns_match(net_ns, dst_net_ns)) + pop_net_ns(net_ns); + IP_INC_STATS_BH(IPSTATS_MIB_INHDRERRORS); drop: kfree_skb(skb); Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h === --- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h +++ 2.6.20-rc4-mm1/include/linux/net_namespace.h @@ -99,6 +99,8 @@ extern __be32 net_ns_select_source_address(const struct net_device *dev, u32 dst, int scope); +extern struct net_namespace *net_ns_find_from_dest_addr(u32 daddr); + #define SELECT_SRC_ADDR net_ns_select_source_address #else /* CONFIG_NET_NS */ @@ -167,6 +169,11 @@ return 0; } +static inline struct net_namespace *net_ns_find_from_dest_addr(u32 daddr) +{ + return NULL; +} + #define SELECT_SRC_ADDR inet_select_addr #endif /* !CONFIG_NET_NS */ Index: 2.6.20-rc4-mm1/net/core/net_namespace.c === --- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c +++ 2.6.20-rc4-mm1/net/core/net_namespace.c @@ -385,4 +385,39 @@ out: return addr; } + +/* + * This function finds the network namespace destination deduced from + * the destination address. The network namespace is retrieved from + * the ifaddr owned by a network namespace + * @daddr : destination + * Returns : the network namespace destination or NULL if not found + */ +struct net_namespace *net_ns_find_from_dest_addr(u32 daddr) +{ + struct net_namespace *net_ns = NULL; + struct net_device *dev; + struct in_device *in_dev; + + if (LOOPBACK(daddr)) + return current_net_ns; + + read_lock(dev_base_lock); + rcu_read_lock(); + for (dev = dev_base; dev; dev = dev-next) { + if ((in_dev = __in_dev_get_rcu(dev)) == NULL) + continue; + for_ifa(in_dev) { + if (ifa-ifa_local == daddr) { + net_ns = ifa-ifa_net_ns; + goto out_unlock_both; + } + } endfor_ifa(in_dev); + } +out_unlock_both: + read_unlock(dev_base_lock); + rcu_read_unlock(); + + return net_ns; +} #endif /* CONFIG_NET_NS */ -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 09/12] net namespace : make loopback address always visible
From: Daniel Lezcano [EMAIL PROTECTED] Add a specific condition when doing inet interface listing in order to see always the loopback address. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/linux/net_namespace.h |9 + net/core/net_namespace.c | 22 ++ net/ipv4/devinet.c| 12 +--- 3 files changed, 36 insertions(+), 7 deletions(-) Index: 2.6.20-rc4-mm1/net/ipv4/devinet.c === --- 2.6.20-rc4-mm1.orig/net/ipv4/devinet.c +++ 2.6.20-rc4-mm1/net/ipv4/devinet.c @@ -695,8 +695,7 @@ for (ifap = in_dev-ifa_list; (ifa = *ifap) != NULL; ifap = ifa-ifa_next) { if (!strcmp(ifr.ifr_name, ifa-ifa_label) - net_ns_match(ifa-ifa_net_ns, -current_net_ns) + net_ns_ifa_is_visible(ifa) sin_orig.sin_addr.s_addr == ifa-ifa_address) { break; /* found */ @@ -710,13 +709,12 @@ for (ifap = in_dev-ifa_list; (ifa = *ifap) != NULL; ifap = ifa-ifa_next) if (!strcmp(ifr.ifr_name, ifa-ifa_label) -net_ns_match(ifa-ifa_net_ns, - current_net_ns)) +net_ns_ifa_is_visible(ifa)) break; } } - if (ifa !net_ns_match(ifa-ifa_net_ns, current_net_ns)) + if (ifa !net_ns_ifa_is_visible(ifa)) goto done; ret = -EADDRNOTAVAIL; @@ -868,7 +866,7 @@ goto out; for (; ifa; ifa = ifa-ifa_next) { - if (!net_ns_match(ifa-ifa_net_ns, current_net_ns)) + if (!net_ns_ifa_is_visible(ifa)) continue; if (!buf) { done += sizeof(ifr); @@ -1216,7 +1214,7 @@ for (ifa = in_dev-ifa_list, ip_idx = 0; ifa; ifa = ifa-ifa_next, ip_idx++) { - if (!net_ns_match(ifa-ifa_net_ns, current_net_ns)) + if (!net_ns_ifa_is_visible(ifa)) continue; if (ip_idx s_ip_idx) continue; Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h === --- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h +++ 2.6.20-rc4-mm1/include/linux/net_namespace.h @@ -7,6 +7,8 @@ #include linux/errno.h #include linux/types.h +struct in_ifaddr; + struct net_namespace { struct kref kref; struct net_device *dev_base_p, **dev_tail_p; @@ -101,6 +103,8 @@ extern struct net_namespace *net_ns_find_from_dest_addr(u32 daddr); +extern int net_ns_ifa_is_visible(const struct in_ifaddr *ifa); + #define SELECT_SRC_ADDR net_ns_select_source_address #else /* CONFIG_NET_NS */ @@ -174,6 +178,11 @@ return NULL; } +static inline int net_ns_ifa_is_visible(const struct in_ifaddr *ifa) +{ + return 1; +} + #define SELECT_SRC_ADDR inet_select_addr #endif /* !CONFIG_NET_NS */ Index: 2.6.20-rc4-mm1/net/core/net_namespace.c === --- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c +++ 2.6.20-rc4-mm1/net/core/net_namespace.c @@ -420,4 +420,26 @@ return net_ns; } + +/* + * This function checks if the ifaddr is visible from the + * current network namespace. This is true if the ifaddr is + * the loopback address or if the ifaddr is owned by the network + * namespace. + * @ifa : the ifaddr + * Returns : 1 if visible, 0 otherwise + */ +int net_ns_ifa_is_visible(const struct in_ifaddr *ifa) +{ + struct net_namespace *net_ns = current_net_ns; + + if (LOOPBACK(ifa-ifa_local)) + return 1; + + if (net_ns_match(ifa-ifa_net_ns, net_ns)) + return 1; + + return 0; +} + #endif /* CONFIG_NET_NS */ -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 03/12] net namespace : share network ressources L2 with L3
From: Daniel Lezcano [EMAIL PROTECTED] L3 namespace will use routes and devices belonging to its parent, so the old network namespace structure is copied when allocating a new one. By this way, hash value, dev list, routes are accessible from the L3 namespaces. In case of L2 namespace, these values are overwritten by the newly allocated values. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/linux/net_namespace.h | 14 ++ net/core/dev.c|4 ++-- net/core/net_namespace.c | 33 ++--- 3 files changed, 34 insertions(+), 17 deletions(-) Index: 2.6.20-rc4-mm1/net/core/net_namespace.c === --- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c +++ 2.6.20-rc4-mm1/net/core/net_namespace.c @@ -37,7 +37,7 @@ * Return ERR_PTR on error, new ns otherwise */ static struct net_namespace *clone_net_ns(unsigned int level, - struct net_namespace *old_ns) + struct net_namespace *old_ns) { struct net_namespace *ns; @@ -45,23 +45,26 @@ if (current_net_ns-level == NET_NS_LEVEL3) return ERR_PTR(-EPERM); - ns = kzalloc(sizeof(struct net_namespace), GFP_KERNEL); + ns = kmemdup(old_ns, sizeof(struct net_namespace), GFP_KERNEL); if (!ns) return NULL; kref_init(ns-kref); - ns-dev_base_p = NULL; - ns-dev_tail_p = ns-dev_base_p; - ns-hash = net_random(); - if ((push_net_ns(ns)) != old_ns) + BUG(); if (level == NET_NS_LEVEL2) { + ns-dev_base_p = NULL; + ns-dev_tail_p = ns-dev_base_p; + ns-hash = net_random(); + #ifdef CONFIG_IP_MULTIPLE_TABLES INIT_LIST_HEAD(ns-fib_rules_ops_list); #endif if (ip_fib_struct_init()) goto out_fib4; + if (loopback_init()) + goto out_loopback; } if (level == NET_NS_LEVEL3) { @@ -70,8 +73,6 @@ } ns-level = level; - if (loopback_init()) - goto out_loopback; pop_net_ns(old_ns); printk(KERN_DEBUG NET_NS: created new netcontext %p, level %u, for %s (pid=%d)\n, ns, (ns-level == NET_NS_LEVEL2) ? @@ -127,15 +128,17 @@ struct net_namespace *ns; ns = container_of(kref, struct net_namespace, kref); - unregister_netdev(ns-loopback_dev_p); - if (ns-dev_base_p != NULL) { - printk(NET_NS: BUG: namespace %p has devices! ref %d\n, - ns, atomic_read(ns-kref.refcount)); - return; - } - if (ns-level == NET_NS_LEVEL2) + if (ns-level == NET_NS_LEVEL2) { ip_fib_struct_cleanup(ns); + unregister_netdev(ns-loopback_dev_p); + if (ns-dev_base_p != NULL) { + printk(NET_NS: BUG: namespace %p has devices! ref %d\n, + ns, atomic_read(ns-kref.refcount)); + return; + } + } + if (ns-level == NET_NS_LEVEL3) put_net_ns(ns-parent); Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h === --- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h +++ 2.6.20-rc4-mm1/include/linux/net_namespace.h @@ -56,6 +56,15 @@ DECLARE_PER_CPU(struct net_namespace *, exec_net_ns); #define current_net_ns (__get_cpu_var(exec_net_ns)) +static inline struct net_namespace *net_ns_l2(void) +{ + struct net_namespace *net_ns = current_net_ns; + + if (net_ns-level == NET_NS_LEVEL3) + return net_ns-parent; + return net_ns; +} + static inline void init_current_net_ns(int cpu) { get_net_ns(init_net_ns); @@ -110,6 +119,11 @@ #define current_net_ns NULL +static inline struct net_namespace *net_ns_l2(void) +{ + return NULL; +} + static inline void init_current_net_ns(int cpu) { } Index: 2.6.20-rc4-mm1/net/core/dev.c === --- 2.6.20-rc4-mm1.orig/net/core/dev.c +++ 2.6.20-rc4-mm1/net/core/dev.c @@ -485,7 +485,7 @@ struct net_device *__dev_get_by_name(const char *name) { struct hlist_node *p; - struct net_namespace *ns = current_net_ns; + struct net_namespace *ns = net_ns_l2(); hlist_for_each(p, dev_name_hash(name, ns)) { struct net_device *dev @@ -768,7 +768,7 @@ if (!err) { hlist_del(dev-name_hlist); hlist_add_head(dev-name_hlist, dev_name_hash(dev-name, - current_net_ns)); + net_ns_l2()));
[patch 06/12] net namespace : check bind address
From: Daniel Lezcano [EMAIL PROTECTED] Check the bind address is allowed. It must match ifaddr assigned to the namespace and all derivative addresses. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/linux/net_namespace.h |7 + net/core/net_namespace.c | 54 ++ net/ipv4/af_inet.c|2 + net/ipv4/raw.c|3 ++ 4 files changed, 66 insertions(+) Index: 2.6.20-rc4-mm1/net/ipv4/af_inet.c === --- 2.6.20-rc4-mm1.orig/net/ipv4/af_inet.c +++ 2.6.20-rc4-mm1/net/ipv4/af_inet.c @@ -433,6 +433,8 @@ * is temporarily down) */ err = -EADDRNOTAVAIL; + if (net_ns_check_bind(chk_addr_ret, addr-sin_addr.s_addr)) + goto out; if (!sysctl_ip_nonlocal_bind !inet-freebind addr-sin_addr.s_addr != INADDR_ANY Index: 2.6.20-rc4-mm1/net/ipv4/raw.c === --- 2.6.20-rc4-mm1.orig/net/ipv4/raw.c +++ 2.6.20-rc4-mm1/net/ipv4/raw.c @@ -559,7 +559,10 @@ if (sk-sk_state != TCP_CLOSE || addr_len sizeof(struct sockaddr_in)) goto out; chk_addr_ret = inet_addr_type(addr-sin_addr.s_addr); + ret = -EADDRNOTAVAIL; + if (net_ns_check_bind(chk_addr_ret, addr-sin_addr.s_addr)) + goto out; if (addr-sin_addr.s_addr chk_addr_ret != RTN_LOCAL chk_addr_ret != RTN_MULTICAST chk_addr_ret != RTN_BROADCAST) goto out; Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h === --- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h +++ 2.6.20-rc4-mm1/include/linux/net_namespace.h @@ -93,6 +93,8 @@ extern int net_ns_ioctl(unsigned int cmd, void __user *arg); +extern int net_ns_check_bind(int addr_type, u32 addr); + #else /* CONFIG_NET_NS */ #define INIT_NET_NS(net_ns) @@ -148,6 +150,11 @@ return -ENOSYS; } +static inline int net_ns_check_bind(int addr_type, u32 addr) +{ + return 0; +} + #endif /* !CONFIG_NET_NS */ #endif /* _LINUX_NET_NAMESPACE_H */ Index: 2.6.20-rc4-mm1/net/core/net_namespace.c === --- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c +++ 2.6.20-rc4-mm1/net/core/net_namespace.c @@ -263,4 +263,58 @@ return err; } +/* + * This function check if the specified bind address is allowed. + * The bind is allowed if the address is: + * - 127.0.0.1 + * - INADDR_ANY + * - INADDR_BROADCAST + * - a multicast address + * - the specified address match an ifaddr owned by the current + * network namespace. That implies the local address and the + * computed address from the netmask + * @addr_type : an addr type + * @addr : the requested bind address + * Returns: -EPERM on failure, 0 on success + */ +int net_ns_check_bind(int addr_type, u32 addr) +{ + int ret = -EPERM; +struct net_device *dev; +struct in_device *in_dev; + struct net_namespace *net_ns = current_net_ns; + + if (LOOPBACK(addr) || + MULTICAST(addr) || + INADDR_ANY == addr || + INADDR_BROADCAST == addr) + return 0; + +read_lock(dev_base_lock); +rcu_read_lock(); +for (dev = dev_base; dev; dev = dev-next) { +in_dev = __in_dev_get_rcu(dev); +if (!in_dev) +continue; + +for_ifa(in_dev) { +if (ifa-ifa_net_ns != net_ns) + continue; + if (addr == ifa-ifa_local || + addr == ifa-ifa_broadcast || + addr == (ifa-ifa_local ifa-ifa_mask) || + addr == ((ifa-ifa_address ifa-ifa_mask)| + ~ifa-ifa_mask)) { + ret = 0; + goto out; + } +} endfor_ifa(in_dev); +} +out: +read_unlock(dev_base_lock); +rcu_read_unlock(); + + return ret; +} + #endif /* CONFIG_NET_NS */ -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 07/12] net namespace: set source addresse
From: Daniel Lezcano [EMAIL PROTECTED] When no source address is specified, search from the dev list the ifaddr allowed to be used as source address. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/linux/net_namespace.h | 14 net/core/net_namespace.c | 68 ++ net/ipv4/route.c | 28 +++-- 3 files changed, 100 insertions(+), 10 deletions(-) Index: 2.6.20-rc4-mm1/net/ipv4/route.c === --- 2.6.20-rc4-mm1.orig/net/ipv4/route.c +++ 2.6.20-rc4-mm1/net/ipv4/route.c @@ -2475,17 +2475,17 @@ if (LOCAL_MCAST(oldflp-fl4_dst) || oldflp-fl4_dst == htonl(0x)) { if (!fl.fl4_src) - fl.fl4_src = inet_select_addr(dev_out, 0, - RT_SCOPE_LINK); + fl.fl4_src = SELECT_SRC_ADDR(dev_out, 0, +RT_SCOPE_LINK); goto make_route; } if (!fl.fl4_src) { if (MULTICAST(oldflp-fl4_dst)) - fl.fl4_src = inet_select_addr(dev_out, 0, - fl.fl4_scope); + fl.fl4_src = SELECT_SRC_ADDR(dev_out, 0, +fl.fl4_scope); else if (!oldflp-fl4_dst) - fl.fl4_src = inet_select_addr(dev_out, 0, - RT_SCOPE_HOST); + fl.fl4_src = SELECT_SRC_ADDR(dev_out, 0, +RT_SCOPE_HOST); } } @@ -2525,8 +2525,8 @@ */ if (fl.fl4_src == 0) - fl.fl4_src = inet_select_addr(dev_out, 0, - RT_SCOPE_LINK); + fl.fl4_src = SELECT_SRC_ADDR(dev_out, 0, +RT_SCOPE_LINK); res.type = RTN_UNICAST; goto make_route; } @@ -2539,7 +2539,13 @@ if (res.type == RTN_LOCAL) { if (!fl.fl4_src) +#ifdef CONFIG_NET_NS + fl.fl4_src = net_ns_select_source_address(dev_out, + fl.fl4_dst, + RT_SCOPE_LINK); +#else fl.fl4_src = fl.fl4_dst; +#endif if (dev_out) dev_put(dev_out); dev_out = loopback_dev; @@ -2561,8 +2567,10 @@ fib_select_default(fl, res); if (!fl.fl4_src) - fl.fl4_src = FIB_RES_PREFSRC(res); - + fl.fl4_src = res.fi-fib_prefsrc ? : + SELECT_SRC_ADDR(FIB_RES_DEV(res), + FIB_RES_GW(res), + res.scope); if (dev_out) dev_put(dev_out); dev_out = FIB_RES_DEV(res); Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h === --- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h +++ 2.6.20-rc4-mm1/include/linux/net_namespace.h @@ -5,6 +5,7 @@ #include linux/kref.h #include linux/nsproxy.h #include linux/errno.h +#include linux/types.h struct net_namespace { struct kref kref; @@ -95,6 +96,11 @@ extern int net_ns_check_bind(int addr_type, u32 addr); +extern __be32 net_ns_select_source_address(const struct net_device *dev, + u32 dst, int scope); + +#define SELECT_SRC_ADDR net_ns_select_source_address + #else /* CONFIG_NET_NS */ #define INIT_NET_NS(net_ns) @@ -155,6 +161,14 @@ return 0; } +static inline __be32 net_ns_select_source_address(struct net_device *dev, + u32 dst, int scope) +{ + return 0; +} + +#define SELECT_SRC_ADDR inet_select_addr + #endif /* !CONFIG_NET_NS */ #endif /* _LINUX_NET_NAMESPACE_H */ Index: 2.6.20-rc4-mm1/net/core/net_namespace.c === --- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c +++ 2.6.20-rc4-mm1/net/core/net_namespace.c @@ -317,4 +317,72 @@ return ret; } +/* + * This function choose the source address from the network device, + * destination and the scope. The function will browse the ifaddr + * owned by network namespace and choose the most adapted for the + * dst address and dev. + * @dev : the network device where the
[patch 10/12] net namespace : add the loopback isolation
From: Daniel Lezcano [EMAIL PROTECTED] When a packet is outgoing, the namespace source is stored into the skbuff. Because it is the loopback address, the source == destination, so when the packet is incoming, it has already the namespace destination set into the packet. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/linux/net_namespace.h | 13 +++-- include/linux/skbuff.h|5 - net/core/net_namespace.c | 32 +++- net/ipv4/ip_input.c |2 +- net/ipv4/ip_output.c |1 + 5 files changed, 44 insertions(+), 9 deletions(-) Index: 2.6.20-rc4-mm1/include/linux/skbuff.h === --- 2.6.20-rc4-mm1.orig/include/linux/skbuff.h +++ 2.6.20-rc4-mm1/include/linux/skbuff.h @@ -225,6 +225,7 @@ * @dma_cookie: a cookie to one of several possible DMA operations * done by skb DMA functions * @secmark: security marking + * @net_ns: namespace destination */ struct sk_buff { @@ -309,7 +310,9 @@ #ifdef CONFIG_NETWORK_SECMARK __u32 secmark; #endif - +#ifdef CONFIG_NET_NS + struct net_namespace*net_ns; +#endif __u32 mark; /* These elements must be at the end, see alloc_skb() for details. */ Index: 2.6.20-rc4-mm1/net/ipv4/ip_input.c === --- 2.6.20-rc4-mm1.orig/net/ipv4/ip_input.c +++ 2.6.20-rc4-mm1/net/ipv4/ip_input.c @@ -396,7 +396,7 @@ iph = skb-nh.iph; - dst_net_ns = net_ns_find_from_dest_addr(iph-daddr); + dst_net_ns = net_ns_find_from_dest_addr(skb); if (dst_net_ns !net_ns_match(net_ns, dst_net_ns)) push_net_ns(dst_net_ns); /* Index: 2.6.20-rc4-mm1/net/ipv4/ip_output.c === --- 2.6.20-rc4-mm1.orig/net/ipv4/ip_output.c +++ 2.6.20-rc4-mm1/net/ipv4/ip_output.c @@ -272,6 +272,7 @@ IP_INC_STATS(IPSTATS_MIB_OUTREQUESTS); + net_ns_tag_sk_buff(skb); skb-dev = dev; skb-protocol = htons(ETH_P_IP); Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h === --- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h +++ 2.6.20-rc4-mm1/include/linux/net_namespace.h @@ -8,6 +8,7 @@ #include linux/types.h struct in_ifaddr; +struct sk_buff; struct net_namespace { struct kref kref; @@ -101,10 +102,13 @@ extern __be32 net_ns_select_source_address(const struct net_device *dev, u32 dst, int scope); -extern struct net_namespace *net_ns_find_from_dest_addr(u32 daddr); +extern struct net_namespace +*net_ns_find_from_dest_addr(const struct sk_buff *skb); extern int net_ns_ifa_is_visible(const struct in_ifaddr *ifa); +extern void net_ns_tag_sk_buff(struct sk_buff *skb); + #define SELECT_SRC_ADDR net_ns_select_source_address #else /* CONFIG_NET_NS */ @@ -173,7 +177,8 @@ return 0; } -static inline struct net_namespace *net_ns_find_from_dest_addr(u32 daddr) +static inline struct net_namespace +*net_ns_find_from_dest_addr(const struct sk_buff *skb) { return NULL; } @@ -183,6 +188,10 @@ return 1; } +static inline void net_ns_tag_sk_buff(struct sk_buff *skb) +{ + ; +} #define SELECT_SRC_ADDR inet_select_addr #endif /* !CONFIG_NET_NS */ Index: 2.6.20-rc4-mm1/net/core/net_namespace.c === --- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c +++ 2.6.20-rc4-mm1/net/core/net_namespace.c @@ -13,6 +13,9 @@ #include linux/in.h #include linux/netdevice.h #include linux/inetdevice.h +#include linux/skbuff.h +#include linux/ip.h + #include net/ip_fib.h struct net_namespace init_net_ns = { @@ -389,18 +392,25 @@ /* * This function finds the network namespace destination deduced from * the destination address. The network namespace is retrieved from - * the ifaddr owned by a network namespace - * @daddr : destination + * the ifaddr owned by a network namespace. If the packet is for the + * loopback address so we assume the destination address is already filled + * by the sender which is the same as the receiver. + * @skb : the packet to be delivered * Returns : the network namespace destination or NULL if not found */ -struct net_namespace *net_ns_find_from_dest_addr(u32 daddr) +struct net_namespace *net_ns_find_from_dest_addr(const struct sk_buff *skb) { struct net_namespace *net_ns = NULL; struct net_device *dev; struct in_device *in_dev; + struct iphdr *iph; + __be32 daddr; + + iph = skb-nh.iph; + daddr = iph-daddr; - if (LOOPBACK(daddr)) - return current_net_ns; + if (LOOPBACK(daddr)) + return skb-net_ns; read_lock(dev_base_lock);
[patch 05/12] net namespace : ioctl to push ifa to net namespace l3
From: Daniel Lezcano [EMAIL PROTECTED] New ioctl to push ifaddr to a container. Actually, the push is done from the current namespace, so the right word is pull. That will be changed to move ifaddr from l2 network namespace to l3. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/linux/net_namespace.h |7 ++ include/linux/sockios.h |4 + net/core/net_namespace.c | 118 +- net/ipv4/af_inet.c|4 + 4 files changed, 132 insertions(+), 1 deletion(-) Index: 2.6.20-rc4-mm1/include/linux/sockios.h === --- 2.6.20-rc4-mm1.orig/include/linux/sockios.h +++ 2.6.20-rc4-mm1/include/linux/sockios.h @@ -122,6 +122,10 @@ #define SIOCBRADDIF0x89a2 /* add interface to bridge */ #define SIOCBRDELIF0x89a3 /* remove interface from bridge */ +/* Container calls */ +#define SIOCNETNSPUSHIF 0x89b0 /* add ifaddr to namespace */ +#define SIOCNETNSPULLIF 0x89b1 /* remove ifaddr to namespace */ + /* Device private ioctl calls */ /* Index: 2.6.20-rc4-mm1/net/ipv4/af_inet.c === --- 2.6.20-rc4-mm1.orig/net/ipv4/af_inet.c +++ 2.6.20-rc4-mm1/net/ipv4/af_inet.c @@ -789,6 +789,10 @@ case SIOCSIFFLAGS: err = devinet_ioctl(cmd, (void __user *)arg); break; + case SIOCNETNSPUSHIF: + case SIOCNETNSPULLIF: + err = net_ns_ioctl(cmd, (void __user *)arg); + break; default: if (sk-sk_prot-ioctl) err = sk-sk_prot-ioctl(sk, cmd, arg); Index: 2.6.20-rc4-mm1/include/linux/net_namespace.h === --- 2.6.20-rc4-mm1.orig/include/linux/net_namespace.h +++ 2.6.20-rc4-mm1/include/linux/net_namespace.h @@ -91,6 +91,8 @@ #define net_ns_hash(ns)((ns)-hash) +extern int net_ns_ioctl(unsigned int cmd, void __user *arg); + #else /* CONFIG_NET_NS */ #define INIT_NET_NS(net_ns) @@ -141,6 +143,11 @@ #define net_ns_hash(ns)(0) +static inline int net_ns_ioctl(unsigned int cmd, void __user *arg) +{ + return -ENOSYS; +} + #endif /* !CONFIG_NET_NS */ #endif /* _LINUX_NET_NAMESPACE_H */ Index: 2.6.20-rc4-mm1/net/core/net_namespace.c === --- 2.6.20-rc4-mm1.orig/net/core/net_namespace.c +++ 2.6.20-rc4-mm1/net/core/net_namespace.c @@ -10,7 +10,9 @@ #include linux/nsproxy.h #include linux/net_namespace.h #include linux/net.h +#include linux/in.h #include linux/netdevice.h +#include linux/inetdevice.h #include net/ip_fib.h struct net_namespace init_net_ns = { @@ -123,6 +125,33 @@ return err; } +/* + * The function will move the ifaddr to the l2 network namespace + * parent. + * @net_ns: the related network namespace + */ +static void release_ifa_to_parent(const struct net_namespace* net_ns) +{ + struct net_device *dev; + struct in_device *in_dev; + + read_lock(dev_base_lock); + rcu_read_lock(); + for (dev = dev_base; dev; dev = dev-next) { + in_dev = __in_dev_get_rcu(dev); + if (!in_dev) + continue; + + for_ifa(in_dev) { + if (ifa-ifa_net_ns != net_ns) + continue; + ifa-ifa_net_ns = net_ns-parent; + } endfor_ifa(in_dev); + } + read_unlock(dev_base_lock); + rcu_read_unlock(); +} + void free_net_ns(struct kref *kref) { struct net_namespace *ns; @@ -139,12 +168,99 @@ } } - if (ns-level == NET_NS_LEVEL3) + if (ns-level == NET_NS_LEVEL3) { + release_ifa_to_parent(ns); put_net_ns(ns-parent); + } printk(KERN_DEBUG NET_NS: net namespace %p destroyed\n, ns); kfree(ns); } EXPORT_SYMBOL_GPL(free_net_ns); +/* + * This function allows to assign an IP address from a l2 network + * namespace to one of his l3 child or to release from an l3 network + * namespace to his l2 network namespace parent. + * @cmd: a push / pull command + * @arg: an userspace buffer containing an ifreq structure + * Returns: + * - EPERM : if caller has no CAP_NET_ADMIN capabilities or the + * current level of network namespace is not layer 2 + * - EFAULT : if arg is an invalid buffer + * - EADDRNOTAVAIL : if the specified ifaddr does not exists + * - EINVAL : if cmd is unknown + * - zero on success + */ +int net_ns_ioctl(unsigned int cmd, void __user *arg) +{ + struct ifreq ifr; + struct sockaddr_in *sin = (struct sockaddr_in *)ifr.ifr_addr; + struct net_namespace *net_ns = current_net_ns; +
[patch 11/12] net namespace : debugfs - add net_ns debugfs
From: Daniel Lezcano [EMAIL PROTECTED] For debug purpose only, this is not intended to be included. Add /sys/kernel/debug/net_ns. Creation of network namespace: echo level /sys/kernel/debug/net_ns/start Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- fs/debugfs/Makefile |2 fs/debugfs/net_ns.c | 335 net/Kconfig |4 3 files changed, 340 insertions(+), 1 deletion(-) Index: 2.6.20-rc4-mm1/fs/debugfs/Makefile === --- 2.6.20-rc4-mm1.orig/fs/debugfs/Makefile +++ 2.6.20-rc4-mm1/fs/debugfs/Makefile @@ -1,4 +1,4 @@ debugfs-objs := inode.o file.o obj-$(CONFIG_DEBUG_FS) += debugfs.o - +obj-$(CONFIG_NET_NS_DEBUG) += net_ns.o Index: 2.6.20-rc4-mm1/fs/debugfs/net_ns.c === --- /dev/null +++ 2.6.20-rc4-mm1/fs/debugfs/net_ns.c @@ -0,0 +1,335 @@ +/* + * net_ns.c - adds a net_ns/ directory to debug NET namespaces + * + * Author: Daniel Lezcano [EMAIL PROTECTED] + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + */ + +#include linux/module.h +#include linux/kernel.h +#include linux/pagemap.h +#include linux/debugfs.h +#include linux/sched.h +#include linux/netdevice.h +#include linux/inetdevice.h +#include linux/syscalls.h +#include linux/net_namespace.h +#include linux/rtnetlink.h + +static struct dentry *net_ns_dentry; +static struct dentry *net_ns_dentry_dev; +static struct dentry *net_ns_dentry_start; +static struct dentry *net_ns_dentry_info; + +static ssize_t net_ns_dev_read_file(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + return 0; +} + +static ssize_t net_ns_dev_write_file(struct file *file, +const char __user *user_buf, +size_t count, loff_t *ppos) +{ + return 0; +} + +static int net_ns_dev_open_file(struct inode *inode, struct file *file) +{ + return 0; +} + +static int net_ns_start_open_file(struct inode *inode, struct file *file) +{ + return 0; +} + +static ssize_t net_ns_start_read_file(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + return 0; +} + +static ssize_t net_ns_start_write_file(struct file *file, + const char __user *user_buf, + size_t count, loff_t *ppos) +{ + int err; + size_t len; + const char __user *p; + char c; + unsigned long flags; + struct net_namespace *net, *new_net; + struct nsproxy *new_nsproxy = NULL, *old_nsproxy = NULL; + + if (current_net_ns != init_net_ns) + return -EBUSY; + + len = 0; + p = user_buf; + while (len count) { + if (get_user(c, p++)) + return -EFAULT; + if (c == 0 || c == '\n') + break; + len++; + } + + if (len 1) + return -EINVAL; + + if (copy_from_user(c, user_buf, sizeof(c))) + return -EFAULT; + + if (c != '2' c != '3') + return -EINVAL; + + flags = (c=='2'?CLONE_NEWNET2:CLONE_NEWNET3); + err = unshare_net_ns(flags, new_net); + if (err) + return err; + + old_nsproxy = current-nsproxy; + new_nsproxy = dup_namespaces(old_nsproxy); + + if (!new_nsproxy) { + put_net_ns(new_net); + task_unlock(current); + return -ENOMEM; + } + + task_lock(current); + + if (new_nsproxy) { + current-nsproxy = new_nsproxy; + new_nsproxy = old_nsproxy; + } + + net = current-nsproxy-net_ns; + current-nsproxy-net_ns = new_net; + pop_net_ns(new_net); + new_net = net; + + task_unlock(current); + + put_nsproxy(new_nsproxy); + put_net_ns(new_net); + + return count; +} + +static int net_ns_info_open_file(struct inode *inode, struct file *file) +{ + return 0; +} + +static ssize_t net_ns_info_read_file(struct file *file, char __user *user_buf, +size_t count, loff_t *ppos) +{ + const unsigned int length = 256; + size_t len; + char buff[length]; + char *level; + struct net_namespace *net_ns = current_net_ns; + struct nsproxy *ns = current-nsproxy; + + if (*ppos 0) + return -EINVAL; + if (*ppos = count) + return 0; + if (!count) + return 0; + + switch (net_ns-level) { + case NET_NS_LEVEL2: + level = layer 2; +
[RFC] [patch 4/6] [Network namespace] Network inet devices isolation
The network isolation relies on the fact that an application can not use IP addresses not belonging to the container in which it's running. This patch isolates the inet device level by adding a structure namespace pointer in the structure in_ifaddr. When an ip address is set inside a network namespace, the structure in_ifaddr is filled with the current namespace pointer. There is a special case with loopback address which belongs to all the namespaces and its particularity is to have the network namespace pointer set to NULL. This patch isolates the ifconfig, ip addr commands, so when an IP address is set, this one it is not visible by another network namespaces. Replace-Subject: [Network namespace] Network inet devices isolation Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] -- include/linux/inetdevice.h |1 + net/ipv4/devinet.c | 28 +++- 2 files changed, 28 insertions(+), 1 deletion(-) Index: 2.6-mm/include/linux/inetdevice.h === --- 2.6-mm.orig/include/linux/inetdevice.h +++ 2.6-mm/include/linux/inetdevice.h @@ -99,6 +99,7 @@ unsigned char ifa_flags; unsigned char ifa_prefixlen; charifa_label[IFNAMSIZ]; + struct net_namespace*ifa_net_ns; }; extern int register_inetaddr_notifier(struct notifier_block *nb); Index: 2.6-mm/net/ipv4/devinet.c === --- 2.6-mm.orig/net/ipv4/devinet.c +++ 2.6-mm/net/ipv4/devinet.c @@ -54,6 +54,7 @@ #include linux/notifier.h #include linux/inetdevice.h #include linux/igmp.h +#include linux/net_ns.h #ifdef CONFIG_SYSCTL #include linux/sysctl.h #endif @@ -257,6 +258,7 @@ if (!(ifa-ifa_flags IFA_F_SECONDARY) || ifa1-ifa_mask != ifa-ifa_mask || + ifa-ifa_net_ns != net_ns() || !inet_ifa_match(ifa1-ifa_address, ifa)) { ifap1 = ifa-ifa_next; prev_prom = ifa; @@ -317,6 +319,8 @@ if (destroy) { inet_free_ifa(ifa1); + put_net_ns(ifa1-ifa_net_ns); + if (!in_dev-ifa_list) inetdev_destroy(in_dev); } @@ -343,6 +347,7 @@ ifa-ifa_scope = ifa1-ifa_scope) last_primary = ifa1-ifa_next; if (ifa1-ifa_mask == ifa-ifa_mask + ifa1-ifa_net_ns == ifa-ifa_net_ns inet_ifa_match(ifa1-ifa_address, ifa)) { if (ifa1-ifa_local == ifa-ifa_local) { inet_free_ifa(ifa); @@ -437,6 +442,8 @@ for (ifap = in_dev-ifa_list; (ifa = *ifap) != NULL; ifap = ifa-ifa_next) { + if (ifa-ifa_net_ns != net_ns()) + continue; if ((rta[IFA_LOCAL - 1] memcmp(RTA_DATA(rta[IFA_LOCAL - 1]), ifa-ifa_local, 4)) || @@ -497,6 +504,9 @@ ifa-ifa_scope = ifm-ifa_scope; in_dev_hold(in_dev); ifa-ifa_dev = in_dev; + ifa-ifa_net_ns = net_ns(); + get_net_ns(net_ns()); + if (rta[IFA_LABEL - 1]) rtattr_strlcpy(ifa-ifa_label, rta[IFA_LABEL - 1], IFNAMSIZ); else @@ -631,10 +641,15 @@ for (ifap = in_dev-ifa_list; (ifa = *ifap) != NULL; ifap = ifa-ifa_next) if (!strcmp(ifr.ifr_name, ifa-ifa_label)) - break; + if (!ifa-ifa_net_ns || + ifa-ifa_net_ns == net_ns()) + break; } } + if (ifa ifa-ifa_net_ns ifa-ifa_net_ns != net_ns()) + goto done; + ret = -EADDRNOTAVAIL; if (!ifa cmd != SIOCSIFADDR cmd != SIOCSIFFLAGS) goto done; @@ -678,6 +693,12 @@ ret = -ENOBUFS; if ((ifa = inet_alloc_ifa()) == NULL) break; + if (!LOOPBACK(sin-sin_addr.s_addr)) { + ifa-ifa_net_ns = net_ns(); + get_net_ns(net_ns()); + } else + ifa-ifa_net_ns = NULL; + if (colon) memcpy(ifa-ifa_label, ifr.ifr_name, IFNAMSIZ); else @@ -782,6 +803,8 @@ goto out; for (; ifa; ifa = ifa-ifa_next) { + if (ifa-ifa_net_ns ifa-ifa_net_ns != net_ns()) + continue; if (!buf) { done += sizeof(ifr); continue; @@ -1012,6
[RFC] [patch 2/6] [Network namespace] Network device sharing by view
Adds to the network namespace a device list view. This view is emptied when the unshare is done. The view is filled/emptied by a set of function which can be called by an external module. Replace-Subject: [Network namespace] Network device sharing by view Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] -- include/linux/net_ns.h |2 include/linux/net_ns_dev.h | 32 +++ init/version.c |4 net/core/Makefile |2 net/core/net_ns_dev.c | 205 + net/net_ns.c |6 + 6 files changed, 250 insertions(+), 1 deletion(-) Index: 2.6-mm/include/linux/net_ns_dev.h === --- /dev/null +++ 2.6-mm/include/linux/net_ns_dev.h @@ -0,0 +1,32 @@ +#ifndef _LINUX_NET_NS_DEV_H +#define _LINUX_NET_NS_DEV_H + +struct net_device; + +struct net_ns_dev { + struct list_head list; + struct net_device *dev; +}; + +struct net_ns_dev_list { + struct list_head list; + rwlock_t lock; +}; + +extern int net_ns_dev_unregister(struct net_device *dev, +struct net_ns_dev_list *devlist); + +extern int net_ns_dev_register(struct net_device *dev, + struct net_ns_dev_list *devlist); + +extern struct net_device *net_ns_dev_find_by_name(const char *devname, + struct net_ns_dev_list *devlist); +extern int net_ns_dev_remove(const char *devname, +struct net_ns_dev_list *devlist); + +extern int net_ns_dev_add(const char *devname, + struct net_ns_dev_list *devlist); + +extern int free_net_ns_dev(struct net_ns_dev_list *devlist); + +#endif Index: 2.6-mm/include/linux/net_ns.h === --- 2.6-mm.orig/include/linux/net_ns.h +++ 2.6-mm/include/linux/net_ns.h @@ -4,9 +4,11 @@ #include linux/kref.h #include linux/sched.h #include linux/nsproxy.h +#include linux/net_ns_dev.h struct net_namespace { struct kref kref; + struct net_ns_dev_list dev_list; }; extern struct net_namespace init_net_ns; Index: 2.6-mm/net/core/net_ns_dev.c === --- /dev/null +++ 2.6-mm/net/core/net_ns_dev.c @@ -0,0 +1,205 @@ +/* + * net_ns_dev.c - adds namespace netwok device view + * + * Copyright (C) 2006 IBM + * + * Author: Daniel Lezcano [EMAIL PROTECTED] + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + */ +#include linux/list.h +#include linux/spinlock.h +#include linux/netdevice.h +#include linux/net_ns_dev.h + +int free_net_ns_dev(struct net_ns_dev_list *devlist) +{ + struct list_head *l, *next; + struct net_ns_dev *db; + struct net_device *dev; + + write_lock(devlist-lock); + list_for_each_safe(l, next, devlist-list) { + db = list_entry(l, struct net_ns_dev, list); + dev = db-dev; + list_del(db-list); + dev_put(dev); + kfree(db); + } + write_unlock(devlist-lock); + + return 0; +} + +/* + * Remove a device to the namespace network devices list + * when registered from a namespace + * @dev : network device + * @dev_list: network namespace devices + * Return ENODEV if the device does not exist, + */ +int net_ns_dev_unregister(struct net_device *dev, + struct net_ns_dev_list *devlist) +{ + struct net_ns_dev *db; + struct list_head *l; + int ret = -ENODEV; + + write_lock(devlist-lock); + list_for_each(l, devlist-list) { + db = list_entry(l, struct net_ns_dev, list); + if (dev != db-dev) + continue; + + list_del(db-list); + dev_put(dev); + kfree(db); + ret = 0; + break; + } + write_unlock(devlist-lock); + return ret; +} + +EXPORT_SYMBOL_GPL(net_ns_dev_unregister); + +/* + * Add a device to the namespace network devices list + * when registered from a namespace + * @dev : network device + * @dev_list: network namespace devices + * Return ENOMEM if allocation fails, 0 on success + */ +int net_ns_dev_register(struct net_device *dev, + struct net_ns_dev_list *devlist) +{ + struct net_ns_dev *db; + + db = kmalloc(sizeof(*db), GFP_KERNEL); + if (!db) + return -ENOMEM; + + write_lock(devlist-lock); + dev_hold(dev); + db-dev = dev; + list_add_tail(db-list, devlist-list); + write_unlock(devlist-lock); + + return 0; +} + +EXPORT_SYMBOL_GPL(net_ns_dev_register); + +/* + * Add a device to the namespace network devices list + *
[RFC] [patch 3/6] [Network namespace] Network devices isolation
The dev list view is filled and used from here. The dev_base_list has been replaced to the dev list view and devices can be accessed only if the view has the device in its list. All calls from the userspace, ioctls, netlinks and procfs, will use the network devices view instead of the global network device list. Replace-Subject: [Network namespace] Network devices isolation Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] -- net/core/dev.c | 147 ++- net/core/rtnetlink.c | 21 +-- 2 files changed, 126 insertions(+), 42 deletions(-) Index: 2.6-mm/net/core/dev.c === --- 2.6-mm.orig/net/core/dev.c +++ 2.6-mm/net/core/dev.c @@ -115,6 +115,7 @@ #include net/iw_handler.h #include asm/current.h #include linux/audit.h +#include linux/net_ns.h #include linux/dmaengine.h /* @@ -474,13 +475,16 @@ struct net_device *__dev_get_by_name(const char *name) { - struct hlist_node *p; + struct net_ns_dev_list *dev_list = (net_ns()-dev_list); + struct list_head *l, *list = dev_list-list; + struct net_ns_dev *db; + struct net_device *dev; - hlist_for_each(p, dev_name_hash(name)) { - struct net_device *dev - = hlist_entry(p, struct net_device, name_hlist); + list_for_each(l, list) { + db = list_entry(l, struct net_ns_dev, list); + dev = db-dev; if (!strncmp(dev-name, name, IFNAMSIZ)) - return dev; + return dev; } return NULL; } @@ -498,13 +502,14 @@ struct net_device *dev_get_by_name(const char *name) { + struct net_ns_dev_list *dev_list = (net_ns()-dev_list); struct net_device *dev; - read_lock(dev_base_lock); + read_lock(dev_list-lock); dev = __dev_get_by_name(name); if (dev) dev_hold(dev); - read_unlock(dev_base_lock); + read_unlock(dev_list-lock); return dev; } @@ -521,11 +526,14 @@ struct net_device *__dev_get_by_index(int ifindex) { - struct hlist_node *p; + struct net_ns_dev_list *dev_list = (net_ns()-dev_list); + struct list_head *l, *list = dev_list-list; + struct net_ns_dev *db; + struct net_device *dev; - hlist_for_each(p, dev_index_hash(ifindex)) { - struct net_device *dev - = hlist_entry(p, struct net_device, index_hlist); + list_for_each(l, list) { + db = list_entry(l, struct net_ns_dev, list); + dev = db-dev; if (dev-ifindex == ifindex) return dev; } @@ -545,13 +553,14 @@ struct net_device *dev_get_by_index(int ifindex) { + struct net_ns_dev_list *dev_list = (net_ns()-dev_list); struct net_device *dev; - read_lock(dev_base_lock); + read_lock(dev_list-lock); dev = __dev_get_by_index(ifindex); if (dev) dev_hold(dev); - read_unlock(dev_base_lock); + read_unlock(dev_list-lock); return dev; } @@ -571,14 +580,24 @@ struct net_device *dev_getbyhwaddr(unsigned short type, char *ha) { - struct net_device *dev; + struct net_ns_dev_list *dev_list = (net_ns()-dev_list); + struct list_head *l, *list = dev_list-list; + struct net_ns_dev *db; + struct net_device *dev = NULL; ASSERT_RTNL(); - for (dev = dev_base; dev; dev = dev-next) + read_lock(dev_list-lock); + list_for_each(l, list) { + db = list_entry(l, struct net_ns_dev, list); + dev = db-dev; if (dev-type == type !memcmp(dev-dev_addr, ha, dev-addr_len)) - break; + goto out; + } + dev = NULL; +out: + read_unlock(dev_list-lock); return dev; } @@ -586,15 +605,25 @@ struct net_device *dev_getfirstbyhwtype(unsigned short type) { + struct net_ns_dev_list *dev_list = (net_ns()-dev_list); + struct list_head *l, *list = dev_list-list; + struct net_ns_dev *db; struct net_device *dev; rtnl_lock(); - for (dev = dev_base; dev; dev = dev-next) { + + read_lock(dev_list-lock); + list_for_each(l, list) { + db = list_entry(l, struct net_ns_dev, list); + dev = db-dev; if (dev-type == type) { dev_hold(dev); - break; + goto out; } } + dev = NULL; +out: + read_unlock(dev_list-lock); rtnl_unlock(); return dev; } @@ -614,16 +643,23 @@ struct net_device * dev_get_by_flags(unsigned short if_flags, unsigned short mask) { + struct net_ns_dev_list *dev_list = (net_ns()-dev_list); + struct list_head *l, *list =
[RFC] [patch 6/6] [Network namespace] Network namespace debugfs
This patch is for testing purpose. It allows to read which network devices are accessible and to add a network device to the view. This RFC hack is purely for discussing the best way to do that. After unsharing with CLONE_NEWNET flag: -- To see which devices are accessible: cat /sys/kernel/debug/net_ns/dev To add a device: echo eth1 /sys/kernel/debug/net_ns/dev This functionnality is intended to be implemented in an higher level container configuration. Replace-Subject: [Network namespace] Network namespace debugfs Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] -- fs/debugfs/Makefile |2 fs/debugfs/net_ns.c | 141 net/Kconfig |4 + 3 files changed, 146 insertions(+), 1 deletion(-) Index: 2.6-mm/fs/debugfs/Makefile === --- 2.6-mm.orig/fs/debugfs/Makefile +++ 2.6-mm/fs/debugfs/Makefile @@ -1,4 +1,4 @@ debugfs-objs := inode.o file.o obj-$(CONFIG_DEBUG_FS) += debugfs.o - +obj-$(CONFIG_NET_NS_DEBUG) += net_ns.o Index: 2.6-mm/fs/debugfs/net_ns.c === --- /dev/null +++ 2.6-mm/fs/debugfs/net_ns.c @@ -0,0 +1,141 @@ +/* + * net_ns.c - adds a net_ns/ directory to debug NET namespaces + * + * Copyright (C) 2006 IBM + * + * Author: Daniel Lezcano [EMAIL PROTECTED] + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + */ + +#include linux/module.h +#include linux/kernel.h +#include linux/pagemap.h +#include linux/debugfs.h +#include linux/sched.h +#include linux/netdevice.h +#include linux/net_ns.h + +static struct dentry *net_ns_dentry; +static struct dentry *net_ns_dentry_dev; + +static ssize_t net_ns_dev_read_file(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + size_t len; + char *buf; + struct net_ns_dev_list *devlist = (net_ns()-dev_list); + struct net_ns_dev *db; + struct net_device *dev; + struct list_head *l; + + if (*ppos 0) + return -EINVAL; + if (*ppos = count) + return 0; + + /* It's for debug, everything should fit */ + buf = kmalloc(4096, GFP_KERNEL); + if (!buf) + return -ENOMEM; + buf[0] = '\0'; + + read_lock(devlist-lock); + list_for_each(l, devlist-list) { + db = list_entry(l, struct net_ns_dev, list); + dev = db-dev; + strcat(buf,dev-name); + strcat(buf,\n); + } + read_unlock(devlist-lock); + + len = strlen(buf); + + if (len count) + len = count; + + if (copy_to_user(user_buf, buf, len)) { + kfree(buf); + return -EFAULT; + } + + *ppos += count; + kfree(buf); + + return count; +} + +static ssize_t net_ns_dev_write_file(struct file *file, +const char __user *user_buf, +size_t count, loff_t *ppos) +{ + int ret; + size_t len; + const char __user *p; + char c; + char devname[IFNAMSIZ]; + struct net_ns_dev_list *dev_list = (net_ns()-dev_list); + + len = 0; + p = user_buf; + while (len count) { + if (get_user(c, p++)) + return -EFAULT; + if (c == 0 || c == '\n') + break; + len++; + } + + if (len = IFNAMSIZ) + return -EINVAL; + + if (copy_from_user(devname, user_buf, len)) + return -EFAULT; + + devname[len] = '\0'; + + ret = net_ns_dev_add(devname, dev_list); + if (ret) + return ret; + + *ppos += count; + return count; +} + +static int net_ns_dev_open_file(struct inode *inode, struct file *file) +{ + return 0; +} + +static struct file_operations net_ns_dev_fops = { + .read = net_ns_dev_read_file, + .write =net_ns_dev_write_file, + .open = net_ns_dev_open_file, +}; + +static int __init net_ns_init(void) +{ + net_ns_dentry = debugfs_create_dir(net_ns, NULL); + + net_ns_dentry_dev = debugfs_create_file(dev, 0666, + net_ns_dentry, + NULL, + net_ns_dev_fops); + return 0; +} + +static void __exit net_ns_exit(void) +{ + debugfs_remove(net_ns_dentry_dev); + debugfs_remove(net_ns_dentry); +} + +module_init(net_ns_init); +module_exit(net_ns_exit); + +MODULE_DESCRIPTION(NET namespace debugfs); +MODULE_AUTHOR(Daniel Lezcano [EMAIL
[RFC] [patch 5/6] [Network namespace] ipv4 isolation
This patch partially isolates ipv4 by adding the network namespace structure in the structure sock, bind bucket and skbuf. When a socket is created, the pointer to the network namespace is stored in the struct sock and the socket belongs to the namespace by this way. That allows to identify sockets related to a namespace for lookup and procfs. The lookup is extended with a network namespace pointer, in order to identify listen points binded to the same port. That allows to have several applications binded to INADDR_ANY:port in different network namespace without conflicting. The bind is checked against port and network namespace. When an outgoing packet has the loopback destination addres, the skbuff is filled with the network namespace. So the loopback packets never go outside the namespace. This approach facilitate the migration of loopback because identification is done by network namespace and not by address. The loopback has been benchmarked by tbench and the overhead is roughly 1.5 % Replace-Subject: [Network namespace] ipv4 isolation Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] -- include/linux/skbuff.h |2 ++ include/net/inet_hashtables.h| 34 -- include/net/inet_timewait_sock.h |1 + include/net/sock.h |4 net/dccp/ipv4.c |7 --- net/ipv4/af_inet.c |2 ++ net/ipv4/inet_connection_sock.c |3 ++- net/ipv4/inet_diag.c |3 ++- net/ipv4/inet_hashtables.c |6 +- net/ipv4/inet_timewait_sock.c|1 + net/ipv4/ip_output.c |4 net/ipv4/tcp_ipv4.c | 25 - net/ipv4/udp.c |7 +-- 13 files changed, 72 insertions(+), 27 deletions(-) Index: 2.6-mm/include/linux/skbuff.h === --- 2.6-mm.orig/include/linux/skbuff.h +++ 2.6-mm/include/linux/skbuff.h @@ -27,6 +27,7 @@ #include linux/poll.h #include linux/net.h #include linux/textsearch.h +#include linux/net_ns.h #include net/checksum.h #include linux/dmaengine.h @@ -301,6 +302,7 @@ *data, *tail, *end; + struct net_namespace*net_ns; }; #ifdef __KERNEL__ Index: 2.6-mm/include/net/inet_hashtables.h === --- 2.6-mm.orig/include/net/inet_hashtables.h +++ 2.6-mm/include/net/inet_hashtables.h @@ -23,6 +23,8 @@ #include linux/spinlock.h #include linux/types.h #include linux/wait.h +#include linux/in.h +#include linux/net_ns.h #include net/inet_connection_sock.h #include net/inet_sock.h @@ -78,6 +80,7 @@ signed shortfastreuse; struct hlist_node node; struct hlist_head owners; + struct net_namespace*net_ns; }; #define inet_bind_bucket_for_each(tb, node, head) \ @@ -274,13 +277,15 @@ extern struct sock *__inet_lookup_listener(const struct hlist_head *head, const u32 daddr, const unsigned short hnum, - const int dif); + const int dif, + const struct net_namespace *net_ns); /* Optimize the common listener case. */ static inline struct sock * inet_lookup_listener(struct inet_hashinfo *hashinfo, const u32 daddr, -const unsigned short hnum, const int dif) +const unsigned short hnum, const int dif, +const struct net_namespace *net_ns) { struct sock *sk = NULL; const struct hlist_head *head; @@ -294,8 +299,9 @@ (!inet-rcv_saddr || inet-rcv_saddr == daddr) (sk-sk_family == PF_INET || !ipv6_only_sock(sk)) !sk-sk_bound_dev_if) - goto sherry_cache; - sk = __inet_lookup_listener(head, daddr, hnum, dif); + if (sk-sk_net_ns == net_ns LOOPBACK(daddr)) + goto sherry_cache; + sk = __inet_lookup_listener(head, daddr, hnum, dif, net_ns); } if (sk) { sherry_cache: @@ -358,7 +364,8 @@ __inet_lookup_established(struct inet_hashinfo *hashinfo, const u32 saddr, const u16 sport, const u32 daddr, const u16 hnum, - const int dif) + const int dif, + const struct net_namespace *net_ns) { INET_ADDR_COOKIE(acookie, saddr, daddr) const __u32 ports = INET_COMBINED_PORTS(sport, hnum); @@ -373,12
[RFC] [patch 1/6] [Network namespace] Network namespace structure
This patch adds to the nsproxy the network namespace and a set of functions to unshare it. The network namespace structure should be filled later with the identified network ressources needed for more isolation. Replace-Subject: [Network namespace] Network namespace structure Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] -- include/linux/init_task.h |2 include/linux/net_ns.h| 59 include/linux/nsproxy.h |2 include/linux/sched.h |1 init/version.c|8 +++ kernel/fork.c | 24 +-- kernel/nsproxy.c | 38 +++--- net/Kconfig |9 net/Makefile |1 net/net_ns.c | 96 ++ 10 files changed, 222 insertions(+), 18 deletions(-) Index: 2.6-mm/include/linux/net_ns.h === --- /dev/null +++ 2.6-mm/include/linux/net_ns.h @@ -0,0 +1,59 @@ +#ifndef _LINUX_NET_NS_H +#define _LINUX_NET_NS_H + +#include linux/kref.h +#include linux/sched.h +#include linux/nsproxy.h + +struct net_namespace { + struct kref kref; +}; + +extern struct net_namespace init_net_ns; + +#ifdef CONFIG_NET_NS + +extern int unshare_network(unsigned long unshare_flags, + struct net_namespace **new_net); + +extern int copy_network(int flags, struct task_struct *tsk); + +static inline void get_net_ns(struct net_namespace *ns) +{ + kref_get(ns-kref); +} + +void free_net_ns(struct kref *kref); + +static inline void put_net_ns(struct net_namespace *ns) +{ + kref_put(ns-kref, free_net_ns); +} + +static inline void exit_network(struct task_struct *p) +{ + struct net_namespace *net_ns = p-nsproxy-net_ns; + if (net_ns) + put_net_ns(net_ns); +} +#else /* !CONFIG_NET_NS */ +static inline int unshare_network(unsigned long unshare_flags, + struct net_namespace **new_net) +{ + return -EINVAL; +} +static inline int copy_network(int flags, struct task_struct *tsk) +{ + return 0; +} +static inline void get_net_ns(struct net_namespace *ns) {} +static inline void put_net_ns(struct net_namespace *ns) {} +static inline void exit_network(struct task_struct *p) {} +#endif /* CONFIG_NET_NS */ + +static inline struct net_namespace *net_ns(void) +{ + return current-nsproxy-net_ns; +} + +#endif Index: 2.6-mm/net/net_ns.c === --- /dev/null +++ 2.6-mm/net/net_ns.c @@ -0,0 +1,96 @@ +/* + * net_ns.c - adds support for network namespace + * + * Copyright (C) 2006 IBM + * + * Author: Daniel Lezcano [EMAIL PROTECTED] + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + */ + +#include linux/net_ns.h +#include linux/module.h + +/* + * Clone a new ns copying an original, setting refcount to 1 + * Cloned process will have + * @old_ns: namespace to clone + * Return NULL on error (failure to kmalloc), new ns otherwise + */ +struct net_namespace *clone_net_ns(struct net_namespace *old_ns) +{ + struct net_namespace *new_ns; + + new_ns = kmalloc(sizeof(*new_ns), GFP_KERNEL); + if (!new_ns) + return NULL; + kref_init(new_ns-kref); + return new_ns; +} + +/* + * unshare the current process' network namespace. + * called only in sys_unshare() + */ +int unshare_network(unsigned long unshare_flags, + struct net_namespace **new_net) +{ + if (!(unshare_flags CLONE_NEWNET)) + return 0; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + *new_net = clone_net_ns(current-nsproxy-net_ns); + if (!*new_net) + return -ENOMEM; + + return 0; +} + +/* + * Copy task tsk's network namespace, or clone it if flags specifies + * CLONE_NEWNET. In latter case, changes to the network ressources of + * this process won't be seen by parent, and vice versa. + */ +int copy_network(int flags, struct task_struct *tsk) +{ + struct net_namespace *old_ns = tsk-nsproxy-net_ns; + struct net_namespace *new_ns; + int err = 0; + + if (!old_ns) + return 0; + + get_net_ns(old_ns); + + if (!(flags CLONE_NEWNET)) + return 0; + + if (!capable(CAP_SYS_ADMIN)) { + err = -EPERM; + goto out; + } + + new_ns = clone_net_ns(old_ns); + if (!new_ns) { + err = -ENOMEM; + goto out; + } + tsk-nsproxy-net_ns = new_ns; + +out: + put_net_ns(old_ns); + return err; +} + +void free_net_ns(struct kref *kref) +{ + struct net_namespace *ns; + + ns = container_of(kref, struct net_namespace, kref); + kfree(ns);
[RFC] [patch 0/6] [Network namespace] introduction
The following patches create a private network namespace for use within containers. This is intended for use with system containers like vserver, but might also be useful for restricting individual applications' access to the network stack. These patches isolate traffic inside the network namespace. The network ressources, the incoming and the outgoing packets are identified to be related to a namespace. It hides network resource not contained in the current namespace, but still allows administration of the network with normal commands like ifconfig. It applies to the kernel version 2.6.17-rc6-mm1 It provides the following: - - when an application unshares its network namespace, it looses its view of all network devices by default. The administrator can choose to make any devices to become visible again. The container then gains a view to the device but without the ip address configured on it. It is up to the container administrator to use ifconfig or ip command to setup a new ip address. This ip address is only visible inside the container. - the loopback is isolated inside the container and it is not possible to communicate between containers via the loopback. - several containers can have an application bind to the same address:port without conflicting. What is for ? - - security : an application can be bounded inside a container without interacting with the network used by another container - consolidation : several instance of the same application can be ran in different container because the network namespace allows to bind to the same addr:port What could be done ? - because the network ressources are related to a namespace, it is easy to identify them. That facilitate the implementation of the network migration How to use ? - do unshare with the CLONE_NEWNET flag as root - do echo eth0 /sys/kernel/debug/net_ns/dev - use ifconfig or ip command to set a new ip address What is missing ? - The routes are not yet isolated, that implies: - binding to another container's address is allowed - an outgoing packet which has an unset source address can potentially get another container's address - an incoming packet can be routed to the wrong container if there are several containers listening to the same addr:port -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html