Re: [PATCH] e1000: Work around 82571 completion timout on Pseries HW
On Thu, May 17, 2007 at 09:58:03AM -0500, Wen Xiong wrote: It really shouldn't be there at all because something in either the intel or pseries hardware is totally buggy and we should disable features in the buggy one completely. Hi, Here there are not hardware issue on both Intel or PPC. The patch is to work around a loop hold on early version of PCI SGI spec. The later PCI Sgi have spec have corrected it. We can just implement it for PPC only. Other vendor may have the same issue. In this case we should add a blacklist for implementations of the old spec. There should be a way to find specific bridges in the OF firmware tree on powerpc and similar things on other platforms aswell. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix race in AF_UNIX
Eric, thanks for looking at this. There are races involving the garbage collector, that can throw away perfectly good packets with AF_UNIX sockets in them. The problems arise when a socket goes from installed to in-flight or vice versa during garbage collection. Since gc is done with a spinlock held, this only shows up on SMP. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] I'm going to hold off on this one for now. Holding all of the read locks kind of defeats the purpose of using the per-socket lock. Can't you just lock purely around the receive queue operation? That's already protected by the receive queue spinlock. The race however happens _between_ pushing the root set and marking of the in-flight but reachable sockets. If in that space any of the AF_UNIX sockets goes from in-flight to installed into a file descriptor, the garbage collector can miss it. If we want to protect against this using unix_sk(s)-readlock, then we have to hold all of them for the duration of the marking. Al, Alan, you have more experience with this piece of code. Do you have better ideas about how to fix this? I haven't looked at the code closely enough to be confident of changing something in this area. However the classic solution to this kind of gc problem is to mark things that are manipulated during garbage collection as dirty (not orphaned). It should be possible to fix this problem by simply changing gc_tree when we perform a problematic manipulation of a passed socket, such as installing a passed socket into the file descriptors of a process. Essentially the idea is moving the current code in the direction of an incremental gc algorithm. If I understand the race properly. What happens is that we dequeue a socket (which has packets in it passing sockets) before the garbage collector gets to it. Therefore the garbage collector never processes that socket. So it sounds like we just need to call maybe_unmark_and_push or possibly just wait for the garbage collector to complete when we do that and the packet we have pulled out Right. But the devil is in the details, and (as you correctly point out later) to implement this, the whole locking scheme needs to be overhauled. Problems: - Using the queue lock to make the dequeue and the fd detach atomic wrt the GC is difficult, if not impossible: they are are far from each other with various magic in between. It would need thorough understanding of these functions and _big_ changes to implement. - Sleeping on u-readlock in GC is currently not possible, since that could deadlock with unix_dgram_recvmsg(). That function could probably be modified to release u-readlock, while waiting for data, similarly to unix_stream_recvmsg() at the cost of some added complexity. - Sleeping on u-readlock is also impossible, because GC is holding unix_table_lock for the whole operation. We could release unix_table_lock, but then would have to cope with sockets coming and going, making the current socket iterator unworkable. So theoretically it's quite simple, but it needs big changes. And this wouldn't even solve all the problems with the GC, like being a possible DoS vector. Miklos - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
On Sat, Jun 23, 2007, David Miller wrote: From: David Stevens [EMAIL PROTECTED] Auto-configured addresses are used by the kernel. It has to have those addresses. But the kernel doesn't do DNS look-ups, or write resolv.conf; that's the difference, for me. I totally agree with David, this stuff definitely does not belong in the kernel. It is my understanding that you think that IP stack configuration belongs in the kernel whereas DNS does not, right? Then I have a question: does RS-RA management belong in the kernel or not? -- Pierre Ynard WTS #51 - No phone Une âme dans un corps, c'est comme un dessin sur une feuille de papier. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NETLINK]: attr: add nested compat attribute type
Add support for the nested compat attribute type to netlink. Thomas, I forgot to CC you on the related rtnetlink/iproute patches, please have a look on netdev. [NETLINK]: attr: add nested compat attribute type Add a nested compat attribute type that can be used to convert attributes that contain a structure to nested attributes in a backwards compatible way. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit fb99bf7aa7d9dd2af24d75d3a574ef17c4bae079 tree 4373b1c05a23d2544e7f53a9769ccc973ab913f7 parent 82a7e0e31d94515507be3ed8ac8e7866ab9ab928 author Patrick McHardy [EMAIL PROTECTED] Sat, 23 Jun 2007 11:24:26 +0200 committer Patrick McHardy [EMAIL PROTECTED] Sat, 23 Jun 2007 11:24:26 +0200 include/net/netlink.h | 84 + net/netlink/attr.c| 11 ++ 2 files changed, 95 insertions(+), 0 deletions(-) diff --git a/include/net/netlink.h b/include/net/netlink.h index 7b510a9..d7b824b 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -118,6 +118,9 @@ * Nested Attributes Construction: * nla_nest_start(skb, type) start a nested attribute * nla_nest_end(skb, nla)finalize a nested attribute + * nla_nest_compat_start(skb, type, start a nested compat attribute + *len, data) + * nla_nest_compat_end(skb, type)finalize a nested compat attribute * nla_nest_cancel(skb, nla) cancel nested attribute construction * * Attribute Length Calculations: @@ -152,6 +155,7 @@ * nla_find_nested() find attribute in nested attributes * nla_parse() parse and validate stream of attrs * nla_parse_nested()parse nested attribuets + * nla_parse_nested_compat() parse nested compat attributes * nla_for_each_attr() loop over all attributes * nla_for_each_nested() loop over the nested attributes *= @@ -170,6 +174,7 @@ enum { NLA_FLAG, NLA_MSECS, NLA_NESTED, + NLA_NESTED_COMPAT, NLA_NUL_STRING, NLA_BINARY, __NLA_TYPE_MAX, @@ -190,6 +195,7 @@ enum { *NLA_NUL_STRING Maximum length of string (excluding NUL) *NLA_FLAG Unused *NLA_BINARY Maximum length of attribute payload + *NLA_NESTED_COMPATExact length of structure payload *All otherExact length of attribute payload * * Example: @@ -733,6 +739,39 @@ static inline int nla_parse_nested(struct nlattr *tb[], int maxtype, { return nla_parse(tb, maxtype, nla_data(nla), nla_len(nla), policy); } + +/** + * nla_parse_nested_compat - parse nested compat attributes + * @tb: destination array with maxtype+1 elements + * @maxtype: maximum attribute type to be expected + * @nla: attribute containing the nested attributes + * @data: pointer to point to contained structure + * @len: length of contained structure + * @policy: validation policy + * + * Parse a nested compat attribute. The compat attribute contains a structure + * and optionally a set of nested attributes. On success the data pointer + * points to the nested data and tb contains the parsed attributes + * (see nla_parse). + */ +static inline int __nla_parse_nested_compat(struct nlattr *tb[], int maxtype, + struct nlattr *nla, + const struct nla_policy *policy, + int len) +{ + if (nla_len(nla) len) + return -1; + if (nla_len(nla) = NLA_ALIGN(len) + sizeof(struct nlattr)) + return nla_parse_nested(tb, maxtype, + nla_data(nla) + NLA_ALIGN(len), + policy); + memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1)); + return 0; +} + +#define nla_parse_nested_compat(tb, maxtype, nla, policy, data, len) \ +({ data = nla_len(nla) = len ? nla_data(nla) : NULL; \ + __nla_parse_nested_compat(tb, maxtype, nla, policy, len); }) /** * nla_put_u8 - Add a u16 netlink attribute to a socket buffer * @skb: socket buffer to add attribute to @@ -965,6 +1004,51 @@ static inline int nla_nest_end(struct sk_buff *skb, struct nlattr *start) } /** + * nla_nest_compat_start - Start a new level of nested compat attributes + * @skb: socket buffer to add attributes to + * @attrtype: attribute type of container + * @attrlen: length of structure + * @data: pointer to structure + * + * Start a nested compat attribute that contains both a structure and + * a set of nested attributes. + * + * Returns the container attribute + */ +static inline struct nlattr *nla_nest_compat_start(struct sk_buff *skb, + int attrtype, int attrlen, +
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
On Saturday 23 June 2007 02:09:18 C. Scott Ananian wrote: diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h linux-2.6.22-rc5/include/net/ip6_rdnss.h --- linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h1969-12-31 19:00:00.0 -0500 +++ linux-2.6.22-rc5/include/net/ip6_rdnss.h2007-06-21 18:16:33.0 -0400 @@ -0,0 +1,58 @@ +#ifndef _NET_IP6_RDNSS_H +#define _NET_IP6_RDNSS_H + +#ifdef __KERNEL__ + +#include linux/in6.h + +struct nd_opt_rdnss { +__u8type; +__u8length; +#if defined(__BIG_ENDIAN_BITFIELD) +__u8priority:4, +open:1, +reserved1:3; +#elif defined(__LITTLE_ENDIAN_BITFIELD) +__u8reserved1:3, +open:1, +priority:4; +#else +# error not little or big endian +#endif That is not endianess-safe. Don't use foo:x at all for stuff where a specific endianess is needed. The compiler doesn't make any guarantee about it. This was copied directly from include/net/ip6_route.h. I believe that it does in fact work, and I (for one) find this much more readable than the alternative. If it is in fact broken, then include/net/ip6_route.h (and the 35 other files which use this #ifdef in this manner) should be fixed. Yeah, it might work. But I think the compiler doesn't guarantee you anything about it. -- Greetings Michael. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Eric W. Biederman wrote: -- The basic design There will be a network namespace structure that holds the global variables for a network namespace, making those global variables per network namespace. One of those per network namespace global variables will be the loopback device. Which means the network namespace a packet resides in can be found simply by examining the network device or the socket the packet is traversing. Either a pointer to this global structure will be passed into the functions that need to reference per network namespace variables or a structure that is already passed in (such as the network device) will be modified to contain a pointer to the network namespace structure. I believe OpenVZ stores the current namespace somewhere global, which avoids passing the namespace around. Couldn't you do this as well? Depending upon the data structure it will either be modified to hold a per entry network namespace pointer or it there will be a separate copy per network namespace. For large global data structures like the ipv4 routing cache hash table adding an additional pointer to the entries appears the more reasonable solution. So the routing cache is shared between all namespaces? --- Performance In initial measurements the only performance overhead we have been able to measure is getting the packet to the network namespace. Going through ethernet bridging or routing seems to trigger copies of the packet that slow things down. When packets go directly to the network namespace no performance penalty has yet been measured. It would be interesting to find out whats triggering these copies. Do you have NAT enabled? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[SKBUFF]: Fix incorrect config #ifdef around skb_copy_secmark
[SKBUFF]: Fix incorrect config #ifdef around skb_copy_secmark secmark doesn't depend on CONFIG_NET_SCHED. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7c6a34e..8d43ae6 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -434,8 +434,8 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) n-tc_verd = CLR_TC_MUNGED(n-tc_verd); C(iif); #endif - skb_copy_secmark(n, skb); #endif + skb_copy_secmark(n, skb); C(truesize); atomic_set(n-users, 1); C(head);
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Patrick McHardy wrote: Urs Thuermann wrote: Patrick McHardy [EMAIL PROTECTED] writes: Is there a reason why you're still doing the allocate n devices on init thing instead of using the rtnl_link API? Sorry, it's simply a matter of time. We have been extremely busy with other projects and two presentations (mgmt, customers, and press) the last two weeks and have worked on the other changes this week. I'm sorry I haven't yet been able to look at your rtnl_link code close enough, but it's definitely on my todo list. Starting on Sunday I'll be on a business trip to .jp for a week, and I hope I get to it in that week, otherwise on return. Sorry, but busy is no reason for merging code that has deprecated (at least by me :)) behaviour. Please change this before submitting for inclusion. Dear Patrick, i was just looking through the mailings regarding your suggested changes (e.g. in VLAN, DUMMY and IFB) an none of them currently went into the kernel and the discussion on some topics (especially in the VLAN case) is just running. I just got an impression of what you intend to have and it looks reasonable and good to me. But anyhow it's in process and therefore i won't like to be the first adopter as you might comprehend. It is no question, that we would update to your approach as it is part of the kernel, finalized in discussion and somewhat stable. But it doesn't look adequate to me to push us to support your brand new approach as some kind of gate for an inclusion into the mainstream kernel :-( So for me it looks like, that we should get the feedback from Jamal if our usage of skb-iif fits the intention of skb-iif and if we should set the incoming interface index ourselves of if we let netif_receive_skb() do this job. After that discussion i currently can not see any reason, why the PF_CAN support should not go into the mainstream kernel. I daily get positive community feedback about this matching implementation for the Linux kernel and it's elegant manner of usage for application programmers. On our TODO list there is the netlink support as well as the usage of hrtimers in our broadcast manager - but both have no vital influence to the new protocol family PF_CAN and therefore it should not slow down the inclusion process. Be sure that we'll support netlink immediately, when it hits the road for other drivers also. Best regards, Oliver - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Hello, Le samedi 23 juin 2007, David Stevens a écrit : Why not make the application that writes resolv.conf also listen on a raw ICMPv6 socket? I don't believe you'd need any kernel changes, then, and it seems pretty simple and straightforward. Unfortunately, ICMPv6 raw sockets will not work quite properly here, without modifications. At the moment, such a socket will queue just about any Router Advertisement that is received by the host. Now, assuming the userland daemon did sanity check the message (properly formatted, source and destination addresses are sane, etc), it needs to know whether the IPv6 kernel stack has accepted it or not. It could be that the interface the RA was received on had autoconf disabled at the time the packet showed up, or it could be that the system is currently configured as a router, or it could be that we have a SeND-patched kernel and the RA did not pass authentication checks. And then, what happens if IPv6 networking has been initialized before init got the chance to start the daemon, for instance root over NFS/IPv6? The RA is lost. Similarly, the daemon has no way to know when information gathered from an RA becomes invalid. Of course, it can duplicate the lifetime timers in userland, but only the kernel knows if the link has been reset to off and on earlier than lifetime expiration. Whether parsing RDNSS-in-RA belong in the kernel is irrelevant to me, as the kernel does not provide any interface for userland to do it properly at the moment. -- Rémi Denis-Courmont http://www.remlab.net/ signature.asc Description: This is a digitally signed message part.
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Oliver Hartkopp wrote: Patrick McHardy wrote: Sorry, it's simply a matter of time. We have been extremely busy with other projects and two presentations (mgmt, customers, and press) the last two weeks and have worked on the other changes this week. I'm sorry I haven't yet been able to look at your rtnl_link code close enough, but it's definitely on my todo list. Starting on Sunday I'll be on a business trip to .jp for a week, and I hope I get to it in that week, otherwise on return. Sorry, but busy is no reason for merging code that has deprecated (at least by me :)) behaviour. Please change this before submitting for inclusion. i was just looking through the mailings regarding your suggested changes (e.g. in VLAN, DUMMY and IFB) an none of them currently went into the kernel and the discussion on some topics (especially in the VLAN case) is just running. They are all in the net-2.6.23 tree. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Patrick McHardy wrote: Eric W. Biederman wrote: -- The basic design There will be a network namespace structure that holds the global variables for a network namespace, making those global variables per network namespace. One of those per network namespace global variables will be the loopback device. Which means the network namespace a packet resides in can be found simply by examining the network device or the socket the packet is traversing. Either a pointer to this global structure will be passed into the functions that need to reference per network namespace variables or a structure that is already passed in (such as the network device) will be modified to contain a pointer to the network namespace structure. I believe OpenVZ stores the current namespace somewhere global, which avoids passing the namespace around. Couldn't you do this as well? Will we be able to have a single application be in multiple name-spaces? Thanks, Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SKBUFF]: Fix incorrect config #ifdef around skb_copy_secmark
Thanks. Acked-by: James Morris [EMAIL PROTECTED] -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Le samedi 23 juin 2007, David Stevens a écrit : No, in fact! I didn't hear anyone suggesting that all of neighbor discovery be pushed out of the kernel. All I suggested is that you read a raw ICMPv6 socket for RA's that have the RDNS header and the app _process_the_RDNS_header. The kernel should still continue to do everything it needs to with the kernel data in the RA. Then you just need a hash table (or maybe just a list -- there shouldn't be a lot of them) and a timer to delete them when the RDNS expiration hits. Easy, right? The exact thing I pointed out does not work. I *DID* write RA parsing in userland in the past. You might have to change the icmp6_filter, if RA's are not already copied to raw sockets (I don't know either way offhand), but that's a trivial kernel patch; otherwise, I don't believe you have to do anything but read the socket and process the RDNS header on RAs you receive. To reiterate: How do I authenticate SeND RA? How do I deal with the link going down before the expiration? How do I know this interface is doing autoconf at all? -- Rémi Denis-Courmont http://www.remlab.net/ signature.asc Description: This is a digitally signed message part.
Re: [PATCH] Ethernet driver for EISA only SNI RM200/RM400 machines
On Fri, 22 Jun 2007 21:53:58 +0200 [EMAIL PROTECTED] (Thomas Bogendoerfer) wrote: Hi, This is new ethernet driver, which use the code taken out of lasi_82596 (done by the other patch I just sent). Thomas. Ethernet driver for EISA only SNI RM200/RM400 machines ... +static char sni_82596_string[] = snirm_82596; const? + +#define DMA_ALLOC dma_alloc_coherent +#define DMA_FREE dma_free_coherent +#define DMA_WBACK(priv, addr, len) do { } while (0) +#define DMA_INV(priv, addr, len) do { } while (0) +#define DMA_WBACK_INV(priv, addr, len) do { } while (0) + +#define SYSBUS 0x4400 + +/* big endian CPU, 82596 little endian */ +#define SWAP32(x) cpu_to_le32((u32)(x)) +#define SWAP16(x) cpu_to_le16((u16)(x)) + +#define OPT_MPU_16BIT0x01 + +static inline void CA(struct net_device *dev); +static inline void MPU_PORT(struct net_device *dev, int c, dma_addr_t x); These two function's implementations could be moved to before the #include, s we wouldn't need to forward-declare them? +#include lib82596.c ugh. Is this really unavoidable? +MODULE_AUTHOR(Thomas Bogendoerfer); +MODULE_DESCRIPTION(i82596 driver); +MODULE_LICENSE(GPL); +module_param(i596_debug, int, 0); +MODULE_PARM_DESC(i596_debug, 82596 debug mask); + +static inline void CA(struct net_device *dev) +{ + struct i596_private *lp = netdev_priv(dev); + + writel(0, lp-ca); +} + + +static inline void MPU_PORT(struct net_device *dev, int c, dma_addr_t x) +{ + struct i596_private *lp = netdev_priv(dev); + + u32 v = (u32) (c) | (u32) (x); + + if (lp-options OPT_MPU_16BIT) { + writew(v 0x, lp-mpu_port); + wmb(); udelay(1); /* order writes to MPU port */ Nope, please put these on separate lines. No exceptions.. + writew(v 16, lp-mpu_port); + } else { + writel(v, lp-mpu_port); + wmb(); udelay(1); /* order writes to MPU port */ + writel(v, lp-mpu_port); + } +} Three callsites: This looks too large to inline. I see no reason why this and CA() are have upper-case names? + +static int __devinit sni_82596_probe(struct platform_device *dev) +{ + struct net_device *netdevice; + struct i596_private *lp; + struct resource *res, *ca, *idprom, *options; + int retval = -ENODEV; + static int init; + void __iomem *mpu_addr = NULL; + void __iomem *ca_addr = NULL; + u8 __iomem *eth_addr = NULL; + + if (init == 0) { + printk(KERN_INFO SNI_82596_DRIVER_VERSION \n); + init++; + } Might as well do this message in the module_init() function? There's a per-probed-device message later on anwyay. The patchset tries to add rather a lot of new trailing whitespace btw. + res = platform_get_resource(dev, IORESOURCE_MEM, 0); + if (!res) + goto probe_failed; + mpu_addr = ioremap_nocache(res-start, 4); + if (!mpu_addr) { + retval = -ENOMEM; + goto probe_failed; + } + ca = platform_get_resource(dev, IORESOURCE_MEM, 1); + if (!ca) + goto probe_failed; + ca_addr = ioremap_nocache(ca-start, 4); + if (!ca_addr) { + retval = -ENOMEM; + goto probe_failed; + } + idprom = platform_get_resource(dev, IORESOURCE_MEM, 2); + if (!idprom) + goto probe_failed; + eth_addr = ioremap_nocache(idprom-start, 0x10); + if (!eth_addr) { + retval = -ENOMEM; + goto probe_failed; + } + options = platform_get_resource(dev, 0, 0); + if (!options) + goto probe_failed; + + printk(KERN_INFO Found i82596 at 0x%x\n, res-start); + + netdevice = alloc_etherdev(sizeof(struct i596_private)); + if (!netdevice) { + retval = -ENOMEM; + goto probe_failed; + } + SET_NETDEV_DEV(netdevice, dev-dev); + platform_set_drvdata (dev, netdevice); + + netdevice-base_addr = res-start; + netdevice-irq = platform_get_irq(dev, 0); + + /* someone seams to like messed up stuff */ + netdevice-dev_addr[0] = readb(eth_addr + 0x0b); + netdevice-dev_addr[1] = readb(eth_addr + 0x0a); + netdevice-dev_addr[2] = readb(eth_addr + 0x09); + netdevice-dev_addr[3] = readb(eth_addr + 0x08); + netdevice-dev_addr[4] = readb(eth_addr + 0x07); + netdevice-dev_addr[5] = readb(eth_addr + 0x06); + iounmap(eth_addr); + + if (!netdevice-irq) { + printk(KERN_ERR %s: IRQ not found for i82596 at 0x%lx\n, + __FILE__, netdevice-base_addr); + goto probe_failed; + } + + lp = netdev_priv(netdevice); + lp-options = options-flags IORESOURCE_BITS; + lp-ca = ca_addr; + lp-mpu_port = mpu_addr; + + retval =
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
[EMAIL PROTECTED] wrote on 06/23/2007 07:47:06 AM: Rémi and Simon give my responses very eloquently. Although you could have yet-another-network-daemon redundantly process RA messages, the kernel is doing it already and it makes sense to just push this It would be two pieces looking at the same packet, but it isn't redundant processing. The kernel would ignore the RDNS header, and the app would ignore everything else; everything would be processed once. Although parsing RA messages and processing expiry in userland looks barely-possible now, barely possible?? See below. SeND support is really necessary for long-term IPv6 security, and duplicating SeND functionality in userland would be a nightmare. Futher, the neighbor discover protocol involves Router Solicitation messages which elicit the Router Advertisement reply, and we really don't want userland sending redundant Router Solicitation messages around, just because the kernel doesn't want to tell it what Router Advertisements it received. I considered storing the *complete* Router Advertisement messages received and pushing them unparsed to userland, just to get around the bogus DNS in the kernel politics (hint: it's not a resolver in the kernel, it's just nameserver addresses being stored). Does anyone really suggest that this would be a better solution? No, in fact! I didn't hear anyone suggesting that all of neighbor discovery be pushed out of the kernel. All I suggested is that you read a raw ICMPv6 socket for RA's that have the RDNS header and the app _process_the_RDNS_header. The kernel should still continue to do everything it needs to with the kernel data in the RA. Then you just need a hash table (or maybe just a list -- there shouldn't be a lot of them) and a timer to delete them when the RDNS expiration hits. Easy, right? You might have to change the icmp6_filter, if RA's are not already copied to raw sockets (I don't know either way offhand), but that's a trivial kernel patch; otherwise, I don't believe you have to do anything but read the socket and process the RDNS header on RAs you receive. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix race in AF_UNIX
Miklos Szeredi [EMAIL PROTECTED] writes: Right. But the devil is in the details, and (as you correctly point out later) to implement this, the whole locking scheme needs to be overhauled. Problems: - Using the queue lock to make the dequeue and the fd detach atomic wrt the GC is difficult, if not impossible: they are are far from each other with various magic in between. It would need thorough understanding of these functions and _big_ changes to implement. - Sleeping on u-readlock in GC is currently not possible, since that could deadlock with unix_dgram_recvmsg(). That function could probably be modified to release u-readlock, while waiting for data, similarly to unix_stream_recvmsg() at the cost of some added complexity. - Sleeping on u-readlock is also impossible, because GC is holding unix_table_lock for the whole operation. We could release unix_table_lock, but then would have to cope with sockets coming and going, making the current socket iterator unworkable. So theoretically it's quite simple, but it needs big changes. And this wouldn't even solve all the problems with the GC, like being a possible DoS vector. Making the GC fully incremental will solve the DoS vector problem as well. Basically you do a fixed amount of reclaim in the new socket allocation code. It appears clear that since we can't stop the world and garbage collect we need an incremental collector. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
On 23/06/07 15:47, C. Scott Ananian wrote: Advertisements it received. I considered storing the *complete* Router Advertisement messages received and pushing them unparsed to userland, just to get around the bogus DNS in the kernel politics (hint: it's not a resolver in the kernel, it's just nameserver addresses being stored). Does anyone really suggest that this would be a better solution? Yes, but I don't think it should be completely unparsed - it should be possible to retrieve the data for a specific attribute type with expiration information and with notification of changes. The kernel has to read RAs anyway, why shouldn't it store them in a way that userspace can access it on demand? A /proc file which is in resolv.conf format is definitely *wrong*, and while I'd argue for DNS being special enough to export its attributes is it really too much to have the kernel provide everything from the last valid message in a partially parsed format? Applications would then parse the data section for RA attributes they understand. -- Simon Arlott - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Patrick McHardy wrote: Oliver Hartkopp wrote: i was just looking through the mailings regarding your suggested changes (e.g. in VLAN, DUMMY and IFB) an none of them currently went into the kernel (..) They are all in the net-2.6.23 tree. Ah, ok - that wasn't on my radar as i missed the mail from Dave to you at June 13th ... @Dave: Please consider to schedule the PF_CAN stuff for inclusion into 2.6.23 also. Thx. Btw. for next week, we'll ... 1. ... wait for Jamals feedback about skb-iif usage 2. ... move the vcan driver for the new netlink API So that we can finally go for net-2.6.23 at the end of next week, if there are no new issues from other reviewers until then. @Patrick: The changes in dummy.c and ifb.c for the netlink support do not look very complicated (not even for me ;-)) When these changes are implemented, how do i create/remove my interfaces? Is there any userspace tool like 'tc' for that? Thx regards, Oliver - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Rémi and Simon give my responses very eloquently. Although you could have yet-another-network-daemon redundantly process RA messages, the kernel is doing it already and it makes sense to just push this information to userland using /proc and/or netlink. Although parsing RA messages and processing expiry in userland looks barely-possible now, SeND support is really necessary for long-term IPv6 security, and duplicating SeND functionality in userland would be a nightmare. Futher, the neighbor discover protocol involves Router Solicitation messages which elicit the Router Advertisement reply, and we really don't want userland sending redundant Router Solicitation messages around, just because the kernel doesn't want to tell it what Router Advertisements it received. I considered storing the *complete* Router Advertisement messages received and pushing them unparsed to userland, just to get around the bogus DNS in the kernel politics (hint: it's not a resolver in the kernel, it's just nameserver addresses being stored). Does anyone really suggest that this would be a better solution? The goal is to push the userland component into glibc, likely through a NSS resolver plugin. Current glibc doesn't do any processing to determine when /etc/resolv.conf has changed, which is a problem for long-running applications. Exporting RDNSS-in-RA via netlink messages (or by poll() on a /proc file as is done for /proc/pid/mounts, which was suggested in linux-kernel) is an elegant solution that (as Rémi noted) cleanly handles interface up/down/reconfig, route expiration, and (eventually) the cryptographic neighbor discovery protocol without weaving a web of hairs from the kernel to the resolver. --scott -- ( http://cscott.net/ ) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Oliver Hartkopp wrote: @Patrick: The changes in dummy.c and ifb.c for the netlink support do not look very complicated (not even for me ;-)) I have a patch to make it even simpler, it basically needs only the rtnl_link_ops structures initialized with one or two members for devices like dummy and ifb. Will push once we're through the patches I sent recently, until then please use the current interface. When these changes are implemented, how do i create/remove my interfaces? Is there any userspace tool like 'tc' for that? Its ip. I think I've CCed you or one of your colleagues on the patches, otherwise please check the list. For a device like yours it only needs the patch implementing general RTM_NEWLINK support, unless you want to make the loopback parameter configurable, in which case you would need to add something like iplink_vlan that parses the parameter. BTW, in case the loopback device is required for normal operation it might make sense to create *one* device by default, but four identical devices seems a bit extreme. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Patrick McHardy [EMAIL PROTECTED] writes: Eric W. Biederman wrote: -- The basic design There will be a network namespace structure that holds the global variables for a network namespace, making those global variables per network namespace. One of those per network namespace global variables will be the loopback device. Which means the network namespace a packet resides in can be found simply by examining the network device or the socket the packet is traversing. Either a pointer to this global structure will be passed into the functions that need to reference per network namespace variables or a structure that is already passed in (such as the network device) will be modified to contain a pointer to the network namespace structure. I believe OpenVZ stores the current namespace somewhere global, which avoids passing the namespace around. Couldn't you do this as well? It sucks. Especially in the corner cases. Think macvlan with the real network device in one namespace and the ``vlan'' device in another device. The implementation of a global is also pretty a little questionable. Last I looked it didn't work on the transmit path at all and interesting on the receive path. Further and fundamentally all a global achieves is removing the need for the noise patches where you pass the pointer into the various functions. For long term maintenance it doesn't help anything. All of the other changes such as messing with the initialization/cleanup and changing access to access the per network namespace data structure, and modifying the code partly along the way to reject working in other non-default network namespaces that are truly intrusive we both still have to make. So except as an implementation detail how we pass the per network namespace pointer is uninteresting. Currently I am trying for the least clever most straight forward implementation I can find, that doesn't give us a regression in network stack performance. So yes if we want to do passing through a magic per cpu global on the packet receive path now is the time to decide to do that. Currently I don't see the advantage in doing that so I'm not suggesting it. In general if there are any specific objections people have written complicated code that allows us to avoid those objections, so it should just be a matter of dusting those patches off. I would much rather go with something stupid and simple if people are willing to merge that however. Depending upon the data structure it will either be modified to hold a per entry network namespace pointer or it there will be a separate copy per network namespace. For large global data structures like the ipv4 routing cache hash table adding an additional pointer to the entries appears the more reasonable solution. So the routing cache is shared between all namespaces? Yes. Each namespaces has it's own view so semantically it's not shared. But the initial fan out of the hash table 2M or something isn't something we want to replicate on a per namespace basis even assuming the huge page allocations could happen. So we just tag the entries and add the network namespace as one more part of the key when doing hash table look ups. --- Performance In initial measurements the only performance overhead we have been able to measure is getting the packet to the network namespace. Going through ethernet bridging or routing seems to trigger copies of the packet that slow things down. When packets go directly to the network namespace no performance penalty has yet been measured. It would be interesting to find out whats triggering these copies. Do you have NAT enabled? I would have to go back and look. There was a skb_cow call someplace in the routing path. Something else with ipfilter, ethernet bridging. So yes it is probably interesting to dig into. So the thread where we dug into this last time to the point of identifying the problem is here: https://lists.linux-foundation.org/pipermail/containers/2007-March/004309.html The problem in the bridging was here: https://lists.linux-foundation.org/pipermail/containers/2007-March/004336.html I can't find a good pointer to the bit of discussion that described the routing. I just remember it was an skb_cow somewhere in the routing output path, I believe at the point where we write in the new destination IP. I haven't a clue why the copy was triggering. Design wise the interesting bit was it nothing was measurable when the network device was in the network namespace. So adding an extra pointer parameter to functions and dereferencing the pointer has not measurably affected performance at this point. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Oliver Hartkopp wrote: Patrick McHardy wrote: BTW, in case the loopback device is required for normal operation it might make sense to create *one* device by default, but four identical devices seems a bit extreme. As i wrote before CAN addressing consists of CAN-Identifiers and the used interface. The use of four vcan's is definitely a usual case! It should create as many devices as necessary to operate (similar to the loopback device) by default. Optional interfaces that are used for addressing reasons should be manually added by the user as needed. And it should not use module parameters for that please. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] e1000: Work around 82571 completion timout on Pseries HW
Christoph Hellwig wrote: On Thu, May 17, 2007 at 09:58:03AM -0500, Wen Xiong wrote: It really shouldn't be there at all because something in either the intel or pseries hardware is totally buggy and we should disable features in the buggy one completely. Hi, Here there are not hardware issue on both Intel or PPC. The patch is to work around a loop hold on early version of PCI SGI spec. The later PCI Sgi have spec have corrected it. We can just implement it for PPC only. Other vendor may have the same issue. In this case we should add a blacklist for implementations of the old spec. There should be a way to find specific bridges in the OF firmware tree on powerpc and similar things on other platforms aswell. Yes, this is almost what we did. IBM is currently testing my patches that implement a generic pci quirk that will be enabled for only selected root complex ID's that require the (1.0a spec) device to disable the completion timeouts. They are currently validating this test on the affected hardware. I expect to get the results within a week and then I will post the patch. Since this is one of the few holes in between the two specs (where manual intervention is needed) I think that a single quirk is a fairly sane approach. Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Stephen Hemminger [EMAIL PROTECTED] writes: On Sat, 23 Jun 2007 08:20:40 -0700 Ben Greear [EMAIL PROTECTED] wrote: Patrick McHardy wrote: Eric W. Biederman wrote: -- The basic design There will be a network namespace structure that holds the global variables for a network namespace, making those global variables per network namespace. One of those per network namespace global variables will be the loopback device. Which means the network namespace a packet resides in can be found simply by examining the network device or the socket the packet is traversing. Either a pointer to this global structure will be passed into the functions that need to reference per network namespace variables or a structure that is already passed in (such as the network device) will be modified to contain a pointer to the network namespace structure. I believe OpenVZ stores the current namespace somewhere global, which avoids passing the namespace around. Couldn't you do this as well? Maybe the current namespace should be attached to something else like sysfs root? Having multiple namespace indirection possiblities leads to interesting cases where current namespace is not correctly associated with current sysfs tree or current proc tree, ... Yes. There are some oddities there. In my current tree there is code that makes proc and sysfs match the inspecting process. I haven't quite solved the inspection problem where we want to look at the namespace of a different process. But as long as we have clean code to do the basics that isn't a big leap when we come to it. I'm not really seeing any problems along this line at this point. The big problem at this point is code review and merging, and in particular breaking this work up into small enough pieces that they can be digested, successfully code reviewed and merged. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ps3: gigabit ethernet driver for PS3, take2
MOKUNO Masakazu wrote: --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2920,6 +2920,12 @@ M: [EMAIL PROTECTED] L: [EMAIL PROTECTED] S: Maintained +PS3 NETWORK SUPPORT +P: Masakazu Mokuno +M: [EMAIL PROTECTED] +L: netdev@vger.kernel.org I think you should put [EMAIL PROTECTED] for the mail list. Users will get better support and I will be able to keep track of the inquiries. All PS3 developers monitor [EMAIL PROTECTED], but few if any monitor [EMAIL PROTECTED] -Geoff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Rémi Denis-Courmont [EMAIL PROTECTED] wrote on 06/23/2007 09:51:55 AM: How do I authenticate SeND RA? How do I deal with the link going down before the expiration? How do I know this interface is doing autoconf at all? The kernel should do the authentication, as it will for other RA's, and should not deliver (IMAO) unauthenticated packets. If it is, I would consider that a bug (for all cases, not just this), and that would be a good thing to fix. :-) An interface going down doesn't directly invalidate a DNS server address, though it may not be the best through another interface. Since it is a list, I think doing nothing for this case wouldn't be terrible. This is no worse than the existing resolver code. But if you really need it, you can monitor netlink, or poll the interface flags on whatever interval you require for detection. As for autoconf, that's available from sysctl, I assume from /proc somewhere, too. That usually doesn't change, but if you want to account for runtime configuration changes, you can always monitor netlink and reread when new addresses appear, too. There certainly may be complications I haven't thought of, since I haven't implemented it. But I still don't see a good case for using the kernel as a DNS database. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Ben Greear [EMAIL PROTECTED] writes: Patrick McHardy wrote: Eric W. Biederman wrote: -- The basic design There will be a network namespace structure that holds the global variables for a network namespace, making those global variables per network namespace. One of those per network namespace global variables will be the loopback device. Which means the network namespace a packet resides in can be found simply by examining the network device or the socket the packet is traversing. Either a pointer to this global structure will be passed into the functions that need to reference per network namespace variables or a structure that is already passed in (such as the network device) will be modified to contain a pointer to the network namespace structure. I believe OpenVZ stores the current namespace somewhere global, which avoids passing the namespace around. Couldn't you do this as well? Will we be able to have a single application be in multiple name-spaces? A single application certainly. But then an application can be composed of multiple processes which can be composed of multiple threads. In my current patches a single task_struct belongs to a single network namespace. That namespace is used when creating sockets. The sockets themselves have a namespace tag and that is used when transmitting packets, or otherwise operating on the socket. So if you pass a socket from one process to another you can have sockets that belong to different network namespaces in a single task. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
On Sat, 23 Jun 2007 08:20:40 -0700 Ben Greear [EMAIL PROTECTED] wrote: Patrick McHardy wrote: Eric W. Biederman wrote: -- The basic design There will be a network namespace structure that holds the global variables for a network namespace, making those global variables per network namespace. One of those per network namespace global variables will be the loopback device. Which means the network namespace a packet resides in can be found simply by examining the network device or the socket the packet is traversing. Either a pointer to this global structure will be passed into the functions that need to reference per network namespace variables or a structure that is already passed in (such as the network device) will be modified to contain a pointer to the network namespace structure. I believe OpenVZ stores the current namespace somewhere global, which avoids passing the namespace around. Couldn't you do this as well? Maybe the current namespace should be attached to something else like sysfs root? Having multiple namespace indirection possiblities leads to interesting cases where current namespace is not correctly associated with current sysfs tree or current proc tree, ... Will we be able to have a single application be in multiple name-spaces? That would break the whole point of namespaces... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][Resend] TIPC: Fix infinite loop in netlink handler
From: Florian Westphal [EMAIL PROTECTED] The tipc netlink config handler uses the nlmsg_pid from the request header as destination for its reply. If the application initialized nlmsg_pid to 0, the reply is looped back to the kernel, causing hangup. Fix: use nlmsg_pid of the skb that triggered the request. Signed-off-by: Florian Westphal [EMAIL PROTECTED] --- I already sent this to netdev@ on the 19th, but the patch itself was neither acked nor Nacked. This is a crash that can be triggered trivially -- please fix this bug. net/tipc/netlink.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/tipc/netlink.c b/net/tipc/netlink.c index 4cdafa2..6a7f7b4 100644 --- a/net/tipc/netlink.c +++ b/net/tipc/netlink.c @@ -60,7 +60,7 @@ static int handle_cmd(struct sk_buff *skb, struct genl_info *info) rep_nlh = nlmsg_hdr(rep_buf); memcpy(rep_nlh, req_nlh, hdr_space); rep_nlh-nlmsg_len = rep_buf-len; - genlmsg_unicast(rep_buf, req_nlh-nlmsg_pid); + genlmsg_unicast(rep_buf, NETLINK_CB(skb).pid); } return 0; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Eric W. Biederman wrote: Patrick McHardy [EMAIL PROTECTED] writes: I believe OpenVZ stores the current namespace somewhere global, which avoids passing the namespace around. Couldn't you do this as well? It sucks. Especially in the corner cases. Think macvlan with the real network device in one namespace and the ``vlan'' device in another device. The implementation of a global is also pretty a little questionable. Last I looked it didn't work on the transmit path at all and interesting on the receive path. Further and fundamentally all a global achieves is removing the need for the noise patches where you pass the pointer into the various functions. For long term maintenance it doesn't help anything. All of the other changes such as messing with the initialization/cleanup and changing access to access the per network namespace data structure, and modifying the code partly along the way to reject working in other non-default network namespaces that are truly intrusive we both still have to make. So except as an implementation detail how we pass the per network namespace pointer is uninteresting. Currently I am trying for the least clever most straight forward implementation I can find, that doesn't give us a regression in network stack performance. So yes if we want to do passing through a magic per cpu global on the packet receive path now is the time to decide to do that. Currently I don't see the advantage in doing that so I'm not suggesting it. I think your approach is fine and is probably a lot easier to review than using something global. Depending upon the data structure it will either be modified to hold a per entry network namespace pointer or it there will be a separate copy per network namespace. For large global data structures like the ipv4 routing cache hash table adding an additional pointer to the entries appears the more reasonable solution. So the routing cache is shared between all namespaces? Yes. Each namespaces has it's own view so semantically it's not shared. But the initial fan out of the hash table 2M or something isn't something we want to replicate on a per namespace basis even assuming the huge page allocations could happen. So we just tag the entries and add the network namespace as one more part of the key when doing hash table look ups. I can wait for the patches, but I would be interested in how GC is performed and whether limits can be configured per namespace. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2.6.22-rc5] cxgb2: handle possible NULL pointer dereferencing, take 2
On Thu, 21 Jun 2007 18:48:30 +0530 pradeep singh [EMAIL PROTECTED] wrote: Hi, My mistake. Resending after reformatting the patch by hand. Looks like gmail messes the plain text patches. That's still mangled so I typed it in again. Please always include a full changlog with each version of a patch. I do not know what this patch does - please provide a changelog. In this case it should tell us whether and how this null pointer deref is actually occuring and if so, why. As well as a full description of the problem which it solves, a changelog should also describe _how_ it solved it, but that is sufficiently obvious in this case. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Rémi Denis-Courmont [EMAIL PROTECTED] wrote on 06/23/2007 11:13:01 AM: An implementation might perform additional validity checks on the ICMPv6 message content and discard malformed packets. However, a portable application must not assume that such validity checks have been performed. This doesn't say that unauthenticated packets must be delivered, and I don't think the portability of an RDNS daemon is an issue. But even if you really wanted to run the same code on a non-Linux machine, it just means that your daemon code would have to do its own authentication. Reading /proc or netlink with packet formats you've defined to get this information is not more portable to non-Linux machines, right? I don't see any issue here. If an application is relying on the ability to see forged packets for portability reasons, it's probably not an application you want running on your machine. :-) That would encourage people into running open recursive DNS servers which is widely known and documented as a bad practice. Definitely a very bad idea. I don't understand your point here. I'm talking about client behaviour, and if the client fails for a server from a downed interface, I don't see how that's different from removing the client from the list, which is what you want to do. Nobody should feel encouraged to do anything different on the server side-- at least not by me! But if you really need it, you can monitor netlink, or poll the interface flags on whatever interval you require for detection. As for autoconf, that's available from sysctl, I assume from /proc somewhere, too. That usually doesn't change, but if you want to account for runtime configuration changes, you can always monitor netlink and reread when new addresses appear, too. There are a bunch of parameters that determine whether an interface accepts RAs or not. I doubt it's wise to try to reimplement that into userspace, particularly if it is subject to change. I'm not suggesting re-implementing anything; I'm saying you can read the current state at application level, if you need it. If you think it's difficult to get the correct information from existing API's, then improving those API's is always worthwhile. I don't believe it's excessively difficult to determine if autoconf is in use, though. My point is raw IPv6 sockets are not usable for the time being, and I do not see anyway to fix without modifying the kernel. I disagree about raw sockets being usable, but without modifying the kernel isn't a constraint. modifying the kernel != put DNS server info in the kernel; if there's a bug, or some minor tweaking that'd help the feature along, I'd support that. The important point for me is that the basic mechanisms are already in place, and I think it'd be best to use those rather than creating a new interface for all of this. The userspace DNS configuration daemon might need to be started later than the kernel autoconf - another issue that needs help from the kernel. Easily done; the init scripts are what bring the interfaces up in the first place, so start the daemon before those run. Adding an entry in inittab so it'll be automatically restarted if it dies is also a reasonable thing. RA's are resent periodically, and they can be lost anyway, so not the end of the world if you miss one, either. +-DLS - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Le samedi 23 juin 2007, David Stevens a écrit : This doesn't say that unauthenticated packets must be delivered, and I don't think the portability of an RDNS daemon is an issue. But even if you really wanted to run the same code on a non-Linux machine, it just means that your daemon code would have to do its own authentication. Reading /proc or netlink with packet formats you've defined to get this information is not more portable to non-Linux machines, right? I don't see any issue here. If an application is relying on the ability to see forged packets for portability reasons, it's probably not an application you want running on your machine. :-) It so happens that the very userland applications that are currently using raw ICMPv6 sockets to see RAs *DO* want to see them all. As far as I know, they are all monitoring softwares (radvdump from radvd, rdisc6 from ndisc6, and probably scapy as well) where you do want to see problematic packets. All in all, this would break well-behaved standard-abiding userland applications... The userspace DNS configuration daemon might need to be started later than the kernel autoconf - another issue that needs help from the kernel. Easily done; the init scripts are what bring the interfaces up in the first place, so start the daemon before those run. Adding an entry in inittab so it'll be automatically restarted if it dies is also a reasonable thing. RA's are resent periodically, and they can be lost anyway, so not the end of the world if you miss one, either. What about NFS root? the network interface will already be up before even the real init gets started, let alone the userland RDNSS daemon. resent periodically... at a default rate of one every 10 minutes! I surely hope your desktop boots up faster than that. Besides, some links do not have unsolicited advertisements at all (I have seen such a PPPoA link for instance). An ugly kludge would be to send a RS from userland, but that's not so great considering routers are rate-limiting their RAs. The only way is for the kernel to remember something about the last processed RA. That disqualifies raw ICMPv6 sockets. -- Rémi Denis-Courmont http://www.remlab.net/ signature.asc Description: This is a digitally signed message part.
Re: [RFD] L2 Network namespace infrastructure
Patrick McHardy [EMAIL PROTECTED] writes: Depending upon the data structure it will either be modified to hold a per entry network namespace pointer or it there will be a separate copy per network namespace. For large global data structures like the ipv4 routing cache hash table adding an additional pointer to the entries appears the more reasonable solution. So the routing cache is shared between all namespaces? Yes. Each namespaces has it's own view so semantically it's not shared. But the initial fan out of the hash table 2M or something isn't something we want to replicate on a per namespace basis even assuming the huge page allocations could happen. So we just tag the entries and add the network namespace as one more part of the key when doing hash table look ups. I can wait for the patches, but I would be interested in how GC is performed and whether limits can be configured per namespace. Currently I believe the gc code is unmodified in my patches. Currently I have been focusing on the normal semantics and just making something work in a mergeable fashion. Limits and the like are comparatively easy to add in after the rest is working so I haven't been focusing on that. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Le samedi 23 juin 2007, David Stevens a écrit : The kernel should do the authentication, as it will for other RA's, and should not deliver (IMAO) unauthenticated packets. If it is, I would consider that a bug (for all cases, not just this), and that would be a good thing to fix. :-) I am all for an interface whereby the kernel queues all accepted RAs for userland to process additionnal parameters... but that's totally NOT how ICMPv6 raw sockets currently work, and it would be a very significant departure from the Advanced IPv6 Socket API (RFC 3542, in particular §3.3): An implementation might perform additional validity checks on the ICMPv6 message content and discard malformed packets. However, a portable application must not assume that such validity checks have been performed. Being malformed does not include failing authentication, or the local host not using autoconf. I am all for a setsockopt() that limits delivery to accepted RAs, but it does not currently exist. An interface going down doesn't directly invalidate a DNS server address, though it may not be the best through another interface. Since it is a list, I think doing nothing for this case wouldn't be terrible. This is no worse than the existing resolver code. That would encourage people into running open recursive DNS servers which is widely known and documented as a bad practice. Definitely a very bad idea. But if you really need it, you can monitor netlink, or poll the interface flags on whatever interval you require for detection. As for autoconf, that's available from sysctl, I assume from /proc somewhere, too. That usually doesn't change, but if you want to account for runtime configuration changes, you can always monitor netlink and reread when new addresses appear, too. There are a bunch of parameters that determine whether an interface accepts RAs or not. I doubt it's wise to try to reimplement that into userspace, particularly if it is subject to change. There certainly may be complications I haven't thought of, since I haven't implemented it. But I still don't see a good case for using the kernel as a DNS database. I never said the kernel needed to parse DNS messages by itself. My point is raw IPv6 sockets are not usable for the time being, and I do not see anyway to fix without modifying the kernel. The userspace DNS configuration daemon might need to be started later than the kernel autoconf - another issue that needs help from the kernel. -- Rémi Denis-Courmont http://www.remlab.net/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Patrick McHardy wrote: BTW, in case the loopback device is required for normal operation it might make sense to create *one* device by default, but four identical devices seems a bit extreme. As i wrote before CAN addressing consists of CAN-Identifiers and the used interface. The use of four vcan's is definitely a usual case! Oliver - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
On 23.06.2007 19:19, Eric W. Biederman wrote: Patrick McHardy [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Depending upon the data structure it will either be modified to hold a per entry network namespace pointer or it there will be a separate copy per network namespace. For large global data structures like the ipv4 routing cache hash table adding an additional pointer to the entries appears the more reasonable solution. So the routing cache is shared between all namespaces? Yes. Each namespaces has it's own view so semantically it's not shared. But the initial fan out of the hash table 2M or something isn't something we want to replicate on a per namespace basis even assuming the huge page allocations could happen. So we just tag the entries and add the network namespace as one more part of the key when doing hash table look ups. Can one namespace DoS other namespaces' access to the routing cache? Two scenarios come to mind: * provoking hash collisions * lock contention (sorry, haven't checked whether/how we do locking) Regards, Carl-Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
On Sat, Jun 23, 2007, David Stevens wrote: There certainly may be complications I haven't thought of, since I haven't implemented it. But I still don't see a good case for using the kernel as a DNS database. Excuse me for being a bit confused by the approach that you suggest, as so far it doesn't look very good to me either: I would be glad if you could clarify some points for the sake of the discussion. The kernel should do the authentication, as it will for other RA's, and should not deliver (IMAO) unauthenticated packets. If it is, I would consider that a bug (for all cases, not just this), and that would be a good thing to fix. :-) You were talking about a raw ICMPv6 socket, right? Though, isn't the point of a raw socket to be raw? Would it be a not-so-raw raw socket, dropping a few unwanted packets? An interface going down doesn't directly invalidate a DNS server address, though it may not be the best through another interface. Since it is a list, I think doing nothing for this case wouldn't be terrible. This is no worse than the existing resolver code. But if you really need it, you can monitor netlink, or poll the interface flags on whatever interval you require for detection. As for autoconf, that's available from sysctl, I assume from /proc somewhere, too. That usually doesn't change, but if you want to account for runtime configuration changes, you can always monitor netlink and reread when new addresses appear, too. If I understand well you suggest that in order to do things properly, the application should keep track of a lot of kernel-related stuff? I mean, the daemon, as the simple piece of code that you seem to have in mind, should only care about processing RA options that it receives: network/RA/configuration/availability concerns are precisely the role of the kernel, which it is already fulfilling, isn't it? It just looks naturally workable in the case where the kernel processes these options first, and then handles them to the daemon. Also, I think that RAs can be considered as a part of IPv6, right? As opposed to DHCP that is indeed an applicative protocol, I can't see why parts of a network protocol should be managed by a (non-networking) userland application. Saying that it can only be used at application layer doesn't look like a very good case for having networking packets handled by userland instead of the kernel, and seems rather selfish from the OS. Am I expecting too much as a user? I had the understanding that it was a better design to clearly handle autoconfiguration in one place, and not to scatter it between kernel and userland. For some reason, it is done in the kernel: do you mean that now the kernel should only support partial, half-way handling of RAs? It may seem a bit awkward as a solution. To me, it looks much more consistent that since the kernel already parses RA options that it needs, it be in charge of wholly processing the RA and extract and export all its options. That would be indeed practical, less error-prone and maybe more efficient than duplicating all the work to userland. Couldn't it be? After all, the fact that RDNSS be accepted as an RA option is an argument to say that it belongs in the kernel, not as DNS, but as an RA option. As you are saying to Rémi, your intent is to fix or enhance the existing, generic means of the kernel to provide an accurate access to these RA options, right? Isn't it just what we all want? -- Pierre Ynard WTS #51 - No phone Une âme dans un corps, c'est comme un dessin sur une feuille de papier. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Eric W. Biederman wrote: Ben Greear [EMAIL PROTECTED] writes: Will we be able to have a single application be in multiple name-spaces? A single application certainly. But then an application can be composed of multiple processes which can be composed of multiple threads. In my current patches a single task_struct belongs to a single network namespace. That namespace is used when creating sockets. The sockets themselves have a namespace tag and that is used when transmitting packets, or otherwise operating on the socket. So if you pass a socket from one process to another you can have sockets that belong to different network namespaces in a single task. Any chance it could allow one to use a single threaded, single process and do something like int fd1 = socket(, namespace1); int fd2 = socket(, namespace2); Or, maybe a sockopt or similar call to move a socket into a particular namespace? I can certainly see it being useful to allow a default name-space per process, but it would be nice to also allow explicit assignment of a socket to a name-space for applications that want to span a large number of name-spaces. Thanks, Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Stephen Hemminger wrote: Will we be able to have a single application be in multiple name-spaces? That would break the whole point of namespaces... I was hoping that I could open a socket in one name-space and another in another name space, and send traffic between them, within a single application. This is basically what I can do now with my send-to-self patch and (for more clever virtual-routing schemes + NAT, with a conn-track patch that Patrick cooked up for me). It seems these patches I use are not acceptable for merge, so I was hoping name-spaces might work instead. Thanks, Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Carl-Daniel Hailfinger [EMAIL PROTECTED] writes: Can one namespace DoS other namespaces' access to the routing cache? Two scenarios come to mind: * provoking hash collisions * lock contention (sorry, haven't checked whether/how we do locking) My initial expectation is that the protections we have to prevent one user from performing a DoS on another user generally cover the cases between namespaces as well. Further in general global caches and global resource management is more efficient then per namespace management. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Ben Greear [EMAIL PROTECTED] writes: Any chance it could allow one to use a single threaded, single process and do something like int fd1 = socket(, namespace1); int fd2 = socket(, namespace2); Or, maybe a sockopt or similar call to move a socket into a particular namespace? I can certainly see it being useful to allow a default name-space per process, but it would be nice to also allow explicit assignment of a socket to a name-space for applications that want to span a large number of name-spaces. That isn't the primary use case so I have not considered it much. A setsockopt call might be possible. It is also possible to have a bunch of children opening sockets for you and passing to the process that wants to do the work. If you have a sufficiently slow socket creation rate that will not be a problem just a little cumbersome. If you can open all of your sockets upfront it is possible to do something where you open your sockets then unshare your network namespace and repeat. I am committed to making general infrastructure not something that is targeted in a brittle way at only one scenario. So it may be that we can cover your scenario. However it is just enough off of the beaten path that I'm not going to worry about it the first time through. It looks like it is a very small step from where I am at to where you want to be. So you may be able to cook up something that will satisfy your requirements relatively easily. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
From: Michael Buesch [EMAIL PROTECTED] Date: Sat, 23 Jun 2007 11:07:14 +0200 Yeah, it might work. But I think the compiler doesn't guarantee you anything about it. The compiler actually does guarentee these things, and that's why we have the endian bitfield macros. You're overreacting, we've been using this stuff for more than 10 years in the basic IPV4 header structure, so stop this nonsense. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Eric W. Biederman wrote: So it may be that we can cover your scenario. However it is just enough off of the beaten path that I'm not going to worry about it the first time through. It looks like it is a very small step from where I am at to where you want to be. So you may be able to cook up something that will satisfy your requirements relatively easily. That sounds fair to me. I will assume that as long as you can migrate sockets with the methods you described, it should not be that difficult to do the same with a sockopt or similar. I'll revisit this when your patches are in mainline. Thanks, Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][Resend] TIPC: Fix infinite loop in netlink handler
From: Florian Westphal [EMAIL PROTECTED] Date: Sat, 23 Jun 2007 20:25:46 +0200 From: Florian Westphal [EMAIL PROTECTED] The tipc netlink config handler uses the nlmsg_pid from the request header as destination for its reply. If the application initialized nlmsg_pid to 0, the reply is looped back to the kernel, causing hangup. Fix: use nlmsg_pid of the skb that triggered the request. Signed-off-by: Florian Westphal [EMAIL PROTECTED] I have this patch already, I'm just backlogged :-) Please be patient. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
DM == David Miller [EMAIL PROTECTED] writes: DM To be honest I think this form of virtualization is a complete DM waste of time, even the openvz approach. You are only considering the security values of OpenVZ. Where I work, OpenVZ and Linux-vserver are used for their ability to cleanly separate processes. Security-wise, we could get the same effect just by running the processes as separate users, but management-wise it is so much easier to give them a completely separate environment. OpenVZ's network virtualization enables us to do things which are completely impossible with both the vanilla kernel and Xen -- e.g. hundreds of virtual routers, with their own routing daemons. Policy routing just doesn't cut it; it's cumbersome to set up, limited to 256 tables, and routing daemons generally can't handle it well, if at all. /Benny - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NET: Multiple queue hardware support
Please consider these patches for 2.6.23 inclusion. These patches are built against Patrick McHardy's recently submitted RTNETLINK nested compat attribute patches. They're needed to preserve ABI between sch_{rr|prio} and iproute2. Updates since the last submission: 1. Added checks for netif_subqueue_stopped() to net/core/netpoll.c, net/core/pktgen.c, and to software device hard_start_xmit in dev_queue_xmit(). 2. Removed TCA_PRIO_TEST and added TCA_PRIO_MQ for sch_prio and sch_rr. 3. Fixed dependancy issues in net/sched/Kconfig with NET_SCH_RR. 4. Implemented the new nested compat attribute API for MQ in NET_SCH_PRIO and NET_SCH_RR. 5. Allow sch_rr and sch_prio to turn multiqueue hardware support on and off at loadtime. This patchset is an updated version of previous multiqueue network device support patches. The general approach of introducing a new API for multiqueue network devices to register with the stack has remained. The changes include adding a round-robin qdisc, heavily based on sch_prio, which will allow queueing to hardware with no OS-enforced queuing policy. sch_prio still has the multiqueue code in it, but has a Kconfig option to compile it out of the qdisc. This allows people with hardware containing scheduling policies to use sch_rr (round-robin), and others without scheduling policies in hardware to continue using sch_prio if they wish to have some notion of scheduling priority. The patches being sent are split into Documentation, Qdisc changes, and core stack changes. The requested e1000 changes are still being resolved, and will be sent at a later date. The patches to iproute2 for tc will be sent separately, to support sch_rr. -- PJ Waskiewicz [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] NET: [CORE] Stack changes to add multiqueue hardware support API
Updated: Added checks for netif_subqueue_stopped() to netpoll, pktgen, and software device dev_queue_xmit(). This will ensure external events to these subsystems will be handled correctly if a subqueue is shut down. Add the multiqueue hardware device support API to the core network stack. Allow drivers to allocate multiple queues and manage them at the netdev level if they choose to do so. Added a new field to sk_buff, namely queue_mapping, for drivers to know which tx_ring to select based on OS classification of the flow. Signed-off-by: Peter P Waskiewicz Jr [EMAIL PROTECTED] --- include/linux/etherdevice.h |3 +- include/linux/netdevice.h | 62 ++- include/linux/skbuff.h |4 ++- net/core/dev.c | 27 +-- net/core/netpoll.c |8 +++--- net/core/pktgen.c | 10 +-- net/core/skbuff.c |3 ++ net/ethernet/eth.c |9 +++--- 8 files changed, 104 insertions(+), 22 deletions(-) diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index f48eb89..b3fbb54 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -39,7 +39,8 @@ extern void eth_header_cache_update(struct hh_cache *hh, struct net_device *dev extern int eth_header_cache(struct neighbour *neigh, struct hh_cache *hh); -extern struct net_device *alloc_etherdev(int sizeof_priv); +extern struct net_device *alloc_etherdev_mq(int sizeof_priv, int queue_count); +#define alloc_etherdev(sizeof_priv) alloc_etherdev_mq(sizeof_priv, 1) /** * is_zero_ether_addr - Determine if give Ethernet address is all zeros. diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e7913ee..6509eb4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -108,6 +108,14 @@ struct wireless_dev; #define MAX_HEADER (LL_MAX_HEADER + 48) #endif +struct net_device_subqueue +{ + /* Give a control state for each queue. This struct may contain +* per-queue locks in the future. +*/ + unsigned long state; +}; + /* * Network device statistics. Akin to the 2.0 ether stats but * with byte counters. @@ -325,6 +333,7 @@ struct net_device #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN packets */ #define NETIF_F_GSO2048/* Enable software GSO. */ #define NETIF_F_LLTX 4096/* LockLess TX */ +#define NETIF_F_MULTI_QUEUE16384 /* Has multiple TX/RX queues */ /* Segmentation offload features */ #define NETIF_F_GSO_SHIFT 16 @@ -543,6 +552,10 @@ struct net_device /* rtnetlink link ops */ const struct rtnl_link_ops *rtnl_link_ops; + + /* The TX queue control structures */ + int egress_subqueue_count; + struct net_device_subqueue egress_subqueue[0]; }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -705,6 +718,48 @@ static inline int netif_running(const struct net_device *dev) return test_bit(__LINK_STATE_START, dev-state); } +/* + * Routines to manage the subqueues on a device. We only need start + * stop, and a check if it's stopped. All other device management is + * done at the overall netdevice level. + * Also test the device if we're multiqueue. + */ +static inline void netif_start_subqueue(struct net_device *dev, u16 queue_index) +{ + clear_bit(__LINK_STATE_XOFF, dev-egress_subqueue[queue_index].state); +} + +static inline void netif_stop_subqueue(struct net_device *dev, u16 queue_index) +{ +#ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + set_bit(__LINK_STATE_XOFF, dev-egress_subqueue[queue_index].state); +} + +static inline int netif_subqueue_stopped(const struct net_device *dev, + u16 queue_index) +{ + return test_bit(__LINK_STATE_XOFF, + dev-egress_subqueue[queue_index].state); +} + +static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index) +{ +#ifdef CONFIG_NETPOLL_TRAP + if (netpoll_trap()) + return; +#endif + if (test_and_clear_bit(__LINK_STATE_XOFF, + dev-egress_subqueue[queue_index].state)) + __netif_schedule(dev); +} + +static inline int netif_is_multiqueue(const struct net_device *dev) +{ + return (!!(NETIF_F_MULTI_QUEUE dev-features)); +} /* Use this variant when it is known for sure that it * is executing from interrupt context. @@ -995,8 +1050,11 @@ static inline void netif_tx_disable(struct net_device *dev) extern voidether_setup(struct net_device *dev); /* Support for loadable net-drivers */ -extern struct net_device *alloc_netdev(int sizeof_priv, const char *name, - void (*setup)(struct
[PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
Updated: This patch applies on top of Patrick McHardy's RTNETLINK nested compat attribute patches. These are required to preserve ABI for iproute2 when working with the multiqueue qdiscs. Add the new sch_rr qdisc for multiqueue network device support. Allow sch_prio and sch_rr to be compiled with or without multiqueue hardware support. sch_rr is part of sch_prio, and is referenced from MODULE_ALIAS. This was done since sch_prio and sch_rr only differ in their dequeue routine. Signed-off-by: Peter P Waskiewicz Jr [EMAIL PROTECTED] --- include/linux/pkt_sched.h |4 +- net/sched/Kconfig | 30 + net/sched/sch_generic.c |3 + net/sched/sch_prio.c | 106 - 4 files changed, 129 insertions(+), 14 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 09808b7..ec3a9a5 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -103,8 +103,8 @@ struct tc_prio_qopt enum { - TCA_PRIO_UNPSEC, - TCA_PRIO_TEST, + TCA_PRIO_UNSPEC, + TCA_PRIO_MQ, __TCA_PRIO_MAX }; diff --git a/net/sched/Kconfig b/net/sched/Kconfig index 475df84..7f14fa6 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -102,8 +102,16 @@ config NET_SCH_ATM To compile this code as a module, choose M here: the module will be called sch_atm. +config NET_SCH_BANDS +bool Multi Band Queueing (PRIO and RR) +---help--- + Say Y here if you want to use n-band multiqueue packet + schedulers. These include a priority-based scheduler and + a round-robin scheduler. + config NET_SCH_PRIO tristate Multi Band Priority Queueing (PRIO) + depends on NET_SCH_BANDS ---help--- Say Y here if you want to use an n-band priority queue packet scheduler. @@ -111,6 +119,28 @@ config NET_SCH_PRIO To compile this code as a module, choose M here: the module will be called sch_prio. +config NET_SCH_RR + tristate Multi Band Round Robin Queuing (RR) + depends on NET_SCH_BANDS + select NET_SCH_PRIO + ---help--- + Say Y here if you want to use an n-band round robin packet + scheduler. + + The module uses sch_prio for its framework and is aliased as + sch_rr, so it will load sch_prio, although it is referred + to using sch_rr. + +config NET_SCH_BANDS_MQ + bool Multiple hardware queue support + depends on NET_SCH_BANDS + ---help--- + Say Y here if you want to allow the PRIO and RR qdiscs to assign + flows to multiple hardware queues on an ethernet device. This + will still work on devices with 1 queue. + + Most people will say N here. + config NET_SCH_RED tristate Random Early Detection (RED) ---help--- diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 9461e8a..203d5c4 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -168,7 +168,8 @@ static inline int qdisc_restart(struct net_device *dev) spin_unlock(dev-queue_lock); ret = NETDEV_TX_BUSY; - if (!netif_queue_stopped(dev)) + if (!netif_queue_stopped(dev) + !netif_subqueue_stopped(dev, skb-queue_mapping)) /* churn baby churn .. */ ret = dev_hard_start_xmit(skb, dev); diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c index 40a13e8..8a716f0 100644 --- a/net/sched/sch_prio.c +++ b/net/sched/sch_prio.c @@ -40,9 +40,11 @@ struct prio_sched_data { int bands; + int curband; /* for round-robin */ struct tcf_proto *filter_list; u8 prio2band[TC_PRIO_MAX+1]; struct Qdisc *queues[TCQ_PRIO_BANDS]; + unsigned char mq; }; @@ -70,14 +72,28 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) #endif if (TC_H_MAJ(band)) band = 0; + if (q-mq) + skb-queue_mapping = + q-prio2band[bandTC_PRIO_MAX]; + else + skb-queue_mapping = 0; return q-queues[q-prio2band[bandTC_PRIO_MAX]]; } band = res.classid; } band = TC_H_MIN(band) - 1; - if (band = q-bands) + if (band = q-bands) { + if (q-mq) + skb-queue_mapping = q-prio2band[0]; + else + skb-queue_mapping = 0; return q-queues[q-prio2band[0]]; + } + if (q-mq) + skb-queue_mapping = band; + else + skb-queue_mapping = 0; return q-queues[band]; } @@ -144,17 +160,57 @@ prio_dequeue(struct Qdisc* sch) struct Qdisc *qdisc; for (prio = 0; prio q-bands; prio++) { -
[PATCH 1/3] NET: [DOC] Multiqueue hardware support documentation
Add a brief howto to Documentation/networking for multiqueue. It explains how to use the multiqueue API in a driver to support multiqueue paths from the stack, as well as the qdiscs to use for feeding a multiqueue device. Signed-off-by: Peter P Waskiewicz Jr [EMAIL PROTECTED] --- Documentation/networking/multiqueue.txt | 106 +++ 1 files changed, 106 insertions(+), 0 deletions(-) diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt new file mode 100644 index 000..b7ede56 --- /dev/null +++ b/Documentation/networking/multiqueue.txt @@ -0,0 +1,106 @@ + + HOWTO for multiqueue network device support + === + +Section 1: Base driver requirements for implementing multiqueue support +Section 2: Qdisc support for multiqueue devices +Section 3: Brief howto using PRIO or RR for multiqueue devices + + +Intro: Kernel support for multiqueue devices +- + +Kernel support for multiqueue devices is only an API that is presented to the +netdevice layer for base drivers to implement. This feature is part of the +core networking stack, and all network devices will be running on the +multiqueue-aware stack. If a base driver only has one queue, then these +changes are transparent to that driver. + + +Section 1: Base driver requirements for implementing multiqueue support +--- + +Base drivers are required to use the new alloc_etherdev_mq() or +alloc_netdev_mq() functions to allocate the subqueues for the device. The +underlying kernel API will take care of the allocation and deallocation of +the subqueue memory, as well as netdev configuration of where the queues +exist in memory. + +The base driver will also need to manage the queues as it does the global +netdev-queue_lock today. Therefore base drivers should use the +netif_{start|stop|wake}_subqueue() functions to manage each queue while the +device is still operational. netdev-queue_lock is still used when the device +comes online or when it's completely shut down (unregister_netdev(), etc.). + +Finally, the base driver should indicate that it is a multiqueue device. The +feature flag NETIF_F_MULTI_QUEUE should be added to the netdev-features +bitmap on device initialization. Below is an example from e1000: + +#ifdef CONFIG_E1000_MQ + if ( (adapter-hw.mac.type == e1000_82571) || +(adapter-hw.mac.type == e1000_82572) || +(adapter-hw.mac.type == e1000_80003es2lan)) + netdev-features |= NETIF_F_MULTI_QUEUE; +#endif + + +Section 2: Qdisc support for multiqueue devices +--- + +Currently two qdiscs support multiqueue devices. A new round-robin qdisc, +sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to +bands and queues, and will store the queue mapping into skb-queue_mapping. +Use this field in the base driver to determine which queue to send the skb +to. + +sch_rr has been added for hardware that doesn't want scheduling policies from +software, so it's a straight round-robin qdisc. It uses the same syntax and +classification priomap that sch_prio uses, so it should be intuitive to +configure for people who've used sch_prio. + +The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been +built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of +bands requested is equal to the number of queues on the hardware. If they +are equal, it sets a one-to-one mapping up between the queues and bands. If +they're not equal, it will not load the qdisc. This is the same behavior +for RR. Once the association is made, any skb that is classified will have +skb-queue_mapping set, which will allow the driver to properly queue skb's +to multiple queues. + + +Section 3: Brief howto using PRIO and RR for multiqueue devices +--- + +The userspace command 'tc,' part of the iproute2 package, is used to configure +qdiscs. To add the PRIO qdisc to your network device, assuming the device is +called eth0, run the following command: + +# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue + +This will create 4 bands, 0 being highest priority, and associate those bands +to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping +would look like: + +band 0 = queue 0 +band 1 = queue 1 +band 2 = queue 2 +band 3 = queue 3 + +Traffic will begin flowing through each queue if your TOS values are assigning +traffic across the various bands. For example, ssh traffic will always try to +go out band 0 based on TOS - Linux priority conversion (realtime traffic), +so it will be sent out queue 0. ICMP traffic (pings) fall into the normal +traffic classification, which is band 1. Therefore pings will be send out
[PATCH] iproute2: sch_rr support in tc
Updated: This patch applies on top of Patrick McHardy's RTNETLINK patches to add nested compat attributes. This is needed to maintain ABI for sch_{rr|prio} in the kernel with respect to tc. A new option, namely multiqueue, was added to sch_prio and sch_rr. This will allow a user to turn multiqueue support on for sch_prio or sch_rr at loadtime. Also, tc qdisc ls will display whether or not multiqueue is enabled on that qdisc. This patch is to support the new sch_rr (round-robin) qdisc being proposed in NET for multiqueue network device support in the Linux network stack. It uses q_prio.c as the template, since the qdiscs are nearly identical, outside of the -dequeue() routine. Signed-off-by: Peter P Waskiewicz Jr [EMAIL PROTECTED] --- include/linux/pkt_sched.h |2 - tc/Makefile |1 tc/q_prio.c | 15 - tc/q_rr.c | 126 + 4 files changed, 138 insertions(+), 6 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index fa0ec53..ec3a9a5 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -104,7 +104,7 @@ struct tc_prio_qopt enum { TCA_PRIO_UNSPEC, - TCA_PRIO_TEST, + TCA_PRIO_MQ, __TCA_PRIO_MAX }; diff --git a/tc/Makefile b/tc/Makefile index 9d618ff..62e2697 100644 --- a/tc/Makefile +++ b/tc/Makefile @@ -9,6 +9,7 @@ TCMODULES += q_fifo.o TCMODULES += q_sfq.o TCMODULES += q_red.o TCMODULES += q_prio.o +TCMODULES += q_rr.o TCMODULES += q_tbf.o TCMODULES += q_cbq.o TCMODULES += f_rsvp.o diff --git a/tc/q_prio.c b/tc/q_prio.c index 4934416..b34bc05 100644 --- a/tc/q_prio.c +++ b/tc/q_prio.c @@ -29,7 +29,7 @@ static void explain(void) { - fprintf(stderr, Usage: ... prio bands NUMBER priomap P1 P2...\n); + fprintf(stderr, Usage: ... prio bands NUMBER priomap P1 P2...[multiqueue]\n); } #define usage() return(-1) @@ -41,6 +41,7 @@ static int prio_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct n int idx = 0; struct tc_prio_qopt opt={3,{ 1, 2, 2, 2, 1, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1 }}; struct rtattr *nest; + unsigned char mq = 0; while (argc 0) { if (strcmp(*argv, bands) == 0) { @@ -58,6 +59,8 @@ static int prio_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct n return -1; } pmap_mode = 1; + } else if (strcmp(*argv, multiqueue) == 0) { + mq = 1; } else if (strcmp(*argv, help) == 0) { explain(); return -1; @@ -92,7 +95,7 @@ static int prio_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct n } */ nest = addattr_nest_compat(n, 1024, TCA_OPTIONS, opt, sizeof(opt)); - addattr32(n, 1024, TCA_PRIO_TEST, 123); + addattr32(n, 1024, TCA_PRIO_MQ, mq); addattr_nest_compat_end(n, nest); return 0; } @@ -106,15 +109,17 @@ int prio_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt) if (opt == NULL) return 0; - if (parse_rtattr_nested_compat(tb, TCA_PRIO_MAX, opt, (void *)qopt, sizeof(*qopt))) + if (parse_rtattr_nested_compat(tb, TCA_PRIO_MAX, opt, qopt, sizeof(*qopt))) return -1; fprintf(f, bands %u priomap , qopt-bands); for (i=0; i=TC_PRIO_MAX; i++) fprintf(f, %d, qopt-priomap[i]); - if (tb[TCA_PRIO_TEST]) - fprintf(f, TCA_PRIO_TEST: %u , *(__u32 *)RTA_DATA(tb[TCA_PRIO_TEST])); + if (tb[TCA_PRIO_MQ]) + fprintf(f, multiqueue: %s , + *(unsigned char *)RTA_DATA(tb[TCA_PRIO_MQ]) ? on : off); + return 0; } diff --git a/tc/q_rr.c b/tc/q_rr.c new file mode 100644 index 000..f74f4d5 --- /dev/null +++ b/tc/q_rr.c @@ -0,0 +1,126 @@ +/* + * q_rr.c RR. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors:PJ Waskiewicz, [EMAIL PROTECTED] + * Original Authors: Alexey Kuznetsov, [EMAIL PROTECTED] (from PRIO) + * + * Changes: + * + * Ole Husgaard [EMAIL PROTECTED]: 990513: prio2band map was always reset. + * J Hadi Salim [EMAIL PROTECTED]: 990609: priomap fix. + */ + +#include stdio.h +#include stdlib.h +#include unistd.h +#include syslog.h +#include fcntl.h +#include sys/socket.h +#include netinet/in.h +#include arpa/inet.h +#include string.h + +#include utils.h +#include tc_util.h + +static void explain(void) +{ + fprintf(stderr, Usage: ... rr bands NUMBER priomap P1 P2... [multiqueue]\n); +} + +#define usage() return(-1) + +static int
Re: [RFD] L2 Network namespace infrastructure
David Miller [EMAIL PROTECTED] writes: From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 23 Jun 2007 11:19:34 -0600 Further and fundamentally all a global achieves is removing the need for the noise patches where you pass the pointer into the various functions. For long term maintenance it doesn't help anything. I don't accept that we have to add another function argument to a bunch of core routines just to support this crap, especially since you give no way to turn it off and get that function argument slot back. To be honest I think this form of virtualization is a complete waste of time, even the openvz approach. We're protecting the kernel from itself, and that's an endless uphill battle that you will never win. Let's do this kind of stuff properly with a real minimal hypervisor, hopefully with appropriate hardware level support and good virtualized device interfaces, instead of this namespace stuff. At least the hypervisor approach you have some chance to fully harden in some verifyable and truly protected way, with namespaces it's just a pipe dream and everyone who works on these namespace approaches knows that very well. The only positive thing that came out of this work is the great auditing that the openvz folks have done and the bugs they have found, but it basically ends right there. Dave thank you for your candor, it looks like I have finally made the pieces small enough that we can discuss them. If you want the argument to compile out. That is not a problem at all. I dropped that part from my patch because it makes infrastructure more complicated and there appeared to be no gain. However having a type that you can pass that the compiler can optimize away is not a problem. Basically you just make the argument: typedef struct {} you_can_compile_me_out; /* when you don't want it. */ typedef void * you_can_compile_me_out; /* when you do want it. */ And gcc will generate no code to pass the argument when you compile it out. As far as the hardening goes. There is definitely a point there, short of a kernel proof subsystem that sounds correct to me. There are some other factors that make a different tradeoff interesting. First hypervisors do not allow global optimizations (because of the better isolation) so have an inherent performance disadvantage. Something like a 10x scaling penalty from the figures I have seen. Even more interesting for me is the possibility of unmodified application migration. Where the limiting factor is that you cannot reliably restore an application because the global identifiers are not available. So yes monolithic kernels may have grown so complex that they cannot be verified and thus you cannot actually keep untrusted users from doing bad things to each other with any degree of certainty. However the interesting cases for me are cases where the users are not aggressively hostile with each other but being stuck with one set of global identifiers are a problem. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
David Miller wrote: I don't accept that we have to add another function argument to a bunch of core routines just to support this crap, especially since you give no way to turn it off and get that function argument slot back. To be honest I think this form of virtualization is a complete waste of time, even the openvz approach. We're protecting the kernel from itself, and that's an endless uphill battle that you will never win. Let's do this kind of stuff properly with a real minimal hypervisor, hopefully with appropriate hardware level support and good virtualized device interfaces, instead of this namespace stuff. Strongly seconded. This containerized virtualization approach just bloats up the kernel for something that is inherently fragile and IMO less secure -- protecting the kernel from itself. Plenty of other virt approaches don't stir the code like this, while simultaneously providing fewer, more-clean entry points for the virtualization to occur. And that's speaking WITHOUT my vendor hat on... Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Jeff Garzik [EMAIL PROTECTED] writes: David Miller wrote: I don't accept that we have to add another function argument to a bunch of core routines just to support this crap, especially since you give no way to turn it off and get that function argument slot back. To be honest I think this form of virtualization is a complete waste of time, even the openvz approach. We're protecting the kernel from itself, and that's an endless uphill battle that you will never win. Let's do this kind of stuff properly with a real minimal hypervisor, hopefully with appropriate hardware level support and good virtualized device interfaces, instead of this namespace stuff. Strongly seconded. This containerized virtualization approach just bloats up the kernel for something that is inherently fragile and IMO less secure -- protecting the kernel from itself. Plenty of other virt approaches don't stir the code like this, while simultaneously providing fewer, more-clean entry points for the virtualization to occur. Wrong. I really don't want to get into a my virtualization approach is better then yours. But this is flat out wrong. 99% of the changes I'm talking about introducing are just: - variable + ptr-variable There are more pieces mostly with when we initialize those variables but that is the essence of the change. And as opposed to other virtualization approaches so far no one has been able to measure the overhead. I suspect there will be a few more cache line misses somewhere but they haven't shown up yet. If the only use was strong isolation which Dave complains about I would concur that the namespace approach is inappropriate. However there are a lot other uses. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
Eric W. Biederman wrote: Jeff Garzik [EMAIL PROTECTED] writes: David Miller wrote: I don't accept that we have to add another function argument to a bunch of core routines just to support this crap, especially since you give no way to turn it off and get that function argument slot back. To be honest I think this form of virtualization is a complete waste of time, even the openvz approach. We're protecting the kernel from itself, and that's an endless uphill battle that you will never win. Let's do this kind of stuff properly with a real minimal hypervisor, hopefully with appropriate hardware level support and good virtualized device interfaces, instead of this namespace stuff. Strongly seconded. This containerized virtualization approach just bloats up the kernel for something that is inherently fragile and IMO less secure -- protecting the kernel from itself. Plenty of other virt approaches don't stir the code like this, while simultaneously providing fewer, more-clean entry points for the virtualization to occur. Wrong. I really don't want to get into a my virtualization approach is better then yours. But this is flat out wrong. 99% of the changes I'm talking about introducing are just: - variable + ptr-variable There are more pieces mostly with when we initialize those variables but that is the essence of the change. You completely dodged the main objection. Which is OK if you are selling something to marketing departments, but not OK Containers introduce chroot-jail-like features that give one a false sense of security, while still requiring one to poke holes in the illusion to get hardware-specific tasks accomplished. The capable/not-capable model (i.e. superuser / normal user) is _still_ being secured locally, even after decades of work and whitepapers and audits. You are drinking Deep Kool-Aid if you think adding containers to the myriad kernel subsystems does anything besides increasing fragility, and decreasing security. You are securing in-kernel subsystems against other in-kernel subsystems. superuser/user model made that difficult enough... now containers add exponential audit complexity to that. Who is to say that a local root does not also pierce the container model? And as opposed to other virtualization approaches so far no one has been able to measure the overhead. I suspect there will be a few more cache line misses somewhere but they haven't shown up yet. If the only use was strong isolation which Dave complains about I would concur that the namespace approach is inappropriate. However there are a lot other uses. Sure there are uses. There are uses to putting the X server into the kernel, too. At some point complexity and featuritis has to take a back seat to basic sanity. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
From: Benny Amorsen [EMAIL PROTECTED] Date: 23 Jun 2007 23:22:38 +0200 Policy routing just doesn't cut it; it's cumbersome to set up, limited to 256 tables False. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET 00/02]: MACVLAN driver
(Applogies for not maintaining thread id, I'm not subscribed.) We don't have any clean interfaces by which to do this MAC programming, and we do need something for it soon. Yep, that's been on my long term wish list for a while, as well. Overall I would like to see a more flexible way of allowing the net stack to learn each NIC's RX filter capabilities, and exploiting them. Plenty of NICs, even 100Mbps ones, support RX filter management that allows scanning for $hw_limit unicast addresses, before having to put the hardware into promisc mode. A thought I had when I discovered this ability in the Natsemi/NS83815 chip was to use these RX filters for perfect multicast DA matching until they ran out, and then reverting to the normal Multicast DA matching mechanisms. Another alternative use I thought of was to use these filters to filter out different ethernet protocol types e.g. if an interface is only going to be processing IPv4 packets, program these filters to only accept frames with type 0800 for IP and 0806 for ARP, reverting to non-filtering if there are too many protocol types, as per the way the interfaces operate today. I think it could be useful to expose the ability to have the NIC ignore broadcast packets, or more generally, expose the three catagories of address recognition that NICs seem to allow to be enabled / disabled - unicast, multicast and broadcast. If an interface then didn't need to have broadcast reception enabled e.g. an IPv6 only interface (or Appletalk), then it wouldn't be, preventing the host from having to process broadcasts it's going to ignore anyway. A future common scenario where this ability might be useful would be LANs with a mix of IPv4 only, IPv4/IPv6 and IPv6-only nodes. The ability to enable/disable unicast, multicast and broadcast address recognition individually on a NIC seems to be widespread - I've found that the original early to mid 90s Ne2K chip, the NS8390D, the Netgear FA311/FA312 chip, the NS83815 and the SMC Epic/100 chip all have specific individual register values for those three types of addresses. Regards, Mark. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 23 Jun 2007 15:41:16 -0600 If you want the argument to compile out. That is not a problem at all. I dropped that part from my patch because it makes infrastructure more complicated and there appeared to be no gain. However having a type that you can pass that the compiler can optimize away is not a problem. Basically you just make the argument: typedef struct {} you_can_compile_me_out; /* when you don't want it. */ typedef void * you_can_compile_me_out; /* when you do want it. */ And gcc will generate no code to pass the argument when you compile it out. I don't want to have to see or be aware of the types or the fact that we support namespaces when I work on the networking code. This is why I like the security layer in the kernel we have, I can disable it and it's completely not there. And I can be completely ignorant of it's existence when I work on the networking stack. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] L2 Network namespace infrastructure
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sat, 23 Jun 2007 16:56:49 -0600 If the only use was strong isolation which Dave complains about I would concur that the namespace approach is inappropriate. However there are a lot other uses. By your very admission the only appropriate use case is when users are not hostile and can be trusted to some extent. And that by definition makes it not appropriate for a general purpose operating system like Linux. Containers are I believe a step backwards, and we're better than that. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html