Re: Network namespaces a path to mergable code.
Sam Vilain <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> In general it is possible to get file descriptors opened by someone >> else because unix domain sockets allow file descriptor passing. Similarly >> I think there are cases in both unshare and fork that allows you to sockets >> open before you entered a namespace. >> > > This is an interesting point; it is known to be possible to do this on a > traditional system, because with a Unix Domain socket, the other end is > always in the same Unix Domain. > > However what we're doing is saying that, well, the other end of the > socket might not be in the same Unix Domain. In fact, we've already > smashed to pieces this monolithic concept of a Unix Domain, to the point > where the other end might be in a different network domain, but is in > the same filesystem domain, for instance. Does it get to pass file > descriptors through? Despite what it might look like unix domain sockets do not live in the filesystem. They store a cookie in the filesystem that roughly corresponds to the port number of an AF_INET socket. When you open a socket the lookup is done by the cookie retrieved from the filesystem. So except for their cookies unix domain sockets are always in the network stack. Which means it is a royal pain to create a unix domain socket between namespaces. Which is the generally desired behavior. > We would appear to be stretching the definition of "Unix Domain" > somewhat if we allow these sockets to exist between network namespaces. > Maybe it doesn't matter; this is just a VFS namespace feature/caveat. Unless I am mistaken this is something that can only be created (given my describe semantics) when you create the container. So if you want it you got it but you can't create it if you never had it. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TOE, etc.
From: Herbert Xu <[EMAIL PROTECTED]> Date: Wed, 28 Jun 2006 15:35:54 +1000 > With their RDMA NIC, we'll have TCP/SCTP connections that bypass > netfilter, tc, IPsec, AF_PACKET/tcpdump and the rest of our stack > while at the same time it is using the same IP address as us and > deciding what packets we will or won't see. That's true. I don't think we should really add any more help for these kinds of things then. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] [Network namespace] Network device sharing by view
Eric W. Biederman wrote: > Have a few more network interfaces for a layer 2 solution > is fundamental. Believing without proof and after arguments > to the contrary that you have not contradicted that a layer 2 > solution is inherently slower is non-productive. Arguing > that a layer 2 only solution most prove itself on guest to guest > communication is also non-productive. > Yes, it does break what some people consider to be a sanity condition when you don't have loopback anymore within a guest. I once experimented with using 127.* addresses for per-guest loopback devices with vserver to fix this, but that couldn't work without fixing glibc to not make assumptions deep in the bowels of the resolver. I logged a fault with gnu.org and you can guess where it went :-). I don't think it's just the performance issue, though. Consider also that if you only have one set of interfaces to manage, the overall configuration of the network stack is simpler. `ip addr list' on the host shows all the addresses on the system, you only have one routing table to manage, one set of iptables, etc. That being said, perhaps if each guest got its own interface, and from some suitably privileged context you could see them all, perhaps it would be nicer and maybe just as fast. Perhaps then *devices* could get their own routing namespaces, and routing namespaces could get iptables namespaces, or something like that, to give the most options. > With a guest with 4 IPs > 10.0.0.1 192.168.0.1 172.16.0.1 127.0.0.1 > How do you make INADDR_ANY work with just filtering at bind time? > It used to just bind to the first one. Don't know if it still does. Sam. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network namespaces a path to mergable code.
Eric W. Biederman wrote: > In general it is possible to get file descriptors opened by someone > else because unix domain sockets allow file descriptor passing. Similarly > I think there are cases in both unshare and fork that allows you to sockets > open before you entered a namespace. > This is an interesting point; it is known to be possible to do this on a traditional system, because with a Unix Domain socket, the other end is always in the same Unix Domain. However what we're doing is saying that, well, the other end of the socket might not be in the same Unix Domain. In fact, we've already smashed to pieces this monolithic concept of a Unix Domain, to the point where the other end might be in a different network domain, but is in the same filesystem domain, for instance. Does it get to pass file descriptors through? We would appear to be stretching the definition of "Unix Domain" somewhat if we allow these sockets to exist between network namespaces. Maybe it doesn't matter; this is just a VFS namespace feature/caveat. Sam. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network namespaces a path to mergable code.
On Tue, Jun 27, 2006 at 10:33:48PM -0600, Eric W. Biederman wrote: > > Something to examine here is that if both network devices and sockets > are tagged does that still allow implicit network namespace passing. I think avoiding implicit network namespace passing expresses more power/flexibility plus it would make things clearer to what container/namespace a given network resource belongs too. >From our experience with an implementation of network containers [Virtual Routing for ipv4/ipv6, with a complete isolation between containers where ip addresses can overlap...], there is some problem domain in which you cannot afford to duplicate a process/daemon in each container [a big process for instance, scalability w.r.t. number of containers etc] By having a proper namespace tag per socket, this can be solved by allowing a process running in the host context to create sockets in that namespace than moving them to the target guest namespaces [via a special setsockopt for instance or unix domain socket as you said]. Regards > > Eric > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch 1/1] AF_UNIX Datagram getpeersec (minor fix)
Hi, Minor fix (un-export selinux_get_sock_sid()). thanks, Catherine -- From: [EMAIL PROTECTED] This patch implements an API whereby an application can determine the label of its peer's Unix datagram sockets via the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of the peer of a Unix datagram socket. The application can then use this security context to determine the security context for processing on behalf of the peer who sent the packet. Patch design and implementation: The design and implementation is very similar to the UDP case for INET sockets. Basically we build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). To retrieve the security context, the application first indicates to the kernel such desire by setting the SO_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for Unix datagram socket should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_SOCKET && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow a server socket to receive security context of the peer. Testing: We have tested the patch by setting up Unix datagram client and server applications. We verified that the server can retrieve the security context using the auxiliary data mechanism of recvmsg. --- include/asm-alpha/socket.h |1 + include/asm-arm/socket.h |1 + include/asm-arm26/socket.h |1 + include/asm-cris/socket.h|1 + include/asm-frv/socket.h |1 + include/asm-h8300/socket.h |1 + include/asm-i386/socket.h|1 + include/asm-ia64/socket.h|1 + include/asm-m32r/socket.h|1 + include/asm-m68k/socket.h|1 + include/asm-mips/socket.h|1 + include/asm-parisc/socket.h |1 + include/asm-powerpc/socket.h |1 + include/asm-s390/socket.h|1 + include/asm-sh/socket.h |1 + include/asm-sparc/socket.h |1 + include/asm-sparc64/socket.h |1 + include/asm-v850/socket.h|1 + include/asm-x86_64/socket.h |1 + include/asm-xtensa/socket.h |1 + include/linux/net.h |1 + include/net/af_unix.h|6 ++ include/net/scm.h| 17 + net/core/sock.c | 11 +++ net/unix/af_unix.c | 27 +++ security/selinux/hooks.c | 11 --- 26 files changed, 90 insertions(+), 3 deletions(-) diff -puN include/asm-alpha/socket.h~lsm-secpeer-unix include/asm-alpha/socket.h --- linux-2.6.17-rc6-mm2-JM/include/asm-alpha/socket.h~lsm-secpeer-unix 2006-06-27 18:14:52.0 -0400 +++ linux-2.6.17-rc6-mm2-JM-cxzhang/include/asm-alpha/socket.h 2006-06-27 18:16:31.0 -0400 @@ -51,6 +51,7 @@ #define SCM_TIMESTAMP SO_TIMESTAMP #define SO_PEERSEC 30 +#define SO_PASSSEC 34 /* Security levels - as per NRL IPv6 - don't actually do anything */ #define SO_SECURITY_AUTHENTICATION 19 diff -puN include/asm-arm/socket.h~lsm-secpeer-unix include/asm-arm/socket.h --- linux-2.6.17-rc6-mm2-JM/include/asm-arm/socket.h~lsm-secpeer-unix 2006-06-27 18:15:10.0 -0400 +++ linux-2.6.17-rc6-mm2-JM-cxzhang/include/asm-arm/socket.h2006-06-27 18:16:31.0 -0400 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 34 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-arm26/socket.h~lsm-secpeer-unix include/asm-arm26/socket.h --- linux-2.6.17-rc6-mm2-JM/include/asm-arm26/socket.h~lsm-secpeer-unix 2006-06-27 18:15:10.0 -0400 +++ linux-2.6.17-rc6-mm2-JM-cxzhang/include/asm-arm26/socket.h 2006-06-27 18:16:31.0 -0400 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 34 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-cris/socket.h~lsm-secpeer-unix include/asm-cris/socket.h --- linux-2.6.17-rc6-mm2-JM/include/asm-cris/socket.h~lsm-secpeer-unix 2006-06-27 18:15:10.0 -0400 +++ linux-2.6.17-rc6-mm2-JM-cxzhang/include/asm-cris/socket.h 2006-06-27 18:16:31.0 -0400 @@ -50,6 +50,7 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define
Re: TOE, etc.
On Tue, Jun 27, 2006 at 09:43:23PM -0700, David Miller wrote: > > Socket state, and that is one thing I don't see them doing yet. I wonder what happens when the Linux TCP stack attempts to open a connection to a remote host when that connection is already open in the RDMA NIC? For that matter what happens if a Linux application decides to listen on a TCP port already listened on by the RDMA NIC? The only saving grace is that they're only doing RDMA rather than arbitrary TCP. However, exactly the same infrastructure can be used to do arbitrary TCP should they wish to. > But we have to realize they've already been given %95 of the > interfaces they need to speak IP using our routes and our neighbour > entries. > > Right? Yes, however I think the same argument could be applied to TOE. With their RDMA NIC, we'll have TCP/SCTP connections that bypass netfilter, tc, IPsec, AF_PACKET/tcpdump and the rest of our stack while at the same time it is using the same IP address as us and deciding what packets we will or won't see. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 driver and interrupt coalescence questions
Hi Cris, I'm looking to decrease the interrupt load on the system. During the test I mentioned above I had some interesting and confusing results. The changes from the default settings to the settings I posted resulted in a 100% performance increase (counted by the number of VoIP audio streams the tested server could support). With default settings one of the two CPUs in the system maxed out at 99% cpu usage handling interrupts, while the second CPU was not maxed out, but we started to drop packets and the VoIP call setups started showing retransmits (which is the measurement for failure in this test) at about 300 streams. With the new settings we were able to hit 600 streams. So I definately recognized a significant improvement. However I'd still like to get more improvement. At 600 streams and 20ms packets we are looking at 30,000 pps. The % of cpu (1 CPU as apparently the interrupts can't be shared across multiple CPUs) used for interrupt handling at this 600 stream limit was 88.0%. interrupts can be balances across multiple CPUs or not. It depends on 4 areas: 1. enabling/disabling such option in kernel upon compilation; 2. enabling/disabling of a user-space service for interrupt balancing, "irqbalance" on redhat, nothing such on debian; 3. enabling of disabling cpu affinity for an irq; Normally, irq-affinity for a nic interrupt is considered good, but if a CPU is overloaded you may try irq balancing. Now what was interesting was on the test generation side (same hardware exactly) of things, I was using the SIPP software to generate the VoIP streams, and each blade in the blade server was only able to generate ~200 streams, with default settings in ethtool, one of the CPUs would hit max usage for interrupt handling at that point. So I modified the ethtool settings to match those I listed above and there was no discernable difference. It was identical performance to the default settings. RTP streams generation can burn your CPU cycles as well as output of them to network, thus balancing of the load among the CPUs, irqbalancing may improve something. -- Sincerely, -- Robert Iakobashvili, coroberti at gmail dot com Navigare necesse est, vivere non est necesse. -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]NET: Add ECN support for TSO
On Tue, Jun 27, 2006 at 09:54:39PM -0700, Michael Chan wrote: > > Assuming that we'll later have GSO_TCPV6, isn't it better to check for > TCPV4 explicitly now? Or just change it later when necessary. Good point, I suppose you never know whether a V6 TSO-capable card is going to handle ECN correctly in both cases. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]NET: Add ECN support for TSO
On Wed, 2006-06-28 at 14:42 +1000, Herbert Xu wrote: > On Tue, Jun 27, 2006 at 09:37:01PM -0700, Michael Chan wrote: > > @@ -56,6 +55,9 @@ static inline void TCP_ECN_send(struct s > > if (tp->ecn_flags&TCP_ECN_QUEUE_CWR) { > > tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR; > > skb->h.th->cwr = 1; > > + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) > > + skb_shinfo(skb)->gso_type |= > > + SKB_GSO_TCPV4_ECN; > > As a byte-pincher I must suggest that you turn this check into something > like > > if (skb_shinfo(skb)->gso_type) > > or even > > if (skb_shinfo(skb)->gso_size) > Assuming that we'll later have GSO_TCPV6, isn't it better to check for TCPV4 explicitly now? Or just change it later when necessary. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TOE, etc.
From: Herbert Xu <[EMAIL PROTECTED]> Date: Wed, 28 Jun 2006 14:29:59 +1000 > On Wed, Jun 28, 2006 at 12:18:25AM -0400, Jeff Garzik wrote: > > > > A PCI device that presents itself as a SCSI controller, but under the > > hood is really iSCSI-over-TCP smells like TOE. Running a virtualized > > Linux guest on top of a proprietary stack [which provides networking > > services to guests] also smells like TOE. :) > > Agreed. However, when they start adding hooks to the ARP table, the > routing table, and PMTU management, it begs the question what more is > there to add for TOE (well, user-space driven TOE at least)? Socket state, and that is one thing I don't see them doing yet. > Put it another way, I think the dividing line between TOE and iSCSI or > virtualisation is exactly the interface between them and the Linux kernel. > If the interface is an existing one such as SCSI or standard IP then it's > OK. However, when it starts poking in the guts of the Linux stack I'd say > that it has crossed the line. Yeah, it's starting to smell really bad. But we have to realize they've already been given %95 of the interfaces they need to speak IP using our routes and our neighbour entries. Right? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]NET: Add ECN support for TSO
On Tue, Jun 27, 2006 at 09:37:01PM -0700, Michael Chan wrote: > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]> Looks good to me too! > @@ -56,6 +55,9 @@ static inline void TCP_ECN_send(struct s > if (tp->ecn_flags&TCP_ECN_QUEUE_CWR) { > tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR; > skb->h.th->cwr = 1; > + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) > + skb_shinfo(skb)->gso_type |= > + SKB_GSO_TCPV4_ECN; As a byte-pincher I must suggest that you turn this check into something like if (skb_shinfo(skb)->gso_type) or even if (skb_shinfo(skb)->gso_size) :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TOE, etc. (was Re: [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism)
Herbert Xu wrote: On Wed, Jun 28, 2006 at 12:18:25AM -0400, Jeff Garzik wrote: A PCI device that presents itself as a SCSI controller, but under the hood is really iSCSI-over-TCP smells like TOE. Running a virtualized Linux guest on top of a proprietary stack [which provides networking services to guests] also smells like TOE. :) Agreed. However, when they start adding hooks to the ARP table, the routing table, and PMTU management, it begs the question what more is there to add for TOE (well, user-space driven TOE at least)? Well, you've always been able to implement userspace (or otherwise completely-virtualized) network stack. tuntap and the packet socket enable that, if nothing else. But, like you characterize below, those are existing, well-defined, easily contained interfaces. Put it another way, I think the dividing line between TOE and iSCSI or virtualisation is exactly the interface between them and the Linux kernel. If the interface is an existing one such as SCSI or standard IP then it's OK. However, when it starts poking in the guts of the Linux stack I'd say that it has crossed the line. Strongly agreed. Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]NET: Add ECN support for TSO
On Wed, 2006-06-28 at 13:48 +1000, Herbert Xu wrote: > I think you're mixing up GSO the mechanism with GSO the flag. The GSO > flag simply tells the TCP stack whether TSO should be used or not, even > if the hardware does not support TSO at all. The GSO mechanism on the > other hand is ALWAYS present. So regardless of the presence of the GSO > flag, you can always rely on the GSO mechanism to pick up the pieces (or > rather generate the pieces as the case may be :) > Thanks, that was my confusion. Here's the revised patch: [NET]: Add ECN support for TSO In the current TSO implementation, NETIF_F_TSO and ECN cannot be turned on together in a TCP connection. The problem is that most hardware that supports TSO does not handle CWR correctly if it is set in the TSO packet. Correct handling requires CWR to be set in the first packet only if it is set in the TSO header. This patch adds the ability to turn on NETIF_F_TSO and ECN using GSO if necessary to handle TSO packets with CWR set. Hardware that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev-> features flag. All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set. If the output device does not have the NETIF_F_TSO_ECN feature set, GSO will split the packet up correctly with CWR only set in the first segment. With help from Herbert Xu <[EMAIL PROTECTED]>. Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock flag is completely removed. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 84b0f0d..a42a9f4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -316,6 +316,7 @@ struct net_device #define NETIF_F_TSO(SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT) #define NETIF_F_UFO(SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT) #define NETIF_F_GSO_ROBUST (SKB_GSO_DODGY << NETIF_F_GSO_SHIFT) +#define NETIF_F_TSO_ECN(SKB_GSO_TCPV4_ECN << NETIF_F_GSO_SHIFT) #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5fb72da..e74c294 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -175,6 +175,9 @@ enum { /* This indicates the skb is from an untrusted source. */ SKB_GSO_DODGY = 1 << 2, + + /* This indicates the tcp segment has CWR set. */ + SKB_GSO_TCPV4_ECN = 1 << 3, }; /** diff --git a/include/net/sock.h b/include/net/sock.h index 2d8d6ad..7136bae 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -383,7 +383,6 @@ enum sock_flags { SOCK_USE_WRITE_QUEUE, /* whether to call sk->sk_write_space in sock_wfree */ SOCK_DBG, /* %SO_DEBUG setting */ SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */ - SOCK_NO_LARGESEND, /* whether to sent large segments or not */ SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */ SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */ }; @@ -1033,7 +1032,7 @@ static inline void sk_setup_caps(struct if (sk->sk_route_caps & NETIF_F_GSO) sk->sk_route_caps |= NETIF_F_TSO; if (sk->sk_route_caps & NETIF_F_TSO) { - if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len) + if (dst->header_len) sk->sk_route_caps &= ~NETIF_F_TSO; else sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM; diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h index c6b8439..7bb366f 100644 --- a/include/net/tcp_ecn.h +++ b/include/net/tcp_ecn.h @@ -31,10 +31,9 @@ static inline void TCP_ECN_send_syn(stru struct sk_buff *skb) { tp->ecn_flags = 0; - if (sysctl_tcp_ecn && !(sk->sk_route_caps & NETIF_F_TSO)) { + if (sysctl_tcp_ecn) { TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_ECE|TCPCB_FLAG_CWR; tp->ecn_flags = TCP_ECN_OK; - sock_set_flag(sk, SOCK_NO_LARGESEND); } } @@ -56,6 +55,9 @@ static inline void TCP_ECN_send(struct s if (tp->ecn_flags&TCP_ECN_QUEUE_CWR) { tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR; skb->h.th->cwr = 1; + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) + skb_shinfo(skb)->gso_type |= + SKB_GSO_TCPV4_ECN; } } else { /* ACK or retransmitted segment: clear ECT|CE */ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 94fe5b1..7fa0b4a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4178,8 +4178,6 @@ static int tcp_rcv_synsent_state_process */ TCP_ECN_rcv_synack(tp, th); -
Re: Network namespaces a path to mergable code.
Sam Vilain <[EMAIL PROTECTED]> writes: > It sounds then like it would be a good start to have general socket > namespaces, if it would merge more easily - perhaps then network device > namespaces would fall into place more easily. I guess I really see both sockets and devices as the fundamental entities of a network namespace. Sockets need to be tagged because in the general case there is no guarantee that a socket that you are using was created in the network namespace of your current process. In general it is possible to get file descriptors opened by someone else because unix domain sockets allow file descriptor passing. Similarly I think there are cases in both unshare and fork that allows you to sockets open before you entered a namespace. Since you can't create a new socket in a different network namespace I can't see any real problems with allowing them to be used, but they are something to be careful about in container creation code. Something to examine here is that if both network devices and sockets are tagged does that still allow implicit network namespace passing. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TOE, etc. (was Re: [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism)
On Wed, Jun 28, 2006 at 12:18:25AM -0400, Jeff Garzik wrote: > > A PCI device that presents itself as a SCSI controller, but under the > hood is really iSCSI-over-TCP smells like TOE. Running a virtualized > Linux guest on top of a proprietary stack [which provides networking > services to guests] also smells like TOE. :) Agreed. However, when they start adding hooks to the ARP table, the routing table, and PMTU management, it begs the question what more is there to add for TOE (well, user-space driven TOE at least)? > Unfortunately I don't have more details, so you just get a generalized > rant :) OK, the patch under discussion here adds hooks to all the stuff in the previous paragraph for the purpose of RDMA over TCP (well I must say that the exact RDMA application/hardware has never been clearly given but this is what I can gather from the previous posts). Put it another way, I think the dividing line between TOE and iSCSI or virtualisation is exactly the interface between them and the Linux kernel. If the interface is an existing one such as SCSI or standard IP then it's OK. However, when it starts poking in the guts of the Linux stack I'd say that it has crossed the line. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network namespaces a path to mergable code.
Andrey Savochkin <[EMAIL PROTECTED]> writes: > Eric, > > On Tue, Jun 27, 2006 at 11:20:40AM -0600, Eric W. Biederman wrote: >> >> Thinking about this I am going to suggest a slightly different direction >> for get a patchset we can merge. >> >> First we concentrate on the fundamentals. >> - How we mark a device as belonging to a specific network namespace. >> - How we mark a socket as belonging to a specific network namespace. > > I agree with the direction of your thoughts. > I was trying to do a similar thing, define clear steps in network > namespace merging. > > My first patchset covers devices but not sockets. > The only difference from what you're suggesting is ipv4 routing. > For me, it is not less important than devices and sockets. May be even > more important, since routing exposes design deficiencies less obvious at > socket level. I agree we need to do it. I mostly want a base that allows us to not need to convert the whole network stack at once and still be able to merge code all the way to the stable kernel. The routing code is important for understanding design choices. It isn't important for merging if that makes sense. For everyone looking at routing choices the IPv6 routing table is interesting because it does not use a hash table, and seems quite possibly to be an equally fast structure that scales better. There is something to think about there. >> As part of the fundamentals we add a patch to the generic socket code >> that by default will disable it for protocol families that do not indicate >> support for handling network namespaces, on a non-default network namespace. > > Fine > > Can you summarize you objections against my way of handling devices, please? > And what was the typo you referred to in your letter to Kirill Korotaev? I have no fundamental objects to the content I have seen so far. Please read the first email Kirill responded too. I quoted a couple of sections of code and described the bugs I saw with the patch. All minor things. The typo I was referring to was a section where the original iteration was on an ifp variable and you called it dev without changing the rest of the code in that section. The only big issue was that the patch too big, and should be split into a patchset for better review. One patch for the new functions, and the an additional patch for each driver/subsystem hunk describing why that chunk needed to be changed. I'm still curious why many of those chunks can't use existing helper functions, to be cleaned up. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
TOE, etc. (was Re: [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism)
Herbert Xu wrote: On Tue, Jun 27, 2006 at 11:24:25PM -0400, Jeff Garzik wrote: I don't see how that position has changed? http://linux-net.osdl.org/index.php/TOE Well I must say that RDMA over TCP smells very much like TOE. They've got an ARP table, a routing table, and presumably a TCP stack. A PCI device that presents itself as a SCSI controller, but under the hood is really iSCSI-over-TCP smells like TOE. Running a virtualized Linux guest on top of a proprietary stack [which provides networking services to guests] also smells like TOE. :) If a TOE vendors wants to do TOE in a way that is transparent to the kernel, more power to them. Such non-Linux TCP stack solutions still suffer many of the problems listed at the web page above, but at least they impose no burden on kernel maintenance. i.e. we really _do not_ want to get into the habit of co-managing arp tables, routing tables, filtering rules, and dozens of other such resources with multiple remote, independent TCP stack. We have enough complexity as it is today, coordinating between the random variations of SMP, uniprocessor, and NUMA machines out there. Not to mention competing with under-the-hood firmware actions (ASF) on NICs. As an aside, RDMA over TCP just seems silly. TCP was _not_ meant to do the things that RDMA users want. The infiniband/RDMA programming model is an ultra-low-latency polling model where one or two apps are allowed to completely consume the machine, either busy-waiting or processing messages. Unfortunately I don't have more details, so you just get a generalized rant :) Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] [Network namespace] Network device sharing by view
Herbert Poetzl <[EMAIL PROTECTED]> writes: > On Tue, Jun 27, 2006 at 10:29:39AM -0600, Eric W. Biederman wrote: >> Herbert Poetzl <[EMAIL PROTECTED]> writes: > >> I watched the linux-vserver irc channel for a while and almost >> every network problem was caused by the change in semantics >> vserver provides. > > the problem here is not the change in semantics compared > to a real linux system (as there basically is none) but > compared to _other_ technologies like UML or QEMU, which > add the need for bridging and additional interfaces, while > Linux-VServer only focuses on the IP layer ... Not being able to bind to INADDR_ANY is a huge semantic change. Unless things have changed recently you get that change when you have two IP addresses in Linux-Vserver. Talking to the outsider world through the loop back interface is a noticeable semantics change. Having to be careful of who uses INADDR_ANY on the host when you have guests is essentially a semantics change. Being able to talk to the outside world with a server bound only to the loopback IP is a weird semantic change. And I suspect I missed something, it is weird peculiar and I don't care to remember all of the exceptions. Have a few more network interfaces for a layer 2 solution is fundamental. Believing without proof and after arguments to the contrary that you have not contradicted that a layer 2 solution is inherently slower is non-productive. Arguing that a layer 2 only solution most prove itself on guest to guest communication is also non-productive. So just to sink one additional nail in the coffin of the silly guest to guest communication issue. For any two guests where fast communication between them is really important I can run an additional interface pair that requires no routing or bridging. Given that the implementation of the tunnel device is essentially the same as the loopback interface and that I make only one trip through the network stack there will be no performance overhead. Similarly for any critical guest communication to the outside world I can give the guest a real network adapter. That said I don't think those things will be necessary and that if they are it is an optimization opportunity to make various bits of the network stack faster. Bridging or routing between guests is an exercise in simplicity and control not a requirement. >> In this case when you allow a guest more than one IP your hack >> while easy to maintain becomes much more complex. > > why? a set of IPs is quite similar to a single IP (which > is actually a subset), so no real change there, only > IP_ANY means something different for a guest ... Which simply filtering at bind time makes impossible. With a guest with 4 IPs 10.0.0.1 192.168.0.1 172.16.0.1 127.0.0.1 How do you make INADDR_ANY work with just filtering at bind time? The host has at least the additional IPs. 10.0.0.2 192.168.0.2 172.16.0.2 127.0.0.1 Herbert I suspect we are talking about completely different implementations otherwise I can't possibly see how we have such different perceptions of their capabilities. I am talking precisely about filter IP addresses at connect or bind time that a guest can use. Which as I recall is what vserver implements. If you are thinking of your ngnet implementation that would explain things. >> Especially as you address each case people care about one at a time. > > hmm? Multiple IPs, IPv6, additional protocols, firewalls. etc. >> In one shot this goes the entire way. Given how many people miss that >> you do the work at layer 2 than at layer 3 I would not call this the >> straight forward approach. The straight forward implementation yes, >> but not the straight forward approach. > > seems I lost you here ... >> > for example, you won't have multiple routing tables >> > in a kernel where this feature is disabled, no? >> > so why should it affect a guest, or require modified >> > apps inside a guest when we would decide to provide >> > only a single routing table? >> > >> >> From my POV, fully virtualized namespaces are the future. >> > >> > the future is already there, it's called Xen or UML, or QEMU :) >> >> Yep. And now we need it to run fast. > > hmm, maybe you should try to optimize linux for Xen then, > as I'm sure it will provide the optimal virtualization > and has all the features folks are looking for (regarding > virtualization) > > I thought we are trying to figure a light-weight subset > of isolation and virtualization technologies and methods > which make sense to have in mainline ... And you presume doing things at layer 2 is more expensive than layer 3. >From what I have seen of layer 3 solutions it is a bloody maintenance nightmare, and an inflexible mess. >> >> It is what makes virtualization solution usable (w/o apps >> >> modifications), provides all the features and doesn't require much >> >> efforts from people to be used. >> > >> > and what if they want to use virtualization inside >> > their guests? where
Re: [PATCH]NET: Add ECN support for TSO
On Tue, Jun 27, 2006 at 08:40:34PM -0700, Michael Chan wrote: > > We need to turn off NETIF_F_TSO for a connection that has negotiated to > turn on ECN if the output device cannot handle TSO and ECN. In other > words, if the output device does not have either GSO or TSO_ECN feature > set. I think you're mixing up GSO the mechanism with GSO the flag. The GSO flag simply tells the TCP stack whether TSO should be used or not, even if the hardware does not support TSO at all. The GSO mechanism on the other hand is ALWAYS present. So regardless of the presence of the GSO flag, you can always rely on the GSO mechanism to pick up the pieces (or rather generate the pieces as the case may be :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]NET: Add ECN support for TSO
On Wed, 2006-06-28 at 13:10 +1000, Herbert Xu wrote: > On Tue, Jun 27, 2006 at 08:06:47PM -0700, Michael Chan wrote: > > > > diff --git a/include/net/sock.h b/include/net/sock.h > > index 2d8d6ad..2c75172 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -1033,7 +1033,8 @@ static inline void sk_setup_caps(struct > > if (sk->sk_route_caps & NETIF_F_GSO) > > sk->sk_route_caps |= NETIF_F_TSO; > > if (sk->sk_route_caps & NETIF_F_TSO) { > > - if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len) > > + if ((sock_flag(sk, SOCK_NO_LARGESEND) && > > + !tso_ecn_capable(sk->sk_route_caps)) || dst->header_len) > > sk->sk_route_caps &= ~NETIF_F_TSO; > > Why turn it off? With GSO in place the stack will handle it just fine > (even your description says so :) We should instead remove all code > that turns off TSO/ECN when the other is present. > We need to turn off NETIF_F_TSO for a connection that has negotiated to turn on ECN if the output device cannot handle TSO and ECN. In other words, if the output device does not have either GSO or TSO_ECN feature set. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] [Network namespace] Network device sharing by view
Alexey Kuznetsov <[EMAIL PROTECTED]> writes: > Hello! > >> It may look weird, but do application really *need* to see eth0 rather >> than eth858354? > > Applications do not care, humans do. :-) > > What's about applications they just need to see exactly the same device > after migration. Not only name, but f.e. also its ifindex. If you do not > create a separate namespace for netdevices, you will inevitably end up > with some strange hack sort of VPIDs to translate (or to partition) ifindices > or to tell that "ping -I eth858354 xxx" is too coimplicated application > to survive migration. Actually there are applications with peculiar licensing practices that do look at devices like eth0 to verify you have the appropriate mac, and do really weird things if you don't have an eth0. Plus there are other cases where it can be simpler to hard code things if it is allowable. (The human factor) Otherwise your configuration must be done through hotplug scripts. But yes there are misguided applications that care. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism
On Tue, Jun 27, 2006 at 11:24:25PM -0400, Jeff Garzik wrote: > > I don't see how that position has changed? > > http://linux-net.osdl.org/index.php/TOE Well I must say that RDMA over TCP smells very much like TOE. They've got an ARP table, a routing table, and presumably a TCP stack. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism
Herbert Xu wrote: On Wed, Jun 28, 2006 at 12:54:10PM +1000, Herbert Xu wrote: Please give more specific reasons for needing these events because it is certainly far from obvious from reading those documents. Never mind, I've found your earlier messages on the list which explains your reasons more clearly. It would be nice if you could include those explanations in your patch description. BTW, does this mean that we're now comfortable with full TOE? I don't see how that position has changed? http://linux-net.osdl.org/index.php/TOE Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]NET: Add ECN support for TSO
On Tue, Jun 27, 2006 at 08:06:47PM -0700, Michael Chan wrote: > > diff --git a/include/net/sock.h b/include/net/sock.h > index 2d8d6ad..2c75172 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -1033,7 +1033,8 @@ static inline void sk_setup_caps(struct > if (sk->sk_route_caps & NETIF_F_GSO) > sk->sk_route_caps |= NETIF_F_TSO; > if (sk->sk_route_caps & NETIF_F_TSO) { > - if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len) > + if ((sock_flag(sk, SOCK_NO_LARGESEND) && > + !tso_ecn_capable(sk->sk_route_caps)) || dst->header_len) > sk->sk_route_caps &= ~NETIF_F_TSO; Why turn it off? With GSO in place the stack will handle it just fine (even your description says so :) We should instead remove all code that turns off TSO/ECN when the other is present. Otherwise the patch looks good. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]bnx2: Add NETIF_F_TSO_ECN
Add NETIF_F_TSO_ECN feature for all bnx2 hardware. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index 7635736..e89d5df 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -5128,6 +5128,16 @@ bnx2_set_rx_csum(struct net_device *dev, return 0; } +static int +bnx2_set_tso(struct net_device *dev, u32 data) +{ + if (data) + dev->features |= NETIF_F_TSO | NETIF_F_TSO_ECN; + else + dev->features &= ~(NETIF_F_TSO | NETIF_F_TSO_ECN); + return 0; +} + #define BNX2_NUM_STATS 46 static struct { @@ -5445,7 +5455,7 @@ static struct ethtool_ops bnx2_ethtool_o .set_sg = ethtool_op_set_sg, #ifdef BCM_TSO .get_tso= ethtool_op_get_tso, - .set_tso= ethtool_op_set_tso, + .set_tso= bnx2_set_tso, #endif .self_test_count= bnx2_self_test_count, .self_test = bnx2_self_test, @@ -5926,7 +5936,7 @@ bnx2_init_one(struct pci_dev *pdev, cons dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX; #endif #ifdef BCM_TSO - dev->features |= NETIF_F_TSO; + dev->features |= NETIF_F_TSO | NETIF_F_TSO_ECN; #endif netif_carrier_off(bp->dev); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]NET: Add ECN support for TSO
In the current TSO implementation, NETIF_F_TSO and ECN cannot be turned on together in a TCP connection. The problem is that most hardware that supports TSO does not handle CWR correctly if it is set in the TSO packet. Correct handling requires CWR to be set in the first packet only if it is set in the TSO header. This patch adds the ability to turn on NETIF_F_TSO and ECN using GSO if necessary to handle TSO packets with CWR set. Hardware that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev-> features flag. All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set. If the output device does not have the NETIF_F_TSO_ECN feature set, GSO will split the packet up correctly with CWR only set in the first segment. It is further assumed that all hardware will handle ECE properly by replicating the ECE flag in all segments. If that is not the case, a simple extension of the logic will be required. Signed-off-by: Michael Chan <[EMAIL PROTECTED]> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index efd1e2a..f393de2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -316,6 +316,7 @@ struct net_device #define NETIF_F_TSO(SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT) #define NETIF_F_UFO(SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT) #define NETIF_F_GSO_ROBUST (SKB_GSO_DODGY << NETIF_F_GSO_SHIFT) +#define NETIF_F_TSO_ECN(SKB_GSO_TCPV4_ECN << NETIF_F_GSO_SHIFT) #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) @@ -1002,6 +1003,11 @@ static inline int netif_needs_gso(struct return !skb_gso_ok(skb, dev->features); } +static inline int tso_ecn_capable(unsigned long features) +{ + return ((features & NETIF_F_GSO) || (features & NETIF_F_TSO_ECN)); +} + #endif /* __KERNEL__ */ #endif /* _LINUX_DEV_H */ diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5fb72da..e74c294 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -175,6 +175,9 @@ enum { /* This indicates the skb is from an untrusted source. */ SKB_GSO_DODGY = 1 << 2, + + /* This indicates the tcp segment has CWR set. */ + SKB_GSO_TCPV4_ECN = 1 << 3, }; /** diff --git a/include/net/sock.h b/include/net/sock.h index 2d8d6ad..2c75172 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1033,7 +1033,8 @@ static inline void sk_setup_caps(struct if (sk->sk_route_caps & NETIF_F_GSO) sk->sk_route_caps |= NETIF_F_TSO; if (sk->sk_route_caps & NETIF_F_TSO) { - if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len) + if ((sock_flag(sk, SOCK_NO_LARGESEND) && + !tso_ecn_capable(sk->sk_route_caps)) || dst->header_len) sk->sk_route_caps &= ~NETIF_F_TSO; else sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM; diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h index c6b8439..871dca2 100644 --- a/include/net/tcp_ecn.h +++ b/include/net/tcp_ecn.h @@ -31,7 +31,8 @@ static inline void TCP_ECN_send_syn(stru struct sk_buff *skb) { tp->ecn_flags = 0; - if (sysctl_tcp_ecn && !(sk->sk_route_caps & NETIF_F_TSO)) { + if (sysctl_tcp_ecn && (!(sk->sk_route_caps & NETIF_F_TSO) || + tso_ecn_capable(sk->sk_route_caps))) { TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_ECE|TCPCB_FLAG_CWR; tp->ecn_flags = TCP_ECN_OK; sock_set_flag(sk, SOCK_NO_LARGESEND); @@ -56,6 +57,9 @@ static inline void TCP_ECN_send(struct s if (tp->ecn_flags&TCP_ECN_QUEUE_CWR) { tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR; skb->h.th->cwr = 1; + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) + skb_shinfo(skb)->gso_type |= + SKB_GSO_TCPV4_ECN; } } else { /* ACK or retransmitted segment: clear ECT|CE */ diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index bdd71db..c4a4dba 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2044,7 +2044,8 @@ struct sk_buff * tcp_make_synack(struct memset(th, 0, sizeof(struct tcphdr)); th->syn = 1; th->ack = 1; - if (dst->dev->features&NETIF_F_TSO) + if ((dst->dev->features & NETIF_F_TSO) && + !tso_ecn_capable(dst->dev->features)) ireq->ecn_ok = 0; TCP_ECN_make_synack(req, th); th->source = inet_sk(sk)->sport; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism
On Wed, Jun 28, 2006 at 12:54:10PM +1000, Herbert Xu wrote: > > Please give more specific reasons for needing these events because it > is certainly far from obvious from reading those documents. Never mind, I've found your earlier messages on the list which explains your reasons more clearly. It would be nice if you could include those explanations in your patch description. BTW, does this mean that we're now comfortable with full TOE? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism
Steve Wise <[EMAIL PROTECTED]> wrote: > > The reason these devices need update events is because they typically > cache this information in hardware and need to be notified when this > information has been updated. For information on RDMA protocols, see: > http://www.ietf.org/html.charters/rddp-charter.html. Please give more specific reasons for needing these events because it is certainly far from obvious from reading those documents. Without reasons these invasive changes may turn out to be completely inappropriate. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 1/1] AF_UNIX Datagram getpeersec (with latest updates)
Got it. Will send a new patch soon. Catherine James Morris <[EMAIL PROTECTED]> wrote on 06/27/2006 10:13:48 PM: > On Tue, 27 Jun 2006, Xiaolan Zhang wrote: > > > > Just one more thing, we don't need to export this function now. > > > > You mean moving it to security/selinux/hooks.c and making it static? > > Yep. > > > I think conceptually this is where it should reside -- auditing system > > might need it in the future, for example. > > We can export it then. > > > > - James > -- > James Morris > <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 1/1] AF_UNIX Datagram getpeersec (with latest updates)
On Tue, 27 Jun 2006, James Morris wrote: > > I think conceptually this is where it should reside -- auditing system > > might need it in the future, for example. > > We can export it then. To clarify, we can export it if the audit system needs it, in the future. - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 1/1] AF_UNIX Datagram getpeersec (with latest updates)
On Tue, 27 Jun 2006, Xiaolan Zhang wrote: > > Just one more thing, we don't need to export this function now. > > You mean moving it to security/selinux/hooks.c and making it static? Yep. > I think conceptually this is where it should reside -- auditing system > might need it in the future, for example. We can export it then. - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Added GSO header verification
On Wed, 2006-06-28 at 08:31 +1000, Herbert Xu wrote: > [NET]: Fix logical error in skb_gso_ok > > The test in skb_gso_ok is backwards. Noticed by Michael Chan > <[EMAIL PROTECTED]>. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Acked-by: Michael Chan <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] cirrus ep93xx ethernet driver
On Mon, Jun 26, 2006 at 04:59:24AM +0200, Lennert Buytenhek wrote: > The cirrus ep93xx is an ARM SoC that includes an ethernet MAC -- > this patch adds a driver for that ethernet MAC. Attached is a new version that optimises interrupt handling somewhat. Since we clear RX status as the first thing we do in the poll handler, we might as well read the read-to-clear version of the interrupt status register in the interrupt handler and avoid the explicit clear in the poll handler. This shaves close to a second off a 128M sendfile() test. At ~40 seconds for a 128M sendfile (~3.2MB/sec), the network performance of this CPU isn't impressive by any means, but given that the CPU only runs at 200MHz and that the MAC doesn't do checksum offloading and insists on 4-byte buffer alignment, we can't really do a whole lot better. The performance with this driver is still a good deal better than with the vendor driver, though -- for this particular test (128M sendfile), the vendor driver needs 1m21s. Apart from that it still uses numeric chip register addresses, I'm quite happy with the driver as it is, it survives heavy beating and is pretty stable. Index: linux-2.6.17-git10/drivers/net/arm/Kconfig === --- linux-2.6.17-git10.orig/drivers/net/arm/Kconfig +++ linux-2.6.17-git10/drivers/net/arm/Kconfig @@ -39,3 +39,10 @@ config ARM_AT91_ETHER help If you wish to compile a kernel for the AT91RM9200 and enable ethernet support, then you should always answer Y to this. + +config EP93XX_ETH + tristate "EP93xx Ethernet support" + depends on NET_ETHERNET && ARM && ARCH_EP93XX + help + This is a driver for the ethernet hardware included in EP93xx CPUs. + Say Y if you are building a kernel for EP93xx based devices. Index: linux-2.6.17-git10/drivers/net/arm/Makefile === --- linux-2.6.17-git10.orig/drivers/net/arm/Makefile +++ linux-2.6.17-git10/drivers/net/arm/Makefile @@ -8,3 +8,4 @@ obj-$(CONFIG_ARM_ETHERH)+= etherh.o obj-$(CONFIG_ARM_ETHER3) += ether3.o obj-$(CONFIG_ARM_ETHER1) += ether1.o obj-$(CONFIG_ARM_AT91_ETHER) += at91_ether.o +obj-$(CONFIG_EP93XX_ETH) += ep93xx_eth.o Index: linux-2.6.17-git10/drivers/net/arm/ep93xx_eth.c === --- /dev/null +++ linux-2.6.17-git10/drivers/net/arm/ep93xx_eth.c @@ -0,0 +1,668 @@ +/* + * EP93xx ethernet network device driver + * Copyright (C) 2006 Lennert Buytenhek <[EMAIL PROTECTED]> + * Dedicated to Marija Kulikova. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "ep93xx_eth.h" + +#define DRV_MODULE_VERSION "0.1" + +#define RX_QUEUE_ENTRIES 64 +#define TX_QUEUE_ENTRIES 8 + +struct ep93xx_descs +{ + struct ep93xx_rdesc rdesc[RX_QUEUE_ENTRIES]; + struct ep93xx_tdesc tdesc[TX_QUEUE_ENTRIES]; + struct ep93xx_rstat rstat[RX_QUEUE_ENTRIES]; + struct ep93xx_tstat tstat[TX_QUEUE_ENTRIES]; +}; + +struct ep93xx_priv +{ + struct resource *res; + void*base_addr; + int irq; + + struct ep93xx_descs *descs; + dma_addr_t descs_dma_addr; + + void*rx_buf[RX_QUEUE_ENTRIES]; + void*tx_buf[TX_QUEUE_ENTRIES]; + + int rx_pointer; + int tx_clean_pointer; + int tx_pointer; + int tx_pending; + + struct net_device_stats stats; +}; + +#define rdb(ep, off) __raw_readb((ep)->base_addr + (off)) +#define rdw(ep, off) __raw_readw((ep)->base_addr + (off)) +#define rdl(ep, off) __raw_readl((ep)->base_addr + (off)) +#define wrb(ep, off, val) __raw_writeb((val), (ep)->base_addr + (off)) +#define wrw(ep, off, val) __raw_writew((val), (ep)->base_addr + (off)) +#define wrl(ep, off, val) __raw_writel((val), (ep)->base_addr + (off)) + +static int ep93xx_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct ep93xx_priv *ep = netdev_priv(dev); + int entry; + + if (unlikely(skb->len) > PAGE_SIZE) { + ep->stats.tx_dropped++; + dev_kfree_skb(skb); + return 0; + } + + entry = ep->tx_pointer; + ep->tx_pointer = (ep->tx_pointer + 1) % TX_QUEUE_ENTRIES; + + ep->descs->tdesc[entry].tdesc1 = + TDESC1_EOF | (entry << 16) | (skb->len & 0xfff); + s
Re: [Patch 1/1] AF_UNIX Datagram getpeersec (with latest updates)
James Morris <[EMAIL PROTECTED]> wrote on 06/27/2006 09:33:17 PM: > On Tue, 27 Jun 2006, Catherine Zhang wrote: > > > diff -puN security/selinux/exports.c~lsm-secpeer-unix > security/selinux/exports.c > > --- linux-2.6.17-rc6-mm2-JM/security/selinux/exports.c~lsm- > secpeer-unix 2006-06-27 18:15:10.914669944 -0400 > > +++ linux-2.6.17-rc6-mm2-JM-cxzhang/security/selinux/exports.c > 2006-06-27 18:16:31.502418744 -0400 > > @@ -17,6 +17,7 @@ > > #include > > #include > > #include > > +#include > > > > #include "security.h" > > #include "objsec.h" > > @@ -72,6 +73,16 @@ void selinux_get_task_sid(struct task_st > > *sid = 0; > > } > > > > +void selinux_get_sock_sid(struct socket *sock, u32 *sid) > > +{ > > + if (selinux_enabled) { > > + const struct inode *inode = SOCK_INODE(sock); > > + selinux_get_inode_sid(inode, sid); > > + return; > > + } > > + *sid = 0; > > +} > > + > > > Just one more thing, we don't need to export this function now. You mean moving it to security/selinux/hooks.c and making it static? I think conceptually this is where it should reside -- auditing system might need it in the future, for example. thanks, Catherine - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 1/1] AF_UNIX Datagram getpeersec (with latest updates)
On Tue, 27 Jun 2006, Catherine Zhang wrote: > diff -puN security/selinux/exports.c~lsm-secpeer-unix > security/selinux/exports.c > --- linux-2.6.17-rc6-mm2-JM/security/selinux/exports.c~lsm-secpeer-unix > 2006-06-27 18:15:10.914669944 -0400 > +++ linux-2.6.17-rc6-mm2-JM-cxzhang/security/selinux/exports.c > 2006-06-27 18:16:31.502418744 -0400 > @@ -17,6 +17,7 @@ > #include > #include > #include > +#include > > #include "security.h" > #include "objsec.h" > @@ -72,6 +73,16 @@ void selinux_get_task_sid(struct task_st > *sid = 0; > } > > +void selinux_get_sock_sid(struct socket *sock, u32 *sid) > +{ > + if (selinux_enabled) { > + const struct inode *inode = SOCK_INODE(sock); > + selinux_get_inode_sid(inode, sid); > + return; > + } > + *sid = 0; > +} > + Just one more thing, we don't need to export this function now. - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch 1/1] AF_UNIX Datagram getpeersec (with latest updates)
Hi, This patch combines all previous updates. Many thanks to James, Dave, and Stephen for their modifications and comments! cheers, Catherine -- From: [EMAIL PROTECTED] This patch implements an API whereby an application can determine the label of its peer's Unix datagram sockets via the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of the peer of a Unix datagram socket. The application can then use this security context to determine the security context for processing on behalf of the peer who sent the packet. Patch design and implementation: The design and implementation is very similar to the UDP case for INET sockets. Basically we build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). To retrieve the security context, the application first indicates to the kernel such desire by setting the SO_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for Unix datagram socket should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_SOCKET && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow a server socket to receive security context of the peer. Testing: We have tested the patch by setting up Unix datagram client and server applications. We verified that the server can retrieve the security context using the auxiliary data mechanism of recvmsg. --- include/asm-alpha/socket.h |1 + include/asm-arm/socket.h |1 + include/asm-arm26/socket.h |1 + include/asm-cris/socket.h|1 + include/asm-frv/socket.h |1 + include/asm-h8300/socket.h |1 + include/asm-i386/socket.h|1 + include/asm-ia64/socket.h|1 + include/asm-m32r/socket.h|1 + include/asm-m68k/socket.h|1 + include/asm-mips/socket.h|1 + include/asm-parisc/socket.h |1 + include/asm-powerpc/socket.h |1 + include/asm-s390/socket.h|1 + include/asm-sh/socket.h |1 + include/asm-sparc/socket.h |1 + include/asm-sparc64/socket.h |1 + include/asm-v850/socket.h|1 + include/asm-x86_64/socket.h |1 + include/asm-xtensa/socket.h |1 + include/linux/net.h |1 + include/linux/selinux.h | 15 +++ include/net/af_unix.h|6 ++ include/net/scm.h| 17 + net/core/sock.c | 11 +++ net/unix/af_unix.c | 27 +++ security/selinux/exports.c | 11 +++ security/selinux/hooks.c |8 +++- 28 files changed, 115 insertions(+), 1 deletion(-) diff -puN include/asm-alpha/socket.h~lsm-secpeer-unix include/asm-alpha/socket.h --- linux-2.6.17-rc6-mm2-JM/include/asm-alpha/socket.h~lsm-secpeer-unix 2006-06-27 18:14:52.586456256 -0400 +++ linux-2.6.17-rc6-mm2-JM-cxzhang/include/asm-alpha/socket.h 2006-06-27 18:16:31.488420872 -0400 @@ -51,6 +51,7 @@ #define SCM_TIMESTAMP SO_TIMESTAMP #define SO_PEERSEC 30 +#define SO_PASSSEC 34 /* Security levels - as per NRL IPv6 - don't actually do anything */ #define SO_SECURITY_AUTHENTICATION 19 diff -puN include/asm-arm/socket.h~lsm-secpeer-unix include/asm-arm/socket.h --- linux-2.6.17-rc6-mm2-JM/include/asm-arm/socket.h~lsm-secpeer-unix 2006-06-27 18:15:10.052800968 -0400 +++ linux-2.6.17-rc6-mm2-JM-cxzhang/include/asm-arm/socket.h2006-06-27 18:16:31.489420720 -0400 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 34 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-arm26/socket.h~lsm-secpeer-unix include/asm-arm26/socket.h --- linux-2.6.17-rc6-mm2-JM/include/asm-arm26/socket.h~lsm-secpeer-unix 2006-06-27 18:15:10.095794432 -0400 +++ linux-2.6.17-rc6-mm2-JM-cxzhang/include/asm-arm26/socket.h 2006-06-27 18:16:31.489420720 -0400 @@ -48,5 +48,6 @@ #define SO_ACCEPTCONN 30 #define SO_PEERSEC 31 +#define SO_PASSSEC 34 #endif /* _ASM_SOCKET_H */ diff -puN include/asm-cris/socket.h~lsm-secpeer-unix include/asm-cris/socket.h --- linux-2.6.17-rc6-mm2-JM/include/asm-cris/socket.h~lsm-secpeer-unix 2006-06-27 18:15:10.132788808 -0400 +++ linux-2.6.17-rc6-mm2-J
Please pull 'upstream' branch of wireless-2.6 (revised)
On Mon, Jun 26, 2006 at 05:25:52PM -0400, John W. Linville wrote: > Michael Buesch: > bcm43xx: suspend MAC while executing long pwork The above patch ruffled some feathers on netdev. In the interest of moving things along, I have pulled that patch out of wireless-2.6. I expect it will be back soon, probably with some additional changes to satisfy concerns raised on the mailing list. NOTE: While I was mucking around, I pulled a bunch of patches from the master branch out into driver-specific branches for adm8211, prism54usb, tiacx, and zd1211rw. Then I rebuilt the master branch by pulling from the driver branches. This is intended to ease the merging of individual drivers upstream (e.g. zd1211rw and maybe tiacx in the near future). Those working off my upstream branch or off Linus' tree should be unaffected. Anyone who works off my master branch may need to rebase, especially if they want me to be able to pull from them (due to dirty history). I apologize for the hassle and appreciate your cooperation! Thanks, John --- The following changes since commit fcc18e83e1f6fd9fa6b333735bf0fcd530655511: Malcolm Parsons: uclinux: use PER_LINUX_32BIT in binfmt_flat are found in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream Daniel Drake: bcm43xx: use softmac-suggested TX rate bcm43xx: enable shared key authentication Eric Sesterhenn: skb used after passing to netif_rx in net/ieee80211/ieee80211_rx.c Faidon Liambotis: Add two PLX device IDs Hong Liu: ieee80211: fix not allocating IV+ICV space when usingencryption in ieee80211_tx_frame Horms: CONFIG_WIRELESS_EXT is neccessary after all John W. Linville: softmac: fix build-break from 881ee6999d66c8fc903b429b73bbe6045b38c549 Joseph Jezak: SoftMAC: Prevent multiple authentication attempts on the same network SoftMAC: Add network to ieee80211softmac_call_events when associate times out Larry Finger: Convert bcm43xx-softmac to use the ieee80211_is_valid_channel routine 2.6.17 missing a call to ieee80211softmac_capabilities from ieee80211softmac_assoc_req Michael Buesch: bcm43xx: workaround init_board vs. IRQ race drivers/net/wireless/bcm43xx/bcm43xx_main.c| 31 - drivers/net/wireless/bcm43xx/bcm43xx_main.h| 24 drivers/net/wireless/bcm43xx/bcm43xx_radio.c |7 + drivers/net/wireless/bcm43xx/bcm43xx_wx.c |2 + drivers/net/wireless/bcm43xx/bcm43xx_xmit.c|5 +++ drivers/net/wireless/hostap/hostap_plx.c |2 + include/net/ieee80211softmac.h |1 + net/ieee80211/ieee80211_rx.c |4 ++- net/ieee80211/ieee80211_tx.c | 15 +++--- net/ieee80211/softmac/ieee80211softmac_assoc.c | 31 - net/ieee80211/softmac/ieee80211softmac_auth.c |4 +-- net/ieee80211/softmac/ieee80211softmac_io.c|3 ++ net/ieee80211/softmac/ieee80211softmac_wx.c| 36 +++- 13 files changed, 105 insertions(+), 60 deletions(-) diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_main.c b/drivers/net/wireless/bcm43xx/bcm43xx_main.c index 085d785..1cd47c5 100644 --- a/drivers/net/wireless/bcm43xx/bcm43xx_main.c +++ b/drivers/net/wireless/bcm43xx/bcm43xx_main.c @@ -1885,6 +1885,15 @@ static irqreturn_t bcm43xx_interrupt_han spin_lock(&bcm->irq_lock); + /* Only accept IRQs, if we are initialized properly. +* This avoids an RX race while initializing. +* We should probably not enable IRQs before we are initialized +* completely, but some careful work is needed to fix this. I think it +* is best to stay with this cheap workaround for now... . +*/ + if (unlikely(bcm43xx_status(bcm) != BCM43xx_STAT_INITIALIZED)) + goto out; + reason = bcm43xx_read32(bcm, BCM43xx_MMIO_GEN_IRQ_REASON); if (reason == 0x) { /* irq not for us (shared irq) */ @@ -1906,19 +1915,11 @@ static irqreturn_t bcm43xx_interrupt_han bcm43xx_interrupt_ack(bcm, reason); - /* Only accept IRQs, if we are initialized properly. -* This avoids an RX race while initializing. -* We should probably not enable IRQs before we are initialized -* completely, but some careful work is needed to fix this. I think it -* is best to stay with this cheap workaround for now... . -*/ - if (likely(bcm43xx_status(bcm) == BCM43xx_STAT_INITIALIZED)) { - /* disable all IRQs. They are enabled again in the bottom half. */ - bcm->irq_savedstate = bcm43xx_interrupt_disable(bcm, BCM43xx_IRQ_ALL); - /* save the reason code and call our bottom half. */ - bcm->irq_reason = reason; - tasklet_schedule(&bcm->isr_tasklet); - } + /* disable all IRQs. They are en
Re: [patch 2/6] [Network namespace] Network device sharing by view
Hello! > It may look weird, but do application really *need* to see eth0 rather > than eth858354? Applications do not care, humans do. :-) What's about applications they just need to see exactly the same device after migration. Not only name, but f.e. also its ifindex. If you do not create a separate namespace for netdevices, you will inevitably end up with some strange hack sort of VPIDs to translate (or to partition) ifindices or to tell that "ping -I eth858354 xxx" is too coimplicated application to survive migration. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Repost PATCH 6/6] PMC MSP85x0 gigabit ethernet driver
Kiran Thota <[EMAIL PROTECTED]> : [...] > +/* > + * Allocate the SKBs for the Rx ring. Also used > + * for refilling the queue > + */ > + > +static int msp85x0_ge_rx_task(struct net_device *netdev, > + msp85x0_ge_port_info *msp85x0_ge_eth) > +{ > + struct device *device = > &msp85x0_ge_device[msp85x0_ge_eth->port_num]->dev; > + volatile msp85x0_ge_rx_desc *rx_desc; > + struct sk_buff *skb; > + int rx_used_desc; > + int count = 0; > + oom_flag=0; Global variable. [...] > + if((rx_used_desc + 1) == MSP85x0_GE_RX_QUEUE) > + msp85x0_ge_eth->rx_used_desc_q =0; > + else > + msp85x0_ge_eth->rx_used_desc_q = (rx_used_desc + 1); Consider greping drivers/net for NEXT_TX or RING_NEXT. [...] > +static void msp85x0_port_init(struct net_device *netdev, > + msp85x0_ge_port_info * msp85x0_ge_eth) > +{ > + unsigned long reg_data; > + unsigned int port_num; > + > + port_num = msp85x0_ge_eth->port_num; > + for (port_num = 0; port_num < NO_PORTS; port_num++) There is something strange with port_num here. [...] > +static int start_tx_and_rx_activity(struct net_device *netdev) > +{ The returned value is not used. [...] > +static int trtg_block_enable(struct net_device *netdev) > +{ The returned value is not used. [...] > +static int enable_tx_and_rx_interrupts(struct net_device *netdev) > +{ The returned value is not used. [...] > +static int xdma_config(struct net_device *netdev) > +{ The indentation of this function is mostly broken. [...] > +static int msp85x0_ge_port_start(struct net_device *netdev) > +{ The returned value is not used. [...] > +static int msp85x0_eth_setup_tx_rx_fifo(struct net_device *dev) > +{ The returned value is not used. [...] > +static int msp85x0_ge_eth_open(struct net_device *netdev) > +{ [...] > + /* Fill the Rx ring with the SKBs */ > + msp85x0_ge_port_start(netdev); [...] > + if (!(phy_reg & 0x0400)) { > + netif_carrier_off(netdev); > + netif_stop_queue(netdev); > + return MSP85x0_ERROR; skb leak [...] > +int msp85x0_ge_start_xmit(struct sk_buff *skb, struct net_device *netdev) > +{ static This function ought to use NETDEV_TX_OK/NETDEV_TX_BUSY (should not happen). [...] > +static int msp85x0_ge_free_tx_queue(struct net_device *netdev) > +{ > + msp85x0_ge_port_info *msp85x0_ge_eth = netdev_priv(netdev); > + int pkts,port_num = msp85x0_ge_eth->port_num; > + int tx_desc_used; > + struct sk_buff *skb; > + > + /* Take the lock */ > + pkts=get_tx_pkt_count(port_num); > + while(pkts) > + { > + pkts--; > + tx_desc_used = msp85x0_ge_eth->tx_used_desc_q; > + > + /* return right away */ > + if (tx_desc_used == msp85x0_ge_eth->tx_curr_desc_q) > + break; > + > + skb = msp85x0_ge_eth->tx_skb[tx_desc_used]; > + dev_kfree_skb_irq(skb); msp85x0_ge_free_tx_queue() is issued in msp85x0_ge_start_xmit(), thus not in irq context. [...] > +static int msp85x0_ge_receive_queue(struct net_device *netdev) > +{ Indentation needs to fixed in this function. [...] > + if (packet.cmd_sts & (MSP85x0_GE_RX_PERR | > MSP85x0_GE_RX_OVERFLOW_ERROR | MSP85x0_GE_RX_TRUNC | MSP85x0_GE_RX_CRC_ERROR)) > + { > + if(packet.cmd_sts & MSP85x0_GE_RX_OVERFLOW_ERROR) > + stats->rx_over_errors++; > + else if(packet.cmd_sts & MSP85x0_GE_RX_TRUNC) > + stats->rx_frame_errors++; > + else > + stats->rx_errors++; > + dev_kfree_skb_any(skb); It's called in ->poll(), outside of in_irq(). dev->last_rx should be updated after netif_receive_skb(). [...] > +static int msp85x0_ge_poll(struct net_device *netdev, int *budget) > +{ [...] > + spin_lock_irqsave(&msp85x0_ge_eth->lock,flags); Afaik poll takes place with irq enabled: no need to save/restore. [...] > +/* Don't Re-Initialize the port, Just start from where it stops */ > +static int msp85x0_ge_eth_reopen(struct net_device *netdev) > +{ > + msp85x0_ge_port_info *msp85x0_ge_eth = netdev_priv(netdev); > + unsigned int reg_data,irq; > + int retval; > + > +irq = MSP85x0_ETH_PORT_IRQ; > + > + retval = request_irq(irq, INTERRUPT_HANDLER, > + SA_INTERRUPT | SA_SAMPLE_RANDOM | SA_SHIRQ, netdev->name, > netdev); /me scratches head... msp85x0_ge_change_mtu() does _not_ free_irqv and it issues msp85x0_ge_eth_reopen(). I noticed this comment in msp85x0_ge_eth_stop(): /* This to work around to solve the msp85x0 shutdown and bringup sequence */ Can you elaborate ? Random remarks: - drivers/net/msp85x0_ge.h includes a lot of #define MSP85x0_GE_MSTATX_SOMETHING Your customers would surely appreciate extended
Re: [patch 2/6] [Network namespace] Network device sharing by view
On Wed, 2006-06-28 at 00:52 +0200, Herbert Poetzl wrote: > seriously, what I think Eric meant was that it > might be nice (especially for migration purposes) > to keep the device namespace completely virtualized > and not just isolated ... It might be nice, but it is probably unneeded for an initial implementation. In practice, a cluster doing checkpoint/restart/migration will already have a system in place for assigning unique IPs or other identifiers to each container. It could just as easily make sure to assign unique network device names to containers. The issues really only come into play when you have an unstructured set of machines and you want to migrate between them without having prepared them with any kind of unique net device names beforehand. It may look weird, but do application really *need* to see eth0 rather than eth858354? -- Dave - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] [Network namespace] Network device sharing by view
On Tue, Jun 27, 2006 at 10:29:39AM -0600, Eric W. Biederman wrote: > Herbert Poetzl <[EMAIL PROTECTED]> writes: > > > On Tue, Jun 27, 2006 at 01:54:51PM +0400, Kirill Korotaev wrote: > >> >>My point is that if you make namespace tagging at routing time, and > >> >>your packets are being routed only once, you lose the ability > >> >>to have separate routing tables in each namespace. > >> > > >> > > >> >Right. What is the advantage of having separate the routing tables ? > > > >> it is impossible to have bridged networking, tun/tap and many other > >> features without it. I even doubt that it is possible to introduce > >> private netfilter rules w/o virtualization of routing. > > > > why? iptables work quite fine on a typical linux > > system when you 'delegate' certain functionality > > to certain chains (i.e. doesn't require access to > > _all_ of them) > > > >> The question is do we want to have fully featured namespaces which > >> allow to create isolated virtual environments with semantics and > >> behaviour of standalone linux box or do we want to introduce some > >> hacks with new rules/restrictions to meet ones goals only? > > > > well, soemtimes 'hacks' are not only simpler but also > > a much better solution for a given problem than the > > straight forward approach ... > > Well I would like to see a hack that qualifies. > I watched the linux-vserver irc channel for a while and almost > every network problem was caused by the change in semantics > vserver provides. the problem here is not the change in semantics compared to a real linux system (as there basically is none) but compared to _other_ technologies like UML or QEMU, which add the need for bridging and additional interfaces, while Linux-VServer only focuses on the IP layer ... > In this case when you allow a guest more than one IP your hack > while easy to maintain becomes much more complex. why? a set of IPs is quite similar to a single IP (which is actually a subset), so no real change there, only IP_ANY means something different for a guest ... > Especially as you address each case people care about one at a time. hmm? > In one shot this goes the entire way. Given how many people miss that > you do the work at layer 2 than at layer 3 I would not call this the > straight forward approach. The straight forward implementation yes, > but not the straight forward approach. seems I lost you here ... > > for example, you won't have multiple routing tables > > in a kernel where this feature is disabled, no? > > so why should it affect a guest, or require modified > > apps inside a guest when we would decide to provide > > only a single routing table? > > > >> From my POV, fully virtualized namespaces are the future. > > > > the future is already there, it's called Xen or UML, or QEMU :) > > Yep. And now we need it to run fast. hmm, maybe you should try to optimize linux for Xen then, as I'm sure it will provide the optimal virtualization and has all the features folks are looking for (regarding virtualization) I thought we are trying to figure a light-weight subset of isolation and virtualization technologies and methods which make sense to have in mainline ... > >> It is what makes virtualization solution usable (w/o apps > >> modifications), provides all the features and doesn't require much > >> efforts from people to be used. > > > > and what if they want to use virtualization inside > > their guests? where do you draw the line? > > The implementation doesn't have any problems with guests inside > of guests. > > The only reason to restrict guests inside of guests is because > the we aren't certain which permissions make sense. well, we have not even touched the permission issues yet best, Herbert > Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 8/8] NetLabel: tie NetLabel into the Kconfig system
Modify the net/Kconfig file to enable selecting the NetLabel Kconfig options. --- net/Kconfig |2 ++ 1 files changed, 2 insertions(+) Index: linux-2.6.17.i686-quilt/net/Kconfig === --- linux-2.6.17.i686-quilt.orig/net/Kconfig +++ linux-2.6.17.i686-quilt/net/Kconfig @@ -228,6 +228,8 @@ source "net/tux/Kconfig" config WIRELESS_EXT bool +source "net/netlabel/Kconfig" + endif # if NET endmenu # Networking -- paul moore linux security @ hp - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 7/8] NetLabel: unlabeled packet handling
Add unlabeled packet support to the NetLabel subsystem. NetLabel does not do any processing on unlabled packets, but it must support passing unlabled packets on both the inbound and outbound sides. --- net/netlabel/netlabel_unlabeled.c | 258 ++ 1 files changed, 258 insertions(+) Index: linux-2.6.17.i686-quilt/net/netlabel/netlabel_unlabeled.c === --- /dev/null +++ linux-2.6.17.i686-quilt/net/netlabel/netlabel_unlabeled.c @@ -0,0 +1,258 @@ +/* + * NetLabel Unlabeled Support + * + * This file defines functions for dealing with unlabeled packets for the + * NetLabel system. The NetLabel system manages static and dynamic label + * mappings for network protocols such as CIPSO and RIPSO. + * + * Author: Paul Moore <[EMAIL PROTECTED]> + * + */ + +/* + * (c) Copyright Hewlett-Packard Development Company, L.P., 2006 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See + * the GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "netlabel_user.h" +#include "netlabel_domainhash.h" +#include "netlabel_unlabeled.h" + +/* Accept unlabeled packets flag */ +static atomic_t netlabel_unlabel_accept_flg = ATOMIC_INIT(0); + +/* NetLabel Generic NETLINK CIPSOv4 family */ +static struct genl_family netlbl_unlabel_gnl_family = { + .id = GENL_ID_GENERATE, + .hdrsize = 0, + .name = NETLBL_NLTYPE_UNLABELED_NAME, + .version = NETLBL_PROTO_VERSION, + .maxattr = 0, +}; + + +/* + * Local Prototypes + */ + +static void netlbl_unlabel_send_ack(const struct genl_info *info, + const u32 ret_code); + + +/* + * NetLabel Command Handlers + */ + +/** + * netlbl_unlabel_accept - Handle an ACCEPT message + * @skb: the NETLINK buffer + * @info: the Generic NETLINK info block + * + * Description: + * Process a user generated ACCEPT message and set the accept flag accordingly. + * Returns zero on success, negative values on failure. + * + */ +static int netlbl_unlabel_accept(struct sk_buff *skb, struct genl_info *info) +{ + int ret_val; + unsigned char *msg = netlbl_netlink_payload_data(skb); + u32 value; + + ret_val = netlbl_netlink_cap_check(skb, CAP_NET_ADMIN); + if (ret_val != 0) + return ret_val; + + if (netlbl_netlink_payload_len(skb) == 4) { + value = netlbl_get_u32(msg); + if (value == 1 || value == 0) { + atomic_set(&netlabel_unlabel_accept_flg, value); + netlbl_unlabel_send_ack(info, NETLBL_E_OK); + return 0; + } + } + + netlbl_unlabel_send_ack(info, EINVAL); + return -EINVAL; +} + + +/* + * NetLabel Generic NETLINK Command Definitions + */ + +static struct genl_ops netlbl_unlabel_genl_c_accept = { + .cmd = NLBL_UNLABEL_C_ACCEPT, + .flags = 0, + .doit = netlbl_unlabel_accept, + .dumpit = NULL, +}; + +/* + * NetLabel Generic NETLINK Protocol Functions + */ + +/** + * netlbl_unlabel_send_ack - Send an ACK message + * @info: the generic NETLINK information + * @ret_code: return code to use + * + * Description: + * This function sends an ACK message to the sender of the NETLINK message + * specified by @info. + * + */ +static void netlbl_unlabel_send_ack(const struct genl_info *info, + const u32 ret_code) +{ + size_t msg_size; + size_t data_size; + struct sk_buff *skb; + unsigned char *data; + + data_size = GENL_HDRLEN + 8; + msg_size = NLMSG_SPACE(data_size); + + skb = alloc_skb(msg_size, GFP_KERNEL); + if (skb == NULL) + return; + + data = netlbl_netlink_hdr_put(skb, + info->snd_pid, + 0, + 0, + netlbl_unlabel_gnl_family.id, + NLBL_UNLABEL_C_ACK, + data_size); + if (data == NULL) + goto send_ack_failure; + + netlbl_putinc_u32(&data, info->snd_seq); + netlbl_putinc_u32(&data,
[RFC 0/8] NetLabel: updated to use generic netlink
An updated patch set with some small changes as well as one big one - NetLabel now uses the generic netlink interface for it's kernel-userland communication as opposed to it's own dedicated netlink type. Needless to say this requires an updated userland configuration tool, so for those of you running this patch please grab version 0.14 (or later) of the netlabel_tools which can be found here: * http://free.linux.hp.com/~pmoore/projects/linux_cipso Thanks. -- paul moore linux security @ hp - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 5/8] NetLabel: SELinux support
Add NetLabel support to the SELinux LSM and modify the socket_post_create() LSM hook to return an error code. The most significant part of this patch is the addition of NetLabel hooks into the following SELinux LSM hooks: * selinux_file_permission() * selinux_socket_sendmsg() * selinux_socket_post_create() * selinux_socket_post_accept() [NEW] * selinux_socket_sock_rcv_skb() * selinux_socket_getpeersec_stream() * selinux_socket_getpeersec_dgram() The basic reasoning behind this patch is that outgoing packets are "NetLabel'd" by labeling their socket and the NetLabel security attributes are checked via the additional hook in selinux_socket_sock_rcv_skb(). NetLabel itself is only a labeling mechanism, similar to filesystem extended attributes, it is up to the SELinux enforcement mechanism to perform the actual access checks. In addition to the changes outlined above this patch also includes some changes to the extended bitmap (ebitmap) and multi-level security (mls) code to import and export SELinux TE/MLS attributes into and out of NetLabel. --- include/linux/security.h| 25 - net/socket.c| 13 security/dummy.c|6 security/selinux/hooks.c| 59 ++ security/selinux/include/objsec.h | 11 security/selinux/include/selinux_netlabel.h | 94 security/selinux/ss/Makefile|1 security/selinux/ss/ebitmap.c | 155 +++ security/selinux/ss/ebitmap.h |6 security/selinux/ss/mls.c | 160 +++ security/selinux/ss/mls.h | 25 + security/selinux/ss/selinux_netlabel.c | 574 security/selinux/ss/services.c | 12 security/selinux/ss/services.h |2 14 files changed, 1113 insertions(+), 30 deletions(-) Index: linux-2.6.17.i686-quilt/include/linux/security.h === --- linux-2.6.17.i686-quilt.orig/include/linux/security.h +++ linux-2.6.17.i686-quilt/include/linux/security.h @@ -1267,8 +1267,8 @@ struct security_operations { int (*unix_may_send) (struct socket * sock, struct socket * other); int (*socket_create) (int family, int type, int protocol, int kern); - void (*socket_post_create) (struct socket * sock, int family, - int type, int protocol, int kern); + int (*socket_post_create) (struct socket * sock, int family, + int type, int protocol, int kern); int (*socket_bind) (struct socket * sock, struct sockaddr * address, int addrlen); int (*socket_connect) (struct socket * sock, @@ -2677,13 +2677,13 @@ static inline int security_socket_create return security_ops->socket_create(family, type, protocol, kern); } -static inline void security_socket_post_create(struct socket * sock, - int family, - int type, - int protocol, int kern) +static inline int security_socket_post_create(struct socket * sock, + int family, + int type, + int protocol, int kern) { - security_ops->socket_post_create(sock, family, type, -protocol, kern); + return security_ops->socket_post_create(sock, family, type, + protocol, kern); } static inline int security_socket_bind(struct socket * sock, @@ -2809,11 +2809,12 @@ static inline int security_socket_create return 0; } -static inline void security_socket_post_create(struct socket * sock, - int family, - int type, - int protocol, int kern) +static inline int security_socket_post_create(struct socket * sock, + int family, + int type, + int protocol, int kern) { + return 0; } static inline int security_socket_bind(struct socket * sock, Index: linux-2.6.17.i686-quilt/net/socket.c === --- linux-2.6.17.i686-quilt.orig/net/socket.c +++ linux-2.6.17.i686-quilt/net/socket.c @@ -976,11 +976,18 @@ int sock_create_lite(int family, int typ goto out; } - security_socket_post_create(sock, family, type, protocol, 1); sock->type = type; + err = security_socket_post_create(sock, family, type, protocol, 1); + if (err) + goto out_rel
[RFC 1/8] NetLabel: documentation
Documentation for the NetLabel system, this includes a basic overview of how NetLabel works and how LSM developers can integrate it into their favorite LSM. Also, due to the difficulty of finding expired IETF drafts, I am including the IETF CIPSO draft that is the basis of the NetLabel CIPSO implementation. --- CREDITS |7 Documentation/00-INDEX|2 Documentation/netlabel/00-INDEX | 10 Documentation/netlabel/cipso_ipv4.txt | 48 Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt | 791 ++ Documentation/netlabel/introduction.txt | 53 Documentation/netlabel/lsm_interface.txt | 47 7 files changed, 958 insertions(+) Index: linux-2.6.17.i686-quilt/CREDITS === --- linux-2.6.17.i686-quilt.orig/CREDITS +++ linux-2.6.17.i686-quilt/CREDITS @@ -2383,6 +2383,13 @@ N: Thomas Molina E: [EMAIL PROTECTED] D: bug fixes, documentation, minor hackery +N: Paul Moore +E: [EMAIL PROTECTED] +D: NetLabel author +S: Hewlett-Packard +S: 110 Spit Brook Road +S: Nashua, NH 03062 + N: James Morris E: [EMAIL PROTECTED] W: http://namei.org/ Index: linux-2.6.17.i686-quilt/Documentation/00-INDEX === --- linux-2.6.17.i686-quilt.orig/Documentation/00-INDEX +++ linux-2.6.17.i686-quilt/Documentation/00-INDEX @@ -184,6 +184,8 @@ mtrr.txt - how to use PPro Memory Type Range Registers to increase performance. nbd.txt - info on a TCP implementation of a network block device. +netlabel/ + - directory with information on the NetLabel subsystem. networking/ - directory with info on various aspects of networking with Linux. nfsroot.txt Index: linux-2.6.17.i686-quilt/Documentation/netlabel/00-INDEX === --- /dev/null +++ linux-2.6.17.i686-quilt/Documentation/netlabel/00-INDEX @@ -0,0 +1,10 @@ +00-INDEX + - this file. +cipso_ipv4.txt + - documentation on the IPv4 CIPSO protocol engine. +draft-ietf-cipso-ipsecurity-01.txt + - IETF draft of the CIPSO protocol, dated 16 July 1992. +introduction.txt + - NetLabel introduction, READ THIS FIRST. +lsm_interface.txt + - documentation on the NetLabel kernel security module API. Index: linux-2.6.17.i686-quilt/Documentation/netlabel/cipso_ipv4.txt === --- /dev/null +++ linux-2.6.17.i686-quilt/Documentation/netlabel/cipso_ipv4.txt @@ -0,0 +1,48 @@ +NetLabel CIPSO/IPv4 Protocol Engine +== +Paul Moore, [EMAIL PROTECTED] + +May 17, 2006 + + * Overview + +The NetLabel CIPSO/IPv4 protocol engine is based on the IETF Commercial IP +Security Option (CIPSO) draft from July 16, 1992. A copy of this draft can be +found in this directory, consult '00-INDEX' for the filename. While the IETF +draft never made it to an RFC standard it has become a de-facto standard for +labeled networking and is used in many trusted operating systems. + + * Outbound Packet Processing + +The CIPSO/IPv4 protocol engine applies the CIPSO IP option to packets by +adding the CIPSO label to the socket. This causes all packets leaving the +system through the socket to have the CIPSO IP option applied. The socket's +CIPSO label can be changed at any point in time, however, it is recommended +that it is set upon the socket's creation. The LSM can set the socket's CIPSO +label by using the NetLabel security module API; if the NetLabel "domain" is +configured to use CIPSO for packet labeling then a CIPSO IP option will be +generated and attached to the socket. + + * Inbound Packet Processing + +The CIPSO/IPv4 protocol engine validates every CIPSO IP option it finds at the +IP layer without any special handling required by the LSM. However, in order +to decode and translate the CIPSO label on the packet the LSM must use the +NetLabel security module API to extract the security attributes of the packet. +This is typically done at the socket layer using the 'socket_sock_rcv_skb()' +LSM hook. + + * Label Translation + +The CIPSO/IPv4 protocol engine contains a mechanism to translate CIPSO security +attributes such as sensitivity level and category to values which are +appropriate for the host. These mappings are defined as part of a CIPSO +Domain Of Interpretation (DOI) definition and are configured through the +NetLabel user space communication layer. Each DOI definition can have a +different security attribute mapping table. + + * Label Translation Cache + +The NetLabel system provides a framework for caching security attribute +mappings from the network labels to the corresponding LSM identifiers. The +CIPSO/IPv4 protocol engine supports this ca
[RFC 3/8] NetLabel: CIPSOv4 engine
Add support for the Commercial IP Security Option (CIPSO) to the IPv4 network stack. CIPSO has become a de-facto standard for trusted/labeled networking amongst existing Trusted Operating Systems such as Trusted Solaris, HP-UX CMW, etc. This implementation is designed to be used with the NetLabel subsystem to provide explicit packet labeling to LSM developers. The CIPSO/IPv4 packet labeling works by the LSM calling a NetLabel API function which attaches a CIPSO label (IPv4 option) to a given socket; this in turn attaches the CIPSO label to every packet leaving the socket without any extra processing on the outbound side. On the inbound side the individual packet's sk_buff is examined through a call to a NetLabel API function to determine if a CIPSO/IPv4 label is present and if so the security attributes of the CIPSO label are returned to the caller of the NetLabel API function. --- net/ipv4/cipso_ipv4.c | 1749 ++ 1 files changed, 1749 insertions(+) Index: linux-2.6.17.i686-quilt/net/ipv4/cipso_ipv4.c === --- /dev/null +++ linux-2.6.17.i686-quilt/net/ipv4/cipso_ipv4.c @@ -0,0 +1,1749 @@ +/* + * CIPSO - Commercial IP Security Option + * + * This is an implementation of the CIPSO 2.2 protocol as specified in + * draft-ietf-cipso-ipsecurity-01.txt with additional tag types as found in + * FIPS-188, copies of both documents can be found in the Documentation + * directory. While CIPSO never became a full IETF RFC standard many vendors + * have chosen to adopt the protocol and over the years it has become a + * de-facto standard for labeled networking. + * + * Author: Paul Moore <[EMAIL PROTECTED]> + * + */ + +/* + * (c) Copyright Hewlett-Packard Development Company, L.P., 2006 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See + * the GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct cipso_v4_domhsh_entry { + char *domain; + u32 valid; + struct list_head list; + struct rcu_head rcu; +}; + +/* List of available DOI definitions */ +/* XXX - Updates should be minimal so having a single lock for the + cipso_v4_doi_list and the cipso_v4_doi_list->dom_list should be + okay. */ +/* XXX - This currently assumes a minimal number of different DOIs in use, + if in practice there are a lot of different DOIs this list should + probably be turned into a hash table or something similar so we + can do quick lookups. */ +DEFINE_SPINLOCK(cipso_v4_doi_list_lock); +static struct list_head cipso_v4_doi_list = LIST_HEAD_INIT(cipso_v4_doi_list); + +/* Label mapping cache */ +#define CIPSO_V4_CACHE_BUCKETBITS 7 +#define CIPSO_V4_CACHE_BUCKETS(1 << CIPSO_V4_CACHE_BUCKETBITS) +#define CIPSO_V4_CACHE_BUCKETSIZE 10 +#define CIPSO_V4_CACHE_REORDERLIMIT 10 +/* PM - the number of cache buckets should probably be a compile time option */ +struct cipso_v4_map_cache_bkt { + spinlock_t lock; + u32 size; + struct list_head list; +}; +struct cipso_v4_map_cache_entry { + u32 hash; + unsigned char *key; + u32 key_len; + + struct netlbl_lsm_cache lsm_data; + + u32 activity; + struct list_head list; +}; +static u32 cipso_v4_cache_size = 0; +static struct cipso_v4_map_cache_bkt *cipso_v4_cache = NULL; +#define CIPSO_V4_CACHE_ENABLED (cipso_v4_cache_size > 0) + +/* + * Helper Functions + */ + +/** + * cipso_v4_bitmap_walk - Walk a bitmap looking for a bit + * @bitmap: the bitmap + * @bitmap_len: length in bits + * @offset: starting offset + * @state: if non-zero, look for a set (1) bit else look for a cleared (0) bit + * + * Description: + * Starting at @offset, walk the bitmap from left to right until either the + * desired bit is found or we reach the end. Return the bit offset, -1 if + * not found, or -2 if error. + */ +static int cipso_v4_bitmap_walk(const unsigned char *bitmap, + const u32 bitmap_len, + const u32 offset, + const u8 state) +{ + u32 bit_spot; + u32 byte_offset; + unsigned char bitmask; + unsigned char byte; + + /
[RFC 6/8] NetLabel: CIPSOv4 integration
Add CIPSO/IPv4 support and management to the NetLabel subsystem. These changes integrate the CIPSO/IPv4 configuration into the existing NetLabel code and enable the use of CIPSO/IPv4 within the overall NetLabel framework. --- net/netlabel/netlabel_cipso_v4.c | 634 +++ 1 files changed, 634 insertions(+) Index: linux-2.6.17.i686-quilt/net/netlabel/netlabel_cipso_v4.c === --- /dev/null +++ linux-2.6.17.i686-quilt/net/netlabel/netlabel_cipso_v4.c @@ -0,0 +1,634 @@ +/* + * NetLabel CIPSO/IPv4 Support + * + * This file defines the CIPSO/IPv4 functions for the NetLabel system. The + * NetLabel system manages static and dynamic label mappings for network + * protocols such as CIPSO and RIPSO. + * + * Author: Paul Moore <[EMAIL PROTECTED]> + * + */ + +/* + * (c) Copyright Hewlett-Packard Development Company, L.P., 2006 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See + * the GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "netlabel_user.h" +#include "netlabel_cipso_v4.h" + +/* NetLabel Generic NETLINK CIPSOv4 family */ +static struct genl_family netlbl_cipsov4_gnl_family = { + .id = GENL_ID_GENERATE, + .hdrsize = 0, + .name = NETLBL_NLTYPE_CIPSOV4_NAME, + .version = NETLBL_PROTO_VERSION, + .maxattr = 0, +}; + + +/* + * Local Prototypes + */ + +static void netlbl_cipsov4_send_ack(const struct genl_info *info, + const u32 ret_code); + + +/* + * Helper Functions + */ + +/** + * netlbl_cipsov4_doi_free - Frees a CIPSO V4 DOI definition + * @entry: the entry's RCU field + * + * Description: + * This function is designed to be used as a callback to the call_rcu() + * function so that the memory allocated to the DOI definition can be released + * safely. + * + */ +static void netlbl_cipsov4_doi_free(struct rcu_head *entry) +{ + struct cipso_v4_doi *ptr; + + ptr = container_of(entry, struct cipso_v4_doi, rcu); + switch (ptr->type) { + case CIPSO_V4_MAP_STD: + if (ptr->map.std->lvl.cipso_size > 0) + kfree(ptr->map.std->lvl.cipso); + if (ptr->map.std->lvl.local_size > 0) + kfree(ptr->map.std->lvl.local); + if (ptr->map.std->cat.cipso_size > 0) + kfree(ptr->map.std->cat.cipso); + if (ptr->map.std->cat.local_size > 0) + kfree(ptr->map.std->cat.local); + break; + } + kfree(ptr); +} + + +/* + * NetLabel Command Handlers + */ + +/** + * netlbl_cipsov4_add_std - Adds a CIPSO V4 DOI definition + * @doi: the DOI value + * @msg: the ADD message data + * @msg_size: the size of the ADD message buffer + * + * Description: + * Create a new CIPSO_V4_MAP_STD DOI definition based on the given ADD message + * and add it to the CIPSO V4 engine. Return zero on success and non-zero on + * error. + * + */ +static int netlbl_cipsov4_add_std(const u32 doi, + const unsigned char *msg, + const u32 msg_size) +{ + int ret_val = -EPERM; + unsigned char *msg_ptr = (unsigned char *)msg; + u32 msg_len = msg_size; + u32 num_tags; + u32 num_lvls; + u32 num_cats; + struct cipso_v4_doi *doi_def = NULL; + u32 iter; + u32 tmp_val_a; + u32 tmp_val_b; + + if (msg_len < 4) + goto add_std_failure; + num_tags = netlbl_getinc_u32(&msg_ptr); + msg_len -= 4; + if (num_tags == 0 || num_tags > CIPSO_V4_TAG_MAXCNT) + goto add_std_failure; + + doi_def = kmalloc(sizeof(*doi_def), GFP_KERNEL); + if (doi_def == NULL) { + ret_val = -ENOMEM; + goto add_std_failure; + } + doi_def->map.std = kzalloc(sizeof(*doi_def->map.std), + GFP_KERNEL); + if (doi_def->map.std == NULL) { + ret_val = -ENOMEM; + goto add_std_failure; + } + doi_def->type = CIPSO_V4_MAP_STD; + + if (msg_len < num_tags) + goto add_std_failure; + msg_len -= num_tags; + for (iter = 0; iter < n
[RFC 2/8] NetLabel: core network changes
Changes to the core network stack to support the NetLabel subsystem. This includes changes to the IPv4 option handling to support CIPSO labels, and a new NetLabel hook in inet_accept() to handle NetLabel attributes across a accept()s done by in-kernel daemons. --- include/linux/ip.h |1 include/net/cipso_ipv4.h | 251 include/net/inet_sock.h |2 include/net/netlabel.h | 488 +++ net/ipv4/Makefile|1 net/ipv4/af_inet.c |3 net/ipv4/ah4.c |2 net/ipv4/ip_options.c| 19 + 8 files changed, 765 insertions(+), 2 deletions(-) Index: linux-2.6.17.i686-quilt/include/linux/ip.h === --- linux-2.6.17.i686-quilt.orig/include/linux/ip.h +++ linux-2.6.17.i686-quilt/include/linux/ip.h @@ -57,6 +57,7 @@ #define IPOPT_SEC (2 |IPOPT_CONTROL|IPOPT_COPY) #define IPOPT_LSRR (3 |IPOPT_CONTROL|IPOPT_COPY) #define IPOPT_TIMESTAMP(4 |IPOPT_MEASUREMENT) +#define IPOPT_CIPSO(6 |IPOPT_CONTROL|IPOPT_COPY) #define IPOPT_RR (7 |IPOPT_CONTROL) #define IPOPT_SID (8 |IPOPT_CONTROL|IPOPT_COPY) #define IPOPT_SSRR (9 |IPOPT_CONTROL|IPOPT_COPY) Index: linux-2.6.17.i686-quilt/include/net/cipso_ipv4.h === --- /dev/null +++ linux-2.6.17.i686-quilt/include/net/cipso_ipv4.h @@ -0,0 +1,251 @@ +/* + * CIPSO - Commercial IP Security Option + * + * This is an implementation of the CIPSO 2.2 protocol as specified in + * draft-ietf-cipso-ipsecurity-01.txt with additional tag types as found in + * FIPS-188, copies of both documents can be found in the Documentation + * directory. While CIPSO never became a full IETF RFC standard many vendors + * have chosen to adopt the protocol and over the years it has become a + * de-facto standard for labeled networking. + * + * Author: Paul Moore <[EMAIL PROTECTED]> + * + */ + +/* + * (c) Copyright Hewlett-Packard Development Company, L.P., 2006 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See + * the GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#ifndef _CIPSO_IPV4_H +#define _CIPSO_IPV4_H + +#include +#include +#include +#include + +/* known doi values */ +#define CIPSO_V4_DOI_UNKNOWN 0x + +/* tag types */ +#define CIPSO_V4_TAG_INVALID 0 +#define CIPSO_V4_TAG_RBITMAP 1 +#define CIPSO_V4_TAG_ENUM 2 +#define CIPSO_V4_TAG_RANGE5 +#define CIPSO_V4_TAG_PBITMAP 6 +#define CIPSO_V4_TAG_FREEFORM 7 + +/* doi mapping types */ +#define CIPSO_V4_MAP_UNKNOWN 0 +#define CIPSO_V4_MAP_STD 1 +#define CIPSO_V4_MAP_PASS 2 + +/* limits */ +#define CIPSO_V4_MAX_REM_LVLS 256 +#define CIPSO_V4_INV_LVL 0x8000 +#define CIPSO_V4_MAX_LOC_LVLS (CIPSO_V4_INV_LVL - 1) +#define CIPSO_V4_MAX_REM_CATS 65536 +#define CIPSO_V4_INV_CAT 0x8000 +#define CIPSO_V4_MAX_LOC_CATS (CIPSO_V4_INV_CAT - 1) + +/* + * CIPSO DOI definitions + */ + +/* DOI definition struct */ +#define CIPSO_V4_TAG_MAXCNT 5 +struct cipso_v4_doi { + u32 doi; + u32 type; + union { + struct cipso_v4_std_map_tbl *std; + } map; + u8 tags[CIPSO_V4_TAG_MAXCNT]; + + u32 valid; + struct list_head list; + struct rcu_head rcu; + struct list_head dom_list; +}; + +/* Standard CIPSO mapping table */ +/* NOTE: the highest order bit (i.e. 0x8000) is an 'invalid' flag, if the + * bit is set then consider that value as unspecified, meaning the + * mapping for that particular level/category is invalid */ +struct cipso_v4_std_map_tbl { + struct { + u32 *cipso; + u32 *local; + u32 cipso_size; + u32 local_size; + } lvl; + struct { + u32 *cipso; + u32 *local; + u32 cipso_size; + u32 local_size; + } cat; +}; + +/* + * Helper Functions + */ + +#define CIPSO_V4_OPTEXIST(x) (IPCB(x)->opt.cipso != 0) +#define CIPSO_V4_OPTPTR(x) ((x)->nh.raw + IPCB(x)->opt.cipso) + +/* + * DOI List Functions + */ + +#ifdef CONFIG_NETLABEL +int cipso_v4_doi_add(struct cipso_v4_doi *doi_def); +int
Re: [PATCH 00/21] e1000: driver update to 7.1.9-k2
Jeff, after comments I've made some adjustments. I'll list them below against the old summary. The changes are available from our git-server: Please pull from: git://lost.foo-projects.org/~ahkok/git/netdev-2.6 upstream These patches are against netdev-2.6#upstream 612eff0e3715a6faff5ba1b74873b99e036c59fe (Brian Haley <[EMAIL PROTECTED]> / [PATCH] s2io: netpoll support) Summary of patches: [01]: fix loopback ethtool test [02]: rework driver hardware reset locking [03]: Make PHY powerup/down a function [04]: fix CONFIG_PM blocks [05]: small performance tweak by removing double code [06]: add smart power down code [07]: change printk into DPRINTK [08]: recycle skb [09]: rework module param code with uninitialized values [10]: force register write flushes to circumvent broken platforms Unmodified. See comments here: http://marc.theaimsgroup.com/?l=linux-netdev&m=115142459725123&w=2 [1] [11]: disable CRC stripping workaround Removed all references to SECRC (crc stripping) instead of leaving it commented. [12]: fix adapter led blinking inconsistency [13]: add E1000_BIG_ENDIAN symbol Dropped this patch entirely [14]: M88 PHY workaround [15]: check return value of _get_speed_and_duplex [16]: disable ERT [17]: add ich8lan core functions [18]: integrate ich8 support into driver [19]: allow user to disable ich8 lock loss workaround [20]: add ich8lan device ID's [21]: increase version to 7.1.9-k2 [1] I can drop #11 in case someone throws a fit ;) - as everyone I'd really like to see patches 17->20 queued for 2.6.18 for obvious reasons - this is the most important section of these patches! Cheers, Auke --- drivers/net/e1000/e1000.h | 10 drivers/net/e1000/e1000_ethtool.c | 143 +-- drivers/net/e1000/e1000_hw.c | 1770 +++--- drivers/net/e1000/e1000_hw.h | 398 drivers/net/e1000/e1000_main.c| 384 +--- drivers/net/e1000/e1000_osdep.h | 13 drivers/net/e1000/e1000_param.c | 213 ++-- 7 files changed, 2530 insertions(+), 401 deletions(-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] [Network namespace] Network device sharing by view
On Tue, Jun 27, 2006 at 09:07:38AM -0700, Ben Greear wrote: > Ben Greear wrote: > >Herbert Poetzl wrote: > > > >>On Mon, Jun 26, 2006 at 03:13:17PM -0700, Ben Greear wrote: > > > >>yes, that sounds good to me, any numbers how that > >>affects networking in general (performance wise and > >>memory wise, i.e. caches and hashes) ... > > > >I'll run some tests later today. Based on my previous tests, > >I don't remember any significant overhead. > > Here's a quick benchmark using my redirect devices (RDD). Each RDD > comes in a pair...when you tx on one, the pkt is rx'd on the peer. > The idea is that it is exactly like two physical ethernet interfaces > connected by a cross-over cable. > > My test system is a 64-bit dual-core Intel system, 3.013 Ghz processor > with 1GB RAM. Fairly standard stuff..it's one of the Shuttle XPC > systems. Kernel is 2.6.16.16 (64-bit). > > > Test setup is: rdd1 -- rdd2 [bridge] rdd3 -- rdd4 > > I am using my proprietary module for the bridge logic...and the > default bridge should be at least this fast. I am injecting 1514 byte > packets on rdd1 and rdd4 with pktgen (bi-directional flow). My pktgen > is also receiving the pkts and gathering stats. > > This setup sustains 1.7Gbps of generated and received traffic between > rdd1 and rdd4. > > Running only the [bridge] between two 10/100/1000 ports on an Intel > PCI-E NIC will sustain about 870Mbps (bi-directional) on this system, > so the virtual devices are quite efficient, as suspected. > > I have not yet had time to benchmark the mac-vlans...hopefully later > today. hmm, maybe you could also benchmark loopback connections (and their throughput) on your system? my (not so fancy) PIII, 32bit, 2.6.17.1 seems to do roughly 2Gbs on the loopback device (tested with dd and netcat) best, Herbert > Thanks, > Ben > > -- > Ben Greear <[EMAIL PROTECTED]> > Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] [Network namespace] Network device sharing by view
On Tue, Jun 27, 2006 at 10:19:23AM -0700, Ben Greear wrote: > Eric W. Biederman wrote: > >Herbert Poetzl <[EMAIL PROTECTED]> writes: > > > > > >>On Tue, Jun 27, 2006 at 05:52:52AM -0600, Eric W. Biederman wrote: > >> > >>>Inside the containers I want all network devices named eth0! > >> > >>huh? even if there are two of them? also tun? > >> > >>I think you meant, you want to be able to have eth0 in > >>_more_ than one guest where eth0 in a guest can also > >>be/use/relate to eth1 on the host, right? > > > > > >Right I want to have an eth0 in each guest where eth0 is > >it's own network device and need have no relationship to > >eth0 on the host. > > How does that help anything? Do you envision programs > that make special decisions on whether the interface is > called eth0 v/s eth151? well, those poor folks who do not have ethernet devices for networking :) seriously, what I think Eric meant was that it might be nice (especially for migration purposes) to keep the device namespace completely virtualized and not just isolated ... I'm fine with that, as long as it does not add overhead or complicate handling, and as far as I can tell, it should not do that ... best, Herbert > Ben > > > -- > Ben Greear <[EMAIL PROTECTED]> > Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH REPOST 0/2][RFC] Network Event Notifier Mechanism
On Tue, Jun 27, 2006 at 09:31:57AM -0500, Steve Wise wrote: > > > I'd like to know more about what the RDMA device is going to do with this > > information. I thought RDMA was for receiving packets? Most of the info > > here pertains to transmission. > > RDMA Ethernet devices adhere to a set of protocols defined by the IETF. > See the RDDP WG (http://www.ietf.org/html.charters/rddp-charter.html) > for the Internet Drafts that define the protocols. Would it be possible for you to give us a quick summary of the relevant points? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network namespaces a path to mergable code.
Andrey Savochkin wrote: > On Tue, Jun 27, 2006 at 11:20:40AM -0600, Eric W. Biederman wrote: > >> Thinking about this I am going to suggest a slightly different direction >> for get a patchset we can merge. >> >> First we concentrate on the fundamentals. >> - How we mark a device as belonging to a specific network namespace. >> - How we mark a socket as belonging to a specific network namespace. >> > > I agree with the direction of your thoughts. > I was trying to do a similar thing, define clear steps in network > namespace merging. > > My first patchset covers devices but not sockets. > The only difference from what you're suggesting is ipv4 routing. > For me, it is not less important than devices and sockets. May be even > more important, since routing exposes design deficiencies less obvious at > socket level. > It sounds then like it would be a good start to have general socket namespaces, if it would merge more easily - perhaps then network device namespaces would fall into place more easily. AIUI socket namespaces are also necessary for situations where you want containers to share IP addresses. AIUI, PlanetLab do something like this with a module atop of VServer already (but read http://openvz.org/pipermail/devel/2006-June/000666.html for a proper explanation from Mark Huang) >> As part of the fundamentals we add a patch to the generic socket code >> that by default will disable it for protocol families that do not indicate >> support for handling network namespaces, on a non-default network namespace. >> > > Fine > > Can you summarize you objections against my way of handling devices, please? > There were many objections, the major one being the patch was too large for certainty of adequate review. Quoting what I perceived as a summary from Eric: > When I went through this, my patchset just added an explicit > continue if the devices was not in the appropriate namespace. > I actually prefer the multiple list implementation but at > the same time I think it is harder to get a clean implementation > out of it. You offered to re-do the patch without separate lists - I suggest that this go ahead. No-one should really care; splitting it out into separate lists can then be considered a performance optimization for later. > And what was the typo you referred to in your letter to Kirill Korotaev? > I think this is the comment he refers to: > These hunks should use for_each_netdev(ifp); Both quotes are from http://lkml.org/lkml/2006/6/26/147 Though, in Kirill's defense, it seems a bit strange to expect him to raise a fault that was just raised by Eric, in a reply to the message where he raised it. Sam. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Added GSO header verification
On Tue, Jun 27, 2006 at 01:46:35PM -0700, Michael Chan wrote: > On Tue, 2006-06-27 at 22:07 +1000, Herbert Xu wrote: > > > [NET]: Added GSO header verification > > > > @@ -2166,10 +2166,14 @@ struct sk_buff *tcp_tso_segment(struct s > > if (!pskb_may_pull(skb, thlen)) > > goto out; > > > > + segs = NULL; > > + if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) > > + goto out; > > + > > This logic doesn't look right to me. Perhaps it's backwards and should > be: > > if (!skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) Oops, you're absolutely right. Here is the fix. [NET]: Fix logical error in skb_gso_ok The test in skb_gso_ok is backwards. Noticed by Michael Chan <[EMAIL PROTECTED]>. Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 84b0f0d..efd1e2a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -994,12 +994,12 @@ static inline int skb_gso_ok(struct sk_b { int feature = skb_shinfo(skb)->gso_size ? skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT : 0; - return (features & feature) != feature; + return (features & feature) == feature; } static inline int netif_needs_gso(struct net_device *dev, struct sk_buff *skb) { - return skb_gso_ok(skb, dev->features); + return !skb_gso_ok(skb, dev->features); } #endif /* __KERNEL__ */ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism
Round 3 Changes: - changed netlink msg for neighbour change to (RTM_NEIGHUPD) - added netlink msg for PMTU change events (RTM_ROUTEUPD) - added netlink messages for redirect (RTM_DELROUTE + RTM_NEWROUTE) - tested neighbour change events via netlink for ipv4 and ipv6. - tested redirect change events via netlink for ipv4. Round 2 Changes: - cleaned up event structures per review feedback. - began integration with netlink (see neighbour changes in patch 2). - added IPv6 support. TODO: - review feedback changes, if any - more testing - retest with RDMA NIC -- This patch implements a mechanism that allows interested clients to register for notification of certain network events. The intended use is to allow RDMA devices (linux/drivers/infiniband) to be notified of neighbour updates, ICMP redirects, path MTU changes, and route changes. The reason these devices need update events is because they typically cache this information in hardware and need to be notified when this information has been updated. For information on RDMA protocols, see: http://www.ietf.org/html.charters/rddp-charter.html. The key events of interest are: - neighbour mac address change - routing redirect (the next hop neighbour changes for a dst_entry) - path mtu change (the path mtu for a dst_entry changes). - route add/deletes NOTE: These new netevents are also passed up to user space via netlink. We would like to get this or similar functionality included in 2.6.19 and request comments. This patchset consists of 2 patches: 1) New files implementing the Network Event Notifier 2) Core network changes to generate network event notifications Signed-off-by: Tom Tucker <[EMAIL PROTECTED]> Signed-off-by: Steve Wise <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH Round 3 1/2] Network Event Notifier Mechanism.
This patch uses notifier blocks to implement a network event notifier mechanism. Clients register their callback function by calling register_netevent_notifier() like this: static struct notifier_block nb = { .notifier_call = my_callback_func }; ... register_netevent_notifier(&nb); --- include/net/netevent.h | 49 +++ net/core/netevent.c| 68 2 files changed, 117 insertions(+), 0 deletions(-) diff --git a/include/net/netevent.h b/include/net/netevent.h new file mode 100644 index 000..22214c8 --- /dev/null +++ b/include/net/netevent.h @@ -0,0 +1,49 @@ +#ifndef _NET_EVENT_H +#define _NET_EVENT_H + +/* + * Generic netevent notifiers + * + * Authors: + * Tom Tucker <[EMAIL PROTECTED]> + * + * Changes: + */ + +#ifdef __KERNEL__ + +#include + +/* + * Generic route info structure. + * + * FamilyData ptr type + * + * AF_INET - struct fib_info * + * AF_INET6- struct rt6_info * + * AF_DECnet - struct dn_route * + */ +struct netevent_route_info { + u16 family; + void *data; +}; + +struct netevent_redirect { + struct dst_entry *old; + struct dst_entry *new; +}; + +enum netevent_notif_type { + NETEVENT_NEIGH_UPDATE = 1, /* arg is struct neighbour ptr */ + NETEVENT_ROUTE_ADD,/* arg is struct netevent_route_info ptr */ + NETEVENT_ROUTE_DEL,/* arg is struct netevent_route_info ptr */ + NETEVENT_PMTU_UPDATE, /* arg is struct dst_entry ptr */ + NETEVENT_REDIRECT, /* arg is struct netevent_redirect ptr */ +}; + +extern int register_netevent_notifier(struct notifier_block *nb); +extern int unregister_netevent_notifier(struct notifier_block *nb); +extern int call_netevent_notifiers(unsigned long val, void *v); + +#endif +#endif diff --git a/net/core/netevent.c b/net/core/netevent.c new file mode 100644 index 000..e995751 --- /dev/null +++ b/net/core/netevent.c @@ -0,0 +1,68 @@ +/* + * Network event notifiers + * + * Authors: + * Tom Tucker <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Fixes: + */ + +#include +#include + +static ATOMIC_NOTIFIER_HEAD(netevent_notif_chain); + +/** + * register_netevent_notifier - register a netevent notifier block + * @nb: notifier + * + * Register a notifier to be called when a netevent occurs. + * The notifier passed is linked into the kernel structures and must + * not be reused until it has been unregistered. A negative errno code + * is returned on a failure. + */ +int register_netevent_notifier(struct notifier_block *nb) +{ + int err; + + err = atomic_notifier_chain_register(&netevent_notif_chain, nb); + return err; +} + +/** + * netevent_unregister_notifier - unregister a netevent notifier block + * @nb: notifier + * + * Unregister a notifier previously registered by + * register_neigh_notifier(). The notifier is unlinked into the + * kernel structures and may then be reused. A negative errno code + * is returned on a failure. + */ + +int unregister_netevent_notifier(struct notifier_block *nb) +{ + return atomic_notifier_chain_unregister(&netevent_notif_chain, nb); +} + +/** + * call_netevent_notifiers - call all netevent notifier blocks + * @val: value passed unmodified to notifier function + * @v: pointer passed unmodified to notifier function + * + * Call all neighbour notifier blocks. Parameters and return value + * are as for notifier_call_chain(). + */ + +int call_netevent_notifiers(unsigned long val, void *v) +{ + return atomic_notifier_call_chain(&netevent_notif_chain, val, v); +} + +EXPORT_SYMBOL_GPL(register_netevent_notifier); +EXPORT_SYMBOL_GPL(unregister_netevent_notifier); +EXPORT_SYMBOL_GPL(call_netevent_notifiers); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH Round 3 2/2] Core network changes to support network event notification.
This patch adds netevent and netlink calls for neighbour change, route add/del, pmtu change, and routing redirect events. Netlink Details: Neighbour change events are broadcast as a new ndmsg type RTM_NEIGHUPD. Path mtu change events are broadcast as a new rtmsg type RTM_ROUTEUPD. Routing redirect events are broadcast as a pair of rtmsgs, RTM_DELROUTE and RTM_NEWROUTE. --- include/linux/rtnetlink.h |4 ++ net/core/Makefile |2 + net/core/neighbour.c | 37 --- net/ipv4/fib_semantics.c |9 + net/ipv4/route.c | 86 ++-- net/ipv6/route.c | 87 + 6 files changed, 213 insertions(+), 12 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index facd9ee..340ca4f 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -35,6 +35,8 @@ #define RTM_NEWROUTE RTM_NEWROUTE #define RTM_DELROUTE RTM_DELROUTE RTM_GETROUTE, #define RTM_GETROUTE RTM_GETROUTE + RTM_ROUTEUPD, +#define RTM_ROUTEUPD RTM_ROUTEUPD RTM_NEWNEIGH= 28, #define RTM_NEWNEIGH RTM_NEWNEIGH @@ -42,6 +44,8 @@ #define RTM_NEWNEIGH RTM_NEWNEIGH #define RTM_DELNEIGH RTM_DELNEIGH RTM_GETNEIGH, #define RTM_GETNEIGH RTM_GETNEIGH + RTM_NEIGHUPD, +#define RTM_NEIGHUPD RTM_NEIGHUPD RTM_NEWRULE = 32, #define RTM_NEWRULERTM_NEWRULE diff --git a/net/core/Makefile b/net/core/Makefile index e9bd246..2645ba4 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -7,7 +7,7 @@ obj-y := sock.o request_sock.o skbuff.o obj-$(CONFIG_SYSCTL) += sysctl_net_core.o -obj-y += dev.o ethtool.o dev_mcast.o dst.o \ +obj-y += dev.o ethtool.o dev_mcast.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o obj-$(CONFIG_XFRM) += flow.o diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 50a8c73..bf70981 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -30,9 +30,11 @@ #include #include #include #include +#include #include #include #include +#include #define NEIGH_DEBUG 1 @@ -59,6 +61,7 @@ static void neigh_app_notify(struct neig #endif static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev); void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev); +static void rtm_neigh_change(struct neighbour *n); static struct neigh_table *neigh_tables; #ifdef CONFIG_PROC_FS @@ -755,6 +758,7 @@ #endif neigh->nud_state = NUD_STALE; neigh->updated = jiffies; neigh_suspect(neigh); + notify = 1; } } else if (state & NUD_DELAY) { if (time_before_eq(now, @@ -763,6 +767,7 @@ #endif neigh->nud_state = NUD_REACHABLE; neigh->updated = jiffies; neigh_connect(neigh); + notify = 1; next = neigh->confirmed + neigh->parms->reachable_time; } else { NEIGH_PRINTK2("neigh %p is probed.\n", neigh); @@ -820,6 +825,8 @@ #endif out: write_unlock(&neigh->lock); } + if (notify) + rtm_neigh_change(neigh); #ifdef CONFIG_ARPD if (notify && neigh->parms->app_probes) @@ -927,9 +934,7 @@ int neigh_update(struct neighbour *neigh { u8 old; int err; -#ifdef CONFIG_ARPD int notify = 0; -#endif struct net_device *dev; int update_isrouter = 0; @@ -949,9 +954,7 @@ #endif neigh_suspect(neigh); neigh->nud_state = new; err = 0; -#ifdef CONFIG_ARPD notify = old & NUD_VALID; -#endif goto out; } @@ -1023,9 +1026,7 @@ #endif if (!(new & NUD_CONNECTED)) neigh->confirmed = jiffies - (neigh->parms->base_reachable_time << 1); -#ifdef CONFIG_ARPD notify = 1; -#endif } if (new == old) goto out; @@ -1056,7 +1057,11 @@ out: (neigh->flags | NTF_ROUTER) : (neigh->flags & ~NTF_ROUTER); } + write_unlock_bh(&neigh->lock); + + if (notify) + rtm_neigh_change(neigh); #ifdef CONFIG_ARPD if (notify && neigh->parms->app_probes) neigh_app_notify(neigh); @@ -2370,9 +2375,27 @@ static void neigh_app_notify(struct neig NETLINK_CB(skb).dst_group = RTNLGRP_NEIGH; netlink_broadcast(rtnl, skb, 0, RTNLGRP_NEIGH, GFP_ATOMIC); } - #endif /* CONFIG_ARPD */ +static void rtm_neigh_change(struct neighbour *n) +{ + struct nlmsghdr *nlh; + int size = NLMSG_SPACE(sizeof(struct ndmsg) + 256); +
Re: [NET]: Added GSO header verification
On Tue, 2006-06-27 at 22:07 +1000, Herbert Xu wrote: > [NET]: Added GSO header verification > > @@ -2166,10 +2166,14 @@ struct sk_buff *tcp_tso_segment(struct s > if (!pskb_may_pull(skb, thlen)) > goto out; > > + segs = NULL; > + if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) > + goto out; > + This logic doesn't look right to me. Perhaps it's backwards and should be: if (!skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'upstream' branch of wireless-2.6
Michael Buesch wrote: On Tuesday 27 June 2006 22:06, Larry Finger wrote: John, I would like to find a diplomatic solution to this impasse between Michael and Jeff, which is why I'm writing to you privately. Michael is correct in that the loop in question will not usually delay private? I meant it to be private, but screwed up. long; however, on some hardware it takes longer than on his. On mine, I have seen delays as long as 550 usec. What's the chip? bcm43xx: Chip ID 0x4306, rev 0x2 bcm43xx: Number of cores: 6 bcm43xx: Core 0: ID 0x800, rev 0x2, vendor 0x4243, enabled bcm43xx: Core 1: ID 0x812, rev 0x4, vendor 0x4243, disabled bcm43xx: Core 2: ID 0x80d, rev 0x1, vendor 0x4243, enabled bcm43xx: Core 3: ID 0x807, rev 0x1, vendor 0x4243, disabled bcm43xx: Core 4: ID 0x804, rev 0x7, vendor 0x4243, enabled bcm43xx: Core 5: ID 0x812, rev 0x4, vendor 0x4243, disabled bcm43xx: Ignoring additional 802.11 core. bcm43xx: Detected PHY: Version: 1, Type 2, Revision 1 bcm43xx: Detected Radio: ID: 2205017f (Manuf: 17f Ver: 2050 Rev: 2) In any case, I think that the following code fragment would work and pass Jeff's criticism: for (i=5000; i; i--) { .. usleep(1); usleep? Can't find that in my kernel tree. In fact, I think the lowest possible sleep time depends on HZ and is 1msec on 1000HZ. I meant udelay, of course. Additionally, we are holding a spinlock at this time, so it is not as easy as simply replacing udelay() by some sleeping function. I know that. This would make the worst-case delay be 5 msec, but would provide a cushion of 10X the longest I have seen and should be safe. Do you have any suggestions on what should be done next? Leave it as is and find out why it takes so long for your strange card. ;) I once offered you my second, duplicate card for testing, but never heard back. Do you have any ideas regarding diagnostics to see why it takes so long? Remember, this card used to time-out on the 1 second delay before the periodic work was restructured. Larry - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH Round 2 0/2][RFC] Network Event Notifier Mechanism
From: Steve Wise <[EMAIL PROTECTED]> Date: Tue, 27 Jun 2006 15:33:19 -0500 > From my experimentation with netlink, RTM_NEWROUTE and RTM_DELROUTE > messages do not get sent up for redirect events. I have, in fact, added > this with the new patch I'll send out soon. So either way I need to > change the IPv[46] code to generate a notification for redirects. With > the single NETEVENT_REDIRECT call, the RDMA driver can, in one sweep, > update all the connections. It seems more efficient. At the place > where I've hooked redirect, both the old route and the new route are > already created. Ok, let's see what it looks like. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Export accept queue len of a TCP listening socket via rx_queue
From: Sridhar Samudrala <[EMAIL PROTECTED]> Date: Thu, 22 Jun 2006 10:38:17 -0700 > On Thu, 2006-06-22 at 10:50 +1000, Herbert Xu wrote: > > Sridhar Samudrala <[EMAIL PROTECTED]> wrote: > > >> > > >> What about using the same fields (rqueue/wqueue) as you did for /proc? > > > > > > I meant extending tcp_info structure to add new fields. I think the user > > > space also uses this structure. > > > > What about putting it into inet_idiag_msg.idiag_[rw]queue instead? > > OK. I was under the mistaken assumption that [rw]queue fields are exported > via tcp_info. This makes it pretty simple to support netlink users also. > Here is the updated patch. This looks fine. Applied, thanks a lot Sridhar. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec >= 1000000
From: Shuya MAEDA <[EMAIL PROTECTED]> Date: Wed, 21 Jun 2006 09:16:03 +0900 > Thank you for the comment. > I made the patch that used the loop instead of the divide and modulus. > Are there any comments? Your email client has corrupted the patch, turning tab characters into spaces, and also turning lines containing only spaces into empty lines. Therefore, I cannot apply your patch, please send your patch properly so that I may apply it. Thank you. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Make illegal_highdma more anal
From: Herbert Xu <[EMAIL PROTECTED]> Date: Wed, 21 Jun 2006 09:49:38 +1000 > [NET]: Make illegal_highdma more anal > > Rather than having illegal_highdma as a macro when HIGHMEM is off, we > can turn it into an inline function that returns zero. This will catch > callers that give it bad arguments. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Looks sane, applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH Round 2 0/2][RFC] Network Event Notifier Mechanism
On Tue, 2006-06-27 at 13:21 -0700, David Miller wrote: > From: Steve Wise <[EMAIL PROTECTED]> > Date: Tue, 27 Jun 2006 15:19:08 -0500 > > > For an RDMA NIC, all this logic is in HW, which is why we need the event > > notification; to tell the HW to change its next hop information. > > Back to the route change notification, I still think you can > get what you need by just looking for the route delete. > > You can match if any RDMA connection is using the deleted > route, mark it "update pending" or something like that, > and when the you get the "new route" event you can walk the > "pending" list and try to relookup the route for those > connections. >From my experimentation with netlink, RTM_NEWROUTE and RTM_DELROUTE messages do not get sent up for redirect events. I have, in fact, added this with the new patch I'll send out soon. So either way I need to change the IPv[46] code to generate a notification for redirects. With the single NETEVENT_REDIRECT call, the RDMA driver can, in one sweep, update all the connections. It seems more efficient. At the place where I've hooked redirect, both the old route and the new route are already created. Steve. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH Round 2 0/2][RFC] Network Event Notifier Mechanism
[EMAIL PROTECTED] wrote: > From: Steve Wise <[EMAIL PROTECTED]> > Date: Tue, 27 Jun 2006 10:02:19 -0500 > >> For the RDMA kernel subsystem, however, we still need a specific >> event. We need both the old and new dst_entry struct ptrs to figure >> out which active connections were using the old dst_entry and should >> be updated to use the new dst_entry. > > This change isn't truly atomic from a kernel standpoint either. > > The new dst won't be selected by the socket until later, when > the socket tries to send something, notices the old dst is > obsolete, and looks up a new one. > > Your code could do the same thing. The request to "send something" is posted directly form user mode to a mapped memory ring that is reaped by the hardware. Having the hardware fault, report that fault, and wait for the host to update it with the new mapping is somewhat clumbsy. It also won't work at all for existing hardware. The best you could do is to have the driver invalidate the old entry, then *presume* that the hardware will want the replacement and look that up, and then forward that answer to the hardware. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Added GSO header verification
From: Herbert Xu <[EMAIL PROTECTED]> Date: Tue, 27 Jun 2006 22:07:14 +1000 > This feature is only needed by Xen but most of the code here is useful > for other things like TCPv4 ECN support. > > [NET]: Added GSO header verification Looks sane, applied. Thanks Herbert. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET] Towards accurate incoming interface information
From: Thomas Graf <[EMAIL PROTECTED]> Date: Tue, 27 Jun 2006 17:07:27 +0200 > * Thomas Graf <[EMAIL PROTECTED]> 2006-06-26 16:54 > > This patchset transforms skb->input_dev based on a device > > reference to skb->iif based on an interface index moving > > towards accurate iif information for routing and classification > > through the following changesets: > > Hold on with this, I haven't noticed this ifb device > go in and thus missed to update it. I'll post an > updated patch shortly Ok. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/1] netlink: encapsulate eff_cap usage within security framework
From: Stephen Smalley <[EMAIL PROTECTED]> Date: Mon, 26 Jun 2006 13:19:05 -0400 > This patch encapsulates the usage of eff_cap (in netlink_skb_params) within > the security framework by extending security_netlink_recv to include a > required > capability parameter and converting all direct usage of eff_caps outside > of the lsm modules to use the interface. It also updates the SELinux > implementation of the security_netlink_send and security_netlink_recv > hooks to take advantage of the sid in the netlink_skb_params struct. > This also enables SELinux to perform auditing of netlink capability checks. > Please apply, for 2.6.18 if possible. > > Signed-off-by: Darrel Goeddel <[EMAIL PROTECTED]> > Signed-off-by: Stephen Smalley <[EMAIL PROTECTED]> > Acked-by: James Morris <[EMAIL PROTECTED]> Applied, thanks a lot. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH Round 2 0/2][RFC] Network Event Notifier Mechanism
From: Steve Wise <[EMAIL PROTECTED]> Date: Tue, 27 Jun 2006 10:02:19 -0500 > For the RDMA kernel subsystem, however, we still need a specific event. > We need both the old and new dst_entry struct ptrs to figure out which > active connections were using the old dst_entry and should be updated to > use the new dst_entry. This change isn't truly atomic from a kernel standpoint either. The new dst won't be selected by the socket until later, when the socket tries to send something, notices the old dst is obsolete, and looks up a new one. Your code could do the same thing. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'upstream' branch of wireless-2.6
On Tuesday 27 June 2006 22:06, Larry Finger wrote: > John, > > I would like to find a diplomatic solution to this impasse between Michael > and Jeff, which is why > I'm writing to you privately. Michael is correct in that the loop in question > will not usually delay private? > long; however, on some hardware it takes longer than on his. On mine, I have > seen delays as long as > 550 usec. What's the chip? > In any case, I think that the following code fragment would work and pass > Jeff's criticism: > > for (i=5000; i; i--) { > .. > usleep(1); usleep? Can't find that in my kernel tree. In fact, I think the lowest possible sleep time depends on HZ and is 1msec on 1000HZ. Additionally, we are holding a spinlock at this time, so it is not as easy as simply replacing udelay() by some sleeping function. > This would make the worst-case delay be 5 msec, but would provide a cushion > of 10X the longest I > have seen and should be safe. > > Do you have any suggestions on what should be done next? Leave it as is and find out why it takes so long for your strange card. ;) -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH Round 2 0/2][RFC] Network Event Notifier Mechanism
On Tue, 2006-06-27 at 13:14 -0700, David Miller wrote: > This change isn't truly atomic from a kernel standpoint either. > > The new dst won't be selected by the socket until later, > when the socket tries to send something, notices the old dst > is obsolete, and looks up a new one. > > Your code could do the same thing. > For an RDMA NIC, all this logic is in HW, which is why we need the event notification; to tell the HW to change its next hop information. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH Round 2 0/2][RFC] Network Event Notifier Mechanism
From: Steve Wise <[EMAIL PROTECTED]> Date: Tue, 27 Jun 2006 15:19:08 -0500 > For an RDMA NIC, all this logic is in HW, which is why we need the event > notification; to tell the HW to change its next hop information. Back to the route change notification, I still think you can get what you need by just looking for the route delete. You can match if any RDMA connection is using the deleted route, mark it "update pending" or something like that, and when the you get the "new route" event you can walk the "pending" list and try to relookup the route for those connections. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'upstream' branch of wireless-2.6
Michael Buesch wrote: On Tuesday 27 June 2006 21:33, John W. Linville wrote: On Tue, Jun 27, 2006 at 06:31:01PM +0200, Michael Buesch wrote: On Tuesday 27 June 2006 18:12, Jeff Garzik wrote: Michael Buesch wrote: So, I will submit a patch to lower the udelay(10) to udelay(1) and we can close the discussion? ;) No, that totally avoids my point. Your "otherwise idle machine" test is probably nowhere near worst case in the field, for loops that can potentially lock the CPU for a long time upon hardware fault. And then there are the huge delays in specific functions that I pointed out... wtf are you requesting from me? 1) I proved you that the loop does only spin _once_ or even _less_. 2) If the hardware is faulty, the user must replace it. Because, if the hardware is faulty, it can crash the whole machine anyway, obviously. 3) There is no "huge delay". I proved it with my logs. -> No CPU hog => Nothing to fix. Michael, I think Jeff's concern is that by using udelay you are busy-waiting. And, the for loop limit of 10 means you could freeze the kernel for up to a whole second. Granted that this won't happen very often s/very often/ever/ It won't happen, as long as the driver is not buggy, or the device is hardware broken. So, if it happens, something has to be fixed. In fact, it did happen _never_ for me. If it triggers, the device does not work _at all_ anyway. and in the grand scheme of things a second isn't all _that_ long, but still it would be better to avoid a delay like that -- a second could be the time it takes to avoid a meltdown at the nuclear power plant. :-) Could you not use msleep instead of udelay (and scale the for loop appropriately)? What would be the problem with that? It would get rid of the busy waiting. Becauses it horribly _increases_ the delay. We "spin" for _at most_ 10 usecs here. Please always remember that. We are talking about a 10 usec delay here. And I already sent a patch to even reduce this to under 10 usec. To be fair, this code was already in the driver and was only being moved by this patch. Still, what better time to fix it than now? :-) If it ain't broken, don't fix it. I'll go ahead and reshuffle wireless-2.6 to drop this patch. A new patch that passes muster w/ Jeff will be most welcome! :-) A new patch won't appear, as there is no problem with this delay. Please don't drop anything and apply the following patch on top of it: John, I would like to find a diplomatic solution to this impasse between Michael and Jeff, which is why I'm writing to you privately. Michael is correct in that the loop in question will not usually delay long; however, on some hardware it takes longer than on his. On mine, I have seen delays as long as 550 usec. In any case, I think that the following code fragment would work and pass Jeff's criticism: for (i=5000; i; i--) { .. usleep(1); } This would make the worst-case delay be 5 msec, but would provide a cushion of 10X the longest I have seen and should be safe. Do you have any suggestions on what should be done next? Larry - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'upstream' branch of wireless-2.6
On Tuesday 27 June 2006 21:33, John W. Linville wrote: > On Tue, Jun 27, 2006 at 06:31:01PM +0200, Michael Buesch wrote: > > On Tuesday 27 June 2006 18:12, Jeff Garzik wrote: > > > Michael Buesch wrote: > > > > So, I will submit a patch to lower the udelay(10) to udelay(1) > > > > and we can close the discussion? ;) > > > > > > No, that totally avoids my point. Your "otherwise idle machine" test is > > > probably nowhere near worst case in the field, for loops that can > > > potentially lock the CPU for a long time upon hardware fault. And then > > > there are the huge delays in specific functions that I pointed out... > > > > wtf are you requesting from me? > > 1) I proved you that the loop does only spin _once_ or even _less_. > > 2) If the hardware is faulty, the user must replace it. > >Because, if the hardware is faulty, it can crash the whole > >machine anyway, obviously. > > > > 3) There is no "huge delay". I proved it with my logs. > >-> No CPU hog => Nothing to fix. > > Michael, > > I think Jeff's concern is that by using udelay you are busy-waiting. > And, the for loop limit of 10 means you could freeze the kernel > for up to a whole second. Granted that this won't happen very often s/very often/ever/ It won't happen, as long as the driver is not buggy, or the device is hardware broken. So, if it happens, something has to be fixed. In fact, it did happen _never_ for me. If it triggers, the device does not work _at all_ anyway. > and in the grand scheme of things a second isn't all _that_ long, > but still it would be better to avoid a delay like that -- a second > could be the time it takes to avoid a meltdown at the nuclear power > plant. :-) > > Could you not use msleep instead of udelay (and scale the for loop > appropriately)? What would be the problem with that? It would get > rid of the busy waiting. Becauses it horribly _increases_ the delay. We "spin" for _at most_ 10 usecs here. Please always remember that. We are talking about a 10 usec delay here. And I already sent a patch to even reduce this to under 10 usec. > To be fair, this code was already in the driver and was only being > moved by this patch. Still, what better time to fix it than now? :-) If it ain't broken, don't fix it. > I'll go ahead and reshuffle wireless-2.6 to drop this patch. A new > patch that passes muster w/ Jeff will be most welcome! :-) A new patch won't appear, as there is no problem with this delay. Please don't drop anything and apply the following patch on top of it: -- Microoptimization: This reduces the udelay in bcm43xx_mac_suspend. Signed-off-by: Michael Buesch <[EMAIL PROTECTED]> Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c 2006-06-27 17:47:24.0 +0200 +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c2006-06-27 17:53:29.0 +0200 @@ -2328,7 +2328,7 @@ tmp = bcm43xx_read32(bcm, BCM43xx_MMIO_GEN_IRQ_REASON); if (tmp & BCM43xx_IRQ_READY) goto out; - udelay(10); + udelay(1); } printkl(KERN_ERR PFX "MAC suspend failed\n"); } -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [redhat-lspp] Re: [RFC 3/7] NetLabel: CIPSOv4 engine
On Mon, Jun 26, 2006 at 08:33:57PM -0400, James Morris wrote: > On Mon, 26 Jun 2006, Joe Nall wrote: > > For all of the EAL4 LSPP Linux evaluation work is being done by Red > > Hat/IBM/HP/atsec and others to be useful to integrators, there has to be > > basic > > (e.g. CIPSO) multilevel network interoperability with existing multilevel > > systems and good (e.g IPSec) multilevel networking between SELinux systems. > > Just to be clear, my understanding is that the native xfrm labeling is > suitable for LSPP evaluation, as distinct from CIPSO being desired by > system integrators from an interoperability point of view. It's not quite that distinct, the two solutions overlap in some areas but neither can replace the other. CIPSO would also be suitable for LSPP evaluation since it is capable of exporting and importing labeled data. It requires a trusted network since it doesn't encrypt or authenticate, so the evaluation would need to restrict the environment accordingly. The native IPSEC/xfrm approach is useful for more hostile environments where you can't fully trust the network, but it's not interoperable with existing deployed systems so it's not a replacement for CIPSO. >From an evaluation point of view, either CIPSO or IPSEC/xfrm would be able to meet LSPP requirements but with different restrictions on the environment. -Klaus - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'upstream' branch of wireless-2.6
On Tue, Jun 27, 2006 at 06:31:01PM +0200, Michael Buesch wrote: > On Tuesday 27 June 2006 18:12, Jeff Garzik wrote: > > Michael Buesch wrote: > > > So, I will submit a patch to lower the udelay(10) to udelay(1) > > > and we can close the discussion? ;) > > > > No, that totally avoids my point. Your "otherwise idle machine" test is > > probably nowhere near worst case in the field, for loops that can > > potentially lock the CPU for a long time upon hardware fault. And then > > there are the huge delays in specific functions that I pointed out... > > wtf are you requesting from me? > 1) I proved you that the loop does only spin _once_ or even _less_. > 2) If the hardware is faulty, the user must replace it. >Because, if the hardware is faulty, it can crash the whole >machine anyway, obviously. > > 3) There is no "huge delay". I proved it with my logs. >-> No CPU hog => Nothing to fix. Michael, I think Jeff's concern is that by using udelay you are busy-waiting. And, the for loop limit of 10 means you could freeze the kernel for up to a whole second. Granted that this won't happen very often and in the grand scheme of things a second isn't all _that_ long, but still it would be better to avoid a delay like that -- a second could be the time it takes to avoid a meltdown at the nuclear power plant. :-) Could you not use msleep instead of udelay (and scale the for loop appropriately)? What would be the problem with that? It would get rid of the busy waiting. To be fair, this code was already in the driver and was only being moved by this patch. Still, what better time to fix it than now? :-) I'll go ahead and reshuffle wireless-2.6 to drop this patch. A new patch that passes muster w/ Jeff will be most welcome! :-) Thanks, John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bcm43xx: opencoded locking
As many people don't seem to like the locking "obfuscation" in the bcm43xx driver, this patch removes it. Signed-off-by: Michael Buesch <[EMAIL PROTECTED]> Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx.h === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx.h2006-06-27 17:47:24.0 +0200 +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx.h 2006-06-27 20:44:27.0 +0200 @@ -647,6 +647,19 @@ #define bcm43xx_status(bcm)atomic_read(&(bcm)->init_status) #define bcm43xx_set_status(bcm, stat) atomic_set(&(bcm)->init_status, (stat)) +/**** THEORY OF LOCKING *** + * + * We have two different locks in the bcm43xx driver. + * => bcm->mutex:General sleeping mutex. Protects struct bcm43xx_private + * and the device registers. This mutex does _not_ protect + * against concurrency from the IRQ handler. + * => bcm->irq_lock: IRQ spinlock. Protects against IRQ handler concurrency. + * + * Please note that, if you only take the irq_lock, you are not protected + * against concurrency from the periodic work handlers. + * Most times you want to take _both_ locks. + */ + struct bcm43xx_private { struct ieee80211_device *ieee; struct ieee80211softmac_device *softmac; @@ -657,7 +670,6 @@ void __iomem *mmio_addr; - /* Locking, see "theory of locking" text below. */ spinlock_t irq_lock; struct mutex mutex; @@ -689,6 +701,7 @@ struct bcm43xx_sprominfo sprom; #define BCM43xx_NR_LEDS4 struct bcm43xx_led leds[BCM43xx_NR_LEDS]; + spinlock_t leds_lock; /* The currently active core. */ struct bcm43xx_coreinfo *current_core; @@ -759,55 +772,6 @@ }; -/**** THEORY OF LOCKING *** - * - * We have two different locks in the bcm43xx driver. - * => bcm->mutex:General sleeping mutex. Protects struct bcm43xx_private - * and the device registers. - * => bcm->irq_lock: IRQ spinlock. Protects against IRQ handler concurrency. - * - * We have three types of helper function pairs to utilize these locks. - * (Always use the helper functions.) - * 1) bcm43xx_{un}lock_noirq(): - * Takes bcm->mutex. Does _not_ protect against IRQ concurrency, - * so it is almost always unsafe, if device IRQs are enabled. - * So only use this, if device IRQs are masked. - * Locking may sleep. - * You can sleep within the critical section. - * 2) bcm43xx_{un}lock_irqonly(): - * Takes bcm->irq_lock. Does _not_ protect against - * bcm43xx_lock_noirq() critical sections. - * Does only protect against the IRQ handler path and other - * irqonly() critical sections. - * Locking does not sleep. - * You must not sleep within the critical section. - * 3) bcm43xx_{un}lock_irqsafe(): - * This is the cummulative lock and takes both, mutex and irq_lock. - * Protects against noirq() and irqonly() critical sections (and - * the IRQ handler path). - * Locking may sleep. - * You must not sleep within the critical section. - */ - -/* Lock type 1 */ -#define bcm43xx_lock_noirq(bcm)mutex_lock(&(bcm)->mutex) -#define bcm43xx_unlock_noirq(bcm) mutex_unlock(&(bcm)->mutex) -/* Lock type 2 */ -#define bcm43xx_lock_irqonly(bcm, flags) \ - spin_lock_irqsave(&(bcm)->irq_lock, flags) -#define bcm43xx_unlock_irqonly(bcm, flags) \ - spin_unlock_irqrestore(&(bcm)->irq_lock, flags) -/* Lock type 3 */ -#define bcm43xx_lock_irqsafe(bcm, flags) do { \ - bcm43xx_lock_noirq(bcm);\ - bcm43xx_lock_irqonly(bcm, flags); \ - } while (0) -#define bcm43xx_unlock_irqsafe(bcm, flags) do {\ - bcm43xx_unlock_irqonly(bcm, flags); \ - bcm43xx_unlock_noirq(bcm); \ - } while (0) - - static inline struct bcm43xx_private * bcm43xx_priv(struct net_device *dev) { Index: wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_debugfs.c === --- wireless-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_debugfs.c 2006-06-24 22:13:44.0 +0200 +++ wireless-2.6/drivers/net/wireless/bcm43xx/bcm43xx_debugfs.c 2006-06-27 20:44:27.0 +0200 @@ -77,7 +77,8 @@ down(&big_buffer_sem); - bcm43xx_lock_irqsafe(bcm, flags); + mutex_lock(&bcm->mutex); + spin_lock_irqsave(&bcm->irq_lock, flags); if (bcm43xx_status(bcm) != BCM43xx_STAT_INITIALIZED) { fappend("Board not initialized.\n"); goto out; @@ -121,7 +122,8 @@ fappend("\n"); out: - bcm43xx_unlock_irqsafe(bcm, flags); + spin_unlock_irqrestore(&bcm->irq_lock, flags); + mutex_unlock(&bcm->mutex); res = simple_read_from_buffer(userbuf, count, ppos, buf, pos); up(&big_buffer_sem); return res; @@ -1
Re: tg3 driver and interrupt coalescence questions
Rick Jones wrote: > > Are you looking to increase or decrease the settings? I would think > (initially at least) that for VOIP one might not want to increase them. > > rick jones I'm looking to decrease the interrupt load on the system. During the test I mentioned above I had some interesting and confusing results. The changes from the default settings to the settings I posted resulted in a 100% performance increase (counted by the number of VoIP audio streams the tested server could support). With default settings one of the two CPUs in the system maxed out at 99% cpu usage handling interrupts, while the second CPU was not maxed out, but we started to drop packets and the VoIP call setups started showing retransmits (which is the measurement for failure in this test) at about 300 streams. With the new settings we were able to hit 600 streams. So I definately recognized a significant improvement. However I'd still like to get more improvement. At 600 streams and 20ms packets we are looking at 30,000 pps. The % of cpu (1 CPU as apparently the interrupts can't be shared across multiple CPUs) used for interrupt handling at this 600 stream limit was 88.0%. Now what was interesting was on the test generation side (same hardware exactly) of things, I was using the SIPP software to generate the VoIP streams, and each blade in the blade server was only able to generate ~200 streams, with default settings in ethtool, one of the CPUs would hit max usage for interrupt handling at that point. So I modified the ethtool settings to match those I listed above and there was no discernable difference. It was identical performance to the default settings. Michael's response clarified for me what the actual parameters in the -C section of ethtool do, thanks Michael. However I';; be greatly appreciative of any recommedations anyone might have for interrupt mitigation settings for 100% UDP RTP traffic of 20ms packets (50 pps per stream). -Chris - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 driver and interrupt coalescence questions
Chris A. Icide wrote: I've been digging around trying to get some information on the current status of interrupt mitigation features for a Braodcom 5704 interface. Specifically I'm sending and receiving lots of VoIP packets (50 pps > per stream, many streams). What I can't seem to determine is this: What version of the linux kernel & tg3 drivers are required to support both rx and tx mitigation? What do the ethtool coalescence settings actually do (I've not been Delay interrupts and increase individual packet latency with the intention being decreasing CPU utilization and allowing a higher aggregate packet per second limit. IE bandwidth vs latency tradeoffs. able to find actual descriptions of the different parameters in the -C section) Is there anything special that needs to be done when compiling a kernel to enable this feature for both the kernel and the tg3 driver. Are you looking to increase or decrease the settings? I would think (initially at least) that for VOIP one might not want to increase them. rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: e1000: Janitor: Use #defined values for literals
Linas Vepstas wrote: On Fri, Jun 23, 2006 at 01:07:21PM -0700, Auke Kok wrote: Linas Vepstas wrote: Minor janitorial patch: use #defines for literal values. + pci_enable_wake(pdev, PCI_D3hot, 0); + pci_enable_wake(pdev, PCI_D3cold, 0); I Acked this but that's silly - the patches sent yesterday already change the code above and this patch is no longer needed (thanks Jesse for spotting this). This patch would conflict with them so please don't apply. Maybe there's a backlog in the queue, but I not this is not yet in 2.6.17-mm3 It's part of the submission for 2.6.18 I sent to jgarzik on friday, which cleans up this section in the way. Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network namespaces a path to mergable code.
Eric, On Tue, Jun 27, 2006 at 11:20:40AM -0600, Eric W. Biederman wrote: > > Thinking about this I am going to suggest a slightly different direction > for get a patchset we can merge. > > First we concentrate on the fundamentals. > - How we mark a device as belonging to a specific network namespace. > - How we mark a socket as belonging to a specific network namespace. I agree with the direction of your thoughts. I was trying to do a similar thing, define clear steps in network namespace merging. My first patchset covers devices but not sockets. The only difference from what you're suggesting is ipv4 routing. For me, it is not less important than devices and sockets. May be even more important, since routing exposes design deficiencies less obvious at socket level. > > As part of the fundamentals we add a patch to the generic socket code > that by default will disable it for protocol families that do not indicate > support for handling network namespaces, on a non-default network namespace. Fine Can you summarize you objections against my way of handling devices, please? And what was the typo you referred to in your letter to Kirill Korotaev? Regards Andrey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 driver and interrupt coalescence questions
On Tue, 2006-06-27 at 10:16 -0700, Chris A. Icide wrote: > What version of the linux kernel & tg3 drivers are required to support both > rx and tx mitigation? ethtool -C for tg3 was added around July of 2005. The version with this change added was 3.33. > What do the ethtool coalescence settings actually do (I've not been able to > find actual descriptions of the different parameters in the -C section) They set the delay between the tx and rx events and the generation of interrupts for those events. These are the only parameters that are relevant for tg3: rx-frames[-irq] rx-usecs[-irq] tx-frames[-irq] tx-usecs[-irq] The frames parameters specify how many packets are received/transmitted before generating an interrupt. The usecs parameters specify how many microseconds after at least 1 packet is received/transmitted before generating an interrupt. The [-irq] parameters are the corresponding delays in updating the status when the interrupt is disabled. > Is there anything special that needs to be done when compiling a kernel to > enable this feature for both the kernel and the tg3 driver. No. > 05:01.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704S > Gigabit Ethernet (rev 10) > Subsystem: IBM: Unknown device 02e8 > Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 201 > Memory at dcfe (64-bit, non-prefetchable) [size=64K] > Capabilities: [40] PCI-X non-bridge device. > Capabilities: [48] Power Management version 2 > Capabilities: [50] Vital Product Data > Capabilities: [58] Message Signalled Interrupts: 64bit+ > Queue=0/3 Enable- > > Linux version 2.6.9-34.ELsmp ([EMAIL PROTECTED]) (gcc version > 3.4.5 20051201 (Red Hat 3.4.5-2)) #1 SMP Thu Mar 9 06:23:23 GMT 2006 > > [EMAIL PROTECTED] ~]# ethtool -c eth1 > Coalesce parameters for eth1: > Adaptive RX: off TX: off > stats-block-usecs: 100 > sample-interval: 0 > pkt-rate-low: 0 > pkt-rate-high: 0 > > rx-usecs: 500 > rx-frames: 30 > rx-usecs-irq: 500 > rx-frames-irq: 20 > This means that the first interrupt will be generated after 30 packets are received or 500 microseconds after the nth packet is received (1 <= n < 30). When irq is disabled, 20 packets instead of 30 before updating status. > tx-usecs: 400 > tx-frames: 53 > tx-usecs-irq: 490 > tx-frames-irq: 5 The first tx interrupt will be generated after 53 packets are transmitted or 400 microseconds after the nth packet is transmitted (1 <= n < 53). When irq is disabled, 5 packets or 490 micosecs before updating status. If the condition for generating a tx or rx interrupt is met, you get all the accumulated tx and rx status during the interrupt. Hope this helps. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: e1000: Janitor: Use #defined values for literals
On Fri, Jun 23, 2006 at 01:07:21PM -0700, Auke Kok wrote: > Linas Vepstas wrote: > >Minor janitorial patch: use #defines for literal values. > >+pci_enable_wake(pdev, PCI_D3hot, 0); > >+pci_enable_wake(pdev, PCI_D3cold, 0); > > I Acked this but that's silly - the patches sent yesterday already change > the code above and this patch is no longer needed (thanks Jesse for > spotting this). > > This patch would conflict with them so please don't apply. Maybe there's a backlog in the queue, but I not this is not yet in 2.6.17-mm3 --linas - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
Russell Stuart wrote: > Without seeing your actual proposal it is difficult to > judge whether this is a reasonable trade-off or not. > Hopefully we will see your code soon. Do you have any > idea when? Probably not today, I'll try to get it into shape until tomomorrow. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
tg3 driver and interrupt coalescence questions
I've been digging around trying to get some information on the current status of interrupt mitigation features for a Braodcom 5704 interface. Specifically I'm sending and receiving lots of VoIP packets (50 pps per stream, many streams). What I can't seem to determine is this: What version of the linux kernel & tg3 drivers are required to support both rx and tx mitigation? What do the ethtool coalescence settings actually do (I've not been able to find actual descriptions of the different parameters in the -C section) Is there anything special that needs to be done when compiling a kernel to enable this feature for both the kernel and the tg3 driver. Just a warning, I'm not a C coder, so I've not had much luck digging around the code and looking for answers. I've currently got a blade server with 10 blades I'm using 9 blades to generate this small packet high rate traffic to the 10th blade and trying to improve the ability of a blade to handle VoIP traffic. I made some guesses at settings for the -C options in ethtool on both the test blade and the traffic generators. Interestingly it seems to have had a very good effect on the test blade (%cpu for interrupt down from 99.9% to ~20%), but the same settings on the traffic generation servers seems to have had no effect. Hardware is identical, kernel is identical. Any help is GREATLY appreciated. -Chris 05:01.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet (rev 10) Subsystem: IBM: Unknown device 02e8 Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 201 Memory at dcfe (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- Linux version 2.6.9-34.ELsmp ([EMAIL PROTECTED]) (gcc version 3.4.5 20051201 (Red Hat 3.4.5-2)) #1 SMP Thu Mar 9 06:23:23 GMT 2006 [EMAIL PROTECTED] ~]# ethtool -c eth1 Coalesce parameters for eth1: Adaptive RX: off TX: off stats-block-usecs: 100 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 500 rx-frames: 30 rx-usecs-irq: 500 rx-frames-irq: 20 tx-usecs: 400 tx-frames: 53 tx-usecs-irq: 490 tx-frames-irq: 5 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0 [EMAIL PROTECTED] ~]# ethtool -i eth1 driver: tg3 version: 3.43-rh firmware-version: bus-info: :05:01.1 [EMAIL PROTECTED] ~]# ethtool eth1 Settings for eth1: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Current message level: 0x00ff (255) Link detected: yes - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] [Network namespace] Network device sharing by view
Eric W. Biederman wrote: Herbert Poetzl <[EMAIL PROTECTED]> writes: On Tue, Jun 27, 2006 at 05:52:52AM -0600, Eric W. Biederman wrote: Inside the containers I want all network devices named eth0! huh? even if there are two of them? also tun? I think you meant, you want to be able to have eth0 in _more_ than one guest where eth0 in a guest can also be/use/relate to eth1 on the host, right? Right I want to have an eth0 in each guest where eth0 is it's own network device and need have no relationship to eth0 on the host. How does that help anything? Do you envision programs that make special decisions on whether the interface is called eth0 v/s eth151? Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] Network namespaces a path to mergable code.
Thinking about this I am going to suggest a slightly different direction for get a patchset we can merge. First we concentrate on the fundamentals. - How we mark a device as belonging to a specific network namespace. - How we mark a socket as belonging to a specific network namespace. As part of the fundamentals we add a patch to the generic socket code that by default will disable it for protocol families that do not indicate support for handling network namespaces, on a non-default network namespace. I think that gives us a path that will allow us to convert the network stack one protocol family at a time instead of in one big lump. Stubbing off the sysfs and sysctl interfaces in the first round for the non-default namespaces as you have done should be good enough. The reason for the suggestion is that most of the work for the protocol stacks ipv4 ipv6 af_packet af_unix is largely noise, and simple replacement without real design work happening. Mostly it is just tweaking the code to remove global variables, and doing a couple lookups. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] [Network namespace] Network device sharing by view
On Tue, Jun 27, 2006 at 06:02:42PM +0200, Herbert Poetzl wrote: > - loopback traffic inside a guest is insignificantly >slower than on a normal system > > - loopback traffic on the host is insignificantly >slower than on a normal system > > - inter guest traffic is faster than on-wire traffic, >and should be withing a small tolerance of the >loopback case (as it really isn't different) I do not follow what are you people arguing about? Intra-guest, guest-guest and host-guest paths have _no_ differences from host-host loopback. Only the device is different: * virtual loopback for intra-guest * virtual interface for guest-guest and host-guest But the work is exactly the same, only the place where packets looped back is different. How could this be issue to break a lance over? :-) Alexey PS. The only thing, which I can imagine is "optimized" out ip_route_input() in the case of loopback. But this optimization was an obvious design mistake (mine, sorry) and apparently will die together with removal of current deficiences of routing cache. Actually, it is one of deficiences. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][patch 1/4] Network namespaces: cleanup of dev_base list use
Kirill Korotaev <[EMAIL PROTECTED]> writes: > This doesn't support anything. e.g. I caught quite a lot of bugs after Ingo > Molnar, but this doesn't make his code "poor". People are people. > Anyway, I would be happy to see the typo. Look up thread. You replied to the message where I commented on it. There are two ways to argue this. - It is the linux kernel development style to do small simple obviously patches that copy the maintainer of the code you are changing. - Explain why that is the style. The basic idea is that on a simple patch that is well described, it is trivial to check and trivial to verify. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'upstream' branch of wireless-2.6
> No, that totally avoids my point. Your "otherwise idle machine" test is > probably nowhere near worst case in the field, for loops that can > potentially lock the CPU for a long time upon hardware fault. And then > there are the huge delays in specific functions that I pointed out... > > Jeff The problem is that these are the delays used in the original driver that we've been writing the specs from. We don't know what they're for or why they're so long. We don't know if reducing the delay will cause issues on some hardware and work fine on others. Without the actual specs from Broadcom, it's hard to say what's excessive and what's not and whether changing it will break the driver. -Joe - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] [Network namespace] Network device sharing by view
Herbert Poetzl <[EMAIL PROTECTED]> writes: > On Tue, Jun 27, 2006 at 05:52:52AM -0600, Eric W. Biederman wrote: >> >> Inside the containers I want all network devices named eth0! > > huh? even if there are two of them? also tun? > > I think you meant, you want to be able to have eth0 in > _more_ than one guest where eth0 in a guest can also > be/use/relate to eth1 on the host, right? Right I want to have an eth0 in each guest where eth0 is it's own network device and need have no relationship to eth0 on the host. >> We need a clean abstraction that optimizes well. >> >> However local communication between containers is not what we >> should benchmark. That can always be improved later. So long as >> the performance is reasonable. What needs to be benchmarked is the >> overhead of namespaces when connected to physical networking devices >> and on their own local loopback, and comparing that to a kernel >> without namespace support. > > well, for me (obviously advocating the lightweight case) > it seems improtant that the following conditions are met: > > - loopback traffic inside a guest is insignificantly >slower than on a normal system > > - loopback traffic on the host is insignificantly >slower than on a normal system > > - inter guest traffic is faster than on-wire traffic, >and should be withing a small tolerance of the >loopback case (as it really isn't different) > > - network (on-wire) traffic should be as fast as without >the namespace (i.e. within 1% or so, better not really >measurable) > > - all this should be true in a setup with a significant >number of guests, when only one guest is active, but >all other guests are ready/configured > > - all this should scale well with a few hundred guests Ultimately I agree. However. Only host performance should be a merge blocker. Allowing us to go back and reclaim the few percentage points we lost later. >> If we don't hurt that core case we have an implementation we can >> merge. There are a lot of optimization opportunities for local >> communications and we can do that after we have a correct and accepted >> implementation. Anything else is optimizing too soon, and will >> just be muddying the waters. > > what I fear is that once something is in, the kernel will > just become slower (as it already did in some areas) and > nobody will care/be-able to fix that later on ... If nobody cares it doesn't matter. If no one can fix it that is a problem. Which is why we need high standards and clean code, not early optimizations. But on that front each step of the way must be justified on it's own merits. Not because it will give us some holy grail. The way to keep the inter guest performance from degrading is to measure it an complain. But the linux network stack is too big to get in one pass. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 1/1] AF_UNIX Datagram getpeersec [Updated #2]
Some more fixes: > diff -purN -X dontdiff linux-2.6.o/net/unix/af_unix.c linux-2.6. > w/net/unix/af_unix.c > --- linux-2.6.o/net/unix/af_unix.c 2006-06-21 00:02:30.0 -0400 > +++ linux-2.6.w/net/unix/af_unix.c 2006-06-27 09:30:12.0 -0400 > @@ -128,6 +128,28 @@ static atomic_t unix_nr_socks = ATOMIC_I > > #define UNIX_ABSTRACT(sk) (unix_sk(sk)->addr->hash != UNIX_HASH_SIZE) > > +#ifdef CONFIG_SECURITY_NETWORK > +static void unix_get_peersec_dgram(struct sk_buff *skb) > +{ add int err; > + err = security_socket_getpeersec_dgram(skb, UNIXSECDATA(skb), > + UNIXSECLEN(skb)); > + if (err) > + *(UNIXSEC(skb)) = NULL; change to *(UNIXSECDATA(skb)) = NULL; > +} > + > +static inline void unix_set_secdata(struct scm_cookie *scm, struct > sk_buff *skb) > +{ > + scm->secdata = *UNIXSECDATA(skb); > + scm->seclen = UNIXSECLEN(skb); change to scm->seclen = *UNIXSECLEN(skb); > +} > +#else > +static void unix_get_peersec_dgram(struct sk_buff *skb) > +{ } > + > +static inline void unix_set_secdata(struct scm_cookie *scm, struct > sk_buff *skb) > +{ } > +#endif /* CONFIG_SECURITY_NETWORKING */ > + > /* > * SMP locking strategy: > *hash table is protected with spinlock unix_table_lock > @@ -1291,6 +1313,8 @@ static int unix_dgram_sendmsg(struct kio > if (siocb->scm->fp) >unix_attach_fds(siocb->scm, skb); > > + unix_get_peersec_dgram(skb); > + > skb->h.raw = skb->data; > err = memcpy_fromiovec(skb_put(skb,len), msg->msg_iov, len); > if (err) > @@ -1570,6 +1594,7 @@ static int unix_dgram_recvmsg(struct kio >memset(&tmp_scm, 0, sizeof(tmp_scm)); > } > siocb->scm->creds = *UNIXCREDS(skb); > + unix_set_secdata(siocb->scm, skb); > > if (!(flags & MSG_PEEK)) > { - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] [Network namespace] Network device sharing by view
Herbert Poetzl <[EMAIL PROTECTED]> writes: > On Tue, Jun 27, 2006 at 01:09:11PM +0400, Andrey Savochkin wrote: >> >> I'd like to caution about over-optimizing communications between >> different network namespaces. Many optimizations of local traffic >> (such as high MTU) don't look so appealing when you start to think >> about live migration of namespaces. > > I think the 'optimization' (or to be precise: desire > not to sacrifice local/loopback traffic for some use > case as you describe it) does not interfere with live > migration at all, we still will have 'local' and 'remote' > traffic, and personally I doubt that the live migration > is a feature for the masses ... Several things. - The linux loopback device is not strongly optimized, it is a compatibility layer. - Traffic between guests is an implementation detail. There is nothing fundamental in our semantics that says the traffic has to be slow for any workload (except for the limuts imposed by using actual on the wire protocols). The lo shares the same problem. Worry about this case now when it has clearly been shown that there are several possible ways to optimize this and get back any lost local performance is optimizing way too early. Criticize the per namespace performance and all you want. That is pretty much a merge blocker. Unless we do worse than a 1-5% penalty the communication across namespaces is really a non-issue. Even with your large communications flows between guests 1-5% is nothing. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'upstream' branch of wireless-2.6
On Tuesday 27 June 2006 18:12, Jeff Garzik wrote: > Michael Buesch wrote: > > So, I will submit a patch to lower the udelay(10) to udelay(1) > > and we can close the discussion? ;) > > No, that totally avoids my point. Your "otherwise idle machine" test is > probably nowhere near worst case in the field, for loops that can > potentially lock the CPU for a long time upon hardware fault. And then > there are the huge delays in specific functions that I pointed out... wtf are you requesting from me? 1) I proved you that the loop does only spin _once_ or even _less_. 2) If the hardware is faulty, the user must replace it. Because, if the hardware is faulty, it can crash the whole machine anyway, obviously. 3) There is no "huge delay". I proved it with my logs. -> No CPU hog => Nothing to fix. -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/6] [Network namespace] Network device sharing by view
Herbert Poetzl <[EMAIL PROTECTED]> writes: > On Tue, Jun 27, 2006 at 01:54:51PM +0400, Kirill Korotaev wrote: >> >>My point is that if you make namespace tagging at routing time, and >> >>your packets are being routed only once, you lose the ability >> >>to have separate routing tables in each namespace. >> > >> > >> >Right. What is the advantage of having separate the routing tables ? > >> it is impossible to have bridged networking, tun/tap and many other >> features without it. I even doubt that it is possible to introduce >> private netfilter rules w/o virtualization of routing. > > why? iptables work quite fine on a typical linux > system when you 'delegate' certain functionality > to certain chains (i.e. doesn't require access to > _all_ of them) > >> The question is do we want to have fully featured namespaces which >> allow to create isolated virtual environments with semantics and >> behaviour of standalone linux box or do we want to introduce some >> hacks with new rules/restrictions to meet ones goals only? > > well, soemtimes 'hacks' are not only simpler but also > a much better solution for a given problem than the > straight forward approach ... Well I would like to see a hack that qualifies. I watched the linux-vserver irc channel for a while and almost every network problem was caused by the change in semantics vserver provides. In this case when you allow a guest more than one IP your hack while easy to maintain becomes much more complex. Especially as you address each case people care about one at a time. In one shot this goes the entire way. Given how many people miss that you do the work at layer 2 than at layer 3 I would not call this the straight forward approach. The straight forward implementation yes, but not the straight forward approach. > for example, you won't have multiple routing tables > in a kernel where this feature is disabled, no? > so why should it affect a guest, or require modified > apps inside a guest when we would decide to provide > only a single routing table? > >> From my POV, fully virtualized namespaces are the future. > > the future is already there, it's called Xen or UML, or QEMU :) Yep. And now we need it to run fast. >> It is what makes virtualization solution usable (w/o apps >> modifications), provides all the features and doesn't require much >> efforts from people to be used. > > and what if they want to use virtualization inside > their guests? where do you draw the line? The implementation doesn't have any problems with guests inside of guests. The only reason to restrict guests inside of guests is because the we aren't certain which permissions make sense. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'upstream' branch of wireless-2.6
On Tuesday 27 June 2006 18:10, Jeff Garzik wrote: > Michael Buesch wrote: > > On Tuesday 27 June 2006 16:11, Jeff Garzik wrote: > >> Overall, bcm43xx is _really really bad_ about this sort of thing. Just > >> grepping for udelay in bcm43xx_radio.c shows some of the worst > >> offenders. bcm43xx_radio_init2060() and bcm43xx_radio_selectchannel() > >> both look like candidates for using msleep() rather than udelay(). > > > > This is _all_ at initialization time. > > select_channel How often do you select a channel? > > That question is irrelevant, because you have no idea what -else- is > going on in the system, at the point when bcm43xx chooses to spin the > CPU heavily. > > Initialization time means you are definitely not in a hot path, and can > therefore sleep. Ok, again: If you are running a preemptible kernel (I am doing a patch for the non-preemptible case), everything is _already_ fine. We are not spinning long times with locks held or IRQs disabled. I already fixed that. And no, I don't really care for initialization time. I am not going to potentially break the driver to remove 1ms of wasted CPU on ifconfig up. In fact, initialization is and always was done lockless. So we should be fine there, too, actually. We don't know why these delays are there all. And we never will. But as this are all some measuring an calibration routines, they surely have some purpose. We don't know if longer delays in some places may have ill effects. Making the whole thing preemptible (as I am doing / have done) surely has its potential to break the driver. I prefer correct operation over an unnoticable 1ms CPU hog. > > I recently reworked the periodically exectuted workhandlers, > > so that they are preemptible. > > Major classes of users run their kernels without preempt. Please don't > depend on that to avoid bad behavior. I am doing a patch atm. I will add voluntary preemption points, if the kernel is not preemptible. -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 1/1] AF_UNIX Datagram getpeersec [Updated #2]
Hi, Thanks for the updates. I am testing the code now. Some minor fixes (so far): changed all #ifdef CONFIG_SECURITY_NETWORKING to #ifdef CONFIG_SECURITY_NETWORK cheers, Catherine James Morris <[EMAIL PROTECTED]> wrote on 06/27/2006 09:57:15 AM: > On Tue, 27 Jun 2006, Stephen Smalley wrote: > > > What about saving the u32 seclen with the secdata, and using it later > > rather than recomputing strlen(secdata)? That also avoids encoding an > > assumption in the af_unix code about the content of the data (i.e. > > NUL-terminated string), leaving that to the security module. > > Ok, this and other issues are addressed in the patch below, which is now > back to a single patch. > > I also #ifdef'd the security fields in struct unix_skb_parms. > > Please review and test. > > --- > > include/asm-alpha/socket.h |1 + > include/asm-arm/socket.h |1 + > include/asm-arm26/socket.h |1 + > include/asm-cris/socket.h|1 + > include/asm-frv/socket.h |1 + > include/asm-h8300/socket.h |1 + > include/asm-i386/socket.h|1 + > include/asm-ia64/socket.h|1 + > include/asm-m32r/socket.h|1 + > include/asm-m68k/socket.h|1 + > include/asm-mips/socket.h|1 + > include/asm-parisc/socket.h |1 + > include/asm-powerpc/socket.h |1 + > include/asm-s390/socket.h|1 + > include/asm-sh/socket.h |1 + > include/asm-sparc/socket.h |1 + > include/asm-sparc64/socket.h |1 + > include/asm-v850/socket.h|1 + > include/asm-x86_64/socket.h |1 + > include/asm-xtensa/socket.h |1 + > include/linux/net.h |1 + > include/linux/selinux.h | 15 +++ > include/net/af_unix.h|7 +++ > include/net/scm.h| 17 + > net/core/sock.c | 11 +++ > net/unix/af_unix.c | 25 + > security/selinux/exports.c | 11 +++ > security/selinux/hooks.c |8 +++- > 28 files changed, 114 insertions(+), 1 deletion(-) > > diff -purN -X dontdiff linux-2.6.o/include/asm-alpha/socket.h > linux-2.6.w/include/asm-alpha/socket.h > --- linux-2.6.o/include/asm-alpha/socket.h 2006-06-21 00:02:08. > 0 -0400 > +++ linux-2.6.w/include/asm-alpha/socket.h 2006-06-27 02:08:49. > 0 -0400 > @@ -51,6 +51,7 @@ > #define SCM_TIMESTAMP SO_TIMESTAMP > > #define SO_PEERSEC 30 > +#define SO_PASSSEC 34 > > /* Security levels - as per NRL IPv6 - don't actually do anything */ > #define SO_SECURITY_AUTHENTICATION 19 > diff -purN -X dontdiff linux-2.6.o/include/asm-arm/socket.h linux-2. > 6.w/include/asm-arm/socket.h > --- linux-2.6.o/include/asm-arm/socket.h 2006-06-21 00:02:10.0 -0400 > +++ linux-2.6.w/include/asm-arm/socket.h 2006-06-27 02:08:49.0 -0400 > @@ -48,5 +48,6 @@ > #define SO_ACCEPTCONN 30 > > #define SO_PEERSEC 31 > +#define SO_PASSSEC 34 > > #endif /* _ASM_SOCKET_H */ > diff -purN -X dontdiff linux-2.6.o/include/asm-arm26/socket.h > linux-2.6.w/include/asm-arm26/socket.h > --- linux-2.6.o/include/asm-arm26/socket.h 2006-06-21 00:02:10. > 0 -0400 > +++ linux-2.6.w/include/asm-arm26/socket.h 2006-06-27 02:08:49. > 0 -0400 > @@ -48,5 +48,6 @@ > #define SO_ACCEPTCONN 30 > > #define SO_PEERSEC 31 > +#define SO_PASSSEC 34 > > #endif /* _ASM_SOCKET_H */ > diff -purN -X dontdiff linux-2.6.o/include/asm-cris/socket.h > linux-2.6.w/include/asm-cris/socket.h > --- linux-2.6.o/include/asm-cris/socket.h 2006-06-21 00:02:11. > 0 -0400 > +++ linux-2.6.w/include/asm-cris/socket.h 2006-06-27 02:08:49. > 0 -0400 > @@ -50,6 +50,7 @@ > #define SO_ACCEPTCONN 30 > > #define SO_PEERSEC 31 > +#define SO_PASSSEC 34 > > #endif /* _ASM_SOCKET_H */ > > diff -purN -X dontdiff linux-2.6.o/include/asm-frv/socket.h linux-2. > 6.w/include/asm-frv/socket.h > --- linux-2.6.o/include/asm-frv/socket.h 2006-06-21 00:02:11.0 -0400 > +++ linux-2.6.w/include/asm-frv/socket.h 2006-06-27 02:08:49.0 -0400 > @@ -48,6 +48,7 @@ > #define SO_ACCEPTCONN 30 > > #define SO_PEERSEC 31 > +#define SO_PASSSEC 34 > > #endif /* _ASM_SOCKET_H */ > > diff -purN -X dontdiff linux-2.6.o/include/asm-h8300/socket.h > linux-2.6.w/include/asm-h8300/socket.h > --- linux-2.6.o/include/asm-h8300/socket.h 2006-06-21 00:02:11. > 0 -0400 > +++ linux-2.6.w/include/asm-h8300/socket.h 2006-06-27 02:08:49. > 0 -0400 > @@ -48,5 +48,6 @@ > #define SO_ACCEPTCONN 30 > > #define SO_PEERSEC 31 > +#define SO_PASSSEC 34 > > #endif /* _ASM_SOCKET_H */ > diff -purN -X dontdiff linux-2.6.o/include/asm-i386/socket.h > linux-2.6.w/include/asm-i386/socket.h > --- linux-2.6.o/include/asm-i386/socket.h 2006-06-21 00:02:12. > 0 -0400 > +++ linux-2.6.w/include/asm-i386/socket.h 2006-06-27 02:08:
Re: [PATCH 17/21] e1000: add ich8lan core functions
Jeff Garzik wrote: Kok, Auke wrote: This implements the core new functions needed for ich8's internal NIC. This includes: * ich8 specific read/write code * flash/nvm access code * software semaphore flag functions * 10/100 PHY (fe - no gigabit speed) support for low-end versions * A workaround for a powerdown sequence problem discovered that affects a small number of motherboard. Signed-off-by: Jesse Brandeburg <[EMAIL PROTECTED]> Signed-off-by: Auke Kok <[EMAIL PROTECTED]> --- drivers/net/e1000/e1000_hw.c| 1000 +++ drivers/net/e1000/e1000_hw.h| 386 +++ drivers/net/e1000/e1000_osdep.h | 13 + 3 files changed, 1392 insertions(+), 7 deletions(-) If it takes this much code to support ICH8, it seems like a e1000-ich8.c would be warranted... that's work in progress - Jeb Cramer has been working on this for a while now but unfortunately it's not ready, and getting ich8 supported in a way that we know that doesn't introduce new bugs is more important. This patch adds tested and validated support for these chipsets that has been hammered by our test team. We are planning (working) on cleaning it all up (including whitespace!) - but getting the ich8 support out is more important - people can buy the hardware today. Cheers, Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html