Re: Multiqueue and virtualization WAS(Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
From: jamal [EMAIL PROTECTED] Date: Fri, 06 Jul 2007 10:39:15 -0400 If the issue is usability of listing 1024 netdevices, i can think of many ways to resolve it. I would agree with this if there were a reason for it, it's totally unnecessary complication as far as I can see. These virtual devices are an ethernet with the subnet details exposed to the driver, nothing more. I see zero benefit to having a netdev for each guest or node we can speak to whatsoever. It's a very heavy abstraction to use for something that is so bloody simple. My demux on -hard_start_xmit() is _5 DAMN LINES OF CODE_, you want to replace that with a full netdev because of some minor difficulties in figuring out to record the queuing state. It's beyond unreasonable. Netdevs are like salt, if you put too much in your food it tastes awful. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] net: netdevice mtu assumptions documentation
From: Stephen Hemminger [EMAIL PROTECTED] Date: Fri, 6 Jul 2007 11:31:17 -0700 Document the expectations about device MTU handling. The documentation about oversize packet handling is probably too loose. IMHO devices should drop oversize packets for robustness, but many devices allow it now. For example, if you set mtu to 1200 bytes, most ether devices will allow a 1500 byte frame in. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] Applied, thanks Stephen. I agree that drivers should drop oversized frames, otherwise setting up PMTU test scenerios is more difficult than it really needs to be. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.23 take 2] Per-datagram TTL and TOS via sendmsg()
[This time with automatic carriage return off :-$] This patch adds support for specifying IPv4 Time-To-Live (IP_TTL) and/or Type-Of-Service (IP_TOS) values on a per datagram basis through sendmsg() ancilliary data. Until then, it only worked for IPv6 sockets (using IPV6_HOPLIMIT and IPV6_TCLASS). Signed-off-by: Rémi Denis-Courmont [EMAIL PROTECTED] diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 62daf21..7a6dc33 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -140,6 +140,8 @@ struct inet_sock { int length; /* Total length of all frames */ __be32 addr; struct flowifl; + __s16 ttl; + __s16 tos; } cork; }; diff --git a/include/net/ip.h b/include/net/ip.h index abf2820..dcfdb41 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -54,6 +54,8 @@ struct ipcm_cookie __be32 addr; int oif; struct ip_options *opt; + __s16 ttl; + __s16 tos; }; #define IPCB(skb) ((struct inet_skb_parm*)((skb)-cb)) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 02a899b..e10852d 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -392,8 +392,9 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) icmp_param-data.icmph.checksum = 0; icmp_out_count(icmp_param-data.icmph.type); - inet-tos = ip_hdr(skb)-tos; daddr = ipc.addr = rt-rt_src; + ipc.tos = ip_hdr(skb)-tos; + ipc.ttl = MULTICAST(daddr) ? inet-mc_ttl : inet-uc_ttl; ipc.opt = NULL; if (icmp_param-replyopts.optlen) { ipc.opt = icmp_param-replyopts; @@ -438,7 +439,6 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) struct rtable *rt = (struct rtable *)skb_in-dst; struct ipcm_cookie ipc; __be32 saddr; - u8 tos; if (!rt) goto out; @@ -526,9 +526,9 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) saddr = 0; } - tos = icmp_pointers[type].error ? ((iph-tos IPTOS_TOS_MASK) | - IPTOS_PREC_INTERNETCONTROL) : - iph-tos; + ipc.tos = icmp_pointers[type].error ? ((iph-tos IPTOS_TOS_MASK) | + IPTOS_PREC_INTERNETCONTROL) : + iph-tos; if (ip_options_echo(icmp_param.replyopts, skb_in)) goto out_unlock; @@ -545,7 +545,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) icmp_param.skb= skb_in; icmp_param.offset = skb_network_offset(skb_in); icmp_out_count(icmp_param.data.icmph.type); - inet_sk(icmp_socket-sk)-tos = tos; + ipc.ttl = -1; ipc.addr = iph-saddr; ipc.opt = icmp_param.replyopts; @@ -557,7 +557,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) icmp_param.replyopts.faddr : iph-saddr, .saddr = saddr, - .tos = RT_TOS(tos) + .tos = RT_TOS(ipc.tos) } }, .proto = IPPROTO_ICMP, diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 34ea454..67ce657 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -806,6 +806,8 @@ int ip_append_data(struct sock *sk, dst_mtu(rt-u.dst.path); inet-cork.rt = rt; inet-cork.length = 0; + inet-cork.ttl = ipc-ttl; + inet-cork.tos = ipc-tos; sk-sk_sndmsg_page = NULL; sk-sk_sndmsg_off = 0; if ((exthdrlen = rt-u.dst.header_len) != 0) { @@ -1233,7 +1235,9 @@ int ip_push_pending_frames(struct sock *sk) if (inet-cork.flags IPCORK_OPT) opt = inet-cork.opt; - if (rt-rt_type == RTN_MULTICAST) + if (inet-cork.ttl != -1) + ttl = inet-cork.ttl; + else if (rt-rt_type == RTN_MULTICAST) ttl = inet-mc_ttl; else ttl = ip_select_ttl(inet, rt-u.dst); @@ -1245,7 +1249,7 @@ int ip_push_pending_frames(struct sock *sk) iph-ihl += opt-optlen2; ip_options_build(skb, opt, inet-cork.addr, rt, 0); } - iph-tos = inet-tos; + iph-tos = (inet-cork.tos != -1) ? inet-cork.tos : inet-tos; iph-tot_len = htons(skb-len); iph-frag_off = df; ip_select_ident(iph, rt-u.dst, sk); @@ -1343,6
[PATCH net-2.6.23 take 3] Per-datagram TTL and TOS via sendmsg()
[Hmm, stupid me. Right this time. Sorry for the line noise.] This patch adds support for specifying IPv4 Time-To-Live (IP_TTL) and/or Type-Of-Service (IP_TOS) values on a per datagram basis through sendmsg() ancilliary data. Until then, it only worked for IPv6 sockets (using IPV6_HOPLIMIT and IPV6_TCLASS). Signed-off-by: Rémi Denis-Courmont [EMAIL PROTECTED] diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 62daf21..7a6dc33 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -140,6 +140,8 @@ struct inet_sock { int length; /* Total length of all frames */ __be32 addr; struct flowifl; + __s16 ttl; + __s16 tos; } cork; }; diff --git a/include/net/ip.h b/include/net/ip.h index abf2820..dcfdb41 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -54,6 +54,8 @@ struct ipcm_cookie __be32 addr; int oif; struct ip_options *opt; + __s16 ttl; + __s16 tos; }; #define IPCB(skb) ((struct inet_skb_parm*)((skb)-cb)) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 02a899b..e10852d 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -392,8 +392,9 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) icmp_param-data.icmph.checksum = 0; icmp_out_count(icmp_param-data.icmph.type); - inet-tos = ip_hdr(skb)-tos; daddr = ipc.addr = rt-rt_src; + ipc.tos = ip_hdr(skb)-tos; + ipc.ttl = MULTICAST(daddr) ? inet-mc_ttl : inet-uc_ttl; ipc.opt = NULL; if (icmp_param-replyopts.optlen) { ipc.opt = icmp_param-replyopts; @@ -438,7 +439,6 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) struct rtable *rt = (struct rtable *)skb_in-dst; struct ipcm_cookie ipc; __be32 saddr; - u8 tos; if (!rt) goto out; @@ -526,9 +526,9 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) saddr = 0; } - tos = icmp_pointers[type].error ? ((iph-tos IPTOS_TOS_MASK) | - IPTOS_PREC_INTERNETCONTROL) : - iph-tos; + ipc.tos = icmp_pointers[type].error ? ((iph-tos IPTOS_TOS_MASK) | + IPTOS_PREC_INTERNETCONTROL) : + iph-tos; if (ip_options_echo(icmp_param.replyopts, skb_in)) goto out_unlock; @@ -545,7 +545,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) icmp_param.skb= skb_in; icmp_param.offset = skb_network_offset(skb_in); icmp_out_count(icmp_param.data.icmph.type); - inet_sk(icmp_socket-sk)-tos = tos; + ipc.ttl = -1; ipc.addr = iph-saddr; ipc.opt = icmp_param.replyopts; @@ -557,7 +557,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) icmp_param.replyopts.faddr : iph-saddr, .saddr = saddr, - .tos = RT_TOS(tos) + .tos = RT_TOS(ipc.tos) } }, .proto = IPPROTO_ICMP, diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 34ea454..67ce657 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -806,6 +806,8 @@ int ip_append_data(struct sock *sk, dst_mtu(rt-u.dst.path); inet-cork.rt = rt; inet-cork.length = 0; + inet-cork.ttl = ipc-ttl; + inet-cork.tos = ipc-tos; sk-sk_sndmsg_page = NULL; sk-sk_sndmsg_off = 0; if ((exthdrlen = rt-u.dst.header_len) != 0) { @@ -1233,7 +1235,9 @@ int ip_push_pending_frames(struct sock *sk) if (inet-cork.flags IPCORK_OPT) opt = inet-cork.opt; - if (rt-rt_type == RTN_MULTICAST) + if (inet-cork.ttl != -1) + ttl = inet-cork.ttl; + else if (rt-rt_type == RTN_MULTICAST) ttl = inet-mc_ttl; else ttl = ip_select_ttl(inet, rt-u.dst); @@ -1245,7 +1249,7 @@ int ip_push_pending_frames(struct sock *sk) iph-ihl += opt-optlen2; ip_options_build(skb, opt, inet-cork.addr, rt, 0); } - iph-tos = inet-tos; + iph-tos = (inet-cork.tos != -1) ? inet-cork.tos : inet-tos; iph-tot_len = htons(skb-len); iph-frag_off = df; ip_select_ident(iph, rt-u.dst, sk);
[PATCH] CXGB3: Replace kcalloc(1,...) with kmalloc() in cxgb3_offload.c.
Signed-off-by: Robert P. J. Day [EMAIL PROTECTED] --- diff --git a/drivers/net/cxgb3/cxgb3_offload.c b/drivers/net/cxgb3/cxgb3_offload.c index ebcf35e..999bc41 100644 --- a/drivers/net/cxgb3/cxgb3_offload.c +++ b/drivers/net/cxgb3/cxgb3_offload.c @@ -1105,7 +1105,7 @@ int cxgb3_offload_activate(struct adapter *adapter) struct mtutab mtutab; unsigned int l2t_capacity; - t = kcalloc(1, sizeof(*t), GFP_KERNEL); + t = kcalloc(sizeof(*t), GFP_KERNEL); if (!t) return -ENOMEM; -- Robert P. J. Day Linux Consulting, Training and Annoying Kernel Pedantry Waterloo, Ontario, CANADA http://fsdev.net/wiki/index.php?title=Main_Page - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)
Stephen Hemminger wrote: I think rather than having a meta-discussion, which always seems to degenerate into finger pointing. Let's look at the code. The problem with popular drivers (as I too well know) is that user's seem to find every wart. There are lots of old drivers that never seem to get the same scrutiny, and if they did would be tarred and feathered. Auke has already posted an early snapshot, archived here: http://marc.info/?l=linux-netdevm=118315951209350w=3 Jeff and I have already commented on the code. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] CXGB3: Replace kcalloc(1,...) with kmalloc() in cxgb3_offload.c.
On Sun, 2007-07-08 at 05:39 -0400, Robert P. J. Day wrote: - t = kcalloc(1, sizeof(*t), GFP_KERNEL); + t = kcalloc(sizeof(*t), GFP_KERNEL); ^ I don't think that's going to work, and shouldn't it probably use kzalloc instead of kmalloc (as you wrote in subject) johannes signature.asc Description: This is a digitally signed message part
[PATCH] ieee1394: first minimal NUMA awareness
Association of a host device with a node on NUMA machines optimizes allocations of skbs given from the networking stack to eth1394. Signed-off-by: Stefan Richter [EMAIL PROTECTED] --- Note, in Linux 2.6.20 and earlier and 2.6.23 and later, eth1394 calls or will call SET_NETDEV_DEV(eth1394_netdev, h-device). drivers/ieee1394/hosts.c |1 + 1 file changed, 1 insertion(+) Index: linux-2.6.22-rc7/drivers/ieee1394/hosts.c === --- linux-2.6.22-rc7.orig/drivers/ieee1394/hosts.c +++ linux-2.6.22-rc7/drivers/ieee1394/hosts.c @@ -154,6 +154,7 @@ struct hpsb_host *hpsb_alloc_host(struct memcpy(h-device, nodemgr_dev_template_host, sizeof(h-device)); h-device.parent = dev; + set_dev_node(h-device, dev_to_node(dev)); snprintf(h-device.bus_id, BUS_ID_SIZE, fw-host%d, h-id); h-host_dev.parent = h-device; -- Stefan Richter -=-=-=== -=== -=--- http://arcgraph.de/sr/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [NET]: Another unnecessary net/tcp.h inclusion in net/dn.h
No longer needed. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- include/net/dn.h |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/include/net/dn.h b/include/net/dn.h index ac4ce90..6277783 100644 --- a/include/net/dn.h +++ b/include/net/dn.h @@ -3,7 +3,6 @@ #include linux/dn.h #include net/sock.h -#include net/tcp.h #include asm/byteorder.h #define dn_ntohs(x) le16_to_cpu(x) -- 1.5.0.6
Re: [RFC 2/2] shrink size of scatterlist on common i386/x86-64
On Fri, Jul 06, 2007 at 10:14:56AM -0700, Williams, Mitch A wrote: David Miller wrote: Okay, but then using SG lists makes skbuff's much bigger. fraglistscatterlistper skbuff 32 bit 8 20 +12 * 18 = +216! 64 bit 16 32 +16 * 18 = +288 So never mind... I know, this is why nobody ever really tries to tackle this. I'll do a fraglist to scatter list set of routines, but not sure if it's worth it. It's better to add dma_map_skb() et al. interfaces to be honest. Also even with the scatterlist idea, we'd still need to do two map calls, one for skb-data and one for the page vector. FWIW, I tried this about a year ago to try to improve e1000 performance on pSeries. I was hoping to simplify the driver transmit code and make IOMMU mapping easier. This was on 2.6.16 or thereabouts. Net result: zilch. No performance increase, no noticeable CPU utilization benefits. Nothing. So I dropped it. Do you have pointers to the patches perchance? Slightly off topic: The real problem that I saw on pSeries is lock contention for the IOMMU. It's architected with a single table per slot, which is great in that two boards in separate slots won't have lock contention. However, this all goes out the window when you drop a quad-port gigabit adapter in there. The time spent waiting for the IOMMU table lock goes up exponentially as you activate each additional port. In my opinion, IOMMU table locking is the major issue with this type of architecture. Since both Intel and AMD are touting IOMMUs for virtual- ization support, this is an issue that's going to need a lot of scrutiny. Agreed. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] shrink size of scatterlist on common i386/x86-64
On Fri, Jul 06, 2007 at 12:20:19PM -0700, David Miller wrote: From: Williams, Mitch A [EMAIL PROTECTED] Date: Fri, 6 Jul 2007 10:14:56 -0700 In my opinion, IOMMU table locking is the major issue with this type of architecture. Since both Intel and AMD are touting IOMMUs for virtual- ization support, this is an issue that's going to need a lot of scrutiny. For the allocation of IOMMU entries themselves you can play tricks using atomic operations on 64-bit words of the allocator bitmap to avoid locking that. Hmm, any pointers? You can use per-cpu salts to determine where to start the search and avoid hitting the same cachelines as other cpus working on the same table. But you'll need to lock in order to flush the IOMMU tlb I'm afraid. The way to mitigate that is to only flush the IOMMU tlb once per allocator generation. That works, but isn't optimal when you have an isolation-capable IOMMU and you want the full isolation properties of the IOMMU. If you only flush the IOTLB when the allocator wraps around, a stale entry in the IOTLB can allow a DMA to go through for an IO entry that has already been unmapped. One way to mitigate that and still retain full isolation is to make sure no one else gets to use the frames that are the targets of the DMA until the translation has been flushed out of the IOTLB, but that requires pretty deep surgery. Cheers, Muli - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)
Stephen Hemminger wrote: I would really like to continue with my original plan that I posted that follows Christoph's idea. I hope you can all agree with that so we can get on with this. I think rather than having a meta-discussion, which always seems to degenerate into finger pointing. Let's look at the code. The problem with popular drivers (as I too well know) is that user's seem to find every wart. There are lots of old drivers that never seem to get the same scrutiny, and if they did would be tarred and feathered. I'd second this; also lets be honest and fair about things and use a similar standard for all drivers; are we going to ask all driver submitters to remove NAPI, TSO and other stuff? I would hope not. Are all new drivers that have even a single bitfield going to be rejected? That's be a new rule but ok, if it applies to all drivers it's fair. So the question in my opinion should be is this driver good enough for merging, and if not, what specifically is wrong enough to hold of merging not what would the perfect ideal we-have-all-the-time-in-the-world driver be and certainly not how is this going to work in an enterprise distro since the later is the problem for the enterprise disro that they get paid to solve, not for kernel.org kernels. What is there now is a driver that works, is relatively clean (and yes if you look deep enough you can ALWAYS nitpick on any code, even Linus' or Jeff's best code) and provides good performance with all the features a modern ethernet driver is supposed to have. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)
Kok, Auke wrote: This is not acceptable and hardly fair to expect from us. It also exposes users to endless delays and uncertainties as to a final resolution. Not to mention that writing a driver from scratch for (just) ich9 will take significant time, is silly since it's almost identical to ich8 etc.. I don't think that anyone besides you and maybe one or two others are interested in doing this rewrite from scratch. The rest of us would rather see something much more similar to my original suggestion which relieves the ich9-wishes for everyone and provides a good starting point to go forward. Not to mention that a lot of that code is already there, and at least cleaned up quite a bit already. Let's review the history here: * For -years-, Intel has been told to stream structural cleanups into e1000 (and ixgb), along with feature enablement. None of these cleanups happened, and the result was today's bloated driver. * Intel posts a new e1000 driver, which is just as complex, if not more so. * Intel proposes a Linux user transition plan that amounts to an abrupt NIC driver switch for users on a widely used NIC platform. * Intel immediately dismisses all alternate paths to e1000 change, and all major e1000new structural comments. Thus in effect you are presenting the Linux community with one option (and only one) -- your e1000new as it exists today -- and demanding that that option be accepted. More importantly, Intel is asking the Linux community to TRUST that Intel will be a good driver MAINTAINER, despite lack of encouraging evidence in that regard. Will we continue to see INATTENTION to basic driver maintenance, with 100% of resources being devoted new features, and 0% devoted to thinking about how best to create a maintainable driver for those features long term? With e1000new, I see a driver that's internally modular, but no more maintainable. I also see a driver that runs directly counter to what we normally do in this situation -- split it up into multiple drivers. When you find yourself reinventing a net driver API, that's a big hint your engineering is ass-backwards. It's not acceptable and hardly fair to expect the Linux community to blindly swallow the single option you've presented us. We're talking about one of the most widely used NIC drivers here. Proposing to flip a switch and dump all users on the new driver in a couple months time does not serve Linux users at all. Introducing the new driver more slowly -- starting with ICH9 and any other PCI IDs not covered by e1000 -- gives the driver a chance to prove itself in the field and be introduced more slowly, while at the same time not dramatically disturbing the existing, working e1000 installed base. Starting with a new driver for the new hardware is the right way to go. Get that driver into the field, get it right, and THEN use that field experience to see what PCI IDs you want to transition from the older driver. That path is minimal disruption for Linux users, and maximizes the changes of Linux users running a working, proven driver. And coincidentally, that path also gives you the freedom to do heavy development on the new driver, with minimal disruption of the majority of your userbase. So in sum, (a) Intel does not appear to have thought through the driver transition, (b) Intel has not proven itself to be a maintainer, and (c) e1000new needs to be restructured. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)
Arjan van de Ven wrote: I'd second this; also lets be honest and fair about things and use a similar standard for all drivers; are we going to ask all driver submitters to remove NAPI, TSO and other stuff? I would hope not. Are That was merely a suggestion. My general meaning was small driver is easier to get into the kernel and into the field. all new drivers that have even a single bitfield going to be rejected? That's be a new rule but ok, if it applies to all drivers it's fair. That's one of a gazillion checklist items that a new driver has to pass through. You know how it works for new drivers. it looks better than our last try and it looks better than that ugly driver over there does not grant you a free pass on review. I'm not asking for perfection, just something other than * e1000 gets feedback * Intel disappears for months * Intel reappears with e1000 rewrite * Intel fights tooth and nail when the driver is not accepted verboten And specifically I worry that three years down the road we will reach the same point again, when e1000new is even more complex. Will Intel shrug its shoulders and decide its time for another rewrite? So the question in my opinion should be is this driver good enough for merging, and if not, what specifically is wrong enough to hold of merging not what would the perfect ideal we-have-all-the-time-in-the-world driver be and certainly not how is this going to work in an enterprise distro since the later is the problem for the enterprise disro that they get paid to solve, not for kernel.org kernels. What is there now is a driver that works, is relatively clean (and yes if you look deep enough you can ALWAYS nitpick on any code, even Linus' or Jeff's best code) and provides good performance with all the features a modern ethernet driver is supposed to have. Is this is the attitude, what's the point of even posting the driver for review? Intel posted e1000new on June 29. Feedback was then posted. Ten days later, without a single revision, Intel declares its own driver clean and working and good enough for merging as-is. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)
Stephen Hemminger wrote: I think rather than having a meta-discussion, which always seems to degenerate into finger pointing. Let's look at the code. The problem with popular drivers (as I too well know) is that user's seem to find every wart. There are lots of old drivers that never seem to get the same scrutiny, and if they did would be tarred and feathered. it's not as scruffy as that driver in the corner is no way to maintain a kernel. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFR: New e1000 driver (e1000new), was: Re: e1000: backport ich9 support from 7.5.5 ?
Arjan van de Ven wrote: Kok, Auke wrote: Jeff Garzik wrote: Andrew Morton wrote: On Fri, 29 Jun 2007 14:39:20 -0700 Kok, Auke [EMAIL PROTECTED] wrote: That's why we want to introduce a second e1000 driver (named differently, pick any name) that contains the new code base, side-by-side into the kernel with the current e1000. Sounds like a reasonable approach to me (it has plenty of precedent). But I forget what all the other issues were, so ignore me. Given past history with duplicate drivers and the problems that they cause -- I know, I've caused some of those problems :( -- I strongly recommend against when it can be avoided. Leaving e1000 with current hardware, and a new e1001 for newer hardware should be easier to manage for all involved, without the headaches that duplicate drivers cause. Jeff, ok first you hate the old e1000 and now you don't want to get rid of it ;) No -- e1000 is big and bloated but also stable and working and a key popular NIC driver for a lot of people. That means the transition has a wide impact, and thus should be considered a lot more carefully than (say) rewriting 8139cp. I reject the notion that a flag day switchover for a huge mass of e1000 users is the correct path. I do not think that best serves Linux users. It is better to get a new driver out in the field and working for a small population, let time pass, then consider if we really want to transition the entire e1000 population to the new driver. That leaves the mass of e1000 users with a working driver while the new driver proves itself. Since a small population of users is required to use the new driver, by virtue of new/old PCI ID split, you are guaranteed a population of test users immediately for the new driver. When the new driver proves itself stable, you will have plenty of knowledge from which to decide whether to transition 8257x users, all e1000 users, or whatever. I appreciate the pain a temporary dual driver situation gives; it comes down to a few things that I can think of right now, if you see more please add to the list. 1) users who find a bug in the new one silently use the old one rather than reporting the bug; and only scream when the old one eventually goes away (see ALSA/OSS duplication) 2) users who enable both in KConfig may get a random one 3) distros really prefer only 1 driver per PCI ID for their infrastructure tools 4) there will be resistance against deleting the old one meaning it might not happen You are missing the largest source of pain and headache: Users will use the default driver, which means no field testing at all until flag day, with obvious results. Furthermore, Linux kernel history demonstrates that temporary dual driver situations are rarely temporary. Thus, selling it as such in the face of all contrary experience is pure hyperbole. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFR: New e1000 driver (e1000new), was: Re: e1000: backport ich9 support from 7.5.5 ?
Roland Dreier wrote: one possibility would be to merge e1000new with support only for chips not supported by e1000, and semi-freeze e1000 (fixes only, new device support goes into e1000new). indeed. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFR: New e1000 driver (e1000new), was: Re: e1000: backport ich9 support from 7.5.5 ?
James Chapman wrote: I envisage something like this:- e1000_core.c- common code used by all drivers of the e1000 family. Exports functions used by actual drivers. Loadable as a separate module when built as a module. e1000_82541.c- driver for 82541 e1000_82542.c- driver for 82542 e1000_x.c- driver for x Please ignore the statement I made above. Having read the code and chip docs again, I realize I jumped to the wrong conclusions when I saw separate source files for each of the 7 chip variants in the new driver. Just wanted to nip it in the bud before it causes any wasted time discussing it. :) -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)
On 7/8/07, Jeff Garzik [EMAIL PROTECTED] wrote: * e1000 gets feedback * Intel disappears for months * Intel reappears with e1000 rewrite * you ask them for another complete (simpler) rewrite * Intel fights tooth and nail when the driver is not accepted verboten I don't think it must be as-is (i.e. replacing e1000 for all HW) but don't throw away all the work they've done in architecting the driver to cleanly handle multiple chip generations. How about: 1) Considering e1000new's current design, but for ICH9 only 2) test test test 3) Sometime in the future, considering incrementally moving previous PCIe generations' support from e1000 to e1000new (like I initially wanted, since that at least means there would be some technical reason for where the split occurs :-) Regards -- Andy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)
Jeff Garzik wrote: Arjan van de Ven wrote: I'd second this; also lets be honest and fair about things and use a similar standard for all drivers; are we going to ask all driver submitters to remove NAPI, TSO and other stuff? I would hope not. Are That was merely a suggestion. My general meaning was small driver is easier to get into the kernel and into the field. it didn't quite come across as a suggestion, but if it was only a suggestion then ok; it's not really a practical option. all new drivers that have even a single bitfield going to be rejected? That's be a new rule but ok, if it applies to all drivers it's fair. That's one of a gazillion checklist items that a new driver has to pass through. You know how it works for new drivers. it looks better than our last try and it looks better than that ugly driver over there does not grant you a free pass on review. I know how it works for new drivers; they need to be good enough to maintain forward and form a burden on the stack/maintainers. Any remaining items need to be small enough to not be a 'risk' for users. What is there now is a driver that works, is relatively clean (and yes if you look deep enough you can ALWAYS nitpick on any code, even Linus' or Jeff's best code) and provides good performance with all the features a modern ethernet driver is supposed to have. Is this is the attitude, what's the point of even posting the driver for review? Intel posted e1000new on June 29. Feedback was then posted. and feedback is being incorperated. Ten days later, without a single revision, Intel declares its own driver clean and working and good enough for merging as-is. No. Arjan looked at driver and sees a rather clean driver compared to some of the other recently merged drivers, sees a list of relatively simple todos (some of which obviously aren't merge stoppers, others are) and hopes that this driver will be treated objectively without the appearance of having different levels of bars depending on what your email address ends with. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFR: New e1000 driver (e1000new), was: Re: e1000: backport ich9 support from 7.5.5 ?
I reject the notion that a flag day switchover for a huge mass of e1000 users is the correct path. I do not think that best serves Linux users. so the options we have is do it one pci id a time; and the suggestion by others like Christoph has been to make the split at the PCI Express switchover. Do you agree with that approach? I appreciate the pain a temporary dual driver situation gives; it comes down to a few things that I can think of right now, if you see more please add to the list. 1) users who find a bug in the new one silently use the old one rather than reporting the bug; and only scream when the old one eventually goes away (see ALSA/OSS duplication) 2) users who enable both in KConfig may get a random one 3) distros really prefer only 1 driver per PCI ID for their infrastructure tools 4) there will be resistance against deleting the old one meaning it might not happen You are missing the largest source of pain and headache: Users will use the default driver, which means no field testing at all until flag day, with obvious results. which is why the proposal that was below the text you quoted talks about working with the distro kernel maintainers and get the new driver default for their development releases etc etc that adds a significant level of granularity. For example I'd not be surprised if Fedora and Ubuntu test releases have orders of magnitude more users than kernel.org self compiloe kernels. Furthermore, Linux kernel history demonstrates that temporary dual driver situations are rarely temporary. Thus, selling it as such in the face of all contrary experience is pure hyperbole. history also demonstrates it can be done (just look at Adrian Bunk and the OSS drivers) as long as someone is determined to do it. It's not easy, especially cases where the two drivers have different maintainers who disagree, or represent different interfaces/functionality; but it's very not impossible either. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 8724] New: Unaligned acess in udp_recvmsg() on EV56
On Sun, 8 Jul 2007 14:30:17 -0700 (PDT) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8724 Summary: Unaligned acess in udp_recvmsg() on EV56 Product: Platform Specific/Hardware Version: 2.5 KernelVersion: 2.6.22-rc7-git7 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Alpha AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: Occurs in all 2.6.2[12] at least Distribution: Debian testing/lenny Hardware Environment: Digital PWS 433au (EV56) Software Environment: Linux utopia 2.6.22-rc7-git7 #1 Thu Jul 8 10:34:17 CDT 2027 alpha GNU/Linux Gnu C 4.2.1 Gnu make 3.81 binutils (GNU Binutils for Debian) 2.17.50.20070426 util-linux 2.12r mount 2.12r module-init-tools 3.3-pre11 e2fsprogs 1.40-WIP xfsprogs 2.8.18 Linux C Library libc.2.5 Dynamic linker (ldd) 2.5 Procps 3.2.7 Net-tools 1.60 Kbd85: Sh-utils 5.97 udev 105 Modules Loaded ipt_TOS xt_multiport xt_tcpudp xt_state ip6table_mangle ip6table_filter ip6_tables ipv6 iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle iptable_filter ip_tables x_tables dm_mod serio_raw mxser_new ide_generic via_rhine mii generic ide_core tulip bitrev crc32 sg sr_mod cdrom raid1 md_mod loop Problem Description: kernel unaligned acc: 2248 (pc=fc583de4,va=fc00071c382a) fc583bc0 T udp_recvmsg fc583ed0 T udp_destroy_sock kernel unaligned acc: 1231 (pc=fc585190,va=fc000795e02e) fc585120 T __udp4_lib_rcv fc585bf0 T udp_rcv This problem does NOT seem to affect my Tsunami/Shark (EV68AL) box running the same kernel version on the same LAN. Not sure if it's CPU generation related (EV5 vs EV6), or if it's the NIC (via_rhine on the EV5 vs e100 on the EV6). Steps to reproduce: According to tshark, the only UDP packets are DHCP packets: 1815064200.131003 10.5.128.1 - 255.255.255.255 DHCP DHCP Offer- Transaction ID 0x27c6 That output isn't terribly illuminating. We'd need to work out which code corresponds with pc=fc583de4 and pc=fc585190. I don't think there's necessarily a bug here: that's just the kernel telling us that there are unaligned accesses which got successfully fixed up, so we're being perhaps a bit inefficient. Yes? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Cbe-oss-dev] [PATCH] ps3: gigabit ethernet driver for PS3, take3
Hi, On Fri, 6 Jul 2007 13:02:41 -0500, [EMAIL PROTECTED] (Linas Vepstas) mentioned: of the spidernet device driver. Please note that the old spidernet had absolutely disasterous performance for transmit; it also had a variety of crazy hangs and lockups under high-stress conditions; or NFS operation, or certain back-to-back tcp usage scenarios. A few dozen bugfixes went in since the time that the gelic snapshot was taken. I think we know the problems of old spidernet issues, we uses QS20 also in our lab. Current gelic has fixed them all separated from your work and it have no remaining issues or performance problem. In our measurement, PS3 network performance is better than IBM QS20 right now. I totally understand difficultness of your effort that fixing spidernet without decent documentation which we have, but the changes we made was significantly large as you see if you diff gelic driver with current spidernet driver (totally different), so the conclusion of discussion at 3C common-linux wg (Sony, IBM and Toshiba) was to include main line differently first then consider unifying it later. Otherwise we won't able to see PS3 network stack working for long time. Akira -- Akira Tsukamoto Sony Computer Entertainment Inc. Computer Development Div. Distributed OS Development Dept. Japan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [2.6.22] Fix a potential NULL pointer dereference in free_shared_mem() in drivers/net/s2io.c
Micah Gruber wrote: This patch fixes a potential null dereference bug where we dereference nic before a null check. This patch simply moves the dereferencing after the null check. Signed-off-by: Micah Gruber [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] any chance you can resend in an email format other than format=flowed? Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [2.6.22] Fix a potential NULL pointer dereference in free_shared_mem() in drivers/net/s2io.c
This patch fixes a potential null dereference bug where we dereference nic before a null check. This patch simply moves the dereferencing after the null check. Signed-off-by: Micah Gruber [EMAIL PROTECTED] --- a/drivers/net/s2io.c +++ b/drivers/net/s2io.c @@ -789,12 +789,14 @@ struct mac_info *mac_control; struct config_param *config; int lst_size, lst_per_page; - struct net_device *dev = nic-dev; + struct net_device *dev; int page_num = 0; if (!nic) return; + dev = nic-dev; + mac_control = nic-mac_control; config = nic-config; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [2.6.22] Remove unneeded pointer idev from addrconf_cleanup() in net/ipv6/addrconf.c
This trivial patch removes the unneeded pointer idev returned from __in6_dev_get(), which is never used. The check for NULL can be simply done by if (__in6_dev_get(dev) == NULL). Signed-off-by: Micah Gruber [EMAIL PROTECTED] --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4240,7 +4240,6 @@ void __exit addrconf_cleanup(void) { struct net_device *dev; - struct inet6_dev *idev; struct inet6_ifaddr *ifa; int i; @@ -4258,7 +4257,7 @@ */ for_each_netdev(dev) { - if ((idev = __in6_dev_get(dev)) == NULL) + if (__in6_dev_get(dev) == NULL) continue; addrconf_ifdown(dev, 1); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html