Re: RED + ECN not working
On 09-01-2007 17:08, [EMAIL PROTECTED] wrote: Hello, I have been trying to get the RED qdisc and ECN to work for the past few weeks and all my experiments have failed. Here is the setup I am using. Src -- R1 -- R2 -- Dst Between Src and R1 is a 100Mbps link and between R1 and R2 a 10Mbps link. I set up the qdisc at R1 as follows tc qdisc add dev eth3 root handle 1: prio tc qdisc add dev eth3 parent 1:1 handle 10: sfq tc qdisc add dev eth3 parent 1:2 handle 20: sfq tc qdisc add dev eth3 parent 1:3 handle 30: red limit 1 min 3000 max 5000 avpkt 1000 burst 5 probability 0.5 bandwidth 256kbit ecn I also inserted printk statments inside the code to print the calculate queue average (RED param), the backlog (Qdisc param) and the queue length (sk_buff_head param). I also inserted print statements for each action of RED i.e. DONT_MARK, PROB_MARK, HARD_MARK and DROP. For the purpose of my experiments, I transferred a 25 MB file. I also did 100 simultaneous TCP transfers for 2 mins using ipref. In all cases none of the packets were either marked or dropped by the RED code. This was verified by the print statements in the logs. For all runs, qavg and backlog were 0 and qlen was 1. I have even tried classless red and got same results. Does anyone know whether RED+ECN work ? Any tests and setups that someone has used and got successful results ? I would really appreciate any input or suggestions on this. Hi, Did you try with lartc.org list? And probably there could be more details, like: - where is eth3 - tc -s -d qdisc show dev eth3 (before and after the transfer) - kernel and iproute versions Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Two Dual Core processors and NICS (not handling interrupts on one CPU/assigning a Two Dual Core processors and NICS (not handling interrupts on one CPU / assigning a CPU to a NIC)
Hello, I have a machine with 2 dual core CPUs. This machine runs Fedora Core 6. I have two Intel e1000 GigaBit network cards on this machine; I use bonding so that the machine assigns the same IP address to both NICs ; It seems to me that bonding is configured OK, bacuse when running: cat /proc/net/bonding/bond0 I get: Ethernet Channel Bonding Driver: v3.0.3 (March 23, 2006) Bonding Mode: load balancing (round-robin) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Link Failure Count: 1 Permanent HW addr: . Slave Interface: eth1 MII Status: up Link Failure Count: 1 Permanent HW addr: (And the Permanent HW addr is diffenet in these two entries). I send a large amount of packets to this machine (more than 20,000 in a second). cat /proc/interrupts shops something like this: CPU0 CPU1 CPU2 CPU3 50:3359337 0 0 0 PCI-MSI eth0 58: 493396136 0 0 PCI-MSI eth1 CPU0 and CPU1 are of the first CPU as far as I understand ; so this means as far as I understand that the second CPU (which has CPU3 and CPU4) does not handle interrupts of the arrived packets; Can I somehow change it so the second CPU will also handle network interrupts of receiving packets on the nic ? Can I assign one CPU to eth0 and the second CPU to eth1 ? Regards, Mark - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: watchdog timeout panic in e1000 driver
Hi, During the holiday season, I posted a patch that fixed this problem without using spinlocks nor disabling interrupts. http://marc.theaimsgroup.com/?l=linux-netdevm=116649413613845w=2 With this patch applied, I confirmed that the system doesn't panic. I think this patch can fix this problem. Does this patch have problems. I welcome any comments. -- Kenzo Iwami ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] [IrDA] irda-usb TX path optimization (was Re: IrDA spams logfiles - since 2.6.19)
Hi Dave, Since we stop using dev_alloc_skb on the IrDA TX frame, we constantly run into the case of the skb headroom being 0, and thus we call skb_cow for every IrDA TX frame. This patch uses a local buffer and memcpy the skb to it, saving us a kmalloc for each of those IrDA TX frames. Signed-off-by: Samuel Ortiz [EMAIL PROTECTED] --- drivers/net/irda/irda-usb.c | 43 --- drivers/net/irda/irda-usb.h |1 + 2 files changed, 21 insertions(+), 23 deletions(-) diff --git a/drivers/net/irda/irda-usb.c b/drivers/net/irda/irda-usb.c index 3ca1082..8381c04 100644 --- a/drivers/net/irda/irda-usb.c +++ b/drivers/net/irda/irda-usb.c @@ -441,25 +441,13 @@ static int irda_usb_hard_xmit(struct sk_buff *skb, struct net_device *netdev) goto drop; } - /* Make sure there is room for IrDA-USB header. The actual -* allocation will be done lower in skb_push(). -* Also, we don't use directly skb_cow(), because it require -* headroom = 16, which force unnecessary copies - Jean II */ - if (skb_headroom(skb) self-header_length) { - IRDA_DEBUG(0, %s(), Insuficient skb headroom.\n, __FUNCTION__); - if (skb_cow(skb, self-header_length)) { - IRDA_WARNING(%s(), failed skb_cow() !!!\n, __FUNCTION__); - goto drop; - } - } + memcpy(self-tx_buff + self-header_length, skb-data, skb-len); /* Change setting for next frame */ - if (self-capability IUC_STIR421X) { __u8 turnaround_time; - __u8* frame; + __u8* frame = self-tx_buff; turnaround_time = get_turnaround_time( skb ); - frame= skb_push(skb, self-header_length); irda_usb_build_header(self, frame, 0); frame[2] = turnaround_time; if ((skb-len != 0) @@ -472,17 +460,17 @@ static int irda_usb_hard_xmit(struct sk_buff *skb, struct net_device *netdev) frame[1] = 0; } } else { - irda_usb_build_header(self, skb_push(skb, self-header_length), 0); + irda_usb_build_header(self, self-tx_buff, 0); } /* FIXME: Make macro out of this one */ ((struct irda_skb_cb *)skb-cb)-context = self; -usb_fill_bulk_urb(urb, self-usbdev, + usb_fill_bulk_urb(urb, self-usbdev, usb_sndbulkpipe(self-usbdev, self-bulk_out_ep), - skb-data, IRDA_SKB_MAX_MTU, + self-tx_buff, skb-len + self-header_length, write_bulk_callback, skb); - urb-transfer_buffer_length = skb-len; + /* This flag (URB_ZERO_PACKET) indicates that what we send is not * a continuous stream of data but separate packets. * In this case, the USB layer will insert an empty USB frame (TD) @@ -1455,6 +1443,9 @@ static inline void irda_usb_close(struct irda_usb_cb *self) /* Remove the speed buffer */ kfree(self-speed_buff); self-speed_buff = NULL; + + kfree(self-tx_buff); + self-tx_buff = NULL; } /** USB CONFIG SUBROUTINES **/ @@ -1753,9 +1744,14 @@ static int irda_usb_probe(struct usb_interface *intf, memset(self-speed_buff, 0, IRDA_USB_SPEED_MTU); + self-tx_buff = kzalloc(IRDA_SKB_MAX_MTU + self-header_length, + GFP_KERNEL); + if (self-tx_buff == NULL) + goto err_out_4; + ret = irda_usb_open(self); if (ret) - goto err_out_4; + goto err_out_5; IRDA_MESSAGE(IrDA: Registered device %s\n, net-name); usb_set_intfdata(intf, self); @@ -1766,14 +1762,14 @@ static int irda_usb_probe(struct usb_interface *intf, self-needspatch = (ret 0); if (self-needspatch) { IRDA_ERROR(STIR421X: Couldn't upload patch\n); - goto err_out_5; + goto err_out_6; } /* replace IrDA class descriptor with what patched device is now reporting */ irda_desc = irda_usb_find_class_desc (self-usbintf); if (irda_desc == NULL) { ret = -ENODEV; - goto err_out_5; + goto err_out_6; } if (self-irda_desc) kfree (self-irda_desc); @@ -1782,9 +1778,10 @@ static int irda_usb_probe(struct usb_interface *intf, } return 0; - -err_out_5: +err_out_6: unregister_netdev(self-netdev); +err_out_5: + kfree(self-tx_buff); err_out_4: kfree(self-speed_buff); err_out_3: diff --git a/drivers/net/irda/irda-usb.h b/drivers/net/irda/irda-usb.h index 6b2271f..e846c38 100644 --- a/drivers/net/irda/irda-usb.h +++
[PATCH 2/2] [IrDA] Removed incorrect IRDA_ASSERT()
With USB2.0 bulk out MTU can be 512 bytes, so checking it only for 64 bytes is incorrect. Signed-off-by: Samuel Ortiz [EMAIL PROTECTED] --- drivers/net/irda/irda-usb.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/drivers/net/irda/irda-usb.c b/drivers/net/irda/irda-usb.c index 8381c04..a66aacf 100644 --- a/drivers/net/irda/irda-usb.c +++ b/drivers/net/irda/irda-usb.c @@ -1515,8 +1515,6 @@ static inline int irda_usb_parse_endpoints(struct irda_usb_cb *self, struct usb_ IRDA_DEBUG(0, %s(), And our endpoints are : in=%02X, out=%02X (%d), int=%02X\n, __FUNCTION__, self-bulk_in_ep, self-bulk_out_ep, self-bulk_out_mtu, self-bulk_int_ep); - /* Should be 8, 16, 32 or 64 bytes */ - IRDA_ASSERT(self-bulk_out_mtu == 64, ;); return((self-bulk_in_ep != 0) (self-bulk_out_ep != 0)); } -- 1.4.4.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rare bad TCP checksum with 2.6.19?
Michael Tokarev [EMAIL PROTECTED] wrote: Note there's no funny/interesting hardware involved, like network cards with tcp checksumming offload capabilities (this is plain dumb 8139 card). The 8139 card might be dumb, but the driver isn't :) It emulates checksum offload in software, meaning that tcpdump will show bogus checksums. So please disable hardware checksum offload with ethtool -K and then try again. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Two Dual Core processors and NICS (not handling interrupts on one CPU/assigning a Two Dual Core processors and NICS (not handling interrupts on one CPU / assigning a CPU to a NIC)
Hi Mark, On 1/15/07, Mark Ryden [EMAIL PROTECTED] wrote: I have a machine with 2 dual core CPUs. This machine runs Fedora Core 6. I have two Intel e1000 GigaBit network cards on this machine; I use bonding so that the machine assigns the same IP address to both NICs ; cat /proc/interrupts shops something like this: CPU0 CPU1 CPU2 CPU3 50:3359337 0 0 0 PCI-MSI eth0 58: 493396136 0 0 PCI-MSI eth1 CPU0 and CPU1 are of the first CPU as far as I understand ; so this means as far as I understand that the second CPU (which has CPU3 and CPU4) does not handle interrupts of the arrived packets; Can I somehow change it so the second CPU will also handle network interrupts of receiving packets on the nic ? Can I assign one CPU to eth0 and the second CPU to eth1 ? How it will help you? Y can set smp-affinity mask for each irq in /proc/irq-number/ google for 'linux smp-affinity. The subject in more details is discussed in: http://linux-net.osdl.org/index.php/TODO#TCP and thread http://marc.theaimsgroup.com/?t=11669529061r=1w=2, read from bottom. -- Sincerely, Robert Iakobashvili, coroberti %x40 gmail %x2e com ... Navigare necesse est, vivere non est necesse ... http://sourceforge.net/projects/curl-loader A powerful open-source HTTP/S, FTP/S traffic generating, loading and testing tool. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPROUTE 04/05]: Replace usec by time in function names
On 10-01-2007 11:01, Patrick McHardy wrote: [IPROUTE]: Replace usec by time in function names Rename functions containing usec since they don't necessarily return usec units anymore. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- ... diff --git a/tc/q_cbq.c b/tc/q_cbq.c index a56..913b26a 100644 --- a/tc/q_cbq.c +++ b/tc/q_cbq.c @@ -500,17 +500,17 @@ static int cbq_print_opt(struct qdisc_ut if (lss show_details) { fprintf(f, \nlevel %u ewma %u avpkt %ub , lss-level, lss-ewma_log, lss-avpkt); if (lss-maxidle) { - fprintf(f, maxidle %luus , tc_core_tick2usec(lss-maxidlelss-ewma_log)); + fprintf(f, maxidle %luus , tc_core_tick2time(lss-maxidlelss-ewma_log)); If not necessarily usec, %luus could be misleading here and later. ... diff --git a/tc/q_netem.c b/tc/q_netem.c index cfd1799..24fb95e 100644 --- a/tc/q_netem.c +++ b/tc/q_netem.c @@ -108,15 +108,15 @@ static int get_ticks(__u32 *ticks, const { unsigned t; - if(get_usecs(t, str)) + if(get_time(t, str)) return -1; - if (tc_core_usec2big(t)) { + if (tc_core_time2big(t)) { fprintf(stderr, Illegal %d usecs (too large)\n, t); Like above but usecs. ... diff --git a/tc/tc_core.c b/tc/tc_core.c index 07dc4ba..e27254e 100644 --- a/tc/tc_core.c +++ b/tc/tc_core.c @@ -27,21 +27,21 @@ static __u32 t2us=1; static __u32 us2t=1; static double tick_in_usec = 1; -int tc_core_usec2big(long usec) +int tc_core_time2big(long time) { - __u64 t = usec; + __u64 t = time; t *= tick_in_usec; return (t 32) != 0; } -long tc_core_usec2tick(long usec) +long tc_core_time2tick(long time) { - return usec*tick_in_usec; + return time*tick_in_usec; } -long tc_core_tick2usec(long tick) +long tc_core_tick2time(long tick) { return tick/tick_in_usec; } Similarly (tick_in_time)? Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove CONFIG_NET_WIRELESS
On Sat, 2007-01-13 at 18:17 +0100, Maarten Lankhorst wrote: Remove CONFIG_NET_WIRELESS Nothing uses this, and it breaks the kernel build if a wireless device is used with a unsupported type of bus. Verified this with a grep. I don't really care about the symbol and I'm in favour of removing it if it is useless, but I don't understand the rationale. How does enabling this cause anything to fail? johannes signature.asc Description: This is a digitally signed message part
Re: [IPROUTE 02/05]: Introduce tc_calc_xmitsize and use where appropriate
On 10-01-2007 11:01, Patrick McHardy wrote: [IPROUTE]: Introduce tc_calc_xmitsize and use where appropriate Add tc_calc_xmitsize() as complement to tc_calc_xmittime(), which calculates the size that can be transmitted at a given rate during a given time. Replace all expressions of the form size = rate*tc_core_tick2usec(time))/100 by tc_calc_xmitsize() calls. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- ... +unsigned tc_calc_xmitsize(unsigned rate, unsigned ticks) +{ + return ((double)rate*tc_core_tick2usec(ticks))/100; +} + Actually, besides of replacing the expression, this function changes its type to unsigned also. Regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove CONFIG_NET_WIRELESS
Johannes Berg schreef: On Sat, 2007-01-13 at 18:17 +0100, Maarten Lankhorst wrote: Remove CONFIG_NET_WIRELESS Nothing uses this, and it breaks the kernel build if a wireless device is used with a unsupported type of bus. Verified this with a grep. I don't really care about the symbol and I'm in favour of removing it if it is useless, but I don't understand the rationale. How does enabling this cause anything to fail? johannes Enabling this doesn't cause anything to fail, but my wireless router doesn't have a pci bus, but instead a native SSB, so CONFIG_NET_WIRELESS isn't selected. This in turn causes wext-common.o to not be built, so I get missing symbols and a build breakage. That's why I made wext-common.o depend on CONFIG_WIRELESS_EXT instead of CONFIG_NET_WIRELESS. Since nothing else uses CONFIG_NET_WIRELESS I decided to kill that symbol. maarten - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove CONFIG_NET_WIRELESS
On Mon, 2007-01-15 at 13:55 +0100, Maarten Lankhorst wrote: Johannes Berg schreef: On Sat, 2007-01-13 at 18:17 +0100, Maarten Lankhorst wrote: Remove CONFIG_NET_WIRELESS Nothing uses this, and it breaks the kernel build if a wireless device is used with a unsupported type of bus. Verified this with a grep. I don't really care about the symbol and I'm in favour of removing it if it is useless, but I don't understand the rationale. How does enabling this cause anything to fail? johannes Enabling this doesn't cause anything to fail, but my wireless router doesn't have a pci bus, but instead a native SSB, so CONFIG_NET_WIRELESS isn't selected. This in turn causes wext-common.o to not be built, so I get missing symbols and a build breakage. That's why I made wext-common.o depend on CONFIG_WIRELESS_EXT instead of CONFIG_NET_WIRELESS. Since nothing else uses CONFIG_NET_WIRELESS I decided to kill that symbol. Ok, that makes sense to me. Let's put this in but with this better description rather than the original one. johannes signature.asc Description: This is a digitally signed message part
Re: rare bad TCP checksum with 2.6.19?
Herbert Xu wrote: Michael Tokarev [EMAIL PROTECTED] wrote: Note there's no funny/interesting hardware involved, like network cards with tcp checksumming offload capabilities (this is plain dumb 8139 card). The 8139 card might be dumb, but the driver isn't :) It emulates checksum offload in software, meaning that tcpdump will show bogus checksums. So please disable hardware checksum offload with ethtool -K and then try again. # ethtool -k eth0 Offload parameters for eth0: Cannot get device rx csum settings: Operation not supported Cannot get device tx csum settings: Operation not supported Cannot get device scatter-gather settings: Operation not supported Cannot get device tcp segmentation offload settings: Operation not supported no offload info available # ethtool -K eth0 rx off tx off tso off Cannot set device rx csum settings: Operation not supported So I guess the problem is not related to hw checksumming offloading. Meanwhile, I tried many times to reproduce the problem - with little success. With different sizings, options, et al - I can't force the sending side to send some data within a FIN packet. I.e, most of the time, the thing just works, because no data goes with FIN packet. But once every 50..100 tries, I see single FIN-with-data packet, and that one ALWAYS has bad checksum. I was never able to reproduce the problem on a LAN, only when going from a distant host. And even with that distant host, it's very difficult to reproduce. At least one network (also distant) triggers this problem on every 2nd try or so (the one I experimented with yesterday). But I've no access to that network - I kindly asked for help yesterday, but I can't abuse their willingness to help more. And another thing I noticed. Right now I'm experimenting with another machine, running 2.6.17(.13) - it also shows similar behavior with bad csums, but MUCH rarer than this 2.6.19. Like this: 16:29:32.490976 IP (tos 0x60, ttl 48, id 14110, offset 0, flags [DF], length: 80) 69.42.67.34.2612 81.13.94.6.1234: . [bad tcp cksum f4b4 (-c1cc)!] ack 93407 win 9821 nop,nop,timestamp 1046528199 5497679,nop,nop,sack sack 3 {104991:109335}{110783:112231}{104991:109335} 16:29:32.525988 IP (tos 0x60, ttl 48, id 14112, offset 0, flags [DF], length: 80) 69.42.67.34.2612 81.13.94.6.1234: . [bad tcp cksum 3fb1 (-1819)!] ack 93407 win 9821 nop,nop,timestamp 1046528202 5497679,nop,nop,sack sack 3 {110783:113679}{122367:123815}{110783:113679} 16:29:32.561407 IP (tos 0x60, ttl 48, id 14116, offset 0, flags [DF], length: 80) 69.42.67.34.2612 81.13.94.6.1234: . [bad tcp cksum 87c0 (-2610)!] ack 93407 win 9821 nop,nop,timestamp 1046528205 5497679,nop,nop,sack sack 3 {122367:127103}{128551:129572}{122367:127103} Here, 69.42.67.34 is 2.6.17 from which I'm requesting data, and 81.13.94.6 is the sender. This behavior so far is demonstrated with sack packets only, but I've seen it in other direction too (also with sack), at least once. Any idea how to force sending FIN-with-data? Thanks! /mjt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove CONFIG_NET_WIRELESS
On Mon, 15 Jan 2007 13:31:06 +, Johannes Berg wrote: On Mon, 2007-01-15 at 13:55 +0100, Maarten Lankhorst wrote: Enabling this doesn't cause anything to fail, but my wireless router doesn't have a pci bus, but instead a native SSB, so CONFIG_NET_WIRELESS isn't selected. This in turn causes wext-common.o to not be built, so I get missing symbols and a build breakage. That's why I made wext-common.o depend on CONFIG_WIRELESS_EXT instead of CONFIG_NET_WIRELESS. Since nothing else uses CONFIG_NET_WIRELESS I decided to kill that symbol. Ok, that makes sense to me. Let's put this in but with this better description rather than the original one. The original mail with patch apparently didn't get to netdev (I haven't received it and it's not in netdev archive). Maarten, could you resend it please? Thanks, Jiri -- Jiri Benc SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
network failures w. r8169 (RTL8111/RTL8168B)
Hello. I am trying to get a RTL8111 (RealTek ethernet controller) running w. the r8169 kernel module. I am using kernel 2.6.19.2 on a LinuxFromScratch system; the motherboard on which said RTL8111 sits is an Asus P5B. lspci says (regarding the ethernet chip): 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01) During the use of network connections, we experience network transfer stops during which a transfer seems to stall completely for many seconds, after which the transfer runs as if nothing happened. This is reproducable everytime w. svn co http://svnserver/svn/tree (hangs VERY LONG) and w. LWP::Parallel::UserAgent. With the latter, I fired 100 runs of 100 requests, 7 clients trying parallel requests. Of these 100 runs, at least one, sometimes 2 stall for about 90 secs, after which the run continues and ends successfully, although the time of more than 90 secs for 100 requests can't be called sucessful, really. Both the subversion checkout and the performance testing via LWP::Parallel::UserAgent run as expected (- without stalling somewhere) on our other machines which do not have RTL8111. They also run as expected with kernels 2.6.18.x and the realtek driver r1000. With kernel 2.6.19.x, the r1000 driver is unusable as it has enormous packet loss (used version: r1000_v1.05.tgz). The r8169 SEEMS to have no packet loss (ping, ping -f) but above mentioned phenomenon seems to indicate otherwise. Has someone experienced similar effects w. r8169.ko and RTL8111? (I searched the archives but didn't quite find anything like this) I also tried kernel 2.6.20-rc5 to see if the problem eventually went away, but unfortunately the scenario remains the same. Greets, Jens -- [EMAIL PROTECTED] 23.56...drifting - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPROUTE 02/05]: Introduce tc_calc_xmitsize and use where appropriate
Jarek Poplawski wrote: On 10-01-2007 11:01, Patrick McHardy wrote: [IPROUTE]: Introduce tc_calc_xmitsize and use where appropriate Add tc_calc_xmitsize() as complement to tc_calc_xmittime(), which calculates the size that can be transmitted at a given rate during a given time. Replace all expressions of the form size = rate*tc_core_tick2usec(time))/100 by tc_calc_xmitsize() calls. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- ... +unsigned tc_calc_xmitsize(unsigned rate, unsigned ticks) +{ + return ((double)rate*tc_core_tick2usec(ticks))/100; +} + Actually, besides of replacing the expression, this function changes its type to unsigned also. It doesn't change it, all expressions I replaced were directly assigned to an unsigned int variable. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
3c59x.c patch to 2.6.18 fixing Wake on Lan (WOL)
The 3c59x.c in kernel 2.6.18 (and as I see later ones too) attempts to enable PME from the D0 state. The PME config space on Dell Optiplexs for this chip has a zero in the capabilities as it doesn't 'wake from d0'. So the pci_wake call fails, its result is not tested, so no error is reported. The routine changes the wake request from 0 to D3_hot. This fix causes wake on lan (WOL) to work properly on older Dell Optiplex models. Harry Coin Bettendorf, Iowa --- drivers-orig/3c59x.c2007-01-15 00:03:52.0 -0600 +++ drivers-fixed/3c59x.c 2007-01-15 00:46:37.0 -0600 @@ -3090,8 +3090,8 @@ /* Set Wake-On-LAN mode and put the board into D3 (power-down) state. */ static void acpi_set_WOL(struct net_device *dev) { - struct vortex_private *vp = netdev_priv(dev); - void __iomem *ioaddr = vp-ioaddr; + struct vortex_private *vp = netdev_priv(dev); + void __iomem *ioaddr = vp-ioaddr; if (vp-enable_wol) { /* Power up on: 1==Downloaded Filter, 2==Magic Packets, 4==Link Status. */ @@ -3101,7 +3101,7 @@ iowrite16(SetRxFilter|RxStation|RxMulticast|RxBroadcast, ioaddr + EL3_CMD); iowrite16(RxEnable, ioaddr + EL3_CMD); - pci_enable_wake(VORTEX_PCI(vp), 0, 1); + pci_enable_wake(VORTEX_PCI(vp),PCI_D3hot,1); /* Change the power state to D3; RxEnable doesn't take effect. */ pci_set_power_state(VORTEX_PCI(vp), PCI_D3hot); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
3c59x.c patch to 2.6.18 fixing Wake on Lan (WOL)
Hello all. The 3c59x.c in kernel 2.6.18 (and as I see later ones too) attempts to enable PME from the already awake D0 state. The PME config space on Dell Optiplexs for this chip has a zero in the capabilities for this bit-- no 'wake from d0'. The pci_enable_wake in 2.6.18 tests the capabilities before enabling PME, so the pci_wake call fails, its result is not tested, so no error is reported. The routine changes the wake request from 0 to D3_hot. This fix causes wake on lan (WOL) to work properly on older Dell Optiplex models. Kindly overlook newbie mistakes. Thank you. Harry Coin Bettendorf, Iowa --- drivers-orig/3c59x.c2007-01-15 00:03:52.0 -0600 +++ drivers-fixed/3c59x.c 2007-01-15 00:46:37.0 -0600 @@ -3090,8 +3090,8 @@ /* Set Wake-On-LAN mode and put the board into D3 (power-down) state. */ static void acpi_set_WOL(struct net_device *dev) { - struct vortex_private *vp = netdev_priv(dev); - void __iomem *ioaddr = vp-ioaddr; + struct vortex_private *vp = netdev_priv(dev); + void __iomem *ioaddr = vp-ioaddr; if (vp-enable_wol) { /* Power up on: 1==Downloaded Filter, 2==Magic Packets, 4==Link Status. */ @@ -3101,7 +3101,7 @@ iowrite16(SetRxFilter|RxStation|RxMulticast|RxBroadcast, ioaddr + EL3_CMD); iowrite16(RxEnable, ioaddr + EL3_CMD); - pci_enable_wake(VORTEX_PCI(vp), 0, 1); + pci_enable_wake(VORTEX_PCI(vp),PCI_D3hot,1); /* Change the power state to D3; RxEnable doesn't take effect. */ pci_set_power_state(VORTEX_PCI(vp), PCI_D3hot); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: network failures w. r8169 (RTL8111/RTL8168B)
On Mon, Jan 15, 2007 at 03:38:57PM +0100, Jens Stroebel wrote: During the use of network connections, we experience network transfer stops during which a transfer seems to stall completely for many seconds, after which the transfer runs as if nothing happened. Addition: Trying to debug the scenario a little, I used tcpdump to maybe find out what/where things get lost. This didn't work, as running tcpdump on !either server or client! made the symtom go away (..?) Greets, Jens -- [EMAIL PROTECTED] 23.56...drifting - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: watchdog timeout panic in e1000 driver
Kenzo Iwami wrote: With this patch applied, I confirmed that the system doesn't panic. I think this patch can fix this problem. Does this patch have problems. Kenzo, thanks for staying patient while most of us were out or busy. Apart from acknowledging that you might have fixed a problem with your patch, we're very reluctant to merge such a huge change in our driver that touches much more cases then the one that seems to be giving you problems. I've thought up a much more elegant solution that prevents the driver from asserting the swfw semaphore during normal operations by checking the mac LU (link up) register in the watchdog. This allows the watchdog task to bypass all PHY checking in case all link statuses are OK, and thus removes the big problem that you are seeing. Attached a version that should apply against most current trees. Please give it a try and let us know if this also fixes the problem for you. I will most likely push this patch to the netdev tree in any case. Cheers, Auke --- From: Auke Kok [EMAIL PROTECTED] e1000: Don't do PHY reads in watchdog unless link status is down The watchdog runs code that every 2 seconds performs several PHY reads that are locked with the swfw semaphore, causing the semaphore to be unavailable for a short time. This is completely unneeded in case the MAC detects PHY link up (LU). Signed-off-by: Auke Kok [EMAIL PROTECTED] --- drivers/net/e1000/e1000_main.c |5 + 1 file changed, 5 insertions(+) diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 34d8e5d..9660925 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -2556,6 +2556,10 @@ e1000_watchdog(unsigned long data) uint32_t link, tctl; int32_t ret_val; + if ((netif_carrier_ok(netdev)) + (E1000_READ_REG(adapter-hw, STATUS) E1000_STATUS_LU)) + goto link_up; + ret_val = e1000_check_for_link(adapter-hw); if ((ret_val == E1000_ERR_PHY) (adapter-hw.phy_type == e1000_phy_igp_3) @@ -2684,6 +2688,7 @@ e1000_watchdog(unsigned long data) e1000_smartspeed(adapter); } +link_up: e1000_update_stats(adapter); adapter-hw.tx_packet_delta = adapter-stats.tpt - adapter-tpt_old;
[PATCH]: 8139cp: Don't blindly enable interrupts in cp_start_xmit
(trying again, this time to the correct maintainer) All, Similar to this commit: http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d15e9c4d9a75702b30e00cdf95c71c88e3f3f51e It's not safe in cp_start_xmit to blindly call spin_lock_irq and then spin_unlock_irq, since it may very well be the case that cp_start_xmit was called with interrupts already disabled (I came across this bug in the context of netdump in RedHat kernels, but the same issue holds, for example, in netconsole). Therefore, replace all instances of spin_lock_irq and spin_unlock_irq with spin_lock_irqsave and spin_unlock_irqrestore, respectively, in cp_start_xmit(). I tested this against a fully-virtualized Xen guest, which happens to use the 8139cp driver to talk to the emulated hardware. I don't have a real piece of 8139cp hardware to test on, so someone else will have to do that. Signed-off-by: Chris Lalancette [EMAIL PROTECTED] diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c index e2cb19b..6f93a76 100644 --- a/drivers/net/8139cp.c +++ b/drivers/net/8139cp.c @@ -765,17 +765,18 @@ static int cp_start_xmit (struct sk_buff *skb, struct net_device *dev) struct cp_private *cp = netdev_priv(dev); unsigned entry; u32 eor, flags; + unsigned long intr_flags; #if CP_VLAN_TAG_USED u32 vlan_tag = 0; #endif int mss = 0; - spin_lock_irq(cp-lock); + spin_lock_irqsave(cp-lock, intr_flags); /* This is a hard error, log it. */ if (TX_BUFFS_AVAIL(cp) = (skb_shinfo(skb)-nr_frags + 1)) { netif_stop_queue(dev); - spin_unlock_irq(cp-lock); + spin_unlock_irqrestore(cp-lock, intr_flags); printk(KERN_ERR PFX %s: BUG! Tx Ring full when queue awake!\n, dev-name); return 1; @@ -908,7 +909,7 @@ static int cp_start_xmit (struct sk_buff *skb, struct net_device *dev) if (TX_BUFFS_AVAIL(cp) = (MAX_SKB_FRAGS + 1)) netif_stop_queue(dev); - spin_unlock_irq(cp-lock); + spin_unlock_irqrestore(cp-lock, intr_flags); cpw8(TxPoll, NormalTxPoll); dev-trans_start = jiffies;
[PATCH] ixgb: Don't stop queue unnecesarily
From: Auke Kok [EMAIL PROTECTED] ixgb: Don't stop queue unnecesarily We don't need to stop twice in ixgb_xmit_frame. Signed-off-by: Auke Kok [EMAIL PROTECTED] diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c index 51bd7e8..83f4d67 100644 --- a/drivers/net/ixgb/ixgb_main.c +++ b/drivers/net/ixgb/ixgb_main.c @@ -1473,7 +1473,6 @@ ixgb_xmit_frame(struct sk_buff *skb, struct net_device *netdev) if (unlikely(ixgb_maybe_stop_tx(netdev, adapter-tx_ring, DESC_NEEDED))) { - netif_stop_queue(netdev); spin_unlock_irqrestore(adapter-tx_lock, flags); return NETDEV_TX_BUSY; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Two Dual Core processors and NICS (not handling interrupts on one CPU/assigning a Two Dual Core processors and NICS (not handling interrupts on one CPU / assigning a CPU to a NIC)
Mark Ryden wrote: Hello, I have a machine with 2 dual core CPUs. This machine runs Fedora Core 6. I have two Intel e1000 GigaBit network cards on this machine; I use bonding so that the machine assigns the same IP address to both NICs ; It seems to me that bonding is configured OK, bacuse when running: cat /proc/net/bonding/bond0 I get: Ethernet Channel Bonding Driver: v3.0.3 (March 23, 2006) Bonding Mode: load balancing (round-robin) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Link Failure Count: 1 Permanent HW addr: . Slave Interface: eth1 MII Status: up Link Failure Count: 1 Permanent HW addr: (And the Permanent HW addr is diffenet in these two entries). I send a large amount of packets to this machine (more than 20,000 in a second). cat /proc/interrupts shops something like this: CPU0 CPU1 CPU2 CPU3 50:3359337 0 0 0 PCI-MSI eth0 58: 493396136 0 0 PCI-MSI eth1 CPU0 and CPU1 are of the first CPU as far as I understand ; so this means as far as I understand that the second CPU (which has CPU3 and CPU4) does not handle interrupts of the arrived packets; Can I somehow change it so the second CPU will also handle network interrupts of receiving packets on the nic ? Can I assign one CPU to eth0 and the second CPU to eth1 ? you will most likely have better performance from the shared cache on the core 2 duo by keeping it the way that it is right now - packets that need to transverse the bridge now make the cpus happy because after receive the sending NIC already has the data in it's cache. Moving one of the NICs over to cpu2/cpu3 would cause a cascade of cache misses for every packet that passes across the two nics in the bridge. Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
sky2: transmit timed out...
Stephen, After some days of uptime, I've been seeing 'transmit timed out' messages [1]. Let me know if there is any useful debugging you'd like. --- [1] sky2 v1.10 addr 0xdfb0 irq 16 Yukon-EC (0xb6) rev 1 sky2 eth1: addr 00:03:2d:05:9c:27 sky2 lan0: enabling interface sky2 lan0: Link is up at 1000 Mbps, full duplex, flow control both [snip] NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 464 .. 441 report=466 done=466 sky2 status report lost? NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 466 .. 441 report=466 done=466 sky2 hardware hung? flushing NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 441 .. 418 report=466 done=466 sky2 status report lost? NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 466 .. 443 report=466 done=466 sky2 hardware hung? flushing NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 443 .. 420 report=466 done=466 sky2 status report lost? NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 466 .. 443 report=466 done=466 sky2 hardware hung? flushing NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 443 .. 420 report=466 done=466 sky2 status report lost? NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 466 .. 443 report=466 done=466 sky2 hardware hung? flushing NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 443 .. 420 report=466 done=466 sky2 status report lost? NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 466 .. 443 report=466 done=466 sky2 hardware hung? flushing NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 443 .. 420 report=466 done=466 sky2 status report lost? NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 466 .. 443 report=466 done=466 sky2 hardware hung? flushing NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 443 .. 420 report=466 done=466 sky2 status report lost? NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 466 .. 443 report=466 done=466 sky2 hardware hung? flushing NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 443 .. 420 report=466 done=466 sky2 status report lost? NETDEV WATCHDOG: lan0: transmit timed out sky2 lan0: tx timeout sky2 lan0: transmit ring 466 .. 443 report=466 done=466 sky2 hardware hung? flushing -- Daniel J Blueman - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3c59x.c patch to 2.6.18 fixing Wake on Lan (WOL)
At 11:00 AM 1/15/2007 -0500, Dan Williams wrote: On Mon, 2007-01-15 at 09:12 -0600, Harry Coin wrote: Hello all. The 3c59x.c in kernel 2.6.18 (and as I see later ones too) attempts to enable PME from the already awake D0 state. The PME config space on Dell Optiplexs for this chip has a zero in the capabilities for this bit-- no 'wake from d0'. The pci_enable_wake in 2.6.18 tests the capabilities before enabling PME, so the pci_wake call fails, its result is not tested, so no error is reported. The routine changes the wake request from 0 to D3_hot. This fix causes wake on lan (WOL) to work properly on older Dell Optiplex models. Kindly overlook newbie mistakes. Thank you. You'll want to include a line like: Signed-off-by: Harry Coin your email here which signifies that you are legally able to contribute the attached patch under the GPL license. Do this right before the start of the patch (where you put your signature in the previous mail). Dan Thank you. I've added it to a repeat of the original posting copied below. --- drivers-orig/3c59x.c2007-01-15 00:03:52.0 -0600 +++ drivers-fixed/3c59x.c 2007-01-15 00:46:37.0 -0600 @@ -3090,8 +3090,8 @@ /* Set Wake-On-LAN mode and put the board into D3 (power-down) state. */ static void acpi_set_WOL(struct net_device *dev) { - struct vortex_private *vp = netdev_priv(dev); - void __iomem *ioaddr = vp-ioaddr; + struct vortex_private *vp = netdev_priv(dev); + void __iomem *ioaddr = vp-ioaddr; if (vp-enable_wol) { /* Power up on: 1==Downloaded Filter, 2==Magic Packets, 4==Link Status. */ @@ -3101,7 +3101,7 @@ iowrite16(SetRxFilter|RxStation|RxMulticast|RxBroadcast, ioaddr + EL3_CMD); iowrite16(RxEnable, ioaddr + EL3_CMD); - pci_enable_wake(VORTEX_PCI(vp), 0, 1); + pci_enable_wake(VORTEX_PCI(vp),PCI_D3hot,1); /* Change the power state to D3; RxEnable doesn't take effect. */ pci_set_power_state(VORTEX_PCI(vp), PCI_D3hot); Harry Coin Bettendorf, Iowa Signed-off-by: Harry Coin [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
e100.c patch to 2.6.18 fixing Wake on Lan (WOL)
Hello from Iowa. Below please find a fix to the Wake On Lan function in the e100.c (intel 10/100) driver. With the original driver distributed in kernel 2.6.18 in debian etch, wake on lan did not work. This was tested on 14 dell optiplexes with built-in ethernet chips in a totally diskless environment (initramfs / pxelinux). All operations were normal save wake on lan. When WOL has been enabled with ethtools, the old driver assumes wrongly that e100_configure will be called at least once with !netif_running. Only in that instance will it set the chip to notice 'magic' wol packets if the ethtools -s wol g has been called prior. The old e100_down routine never does call e100_configure so that the driver never does turn off the 'disable WOL magic packet' bit. Neither does the .shutdown routine. This fix tries to only enable the WOL recognition only when e100_down is called for the last time before module unload or system shutdown, while leaving ifconfig down untouched.(testing for being run in the context of dev-stop). Notice that the hw_reset routine is called in the old e100_down, and that silently causes WOL to be reset. In some attempt to avoid this debian (and I don't know which other sysvinit tools) added a NETDOWN define to the /etc/init.d/halt script, which when changed from the default=yes to 'no' avoids the -i option to 'halt' leaving the e100 configured. With the below fix the default in /etc/init.d/halt is required, the define change is not necessary, in fact it is important that halt call down for wol to work. (In the case of the old e100 driver it didn't matter either way, as e100_configure was never called once the driver was stopped). Notice that the binary /sbin/halt in debain etch has a bug and in fact never does call ifdown, whether -i is or isn't specified. Compiling from the source by hand does work. I have submitted a bug report for this. A further e100 fix I didn't add was for .shutdown to check whether the driver was down and to call e100_down if it was still up. That added fix would make sure WOL would work no matter if the halt script did or didn't down the driver before system shutdown.I'm not sure what the implications of my fix are in the context of sleep /resume. I have also submitted the above to the e1000 group at intel privately as they are the 'maintainers', but this appears to be the only apropos open group I thought to note he here as well. Thanks Harry Coin N4 Communications Bettendorf, Iowa Signed-off-by: Harry Coin [EMAIL PROTECTED] --- drivers-orig/e100.c 2007-01-15 00:01:48.0 -0600 +++ drivers-fixed/e100.c2007-01-14 23:32:08.0 -0600 @@ -2088,10 +2088,26 @@ static void e100_down(struct nic *nic) { - /* wait here for poll to complete */ - netif_poll_disable(nic-netdev); - netif_stop_queue(nic-netdev); - e100_hw_reset(nic); +if ((!netif_running(nic-netdev)) (nic-flags wol_magic)) { + /* if this is a device close, and not an ifdown, and wol is enabled, */ + /* then turn off the bit disabling wol magic packet recognition on */ + /* the chip. Previously, WOL magic packet recognition was never */ + /* enabled as e100_down never called e100_configure when */ + /* nif_running was false. So: */ + /* This makes the e100 not only work with WOL, but */ + /* also avoids having to edit the default NETDOWN variable */ + /* in /etc/init.d/halt from the default 'yes' to 'no'. */ + e100_exec_cb(nic, NULL, e100_configure); + /* wait here for poll to complete */ + netif_poll_disable(nic-netdev); + netif_stop_queue(nic-netdev); + e100_disable_irq(nic); +} else { + /* wait here for poll to complete */ + netif_poll_disable(nic-netdev); + netif_stop_queue(nic-netdev); + e100_hw_reset(nic); + } free_irq(nic-pdev-irq, nic-netdev); del_timer_sync(nic-watchdog); netif_carrier_off(nic-netdev); @@ -2099,6 +2115,7 @@ e100_rx_clean_list(nic); } + static void e100_tx_timeout(struct net_device *netdev) { struct nic *nic = netdev_priv(netdev); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100.c patch to 2.6.18 fixing Wake on Lan (WOL)
Harry Coin wrote: Hello from Iowa. Below please find a fix to the Wake On Lan function in the e100.c (intel 10/100) driver. With the original driver distributed in kernel 2.6.18 in debian etch, wake on lan did not work. This was tested on 14 dell optiplexes with built-in ethernet chips in a totally diskless environment (initramfs / pxelinux). All operations were normal save wake on lan. Oi, I've done quite a bit of work especially on that since 2.6.18 and as far as I could see those changes fixed WoL, suspend/resume and netconsole, as was confirmed by Andrew Morton even. Have you tried the version in 2.6.19? When WOL has been enabled with ethtools, the old driver assumes wrongly that e100_configure will be called at least once with !netif_running. Only in that instance will it set the chip to notice 'magic' wol packets if the ethtools -s wol g has been called prior. The old e100_down routine never does call e100_configure so that the driver never does turn off the 'disable WOL magic packet' bit. Neither does the .shutdown routine. This fix tries to only enable the WOL recognition only when e100_down is called for the last time before module unload or system shutdown, while leaving ifconfig down untouched.(testing for being run in the context of dev-stop). that's exactly what my patches should fix as far as I can remember I have also submitted the above to the e1000 group at intel privately as they are the 'maintainers', but this appears to be the only apropos open group I thought to note he here as well. I have not seen this patch before, care to Cc me to that? We also publically discuss e1000/e100 and ixgb issues on [EMAIL PROTECTED] Feel free to Cc that list. Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2: transmit timed out...
Please reproduce problem with this patch, then do: cat /proc/sys/net/sky2/lan0 This patch (which shouldn't go into the mainline driver), adds a debug interface to sky2 driver to dump the receive and transmit rings. The file /proc/net/sky2/ethX will show the status of transmits in process, status responses not handled, and receives pending. --- drivers/net/sky2.c | 158 +++-- drivers/net/sky2.h |4 + 2 files changed, 157 insertions(+), 5 deletions(-) --- sky2-2.6.orig/drivers/net/sky2.c2007-01-11 10:05:09.0 -0800 +++ sky2-2.6/drivers/net/sky2.c 2007-01-11 10:23:01.0 -0800 @@ -38,6 +38,7 @@ #include linux/workqueue.h #include linux/if_vlan.h #include linux/prefetch.h +#include linux/proc_fs.h #include linux/mii.h #include asm/irq.h @@ -866,10 +867,11 @@ /* Build description to hardware for one possibly fragmented skb */ static void sky2_rx_submit(struct sky2_port *sky2, - const struct rx_ring_info *re) + struct rx_ring_info *re) { int i; + re-idx = sky2-rx_put; sky2_rx_add(sky2, OP_PACKET, re-data_addr, sky2-rx_data_size); for (i = 0; i skb_shinfo(re-skb)-nr_frags; i++) @@ -1462,6 +1464,7 @@ } le-ctrl |= EOP; + re-idx = le - sky2-tx_le; /* debug */ if (tx_avail(sky2) = MAX_SKB_TX_LE) netif_stop_queue(dev); @@ -3296,6 +3299,139 @@ .get_perm_addr = ethtool_op_get_perm_addr, }; + +static struct proc_dir_entry *sky2_proc; + +static int sky2_seq_show(struct seq_file *seq, void *v) +{ + struct net_device *dev = seq-private; + const struct sky2_port *sky2 = netdev_priv(dev); + const struct sky2_hw *hw = sky2-hw; + unsigned port = sky2-port; + unsigned idx, ridx, rend, last; + + last = sky2_read16(hw, STAT_PUT_IDX); + + if (hw-st_idx == last) + seq_puts(seq, Status ring (empty)\n); + else { + seq_puts(seq, Status ring\n); + for (idx = hw-st_idx; idx != last; +idx = RING_NEXT(idx, STATUS_RING_SIZE)) { + const struct sky2_status_le *le = hw-st_le + idx; + seq_printf(seq, [%d] %#x %d %#x\n, + idx, le-opcode, le-length, le-status); + } + } + + if (sky2-tx_cons == sky2-tx_prod) + seq_puts(seq, \nTx ring (empty)\n); + else { + seq_puts(seq, \nTx ring\n); + idx = sky2-tx_cons; + while (idx != sky2-tx_prod) { + const struct tx_ring_info *re = sky2-tx_ring + idx; + + seq_printf(seq, [%d] %p\n, idx, re-skb); + do { + const struct sky2_tx_le *le = sky2-tx_le + idx; + seq_printf(seq, \t%#x %d, le-opcode, le-addr); + idx = RING_NEXT(idx, TX_RING_SIZE); + } while (idx != re-idx || idx != sky2-tx_prod); + seq_putc(seq, '\n'); + } + } + + seq_printf(seq, \nRx pending hw get=%d put=%d last=%d\n, + sky2_read16(hw, Y2_QADDR(rxqaddr[port], PREF_UNIT_GET_IDX)), + last = sky2_read16(hw, Y2_QADDR(rxqaddr[port], PREF_UNIT_PUT_IDX)), + sky2_read16(hw, Y2_QADDR(rxqaddr[port], PREF_UNIT_LAST_IDX))); + + ridx = sky2-rx_next; + do { + const struct rx_ring_info *re = sky2-rx_ring + ridx; + seq_printf(seq, [%d] %p |, ridx, re-skb); + + idx = re-idx; + ridx = (ridx + 1) % sky2-rx_pending; + + if (ridx == sky2-rx_next) + rend = last; + else + rend = sky2-rx_ring[ridx].idx; + + do { + const struct sky2_rx_le *le = sky2-rx_le + idx; + + switch (le-opcode ~HW_OWNER) { + case OP_PACKET: + case OP_BUFFER: + seq_printf(seq, %#x(%d), le-addr, le-length); + break; + case OP_ADDR64: + seq_printf(seq, %#x:, le-addr); + break; + default: + seq_printf(seq, {%x} %#x(%d), + le-opcode, le-addr, le-length); + } + + } while ((idx = RING_NEXT(idx, RX_LE_SIZE)) != rend); + + seq_puts(seq, \n); + } while (ridx != sky2-rx_next); + + return 0; +} + +static int sky2_proc_open(struct inode *inode, struct file *file) +{ + return single_open(file, sky2_seq_show, PDE(inode)-data); +} + +static const struct file_operations sky2_proc_fops = { +
Re: [patch 0/6] sky2 driver update (v1.11)
On Sat, 13 Jan 2007 14:03:29 +0100 Tino Keitel [EMAIL PROTECTED] wrote: On Tue, Jan 02, 2007 at 20:10:15 +0100, Tino Keitel wrote: [...] Btw., I just built 2.6.20-rc3 with patches 4 and 5 and wake on LAN now works. Thanks for your work. Hi, I had some failures during resume from suspend with 2.6.20-rc3 and -rc4. I enabled pm_trace and it looks like the sky2 driver is the culprit: hash matches drivers/base/power/resume.c:56 hash matches device :01:00.0 $ lspci | grep 01:00.0 01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 22) I removed the patches and had no resume failure so far. Regards, Tino What kind of failures, did the system just not come up? Did you have WOL enabled or not? The new code checks for pci_ errors on resume and it could be that the errors were always there. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rare bad TCP checksum with 2.6.19?
Michael Tokarev a e'crit : Any idea how to force sending FIN-with-data? int flag_on = 1; setsockopt(fd, SOL_TCP, TCP_CORK, flag_on, sizeof(int)); send(fd, data, datalen, 0); close(fd); Eric Dumazet - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/6] sky2 driver update (v1.11)
On Mon, Jan 15, 2007 at 10:21:49 -0800, Stephen Hemminger wrote: On Sat, 13 Jan 2007 14:03:29 +0100 Tino Keitel [EMAIL PROTECTED] wrote: On Tue, Jan 02, 2007 at 20:10:15 +0100, Tino Keitel wrote: [...] Btw., I just built 2.6.20-rc3 with patches 4 and 5 and wake on LAN now works. Thanks for your work. Hi, I had some failures during resume from suspend with 2.6.20-rc3 and -rc4. I enabled pm_trace and it looks like the sky2 driver is the culprit: hash matches drivers/base/power/resume.c:56 hash matches device :01:00.0 $ lspci | grep 01:00.0 01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 22) I removed the patches and had no resume failure so far. Regards, Tino What kind of failures, did the system just not come up? Yes, screen stayed dark and machine was dead. However, it was hardly reproducable. I set up a suspend/resume loop for an hour without failures. Then, when I just wanted to suspend for a while, resume failed. Did you have WOL enabled or not? I had WOL enabled. Regards, Tino - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100.c patch to 2.6.18 fixing Wake on Lan (WOL)
At 10:19 AM 1/15/2007 -0800, Auke Kok wrote: Have you tried the version in 2.6.19? I even tried copying and pasting the e100_down and the latest PM stuff from the newest e100.c version on sourceforge. I admit to being defeated as to how to join a sourceforge group. Too many hours writing Microsoft drivers maybe? It comes down to this: 1) The e100_configure command is the only place that turns off the WOL disable bit. 2) That bit is only turned off if e100_configure is called after netif_running is false and wol is set. 3) e100_configure is not called at any point after dev-stop (the first moment netif_running is false) through the end of .shutdown. Therefore WOL disable is always turned on, no matter the request by ethtools. I sense there is a sense that if pci_enable_wake has been called properly, then all's well. But on this board, there is a configuration bit that also has to be disabled, a but that is silently reset during a hw_reset, and hw_reset __is__ called in e100_down. Hence, the fix I submitted. I know it isn't perfect because I'm not intimately familiar with the dynamics of this chip. But I do know this: 14 Dell Optiplex systems failed to WOL with the stock 2.6.18 distributed with debian etch. After my patch is applied to e100.c, and no other changes from anything default in 2.6.18 and debian etch, it works perfectly every time. I should have added that ACPI and lapic are in use, but that's the usual case. Cheers, Harry Coin N4 Communications Bettendorf, Iowa - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Remove CONFIG_NET_WIRELESS
Jiri Benc schreef: On Mon, 15 Jan 2007 13:31:06 +, Johannes Berg wrote: On Mon, 2007-01-15 at 13:55 +0100, Maarten Lankhorst wrote: Enabling this doesn't cause anything to fail, but my wireless router doesn't have a pci bus, but instead a native SSB, so CONFIG_NET_WIRELESS isn't selected. This in turn causes wext-common.o to not be built, so I get missing symbols and a build breakage. That's why I made wext-common.o depend on CONFIG_WIRELESS_EXT instead of CONFIG_NET_WIRELESS. Since nothing else uses CONFIG_NET_WIRELESS I decided to kill that symbol. Ok, that makes sense to me. Let's put this in but with this better description rather than the original one. The original mail with patch apparently didn't get to netdev (I haven't received it and it's not in netdev archive). Maarten, could you resend it please? Thanks, Jiri Sorry, must have missed sending it to netdev, original message follows. Remove CONFIG_NET_WIRELESS Nothing uses this, and it breaks the kernel build if a wireless device is used with a unsupported type of bus. Verified this with a grep. Signed-off-by: Maarten Lankhorst [EMAIL PROTECTED] diff --git a/drivers/net/wireless/Kconfig b/drivers/net/wireless/Kconfig index 03dbe60..b9620c6 100644 --- a/drivers/net/wireless/Kconfig +++ b/drivers/net/wireless/Kconfig @@ -544,11 +544,5 @@ source drivers/net/wireless/zd1211rw/Kc source drivers/net/wireless/d80211/Kconfig -# yes, this works even when no drivers are selected -config NET_WIRELESS - bool - depends on NET_RADIO (ISA || PCI || PPC_PMAC || PCMCIA) - default y - endmenu diff --git a/net/wireless/Makefile b/net/wireless/Makefile index f285440..44ae23a 100644 --- a/net/wireless/Makefile +++ b/net/wireless/Makefile @@ -12,5 +12,5 @@ obj-ny := # this needs to be compiled in... obj-$(CONFIG_CFG80211_WEXT_COMPAT) += wext-compat.o -obj-$(CONFIG_CFG80211_WEXT_COMPAT)$(CONFIG_NET_WIRELESS) += wext-common.o +obj-$(CONFIG_CFG80211_WEXT_COMPAT)$(CONFIG_WIRELESS_EXT) += wext-common.o obj-y += $(obj-yy) $(obj-yn) $(obj-ny) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100.c patch to 2.6.18 fixing Wake on Lan (WOL)
Harry Coin wrote: At 10:19 AM 1/15/2007 -0800, Auke Kok wrote: Have you tried the version in 2.6.19? I even tried copying and pasting the e100_down and the latest PM stuff from the newest e100.c version on sourceforge. I admit to being defeated as to how to join a sourceforge group. Too many hours writing Microsoft drivers maybe? the list is open to posting, so that's fairly easy. It comes down to this: 1) The e100_configure command is the only place that turns off the WOL disable bit. 2) That bit is only turned off if e100_configure is called after netif_running is false and wol is set. 3) e100_configure is not called at any point after dev-stop (the first moment netif_running is false) through the end of .shutdown. Therefore WOL disable is always turned on, no matter the request by ethtools. I sense there is a sense that if pci_enable_wake has been called properly, then all's well. But on this board, there is a configuration bit that also has to be disabled, a but that is silently reset during a hw_reset, and hw_reset __is__ called in e100_down. Hence, the fix I submitted. I know it isn't perfect because I'm not intimately familiar with the dynamics of this chip. But I do know this: 14 Dell Optiplex systems failed to WOL with the stock 2.6.18 distributed with debian etch. After my patch is applied to e100.c, and no other changes from anything default in 2.6.18 and debian etch, it works perfectly every time. I should have added that ACPI and lapic are in use, but that's the usual case. okay, I don't necesary meant that your patch is incorrect, however we need to make sure that your patch doesn't break 2.6.19, because that code is already upstream. on top of that, both patches might be needed, and I suspect that is the case to keep suspend and netconsole to keep working, so I would still like to ask you to test 2.6.19, with and without your patch. I'll do the same here and push your patch to Garzik if (after testing) we're both OK with it. Thanks, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: 8139cp: Don't blindly enable interrupts in cp_start_xmit
Chris Lalancette [EMAIL PROTECTED] : [...] Similar to this commit: http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d15e9c4d9a75702b30e00cdf95c71c88e3f3f51e It's not safe in cp_start_xmit to blindly call spin_lock_irq and then spin_unlock_irq, since it may very well be the case that cp_start_xmit was called with interrupts already disabled (I came across this bug in the context of netdump in RedHat kernels, but the same issue holds, for example, in netconsole). Therefore, replace all instances of spin_lock_irq and spin_unlock_irq with spin_lock_irqsave and spin_unlock_irqrestore, respectively, in cp_start_xmit(). I tested this against a fully-virtualized Xen guest, which happens to use the 8139cp driver to talk to the emulated hardware. I don't have a real piece of 8139cp hardware to test on, so someone else will have to do that. (message reformated to fit in 80 columns, please fix your mailer) As I understand http://lkml.org/lkml/2006/12/12/239, something like the patch below should had been sent instead. Herbert, ack/nak ? diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 823215d..ff95641 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -55,7 +55,6 @@ static void queue_process(struct work_struct *work) struct netpoll_info *npinfo = container_of(work, struct netpoll_info, tx_work.work); struct sk_buff *skb; - unsigned long flags; while ((skb = skb_dequeue(npinfo-txq))) { struct net_device *dev = skb-dev; @@ -65,19 +64,16 @@ static void queue_process(struct work_struct *work) continue; } - local_irq_save(flags); netif_tx_lock(dev); if (netif_queue_stopped(dev) || dev-hard_start_xmit(skb, dev) != NETDEV_TX_OK) { skb_queue_head(npinfo-txq, skb); netif_tx_unlock(dev); - local_irq_restore(flags); schedule_delayed_work(npinfo-tx_work, HZ/10); return; } netif_tx_unlock(dev); - local_irq_restore(flags); } } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rare bad TCP checksum with 2.6.19?
On Mon, Jan 15, 2007 at 04:34:41PM +0300, Michael Tokarev wrote: # ethtool -k eth0 Offload parameters for eth0: Cannot get device rx csum settings: Operation not supported Cannot get device tx csum settings: Operation not supported Cannot get device scatter-gather settings: Operation not supported Cannot get device tcp segmentation offload settings: Operation not supported no offload info available # ethtool -K eth0 rx off tx off tso off Cannot set device rx csum settings: Operation not supported So I guess the problem is not related to hw checksumming offloading. Nope, it just means that 8139too doesn't provide ethtool handlers to disable checksum offloading. So I suggest that you try doing the tcpdump on the receive side as that should show the real checksum. BTW, the reason tcpdump only shows some packets with bogus checksums is because it cuts packets off at 100 bytes by default so for most packets it can't verify the checksum at all. If you run it with -s 1600 you should see bogus checksums on every packet with payload. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rare bad TCP checksum with 2.6.19?
Herbert Xu wrote: On Mon, Jan 15, 2007 at 04:34:41PM +0300, Michael Tokarev wrote: [] So I guess the problem is not related to hw checksumming offloading. Nope, it just means that 8139too doesn't provide ethtool handlers to disable checksum offloading. So I suggest that you try doing the tcpdump on the receive side as that should show the real checksum. I'm doing the capture on an intermediate host - the whole day today ;) BTW, the reason tcpdump only shows some packets with bogus checksums is because it cuts packets off at 100 bytes by default so for most packets it can't verify the checksum at all. If you run it with -s 1600 you should see bogus checksums on every packet with payload. And I'm capturing with -s 2000. By the way, tcpdump just does not verify the cheksum of truncated (due to capture size) packets. At least not the version I'm using (which is 3.9.5). Herbert, the problem IS real, it's not due to some bad behavior due to improper capturing or something like that. Yes it's difficult to come to it, but it is real. I've saved quite alot of packets today, but it's all quite.. useless as the thing is difficult to hit. Here's some traces made with the following filter: proto TCP and tcp[tcpflags] (tcp-fin|tcp-push) == (tcp-fin|tcp-push) (I've choosen FIN+PUSH because this combination is where the problem is seen most - to be fair, it looks like I haven't seen it with other flags). In there, some packets are ok, but some are not. So - again, it seems like - I was wrong about 100% hit ratio -- ie, that the bad checksum is ALWAYS the case with packets where some data goes in FIN packets -- this is incorrect, because the trace shows quite a few examples of right behavior. The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin (it contains some data which it sholdn't - but I hope there's nothing confidential in there ;) So, after the whole day digging around, I still don't have any more-or-less clean way to reproduce it. But I've noticied another thing as well: many different machines here, with different kernels, behave the same way. So it can't be a hardware problem for example. And only at VERY rare cases, the thing causes noticeable transfer slowdowns or stalls. But some networks triggers those rare cases more often than others (so the only more or less sane conclusion I can come with is that it's somehow timing-related). Thanks! /mjt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: 8139cp: Don't blindly enable interrupts in cp_start_xmit
On Mon, Jan 15, 2007 at 08:56:35PM +0100, Francois Romieu wrote: As I understand http://lkml.org/lkml/2006/12/12/239, something like the patch below should had been sent instead. Herbert, ack/nak ? Sorry, what I said in that thread is in error. Netpoll may unfortunately call the transmit routine with IRQs off. So the drivers can't currently use spin_lock_irq and must save the current flags instead. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rare bad TCP checksum with 2.6.19?
On Tue, Jan 16, 2007 at 12:46:08AM +0300, Michael Tokarev wrote: I'm doing the capture on an intermediate host - the whole day today ;) Cool, I was just trying to make sure :) The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin I'll take a look. Are you using anything extra like netfilter? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rare bad TCP checksum with 2.6.19?
Michael Tokarev a écrit : Eric Dumazet wrote: Michael Tokarev a e'crit : Any idea how to force sending FIN-with-data? int flag_on = 1; setsockopt(fd, SOL_TCP, TCP_CORK, flag_on, sizeof(int)); send(fd, data, datalen, 0); close(fd); That produces two packets - one (or more - depending on the size) data packet and one FIN packet w/o any data. This is the first thing I've tried. This may be because I forgot the shutdown() ? int flag_on = 1; setsockopt(fd, SOL_TCP, TCP_CORK, flag_on, sizeof(int)); send(fd, data, datalen, 0); shutdown(fd, 1); close(fd); At least this is working on my machines (with and without shutdown()) Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] atl1: Header files for Attansic L1 driver
Christoph Hellwig wrote: On Wed, Jan 10, 2007 at 06:41:37PM -0600, Jay Cliburn wrote: +struct csum_param { + unsigned buf_len:14; + unsigned dma_int:1; + unsigned pkt_int:1; + u16 valan_tag; + unsigned eop:1; + /* command */ + unsigned coalese:1; + unsigned ins_vlag:1; + unsigned custom_chksum:1; + unsigned segment:1; + unsigned ip_chksum:1; + unsigned tcp_chksum:1; + unsigned udp_chksum:1; + /* packet state */ + unsigned vlan_tagged:1; + unsigned eth_type:1; + unsigned iphl:4; + unsigned:2; + unsigned payload_offset:8; + unsigned xsum_offset:8; +} _ATL1_ATTRIB_PACK_; Bitfields should not be used for hardware datastructures ever. Please convert this to explicit masking and shifting. +/* formerly ATL1_WRITE_REG */ +static inline void atl1_write32(const struct atl1_hw *hw, int reg, u32 val) +{ +writel(val, hw-hw_addr + reg); +} + +/* formerly ATL1_READ_REG */ +static inline u32 atl1_read32(const struct atl1_hw *hw, int reg) +{ +return readl(hw-hw_addr + reg); +} Just kill all these wrappers. Also you probably want to convert to pci_iomap + ioread*/iowrite*. Christoph et al., I've incorporated all your comments except the two shown above. I killed the indicated atl1_write*/atl1_read* wrappers, but I'm not yet familiar enough with pci_iomap/iowrite*/ioread* to make that particular conversion, and I'm having trouble getting the bitfield struct converted to shift/mask semantics (No matter how hard I try, I keep breaking the transmit side of the adapter). I'd like to plead for relief on these two items and submit a new version of the driver containing all your other comments. I need help from a more experienced netdev hacker, and in my mind, the best way to do that is to get the driver in the kernel so more people can use it and contribute changes and make improvements. I welcome any comments on the rationality of this approach. Jay - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] atl1: Header files for Attansic L1 driver
Jay Cliburn [EMAIL PROTECTED] : [...] I welcome any comments on the rationality of this approach. An URL for the current version of the patch would be welcome too :o) -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] atl1: Header files for Attansic L1 driver
Francois Romieu wrote: Jay Cliburn [EMAIL PROTECTED] : [...] I welcome any comments on the rationality of this approach. An URL for the current version of the patch would be welcome too :o) Sorry. Forgot to do that. The current version may be found here: ftp://hogchain.net/pub/linux/m2v/attansic/kernel_driver/atl1-2.0.4/atl1-2.0.4-linux-2.6.20.rc5.patch.bz2 Jay - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] [SCTP]: Set correct error cause value for missing parameters
From: Sridhar Samudrala [EMAIL PROTECTED] Date: Thu, 11 Jan 2007 11:41:25 -0800 [SCTP]: Set correct error cause value for missing parameters sctp_process_missing_param() needs to use the SCTP_ERROR_MISS_PARAM error cause value. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED] Applied, thank you. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] [SCTP]: Verify some mandatory parameters.
From: Sridhar Samudrala [EMAIL PROTECTED] Date: Thu, 11 Jan 2007 11:41:27 -0800 [SCTP]: Verify some mandatory parameters. Verify init_tag and a_rwnd mandatory parameters in INIT and INIT-ACK chunks. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] [SCTP]: Correctly handle unexpected INIT-ACK chunk.
From: Sridhar Samudrala [EMAIL PROTECTED] Date: Thu, 11 Jan 2007 11:41:29 -0800 [SCTP]: Correctly handle unexpected INIT-ACK chunk. Consider the chunk as Out-of-the-Blue if we don't have an endpoint. Otherwise discard it as before. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED] Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] [SCTP]: Fix SACK sequence during shutdown
From: Sridhar Samudrala [EMAIL PROTECTED] Date: Thu, 11 Jan 2007 11:41:32 -0800 [SCTP]: Fix SACK sequence during shutdown Currently, when association enters SHUTDOWN state,the implementation will SACK any DATA first and then transmit the SHUTDOWN chunk. This is against the order required by 2960bis spec. SHUTDOWN must always be first, followed by SACK. This change forces this order and also enables bundling. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED] Also applied, thanks a lot. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rare bad TCP checksum with 2.6.19?
On Tue, Jan 16, 2007 at 12:46:08AM +0300, Michael Tokarev wrote: The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin I'm sorry but this dump does NOT look like it was taken from an intermediate box. I verified two bad checksums (chosen randomly) and they were both correct but partial checksums. This means that this dump was most likely taken from the sending host. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [IrDA] irda-usb TX path optimization
From: Samuel Ortiz [EMAIL PROTECTED] Date: Mon, 15 Jan 2007 11:15:11 +0200 Since we stop using dev_alloc_skb on the IrDA TX frame, we constantly run into the case of the skb headroom being 0, and thus we call skb_cow for every IrDA TX frame. This patch uses a local buffer and memcpy the skb to it, saving us a kmalloc for each of those IrDA TX frames. Signed-off-by: Samuel Ortiz [EMAIL PROTECTED] Applied, thanks. Technically this is a bug fix too because once an SKB hits the transmit function it should essentially be immutable, ie. you shouldn't be writing to it. tcpdump sniffers could be looking at the SKB, as one example. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rare bad TCP checksum with 2.6.19?
On Tue, Jan 16, 2007 at 02:27:39PM +1100, Herbert Xu wrote: I'm sorry but this dump does NOT look like it was taken from an intermediate box. I verified two bad checksums (chosen randomly) and they were both correct but partial checksums. This means that this dump was most likely taken from the sending host. I did see one strange bit: 02:39:51.758803 IP (tos 0x0, ttl 63, id 41084, offset 0, flags [DF], length: 102) 192.168.1.1.25 81.13.94.6.21350: FP [bad tcp cksum 81b0 (-9ee8)!] 4271854025:4271 854075(50) ack 3772789166 win 272 nop,nop,timestamp 145420525 6279830 0x: 4500 0066 a07c 4000 3f06 2a59 c0a8 0101 E..f.|@.?.*Y 0x0010: 510d 5e06 0019 5366 fe9f 51c9 e0e0 31ae Q.^...Sf..Q...1. 0x0020: 8019 0110 81b0 0101 080a 08aa f0ed 0x0030: 005f d296 3235 3020 322e 302e 3020 4f6b ._..250.2.0.0.Ok 0x0040: 3a20 7175 6575 6564 2061 7320 3631 3345 :.queued.as.613E 0x0050: 4137 4637 440d 0a32 3231 2032 2e30 2e30 A7F7D..221.2.0.0 0x0060: 2042 7965 0d0a .Bye.. Most of the bad checksums are from 81.13.94.6, which I presume is the host you were dumping on. However, this packet is destined for it instead and yet it too has a partial (but correct) checksum. So the question is where in your network is 192.168.1.1 and how is your network setup in terms of NAT? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] [IrDA] Removed incorrect IRDA_ASSERT()
From: Samuel Ortiz [EMAIL PROTECTED] Date: Mon, 15 Jan 2007 11:15:42 +0200 With USB2.0 bulk out MTU can be 512 bytes, so checking it only for 64 bytes is incorrect. Signed-off-by: Samuel Ortiz [EMAIL PROTECTED] Applied, thanks a lot. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [IrDA] irda-usb TX path optimization
David Miller [EMAIL PROTECTED] wrote: Technically this is a bug fix too because once an SKB hits the transmit function it should essentially be immutable, ie. you shouldn't be writing to it. tcpdump sniffers could be looking at the SKB, as one example. We do have a way around that with skb_header_cloned. In fact it looks like VLAN should use it as otherwise TCP packets will get copied unnecessarily. This is still not optimal for AF_PACKET users since they will still cause things like VLANs to do the copy even when it isn't necessary because it doesn't touch any part of the packet that AF_PACKET actually looks at. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [IPV6] fixed the size of the netlink message notified by inet6_rt_notify().
Hi, I think the return value of rt6_nlmsg_size() should includes the amount of RTA_METRICS. Regards, --- net/ipv6/route.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 8c3d568..5f0043c 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2017,6 +2017,7 @@ static inline size_t rt6_nlmsg_size(void + nla_total_size(4) /* RTA_IIF */ + nla_total_size(4) /* RTA_OIF */ + nla_total_size(4) /* RTA_PRIORITY */ + + RTAX_MAX * nla_total_size(4) /* RTA_METRICS */ + nla_total_size(sizeof(struct rta_cacheinfo)); } -- 1.4.4 -- Noriaki TAKAMIYA - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPV6] fixed the size of the netlink message notified by inet6_rt_notify().
Hi, I'm sorry to re-send... I think the return value of rt6_nlmsg_size() should includes the amount of RTA_METRICS. Regards, Signed-off-by: Noriaki TAKAMIYA [EMAIL PROTECTED] --- net/ipv6/route.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 8c3d568..5f0043c 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2017,6 +2017,7 @@ static inline size_t rt6_nlmsg_size(void + nla_total_size(4) /* RTA_IIF */ + nla_total_size(4) /* RTA_OIF */ + nla_total_size(4) /* RTA_PRIORITY */ + + RTAX_MAX * nla_total_size(4) /* RTA_METRICS */ + nla_total_size(sizeof(struct rta_cacheinfo)); } -- 1.4.4 -- Noriaki TAKAMIYA - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -mm 0/10][RFC] aio: make struct kiocb private
This series is an attempt to generalize the async I/O paths to be implementation agnostic. It completely eliminates knowledge of the kiocb structure in the generic code and makes it private within the current aio code. Things get noticeably cleaner without that layering violation. The new interface takes a file_endio_t function pointer, and a private data pointer, which would normally be aio_complete and a kiocb pointer, respectively. If the aio submission function gets back EIOCBQUEUED, that is a guarantee that the endio function will be called, or *already has been called*. If the file_endio_t pointer provided to aio_[read|write] is NULL, the FS must block on I/O completion, then return either the number of bytes read, or an error. I had to touch more areas that I had originally expected, so there are changes in a corner of the socket code, and a slight behavior change in the direct-io completion path with affects XFS and OCFS2. I would appreciate further review there, so I copied some extra people I hope can help. This patch is against 2.6.20-rc4-mm1. It has been compile-tested at each stage. It needs some runtime testing yet, but I prefer to get it out for commentary and test later. These patches are for RFC only and have not yet been signed off. NATE --- Documentation/filesystems/Locking | 11 + Documentation/filesystems/vfs.txt | 11 + arch/s390/hypfs/inode.c | 16 +- drivers/net/pppoe.c |8 - drivers/net/tun.c | 13 +- drivers/usb/gadget/inode.c| 239 +- fs/aio.c | 74 ++- fs/bad_inode.c| 10 - fs/block_dev.c| 109 +++-- fs/cifs/cifsfs.c | 10 - fs/compat.c | 56 fs/direct-io.c| 92 -- fs/ecryptfs/file.c| 16 +- fs/ext2/inode.c | 12 - fs/ext3/file.c|9 - fs/ext3/inode.c | 11 - fs/ext4/file.c|9 - fs/ext4/inode.c | 11 - fs/fat/inode.c| 12 - fs/fuse/dev.c | 13 +- fs/gfs2/ops_address.c | 14 +- fs/hfs/inode.c| 13 -- fs/hfsplus/inode.c| 13 -- fs/jfs/inode.c| 12 - fs/nfs/direct.c | 92 +++--- fs/nfs/file.c | 62 + fs/ntfs/file.c| 71 ++- fs/ocfs2/aops.c | 24 +-- fs/ocfs2/aops.h |8 - fs/ocfs2/file.c | 44 +++--- fs/ocfs2/inode.h |2 fs/pipe.c | 12 - fs/read_write.c | 225 --- fs/read_write.h |8 - fs/reiserfs/inode.c | 13 -- fs/smbfs/file.c | 28 ++-- fs/udf/file.c | 13 +- fs/xfs/linux-2.6/xfs_aops.c | 44 +++--- fs/xfs/linux-2.6/xfs_file.c | 58 + fs/xfs/linux-2.6/xfs_lrw.c| 29 ++-- fs/xfs/linux-2.6/xfs_lrw.h| 10 - fs/xfs/linux-2.6/xfs_vnode.h | 20 +-- include/linux/aio.h | 11 - include/linux/fs.h| 114 +- include/linux/net.h | 18 +- include/linux/nfs_fs.h| 12 - include/net/bluetooth/bluetooth.h |2 include/net/inet_common.h |3 include/net/scm.h |2 include/net/sock.h| 45 +-- include/net/tcp.h |6 include/net/udp.h |3 mm/filemap.c | 109 - net/appletalk/ddp.c |5 net/atm/common.c |6 net/atm/common.h |7 - net/ax25/af_ax25.c|7 - net/bluetooth/af_bluetooth.c |4 net/bluetooth/hci_sock.c |7 - net/bluetooth/l2cap.c |2 net/bluetooth/rfcomm/sock.c |8 - net/bluetooth/sco.c |3 net/core/sock.c | 12 - net/dccp/dccp.h |8 - net/dccp/probe.c |3 net/dccp/proto.c |7 - net/decnet/af_decnet.c|7 - net/econet/af_econet.c|7 - net/ipv4/af_inet.c|5 net/ipv4/raw.c|8 - net/ipv4/tcp.c|7 - net/ipv4/tcp_probe.c |3 net/ipv4/udp.c|9 - net/ipv4/udp_impl.h |2 net/ipv6/raw.c|6 net/ipv6/udp.c| 10 - net/ipv6/udp_impl.h |6 net/ipx/af_ipx.c |7 - net/irda/af_irda.c| 29 ++-- net/key/af_key.c
[PATCH -mm 4/10][RFC] aio: convert aio_complete to file_endio_t
Define a new function typedef for I/O completion at the file/iovec level -- typedef void (file_endio_t)(void *endio_data, ssize_t count, int err); and convert aio_complete and all its callers to this new prototype. --- drivers/usb/gadget/inode.c | 24 +++--- fs/aio.c | 59 - fs/block_dev.c |8 +- fs/direct-io.c | 18 + fs/nfs/direct.c|9 ++ include/linux/aio.h| 11 +++- include/linux/fs.h |2 + 7 files changed, 61 insertions(+), 70 deletions(-) --- diff -urpN -X dontdiff a/drivers/usb/gadget/inode.c b/drivers/usb/gadget/inode.c --- a/drivers/usb/gadget/inode.c2007-01-12 14:42:29.0 -0800 +++ b/drivers/usb/gadget/inode.c2007-01-12 14:25:34.0 -0800 @@ -559,35 +559,32 @@ static int ep_aio_cancel(struct kiocb *i return value; } -static ssize_t ep_aio_read_retry(struct kiocb *iocb) +static int ep_aio_read_retry(struct kiocb *iocb) { struct kiocb_priv *priv = iocb-private; - ssize_t len, total; - int i; + ssize_t total; + int i, err = 0; /* we retry to get the right mm context for this: */ /* copy stuff into user buffers */ total = priv-actual; - len = 0; for (i=0; i priv-nr_segs; i++) { ssize_t this = min((ssize_t)(priv-iv[i].iov_len), total); if (copy_to_user(priv-iv[i].iov_base, priv-buf, this)) { - if (len == 0) - len = -EFAULT; + err = -EFAULT; break; } total -= this; - len += this; if (total == 0) break; } kfree(priv-buf); kfree(priv); aio_put_req(iocb); - return len; + return err; } static void ep_aio_complete(struct usb_ep *ep, struct usb_request *req) @@ -610,9 +607,7 @@ static void ep_aio_complete(struct usb_e if (unlikely(kiocbIsCancelled(iocb))) aio_put_req(iocb); else - aio_complete(iocb, - req-actual ? req-actual : req-status, - req-status); + aio_complete(iocb, req-actual, req-status); } else { /* retry() won't report both; so we hide some faults */ if (unlikely(0 != req-status)) @@ -702,16 +697,17 @@ ep_aio_read(struct kiocb *iocb, const st { struct ep_data *epdata = iocb-ki_filp-private_data; char*buf; + size_t len = iov_length(iov, nr_segs); if (unlikely(epdata-desc.bEndpointAddress USB_DIR_IN)) return -EINVAL; - buf = kmalloc(iocb-ki_left, GFP_KERNEL); + buf = kmalloc(len, GFP_KERNEL); if (unlikely(!buf)) return -ENOMEM; iocb-ki_retry = ep_aio_read_retry; - return ep_aio_rwtail(iocb, buf, iocb-ki_left, epdata, iov, nr_segs); + return ep_aio_rwtail(iocb, buf, len, epdata, iov, nr_segs); } static ssize_t @@ -726,7 +722,7 @@ ep_aio_write(struct kiocb *iocb, const s if (unlikely(!(epdata-desc.bEndpointAddress USB_DIR_IN))) return -EINVAL; - buf = kmalloc(iocb-ki_left, GFP_KERNEL); + buf = kmalloc(iov_length(iov, nr_segs), GFP_KERNEL); if (unlikely(!buf)) return -ENOMEM; diff -urpN -X dontdiff a/fs/aio.c b/fs/aio.c --- a/fs/aio.c 2007-01-12 14:42:29.0 -0800 +++ b/fs/aio.c 2007-01-12 14:29:20.0 -0800 @@ -658,16 +658,16 @@ static inline int __queue_kicked_iocb(st * simplifies the coding of individual aio operations as * it avoids various potential races. */ -static ssize_t aio_run_iocb(struct kiocb *iocb) +static void aio_run_iocb(struct kiocb *iocb) { struct kioctx *ctx = iocb-ki_ctx; - ssize_t (*retry)(struct kiocb *); + int (*retry)(struct kiocb *); wait_queue_t *io_wait = current-io_wait; - ssize_t ret; + int err; if (!(retry = iocb-ki_retry)) { printk(aio_run_iocb: iocb-ki_retry = NULL\n); - return 0; + return; } /* @@ -702,8 +702,8 @@ static ssize_t aio_run_iocb(struct kiocb /* Quit retrying if the i/o has been cancelled */ if (kiocbIsCancelled(iocb)) { - ret = -EINTR; - aio_complete(iocb, ret, 0); + err = -EINTR; + aio_complete(iocb, iocb-ki_nbytes - iocb-ki_left, err); /* must not access the iocb after this */ goto out; } @@ -720,17 +720,17 @@ static ssize_t aio_run_iocb(struct kiocb */
[PATCH -mm 5/10][RFC] aio: make blk_directIO use file_endio_t
Convert the internals of blkdev_direct_IO to use a generic endio function, instead of directly calling aio_complete. This may also fix some bugs/races in this code, for instance it checks bio-bi_size instead of assuming it's zero, and it atomically accumulates the bytes_done counter (assuming that the bio completion handler can't race with itself *might* be valid here, but the direct-io code makes no such assumption). I'm also pretty sure that the address_space-directIO functions aren't supposed to mess with the iocb-ki_pos or -ki_left. --- diff -urpN -X dontdiff a/fs/block_dev.c b/fs/block_dev.c --- a/fs/block_dev.c2007-01-12 20:26:25.0 -0800 +++ b/fs/block_dev.c2007-01-12 20:23:55.0 -0800 @@ -131,10 +131,32 @@ blkdev_get_block(struct inode *inode, se return 0; } -static int blk_end_aio(struct bio *bio, unsigned int bytes_done, int error) +struct bdev_aio { + atomic_tiocount;/* refcount */ + atomic_tbytes_done; /* byte counter */ + int err;/* error handling */ + file_endio_t*endio; /* end I/O notify fn */ + void*endio_data;/* notify fn private data */ +}; + +static void blk_io_put(struct bdev_aio *io) +{ + if (!atomic_dec_and_test(io-iocount)) + return; + + if (!io-endio) + return complete((struct completion*)io-endio_data); + + io-endio(io-endio_data, atomic_read(io-bytes_done), io-err); + kfree(io); +} + +static int blk_bio_endio(struct bio *bio, unsigned int bytes_done, int error) { - struct kiocb *iocb = bio-bi_private; - atomic_t *bio_count = iocb-ki_bio_count; + struct bdev_aio *io = bio-bi_private; + + if (bio-bi_size) + return 1; if (bio_data_dir(bio) == READ) bio_check_pages_dirty(bio); @@ -143,16 +165,21 @@ static int blk_end_aio(struct bio *bio, bio_put(bio); } - /* iocb-ki_nbytes stores error code from LLDD */ - if (error) - iocb-ki_nbytes = -EIO; - - if (atomic_dec_and_test(bio_count)) - aio_complete(iocb, iocb-ki_left, iocb-ki_nbytes); + if (error) + io-err = error; + atomic_add(bytes_done, io-bytes_done); + blk_io_put(io); return 0; } +static void blk_io_init(struct bdev_aio *io) +{ + atomic_set(io-iocount, 1); + atomic_set(io-bytes_done, 0); + io-err = 0; +} + #define VEC_SIZE 16 struct pvec { unsigned short nr; @@ -208,24 +235,33 @@ blkdev_direct_IO(int rw, struct kiocb *i unsigned long addr; /* user iovec address */ size_t count; /* user iovec len */ - size_t nbytes = iocb-ki_nbytes = iocb-ki_left; /* total xfer size */ + size_t nbytes; /* total xfer size */ loff_t size;/* size of block device */ struct bio *bio; - atomic_t *bio_count = iocb-ki_bio_count; + struct bdev_aio stack_io, *io; + file_endio_t *endio = aio_complete; + void *endio_data = iocb; struct page *page; struct pvec pvec; pvec.nr = 0; pvec.idx = 0; + io = stack_io; + if (endio) { + io = kmalloc(sizeof(struct bdev_aio), GFP_KERNEL); + if (!io) + return -ENOMEM; + } + blk_io_init(io); + if (pos blocksize_mask) return -EINVAL; + nbytes = iov_length(iov, nr_segs); size = i_size_read(inode); - if (pos + nbytes size) { + if (pos + nbytes size) nbytes = size - pos; - iocb-ki_left = nbytes; - } /* * check first non-zero iov alignment, the remaining @@ -237,7 +273,6 @@ blkdev_direct_IO(int rw, struct kiocb *i if (addr blocksize_mask || count blocksize_mask) return -EINVAL; } while (!count ++seg nr_segs); - atomic_set(bio_count, 1); while (nbytes) { /* roughly estimate number of bio vec needed */ @@ -248,8 +283,8 @@ blkdev_direct_IO(int rw, struct kiocb *i /* bio_alloc should not fail with GFP_KERNEL flag */ bio = bio_alloc(GFP_KERNEL, nvec); bio-bi_bdev = I_BDEV(inode); - bio-bi_end_io = blk_end_aio; - bio-bi_private = iocb; + bio-bi_end_io = blk_bio_endio; + bio-bi_private = io; bio-bi_sector = pos blkbits; same_bio: cur_off = addr ~PAGE_MASK; @@ -289,18 +324,27 @@ same_bio: /* bio is ready, submit it */ if (rw == READ) bio_set_pages_dirty(bio); - atomic_inc(bio_count); + atomic_inc(io-iocount); submit_bio(rw, bio); } completion: -
[PATCH -mm 6/10][RFC] aio: make nfs_directIO use file_endio_t
This converts the iternals of nfs's directIO support to use a generic endio function, instead of directly calling aio_complete. It's pretty easy because it already has a pretty abstracted completion path. --- diff -urpN -X dontdiff a/fs/nfs/direct.c b/fs/nfs/direct.c --- a/fs/nfs/direct.c 2007-01-12 14:53:48.0 -0800 +++ b/fs/nfs/direct.c 2007-01-12 15:02:30.0 -0800 @@ -68,7 +68,6 @@ struct nfs_direct_req { /* I/O parameters */ struct nfs_open_context *ctx; /* file open context info */ - struct kiocb * iocb; /* controlling i/o request */ struct inode * inode; /* target file of i/o */ /* completion state */ @@ -77,6 +76,8 @@ struct nfs_direct_req { ssize_t count, /* bytes actually processed */ error; /* any reported error */ struct completion completion; /* wait for i/o completion */ + file_endio_t*endio; /* async completion function */ + void*endio_data;/* private completion data */ /* commit state */ struct list_headrewrite_list; /* saved nfs_write_data structs */ @@ -151,7 +152,7 @@ static inline struct nfs_direct_req *nfs kref_get(dreq-kref); init_completion(dreq-completion); INIT_LIST_HEAD(dreq-rewrite_list); - dreq-iocb = NULL; + dreq-endio = NULL; dreq-ctx = NULL; spin_lock_init(dreq-lock); atomic_set(dreq-io_count, 0); @@ -179,7 +180,7 @@ static ssize_t nfs_direct_wait(struct nf ssize_t result = -EIOCBQUEUED; /* Async requests don't wait here */ - if (dreq-iocb) + if (!dreq-endio) goto out; result = wait_for_completion_interruptible(dreq-completion); @@ -194,14 +195,10 @@ out: return (ssize_t) result; } -/* - * Synchronous I/O uses a stack-allocated iocb. Thus we can't trust - * the iocb is still valid here if this is a synchronous request. - */ static void nfs_direct_complete(struct nfs_direct_req *dreq) { - if (dreq-iocb) - aio_complete(dreq-iocb, dreq-count, dreq-error); + if (dreq-endio) + dreq-endio(dreq-endio_data, dreq-count, dreq-error); complete_all(dreq-completion); @@ -332,11 +329,13 @@ static ssize_t nfs_direct_read_schedule( return result 0 ? (ssize_t) result : -EFAULT; } -static ssize_t nfs_direct_read(struct kiocb *iocb, unsigned long user_addr, size_t count, loff_t pos) +static ssize_t nfs_direct_read(struct file *file, unsigned long user_addr, + size_t count, loff_t pos, + file_endio_t *endio, void *endio_data) { ssize_t result = 0; sigset_t oldset; - struct inode *inode = iocb-ki_filp-f_mapping-host; + struct inode *inode = file-f_mapping-host; struct rpc_clnt *clnt = NFS_CLIENT(inode); struct nfs_direct_req *dreq; @@ -345,9 +344,9 @@ static ssize_t nfs_direct_read(struct ki return -ENOMEM; dreq-inode = inode; - dreq-ctx = get_nfs_open_context((struct nfs_open_context *)iocb-ki_filp-private_data); - if (!is_sync_kiocb(iocb)) - dreq-iocb = iocb; + dreq-ctx = get_nfs_open_context((struct nfs_open_context *)file-private_data); + dreq-endio = endio; + dreq-endio_data = endio_data; nfs_add_stats(inode, NFSIOS_DIRECTREADBYTES, count); rpc_clnt_sigmask(clnt, oldset); @@ -663,11 +662,13 @@ static ssize_t nfs_direct_write_schedule return result 0 ? (ssize_t) result : -EFAULT; } -static ssize_t nfs_direct_write(struct kiocb *iocb, unsigned long user_addr, size_t count, loff_t pos) +static ssize_t nfs_direct_write(struct file *file, unsigned long user_addr, + size_t count, loff_t pos, + file_endio_t *endio, void *endio_data) { ssize_t result = 0; sigset_t oldset; - struct inode *inode = iocb-ki_filp-f_mapping-host; + struct inode *inode = file-f_mapping-host; struct rpc_clnt *clnt = NFS_CLIENT(inode); struct nfs_direct_req *dreq; size_t wsize = NFS_SERVER(inode)-wsize; @@ -682,9 +683,9 @@ static ssize_t nfs_direct_write(struct k sync = FLUSH_STABLE; dreq-inode = inode; - dreq-ctx = get_nfs_open_context((struct nfs_open_context *)iocb-ki_filp-private_data); - if (!is_sync_kiocb(iocb)) - dreq-iocb = iocb; + dreq-ctx = get_nfs_open_context((struct nfs_open_context *)file-private_data); + dreq-endio = endio; + dreq-endio_data = endio_data; nfs_add_stats(inode, NFSIOS_DIRECTWRITTENBYTES, count); @@ -701,10 +702,12 @@ static ssize_t nfs_direct_write(struct k /** * nfs_file_direct_read - file direct read operation for NFS
[PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left
Convert code using iocb-ki_left to use the more generic iov_length() call. --- diff -urpN -X dontdiff a/fs/ocfs2/file.c b/fs/ocfs2/file.c --- a/fs/ocfs2/file.c 2007-01-10 11:50:26.0 -0800 +++ b/fs/ocfs2/file.c 2007-01-10 12:42:09.0 -0800 @@ -1157,7 +1157,7 @@ static ssize_t ocfs2_file_aio_write(stru filp-f_path.dentry-d_name.name); /* happy write of zero bytes */ - if (iocb-ki_left == 0) + if (iov_length(iov, nr_segs) == 0) return 0; mutex_lock(inode-i_mutex); @@ -1177,7 +1177,7 @@ static ssize_t ocfs2_file_aio_write(stru } ret = ocfs2_prepare_inode_for_write(filp-f_path.dentry, iocb-ki_pos, - iocb-ki_left, appending); + iov_length(iov, nr_segs), appending); if (ret 0) { mlog_errno(ret); goto out; diff -urpN -X dontdiff a/fs/smbfs/file.c b/fs/smbfs/file.c --- a/fs/smbfs/file.c 2007-01-10 11:50:28.0 -0800 +++ b/fs/smbfs/file.c 2007-01-10 12:42:09.0 -0800 @@ -222,7 +222,7 @@ smb_file_aio_read(struct kiocb *iocb, co ssize_t status; VERBOSE(file %s/%s, [EMAIL PROTECTED], DENTRY_PATH(dentry), - (unsigned long) iocb-ki_left, (unsigned long) pos); + (unsigned long) iov_length(iov, nr_segs), (unsigned long) pos); status = smb_revalidate_inode(dentry); if (status) { @@ -328,7 +328,7 @@ smb_file_aio_write(struct kiocb *iocb, c VERBOSE(file %s/%s, [EMAIL PROTECTED], DENTRY_PATH(dentry), - (unsigned long) iocb-ki_left, (unsigned long) pos); + (unsigned long) iov_length(iov, nr_segs), (unsigned long) pos); result = smb_revalidate_inode(dentry); if (result) { @@ -341,7 +341,7 @@ smb_file_aio_write(struct kiocb *iocb, c if (result) goto out; - if (iocb-ki_left 0) { + if (iov_length(iov, nr_segs) 0) { result = generic_file_aio_write(iocb, iov, nr_segs, pos); VERBOSE(pos=%ld, size=%ld, mtime=%ld, atime=%ld\n, (long) file-f_pos, (long) dentry-d_inode-i_size, diff -urpN -X dontdiff a/fs/udf/file.c b/fs/udf/file.c --- a/fs/udf/file.c 2007-01-10 11:53:02.0 -0800 +++ b/fs/udf/file.c 2007-01-10 12:42:09.0 -0800 @@ -109,7 +109,7 @@ static ssize_t udf_file_aio_write(struct struct file *file = iocb-ki_filp; struct inode *inode = file-f_path.dentry-d_inode; int err, pos; - size_t count = iocb-ki_left; + size_t count = iov_length(iov, nr_segs); if (UDF_I_ALLOCTYPE(inode) == ICBTAG_FLAG_AD_IN_ICB) { diff -urpN -X dontdiff a/net/socket.c b/net/socket.c --- a/net/socket.c 2007-01-10 12:40:54.0 -0800 +++ b/net/socket.c 2007-01-10 12:42:09.0 -0800 @@ -632,7 +632,7 @@ static ssize_t sock_aio_read(struct kioc if (pos != 0) return -ESPIPE; - if (iocb-ki_left == 0) /* Match SYS5 behaviour */ + if (iov_length(iov, nr_segs) == 0) /* Match SYS5 behaviour */ return 0; for (i = 0; i nr_segs; i++) @@ -660,7 +660,7 @@ static ssize_t sock_aio_write(struct kio if (pos != 0) return -ESPIPE; - if (iocb-ki_left == 0) /* Match SYS5 behaviour */ + if (iov_length(iov, nr_segs) == 0) /* Match SYS5 behaviour */ return 0; for (i = 0; i nr_segs; i++) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -mm 9/10][RFC] aio: usb gadget remove aio file ops
This removes the aio implementation from the usb gadget file system. Aside from making very creative (!) use of the aio retry path, it can't be of any use performance-wise because it always kmalloc()s a bounce buffer for the *whole* I/O size. Perhaps the only reason to keep it around is the ability to cancel I/O requests, which only applies when using the user space async I/O interface. I highly doubt that is enough incentive to justify the extra complexity here or in user-space, so I think it's a safe bet to remove this. If that feature still desired, it would be possible to implement a sync interface that does an interruptible sleep. I can be convinced otherwise, but the alternatives are difficult. See for example the fuse, get_user_pages, flush_anon_page, aliasing caches and all that again LKML thread recently for why it's waaay easier to kmalloc a bounce buffer here, and (ab)use the retry interface. --- diff -urpN -X dontdiff a/drivers/usb/gadget/inode.c b/drivers/usb/gadget/inode.c --- a/drivers/usb/gadget/inode.c2007-01-10 13:23:46.0 -0800 +++ b/drivers/usb/gadget/inode.c2007-01-10 16:56:09.0 -0800 @@ -527,218 +527,6 @@ static int ep_ioctl (struct inode *inode /*--*/ -/* ASYNCHRONOUS ENDPOINT I/O OPERATIONS (bulk/intr/iso) */ - -struct kiocb_priv { - struct usb_request *req; - struct ep_data *epdata; - void*buf; - const struct iovec *iv; - unsigned long nr_segs; - unsignedactual; -}; - -static int ep_aio_cancel(struct kiocb *iocb, struct io_event *e) -{ - struct kiocb_priv *priv = iocb-private; - struct ep_data *epdata; - int value; - - local_irq_disable(); - epdata = priv-epdata; - // spin_lock(epdata-dev-lock); - kiocbSetCancelled(iocb); - if (likely(epdata epdata-ep priv-req)) - value = usb_ep_dequeue (epdata-ep, priv-req); - else - value = -EINVAL; - // spin_unlock(epdata-dev-lock); - local_irq_enable(); - - aio_put_req(iocb); - return value; -} - -static int ep_aio_read_retry(struct kiocb *iocb) -{ - struct kiocb_priv *priv = iocb-private; - ssize_t total; - int i, err = 0; - - /* we retry to get the right mm context for this: */ - - /* copy stuff into user buffers */ - total = priv-actual; - for (i=0; i priv-nr_segs; i++) { - ssize_t this = min((ssize_t)(priv-iv[i].iov_len), total); - - if (copy_to_user(priv-iv[i].iov_base, priv-buf, this)) { - err = -EFAULT; - break; - } - - total -= this; - if (total == 0) - break; - } - kfree(priv-buf); - kfree(priv); - aio_put_req(iocb); - return err; -} - -static void ep_aio_complete(struct usb_ep *ep, struct usb_request *req) -{ - struct kiocb*iocb = req-context; - struct kiocb_priv *priv = iocb-private; - struct ep_data *epdata = priv-epdata; - - /* lock against disconnect (and ideally, cancel) */ - spin_lock(epdata-dev-lock); - priv-req = NULL; - priv-epdata = NULL; - if (priv-iv == NULL - || unlikely(req-actual == 0) - || unlikely(kiocbIsCancelled(iocb))) { - kfree(req-buf); - kfree(priv); - iocb-private = NULL; - /* aio_complete() reports bytes-transferred _and_ faults */ - if (unlikely(kiocbIsCancelled(iocb))) - aio_put_req(iocb); - else - aio_complete(iocb, req-actual, req-status); - } else { - /* retry() won't report both; so we hide some faults */ - if (unlikely(0 != req-status)) - DBG(epdata-dev, %s fault %d len %d\n, - ep-name, req-status, req-actual); - - priv-buf = req-buf; - priv-actual = req-actual; - kick_iocb(iocb); - } - spin_unlock(epdata-dev-lock); - - usb_ep_free_request(ep, req); - put_ep(epdata); -} - -static ssize_t -ep_aio_rwtail( - struct kiocb*iocb, - char*buf, - size_t len, - struct ep_data *epdata, - const struct iovec *iv, - unsigned long nr_segs -) -{ - struct kiocb_priv *priv; - struct usb_request *req; - ssize_t value; - - priv = kmalloc(sizeof *priv, GFP_KERNEL); - if (!priv) { - value = -ENOMEM; -fail: - kfree(buf); - return value; - } - iocb-private = priv; -
[PATCH -mm 7/10][RFC] aio: make __blockdev_direct_IO use file_endio_t
This converts the internals of __blockdev_direct_IO in fs/direct-io.c to use a generic endio function, instead of directly calling aio_complete. It also changes the semantics of dio_iodone to be more friendly to its only users, xfs and ocfs2. This allows the caller to know how to release locks and tear down data structures on error. It also converts the _own_locking and _no_locking variants of blockdev_direct_IO to use a generic endio function. --- fs/direct-io.c | 74 ++-- fs/gfs2/ops_address.c |6 +-- fs/ocfs2/aops.c | 15 ++-- fs/ocfs2/aops.h |8 fs/ocfs2/file.c | 18 -- fs/ocfs2/inode.h|2 - fs/xfs/linux-2.6/xfs_aops.c | 33 +++ include/linux/fs.h | 57 ++--- 8 files changed, 104 insertions(+), 109 deletions(-) --- diff -urpN -X dontdiff a/fs/direct-io.c b/fs/direct-io.c --- a/fs/direct-io.c2007-01-12 14:53:48.0 -0800 +++ b/fs/direct-io.c2007-01-12 15:06:44.0 -0800 @@ -67,7 +67,7 @@ struct dio { struct bio *bio;/* bio under assembly */ struct inode *inode; int rw; - loff_t i_size; /* i_size when submitted */ + unsigned max_to_read; /* (i_size when submitted) - offset */ int lock_type; /* doesn't change */ unsigned blkbits; /* doesn't change */ unsigned blkfactor; /* When we're using an alignment which @@ -89,6 +89,7 @@ struct dio { int reap_counter; /* rate limit reaping */ get_block_t *get_block; /* block mapping function */ dio_iodone_t *end_io; /* IO completion function */ + void *destructor_data; /* private data for completion fn */ sector_t final_block_in_bio;/* current final block in bio + 1 */ sector_t next_block_for_io; /* next block to be put under IO, in dio_blocks units */ @@ -127,7 +128,8 @@ struct dio { struct task_struct *waiter; /* waiting task (NULL if none) */ /* AIO related stuff */ - struct kiocb *iocb; /* kiocb */ + file_endio_t *file_endio; /* aio completion function */ + void *endio_data; /* private data for aio completion */ int is_async; /* is IO async ? */ int io_error; /* IO error in completion path */ ssize_t result; /* IO result */ @@ -222,7 +224,7 @@ static struct page *dio_get_page(struct * filesystems can use it to hold additional state between get_block calls and * dio_complete. */ -static int dio_complete(struct dio *dio, loff_t offset, int ret) +static int dio_complete(struct dio *dio, int ret) { /* * AIO submission can race with bio completion to get here while @@ -232,25 +234,21 @@ static int dio_complete(struct dio *dio, */ if (ret == -EIOCBQUEUED) ret = 0; + if (ret == 0) + ret = dio-page_errors; + if (ret == 0) + ret = dio-io_error; if (dio-result) { /* Check for short read case */ - if ((dio-rw == READ) ((offset + dio-result) dio-i_size)) - dio-result = dio-i_size - offset; + if ((dio-rw == READ) (dio-result dio-max_to_read)) + dio-result = dio-max_to_read; } - if (dio-end_io dio-result) - dio-end_io(dio-iocb, offset, dio-result, - dio-map_bh.b_private); if (dio-lock_type == DIO_LOCKING) /* lockdep: non-owner release */ up_read_non_owner(dio-inode-i_alloc_sem); - if (ret == 0) - ret = dio-page_errors; - if (ret == 0) - ret = dio-io_error; - return ret; } @@ -277,8 +275,11 @@ static int dio_bio_end_aio(struct bio *b spin_unlock_irqrestore(dio-bio_lock, flags); if (remaining == 0) { - int err = dio_complete(dio, dio-iocb-ki_pos, 0); - aio_complete(dio-iocb, dio-result, err); + int err = dio_complete(dio, 0); + if (dio-end_io) + dio-end_io(dio-destructor_data, dio-result, + dio-map_bh.b_private); + dio-file_endio(dio-endio_data, dio-result, err); kfree(dio); } @@ -944,10 +945,11 @@ out: * Releases both i_mutex and i_alloc_sem */ static ssize_t -direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode, +direct_io_worker(int rw, struct file *file, struct inode *inode, const struct iovec *iov, loff_t offset, unsigned long nr_segs, unsigned blkbits, get_block_t get_block,
Re: [PATCH -mm 3/10][RFC] aio: use iov_length instead of ki_left
On Mon, Jan 15, 2007 at 05:54:50PM -0800, Nate Diller wrote: Convert code using iocb-ki_left to use the more generic iov_length() call. No way. We need to reduce the numer of iovec traversals, not adding more of them. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -mm 8/10][RFC] aio: make direct_IO aops use file_endio_t
This converts the _locking variant of blockdev_direct_IO to use a generic endio function, and updates all the FS callsites. --- Documentation/filesystems/Locking |5 +++-- Documentation/filesystems/vfs.txt |5 +++-- fs/block_dev.c|9 - fs/ext2/inode.c | 12 +--- fs/ext3/inode.c | 11 +-- fs/ext4/inode.c | 11 +-- fs/fat/inode.c| 12 ++-- fs/gfs2/ops_address.c |8 fs/hfs/inode.c| 13 ++--- fs/hfsplus/inode.c| 13 ++--- fs/jfs/inode.c| 12 +--- fs/nfs/direct.c |8 +--- fs/ocfs2/aops.c |9 + fs/reiserfs/inode.c | 13 + fs/xfs/linux-2.6/xfs_aops.c | 11 ++- fs/xfs/linux-2.6/xfs_lrw.c|4 ++-- include/linux/fs.h| 28 +--- include/linux/nfs_fs.h|4 ++-- mm/filemap.c | 34 ++ 19 files changed, 108 insertions(+), 114 deletions(-) --- diff -urpN -X dontdiff a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking --- a/Documentation/filesystems/Locking 2007-01-12 20:26:06.0 -0800 +++ b/Documentation/filesystems/Locking 2007-01-12 20:42:37.0 -0800 @@ -169,8 +169,9 @@ prototypes: sector_t (*bmap)(struct address_space *, sector_t); int (*invalidatepage) (struct page *, unsigned long); int (*releasepage) (struct page *, int); - int (*direct_IO)(int, struct kiocb *, const struct iovec *iov, - loff_t offset, unsigned long nr_segs); + int (*direct_IO)(int, struct file *, const struct iovec *iov, + loff_t offset, unsigned long nr_segs, + file_endio_t *endio, void *endio_data); int (*launder_page) (struct page *); locking rules: diff -urpN -X dontdiff a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt --- a/Documentation/filesystems/vfs.txt 2007-01-12 20:26:06.0 -0800 +++ b/Documentation/filesystems/vfs.txt 2007-01-12 20:42:37.0 -0800 @@ -537,8 +537,9 @@ struct address_space_operations { sector_t (*bmap)(struct address_space *, sector_t); int (*invalidatepage) (struct page *, unsigned long); int (*releasepage) (struct page *, int); - ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov, - loff_t offset, unsigned long nr_segs); + ssize_t (*direct_IO)(int, struct file *, const struct iovec *iov, + loff_t offset, unsigned long nr_segs, + file_endio_t *endio, void *endio_data); struct page* (*get_xip_page)(struct address_space *, sector_t, int); /* migrate the contents of a page to the specified target */ diff -urpN -X dontdiff a/fs/block_dev.c b/fs/block_dev.c --- a/fs/block_dev.c2007-01-12 20:29:02.0 -0800 +++ b/fs/block_dev.c2007-01-12 20:42:37.0 -0800 @@ -222,10 +222,11 @@ static void blk_unget_page(struct page * } static ssize_t -blkdev_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, -loff_t pos, unsigned long nr_segs) +blkdev_direct_IO(int rw, struct file *file, const struct iovec *iov, +loff_t pos, unsigned long nr_segs, file_endio_t *endio, +void *endio_data) { - struct inode *inode = iocb-ki_filp-f_mapping-host; + struct inode *inode = file-f_mapping-host; unsigned blkbits = blksize_bits(bdev_hardsect_size(I_BDEV(inode))); unsigned blocksize_mask = (1 blkbits) - 1; unsigned long seg = 0; /* iov segment iterator */ @@ -239,8 +240,6 @@ blkdev_direct_IO(int rw, struct kiocb *i loff_t size;/* size of block device */ struct bio *bio; struct bdev_aio stack_io, *io; - file_endio_t *endio = aio_complete; - void *endio_data = iocb; struct page *page; struct pvec pvec; diff -urpN -X dontdiff a/fs/ext2/inode.c b/fs/ext2/inode.c --- a/fs/ext2/inode.c 2007-01-12 20:26:06.0 -0800 +++ b/fs/ext2/inode.c 2007-01-12 20:42:37.0 -0800 @@ -752,14 +752,12 @@ static sector_t ext2_bmap(struct address } static ssize_t -ext2_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, - loff_t offset, unsigned long nr_segs) +ext2_direct_IO(int rw, struct file *file, const struct iovec *iov, + loff_t offset, unsigned long nr_segs, file_endio_t *endio, + void *endio_data) { - struct file *file = iocb-ki_filp; - struct inode *inode = file-f_mapping-host; - - return blockdev_direct_IO(rw, iocb, inode, inode-i_sb-s_bdev, iov, -
[PATCH -mm 2/10][RFC] aio: net use struct socket for io
Remove unused arg from socket operations The sendmsg and recvmsg socket operations take a kiocb pointer, but none of the functions actually use it. There's really no need even theoretically, it's really quite ugly having it there at all. Also, removing it will pave the way for a more generic completion path in the file_operations. --- drivers/net/pppoe.c |8 +++ include/linux/net.h | 18 +++-- include/net/bluetooth/bluetooth.h |2 - include/net/inet_common.h |3 -- include/net/sock.h| 19 -- include/net/tcp.h |6 ++--- include/net/udp.h |3 -- net/appletalk/ddp.c |5 +--- net/atm/common.c |6 + net/atm/common.h |7 ++ net/ax25/af_ax25.c|7 ++ net/bluetooth/af_bluetooth.c |4 +-- net/bluetooth/hci_sock.c |7 ++ net/bluetooth/l2cap.c |2 - net/bluetooth/rfcomm/sock.c |8 +++ net/bluetooth/sco.c |3 -- net/core/sock.c | 12 --- net/dccp/dccp.h |8 +++ net/dccp/probe.c |3 -- net/dccp/proto.c |7 ++ net/decnet/af_decnet.c|7 ++ net/econet/af_econet.c|7 ++ net/ipv4/af_inet.c|5 +--- net/ipv4/raw.c|8 ++- net/ipv4/tcp.c|7 ++ net/ipv4/tcp_probe.c |3 -- net/ipv4/udp.c|9 +++- net/ipv4/udp_impl.h |2 - net/ipv6/raw.c|6 + net/ipv6/udp.c| 10 +++-- net/ipv6/udp_impl.h |6 + net/ipx/af_ipx.c |7 ++ net/irda/af_irda.c| 29 +--- net/key/af_key.c |6 + net/llc/af_llc.c |7 ++ net/netlink/af_netlink.c |6 + net/netrom/af_netrom.c|7 ++ net/packet/af_packet.c| 11 -- net/rose/af_rose.c|7 ++ net/sctp/socket.c |9 +++- net/socket.c | 32 ++- net/tipc/socket.c | 28 +-- net/unix/af_unix.c| 39 +++--- net/wanrouter/af_wanpipe.c|7 ++ net/x25/af_x25.c |6 + 45 files changed, 166 insertions(+), 243 deletions(-) --- diff -urpN -X dontdiff a/drivers/net/pppoe.c b/drivers/net/pppoe.c --- a/drivers/net/pppoe.c 2007-01-12 11:18:47.244855016 -0800 +++ b/drivers/net/pppoe.c 2007-01-12 11:29:21.179177108 -0800 @@ -746,8 +746,8 @@ static int pppoe_ioctl(struct socket *so } -static int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock, - struct msghdr *m, size_t total_len) +static int pppoe_sendmsg(struct socket *sock, struct msghdr *m, +size_t total_len) { struct sk_buff *skb = NULL; struct sock *sk = sock-sk; @@ -912,8 +912,8 @@ static struct ppp_channel_ops pppoe_chan .start_xmit = pppoe_xmit, }; -static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock, - struct msghdr *m, size_t total_len, int flags) +static int pppoe_recvmsg(struct socket *sock, struct msghdr *m, +size_t total_len, int flags) { struct sock *sk = sock-sk; struct sk_buff *skb = NULL; diff -urpN -X dontdiff a/include/linux/net.h b/include/linux/net.h --- a/include/linux/net.h 2007-01-12 11:18:56.683629587 -0800 +++ b/include/linux/net.h 2007-01-12 11:29:21.185175058 -0800 @@ -118,7 +118,6 @@ struct socket { struct vm_area_struct; struct page; -struct kiocb; struct sockaddr; struct msghdr; struct module; @@ -156,11 +155,10 @@ struct proto_ops { int optname, char __user *optval, int optlen); int (*compat_getsockopt)(struct socket *sock, int level, int optname, char __user *optval, int __user *optlen); - int (*sendmsg) (struct kiocb *iocb, struct socket *sock, - struct msghdr *m, size_t total_len); - int (*recvmsg) (struct kiocb *iocb, struct socket *sock, - struct msghdr *m, size_t total_len, - int flags); + int (*sendmsg) (struct socket *sock, struct msghdr *m, + size_t total_len); + int (*recvmsg) (struct socket *sock, struct msghdr *m, + size_t total_len, int flags); int