Re: [PATCH 2/4] forcedeth: fix power management support
Ayaz Abdulla [EMAIL PROTECTED] writes: This patch fixes the power management functions. It includes lowering the phy speed to conserve power. Shouldn't there be some way to disable this? AFAIK a few old switches have trouble with this. I assume a new ethtool option would be appropiate since we expect other drivers to gain this capability too. Could some of this code be put into the generic MII layer? -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: r8169: Link always down
Francois Romieu wrote: [...] As an option you can retrieve the url below which contains all the patches above in a single patch-file: http://www.fr.zoreil.com/people/francois/misc/20070522-2.6.22-rc2-r8169.patch Thanks for providing the single patch file. I was able to compile the patched kernel (version 2.6.22-rc3) with the EEPROM option enabled. However, the NIC still does not work (r8169: eth0: link down). Please let me know what else I could do to help you. Clemens - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH-2.4] Fix divide by 0 in vegas_cong_avoid()
Hi, I had a divide by zero on kernel 2.4.33 running with Vegas enabled. The KDB back trace is: kdb bt Stack traceback for pid 0 0x403a600000 10 R 0x403a6370 *swapper EBPEIPFunction (args) 0x403a7d48 0x4026ae51 vegas_cong_avoid+0x111 (0x5f3bb638, 0x73c92cbb, 0x , 0x73c92cbb, 0xf28d0275) kernel .text 0x4010 0x4026ad40 0x4026aef0 0x403a7d8c 0x4026bb67 tcp_ack+0x307 (0x5f3bb560, 0x581985c0, 0x18e, 0x4023e765, 0x5c369044) kernel .text 0x4010 0x4026b860 0x4026be20 0x403a7ddc 0x4026e771 tcp_rcv_established+0x461 (0x5f3bb560, 0x581985c0, 0x5c369 044, 0x2c, 0x5f3bb638) kernel .text 0x4010 0x4026e310 0x4026ed80 0x403a7e00 0x40277c2f tcp_v4_do_rcv+0x14f (0x5f3bb560, 0x581985c0, 0x0, 0x403a7e 50, 0x4024bb42) kernel .text 0x4010 0x40277ae0 0x40277c40 Rest was cut for readability What happens is that vegas_rtt_calc() gets rtt as -1, so when it adds 1 the rtt is set to zero. It seems that the -1 came from tcp_clean_rtx_queue() so I made this small patch to fix the problem. I think it is also relevant to 2.6. Don't perform congestion avoidance on packets that we didn't calculate there RTT, as this may result in a divide by zero later on. Signed-off-by: Lior Dotan [EMAIL PROTECTED] --- diff -up net/ipv4/tcp_input.c.orig net/ipv4/tcp_input.c --- net/ipv4/tcp_input.c.orig 2007-05-29 11:43:37.0 +0300 +++ net/ipv4/tcp_input.c2007-05-29 11:20:21.0 +0300 @@ -2425,6 +2424,7 @@ static int tcp_clean_rtx_queue(struct so tp-retrans_out--; acked |= FLAG_RETRANS_DATA_ACKED; seq_rtt = -1; + acked = ~FLAG_DATA_ACKED; } else if (seq_rtt 0) seq_rtt = now - scb-when; if(sacked TCPCB_SACKED_ACKED) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SKY2]: Fix VLAN unregistration
Ben Greear wrote: Patrick McHardy wrote: Fix sky2 disabling VLAN completely when the first vid is unregistered. For some reason the VLAN code insists on the driver providing a vlan_rx_kill_vid function even if only NETIF_F_HW_VLAN_RX and not NETIF_F_HW_VLAN_FILTER is set, so this patch keeps an empty function. This seems to be a bug though, vlan_rx_add_vid is only required with NETIF_F_HW_VLAN_FILTER. Ben? I believe DaveM did most of the vlan hw-accel work. It would be easy enough to check for a null function before calling the vlan_rx_kill_vid function and relax the checks based on the flags, but there may be valid reasons to keep this as is. Most drivers are using vlan_rx_kill_vid to remove a vid from the group under their local local lock, which seems unnecessary since the VLAN code calls synchronize_net before freeing it and also does the removal itself. A few others have empty functions and 8139cp seems to have the same bug as sky2. In all cases it doesn't seem to be needed for drivers without NETIF_F_HW_VLAN_FILTER. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] merge dst_discard in out into one, removed a duplicate function
On 5/28/07, Jan Engelhardt [EMAIL PROTECTED] wrote: Uhm, just replace every invocation of dst_discard_in/_out() directly by dst_discard ... don't add macros for that. merge dst_discard in out into one, this removed a duplicate function. Signed-off-by: Denis Cheng [EMAIL PROTECTED] --- Is there anybody also found this duplicate? I found this duplcate had existed from linux-2.2 linux-2.4 to the latest kernel tree, you could check it in the lxr site conveniently: http://lxr.linux.no/source/net/core/dst.c?v=2.2.26 http://lxr.linux.no/source/net/core/dst.c?v=2.4.18 http://lxr.linux.no/source/net/core/dst.c With little difference, the 2.2 and 2.4 version use two different function name: dst_discard and dst_blackhole, but the definition of them are the same, so they are same substantially. Additionally, the two same function are two local(static) functions in a single file dst.c, I don't know why there should exist two same functions. I wonder what the consideration of this design is, because I'm just a kernel newbie; this problem has been confusing me for many days, so if you know why, please drop me some directions. diff --git a/net/core/dst.c b/net/core/dst.c index 764bccb..c6a0587 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -111,13 +111,7 @@ out: spin_unlock(dst_lock); } -static int dst_discard_in(struct sk_buff *skb) -{ - kfree_skb(skb); - return 0; -} - -static int dst_discard_out(struct sk_buff *skb) +static int dst_discard(struct sk_buff *skb) { kfree_skb(skb); return 0; @@ -138,8 +132,7 @@ void * dst_alloc(struct dst_ops * ops) dst-ops = ops; dst-lastuse = jiffies; dst-path = dst; - dst-input = dst_discard_in; - dst-output = dst_discard_out; + dst-input = dst-output = dst_discard; #if RT_CACHE_DEBUG = 2 atomic_inc(dst_total); #endif @@ -153,8 +146,7 @@ static void ___dst_free(struct dst_entry * dst) protocol module is unloaded. */ if (dst-dev == NULL || !(dst-dev-flagsIFF_UP)) { - dst-input = dst_discard_in; - dst-output = dst_discard_out; + dst-input = dst-output = dst_discard; } dst-obsolete = 2; } @@ -242,8 +234,7 @@ static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev, return; if (!unregister) { - dst-input = dst_discard_in; - dst-output = dst_discard_out; + dst-input = dst-output = dst_discard; } else { dst-dev = loopback_dev; dev_hold(loopback_dev); -- Denis Cheng Linux Application Developer - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] merge dst_discard in out into one, removed a duplicate function
rae l wrote: merge dst_discard in out into one, this removed a duplicate function. Your patch is whitespace damaged, please fix your mailer and resend *once*. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/3] 2.6.22-rc3: known regressions with patches
Hi all, Here is a list of some known regressions in 2.6.22-rc3 with patches available. Feel free to add new regressions/remove fixed etc. http://kernelnewbies.org/known_regressions Networking Subject: Oops with prism54 in 2.6.22-rc3 References : http://lkml.org/lkml/2007/5/26/54 Submitter : Maximilian Engelhardt [EMAIL PROTECTED] Handled-By : Björn Steinbrink [EMAIL PROTECTED] Patch : http://lkml.org/lkml/2007/5/27/182 Status : patch available Subject: OOPS triggered by ip(8) deconfiguring a network interface References : http://bugzilla.kernel.org/show_bug.cgi?id=8491 Submitter : Ben Collins [EMAIL PROTECTED] Patch : http://marc.info/?l=linux-netdevm=117915849224816w=2 commit 5632c5152aa621885d87ea0b8fdd5a6bb9f69c6f Status : bug probably fixed SATA/PATA Subject: pata_via appears to incorrectly detects 40-pin cable References : http://lkml.org/lkml/2007/5/17/273 Submitter : Francis Russell [EMAIL PROTECTED] Status : Not really a regression. Alan seems to have a general fix. (Tejun Heo) Subject: libata reset-seq merge broke sata_sil on sh References : http://lkml.org/lkml/2007/5/10/63 Submitter : Paul Mundt [EMAIL PROTECTED] Handled-By : Tejun Heo [EMAIL PROTECTED] Caused-By : commit 4750def52cb2c21732dda9aa1d43a07db37b0186 Patch : http://lkml.org/lkml/2007/5/19/161 Status : patch available Sparc64 Subject: arch/sparc64/time.c doesn't compile on Ultra 1 (no PCI) References : http://bugzilla.kernel.org/show_bug.cgi?id=8540 Submitter : Horst H. von Brand [EMAIL PROTECTED] Status : patch available Regards, Michal -- Najbardziej brakowało mi twojego milczenia. -- Andrzej Sapkowski Coś więcej - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/4] 2.6.22-rc3: known regressions
Hi all, Here is a list of some known regressions in 2.6.22-rc3. Feel free to add new regressions/remove fixed etc. http://kernelnewbies.org/known_regressions Networking Subject: Network card not usable - sky2 References : http://bugzilla.kernel.org/show_bug.cgi?id=8539 Submitter : Ruben [EMAIL PROTECTED] Status : Unknown PCI Subject: Oops on 2.6.22-rc2 when unloading the cciss driver References : http://lkml.org/lkml/2007/5/24/172 Submitter : Mike Miller (OS Dev) [EMAIL PROTECTED] Status : Unknown PCMCIA Subject: libata and legacy ide pcmcia failure References : http://lkml.org/lkml/2007/5/17/305 Submitter : Robert de Rooy [EMAIL PROTECTED] Status : Unknown SATA/PATA Subject: 22-rc3 broke the CDROM in Dell notebook References : http://lkml.org/lkml/2007/5/27/63 Submitter : Gregor Jasny [EMAIL PROTECTED] Handled-By : Tejun Heo [EMAIL PROTECTED] Jeff Garzik [EMAIL PROTECTED] Caused-By : Tejun Heo [EMAIL PROTECTED] commit d4b2bab4f26345ea1803feb23ea92fbe3f6b77bc Status : problem is being debugged Sparc64 Subject: 2.6.22-rc broke X on Ultra5 References : http://lkml.org/lkml/2007/5/22/78 Submitter : Mikael Pettersson [EMAIL PROTECTED] Status : Unknown Regards, Michal -- Najbardziej brakowało mi twojego milczenia. -- Andrzej Sapkowski Coś więcej - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/3] 2.6.22-rc3: known regressions with patches
On Tue, 2007-05-29 at 14:53 +0200, Michal Piotrowski wrote: Subject: OOPS triggered by ip(8) deconfiguring a network interface References : http://bugzilla.kernel.org/show_bug.cgi?id=8491 Submitter : Ben Collins [EMAIL PROTECTED] Patch : http://marc.info/?l=linux-netdevm=117915849224816w=2 commit 5632c5152aa621885d87ea0b8fdd5a6bb9f69c6f Status : bug probably fixed Seems to be fixed, and just marked it resolved on bugzilla. -- Ubuntu : http://www.ubuntu.com/ Linux1394: http://wiki.linux1394.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-2.4] Fix divide by 0 in vegas_cong_avoid()
On Tue, 29 May 2007 12:18:19 +0300 Lior Dotan [EMAIL PROTECTED] wrote: Hi, I had a divide by zero on kernel 2.4.33 running with Vegas enabled. The KDB back trace is: kdb bt Stack traceback for pid 0 0x403a600000 10 R 0x403a6370 *swapper EBPEIPFunction (args) 0x403a7d48 0x4026ae51 vegas_cong_avoid+0x111 (0x5f3bb638, 0x73c92cbb, 0x , 0x73c92cbb, 0xf28d0275) kernel .text 0x4010 0x4026ad40 0x4026aef0 0x403a7d8c 0x4026bb67 tcp_ack+0x307 (0x5f3bb560, 0x581985c0, 0x18e, 0x4023e765, 0x5c369044) kernel .text 0x4010 0x4026b860 0x4026be20 0x403a7ddc 0x4026e771 tcp_rcv_established+0x461 (0x5f3bb560, 0x581985c0, 0x5c369 044, 0x2c, 0x5f3bb638) kernel .text 0x4010 0x4026e310 0x4026ed80 0x403a7e00 0x40277c2f tcp_v4_do_rcv+0x14f (0x5f3bb560, 0x581985c0, 0x0, 0x403a7e 50, 0x4024bb42) kernel .text 0x4010 0x40277ae0 0x40277c40 Rest was cut for readability What happens is that vegas_rtt_calc() gets rtt as -1, so when it adds 1 the rtt is set to zero. It seems that the -1 came from tcp_clean_rtx_queue() so I made this small patch to fix the problem. I think it is also relevant to 2.6. Seems like fixing the -1 would be better. Perhaps NTP reset the clock? Don't perform congestion avoidance on packets that we didn't calculate there RTT, as this may result in a divide by zero later on. Signed-off-by: Lior Dotan [EMAIL PROTECTED] --- diff -up net/ipv4/tcp_input.c.orig net/ipv4/tcp_input.c --- net/ipv4/tcp_input.c.orig 2007-05-29 11:43:37.0 +0300 +++ net/ipv4/tcp_input.c2007-05-29 11:20:21.0 +0300 @@ -2425,6 +2424,7 @@ static int tcp_clean_rtx_queue(struct so tp-retrans_out--; acked |= FLAG_RETRANS_DATA_ACKED; seq_rtt = -1; + acked = ~FLAG_DATA_ACKED; } else if (seq_rtt 0) seq_rtt = now - scb-when; if(sacked TCPCB_SACKED_ACKED) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/4] 2.6.22-rc3: known regressions
On Tue, 29 May 2007 14:56:20 +0200 Michal Piotrowski [EMAIL PROTECTED] wrote: Hi all, Here is a list of some known regressions in 2.6.22-rc3. Feel free to add new regressions/remove fixed etc. http://kernelnewbies.org/known_regressions Networking Subject: Network card not usable - sky2 References : http://bugzilla.kernel.org/show_bug.cgi?id=8539 Submitter : Ruben [EMAIL PROTECTED] Status : Unknown Not a regression. Read the bug report: Most recent kernel where this bug did *NOT* occur: It's been happening for a long time (I'd say always). There is a regression since 2.6.22-rc1 where the b44 driver has receive performance/interrupt lockup issues. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] defxx: Fix the handling of ioremap() failures
If ioremap_nocache() is unfortunate enough to fail, the error code is not set correctly leading to a false success from dfx_register(). This change fixes the problem. Signed-off-by: Maciej W. Rozycki [EMAIL PROTECTED] --- Please apply. Maciej patch-mips-2.6.21-20070502-defxx-ioremap-0 diff -up --recursive --new-file linux-mips-2.6.21-20070502.macro/drivers/net/defxx.c linux-mips-2.6.21-20070502/drivers/net/defxx.c --- linux-mips-2.6.21-20070502.macro/drivers/net/defxx.c2007-02-21 05:56:25.0 + +++ linux-mips-2.6.21-20070502/drivers/net/defxx.c 2007-05-16 21:26:43.0 + @@ -566,6 +566,7 @@ static int __devinit dfx_register(struct bp-base.mem = ioremap_nocache(bar_start, bar_len); if (!bp-base.mem) { printk(KERN_ERR %s: Cannot map MMIO\n, print_name); + err = -ENOMEM; goto err_out_region; } } else { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: b44: regression in 2.6.22 (resend)
On Mon, 2007-05-28 at 13:55 -0700, Maximilian Engelhardt wrote: On Monday 28 May 2007, Thomas Gleixner wrote: On Mon, 2007-05-28 at 19:44 +0200, Maximilian Engelhardt wrote: Can you please keep CONFIG_HIGH_RES_TIMERS and CONFIG_NOHZ and try the following combinations on the kernel command line: 1) highres=off nohz=off (should be the same as your working config) 2) highres=off 3) nohz=off I tested this with my 2.6.22-rc3 kernel, here are the results: without any special boot parameters: problem does appear highres=off nohz=off: problem does not appear highres=off: problem does not appear nohz=off: problem does appear Is there any other strange behavior of the high res enabled kernel than the b44 problem ? I didn't notice anything. I additionally built my 2.6.22-rc2-mm1 kernel without High Resolution Timer, but the high ping problem is still there. Hmm, that's mysterious. Wild guess is that highres exposes the hidden feature in a different way than rc2-mm1 does. I think the bug in 2.6.21/22-rc3 is a different one that the one in 2.6.22-rc2-mm1, but that's also only a wild guess :) I'll explain this a bit: In 2.6.21/22-rc3 is the same b44 driver that has been in the stock kernels for some time. With this driver and High Resolution Timer turned on I get problems using iperf. The problems are that the systems becomes really slow and unresponsive. Michael Buesch thought this could be an IRQ storm which sounds logical to me. This bug did never happen to me before I startet the iperf test. Can you please check to see if you notice anything out of the ordinary using netperf in place of iperf in your high res timer on/off testbed? Thanks, Gary - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: b44: regression in 2.6.22 (resend)
On Mon, 2007-05-28 at 16:55 +0200, Michael Buesch wrote: On Monday 28 May 2007 16:12:12 Maximilian Engelhardt wrote: On Monday 28 May 2007, Michael Buesch wrote: Can you also test the following patch? I think there's a bug in b44 that is doesn't properly discard shared IRQs, so it might possibly generate a NAPI storm, dunno. Worth a try. Index: linux-2.6.22-rc3/drivers/net/b44.c === --- linux-2.6.22-rc3.orig/drivers/net/b44.c 2007-05-27 23:01:44.0 +0200 +++ linux-2.6.22-rc3/drivers/net/b44.c 2007-05-28 12:48:27.0 +0200 @@ -911,6 +911,8 @@ static irqreturn_t b44_interrupt(int irq spin_lock(bp-lock); istat = br32(bp, B44_ISTAT); + if (istat == 0x) + goto out; /* Shared IRQ not for us */ imask = br32(bp, B44_IMASK); /* The interrupt mask register controls which interrupt bits @@ -942,6 +944,7 @@ irq_ack: bw32(bp, B44_ISTAT, istat); br32(bp, B44_ISTAT); } +out: spin_unlock(bp-lock); return IRQ_RETVAL(handled); } I did try this patch on a affected kernel, but I didn't notice any big difference. Perhaps the kernel is a bit less slow during the test, but It's hard to tell. Ok, but anyway. I think this is a bug and needs to be fixed this way. Gary? Thanks Michael. No, I don't think this is a bug and it does not need to be fixed. Thanks, Gary - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)
Ok, I finally got some time to code this out and study it and Ihave some questions. Milton Miller wrote: We add two flags to struct rx: one says this packet is EL, and one says it is or was size 0. We create a function, find_mark_el(struct nic, is_running) that is called after initial alloc and/or after refilling skb list. In find_mark_el, we start with rx_to_use (lets rename this rx_to_fill), and go back two entries, call this rx_to_mark. If is_running and rx_to_mark-prev-el then just return, leave EL alone. So if we start clean and then one packet completes, we can not clear the old marked entry? We must then add a per nic pointer to the current marked entry so that when the next packet completes, we can avoid scanning for it. Why do we need to check this again? Otherwise, set EL and size to 0, set the two flags in the rx struct, sync the RFD, then search for the current EL (we could save the pointer, but its always the odl rx_to_use (fill) or its -prev). Clear RFD-EL, sync, clear rx-el. Set size non-zero, sync, Leave the was_0 flag set if is_running (clear it only if we know reciever is stopped). At this point, if the receiver was stopped we can restart it,. We should do so in the caller. (We always restart at rx_to_clean). Restart should ack the RNR status before issuing the ru_start command to avoid a spurious RNR interrupt. In the receive loop, if RFD-C is not set, rx-was_0 is set and el is not set, then we need to check rx-next-RFD-C bit (after pci_sync_for_cpu). If the next packet C bit is set then we consider this skb as used, and just complete it with no data to the stack. If the C-bit is not set, we can read the status to see if the RU is running (we cleared the EL bit before it read it but it has not found another packet yet) or not (it read the el-bit before we cleared it). This read lets us avoid going through an enable interrupts, wait for the RNR interrupt cycle. Because find_mark_el will only advance EL if the receiver was stopped or we had more than 1 packet added, we can guarantee that if packet N has was_0 set, then packet N+1 will not have it set, so we have bounded lookahead. Is this meant to be 1 packet added at any given call or 1 packet added since the last time we cleared? This adds two flags to struct rx, but that is allocated as a single array and the array size is filled out as forward and back pointers. Rx clean only has to look at was_0 on the last packet when it decides C is not set, and only if both are set does it peek at the next packet. In other words, we shouldn't worry about the added flags making it a non-power-of-2 size. By setting size != 0 after we have modified all other fields, the device will only write to this packet if we are done writing. If we loose the race and only partially complete, it will just go on to the next packet and we find it. If we totally loose, then the receiver will go RNR and we can reclaim all the buffer space we have allocated. Comments? Questions? Say we have an 8 buffer receive pool (0-7) at start. rx[6] is marked. hardware consumes rx[0] software sees one rx complete without an s-bit set. software notes that rx[6] is marked software allocates new buffer for rx[0] software runs find_mark_el with rx_to_use == rx[1], rx_to_mark == rx[7], is_running == true, marked_rx == rx[6] find_mark_el finds rx_to_mark-prev-el set and is_running true, and returns. hardware consumes rx[1] software sees one rx complete without an s-bit set. software notes that rx[6] is still marked software allocates new buffer for rx[1] software runs find_mark_el with rx_to_use == rx[2], rx_to_mark == rx[0], is_running == true, marked_rx == rx[6] find_mark_el does not find rx_to_mark-prev-el set so continues find_mark_el sets rx[0]-rfd-el rx[0]-rfd-size = 0, rx[0]-el, rx[0]-size0; find_mark_el clears rx[6]-rfd-el, syncs, sets rx[6]-rfd-size, syncs, clears rx[6]-rfd-el but leaves rx[6]-rfd-size0 set. Is this the correct flow? -Ack - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9]: tcp-2.6 patchset
On Mon, 28 May 2007 13:27:03 +0300 (EEST) Ilpo Järvinen [EMAIL PROTECTED] wrote: On Sun, 27 May 2007, Ilpo Järvinen wrote: On Sun, 27 May 2007, Baruch Even wrote: * Ilpo J?rvinen [EMAIL PROTECTED] [070527 14:16]: Thus, my original question basically culminates in this: should cc modules be passed number of packets acked or number of skbs acked? ...The latter makes no sense to me unless the value is intented to be interpreted as number of timestamps acked or something along those lines. ...I briefly tried looking up for documentation for cc module interface but didn't find anything useful about this, and thus asked in the first place... At least the htcp module that I wrote assumes that the number is actual number of tcp packets so GSO should be considered. Thanks for the info! It is what I suspected... ...I'll write a patch for it tomorrow against net-2.6... Dave, beware that it will partially overlap with the changes made in the patch 8, so you might choose to put the patch 8 on hold until this issue is first resolved... The consequences of this bug are not too large but it does make all congestion control algorithms a lot less aggressive. On my machines GSO is disabled by default (e1000 at 100mbps Tigon3 @ 1Gbps). Agreed, that's my impression too. However, some algorithms do things like 0 checks for it, so it might disturb their dynamics even more than in the too small value cases... Hmm, there seems to be another case that I'm not too sure of... Please check the alternative I choose for SYN handling below... ...hmm... While exploring this SYN thingie, I noticed that commit 164891aadf1721fca4dce473bb0e0998181537c6 drops !FLAG_RETRANS_DATA_ACKED check from rtt_sample call (when combining it with pkts_acked call). I hope that's intentional?!? ...the commit message didn't say anything about it nor was anything in cc modules changed to accomodate that. [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules) The code used to ignore GSO completely, passing either way too small or zero pkts_acked when GSO skb or part of it got ACKed. In addition, there is no need to calculate the value in the loop but simple arithmetics after the loop is sufficient. Yes, thanks for fixing this. Wonder how it affects measurements. It is not very clear how SYN segments should be handled, so I choose to follow the previous implementation in this respect. Since we don't invoke congestion control modules until after the SYN handshake this is not a problem. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SKY2]: Fix VLAN unregistration
On Sun, 27 May 2007 20:44:04 +0200 Patrick McHardy [EMAIL PROTECTED] wrote: Fix sky2 disabling VLAN completely when the first vid is unregistered. For some reason the VLAN code insists on the driver providing a vlan_rx_kill_vid function even if only NETIF_F_HW_VLAN_RX and not NETIF_F_HW_VLAN_FILTER is set, so this patch keeps an empty function. This seems to be a bug though, vlan_rx_add_vid is only required with NETIF_F_HW_VLAN_FILTER. Ben? I put it in the next sky2 patch set. Still waiting for Jeff to take the last one. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-2.4] Fix divide by 0 in vegas_cong_avoid()
NTP was not running. I'm not sure what do you mean by fixing the -1. The trace shows that vegas_cong_avoid() is called with -1, and the only way it can happen is from tcp_clean_rtx_queue() and the patch should eliminate this. Another way of solving this is by checking vegas_rtt_calc() and see if it gets -1 and handle it there. Another thing that I don't understand is that some places like tcp_ack() declare seq_rtt as signed and some vegas_cong_avoid() declare it as unsigned. Shouldn't it be declared always as signed? On 5/29/07, Stephen Hemminger [EMAIL PROTECTED] wrote: On Tue, 29 May 2007 12:18:19 +0300 Lior Dotan [EMAIL PROTECTED] wrote: Hi, I had a divide by zero on kernel 2.4.33 running with Vegas enabled. The KDB back trace is: kdb bt Stack traceback for pid 0 0x403a600000 10 R 0x403a6370 *swapper EBPEIPFunction (args) 0x403a7d48 0x4026ae51 vegas_cong_avoid+0x111 (0x5f3bb638, 0x73c92cbb, 0x , 0x73c92cbb, 0xf28d0275) kernel .text 0x4010 0x4026ad40 0x4026aef0 0x403a7d8c 0x4026bb67 tcp_ack+0x307 (0x5f3bb560, 0x581985c0, 0x18e, 0x4023e765, 0x5c369044) kernel .text 0x4010 0x4026b860 0x4026be20 Seems like fixing the -1 would be better. Perhaps NTP reset the clock? -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: b44: regression in 2.6.22 (resend)
On Monday 28 May 2007, Thomas Gleixner wrote: On Mon, 2007-05-28 at 22:55 +0200, Maximilian Engelhardt wrote: I additionally built my 2.6.22-rc2-mm1 kernel without High Resolution Timer, but the high ping problem is still there. Hmm, that's mysterious. Wild guess is that highres exposes the hidden feature in a different way than rc2-mm1 does. I think the bug in 2.6.21/22-rc3 is a different one that the one in 2.6.22-rc2-mm1, but that's also only a wild guess :) I'll explain this a bit: In 2.6.21/22-rc3 is the same b44 driver that has been in the stock kernels for some time. With this driver and High Resolution Timer turned on I get problems using iperf. The problems are that the systems becomes really slow and unresponsive. Michael Buesch thought this could be an IRQ storm which sounds logical to me. This bug did never happen to me before I startet the iperf test. Can you please apply http://www.tglx.de/projects/hrtimers/2.6.22-rc3/patch-2.6.22-rc3-hrt1.patch on top of rc3 and check, whether it has any effect on your problem. The patch didn't change anything. The other issue happens only with 2.6.22-rc2-mm1 which includes the b44 ssb spilt. It's independed wether High Resolution Timer is turned on or off I always get very varying and high ping times. The iperf-test doesn't show the problems from 2.6.21/22-rc3. Neither with nor without highres ? Yes, it doesn't matter if highres is turned on or off. iperf never showed the problem from 2.6.21/22-rc3. Maxi signature.asc Description: This is a digitally signed message part.
Re: [PATCH][NET_SCHED] Update htb rate when stats are polled.
On 5/26/07, Patrick McHardy [EMAIL PROTECTED] wrote: Patrick McHardy wrote: Ranjit Manomohan wrote: Currently the HTB rate for a class is update very slowly (once every 16 seconds). This patch updates the rate whenever the stats are requested from user space. This enables more accurate rate monitoring. +/* Update packet/byte rate for a class. */ [..] We have a generic rate estimator, I think we should convert HTB over to use it and then maybe add this feature to the generic estimator. You can use this patch as a base. It needs a bit more work (for example the CONFIG_NET_SCHED_ESTIMATOR ifdefs are unnecessary, but I've added them to remind me to clean this up in all schedulers), but it works well enough for testing. Actually .. do you still need your changes with this patch? You can now replace the default estimator by one with a shorter interval. I applied the patch and tried this out and it works fine with short intervals. My changes are no longer necessary. Thanks! -Ranjit [NET_SCHED]: sch_htb: use generic estimator Remove the internal rate estimator and use the generic one. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit add81ec9b23f1fbe973093d5999a3d70e9d4c48b tree d5d119cceed54f14c39cbe6bc75c270a3e89f316 parent b208cb31ea86bc6296e5b08e53bc765c81306286 author Patrick McHardy [EMAIL PROTECTED] Sat, 26 May 2007 14:52:02 +0200 committer Patrick McHardy [EMAIL PROTECTED] Sat, 26 May 2007 14:52:02 +0200 net/sched/sch_htb.c | 91 +-- 1 files changed, 30 insertions(+), 61 deletions(-) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 035788c..b29ff8f 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -69,8 +69,6 @@ */ #define HTB_HSIZE 16 /* classid hash size */ -#define HTB_EWMAC 2/* rate average over HTB_EWMAC*HTB_HSIZE sec */ -#define HTB_RATECM 1 /* whether to use rate computer */ #define HTB_HYSTERESIS 1 /* whether to use mode hysteresis for speedup */ #define HTB_VER 0x30011/* major must be matched with number suplied by TC as version */ @@ -95,12 +93,6 @@ struct htb_class { struct tc_htb_xstats xstats;/* our special stats */ int refcnt; /* usage count of this class */ -#ifdef HTB_RATECM - /* rate measurement counters */ - unsigned long rate_bytes, sum_bytes; - unsigned long rate_packets, sum_packets; -#endif - /* topology */ int level; /* our level (see above) */ struct htb_class *parent; /* parent class */ @@ -194,10 +186,6 @@ struct htb_sched { int rate2quantum; /* quant = rate / rate2quantum */ psched_time_t now; /* cached dequeue time */ struct qdisc_watchdog watchdog; -#ifdef HTB_RATECM - struct timer_list rttim;/* rate computer timer */ - int recmp_bucket; /* which hash bucket to recompute next */ -#endif /* non shaped skbs; let them go directly thru */ struct sk_buff_head direct_queue; @@ -677,34 +665,6 @@ static int htb_requeue(struct sk_buff *skb, struct Qdisc *sch) return NET_XMIT_SUCCESS; } -#ifdef HTB_RATECM -#define RT_GEN(D,R) R+=D-(R/HTB_EWMAC);D=0 -static void htb_rate_timer(unsigned long arg) -{ - struct Qdisc *sch = (struct Qdisc *)arg; - struct htb_sched *q = qdisc_priv(sch); - struct hlist_node *p; - struct htb_class *cl; - - - /* lock queue so that we can muck with it */ - spin_lock_bh(sch-dev-queue_lock); - - q-rttim.expires = jiffies + HZ; - add_timer(q-rttim); - - /* scan and recompute one bucket at time */ - if (++q-recmp_bucket = HTB_HSIZE) - q-recmp_bucket = 0; - - hlist_for_each_entry(cl,p, q-hash + q-recmp_bucket, hlist) { - RT_GEN(cl-sum_bytes, cl-rate_bytes); - RT_GEN(cl-sum_packets, cl-rate_packets); - } - spin_unlock_bh(sch-dev-queue_lock); -} -#endif - /** * htb_charge_class - charges amount bytes to leaf and ancestors * @@ -750,11 +710,6 @@ static void htb_charge_class(struct htb_sched *q, struct htb_class *cl, if (cl-cmode != HTB_CAN_SEND) htb_add_to_wait_tree(q, cl, diff); } -#ifdef HTB_RATECM - /* update rate counters */ - cl-sum_bytes += bytes; - cl-sum_packets++; -#endif /* update byte stats except for leaves which are already updated */ if (cl-level) { @@ -1095,13 +1050,6 @@ static int htb_init(struct Qdisc *sch, struct rtattr *opt) if (q-direct_qlen 2) /* some devices have zero tx_queue_len */ q-direct_qlen = 2; -#ifdef HTB_RATECM - init_timer(q-rttim); - q-rttim.function = htb_rate_timer; - q-rttim.data = (unsigned long)sch; - q-rttim.expires = jiffies + HZ; - add_timer(q-rttim); -#endif
Please pull 'upstream-fixes' branch of wireless-2.6
The following changes since commit c420bc9f09a0926b708c3edb27eacba434a4f4ba: Linus Torvalds (1): Linux 2.6.22-rc3 are found in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream-fixes Akinobu Mita (2): ieee80211: fix incomplete error message softmac: alloc_ieee80211() NULL check Björn Steinbrink (1): prism54: fix monitor mode oops Brandon Craig Rhodes (1): hostap: Allocate enough tailroom for TKIP drivers/net/wireless/hostap/hostap_80211_tx.c | 13 - drivers/net/wireless/prism54/islpci_eth.c |5 +++-- net/ieee80211/ieee80211_module.c|2 +- net/ieee80211/softmac/ieee80211softmac_module.c |5 - 4 files changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/hostap/hostap_80211_tx.c b/drivers/net/wireless/hostap/hostap_80211_tx.c index 246fac0..3df3c60 100644 --- a/drivers/net/wireless/hostap/hostap_80211_tx.c +++ b/drivers/net/wireless/hostap/hostap_80211_tx.c @@ -311,7 +311,7 @@ static struct sk_buff * hostap_tx_encrypt(struct sk_buff *skb, local_info_t *local; struct ieee80211_hdr_4addr *hdr; u16 fc; - int hdr_len, res; + int prefix_len, postfix_len, hdr_len, res; iface = netdev_priv(skb-dev); local = iface-local; @@ -337,10 +337,13 @@ static struct sk_buff * hostap_tx_encrypt(struct sk_buff *skb, if (skb == NULL) return NULL; - if ((skb_headroom(skb) crypt-ops-extra_mpdu_prefix_len || -skb_tailroom(skb) crypt-ops-extra_mpdu_postfix_len) - pskb_expand_head(skb, crypt-ops-extra_mpdu_prefix_len, -crypt-ops-extra_mpdu_postfix_len, GFP_ATOMIC)) { + prefix_len = crypt-ops-extra_mpdu_prefix_len + + crypt-ops-extra_msdu_prefix_len; + postfix_len = crypt-ops-extra_mpdu_postfix_len + + crypt-ops-extra_msdu_postfix_len; + if ((skb_headroom(skb) prefix_len || +skb_tailroom(skb) postfix_len) + pskb_expand_head(skb, prefix_len, postfix_len, GFP_ATOMIC)) { kfree_skb(skb); return NULL; } diff --git a/drivers/net/wireless/prism54/islpci_eth.c b/drivers/net/wireless/prism54/islpci_eth.c index dd070cc..f49eb06 100644 --- a/drivers/net/wireless/prism54/islpci_eth.c +++ b/drivers/net/wireless/prism54/islpci_eth.c @@ -378,9 +378,10 @@ islpci_eth_receive(islpci_private *priv) display_buffer((char *) skb-data, skb-len); #endif /* take care of monitor mode and spy monitoring. */ - if (unlikely(priv-iw_mode == IW_MODE_MONITOR)) + if (unlikely(priv-iw_mode == IW_MODE_MONITOR)) { + skb-dev = ndev; discard = islpci_monitor_rx(priv, skb); - else { + } else { if (unlikely(skb-data[2 * ETH_ALEN] == 0)) { /* The packet has a rx_annex. Read it for spy monitoring, Then * remove it, while keeping the 2 leading MAC addr. diff --git a/net/ieee80211/ieee80211_module.c b/net/ieee80211/ieee80211_module.c index 7ec6610..17ad278 100644 --- a/net/ieee80211/ieee80211_module.c +++ b/net/ieee80211/ieee80211_module.c @@ -140,7 +140,7 @@ struct net_device *alloc_ieee80211(int sizeof_priv) dev = alloc_etherdev(sizeof(struct ieee80211_device) + sizeof_priv); if (!dev) { - IEEE80211_ERROR(Unable to network device.\n); + IEEE80211_ERROR(Unable to allocate network device.\n); goto failed; } ieee = netdev_priv(dev); diff --git a/net/ieee80211/softmac/ieee80211softmac_module.c b/net/ieee80211/softmac/ieee80211softmac_module.c index e9cdc66..c308756 100644 --- a/net/ieee80211/softmac/ieee80211softmac_module.c +++ b/net/ieee80211/softmac/ieee80211softmac_module.c @@ -33,7 +33,10 @@ struct net_device *alloc_ieee80211softmac(int sizeof_priv) struct ieee80211softmac_device *softmac; struct net_device *dev; - dev = alloc_ieee80211(sizeof(struct ieee80211softmac_device) + sizeof_priv); + dev = alloc_ieee80211(sizeof(*softmac) + sizeof_priv); + if (!dev) + return NULL; + softmac = ieee80211_priv(dev); softmac-dev = dev; softmac-ieee = netdev_priv(dev); -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Please pull 'upstream' branch of wireless-2.6
The following changes since commit d7ea3be56adc95b17351221fd95e78115f3b01f4: Brandon Craig Rhodes (1): hostap: Allocate enough tailroom for TKIP are found in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream Akinobu Mita (1): softmac: use list_for_each_entry Daniel Drake (4): zd1211rw: Add ID for ZyXEL G-200v2 zd1211rw: Extend RF layer zd1211rw: Add UW2453 RF support zd1211rw: Make CCK gain patching conditional on RF type Jouni Malinen (1): hostap: Remove driver version number Larry Finger (1): bcm43xx: Fix deviation from specifications in set_baseband_attenuation Matthias Kaehlcke (1): hostap: Use list_for_each_entry Pavel Roskin (1): hostap: Suppress broadcast if no stations are associated drivers/net/wireless/bcm43xx/bcm43xx_phy.c |2 +- drivers/net/wireless/hostap/hostap_ap.c | 34 +- drivers/net/wireless/hostap/hostap_config.h |2 - drivers/net/wireless/hostap/hostap_cs.c |4 - drivers/net/wireless/hostap/hostap_ioctl.c |2 - drivers/net/wireless/hostap/hostap_main.c |1 - drivers/net/wireless/hostap/hostap_pci.c|5 - drivers/net/wireless/hostap/hostap_plx.c|5 - drivers/net/wireless/zd1211rw/Makefile |2 +- drivers/net/wireless/zd1211rw/zd_chip.c |5 +- drivers/net/wireless/zd1211rw/zd_chip.h |3 + drivers/net/wireless/zd1211rw/zd_rf.c | 21 +- drivers/net/wireless/zd1211rw/zd_rf.h | 28 ++ drivers/net/wireless/zd1211rw/zd_rf_al2230.c|1 + drivers/net/wireless/zd1211rw/zd_rf_al7230b.c |1 + drivers/net/wireless/zd1211rw/zd_rf_uw2453.c| 534 +++ drivers/net/wireless/zd1211rw/zd_usb.c |1 + net/ieee80211/softmac/ieee80211softmac_module.c | 32 +- 18 files changed, 611 insertions(+), 72 deletions(-) create mode 100644 drivers/net/wireless/zd1211rw/zd_rf_uw2453.c diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_phy.c b/drivers/net/wireless/bcm43xx/bcm43xx_phy.c index b37f1e3..d779199 100644 --- a/drivers/net/wireless/bcm43xx/bcm43xx_phy.c +++ b/drivers/net/wireless/bcm43xx/bcm43xx_phy.c @@ -1638,7 +1638,7 @@ void bcm43xx_phy_set_baseband_attenuation(struct bcm43xx_private *bcm, return; } - if (phy-analog 1) { + if (phy-analog == 1) { value = bcm43xx_phy_read(bcm, 0x0060) ~0x003C; value |= (baseband_attenuation 2) 0x003C; } else { diff --git a/drivers/net/wireless/hostap/hostap_ap.c b/drivers/net/wireless/hostap/hostap_ap.c index 5b3abd5..9090052 100644 --- a/drivers/net/wireless/hostap/hostap_ap.c +++ b/drivers/net/wireless/hostap/hostap_ap.c @@ -326,7 +326,6 @@ static int ap_control_proc_read(char *page, char **start, off_t off, char *p = page; struct ap_data *ap = (struct ap_data *) data; char *policy_txt; - struct list_head *ptr; struct mac_entry *entry; if (off != 0) { @@ -352,14 +351,12 @@ static int ap_control_proc_read(char *page, char **start, off_t off, p += sprintf(p, MAC entries: %u\n, ap-mac_restrictions.entries); p += sprintf(p, MAC list:\n); spin_lock_bh(ap-mac_restrictions.lock); - for (ptr = ap-mac_restrictions.mac_list.next; -ptr != ap-mac_restrictions.mac_list; ptr = ptr-next) { + list_for_each_entry(entry, ap-mac_restrictions.mac_list, list) { if (p - page PAGE_SIZE - 80) { p += sprintf(p, All entries did not fit one page.\n); break; } - entry = list_entry(ptr, struct mac_entry, list); p += sprintf(p, MACSTR \n, MAC2STR(entry-addr)); } spin_unlock_bh(ap-mac_restrictions.lock); @@ -413,7 +410,6 @@ int ap_control_del_mac(struct mac_restrictions *mac_restrictions, u8 *mac) static int ap_control_mac_deny(struct mac_restrictions *mac_restrictions, u8 *mac) { - struct list_head *ptr; struct mac_entry *entry; int found = 0; @@ -421,10 +417,7 @@ static int ap_control_mac_deny(struct mac_restrictions *mac_restrictions, return 0; spin_lock_bh(mac_restrictions-lock); - for (ptr = mac_restrictions-mac_list.next; -ptr != mac_restrictions-mac_list; ptr = ptr-next) { - entry = list_entry(ptr, struct mac_entry, list); - + list_for_each_entry(entry, mac_restrictions-mac_list, list) { if (memcmp(entry-addr, mac, ETH_ALEN) == 0) { found = 1; break; @@ -519,7 +512,7 @@ static int prism2_ap_proc_read(char *page, char **start, off_t off, { char *p = page; struct ap_data *ap = (struct ap_data *) data; - struct list_head *ptr; + struct sta_info *sta; int i;
Re: [PATCH][IPSEC] fix panic when using inter address familiy IPsec on loopback
From: Kazunori MIYAZAWA [EMAIL PROTECTED] Date: Fri, 20 Apr 2007 19:02:51 +0900 I send patches to fix panic when using inter address family IPsec on loopback device. These patches can be applied to current linux-2.6 and should also be net-2.6. Best regards, Signed-off-by: Kazunori MIYAZAWA [EMAIL PROTECTED] Patch applied, thank you. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9]: tcp-2.6 patchset
On Tue, 29 May 2007, Stephen Hemminger wrote: On Mon, 28 May 2007 13:27:03 +0300 (EEST) Ilpo Järvinen [EMAIL PROTECTED] wrote: On Sun, 27 May 2007, Ilpo Järvinen wrote: [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules) Yes, thanks for fixing this. Wonder how it affects measurements. ...It's a bit hard to tell since dynamics change so dramatically in 0 check cases, the resulting behavior in too small value cases may be easier to predict... It's possible that this could explain some anomalities you've been seeing in your measurements. It is not very clear how SYN segments should be handled, so I choose to follow the previous implementation in this respect. Since we don't invoke congestion control modules until after the SYN handshake this is not a problem. Just curious, do you mean that cc modules cannot measure, e.g., initial RTT through this mechanism (though they could do that in init() cb then I suppose)... Or do you mean that they are called already for the ACK that completes the SYN handshake and therefore its skb is being cleaned from the queue right now (this is the case I above refer to)? In the first case the decrementer code is NOP. If the latter, then it is just interface specification question, i.e., if SYNs are treated as zero or one in num_acked for the pkts_acked callback (I have no opinion on this but was just trying to make sure cc modules get what they expect :-)). -- i.
Re: [PATCH 0/9]: tcp-2.6 patchset
On Tue, 29 May 2007 23:07:00 +0300 (EEST) Ilpo Järvinen [EMAIL PROTECTED] wrote: On Tue, 29 May 2007, Stephen Hemminger wrote: On Mon, 28 May 2007 13:27:03 +0300 (EEST) Ilpo Järvinen [EMAIL PROTECTED] wrote: On Sun, 27 May 2007, Ilpo Järvinen wrote: [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules) Yes, thanks for fixing this. Wonder how it affects measurements. ...It's a bit hard to tell since dynamics change so dramatically in 0 check cases, the resulting behavior in too small value cases may be easier to predict... It's possible that this could explain some anomalities you've been seeing in your measurements. It is not very clear how SYN segments should be handled, so I choose to follow the previous implementation in this respect. Since we don't invoke congestion control modules until after the SYN handshake this is not a problem. Just curious, do you mean that cc modules cannot measure, e.g., initial RTT through this mechanism (though they could do that in init() cb then I suppose)... Or do you mean that they are called already for the ACK that completes the SYN handshake and therefore its skb is being cleaned from the queue right now (this is the case I above refer to)? In the first case the decrementer code is NOP. If the latter, then it is just interface specification question, i.e., if SYNs are treated as zero or one in num_acked for the pkts_acked callback (I have no opinion on this but was just trying to make sure cc modules get what they expect :-)). We don't switch a socket out of Reno until after the initial handshake. It evolved that way, and has a couple of advantages: * uncomplete connections don't up module refcount so it is easier to debug * in future, we can choose congestion control to use based on route metric. As an interface, it makes sense to keep the API with the SYN counting as a packet. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-2.4] Fix divide by 0 in vegas_cong_avoid()
On Tue, 29 May 2007 12:18:19 +0300 Lior Dotan [EMAIL PROTECTED] wrote: Hi, I had a divide by zero on kernel 2.4.33 running with Vegas enabled. The KDB back trace is: kdb bt Stack traceback for pid 0 0x403a600000 10 R 0x403a6370 *swapper EBPEIPFunction (args) 0x403a7d48 0x4026ae51 vegas_cong_avoid+0x111 (0x5f3bb638, 0x73c92cbb, 0x , 0x73c92cbb, 0xf28d0275) kernel .text 0x4010 0x4026ad40 0x4026aef0 0x403a7d8c 0x4026bb67 tcp_ack+0x307 (0x5f3bb560, 0x581985c0, 0x18e, 0x4023e765, 0x5c369044) kernel .text 0x4010 0x4026b860 0x4026be20 0x403a7ddc 0x4026e771 tcp_rcv_established+0x461 (0x5f3bb560, 0x581985c0, 0x5c369 044, 0x2c, 0x5f3bb638) kernel .text 0x4010 0x4026e310 0x4026ed80 0x403a7e00 0x40277c2f tcp_v4_do_rcv+0x14f (0x5f3bb560, 0x581985c0, 0x0, 0x403a7e 50, 0x4024bb42) kernel .text 0x4010 0x40277ae0 0x40277c40 Rest was cut for readability What happens is that vegas_rtt_calc() gets rtt as -1, so when it adds 1 the rtt is set to zero. It seems that the -1 came from tcp_clean_rtx_queue() so I made this small patch to fix the problem. I think it is also relevant to 2.6. Don't perform congestion avoidance on packets that we didn't calculate there RTT, as this may result in a divide by zero later on. Signed-off-by: Lior Dotan [EMAIL PROTECTED] --- diff -up net/ipv4/tcp_input.c.orig net/ipv4/tcp_input.c --- net/ipv4/tcp_input.c.orig 2007-05-29 11:43:37.0 +0300 +++ net/ipv4/tcp_input.c2007-05-29 11:20:21.0 +0300 @@ -2425,6 +2424,7 @@ static int tcp_clean_rtx_queue(struct so tp-retrans_out--; acked |= FLAG_RETRANS_DATA_ACKED; seq_rtt = -1; + acked = ~FLAG_DATA_ACKED; } else if (seq_rtt 0) seq_rtt = now - scb-when; if(sacked TCPCB_SACKED_ACKED) - The proposed fix for TSO and packets acked callback will fix this as well. Your fix makes sense for kernels 2.6.22, but with 2.6.22 this would fix it: http://article.gmane.org/gmane.linux.network/63101 -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: b44: regression in 2.6.22 (resend)
On Tuesday 29 May 2007 16:14:35 Gary Zambrano wrote: On Mon, 2007-05-28 at 16:55 +0200, Michael Buesch wrote: On Monday 28 May 2007 16:12:12 Maximilian Engelhardt wrote: On Monday 28 May 2007, Michael Buesch wrote: Can you also test the following patch? I think there's a bug in b44 that is doesn't properly discard shared IRQs, so it might possibly generate a NAPI storm, dunno. Worth a try. Index: linux-2.6.22-rc3/drivers/net/b44.c === --- linux-2.6.22-rc3.orig/drivers/net/b44.c 2007-05-27 23:01:44.0 +0200 +++ linux-2.6.22-rc3/drivers/net/b44.c2007-05-28 12:48:27.0 +0200 @@ -911,6 +911,8 @@ static irqreturn_t b44_interrupt(int irq spin_lock(bp-lock); istat = br32(bp, B44_ISTAT); + if (istat == 0x) + goto out; /* Shared IRQ not for us */ imask = br32(bp, B44_IMASK); /* The interrupt mask register controls which interrupt bits @@ -942,6 +944,7 @@ irq_ack: bw32(bp, B44_ISTAT, istat); br32(bp, B44_ISTAT); } +out: spin_unlock(bp-lock); return IRQ_RETVAL(handled); } I did try this patch on a affected kernel, but I didn't notice any big difference. Perhaps the kernel is a bit less slow during the test, but It's hard to tell. Ok, but anyway. I think this is a bug and needs to be fixed this way. Gary? Thanks Michael. No, I don't think this is a bug and it does not need to be fixed. Are you sure? I'm not so sure, because 1) On bcm43xx the reverse engineers told us that the card returns 0x for no-irq-pending. Since b44 and bcm43xx are very similiar in IRQ and DMA I just thought it would be the case there, too. Just guessing. 2) PCMCIA cards usually return all-ones if you try to read a register of a card that's been removed. So it's good practice to check for this and bail out early in the IRQ path. Do PCMCIA cards (PC-card, not neccessarily a real 16bit PCMCIA card) for b44 exist? -- Greetings Michael. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9]: tcp-2.6 patchset
On Tue, 29 May 2007, Stephen Hemminger wrote: On Tue, 29 May 2007 23:07:00 +0300 (EEST) Ilpo Järvinen [EMAIL PROTECTED] wrote: On Tue, 29 May 2007, Stephen Hemminger wrote: Since we don't invoke congestion control modules until after the SYN handshake this is not a problem. Just curious, do you mean that cc modules cannot measure, e.g., initial RTT through this mechanism (though they could do that in init() cb then I suppose)... Or do you mean that they are called already for the ACK that completes the SYN handshake and therefore its skb is being cleaned from the queue right now (this is the case I above refer to)? In the first case the decrementer code is NOP. If the latter, then it is just interface specification question, i.e., if SYNs are treated as zero or one in num_acked for the pkts_acked callback (I have no opinion on this but was just trying to make sure cc modules get what they expect :-)). We don't switch a socket out of Reno until after the initial handshake. ...It's still not very clear to me what exactly your after means (both here and in your earlier description), i.e., whether clean_rtx_queue call that cleans SYN skb from the queue happens before or after the switch out of reno or not... If I understand the code correctly, this specific clean_rtx_queue call happens after your after but I could be misunderstanding the current 3-way handshake code. :-) As an interface, it makes sense to keep the API with the SYN counting as a packet. Ok, this one answers the remaining question concerning the patch, here is it without the decrementer for SYN case (which IMHO wasn't very beautiful looking anyway :-)). Dave, please consider this to net-2.6. It could be a stable candidate as well, haven't tested yet if it applies cleanly to stable: [PATCH] [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules) The code used to ignore GSO completely, passing either way too small or zero pkts_acked when GSO skb or part of it got ACKed. In addition, there is no need to calculate the value in the loop but simple arithmetics after the loop is sufficient. Pkts_acked callback of congestion control modules include SYN as a packet for now on. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 38cb25b..74683d8 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2407,8 +2407,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) struct sk_buff *skb; __u32 now = tcp_time_stamp; int acked = 0; + int prior_packets = tp-packets_out; __s32 seq_rtt = -1; - u32 pkts_acked = 0; ktime_t last_ackt = ktime_set(0,0); while ((skb = tcp_write_queue_head(sk)) @@ -2437,7 +2437,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) */ if (!(scb-flags TCPCB_FLAG_SYN)) { acked |= FLAG_DATA_ACKED; - ++pkts_acked; } else { acked |= FLAG_SYN_ACKED; tp-retrans_stamp = 0; @@ -2481,6 +2480,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, __s32 *seq_rtt_p) } if (ackedFLAG_ACKED) { + u32 pkts_acked = prior_packets - tp-packets_out; const struct tcp_congestion_ops *ca_ops = inet_csk(sk)-icsk_ca_ops; -- 1.5.0.6
Definition and usage of NETIF_F_HW_SUM?
The flag NETIF_F_HW_SUM is being misused. The definition says that the device is capable of checksumming any packet. When in fact from usage it seems to imply that the device is capable of doing IPV6 as well as IPV4. Some devices like 8139too do fake checksum offloading because they always have to copy the packet. Some devices like via-rhine, don't really checksum but if they see CHECKSUM_PARTIAL then they copy. This is bogus, they should just let higher layer do checksum/copy. Devices like e1000, and bnx2 are broken because they assume only TCP/UDP and IPV4/IPV6. The definition of the flag says other protocols should work, but they probably send the hardware into a state of confusion. A few devices take a offset, starting point, and insertion point. This looks like the correct model. But no upper layer protocols other than IPV4/IPV6 can do checksum offload at present, so it seems moot. IMHO the correct solution would be to get rid if NETIF_F_HW_SUM and make a new flag NETIF_F_IPV6_SUM. Devices that can checksum both could do NETIF_F_IPV4_SUM|NETI_F_IPV6_SUM. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] b44: netpoll locking fix
The irq handling thread (b44_interrupt) uses the same lock as the NAPI thread. This change should prevent a deadlock if something interrupts the b44 NAPI thread and tries to printk through netconsole. Signed-off-by: Francois Romieu [EMAIL PROTECTED] --- drivers/net/b44.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/b44.c b/drivers/net/b44.c index 879a2ff..64d75e4 100644 --- a/drivers/net/b44.c +++ b/drivers/net/b44.c @@ -853,16 +853,18 @@ static int b44_rx(struct b44 *bp, int budget) static int b44_poll(struct net_device *netdev, int *budget) { struct b44 *bp = netdev_priv(netdev); + unsigned long flags; int done; - spin_lock_irq(bp-lock); + spin_lock_irqsave(bp-lock, flags); if (bp-istat (ISTAT_TX | ISTAT_TO)) { /* spin_lock(bp-tx_lock); */ b44_tx(bp); /* spin_unlock(bp-tx_lock); */ } - spin_unlock_irq(bp-lock); + + spin_unlock_irqrestore(bp-lock, flags); done = 1; if (bp-istat ISTAT_RX) { @@ -882,8 +884,6 @@ static int b44_poll(struct net_device *netdev, int *budget) } if (bp-istat ISTAT_ERRORS) { - unsigned long flags; - spin_lock_irqsave(bp-lock, flags); b44_halt(bp); b44_init_rings(bp); -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: b44: regression in 2.6.22 (resend)
I am busy bisecting the real cause. Unfortunately, oprofile doesn't work on the laptop, and build time sucks... This how I think the IRQ should work: --- a/drivers/net/b44.c 2007-05-29 09:47:53.0 -0700 +++ b/drivers/net/b44.c 2007-05-29 09:49:50.0 -0700 @@ -908,9 +908,11 @@ static irqreturn_t b44_interrupt(int irq u32 istat, imask; int handled = 0; - spin_lock(bp-lock); - istat = br32(bp, B44_ISTAT); + if (istat == 0 || istat == ~0) + return IRQ_NONE; + + spin_lock(bp-lock); imask = br32(bp, B44_IMASK); /* The interrupt mask register controls which interrupt bits - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Definition and usage of NETIF_F_HW_SUM?
On Tue, May 29, 2007 at 01:58:13PM -0700, Stephen Hemminger wrote: The flag NETIF_F_HW_SUM is being misused. The definition says that the device is capable of checksumming any packet. When in fact from usage it seems to imply that the device is capable of doing IPV6 as well as IPV4. That would be a problem. Some devices like 8139too do fake checksum offloading because they always have to copy the packet. Some devices like via-rhine, don't really checksum but if they see CHECKSUM_PARTIAL then they copy. This is bogus, they should just let higher layer do checksum/copy. Actually this is OK because if they have to copy it then it's cheaper to checksum it there. Both of these should be able to support all protocols. Devices like e1000, and bnx2 are broken because they assume only TCP/UDP and IPV4/IPV6. The definition of the flag says other protocols should work, but they probably send the hardware into a state of confusion. I just checked e1000 and it's correct as it does use the csum_offset when doing TX offload. However, you're definitely right that bnx2 seems to be broken. A few devices take a offset, starting point, and insertion point. This looks like the correct model. But no upper layer protocols other than IPV4/IPV6 can do checksum offload at present, so it seems moot. I could easily whip up a patch to get GRE to use it for a start :) IMHO the correct solution would be to get rid if NETIF_F_HW_SUM and make a new flag NETIF_F_IPV6_SUM. Devices that can checksum both could do NETIF_F_IPV4_SUM|NETI_F_IPV6_SUM. We should definitely keep NETIF_F_HW_SUM for sane hardware such as the e1000. Unfortunately we may just have to invent IPV6_SUM for the broken ones. Ccing Michael to see if the bnx2 chip can actually do offset-based checksum offload. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] b44: netpoll locking fix
On Tue, 29 May 2007 23:14:58 +0200 Francois Romieu [EMAIL PROTECTED] wrote: The irq handling thread (b44_interrupt) uses the same lock as the NAPI thread. This change should prevent a deadlock if something interrupts the b44 NAPI thread and tries to printk through netconsole. Signed-off-by: Francois Romieu [EMAIL PROTECTED] Better to just get rid of using the lock as a transmit lock and use netif_tx_lock instead. --- a/drivers/net/b44.c 2007-05-29 09:51:43.0 -0700 +++ b/drivers/net/b44.c 2007-05-29 14:26:03.0 -0700 @@ -607,6 +607,7 @@ static void b44_tx(struct b44 *bp) { u32 cur, cons; + netif_tx_lock(bp-dev); cur = br32(bp, B44_DMATX_STAT) DMATX_STAT_CDMASK; cur /= sizeof(struct dma_desc); @@ -622,7 +623,7 @@ static void b44_tx(struct b44 *bp) skb-len, PCI_DMA_TODEVICE); rp-skb = NULL; - dev_kfree_skb_irq(skb); + dev_kfree_skb_any(skb); } bp-tx_cons = cons; @@ -631,6 +632,7 @@ static void b44_tx(struct b44 *bp) netif_wake_queue(bp-dev); bw32(bp, B44_GPTIMER, 0); + netif_tx_unlock(bp-dev); } /* Works like this. This chip writes a 'struct rx_header 30 bytes @@ -855,14 +857,8 @@ static int b44_poll(struct net_device *n struct b44 *bp = netdev_priv(netdev); int done; - spin_lock_irq(bp-lock); - - if (bp-istat (ISTAT_TX | ISTAT_TO)) { - /* spin_lock(bp-tx_lock); */ + if (bp-istat (ISTAT_TX | ISTAT_TO)) b44_tx(bp); - /* spin_unlock(bp-tx_lock); */ - } - spin_unlock_irq(bp-lock); done = 1; if (bp-istat ISTAT_RX) { @@ -970,21 +966,19 @@ static int b44_start_xmit(struct sk_buff { struct b44 *bp = netdev_priv(dev); struct sk_buff *bounce_skb; - int rc = NETDEV_TX_OK; dma_addr_t mapping; u32 len, entry, ctrl; - len = skb-len; - spin_lock_irq(bp-lock); - - /* This is a hard error, log it. */ if (unlikely(TX_BUFFS_AVAIL(bp) 1)) { - netif_stop_queue(dev); - printk(KERN_ERR PFX %s: BUG! Tx Ring full when queue awake!\n, - dev-name); - goto err_out; + if (!netif_queue_stopped(dev)) { + netif_stop_queue(dev); + printk(KERN_ERR PFX %s: BUG! Tx Ring full when queue awake!\n, + dev-name); + } + return NETDEV_TX_BUSY; } + len = skb-len; mapping = pci_map_single(bp-pdev, skb-data, len, PCI_DMA_TODEVICE); if (dma_mapping_error(mapping) || mapping + len DMA_30BIT_MASK) { /* Chip can't handle DMA to/from 1GB, use bounce buffer */ @@ -1044,16 +1038,14 @@ static int b44_start_xmit(struct sk_buff if (TX_BUFFS_AVAIL(bp) 1) netif_stop_queue(dev); - dev-trans_start = jiffies; - -out_unlock: - spin_unlock_irq(bp-lock); + mmiowb(); - return rc; + dev-trans_start = jiffies; + return NETDEV_TX_OK; err_out: - rc = NETDEV_TX_BUSY; - goto out_unlock; + dev_kfree_skb(skb); + return NETDEV_TX_OK; } static int b44_change_mtu(struct net_device *dev, int new_mtu) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Definition and usage of NETIF_F_HW_SUM?
On Wed, 30 May 2007 07:36:18 +1000 Herbert Xu [EMAIL PROTECTED] wrote: On Tue, May 29, 2007 at 01:58:13PM -0700, Stephen Hemminger wrote: The flag NETIF_F_HW_SUM is being misused. The definition says that the device is capable of checksumming any packet. When in fact from usage it seems to imply that the device is capable of doing IPV6 as well as IPV4. That would be a problem. Some devices like 8139too do fake checksum offloading because they always have to copy the packet. Some devices like via-rhine, don't really checksum but if they see CHECKSUM_PARTIAL then they copy. This is bogus, they should just let higher layer do checksum/copy. Actually this is OK because if they have to copy it then it's cheaper to checksum it there. Both of these should be able to support all protocols. Devices like e1000, and bnx2 are broken because they assume only TCP/UDP and IPV4/IPV6. The definition of the flag says other protocols should work, but they probably send the hardware into a state of confusion. I just checked e1000 and it's correct as it does use the csum_offset when doing TX offload. However, you're definitely right that bnx2 seems to be broken. A few devices take a offset, starting point, and insertion point. This looks like the correct model. But no upper layer protocols other than IPV4/IPV6 can do checksum offload at present, so it seems moot. I could easily whip up a patch to get GRE to use it for a start :) IMHO the correct solution would be to get rid if NETIF_F_HW_SUM and make a new flag NETIF_F_IPV6_SUM. Devices that can checksum both could do NETIF_F_IPV4_SUM|NETI_F_IPV6_SUM. We should definitely keep NETIF_F_HW_SUM for sane hardware such as the e1000. Unfortunately we may just have to invent IPV6_SUM for the broken ones. The Marvell 88e8071 does IPV4 and IPV6 checksum, earlier chips could do arbitrary checksum. Looks like when they added the TSO6 logic, they made transmit state machine more protocol dependent. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-2.4] Fix divide by 0 in vegas_cong_avoid()
On Tue, 29 May 2007 20:23:45 +0200 Lior Dotan [EMAIL PROTECTED] wrote: NTP was not running. I'm not sure what do you mean by fixing the -1. The trace shows that vegas_cong_avoid() is called with -1, and the only way it can happen is from tcp_clean_rtx_queue() and the patch should eliminate this. Another way of solving this is by checking vegas_rtt_calc() and see if it gets -1 and handle it there. Another thing that I don't understand is that some places like tcp_ack() declare seq_rtt as signed and some vegas_cong_avoid() declare it as unsigned. Shouldn't it be declared always as signed? On 5/29/07, Stephen Hemminger [EMAIL PROTECTED] wrote: On Tue, 29 May 2007 12:18:19 +0300 Lior Dotan [EMAIL PROTECTED] wrote: Hi, I had a divide by zero on kernel 2.4.33 running with Vegas enabled. I don't think anyone has backported the other vegas fixes from 2.6.21 to 2.4. The KDB back trace is: kdb bt Stack traceback for pid 0 0x403a600000 10 R 0x403a6370 *swapper EBPEIPFunction (args) 0x403a7d48 0x4026ae51 vegas_cong_avoid+0x111 (0x5f3bb638, 0x73c92cbb, 0x , 0x73c92cbb, 0xf28d0275) kernel .text 0x4010 0x4026ad40 0x4026aef0 0x403a7d8c 0x4026bb67 tcp_ack+0x307 (0x5f3bb560, 0x581985c0, 0x18e, 0x4023e765, 0x5c369044) kernel .text 0x4010 0x4026b860 0x4026be20 Seems like fixing the -1 would be better. Perhaps NTP reset the clock? -- Stephen Hemminger [EMAIL PROTECTED] For 2.6.22, rtt_calc doesn't exist, instead pkts_acked is called. The following should be added to avoid getting bogus timestamp values from retransmits. --- a/net/ipv4/tcp_vegas.c 2007-05-02 12:26:35.0 -0700 +++ b/net/ipv4/tcp_vegas.c 2007-05-29 14:06:26.0 -0700 @@ -117,6 +117,10 @@ void tcp_vegas_pkts_acked(struct sock *s struct vegas *vegas = inet_csk_ca(sk); u32 vrtt; + /* Ignore retransmits */ + if (unlikely(cnt == 0)) + return; + /* Never allow zero rtt or baseRTT */ vrtt = ktime_to_us(net_timedelta(last)) + 1; -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] b44: netpoll locking fix
Stephen Hemminger [EMAIL PROTECTED] : ???...] Better to just get rid of using the lock as a transmit lock and use netif_tx_lock instead. --- a/drivers/net/b44.c 2007-05-29 09:51:43.0 -0700 +++ b/drivers/net/b44.c 2007-05-29 14:26:03.0 -0700 @@ -607,6 +607,7 @@ static void b44_tx(struct b44 *bp) { u32 cur, cons; + netif_tx_lock(bp-dev); cur = br32(bp, B44_DMATX_STAT) DMATX_STAT_CDMASK; cur /= sizeof(struct dma_desc); (damn, you are quick) I am not completely convinced. 1. netpoll_send_skb (calls netif_tx_trylock(dev)) - netpoll_poll(np) - poll_napi(np) - np-dev-poll(np-dev, budget) ( == b44_poll) - b44_tx - netif_tx_lock(bp-dev) *deadlock* 2. Hunk #5 clashes with John Linville's wireless-dev#master 3. Moderately appropriate for 2.6.22-rc (imho, fwiw, etc.) -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: b44: regression in 2.6.22 (resend)
On Tue, 2007-05-29 at 22:45 +0200, Michael Buesch wrote: On Tuesday 29 May 2007 16:14:35 Gary Zambrano wrote: On Mon, 2007-05-28 at 16:55 +0200, Michael Buesch wrote: On Monday 28 May 2007 16:12:12 Maximilian Engelhardt wrote: On Monday 28 May 2007, Michael Buesch wrote: Can you also test the following patch? I think there's a bug in b44 that is doesn't properly discard shared IRQs, so it might possibly generate a NAPI storm, dunno. Worth a try. Index: linux-2.6.22-rc3/drivers/net/b44.c === --- linux-2.6.22-rc3.orig/drivers/net/b44.c 2007-05-27 23:01:44.0 +0200 +++ linux-2.6.22-rc3/drivers/net/b44.c 2007-05-28 12:48:27.0 +0200 @@ -911,6 +911,8 @@ static irqreturn_t b44_interrupt(int irq spin_lock(bp-lock); istat = br32(bp, B44_ISTAT); + if (istat == 0x) + goto out; /* Shared IRQ not for us */ imask = br32(bp, B44_IMASK); /* The interrupt mask register controls which interrupt bits @@ -942,6 +944,7 @@ irq_ack: bw32(bp, B44_ISTAT, istat); br32(bp, B44_ISTAT); } +out: spin_unlock(bp-lock); return IRQ_RETVAL(handled); } I did try this patch on a affected kernel, but I didn't notice any big difference. Perhaps the kernel is a bit less slow during the test, but It's hard to tell. Ok, but anyway. I think this is a bug and needs to be fixed this way. Gary? Thanks Michael. No, I don't think this is a bug and it does not need to be fixed. Are you sure? I'm not so sure, because 1) On bcm43xx the reverse engineers told us that the card returns 0x for no-irq-pending. Since b44 and bcm43xx are very similiar in IRQ and DMA I just thought it would be the case there, too. Just guessing. The b44 interrupt status reg returns a value of 0 if no interrupts are pending. The b44 uses a mask to determine which bits (events) can generate device interrupts on the system. If the masked interrupt status register bits are not asserted, then the b44 will return to the system with handled = 0. So, I think the way the b44 interrupt code is written should be ok and not a bug. 2) PCMCIA cards usually return all-ones if you try to read a register of a card that's been removed. So it's good practice to check for this and bail out early in the IRQ path. Do PCMCIA cards (PC-card, not neccessarily a real 16bit PCMCIA card) for b44 exist? I do not know of any pccard application of the b44. As far as I know b44s live on motherboards and in the wireless soc. Thanks, Gary - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7 RESEND] cxgb3 - Update FW to 4.1
Jeff Garzik wrote: [EMAIL PROTECTED] wrote: From: Divy Le Ray [EMAIL PROTECTED] Bump FW version to 4.1. Modify chip tuning in consequence. Signed-off-by: Divy Le Ray [EMAIL PROTECTED] --- drivers/net/cxgb3/regs.h|4 drivers/net/cxgb3/t3_hw.c | 40 ++-- drivers/net/cxgb3/version.h |2 +- 3 files changed, 27 insertions(+), 19 deletions(-) You think requiring a FIRMWARE UPGRADE is something that should be done in the middle of a kernel bugfix series? What Q/A planet are you from? I made a bad jugdement. I'll repost a patch series for 2.6.22 fixes against Linus'tree, and a patch series for driver improvements against netdev-2.6#upstream, if it is acceptable. This is a completely ridiculous burden you are placing on the individual user. A good driver knows that firmware NOT SHIPPED WITH THE KERNEL does not necessarily move in lock-step with the kernel driver. We're actively working on RDMA improvements, leading to these FW updates. Some of the changes are unfortunately not backward compatible, hence the exact matching the driver enforces. The bundle driver/FW is getting finalized. New adapters will be shipped with the right image once it is available. In the mean time, this is indeed a burden for the current users to get the corresponding FW. Divy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: b44: regression in 2.6.22 (resend)
Gary Zambrano wrote: The b44 interrupt status reg returns a value of 0 if no interrupts are pending. The b44 uses a mask to determine which bits (events) can generate device interrupts on the system. If the masked interrupt status register bits are not asserted, then the b44 will return to the system with handled = 0. So, I think the way the b44 interrupt code is written should be ok and not a bug. This is normal. We check for 0x because that is often how a fault is indicated, when the memory location is read during or immediately after hotplug (or if the PCI bus is truly faulty). So for most hardware, you see tmp = read(irq status) if (!tmp) return irq-none /* no irq events raised */ if (tmp == 0x) return irq-none /* hot unplug or h/w fault */ and the method that determines no interrupt handling is needed. Regards, Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] b44: netpoll locking fix
On Wed, 30 May 2007 00:20:41 +0200 Francois Romieu [EMAIL PROTECTED] wrote: Stephen Hemminger [EMAIL PROTECTED] : ⅜...] Better to just get rid of using the lock as a transmit lock and use netif_tx_lock instead. --- a/drivers/net/b44.c 2007-05-29 09:51:43.0 -0700 +++ b/drivers/net/b44.c 2007-05-29 14:26:03.0 -0700 @@ -607,6 +607,7 @@ static void b44_tx(struct b44 *bp) { u32 cur, cons; + netif_tx_lock(bp-dev); cur = br32(bp, B44_DMATX_STAT) DMATX_STAT_CDMASK; cur /= sizeof(struct dma_desc); (damn, you are quick) I am not completely convinced. 1. netpoll_send_skb (calls netif_tx_trylock(dev)) - netpoll_poll(np) - poll_napi(np) - np-dev-poll(np-dev, budget) ( == b44_poll) - b44_tx - netif_tx_lock(bp-dev) *deadlock* Netpoll needs to be fixed. (or scrapped), as is it will break drivers trying to use tx_lock in poll routine. I know sky2 would get borked. Something like this: --- a/net/core/netpoll.c2007-05-08 14:19:32.0 -0700 +++ b/net/core/netpoll.c2007-05-29 15:28:22.0 -0700 @@ -250,22 +250,23 @@ static void netpoll_send_skb(struct netp unsigned long flags; local_irq_save(flags); - if (netif_tx_trylock(dev)) { - /* try until next clock tick */ - for (tries = jiffies_to_usecs(1)/USEC_PER_POLL; - tries 0; --tries) { + /* try until next clock tick */ + for (tries = jiffies_to_usecs(1)/USEC_PER_POLL; +tries 0; --tries) { + if (netif_tx_trylock(dev)) { if (!netif_queue_stopped(dev)) status = dev-hard_start_xmit(skb, dev); + netif_tx_unlock(dev); if (status == NETDEV_TX_OK) break; - /* tickle device maybe there is some cleanup */ - netpoll_poll(np); - - udelay(USEC_PER_POLL); } - netif_tx_unlock(dev); + + /* tickle device maybe there is some cleanup */ + netpoll_poll(np); + + udelay(USEC_PER_POLL); } local_irq_restore(flags); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: b44: regression in 2.6.22 (resend)
On Tue, 2007-05-29 at 18:39 -0400, Jeff Garzik wrote: We check for 0x because that is often how a fault is indicated, when the memory location is read during or immediately after hotplug (or if the PCI bus is truly faulty). So for most hardware, you see tmp = read(irq status) if (!tmp) return irq-none /* no irq events raised */ if (tmp == 0x) return irq-none /* hot unplug or h/w fault */ and the method that determines no interrupt handling is needed. I guess you are right, but then shouldn't the driver be checking for faults in other parts of the code too? What if a fault/hotplug occurs immediately after an interrupt, but before a tx? Thanks, Gary - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Definition and usage of NETIF_F_HW_SUM?
On Wed, 2007-05-30 at 07:36 +1000, Herbert Xu wrote: I just checked e1000 and it's correct as it does use the csum_offset when doing TX offload. However, you're definitely right that bnx2 seems to be broken. A few devices take a offset, starting point, and insertion point. This looks like the correct model. But no upper layer protocols other than IPV4/IPV6 can do checksum offload at present, so it seems moot. I could easily whip up a patch to get GRE to use it for a start :) IMHO the correct solution would be to get rid if NETIF_F_HW_SUM and make a new flag NETIF_F_IPV6_SUM. Devices that can checksum both could do NETIF_F_IPV4_SUM|NETI_F_IPV6_SUM. We should definitely keep NETIF_F_HW_SUM for sane hardware such as the e1000. Unfortunately we may just have to invent IPV6_SUM for the broken ones. Ccing Michael to see if the bnx2 chip can actually do offset-based checksum offload. bnx2 and tg3 cannot do offset-based checksumming because the hardware doesn't have room in the buffer descriptors to specify the offsets. So regrettably, the NETIF_F_HW_SUM flag has been misused in these drivers. A new NETIF_F_IPV6_SUM flag will be very useful for us. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] b44: netpoll locking fix
On Wed, May 30, 2007 at 12:20:41AM +0200, Francois Romieu wrote: 2. Hunk #5 clashes with John Linville's wireless-dev#master Let's not worry too much about this fact -- I'll (make Michael) fix-up things in my tree as appropriate. :-) John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Definition and usage of NETIF_F_HW_SUM?
On Tue, 29 May 2007 17:10:52 -0700 Michael Chan [EMAIL PROTECTED] wrote: On Wed, 2007-05-30 at 07:36 +1000, Herbert Xu wrote: I just checked e1000 and it's correct as it does use the csum_offset when doing TX offload. However, you're definitely right that bnx2 seems to be broken. A few devices take a offset, starting point, and insertion point. This looks like the correct model. But no upper layer protocols other than IPV4/IPV6 can do checksum offload at present, so it seems moot. I could easily whip up a patch to get GRE to use it for a start :) IMHO the correct solution would be to get rid if NETIF_F_HW_SUM and make a new flag NETIF_F_IPV6_SUM. Devices that can checksum both could do NETIF_F_IPV4_SUM|NETI_F_IPV6_SUM. We should definitely keep NETIF_F_HW_SUM for sane hardware such as the e1000. Unfortunately we may just have to invent IPV6_SUM for the broken ones. Ccing Michael to see if the bnx2 chip can actually do offset-based checksum offload. bnx2 and tg3 cannot do offset-based checksumming because the hardware doesn't have room in the buffer descriptors to specify the offsets. So regrettably, the NETIF_F_HW_SUM flag has been misused in these drivers. A new NETIF_F_IPV6_SUM flag will be very useful for us. Look furthur many drivers are just plain broken and use F_HW_SUM and can't even do IPV6 properly. I'll fix The worst code award goes to: qla3xxx.c which is broken on IPV6 and goes to trouble of computing all the offsets and they are already there in skb... -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] cxgb3 - fix netpoll hanlder
Jeff Garzik wrote: +t3_intr_handler(adapter, qs-rspq.polling) (0, +(adapter-flags USING_MSIX) ? +(void *)qs : (void *)adapter); Remove needless casts to void* The two branches of ?: need to have the same type; without the casts they'd be struct sge_qset and struct adapter. Seems a bit cruddy to have two types passed to one function depending on the MSI state, but maybe that's OK. J - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Problem with atl1 and msi in kernel 2.6.22-rc3
I have problems with a M2V motherboard and atl1 driver with msi and kernel 2.6.22-rc3. With kernel 2.6.21 I had no problems. I must add in drivers/pci/quirks.c (line 1723): DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_RS480, quirk_disable_msi); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, 0xe238, quirk_disable_msi); to disable msi in the atl1. Thanks. Jose Alberto - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] spidernet: checksum and ethtool
It doesn't look like spidernet hardware can really checksum all protocols, the code looks like it does IPV4 only. If so, it should use NETIF_F_IP_CSUM instead of NETIF_F_HW_CSUM. The driver doesn't need it's own get/set for ethtool tx csum, and it should use the standard ethtool_op_get_link. NOT TESTED (no CELL hardware). --- drivers/net/spider_net.c |4 ++-- drivers/net/spider_net_ethtool.c | 21 +++-- 2 files changed, 5 insertions(+), 20 deletions(-) --- a/drivers/net/spider_net.c 2007-05-29 16:44:35.0 -0700 +++ b/drivers/net/spider_net.c 2007-05-29 17:19:52.0 -0700 @@ -718,7 +718,7 @@ spider_net_prepare_tx_descr(struct spide SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS; spin_unlock_irqrestore(chain-lock, flags); - if (skb-protocol == htons(ETH_P_IP) skb-ip_summed == CHECKSUM_PARTIAL) + if (skb-ip_summed == CHECKSUM_PARTIAL) switch (ip_hdr(skb)-protocol) { case IPPROTO_TCP: hwdescr-dmac_cmd_status |= SPIDER_NET_DMAC_TCP; @@ -2225,7 +2225,7 @@ spider_net_setup_netdev(struct spider_ne spider_net_setup_netdev_ops(netdev); - netdev-features = NETIF_F_HW_CSUM | NETIF_F_LLTX; + netdev-features = NETIF_F_IP_CSUM | NETIF_F_LLTX; /* some time: NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX | * NETIF_F_HW_VLAN_FILTER */ --- a/drivers/net/spider_net_ethtool.c 2007-05-29 17:19:52.0 -0700 +++ b/drivers/net/spider_net_ethtool.c 2007-05-29 17:24:08.0 -0700 @@ -134,22 +134,6 @@ spider_net_ethtool_set_rx_csum(struct ne return 0; } -static uint32_t -spider_net_ethtool_get_tx_csum(struct net_device *netdev) -{ -return (netdev-features NETIF_F_HW_CSUM) != 0; -} - -static int -spider_net_ethtool_set_tx_csum(struct net_device *netdev, uint32_t data) -{ -if (data) -netdev-features |= NETIF_F_HW_CSUM; -else -netdev-features = ~NETIF_F_HW_CSUM; - -return 0; -} static void spider_net_ethtool_get_ringparam(struct net_device *netdev, @@ -200,11 +184,12 @@ const struct ethtool_ops spider_net_etht .get_wol= spider_net_ethtool_get_wol, .get_msglevel = spider_net_ethtool_get_msglevel, .set_msglevel = spider_net_ethtool_set_msglevel, + .get_link = ethtool_op_get_link, .nway_reset = spider_net_ethtool_nway_reset, .get_rx_csum= spider_net_ethtool_get_rx_csum, .set_rx_csum= spider_net_ethtool_set_rx_csum, - .get_tx_csum= spider_net_ethtool_get_tx_csum, - .set_tx_csum= spider_net_ethtool_set_tx_csum, + .get_tx_csum= ethtool_op_get_tx_csum, + .set_tx_csum= ethtool_op_set_tx_csum, .get_ringparam = spider_net_ethtool_get_ringparam, .get_strings= spider_net_get_strings, .get_stats_count= spider_net_get_stats_count, - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
bond_compute_features() does not handle devices with different CSUM
kernel version = 2.6.20.1 file drivers/net/bonding/bonding_main.c function bond_compute_features() --- Given a system with two different NIC. One driver sets dev-features |= NETIF_F_HW_CSUM the other driver sets dev-features |= NETIF_F_IP_CSUM when enslaving the 2 device above, bond_compute_features() does not set the intersection of the 2 CSUM features (should be NETIF_F_IP_CSUM) --- should the bond features in the case above include NETIF_F_IP_CSUM - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] cxgb3 - fix netpoll hanlder
Jeremy Fitzhardinge wrote: Jeff Garzik wrote: +t3_intr_handler(adapter, qs-rspq.polling) (0, +(adapter-flags USING_MSIX) ? +(void *)qs : (void *)adapter); Remove needless casts to void* The two branches of ?: need to have the same type; without the casts they'd be struct sge_qset and struct adapter. Seems a bit cruddy to have two types passed to one function depending on the MSI state, but maybe that's OK. Look at the function argument... Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] netpoll: tx lock deadlock fix
If sky2 device poll routine is called from netpoll_send_skb, it would deadlock. The netpoll_send_skb held the netif_tx_lock, and the poll routine could acquire it to clean up skb's. Other drivers might use same locking model. The driver is correct, netpoll should not introduce more locking problems than it causes already. So change the code to drop lock before calling poll handler. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/core/netpoll.c2007-05-08 14:19:32.0 -0700 +++ b/net/core/netpoll.c2007-05-29 15:28:22.0 -0700 @@ -250,22 +250,23 @@ static void netpoll_send_skb(struct netp unsigned long flags; local_irq_save(flags); - if (netif_tx_trylock(dev)) { - /* try until next clock tick */ - for (tries = jiffies_to_usecs(1)/USEC_PER_POLL; - tries 0; --tries) { + /* try until next clock tick */ + for (tries = jiffies_to_usecs(1)/USEC_PER_POLL; +tries 0; --tries) { + if (netif_tx_trylock(dev)) { if (!netif_queue_stopped(dev)) status = dev-hard_start_xmit(skb, dev); + netif_tx_unlock(dev); if (status == NETDEV_TX_OK) break; - /* tickle device maybe there is some cleanup */ - netpoll_poll(np); - - udelay(USEC_PER_POLL); } - netif_tx_unlock(dev); + + /* tickle device maybe there is some cleanup */ + netpoll_poll(np); + + udelay(USEC_PER_POLL); } local_irq_restore(flags); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bond_compute_features() does not handle devices with different CSUM
proposed change. --- 2.6.20/drivers/net/bonding/bond_main.c 2007-05-29 19:43:39.010565000 -0700 +++ 2.6.20.fix/drivers/net/bonding/bond_main.c 2007-05-29 19:46:12.37698 -0700 @@ -1227,7 +1227,14 @@ int i; bond_for_each_slave(bond, slave, i) { - features = (slave-dev-features BOND_INTERSECT_FEATURES); + /* NETIF_F_HW_CSUM includes support for NET_IF_IP_CSUM +* as such when looking for the intersection we need to +* add it to the device supported features +*/ + unsigned long dev_features = slave-dev-features; + if (slave-dev-features NETIF_F_HW_CSUM) + dev_features |= NETIF_F_IP_CSUM; + features = (features BOND_INTERSECT_FEATURES); if (slave-dev-hard_header_len max_hard_header_len) max_hard_header_len = slave-dev-hard_header_len; } On 5/29/07, Laurent Chavey [EMAIL PROTECTED] wrote: kernel version = 2.6.20.1 file drivers/net/bonding/bonding_main.c function bond_compute_features() --- Given a system with two different NIC. One driver sets dev-features |= NETIF_F_HW_CSUM the other driver sets dev-features |= NETIF_F_IP_CSUM when enslaving the 2 device above, bond_compute_features() does not set the intersection of the 2 CSUM features (should be NETIF_F_IP_CSUM) --- should the bond features in the case above include NETIF_F_IP_CSUM - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html