Re: [PATCH 0/4] RFC: Realtek 83xx SMI driver core
Hi Linus, did you make any progress with this? I noticed that the Vodafone Easybox 904xdsl/904lte models both make use of the RTL8367 switch. About one million of these routers have been deployed in Germany. There is an OpenWrt fork at https://github.com/Quallenauge/Easybox-904-XDSL/commits/master-lede which depends on the out-of-tree patches which seem to be the basis for your Realtek 83xx driver patches. Having your Realtek 83xx patches in the upstream Linux kernel would help tremendously in getting support for those router models merged in OpenWrt. Regards, Carl-Daniel
Re: [RFD] L2 Network namespace infrastructure
On 23.06.2007 19:19, Eric W. Biederman wrote: Patrick McHardy [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Depending upon the data structure it will either be modified to hold a per entry network namespace pointer or it there will be a separate copy per network namespace. For large global data structures like the ipv4 routing cache hash table adding an additional pointer to the entries appears the more reasonable solution. So the routing cache is shared between all namespaces? Yes. Each namespaces has it's own view so semantically it's not shared. But the initial fan out of the hash table 2M or something isn't something we want to replicate on a per namespace basis even assuming the huge page allocations could happen. So we just tag the entries and add the network namespace as one more part of the key when doing hash table look ups. Can one namespace DoS other namespaces' access to the routing cache? Two scenarios come to mind: * provoking hash collisions * lock contention (sorry, haven't checked whether/how we do locking) Regards, Carl-Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel (v.2)
On 08.06.2007 19:00, Ben Greear wrote: I have another sysfs patch that allows setting a default skb-mark for an interface so that you can set the skb-mark before it hits the connection tracking logic, but I'm been told this one has very little chance of getting into the kernel. The skb-mark patch is only useful (as far as I can tell) if you also include a patch Patrick McHardy did for me that allowed the conn-tracking logic to use skb-mark as part of it's tuple. This allows me to do NAT between virtual routers (routing tables) on the same machine using veth-equivalent drivers to connect the routers. He thinks this will probably not ever get into the kernel either. Are these patches available somewhere? I'm currently doing NAT between virtual routers by some advanced iproute2/iptables trickery, but I have no way to handle the occasional tuple conflict. Regards, Carl-Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LARTC] [ANNOUNCE] iproute2-2.6.18-061002
Stephen Hemminger wrote: This is a much delayed update to the iproute2 command set. It can be downloaded from: http://developer.osdl.org/dev/iproute2/download/iproute2-2.6.18-061002.tar.gz Thanks! Are there any plans to merge the ip arp patches at http://www.ssi.bg/~ja/#iparp ? Apologies if this has already been rejected before. Searching the archives I couldn't find such a discussion. Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Revert sky2 to 0.13a
Hi Jeff, you may want to push this patch into 2.6.16. The version it reverts to has been running stable for over four weeks for various folks (CC'ed) and we have had no success communicating with the maintainer. Regards, Carl-Daniel Revert sky2 to 0.13 with a four-line fix on top of it. Later versions cause random oopses and just hang on some chips. Signed-off-by: Carl-Daniel Hailfinger [EMAIL PROTECTED] diff -Nurp linux-2.6.16-rc4-git8/drivers/net/sky2.c linux-2.6.16-rc4-git8-sky2fix/drivers/net/sky2.c --- linux-2.6.16-rc4-git8/drivers/net/sky2.c2006-02-25 02:38:35.0 +0100 +++ linux-2.6.16-rc4-git8-sky2fix/drivers/net/sky2.c2006-02-26 01:29:45.0 +0100 @@ -23,6 +23,12 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ +/* + * TOTEST + * - speed setting + * - suspend/resume + */ + #include linux/config.h #include linux/crc32.h #include linux/kernel.h @@ -51,7 +57,7 @@ #include sky2.h #define DRV_NAME sky2 -#define DRV_VERSION0.15 +#define DRV_VERSION0.13a #define PFXDRV_NAME /* @@ -96,10 +102,6 @@ static int copybreak __read_mostly = 256 module_param(copybreak, int, 0); MODULE_PARM_DESC(copybreak, Receive copy threshold); -static int disable_msi = 0; -module_param(disable_msi, int, 0); -MODULE_PARM_DESC(disable_msi, Disable Message Signaled Interrupt (MSI)); - static const struct pci_device_id sky2_id_table[] = { { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9000) }, { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9E00) }, @@ -195,11 +197,11 @@ static int sky2_set_power_state(struct s pr_debug(sky2_set_power_state %d\n, state); sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_ON); - power_control = sky2_pci_read16(hw, hw-pm_cap + PCI_PM_PMC); - vaux = (sky2_read16(hw, B0_CTST) Y2_VAUX_AVAIL) + pci_read_config_word(hw-pdev, hw-pm_cap + PCI_PM_PMC, power_control); + vaux = (sky2_read8(hw, B0_CTST) Y2_VAUX_AVAIL) (power_control PCI_PM_CAP_PME_D3cold); - power_control = sky2_pci_read16(hw, hw-pm_cap + PCI_PM_CTRL); + pci_read_config_word(hw-pdev, hw-pm_cap + PCI_PM_CTRL, power_control); power_control |= PCI_PM_CTRL_PME_STATUS; power_control = ~(PCI_PM_CTRL_STATE_MASK); @@ -223,7 +225,7 @@ static int sky2_set_power_state(struct s sky2_write8(hw, B2_Y2_CLK_GATE, 0); /* Turn off phy power saving */ - reg1 = sky2_pci_read32(hw, PCI_DEV_REG1); + pci_read_config_dword(hw-pdev, PCI_DEV_REG1, reg1); reg1 = ~(PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD); /* looks like this XL is back asswards .. */ @@ -232,28 +234,18 @@ static int sky2_set_power_state(struct s if (hw-ports 1) reg1 |= PCI_Y2_PHY2_COMA; } - - if (hw-chip_id == CHIP_ID_YUKON_EC_U) { - sky2_pci_write32(hw, PCI_DEV_REG3, 0); - reg1 = sky2_pci_read32(hw, PCI_DEV_REG4); - reg1 = P_ASPM_CONTROL_MSK; - sky2_pci_write32(hw, PCI_DEV_REG4, reg1); - sky2_pci_write32(hw, PCI_DEV_REG5, 0); - } - - sky2_pci_write32(hw, PCI_DEV_REG1, reg1); - + pci_write_config_dword(hw-pdev, PCI_DEV_REG1, reg1); break; case PCI_D3hot: case PCI_D3cold: /* Turn on phy power saving */ - reg1 = sky2_pci_read32(hw, PCI_DEV_REG1); + pci_read_config_dword(hw-pdev, PCI_DEV_REG1, reg1); if (hw-chip_id == CHIP_ID_YUKON_XL hw-chip_rev 1) reg1 = ~(PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD); else reg1 |= (PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD); - sky2_pci_write32(hw, PCI_DEV_REG1, reg1); + pci_write_config_dword(hw-pdev, PCI_DEV_REG1, reg1); if (hw-chip_id == CHIP_ID_YUKON_XL hw-chip_rev 1) sky2_write8(hw, B2_Y2_CLK_GATE, 0); @@ -275,7 +267,7 @@ static int sky2_set_power_state(struct s ret = -1; } - sky2_pci_write16(hw, hw-pm_cap + PCI_PM_CTRL, power_control); + pci_write_config_byte(hw-pdev, hw-pm_cap + PCI_PM_CTRL, power_control); sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_OFF); return ret; } @@ -473,31 +465,16 @@ static void sky2_phy_init(struct sky2_hw ledover |= PHY_M_LED_MO_RX(MO_LED_OFF); } - if (hw-chip_id == CHIP_ID_YUKON_EC_U hw-chip_rev = 2) { - /* apply fixes in PHY AFE */ - gm_phy_write(hw, port, 22, 255); - /* increase differential signal amplitude in 10BASE-T */ - gm_phy_write(hw, port, 24, 0xaa99); - gm_phy_write(hw, port, 23, 0x2011
Re: [PATCH 02/02] add mask options to fwmark masking code
Michael Richardson schrieb: [PATCH] This patch introduces a mask to the fwmark test cases in the advanced routing. This let's one test individual bits of the fwmark to determine how things should be routed (pick a routing table). This patch retains compatibility with tests that do not set the mask by assuming a mask of 0 is equivalent to a mask of 0x. Sorry if I misunderstood the intention of your patch, but isn't similar code already in mainline? linux-2.6.16-rc3/net/sched/cls_u32.c:146 #ifdef CONFIG_CLS_U32_MARK if ((skb-nfmark n-mark.mask) != n-mark.val) { Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFT] sky2 0.16
Ian Kumlien schrieb: On Sun, 2006-02-19 at 14:20 +0100, Wolfgang Hoffmann wrote: On Saturday 18 February 2006 18:00, Carl-Daniel Hailfinger wrote: Hi, Stephen Hemminger schrieb: Could everyone who has problems with hangs try the following patch (against current 2.6.16-rc3 version) If Stephen's patch doesn't work for you, could you try replacing sky2.c and sky2.h with the ones attached to this mail? I'd be very interested in feedback for my version of the hangfix. Yes, your version cures my hangs. It's official, it cures my hangs as well, but it doesn't do MSI, and MSI might add some additional complexity. Could you all please test the attached patch against 2.6.16-rc4? It is a straight forward-port of my sky2 version that worked for you. Regards, Carl-Daniel -- http://www.hailfinger.org/ diff -urN linux-2.6.16-rc4/drivers/net/sky2.c linux-2.6.16-rc4-sky2fix/drivers/net/sky2.c --- linux-2.6.16-rc4/drivers/net/sky2.c 2006-02-21 01:31:18.0 +0100 +++ linux-2.6.16-rc4-sky2fix/drivers/net/sky2.c 2006-02-21 01:27:42.0 +0100 @@ -1863,6 +1863,17 @@ sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ); + /* +* Kick the STAT_LEV_TIMER_CTRL timer. +* This fixes my hangs on Yukon-EC (0xb6) rev 1. +* The if clause is there to start the timer only if it has been +* configured correctly and not been disabled via ethtool. +*/ + if (sky2_read8(hw, STAT_LEV_TIMER_CTRL) == TIM_START) { + sky2_write8(hw, STAT_LEV_TIMER_CTRL, TIM_STOP); + sky2_write8(hw, STAT_LEV_TIMER_CTRL, TIM_START); + } + hwidx = sky2_read16(hw, STAT_PUT_IDX); BUG_ON(hwidx = STATUS_RING_SIZE); rmb();
Re: [RFT] sky2 0.16
Ian Kumlien schrieb: On Sat, 2006-02-18 at 18:00 +0100, Carl-Daniel Hailfinger wrote: Hi, Stephen Hemminger schrieb: Could everyone who has problems with hangs try the following patch (against current 2.6.16-rc3 version) If Stephen's patch doesn't work for you, could you try replacing sky2.c and sky2.h with the ones attached to this mail? I'd be very interested in feedback for my version of the hangfix. Using that time stop and start on current 0.17 did not help: And what about the modified 0.13 version I had attached? Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sky2: fix hang on Yukon-EC (0xb6) rev 1
Carl-Daniel Hailfinger schrieb: Stephen Hemminger schrieb: On Tue, 24 Jan 2006 14:19:56 +0100 Carl-Daniel Hailfinger [EMAIL PROTECTED] wrote: This patch for sky2 fixes a hang on Yukon-EC (0xb6) rev 1 where suddenly no more interrupts were delivered. I don't know the real cause of the hang due to lack of docs, but the patch has been running stable for a few hours whereas the unmodified driver will hang after less than 2 minutes. OK, the patch has been stable for me for about 30 hours. As an added benefit, it seems to have reduced the NMI rate on my box from 1/second to 0.15/second, so something was wrong with these cards. Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sky2: fix ethtool ops
This fixes setting rx_coalesce_usecs_irq via ethtool in sky2. The write was directed to the wrong register. Signed-off-by: Carl-Daniel Hailfinger [EMAIL PROTECTED] --- linux/drivers/net/sky2.c2006-01-23 23:41:35.0 +0100 +++ linux/drivers/net/sky2.c2006-01-24 12:52:11.0 +0100 @@ -2843,7 +2843,7 @@ if (ecmd-rx_coalesce_usecs_irq == 0) sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_STOP); else { - sky2_write32(hw, STAT_TX_TIMER_INI, + sky2_write32(hw, STAT_ISR_TIMER_INI, sky2_us2clk(hw, ecmd-rx_coalesce_usecs_irq)); sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_START); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sky2: fix hang on Yukon-EC (0xb6) rev 1
Stephen Hemminger schrieb: On Tue, 24 Jan 2006 14:19:56 +0100 Carl-Daniel Hailfinger [EMAIL PROTECTED] wrote: This patch for sky2 fixes a hang on Yukon-EC (0xb6) rev 1 where suddenly no more interrupts were delivered. I don't know the real cause of the hang due to lack of docs, but the patch has been running stable for a few hours whereas the unmodified driver will hang after less than 2 minutes. This shouldn't be necessary, but I don't have specifications (yet) either. The logic is that clearing the interrupt should be okay since later on we check to see if the status ring is empty. Hm. We check the TX status, but do we really also check the RX status? I always had the feeling that the hangs were due to RX packets not getting serviced. Well, we'll see the real reason once you have the docs. I'll hold off till I get hardware specifications, that are coming any day now. Understandable. Just wanted to let you know that the patch has now been stable under various loads for 8 hours. Should there indeed be a documented bug in my Yukon2 version, will the bugfix make it into 2.6.16? Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 0.11 instability
Stephen Hemminger schrieb: You might try adjusting the interrupt coalescing parameters with ethtool -C eth0 ... But I can't give you hard guidelines as to what would make it better. I have a debug patch, but it needs work still. I don't care whether that debug patch will freeze the box or perform other random funnies. All the debugging printks I added to the driver did not trigger and I'd try anything. So yes, I'm desparate. Does the sk98lin driver have any code for such problems? Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 0.11 instability
Stephen Hemminger schrieb: On Mon, 23 Jan 2006 20:57:10 +0100 Carl-Daniel Hailfinger [EMAIL PROTECTED] wrote: Stephen Hemminger schrieb: You might try adjusting the interrupt coalescing parameters with ethtool -C eth0 ... But I can't give you hard guidelines as to what would make it better. I have a debug patch, but it needs work still. I don't care whether that debug patch will freeze the box or perform other random funnies. All the debugging printks I added to the driver did not trigger and I'd try anything. So yes, I'm desparate. Does the sk98lin driver have any code for such problems? There are several differences that the sk98lin driver has. * It programs some parts of the chip differently. But most of those are wrong. I started copying it, but where it was wrong I didn't copy the mistakes. * Sk98lin does NAPI wrong. It has interrupts disabled and runs packets through soft irq twice. * Sk98lin does it's own buggy rx checksum validation. * Sk98lin does not do VLAN * Sk98lin programs PCI-Ex for 2K transfers, but that causes data corruption The one that probably is saving you with sk98lin, is it has a watchdog routine that tries to work around all the possible driver hangs. I prefer to find an fix these hangs, because a watchdog routine like that just masks the problem and introduces a bunch of SMP race conditions which the sk98lin author either didn't see or ignored. Oh. Now that is news to me. Glad I didn't have a SMP machine with the old driver. There is a bug in ethtool support in sky2. Namely, rx-frames{,-irq}=64 is wrapped to zero. And rx-usecs-irq is 20 no matter what I set it to. # ethtool -C bridgeint0 rx-frames 64 rx-frames-irq 64 rx-usecs 1 rx-usecs-irq 1 tx-usecs 1 tx-frames 64 # ethtool -c bridgeint0 Coalesce parameters for bridgeint0: Adaptive RX: off TX: off stats-block-usecs: 0 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 1 rx-frames: 0 rx-usecs-irq: 20 rx-frames-irq: 0 tx-usecs: 1 tx-frames: 64 tx-usecs-irq: 0 tx-frames-irq: 0 Will continue investigating. Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 0.11 instability
Carl-Daniel Hailfinger schrieb: Stephen Hemminger schrieb: On Mon, 23 Jan 2006 20:57:10 +0100 Carl-Daniel Hailfinger [EMAIL PROTECTED] wrote: Stephen Hemminger schrieb: You might try adjusting the interrupt coalescing parameters with ethtool -C eth0 ... But I can't give you hard guidelines as to what would make it better. I have a debug patch, but it needs work still. I don't care whether that debug patch will freeze the box or perform other random funnies. All the debugging printks I added to the driver did not trigger and I'd try anything. So yes, I'm desparate. Does the sk98lin driver have any code for such problems? There are several differences that the sk98lin driver has. * It programs some parts of the chip differently. But most of those are wrong. I started copying it, but where it was wrong I didn't copy the mistakes. * Sk98lin does NAPI wrong. It has interrupts disabled and runs packets through soft irq twice. * Sk98lin does it's own buggy rx checksum validation. * Sk98lin does not do VLAN * Sk98lin programs PCI-Ex for 2K transfers, but that causes data corruption The one that probably is saving you with sk98lin, is it has a watchdog routine that tries to work around all the possible driver hangs. I prefer to find an fix these hangs, because a watchdog routine like that just masks the problem and introduces a bunch of SMP race conditions which the sk98lin author either didn't see or ignored. Oh. Now that is news to me. Glad I didn't have a SMP machine with the old driver. There is a bug in ethtool support in sky2. Namely, rx-frames{,-irq}=64 is wrapped to zero. And rx-usecs-irq is 20 no matter what I set it to. The following whitespace-damaged patch should help with the latter problem. --- a/drivers/net/sky2.c 2006-01-23 23:41:35.0 +0100 +++ b/drivers/net/sky2.c 2006-01-24 03:41:21.0 +0100 @@ -2843,7 +2843,7 @@ if (ecmd-rx_coalesce_usecs_irq == 0) sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_STOP); else { - sky2_write32(hw, STAT_TX_TIMER_INI, + sky2_write32(hw, STAT_ISR_TIMER_INI, sky2_us2clk(hw, ecmd-rx_coalesce_usecs_irq)); sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_START); } Despite all the problems I'm having with sky2, I want to thank you for writing it. The driver is easily readable and I can at least try to get it running. With sk98lin I'm just stuck due to coding style and general obfuscation. Yeah! I got the nic to reproducibly auto-recover. With the following ethtool settings it would hang after a few minutes and not recover until a rmmod/modprobe cycle. Now it comes back reliably. # ethtool -C bridgeext0 rx-frames 63 rx-frames-irq 63 tx-frames 63 \ rx-usecs 250 rx-usecs-irq 250 tx-usecs 250 Patch follows: --- a/drivers/net/sky2.c 2006-01-23 23:41:35.0 +0100 +++ b/drivers/net/sky2.c 2006-01-24 04:59:38.0 +0100 @@ -1623,6 +1623,12 @@ unsigned txq = txqaddr[sky2-port]; u16 ridx; + //sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP); + sky2_write8(hw, STAT_LEV_TIMER_CTRL, TIM_STOP); + //sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_STOP); + //sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START); + sky2_write8(hw, STAT_LEV_TIMER_CTRL, TIM_START); + //sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_START); /* Maybe we just missed an status interrupt */ spin_lock(sky2-tx_lock); ridx = sky2_read16(hw, @@ -1639,6 +1645,7 @@ if (netif_msg_timer(sky2)) printk(KERN_ERR PFX %s: tx timeout\n, dev-name); +#if 0 sky2_write32(hw, Q_ADDR(txq, Q_CSR), BMU_STOP); sky2_write32(hw, Y2_QADDR(txq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET); @@ -1646,6 +1653,7 @@ sky2_qset(hw, txq); sky2_prefetch_init(hw, txq, sky2-tx_le_map, TX_RING_SIZE - 1); +#endif } Properties of the patch above: The device will fail after some time, enter the tx_timeout handler, recover and continue. Now if I could avoid entering the tx_timeout handler, I would be happy because it triggers only after hanging for approx. 10 seconds. Error log with my patch so far: Jan 24 05:09:27 switch kernel: NETDEV WATCHDOG: bridgeint0: transmit timed out Jan 24 05:09:27 switch kernel: sky2 bridgeint0: tx timeout Jan 24 05:09:41 switch kernel: NETDEV WATCHDOG: bridgeext0: transmit timed out Jan 24 05:09:41 switch kernel: sky2 bridgeext0: tx timeout Jan 24 05:09:41 switch kernel: sky2 bridgeext0: rx error, status 0x7ffc0001 length 1312 Jan 24 05:11:12 switch kernel: NETDEV WATCHDOG: bridgeint0: transmit timed out Jan 24 05:11:12 switch kernel: sky2 bridgeint0: tx timeout Jan 24 05:11:12 switch kernel: sky2 bridgeint0: rx error, status 0x7ffc0001 length 592 Jan 24 05:11:42 switch kernel: NETDEV WATCHDOG: bridgeint0: transmit timed out Jan 24 05:11:42 switch kernel: sky2 bridgeint0: tx timeout Jan 24 05:11:42 switch kernel: sky2
Re: sky2 0.11 instability
Hi, Carl-Daniel Hailfinger schrieb: Carl-Daniel Hailfinger schrieb: Carl-Daniel Hailfinger schrieb: after sending 259 GB and receiving 25 GB over my SysKonnect SK-9E21 card (sky2 says it is a Yukon-EC (0xb6) rev 1), the card appears dead. Machine is an Athlon64 3200+ on an Asus A8N-SLI Deluxe board. I have now added a hard reset routine to the tx timeout path and hope it won't kill my machine. Apologies for mangled whitespace, this is just a rough cut'n'paste. --- linux-2.6.15/drivers/net/sky2.c.orig2006-01-21 16:00:15.0 +0100 +++ linux-2.6.15/drivers/net/sky2.c 2006-01-21 14:08:28.0 +0100 @@ -1565,6 +1565,7 @@ static int sky2_autoneg_done(struct sky2 return 0; } +static int sky2_reset(struct sky2_hw *hw); /* * Interrupt from PHY are handled outside of interrupt context * because accessing phy registers requires spin wait which might @@ -1639,6 +1640,7 @@ static void sky2_tx_timeout(struct net_d if (netif_msg_timer(sky2)) printk(KERN_ERR PFX %s: tx timeout\n, dev-name); + if (0) { sky2_write32(hw, Q_ADDR(txq, Q_CSR), BMU_STOP); sky2_write32(hw, Y2_QADDR(txq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET); @@ -1646,6 +1648,12 @@ static void sky2_tx_timeout(struct net_d sky2_qset(hw, txq); sky2_prefetch_init(hw, txq, sky2-tx_le_map, TX_RING_SIZE - 1); + } else { + printk(KERN_ERR PFX %s: recovering the HARD way...\n, dev-name); + sky2_down(dev); + sky2_reset(hw); + sky2_up(dev); + } } And everytime the kernel throws this message, I run the following script: #!/bin/bash deadinterface=`dmesg|grep HARD|tail -1|sed s/.*sky2 //;s/:.*//` ip l s $deadinterface down ip l s $deadinterface up After that, everything continues to work until the next tx timeout happens, and then the script again saves the day. More results about the circumstances of this bug: It seems that it will only trigger under LOW load. As long as I keep the interface busy, it will have no problems at all. OK, more info about the circumstances of the bug. - happens with sky2 0.11 and 0.13 - with low load (100 kB/s) it triggers after 12 hours and then approx. every 50 minutes - with medium load (100-1200 kB/s) it triggers after 30 minutes and then approx. every 70 minutes - with high RX load (9-12 MB/s) it triggers every 8 hours - with high TX load (9-12 MB/s) I can't get it to trigger - with stock tx_timeout handler, it will stay dead and no interrupts are received from the nic once it hangs - simply taking the interface down and up again doesn't help - with my modified tx_timeout handler, taking the interface down and up again after the timeout helps - with stock tx_timeout handler, I have to unload and reload the module to fix up the card - general pattern seems to be medium interrupt load - instability - ah yes, and this is a production machine at a slightly remote location. Silly me. If you want me to test any patch, tell me. It can only get better. Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 0.11 instability
Stephen Hemminger schrieb: You might try adjusting the interrupt coalescing parameters with ethtool -C eth0 ... But I can't give you hard guidelines as to what would make it better. I have a debug patch, but it needs work still. ethtool -C bridgeint1 rx-frames 255 rx-frames-irq 255 rx-usecs 0 rx-usecs-irq 0 tx-usecs 0 tx-frames 255 always results in a hang after less than 2 minutes if the network activity is not too high (about 100-600 packets/s). So yes, I can trigger this sucker on demand and give you all the debugging you need. Do you have any idea what the out-of-tree sk98lin did differently? Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 0.11 instability
Carl-Daniel Hailfinger schrieb: Stephen Hemminger schrieb: You might try adjusting the interrupt coalescing parameters with ethtool -C eth0 ... But I can't give you hard guidelines as to what would make it better. I have a debug patch, but it needs work still. After experimenting further, the following command will always hang the card after 2-3 seconds: ethtool -C bridgeint1 rx-frames 63 rx-frames-irq 63 rx-usecs 0 rx-usecs-irq 0 tx-usecs 0 tx-frames 63 Crude activity log (1 second interval) follows: interrupts RX packets TX packets # normal activity 18225503 1828622 2084564 18225914 1828932 2084939 18226422 1829361 2085422 18226875 1829694 2085832 18227286 1830012 2086183 18227622 1830270 2086465 18227963 1830541 2086738 18228340 1830827 2087057 18228710 1831107 2087382 18229091 1831390 2087694 18229467 1831677 2088002 18229835 1831954 2088338 # ethtool starts now 18230143 1832249 2088647 18230146 1832434 2088799 18230146 1832462 2088799 18230146 1832462 2088799 18230146 1832462 2088799 18230146 1832462 2088799 18230146 1832462 2088799 18230146 1832462 2088799 18230146 1832462 2088799 18230146 1832462 2088799 18230146 1832462 2088799 18230146 1832462 2088799 # the netdev watchdog triggers now So yes, I can trigger this sucker on demand and give you all the debugging you need. Do you have any idea what the out-of-tree sk98lin v8.14.3.3 did differently? Regards, Carl-Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 0.11 instability
Hi, Carl-Daniel Hailfinger schrieb: after sending 259 GB and receiving 25 GB over my SysKonnect SK-9E21 card (sky2 says it is a Yukon-EC (0xb6) rev 1), the card appears dead. Machine is an Athlon64 3200+ on an Asus A8N-SLI Deluxe board. sky2 v0.11 addr 0xc900 irq 74 Yukon-EC (0xb6) rev 1 sky2 eth3: addr 00:00:5a:70:30:fb [...] sky2 eth3: enabling interface [...] sky2 eth3: phy interrupt status 0x1c40 0x7d0c sky2 eth3: Link is up at 100 Mbps, full duplex, flow control both [...] NETDEV WATCHDOG: eth3: transmit timed out sky2 eth3: tx timeout NETDEV WATCHDOG: eth3: transmit timed out sky2 eth3: tx timeout switch:~ # ifconfig eth3 eth3 Link encap:Ethernet HWaddr 00:00:5A:70:30:FB inet6 addr: fe80::200:5aff:fe70:30fb/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:130530358 errors:0 dropped:0 overruns:0 frame:0 TX packets:209647800 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:25980735946 (24777.1 Mb) TX bytes:259787058579 (247752.2 Mb) Interrupt:74 switch:~ # cat /proc/interrupts CPU0 0: 11213627IO-APIC-edge timer 1: 24783IO-APIC-edge i8042 8: 0IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 15: 401558IO-APIC-edge ide1 50: 249384881 IO-APIC-level eth0 58: 179123938 IO-APIC-level sky2 66: 3 IO-APIC-level sky2, ohci1394 74: 98956955 IO-APIC-level sky2 82: 19952 IO-APIC-level sky2 217: 1865 IO-APIC-level libata, NVidia CK804 225: 263052 IO-APIC-level libata, ehci_hcd:usb1 NMI: 11098 LOC: 11214113 ERR: 0 MIS: 0 Not only will the card not transmit anymore, it also doesn't receive any packet at all. ethtool -r eth3 doesn't change anything, taking the interface down and up again also doesn't help. The interrupt count of interrupt 74 stays constant after failing. modprobe -r sky2; modprobe sky2 fixes the problem for me, so maybe resetting the card on TX timeouts will help. The same problem appeared much earlier for another card which shared interrupt 58 with an onboard card driven by skge. After disabling the skge driver and rebooting, that card has been stable so far. The card is connected to a 100 MBit switch. These problems didn't appear with sk98lin v8.14.3.3 (that driver did survive about 10 TB of traffic before I rebooted). Register dumps are available on request (too big for this list). I will now try sky2 0.13 and report back. And it hit the other interface after 200 MB transferred... NETDEV WATCHDOG: bridgeext0: transmit timed out sky2 bridgeext0: tx timeout NETDEV WATCHDOG: bridgeext0: transmit timed out sky2 transmit interrupt missed? recovered Although the driver claims to recover, it doesn't recover at all. What debug level would be advisable? It is now running with modprobe sky2 debug=2, but I can't see more than the messages above. I have now added a hard reset routine to the tx timeout path and hope it won't kill my machine. Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 0.11 instability
Carl-Daniel Hailfinger schrieb: Hi, Carl-Daniel Hailfinger schrieb: after sending 259 GB and receiving 25 GB over my SysKonnect SK-9E21 card (sky2 says it is a Yukon-EC (0xb6) rev 1), the card appears dead. Machine is an Athlon64 3200+ on an Asus A8N-SLI Deluxe board. sky2 v0.11 addr 0xc900 irq 74 Yukon-EC (0xb6) rev 1 sky2 eth3: addr 00:00:5a:70:30:fb [...] sky2 eth3: enabling interface [...] sky2 eth3: phy interrupt status 0x1c40 0x7d0c sky2 eth3: Link is up at 100 Mbps, full duplex, flow control both [...] NETDEV WATCHDOG: eth3: transmit timed out sky2 eth3: tx timeout NETDEV WATCHDOG: eth3: transmit timed out sky2 eth3: tx timeout switch:~ # ifconfig eth3 eth3 Link encap:Ethernet HWaddr 00:00:5A:70:30:FB inet6 addr: fe80::200:5aff:fe70:30fb/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:130530358 errors:0 dropped:0 overruns:0 frame:0 TX packets:209647800 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:25980735946 (24777.1 Mb) TX bytes:259787058579 (247752.2 Mb) Interrupt:74 switch:~ # cat /proc/interrupts CPU0 0: 11213627IO-APIC-edge timer 1: 24783IO-APIC-edge i8042 8: 0IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 15: 401558IO-APIC-edge ide1 50: 249384881 IO-APIC-level eth0 58: 179123938 IO-APIC-level sky2 66: 3 IO-APIC-level sky2, ohci1394 74: 98956955 IO-APIC-level sky2 82: 19952 IO-APIC-level sky2 217: 1865 IO-APIC-level libata, NVidia CK804 225: 263052 IO-APIC-level libata, ehci_hcd:usb1 NMI: 11098 LOC: 11214113 ERR: 0 MIS: 0 Not only will the card not transmit anymore, it also doesn't receive any packet at all. ethtool -r eth3 doesn't change anything, taking the interface down and up again also doesn't help. The interrupt count of interrupt 74 stays constant after failing. modprobe -r sky2; modprobe sky2 fixes the problem for me, so maybe resetting the card on TX timeouts will help. The same problem appeared much earlier for another card which shared interrupt 58 with an onboard card driven by skge. After disabling the skge driver and rebooting, that card has been stable so far. The card is connected to a 100 MBit switch. These problems didn't appear with sk98lin v8.14.3.3 (that driver did survive about 10 TB of traffic before I rebooted). Register dumps are available on request (too big for this list). I will now try sky2 0.13 and report back. And it hit the other interface after 200 MB transferred... NETDEV WATCHDOG: bridgeext0: transmit timed out sky2 bridgeext0: tx timeout NETDEV WATCHDOG: bridgeext0: transmit timed out sky2 transmit interrupt missed? recovered Although the driver claims to recover, it doesn't recover at all. What debug level would be advisable? It is now running with modprobe sky2 debug=2, but I can't see more than the messages above. I have now added a hard reset routine to the tx timeout path and hope it won't kill my machine. Apologies for mangled whitespace, this is just a rough cut'n'paste. --- linux-2.6.15/drivers/net/sky2.c.orig2006-01-21 16:00:15.0 +0100 +++ linux-2.6.15/drivers/net/sky2.c 2006-01-21 14:08:28.0 +0100 @@ -1565,6 +1565,7 @@ static int sky2_autoneg_done(struct sky2 return 0; } +static int sky2_reset(struct sky2_hw *hw); /* * Interrupt from PHY are handled outside of interrupt context * because accessing phy registers requires spin wait which might @@ -1639,6 +1640,7 @@ static void sky2_tx_timeout(struct net_d if (netif_msg_timer(sky2)) printk(KERN_ERR PFX %s: tx timeout\n, dev-name); + if (0) { sky2_write32(hw, Q_ADDR(txq, Q_CSR), BMU_STOP); sky2_write32(hw, Y2_QADDR(txq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET); @@ -1646,6 +1648,12 @@ static void sky2_tx_timeout(struct net_d sky2_qset(hw, txq); sky2_prefetch_init(hw, txq, sky2-tx_le_map, TX_RING_SIZE - 1); + } else { + printk(KERN_ERR PFX %s: recovering the HARD way...\n, dev-name); + sky2_down(dev); + sky2_reset(hw); + sky2_up(dev); + } } And everytime the kernel throws this message, I run the following script: #!/bin/bash deadinterface=`dmesg|grep HARD|tail -1|sed s/.*sky2 //;s/:.*//` ip l s $deadinterface down ip l s $deadinterface up After that, everything continues to work until the next tx timeout happens, and then the script again saves the day. More results about the circumstances of this bug: It seems that it will only trigger under LOW load. As long as I keep the interface busy, it will have no problems at all. Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org