Re: sky2 hangs
On Thu, 1 Feb 2007 19:55:32 +0100 Thomas Glanzmann [EMAIL PROTECTED] wrote: Hello, I have a sky2 network card in my intel mac mini. It stops working when I do havy network load like watching a divx over http/sshfs. However if I remove the driver module and load it again it works and even the tcp connection doesn't get shutdown. I automated the above procedure using a userland watchdog which basically does the same thing and is written entirely by me, because the traditional watchdog wasn't that reliable and did a lot of false positives: * Look every ten seconds if my default router is pingable (3 pings, one has to get back). If it isn't the case I call network_fix script (it calls the script only once after a ping gets lost. To run the script again at least one ping has to arrive again) (mini) [~] cat /usr/local/sbin/fix_network #!/bin/bash export PATH=/bin:/usr/bin:/usr/sbin:/sbin rmmod sky2 modprobe sky2 ifdown eth0 ifup eth0 If after that no ping is received from the default router for another 90 seconds I tell init to reboot and stop feeding the kernel software watchdog. * My watchdog also checks if sshd process is running. If it is down for more than 100 seconds it reboots the machine, too. Jan 27 22:35:35 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10) Jan 27 22:35:35 mini watchdog-tg[4146]: Running fix_network script. Jan 27 22:38:46 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10) Jan 27 22:38:46 mini watchdog-tg[4146]: Running fix_network script. Jan 27 22:44:17 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10) Jan 27 22:44:17 mini watchdog-tg[4146]: Running fix_network script. Jan 29 12:00:13 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10) Jan 29 12:00:13 mini watchdog-tg[4146]: Running fix_network script. Jan 29 19:18:59 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10) Jan 29 19:18:59 mini watchdog-tg[4146]: Running fix_network script. Jan 31 15:56:29 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10) Jan 31 15:56:29 mini watchdog-tg[4146]: Running fix_network script. Feb 1 08:56:57 mini watchdog-tg[4146]: No PONG received from 192.168.0.3 (failure 1 of 10) Feb 1 08:56:57 mini watchdog-tg[4146]: Running fix_network script. I have a question to this: I wonder why the Linux Kernel (no longer?) increments the use counter of an ethernet driver (I saw it on sky2 and e1000) when the interface is up, running and configured? I can unload the sky2 driver without doing a 'ifconfig eth0 down' beforehand. Could somone provide me with background on this fact? It was intentional in 2.6 to allow interfaces to be hot-removed. Remember with Internet protocols there is no hard binding (normally) between address and device and connections should not go down if link fails. With that everything works. If somone is interested in my userland watchdog, just send me an E-Mail. Hopefully, it won't be necessary for long. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 hangs
On Thu, 1 Feb 2007 19:55:32 +0100 Thomas Glanzmann [EMAIL PROTECTED] wrote: Hello, I have a sky2 network card in my intel mac mini. It stops working when I do havy network load like watching a divx over http/sshfs. Is this heavy Tx load (ie your watching movie from mac mini). or Rx load (you are watching movie on mac mini). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 hangs
I can reproduce the problem now (on mac mini). Interestingly it seems to whack the whole ethernet switch when it happens. - a previously suggested fix - passing idle=poll to the kernel - did not work for me at the end It is not an MSI or IRQ problem. It is a phy problem (see below). - the locks I have happen very periodically (somewhere around every 22-28 hours), as if the chip would die after a given amount of data transferred; I know this looks stupid but I thought I might mention it - I have about 1Mbit/s of (incoming) traffic on this interface: with short, very high peaks, as there is a MySQL server on the other end, receiving about 100 queries per second - unloading the sky2 module totally freezes the computer for me If you do: ethtool -r eth0 it cause a PHY reset (renegotiation) and clears the problem. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] skge: handle zero address at open
Some motherboards are broken and have no address set. Failing at probe time prevents the device from ever being used (like to download a fixed BIOS). Instead warn on probe and check again when device is brought up. That way the address can be set. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge.orig/drivers/net/skge.c +++ skge/drivers/net/skge.c @@ -2373,6 +2373,9 @@ static int skge_up(struct net_device *de size_t rx_size, tx_size; int err; + if (!is_valid_ether_addr(dev-dev_addr)) + return -EINVAL; + if (netif_msg_ifup(skge)) printk(KERN_INFO PFX %s: enabling interface\n, dev-name); @@ -3567,11 +3570,10 @@ static int __devinit skge_probe(struct p if (!dev) goto err_out_led_off; + /* Some motherboards are broken and has zero in ROM. */ if (!is_valid_ether_addr(dev-dev_addr)) { - printk(KERN_ERR PFX %s: bad (zero?) ethernet address in rom\n, + printk(KERN_WARNING PFX %s: bad (zero?) ethernet address in rom\n, pci_name(pdev)); - err = -EIO; - goto err_out_free_netdev; } err = register_netdev(dev); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] skge: update
Several enhancements: WOL now works, use dev_printk macros and allow handling broken hardware better. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] skge: version 1.10
Mark this as 1.10 because WOL now works Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge.orig/drivers/net/skge.c +++ skge/drivers/net/skge.c @@ -42,7 +42,7 @@ #include skge.h #define DRV_NAME skge -#define DRV_VERSION1.9 +#define DRV_VERSION1.10 #define PFXDRV_NAME #define DEFAULT_TX_RING_SIZE 128 -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] skge: WOL support
Add WOL support for Yukon chipsets in skge device. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/skge.c | 158 + drivers/net/skge.h |2 2 files changed, 125 insertions(+), 35 deletions(-) --- skge.orig/drivers/net/skge.c +++ skge/drivers/net/skge.c @@ -132,18 +132,93 @@ static void skge_get_regs(struct net_dev } /* Wake on Lan only supported on Yukon chips with rev 1 or above */ -static int wol_supported(const struct skge_hw *hw) +static u32 wol_supported(const struct skge_hw *hw) { - return !((hw-chip_id == CHIP_ID_GENESIS || - (hw-chip_id == CHIP_ID_YUKON hw-chip_rev == 0))); + if (hw-chip_id == CHIP_ID_YUKON hw-chip_rev != 0) + return WAKE_MAGIC | WAKE_PHY; + else + return 0; +} + +static u32 pci_wake_enabled(struct pci_dev *dev) +{ + int pm = pci_find_capability(dev, PCI_CAP_ID_PM); + u16 value; + + /* If device doesn't support PM Capabilities, but request is to disable +* wake events, it's a nop; otherwise fail */ + if (!pm) + return 0; + + pci_read_config_word(dev, pm + PCI_PM_PMC, value); + + value = PCI_PM_CAP_PME_MASK; + value = ffs(PCI_PM_CAP_PME_MASK) - 1; /* First bit of mask */ + + return value != 0; +} + +static void skge_wol_init(struct skge_port *skge) +{ + struct skge_hw *hw = skge-hw; + int port = skge-port; + enum pause_control save_mode; + u32 ctrl; + + /* Bring hardware out of reset */ + skge_write16(hw, B0_CTST, CS_RST_CLR); + skge_write16(hw, SK_REG(port, GMAC_LINK_CTRL), GMLC_RST_CLR); + + skge_write8(hw, SK_REG(port, GPHY_CTRL), GPC_RST_CLR); + skge_write8(hw, SK_REG(port, GMAC_CTRL), GMC_RST_CLR); + + /* Force to 10/100 skge_reset will re-enable on resume */ + save_mode = skge-flow_control; + skge-flow_control = FLOW_MODE_SYMMETRIC; + + ctrl = skge-advertising; + skge-advertising = ~(ADVERTISED_1000baseT_Half|ADVERTISED_1000baseT_Full); + + skge_phy_reset(skge); + + skge-flow_control = save_mode; + skge-advertising = ctrl; + + /* Set GMAC to no flow control and auto update for speed/duplex */ + gma_write16(hw, port, GM_GP_CTRL, + GM_GPCR_FC_TX_DIS|GM_GPCR_TX_ENA|GM_GPCR_RX_ENA| + GM_GPCR_DUP_FULL|GM_GPCR_FC_RX_DIS|GM_GPCR_AU_FCT_DIS); + + /* Set WOL address */ + memcpy_toio(hw-regs + WOL_REGS(port, WOL_MAC_ADDR), + skge-netdev-dev_addr, ETH_ALEN); + + /* Turn on appropriate WOL control bits */ + skge_write16(hw, WOL_REGS(port, WOL_CTRL_STAT), WOL_CTL_CLEAR_RESULT); + ctrl = 0; + if (skge-wol WAKE_PHY) + ctrl |= WOL_CTL_ENA_PME_ON_LINK_CHG|WOL_CTL_ENA_LINK_CHG_UNIT; + else + ctrl |= WOL_CTL_DIS_PME_ON_LINK_CHG|WOL_CTL_DIS_LINK_CHG_UNIT; + + if (skge-wol WAKE_MAGIC) + ctrl |= WOL_CTL_ENA_PME_ON_MAGIC_PKT|WOL_CTL_ENA_MAGIC_PKT_UNIT; + else + ctrl |= WOL_CTL_DIS_PME_ON_MAGIC_PKT|WOL_CTL_DIS_MAGIC_PKT_UNIT;; + + ctrl |= WOL_CTL_DIS_PME_ON_PATTERN|WOL_CTL_DIS_PATTERN_UNIT; + skge_write16(hw, WOL_REGS(port, WOL_CTRL_STAT), ctrl); + + /* block receiver */ + skge_write8(hw, SK_REG(port, RX_GMF_CTRL_T), GMF_RST_SET); } static void skge_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) { struct skge_port *skge = netdev_priv(dev); - wol-supported = wol_supported(skge-hw) ? WAKE_MAGIC : 0; - wol-wolopts = skge-wol ? WAKE_MAGIC : 0; + wol-supported = wol_supported(skge-hw); + wol-wolopts = skge-wol; } static int skge_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) @@ -151,23 +226,12 @@ static int skge_set_wol(struct net_devic struct skge_port *skge = netdev_priv(dev); struct skge_hw *hw = skge-hw; - if (wol-wolopts != WAKE_MAGIC wol-wolopts != 0) - return -EOPNOTSUPP; - - if (wol-wolopts == WAKE_MAGIC !wol_supported(hw)) + if (wol-wolopts wol_supported(hw)) return -EOPNOTSUPP; - skge-wol = wol-wolopts == WAKE_MAGIC; - - if (skge-wol) { - memcpy_toio(hw-regs + WOL_MAC_ADDR, dev-dev_addr, ETH_ALEN); - - skge_write16(hw, WOL_CTRL_STAT, -WOL_CTL_ENA_PME_ON_MAGIC_PKT | -WOL_CTL_ENA_MAGIC_PKT_UNIT); - } else - skge_write16(hw, WOL_CTRL_STAT, WOL_CTL_DEFAULT); - + skge-wol = wol-wolopts; + if (!netif_running(dev)) + skge_wol_init(skge); return 0; } @@ -3456,6 +3520,7 @@ static struct net_device *skge_devinit(s skge-duplex = -1; skge-speed = -1; skge-advertising = skge_supported_modes(hw); + skge-wol = pci_wake_enabled(hw-pdev) ? wol_supported(hw) : 0; hw
[PATCH 2/4] skge: use dev_printk
Use dev_printk related macros for PCI related errors and warnings Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge.orig/drivers/net/skge.c +++ skge/drivers/net/skge.c @@ -2395,7 +2395,7 @@ static int skge_up(struct net_device *de BUG_ON(skge-dma 7); if ((u64)skge-dma 32 != ((u64) skge-dma + skge-mem_size) 32) { - printk(KERN_ERR PFX pci_alloc_consistent region crosses 4G boundary\n); + dev_err(hw-pdev-dev, pci_alloc_consistent region crosses 4G boundary\n); err = -EINVAL; goto free_pci_mem; } @@ -3004,6 +3004,7 @@ static void skge_mac_intr(struct skge_hw /* Handle device specific framing and timeout interrupts */ static void skge_error_irq(struct skge_hw *hw) { + struct pci_dev *pdev = hw-pdev; u32 hwstatus = skge_read32(hw, B0_HWE_ISRC); if (hw-chip_id == CHIP_ID_GENESIS) { @@ -3019,12 +3020,12 @@ static void skge_error_irq(struct skge_h } if (hwstatus IS_RAM_RD_PAR) { - printk(KERN_ERR PFX Ram read data parity error\n); + dev_err(pdev-dev, Ram read data parity error\n); skge_write16(hw, B3_RI_CTRL, RI_CLR_RD_PERR); } if (hwstatus IS_RAM_WR_PAR) { - printk(KERN_ERR PFX Ram write data parity error\n); + dev_err(pdev-dev, Ram write data parity error\n); skge_write16(hw, B3_RI_CTRL, RI_CLR_WR_PERR); } @@ -3035,38 +3036,38 @@ static void skge_error_irq(struct skge_h skge_mac_parity(hw, 1); if (hwstatus IS_R1_PAR_ERR) { - printk(KERN_ERR PFX %s: receive queue parity error\n, - hw-dev[0]-name); + dev_err(pdev-dev, %s: receive queue parity error\n, + hw-dev[0]-name); skge_write32(hw, B0_R1_CSR, CSR_IRQ_CL_P); } if (hwstatus IS_R2_PAR_ERR) { - printk(KERN_ERR PFX %s: receive queue parity error\n, - hw-dev[1]-name); + dev_err(pdev-dev, %s: receive queue parity error\n, + hw-dev[1]-name); skge_write32(hw, B0_R2_CSR, CSR_IRQ_CL_P); } if (hwstatus (IS_IRQ_MST_ERR|IS_IRQ_STAT)) { u16 pci_status, pci_cmd; - pci_read_config_word(hw-pdev, PCI_COMMAND, pci_cmd); - pci_read_config_word(hw-pdev, PCI_STATUS, pci_status); + pci_read_config_word(pdev, PCI_COMMAND, pci_cmd); + pci_read_config_word(pdev, PCI_STATUS, pci_status); - printk(KERN_ERR PFX %s: PCI error cmd=%#x status=%#x\n, - pci_name(hw-pdev), pci_cmd, pci_status); + dev_err(pdev-dev, PCI error cmd=%#x status=%#x\n, + pci_cmd, pci_status); /* Write the error bits back to clear them. */ pci_status = PCI_STATUS_ERROR_BITS; skge_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_ON); - pci_write_config_word(hw-pdev, PCI_COMMAND, + pci_write_config_word(pdev, PCI_COMMAND, pci_cmd | PCI_COMMAND_SERR | PCI_COMMAND_PARITY); - pci_write_config_word(hw-pdev, PCI_STATUS, pci_status); + pci_write_config_word(pdev, PCI_STATUS, pci_status); skge_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_OFF); /* if error still set then just ignore it */ hwstatus = skge_read32(hw, B0_HWE_ISRC); if (hwstatus IS_IRQ_STAT) { - printk(KERN_INFO PFX unable to clear error (so ignoring them)\n); + dev_warn(hw-pdev-dev, unable to clear error (so ignoring them)\n); hw-intr_mask = ~IS_HW_ERR; } } @@ -3280,8 +3281,8 @@ static int skge_reset(struct skge_hw *hw hw-phy_addr = PHY_ADDR_BCOM; break; default: - printk(KERN_ERR PFX %s: unsupported phy type 0x%x\n, - pci_name(hw-pdev), hw-phy_type); + dev_err(hw-pdev-dev, unsupported phy type 0x%x\n, + hw-phy_type); return -EOPNOTSUPP; } break; @@ -3296,8 +3297,8 @@ static int skge_reset(struct skge_hw *hw break; default: - printk(KERN_ERR PFX %s: unsupported chip type 0x%x\n, - pci_name(hw-pdev), hw-chip_id); + dev_err(hw-pdev-dev, unsupported chip type 0x%x\n, + hw-chip_id); return -EOPNOTSUPP; } @@ -3337,7 +3338,7 @@ static int skge_reset(struct skge_hw *hw /* avoid boards with stuck Hardware error bits */ if ((skge_read32(hw, B0_ISRC) IS_HW_ERR
[RFT] sky2 auto negotiation PHY errata
This patch does the Marvell errata before auto negotiation (from drivers/phy/marvell.c). The Yukon II chips have an internal version of the same PHY, so perhaps this errata is necessary for them as well. For test only, but it may fix some of the hangs. It seems to fix the PHY lockups I saw yesterday on Mac Mini. --- drivers/net/sky2.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index 822dd0b..4f04ffa 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -387,6 +387,14 @@ static void sky2_phy_init(struct sky2_hw if (sky2-autoneg == AUTONEG_ENABLE) { if (sky2_is_copper(hw)) { + /* Errata setup */ + gm_phy_write(hw, port, PHY_MARV_PAGE_ADDR, 0x1f); + gm_phy_write(hw, port, PHY_MARV_PAGE_DATA, 0x200c); + gm_phy_write(hw, port, PHY_MARV_PAGE_ADDR, 5); + gm_phy_write(hw, port, PHY_MARV_PAGE_DATA, 0); + gm_phy_write(hw, port, PHY_MARV_PAGE_DATA, 0x100); + + if (sky2-advertising ADVERTISED_1000baseT_Full) ct1000 |= PHY_M_1000C_AFD; if (sky2-advertising ADVERTISED_1000baseT_Half) -- 1.4.1 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sky2: flow control off
Turn flow control off for sky2. When flow control is on, the transmitter may get randomly stuck. Perhaps there is hardware problem, but until Marvell provides errata information for workaround, it should default to off. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/sky2.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index 822dd0b..a31dea5 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -3263,7 +3263,7 @@ #endif /* Auto speed and flow control */ sky2-autoneg = AUTONEG_ENABLE; - sky2-flow_mode = FC_BOTH; + sky2-flow_mode = FC_NONE; sky2-duplex = -1; sky2-speed = -1; -- 1.4.1 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP]: Fix truesize underflow
On Tue, 18 Apr 2006 22:32:04 +1000 Herbert Xu [EMAIL PROTECTED] wrote: Hi Dave: You're absolutely right about there being a problem with the TSO packet trimming code. The cause of this lies in the tcp_fragment() function. When we allocate a fragment for a completely non-linear packet the truesize is calculated for a payload length of zero. This means that truesize could in fact be less than the real payload length. When that happens the TSO packet trimming can cause truesize to become negative. This in turn can cause sk_forward_alloc to be -n * PAGE_SIZE which would trigger the warning. I've copied the code you used in tso_fragment which should work here. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Everyone who's having the sk_forward_alloc warning problem should give this patch a go to see if it cures things. Just in case this still doesn't fix it, could everyone please also verify whether disabling SMP has any effect on reproducing this? Thanks, Please put this in the next -stable load... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I/OAT: Call for discussion
On Wed, 19 Apr 2006 09:39:37 -0700 Grover, Andrew [EMAIL PROTECTED] wrote: Over the past few months, we (the Intel networking group) have been working hard, often off-list, to get the I/OAT patches we've posted here merged into the mainline kernel branch, as well as Red Hat and SuSE. We've had some success, but not what's really important: getting it into the mainline kernel releases. Vendor kernel support has little or no bearing on eventual inclusion. Of course some of this can be blamed on how a corporate culture approaches the open source community when it thinks it has something that gives it a competitive advantage in the marketplace. If we acted like jerks, it's just because we think we have something good here! :) But seriously, I know we've had longer turnaround times in releases and replying to comments than people have liked. All we can say is sorry, we really have been doing our best. People were kind enough to review our patches and suggest over 50 improvements, we have fixed the patches accordingly, and we really do appreciate it. So OK assume we have a nice pretty patchset. Why should it go in? Since we have an NDA with Red Hat we've been trying to convince DaveM and Red Hat of I/OAT's merits off-list, but this kind of change needs a more public airing of all its pros and cons. Off list lobbying usually has a negative impact. We have posted all the performance data we have gathered so far on the linux-net wiki: http://linux-net.osdl.org/index.php/I/OAT , and listed the overall concerns that have been expressed in private. I'm hoping you will look at the data, re-examine the patches, and then we can talk about the technical issues here on the list, getting down to the specifics, so we can hash it out in public and settle on the right path to take. The biggest barrier at this point seems to be hardware availability. People generally don't care unless they use or are going to get that hardware. Also the big benchmark data, although interesting, is usually only interesting to vendors. You probably will have to suffer out of tree for a while until the hardware becomes more available. When the hardware is more common, then the implementation details will be sorted out. Also after the 2+ years of getting TSO to work right, maybe the developers are a little gun shy at this point. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netpoll checksum issue
The changes to how hardware receive checksums are handled broke the netpoll checksum code (for CHECKSUM_HW). Since this is not at all performance critical, try this patch. It changes to always to normal software checksum. --- linux-2.6.orig/net/core/netpoll.c 2006-03-22 09:30:56.0 -0800 +++ linux-2.6/net/core/netpoll.c2006-04-19 10:30:13.0 -0700 @@ -102,20 +102,11 @@ static int checksum_udp(struct sk_buff *skb, struct udphdr *uh, unsigned short ulen, u32 saddr, u32 daddr) { - unsigned int psum; - if (uh-check == 0 || skb-ip_summed == CHECKSUM_UNNECESSARY) return 0; - psum = csum_tcpudp_nofold(saddr, daddr, ulen, IPPROTO_UDP, 0); - - if (skb-ip_summed == CHECKSUM_HW - !(u16)csum_fold(csum_add(psum, skb-csum))) - return 0; - - skb-csum = psum; - - return __skb_checksum_complete(skb); + skb-csum = csum_tcpudp_nofold(saddr, daddr, ulen, IPPROTO_UDP, 0); + return (u16) csum_fold(skb_checksum(skb, 0, skb-len, skb-csum)); } /* - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TOE info page
On Wed, 19 Apr 2006 19:22:14 -0400 Jeff Garzik [EMAIL PROTECTED] wrote: I created a TOE (TCP Offload Engine) info page for Linux, on the linux-net wiki: http://linux-net.osdl.org/index.php/TOE As soon as I can find a wiki admin, it will get added to the main page. I don't seem to have such access. Jeff I am the main administrator. I updated the front page, and added a couple more stubs for NAPI, TSO, UFO. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bridge: allow full size vlan tagged packets to be bridged
The Ethernet bridge code silently drops packets when forwarding a packet that is too large for the destination interface (as per 802.1d). But it should allow for VLAN tagged frames. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- bridge.orig/net/bridge/br_forward.c 2006-04-10 16:17:51.0 -0700 +++ bridge/net/bridge/br_forward.c 2006-04-19 13:50:42.0 -0700 @@ -16,6 +16,7 @@ #include linux/kernel.h #include linux/netdevice.h #include linux/skbuff.h +#include linux/if_vlan.h #include linux/netfilter_bridge.h #include br_private.h @@ -29,10 +30,15 @@ return 1; } +static inline unsigned packet_length(const struct sk_buff *skb) +{ + return skb-len - (skb-protocol == htons(ETH_P_8021Q) ? VLAN_HLEN : 0); +} + int br_dev_queue_push_xmit(struct sk_buff *skb) { /* drop mtu oversized packets except tso */ - if (skb-len skb-dev-mtu !skb_shinfo(skb)-tso_size) + if (packet_length(skb) skb-dev-mtu !skb_shinfo(skb)-tso_size) kfree_skb(skb); else { #ifdef CONFIG_BRIDGE_NETFILTER - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Congestion Avoidance Monitoring Tools
On Thu, 20 Apr 2006 22:26:14 -0700 Piet Delaney [EMAIL PROTECTED] wrote: I'm upgrading our 2.6.12 kernel to 2.6.13, which includes significant congestion avoidance code additions and changes. I was wondering if there are any tools folks can recommend for testing the kernel to make sure the congestion avoidance code is operating correctly. For example the displaying of the congestion window as a function of time while undergoing convergence. For causing congestion I could modify a kernel to discard packets once in a while on a lab gateway and hit it with iperf. HP's netperf looks interesting. Any suggestions? -piet 2.6.13 still had lots of problems, things didn't really get working right till 2.6.15 or later. Especially with TSO. I have a tool using kprobe's see http://developer.osdl.org/shemminger/prototypes/tcpprobe.tar.gz I try to keep it up to date with current kernel and build process, last used it on 2.6.16. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Hotplug race on name change
This: Without that patch, there is a race when registering network interfaces and renaming it with udev rules, because initially the address in sysfs doesn't contain useful data. See http://marc.theaimsgroup.com/?t=11446033892r=1w=2 Breaking the recommended way of assigning persistent network interface names is, IMHO, a bug serious enough to be fixed in -stable. Signed-off-by: Alexander E. Patrakov [EMAIL PROTECTED] --- --- linux-2.6.16.5/net/core/dev.c +++ linux-2.6.16.5/net/core/dev.c @@ -2932,11 +2932,11 @@ switch(dev-reg_state) { case NETREG_REGISTERING: + dev-reg_state = NETREG_REGISTERED; err = netdev_register_sysfs(dev); if (err) printk(KERN_ERR %s: failed sysfs registration (%d)\n, dev-name, err); - dev-reg_state = NETREG_REGISTERED; break; case NETREG_UNREGISTERING: Introduces new races in netdev_register_sysfs if the name changes, because netdev_register_sysfs runs without RTNL at this point. So if some application gets in and changes the device name while netdev_register_sysfs is running, then the class_dev-class_id would end up not matching the netdevice-name. Not a big issue since, hotplug doesn't get run until the device is registered. Ideally, it would be possible to create the groups in the class device before it was registered. It won't work with existing class device interface. I am working on a patch to extend class_device to allow the creation of groups to be atomic (like the attributes are). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fw: [Bug 6421] New: kernel 2.6.10-2.6.16 on alpha: arch/alpha/kernel/io.c, iowrite16_rep() BUG_ON((unsigned long)src 0x1) triggered
Looks like PIO at unaligned addresses doesn't work on alpha... Begin forwarded message: Date: Fri, 21 Apr 2006 02:35:45 -0700 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [Bug 6421] New: kernel 2.6.10-2.6.16 on alpha: arch/alpha/kernel/io.c, iowrite16_rep() BUG_ON((unsigned long)src 0x1) triggered http://bugzilla.kernel.org/show_bug.cgi?id=6421 Summary: kernel 2.6.10-2.6.16 on alpha: arch/alpha/kernel/io.c, iowrite16_rep() BUG_ON((unsigned long)src 0x1) triggered Kernel Version: 2.6.16 Status: NEW Severity: blocking Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: 2.6.10 (i think so) Distribution: self build (RH9, FC4/5 on alpha mix) Hardware Environment: alpha sx164pc board with ne200 ISA card. Software Environment: Problem Description: Since 2.6.10 the BUG_ON() in arch/alpha/kernel/io.c iowrite16_rep() is triggered randomly, but reproducable on linux alpha platform with ne2000/8390 ISA network card. ne.c:v1.10 9/23/94 Donald Becker ([EMAIL PROTECTED]) Last modified Nov 1, 2000 by Paul Gortmaker NE*000 ethercard probe at 0x320: 00 80 29 63 4a f6 eth%d: NE2000 found at 0x320, using IRQ 5. I´ve traced this back to drivers/net/8390.c: ei_start_xmit(). I add a workaround, because else the system won't be usable any more in ei_start_xmit() like this: if ( (unsigned long) skb-data 0x1) { printk(KERN_WARNING ei_start_xmit(): skb-data unaligned %p align to %p length %i\n, skb-data, scratch, send_length); if ( send_length = 128 ) goto normal; memset( scratch, 0, 128); memcpy( scratch, skb-data, send_length 128 ? send_length : 128); ei_block_output(dev, send_length, scratch, output_page); } else { normal: ei_block_output(dev, send_length, skb-data, output_page); } And the output in the kernel ring buffer is: dmesg | grep xmit ei_start_xmit(): skb-data unaligned fc0019be55d5 align to fc001ef37620 length 60 ei_start_xmit(): skb-data unaligned fc0019be55d5 align to fc001ef37620 length 60 ei_start_xmit(): skb-data unaligned fc0008ceb735 align to fc00019dbb18 length 60 ei_start_xmit(): skb-data unaligned fc000ea787c5 align to fc001f683540 length 60 ei_start_xmit(): skb-data unaligned fc000e864fe9 align to fc000b737350 length 60 ei_start_xmit(): skb-data unaligned fc0008cd883f align to fc000b737350 length 60 So why is the skb-data address unaligned over time the system is running? Steps to reproduce: Every time. I think with higher system load the BUG is triggered earlier. Here some stack traces with BUG_ON/ WARN_ON added by me in more places to trace down the problem: Kernel bug at net/ipv4/ip_output.c:297 cc1(2841): Kernel Bug 1 pc = [fc62a92c] ra = [fc6407c8] ps = Not tainted pc is at ip_queue_xmit+0x59c/0x690 ra is at tcp_transmit_skb+0x588/0xbb0 v0 = fbc4 t0 = 0cfa4195 t1 = fc785d18 t2 = fc00014e26e0 t3 = 0030 t4 = 0030 t5 = fc000cfa41a9 t6 = 0002 t7 = fc00062d8000 a0 = fc001c14ef40 a1 = a2 = 1400 a3 = 0600 a4 = a5 = t8 = t9 = fc000cfa41a9 t10= 0001 t11= 0200 pv = fc62a390 at = gp = fc7f1600 sp = fc00062db6e0 Trace: [fc6407c8] tcp_transmit_skb+0x588/0xbb0 [fc6469ac] tcp_v4_send_check+0x11c/0x150 [fc6406cc] tcp_transmit_skb+0x48c/0xbb0 [fc641c48] tcp_retransmit_skb+0x128/0x7d0 [fc6436ec] tcp_xmit_retransmit_queue+0x1bc/0x3c0 [fc5f5618] sk_reset_timer+0x28/0x60 [fc63b8a0] tcp_ack+0x16e0/0x1e80 [fc63ec58] tcp_rcv_state_process+0x6c8/0x13f0 [fc649078] tcp_v4_do_rcv+0x128/0x480 [fc64a068] tcp_v4_rcv+0xc98/0xcb0 [fc62556c] ip_local_deliver+0x1ac/0x400 [fc625110] ip_rcv+0x480/0x730 [fc601794] netif_receive_skb+0x174/0x300 [fc6019ec] process_backlog+0xcc/0x1b0 [fc600394] net_rx_action+0xb4/0x1a0 [fc330cb0] __do_softirq+0x90/0x130 [fc3162e8] handle_IRQ_event+0x48/0x110 [fc330db4] do_softirq+0x64/0x70 [fc316c14] handle_irq+0x124/0x1b0 [fc31ff28] isa_device_interrupt+0x28/0x40 [fc31fe38] pyxis_device_interrupt+0x68/0x130 [fc317328] do_entInt+0x118/0x190 [fc3112d0] ret_from_sys_call+0x0/0x10 [fc393b90] __link_path_walk+0x900/0x10a0 [fc4ff080] memcmp+0x0/0x70 [fc393b90] __link_path_walk+0x900/0x10a0 [fc3943c0] link_path_walk+0x90/0x1b0 [fc4ff080] memcmp+0x0/0x70 [fc393b90] __link_path_walk+0x900/0x10a0 [fc3943c0] link_path_walk+0x90/0x1b0 [fc3956d8]
[PATCH 1/2] class device: add attribute_group creation
Extend the support of attribute groups in class_device's to allow groups to be created as part of the registration process. This allows network device's to avoid race between registration and creating groups. Note that unlike attributes that are a property of the class object, the groups are a property of the class_device object. This is done because there are different types of network devices (wireless for example). Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2-2.6.17.orig/drivers/base/class.c 2006-04-21 12:19:26.0 -0700 +++ sky2-2.6.17/drivers/base/class.c2006-04-21 12:21:21.0 -0700 @@ -456,6 +456,35 @@ } } +static int class_device_add_groups(struct class_device * cd) +{ + int i; + int error = 0; + + if (cd-groups) { + for (i = 0; cd-groups[i]; i++) { + error = sysfs_create_group(cd-kobj, cd-groups[i]); + if (error) { + while (--i = 0) + sysfs_remove_group(cd-kobj, cd-groups[i]); + goto out; + } + } + } +out: + return error; +} + +static void class_device_remove_groups(struct class_device * cd) +{ + int i; + if (cd-groups) { + for (i = 0; cd-groups[i]; i++) { + sysfs_remove_group(cd-kobj, cd-groups[i]); + } + } +} + static ssize_t show_dev(struct class_device *class_dev, char *buf) { return print_dev_t(buf, class_dev-devt); @@ -559,6 +588,8 @@ class_name); } + class_device_add_groups(class_dev); + kobject_uevent(class_dev-kobj, KOBJ_ADD); /* notify any interfaces this device is now here */ @@ -672,6 +703,7 @@ if (class_dev-devt_attr) class_device_remove_file(class_dev, class_dev-devt_attr); class_device_remove_attrs(class_dev); + class_device_remove_groups(class_dev); kobject_uevent(class_dev-kobj, KOBJ_REMOVE); kobject_del(class_dev-kobj); --- sky2-2.6.17.orig/include/linux/device.h 2006-04-21 12:19:26.0 -0700 +++ sky2-2.6.17/include/linux/device.h 2006-04-21 12:19:36.0 -0700 @@ -200,6 +200,7 @@ * @node: for internal use by the driver core only. * @kobj: for internal use by the driver core only. * @devt_attr: for internal use by the driver core only. + * @groups: optional additional groups to be created * @dev: if set, a symlink to the struct device is created in the sysfs * directory for this struct class device. * @class_data: pointer to whatever you want to store here for this struct @@ -228,6 +229,7 @@ struct device * dev; /* not necessary, but nice to have */ void* class_data; /* class-specific data */ struct class_device *parent;/* parent of this child device, if there is one */ + struct attribute_group ** groups; /* optional groups */ void(*release)(struct class_device *dev); int (*uevent)(struct class_device *dev, char **envp, - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] netdev: create attribute_groups with class_device_add
Atomically create attributes when class device is added. This avoids the race between registering class_device (which generates hotplug event), and the creation of attribute groups. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2-2.6.17.orig/net/core/dev.c 2006-04-21 12:20:58.0 -0700 +++ sky2-2.6.17/net/core/dev.c 2006-04-21 12:21:45.0 -0700 @@ -3043,11 +3043,11 @@ switch(dev-reg_state) { case NETREG_REGISTERING: - dev-reg_state = NETREG_REGISTERED; err = netdev_register_sysfs(dev); if (err) printk(KERN_ERR %s: failed sysfs registration (%d)\n, dev-name, err); + dev-reg_state = NETREG_REGISTERED; break; case NETREG_UNREGISTERING: --- sky2-2.6.17.orig/net/core/net-sysfs.c 2006-04-21 12:20:58.0 -0700 +++ sky2-2.6.17/net/core/net-sysfs.c2006-04-21 12:21:45.0 -0700 @@ -29,7 +29,7 @@ static inline int dev_isalive(const struct net_device *dev) { - return dev-reg_state == NETREG_REGISTERED; + return dev-reg_state = NETREG_REGISTERED; } /* use same locking rules as GIF* ioctl's */ @@ -445,58 +445,33 @@ void netdev_unregister_sysfs(struct net_device * net) { - struct class_device * class_dev = (net-class_dev); - - if (net-get_stats) - sysfs_remove_group(class_dev-kobj, netstat_group); - -#ifdef WIRELESS_EXT - if (net-get_wireless_stats || (net-wireless_handlers - net-wireless_handlers-get_wireless_stats)) - sysfs_remove_group(class_dev-kobj, wireless_group); -#endif - class_device_del(class_dev); - + class_device_del((net-class_dev)); } /* Create sysfs entries for network device. */ int netdev_register_sysfs(struct net_device *net) { struct class_device *class_dev = (net-class_dev); - int ret; + struct attribute_group **groups = net-sysfs_groups; + class_device_initialize(class_dev); class_dev-class = net_class; class_dev-class_data = net; + class_dev-groups = groups; + BUILD_BUG_ON(BUS_ID_SIZE IFNAMSIZ); strlcpy(class_dev-class_id, net-name, BUS_ID_SIZE); - if ((ret = class_device_register(class_dev))) - goto out; - if (net-get_stats - (ret = sysfs_create_group(class_dev-kobj, netstat_group))) - goto out_unreg; + if (net-get_stats) + *groups++ = netstat_group; #ifdef WIRELESS_EXT - if (net-get_wireless_stats || (net-wireless_handlers - net-wireless_handlers-get_wireless_stats)) { - ret = sysfs_create_group(class_dev-kobj, wireless_group); - if (ret) - goto out_cleanup; - } - return 0; -out_cleanup: - if (net-get_stats) - sysfs_remove_group(class_dev-kobj, netstat_group); -#else - return 0; + if (net-get_wireless_stats + || (net-wireless_handlers net-wireless_handlers-get_wireless_stats)) + *groups++ = wireless_group; #endif -out_unreg: - printk(KERN_WARNING %s: sysfs attribute registration failed %d\n, - net-name, ret); - class_device_unregister(class_dev); -out: - return ret; + return class_device_add(class_dev); } int netdev_sysfs_init(void) --- sky2-2.6.17.orig/include/linux/netdevice.h 2006-04-21 12:20:58.0 -0700 +++ sky2-2.6.17/include/linux/netdevice.h 2006-04-21 12:21:45.0 -0700 @@ -506,6 +506,8 @@ /* class/net/name entry */ struct class_device class_dev; + /* space for optional statistics and wireless sysfs groups */ + struct attribute_group *sysfs_groups[3]; }; #defineNETDEV_ALIGN32 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] netdev sysfs failure handling
In case of sysfs failure, don't let device be brought up. It can be cleared by unregister_netdevice so module can be unloaded normally. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2-2.6.17.orig/net/core/dev.c 2006-04-21 12:21:45.0 -0700 +++ sky2-2.6.17/net/core/dev.c 2006-04-21 12:46:48.0 -0700 @@ -3043,10 +3043,17 @@ switch(dev-reg_state) { case NETREG_REGISTERING: + /* Can't do proper error handling here because +* this is a delayed call after register_netdevice +* so no way to tell device driver what is wrong. +*/ err = netdev_register_sysfs(dev); - if (err) + if (err) { printk(KERN_ERR %s: failed sysfs registration (%d)\n, dev-name, err); + /* Don't let device be brought up */ + clear_bit(__LINK_STATE_PRESENT, dev-state); + } dev-reg_state = NETREG_REGISTERED; break; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/10] d80211: rename master interface
On Fri, 21 Apr 2006 22:53:29 +0200 (CEST) Jiri Benc [EMAIL PROTECTED] wrote: Rename master interface to wmasterX to better reflect its purpose. Signed-off-by: Jiri Benc [EMAIL PROTECTED] --- net/d80211/ieee80211.c |2 +- net/d80211/ieee80211_i.h |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) 784f203467e4421aa0ecac34cb1647f4bdfe51be diff --git a/net/d80211/ieee80211.c b/net/d80211/ieee80211.c index 31f979c..1fd13dd 100644 --- a/net/d80211/ieee80211.c +++ b/net/d80211/ieee80211.c @@ -4144,7 +4144,7 @@ struct net_device *ieee80211_alloc_hw(si ((char *) local + ((sizeof(struct ieee80211_local) + 3) ~3)); ether_setup(mdev); - memcpy(mdev-name, wlan%d, 7); + memcpy(mdev-name, wmaster%d, 10); Why not use strlcpy or strncpy? and use sizeof(mdev-name) or IFNAMSIZ rather than hard coded 10. local-dev_index = -1; local-mdev = mdev; diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h index ea1d9ab..3580d1e 100644 --- a/net/d80211/ieee80211_i.h +++ b/net/d80211/ieee80211_i.h @@ -318,7 +318,7 @@ #define IEEE80211_SUB_IF_TO_DEV(sub_if) struct ieee80211_local { struct ieee80211_hw *hw; void *hw_priv; - struct net_device *mdev; /* wlan# - master 802.11 device */ + struct net_device *mdev; /* wmaster# - master 802.11 device */ int open_count; int monitors; struct ieee80211_conf conf; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/5] s2io driver updates
On Sat, 22 Apr 2006 11:28:02 +0200 Francois Romieu [EMAIL PROTECTED] wrote: Ananda Raju [EMAIL PROTECTED] : [...] Signed-off-by: Ananda Raju [EMAIL PROTECTED] --- diff -upNr perf_fixes/drivers/net/s2io.c dmesg_param_fixes/drivers/net/s2io.c --- perf_fixes/drivers/net/s2io.c 2006-04-13 08:02:56.0 -0700 +++ dmesg_param_fixes/drivers/net/s2io.c2006-04-13 09:08:22.0 -0700 [...] @@ -4626,6 +4633,45 @@ static int write_eeprom(nic_t * sp, int return ret; } +static void s2io_vpd_read(nic_t *nic) +{ + u8 vpd_data[256],data; You may consider removing vpd_data from the stack and kmallocing it. Since there lsvpd tool doesn't in user space, why add more kernel code to do it? Adding more code to just print prettier console log's is bogus. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] ixgb: Fix the use of dprintk rather than printk
On Sat, 22 Apr 2006 11:03:01 +0200 Francois Romieu [EMAIL PROTECTED] wrote: Jeff Kirsher [EMAIL PROTECTED] : [...] diff --git a/drivers/net/ixgb/ixgb.h b/drivers/net/ixgb/ixgb.h index c83271b..a696c33 100644 --- a/drivers/net/ixgb/ixgb.h +++ b/drivers/net/ixgb/ixgb.h [...] @@ -192,6 +197,7 @@ struct ixgb_adapter { /* structs defined in ixgb_hw.h */ struct ixgb_hw hw; + u16 msg_enable; struct ixgb_hw_stats stats; #ifdef CONFIG_PCI_MSI boolean_t have_msi; diff --git a/drivers/net/ixgb/ixgb_ethtool.c b/drivers/net/ixgb/ixgb_ethtool.c index d38ade5..e8d83de 100644 --- a/drivers/net/ixgb/ixgb_ethtool.c +++ b/drivers/net/ixgb/ixgb_ethtool.c @@ -251,6 +251,20 @@ ixgb_set_tso(struct net_device *netdev, } #endif /* NETIF_F_TSO */ +static uint32_t +ixgb_get_msglevel(struct net_device *netdev) +{ + struct ixgb_adapter *adapter = netdev-priv; + return adapter-msg_enable; +} + +static void +ixgb_set_msglevel(struct net_device *netdev, uint32_t data) +{ + struct ixgb_adapter *adapter = netdev-priv; + adapter-msg_enable = data; +} + Minor nits: - you may consider removing the u{8/16/32} in drivers/net/ixgb for consistency sake in a different patch (there is a strong majority of uint_something in the driver). All the uint32_t should be removed. Kernel style is u32. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Congestion Avoidance Monitoring Tools
On Mon, 24 Apr 2006 00:52:35 +0200 Hagen Paul Pfeifer [EMAIL PROTECTED] wrote: * Stephen Hemminger | 2006-04-21 08:19:17 [-0700]: 2.6.13 still had lots of problems, things didn't really get working right till 2.6.15 or later. Especially with TSO. --verbose? I have a tool using kprobe's see http://developer.osdl.org/shemminger/prototypes/tcpprobe.tar.gz I try to keep it up to date with current kernel and build process, last used it on 2.6.16. wget http://developer.osdl.org/shemminger/prototypes/tcpprobe.tar.gz Ended with following error code: ;-) 00:32:48 ERROR 403: Forbidden. HGN Fixed - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/5] s2io driver updates
On Mon, 24 Apr 2006 10:39:52 -0700 Ananda Raju [EMAIL PROTECTED] wrote: Hi, Currently the only way we can differentiate between copper CX4 transponder adapters from optical transponder adapters is by reading the product name string in vpd. That makes sense. Though often the VPD can be messed up by OEM's. Probably not a big issue with this driver. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] netdev: hotplug napi race cleanup
This follows after the earlier two patches. Change the initialization of the class device portion of the net device to be done earlier, so that any races before registration completes are harmless. Add a mutex to avoid changes to netdevice during the class device registration. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- linux-2.6.orig/net/core/dev.c 2006-04-24 10:31:15.0 -0700 +++ linux-2.6/net/core/dev.c2006-04-24 10:31:16.0 -0700 @@ -203,10 +203,12 @@ #ifdef CONFIG_SYSFS extern int netdev_sysfs_init(void); -extern int netdev_register_sysfs(struct net_device *); -extern void netdev_unregister_sysfs(struct net_device *); +extern void netdev_init_classdev(struct net_device *); +#define netdev_register_sysfs(dev) class_device_add((dev-class_dev)) +#definenetdev_unregister_sysfs(dev) class_device_del((dev-class_dev)) #else #define netdev_sysfs_init()(0) +#define netdev_init_classdev(dev) do { } while(0) #define netdev_register_sysfs(dev) (0) #definenetdev_unregister_sysfs(dev)do { } while(0) #endif @@ -2870,6 +2872,8 @@ set_bit(__LINK_STATE_PRESENT, dev-state); + netdev_init_classdev(dev); + dev-next = NULL; dev_init_scheduler(dev); write_lock_bh(dev_base_lock); @@ -3047,7 +3051,10 @@ * this is a delayed call after register_netdevice * so no way to tell device driver what is wrong. */ + rtnl_lock(); err = netdev_register_sysfs(dev); + __rtnl_unlock(); + if (err) { printk(KERN_ERR %s: failed sysfs registration (%d)\n, dev-name, err); --- sky2-2.6.17.orig/net/core/net-sysfs.c 2006-04-24 10:31:14.0 -0700 +++ sky2-2.6.17/net/core/net-sysfs.c2006-04-24 10:31:16.0 -0700 @@ -443,13 +443,8 @@ #endif }; -void netdev_unregister_sysfs(struct net_device * net) -{ - class_device_del((net-class_dev)); -} - -/* Create sysfs entries for network device. */ -int netdev_register_sysfs(struct net_device *net) +/* Setup class device */ +void netdev_init_classdev(struct net_device *net) { struct class_device *class_dev = (net-class_dev); struct attribute_group **groups = net-sysfs_groups; @@ -470,8 +465,6 @@ || (net-wireless_handlers net-wireless_handlers-get_wireless_stats)) *groups++ = wireless_group; #endif - - return class_device_add(class_dev); } int netdev_sysfs_init(void) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: is it a backwards compatability catch-22?
On Mon, 24 Apr 2006 16:47:34 -0700 Rick Jones [EMAIL PROTECTED] wrote: I might be out to lunch, certainly it happens often enough :) I've spent the afternoon trying to stop my NIC names from being random on each boot. To that end, I've been doing udev rules based on an example I found at http://www.debianhelp.co.uk/udev.htm In this case I'm running a Debian 2.6.15-1 kernel. It seems that the SYSTEM{address} looks for a case senstive match on the address (MAC) of the interface in rules like these: lumber:~# cat /etc/udev/rules.d/010_netinterfaces.rules KERNEL=eth*,SYSFS{address}==00:30:6e:4c:27:3c, NAME=eth0 KERNEL=eth*,SYSFS{address}==00:30:6e:4c:27:3d, NAME=eth1 KERNEL=eth*,SYSFS{address}==00:12:79:9e:0e:d2, NAME=eth2 KERNEL=eth*,SYSFS{address}==00:12:79:9e:0e:d3, NAME=eth3 KERNEL=eth*,SYSFS{address}==00:0c:fc:00:08:71, NAME=eth4 it seems to want lower-case hex because that is what comes out of SYSFS. (?) Of course, ifconfig -a gives HW addresses in uppercase hex: lumber:~# ifconfig -a | grep HW eth0 Link encap:Ethernet HWaddr 00:30:6E:4C:27:3C eth1 Link encap:Ethernet HWaddr 00:30:6E:4C:27:3D eth2 Link encap:Ethernet HWaddr 00:12:79:9E:0E:D2 eth3 Link encap:Ethernet HWaddr 00:12:79:9E:0E:D3 eth4 Link encap:Ethernet HWaddr 00:0C:FC:00:08:71 and some of the dmesg stuff - notably e100: lumber:~# dmesg | grep eth e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection eth1: Tigon3 [partno(BCM95700A6) rev 0105 PHY(5701)] (PCI:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:30:6e:4c:27:3d eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[0] eth1: dma_rwctrl[76ff2d0f] e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection e100: eth3: e100_probe: addr 0x8002, irq 57, MAC addr 00:30:6E:4C:27:3C eth4: Neterion Xframe I 10GbE adapter (rev 4), Version Version 2.0.9.3, Intr type INTA e100: eth0: e100_watchdog: link up, 100Mbps, half-duplex While it isn't a showstopper it does become a bit inconvenient to have to downshift the MAC when taking it from ifconfig to use in the udev rules. Any chance the two can agree on one or the other? Or is each locked in a backwards compatability embrace? rick jones and of course, arp matches ifconfig: lumber:~# arp -an ? (15.4.89.87) at 00:12:79:94:F8:24 [ether] on eth0 ? (15.4.88.1) at 00:00:0C:07:AC:00 [ether] on eth0 not that arp in and of itself matters in this situation. Don't use the auto assigned format eth0, eth1, eth2? The udev stuff runs after the device has already chosen it's default name. It has to, it's part of the hotplug infrastructure, and we don't want to depend on usermode to define the name. Just choose some other convention eth_0 or something like that. -- Stephen Hemminger [EMAIL PROTECTED] OSDL http://developer.osdl.org/~shemminger - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: suspicious unlikely usage in tcp_transmit_skb()
On Mon, 24 Apr 2006 16:25:39 -0700 Hua Zhong [EMAIL PROTECTED] wrote: Hi, I am developing a profiling tool to check if likely/unlikely usages are wise. I find that the following one is always a miss: # Hit# miss Function:[EMAIL PROTECTED] ! 0 50505 tcp_transmit_skb():net/ipv4/[EMAIL PROTECTED] There is a chance that my tool is buggy, but I just want to confirm with you whether this does look suspicious and what your opinion is. Signed-off-by: Hua Zhong [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index a28ae59..743016b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -465,7 +465,7 @@ #define SYSCTL_FLAG_SACK0x4 TCP_INC_STATS(TCP_MIB_OUTSEGS); err = icsk-icsk_af_ops-queue_xmit(skb, 0); - if (unlikely(err = 0)) + if (likely(err = 0)) return err; tcp_enter_cwr(sk); How about just taking off the likely/unlikely in this case. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] sky2: add fake idle irq timer
Add an fake NAPI schedule once a second. This is an attempt to work around for broken configurations with edge-triggered interrupts. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2-2.6.17.orig/drivers/net/sky2.c 2006-04-25 10:48:47.0 -0700 +++ sky2-2.6.17/drivers/net/sky2.c 2006-04-25 10:53:32.0 -0700 @@ -2086,6 +2086,20 @@ } } +/* If idle then force a fake soft NAPI poll once a second + * to work around cases where sharing an edge triggered interrupt. + */ +static void sky2_idle(unsigned long arg) +{ + struct net_device *dev = (struct net_device *) arg; + + local_irq_disable(); + if (__netif_rx_schedule_prep(dev)) + __netif_rx_schedule(dev); + local_irq_enable(); +} + + static int sky2_poll(struct net_device *dev0, int *budget) { struct sky2_hw *hw = ((struct sky2_port *) netdev_priv(dev0))-hw; @@ -2134,6 +2148,8 @@ sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ); } + mod_timer(hw-idle_timer, jiffies + HZ); + local_irq_disable(); __netif_rx_complete(dev0); @@ -3288,6 +3304,8 @@ sky2_write32(hw, B0_IMSK, Y2_IS_BASE); + setup_timer(hw-idle_timer, sky2_idle, (unsigned long) dev); + pci_set_drvdata(pdev, hw); return 0; @@ -3323,13 +3341,15 @@ if (!hw) return; + del_timer_sync(hw-idle_timer); + + sky2_write32(hw, B0_IMSK, 0); dev0 = hw-dev[0]; dev1 = hw-dev[1]; if (dev1) unregister_netdev(dev1); unregister_netdev(dev0); - sky2_write32(hw, B0_IMSK, 0); sky2_set_power_state(hw, PCI_D3hot); sky2_write16(hw, B0_Y2LED, LED_STAT_OFF); sky2_write8(hw, B0_CTST, CS_RST_SET); --- sky2-2.6.17.orig/drivers/net/sky2.h 2006-04-25 10:48:42.0 -0700 +++ sky2-2.6.17/drivers/net/sky2.h 2006-04-25 10:51:33.0 -0700 @@ -1880,6 +1880,8 @@ struct sky2_status_le *st_le; u32 st_idx; dma_addr_t st_dma; + + struct timer_listidle_timer; int msi_detected; wait_queue_head_tmsi_wait; }; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] sky2: use ALIGN() macro
The ALIGN() macro in kernel.h does the same math that the sky2 driver was using for padding. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2-2.6.17.orig/drivers/net/sky2.c 2006-04-25 10:47:03.0 -0700 +++ sky2-2.6.17/drivers/net/sky2.c 2006-04-25 10:47:28.0 -0700 @@ -925,8 +925,7 @@ skb = alloc_skb(size + RX_SKB_ALIGN, gfp_mask); if (likely(skb)) { unsigned long p = (unsigned long) skb-data; - skb_reserve(skb, - ((p + RX_SKB_ALIGN - 1) ~(RX_SKB_ALIGN - 1)) - p); + skb_reserve(skb, ALIGN(p, RX_SKB_ALIGN) - p); } return skb; @@ -1686,13 +1685,12 @@ } -#define roundup(x, y) x)+((y)-1))/(y))*(y)) /* Want receive buffer size to be multiple of 64 bits * and incl room for vlan and truncation */ static inline unsigned sky2_buf_size(int mtu) { - return roundup(mtu + ETH_HLEN + VLAN_HLEN, 8) + 8; + return ALIGN(mtu + ETH_HLEN + VLAN_HLEN, 8) + 8; } static int sky2_change_mtu(struct net_device *dev, int new_mtu) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] sky2: version 1.2
Update to version 1.2 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2-2.6.17.orig/drivers/net/sky2.c 2006-04-25 10:54:57.0 -0700 +++ sky2-2.6.17/drivers/net/sky2.c 2006-04-25 10:55:51.0 -0700 @@ -51,7 +51,7 @@ #include sky2.h #define DRV_NAME sky2 -#define DRV_VERSION1.1 +#define DRV_VERSION1.2 #define PFXDRV_NAME /* -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] sky2: version 1.2
Update to sky2 driver. Mostly fixes to try and handle users stuck with edge-triggered interrupts. Also, some minor cleanups. Patches apply onto 1.1 version in 2.6.17-rc2 -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] sky2: reschedule if irq still pending
This is a workaround for the case edge-triggered irq's. Several users seem to have broken configurations sharing edge-triggered irq's. To avoid losing IRQ's, reshedule if more work arrives. The changes to netdevice.h are to extract the part that puts device back in list into separate inline. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2-2.6.17.orig/drivers/net/sky2.c 2006-04-25 10:48:44.0 -0700 +++ sky2-2.6.17/drivers/net/sky2.c 2006-04-25 10:48:47.0 -0700 @@ -2093,6 +2093,7 @@ int work_done = 0; u32 status = sky2_read32(hw, B0_Y2_SP_EISR); + restart_poll: if (unlikely(status ~Y2_IS_STAT_BMU)) { if (status Y2_IS_HW_ERR) sky2_hw_intr(hw); @@ -2123,7 +2124,7 @@ } if (status Y2_IS_STAT_BMU) { - work_done = sky2_status_intr(hw, work_limit); + work_done += sky2_status_intr(hw, work_limit - work_done); *budget -= work_done; dev0-quota -= work_done; @@ -2133,9 +2134,22 @@ sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ); } - netif_rx_complete(dev0); + local_irq_disable(); + __netif_rx_complete(dev0); status = sky2_read32(hw, B0_Y2_SP_LISR); + + if (unlikely(status)) { + /* More work pending, try and keep going */ + if (__netif_rx_schedule_prep(dev0)) { + __netif_rx_reschedule(dev0, work_done); + status = sky2_read32(hw, B0_Y2_SP_EISR); + local_irq_enable(); + goto restart_poll; + } + } + + local_irq_enable(); return 0; } @@ -2153,8 +2167,6 @@ prefetch(hw-st_le[hw-st_idx]); if (likely(__netif_rx_schedule_prep(dev0))) __netif_rx_schedule(dev0); - else - printk(KERN_DEBUG PFX irq race detected\n); return IRQ_HANDLED; } --- sky2-2.6.17.orig/include/linux/netdevice.h 2006-04-25 10:48:44.0 -0700 +++ sky2-2.6.17/include/linux/netdevice.h 2006-04-25 10:48:47.0 -0700 @@ -829,19 +829,21 @@ __netif_rx_schedule(dev); } -/* Try to reschedule poll. Called by dev-poll() after netif_rx_complete(). - * Do not inline this? - */ + +static inline void __netif_rx_reschedule(struct net_device *dev, int undo) +{ + dev-quota += undo; + list_add_tail(dev-poll_list, __get_cpu_var(softnet_data).poll_list); + __raise_softirq_irqoff(NET_RX_SOFTIRQ); +} + +/* Try to reschedule poll. Called by dev-poll() after netif_rx_complete(). */ static inline int netif_rx_reschedule(struct net_device *dev, int undo) { if (netif_rx_schedule_prep(dev)) { unsigned long flags; - - dev-quota += undo; - local_irq_save(flags); - list_add_tail(dev-poll_list, __get_cpu_var(softnet_data).poll_list); - __raise_softirq_irqoff(NET_RX_SOFTIRQ); + __netif_rx_reschedule(dev, undo); local_irq_restore(flags); return 1; } -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] sky2: add fake idle irq timer
On Tue, 25 Apr 2006 23:23:29 +0200 Francois Romieu [EMAIL PROTECTED] wrote: Stephen Hemminger [EMAIL PROTECTED] : [...] --- sky2-2.6.17.orig/drivers/net/sky2.c 2006-04-25 10:48:47.0 -0700 +++ sky2-2.6.17/drivers/net/sky2.c 2006-04-25 10:53:32.0 -0700 @@ -2086,6 +2086,20 @@ } } +/* If idle then force a fake soft NAPI poll once a second + * to work around cases where sharing an edge triggered interrupt. + */ +static void sky2_idle(unsigned long arg) +{ + struct net_device *dev = (struct net_device *) arg; + + local_irq_disable(); + if (__netif_rx_schedule_prep(dev)) + __netif_rx_schedule(dev); + local_irq_enable(); +} + + static int sky2_poll(struct net_device *dev0, int *budget) { struct sky2_hw *hw = ((struct sky2_port *) netdev_priv(dev0))-hw; @@ -2134,6 +2148,8 @@ sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ); } + mod_timer(hw-idle_timer, jiffies + HZ); + local_irq_disable(); __netif_rx_complete(dev0); Any objection against moving mod_timer() from sky2_poll() to sky2_idle() so as to keep poll() path unmodified ? If traffic is moving, then I want the timer to keep getting rescheduled farther out. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: suspicious unlikely usage in tcp_transmit_skb()
On Tue, 25 Apr 2006 14:46:49 -0700 (PDT) David S. Miller [EMAIL PROTECTED] wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Tue, 25 Apr 2006 10:01:49 -0700 # Hit# miss Function:[EMAIL PROTECTED] ! 0 50505 tcp_transmit_skb():net/ipv4/[EMAIL PROTECTED] ... How about just taking off the likely/unlikely in this case. Why remove it when we'll now get a 50505 to 0 hit rate? Depends on the data stream, but I guess if we are seeing high loss we really don't care about the CPU branch prediction. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] sky2: add fake idle irq timer
On Wed, 26 Apr 2006 00:39:00 +0200 Francois Romieu [EMAIL PROTECTED] wrote: Stephen Hemminger [EMAIL PROTECTED] : [...] Any objection against moving mod_timer() from sky2_poll() to sky2_idle() so as to keep poll() path unmodified ? If traffic is moving, then I want the timer to keep getting rescheduled farther out. If my version of the driver is not stale, the timer will not be rescheduled when work_done = work_limit. I am trying to work around possible lost IRQ's, not netdev scheduler screw up's. If workdone = work_limit, then it will already be called back later when it return's 1. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1)
On Tue, 25 Apr 2006 17:06:25 -0700 Guenther Thomsen [EMAIL PROTECTED] wrote: On Monday 17 April 2006 11:18, Stephen Hemminger wrote: I don't know what you are doing different, but my 2 port SysKonnect card is working fine. Running SMP AMD64 and 2.6.17 latest. Showing full speed on both ports. I missed that e-mail, sorry. I just gave it another try, this time with 2.6.16.11 . One port works fine (so far, I just did very limited testing with ttcp). The second port does negotiate IP address via DHCP, but the packgages it receives seem to be garbled: --8-- 0x: 6175 6469 7428 3131 3435 3939 3430 ..audit(11459940 0x0010: 3031 2e39 3738 3a33 3829 3a20 7573 6572 01.978:38):.user 0x0020: 2070 6964 3d33 3230 3920 7569 643d .pid=3209.uid= 12:56:23.725090 00:00:00:00:00:00 30:6e:6d:00:00:00 null I (s=32,r=55,P) len=42 12:56:24.603274 00:00:21:00:00:00 00:00:00:00:00:00 null disc/C len=43 12:56:26.619326 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:28.635346 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:29.734046 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:29.865239 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:30.651371 00:00:00:00:00:00 a6:00:00:00:4d:04, ethertype Unknown (0xe20c), length 60: 0x: 6175 6469 7428 3131 3435 3939 3436 ..audit(11459946 0x0010: 3031 2e33 3639 3a34 3729 3a20 7573 6572 01.369:47):.user 0x0020: 2070 6964 3d33 3239 3820 7569 643d .pid=3298.uid= 12:56:30.916718 00:00:f0:71:61:00 28:37:03:5b:3a:00 null I (s=16,r=0,C) len=42 12:56:30.923558 00:00:21:00:00:00 00:00:00:00:00:00 null rnr (r=55,C) len=42 12:56:32.667413 00:00:d0:2e:30:42 10:60:61:00:00:00, ethertype Unknown (0x572b), length 60: 0x: d675 0d00 0200 ...u 0x0010: 0x0020: 1300 .. 12:56:33.296384 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:33.303222 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 [..] 13:00:44.340062 00:00:00:00:00:00 5f:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:44.672350 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:44.868724 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:45.340123 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:46.340173 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:46.688433 IP truncated-ip - 1454 bytes missing! 192.168.65.66.40313 192.168.65.65.5001: . 1426488980:1426490428(1448) ack 1790562292 win 1460 nop,nop,timestamp[|tcp] 13:00:48.704431 00:00:21:00:00:00 00:00:00:00:00:00 null I (s=17,r=18,C) len=42 13:00:48.886426 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:50.720463 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:52.736496 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:54.752522 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:54.927556 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:54.934394 00:00:00:00:00:00 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 --8-- On a different host connected to the same switch, traffic looks more like: --8-- 2:01:49.388992 IP 192.168.64.1.ntp 255.255.255.255.ntp: NTPv3, Broadcast, length 48 12:01:50.176550 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:51.235034 arp reply 192.168.64.32 is-at 00:0a:49:00:5e:8a 12:01:51.241857 arp reply 192.168.64.33 is-at 00:0a:49:00:5e:8b 12:01:51.891193 00:00:01:02:c8:58 45:c0:00:1c:00:20, ethertype Unknown (0xe000), length 60: 0x: 0001 1164 ee9b ...d 0x0010: 2f6b 8c87 /k.. 0x0020: .. 12:01:52.192552 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:52.801392 arp reply 192.168.64.34 is-at 00:0a:49:00:5e:8c 12:01:52.808240 arp reply 192.168.64.35 is-at 00:0a:49:00:5e:8d 12:01:54.208495 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:56.224453 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:58.240464 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:02:00.029320 arp reply 192.168.64.39 is-at 00:0a:49:00:5e:ff 12:02:00.256420 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4
[RFC] bridge: partial rtnetlink hooks
This is the start of adding support for rtnetlink to the bridge code. So far it only supports accessing the list of links and notifying about link changes. It is just a prototype to get early feedback, don't use to build your own masterpiece yet. --- bridge-2.6.orig/net/bridge/Makefile +++ bridge-2.6/net/bridge/Makefile @@ -6,7 +6,7 @@ obj-$(CONFIG_BRIDGE) += bridge.o bridge-y := br.o br_device.o br_fdb.o br_forward.o br_if.o br_input.o \ br_ioctl.o br_notify.o br_stp.o br_stp_bpdu.o \ - br_stp_if.o br_stp_timer.o + br_stp_if.o br_stp_timer.o br_netlink.o bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o --- bridge-2.6.orig/net/bridge/br.c +++ bridge-2.6/net/bridge/br.c @@ -30,17 +30,20 @@ static struct llc_sap *br_stp_sap; static int __init br_init(void) { + int err = -EADDRINUSE; + br_stp_sap = llc_sap_open(LLC_SAP_BSPAN, br_stp_rcv); if (!br_stp_sap) { printk(KERN_ERR bridge: can't register sap for STP\n); - return -EBUSY; + goto out; } br_fdb_init(); #ifdef CONFIG_BRIDGE_NETFILTER - if (br_netfilter_init()) - return 1; + err = br_netfilter_init(); + if (err) + goto unregister_sap; #endif brioctl_set(br_ioctl_deviceless_stub); br_handle_frame_hook = br_handle_frame; @@ -50,13 +53,23 @@ static int __init br_init(void) register_netdevice_notifier(br_device_notifier); + br_netlink_init(); + return 0; +#ifdef CONFIG_BRIDGE_NETFILTER + unregister_sap: + llc_sap_close(br_stp_sap); +#endif + out: + return err; } static void __exit br_deinit(void) { llc_sap_close(br_stp_sap); + br_netlink_exit(); + #ifdef CONFIG_BRIDGE_NETFILTER br_netfilter_fini(); #endif --- /dev/null +++ bridge-2.6/net/bridge/br_netlink.c @@ -0,0 +1,135 @@ +/* + * Bridge netlink control interface + * + * Authors: + * Stephen Hemminger [EMAIL PROTECTED] + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include linux/kernel.h +#include linux/rtnetlink.h +#include br_private.h + +static int br_fill_ifinfo(struct sk_buff *skb, const struct net_bridge_port *port, + u32 pid, u32 seq, int event, unsigned int flags) +{ + const struct net_bridge *br = port-br; + const struct net_device *dev = port-dev; + struct ifinfomsg *r; + struct nlmsghdr *nlh; + unsigned char *b = skb-tail; + u32 mtu = dev-mtu; + u8 operstate = netif_running(dev) ? dev-operstate : IF_OPER_DOWN; + + printk(KERN_DEBUG bridge fill %s %s\n, dev-name, br-dev-name); + + nlh = NLMSG_NEW(skb, pid, seq, event, sizeof(*r), flags); + r = NLMSG_DATA(nlh); + r-ifi_family = AF_BRIDGE; + r-__ifi_pad = 0; + r-ifi_type = dev-type; + r-ifi_index = dev-ifindex; + r-ifi_flags = dev_get_flags(dev); + r-ifi_change = 0; + + RTA_PUT(skb, IFLA_IFNAME, strlen(dev-name)+1, dev-name); + + RTA_PUT(skb, IFLA_MASTER, sizeof(int), br-dev-ifindex); + + if (dev-addr_len) + RTA_PUT(skb, IFLA_ADDRESS, dev-addr_len, dev-dev_addr); + + RTA_PUT(skb, IFLA_MTU, sizeof(mtu), mtu); + if (dev-ifindex != dev-iflink) + RTA_PUT(skb, IFLA_LINK, sizeof(int), dev-iflink); + + + RTA_PUT(skb, IFLA_OPERSTATE, sizeof(operstate), operstate); + + if (event == RTM_NEWLINK) { + struct brifinfo portstate = { + .state = port-state, + .cost = port-path_cost, + }; + RTA_PUT(skb, IFLA_PROTINFO, sizeof(portstate), portstate); + } + + nlh-nlmsg_len = skb-tail - b; + + return skb-len; + +nlmsg_failure: +rtattr_failure: + + skb_trim(skb, b - skb-data); + return -1; +} + + +void br_ifinfo_notify(int event, struct net_bridge_port *port) +{ + struct sk_buff *skb; + + printk(KERN_DEBUG bridge notify event=%d\n, event); + skb = alloc_skb(NLMSG_SPACE(sizeof(struct ifinfomsg) + 128), + GFP_ATOMIC); + if (!skb) { + netlink_set_err(rtnl, 0, RTNLGRP_BRIDGE_IFINFO, ENOBUFS); + return; + } + if (br_fill_ifinfo(skb, port, current-pid, 0, event, 0) 0) { + kfree_skb(skb); + netlink_set_err(rtnl, 0, RTNLGRP_BRIDGE_IFINFO, EINVAL); + return; + } + NETLINK_CB(skb).dst_group = RTNLGRP_IPV6_IFINFO; + netlink_broadcast(rtnl, skb, 0, RTNLGRP_BRIDGE_IFINFO, GFP_ATOMIC); +} + +static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb
Re: tune back idle cwnd closing?
On Wed, 26 Apr 2006 15:16:18 -0700 Rick Jones [EMAIL PROTECTED] wrote: When you're bursty application is not sending, other flows can take up the pipe space you are not using, and you must reprobe to figure that out. If the restarted connection does normal slow-start, one of two things will happen yes? Either it will grow its cwnd to = the receiver's window, or it will have to stop before then because it triggered a packet loss. In the first case, seems it would have been just as good to let the connection burst. In the second case, is the effect on other connections really any better than if the connection just started-up from where it was before? BTW, is the RFC 2681? I looked that one up on ietf.org and the RFC by that number was a different beast entirely - at least at a very quick glance. rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html http://www.faqs.org/rfcs/rfc2861.html Long periods when the sender is application-limited can lead to the invalidation of the congestion window. During periods when the TCP sender is network-limited, the value of the congestion window is repeatedly revalidated by the successful transmission of a window of data without loss. When the TCP sender is network-limited, there is an incoming stream of acknowledgements that clocks out new data, giving concrete evidence of recent available bandwidth in the network. In contrast, during periods when the TCP sender is application-limited, the estimate of available capacity represented by the congestion window may become steadily less accurate over time. In particular, capacity that had once been used by the network- limited connection might now be used by other traffic. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
On Fri, 28 Apr 2006 10:02:10 -0700 Caitlin Bestler [EMAIL PROTECTED] wrote: Evgeniy Polyakov wrote: On Fri, Apr 28, 2006 at 08:59:19AM -0700, Caitlin Bestler ([EMAIL PROTECTED]) wrote: Btw, how is it supposed to work without header split capabale hardware? Hardware that can classify packets is obviously capable of doing header data separation, but that does not mean that it has to do so. If the host wants header data separation it's real value is that when packets arrive in order that fewer distinct copies are required to move the data to the user buffer (because separated data can be placed back-to-back in a data-only ring). But that's an optimization, it's not needed to make the idea worth doing, or even necessarily in the first implementation. If there is dataflow, not flow of packets or flow of data with holes, it could be possible to modify recv() to just return the right pointer, so in theory userspace modifications would be minimal. With copy in place it completely does not differ from current design with copy_to_user() being used since memcpy() is just slightly faster than copy*user(). If the app is really ready to use a modified interface we might as well just give them a QP/CQ interface. But I suppose receive by pointer interfaces don't really stretch the sockets interface all that badly. The key is that you have to decide how the buffer is released, is it the next call? Or a separate call? Does releasing buffer N+2 release buffers N and N+1? What you want to avoid is having to keep a scoreboard of which buffers have been released. Please just use existing AIO interface. We don't need another interface. The number of interfaces increases the exposed bug surface geometrically. Which means for each new interface, it means testing and fixing bugs in every possible usage. But in context, header/data separation would allow in order packets to have the data be placed back to back, which could allow a single recv to report the payload of multiple successive TCP segments. So the benefit of header/data separation remains the same, and I still say it's a optimization that should not be made a requirement. The benefits of vj_channels exist even without them. When the packet classifier runs on the host, header/data separation would not be free. I want to enable hardware offloads, not make the kernel bend over backwards to emulate how hardware would work. I'm just hoping that we can agree to let hardware do its work without being forced to work the same way the kernel does (i.e., running down a long list of arbitrary packet filter rules on a per packet basis). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] netem: fix loss
The following one line fix is needed to make loss function of netem work right when doing loss on the local host. Otherwise, higher layers just recover. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- linux-2.6.orig/net/sched/sch_netem.c +++ linux-2.6/net/sched/sch_netem.c @@ -167,7 +167,7 @@ static int netem_enqueue(struct sk_buff if (count == 0) { sch-qstats.drops++; kfree_skb(skb); - return NET_XMIT_DROP; + return NET_XMIT_BYPASS; } /* - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LARTC] how to change classful netem loss probability?
Loss was broken, patch sent. The following works now: # tc qdisc add dev eth1 root handle 1:0 netem loss 20% # tc qdisc add dev eth1 parent 1:1 handle 10: tbf \ rate 256kbit buffer 1600 limit 3000 # ping -f -c 1000 shell 1000 packets transmitted, 781 received, 21% packet loss, time 3214ms rtt min/avg/max/mdev = 0.187/0.398/3.763/0.730 ms, ipg/ewma 3.217/0.538 ms # tc qdisc chang dev eth1 handle 1: netem loss 1% # ping -f -c 1000 shell 1000 packets transmitted, 990 received, 1% packet loss, time 2922ms rtt min/avg/max/mdev = 0.187/2.739/3.298/0.789 ms, ipg/ewma 2.924/2.084 ms - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
On Fri, 28 Apr 2006 21:29:32 +0400 Evgeniy Polyakov [EMAIL PROTECTED] wrote: On Fri, Apr 28, 2006 at 10:18:33AM -0700, Stephen Hemminger ([EMAIL PROTECTED]) wrote: Please just use existing AIO interface. We don't need another interface. The number of interfaces increases the exposed bug surface geometrically. Which means for each new interface, it means testing and fixing bugs in every possible usage. Networking AIO? Like [1] :) That would be really good. 1. http://tservice.net.ru/~s0mbre/old/?section=projectsitem=naio The existing infrastructure is there in the syscall layer, it just isn't really AIO for sockets. That naio project has two problems, first they require driver changes, and he is doing it on the stupidest of hardware, optimizing a 8139too is foolish. Second, introducing kevents, seems unnecessary and hasn't been accepted in the mainline. The existing linux AIO model seems sufficient: http://lse.sourceforge.net/io/aio.html There is work to put true Posix AIO on top of this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
On Fri, 28 Apr 2006 12:16:36 -0700 (PDT) David S. Miller [EMAIL PROTECTED] wrote: From: Evgeniy Polyakov [EMAIL PROTECTED] Date: Fri, 28 Apr 2006 21:55:39 +0400 On Fri, Apr 28, 2006 at 10:41:18AM -0700, Stephen Hemminger ([EMAIL PROTECTED]) wrote: Second, introducing kevents, seems unnecessary and hasn't been accepted in the mainline. kevent was never sent to lkml@ although it showed over 40% win over epoll for test web server. Sending it to lkml@ is just jumping into ... not into technical world, so I posted it first here, but without much attention though. Frankly I found kevents to be a very strong idea. But there is this huge semantic overload of kevent, poll, epoll, aio, regular sendmsg/recv, posix aio, etc. Perhaps a clean break with the socket interface is needed. Otherwise, there are nasty complications with applications that mix old socket calls and new interface on the same connection. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 001/100] TCP congestion module: add TCP-LP supporting for 2.6.16
On Mon, 1 May 2006 18:05:52 +0800 Wong Edison [EMAIL PROTECTED] wrote: TCP Low Priority is a distributed algorithm whose goal is to utilize only the excess network bandwidth as compared to the ``fair share`` of bandwidth as targeted by TCP. Available from: http://www.ece.rice.edu/~akuzma/Doc/akuzma/TCP-LP.pdf See http://www-ece.rice.edu/networks/TCP-LP/ for their implementation. Our group take the following changes from the original TCP-LP implementation: o We use newReno in most core CA handling. Only add some checking within cong_avoid. o Error correcting in remote HZ, therefore remote HZ will be keeped on checking and updating. o Handling calculation of One-Way-Delay (OWD) within rtt_sample, sicne OWD have a similar meaning as RTT. Also correct the buggy formular. o Handle reaction for Early Congestion Indication (ECI) within pkts_acked, as mentioned within pseudo code. o OWD is handled in relative format, where local time stamp will in tcp_time_stamp format. Port from 2.4.19 to 2.6.16 as module by: Wong Hoi Sing Edison [EMAIL PROTECTED] Hung Hing Lun [EMAIL PROTECTED] Signed-off-by: Wong Hoi Sing Edison [EMAIL PROTECTED] Is this all of it? Your subject line says there are a 99 more pieces. That seems huge. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 001/100] TCP congestion module: add TCP-LP supporting for 2.6.16
+/** + * struct lp + * @flag: TCP-LP state flag + * @sowd: smoothed OWD 3 + * @owd_min: min OWD + * @owd_max: max OWD + * @owd_max_rsv: resrved max owd + * @RHZ: estimated remote HZ + * @remote_ref_time: remote reference time + * @local_ref_time: local reference time + * @last_drop: time for last active drop + * @inference: current inference + * + * TCP-LP's private struct. + * We get the idea from original TCP-LP implementation where only left those we + * found are really useful. + */ +struct lp { + u32 flag; + u32 sowd; + u32 owd_min; + u32 owd_max; + u32 owd_max_rsv; + u32 RHZ; + u32 remote_ref_time; + u32 local_ref_time; + u32 last_drop; + u32 inference; +}; It is best to keep structure element names lower case. s/RHZ/rhz/ or use remote_hz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VJ Channel API - driver level (PATCH)
On Wed, 3 May 2006 11:12:15 -0700 Caitlin Bestler [EMAIL PROTECTED] wrote: Evgeniy Polyakov wrote: On Wed, May 03, 2006 at 08:56:23AM -0700, Caitlin Bestler ([EMAIL PROTECTED]) wrote: I'd expect high end NIC ASICs to implement rx steering based upon some sort of hash (for load balancing), as well as explicit 1:1 steering between a sw channel and a hw channel. Both options for channel configuration are present in the driver interface. If netfilter assists can be done in hardware, I agree the driver interface will need to add support for these - otherwise, netfilter processing will stay above the driver. Even if the hardware cannot fully implement netfilter rules there is still value in having an interface that documents exactly how much filtering a given piece of hardware can do. There is no point in having the kernel repeat packet classifications that have already been done by the NIC. Please do not suppose that vj channel must rely on underlaying hardware. New interface MUST work better or at least not worse than existing skb queueing for majority of users, and I doubt users with netfilter capable hardware are there. It is only some hint to the SW, not rules, that hardware can provide. The best would be ipv4/ipv6 hashing, and I think it is enough. I agree. I was just stating that *if* there is direct hardware support then the software should be enabled to skip redundant checks. What I'm suggesting is really the equivalent of knowing whether the hardware generates or checks CRCs and TCP checksums. Don't mandate the feature, just have the option to avoid redundant work. Also like mulitcast filtering, you need to allow for the partial match case. If hardware can do some of the work, it is helps. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
sky2 1.3-rc1
Here is a new version that addresses some of the outstanding bugs. * There was a race in receive processing that would cause hang * Some more support for Yukon Ultra found in dual-core Centrino laptops (I want one of these). It does not fix the problems with dual port cards corrupting receive data (and possibly memory). http://developer.osdl.org/shemminger/prototypes/sky2-1.3-rc1.tar.bz2 If this works for most people, I'll post as separate patches for 2.6.17 tomorrow. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink+ARP+CLIP == broken,
On Wed, 03 May 2006 22:32:39 +0100 Simon Kelley [EMAIL PROTECTED] wrote: Both net/ipv4/arp.c and net/arm/clip.c create neighbour tables with family == AF_INET. For most purposes this is fine, since the two modules each hold a pointer to their table and pass it into the neigh_* functions. A problem arises in neigh_add, which is called by the rtnetlink code and which iterates through all the neighbour tables looking for the first one with the correct family. Since there are two different tables with family == AF_INET, sometimes it picks the wrong one. This leads to the situation where sending a RTM_NEWNEIGH message via netlink can generate an ignored and useless entry in the clip table, whilst the not affecting another entry in the ARP table, both entries for the same IP. Viz: sid:~# ip neigh 192.168.3.40 dev eth0 lladdr 52:54:00:12:34:59 REACHABLE 192.168.3.40 dev eth0 FAILED It's not immediately obvious how to fix this in a conceptually clean manner: neighbour tables are not associated with single netdevices, and they don't carry an address-type field. Given a {IP,lladdr,device} triple, its easy to determine if the device is ether-like or CLIP, but then the update call would have to go via the ARP and CLIP modules, instead of direct to the neighbour module in an address independent way. New address types would need further additions to the netlink/neighbour code. OTOH there are several obvious hacks that will fix the immediate problem. I'm happy to provide a patch implementing one if that's desired. Looking again, I think this is also a security hole, since the CLIP code keeps a whole struct including pointers in the neighbour table entry where ARP has the MAC address. So this might provide a way to poke arbitrary pointers into the kernel via RTM_NEWNEIGH. Only for root, though. This was fixed in 2.6.16.6 and current 2.6.17 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bridge: keep track of received multicast packets
It makes sense to add this simple statistic to keep track of received multicast packets. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- bridge.orig/net/bridge/br_input.c 2006-04-21 14:28:55.0 -0700 +++ bridge/net/bridge/br_input.c2006-05-04 16:07:24.0 -0700 @@ -66,6 +66,7 @@ } if (is_multicast_ether_addr(dest)) { + br-statistics.multicast++; br_flood_forward(br, skb, !passedup); if (!passedup) br_pass_frame_up(br, skb); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Netem] where i can find this netem patch?
On Fri, 05 May 2006 11:08:23 -0400 George Nychis [EMAIL PROTECTED] wrote: Hi, I need help finding this patch that Stephen made. He sent me a patch, but i do not think its related to the patch that solved this problem. I will include the patch he did forward to me at the bottom. However here is the problem, i even rtied his misspelling of change :) thorium-ini 15849-tests # tc qdisc add dev ath0 root handle 1:0 netem drop 0% thorium-ini 15849-tests # tc qdisc add dev ath0 parent 1:1 handle 10: xcp capacity 54Mbit limit 500 thorium-ini 15849-tests # tc qdisc change dev ath0 root handle 1:0 netem drop 1% RTNETLINK answers: Invalid argument The problem was you are giving handle 1:0 so the change request was going to xcp. And xcp doesn't understand netem rtnetlink message. You want to do: # tc qdisc change dev ath0 root netem drop 1% - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 1.3-rc1
On Fri, 5 May 2006 19:35:09 +0200 Thomas Glanzmann [EMAIL PROTECTED] wrote: Hello, http://developer.osdl.org/shemminger/prototypes/sky2-1.3-rc1.tar.bz2 v0.15: 64 bytes from 10.0.0.138: icmp_seq=2 ttl=64 time=0.467 ms v1.3-rc1: 64 bytes from 10.0.0.138: icmp_seq=4 ttl=64 time=32.9 ms I can't confirm this. For me it is just perfect: 64 bytes from 89.106.66.1: icmp_seq=1 ttl=64 time=0.278 ms :04:00.0 Ethernet controller: Marvell Technology Group Ltd.: Unknown device 4361 (rev 17) Thomas What is happening is that if there is a misconfiguration and irq routing is messed up (ie edge trigged). The driver will degenerate to polling every 100ms. If your system is this misconfigured, then ACPI or the BIOS needs to be fixed and the driver really only needs to work well enough to get the bug report out ;-) The older driver was doing rewhacking the Transmit IRQ status timer, so it would give a bogus transmit status interrupt and that was masking issues. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sky2 1.3-rc1
On Fri, 05 May 2006 19:42:27 +0100 Daniel Drake [EMAIL PROTECTED] wrote: Stephen Hemminger wrote: What is happening is that if there is a misconfiguration and irq routing is messed up (ie edge trigged). The driver will degenerate to polling every 100ms. If your system is this misconfigured, then ACPI or the BIOS needs to be fixed and the driver really only needs to work well enough to get the bug report out ;-) Ok, thanks for the explanation. Can you give any hints as to how we can classify this misconfiguration? Barry's system has a level triggered IRQ assigned to sky2, and that IRQ is not shared: http://bugs.gentoo.org/show_bug.cgi?id=132056#c3 I'm just looking for something I can take to the ACPI developers, other than its broken because Stephen said so ;) Try running idle_timeout=0 module parameter. In that case there will be no polling timer. If it just hangs, then the problem is missed interrupt. You could use this to see if you are getting irq's --- sky2.orig/drivers/net/sky2.c +++ sky2/drivers/net/sky2.c @@ -2125,6 +2125,9 @@ static int sky2_poll(struct net_device * int work_done = 0; u32 status = sky2_read32(hw, B0_Y2_SP_EISR); + if (netif_msg_intr((struct sky2_port *) netdev_priv(dev0))) + printk(KERN_DEBUG PFX poll status %#x\n, status); + if (status Y2_IS_HW_ERR) sky2_hw_intr(hw); @@ -2183,6 +2186,9 @@ static irqreturn_t sky2_intr(int irq, vo if (status == 0 || status == ~0) return IRQ_NONE; + if (netif_msg_intr((struct sky2_port *) netdev_priv(dev0))) + printk(KERN_DEBUG PFX irq status %#x\n, status); + prefetch(hw-st_le[hw-st_idx]); if (likely(__netif_rx_schedule_prep(dev0))) __netif_rx_schedule(dev0); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull upstream-fixes branch of wireless-2.6
Linus Torvalds wrote: On Fri, 5 May 2006, Andrew Morton wrote: On Fri, 5 May 2006 21:06:18 -0400 John W. Linville [EMAIL PROTECTED] wrote: These are fixes intended for 2.6.17...thanks! Jeff is offline for a couple of weeks. Please prepare a pull for Linus. Actually, while Jeff is off, Steve Hemminger is supposed to be the network driver overlord (All bow down before the mighty Shemminger), so please do synchronize with him. Of course, that might be just Steve taking a look and telling me yeah, please pull directly from John. Linus I had a bunch ready for monday... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netdev: hotplug napi race cleanup
On Sat, 06 May 2006 18:09:47 -0700 (PDT) David S. Miller [EMAIL PROTECTED] wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Mon, 24 Apr 2006 15:23:41 -0700 This follows after the earlier two patches. Change the initialization of the class device portion of the net device to be done earlier, so that any races before registration completes are harmless. Add a mutex to avoid changes to netdevice during the class device registration. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] I'm not going to apply this patch and instead request that we think about why this problem exists in the first place. This patch is even stronger evidence that doing the sysfs registry in the todo list processing is wrong. If you can legally do this while holding the rtnl semaphore, you can just as equally do it inside of register_netdevice() which is where it truly belongs. Then you can handle errors properly, unwind the state, and return the error to the caller instead of just losing the error and leaving the device in a half-registered state. The issue is are there network devices that can't sleep during register_netdevice? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.4 kern: want to print TCP cwnd with every packet
On Sat, 6 May 2006 10:19:16 -0400 (EDT) George P Nychis [EMAIL PROTECTED] wrote: Hi, I'd like to print the TCP cwnd for the sender, with every packet before it is sent out. This way i could plot the sender window over time to show TCP's behavior in certain conditions. I see in tcp_input.c several places where i could print the current window, but i'd have to add code in multiple places. I was wondering if there is any 1 place, right before a packet is sent out, that i could printk() tp-snd_cwnd Thanks! George - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Look at http://developer.osdl.org/shemminger/prototypes/tcpprobe.tar.gz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netdev: hotplug napi race cleanup
On Mon, 08 May 2006 11:37:31 -0700 (PDT) David S. Miller [EMAIL PROTECTED] wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Mon, 8 May 2006 09:54:58 -0700 The issue is are there network devices that can't sleep during register_netdevice? Oh right, I forgot about that. We could do something like this in register_netdevice() if (in_atomic() || irqs_disabled()) net_set_todo(dev); else { dev-reg_state = NETREG_REGISTERED; ret = netdev_register_sysfs(dev); if (ret) { ... } It seems a bit grotty, and might cause pain later. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 8/9] Add more support for the Yukon Ultra chip found in dual core centino laptops.
The newest Yukon Ultra chipset's require more special tweaks. They seem to be like the Yukon XL chipsets. This code is transliterated from the latest SysKonnect driver; I don't have any Ultra hardware. Signed-off-by: Stephe Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c +++ sky2/drivers/net/sky2.c @@ -304,7 +304,8 @@ static void sky2_phy_init(struct sky2_hw struct sky2_port *sky2 = netdev_priv(hw-dev[port]); u16 ctrl, ct1000, adv, pg, ledctrl, ledover; - if (sky2-autoneg == AUTONEG_ENABLE hw-chip_id != CHIP_ID_YUKON_XL) { + if (sky2-autoneg == AUTONEG_ENABLE + (hw-chip_id != CHIP_ID_YUKON_XL || hw-chip_id == CHIP_ID_YUKON_EC_U)) { u16 ectrl = gm_phy_read(hw, port, PHY_MARV_EXT_CTRL); ectrl = ~(PHY_M_EC_M_DSC_MSK | PHY_M_EC_S_DSC_MSK | @@ -332,7 +333,7 @@ static void sky2_phy_init(struct sky2_hw ctrl |= PHY_M_PC_MDI_XMODE(PHY_M_PC_ENA_AUTO); if (sky2-autoneg == AUTONEG_ENABLE - hw-chip_id == CHIP_ID_YUKON_XL) { + (hw-chip_id == CHIP_ID_YUKON_XL || hw-chip_id == CHIP_ID_YUKON_EC_U)) { ctrl = ~PHY_M_PC_DSC_MSK; ctrl |= PHY_M_PC_DSC(2) | PHY_M_PC_DOWN_S_ENA; } @@ -448,10 +449,11 @@ static void sky2_phy_init(struct sky2_hw gm_phy_write(hw, port, PHY_MARV_EXT_ADR, 3); /* set LED Function Control register */ - gm_phy_write(hw, port, PHY_MARV_PHY_CTRL, (PHY_M_LEDC_LOS_CTRL(1) | /* LINK/ACT */ - PHY_M_LEDC_INIT_CTRL(7) |/* 10 Mbps */ - PHY_M_LEDC_STA1_CTRL(7) |/* 100 Mbps */ - PHY_M_LEDC_STA0_CTRL(7))); /* 1000 Mbps */ + gm_phy_write(hw, port, PHY_MARV_PHY_CTRL, +(PHY_M_LEDC_LOS_CTRL(1) | /* LINK/ACT */ + PHY_M_LEDC_INIT_CTRL(7) | /* 10 Mbps */ + PHY_M_LEDC_STA1_CTRL(7) | /* 100 Mbps */ + PHY_M_LEDC_STA0_CTRL(7)));/* 1000 Mbps */ /* set Polarity Control register */ gm_phy_write(hw, port, PHY_MARV_PHY_STAT, @@ -465,6 +467,25 @@ static void sky2_phy_init(struct sky2_hw /* restore page register */ gm_phy_write(hw, port, PHY_MARV_EXT_ADR, pg); break; + case CHIP_ID_YUKON_EC_U: + pg = gm_phy_read(hw, port, PHY_MARV_EXT_ADR); + + /* select page 3 to access LED control register */ + gm_phy_write(hw, port, PHY_MARV_EXT_ADR, 3); + + /* set LED Function Control register */ + gm_phy_write(hw, port, PHY_MARV_PHY_CTRL, +(PHY_M_LEDC_LOS_CTRL(1) | /* LINK/ACT */ + PHY_M_LEDC_INIT_CTRL(8) | /* 10 Mbps */ + PHY_M_LEDC_STA1_CTRL(7) | /* 100 Mbps */ + PHY_M_LEDC_STA0_CTRL(7)));/* 1000 Mbps */ + + /* set Blink Rate in LED Timer Control Register */ + gm_phy_write(hw, port, PHY_MARV_INT_MASK, +ledctrl | PHY_M_LED_BLINK_RT(BLINK_84MS)); + /* restore page register */ + gm_phy_write(hw, port, PHY_MARV_EXT_ADR, pg); + break; default: /* set Tx LED (LED_TX) to blink mode on Rx OR Tx activity */ @@ -473,19 +494,21 @@ static void sky2_phy_init(struct sky2_hw ledover |= PHY_M_LED_MO_RX(MO_LED_OFF); } - if (hw-chip_id == CHIP_ID_YUKON_EC_U hw-chip_rev = 2) { + if (hw-chip_id == CHIP_ID_YUKON_EC_U hw-chip_rev == CHIP_REV_YU_EC_A1) { /* apply fixes in PHY AFE */ - gm_phy_write(hw, port, 22, 255); + pg = gm_phy_read(hw, port, PHY_MARV_EXT_ADR); + gm_phy_write(hw, port, PHY_MARV_EXT_ADR, 255); + /* increase differential signal amplitude in 10BASE-T */ - gm_phy_write(hw, port, 24, 0xaa99); - gm_phy_write(hw, port, 23, 0x2011); + gm_phy_write(hw, port, 0x18, 0xaa99); + gm_phy_write(hw, port, 0x17, 0x2011); /* fix for IEEE A/B Symmetry failure in 1000BASE-T */ - gm_phy_write(hw, port, 24, 0xa204); - gm_phy_write(hw, port, 23, 0x2002); + gm_phy_write(hw, port, 0x18, 0xa204); + gm_phy_write(hw, port, 0x17, 0x2002); /* set page register to 0 */ - gm_phy_write(hw, port, 22, 0); + gm_phy_write(hw, port, PHY_MARV_EXT_ADR, pg); } else { gm_phy_write(hw, port, PHY_MARV_LED_CTRL, ledctrl); @@ -559,6 +582,11 @@
[patch 6/9] sky2: dont write status ring
It is more efficient not to write the status ring from the processor and just read the active portion. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-05-02 09:49:38.0 -0700 +++ sky2/drivers/net/sky2.c 2006-05-02 09:49:42.0 -0700 @@ -1865,35 +1865,28 @@ static int sky2_status_intr(struct sky2_hw *hw, int to_do) { int work_done = 0; + u16 hwidx = sky2_read16(hw, STAT_PUT_IDX); rmb(); - for(;;) { + while (hw-st_idx != hwidx) { struct sky2_status_le *le = hw-st_le + hw-st_idx; struct net_device *dev; struct sky2_port *sky2; struct sk_buff *skb; u32 status; u16 length; - u8 link, opcode; - - opcode = le-opcode; - if (!opcode) - break; - opcode = ~HW_OWNER; hw-st_idx = RING_NEXT(hw-st_idx, STATUS_RING_SIZE); - le-opcode = 0; - link = le-link; - BUG_ON(link = 2); - dev = hw-dev[link]; + BUG_ON(le-link = 2); + dev = hw-dev[le-link]; sky2 = netdev_priv(dev); length = le-length; status = le-status; - switch (opcode) { + switch (le-opcode ~HW_OWNER) { case OP_RXSTAT: skb = sky2_receive(sky2, length, status); if (!skb) @@ -1944,8 +1937,8 @@ default: if (net_ratelimit()) printk(KERN_WARNING PFX - unknown status opcode 0x%x\n, opcode); - break; + unknown status opcode 0x%x\n, le-opcode); + goto exit_loop; } } -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 5/9] sky2: edge triggered workaround enhancement
Need to make the edge-triggered workaround timer faster to get marginally better peformance. The test_and_set_bit in schedule_prep() acts as a barrier already. Make it a module parameter so that laptops who are concerned about power can set it to 0; and user's stuck with broken BIOS's can turn the driver into pure polling. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-05-02 09:49:37.0 -0700 +++ sky2/drivers/net/sky2.c 2006-05-02 09:49:38.0 -0700 @@ -98,6 +98,10 @@ module_param(disable_msi, int, 0); MODULE_PARM_DESC(disable_msi, Disable Message Signaled Interrupt (MSI)); +static int idle_timeout = 100; +module_param(idle_timeout, int, 0); +MODULE_PARM_DESC(idle_timeout, Idle timeout workaround for lost interrupts (ms)); + static const struct pci_device_id sky2_id_table[] = { { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9000) }, { PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9E00) }, @@ -2092,12 +2096,13 @@ */ static void sky2_idle(unsigned long arg) { - struct net_device *dev = (struct net_device *) arg; + struct sky2_hw *hw = (struct sky2_hw *) arg; + struct net_device *dev = hw-dev[0]; - local_irq_disable(); if (__netif_rx_schedule_prep(dev)) __netif_rx_schedule(dev); - local_irq_enable(); + + mod_timer(hw-idle_timer, jiffies + msecs_to_jiffies(idle_timeout)); } @@ -2145,8 +2150,6 @@ if (work_done = work_limit) return 1; - mod_timer(hw-idle_timer, jiffies + HZ); - netif_rx_complete(dev0); status = sky2_read32(hw, B0_Y2_SP_LISR); @@ -2167,8 +2170,6 @@ prefetch(hw-st_le[hw-st_idx]); if (likely(__netif_rx_schedule_prep(dev0))) __netif_rx_schedule(dev0); - else - printk(KERN_DEBUG PFX irq race detected\n); return IRQ_HANDLED; } @@ -3290,7 +3291,10 @@ sky2_write32(hw, B0_IMSK, Y2_IS_BASE); - setup_timer(hw-idle_timer, sky2_idle, (unsigned long) dev); + setup_timer(hw-idle_timer, sky2_idle, (unsigned long) hw); + if (idle_timeout 0) + mod_timer(hw-idle_timer, + jiffies + msecs_to_jiffies(idle_timeout)); pci_set_drvdata(pdev, hw); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/9] sky2: backout NAPI reschedule
This is a backout of earlier patch. The whole rescheduling hack was a bad idea. It doesn't really solve the problem and it makes the code more complicated for no good reason. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c +++ sky2/drivers/net/sky2.c @@ -2105,7 +2105,6 @@ static int sky2_poll(struct net_device * int work_done = 0; u32 status = sky2_read32(hw, B0_Y2_SP_EISR); - restart_poll: if (unlikely(status ~Y2_IS_STAT_BMU)) { if (status Y2_IS_HW_ERR) sky2_hw_intr(hw); @@ -2136,7 +2135,7 @@ static int sky2_poll(struct net_device * } if (status Y2_IS_STAT_BMU) { - work_done += sky2_status_intr(hw, work_limit - work_done); + work_done = sky2_status_intr(hw, work_limit); *budget -= work_done; dev0-quota -= work_done; @@ -2148,22 +2147,9 @@ static int sky2_poll(struct net_device * mod_timer(hw-idle_timer, jiffies + HZ); - local_irq_disable(); - __netif_rx_complete(dev0); + netif_rx_complete(dev0); status = sky2_read32(hw, B0_Y2_SP_LISR); - - if (unlikely(status)) { - /* More work pending, try and keep going */ - if (__netif_rx_schedule_prep(dev0)) { - __netif_rx_reschedule(dev0, work_done); - status = sky2_read32(hw, B0_Y2_SP_EISR); - local_irq_enable(); - goto restart_poll; - } - } - - local_irq_enable(); return 0; } @@ -2181,6 +2167,8 @@ static irqreturn_t sky2_intr(int irq, vo prefetch(hw-st_le[hw-st_idx]); if (likely(__netif_rx_schedule_prep(dev0))) __netif_rx_schedule(dev0); + else + printk(KERN_DEBUG PFX irq race detected\n); return IRQ_HANDLED; } --- sky2.orig/include/linux/netdevice.h +++ sky2/include/linux/netdevice.h @@ -829,21 +829,19 @@ static inline void netif_rx_schedule(str __netif_rx_schedule(dev); } - -static inline void __netif_rx_reschedule(struct net_device *dev, int undo) -{ - dev-quota += undo; - list_add_tail(dev-poll_list, __get_cpu_var(softnet_data).poll_list); - __raise_softirq_irqoff(NET_RX_SOFTIRQ); -} - -/* Try to reschedule poll. Called by dev-poll() after netif_rx_complete(). */ +/* Try to reschedule poll. Called by dev-poll() after netif_rx_complete(). + * Do not inline this? + */ static inline int netif_rx_reschedule(struct net_device *dev, int undo) { if (netif_rx_schedule_prep(dev)) { unsigned long flags; + + dev-quota += undo; + local_irq_save(flags); - __netif_rx_reschedule(dev, undo); + list_add_tail(dev-poll_list, __get_cpu_var(softnet_data).poll_list); + __raise_softirq_irqoff(NET_RX_SOFTIRQ); local_irq_restore(flags); return 1; } -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/9] sky2 update
Bug fixes for sky2 driver: * fix NAPI related race that caused hangs * possible fixes for Yukon Ultra PHY support * performance improvement of ring management * fix race with irq on module removal -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 4/9] sky2: use mask instead of modulo operation
Gcc isn't smart enough to know that it can do a modulo operation with power of 2 constant by doing a mask. So add macro to do it for us. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c +++ sky2/drivers/net/sky2.c @@ -79,6 +79,8 @@ #define NAPI_WEIGHT64 #define PHY_RETRIES1000 +#define RING_NEXT(x,s) (((x)+1) ((s)-1)) + static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK | NETIF_MSG_TIMER | NETIF_MSG_TX_ERR | NETIF_MSG_RX_ERR @@ -719,7 +721,7 @@ static inline struct sky2_tx_le *get_tx_ { struct sky2_tx_le *le = sky2-tx_le + sky2-tx_prod; - sky2-tx_prod = (sky2-tx_prod + 1) % TX_RING_SIZE; + sky2-tx_prod = RING_NEXT(sky2-tx_prod, TX_RING_SIZE); return le; } @@ -735,7 +737,7 @@ static inline void sky2_put_idx(struct s static inline struct sky2_rx_le *sky2_next_rx(struct sky2_port *sky2) { struct sky2_rx_le *le = sky2-rx_le + sky2-rx_put; - sky2-rx_put = (sky2-rx_put + 1) % RX_LE_SIZE; + sky2-rx_put = RING_NEXT(sky2-rx_put, RX_LE_SIZE); return le; } @@ -1078,7 +1080,7 @@ err_out: /* Modular subtraction in ring */ static inline int tx_dist(unsigned tail, unsigned head) { - return (head - tail) % TX_RING_SIZE; + return (head - tail) (TX_RING_SIZE - 1); } /* Number of list elements available for next tx */ @@ -1255,7 +1257,7 @@ static int sky2_xmit_frame(struct sk_buf le-opcode = OP_BUFFER | HW_OWNER; fre = sky2-tx_ring - + ((re - sky2-tx_ring) + i + 1) % TX_RING_SIZE; + + RING_NEXT((re - sky2-tx_ring) + i, TX_RING_SIZE); pci_unmap_addr_set(fre, mapaddr, mapping); } @@ -1315,7 +1317,7 @@ static void sky2_tx_complete(struct sky2 for (i = 0; i skb_shinfo(skb)-nr_frags; i++) { struct tx_ring_info *fre; - fre = sky2-tx_ring + (put + i + 1) % TX_RING_SIZE; + fre = sky2-tx_ring + RING_NEXT(put + i, TX_RING_SIZE); pci_unmap_page(pdev, pci_unmap_addr(fre, mapaddr), skb_shinfo(skb)-frags[i].size, PCI_DMA_TODEVICE); @@ -1876,7 +1878,7 @@ static int sky2_status_intr(struct sky2_ break; opcode = ~HW_OWNER; - hw-st_idx = (hw-st_idx + 1) % STATUS_RING_SIZE; + hw-st_idx = RING_NEXT(hw-st_idx, STATUS_RING_SIZE); le-opcode = 0; link = le-link; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/9] sky2: status irq hang fix
The status interrupt flag should be cleared before processing, not afterwards to avoid race. Need to process in poll routine even if no new interrupt status. This is a normal occurrence when more than 64 frames (NAPI weight) are processed in one poll routine. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c2006-05-02 09:42:18.0 -0700 +++ sky2/drivers/net/sky2.c 2006-05-02 09:46:39.0 -0700 @@ -2105,45 +2105,42 @@ int work_done = 0; u32 status = sky2_read32(hw, B0_Y2_SP_EISR); - if (unlikely(status ~Y2_IS_STAT_BMU)) { - if (status Y2_IS_HW_ERR) - sky2_hw_intr(hw); + if (status Y2_IS_HW_ERR) + sky2_hw_intr(hw); - if (status Y2_IS_IRQ_PHY1) - sky2_phy_intr(hw, 0); + if (status Y2_IS_IRQ_PHY1) + sky2_phy_intr(hw, 0); - if (status Y2_IS_IRQ_PHY2) - sky2_phy_intr(hw, 1); + if (status Y2_IS_IRQ_PHY2) + sky2_phy_intr(hw, 1); - if (status Y2_IS_IRQ_MAC1) - sky2_mac_intr(hw, 0); + if (status Y2_IS_IRQ_MAC1) + sky2_mac_intr(hw, 0); - if (status Y2_IS_IRQ_MAC2) - sky2_mac_intr(hw, 1); + if (status Y2_IS_IRQ_MAC2) + sky2_mac_intr(hw, 1); - if (status Y2_IS_CHK_RX1) - sky2_descriptor_error(hw, 0, receive, Y2_IS_CHK_RX1); + if (status Y2_IS_CHK_RX1) + sky2_descriptor_error(hw, 0, receive, Y2_IS_CHK_RX1); - if (status Y2_IS_CHK_RX2) - sky2_descriptor_error(hw, 1, receive, Y2_IS_CHK_RX2); + if (status Y2_IS_CHK_RX2) + sky2_descriptor_error(hw, 1, receive, Y2_IS_CHK_RX2); - if (status Y2_IS_CHK_TXA1) - sky2_descriptor_error(hw, 0, transmit, Y2_IS_CHK_TXA1); + if (status Y2_IS_CHK_TXA1) + sky2_descriptor_error(hw, 0, transmit, Y2_IS_CHK_TXA1); - if (status Y2_IS_CHK_TXA2) - sky2_descriptor_error(hw, 1, transmit, Y2_IS_CHK_TXA2); - } + if (status Y2_IS_CHK_TXA2) + sky2_descriptor_error(hw, 1, transmit, Y2_IS_CHK_TXA2); - if (status Y2_IS_STAT_BMU) { - work_done = sky2_status_intr(hw, work_limit); - *budget -= work_done; - dev0-quota -= work_done; + if (status Y2_IS_STAT_BMU) + sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ); - if (work_done = work_limit) - return 1; + work_done = sky2_status_intr(hw, work_limit); + *budget -= work_done; + dev0-quota -= work_done; - sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ); - } + if (work_done = work_limit) + return 1; mod_timer(hw-idle_timer, jiffies + HZ); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/9] sky2: tx ring index mask fix
Mask for transmit ring status was picking up bits from the unused sync ring. They were always zero, so far... Also, make sure to remind self not to make tx ring too big. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c +++ sky2/drivers/net/sky2.c @@ -1927,7 +1927,8 @@ static int sky2_status_intr(struct sky2_ case OP_TXINDEXLE: /* TX index reports status for both ports */ - sky2_tx_done(hw-dev[0], status 0x); + BUILD_BUG_ON(TX_RING_SIZE 0x1000); + sky2_tx_done(hw-dev[0], status 0xfff); if (hw-dev[1]) sky2_tx_done(hw-dev[1], ((status 24) 0xff) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 9/9] sky2: version 1.3
Update version number, to track changes. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c +++ sky2/drivers/net/sky2.c @@ -51,7 +51,7 @@ #include sky2.h #define DRV_NAME sky2 -#define DRV_VERSION1.2 +#define DRV_VERSION1.3 #define PFXDRV_NAME /* -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
iproute2 git repository
I moved iproute2 out of CVS. New home is: git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git Will keep CVS tree up to date until the next release, after that it is will rest in peace. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Please pull updated network drivers
These fixes are for 2.6.17, please excuse my git learning curve. I have about had it for today. git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/netdev-2.6.git upstream Daniel Drake: softmac: don't reassociate if user asked for deauthentication softmac: make non-operational after being stopped David Woodhouse: bcm43xx: Fix access to non-existent PHY registers Herbert Valerio Riedel: au1000_eth.c: use ether_crc() from linux/crc32.h Jean Delvare: ieee80211: Fix A band channel count (resent) Jens Osterkamp: spidernet: introduce new setting spidernet: enable support for bcm5461 ethernet phy Michael Buesch: bcm43xx: fix iwmode crash when down bcm43xx: Fix array overrun in bcm43xx_geo_init Sergei Shtylyov: Fix RTL8019AS init for Toshiba RBTX49xx boards Stefano Brivio: bcm43xx: check for valid MAC address in SPROM Stephen Hemminger: sky2: backout NAPI reschedule sky2: status irq hang fix sky2: tx ring index mask fix sky2: use mask instead of modulo operation sky2: edge triggered workaround enhancement sky2: dont write status ring sky2: synchronize irq on remove Add more support for the Yukon Ultra chip found in dual core centino laptops. sky2: version 1.3 Merge branch 'upstream-fixes' of git://git.kernel.org/.../linville/wireless-2.6 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp compound
On Tue, 9 May 2006 19:39:43 +0200 Angelo P. Castellani [EMAIL PROTECTED] wrote: I resend the file because I've sent an old (quite identical) copy Moved discussion over to netdev mailing list.. Could you export symbols in tcp_vegas (and change config dependencies) to allow code reuse rather than having to copy/paste everything from vegas? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iproute2 git repository
On Tue, 09 May 2006 21:51:44 +1000 Herbert Xu [EMAIL PROTECTED] wrote: Stephen Hemminger [EMAIL PROTECTED] wrote: I moved iproute2 out of CVS. New home is: git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git Thanks Stephen. BTW, how come there is a checked out tree sitting in that git directory? fixed. stupid git - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] netdev sysfs failure handling
Something like this would handle errors better, but introduce possible problems for drivers that call register_netdevice with irq's disabled. There was some comment about racing with linkwatch, but don't see how that could happen during creation. For 2.6.18? --- bridge.orig/include/linux/netdevice.h 2006-05-09 11:17:08.0 -0700 +++ bridge/include/linux/netdevice.h2006-05-09 11:18:52.0 -0700 @@ -433,8 +433,7 @@ /* register/unregister state machine */ enum { NETREG_UNINITIALIZED=0, - NETREG_REGISTERING, /* called register_netdevice */ - NETREG_REGISTERED, /* completed register todo */ + NETREG_REGISTERED, /* completed register_netdevice */ NETREG_UNREGISTERING,/* called unregister_netdevice */ NETREG_UNREGISTERED, /* completed unregister todo */ NETREG_RELEASED, /* called free_netdev */ --- bridge.orig/net/core/dev.c 2006-05-09 11:17:09.0 -0700 +++ bridge/net/core/dev.c 2006-05-09 11:37:18.0 -0700 @@ -2777,6 +2777,8 @@ BUG_ON(dev_boot_phase); ASSERT_RTNL(); + might_sleep(); + /* When net_device's are persistent, this will be fatal. */ BUG_ON(dev-reg_state != NETREG_UNINITIALIZED); @@ -2863,6 +2865,11 @@ if (!dev-rebuild_header) dev-rebuild_header = default_rebuild_header; + ret = netdev_register_sysfs(dev); + if (ret) + goto out_err; + dev-reg_state = NETREG_REGISTERED; + /* * Default initial state at registry is that the * device is present. @@ -2878,14 +2885,11 @@ hlist_add_head(dev-name_hlist, head); hlist_add_head(dev-index_hlist, dev_index_hash(dev-ifindex)); dev_hold(dev); - dev-reg_state = NETREG_REGISTERING; write_unlock_bh(dev_base_lock); /* Notify protocols, that a new device appeared. */ blocking_notifier_call_chain(netdev_chain, NETDEV_REGISTER, dev); - /* Finish registration after unlock */ - net_set_todo(dev); ret = 0; out: @@ -3008,7 +3012,7 @@ * * We are invoked by rtnl_unlock() after it drops the semaphore. * This allows us to deal with problems: - * 1) We can create/delete sysfs objects which invoke hotplug + * 1) We can delete sysfs objects which invoke hotplug *without deadlocking with linkwatch via keventd. * 2) Since we run with the RTNL semaphore not held, we can sleep *safely in order to wait for the netdev refcnt to drop to zero. @@ -3017,8 +3021,6 @@ void netdev_run_todo(void) { struct list_head list = LIST_HEAD_INIT(list); - int err; - /* Need to guard against multiple cpu's getting out of order. */ mutex_lock(net_todo_run_mutex); @@ -3041,40 +3043,29 @@ = list_entry(list.next, struct net_device, todo_list); list_del(dev-todo_list); - switch(dev-reg_state) { - case NETREG_REGISTERING: - err = netdev_register_sysfs(dev); - if (err) - printk(KERN_ERR %s: failed sysfs registration (%d)\n, - dev-name, err); - dev-reg_state = NETREG_REGISTERED; - break; - - case NETREG_UNREGISTERING: - netdev_unregister_sysfs(dev); - dev-reg_state = NETREG_UNREGISTERED; - - netdev_wait_allrefs(dev); - - /* paranoia */ - BUG_ON(atomic_read(dev-refcnt)); - BUG_TRAP(!dev-ip_ptr); - BUG_TRAP(!dev-ip6_ptr); - BUG_TRAP(!dev-dn_ptr); - - - /* It must be the very last action, -* after this 'dev' may point to freed up memory. -*/ - if (dev-destructor) - dev-destructor(dev); - break; - - default: + if (unlikely(dev-reg_state != NETREG_UNREGISTERING)) { printk(KERN_ERR network todo '%s' but state %d\n, dev-name, dev-reg_state); - break; + dump_stack(); + continue; } + + netdev_unregister_sysfs(dev); + dev-reg_state = NETREG_UNREGISTERED; + + netdev_wait_allrefs(dev); + + /* paranoia */ + BUG_ON(atomic_read(dev-refcnt)); + BUG_TRAP(!dev-ip_ptr); + BUG_TRAP(!dev-ip6_ptr); + BUG_TRAP(!dev-dn_ptr); + + /* It must be the very last action, +* after this 'dev' may point to freed up memory. +*/ +
Re: [RFC PATCH 34/35] Add the Xen virtual network device driver.
The stuff in /proc could easily just be added attributes to the class_device kobject of the net device (and then show up in sysfs). + +#define GRANT_INVALID_REF0 + +#define NET_TX_RING_SIZE __RING_SIZE((struct netif_tx_sring *)0, PAGE_SIZE) +#define NET_RX_RING_SIZE __RING_SIZE((struct netif_rx_sring *)0, PAGE_SIZE) + +static inline void init_skb_shinfo(struct sk_buff *skb) +{ + atomic_set((skb_shinfo(skb)-dataref), 1); + skb_shinfo(skb)-nr_frags = 0; + skb_shinfo(skb)-frag_list = NULL; +} + Could you use existing sk_buff_head instead of inventing your own skb queue? +struct netfront_info +{ + struct list_head list; + struct net_device *netdev; + + struct net_device_stats stats; + unsigned int tx_full; + + struct netif_tx_front_ring tx; + struct netif_rx_front_ring rx; + + spinlock_t tx_lock; + spinlock_t rx_lock; + + unsigned int handle; + unsigned int evtchn, irq; + + /* What is the status of our connection to the remote backend? */ +#define BEST_CLOSED 0 +#define BEST_DISCONNECTED 1 +#define BEST_CONNECTED2 + unsigned int backend_state; + + /* Is this interface open or closed (down or up)? */ +#define UST_CLOSED0 +#define UST_OPEN 1 + unsigned int user_state; + + /* Receive-ring batched refills. */ +#define RX_MIN_TARGET 8 +#define RX_DFL_MIN_TARGET 64 +#define RX_MAX_TARGET NET_RX_RING_SIZE + int rx_min_target, rx_max_target, rx_target; + struct sk_buff_head rx_batch; + + struct timer_list rx_refill_timer; + + /* + * {tx,rx}_skbs store outstanding skbuffs. The first entry in each + * array is an index into a chain of free entries. + */ + struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1]; + struct sk_buff *rx_skbs[NET_RX_RING_SIZE+1]; + + grant_ref_t gref_tx_head; + grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1]; + grant_ref_t gref_rx_head; + grant_ref_t grant_rx_ref[NET_TX_RING_SIZE + 1]; + + struct xenbus_device *xbdev; + int tx_ring_ref; + int rx_ring_ref; + u8 mac[ETH_ALEN]; Isn't mac address already stored in dev-dev_addr and/or dev-perm_addr? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 34/35] Add the Xen virtual network device driver.
+static int setup_device(struct xenbus_device *dev, struct netfront_info *info) +{ + struct netif_tx_sring *txs; + struct netif_rx_sring *rxs; + int err; + struct net_device *netdev = info-netdev; + + info-tx_ring_ref = GRANT_INVALID_REF; + info-rx_ring_ref = GRANT_INVALID_REF; + info-rx.sring = NULL; + info-tx.sring = NULL; + info-irq = 0; + + txs = (struct netif_tx_sring *)get_zeroed_page(GFP_KERNEL); + if (!txs) { + err = -ENOMEM; + xenbus_dev_fatal(dev, err, allocating tx ring page); + goto fail; + } + rxs = (struct netif_rx_sring *)get_zeroed_page(GFP_KERNEL); + if (!rxs) { + err = -ENOMEM; + xenbus_dev_fatal(dev, err, allocating rx ring page); + free_page((unsigned long)txs); + goto fail; + } + info-backend_state = BEST_DISCONNECTED; + + SHARED_RING_INIT(txs); + FRONT_RING_INIT(info-tx, txs, PAGE_SIZE); + + SHARED_RING_INIT(rxs); + FRONT_RING_INIT(info-rx, rxs, PAGE_SIZE); + + err = xenbus_grant_ring(dev, virt_to_mfn(txs)); + if (err 0) + goto fail; + info-tx_ring_ref = err; + + err = xenbus_grant_ring(dev, virt_to_mfn(rxs)); + if (err 0) + goto fail; + info-rx_ring_ref = err; + + err = xenbus_alloc_evtchn(dev, info-evtchn); + if (err) + goto fail; + + memcpy(netdev-dev_addr, info-mac, ETH_ALEN); + network_connect(netdev); + info-irq = bind_evtchn_to_irqhandler( + info-evtchn, netif_int, SA_SAMPLE_RANDOM, netdev-name, This doesn't look like a real random entropy source. packets arriving from another domain are easily timed. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] netdev sysfs failure handling
On Tue, 09 May 2006 14:05:01 -0700 (PDT) David S. Miller [EMAIL PROTECTED] wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Tue, 9 May 2006 12:01:07 -0700 Something like this would handle errors better, but introduce possible problems for drivers that call register_netdevice with irq's disabled. There was some comment about racing with linkwatch, but don't see how that could happen during creation. For 2.6.18? I've been thinking about this a bit more. How can anyone be using this with IRQ's disabled if we have an ASSERT_RTNL() there? Agreed, especially since rtnl is now a real mutex. The case, that I was worried about: rtnl_lock() spin_lock_irq(mylock); x = register_netdevice(); ... Doesn't show up in any current code, even for the pseudo devices and funny virtualized interfaces. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sky2: ifdown kills irq mask
Bringing down a port also masks off the status and other IRQ's needed for device to function due to missing paren's. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c +++ sky2/drivers/net/sky2.c @@ -128,6 +128,7 @@ MODULE_DEVICE_TABLE(pci, sky2_id_table); /* Avoid conditionals by using array */ static const unsigned txqaddr[] = { Q_XA1, Q_XA2 }; static const unsigned rxqaddr[] = { Q_R1, Q_R2 }; +static const u32 portirq_msk[] = { Y2_IS_PORT_1, Y2_IS_PORT_2 }; /* This driver supports yukon2 chipset only */ static const char *yukon2_name[] = { @@ -1084,7 +1085,7 @@ static int sky2_up(struct net_device *de /* Enable interrupts from phy/mac for port */ imask = sky2_read32(hw, B0_IMSK); - imask |= (port == 0) ? Y2_IS_PORT_1 : Y2_IS_PORT_2; + imask |= portirq_msk[port]; sky2_write32(hw, B0_IMSK, imask); return 0; @@ -1435,7 +1436,7 @@ static int sky2_down(struct net_device * /* Disable port IRQ */ imask = sky2_read32(hw, B0_IMSK); - imask = ~(sky2-port == 0) ? Y2_IS_PORT_1 : Y2_IS_PORT_2; + imask = ~portirq_msk[port]; sky2_write32(hw, B0_IMSK, imask); /* turn off LED's */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] netdev sysfs failure handling
On Tue, 09 May 2006 15:43:22 -0700 (PDT) David S. Miller [EMAIL PROTECTED] wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Tue, 9 May 2006 14:40:49 -0700 Agreed, especially since rtnl is now a real mutex. The case, that I was worried about: rtnl_lock() spin_lock_irq(mylock); x = register_netdevice(); ... Doesn't show up in any current code, even for the pseudo devices and funny virtualized interfaces. Right, therefore I think we should put something like your patch in there now perhaps. The case where we really needed the todo list is unregister, so that we can safely wait for all references to the net device to go away. I still wonder about those mentioned hotplug races wrt. linkwatch in the comment above netdev_run_todo(). Linkwatch is such a nuissance because it combines asynchronous link state change processing with keventd and RTNL locking. It sleeps waiting for __LINK_STATE_SCHED to clear with the RTNL held (via dev_deactivate()). But then again dev_close() code paths do this too, so the dev_deactivate() bit should be OK. Linkwatch, after doing the dev_activate(), emits a NETDEV_CHANGE notifier on netdev_chain and also sends out an RTM_NETLINK message. This is for the case where IFF_UP is set. Until we release the RTNL semaphore, during netdev register, nobody can go in an inspect the state of a net device. So doing the sysfs node creation in register_netdevice() should be OK as far as I can tell. Can anyone find a problem with this? Also, by getting the netdevice fully in sysfs under RTNL, we are safe from races with the hotplug uevent that occurs. Right now, it might be possible on SMP for the hotplug to happen after register_netdevice, but before the device shows up in sysfs. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] register_netdevice and sysfs changes
This is a signed-off version of yesterday's fix, plus the bridge code no longer needs to be so tricky. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] bridge: do sysfs registration inside rtnl
Now that netdevice sysfs registration is done as part of register_netdevice; bridge code no longer has to be tricky when adding it's kobjects to bridges. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- bridge.orig/net/bridge/br_if.c 2006-05-04 16:22:29.0 -0700 +++ bridge/net/bridge/br_if.c 2006-05-09 11:27:16.0 -0700 @@ -308,26 +308,19 @@ if (ret) goto err2; - /* network device kobject is not setup until -* after rtnl_unlock does it's hotplug magic. -* so hold reference to avoid race. -*/ - dev_hold(dev); - rtnl_unlock(); - ret = br_sysfs_addbr(dev); - dev_put(dev); - - if (ret) - unregister_netdev(dev); - out: - return ret; + if (ret) + goto err3; + rtnl_unlock(); + return 0; + err3: + unregister_netdev(dev); err2: free_netdev(dev); err1: rtnl_unlock(); - goto out; + return ret; } int br_del_bridge(const char *name) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] netdev: do sysfs registration as part of register_netdevice
The last step of netdevice registration was being done by a delayed call, but because it was delayed, it was impossible to return any error code if the class_device registration failed. Side effects: * one state in registration process is unnecessary. * register_netdevice can sleep inside class_device registration/hotplug * code in netdev_run_todo only does unregistration so it is simpler. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- bridge.orig/include/linux/netdevice.h 2006-05-09 11:17:08.0 -0700 +++ bridge/include/linux/netdevice.h2006-05-09 11:18:52.0 -0700 @@ -433,8 +433,7 @@ /* register/unregister state machine */ enum { NETREG_UNINITIALIZED=0, - NETREG_REGISTERING, /* called register_netdevice */ - NETREG_REGISTERED, /* completed register todo */ + NETREG_REGISTERED, /* completed register_netdevice */ NETREG_UNREGISTERING,/* called unregister_netdevice */ NETREG_UNREGISTERED, /* completed unregister todo */ NETREG_RELEASED, /* called free_netdev */ --- bridge.orig/net/core/dev.c 2006-05-09 11:17:09.0 -0700 +++ bridge/net/core/dev.c 2006-05-09 11:37:18.0 -0700 @@ -2777,6 +2777,8 @@ BUG_ON(dev_boot_phase); ASSERT_RTNL(); + might_sleep(); + /* When net_device's are persistent, this will be fatal. */ BUG_ON(dev-reg_state != NETREG_UNINITIALIZED); @@ -2863,6 +2865,11 @@ if (!dev-rebuild_header) dev-rebuild_header = default_rebuild_header; + ret = netdev_register_sysfs(dev); + if (ret) + goto out_err; + dev-reg_state = NETREG_REGISTERED; + /* * Default initial state at registry is that the * device is present. @@ -2878,14 +2885,11 @@ hlist_add_head(dev-name_hlist, head); hlist_add_head(dev-index_hlist, dev_index_hash(dev-ifindex)); dev_hold(dev); - dev-reg_state = NETREG_REGISTERING; write_unlock_bh(dev_base_lock); /* Notify protocols, that a new device appeared. */ blocking_notifier_call_chain(netdev_chain, NETDEV_REGISTER, dev); - /* Finish registration after unlock */ - net_set_todo(dev); ret = 0; out: @@ -3008,7 +3012,7 @@ * * We are invoked by rtnl_unlock() after it drops the semaphore. * This allows us to deal with problems: - * 1) We can create/delete sysfs objects which invoke hotplug + * 1) We can delete sysfs objects which invoke hotplug *without deadlocking with linkwatch via keventd. * 2) Since we run with the RTNL semaphore not held, we can sleep *safely in order to wait for the netdev refcnt to drop to zero. @@ -3017,8 +3021,6 @@ void netdev_run_todo(void) { struct list_head list = LIST_HEAD_INIT(list); - int err; - /* Need to guard against multiple cpu's getting out of order. */ mutex_lock(net_todo_run_mutex); @@ -3041,40 +3043,29 @@ = list_entry(list.next, struct net_device, todo_list); list_del(dev-todo_list); - switch(dev-reg_state) { - case NETREG_REGISTERING: - err = netdev_register_sysfs(dev); - if (err) - printk(KERN_ERR %s: failed sysfs registration (%d)\n, - dev-name, err); - dev-reg_state = NETREG_REGISTERED; - break; - - case NETREG_UNREGISTERING: - netdev_unregister_sysfs(dev); - dev-reg_state = NETREG_UNREGISTERED; - - netdev_wait_allrefs(dev); - - /* paranoia */ - BUG_ON(atomic_read(dev-refcnt)); - BUG_TRAP(!dev-ip_ptr); - BUG_TRAP(!dev-ip6_ptr); - BUG_TRAP(!dev-dn_ptr); - - - /* It must be the very last action, -* after this 'dev' may point to freed up memory. -*/ - if (dev-destructor) - dev-destructor(dev); - break; - - default: + if (unlikely(dev-reg_state != NETREG_UNREGISTERING)) { printk(KERN_ERR network todo '%s' but state %d\n, dev-name, dev-reg_state); - break; + dump_stack(); + continue; } + + netdev_unregister_sysfs(dev); + dev-reg_state = NETREG_UNREGISTERED; + + netdev_wait_allrefs(dev); + + /* paranoia */ + BUG_ON(atomic_read(dev-refcnt)); + BUG_TRAP(!dev-ip_ptr); + BUG_TRAP(!dev-ip6_ptr
please pull upstream branch of netdev-2.6
The following changes since commit 6810b548b25114607e0814612d84125abccc0a4f: Andi Kleen: x86_64: Move ondemand timer into own work queue are found in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/netdev-2.6.git upstream Francois Romieu: dl2k: use DMA_48BIT_MASK constant Herbert Valerio Riedel: phy: mdiobus_register(): initialize all phy_map entries James Cameron: sis900: phy for FoxCon motherboard Stephen Hemminger: sky2: ifdown kills irq mask drivers/net/dl2k.c | 12 ++-- drivers/net/phy/mdio_bus.c |4 +++- drivers/net/sis900.c|1 + drivers/net/sky2.c |5 +++-- include/linux/dma-mapping.h |1 + 5 files changed, 14 insertions(+), 9 deletions(-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] myri10ge - Driver header files
On Wed, 10 May 2006 23:36:18 +0200 Brice Goglin [EMAIL PROTECTED] wrote: [PATCH 3/6] myri10ge - Driver header files myri10ge driver header files. myri10ge_mcp.h is the generic header, while myri10ge_mcp_gen_header.h is automatically generated from our firmware image. Then clean it up after the auto generation. Auto generated code still gets maintained by humans. Signed-off-by: Brice Goglin [EMAIL PROTECTED] Signed-off-by: Andrew J. Gallatin [EMAIL PROTECTED] myri10ge_mcp.h| 233 ++ myri10ge_mcp_gen_header.h | 73 ++ 2 files changed, 306 insertions(+) --- /dev/null 2006-04-21 00:45:09.06443 -0700 +++ linux-mm/drivers/net/myri10ge/myri10ge_mcp.h 2006-04-21 08:20:59.0 -0700 @@ -0,0 +1,233 @@ +#ifndef _myri10ge_mcp_h +#define _myri10ge_mcp_h + +#define MYRI10GE_MCP_MAJOR 1 +#define MYRI10GE_MCP_MINOR 4 + Major/Minor for what. You don't have a character device. +#ifdef MYRI10GE_MCP +typedef signed char int8_t; +typedef signed shortint16_t; +typedef signed int int32_t; +typedef signed long longint64_t; +typedef unsigned char uint8_t; +typedef unsigned short uint16_t; +typedef unsigned int uint32_t; +typedef unsigned long long uint64_t; +#endif Use u8 u16 u32 +/* 8 Bytes */ +typedef struct +{ + uint32_t high; + uint32_t low; +} mcp_dma_addr_t; Run this through scripts/Lindent and get indentation right +/* 16 Bytes */ +typedef struct +{ + uint16_t checksum; + uint16_t length; +} mcp_slot_t; + +/* 64 Bytes */ +typedef struct +{ + uint32_t cmd; + uint32_t data0;/* will be low portion if data 32 bits */ + /* 8 */ + uint32_t data1;/* will be high portion if data 32 bits */ + uint32_t data2;/* currently unused.. */ + /* 16 */ + mcp_dma_addr_t response_addr; + /* 24 */ + uint8_t pad[40]; +} mcp_cmd_t; + +/* 8 Bytes */ +typedef struct +{ + uint32_t data; + uint32_t result; +} mcp_cmd_response_t; + + + +/* + flags used in mcp_kreq_ether_send_t: + + The SMALL flag is only needed in the first segment. It is raised + for packets that are total less or equal 512 bytes. + + The CKSUM flag must be set in all segments. + + The PADDED flags is set if the packet needs to be padded, and it + must be set for all segments. + + The MYRI10GE_MCP_ETHER_FLAGS_ALIGN_ODD must be set if the cumulative + length of all previous segments was odd. +*/ + + +#define MYRI10GE_MCP_ETHER_FLAGS_SMALL 0x1 +#define MYRI10GE_MCP_ETHER_FLAGS_TSO_HDR0x1 +#define MYRI10GE_MCP_ETHER_FLAGS_FIRST 0x2 +#define MYRI10GE_MCP_ETHER_FLAGS_ALIGN_ODD 0x4 +#define MYRI10GE_MCP_ETHER_FLAGS_CKSUM 0x8 +#define MYRI10GE_MCP_ETHER_FLAGS_TSO_LAST 0x8 +#define MYRI10GE_MCP_ETHER_FLAGS_NO_TSO 0x10 +#define MYRI10GE_MCP_ETHER_FLAGS_TSO_CHOP 0x10 +#define MYRI10GE_MCP_ETHER_FLAGS_TSO_PLD0x20 + +#define MYRI10GE_MCP_ETHER_SEND_SMALL_SIZE 1520 +#define MYRI10GE_MCP_ETHER_MAX_MTU 9400 + +typedef union mcp_pso_or_cumlen +{ + uint16_t pseudo_hdr_offset; + uint16_t cum_len; +} mcp_pso_or_cumlen_t; + +#define MYRI10GE_MCP_ETHER_MAX_SEND_DESC 12 +#define MYRI10GE_MCP_ETHER_PAD 2 + +/* 16 Bytes */ +typedef struct +{ + uint32_t addr_high; + uint32_t addr_low; + uint16_t pseudo_hdr_offset; + uint16_t length; + uint8_t pad; + uint8_t rdma_count; + uint8_t cksum_offset; /* where to start computing cksum */ + uint8_t flags;/* as defined above */ +} mcp_kreq_ether_send_t; + +/* 8 Bytes */ +typedef struct +{ + uint32_t addr_high; + uint32_t addr_low; +} mcp_kreq_ether_recv_t; + + +/* Commands */ + +#define MYRI10GE_MCP_CMD_OFFSET 0xf8 + +typedef enum { + MYRI10GE_MCP_CMD_NONE = 0, + /* Reset the mcp, it is left in a safe state, waiting + for the driver to set all its parameters */ + MYRI10GE_MCP_CMD_RESET, + + /* get the version number of the current firmware.. + (may be available in the eeprom strings..? */ + MYRI10GE_MCP_GET_MCP_VERSION, + + + /* Parameters which must be set by the driver before it can + issue MYRI10GE_MCP_CMD_ETHERNET_UP. They persist until the next + MYRI10GE_MCP_CMD_RESET is issued */ + + MYRI10GE_MCP_CMD_SET_INTRQ_DMA, + MYRI10GE_MCP_CMD_SET_BIG_BUFFER_SIZE, /* in bytes, power of 2 */ + MYRI10GE_MCP_CMD_SET_SMALL_BUFFER_SIZE,/* in bytes */ + + + /* Parameters which refer to lanai SRAM addresses where the + driver must issue PIO writes for various things */ + + MYRI10GE_MCP_CMD_GET_SEND_OFFSET, + MYRI10GE_MCP_CMD_GET_SMALL_RX_OFFSET, + MYRI10GE_MCP_CMD_GET_BIG_RX_OFFSET, + MYRI10GE_MCP_CMD_GET_IRQ_ACK_OFFSET, + MYRI10GE_MCP_CMD_GET_IRQ_DEASSERT_OFFSET, + + /* Parameters which refer to rings stored on the MCP, + and whose size is controlled by the mcp */ + +
Re: [PATCH 4/6] myri10ge - First half of the driver
On Wed, 10 May 2006 14:40:22 -0700 (PDT) Brice Goglin [EMAIL PROTECTED] wrote: [PATCH 4/6] myri10ge - First half of the driver The first half of the myri10ge driver core. Splitting it in half, might help email restrictions, but it kills future users of 'git bisect' who expect to have every kernel buildable. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] myri10ge - Second half of the driver
On Wed, 10 May 2006 14:42:41 -0700 (PDT) Brice Goglin [EMAIL PROTECTED] wrote: [PATCH 5/6] myri10ge - Second half of the driver The second half of the myri10ge driver core. Signed-off-by: Brice Goglin [EMAIL PROTECTED] Signed-off-by: Andrew J. Gallatin [EMAIL PROTECTED] myri10ge.c | 1540 + 1 file changed, 1540 insertions(+) --- linux/drivers/net/myri10ge/myri10ge.c.old 2006-05-09 23:00:54.0 +0200 +++ linux/drivers/net/myri10ge/myri10ge.c 2006-05-09 23:00:54.0 +0200 @@ -1481,3 +1481,1543 @@ static struct ethtool_ops myri10ge_ethto .get_stats_count= myri10ge_get_stats_count, .get_ethtool_stats = myri10ge_get_ethtool_stats }; + +static int +myri10ge_open(struct net_device *dev) It is preferred to put function declarations on one line. static int mril10ge_open(struct net_device *dev) +{ + struct myri10ge_priv *mgp; + size_t bytes; + myri10ge_cmd_t cmd; + int tx_ring_size, rx_ring_size; + int tx_ring_entries, rx_ring_entries; + int i, status, big_pow2; + + mgp = dev-priv; use netdev_priv(dev) + + if (mgp-running != MYRI10GE_ETH_STOPPED) + return -EBUSY; + + mgp-running = MYRI10GE_ETH_STARTING; + status = myri10ge_reset(mgp); + /* If the user sets an obscenely small MTU, adjust the small + * bytes down to nearly nothing */ + if (mgp-small_bytes = (dev-mtu + ETH_HLEN)) + mgp-small_bytes = 64; You should enforce mtu = 68 in your driver (see eth_change_mtu) +static int +myri10ge_close(struct net_device *dev) +{ + struct myri10ge_priv *mgp; + struct sk_buff *skb; + myri10ge_tx_buf_t *tx; + int status, i, old_down_cnt, len, idx; + myri10ge_cmd_t cmd; + + mgp = dev-priv; + + if (mgp-running != MYRI10GE_ETH_RUNNING) + return 0; + + if (mgp-tx.req_bytes == NULL) + return 0; + + del_timer_sync(mgp-watchdog_timer); + mgp-running = MYRI10GE_ETH_STOPPING; + if (myri10ge_napi) + netif_poll_disable(mgp-dev); + netif_carrier_off(dev); + netif_stop_queue(dev); + old_down_cnt = mgp-down_cnt; + mb(); + status = myri10ge_send_cmd(mgp, MYRI10GE_MCP_CMD_ETHERNET_DOWN, cmd); + if (status) { + printk(KERN_ERR myri10ge: %s: Couldn't bring down link\n, +dev-name); + } + set_current_state (TASK_UNINTERRUPTIBLE); + if (old_down_cnt == mgp-down_cnt) + schedule_timeout(HZ); + set_current_state(TASK_RUNNING); + if (old_down_cnt == mgp-down_cnt) { + printk(KERN_ERR myri10ge: %s never got down irq\n, +dev-name); + } Better to use a wait_queue and wait_event() +#ifdef NETIF_F_TSO +static inline unsigned long +myri10ge_tcpend(struct sk_buff *skb) +{ + struct iphdr *ip; + int iphlen, tcplen; + struct tcphdr *tcp; + + ip = (struct iphdr *) ((char *) skb-data + 14); + iphlen = ip-ihl 2; + tcp = (struct tcphdr *) ((char *) ip + iphlen); + tcplen = tcp-doff 2; + return (tcplen + iphlen + 14); +} +#endif The information you want is already in skb-nh.iph and skb-h.th and it works with VLAN's. Your code doesn't. + +static inline void +myri10ge_csum_fixup(struct sk_buff *skb, int cksum_offset, + int pseudo_hdr_offset) +{ + int csum; + uint16_t *csum_ptr; + + + csum = skb_checksum(skb, cksum_offset, + skb-len - cksum_offset, 0); + csum_ptr = (uint16_t *) (skb-h.raw + skb-csum); + if (!pskb_may_pull(skb, pseudo_hdr_offset)) { + printk(KERN_ERR myri10ge: can't pull skb %d\n, +pseudo_hdr_offset); + return; + } + *csum_ptr = csum_fold(csum); + /* need to fixup IPv4 UDP packets according to RFC768 */ + if (unlikely(*csum_ptr == 0 + skb-protocol == htons(ETH_P_IP) + skb-nh.iph-protocol == IPPROTO_UDP)) { + *csum_ptr = 0x; + } +} Use skb_checksum_help() instead of this code... + +/* + * Transmit a packet. We need to split the packet so that a single + * segment does not cross myri10ge-tx.boundary, so this makes segment + * counting tricky. So rather than try to count segments up front, we + * just give up if there are too few segments to hold a reasonably + * fragmented packet currently available. If we run + * out of segments while preparing a packet for DMA, we just linearize + * it and try again. + */ + +static int +myri10ge_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct myri10ge_priv *mgp = dev-priv; + mcp_kreq_ether_send_t *req; + myri10ge_tx_buf_t *tx = mgp-tx; + struct skb_frag_struct *frag; + dma_addr_t bus; + uint32_t low, high_swapped; +
[PATCH] bonding: fix sparse warnings
Fix warning from sparse in bonding code about incorrect type in assignment Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- orig/drivers/net/bonding/bond_main.c2006-05-04 16:22:10.0 -0700 +++ new/drivers/net/bonding/bond_main.c 2006-05-10 16:04:38.0 -0700 @@ -629,7 +629,7 @@ ioctl = slave_dev-do_ioctl; strncpy(ifr.ifr_name, slave_dev-name, IFNAMSIZ); etool.cmd = ETHTOOL_GSET; - ifr.ifr_data = (char*)etool; + ifr.ifr_data = (void __user *) etool; if (!ioctl || (IOCTL(slave_dev, ifr, SIOCETHTOOL) 0)) { return -1; } @@ -726,7 +726,7 @@ if (ioctl) { strncpy(ifr.ifr_name, slave_dev-name, IFNAMSIZ); etool.cmd = ETHTOOL_GLINK; - ifr.ifr_data = (char*)etool; + ifr.ifr_data = (void __user *) etool; if (IOCTL(slave_dev, ifr, SIOCETHTOOL) == 0) { if (etool.data == 1) { return BMSR_LSTATUS; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bonding: fix sparse warnings
On Thu, 11 May 2006 00:22:03 +0100 Al Viro [EMAIL PROTECTED] wrote: On Wed, May 10, 2006 at 04:14:05PM -0700, Stephen Hemminger wrote: Fix warning from sparse in bonding code about incorrect type in assignment *snerk* Only if you are building without -Wcast-to-as. It _is_ incorrect type in assignment. And the real fix is to expand the call, killing set_fs() in there. More like this (in br_if.c)? struct ethtool_cmd ecmd = { ETHTOOL_GSET }; struct ifreq ifr; mm_segment_t old_fs; int err; strncpy(ifr.ifr_name, dev-name, IFNAMSIZ); ifr.ifr_data = (void __user *) ecmd; old_fs = get_fs(); set_fs(KERNEL_DS); err = dev_ethtool(ifr); set_fs(old_fs); if (!err) ... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 34/35] Add the Xen virtual network device driver.
On Thu, 11 May 2006 11:47:52 +0200 Andi Kleen [EMAIL PROTECTED] wrote: On Thursday 11 May 2006 09:49, Keir Fraser wrote: On 11 May 2006, at 01:33, Herbert Xu wrote: But if sampling virtual events for randomness is really unsafe (is it really?) then native guests in Xen would also get bad random numbers and this would need to be somehow addressed. Good point. I wonder what VMWare does in this situation. Well, there's not much they can do except maybe jitter interrupt delivery. I doubt they do that though. The original complaint in our case was that we take entropy from interrupts caused by other local VMs, as well as external sources. There was a feeling that the former was more predictable and could form the basis of an attack. I have to say I'm unconvinced: I don't really see that it's significantly easier to inject precisely-timed interrupts into a local VM. Certainly not to better than +/- a few microseconds. As long as you add cycle-counter info to the entropy pool, the least significant bits of that will always be noise. I think I agree - e.g. i would expect the virtual interrupts to have enough jitter too. Maybe it would be good if someone could run a few statistics on the resulting numbers? Ok the randomness added doesn't consist only of the least significant bits. Currently it adds jiffies+full 32bit cycle count. I guess if it was a real problem the code could be changed to leave out the jiffies and only add maybe a 8 bit word from the low bits. But that would only help for the para case because the algorithm for native guests cannot be changed. 2. An entropy front/back is tricky -- how do we decide how much entropy to pull from domain0? How much should domain0 be prepared to give other domains? How easy is it to DoS domain0 by draining its entropy pool? Yuk. I claim (without having read any code) that in theory you need to have solved that problem already in the vTPM @) The base question under all this is how good does an entropy source have to be? and then what guarantees do we make about the entropy inputs used by /dev/random?. If we can resolve those, then the virtual environment answer should fall out. This is a area where the security tin-foil hat types take over, and it gets real hard to make good enough argument. People have built an expectation that /dev/random has really strong entropy, good enough to generate long term keys etc. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] expose simplified skb_checksum_recalc
Many users of skb_checksum_help() are just using it to recalculate outbound checksum, so why not expose the interface in a more useful way. Suggested by Ingo Oeser. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- linux-2.6.orig/include/linux/skbuff.h 2006-04-27 11:12:53.0 -0700 +++ linux-2.6/include/linux/skbuff.h2006-05-11 11:17:39.0 -0700 @@ -1343,6 +1343,24 @@ __skb_checksum_complete(skb); } +extern int skb_checksum_recalc(struct sk_buff *skb); +/** + * skb_checksum_help - recalculate checksum of packet + * @skb: packet to process + * @inward: direction of flow, zero is receiving + * + * Invalidate hardware checksum when packet is to be mangled on + * receive and complete checksum manually on outgoing path. + */ +static inline int skb_checksum_help(struct sk_buff *skb, int inward) +{ + if (inward) { + skb-ip_summed = CHECKSUM_NONE; + return 0; + } + return skb_checksum_recalc(skb); +} + #ifdef CONFIG_NETFILTER static inline void nf_conntrack_put(struct nf_conntrack *nfct) { --- sky2.orig/net/core/dev.c2006-05-10 10:17:51.0 -0700 +++ sky2/net/core/dev.c 2006-05-11 11:22:27.0 -0700 @@ -1144,39 +1144,6 @@ EXPORT_SYMBOL(netif_device_attach); -/* - * Invalidate hardware checksum when packet is to be mangled, and - * complete checksum manually on outgoing path. - */ -int skb_checksum_help(struct sk_buff *skb, int inward) -{ - unsigned int csum; - int ret = 0, offset = skb-h.raw - skb-data; - - if (inward) { - skb-ip_summed = CHECKSUM_NONE; - goto out; - } - - if (skb_cloned(skb)) { - ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); - if (ret) - goto out; - } - - BUG_ON(offset (int)skb-len); - csum = skb_checksum(skb, offset, skb-len-offset, 0); - - offset = skb-tail - skb-h.raw; - BUG_ON(offset = 0); - BUG_ON(skb-csum + 2 offset); - - *(u16*)(skb-h.raw + skb-csum) = csum_fold(csum); - skb-ip_summed = CHECKSUM_NONE; -out: - return ret; -} - /* Take action when hardware reception checksum errors are detected. */ #ifdef CONFIG_BUG void netdev_rx_csum_fault(struct net_device *dev) @@ -3403,7 +3370,6 @@ EXPORT_SYMBOL(register_gifconf); EXPORT_SYMBOL(register_netdevice); EXPORT_SYMBOL(register_netdevice_notifier); -EXPORT_SYMBOL(skb_checksum_help); EXPORT_SYMBOL(synchronize_net); EXPORT_SYMBOL(unregister_netdevice); EXPORT_SYMBOL(unregister_netdevice_notifier); --- sky2.orig/net/core/skbuff.c 2006-04-27 11:12:54.0 -0700 +++ sky2/net/core/skbuff.c 2006-05-11 11:23:13.0 -0700 @@ -1334,6 +1334,36 @@ } /** + * skb_checksum_recalc - force software checksum + * @skb: skb to process + * Force complete checksum, this is used to force a software checksum + * on the outgoing path. + */ +int skb_checksum_recalc(struct sk_buff *skb) +{ + unsigned int csum; + int ret = 0, offset = skb-h.raw - skb-data; + + if (skb_cloned(skb)) { + ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); + if (ret) + goto out; + } + + BUG_ON(offset (int)skb-len); + csum = skb_checksum(skb, offset, skb-len-offset, 0); + + offset = skb-tail - skb-h.raw; + BUG_ON(offset = 0); + BUG_ON(skb-csum + 2 offset); + + *(u16*)(skb-h.raw + skb-csum) = csum_fold(csum); + skb-ip_summed = CHECKSUM_NONE; +out: + return ret; +} + +/** * skb_dequeue - remove from the head of the queue * @list: list to dequeue from * @@ -1854,6 +1884,7 @@ EXPORT_SYMBOL(pskb_copy); EXPORT_SYMBOL(pskb_expand_head); EXPORT_SYMBOL(skb_checksum); +EXPORT_SYMBOL(skb_checksum_recalc); EXPORT_SYMBOL(skb_clone); EXPORT_SYMBOL(skb_clone_fraglist); EXPORT_SYMBOL(skb_copy); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sky2: prevent dual port receiver problems
When both ports are receiving simultaneously, the receive logic gets confused and may pass up a packet before it is full. This causes hangs, and IP will see lots of garbage packets. There is even the potential for data corruption if a later arriving packet DMA's into freed memory. It looks like a hardware bug because status arrives for a packet but no data is there. Until this bug is worked out, block the user from bringing up both ports at once. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- sky2.orig/drivers/net/sky2.c +++ sky2/drivers/net/sky2.c @@ -1020,8 +1020,19 @@ static int sky2_up(struct net_device *de struct sky2_hw *hw = sky2-hw; unsigned port = sky2-port; u32 ramsize, rxspace, imask; - int err = -ENOMEM; + int err; + struct net_device *otherdev = hw-dev[sky2-port^1]; + /* Block bringing up both ports at the same time on a dual port card. +* There is an unfixed bug where receiver gets confused and picks up +* packets out of order. Until this is fixed, prevent data corruption. +*/ + if (otherdev netif_running(otherdev)) { + printk(KERN_INFO PFX dual port support is disabled.\n); + return -EBUSY; + } + + err = -ENOMEM; if (netif_msg_ifup(sky2)) printk(KERN_INFO PFX %s: enabling interface\n, dev-name); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skge driver oops
On Fri, 12 May 2006 11:36:24 +1000 David Arnold [EMAIL PROTECTED] wrote: i've been getting semi-regular lockups on my machine over 2.6.16 series. I recently attached a serial console in an attempt to capture an OOPS. i got one yesterday. it's copied manually from the console, but hopefully the values are all accurate. there was more that had scrolled off screen above this too (sorry). oops, lspci, uname -a, .config and dmesg below. any suggestions for further debugging would be great, thanks, Could you retest with the v1.5 version that is 2.6.17-rc3? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ixp2000: handle enp2611s with two gigabit ports
On Thu, 27 Apr 2006 00:24:11 +0200 Lennert Buytenhek [EMAIL PROTECTED] wrote: The ixp2000 driver for the enp2611 was developed on a board with three gigabit ports, but some enp2611 models only have two ports (and only one onboard PM3386.) The current driver assumes there are always three ports and so it doesn't work on the two-port version of the board at all. This patch adds a bit of logic to the enp2611 driver to limit the number of ports to 2 if the second PM3386 isn't detected. Signed-off-by: Lennert Buytenhek [EMAIL PROTECTED] This patch got mangled, that is probably why jeff didn't apply it before he left. I had to fix it manually. patching file drivers/net/ixp2000/enp2611.c patch: malformed patch at line 106: module_init(enp2611_init_module); In this part... @@ -236,8 +240,10 @@ del_timer_sync(link_check_timer); ixpdev_deinit(); - for (i = 0; i 3; i++) - free_netdev(nds[i]); + for (i = 0; i 3; i++) { + if (nds[i] != NULL) free_netdev(nds[i]); + } } module_init(enp2611_init_module); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net_sched: potential jiffy wrap bug in dev_watchdog
There is a potential jiffy wraparound bug in the transmit watchdog that is easily avoided by using time_after(). Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- linux-2.6.orig/net/sched/sch_generic.c +++ linux-2.6/net/sched/sch_generic.c @@ -193,8 +193,10 @@ static void dev_watchdog(unsigned long a netif_running(dev) netif_carrier_ok(dev)) { if (netif_queue_stopped(dev) - (jiffies - dev-trans_start) dev-watchdog_timeo) { - printk(KERN_INFO NETDEV WATCHDOG: %s: transmit timed out\n, dev-name); + time_after(jiffies, dev-trans_start + dev-watchdog_timeo)) { + + printk(KERN_INFO NETDEV WATCHDOG: %s: transmit timed out\n, + dev-name); dev-tx_timeout(dev); } if (!mod_timer(dev-watchdog_timer, jiffies + dev-watchdog_timeo)) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] skge: bad checksums on big-endian platforms
Skge driver always causes bad checksums on big-endian. The checksum in the receive control block was being swapped when it doesn't need to be. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge-2.6.orig/drivers/net/skge.c +++ skge-2.6/drivers/net/skge.c @@ -2717,8 +2717,7 @@ static int skge_poll(struct net_device * if (control BMU_OWN) break; - skb = skge_rx_get(skge, e, control, rd-status, - le16_to_cpu(rd-csum2)); + skb = skge_rx_get(skge, e, control, rd-status, rd-csum2); if (likely(skb)) { dev-last_rx = jiffies; netif_receive_skb(skb); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] skge: don't allow transmit ring to be too small
The driver will get stuck (permanent transmit timeout), if the transmit ring size is set too small. It needs to have enough ring elements to hold one maximum size transmit. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- skge-2.6.orig/drivers/net/skge.c +++ skge-2.6/drivers/net/skge.c @@ -402,7 +402,7 @@ static int skge_set_ring_param(struct ne int err; if (p-rx_pending == 0 || p-rx_pending MAX_RX_RING_SIZE || - p-tx_pending == 0 || p-tx_pending MAX_TX_RING_SIZE) + p-tx_pending MAX_SKB_FRAGS+1 || p-tx_pending MAX_TX_RING_SIZE) return -EINVAL; skge-rx_ring.count = p-rx_pending; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: send(), sendmsg(), sendto() not thread-safe
On Mon, 15 May 2006 16:17:48 -0700 Rick Jones [EMAIL PROTECTED] wrote: David S. Miller wrote: From: Mark A Smith [EMAIL PROTECTED] Date: Mon, 15 May 2006 14:39:06 -0700 I discovered that in some cases, send(), sendmsg(), and sendto() are not thread-safe. Although the man page for these functions does not specify whether these functions are supposed to be thread-safe, my reading of the POSIX/SUSv3 specification tells me that they should be. I traced the problem to tcp_sendmsg(). I was very curious about this issue, so I wrote up a small page to describe in more detail my findings. You can find it at: http://www.almaden.ibm.com/cs/people/marksmith/sendmsg.html . # ./sendmsgclient localhost ERROR! We should have all 0! We don't! buff[16384]=1 buff[16385]=1 buff[16386]=1 buff[16387]=1 buff[16388]=1 buff[16389]=1 buff[16390]=1 buff[16391]=1 buff[16392]=1 buff[16393]=1 That's 10/32768 bad bytes # uname -a HP-UX tarry B.11.23 U ia64 2397028692 unlimited-user license Given that the URL above asserts that HP-UX claims atomicity, either there is a bug in the UX stack, or perhaps the test? I took a quick look at the HP-UX 11iv2 (aka 11.23) manpage for sendmsg and didn't see anything about atomicity there - on which manpage(s) or docs was the assertion of HP-UX atomicity made? I presume this is only for blocking sockets? I cannot at least off the top of my head see how a stack could offer it on non-blocking sockets. The test seems to be based on sending a big message. In this case, on non-blocking sockets, the send call will return partial status. The return from the system call will be less than the number of bytes requested. And frankly, BSD defines BSD socket semantics here not some wording in the POSIX standards. Have BSD socket semantics ever been updated/clarified any any quasi-official manner since the popular presence of threads? Or are/were Posix/Xopen filling a gap? rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcpdump may trace some outbound packets twice.
On Mon, 15 May 2006 16:11:05 -0700 (PDT) Ranjit Manomohan [EMAIL PROTECTED] wrote: On Mon, 15 May 2006, David S. Miller wrote: From: Ranjit Manomohan [EMAIL PROTECTED] Date: Mon, 15 May 2006 14:19:06 -0700 (PDT) Heres a new version which does a copy instead of the clone to avoid the double cloning issue. I still very much dislike this patch because it is creating 1 more clone per packet than is actually necessary and that is very expensive. dev_queue_xmit_nit() is going to clone whatever SKB you send into there, so better to just bump the reference count (with skb_get()) instead of cloning or copying. I was a bit apprehensive about just incrementing the refcnt but that works too. Attached is the modified version. -Thanks, Ranjit --- linux-2.6/net/sched/sch_generic.c 2006-05-10 12:34:52.0 -0700 +++ linux/net/sched/sch_generic.c 2006-05-15 15:48:03.0 -0700 @@ -136,8 +136,12 @@ if (!netif_queue_stopped(dev)) { int ret; + struct sk_buff *skbc = NULL; + /* Increment the reference count on the skb so + * that we can use it after a successful xmit. + */ if (netdev_nit) - dev_queue_xmit_nit(skb, dev); + skbc = skb_get(skb); skbc = netdev_nit ? skb_get(skb) : NULL; ret = dev-hard_start_xmit(skb, dev); if (ret == NETDEV_TX_OK) { @@ -145,9 +149,20 @@ dev-xmit_lock_owner = -1; spin_unlock(dev-xmit_lock); } + if (skbc) { + /* transmit succeeded, + * trace the buffer. */ + dev_queue_xmit_nit(skbc,dev); + kfree_skb(skbc); + } spin_lock(dev-queue_lock); return -1; } + + /* Call free in case we incremented refcnt */ + if (skbc) + kfree_skb(skbc); kfree_skb(NULL) is legal so the conditional here is unneeded. But the increased calls to kfree_skb(NULL) would probably bring the unlikely() hordes descending on kfree_skb, so maybe: if (unlikely(netdev_nit)) kfree_skb(skbc); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enabling standard compliant behaviour in the Linux TCP implementation
On Tue, 16 May 2006 16:24:22 +0200 Angelo P. Castellani [EMAIL PROTECTED] wrote: Hi all, I'm a student doing a thesis about TCP performance over high BDP links and so about congestion control in TCP. To do this work I've built a testbed using the latest Linux release (2.6.16). Anyway I've came across the fact that Linux TCP implementation isn't fully standard compliant. Even if the choices made to be different from the standards have been wisely thought, I think that should be possible to disable these Linuxisms. Surely this can help all the people using Linux to evaluate a standard environment. Moreover it permits to compare the proscons of the Linux implementation against the standard one. So I've disabled the first two Linux-specific mechanisms I've found: - rate halving - dynamic reordering metric (dynamic DupThresh) These're disabled as long as net.ipv4.tcp_standard_compliant=1 (default: 0). However I don't exclude that there're more non-standard details, so I hope that somebody can point some more differences between Linux and the RFCs. Moreover NewReno is implemented in the Impatient variant (resets the retransmit timer only on the first partial ack), with net.ipv4.tcp_slow_but_steady=1 (default: 0) you can enable the Slow-but-Steady variant (resets the retransmit timer every partial ack). Hoping that this can be useful, I attach the patch. Regards, Angelo P. Castellani Read Linus's comments on standards. We make software for users, not for academic use. http://kerneltrap.org/node/5725 If we added this then paranoid users would set it. The Reno thing seems okay, if the default was the same as the original behavior but it makes one more test case to try. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ifIndex allocation
On Tue, 16 May 2006 08:11:01 +0200 Sven Schnelle [EMAIL PROTECTED] wrote: Hi List, Redirecting to netdev investigating a problem with an snmp software for linux, i was wondering why the kernel allocates a new ifindex Number, even if the old one is still available. For example, if i unload a network driver module, and reload it, it has a different ifindex. Because when you reload the driver it is effectively a completely new object. ifindex is a basically and object id. Ifindices act as soft references so user space can know about a particular network device even if name changes or other operations happen. The reference is soft because the device can disappear. If the application wants to know about device removal it can catch the netlink event. Looking at the function dev_new_index (line 2620 in net/core/dev.c) there is a line 'static int ifindex'. Is there any special reason why this variable is static, and the list is not traversed from the beginning, so that the first free ifindex will be used? Best regards, Sven. It is static because no other code should be looking at it. It doesn't retraverse from the start because it doesn't want to reuse an earlier index and confuse an application with a soft reference. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1
Could you try the 2.6.17-rc4 version with this patch. It turns out the board seems to give out of order status responses. Ignore the vendor sk98lin driver, when I try the stock version it spends it's life resetting itself because it sets up PCI bus wrong. If I fix that, it spends it's time getting confused because it can't handle intermixed status reports properly (checksum et all is per port not per board). drivers/net/sky2.c | 28 +--- 1 files changed, 21 insertions(+), 7 deletions(-) 792547bc5e8e4f7d5a1070a168056f429635c254 diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index ffd267f..11e7914 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -1020,8 +1020,27 @@ static int sky2_up(struct net_device *de struct sky2_hw *hw = sky2-hw; unsigned port = sky2-port; u32 ramsize, rxspace, imask; - int err = -ENOMEM; + int cap, err; + struct net_device *otherdev = hw-dev[sky2-port^1]; + /* +* Reduce split transactions (and turn off) rx checksums to +* prevent problems with dual ports. +*/ + if (otherdev netif_running(otherdev) + (cap = pci_find_capability(hw-pdev, PCI_CAP_ID_PCIX))) { + struct sky2_port *osky2 = netdev_priv(otherdev); + u16 cmd; + + cmd = sky2_pci_read16(hw, cap + PCI_X_CMD); + cmd = ~PCI_X_CMD_MAX_SPLIT; + sky2_pci_write16(hw, cap + PCI_X_CMD, cmd); + + sky2-rx_csum = 0; + osky2-rx_csum = 0; + } + + err = -ENOMEM; if (netif_msg_ifup(sky2)) printk(KERN_INFO PFX %s: enabling interface\n, dev-name); @@ -3067,12 +3086,7 @@ static __devinit struct net_device *sky2 sky2-duplex = -1; sky2-speed = -1; sky2-advertising = sky2_supported_modes(hw); - - /* Receive checksum disabled for Yukon XL -* because of observed problems with incorrect -* values when multiple packets are received in one interrupt -*/ - sky2-rx_csum = (hw-chip_id != CHIP_ID_YUKON_XL); + sky2-rx_csum = 1; spin_lock_init(sky2-phy_lock); sky2-tx_pending = TX_DEF_PENDING; -- 1.2.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skge driver oops
On Fri, 12 May 2006 11:36:24 +1000 David Arnold [EMAIL PROTECTED] wrote: i've been getting semi-regular lockups on my machine over 2.6.16 series. I recently attached a serial console in an attempt to capture an OOPS. i got one yesterday. it's copied manually from the console, but hopefully the values are all accurate. there was more that had scrolled off screen above this too (sorry). oops, lspci, uname -a, .config and dmesg below. any suggestions for further debugging would be great, thanks, I tried reproducing this and can't seem to cause it. Are you running anything special that could influence this? bridging, VLAN's, bonding, netfilter, queueing disciplines, tc filters, ... What is the output of /proc/interrupts, perhaps the devices don't like sharing IRQ? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html