Re: e100: Wait for PHY reset to complete?
Anders Grafström wrote: I ran mii-diag when the LEDs went out and the register dump said it was in loopback. It is somewhat difficult reproduce. It seems to be timing dependent, something else has to occur at the same time. I must confess I have only seen it with the 2.6.13 kernel. I have not been able to reproduce it with 2.6.18. But I have found no change in the driver that would fix it so I suspect the problem is still there. Now looking at mdio_ctrl(), which implements mdio_write() and mdio_read(), I see that this patch http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm3/broken-out/corruption-during-e100-mdi-register-access.patch most likely fixed the problem that I have experienced. although access to the MDI registers involves multiple steps which must not be intermixed, nothing was defending against two or more threads attempting simultaneous access. The most obvious situation where such interference could occur involves the watchdog versus ioctl paths Sorry for the noise. Thanks, Anders - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: Wait for PHY reset to complete?
Anders Grafström wrote: Anders Grafström wrote: I ran mii-diag when the LEDs went out and the register dump said it was in loopback. It is somewhat difficult reproduce. It seems to be timing dependent, something else has to occur at the same time. I must confess I have only seen it with the 2.6.13 kernel. I have not been able to reproduce it with 2.6.18. But I have found no change in the driver that would fix it so I suspect the problem is still there. Now looking at mdio_ctrl(), which implements mdio_write() and mdio_read(), I see that this patch http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm3/broken-out/corruption-during-e100-mdi-register-access.patch most likely fixed the problem that I have experienced. although access to the MDI registers involves multiple steps which must not be intermixed, nothing was defending against two or more threads attempting simultaneous access. The most obvious situation where such interference could occur involves the watchdog versus ioctl paths Sorry for the noise. cool, thanks for digging that up and taking the time to report back ;) Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: Wait for PHY reset to complete?
Auke Kok wrote: Anders Grafström wrote: Auke Kok wrote: Allthough the spec itself didn't talk about phy reset times, I've ran this patch with some debugging output on a few boxes and did some speed/duplex settings, and the PHY reset returned succesfull after the very first mdio_read, which is before any msleep(10) is executed. That is also expected behaviour. I think you might be confusing this with a MAC reset, which has a documented 10usec timeout (see 8255x developers manual). The driver already adheres to this by doing a 20usec delay after software/selective resets. which gets us back to the original problem: how did your driver end up in loopback mode? (and, how did you figure out that it did??). This is what the 2.4.33.3 driver does: void e100_phy_reset(struct e100_private *bdp) { u16 ctrl_reg; ctrl_reg = BMCR_RESET; e100_mdi_write(bdp, MII_BMCR, bdp-phy_addr, ctrl_reg); /* ieee 802.3 : The reset process shall be completed */ /* within 0.5 seconds from the settting of PHY reset bit. */ set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(HZ / 2); } And here http://www.cs.helsinki.fi/linux/linux-kernel/2003-23/1245.html I found this entry: [EMAIL PROTECTED] (03/06/08 1.1218) [e100] misc ... * Add 1/2 second delay after PHY reset to allow link partner to see and respond to reset, per IEEE 802.3. I ran mii-diag when the LEDs went out and the register dump said it was in loopback. It is somewhat difficult reproduce. It seems to be timing dependent, something else has to occur at the same time. I must confess I have only seen it with the 2.6.13 kernel. I have not been able to reproduce it with 2.6.18. But I have found no change in the driver that would fix it so I suspect the problem is still there. I have tried adding debug output to see if I can read back the RESET bit in set state, but then the problem refuses to show so I don't think I can rule out an unfinished PHY reset. theoretically, yes, the ieee spec PHY reset timeout is kind of silly: in no way do we assume that we have re-negotiated link after 1/2 a second! Other code in the driver should take care of that, and since it works I'll assume it does ;) the mdio_read probably acts as a flush to the hardware too - masquerading problems, more goodness. Perhaps we should do a single read in all cases and forget about the timeout (is there an mdio_write_flush?) Anders, It appears that there isn't such a thing as an mdio_write_flush in e100 but we can force a flush by reading the status register. Not sure if it works 100% so please give this a try and let me know if this fixes the problem for you. Basically, we're preventing mdio writes that might happen after our reset write from being reordered. This might have caused an undefined state. Auke --- e100: force mdio write flush after PHY reset Make sure the PHY reset happens without delay by flushing the reset write. Signed-off-by: Auke Kok [EMAIL PROTECTED] diff --git a/drivers/net/e100.c b/drivers/net/e100.c index 815eb29..04a52cd 100644 --- a/drivers/net/e100.c +++ b/drivers/net/e100.c @@ -2197,6 +2197,8 @@ static int e100_set_settings(struct net_ int err; mdio_write(netdev, nic-mii.phy_id, MII_BMCR, BMCR_RESET); + /* flush MDIO write */ + mdio_read(netdev, nic-mii.phy_id, MII_BMSR); err = mii_ethtool_sset(nic-mii, cmd); e100_exec_cb(nic, NULL, e100_configure); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
e100: Wait for PHY reset to complete?
Shouldn't the e100 driver wait for PHY reset to complete before proceeding in e100_set_settings()? I'm asking because the interface (82551) sometimes ends up in loopback when I try to set AUTONEG. I'm thinking something like this patch. Anders --- linux-2.6.18.orig/drivers/net/e100.c2006-09-20 05:42:06.0 +0200 +++ linux-2.6.18/drivers/net/e100.c 2006-10-17 17:43:47.0 +0200 @@ -2201,8 +2201,16 @@ static int e100_set_settings(struct net_ { struct nic *nic = netdev_priv(netdev); int err; + int counter = 50; mdio_write(netdev, nic-mii.phy_id, MII_BMCR, BMCR_RESET); + + /* wait for possibly (ouch) 500ms */ + while (mdio_read(netdev, nic-mii.phy_id, MII_BMCR) BMCR_RESET) { + msleep(10); + if (!--counter) break; + } + err = mii_ethtool_sset(nic-mii, cmd); e100_exec_cb(nic, NULL, e100_configure); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: Wait for PHY reset to complete?
Anders Grafstrom [EMAIL PROTECTED] : [...] I'm thinking something like this patch. I do not have the spec for the max duration at hand but it makes sense. Can you: - decrease the duration of each sleep and increase the count of iterations - put the break on a line of its own - add a Signed-off-by: Foo Bar [EMAIL PROTECTED] line - Cc: [EMAIL PROTECTED] and Auke Kok [EMAIL PROTECTED] -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: Wait for PHY reset to complete?
Francois Romieu wrote: Anders Grafstrom [EMAIL PROTECTED] : [...] I'm thinking something like this patch. I do not have the spec for the max duration at hand but it makes sense. Can you: - decrease the duration of each sleep and increase the count of iterations - put the break on a line of its own - add a Signed-off-by: Foo Bar [EMAIL PROTECTED] line - Cc: [EMAIL PROTECTED] and Auke Kok [EMAIL PROTECTED] I'm already checking out specs, which, for 8255x devices, are available on http://e1000.sf.net/ . Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: Wait for PHY reset to complete?
Auke Kok wrote: Francois Romieu wrote: Anders Grafstrom [EMAIL PROTECTED] : [...] I'm thinking something like this patch. I do not have the spec for the max duration at hand but it makes sense. Can you: - decrease the duration of each sleep and increase the count of iterations - put the break on a line of its own - add a Signed-off-by: Foo Bar [EMAIL PROTECTED] line - Cc: [EMAIL PROTECTED] and Auke Kok [EMAIL PROTECTED] I'm already checking out specs, which, for 8255x devices, are available on http://e1000.sf.net/ . okay, I don't think this is needed at all: Allthough the spec itself didn't talk about phy reset times, I've ran this patch with some debugging output on a few boxes and did some speed/duplex settings, and the PHY reset returned succesfull after the very first mdio_read, which is before any msleep(10) is executed. That is also expected behaviour. I think you might be confusing this with a MAC reset, which has a documented 10usec timeout (see 8255x developers manual). The driver already adheres to this by doing a 20usec delay after software/selective resets. which gets us back to the original problem: how did your driver end up in loopback mode? (and, how did you figure out that it did??). Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: Wait for PHY reset to complete?
Auke Kok [EMAIL PROTECTED] : [...] okay, I don't think this is needed at all: Allthough the spec itself didn't talk about phy reset times, I've ran this patch with some debugging output on a few boxes and did some speed/duplex settings, and the PHY reset returned succesfull after the very first mdio_read, which is before any msleep(10) is executed. That is also expected behaviour. Would it be expected as well that the PHY has performed reset before the second mdio_write hits it when the mdio_read is not there at all ? -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: Wait for PHY reset to complete?
Auke Kok wrote: Allthough the spec itself didn't talk about phy reset times, I've ran this patch with some debugging output on a few boxes and did some speed/duplex settings, and the PHY reset returned succesfull after the very first mdio_read, which is before any msleep(10) is executed. That is also expected behaviour. I think you might be confusing this with a MAC reset, which has a documented 10usec timeout (see 8255x developers manual). The driver already adheres to this by doing a 20usec delay after software/selective resets. which gets us back to the original problem: how did your driver end up in loopback mode? (and, how did you figure out that it did??). This is what the 2.4.33.3 driver does: void e100_phy_reset(struct e100_private *bdp) { u16 ctrl_reg; ctrl_reg = BMCR_RESET; e100_mdi_write(bdp, MII_BMCR, bdp-phy_addr, ctrl_reg); /* ieee 802.3 : The reset process shall be completed */ /* within 0.5 seconds from the settting of PHY reset bit. */ set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(HZ / 2); } And here http://www.cs.helsinki.fi/linux/linux-kernel/2003-23/1245.html I found this entry: [EMAIL PROTECTED] (03/06/08 1.1218) [e100] misc ... * Add 1/2 second delay after PHY reset to allow link partner to see and respond to reset, per IEEE 802.3. I ran mii-diag when the LEDs went out and the register dump said it was in loopback. It is somewhat difficult reproduce. It seems to be timing dependent, something else has to occur at the same time. I must confess I have only seen it with the 2.6.13 kernel. I have not been able to reproduce it with 2.6.18. But I have found no change in the driver that would fix it so I suspect the problem is still there. I have tried adding debug output to see if I can read back the RESET bit in set state, but then the problem refuses to show so I don't think I can rule out an unfinished PHY reset. Anders - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: Wait for PHY reset to complete?
Anders Grafström wrote: Auke Kok wrote: Allthough the spec itself didn't talk about phy reset times, I've ran this patch with some debugging output on a few boxes and did some speed/duplex settings, and the PHY reset returned succesfull after the very first mdio_read, which is before any msleep(10) is executed. That is also expected behaviour. I think you might be confusing this with a MAC reset, which has a documented 10usec timeout (see 8255x developers manual). The driver already adheres to this by doing a 20usec delay after software/selective resets. which gets us back to the original problem: how did your driver end up in loopback mode? (and, how did you figure out that it did??). This is what the 2.4.33.3 driver does: void e100_phy_reset(struct e100_private *bdp) { u16 ctrl_reg; ctrl_reg = BMCR_RESET; e100_mdi_write(bdp, MII_BMCR, bdp-phy_addr, ctrl_reg); /* ieee 802.3 : The reset process shall be completed */ /* within 0.5 seconds from the settting of PHY reset bit. */ set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(HZ / 2); } And here http://www.cs.helsinki.fi/linux/linux-kernel/2003-23/1245.html I found this entry: [EMAIL PROTECTED] (03/06/08 1.1218) [e100] misc ... * Add 1/2 second delay after PHY reset to allow link partner to see and respond to reset, per IEEE 802.3. I ran mii-diag when the LEDs went out and the register dump said it was in loopback. It is somewhat difficult reproduce. It seems to be timing dependent, something else has to occur at the same time. I must confess I have only seen it with the 2.6.13 kernel. I have not been able to reproduce it with 2.6.18. But I have found no change in the driver that would fix it so I suspect the problem is still there. I have tried adding debug output to see if I can read back the RESET bit in set state, but then the problem refuses to show so I don't think I can rule out an unfinished PHY reset. theoretically, yes, the ieee spec PHY reset timeout is kind of silly: in no way do we assume that we have re-negotiated link after 1/2 a second! Other code in the driver should take care of that, and since it works I'll assume it does ;) the mdio_read probably acts as a flush to the hardware too - masquerading problems, more goodness. Perhaps we should do a single read in all cases and forget about the timeout (is there an mdio_write_flush?) Basically the timeout is wrong: a LINK reset is not a PHY reset. The PHY is back online and ready to respond in (probably) a single clock cycle. The link can take up to 3 seconds in normal cases. Waiting for 1/2 a second does not fix anything there. Here's where the 8255x (PHY part) spec abandons us: I don't read anything about PHY reset timeouts in it. Can you try to debug if your while () timeout loop is actually waiting for a significant amount? something like adding a printk(KERN_ERR counted down to %d0 msec\n, counter); after the entire while{} loop should show you if there is variation in the PHY reset time needed for the PHY to be back online. running mii-diag before the link comes back up might be causing the issue in the first place, and certainly suggests a small race. Have you tried to run the e100-sbit branch from jgarzik's netdev-2.6 tree? We're still looking into merging this and I guess I should push it to -mm to have it receive some testing Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html