Re: e100: Wait for PHY reset to complete?

2006-10-31 Thread Anders Grafström
 Anders Grafström wrote:
 I ran mii-diag when the LEDs went out and the register dump
 said it was in loopback. It is somewhat difficult reproduce.
 It seems to be timing dependent, something else has to occur
 at the same time.
 I must confess I have only seen it with the 2.6.13 kernel.
 I have not been able to reproduce it with 2.6.18.
 But I have found no change in the driver that would fix it so
 I suspect the problem is still there.

Now looking at mdio_ctrl(), which implements mdio_write() and mdio_read(),
I see that this patch
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm3/broken-out/corruption-during-e100-mdi-register-access.patch
most likely fixed the problem that I have experienced.

although access to the MDI registers involves
multiple steps which must not be intermixed, nothing was defending against
two or more threads attempting simultaneous access.  The most obvious
situation where such interference could occur involves the watchdog versus
ioctl paths

Sorry for the noise.
Thanks,
Anders


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100: Wait for PHY reset to complete?

2006-10-31 Thread Auke Kok

Anders Grafström wrote:

Anders Grafström wrote:

I ran mii-diag when the LEDs went out and the register dump
said it was in loopback. It is somewhat difficult reproduce.
It seems to be timing dependent, something else has to occur
at the same time.
I must confess I have only seen it with the 2.6.13 kernel.
I have not been able to reproduce it with 2.6.18.
But I have found no change in the driver that would fix it so
I suspect the problem is still there.


Now looking at mdio_ctrl(), which implements mdio_write() and mdio_read(),
I see that this patch
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15/2.6.15-mm3/broken-out/corruption-during-e100-mdi-register-access.patch
most likely fixed the problem that I have experienced.

although access to the MDI registers involves
multiple steps which must not be intermixed, nothing was defending against
two or more threads attempting simultaneous access.  The most obvious
situation where such interference could occur involves the watchdog versus
ioctl paths

Sorry for the noise.


cool, thanks for digging that up and taking the time to report back ;)

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100: Wait for PHY reset to complete?

2006-10-30 Thread Auke Kok

Auke Kok wrote:

Anders Grafström wrote:

Auke Kok wrote:
Allthough the spec itself didn't talk about phy reset times, I've ran 
this

patch with
some debugging output on a few boxes and did some speed/duplex settings,
and the PHY
reset returned succesfull after the very first mdio_read, which is 
before

any msleep(10)
is executed. That is also expected behaviour.

I think you might be confusing this with a MAC reset, which has a
documented 10usec
timeout (see 8255x developers manual). The driver already adheres to 
this

by doing a
20usec delay after software/selective resets.

which gets us back to the original problem: how did your driver end 
up in

loopback mode?
(and, how did you figure out that it did??).



This is what the 2.4.33.3 driver does:

void
e100_phy_reset(struct e100_private *bdp)
{
u16 ctrl_reg;
ctrl_reg = BMCR_RESET;
e100_mdi_write(bdp, MII_BMCR, bdp-phy_addr, ctrl_reg);
/* ieee 802.3 : The reset process shall be completed   */
/* within 0.5 seconds from the settting of PHY reset bit.  */
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(HZ / 2);
}

And here
http://www.cs.helsinki.fi/linux/linux-kernel/2003-23/1245.html
I found this entry:

[EMAIL PROTECTED] (03/06/08 1.1218)
[e100] misc
...
* Add 1/2 second delay after PHY reset to allow link partner to
see and respond to reset, per IEEE 802.3.


I ran mii-diag when the LEDs went out and the register dump
said it was in loopback. It is somewhat difficult reproduce.
It seems to be timing dependent, something else has to occur
at the same time.
I must confess I have only seen it with the 2.6.13 kernel.
I have not been able to reproduce it with 2.6.18.
But I have found no change in the driver that would fix it so
I suspect the problem is still there.

I have tried adding debug output to see if I can read back the
RESET bit in set state, but then the problem refuses to show
so I don't think I can rule out an unfinished PHY reset.


theoretically, yes, the ieee spec PHY reset timeout is kind of silly: in 
no way do we assume that we have re-negotiated link after 1/2 a second! 
Other code in the driver should take care of that, and since it works 
I'll assume it does ;)


the mdio_read probably acts as a flush to the hardware too - 
masquerading problems, more goodness. Perhaps we should do a single read 
in all cases and forget about the timeout (is there an mdio_write_flush?)


Anders,

It appears that there isn't such a thing as an mdio_write_flush in e100 but we can force 
a flush by reading the status register. Not sure if it works 100% so please give this a 
try and let me know if this fixes the problem for you.


Basically, we're preventing mdio writes that might happen after our reset write from 
being reordered. This might have caused an undefined state.


Auke


---

e100: force mdio write flush after PHY reset

Make sure the PHY reset happens without delay by flushing the reset write.

Signed-off-by: Auke Kok [EMAIL PROTECTED]


diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 815eb29..04a52cd 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2197,6 +2197,8 @@ static int e100_set_settings(struct net_
int err;

mdio_write(netdev, nic-mii.phy_id, MII_BMCR, BMCR_RESET);
+   /* flush MDIO write */
+   mdio_read(netdev, nic-mii.phy_id, MII_BMSR);
err = mii_ethtool_sset(nic-mii, cmd);
e100_exec_cb(nic, NULL, e100_configure);

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100: Wait for PHY reset to complete?

2006-10-25 Thread Francois Romieu
Anders Grafstrom [EMAIL PROTECTED] :
[...]
 I'm thinking something like this patch.

I do not have the spec for the max duration at hand but it makes sense.

Can you:
- decrease the duration of each sleep and increase the count of iterations
- put the break on a line of its own
- add a Signed-off-by: Foo Bar [EMAIL PROTECTED] line
- Cc: [EMAIL PROTECTED] and Auke Kok [EMAIL PROTECTED]

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100: Wait for PHY reset to complete?

2006-10-25 Thread Auke Kok

Francois Romieu wrote:

Anders Grafstrom [EMAIL PROTECTED] :
[...]

I'm thinking something like this patch.


I do not have the spec for the max duration at hand but it makes sense.

Can you:
- decrease the duration of each sleep and increase the count of iterations
- put the break on a line of its own
- add a Signed-off-by: Foo Bar [EMAIL PROTECTED] line
- Cc: [EMAIL PROTECTED] and Auke Kok [EMAIL PROTECTED]


I'm already checking out specs, which, for 8255x devices, are available on 
http://e1000.sf.net/ .


Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100: Wait for PHY reset to complete?

2006-10-25 Thread Auke Kok

Auke Kok wrote:

Francois Romieu wrote:

Anders Grafstrom [EMAIL PROTECTED] :
[...]

I'm thinking something like this patch.


I do not have the spec for the max duration at hand but it makes sense.

Can you:
- decrease the duration of each sleep and increase the count of 
iterations

- put the break on a line of its own
- add a Signed-off-by: Foo Bar [EMAIL PROTECTED] line
- Cc: [EMAIL PROTECTED] and Auke Kok [EMAIL PROTECTED]


I'm already checking out specs, which, for 8255x devices, are available 
on http://e1000.sf.net/ .


okay, I don't think this is needed at all:

Allthough the spec itself didn't talk about phy reset times, I've ran this patch with 
some debugging output on a few boxes and did some speed/duplex settings, and the PHY 
reset returned succesfull after the very first mdio_read, which is before any msleep(10) 
is executed. That is also expected behaviour.


I think you might be confusing this with a MAC reset, which has a documented 10usec 
timeout (see 8255x developers manual). The driver already adheres to this by doing a 
20usec delay after software/selective resets.


which gets us back to the original problem: how did your driver end up in loopback mode? 
(and, how did you figure out that it did??).


Cheers,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100: Wait for PHY reset to complete?

2006-10-25 Thread Francois Romieu
Auke Kok [EMAIL PROTECTED] :
[...]
 okay, I don't think this is needed at all:
 
 Allthough the spec itself didn't talk about phy reset times, I've ran this 
 patch with some debugging output on a few boxes and did some speed/duplex 
 settings, and the PHY reset returned succesfull after the very first 
 mdio_read, which is before any msleep(10) is executed. That is also 
 expected behaviour.

Would it be expected as well that the PHY has performed reset before the
second mdio_write hits it when the mdio_read is not there at all ?

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100: Wait for PHY reset to complete?

2006-10-25 Thread Anders Grafström
Auke Kok wrote:
 Allthough the spec itself didn't talk about phy reset times, I've ran this
 patch with
 some debugging output on a few boxes and did some speed/duplex settings,
 and the PHY
 reset returned succesfull after the very first mdio_read, which is before
 any msleep(10)
 is executed. That is also expected behaviour.

 I think you might be confusing this with a MAC reset, which has a
 documented 10usec
 timeout (see 8255x developers manual). The driver already adheres to this
 by doing a
 20usec delay after software/selective resets.

 which gets us back to the original problem: how did your driver end up in
 loopback mode?
 (and, how did you figure out that it did??).


This is what the 2.4.33.3 driver does:

void
e100_phy_reset(struct e100_private *bdp)
{
u16 ctrl_reg;
ctrl_reg = BMCR_RESET;
e100_mdi_write(bdp, MII_BMCR, bdp-phy_addr, ctrl_reg);
/* ieee 802.3 : The reset process shall be completed   */
/* within 0.5 seconds from the settting of PHY reset bit.  */
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(HZ / 2);
}

And here
http://www.cs.helsinki.fi/linux/linux-kernel/2003-23/1245.html
I found this entry:

[EMAIL PROTECTED] (03/06/08 1.1218)
[e100] misc
...
* Add 1/2 second delay after PHY reset to allow link partner to
see and respond to reset, per IEEE 802.3.


I ran mii-diag when the LEDs went out and the register dump
said it was in loopback. It is somewhat difficult reproduce.
It seems to be timing dependent, something else has to occur
at the same time.
I must confess I have only seen it with the 2.6.13 kernel.
I have not been able to reproduce it with 2.6.18.
But I have found no change in the driver that would fix it so
I suspect the problem is still there.

I have tried adding debug output to see if I can read back the
RESET bit in set state, but then the problem refuses to show
so I don't think I can rule out an unfinished PHY reset.

Anders


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100: Wait for PHY reset to complete?

2006-10-25 Thread Auke Kok

Anders Grafström wrote:

Auke Kok wrote:

Allthough the spec itself didn't talk about phy reset times, I've ran this
patch with
some debugging output on a few boxes and did some speed/duplex settings,
and the PHY
reset returned succesfull after the very first mdio_read, which is before
any msleep(10)
is executed. That is also expected behaviour.

I think you might be confusing this with a MAC reset, which has a
documented 10usec
timeout (see 8255x developers manual). The driver already adheres to this
by doing a
20usec delay after software/selective resets.

which gets us back to the original problem: how did your driver end up in
loopback mode?
(and, how did you figure out that it did??).



This is what the 2.4.33.3 driver does:

void
e100_phy_reset(struct e100_private *bdp)
{
u16 ctrl_reg;
ctrl_reg = BMCR_RESET;
e100_mdi_write(bdp, MII_BMCR, bdp-phy_addr, ctrl_reg);
/* ieee 802.3 : The reset process shall be completed   */
/* within 0.5 seconds from the settting of PHY reset bit.  */
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(HZ / 2);
}

And here
http://www.cs.helsinki.fi/linux/linux-kernel/2003-23/1245.html
I found this entry:

[EMAIL PROTECTED] (03/06/08 1.1218)
[e100] misc
...
* Add 1/2 second delay after PHY reset to allow link partner to
see and respond to reset, per IEEE 802.3.


I ran mii-diag when the LEDs went out and the register dump
said it was in loopback. It is somewhat difficult reproduce.
It seems to be timing dependent, something else has to occur
at the same time.
I must confess I have only seen it with the 2.6.13 kernel.
I have not been able to reproduce it with 2.6.18.
But I have found no change in the driver that would fix it so
I suspect the problem is still there.

I have tried adding debug output to see if I can read back the
RESET bit in set state, but then the problem refuses to show
so I don't think I can rule out an unfinished PHY reset.


theoretically, yes, the ieee spec PHY reset timeout is kind of silly: in no way do we 
assume that we have re-negotiated link after 1/2 a second! Other code in the driver 
should take care of that, and since it works I'll assume it does ;)


the mdio_read probably acts as a flush to the hardware too - masquerading problems, more 
goodness. Perhaps we should do a single read in all cases and forget about the timeout 
(is there an mdio_write_flush?)


Basically the timeout is wrong: a LINK reset is not a PHY reset. The PHY is back online 
and ready to respond in (probably) a single clock cycle. The link can take up to 3 
seconds in normal cases. Waiting for 1/2 a second does not fix anything there. Here's 
where the 8255x (PHY part) spec abandons us: I don't read anything about PHY reset 
timeouts in it.


Can you try to debug if your while () timeout loop is actually waiting for a significant 
amount? something like adding a printk(KERN_ERR counted down to %d0 msec\n, counter); 
after the entire while{} loop should show you if there is variation in the PHY reset 
time needed for the PHY to be back online.


running mii-diag before the link comes back up might be causing the issue in the first 
place, and certainly suggests a small race.


Have you tried to run the e100-sbit branch from jgarzik's netdev-2.6 tree? We're still 
looking into merging this and I guess I should push it to -mm to have it receive some 
testing


Cheers,

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html