Re: [PATCH RFC] net: phy: Introduced the PHY_AN_PENDING state

2015-06-10 Thread Keng Soon Cheah
Resending as I didn't reply all.

On Tue, Jun 09, 2015 at 10:17:59PM -0700, Florian Fainelli wrote:
> 
> The typical way to work around these problems are to fix them at the PHY
> driver level, see below.
> 

My first attempt of work around is to target on the PHY driver but I
couldn't figure out a good way in doing that because

1. We use RT kernel. The jitter performance is very sensitive to the time
we spend in the kernel. Hence polling for auto-negotiation to complete
before starting a new one could hurt the system performance. Moreover,
I've seen the PHY takes up to 11s to complete auto-negotiation.

2. Ruling out status polling mean I have to switch to use some work
scheduling mechanism to let the PHY driver come back and check the
auto-negotiation status (which is what the state machine is doing now)
but this seem to even complicate the solution because the PHY state machine
has moved to PHY_AN state at the same time, but the auto-negotiation has
not really been fired yet. There are more conditions that need to be
consider to sync the PHY driver and the PHY state machine after the
previous auto-negotiation finish.

It looks like a dead end to me for continuing down the path of modifying
PHY driver. Do let me know if you have better idea to achieve the same
objective.

> 
> That sounds like a bug in the PHY state machine and/or the PHY driver if
> you are allowed to restart auto-negotiation while one is pending. Now
> that the PHY state machine has debug prints built-in, could you capture
> a trace of this failing case?
> 
> Is this observed with the generic PHY driver or a custom PHY driver?
> 

It's not really a problem in the state machine or PHY driver. A very common
scenario for an auto-negotiation to start before previous complete is at
system boot up where the previous auto-negotiation, either triggered by
hardware (because PHY come out from reset and auto-negotiate by itself)
or software (U-Boot triggering PHY software reset).

The other scenario that I'm able to induce the same effect is by doing
mii-tool -r in Linux.

> As usual with state machines, introducing a new state needs to be
> carefully done in order to make sure that all transitions are correct,
> so far I would rather work on finding the root cause/extending the
> timeout and/or making it configurable on a PHY-driver basis rather than
> having this additional state which is more error prone.
> 

I agree that introducing changes to the state machine will need careful
review and it's my last option within the constraint I have. I tried to
serialize the PHY_AN_PENDING and PHY_AN state to minimize the disruption
introduced to the state machine. There is only one entry to the
PHY_AN_PENDING state (via phy_start_aneg) and there is only one exit point
to continue with PHY_AN. The state machine would otherwise identical with
the original state machine.
Should the PHY drop to any state other than PHY_AN_PENDING, they will
always transition their state as usual.
Should the PHY state machine requires an auto-negotiation, it will always
enter PHY_AN_PENDING and always continue with PHY_AN as usual.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] net: phy: Introduced the PHY_AN_PENDING state

2015-06-09 Thread Florian Fainelli
Le 06/09/15 21:36, Keng Soon Cheah a écrit :
> The PHY_AN_PENDING state is put as a gate to enter the PHY_AN state
> where it will wait for any uncomplete auto-negotiation session to
> finish before starting a new one.
> 
> This extra state could be used to workaround some auto-negotation
> issues from certain vendors.

The typical way to work around these problems are to fix them at the PHY
driver level, see below.

> 
> an_pending_timeout module parameter is used to enable the AN_PENDING
> transition state. Set it to 0 to disable AN_PENDING state transition,
> set it to any non-zero value to specify the timeout period for
> PHY_AN_PENDING state in second. The default value is 0.
> 
> an_pending_guard module parameter serves as a guard band to delay
> the auto-negotiation firing after the previous auto-negotiation
> finish.
> 
> Signed-off-by: Keng Soon Cheah 
> 
> Conflicts:
> 
>   drivers/net/phy/phy.c
> ---
> We observed failure in the ethernet link operation when our board pairs
> with some network switch model. The problem happens when an
> auto-negotiation is started around the time the previous auto-negotiation
> complete. We believe this might be an interoperatibility issue between
> the PHYs but we need a short-term solution in software to workaround the
> issue.
> 
> We found that we are able to avoid from hitting the problem by waiting any
> pending auto-negotiation to complete before starting a new one and this
> patch is designed to serve the purpose.

That sounds like a bug in the PHY state machine and/or the PHY driver if
you are allowed to restart auto-negotiation while one is pending. Now
that the PHY state machine has debug prints built-in, could you capture
a trace of this failing case?

Is this observed with the generic PHY driver or a custom PHY driver?

> 
> A PHY_AN_PENDING state is introduced and it will act as a gate to enter
> the PHY_AN state. This state will check for auto-negotiation completion
> or timeout after an_pending_timeout period, then it will wait for
> an_pending_guard before triggering another auto-negotiation.
> 
> The following diagram shows the timing diagram
> 
> 
>an_pending_timeout   an_pending_guard
>V   V auto-nego
>|->||
>   ^
>auto-negotiation complete/timeout
> 
> We do not have plan to submit this patch upstream (unless the community
> feels this patch is useful in general) but we would like to seek for
> feedback or advice if this patch could introduce new problems.

As usual with state machines, introducing a new state needs to be
carefully done in order to make sure that all transitions are correct,
so far I would rather work on finding the root cause/extending the
timeout and/or making it configurable on a PHY-driver basis rather than
having this additional state which is more error prone.

Thanks!

> 
> ---
>  drivers/net/phy/phy.c |   44 +++-
>  include/linux/phy.h   |3 ++-
>  2 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index b2197b5..35e6484 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -38,6 +38,16 @@
>  
>  #include 
>  
> +static unsigned int an_pending_timeout;
> +module_param(an_pending_timeout, uint, 0644);
> +MODULE_PARM_DESC(an_pending_timeout,
> + "Timeout period for PHY_AN_PENDING state in second. 0 to disable 
> PHY_AN_PENDING state (default)");
> +
> +static unsigned int an_pending_guard;
> +module_param(an_pending_guard, uint, 0644);
> +MODULE_PARM_DESC(an_pending_guard,
> + "Guard band period before firing auto-negotiation from PHY_AN_PENDING 
> state in second. Default to 0");
> +
>  static const char *phy_speed_to_str(int speed)
>  {
>   switch (speed) {
> @@ -82,7 +92,6 @@ static const char *phy_state_to_str(enum phy_state st)
>   return NULL;
>  }
>  
> -
>  /**
>   * phy_print_status - Convenience function to print out the current phy 
> status
>   * @phydev: the phy_device struct
> @@ -485,6 +494,18 @@ int phy_start_aneg(struct phy_device *phydev)
>  
>   /* Invalidate LP advertising flags */
>   phydev->lp_advertising = 0;
> + if (an_pending_timeout) {
> + switch (phydev->state) {
> + case PHY_AN_PENDING:
> + case PHY_HALTED:
> + break;
> + default:
> + phydev->state = PHY_AN_PENDING;
> + phydev->link_timeout = an_pending_timeout;
> + goto out_unlock;
> + }
> +
> + }
>  
>   err = phydev->drv->config_aneg(phydev);
>   if (err < 0)
> @@ -831,6 +852,27 @@ void phy_state_machine(struct work_struct *work)
>   phydev->link_timeout = PHY_AN_TIMEOUT;
>  
>   break;
> + case PHY_AN_PENDING:
> +

[PATCH RFC] net: phy: Introduced the PHY_AN_PENDING state

2015-06-09 Thread Keng Soon Cheah
The PHY_AN_PENDING state is put as a gate to enter the PHY_AN state
where it will wait for any uncomplete auto-negotiation session to
finish before starting a new one.

This extra state could be used to workaround some auto-negotation
issues from certain vendors.

an_pending_timeout module parameter is used to enable the AN_PENDING
transition state. Set it to 0 to disable AN_PENDING state transition,
set it to any non-zero value to specify the timeout period for
PHY_AN_PENDING state in second. The default value is 0.

an_pending_guard module parameter serves as a guard band to delay
the auto-negotiation firing after the previous auto-negotiation
finish.

Signed-off-by: Keng Soon Cheah 

Conflicts:

drivers/net/phy/phy.c
---
We observed failure in the ethernet link operation when our board pairs
with some network switch model. The problem happens when an
auto-negotiation is started around the time the previous auto-negotiation
complete. We believe this might be an interoperatibility issue between
the PHYs but we need a short-term solution in software to workaround the
issue.

We found that we are able to avoid from hitting the problem by waiting any
pending auto-negotiation to complete before starting a new one and this
patch is designed to serve the purpose.

A PHY_AN_PENDING state is introduced and it will act as a gate to enter
the PHY_AN state. This state will check for auto-negotiation completion
or timeout after an_pending_timeout period, then it will wait for
an_pending_guard before triggering another auto-negotiation.

The following diagram shows the timing diagram


   an_pending_timeout   an_pending_guard
   V   V auto-nego
   |->||
  ^
   auto-negotiation complete/timeout

We do not have plan to submit this patch upstream (unless the community
feels this patch is useful in general) but we would like to seek for
feedback or advice if this patch could introduce new problems.

---
 drivers/net/phy/phy.c |   44 +++-
 include/linux/phy.h   |3 ++-
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index b2197b5..35e6484 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -38,6 +38,16 @@
 
 #include 
 
+static unsigned int an_pending_timeout;
+module_param(an_pending_timeout, uint, 0644);
+MODULE_PARM_DESC(an_pending_timeout,
+   "Timeout period for PHY_AN_PENDING state in second. 0 to disable 
PHY_AN_PENDING state (default)");
+
+static unsigned int an_pending_guard;
+module_param(an_pending_guard, uint, 0644);
+MODULE_PARM_DESC(an_pending_guard,
+   "Guard band period before firing auto-negotiation from PHY_AN_PENDING 
state in second. Default to 0");
+
 static const char *phy_speed_to_str(int speed)
 {
switch (speed) {
@@ -82,7 +92,6 @@ static const char *phy_state_to_str(enum phy_state st)
return NULL;
 }
 
-
 /**
  * phy_print_status - Convenience function to print out the current phy status
  * @phydev: the phy_device struct
@@ -485,6 +494,18 @@ int phy_start_aneg(struct phy_device *phydev)
 
/* Invalidate LP advertising flags */
phydev->lp_advertising = 0;
+   if (an_pending_timeout) {
+   switch (phydev->state) {
+   case PHY_AN_PENDING:
+   case PHY_HALTED:
+   break;
+   default:
+   phydev->state = PHY_AN_PENDING;
+   phydev->link_timeout = an_pending_timeout;
+   goto out_unlock;
+   }
+
+   }
 
err = phydev->drv->config_aneg(phydev);
if (err < 0)
@@ -831,6 +852,27 @@ void phy_state_machine(struct work_struct *work)
phydev->link_timeout = PHY_AN_TIMEOUT;
 
break;
+   case PHY_AN_PENDING:
+   /* Check if negotiation is done.  Break if there's an error */
+   err = phy_aneg_done(phydev);
+   if (err < 0)
+   break;
+
+   /* If AN is done, we'll proceed with the real aneg triggering */
+   if (err > 0) {
+   if (phydev->link_timeout > 0)
+   phydev->link_timeout = -(an_pending_guard);
+   else if (phydev->link_timeout < 0)
+   phydev->link_timeout++;
+   } else
+   phydev->link_timeout--;
+
+   if (0 == phydev->link_timeout) {
+   needs_aneg = true;
+
+   phydev->link_timeout = PHY_AN_TIMEOUT;
+   }
+   break;
case PHY_AN:
err = phy_read_status(phydev);
if (err < 0)
diff --git a/include/linux/phy.h b/include/linux/phy.h
index a26c3f8..a63afdc 10