On 2017/08/02 10:34, Lennart Sorensen wrote:
> On Wed, Aug 02, 2017 at 02:28:07PM +0300, Neftin, Sasha wrote:
> > On 7/21/2017 21:36, Benjamin Poirier wrote:
> > > Lennart reported the following race condition:
> > > 
> > > \ e1000_watchdog_task
> > >      \ e1000e_has_link
> > >          \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
> > >              /* link is up */
> > >              mac->get_link_status = false;
> > > 
> > >                              /* interrupt */
> > >                              \ e1000_msix_other
> > >                                  hw->mac.get_link_status = true;
> > > 
> > >          link_active = !hw->mac.get_link_status
> > >          /* link_active is false, wrongly */
> > > 
> > > This problem arises because the single flag get_link_status is used to
> > > signal two different states: link status needs checking and link status is
> > > down.
> > > 
> > > Avoid the problem by using the return value of .check_for_link to signal
> > > the link status to e1000e_has_link().
> > > 
> > > Reported-by: Lennart Sorensen <lsore...@csclub.uwaterloo.ca>
> > > Signed-off-by: Benjamin Poirier <bpoir...@suse.com>
> > > ---
> > >   drivers/net/ethernet/intel/e1000e/mac.c    | 11 ++++++++---
> > >   drivers/net/ethernet/intel/e1000e/netdev.c |  2 +-
> > >   2 files changed, 9 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/intel/e1000e/mac.c 
> > > b/drivers/net/ethernet/intel/e1000e/mac.c
> > > index b322011ec282..f457c5703d0c 100644
> > > --- a/drivers/net/ethernet/intel/e1000e/mac.c
> > > +++ b/drivers/net/ethernet/intel/e1000e/mac.c
> > > @@ -410,6 +410,9 @@ void e1000e_clear_hw_cntrs_base(struct e1000_hw *hw)
> > >    *  Checks to see of the link status of the hardware has changed.  If a
> > >    *  change in link status has been detected, then we read the PHY 
> > > registers
> > >    *  to get the current speed/duplex if link exists.
> > > + *
> > > + *  Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 
> > > (link
> > > + *  up).
> > >    **/
> > >   s32 e1000e_check_for_copper_link(struct e1000_hw *hw)
> > >   {
> > > @@ -423,7 +426,7 @@ s32 e1000e_check_for_copper_link(struct e1000_hw *hw)
> > >            * Change or Rx Sequence Error interrupt.
> > >            */
> > >           if (!mac->get_link_status)
> > > -         return 0;
> > > +         return 1;
> > >           /* First we want to see if the MII Status Register reports
> > >            * link.  If so, then we want to get the current speed/duplex
> > > @@ -461,10 +464,12 @@ s32 e1000e_check_for_copper_link(struct e1000_hw 
> > > *hw)
> > >            * different link partner.
> > >            */
> > >           ret_val = e1000e_config_fc_after_link_up(hw);
> > > - if (ret_val)
> > > + if (ret_val) {
> > >                   e_dbg("Error configuring flow control\n");
> > > +         return ret_val;
> > > + }
> > > - return ret_val;
> > > + return 1;
> > >   }
> > >   /**
> > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> > > b/drivers/net/ethernet/intel/e1000e/netdev.c
> > > index fc6a1d9999b2..5a8ab1136566 100644
> > > --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> > > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> > > @@ -5081,7 +5081,7 @@ static bool e1000e_has_link(struct e1000_adapter 
> > > *adapter)
> > >           case e1000_media_type_copper:
> > >                   if (hw->mac.get_link_status) {
> > >                           ret_val = hw->mac.ops.check_for_link(hw);
> > > -                 link_active = !hw->mac.get_link_status;
> > > +                 link_active = ret_val > 0;
> > >                   } else {
> > >                           link_active = true;
> > >                   }
> > 
> > Hello Benjamin,
> > 
> > Will this patch fix any serious problem with link indication? Is it
> > necessary? Can we consider your patch series without 4/5 part?
> 
> Without this patch, you have the race condition that can make the
> watchdog_task mistakenly think the link is down when it isn't, and then
> it resets the adapter, which does make the link go down.
> 
> So it is rather catastrophic for the interface.
> 
> The other patch to the interrupt handling should make it never get hit,
> but the issue does still exist if not fixed and I wouldn't rule out that
> it could possibly still happen even with the other fix in place.

Exactly. I wouldn't have explained it better, thanks.

Reply via email to