On 2017/08/02 10:34, Lennart Sorensen wrote: > On Wed, Aug 02, 2017 at 02:28:07PM +0300, Neftin, Sasha wrote: > > On 7/21/2017 21:36, Benjamin Poirier wrote: > > > Lennart reported the following race condition: > > > > > > \ e1000_watchdog_task > > > \ e1000e_has_link > > > \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link > > > /* link is up */ > > > mac->get_link_status = false; > > > > > > /* interrupt */ > > > \ e1000_msix_other > > > hw->mac.get_link_status = true; > > > > > > link_active = !hw->mac.get_link_status > > > /* link_active is false, wrongly */ > > > > > > This problem arises because the single flag get_link_status is used to > > > signal two different states: link status needs checking and link status is > > > down. > > > > > > Avoid the problem by using the return value of .check_for_link to signal > > > the link status to e1000e_has_link(). > > > > > > Reported-by: Lennart Sorensen <lsore...@csclub.uwaterloo.ca> > > > Signed-off-by: Benjamin Poirier <bpoir...@suse.com> > > > --- > > > drivers/net/ethernet/intel/e1000e/mac.c | 11 ++++++++--- > > > drivers/net/ethernet/intel/e1000e/netdev.c | 2 +- > > > 2 files changed, 9 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/net/ethernet/intel/e1000e/mac.c > > > b/drivers/net/ethernet/intel/e1000e/mac.c > > > index b322011ec282..f457c5703d0c 100644 > > > --- a/drivers/net/ethernet/intel/e1000e/mac.c > > > +++ b/drivers/net/ethernet/intel/e1000e/mac.c > > > @@ -410,6 +410,9 @@ void e1000e_clear_hw_cntrs_base(struct e1000_hw *hw) > > > * Checks to see of the link status of the hardware has changed. If a > > > * change in link status has been detected, then we read the PHY > > > registers > > > * to get the current speed/duplex if link exists. > > > + * > > > + * Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 > > > (link > > > + * up). > > > **/ > > > s32 e1000e_check_for_copper_link(struct e1000_hw *hw) > > > { > > > @@ -423,7 +426,7 @@ s32 e1000e_check_for_copper_link(struct e1000_hw *hw) > > > * Change or Rx Sequence Error interrupt. > > > */ > > > if (!mac->get_link_status) > > > - return 0; > > > + return 1; > > > /* First we want to see if the MII Status Register reports > > > * link. If so, then we want to get the current speed/duplex > > > @@ -461,10 +464,12 @@ s32 e1000e_check_for_copper_link(struct e1000_hw > > > *hw) > > > * different link partner. > > > */ > > > ret_val = e1000e_config_fc_after_link_up(hw); > > > - if (ret_val) > > > + if (ret_val) { > > > e_dbg("Error configuring flow control\n"); > > > + return ret_val; > > > + } > > > - return ret_val; > > > + return 1; > > > } > > > /** > > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c > > > b/drivers/net/ethernet/intel/e1000e/netdev.c > > > index fc6a1d9999b2..5a8ab1136566 100644 > > > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > > > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > > > @@ -5081,7 +5081,7 @@ static bool e1000e_has_link(struct e1000_adapter > > > *adapter) > > > case e1000_media_type_copper: > > > if (hw->mac.get_link_status) { > > > ret_val = hw->mac.ops.check_for_link(hw); > > > - link_active = !hw->mac.get_link_status; > > > + link_active = ret_val > 0; > > > } else { > > > link_active = true; > > > } > > > > Hello Benjamin, > > > > Will this patch fix any serious problem with link indication? Is it > > necessary? Can we consider your patch series without 4/5 part? > > Without this patch, you have the race condition that can make the > watchdog_task mistakenly think the link is down when it isn't, and then > it resets the adapter, which does make the link go down. > > So it is rather catastrophic for the interface. > > The other patch to the interrupt handling should make it never get hit, > but the issue does still exist if not fixed and I wouldn't rule out that > it could possibly still happen even with the other fix in place.
Exactly. I wouldn't have explained it better, thanks.