** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Disco)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Eoan)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Focal)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1852077

Title:
  Backport: bonding: fix state transition issue in link monitoring

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  In Progress
Status in linux source package in Focal:
  In Progress

Bug description:
  == Justification ==
  From the well explained commit message:

  Since de77ecd4ef02 ("bonding: improve link-status update in
  mii-monitoring"), the bonding driver has utilized two separate variables
  to indicate the next link state a particular slave should transition to.
  Each is used to communicate to a different portion of the link state
  change commit logic; one to the bond_miimon_commit function itself, and
  another to the state transition logic.

   Unfortunately, the two variables can become unsynchronized,
  resulting in incorrect link state transitions within bonding.  This can
  cause slaves to become stuck in an incorrect link state until a
  subsequent carrier state transition.

   The issue occurs when a special case in bond_slave_netdev_event
  sets slave->link directly to BOND_LINK_FAIL.  On the next pass through
  bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL
  case will set the proposed next state (link_new_state) to BOND_LINK_UP,
  but the new_link to BOND_LINK_DOWN.  The setting of the final link state
  from new_link comes after that from link_new_state, and so the slave
  will end up incorrectly in _DOWN state.

   Resolve this by combining the two variables into one.

  == Fixes ==
  * 1899bb32 (bonding: fix state transition issue in link monitoring)

  This patch can be cherry-picked into E/F

  For older releases like B/D, it will needs to be backported as they are
  missing the slave_err() printk marco added in 5237ff79 (bonding: add
  slave_foo printk macros) as well as the commit to replace netdev_err()
  with slave_err() in e2a7420d (bonding/main: convert to using slave
  printk macros)

  For Xenial, the commit that causes this issue, de77ecd4, does not
  exist.

  == Test ==
  Test kernels can be found here:
  https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/

  The X-hwe and Disco kernel were tested by the bug reporter, Aleksei,
  the patched kernel works as expected.

  == Regression Potential ==
  Low.
  This patch just unify the variable used in link state change commit
  logic to prevent the occurrence of an incorrect state. And the changes
  are limited to the bonding driver itself.

  (Although the include/net/bonding.h will be used in other drivers, but
  the changes to that file is only affecting this bond_main.c driver)

  == Original Bug Report ==
  There's an issue with bonding driver in the current ubuntu kernels.
  Sometimes one link stuck in a weird state.
  It was fixed with patch https://www.spinics.net/lists/netdev/msg609506.html 
in upstream.
  Commit 1899bb325149e481de31a4f32b59ea6f24e176ea.

  We see this bug with linux 4.15 (ubuntu xenial, hwe kernel), but it
  should be reproducible with other current kernel versions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852077/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to