On 4/29/20 1:38 PM, Jay Vosburgh wrote:
Thomas Falcon <tlfal...@linux.ibm.com> wrote:

The following behavior has been observed when testing logical partition
migration of LACP-bonded VNIC devices in a PowerVM pseries environment.

1. When performing the migration, the bond master detects that a slave has
   lost its link, deactivates the LACP port, and sets the port's
   is_enabled flag to false.
2. The slave device then updates it's carrier state to off while it resets
   itself. This update triggers a NETDEV_CHANGE notification, which performs
   a speed and duplex update. The device does not return a valid speed
   and duplex, so the master sets the slave link state to BOND_LINK_FAIL.
3. When the slave VNIC device(s) are active again, some operations, such
   as setting the port's is_enabled flag, are not performed when transitioning
   the link state back to BOND_LINK_UP from BOND_LINK_FAIL, though the state
   prior to the speed check was BOND_LINK_DOWN.
        Just to make sure I'm understanding correctly, in regards to
"the state prior to the speed check was BOND_LINK_DOWN," do you mean
that during step 1, the slave link is set to BOND_LINK_DOWN, and then in
step 2 changed from _DOWN to _FAIL?

Affected devices are therefore not utilized in the aggregation though they
are operational. The simplest way to fix this seems to be to restrict the
link state change to devices that are currently up and running.
        This sounds similar to an issue from last fall; can you confirm
that you're running with a kernel that includes:

1899bb325149 bonding: fix state transition issue in link monitoring

It did not have that fix.  I will patch the kernel and rerun the test.

Thanks,

Tom


        -J
        

CC: Jay Vosburgh <j.vosbu...@gmail.com>
CC: Veaceslav Falico <vfal...@gmail.com>
CC: Andy Gospodarek <a...@greyhouse.net>
Signed-off-by: Thomas Falcon <tlfal...@linux.ibm.com>
---
drivers/net/bonding/bond_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2e70e43c5df5..d840da7cd379 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3175,7 +3175,8 @@ static int bond_slave_netdev_event(unsigned long event,
                 * speeds/duplex are available.
                 */
                if (bond_update_speed_duplex(slave) &&
-                   BOND_MODE(bond) == BOND_MODE_8023AD) {
+                   BOND_MODE(bond) == BOND_MODE_8023AD &&
+                   slave->link == BOND_LINK_UP) {
                        if (slave->last_link_up)
                                slave->link = BOND_LINK_FAIL;
                        else
--
2.18.2

---
        -Jay Vosburgh, jay.vosbu...@canonical.com

Reply via email to