> -----Original Message----- > From: Jay Vosburgh [mailto:jay.vosbu...@canonical.com] > Sent: 2019年9月19日 16:12 > To: zhangsha (A) <zhangsha.zh...@huawei.com> > Cc: vfal...@gmail.com; a...@greyhouse.net; da...@davemloft.net; > net...@vger.kernel.org; linux-kernel@vger.kernel.org; yuehaibing > <yuehaib...@huawei.com>; hunongda <hunon...@huawei.com>; > Chenzhendong (alex) <alex.c...@huawei.com> > Subject: Re: [PATCH v3] bonding: force enable lacp port after link state > recovery for 802.3ad > > zhangsha (A) <zhangsha.zh...@huawei.com> wrote: > > >> -----Original Message----- > >> From: zhangsha (A) > >> Sent: 2019年9月18日 21:06 > >> To: jay.vosbu...@canonical.com; vfal...@gmail.com; > >> a...@greyhouse.net; da...@davemloft.net; net...@vger.kernel.org; > >> linux-kernel@vger.kernel.org; yuehaibing <yuehaib...@huawei.com>; > >> hunongda <hunon...@huawei.com>; Chenzhendong (alex) > >> <alex.c...@huawei.com>; zhangsha (A) <zhangsha.zh...@huawei.com> > >> Subject: [PATCH v3] bonding: force enable lacp port after link state > >> recovery for 802.3ad > >> > >> From: Sha Zhang <zhangsha.zh...@huawei.com> > >> > >> After the commit 334031219a84 ("bonding/802.3ad: fix slave link > >> initialization transition states") merged, the slave's link status > >> will be changed to BOND_LINK_FAIL from BOND_LINK_DOWN in the > following scenario: > >> - Driver reports loss of carrier and > >> bonding driver receives NETDEV_DOWN notifier > >> - slave's duplex and speed is zerod and > >> its port->is_enabled is cleard to 'false'; > >> - Driver reports link recovery and > >> bonding driver receives NETDEV_UP notifier; > >> - If speed/duplex getting failed here, the link status > >> will be changed to BOND_LINK_FAIL; > >> - The MII monotor later recover the slave's speed/duplex > >> and set link status to BOND_LINK_UP, but remains > >> the 'port->is_enabled' to 'false'. > >> > >> In this scenario, the lacp port will not be enabled even its speed > >> and duplex are valid. The bond will not send LACPDU's, and its state is > 'AD_STATE_DEFAULTED' > >> forever. The simplest fix I think is to call > >> bond_3ad_handle_link_change() in bond_miimon_commit, this function > >> can enable lacp after port slave speed check. > >> As enabled, the lacp port can run its state machine normally after link > recovery. > >> > >> Signed-off-by: Sha Zhang <zhangsha.zh...@huawei.com> > >> --- > >> drivers/net/bonding/bond_main.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/net/bonding/bond_main.c > >> b/drivers/net/bonding/bond_main.c index 931d9d9..76324a5 100644 > >> --- a/drivers/net/bonding/bond_main.c > >> +++ b/drivers/net/bonding/bond_main.c > >> @@ -2206,7 +2206,8 @@ static void bond_miimon_commit(struct bonding > >> *bond) > >> */ > >> if (BOND_MODE(bond) == BOND_MODE_8023AD && > >> slave->link == BOND_LINK_UP) > >> - > >> bond_3ad_adapter_speed_duplex_changed(slave); > >> + bond_3ad_handle_link_change(slave, > >> + BOND_LINK_UP); > >> continue; > >> > >> case BOND_LINK_UP: > > > >Hi, David, > >I have replied your email for a while, I guess you may miss my email, so I > resend it. > >The following link address is the last email, please review the new one > >again, > thank you. > >https://patchwork.ozlabs.org/patch/1151915/ > > > >Last time, you doubted this is a driver specific problem, I prefer to > >believe it's not because I find the commit 4d2c0cda, its log says " > >Some NIC drivers don't have correct speed/duplex settings at the time > >they send NETDEV_UP notification ...". > > > >Anyway, I think the lacp status should be fixed correctly, since > >link-monitoring (miimon) set SPEED/DUPLEX right here. > > I suspect this is going to be related to the concurrent discussion with > Aleksei, and would like to see the instrumentation results from his tests > before > adding another change to attempt to resolve this. > > Also, what device are you using for your testing, and are you able to > run the instrumentation patch that I provided to Aleksei and provide its > results? > > -J >
Yes, I think it's the same problem. I am using a Huawei hinic card with kernel 4.19 and got the same message and the weird system mac "00:00:00:00:00:00": "link status definitely down for interface eth6, disabling it link status up again after 0 ms for interface eth6" I will run your instrumentation patch, but maybe I need more times. In fact, I have tried to reproduce the problem for thousands of times, but never succeeded. > --- > -Jay Vosburgh, jay.vosbu...@canonical.com