On Tue, Aug 18, 2020 at 04:53:42PM +1000, Jonathan Matthew wrote: > On Mon, Aug 17, 2020 at 03:32:35PM -0400, Winfred Harrelson wrote: > > On Mon, Aug 17, 2020 at 03:40:47PM +0200, Hrvoje Popovski wrote: > > > On 17.8.2020. 11:46, Stuart Henderson wrote: > > > > On 2020-08-15, Hrvoje Popovski <hrv...@srce.hr> wrote: > > > >> On 15.8.2020. 0:48, Hrvoje Popovski wrote: > > > >>> On 12.8.2020. 15:18, Winfred Harrelson wrote: > > > >>>> On Tue, Aug 11, 2020 at 07:52:10PM +0100, Tom Smyth wrote: > > > >>>>> Hi Winfred, > > > >>>>> the intel 710 is a complex card, I would suggest that you try > > > >>>>> updating the > > > >>>>> firmware on the card, available from intel.com or your card vendor, > > > >>>>> you may have to boot to a live linux cd to apply the firmware > > > >>>>> update, > > > >>>>> > > > >>>>> but I had some issues with the Intel XL710 cards and I had to > > > >>>>> update the > > > >>>>> firmware to get it working stable, > > > >>>>> > > > >>>>> I hope this helps > > > >>>>> Tom Smyth > > > >>>> > > > >>>> Adding misc@openbsd.org back to the CC for the record. > > > >>>> > > > >>>> Thanks for the quick reply. I didn't reply back yesterday because I > > > >>>> was having trouble getting the firmware updated from a Linux boot > > > >>>> disk. > > > >>>> I ended up having to try from a Windows boot disk. Unfortunately, I > > > >>>> am getting the same thing again: > > > >>>> > > > >>>> > > > >>>> wharrels@styx2:/home/wharrels# dmesg | grep ^ixl > > > >>>> ixl0 at pci5 dev 0 function 0 "Intel XXV710 SFP28" rev 0x02: port 0, > > > >>>> FW 8.0.61820 API 1.11, msix, 8 queues, address 3c:fd:fe:ed:b7:28 > > > >>>> ixl1 at pci5 dev 0 function 1 "Intel XXV710 SFP28" rev 0x02: port 1, > > > >>>> FW 8.0.61820 API 1.11, msix, 8 queues, address 3c:fd:fe:ed:b7:29 > > > >>>> ixl2 at pci8 dev 0 function 0 "Intel XXV710 SFP28" rev 0x02: port 0, > > > >>>> FW 8.0.61820 API 1.11, msix, 8 queues, address 3c:fd:fe:eb:19:b0 > > > >>>> ixl3 at pci8 dev 0 function 1 "Intel XXV710 SFP28" rev 0x02: port 1, > > > >>>> FW 8.0.61820 API 1.11, msix, 8 queues, address 3c:fd:fe:eb:19:b1 > > > >>>> ixl4 at pci12 dev 0 function 0 "Intel X722 10GBASE-T" rev 0x09: port > > > >>>> 0, FW 3.1.57069 API 1.5, msix, 8 queues, address 3c:ec:ef:1a:df:f2 > > > >>>> ixl5 at pci12 dev 0 function 1 "Intel X722 10GBASE-T" rev 0x09: port > > > >>>> 1, FW 3.1.57069 API 1.5, msix, 8 queues, address 3c:ec:ef:1a:df:f3 > > > >>>> > > > >>>> Yup, all the XXV710 cards have been updated to newest firmware. > > > >>>> > > > >>>> Now for the (failed) attempt: > > > >>>> > > > >>>> wharrels@styx2:/etc# ifconfig ixl0 > > > >>>> ixl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > > > >>>> lladdr 3c:fd:fe:ed:b7:28 > > > >>>> index 1 priority 0 llprio 3 > > > >>>> media: Ethernet autoselect (25GbaseSR full-duplex) > > > >>>> status: active > > > >>>> wharrels@styx2:/etc# ifconfig ixl2 > > > >>>> ixl2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > > > >>>> lladdr 3c:fd:fe:eb:19:b0 > > > >>>> index 3 priority 0 llprio 3 > > > >>>> media: Ethernet autoselect (25GbaseSR full-duplex) > > > >>>> status: active > > > >>>> wharrels@styx2:/etc# ifconfig aggr1 create > > > >>>> wharrels@styx2:/etc# ifconfig aggr1 trunkport ixl0 > > > >>>> wharrels@styx2:/etc# ifconfig aggr1 trunkport ixl2 > > > >>>> wharrels@styx2:/etc# ifconfig aggr1 up > > > >>>> wharrels@styx2:/etc# ifconfig aggr1 > > > >>>> aggr1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > > > >>>> lladdr fe:e1:ba:d0:7c:e9 > > > >>>> index 11 priority 0 llprio 7 > > > >>>> trunk: trunkproto lacp > > > >>>> trunk id: [(8000,fe:e1:ba:d0:7c:e9,000B,0000,0000), > > > >>>> (0000,00:00:00:00:00:00,0000,0000,0000)] > > > >>>> ixl0 lacp actor system pri 0x8000 mac > > > >>>> fe:e1:ba:d0:7c:e9, key 0xb, port pri 0x8000 number 0x1 > > > >>>> ixl0 lacp actor state activity,aggregation,defaulted > > > >>>> ixl0 lacp partner system pri 0x0 mac > > > >>>> 00:00:00:00:00:00, key 0x0, port pri 0x0 number 0x0 > > > >>>> ixl0 lacp partner state activity,aggregation,sync > > > >>>> ixl0 port > > > >>>> ixl2 lacp actor system pri 0x8000 mac > > > >>>> fe:e1:ba:d0:7c:e9, key 0xb, port pri 0x8000 number 0x3 > > > >>>> ixl2 lacp actor state activity,aggregation,defaulted > > > >>>> ixl2 lacp partner system pri 0x0 mac > > > >>>> 00:00:00:00:00:00, key 0x0, port pri 0x0 number 0x0 > > > >>>> ixl2 lacp partner state activity,aggregation,sync > > > >>>> ixl2 port > > > >>>> groups: aggr > > > >>>> media: Ethernet autoselect > > > >>>> status: no carrier > > > >>>> > > > >>>> > > > >>>> > > > >>>> I tried doing another sysupgrade this morning just in case something > > > >>>> had changed overnight but no luck. Any other ideas? > > > >>>> > > > >>>> Winfred > > > >>>> > > > >>> > > > >>> Hi, > > > >>> > > > >>> could you try install snapshot from http://ftp.hostserver.de/archive/ > > > >>> that is older than Thu Jun 25 06:41:38 2020 UTC ... > > > >>> > > > >>> maybe this commit broke xxv710 > > > >>> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/if_ixl.c?rev=1.56&content-type=text/x-cvsweb-markup > > > >>> > > > >>> i have vlans over aggr over x710-da2 with latest snapshot and it's > > > >>> working as expected .. > > > >>> > > > >>> ixl0 at pci1 dev 0 function 0 "Intel X710 SFP+" rev 0x02: port 0, FW > > > >>> 7.3.60988 API 1.10, msix, 8 queues > > > >>> ixl1 at pci1 dev 0 function 1 "Intel X710 SFP+" rev 0x02: port 1, FW > > > >>> 7.3.60988 API 1.10, msix, 8 queues > > > >>> > > > >> > > > >> with new firmware aggr is working > > > >> > > > >> ixl0 at pci1 dev 0 function 0 "Intel X710 SFP+" rev 0x02: port 0, FW > > > >> 8.0.61820 API 1.11, msix, 8 queues > > > >> ixl1 at pci1 dev 0 function 1 "Intel X710 SFP+" rev 0x02: port 1, FW > > > >> 8.0.61820 API 1.11, msix, 8 queues > > > > > > > > That's the same firmware as in your previous (failing) report, > > > > so is that "with new firmware and a snapshot from before Thu Jun 25"? > > > > Stuart, you may have gotten message from Hrvoje confused with mine > > (Winfred). Hrvoje seems to have gotten this to work but I haven't. > > I can use trunk(4) but I just think it would be nice to try to find > > out what is going on here. Don't want to be a pain though. > > > > > > > > it would be great if winfred could test snapshot before Jun 25 with > > > xxv710 card. x710 card works great with new firmware (8.0) and older one > > > 7.3 .. > > > > I have no way of testing this (25Gbps cards in lacp bond) at home > > so I have been testing at work. This is why I haven't done anything > > over the weekend. > > > > Grabbed snapshot from 2020-06-24 with same results: > > This sounds like multicast filters aren't working properly with your nic. > trunk(4) puts trunk ports in promisc mode, so multicast filters don't matter, > but aggr(4) doesn't. Could you try running 'tcpdump -ni ixl0' for a while and > see if that side of the aggr starts working?
I left the tcpdump running for a little over 5 minutes but that changed nothing: wharrels@styx2:/etc# ifconfig aggr1 aggr1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 lladdr fe:e1:ba:d1:25:69 index 12 priority 0 llprio 7 trunk: trunkproto lacp trunk id: [(8000,fe:e1:ba:d1:25:69,000C,0000,0000), (0000,00:00:00:00:00:00,0000,0000,0000)] ixl0 lacp actor system pri 0x8000 mac fe:e1:ba:d1:25:69, key 0xc, port pri 0x8000 number 0x1 ixl0 lacp actor state activity,aggregation,defaulted ixl0 lacp partner system pri 0x0 mac 00:00:00:00:00:00, key 0x0, port pri 0x0 number 0x0 ixl0 lacp partner state activity,aggregation,sync ixl0 port ixl1 lacp actor system pri 0x8000 mac fe:e1:ba:d1:25:69, key 0xc, port pri 0x8000 number 0x2 ixl1 lacp actor state activity,aggregation,defaulted ixl1 lacp partner system pri 0x0 mac 00:00:00:00:00:00, key 0x0, port pri 0x0 number 0x0 ixl1 lacp partner state activity,aggregation,sync ixl1 port groups: aggr media: Ethernet autoselect status: no carrier I also ran the same tcpdump on ixl1 but that didn't help. > Other parts of the output indicate we're not compatible with some aspects of > the > new firmware API, so I guess we have some work to do there. Wasn't working with the older firmware for the XXV710 cards either. I originally had FW 6.0.48442 before updating to the newest. Don't know how much longer I can keep messing with this before I have to go back to trunk(4) and put this box into production. I will try to get another test box up soon if I can. I also may be able to get hold of some Mellanox cards for testing. Thanks again to everyone for your time and help! Winfred