Hi Assaf, In addition to the commands listed by Jesse, please also provide "ethtool -i <eth#>" output. This will assist us in identifying the NIC and Firmware revision you are using.
- Don > -----Original Message----- > From: Jesse Brandeburg <jesse.brandeb...@intel.com> > Sent: Tuesday, December 5, 2023 10:47 AM > To: Assaf Albo <ass...@qwilt.com>; e1000-devel@lists.sourceforge.net; Matan > Levy <mat...@qwilt.com> > Subject: Re: [e1000-devel] Intel E810 100Gb goes down sporadically > > On 12/3/2023 1:26 AM, Assaf Albo via E1000-devel wrote: > > Hello guys, > > > > We are having constant network issues in production in that the link goes > > down, waits *exactly* 7-8 seconds, and goes up again. > > This can happen zero to a few times a day on all our servers; they are not > > in the same location and are connected to different network devices. > > > > Each server runs as a KVM virtual machine with 60 CPUs (Pinning) and 224Gi > > (Huge pages) - overall performance is excellent. > > The NIC is PCI passed through to the KVM machine AS IS. > > OS Rocky Linux 8.5, kernel 4.18.0-348.23.1.el8_5.x86_64 with Intel ice > > 1.9.11 built and installed using rpm. > > We have a traffic generator between two servers (our app: client+server) > > that is reaching 94Gb and can replicate this issue. > > > > The dmesg once the issue occur: > > Nov 28 16:01:27 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is Down > > Nov 28 16:01:35 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is up 100 > > Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg > > Advertised: Off, Autoneg Negotiated: False, Flow Control: None > > Hi Assaf, sorry hear you're having problems. > > w.r.t. the link down events we need to determine if it is a local down > or remote. > > Please gather the 'ethtool -S eth0' statistics for a system that has had > some problems, and send to the list as text. > > also, 'ethtool -m eth0' > > The passthrough device shouldn't be any problem but I do recommend that > if you're passing through the device to a VM, you try to match the > destination PCIe function number to the origination ID to prevent odd > issues. > > like if your host device is: > 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is > 00:06.1, and not 00:06.0 > > So I guess with that statement I'd ask do you ever see the problem on > systems with > 3b:00.0 (ice PF PCIe in host) > 00:06.0 (ice PF in VM) > > having the link down issues? > > Please include output from devlink dev info, and if you know it, what > switch you're connected to. > > Also, do you see any stats or events on the switch side when link is lost? > > - Jesse > > > _______________________________________________ > E1000-devel mailing list > E1000-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel Ethernet, visit > https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel Ethernet, visit https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products