Hi Assaf,

In addition to the commands listed by Jesse,
please also provide "ethtool -i <eth#>" output.
This will assist us in identifying the NIC and
Firmware revision you are using.

- Don


> -----Original Message-----
> From: Jesse Brandeburg <jesse.brandeb...@intel.com>
> Sent: Tuesday, December 5, 2023 10:47 AM
> To: Assaf Albo <ass...@qwilt.com>; e1000-devel@lists.sourceforge.net; Matan
> Levy <mat...@qwilt.com>
> Subject: Re: [e1000-devel] Intel E810 100Gb goes down sporadically
> 
> On 12/3/2023 1:26 AM, Assaf Albo via E1000-devel wrote:
> > Hello guys,
> >
> > We are having constant network issues in production in that the link goes
> > down, waits *exactly* 7-8 seconds, and goes up again.
> > This can happen zero to a few times a day on all our servers; they are not
> > in the same location and are connected to different network devices.
> >
> > Each server runs as a KVM virtual machine with 60 CPUs (Pinning) and 224Gi
> > (Huge pages) - overall performance is excellent.
> > The NIC is PCI passed through to the KVM machine AS IS.
> > OS Rocky Linux 8.5, kernel 4.18.0-348.23.1.el8_5.x86_64 with Intel ice
> > 1.9.11 built and installed using rpm.
> > We have a traffic generator between two servers (our app: client+server)
> > that is reaching 94Gb and can replicate this issue.
> >
> > The dmesg once the issue occur:
> > Nov 28 16:01:27 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is Down
> > Nov 28 16:01:35 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is up 100
> > Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg
> > Advertised: Off, Autoneg Negotiated: False, Flow Control: None
> 
> Hi Assaf, sorry hear you're having problems.
> 
> w.r.t. the link down events we need to determine if it is a local down
> or remote.
> 
> Please gather the 'ethtool -S eth0' statistics for a system that has had
> some problems, and send to the list as text.
> 
> also, 'ethtool -m eth0'
> 
> The passthrough device shouldn't be any problem but I do recommend that
> if you're passing through the device to a VM, you try to match the
> destination PCIe function number to the origination ID to prevent odd
> issues.
> 
> like if your host device is:
> 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is
> 00:06.1, and not 00:06.0
> 
> So I guess with that statement I'd ask do you ever see the problem on
> systems with
> 3b:00.0 (ice PF PCIe in host)
> 00:06.0 (ice PF in VM)
> 
> having the link down issues?
> 
> Please include output from devlink dev info, and if you know it, what
> switch you're connected to.
> 
> Also, do you see any stats or events on the switch side when link is lost?
> 
> - Jesse
> 
> 
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel Ethernet, visit
> https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products


_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products

Reply via email to