Hi all,

My name is Iván, I am an Openshift consultant working with a customer that
is facing the following issue. When a pod starts to OOM network interfaces
start to go down until the node becomes to 'Not Ready' State

This is dmesg  message:

sh-4.4# modinfo i40e
filename:
/lib/modules/4.18.0-193.51.1.el8_2.x86_64/kernel/drivers/net/ethernet/intel/i40e/i40e.ko.xz
version:        2.8.20-k
license:        GPL v2
description:    Intel(R) Ethernet Connection XL710 Network Driver
author:         Intel Corporation, <e1000-devel@lists.sourceforge.net>
rhelversion:    8.2

[Tue Jul 27 09:36:43 2021] Memory cgroup stats for
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd28f9b32_f407_44bd_a64d_cf05a10f2a5f.slice/crio-6ffbabbb06eb557c53304b0b253122a82ac4ea5d31535503f812a97dff9ac4c.scope:
cache:0KB rss:132KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB
writeback:0KB swap:0KB inactive_anon:0KB active_anon:904KB
inactive_file:0KB ative_file:0KB unevictable:0KB
[Tue Jul 27 09:36:43 2021] [ pid ]   uid  tgid total_vm      rss
pgtables_bytes swapents oom_score_adj name
[Tue Jul 27 09:36:43 2021] [21805]     0 21805    35869      651   176128
     0         -1000 conmon
[Tue Jul 27 09:36:43 2021] [21806]     0 21806   383963     5780   249856
     0         -1000 runc
[Tue Jul 27 09:36:43 2021] [21835]     0 21835     5029      855    65536
     0         -1000 exe


*[Tue Jul 27 09:36:43 2021] Out of memory and no killable processes...[Tue
Jul 27 09:36:43 2021] i40e 0000:14:00.1: Query for DCB configuration
failed, err I40E_ERR_NOT_READY aq_err OK[Tue Jul 27 09:36:44 2021] i40e
0000:14:00.1: DCB init failed -63, disabled*
[Tue Jul 27 09:36:44 2021] bond0: (slave eno2): link status definitely
down, disabling slave
*[Tue Jul 27 09:36:45 2021] i40e 0000:14:00.0: Query for DCB configuration
failed, err I40E_ERR_NOT_READY aq_err OK*
[Tue Jul 27 09:36:45 2021] i40e 0000:14:00.0: DCB init failed -63, disabled
[Tue Jul 27 09:36:45 2021] i40e 0000:14:00.1 eno2: port 4789 already
offloaded
[Tue Jul 27 09:36:45 2021] i40e 0000:14:00.1 eno2: port 4789 already
offloaded
[Tue Jul 27 09:36:45 2021] i40e 0000:14:00.1: FW LLDP is enabled

*[Tue Jul 27 09:36:45 2021] bond0: (slave eno1): link status definitely
down, disabling slave*[Tue Jul 27 09:36:45 2021] i40iw_deinit_device: state
= 11
[Tue Jul 27 09:36:45 2021] bond0: (slave eno2): link status definitely up,
10000 Mbps full duplex
[Tue Jul 27 09:36:45 2021] ib_srpt srpt_remove_one(i40iw1): nothing to do.
[Tue Jul 27 09:36:45 2021] device vethda19890a entered promiscuous mode
[Tue Jul 27 09:36:45 2021] i40e 0000:14:00.0 eno1: port 4789 already
offloaded
[Tue Jul 27 09:36:45 2021] i40e 0000:14:00.0 eno1: port 4789 already
offloaded
[Tue Jul 27 09:36:45 2021] i40e 0000:14:00.0: FW LLDP is enabled
[Tue Jul 27 09:36:45 2021] i40iw_deinit_device: state = 11
[Tue Jul 27 09:36:45 2021] ib_srpt srpt_remove_one(i40iw0): nothing to do.
[Tue Jul 27 09:36:45 2021] i40iw_initialize_dev: DCB is set/clear = 0
[Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1283] fm load status[x0703]
[Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1288] I40E_GLPE_CPUSTATUS1
status[x0080]
[Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1291] I40E_GLPE_CPUSTATUS2
status[x0080]
[Tue Jul 27 09:36:45 2021] bond0: (slave eno1): link status definitely up,
10000 Mbps full duplex
[Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1283] fm load status[x0703]
[Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1288] I40E_GLPE_CPUSTATUS1
status[x0080]
[Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1291] I40E_GLPE_CPUSTATUS2
status[x0080]
[Tue Jul 27 09:36:45 2021] ib_srpt MAD registration failed for i40iw0-1.
[Tue Jul 27 09:36:45 2021] ib_srpt srpt_add_one(i40iw0) failed.
[Tue Jul 27 09:36:45 2021] i40iw_open: i40iw_open completed
[Tue Jul 27 09:36:45 2021] ib_srpt MAD registration failed for i40iw1-1.
[Tue Jul 27 09:36:45 2021] ib_srpt srpt_add_one(i40iw1) failed.
[Tue Jul 27 09:36:45 2021] i40iw_open: i40iw_open completed
[Tue Jul 27 09:36:49 2021] exe invoked oom-killer:
gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-1000
[Tue Jul 27 09:36:50 2021] exe cpuset=/ mems_allowed=0-3
[Tue Jul 27 09:36:50 2021] CPU: 53 PID: 21835 Comm: exe Tainted: G        W
   L   --------- -  - 4.18.0-193.51.1.el8_2.x86_64 #1
[Tue Jul 27 09:36:50 2021] Hardware name: HPE ProLiant XL170r
Gen10/ProLiant XL170r Gen10, BIOS U38 10/26/2020
[Tue Jul 27 09:36:50 2021] Call Trace:

I was looking for information about the error codes and there is very
little information on the Internet.

Maybe you already know the problem.

Any information would be very helpful!

BR,
-- 
*Ivan Pazos*

Senior Openshift Consultant

Red Hat Iberia <https://www.redhat.com/>

ivan.pa...@redhat.com
Mobile : +34647962071
@RedHat <https://twitter.com/redhat>   Red Hat
<https://www.linkedin.com/company/red-hat>  Red Hat
<https://www.facebook.com/RedHatInc>
<https://www.redhat.com/>

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet

Reply via email to