I am getting a 'general protection fault: 0000 [#1] SMP PTI' on the i40e
driver. I am using the Intel SR-IOV CNI (https://github.com/intel/sriov-cni/
and https://github.com/intel/sriov-network-device-plugin) to created and
delete pods with SR-IOV VFs attached to containers. I found that if I add a
VLAN to the VF (via the CNI) I get the crash on the 'kubectl delete pod',
but if I add a VLAN and QOS to the VF (via the CNI), the 'kubectl delete
pod' doesn't crash. I haven't been able to reproduce with 'ip link set
<iface> vf <vfid> vlan <vlanid>' commands.
Details:
Running Fedora 29, kernel 5.2.11-100.fc29.x86_64
$ ethtool -i eno1
driver: i40e
version: 2.8.20-k
firmware-version: 6.80 0x80003d71 18.8.9
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
Crashed on i40e 2.8.20-k, so downloaded and built 2.9.21, which also
crashes. Details below are from 2.9.21.
[Sep18 10:35] general protection fault: 0000 [#1] SMP PTI
[ +0.000030] CPU: 35 PID: 2783 Comm: sriov Tainted: G OE
5.2.11-100.fc29.x86_64 #1
[ +0.000026] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.9.1
12/04/2018
[ +0.000032] RIP: 0010:i40e_config_vf_promiscuous_mode+0x17a/0x350 [i40e]
[ +0.000021] Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 48 83 c6 08 48 39 34
24 75 dd 85 c9 74 77 44 0f b6 64 24 08 45 31 d2 4d 8b 3e 4d 85 ff 74 57
<41> 0f b7 4f 16 66 81 f9 ff 0f 77 43 0f b7 b3 ea 0c 00 00 45 31 c0
[ +0.000047] RSP: 0018:ffffa814c90c78b0 EFLAGS: 00010202
[ +0.000016] RAX: 0000000000000000 RBX: ffff9bcb9639f000 RCX:
0000000000000000
[ +0.000019] RDX: 0000000000000000 RSI: 0000000006000000 RDI:
ffff9bcb83f30370
[ +0.000020] RBP: ffff9bcb83f30008 R08: 0000000000000000 R09:
0000000000000023
[ +0.000020] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[ +0.000020] R13: 0000000000000000 R14: ffff9bcb9639f338 R15:
207904daf17a68cd
[ +0.000019] FS: 00000000006d85d0(0000) GS:ffff9bdbbf840000(0000)
knlGS:0000000000000000
[ +0.000022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000017] CR2: 000000c000111000 CR3: 0000002023cf6002 CR4:
00000000003606e0
[ +0.000019] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ +0.000020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ +0.000020] Call Trace:
[ +0.000018] ? i40e_ndo_set_vf_port_vlan+0x1c4/0x2c0 [i40e]
[ +0.000022] ? do_setlink+0x577/0xe90
[ +0.000018] ? security_sock_rcv_skb+0x2a/0x40
[ +0.000015] ? sk_filter_trim_cap+0x4f/0x210
[ +0.000015] ? netlink_attachskb+0x1bc/0x1d0
[ +0.000014] ? rtnl_setlink+0xdd/0x140
[ +0.000015] ? security_capset+0x50/0x60
[ +0.000013] ? rtnetlink_rcv_msg+0x2b1/0x360
[ +0.000014] ? rtnl_calcit.isra.32+0x110/0x110
[ +0.000014] ? netlink_rcv_skb+0x49/0x110
[ +0.000013] ? netlink_unicast+0x191/0x220
[ +0.000013] ? netlink_sendmsg+0x204/0x3d0
[ +0.000015] ? sock_sendmsg+0x4c/0x50
[ +0.000013] ? __sys_sendto+0xee/0x160
[ +0.000013] ? __sys_bind+0x79/0xf0
[ +0.000033] ? __sys_socket+0x93/0xe0
[ +0.000015] ? __x64_sys_sendto+0x24/0x30
[ +0.000017] ? do_syscall_64+0x5f/0x1a0
[ +0.000015] ? page_fault+0x8/0x30
[ +0.000013] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ +0.000017] Modules linked in: i40iw i40e(OE) veth vxlan ip6_udp_tunnel
udp_tunnel xt_statistic xt_nat xt_comment xt_mark nf_conntrack_netlink
xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE iavf tun bridge stp llc
ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4
xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vfio_pci
vfio_virqfd vfio_iommu_type1 vfio overlay ip_set uio nfnetlink
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi
sunrpc ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm
ib_umad rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support dcdbas intel_rapl
sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate
intel_uncore
[ +0.000033] intel_rapl_perf joydev mxm_wmi lpc_ich ipmi_ssif ib_uverbs
ib_core mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter
pcc_cpufreq xfs libcrc32c mgag200 drm_kms_helper ttm virtio_net
net_failover drm crc32c_intel igb failover megaraid_sas dca i2c_algo_bit
wmi [last unloaded: i40e]
[ +0.005208] ---[ end trace deea73cdd7c0f936 ]---
[ +0.012672] RIP: 0010:i40e_config_vf_promiscuous_mode+0x17a/0x350 [i40e]
[ +0.000660] Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 48 83 c6 08 48 39 34
24 75 dd 85 c9 74 77 44 0f b6 64 24 08 45 31 d2 4d 8b 3e 4d 85 ff 74 57
<41> 0f b7 4f 16 66 81 f9 ff 0f 77 43 0f b7 b3 ea 0c 00 00 45 31 c0
[ +0.001287] RSP: 0018:ffffa814c90c78b0 EFLAGS: 00010202
[ +0.000778] RAX: 0000000000000000 RBX: ffff9bcb9639f000 RCX:
0000000000000000
[ +0.000688] RDX: 0000000000000000 RSI: 0000000006000000 RDI:
ffff9bcb83f30370
[ +0.000657] RBP: ffff9bcb83f30008 R08: 0000000000000000 R09:
0000000000000023
[ +0.000676] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[ +0.000695] R13: 0000000000000000 R14: ffff9bcb9639f338 R15:
207904daf17a68cd
[ +0.000669] FS: 00000000006d85d0(0000) GS:ffff9bdbbf840000(0000)
knlGS:0000000000000000
[ +0.000658] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000669] CR2: 000000c000111000 CR3: 0000002023cf6002 CR4:
00000000003606e0
[ +0.000665] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ +0.000674] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ +0.000670] Kernel panic - not syncing: Fatal exception
Using:
objdump -S i40e_virtchnl_pf.o > i40e_virtchnl_pf.txt
Crash is located in i40e_config_vf_promiscuous_mode() in i40e_virtchnl_pf.c:
static i40e_status i40e_config_vf_promiscuous_mode(struct i40e_vf *vf,
u16 vsi_id,
bool allmulti,
bool alluni)
{
:
if (vf->port_vlan_id) {
:
} else if (i40e_getnum_vf_vsi_vlan_filters(vsi)) {
1b5: 85 c9 test %ecx,%ecx
1b7: 74 77 je 230
<i40e_config_vf_promiscuous_mode+0x1e0>
aq_ret = i40e_aq_set_vsi_mc_promisc_on_vlan(hw,
1b9: 44 0f b6 64 24 08 movzbl 0x8(%rsp),%r12d
i40e_status aq_ret = I40E_SUCCESS;
1bf: 45 31 d2 xor %r10d,%r10d
hash_for_each(vsi->mac_filter_hash, bkt, f, hlist) {
1c2: 4d 8b 3e mov (%r14),%r15
1c5: 4d 85 ff test %r15,%r15
1c8: 74 57 je 221
<i40e_config_vf_promiscuous_mode+0x1d1>
if (f->vlan < 0 || f->vlan > I40E_MAX_VLANID)
--> 1ca: 41 0f b7 4f 16 movzwl 0x16(%r15),%ecx
1cf: 66 81 f9 ff 0f cmp $0xfff,%cx
1d4: 77 43 ja 219
<i40e_config_vf_promiscuous_mode+0x1c9>
aq_ret = i40e_aq_set_vsi_mc_promisc_on_vlan(hw,
1d6: 0f b7 b3 ea 0c 00 00 movzwl 0xcea(%rbx),%esi
1dd: 45 31 c0 xor %r8d,%r8d
1e0: 44 89 e2 mov %r12d,%edx
1e3: 48 89 ef mov %rbp,%rdi
1e6: e8 00 00 00 00 callq 1eb
<i40e_config_vf_promiscuous_mode+0x19b>
if (aq_ret) {
1eb: 85 c0 test %eax,%eax
1ed: 0f 85 87 00 00 00 jne 27a
<i40e_config_vf_promiscuous_mode+0x22a>
FYI - A put a lot of debug prints to try to narrow down, and if I print all
the bkts before hash_for_each(), the problem goes away. So looks like a
race condition or multiple threads accessing the same data. I removed the
debug and added a spin_lock_bh(&vsi->mac_filter_hash_lock);
and spin_unlock_bh(&vsi->mac_filter_hash_lock); and change the
hash_for_each() to a hash_for_each_safe() and the crash also went away.
Let me know what additional data you need.
Thanks,
Billy McFall
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired