Public bug reported:

intro
-----

Our internal test triggers a kernel crash dump below
[  888.690348] Sun Mar 24 23:51:59 2024: DriVerTest - Start Test
 [  888.691834] 
----------------------------------------------------------------------------------------------------
 [  888.983912] mlx5_core 0000:08:00.1 eth3: Link up
 [  888.987644] IPv6: ADDRCONF(NETDEV_CHANGE): eth3: link becomes ready
 [  889.336577] mlx5_core 0000:08:00.0 eth2: Link up
 [  894.635836] Sun Mar 24 11:52:04 PM IST 2024 - DriVerTest Debug Heartbeat
 [  940.431644] general protection fault, probably for non-canonical address 
0x8002001400000000: 0000 [#1] SMP NOPTI
 [  940.432866] CPU: 7 PID: 94305 Comm: ethtool Tainted: G           OE     
5.15.0-1039.17.g0d63875-bluefield #1
 [  940.433970] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [  940.435220] RIP: 0010:netlink_policy_dump_add_policy+0x95/0x160
 [  940.435893] Code: 48 c1 e0 04 4c 8b 34 01 4d 85 f6 74 5b 31 db eb 10 4c 89 
e8 83 c3 01 48 c1 e0 04 39 5c 01 08 72 3f 89 d8 48 c1 e0 04 4c 01 f0 <0f> b6 10 
83 ea 08 83 fa 01 77 dc 0f b7 50 02 48 8b 70 08 48 8d 7c
 [  940.437921] RSP: 0018:ffa0000002d37a08 EFLAGS: 00010286
 [  940.438551] RAX: 8002001400000000 RBX: 0000000000000000 RCX: 
ff1100027d000000
 [  940.439351] RDX: 00000000fffffff8 RSI: 0000000000000018 RDI: 
ffa0000002d37a10
 [  940.440131] RBP: 0000000000000003 R08: 0000000000400000 R09: 
ff1100027d2d0f10
 [  940.440900] R10: 0000000000000318 R11: 0000000000000000 R12: 
ff1100011fa59bc0
 [  940.441683] R13: 0000000000000004 R14: 8002001400000000 R15: 
ffffffff83fa6540
 [  940.442459] FS:  00007f4a17993740(0000) GS:ff1100085f9c0000(0000) 
knlGS:0000000000000000
 [  940.443394] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  940.444044] CR2: 0000000000429f50 CR3: 000000012fc2e002 CR4: 
0000000000771ee0
 [  940.444847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
 [  940.445639] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
 [  940.446431] PKRU: 55555554
 [  940.446795] Call Trace:
 [  940.447144]  <TASK>
 [  940.447444]  ? __die_body+0x1b/0x60
 [  940.447880]  ? die_addr+0x39/0x60
 [  940.448315]  ? exc_general_protection+0x1bc/0x3c0
 [  940.448867]  ? asm_exc_general_protection+0x22/0x30
 [  940.449445]  ? netlink_policy_dump_add_policy+0x95/0x160
 [  940.450058]  ? netlink_policy_dump_add_policy+0xb2/0x160
 [  940.450714]  ? ethtool_get_phc_vclocks+0x70/0x70
 [  940.451272]  ctrl_dumppolicy_start+0xc4/0x2a0
 [  940.451788]  ? ethnl_reply_init+0xd0/0xd0
 [  940.452284]  ? __nla_parse+0x22/0x30
 [  940.452734]  ? __cond_resched+0x15/0x30
 [  940.453211]  ? kmem_cache_alloc_trace+0x44/0x390
 [  940.453750]  genl_start+0xc3/0x150
 [  940.454179]  __netlink_dump_start+0x175/0x250
 [  940.454706]  genl_family_rcv_msg_dumpit.isra.0+0x9a/0x100
 [  940.455334]  ? genl_family_rcv_msg_attrs_parse.isra.0+0xe0/0xe0
 [  940.455998]  ? genl_unlock+0x20/0x20
 [  940.456453]  ? genl_parallel_done+0x40/0x40
 [  940.456957]  genl_rcv_msg+0x11f/0x2b0
 [  940.457421]  ? genl_get_cmd+0x170/0x170
 [  940.457890]  ? ctrl_dumppolicy_put_op.isra.0+0x1e0/0x1e0
 [  940.458515]  ? genl_lock_done+0x60/0x60
 [  940.458987]  ? genl_family_rcv_msg_doit.isra.0+0x110/0x110
 [  940.459634]  netlink_rcv_skb+0x54/0x100
 [  940.460107]  genl_rcv+0x24/0x40
 [  940.460504]  netlink_unicast+0x18d/0x230
 [  940.460983]  netlink_sendmsg+0x240/0x4a0
 [  940.461472]  __sock_sendmsg+0x2f/0x40
 [  940.461922]  __sys_sendto+0xee/0x160
 [  940.462384]  ? __sys_recvmsg+0x56/0xa0
 [  940.462854]  ? exit_to_user_mode_prepare+0x35/0x170
 [  940.463439]  __x64_sys_sendto+0x25/0x30
 [  940.463906]  do_syscall_64+0x35/0x80
 [  940.464368]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
 [  940.464955] RIP: 0033:0x7f4a17aa940a
 [  940.465415] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e 
fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
 [  940.467418] RSP: 002b:00007ffc3612cac8 EFLAGS: 00000246 ORIG_RAX: 
000000000000002c
 [  940.468284] RAX: ffffffffffffffda RBX: 0000000000c3b3b0 RCX: 
00007f4a17aa940a
 [  940.469057] RDX: 0000000000000024 RSI: 0000000000c3b3b0 RDI: 
0000000000000003
 [  940.469852] RBP: 0000000000c3b2a0 R08: 00007f4a17ba4200 R09: 
000000000000000c
 [  940.470674] R10: 0000000000000000 R11: 0000000000000246 R12: 
0000000000c3b340
 [  940.471470] R13: 0000000000c3b350 R14: 00007ffc3612caec R15: 
0000000000c3b3b0
 [  940.472257]  </TASK>
 [  940.472570] Modules linked in: udp_diag(E) tcp_diag(E) inet_diag(E) 
iptable_raw(E) openvswitch(E) nsh(E) nf_conncount(E) rdma_ucm(OE) rdma_cm(OE) 
iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) 
mlxfw(OE) auxiliary(OE) mlxdevm(OE) ib_uverbs(OE) ib_core(OE) mlx_compat(OE) 
memtrack(OE) psample(E) ptp(E) pps_core(E) nfsv3(E) nfs_acl(E) 
rpcsec_gss_krb5(E) xt_conntrack(E) auth_rpcgss(E) xt_MASQUERADE(E) 
nf_conntrack_netlink(E) nfnetlink(E) xt_addrtype(E) iptable_filter(E) 
iptable_nat(E) nf_nat(E) br_netfilter(E) bridge(E) stp(E) llc(E) nfsv4(E) 
dns_resolver(E) nfs(E) lockd(E) grace(E) fscache(E) netfs(E) overlay(E) 
rfkill(E) sunrpc(E) kvm_intel(E) iTCO_wdt(E) iTCO_vendor_support(E) kvm(E) 
irqbypass(E) virtio_net(E) i2c_i801(E) pcspkr(E) i2c_smbus(E) lpc_ich(E) 
net_failover(E) mfd_core(E) failover(E) sch_fq_codel(E) drm(E) i2c_core(E) 
ip_tables(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) 
sha256_ssse3(E) sha1_ssse3(E) serio_raw(E) fuse(E)
 [  940.472612]  [last unloaded: ib_core]
 [  940.481959] ---[ end trace 09663efb82dc1774 ]---
 [  940.482523] RIP: 0010:netlink_policy_dump_add_policy+0x95/0x160


fix
---

Need to cherry-pick the following patch

commit c1b05105573b2cd5845921eb0d2caa26e2144a34
Author: Jakub Kicinski <k...@kernel.org>
Date:   Wed Nov 9 10:32:54 2022 -0800

    genetlink: fix single op policy dump when do is present

    Jonathan reports crashes when running net-next in Meta's fleet.
    Stats collection uses ethtool -I which does a per-op policy dump
    to check if stats are supported. We don't initialize the dumpit
    information if doit succeeds due to evaluation short-circuiting.

    The crash may look like this:

       BUG: kernel NULL pointer dereference, address: 0000000000000cc0
       RIP: 0010:netlink_policy_dump_add_policy+0x174/0x2a0
         ctrl_dumppolicy_start+0x19f/0x2f0
         genl_start+0xe7/0x140

    Or we may trigger a warning:

       WARNING: CPU: 1 PID: 785 at net/netlink/policy.c:87 
netlink_policy_dump_get_policy_idx+0x79/0x80
       RIP: 0010:netlink_policy_dump_get_policy_idx+0x79/0x80
         ctrl_dumppolicy_put_op+0x214/0x360

    depending on what garbage we pick up from the stack.

    Reported-by: Jonathan Lemon <b...@meta.com>
    Fixes: 26588edbef60 ("genetlink: support split policies in 
ctrl_dumppolicy_put_op()")
    Reviewed-by: Jacob Keller <jacob.e.kel...@intel.com>
    Tested-by: Leon Romanovsky <leo...@nvidia.com>
    Link: https://lore.kernel.org/r/20221109183254.554051-1-k...@kernel.org
    Signed-off-by: Jakub Kicinski <k...@kernel.org>

** Affects: linux-bluefield (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059961

Title:
  genetlink: fix single op policy dump when do is present

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2059961/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to