On Tue, Jul 9, 2024 at 5:08 AM Mina Almasry <almasrym...@google.com> wrote: >
Hi Mina, Thanks a lot for your reply! > On Thu, Jul 4, 2024 at 10:57 AM Taehee Yoo <ap420...@gmail.com> wrote: > > > > I found several locking warnings while testing. > > > > Thanks for Testing Taehee! And sorry for the late reply. I was off for > a couple of days. With some minor tweaks to my test setup I was able > to reproduce and fix all 3 warnings. > > > [ 1135.125874] WARNING: CPU: 1 PID: 1644 at > > drivers/dma-buf/dma-buf.c:1123 dma_buf_map_attachment+0x164/0x2f0 > ... > > [ 1136.178258] WARNING: CPU: 1 PID: 1644 at > > drivers/dma-buf/dma-buf.c:1226 dma_buf_unmap_attachment+0x267/0x320 > > Both of these are warnings that dma->resv is not locked when calling > dma_buf_[un]map_attachment(). As far as I can tell so far, this can be > resolved by using the unlocked versions: > dma_buf_[un]map_attachment_unlocked() which is correct here for this > static importer. > > ... > > > [ 1135.709313] WARNING: CPU: 3 PID: 1644 at > > net/core/netdev_rx_queue.c:18 netdev_rx_queue_restart+0x3f4/0x5a0 > > This is due to rtnl_lock() actually not being acquired in the unbind > path, when the netlink socket is closed. Sorry about that. This is > fixed by obtaining rtnl_lock() in the unbind path. > > With the fixes below all the warnings disappear. I'm planning to > squash them to the next version. Let me know if those don't work for > you. Thanks! > > diff --git a/net/core/devmem.c b/net/core/devmem.c > index e52bca1a55c7c..a6ef1485b80f2 100644 > --- a/net/core/devmem.c > +++ b/net/core/devmem.c > @@ -46,8 +46,8 @@ void __net_devmem_dmabuf_binding_free(struct > net_devmem_dmabuf_binding *binding) > size, avail)) > gen_pool_destroy(binding->chunk_pool); > > - dma_buf_unmap_attachment(binding->attachment, binding->sgt, > - DMA_FROM_DEVICE); > + dma_buf_unmap_attachment_unlocked(binding->attachment, binding->sgt, > + DMA_FROM_DEVICE); > dma_buf_detach(binding->dmabuf, binding->attachment); > dma_buf_put(binding->dmabuf); > xa_destroy(&binding->bound_rxqs); > @@ -157,8 +157,8 @@ struct net_devmem_dmabuf_binding > *net_devmem_bind_dmabuf(struct net_device *dev, > goto err_free_id; > } > > - binding->sgt = > - dma_buf_map_attachment(binding->attachment, DMA_FROM_DEVICE); > + binding->sgt = dma_buf_map_attachment_unlocked(binding->attachment, > + DMA_FROM_DEVICE); > if (IS_ERR(binding->sgt)) { > err = PTR_ERR(binding->sgt); > goto err_detach; > @@ -225,8 +225,8 @@ struct net_devmem_dmabuf_binding > *net_devmem_bind_dmabuf(struct net_device *dev, > net_devmem_dmabuf_free_chunk_owner, NULL); > gen_pool_destroy(binding->chunk_pool); > err_unmap: > - dma_buf_unmap_attachment(binding->attachment, binding->sgt, > - DMA_FROM_DEVICE); > + dma_buf_unmap_attachment_unlocked(binding->attachment, binding->sgt, > + DMA_FROM_DEVICE); > err_detach: > dma_buf_detach(dmabuf, binding->attachment); > err_free_id: > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c > index 4b16b3ad2ec5b..33bb20c143997 100644 > --- a/net/core/netdev-genl.c > +++ b/net/core/netdev-genl.c > @@ -861,6 +861,9 @@ void netdev_nl_sock_priv_destroy(struct list_head *priv) > struct net_devmem_dmabuf_binding *binding; > struct net_devmem_dmabuf_binding *temp; > > - list_for_each_entry_safe(binding, temp, priv, list) > + list_for_each_entry_safe(binding, temp, priv, list) { > + rtnl_lock(); > net_devmem_unbind_dmabuf(binding); > + rtnl_unlock(); > + } > } > > > > -- > Thanks, > Mina I tested the above fix, it works well. And I found another bug. [ 236.625141] BUG: KASAN: slab-use-after-free in net_devmem_unbind_dmabuf+0x364/0x440 [ 236.633488] Read of size 8 at addr ffff8881490d00b0 by task ncdevmem/1480 [ 236.643137] CPU: 0 PID: 1480 Comm: ncdevmem Tainted: G W 6.10.0-rc5+ #50 8d4b0a557c4b34e2938739913129f 4523354121c [ 236.655443] Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 [ 236.664307] Call Trace: [ 236.667443] <TASK> [ 236.670234] dump_stack_lvl+0x7e/0xc0 [ 236.674583] print_report+0xc1/0x5e0 [ 236.678850] ? __virt_addr_valid+0x1f5/0x3d0 [ 236.683803] ? net_devmem_unbind_dmabuf+0x364/0x440 [ 236.689362] kasan_report+0xb9/0xf0 [ 236.693536] ? net_devmem_unbind_dmabuf+0x364/0x440 [ 236.699094] net_devmem_unbind_dmabuf+0x364/0x440 [ 236.704487] ? __pfx_lock_release+0x10/0x10 [ 236.709352] ? __pfx_net_devmem_unbind_dmabuf+0x10/0x10 [ 236.715256] netdev_nl_sock_priv_destroy+0x77/0xd0 [ 236.720743] genl_release+0xed/0x190 [ 236.725004] ? __pfx_genl_release+0x10/0x10 [ 236.729870] ? rcu_is_watching+0x11/0xb0 [ 236.734476] ? netlink_release+0x7d8/0x18f0 [ 236.739343] ? trace_irq_enable.constprop.0+0xe4/0x130 [ 236.745168] ? __pfx_genl_release+0x10/0x10 [ 236.750034] netlink_release+0x839/0x18f0 [ 236.754727] ? netlink_release+0x1a9/0x18f0 [ 236.759594] ? __pfx_netlink_release+0x10/0x10 [ 236.764719] ? __pfx_down_write+0x10/0x10 [ 236.769413] ? __pfx_locks_remove_file+0x10/0x10 [ 236.774718] __sock_release+0xa3/0x260 [ 236.779153] sock_close+0x14/0x20 [ 236.783153] __fput+0x367/0xad0 [ 236.786982] ? trace_irq_enable.constprop.0+0xe4/0x130 [ 236.792801] task_work_run+0x12e/0x220 [ 236.797243] ? __pfx_task_work_run+0x10/0x10 [ 236.802193] ? do_raw_spin_unlock+0x54/0x220 [ 236.807149] do_exit+0x916/0x2570 ... [ 236.994294] Allocated by task 1503: [ 236.998470] kasan_save_stack+0x20/0x40 [ 237.002992] kasan_save_track+0x10/0x30 [ 237.007513] __kasan_slab_alloc+0x83/0x90 [ 237.012203] kmem_cache_alloc_node_noprof+0x154/0x380 [ 237.017936] kmalloc_reserve+0x140/0x240 [ 237.022541] __alloc_skb+0x10d/0x2d0 [ 237.026801] alloc_uevent_skb+0x79/0x210 [ 237.031408] kobject_uevent_env+0xd7c/0x10e0 [ 237.036362] __kobject_del+0x131/0x1d0 [ 237.040794] kobject_put+0x23e/0x3f0 [ 237.045056] net_rx_queue_update_kobjects+0x35d/0x470 [ 237.050789] netdev_unregister_kobject+0x139/0x250 [ 237.056266] unregister_netdevice_many_notify+0xf05/0x1900 [ 237.062429] unregister_netdevice_queue+0x29a/0x360 [ 237.067988] unregister_netdev+0x18/0x20 [ 237.072594] 0xffffffffc09de73c [ 237.076422] pci_device_remove+0xa7/0x1d0 [ 237.081140] device_release_driver_internal+0x36d/0x530 [ 237.087044] driver_detach+0xc1/0x180 [ 237.091392] bus_remove_driver+0x11a/0x2a0 [ 237.096173] pci_unregister_driver+0x26/0x250 [ 237.101210] 0xffffffffc0a47b3c [ 237.105038] __do_sys_delete_module.constprop.0+0x2ff/0x4b0 [ 237.111289] do_syscall_64+0x64/0x140 [ 237.115637] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 237.123553] Freed by task 279: [ 237.127296] kasan_save_stack+0x20/0x40 [ 237.131816] kasan_save_track+0x10/0x30 [ 237.136337] kasan_save_free_info+0x37/0x60 [ 237.141203] poison_slab_object+0xee/0x170 [ 237.145983] __kasan_slab_free+0x2f/0x50 [ 237.150589] kmem_cache_free+0x12e/0x470 [ 237.155197] skb_release_data+0x51f/0x790 [ 237.159891] consume_skb+0xa7/0x110 [ 237.164063] netlink_recvmsg+0x4f9/0xc80 [ 237.168672] ____sys_recvmsg+0x5fc/0x860 [ 237.173278] ___sys_recvmsg+0xd3/0x150 [ 237.177712] __sys_recvmsg+0xc6/0x160 [ 237.182060] do_syscall_64+0x64/0x140 [ 237.186405] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 237.194320] The buggy address belongs to the object at ffff8881490d0040 which belongs to the cache skbuff_small_head of size 640 [ 237.208701] The buggy address is located 112 bytes inside of freed 640-byte region [ffff8881490d0040, ffff8881490d02c0) [ 237.224514] The buggy address belongs to the physical page: [ 237.230763] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1490d0 [ 237.239433] head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 237.247755] flags: 0x200000000000040(head|node=0|zone=2) [ 237.253746] page_type: 0xffffefff(slab) [ 237.258265] raw: 0200000000000040 ffff8881050cadc0 ffffea0004153b10 ffffea00044ddb10 [ 237.266677] raw: 0000000000000000 0000000000120012 00000001ffffefff 0000000000000000 [ 237.275096] head: 0200000000000040 ffff8881050cadc0 ffffea0004153b10 ffffea00044ddb10 [ 237.283599] head: 0000000000000000 0000000000120012 00000001ffffefff 0000000000000000 [ 237.292095] head: 0200000000000002 ffffea0005243401 ffffffffffffffff 0000000000000000 [ 237.300590] head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000 [ 237.309088] page dumped because: kasan: bad access detected [ 237.317519] Memory state around the buggy address: [ 237.322993] ffff8881490cff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 237.330887] ffff8881490d0000: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb [ 237.338776] >ffff8881490d0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 237.346668] ^ [ 237.352139] ffff8881490d0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 237.360032] ffff8881490d0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 237.367921] ================================================================== Reproducer: ./ncdevmem -f <interface name> -l -p 5201 -v 7 -t 0 -q 2 & sleep 10 modprobe -rv bnxt_en killall ncdevmem I think it's a devmemTCP core bug so this issue would be reproduced with other drivers. Thanks! Taehee Yoo