Re: [BUG][5.20] refcount_t: underflow; use-after-free
Hi! Unfortunately the use-after-free issue still happens on the 6.0-rc5 kernel. The issue became hard to repeat. I spent the whole day at the computer when use-after-free again happened, I was playing the game Tiny Tina's Wonderlands. Therefore, forget about repeatability. It remains only to hope for logs and tracing. I didn't see anything new in the logs. It seems that we need to somehow expand the logging so that the next time this happens we have more information. Sep 18 20:52:16 primary-ws gnome-shell[2388]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed Sep 18 20:52:27 primary-ws gnome-shell[2388]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed Sep 18 20:53:44 primary-ws gnome-shell[2388]: Window manager warning: Window 0x4e3 sets an MWM hint indicating it isn't resizable, but sets min size 1 x 1 and max size 2147483647 x 2147483647; this doesn't make much sense. Sep 18 20:53:45 primary-ws kernel: umip_printk: 11 callbacks suppressed Sep 18 20:53:45 primary-ws kernel: umip: Wonderlands.exe[213853] ip:14ebb0d03 sp:4ee528: SGDT instruction cannot be used by applications. Sep 18 20:53:45 primary-ws kernel: umip: Wonderlands.exe[213853] ip:14ebb0d03 sp:4ee528: For now, expensive software emulation returns the result. Sep 18 20:53:53 primary-ws gnome-shell[2388]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed Sep 18 20:53:53 primary-ws kernel: umip: Wonderlands.exe[213853] ip:14ebb0d03 sp:4ee528: SGDT instruction cannot be used by applications. Sep 18 20:53:53 primary-ws kernel: umip: Wonderlands.exe[213853] ip:14ebb0d03 sp:4ee528: For now, expensive software emulation returns the result. Sep 18 20:54:15 primary-ws kernel: umip: Wonderlands.exe[214194] ip:15a270815 sp:6eaef490: SGDT instruction cannot be used by applications. Sep 18 20:56:01 primary-ws kernel: umip_printk: 15 callbacks suppressed Sep 18 20:56:01 primary-ws kernel: umip: Wonderlands.exe[213853] ip:15e3a82b0 sp:4ed178: SGDT instruction cannot be used by applications. Sep 18 20:56:01 primary-ws kernel: umip: Wonderlands.exe[213853] ip:15e3a82b0 sp:4ed178: For now, expensive software emulation returns the result. Sep 18 20:56:03 primary-ws kernel: umip: Wonderlands.exe[213853] ip:15e3a82b0 sp:4edbe8: SGDT instruction cannot be used by applications. Sep 18 20:56:03 primary-ws kernel: umip: Wonderlands.exe[213853] ip:15e3a82b0 sp:4edbe8: For now, expensive software emulation returns the result. Sep 18 20:56:03 primary-ws kernel: umip: Wonderlands.exe[213853] ip:15e3a82b0 sp:4ebf18: SGDT instruction cannot be used by applications. Sep 18 20:57:55 primary-ws kernel: [ cut here ] Sep 18 20:57:55 primary-ws kernel: refcount_t: underflow; use-after-free. Sep 18 20:57:55 primary-ws kernel: WARNING: CPU: 22 PID: 235114 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 Sep 18 20:57:55 primary-ws kernel: Modules linked in: tls uinput rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_> Sep 18 20:57:55 primary-ws kernel: asus_wmi ledtrig_audio sparse_keymap platform_profile irqbypass rfkill mc rapl snd_timer video wmi_bmof pcspkr snd k10temp i2c_piix4 soundcore acpi_cpufreq zram amdgpu drm_ttm_helper ttm iommu_v2 crct1> Sep 18 20:57:55 primary-ws kernel: Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_eda> Sep 18 20:57:55 primary-ws kernel: pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 fjes():1 pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 Sep 18 20:57:55 primary-ws kernel: CPU: 22 PID: 235114 Comm: kworker/22:0 Tainted: GWL--- --- 6.0.0-0.rc5.20220914git3245cb65fd91.39.fc38.x86_64 #1 Sep 18 20:57:55 primary-ws kernel: Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022 Sep 18 20:57:55 primary-ws kernel: Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched] Sep 18 20:57:55 primary-ws kernel: RIP: 0010:refcount_warn_saturate+0xba/0x110 Sep 18 20:57:55 primary-ws kernel: Code: 01 01 e8 69 6b 6f 00 0f 0b e9 32 38 a5 00 80 3d 4d 7d be 01 00 75 85 48 c7 c7 80 b7 8e 95 c6 05 3d 7d be 01 01 e8 46 6b 6f 00 <0f> 0b e9 0f 38 a5 00 80 3d 28 7d be 01 00 0f 85 5e ff ff ff 48 c7 Sep 18 20:57:55 primary-ws kernel: RSP: 0018:a1a853ccbe60 EFLAGS: 00010286 Sep 18 20:57:55 primary-ws kernel: RAX: 0026 RBX: 8e0e60a96c28 RCX: Sep 18 20:57:55 primary-ws kernel: RDX: 0001 RSI: 958d255c RDI: Sep 18 20:57:55 primary-ws kernel: RBP: 8e19a83f5600 R08: R09: a1a853ccbd10 Sep 18 20:57:55 primary-ws kernel: R10: 0003 R11: 8e19ee2fffe8 R12: 8e19a83fc800 Sep 18
Re: [BUG][5.20] refcount_t: underflow; use-after-free
On Fri, Aug 19, 2022 at 5:13 PM Maíra Canal wrote: > > Hi Mikhail, > > Could you please specify the steps to reproduce this use-after-free? I > will try to reproduce it on the RX5700 XT and bisect the issue. > Hi Maíra, thanks for help. I'm afraid that it will be unrealistic to reproduce, because on a laptop with 6800M (also RDNA 2 graphics) the problem does not repeat. Sorry for the long silence, but I was trying to bisect the problem myself. git bisect start # status: waiting for both good and bad commits # good: [3d7cb6b04c3f3115719235cc6866b10326de34cd] Linux 5.19 git bisect good 3d7cb6b04c3f3115719235cc6866b10326de34cd # status: waiting for bad commit, 1 good commit known # bad: [7ebfc85e2cd7b08f518b526173e9a33b56b3913b] Merge tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net git bisect bad 7ebfc85e2cd7b08f518b526173e9a33b56b3913b # bad: [b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1] Merge tag 'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm # 001: GPU hangs + use-after-free issue - https://pastebin.com/z86E9ydx git bisect bad b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1 # good: [526942b8134cc34d25d27f95dfff98b8ce2f6fcd] Merge tag 'ata-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata # 002: good - https://pastebin.com/9qki65Sj git bisect good 526942b8134cc34d25d27f95dfff98b8ce2f6fcd # good: [45490ce2ff833c4ec0de66705e46ba41320860cb] nfp: flower: add support for tunnel offload without key ID # 003: good - https://pastebin.com/vHk5eRkw git bisect good 45490ce2ff833c4ec0de66705e46ba41320860cb # skip: [e23a5e14aa278858c2e3d81ec34e83aa9a4177c5] Backmerge tag 'v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next # 004: GPU not switched in graphic mode - https://pastebin.com/RmqCTMLD git bisect skip e23a5e14aa278858c2e3d81ec34e83aa9a4177c5 # bad: [b2065fb21d9a789b14f737ea90facedabadeb8a4] drm/amdgpu: fix i2s_pdata out of bound array access # 005: GPU hangs + use-after-free issue - https://pastebin.com/Zgw5Hc48 git bisect bad b2065fb21d9a789b14f737ea90facedabadeb8a4 # skip: [344feb7ccf764756937cfd74fa4ac5caba069c99] Merge tag 'amd-drm-next-5.20-2022-07-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-next # 006: GPU not switched in graphic mode - https://pastebin.com/b8BUBE7Q git bisect skip 344feb7ccf764756937cfd74fa4ac5caba069c99 # skip: [869b10ac8d2300327f554d83f4dbab041bf27d49] drm/amdgpu: add dm ip block for dcn 3.1.4 # 007: GPU not switched in graphic mode - https://pastebin.com/byd7HECH git bisect skip 869b10ac8d2300327f554d83f4dbab041bf27d49 # skip: [676ad8e997036e2f815c293b76c356fb7cc97a08] drm: rcar-du: Lift z-pos restriction on primary plane for Gen3 # 008: GPU not switched in graphic mode - https://pastebin.com/3fXCTinb git bisect skip 676ad8e997036e2f815c293b76c356fb7cc97a08 # skip: [5c57cbc390b166950c2e6c2f0c4edaeb0f47e97d] drm/bridge: lt9211: Convert to drm_of_get_data_lanes_count # 009: Build error - https://pastebin.com/rxHe9QRB git bisect skip 5c57cbc390b166950c2e6c2f0c4edaeb0f47e97d # skip: [6db5e0c8692e590734a7ec7455365d9cbaa15ef1] Merge tag 'drm-intel-next-2022-07-06' of git://anongit.freedesktop.org/drm/drm-intel into drm-next # 010: GPU not switched in graphic mode - https://pastebin.com/rqubSuc8 git bisect skip 6db5e0c8692e590734a7ec7455365d9cbaa15ef1 # skip: [5d763a9955f0fbf2681a2f1fa87c416056bd0c89] drm/amd/display: Remove compiler warning # 011: GPU not switched in graphic mode - https://pastebin.com/BrJs6ybP git bisect skip 5d763a9955f0fbf2681a2f1fa87c416056bd0c89 # skip: [e6c2db2be986158afb9991d9fa8a38fe65a88516] drm/i915: Don't use DRM_DEBUG_WARN_ON for unexpected l3bank/mslice config # 012: GPU not switched in graphic mode - https://pastebin.com/yxppyqbD git bisect skip e6c2db2be986158afb9991d9fa8a38fe65a88516 # bad: [cb6b81b21bd9cf09d72b7fe711be1b55001eb166] Merge tag 'drm-misc-next-fixes-2022-07-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-next # 013: GPU hangs without use-after-free issue - https://pastebin.com/iRek4bBy git bisect bad cb6b81b21bd9cf09d72b7fe711be1b55001eb166 # skip: [48b927770f8ad3f8cf4a024a552abf272af9f592] drm/exynos/exynos7_drm_decon: free resources when clk_set_parent() failed. # 014: GPU not switched in graphic mode - https://pastebin.com/ekp10xhP git bisect skip 48b927770f8ad3f8cf4a024a552abf272af9f592 # skip: [c5da61cf5bab30059f22ea368702c445ee87171a] drm/amdgpu/display: add missing FP_START/END checks dcn32_clk_mgr.c # 015: GPU not switched in graphic mode - https://pastebin.com/YbskKWmA git bisect skip c5da61cf5bab30059f22ea368702c445ee87171a # skip: [a77f7c89e62c6dfe405a64995812746f27adc510] drm/edid: convert drm_gtf_modes_for_range() to drm_edid # 016: GPU not switched in graphic mode - https://pastebin.com/bA2AwkJ7 git bisect skip a77f7c89e62c6dfe405a64995812746f27adc510 # skip: [6fde8eec71796f3534f0c274066862829813b21f] drm/doc: Add KUnit documentation # 017: GPU not switched in graphic mode -
Re: [BUG][5.20] refcount_t: underflow; use-after-free
On 08/17, Mikhail Gavrilov wrote: > On Mon, Aug 15, 2022 at 3:37 PM Mikhail Gavrilov > wrote: > > > > Thanks, I tested this patch. > > But with this patch use-after-free problem happening in another place: > > Does anyone have an idea why the second use-after-free happened? > From the trace I don't understand which code is related. > I don't quite understand what the "Workqueue" entry in the trace means. Hi Mikhail, IIUC, you got this second user-after-free by applying the first version of Maíra's patch, right? So, that version was adding another unbalanced unlock to the cs ioctl flow, but it was solved in the latest version, that you can find here: https://patchwork.freedesktop.org/patch/497680/ If this is the situation, can you check this last version? Thanks, Melissa > > [ 408.358737] [ cut here ] > [ 408.358743] refcount_t: underflow; use-after-free. > [ 408.358760] WARNING: CPU: 9 PID: 62 at lib/refcount.c:28 > refcount_warn_saturate+0xba/0x110 > [ 408.358769] Modules linked in: uinput snd_seq_dummy rfcomm > snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast > nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink > qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u > mt76x2_common snd_hda_codec_realtek mt76x02_usb snd_hda_codec_generic > iwlmvm snd_hda_codec_hdmi mt76_usb intel_rapl_msr snd_hda_intel > mt76x02_lib intel_rapl_common snd_intel_dspcfg snd_intel_sdw_acpi mt76 > snd_hda_codec vfat fat snd_usb_audio snd_hda_core edac_mce_amd > mac80211 snd_usbmidi_lib snd_hwdep snd_rawmidi mc snd_seq btusb > kvm_amd iwlwifi snd_seq_device btrtl btbcm libarc4 btintel eeepc_wmi > snd_pcm iwlmei kvm btmtk asus_wmi ledtrig_audio irqbypass joydev > snd_timer sparse_keymap bluetooth platform_profile rapl cfg80211 snd > video wmi_bmof soundcore i2c_piix4 k10temp rfkill mei > [ 408.358853] asus_ec_sensors acpi_cpufreq zram hid_logitech_hidpp > amdgpu igb dca drm_ttm_helper ttm iommu_v2 crct10dif_pclmul gpu_sched > crc32_pclmul ucsi_ccg crc32c_intel drm_buddy nvme typec_ucsi > drm_display_helper ghash_clmulni_intel ccp typec nvme_core sp5100_tco > cec wmi ip6_tables ip_tables fuse > [ 408.358880] Unloaded tainted modules: amd64_edac():1 amd64_edac():1 > amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 > amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 > amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 > pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 > [ 408.358953] pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 > fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 > [ 408.358967] CPU: 9 PID: 62 Comm: kworker/9:0 Tainted: G W L --- > --- 6.0.0-0.rc1.13.fc38.x86_64+debug #1 > [ 408.358971] Hardware name: System manufacturer System Product > Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022 > [ 408.358974] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched] > [ 408.358982] RIP: 0010:refcount_warn_saturate+0xba/0x110 > [ 408.358987] Code: 01 01 e8 d9 59 6f 00 0f 0b e9 a2 46 a5 00 80 3d 3e > 7e be 01 00 75 85 48 c7 c7 70 99 8e 92 c6 05 2e 7e be 01 01 e8 b6 59 > 6f 00 <0f> 0b e9 7f 46 a5 00 80 3d 19 7e be 01 00 0f 85 5e ff ff ff 48 > c7 > [ 408.358990] RSP: 0018:b124003efe60 EFLAGS: 00010286 > [ 408.358994] RAX: 0026 RBX: 9987a025d428 RCX: > > [ 408.358997] RDX: 0001 RSI: 928d0754 RDI: > > [ 408.358999] RBP: 9994e4ff5600 R08: R09: > b124003efd10 > [ 408.359001] R10: 0003 R11: 99952e2fffe8 R12: > 9994e4ffc800 > [ 408.359004] R13: 998600228cc0 R14: 9994e4ffc805 R15: > 9987a025d430 > [ 408.359006] FS: () GS:9994e4e0() > knlGS: > [ 408.359009] CS: 0010 DS: ES: CR0: 80050033 > [ 408.359012] CR2: 27ac39e78000 CR3: 0001a66d8000 CR4: > 00350ee0 > [ 408.359015] Call Trace: > [ 408.359017] > [ 408.359020] process_one_work+0x2a0/0x600 > [ 408.359032] worker_thread+0x4f/0x3a0 > [ 408.359036] ? process_one_work+0x600/0x600 > [ 408.359039]
Re: [BUG][5.20] refcount_t: underflow; use-after-free
On 8/17/22 17:57, Mikhail Gavrilov wrote: > On Wed, Aug 17, 2022 at 11:43 PM Maíra Canal wrote: >> >> Hi Mikhail, >> >> Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial >> revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the >> error. Try reverting it and check if the use-after-free still happens. > > Thanks, but unfortunately, this did not lead to the expected result. > Again happens use-after-free in an incomprehensible context. > From the new: added warning "suspicious RCU usage" but it looks like > it is completely not related to the use-after-free issue. > Hi Mikhail, Could you please specify the steps to reproduce this use-after-free? I will try to reproduce it on the RX5700 XT and bisect the issue. Best Regards, - Maíra Canal > [ 215.434115] [ cut here ] > [ 215.434184] refcount_t: underflow; use-after-free. > [ 215.434204] WARNING: CPU: 7 PID: 1258 at lib/refcount.c:28 > refcount_warn_saturate+0xba/0x110 > [ 215.434214] Modules linked in: uinput rfcomm snd_seq_dummy > snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast > nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink > qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event > intel_rapl_msr intel_rapl_common snd_hda_codec_realtek vfat > snd_hda_codec_generic snd_hda_codec_hdmi mt76x2u fat mt76x2_common > snd_hda_intel mt76x02_usb snd_intel_dspcfg snd_intel_sdw_acpi mt76_usb > iwlmvm edac_mce_amd snd_usb_audio snd_hda_codec mt76x02_lib > snd_hda_core snd_usbmidi_lib snd_hwdep snd_rawmidi uvcvideo mt76 > kvm_amd snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_device > mac80211 videobuf2_v4l2 videobuf2_common kvm btusb iwlwifi snd_pcm > btrtl videodev libarc4 eeepc_wmi btbcm asus_wmi iwlmei btintel > ledtrig_audio xpad irqbypass sparse_keymap btmtk platform_profile > joydev > [ 215.434436] hid_logitech_hidpp rapl ff_memless mc snd_timer > bluetooth cfg80211 video pcspkr wmi_bmof snd soundcore k10temp > i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram amdgpu > drm_ttm_helper ttm iommu_v2 ucsi_ccg gpu_sched crct10dif_pclmul > crc32_pclmul typec_ucsi drm_buddy crc32c_intel ghash_clmulni_intel ccp > igb sp5100_tco typec drm_display_helper nvme dca nvme_core cec wmi > ip6_tables ip_tables fuse > [ 215.434528] Unloaded tainted modules: amd64_edac():1 amd64_edac():1 > amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 > amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 > pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 > pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 > [ 215.434672] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 > pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 > [ 215.434702] CPU: 7 PID: 1258 Comm: kworker/7:3 Tainted: G W L > --- --- 6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1 > [ 215.434709] Hardware name: System manufacturer System Product > Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022 > [ 215.434715] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched] > [ 215.434728] RIP: 0010:refcount_warn_saturate+0xba/0x110 > [ 215.434734] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d be > 7d be 01 00 75 85 48 c7 c7 c0 99 8e 92 c6 05 ae 7d be 01 01 e8 36 59 > 6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff 48 > c7 > [ 215.434740] RSP: 0018:9ccb0237fe60 EFLAGS: 00010286 > [ 215.434747] RAX: 0026 RBX: 8d531f6f2828 RCX: > > [ 215.434753] RDX: 0001 RSI: 928d07a4 RDI: > > [ 215.434757] RBP: 8d61e47f5600 R08: R09: > 9ccb0237fd10 > [ 215.434762] R10: 0003 R11: 8d622e2fffe8 R12: > 8d61e47fc800 > [ 215.434767] R13: 8d5313e95500 R14: 8d61e47fc805 R15: > 8d531f6f2830 > [ 215.434772] FS: () GS:8d61e460() > knlGS: > [ 215.434777] CS: 0010 DS: ES: CR0: 80050033 > [ 215.434782] CR2: 7f0c8b815048 CR3: 0001ab0e8000 CR4: > 00350ee0 > [ 215.434788] Call Trace: > [ 215.434792] > [
Re: [BUG][5.20] refcount_t: underflow; use-after-free
On Wed, Aug 17, 2022 at 11:43 PM Maíra Canal wrote: > > Hi Mikhail, > > Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial > revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the > error. Try reverting it and check if the use-after-free still happens. Thanks, but unfortunately, this did not lead to the expected result. Again happens use-after-free in an incomprehensible context. >From the new: added warning "suspicious RCU usage" but it looks like it is completely not related to the use-after-free issue. [ 215.434115] [ cut here ] [ 215.434184] refcount_t: underflow; use-after-free. [ 215.434204] WARNING: CPU: 7 PID: 1258 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 [ 215.434214] Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event intel_rapl_msr intel_rapl_common snd_hda_codec_realtek vfat snd_hda_codec_generic snd_hda_codec_hdmi mt76x2u fat mt76x2_common snd_hda_intel mt76x02_usb snd_intel_dspcfg snd_intel_sdw_acpi mt76_usb iwlmvm edac_mce_amd snd_usb_audio snd_hda_codec mt76x02_lib snd_hda_core snd_usbmidi_lib snd_hwdep snd_rawmidi uvcvideo mt76 kvm_amd snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_device mac80211 videobuf2_v4l2 videobuf2_common kvm btusb iwlwifi snd_pcm btrtl videodev libarc4 eeepc_wmi btbcm asus_wmi iwlmei btintel ledtrig_audio xpad irqbypass sparse_keymap btmtk platform_profile joydev [ 215.434436] hid_logitech_hidpp rapl ff_memless mc snd_timer bluetooth cfg80211 video pcspkr wmi_bmof snd soundcore k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram amdgpu drm_ttm_helper ttm iommu_v2 ucsi_ccg gpu_sched crct10dif_pclmul crc32_pclmul typec_ucsi drm_buddy crc32c_intel ghash_clmulni_intel ccp igb sp5100_tco typec drm_display_helper nvme dca nvme_core cec wmi ip6_tables ip_tables fuse [ 215.434528] Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 [ 215.434672] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 [ 215.434702] CPU: 7 PID: 1258 Comm: kworker/7:3 Tainted: G W L --- --- 6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1 [ 215.434709] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022 [ 215.434715] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched] [ 215.434728] RIP: 0010:refcount_warn_saturate+0xba/0x110 [ 215.434734] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d be 7d be 01 00 75 85 48 c7 c7 c0 99 8e 92 c6 05 ae 7d be 01 01 e8 36 59 6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff 48 c7 [ 215.434740] RSP: 0018:9ccb0237fe60 EFLAGS: 00010286 [ 215.434747] RAX: 0026 RBX: 8d531f6f2828 RCX: [ 215.434753] RDX: 0001 RSI: 928d07a4 RDI: [ 215.434757] RBP: 8d61e47f5600 R08: R09: 9ccb0237fd10 [ 215.434762] R10: 0003 R11: 8d622e2fffe8 R12: 8d61e47fc800 [ 215.434767] R13: 8d5313e95500 R14: 8d61e47fc805 R15: 8d531f6f2830 [ 215.434772] FS: () GS:8d61e460() knlGS: [ 215.434777] CS: 0010 DS: ES: CR0: 80050033 [ 215.434782] CR2: 7f0c8b815048 CR3: 0001ab0e8000 CR4: 00350ee0 [ 215.434788] Call Trace: [ 215.434792] [ 215.434797] process_one_work+0x2a0/0x600 [ 215.434819] worker_thread+0x4f/0x3a0 [ 215.434830] ? process_one_work+0x600/0x600 [ 215.434836] kthread+0xf5/0x120 [ 215.434842] ? kthread_complete_and_exit+0x20/0x20 [ 215.434854] ret_from_fork+0x22/0x30 [ 215.434881] [ 215.434885] irq event stamp: 134873 [ 215.434890] hardirqs last enabled at (134881): [] __up_console_sem+0x5e/0x70 [ 215.434897] hardirqs
Re: [BUG][5.20] refcount_t: underflow; use-after-free
On 8/17/22 14:44, Mikhail Gavrilov wrote: On Wed, Aug 17, 2022 at 9:08 PM Melissa Wen wrote: Hi Mikhail, IIUC, you got this second user-after-free by applying the first version of Maíra's patch, right? So, that version was adding another unbalanced unlock to the cs ioctl flow, but it was solved in the latest version, that you can find here: https://patchwork.freedesktop.org/patch/497680/ If this is the situation, can you check this last version? Thanks, Melissa With the last version warning "bad unlock balance detected!" was gone, but the user-after-free issue remains. And again "Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]". Hi Mikhail, Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the error. Try reverting it and check if the use-after-free still happens. Best Regards, - Maíra Canal [ 297.834779] [ cut here ] [ 297.834818] refcount_t: underflow; use-after-free. [ 297.834831] WARNING: CPU: 30 PID: 2377 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 [ 297.834838] Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u mt76x2_common mt76x02_usb mt76_usb mt76x02_lib snd_hda_codec_realtek iwlmvm intel_rapl_msr snd_hda_codec_generic snd_hda_codec_hdmi mt76 vfat fat snd_hda_intel intel_rapl_common mac80211 snd_intel_dspcfg snd_intel_sdw_acpi snd_usb_audio snd_hda_codec snd_usbmidi_lib btusb edac_mce_amd iwlwifi libarc4 uvcvideo snd_hda_core btrtl snd_rawmidi snd_hwdep videobuf2_vmalloc btbcm kvm_amd videobuf2_memops snd_seq iwlmei btintel videobuf2_v4l2 eeepc_wmi snd_seq_device videobuf2_common btmtk kvm xpad videodev joydev irqbypass snd_pcm asus_wmi hid_logitech_hidpp ff_memless cfg80211 bluetooth rapl mc [ 297.834932] ledtrig_audio snd_timer sparse_keymap platform_profile wmi_bmof snd video pcspkr k10temp i2c_piix4 rfkill soundcore mei asus_ec_sensors acpi_cpufreq zram amdgpu drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul crc32c_intel iommu_v2 ucsi_ccg gpu_sched typec_ucsi drm_buddy ghash_clmulni_intel drm_display_helper ccp igb typec sp5100_tco nvme cec nvme_core dca wmi ip6_tables ip_tables fuse [ 297.834978] Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 [ 297.835055] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 [ 297.835071] CPU: 30 PID: 2377 Comm: kworker/30:6 Tainted: G WL--- --- 6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1 [ 297.835075] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022 [ 297.835078] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched] [ 297.835085] RIP: 0010:refcount_warn_saturate+0xba/0x110 [ 297.835088] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d be 7d be 01 00 75 85 48 c7 c7 c0 99 8e aa c6 05 ae 7d be 01 01 e8 36 59 6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff 48 c7 [ 297.835091] RSP: 0018:bd3506df7e60 EFLAGS: 00010286 [ 297.835095] RAX: 0026 RBX: 961b250cbc28 RCX: [ 297.835097] RDX: 0001 RSI: aa8d07a4 RDI: [ 297.835100] RBP: 96276a3f5600 R08: R09: bd3506df7d10 [ 297.835102] R10: 0003 R11: 9627ae2fffe8 R12: 96276a3fc800 [ 297.835105] R13: 9618c03e6600 R14: 96276a3fc805 R15: 961b250cbc30 [ 297.835108] FS: () GS:96276a20() knlGS: [ 297.835110] CS: 0010 DS: ES: CR0: 80050033 [ 297.835113] CR2: 621001e4a000 CR3: 00018d958000 CR4: 00350ee0 [ 297.835116] Call Trace: [
Re: [BUG][5.20] refcount_t: underflow; use-after-free
On Wed, Aug 17, 2022 at 9:08 PM Melissa Wen wrote: > > Hi Mikhail, > > IIUC, you got this second user-after-free by applying the first version > of Maíra's patch, right? So, that version was adding another unbalanced > unlock to the cs ioctl flow, but it was solved in the latest version, > that you can find here: https://patchwork.freedesktop.org/patch/497680/ > If this is the situation, can you check this last version? > > Thanks, > > Melissa With the last version warning "bad unlock balance detected!" was gone, but the user-after-free issue remains. And again "Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]". [ 297.834779] [ cut here ] [ 297.834818] refcount_t: underflow; use-after-free. [ 297.834831] WARNING: CPU: 30 PID: 2377 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 [ 297.834838] Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u mt76x2_common mt76x02_usb mt76_usb mt76x02_lib snd_hda_codec_realtek iwlmvm intel_rapl_msr snd_hda_codec_generic snd_hda_codec_hdmi mt76 vfat fat snd_hda_intel intel_rapl_common mac80211 snd_intel_dspcfg snd_intel_sdw_acpi snd_usb_audio snd_hda_codec snd_usbmidi_lib btusb edac_mce_amd iwlwifi libarc4 uvcvideo snd_hda_core btrtl snd_rawmidi snd_hwdep videobuf2_vmalloc btbcm kvm_amd videobuf2_memops snd_seq iwlmei btintel videobuf2_v4l2 eeepc_wmi snd_seq_device videobuf2_common btmtk kvm xpad videodev joydev irqbypass snd_pcm asus_wmi hid_logitech_hidpp ff_memless cfg80211 bluetooth rapl mc [ 297.834932] ledtrig_audio snd_timer sparse_keymap platform_profile wmi_bmof snd video pcspkr k10temp i2c_piix4 rfkill soundcore mei asus_ec_sensors acpi_cpufreq zram amdgpu drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul crc32c_intel iommu_v2 ucsi_ccg gpu_sched typec_ucsi drm_buddy ghash_clmulni_intel drm_display_helper ccp igb typec sp5100_tco nvme cec nvme_core dca wmi ip6_tables ip_tables fuse [ 297.834978] Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 [ 297.835055] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 [ 297.835071] CPU: 30 PID: 2377 Comm: kworker/30:6 Tainted: G WL--- --- 6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1 [ 297.835075] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022 [ 297.835078] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched] [ 297.835085] RIP: 0010:refcount_warn_saturate+0xba/0x110 [ 297.835088] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d be 7d be 01 00 75 85 48 c7 c7 c0 99 8e aa c6 05 ae 7d be 01 01 e8 36 59 6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff 48 c7 [ 297.835091] RSP: 0018:bd3506df7e60 EFLAGS: 00010286 [ 297.835095] RAX: 0026 RBX: 961b250cbc28 RCX: [ 297.835097] RDX: 0001 RSI: aa8d07a4 RDI: [ 297.835100] RBP: 96276a3f5600 R08: R09: bd3506df7d10 [ 297.835102] R10: 0003 R11: 9627ae2fffe8 R12: 96276a3fc800 [ 297.835105] R13: 9618c03e6600 R14: 96276a3fc805 R15: 961b250cbc30 [ 297.835108] FS: () GS:96276a20() knlGS: [ 297.835110] CS: 0010 DS: ES: CR0: 80050033 [ 297.835113] CR2: 621001e4a000 CR3: 00018d958000 CR4: 00350ee0 [ 297.835116] Call Trace: [ 297.835118] [ 297.835121] process_one_work+0x2a0/0x600 [ 297.835133] worker_thread+0x4f/0x3a0 [ 297.835139] ? process_one_work+0x600/0x600 [ 297.835142] kthread+0xf5/0x120 [ 297.835145] ? kthread_complete_and_exit+0x20/0x20 [ 297.835151] ret_from_fork+0x22/0x30 [ 297.835166] [
Re: [BUG][5.20] refcount_t: underflow; use-after-free
On Mon, Aug 15, 2022 at 3:37 PM Mikhail Gavrilov wrote: > > Thanks, I tested this patch. > But with this patch use-after-free problem happening in another place: Does anyone have an idea why the second use-after-free happened? >From the trace I don't understand which code is related. I don't quite understand what the "Workqueue" entry in the trace means. [ 408.358737] [ cut here ] [ 408.358743] refcount_t: underflow; use-after-free. [ 408.358760] WARNING: CPU: 9 PID: 62 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 [ 408.358769] Modules linked in: uinput snd_seq_dummy rfcomm snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u mt76x2_common snd_hda_codec_realtek mt76x02_usb snd_hda_codec_generic iwlmvm snd_hda_codec_hdmi mt76_usb intel_rapl_msr snd_hda_intel mt76x02_lib intel_rapl_common snd_intel_dspcfg snd_intel_sdw_acpi mt76 snd_hda_codec vfat fat snd_usb_audio snd_hda_core edac_mce_amd mac80211 snd_usbmidi_lib snd_hwdep snd_rawmidi mc snd_seq btusb kvm_amd iwlwifi snd_seq_device btrtl btbcm libarc4 btintel eeepc_wmi snd_pcm iwlmei kvm btmtk asus_wmi ledtrig_audio irqbypass joydev snd_timer sparse_keymap bluetooth platform_profile rapl cfg80211 snd video wmi_bmof soundcore i2c_piix4 k10temp rfkill mei [ 408.358853] asus_ec_sensors acpi_cpufreq zram hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm iommu_v2 crct10dif_pclmul gpu_sched crc32_pclmul ucsi_ccg crc32c_intel drm_buddy nvme typec_ucsi drm_display_helper ghash_clmulni_intel ccp typec nvme_core sp5100_tco cec wmi ip6_tables ip_tables fuse [ 408.358880] Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 [ 408.358953] pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 [ 408.358967] CPU: 9 PID: 62 Comm: kworker/9:0 Tainted: G W L --- --- 6.0.0-0.rc1.13.fc38.x86_64+debug #1 [ 408.358971] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022 [ 408.358974] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched] [ 408.358982] RIP: 0010:refcount_warn_saturate+0xba/0x110 [ 408.358987] Code: 01 01 e8 d9 59 6f 00 0f 0b e9 a2 46 a5 00 80 3d 3e 7e be 01 00 75 85 48 c7 c7 70 99 8e 92 c6 05 2e 7e be 01 01 e8 b6 59 6f 00 <0f> 0b e9 7f 46 a5 00 80 3d 19 7e be 01 00 0f 85 5e ff ff ff 48 c7 [ 408.358990] RSP: 0018:b124003efe60 EFLAGS: 00010286 [ 408.358994] RAX: 0026 RBX: 9987a025d428 RCX: [ 408.358997] RDX: 0001 RSI: 928d0754 RDI: [ 408.358999] RBP: 9994e4ff5600 R08: R09: b124003efd10 [ 408.359001] R10: 0003 R11: 99952e2fffe8 R12: 9994e4ffc800 [ 408.359004] R13: 998600228cc0 R14: 9994e4ffc805 R15: 9987a025d430 [ 408.359006] FS: () GS:9994e4e0() knlGS: [ 408.359009] CS: 0010 DS: ES: CR0: 80050033 [ 408.359012] CR2: 27ac39e78000 CR3: 0001a66d8000 CR4: 00350ee0 [ 408.359015] Call Trace: [ 408.359017] [ 408.359020] process_one_work+0x2a0/0x600 [ 408.359032] worker_thread+0x4f/0x3a0 [ 408.359036] ? process_one_work+0x600/0x600 [ 408.359039] kthread+0xf5/0x120 [ 408.359044] ? kthread_complete_and_exit+0x20/0x20 [ 408.359049] ret_from_fork+0x22/0x30 [ 408.359061] [ 408.359063] irq event stamp: 5468 [ 408.359064] hardirqs last enabled at (5467): [] _raw_spin_unlock_irq+0x24/0x50 [ 408.359071] hardirqs last disabled at (5468): [] __schedule+0xe2c/0x16d0 [ 408.359076] softirqs last enabled at (2482): [] rht_deferred_worker+0x708/0xc00 [ 408.359079] softirqs last disabled at (2480): [] rht_deferred_worker+0x1f7/0xc00 [ 408.359082] ---[ end trace ]--- Full kernel log is here:
Re: [BUG][5.20] refcount_t: underflow; use-after-free
Am 15.08.22 um 12:55 schrieb Melissa Wen: On 08/14, Maíra Canal wrote: Hi Mikhail Looks like this use-after-free problem was introduced on 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems like: if amdgpu_cs_vm_handling return r != 0, then it will unlock bo_list_mutex inside the function amdgpu_cs_vm_handling and again on amdgpu_cs_parser_fini. Maybe the following patch will help: --- From 71d718c0f53a334bb59bcd5dabd29bbe92c724af Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ma=C3=ADra=20Canal?= Date: Sun, 14 Aug 2022 21:12:24 -0300 Subject: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a mutex v2") Reported-by: Mikhail Gavrilov Signed-off-by: Maíra Canal --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index d8f1335bc68f..a7fce7b14321 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -837,17 +837,14 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p) continue; r = amdgpu_vm_bo_update(adev, bo_va, false); - if (r) { - mutex_unlock(>bo_list->bo_list_mutex); + if (r) return r; - } r = amdgpu_sync_fence(>job->sync, bo_va->last_pt_update); - if (r) { - mutex_unlock(>bo_list->bo_list_mutex); + if (r) return r; - } } + mutex_unlock(>bo_list->bo_list_mutex); I think we don't need to unlock the bo_list_mutex here. If return != 0 amdgpu_cs_parser_fini() will unlock it; otherwise, amdgpu_cs_submit() unlocks it in the end. Yeah, exactly that. Apart from that the patch looks good to me. We moved the mutex unlocking around a few times during review. Probably just a fallout from that. Thanks for fixing this, Christian. BR, Melissa r = amdgpu_vm_handle_moved(adev, vm); if (r) -- 2.37.1 --- Best Regards, - Maíra Canal On 8/14/22 18:11, Mikhail Gavrilov wrote: Hi folks. Joined testing 5.20 today (7ebfc85e2cd7). I encountered a frequently GPU freeze, after which a message appears in the kernel logs: [ 220.280990] [ cut here ] [ 220.281000] refcount_t: underflow; use-after-free. [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 [ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr fat intel_rapl_common snd_hda_codec_realtek mt76x2u snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76 kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev snd_seq_device joydev xpad iwlmei platform_profile bluetooth ff_memless snd_pcm mc rapl [ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec sp5100_tco cec wmi ip6_tables ip_tables fuse [ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
Re: [BUG][5.20] refcount_t: underflow; use-after-free
On 08/14, Maíra Canal wrote: > Hi Mikhail > > Looks like this use-after-free problem was introduced on > 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems > like: if amdgpu_cs_vm_handling return r != 0, then it will unlock > bo_list_mutex inside the function amdgpu_cs_vm_handling and again on > amdgpu_cs_parser_fini. > > Maybe the following patch will help: > > --- > From 71d718c0f53a334bb59bcd5dabd29bbe92c724af Mon Sep 17 00:00:00 2001 > From: =?UTF-8?q?Ma=C3=ADra=20Canal?= > Date: Sun, 14 Aug 2022 21:12:24 -0300 > Subject: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a > mutex v2") > Reported-by: Mikhail Gavrilov > Signed-off-by: Maíra Canal > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index d8f1335bc68f..a7fce7b14321 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -837,17 +837,14 @@ static int amdgpu_cs_vm_handling(struct > amdgpu_cs_parser *p) > continue; > > r = amdgpu_vm_bo_update(adev, bo_va, false); > - if (r) { > - mutex_unlock(>bo_list->bo_list_mutex); > + if (r) > return r; > - } > > r = amdgpu_sync_fence(>job->sync, bo_va->last_pt_update); > - if (r) { > - mutex_unlock(>bo_list->bo_list_mutex); > + if (r) > return r; > - } > } > + mutex_unlock(>bo_list->bo_list_mutex); I think we don't need to unlock the bo_list_mutex here. If return != 0 amdgpu_cs_parser_fini() will unlock it; otherwise, amdgpu_cs_submit() unlocks it in the end. BR, Melissa > > r = amdgpu_vm_handle_moved(adev, vm); > if (r) > -- > 2.37.1 > --- > Best Regards, > - Maíra Canal > > On 8/14/22 18:11, Mikhail Gavrilov wrote: > > Hi folks. > > Joined testing 5.20 today (7ebfc85e2cd7). > > I encountered a frequently GPU freeze, after which a message appears > > in the kernel logs: > > [ 220.280990] [ cut here ] > > [ 220.281000] refcount_t: underflow; use-after-free. > > [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28 > > refcount_warn_saturate+0xba/0x110 > > [ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy > > snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast > > nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet > > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink > > qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr > > fat intel_rapl_common snd_hda_codec_realtek mt76x2u > > snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm > > mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg > > mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76 > > kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib > > videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi > > snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel > > snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev > > snd_seq_device joydev xpad iwlmei platform_profile bluetooth > > ff_memless snd_pcm mc rapl > > [ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore > > k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram > > hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul > > iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme > > typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec > > sp5100_tco cec wmi ip6_tables ip_tables fuse > > [ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1 > > amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 > > amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 > > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 > > pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 > > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 > > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 > > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 > > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 > > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 > > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 > > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 > > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 > >
Re: [BUG][5.20] refcount_t: underflow; use-after-free
On Mon, Aug 15, 2022 at 5:20 AM Maíra Canal wrote: > > Hi Mikhail > > Looks like this use-after-free problem was introduced on > 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems > like: if amdgpu_cs_vm_handling return r != 0, then it will unlock > bo_list_mutex inside the function amdgpu_cs_vm_handling and again on > amdgpu_cs_parser_fini. > > Maybe the following patch will help: Thanks, I tested this patch. But with this patch use-after-free problem happening in another place: [ 894.012920] [ cut here ] [ 894.012939] refcount_t: underflow; use-after-free. [ 894.012968] WARNING: CPU: 14 PID: 205 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 [ 894.012999] Modules linked in: tls uinput rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event snd_hda_codec_realtek mt76x2u mt76x2_common snd_hda_codec_generic snd_hda_codec_hdmi intel_rapl_msr mt76x02_usb intel_rapl_common snd_hda_intel mt76_usb snd_intel_dspcfg vfat iwlmvm snd_intel_sdw_acpi mt76x02_lib fat snd_usb_audio snd_hda_codec mt76 edac_mce_amd snd_usbmidi_lib snd_hda_core btusb snd_rawmidi snd_hwdep mac80211 mc iwlwifi btrtl eeepc_wmi asus_wmi btbcm snd_seq kvm_amd libarc4 ledtrig_audio snd_seq_device btintel iwlmei sparse_keymap btmtk kvm snd_pcm irqbypass platform_profile snd_timer xpad joydev cfg80211 rapl hid_logitech_hidpp bluetooth ff_memless wmi_bmof video pcspkr snd k10temp i2c_piix4 [ 894.013086] soundcore rfkill mei asus_ec_sensors acpi_cpufreq zram amdgpu drm_ttm_helper ttm iommu_v2 crct10dif_pclmul ucsi_ccg gpu_sched crc32_pclmul crc32c_intel typec_ucsi drm_buddy typec drm_display_helper ghash_clmulni_intel igb ccp cec nvme sp5100_tco nvme_core dca wmi ip6_tables ip_tables fuse [ 894.013322] Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 [ 894.013455] pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 [ 894.013690] CPU: 14 PID: 205 Comm: kworker/14:1 Tainted: GW L--- --- 5.20.0-0.rc0.20220812git7ebfc85e2cd7.11.fc38.x86_64 #1 [ 894.013725] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022 [ 894.013756] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched] [ 894.013779] RIP: 0010:refcount_warn_saturate+0xba/0x110 [ 894.013796] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d de 7e be 01 00 75 85 48 c7 c7 f8 98 8e 9c c6 05 ce 7e be 01 01 e8 56 4a 6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff 48 c7 [ 894.013842] RSP: 0018:b48681153e60 EFLAGS: 00010286 [ 894.013858] RAX: 0026 RBX: 9bad16f1f028 RCX: [ 894.013878] RDX: 0001 RSI: 9c8d06dc RDI: [ 894.013897] RBP: 9bba663f5600 R08: R09: b48681153d10 [ 894.013916] R10: 0003 R11: 9bbaae2fffe8 R12: 9bba663fc800 [ 894.013934] R13: 9bab93fcab40 R14: 9bba663fc805 R15: 9bad16f1f030 [ 894.013954] FS: () GS:9bba6620() knlGS: [ 894.013975] CS: 0010 DS: ES: CR0: 80050033 [ 894.013991] CR2: 1aa46b2ec008 CR3: 000101516000 CR4: 00350ee0 [ 894.014011] Call Trace: [ 894.014022] [ 894.014030] process_one_work+0x2a0/0x600 [ 894.014051] worker_thread+0x4f/0x3a0 [ 894.014065] ? process_one_work+0x600/0x600 [ 894.014079] kthread+0xf5/0x120 [ 894.014092] ? kthread_complete_and_exit+0x20/0x20 [ 894.014109] ret_from_fork+0x22/0x30 [ 894.014129] [ 894.014137] irq event stamp: 5802 [ 894.014148] hardirqs last enabled at (5801): [] _raw_spin_unlock_irq+0x24/0x50 [ 894.014178] hardirqs last disabled at (5802): [] __schedule+0xe2c/0x16d0 [ 894.014206]
Re: [BUG][5.20] refcount_t: underflow; use-after-free
Hi Mikhail Looks like this use-after-free problem was introduced on 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems like: if amdgpu_cs_vm_handling return r != 0, then it will unlock bo_list_mutex inside the function amdgpu_cs_vm_handling and again on amdgpu_cs_parser_fini. Maybe the following patch will help: --- >From 71d718c0f53a334bb59bcd5dabd29bbe92c724af Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ma=C3=ADra=20Canal?= Date: Sun, 14 Aug 2022 21:12:24 -0300 Subject: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a mutex v2") Reported-by: Mikhail Gavrilov Signed-off-by: Maíra Canal --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index d8f1335bc68f..a7fce7b14321 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -837,17 +837,14 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p) continue; r = amdgpu_vm_bo_update(adev, bo_va, false); - if (r) { - mutex_unlock(>bo_list->bo_list_mutex); + if (r) return r; - } r = amdgpu_sync_fence(>job->sync, bo_va->last_pt_update); - if (r) { - mutex_unlock(>bo_list->bo_list_mutex); + if (r) return r; - } } + mutex_unlock(>bo_list->bo_list_mutex); r = amdgpu_vm_handle_moved(adev, vm); if (r) -- 2.37.1 --- Best Regards, - Maíra Canal On 8/14/22 18:11, Mikhail Gavrilov wrote: > Hi folks. > Joined testing 5.20 today (7ebfc85e2cd7). > I encountered a frequently GPU freeze, after which a message appears > in the kernel logs: > [ 220.280990] [ cut here ] > [ 220.281000] refcount_t: underflow; use-after-free. > [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28 > refcount_warn_saturate+0xba/0x110 > [ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy > snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast > nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink > qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr > fat intel_rapl_common snd_hda_codec_realtek mt76x2u > snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm > mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg > mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76 > kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib > videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi > snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel > snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev > snd_seq_device joydev xpad iwlmei platform_profile bluetooth > ff_memless snd_pcm mc rapl > [ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore > k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram > hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul > iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme > typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec > sp5100_tco cec wmi ip6_tables ip_tables fuse > [ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1 > amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 > amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 > pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 > [ 220.281388] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 > fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 > [ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L --- > --- 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1 > [
[BUG][5.20] refcount_t: underflow; use-after-free
Hi folks. Joined testing 5.20 today (7ebfc85e2cd7). I encountered a frequently GPU freeze, after which a message appears in the kernel logs: [ 220.280990] [ cut here ] [ 220.281000] refcount_t: underflow; use-after-free. [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 [ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr fat intel_rapl_common snd_hda_codec_realtek mt76x2u snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76 kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev snd_seq_device joydev xpad iwlmei platform_profile bluetooth ff_memless snd_pcm mc rapl [ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec sp5100_tco cec wmi ip6_tables ip_tables fuse [ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 [ 220.281388] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 [ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L --- --- 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1 [ 220.281421] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022 [ 220.281426] RIP: 0010:refcount_warn_saturate+0xba/0x110 [ 220.281431] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d de 7e be 01 00 75 85 48 c7 c7 f8 98 8e 98 c6 05 ce 7e be 01 01 e8 56 4a 6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff 48 c7 [ 220.281437] RSP: 0018:b4b0d18d7a80 EFLAGS: 00010282 [ 220.281443] RAX: 0026 RBX: 0003 RCX: [ 220.281448] RDX: 0001 RSI: 988d06dc RDI: [ 220.281452] RBP: R08: R09: b4b0d18d7930 [ 220.281457] R10: 0003 R11: a0672e2fffe8 R12: a058ca360400 [ 220.281461] R13: a05846c50a18 R14: fe00 R15: 0003 [ 220.281465] FS: 7f82683e06c0() GS:a066e2e0() knlGS: [ 220.281470] CS: 0010 DS: ES: CR0: 80050033 [ 220.281475] CR2: 3590005cc000 CR3: 0001fca46000 CR4: 00350ee0 [ 220.281480] Call Trace: [ 220.281485] [ 220.281490] amdgpu_cs_ioctl+0x4e2/0x2070 [amdgpu] [ 220.281806] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu] [ 220.282028] drm_ioctl_kernel+0xa4/0x150 [ 220.282043] drm_ioctl+0x21f/0x420 [ 220.282053] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu] [ 220.282275] ? lock_release+0x14f/0x460 [ 220.282282] ? _raw_spin_unlock_irqrestore+0x30/0x60 [ 220.282290] ? _raw_spin_unlock_irqrestore+0x30/0x60 [ 220.282297] ? lockdep_hardirqs_on+0x7d/0x100 [ 220.282305] ? _raw_spin_unlock_irqrestore+0x40/0x60 [ 220.282317] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu] [ 220.282534] __x64_sys_ioctl+0x90/0xd0 [ 220.282545] do_syscall_64+0x5b/0x80 [ 220.282551] ? futex_wake+0x6c/0x150 [ 220.282568] ? lock_is_held_type+0xe8/0x140 [ 220.282580] ? do_syscall_64+0x67/0x80 [ 220.282585] ? lockdep_hardirqs_on+0x7d/0x100 [ 220.282592] ? do_syscall_64+0x67/0x80 [ 220.282597] ? do_syscall_64+0x67/0x80 [ 220.282602] ?