Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-09-19 Thread Mikhail Gavrilov
Hi!
Unfortunately the use-after-free issue still happens on the 6.0-rc5 kernel.
The issue became hard to repeat. I spent the whole day at the computer
when use-after-free again happened, I was playing the game Tiny Tina's
Wonderlands.
Therefore, forget about repeatability. It remains only to hope for
logs and tracing.
I didn't see anything new in the logs. It seems that we need to
somehow expand the logging so that the next time this happens we have
more information.

Sep 18 20:52:16 primary-ws gnome-shell[2388]:
meta_window_set_stack_position_no_sync: assertion
'window->stack_position >= 0' failed
Sep 18 20:52:27 primary-ws gnome-shell[2388]:
meta_window_set_stack_position_no_sync: assertion
'window->stack_position >= 0' failed
Sep 18 20:53:44 primary-ws gnome-shell[2388]: Window manager warning:
Window 0x4e3 sets an MWM hint indicating it isn't resizable, but
sets min size 1 x 1 and max size 2147483647 x 2147483647; this doesn't
make much sense.
Sep 18 20:53:45 primary-ws kernel: umip_printk: 11 callbacks suppressed
Sep 18 20:53:45 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:14ebb0d03 sp:4ee528: SGDT instruction cannot be used by
applications.
Sep 18 20:53:45 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:14ebb0d03 sp:4ee528: For now, expensive software emulation returns
the result.
Sep 18 20:53:53 primary-ws gnome-shell[2388]:
meta_window_set_stack_position_no_sync: assertion
'window->stack_position >= 0' failed
Sep 18 20:53:53 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:14ebb0d03 sp:4ee528: SGDT instruction cannot be used by
applications.
Sep 18 20:53:53 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:14ebb0d03 sp:4ee528: For now, expensive software emulation returns
the result.
Sep 18 20:54:15 primary-ws kernel: umip: Wonderlands.exe[214194]
ip:15a270815 sp:6eaef490: SGDT instruction cannot be used by
applications.
Sep 18 20:56:01 primary-ws kernel: umip_printk: 15 callbacks suppressed
Sep 18 20:56:01 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:15e3a82b0 sp:4ed178: SGDT instruction cannot be used by
applications.
Sep 18 20:56:01 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:15e3a82b0 sp:4ed178: For now, expensive software emulation returns
the result.
Sep 18 20:56:03 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:15e3a82b0 sp:4edbe8: SGDT instruction cannot be used by
applications.
Sep 18 20:56:03 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:15e3a82b0 sp:4edbe8: For now, expensive software emulation returns
the result.
Sep 18 20:56:03 primary-ws kernel: umip: Wonderlands.exe[213853]
ip:15e3a82b0 sp:4ebf18: SGDT instruction cannot be used by
applications.
Sep 18 20:57:55 primary-ws kernel: [ cut here ]
Sep 18 20:57:55 primary-ws kernel: refcount_t: underflow; use-after-free.
Sep 18 20:57:55 primary-ws kernel: WARNING: CPU: 22 PID: 235114 at
lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
Sep 18 20:57:55 primary-ws kernel: Modules linked in: tls uinput
rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_>
Sep 18 20:57:55 primary-ws kernel:  asus_wmi ledtrig_audio
sparse_keymap platform_profile irqbypass rfkill mc rapl snd_timer
video wmi_bmof pcspkr snd k10temp i2c_piix4 soundcore acpi_cpufreq
zram amdgpu drm_ttm_helper ttm iommu_v2 crct1>
Sep 18 20:57:55 primary-ws kernel: Unloaded tainted modules:
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_eda>
Sep 18 20:57:55 primary-ws kernel:  pcc_cpufreq():1 pcc_cpufreq():1
fjes():1 fjes():1 pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1
fjes():1
Sep 18 20:57:55 primary-ws kernel: CPU: 22 PID: 235114 Comm:
kworker/22:0 Tainted: GWL---  ---
6.0.0-0.rc5.20220914git3245cb65fd91.39.fc38.x86_64 #1
Sep 18 20:57:55 primary-ws kernel: Hardware name: System manufacturer
System Product Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
Sep 18 20:57:55 primary-ws kernel: Workqueue: events
drm_sched_entity_kill_jobs_work [gpu_sched]
Sep 18 20:57:55 primary-ws kernel: RIP: 0010:refcount_warn_saturate+0xba/0x110
Sep 18 20:57:55 primary-ws kernel: Code: 01 01 e8 69 6b 6f 00 0f 0b e9
32 38 a5 00 80 3d 4d 7d be 01 00 75 85 48 c7 c7 80 b7 8e 95 c6 05 3d
7d be 01 01 e8 46 6b 6f 00 <0f> 0b e9 0f 38 a5 00 80 3d 28 7d be 01 00
0f 85 5e ff ff ff 48 c7
Sep 18 20:57:55 primary-ws kernel: RSP: 0018:a1a853ccbe60 EFLAGS: 00010286
Sep 18 20:57:55 primary-ws kernel: RAX: 0026 RBX:
8e0e60a96c28 RCX: 
Sep 18 20:57:55 primary-ws kernel: RDX: 0001 RSI:
958d255c RDI: 
Sep 18 20:57:55 primary-ws kernel: RBP: 8e19a83f5600 R08:
 R09: a1a853ccbd10
Sep 18 20:57:55 primary-ws kernel: R10: 0003 R11:
8e19ee2fffe8 R12: 8e19a83fc800
Sep 18 

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-24 Thread Mikhail Gavrilov
On Fri, Aug 19, 2022 at 5:13 PM Maíra Canal  wrote:
>
> Hi Mikhail,
>
> Could you please specify the steps to reproduce this use-after-free? I
> will try to reproduce it on the RX5700 XT and bisect the issue.
>

Hi Maíra, thanks for help.

I'm afraid that it will be unrealistic to reproduce, because on a
laptop with 6800M (also RDNA 2 graphics) the problem does not repeat.

Sorry for the long silence, but I was trying to bisect the problem myself.

git bisect start
# status: waiting for both good and bad commits
# good: [3d7cb6b04c3f3115719235cc6866b10326de34cd] Linux 5.19
git bisect good 3d7cb6b04c3f3115719235cc6866b10326de34cd
# status: waiting for bad commit, 1 good commit known
# bad: [7ebfc85e2cd7b08f518b526173e9a33b56b3913b] Merge tag
'net-6.0-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect bad 7ebfc85e2cd7b08f518b526173e9a33b56b3913b

# bad: [b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1] Merge tag
'drm-next-2022-08-03' of git://anongit.freedesktop.org/drm/drm
# 001: GPU hangs + use-after-free issue - https://pastebin.com/z86E9ydx
git bisect bad b44f2fd87919b5ae6e1756d4c7ba2cbba22238e1

# good: [526942b8134cc34d25d27f95dfff98b8ce2f6fcd] Merge tag
'ata-5.20-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
# 002: good - https://pastebin.com/9qki65Sj
git bisect good 526942b8134cc34d25d27f95dfff98b8ce2f6fcd

# good: [45490ce2ff833c4ec0de66705e46ba41320860cb] nfp: flower: add
support for tunnel offload without key ID
# 003: good - https://pastebin.com/vHk5eRkw
git bisect good 45490ce2ff833c4ec0de66705e46ba41320860cb

# skip: [e23a5e14aa278858c2e3d81ec34e83aa9a4177c5] Backmerge tag
'v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
into drm-next
# 004: GPU not switched in graphic mode - https://pastebin.com/RmqCTMLD
git bisect skip e23a5e14aa278858c2e3d81ec34e83aa9a4177c5

# bad: [b2065fb21d9a789b14f737ea90facedabadeb8a4] drm/amdgpu: fix
i2s_pdata out of bound array access
# 005: GPU hangs + use-after-free issue - https://pastebin.com/Zgw5Hc48
git bisect bad b2065fb21d9a789b14f737ea90facedabadeb8a4

# skip: [344feb7ccf764756937cfd74fa4ac5caba069c99] Merge tag
'amd-drm-next-5.20-2022-07-05' of
https://gitlab.freedesktop.org/agd5f/linux into drm-next
# 006: GPU not switched in graphic mode - https://pastebin.com/b8BUBE7Q
git bisect skip 344feb7ccf764756937cfd74fa4ac5caba069c99

# skip: [869b10ac8d2300327f554d83f4dbab041bf27d49] drm/amdgpu: add dm
ip block for dcn 3.1.4
# 007: GPU not switched in graphic mode - https://pastebin.com/byd7HECH
git bisect skip 869b10ac8d2300327f554d83f4dbab041bf27d49

# skip: [676ad8e997036e2f815c293b76c356fb7cc97a08] drm: rcar-du: Lift
z-pos restriction on primary plane for Gen3
# 008: GPU not switched in graphic mode - https://pastebin.com/3fXCTinb
git bisect skip 676ad8e997036e2f815c293b76c356fb7cc97a08

# skip: [5c57cbc390b166950c2e6c2f0c4edaeb0f47e97d] drm/bridge: lt9211:
Convert to drm_of_get_data_lanes_count
# 009: Build error - https://pastebin.com/rxHe9QRB
git bisect skip 5c57cbc390b166950c2e6c2f0c4edaeb0f47e97d

# skip: [6db5e0c8692e590734a7ec7455365d9cbaa15ef1] Merge tag
'drm-intel-next-2022-07-06' of
git://anongit.freedesktop.org/drm/drm-intel into drm-next
# 010: GPU not switched in graphic mode - https://pastebin.com/rqubSuc8
git bisect skip 6db5e0c8692e590734a7ec7455365d9cbaa15ef1

# skip: [5d763a9955f0fbf2681a2f1fa87c416056bd0c89] drm/amd/display:
Remove compiler warning
# 011: GPU not switched in graphic mode - https://pastebin.com/BrJs6ybP
git bisect skip 5d763a9955f0fbf2681a2f1fa87c416056bd0c89

# skip: [e6c2db2be986158afb9991d9fa8a38fe65a88516] drm/i915: Don't use
DRM_DEBUG_WARN_ON for unexpected l3bank/mslice config
# 012: GPU not switched in graphic mode - https://pastebin.com/yxppyqbD
git bisect skip e6c2db2be986158afb9991d9fa8a38fe65a88516

# bad: [cb6b81b21bd9cf09d72b7fe711be1b55001eb166] Merge tag
'drm-misc-next-fixes-2022-07-21' of
git://anongit.freedesktop.org/drm/drm-misc into drm-next
# 013: GPU hangs without use-after-free issue - https://pastebin.com/iRek4bBy
git bisect bad cb6b81b21bd9cf09d72b7fe711be1b55001eb166

# skip: [48b927770f8ad3f8cf4a024a552abf272af9f592]
drm/exynos/exynos7_drm_decon: free resources when clk_set_parent()
failed.
# 014: GPU not switched in graphic mode - https://pastebin.com/ekp10xhP
git bisect skip 48b927770f8ad3f8cf4a024a552abf272af9f592

# skip: [c5da61cf5bab30059f22ea368702c445ee87171a] drm/amdgpu/display:
add missing FP_START/END checks dcn32_clk_mgr.c
# 015: GPU not switched in graphic mode - https://pastebin.com/YbskKWmA
git bisect skip c5da61cf5bab30059f22ea368702c445ee87171a

# skip: [a77f7c89e62c6dfe405a64995812746f27adc510] drm/edid: convert
drm_gtf_modes_for_range() to drm_edid
# 016: GPU not switched in graphic mode - https://pastebin.com/bA2AwkJ7
git bisect skip a77f7c89e62c6dfe405a64995812746f27adc510

# skip: [6fde8eec71796f3534f0c274066862829813b21f] drm/doc: Add KUnit
documentation
# 017: GPU not switched in graphic mode - 

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-24 Thread Melissa Wen
On 08/17, Mikhail Gavrilov wrote:
> On Mon, Aug 15, 2022 at 3:37 PM Mikhail Gavrilov
>  wrote:
> >
> > Thanks, I tested this patch.
> > But with this patch use-after-free problem happening in another place:
> 
> Does anyone have an idea why the second use-after-free happened?
> From the trace I don't understand which code is related.
> I don't quite understand what the "Workqueue" entry in the trace means.

Hi Mikhail,

IIUC, you got this second user-after-free by applying the first version
of Maíra's patch, right? So, that version was adding another unbalanced
unlock to the cs ioctl flow, but it was solved in the latest version,
that you can find here: https://patchwork.freedesktop.org/patch/497680/
If this is the situation, can you check this last version?

Thanks,

Melissa

> 
> [ 408.358737] [ cut here ]
> [ 408.358743] refcount_t: underflow; use-after-free.
> [ 408.358760] WARNING: CPU: 9 PID: 62 at lib/refcount.c:28
> refcount_warn_saturate+0xba/0x110
> [ 408.358769] Modules linked in: uinput snd_seq_dummy rfcomm
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u
> mt76x2_common snd_hda_codec_realtek mt76x02_usb snd_hda_codec_generic
> iwlmvm snd_hda_codec_hdmi mt76_usb intel_rapl_msr snd_hda_intel
> mt76x02_lib intel_rapl_common snd_intel_dspcfg snd_intel_sdw_acpi mt76
> snd_hda_codec vfat fat snd_usb_audio snd_hda_core edac_mce_amd
> mac80211 snd_usbmidi_lib snd_hwdep snd_rawmidi mc snd_seq btusb
> kvm_amd iwlwifi snd_seq_device btrtl btbcm libarc4 btintel eeepc_wmi
> snd_pcm iwlmei kvm btmtk asus_wmi ledtrig_audio irqbypass joydev
> snd_timer sparse_keymap bluetooth platform_profile rapl cfg80211 snd
> video wmi_bmof soundcore i2c_piix4 k10temp rfkill mei
> [ 408.358853] asus_ec_sensors acpi_cpufreq zram hid_logitech_hidpp
> amdgpu igb dca drm_ttm_helper ttm iommu_v2 crct10dif_pclmul gpu_sched
> crc32_pclmul ucsi_ccg crc32c_intel drm_buddy nvme typec_ucsi
> drm_display_helper ghash_clmulni_intel ccp typec nvme_core sp5100_tco
> cec wmi ip6_tables ip_tables fuse
> [ 408.358880] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
> [ 408.358953] pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1
> fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
> [ 408.358967] CPU: 9 PID: 62 Comm: kworker/9:0 Tainted: G W L ---
> --- 6.0.0-0.rc1.13.fc38.x86_64+debug #1
> [ 408.358971] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
> [ 408.358974] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
> [ 408.358982] RIP: 0010:refcount_warn_saturate+0xba/0x110
> [ 408.358987] Code: 01 01 e8 d9 59 6f 00 0f 0b e9 a2 46 a5 00 80 3d 3e
> 7e be 01 00 75 85 48 c7 c7 70 99 8e 92 c6 05 2e 7e be 01 01 e8 b6 59
> 6f 00 <0f> 0b e9 7f 46 a5 00 80 3d 19 7e be 01 00 0f 85 5e ff ff ff 48
> c7
> [ 408.358990] RSP: 0018:b124003efe60 EFLAGS: 00010286
> [ 408.358994] RAX: 0026 RBX: 9987a025d428 RCX: 
> 
> [ 408.358997] RDX: 0001 RSI: 928d0754 RDI: 
> 
> [ 408.358999] RBP: 9994e4ff5600 R08:  R09: 
> b124003efd10
> [ 408.359001] R10: 0003 R11: 99952e2fffe8 R12: 
> 9994e4ffc800
> [ 408.359004] R13: 998600228cc0 R14: 9994e4ffc805 R15: 
> 9987a025d430
> [ 408.359006] FS: () GS:9994e4e0()
> knlGS:
> [ 408.359009] CS: 0010 DS:  ES:  CR0: 80050033
> [ 408.359012] CR2: 27ac39e78000 CR3: 0001a66d8000 CR4: 
> 00350ee0
> [ 408.359015] Call Trace:
> [ 408.359017] 
> [ 408.359020] process_one_work+0x2a0/0x600
> [ 408.359032] worker_thread+0x4f/0x3a0
> [ 408.359036] ? process_one_work+0x600/0x600
> [ 408.359039] 

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-19 Thread Maíra Canal



On 8/17/22 17:57, Mikhail Gavrilov wrote:
> On Wed, Aug 17, 2022 at 11:43 PM Maíra Canal  wrote:
>>
>> Hi Mikhail,
>>
>> Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial
>> revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the
>> error. Try reverting it and check if the use-after-free still happens.
> 
> Thanks, but unfortunately, this did not lead to the expected result.
> Again happens use-after-free in an incomprehensible context.
> From the new: added warning "suspicious RCU usage" but it looks like
> it is completely not related to the use-after-free issue.
> 

Hi Mikhail,

Could you please specify the steps to reproduce this use-after-free? I
will try to reproduce it on the RX5700 XT and bisect the issue.

Best Regards,
- Maíra Canal

> [ 215.434115] [ cut here ]
> [ 215.434184] refcount_t: underflow; use-after-free.
> [ 215.434204] WARNING: CPU: 7 PID: 1258 at lib/refcount.c:28
> refcount_warn_saturate+0xba/0x110
> [ 215.434214] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event
> intel_rapl_msr intel_rapl_common snd_hda_codec_realtek vfat
> snd_hda_codec_generic snd_hda_codec_hdmi mt76x2u fat mt76x2_common
> snd_hda_intel mt76x02_usb snd_intel_dspcfg snd_intel_sdw_acpi mt76_usb
> iwlmvm edac_mce_amd snd_usb_audio snd_hda_codec mt76x02_lib
> snd_hda_core snd_usbmidi_lib snd_hwdep snd_rawmidi uvcvideo mt76
> kvm_amd snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_device
> mac80211 videobuf2_v4l2 videobuf2_common kvm btusb iwlwifi snd_pcm
> btrtl videodev libarc4 eeepc_wmi btbcm asus_wmi iwlmei btintel
> ledtrig_audio xpad irqbypass sparse_keymap btmtk platform_profile
> joydev
> [ 215.434436] hid_logitech_hidpp rapl ff_memless mc snd_timer
> bluetooth cfg80211 video pcspkr wmi_bmof snd soundcore k10temp
> i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram amdgpu
> drm_ttm_helper ttm iommu_v2 ucsi_ccg gpu_sched crct10dif_pclmul
> crc32_pclmul typec_ucsi drm_buddy crc32c_intel ghash_clmulni_intel ccp
> igb sp5100_tco typec drm_display_helper nvme dca nvme_core cec wmi
> ip6_tables ip_tables fuse
> [ 215.434528] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1
> [ 215.434672] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
> pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
> [ 215.434702] CPU: 7 PID: 1258 Comm: kworker/7:3 Tainted: G W L
> --- --- 6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1
> [ 215.434709] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
> [ 215.434715] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
> [ 215.434728] RIP: 0010:refcount_warn_saturate+0xba/0x110
> [ 215.434734] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d be
> 7d be 01 00 75 85 48 c7 c7 c0 99 8e 92 c6 05 ae 7d be 01 01 e8 36 59
> 6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff 48
> c7
> [ 215.434740] RSP: 0018:9ccb0237fe60 EFLAGS: 00010286
> [ 215.434747] RAX: 0026 RBX: 8d531f6f2828 RCX: 
> 
> [ 215.434753] RDX: 0001 RSI: 928d07a4 RDI: 
> 
> [ 215.434757] RBP: 8d61e47f5600 R08:  R09: 
> 9ccb0237fd10
> [ 215.434762] R10: 0003 R11: 8d622e2fffe8 R12: 
> 8d61e47fc800
> [ 215.434767] R13: 8d5313e95500 R14: 8d61e47fc805 R15: 
> 8d531f6f2830
> [ 215.434772] FS: () GS:8d61e460()
> knlGS:
> [ 215.434777] CS: 0010 DS:  ES:  CR0: 80050033
> [ 215.434782] CR2: 7f0c8b815048 CR3: 0001ab0e8000 CR4: 
> 00350ee0
> [ 215.434788] Call Trace:
> [ 215.434792] 
> [ 

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-17 Thread Mikhail Gavrilov
On Wed, Aug 17, 2022 at 11:43 PM Maíra Canal  wrote:
>
> Hi Mikhail,
>
> Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial
> revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the
> error. Try reverting it and check if the use-after-free still happens.

Thanks, but unfortunately, this did not lead to the expected result.
Again happens use-after-free in an incomprehensible context.
>From the new: added warning "suspicious RCU usage" but it looks like
it is completely not related to the use-after-free issue.

[ 215.434115] [ cut here ]
[ 215.434184] refcount_t: underflow; use-after-free.
[ 215.434204] WARNING: CPU: 7 PID: 1258 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[ 215.434214] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event
intel_rapl_msr intel_rapl_common snd_hda_codec_realtek vfat
snd_hda_codec_generic snd_hda_codec_hdmi mt76x2u fat mt76x2_common
snd_hda_intel mt76x02_usb snd_intel_dspcfg snd_intel_sdw_acpi mt76_usb
iwlmvm edac_mce_amd snd_usb_audio snd_hda_codec mt76x02_lib
snd_hda_core snd_usbmidi_lib snd_hwdep snd_rawmidi uvcvideo mt76
kvm_amd snd_seq videobuf2_vmalloc videobuf2_memops snd_seq_device
mac80211 videobuf2_v4l2 videobuf2_common kvm btusb iwlwifi snd_pcm
btrtl videodev libarc4 eeepc_wmi btbcm asus_wmi iwlmei btintel
ledtrig_audio xpad irqbypass sparse_keymap btmtk platform_profile
joydev
[ 215.434436] hid_logitech_hidpp rapl ff_memless mc snd_timer
bluetooth cfg80211 video pcspkr wmi_bmof snd soundcore k10temp
i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram amdgpu
drm_ttm_helper ttm iommu_v2 ucsi_ccg gpu_sched crct10dif_pclmul
crc32_pclmul typec_ucsi drm_buddy crc32c_intel ghash_clmulni_intel ccp
igb sp5100_tco typec drm_display_helper nvme dca nvme_core cec wmi
ip6_tables ip_tables fuse
[ 215.434528] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1
[ 215.434672] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[ 215.434702] CPU: 7 PID: 1258 Comm: kworker/7:3 Tainted: G W L
--- --- 6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1
[ 215.434709] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[ 215.434715] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
[ 215.434728] RIP: 0010:refcount_warn_saturate+0xba/0x110
[ 215.434734] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d be
7d be 01 00 75 85 48 c7 c7 c0 99 8e 92 c6 05 ae 7d be 01 01 e8 36 59
6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff 48
c7
[ 215.434740] RSP: 0018:9ccb0237fe60 EFLAGS: 00010286
[ 215.434747] RAX: 0026 RBX: 8d531f6f2828 RCX: 
[ 215.434753] RDX: 0001 RSI: 928d07a4 RDI: 
[ 215.434757] RBP: 8d61e47f5600 R08:  R09: 9ccb0237fd10
[ 215.434762] R10: 0003 R11: 8d622e2fffe8 R12: 8d61e47fc800
[ 215.434767] R13: 8d5313e95500 R14: 8d61e47fc805 R15: 8d531f6f2830
[ 215.434772] FS: () GS:8d61e460()
knlGS:
[ 215.434777] CS: 0010 DS:  ES:  CR0: 80050033
[ 215.434782] CR2: 7f0c8b815048 CR3: 0001ab0e8000 CR4: 00350ee0
[ 215.434788] Call Trace:
[ 215.434792] 
[ 215.434797] process_one_work+0x2a0/0x600
[ 215.434819] worker_thread+0x4f/0x3a0
[ 215.434830] ? process_one_work+0x600/0x600
[ 215.434836] kthread+0xf5/0x120
[ 215.434842] ? kthread_complete_and_exit+0x20/0x20
[ 215.434854] ret_from_fork+0x22/0x30
[ 215.434881] 
[ 215.434885] irq event stamp: 134873
[ 215.434890] hardirqs last enabled at (134881): []
__up_console_sem+0x5e/0x70
[ 215.434897] hardirqs 

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-17 Thread Maíra Canal




On 8/17/22 14:44, Mikhail Gavrilov wrote:

On Wed, Aug 17, 2022 at 9:08 PM Melissa Wen  wrote:


Hi Mikhail,

IIUC, you got this second user-after-free by applying the first version
of Maíra's patch, right? So, that version was adding another unbalanced
unlock to the cs ioctl flow, but it was solved in the latest version,
that you can find here: https://patchwork.freedesktop.org/patch/497680/
If this is the situation, can you check this last version?

Thanks,

Melissa


With the last version warning "bad unlock balance detected!" was gone,
but the user-after-free issue remains.
And again "Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]".


Hi Mikhail,

Looks like 45ecaea738830b9d521c93520c8f201359dcbd95 ("drm/sched: Partial 
revert of 'drm/sched: Keep s_fence->parent pointer'") introduced the 
error. Try reverting it and check if the use-after-free still happens.


Best Regards,
- Maíra Canal



[  297.834779] [ cut here ]
[  297.834818] refcount_t: underflow; use-after-free.
[  297.834831] WARNING: CPU: 30 PID: 2377 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[  297.834838] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u
mt76x2_common mt76x02_usb mt76_usb mt76x02_lib snd_hda_codec_realtek
iwlmvm intel_rapl_msr snd_hda_codec_generic snd_hda_codec_hdmi mt76
vfat fat snd_hda_intel intel_rapl_common mac80211 snd_intel_dspcfg
snd_intel_sdw_acpi snd_usb_audio snd_hda_codec snd_usbmidi_lib btusb
edac_mce_amd iwlwifi libarc4 uvcvideo snd_hda_core btrtl snd_rawmidi
snd_hwdep videobuf2_vmalloc btbcm kvm_amd videobuf2_memops snd_seq
iwlmei btintel videobuf2_v4l2 eeepc_wmi snd_seq_device
videobuf2_common btmtk kvm xpad videodev joydev irqbypass snd_pcm
asus_wmi hid_logitech_hidpp ff_memless cfg80211 bluetooth rapl mc
[  297.834932]  ledtrig_audio snd_timer sparse_keymap platform_profile
wmi_bmof snd video pcspkr k10temp i2c_piix4 rfkill soundcore mei
asus_ec_sensors acpi_cpufreq zram amdgpu drm_ttm_helper ttm
crct10dif_pclmul crc32_pclmul crc32c_intel iommu_v2 ucsi_ccg gpu_sched
typec_ucsi drm_buddy ghash_clmulni_intel drm_display_helper ccp igb
typec sp5100_tco nvme cec nvme_core dca wmi ip6_tables ip_tables fuse
[  297.834978] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1
[  297.835055]  pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  297.835071] CPU: 30 PID: 2377 Comm: kworker/30:6 Tainted: G
WL---  ---
6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1
[  297.835075] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[  297.835078] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
[  297.835085] RIP: 0010:refcount_warn_saturate+0xba/0x110
[  297.835088] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d
be 7d be 01 00 75 85 48 c7 c7 c0 99 8e aa c6 05 ae 7d be 01 01 e8 36
59 6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff
48 c7
[  297.835091] RSP: 0018:bd3506df7e60 EFLAGS: 00010286
[  297.835095] RAX: 0026 RBX: 961b250cbc28 RCX: 
[  297.835097] RDX: 0001 RSI: aa8d07a4 RDI: 
[  297.835100] RBP: 96276a3f5600 R08:  R09: bd3506df7d10
[  297.835102] R10: 0003 R11: 9627ae2fffe8 R12: 96276a3fc800
[  297.835105] R13: 9618c03e6600 R14: 96276a3fc805 R15: 961b250cbc30
[  297.835108] FS:  () GS:96276a20()
knlGS:
[  297.835110] CS:  0010 DS:  ES:  CR0: 80050033
[  297.835113] CR2: 621001e4a000 CR3: 00018d958000 CR4: 00350ee0
[  297.835116] Call Trace:
[  

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-17 Thread Mikhail Gavrilov
On Wed, Aug 17, 2022 at 9:08 PM Melissa Wen  wrote:
>
> Hi Mikhail,
>
> IIUC, you got this second user-after-free by applying the first version
> of Maíra's patch, right? So, that version was adding another unbalanced
> unlock to the cs ioctl flow, but it was solved in the latest version,
> that you can find here: https://patchwork.freedesktop.org/patch/497680/
> If this is the situation, can you check this last version?
>
> Thanks,
>
> Melissa

With the last version warning "bad unlock balance detected!" was gone,
but the user-after-free issue remains.
And again "Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]".

[  297.834779] [ cut here ]
[  297.834818] refcount_t: underflow; use-after-free.
[  297.834831] WARNING: CPU: 30 PID: 2377 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[  297.834838] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u
mt76x2_common mt76x02_usb mt76_usb mt76x02_lib snd_hda_codec_realtek
iwlmvm intel_rapl_msr snd_hda_codec_generic snd_hda_codec_hdmi mt76
vfat fat snd_hda_intel intel_rapl_common mac80211 snd_intel_dspcfg
snd_intel_sdw_acpi snd_usb_audio snd_hda_codec snd_usbmidi_lib btusb
edac_mce_amd iwlwifi libarc4 uvcvideo snd_hda_core btrtl snd_rawmidi
snd_hwdep videobuf2_vmalloc btbcm kvm_amd videobuf2_memops snd_seq
iwlmei btintel videobuf2_v4l2 eeepc_wmi snd_seq_device
videobuf2_common btmtk kvm xpad videodev joydev irqbypass snd_pcm
asus_wmi hid_logitech_hidpp ff_memless cfg80211 bluetooth rapl mc
[  297.834932]  ledtrig_audio snd_timer sparse_keymap platform_profile
wmi_bmof snd video pcspkr k10temp i2c_piix4 rfkill soundcore mei
asus_ec_sensors acpi_cpufreq zram amdgpu drm_ttm_helper ttm
crct10dif_pclmul crc32_pclmul crc32c_intel iommu_v2 ucsi_ccg gpu_sched
typec_ucsi drm_buddy ghash_clmulni_intel drm_display_helper ccp igb
typec sp5100_tco nvme cec nvme_core dca wmi ip6_tables ip_tables fuse
[  297.834978] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 fjes():1
[  297.835055]  pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  297.835071] CPU: 30 PID: 2377 Comm: kworker/30:6 Tainted: G
WL---  ---
6.0.0-0.rc1.20220817git3cc40a443a04.14.fc38.x86_64 #1
[  297.835075] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[  297.835078] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
[  297.835085] RIP: 0010:refcount_warn_saturate+0xba/0x110
[  297.835088] Code: 01 01 e8 59 59 6f 00 0f 0b e9 22 46 a5 00 80 3d
be 7d be 01 00 75 85 48 c7 c7 c0 99 8e aa c6 05 ae 7d be 01 01 e8 36
59 6f 00 <0f> 0b e9 ff 45 a5 00 80 3d 99 7d be 01 00 0f 85 5e ff ff ff
48 c7
[  297.835091] RSP: 0018:bd3506df7e60 EFLAGS: 00010286
[  297.835095] RAX: 0026 RBX: 961b250cbc28 RCX: 
[  297.835097] RDX: 0001 RSI: aa8d07a4 RDI: 
[  297.835100] RBP: 96276a3f5600 R08:  R09: bd3506df7d10
[  297.835102] R10: 0003 R11: 9627ae2fffe8 R12: 96276a3fc800
[  297.835105] R13: 9618c03e6600 R14: 96276a3fc805 R15: 961b250cbc30
[  297.835108] FS:  () GS:96276a20()
knlGS:
[  297.835110] CS:  0010 DS:  ES:  CR0: 80050033
[  297.835113] CR2: 621001e4a000 CR3: 00018d958000 CR4: 00350ee0
[  297.835116] Call Trace:
[  297.835118]  
[  297.835121]  process_one_work+0x2a0/0x600
[  297.835133]  worker_thread+0x4f/0x3a0
[  297.835139]  ? process_one_work+0x600/0x600
[  297.835142]  kthread+0xf5/0x120
[  297.835145]  ? kthread_complete_and_exit+0x20/0x20
[  297.835151]  ret_from_fork+0x22/0x30
[  297.835166]  
[  

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-16 Thread Mikhail Gavrilov
On Mon, Aug 15, 2022 at 3:37 PM Mikhail Gavrilov
 wrote:
>
> Thanks, I tested this patch.
> But with this patch use-after-free problem happening in another place:

Does anyone have an idea why the second use-after-free happened?
>From the trace I don't understand which code is related.
I don't quite understand what the "Workqueue" entry in the trace means.

[ 408.358737] [ cut here ]
[ 408.358743] refcount_t: underflow; use-after-free.
[ 408.358760] WARNING: CPU: 9 PID: 62 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[ 408.358769] Modules linked in: uinput snd_seq_dummy rfcomm
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc binfmt_misc snd_seq_midi snd_seq_midi_event mt76x2u
mt76x2_common snd_hda_codec_realtek mt76x02_usb snd_hda_codec_generic
iwlmvm snd_hda_codec_hdmi mt76_usb intel_rapl_msr snd_hda_intel
mt76x02_lib intel_rapl_common snd_intel_dspcfg snd_intel_sdw_acpi mt76
snd_hda_codec vfat fat snd_usb_audio snd_hda_core edac_mce_amd
mac80211 snd_usbmidi_lib snd_hwdep snd_rawmidi mc snd_seq btusb
kvm_amd iwlwifi snd_seq_device btrtl btbcm libarc4 btintel eeepc_wmi
snd_pcm iwlmei kvm btmtk asus_wmi ledtrig_audio irqbypass joydev
snd_timer sparse_keymap bluetooth platform_profile rapl cfg80211 snd
video wmi_bmof soundcore i2c_piix4 k10temp rfkill mei
[ 408.358853] asus_ec_sensors acpi_cpufreq zram hid_logitech_hidpp
amdgpu igb dca drm_ttm_helper ttm iommu_v2 crct10dif_pclmul gpu_sched
crc32_pclmul ucsi_ccg crc32c_intel drm_buddy nvme typec_ucsi
drm_display_helper ghash_clmulni_intel ccp typec nvme_core sp5100_tco
cec wmi ip6_tables ip_tables fuse
[ 408.358880] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
[ 408.358953] pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1
fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[ 408.358967] CPU: 9 PID: 62 Comm: kworker/9:0 Tainted: G W L ---
--- 6.0.0-0.rc1.13.fc38.x86_64+debug #1
[ 408.358971] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[ 408.358974] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
[ 408.358982] RIP: 0010:refcount_warn_saturate+0xba/0x110
[ 408.358987] Code: 01 01 e8 d9 59 6f 00 0f 0b e9 a2 46 a5 00 80 3d 3e
7e be 01 00 75 85 48 c7 c7 70 99 8e 92 c6 05 2e 7e be 01 01 e8 b6 59
6f 00 <0f> 0b e9 7f 46 a5 00 80 3d 19 7e be 01 00 0f 85 5e ff ff ff 48
c7
[ 408.358990] RSP: 0018:b124003efe60 EFLAGS: 00010286
[ 408.358994] RAX: 0026 RBX: 9987a025d428 RCX: 
[ 408.358997] RDX: 0001 RSI: 928d0754 RDI: 
[ 408.358999] RBP: 9994e4ff5600 R08:  R09: b124003efd10
[ 408.359001] R10: 0003 R11: 99952e2fffe8 R12: 9994e4ffc800
[ 408.359004] R13: 998600228cc0 R14: 9994e4ffc805 R15: 9987a025d430
[ 408.359006] FS: () GS:9994e4e0()
knlGS:
[ 408.359009] CS: 0010 DS:  ES:  CR0: 80050033
[ 408.359012] CR2: 27ac39e78000 CR3: 0001a66d8000 CR4: 00350ee0
[ 408.359015] Call Trace:
[ 408.359017] 
[ 408.359020] process_one_work+0x2a0/0x600
[ 408.359032] worker_thread+0x4f/0x3a0
[ 408.359036] ? process_one_work+0x600/0x600
[ 408.359039] kthread+0xf5/0x120
[ 408.359044] ? kthread_complete_and_exit+0x20/0x20
[ 408.359049] ret_from_fork+0x22/0x30
[ 408.359061] 
[ 408.359063] irq event stamp: 5468
[ 408.359064] hardirqs last enabled at (5467): []
_raw_spin_unlock_irq+0x24/0x50
[ 408.359071] hardirqs last disabled at (5468): []
__schedule+0xe2c/0x16d0
[ 408.359076] softirqs last enabled at (2482): []
rht_deferred_worker+0x708/0xc00
[ 408.359079] softirqs last disabled at (2480): []
rht_deferred_worker+0x1f7/0xc00
[ 408.359082] ---[ end trace  ]---


Full kernel log is here: 

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-15 Thread Christian König

Am 15.08.22 um 12:55 schrieb Melissa Wen:

On 08/14, Maíra Canal wrote:

Hi Mikhail

Looks like this use-after-free problem was introduced on
90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems
like: if amdgpu_cs_vm_handling return r != 0, then it will unlock
bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
amdgpu_cs_parser_fini.

Maybe the following patch will help:

---
 From 71d718c0f53a334bb59bcd5dabd29bbe92c724af Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ma=C3=ADra=20Canal?= 
Date: Sun, 14 Aug 2022 21:12:24 -0300
Subject: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a
mutex v2")
Reported-by: Mikhail Gavrilov 
Signed-off-by: Maíra Canal 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++--
  1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d8f1335bc68f..a7fce7b14321 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -837,17 +837,14 @@ static int amdgpu_cs_vm_handling(struct
amdgpu_cs_parser *p)
continue;

r = amdgpu_vm_bo_update(adev, bo_va, false);
-   if (r) {
-   mutex_unlock(>bo_list->bo_list_mutex);
+   if (r)
return r;
-   }

r = amdgpu_sync_fence(>job->sync, bo_va->last_pt_update);
-   if (r) {
-   mutex_unlock(>bo_list->bo_list_mutex);
+   if (r)
return r;
-   }
}
+   mutex_unlock(>bo_list->bo_list_mutex);

I think we don't need to unlock the bo_list_mutex here. If return != 0
amdgpu_cs_parser_fini() will unlock it; otherwise, amdgpu_cs_submit()
unlocks it in the end.


Yeah, exactly that.

Apart from that the patch looks good to me. We moved the mutex unlocking 
around a few times during review. Probably just a fallout from that.


Thanks for fixing this,
Christian.



BR,

Melissa

r = amdgpu_vm_handle_moved(adev, vm);
if (r)
--
2.37.1
---
Best Regards,
- Maíra Canal

On 8/14/22 18:11, Mikhail Gavrilov wrote:

Hi folks.
Joined testing 5.20 today (7ebfc85e2cd7).
I encountered a frequently GPU freeze, after which a message appears
in the kernel logs:
[ 220.280990] [ cut here ]
[ 220.281000] refcount_t: underflow; use-after-free.
[ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr
fat intel_rapl_common snd_hda_codec_realtek mt76x2u
snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm
mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg
mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76
kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib
videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi
snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel
snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev
snd_seq_device joydev xpad iwlmei platform_profile bluetooth
ff_memless snd_pcm mc rapl
[ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore
k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram
hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul
iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme
typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec
sp5100_tco cec wmi ip6_tables ip_tables fuse
[ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-15 Thread Melissa Wen
On 08/14, Maíra Canal wrote:
> Hi Mikhail
> 
> Looks like this use-after-free problem was introduced on
> 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems
> like: if amdgpu_cs_vm_handling return r != 0, then it will unlock
> bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
> amdgpu_cs_parser_fini.
> 
> Maybe the following patch will help:
> 
> ---
> From 71d718c0f53a334bb59bcd5dabd29bbe92c724af Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Ma=C3=ADra=20Canal?= 
> Date: Sun, 14 Aug 2022 21:12:24 -0300
> Subject: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a
> mutex v2")
> Reported-by: Mikhail Gavrilov 
> Signed-off-by: Maíra Canal 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index d8f1335bc68f..a7fce7b14321 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -837,17 +837,14 @@ static int amdgpu_cs_vm_handling(struct
> amdgpu_cs_parser *p)
>   continue;
> 
>   r = amdgpu_vm_bo_update(adev, bo_va, false);
> - if (r) {
> - mutex_unlock(>bo_list->bo_list_mutex);
> + if (r)
>   return r;
> - }
> 
>   r = amdgpu_sync_fence(>job->sync, bo_va->last_pt_update);
> - if (r) {
> - mutex_unlock(>bo_list->bo_list_mutex);
> + if (r)
>   return r;
> - }
>   }
> + mutex_unlock(>bo_list->bo_list_mutex);

I think we don't need to unlock the bo_list_mutex here. If return != 0
amdgpu_cs_parser_fini() will unlock it; otherwise, amdgpu_cs_submit()
unlocks it in the end.

BR,

Melissa
> 
>   r = amdgpu_vm_handle_moved(adev, vm);
>   if (r)
> -- 
> 2.37.1
> ---
> Best Regards,
> - Maíra Canal
> 
> On 8/14/22 18:11, Mikhail Gavrilov wrote:
> > Hi folks.
> > Joined testing 5.20 today (7ebfc85e2cd7).
> > I encountered a frequently GPU freeze, after which a message appears
> > in the kernel logs:
> > [ 220.280990] [ cut here ]
> > [ 220.281000] refcount_t: underflow; use-after-free.
> > [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28
> > refcount_warn_saturate+0xba/0x110
> > [ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy
> > snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> > nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> > qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr
> > fat intel_rapl_common snd_hda_codec_realtek mt76x2u
> > snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm
> > mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg
> > mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76
> > kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib
> > videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi
> > snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel
> > snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev
> > snd_seq_device joydev xpad iwlmei platform_profile bluetooth
> > ff_memless snd_pcm mc rapl
> > [ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore
> > k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram
> > hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul
> > iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme
> > typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec
> > sp5100_tco cec wmi ip6_tables ip_tables fuse
> > [ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
> > amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> > amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> > pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> > amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> > amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> > pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> > 

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-15 Thread Mikhail Gavrilov
On Mon, Aug 15, 2022 at 5:20 AM Maíra Canal  wrote:
>
> Hi Mikhail
>
> Looks like this use-after-free problem was introduced on
> 90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems
> like: if amdgpu_cs_vm_handling return r != 0, then it will unlock
> bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
> amdgpu_cs_parser_fini.
>
> Maybe the following patch will help:

Thanks, I tested this patch.
But with this patch use-after-free problem happening in another place:

[  894.012920] [ cut here ]
[  894.012939] refcount_t: underflow; use-after-free.
[  894.012968] WARNING: CPU: 14 PID: 205 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[  894.012999] Modules linked in: tls uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event snd_hda_codec_realtek
mt76x2u mt76x2_common snd_hda_codec_generic snd_hda_codec_hdmi
intel_rapl_msr mt76x02_usb intel_rapl_common snd_hda_intel mt76_usb
snd_intel_dspcfg vfat iwlmvm snd_intel_sdw_acpi mt76x02_lib fat
snd_usb_audio snd_hda_codec mt76 edac_mce_amd snd_usbmidi_lib
snd_hda_core btusb snd_rawmidi snd_hwdep mac80211 mc iwlwifi btrtl
eeepc_wmi asus_wmi btbcm snd_seq kvm_amd libarc4 ledtrig_audio
snd_seq_device btintel iwlmei sparse_keymap btmtk kvm snd_pcm
irqbypass platform_profile snd_timer xpad joydev cfg80211 rapl
hid_logitech_hidpp bluetooth ff_memless wmi_bmof video pcspkr snd
k10temp i2c_piix4
[  894.013086]  soundcore rfkill mei asus_ec_sensors acpi_cpufreq zram
amdgpu drm_ttm_helper ttm iommu_v2 crct10dif_pclmul ucsi_ccg gpu_sched
crc32_pclmul crc32c_intel typec_ucsi drm_buddy typec
drm_display_helper ghash_clmulni_intel igb ccp cec nvme sp5100_tco
nvme_core dca wmi ip6_tables ip_tables fuse
[  894.013322] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
[  894.013455]  pcc_cpufreq():1 pcc_cpufreq():1 fjes():1
pcc_cpufreq():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[  894.013690] CPU: 14 PID: 205 Comm: kworker/14:1 Tainted: GW
   L---  ---
5.20.0-0.rc0.20220812git7ebfc85e2cd7.11.fc38.x86_64 #1
[  894.013725] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[  894.013756] Workqueue: events drm_sched_entity_kill_jobs_work [gpu_sched]
[  894.013779] RIP: 0010:refcount_warn_saturate+0xba/0x110
[  894.013796] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d
de 7e be 01 00 75 85 48 c7 c7 f8 98 8e 9c c6 05 ce 7e be 01 01 e8 56
4a 6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff
48 c7
[  894.013842] RSP: 0018:b48681153e60 EFLAGS: 00010286
[  894.013858] RAX: 0026 RBX: 9bad16f1f028 RCX: 
[  894.013878] RDX: 0001 RSI: 9c8d06dc RDI: 
[  894.013897] RBP: 9bba663f5600 R08:  R09: b48681153d10
[  894.013916] R10: 0003 R11: 9bbaae2fffe8 R12: 9bba663fc800
[  894.013934] R13: 9bab93fcab40 R14: 9bba663fc805 R15: 9bad16f1f030
[  894.013954] FS:  () GS:9bba6620()
knlGS:
[  894.013975] CS:  0010 DS:  ES:  CR0: 80050033
[  894.013991] CR2: 1aa46b2ec008 CR3: 000101516000 CR4: 00350ee0
[  894.014011] Call Trace:
[  894.014022]  
[  894.014030]  process_one_work+0x2a0/0x600
[  894.014051]  worker_thread+0x4f/0x3a0
[  894.014065]  ? process_one_work+0x600/0x600
[  894.014079]  kthread+0xf5/0x120
[  894.014092]  ? kthread_complete_and_exit+0x20/0x20
[  894.014109]  ret_from_fork+0x22/0x30
[  894.014129]  
[  894.014137] irq event stamp: 5802
[  894.014148] hardirqs last  enabled at (5801): []
_raw_spin_unlock_irq+0x24/0x50
[  894.014178] hardirqs last disabled at (5802): []
__schedule+0xe2c/0x16d0
[  894.014206] 

Re: [BUG][5.20] refcount_t: underflow; use-after-free

2022-08-14 Thread Maíra Canal
Hi Mikhail

Looks like this use-after-free problem was introduced on
90af0ca047f3049c4b46e902f432ad6ef1e2ded6. Checking this patch it seems
like: if amdgpu_cs_vm_handling return r != 0, then it will unlock
bo_list_mutex inside the function amdgpu_cs_vm_handling and again on
amdgpu_cs_parser_fini.

Maybe the following patch will help:

---
>From 71d718c0f53a334bb59bcd5dabd29bbe92c724af Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ma=C3=ADra=20Canal?= 
Date: Sun, 14 Aug 2022 21:12:24 -0300
Subject: [PATCH] drm/amdgpu: Fix use-after-free on amdgpu_bo_list mutex
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes: 90af0ca047f3 ("drm/amdgpu: Protect the amdgpu_bo_list list with a
mutex v2")
Reported-by: Mikhail Gavrilov 
Signed-off-by: Maíra Canal 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d8f1335bc68f..a7fce7b14321 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -837,17 +837,14 @@ static int amdgpu_cs_vm_handling(struct
amdgpu_cs_parser *p)
continue;

r = amdgpu_vm_bo_update(adev, bo_va, false);
-   if (r) {
-   mutex_unlock(>bo_list->bo_list_mutex);
+   if (r)
return r;
-   }

r = amdgpu_sync_fence(>job->sync, bo_va->last_pt_update);
-   if (r) {
-   mutex_unlock(>bo_list->bo_list_mutex);
+   if (r)
return r;
-   }
}
+   mutex_unlock(>bo_list->bo_list_mutex);

r = amdgpu_vm_handle_moved(adev, vm);
if (r)
-- 
2.37.1
---
Best Regards,
- Maíra Canal

On 8/14/22 18:11, Mikhail Gavrilov wrote:
> Hi folks.
> Joined testing 5.20 today (7ebfc85e2cd7).
> I encountered a frequently GPU freeze, after which a message appears
> in the kernel logs:
> [ 220.280990] [ cut here ]
> [ 220.281000] refcount_t: underflow; use-after-free.
> [ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28
> refcount_warn_saturate+0xba/0x110
> [ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy
> snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
> qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr
> fat intel_rapl_common snd_hda_codec_realtek mt76x2u
> snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm
> mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg
> mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76
> kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib
> videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi
> snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel
> snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev
> snd_seq_device joydev xpad iwlmei platform_profile bluetooth
> ff_memless snd_pcm mc rapl
> [ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore
> k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram
> hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul
> iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme
> typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec
> sp5100_tco cec wmi ip6_tables ip_tables fuse
> [ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
> amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
> pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
> pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
> amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1
> [ 220.281388] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
> fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
> [ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L ---
> --- 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1
> [ 

[BUG][5.20] refcount_t: underflow; use-after-free

2022-08-14 Thread Mikhail Gavrilov
Hi folks.
Joined testing 5.20 today (7ebfc85e2cd7).
I encountered a frequently GPU freeze, after which a message appears
in the kernel logs:
[ 220.280990] [ cut here ]
[ 220.281000] refcount_t: underflow; use-after-free.
[ 220.281019] WARNING: CPU: 1 PID: 3746 at lib/refcount.c:28
refcount_warn_saturate+0xba/0x110
[ 220.281029] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink
qrtr bnep sunrpc snd_seq_midi snd_seq_midi_event vfat intel_rapl_msr
fat intel_rapl_common snd_hda_codec_realtek mt76x2u
snd_hda_codec_generic snd_hda_codec_hdmi mt76x2_common iwlmvm
mt76x02_usb edac_mce_amd mt76_usb snd_hda_intel snd_intel_dspcfg
mt76x02_lib snd_intel_sdw_acpi snd_usb_audio snd_hda_codec mt76
kvm_amd uvcvideo mac80211 snd_hda_core btusb eeepc_wmi snd_usbmidi_lib
videobuf2_vmalloc videobuf2_memops kvm btrtl snd_rawmidi asus_wmi
snd_hwdep videobuf2_v4l2 btbcm iwlwifi ledtrig_audio libarc4 btintel
snd_seq videobuf2_common sparse_keymap btmtk irqbypass videodev
snd_seq_device joydev xpad iwlmei platform_profile bluetooth
ff_memless snd_pcm mc rapl
[ 220.281185] video snd_timer cfg80211 wmi_bmof snd pcspkr soundcore
k10temp i2c_piix4 rfkill mei asus_ec_sensors acpi_cpufreq zram
hid_logitech_hidpp amdgpu igb dca drm_ttm_helper ttm crct10dif_pclmul
iommu_v2 crc32_pclmul gpu_sched crc32c_intel ucsi_ccg drm_buddy nvme
typec_ucsi ghash_clmulni_intel drm_display_helper ccp nvme_core typec
sp5100_tco cec wmi ip6_tables ip_tables fuse
[ 220.281258] Unloaded tainted modules: amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1
amd64_edac():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 amd64_edac():1 amd64_edac():1
pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1 amd64_edac():1
pcc_cpufreq():1 pcc_cpufreq():1 amd64_edac():1 pcc_cpufreq():1
amd64_edac():1 pcc_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1
[ 220.281388] pcc_cpufreq():1 fjes():1 pcc_cpufreq():1 fjes():1
fjes():1 fjes():1 fjes():1 fjes():1 fjes():1 fjes():1
[ 220.281415] CPU: 1 PID: 3746 Comm: chrome:cs0 Tainted: G W L ---
--- 5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1
[ 220.281421] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[ 220.281426] RIP: 0010:refcount_warn_saturate+0xba/0x110
[ 220.281431] Code: 01 01 e8 79 4a 6f 00 0f 0b e9 42 47 a5 00 80 3d de
7e be 01 00 75 85 48 c7 c7 f8 98 8e 98 c6 05 ce 7e be 01 01 e8 56 4a
6f 00 <0f> 0b e9 1f 47 a5 00 80 3d b9 7e be 01 00 0f 85 5e ff ff ff 48
c7
[ 220.281437] RSP: 0018:b4b0d18d7a80 EFLAGS: 00010282
[ 220.281443] RAX: 0026 RBX: 0003 RCX: 
[ 220.281448] RDX: 0001 RSI: 988d06dc RDI: 
[ 220.281452] RBP:  R08:  R09: b4b0d18d7930
[ 220.281457] R10: 0003 R11: a0672e2fffe8 R12: a058ca360400
[ 220.281461] R13: a05846c50a18 R14: fe00 R15: 0003
[ 220.281465] FS: 7f82683e06c0() GS:a066e2e0()
knlGS:
[ 220.281470] CS: 0010 DS:  ES:  CR0: 80050033
[ 220.281475] CR2: 3590005cc000 CR3: 0001fca46000 CR4: 00350ee0
[ 220.281480] Call Trace:
[ 220.281485] 
[ 220.281490] amdgpu_cs_ioctl+0x4e2/0x2070 [amdgpu]
[ 220.281806] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[ 220.282028] drm_ioctl_kernel+0xa4/0x150
[ 220.282043] drm_ioctl+0x21f/0x420
[ 220.282053] ? amdgpu_cs_find_mapping+0xe0/0xe0 [amdgpu]
[ 220.282275] ? lock_release+0x14f/0x460
[ 220.282282] ? _raw_spin_unlock_irqrestore+0x30/0x60
[ 220.282290] ? _raw_spin_unlock_irqrestore+0x30/0x60
[ 220.282297] ? lockdep_hardirqs_on+0x7d/0x100
[ 220.282305] ? _raw_spin_unlock_irqrestore+0x40/0x60
[ 220.282317] amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[ 220.282534] __x64_sys_ioctl+0x90/0xd0
[ 220.282545] do_syscall_64+0x5b/0x80
[ 220.282551] ? futex_wake+0x6c/0x150
[ 220.282568] ? lock_is_held_type+0xe8/0x140
[ 220.282580] ? do_syscall_64+0x67/0x80
[ 220.282585] ? lockdep_hardirqs_on+0x7d/0x100
[ 220.282592] ? do_syscall_64+0x67/0x80
[ 220.282597] ? do_syscall_64+0x67/0x80
[ 220.282602] ?