On 03.07.25 15:54, Thomas Zimmermann wrote: > Hi > > Am 03.07.25 um 15:45 schrieb Christian König: >> On 03.07.25 15:37, Thomas Zimmermann wrote: >>> Hi >>> >>> Am 03.07.25 um 13:59 schrieb Bert Karwatzki: >>>> When booting next-20250703 on my Msi Alpha 15 Laptop running debian sid >>>> (last >>>> updated 20250703) I get a several warnings of the following kind: >>>> >>>> [ 8.702999] [ T1628] ------------[ cut here ]------------ >>>> [ 8.703001] [ T1628] WARNING: drivers/gpu/drm/drm_gem.c:287 at >>>> drm_gem_object_handle_put_unlocked+0xaa/0xe0, CPU#14: Xorg/1628 >>> Well, that didn't take long to blow up. Thanks for reporting the bug. >>> >>> I have an idea how to fix this, but it would likely just trigger the next >>> issue. >>> >>> Christian, can we revert this patch, and also the other patches that switch >>> from import_attach->dmabuf to ->dma_buf that cased the problem? >> Sure we can, but I would rather vote for fixing this at least for now. Those >> patches are not just cleanup, but are fixing rare occurring real world >> problems. >> >> If we can't get it working in the next week or so we can still revert back >> to a working state. >> >> What exactly is the issue? That cursors don't necessarily have GEM handles? >> If yes how we grab/drop handle refs when we have a DMA-buf? > > A dozen drivers apparently use drm_gem_fb_destroy() but not > drm_gem_fb_init_with_funcs(). So they don't take the ref on the handle. > That's what we're seeing here. Fixing this would mean to go through all > affected drivers and take the handle refs an needed. The shortcut would be to > take the handle refs in drm_framebuffer_init() and put them in > drm_framebuffer_cleanup(). Those are the minimal calls for all > implementations. But there's the fbdev code of some drivers that does magic > hackery on framebuffer and object allocation. so whatever we do, it's likely > not a quick fixup. Best regards Thomas
Ok that sounds worse than I thought it would be. Feel free to add my Acked-by to a revert for now. Thanks, Christian. >> >> Regards, >> Christian. >> >>> Best regards >>> Thomas >>> >>>> [ 8.703007] [ T1628] Modules linked in: snd_seq_dummy >>>> snd_hrtimer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq >>>> snd_seq_device rfcomm bnep nls_ascii nls_cp437 vfat fat snd_ctl_led >>>> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component >>>> snd_hda_codec_hdmi snd_hda_intel btusb snd_intel_dspcfg btrtl btintel >>>> snd_hda_codec uvcvideo snd_soc_dmic snd_acp3x_pdm_dma btbcm snd_acp3x_rn >>>> btmtk snd_hwdep videobuf2_vmalloc snd_soc_core snd_hda_core >>>> videobuf2_memops snd_pcm_oss uvc videobuf2_v4l2 bluetooth snd_mixer_oss >>>> videodev snd_pcm snd_rn_pci_acp3x videobuf2_common snd_acp_config >>>> snd_timer msi_wmi ecdh_generic snd_soc_acpi ecc mc sparse_keymap snd >>>> wmi_bmof edac_mce_amd k10temp soundcore snd_pci_acp3x ccp ac battery >>>> button joydev hid_sensor_accel_3d hid_sensor_prox hid_sensor_als >>>> hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger >>>> industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common >>>> amd_pmc evdev mt7921e mt7921_common mt792x_lib >>>> mt76_connac_lib mt76 mac80211 libarc4 cfg80211 rfkill msr fuse >>>> [ 8.703056] [ T1628] nvme_fabrics efi_pstore configfs efivarfs >>>> autofs4 ext4 mbcache jbd2 usbhid amdgpu drm_client_lib i2c_algo_bit >>>> drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper >>>> amdxcp drm_buddy xhci_pci gpu_sched xhci_hcd drm_display_helper >>>> hid_sensor_hub hid_multitouch mfd_core hid_generic drm_kms_helper psmouse >>>> i2c_hid_acpi nvme usbcore amd_sfh i2c_hid hid cec serio_raw nvme_core >>>> r8169 crc16 i2c_piix4 usb_common i2c_smbus i2c_designware_platform >>>> i2c_designware_core >>>> [ 8.703082] [ T1628] CPU: 14 UID: 1000 PID: 1628 Comm: Xorg Not >>>> tainted 6.16.0-rc4-next-20250703-master #127 PREEMPT_{RT,(full)} >>>> [ 8.703085] [ T1628] Hardware name: Micro-Star International >>>> Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024 >>>> [ 8.703086] [ T1628] RIP: >>>> 0010:drm_gem_object_handle_put_unlocked+0xaa/0xe0 >>>> [ 8.703088] [ T1628] Code: c7 f6 8a ff 48 89 ef e8 94 d4 2e 00 >>>> eb d8 48 8b 43 08 48 8d b8 d8 06 00 00 e8 52 78 2b 00 c7 83 08 01 00 00 00 >>>> 00 00 00 eb 98 <0f> 0b 5b 5d e9 98 f6 8a ff 48 8b 83 68 01 00 00 48 8b 00 >>>> 48 85 c0 >>>> [ 8.703089] [ T1628] RSP: 0018:ffffb8e8c7fbfb00 EFLAGS: 00010246 >>>> [ 8.703091] [ T1628] RAX: 0000000000000000 RBX: >>>> 0000000000000001 RCX: 0000000000000000 >>>> [ 8.703092] [ T1628] RDX: 0000000000000000 RSI: >>>> ffff94cdc062b478 RDI: ffff94ce71390448 >>>> [ 8.703093] [ T1628] RBP: ffff94ce14780010 R08: >>>> ffff94cdc062b618 R09: ffff94ce14780278 >>>> [ 8.703094] [ T1628] R10: 0000000000000001 R11: >>>> ffff94cdc062b478 R12: ffff94ce14780010 >>>> [ 8.703095] [ T1628] R13: 0000000000000007 R14: >>>> 0000000000000004 R15: ffff94ce14780010 >>>> [ 8.703096] [ T1628] FS: 00007fc164276b00(0000) >>>> GS:ffff94dcb49cf000(0000) knlGS:0000000000000000 >>>> [ 8.703097] [ T1628] CS: 0010 DS: 0000 ES: 0000 CR0: >>>> 0000000080050033 >>>> [ 8.703098] [ T1628] CR2: 00005647ccd53008 CR3: >>>> 000000012533f000 CR4: 0000000000750ef0 >>>> [ 8.703099] [ T1628] PKRU: 55555554 >>>> [ 8.703100] [ T1628] Call Trace: >>>> [ 8.703101] [ T1628] <TASK> >>>> [ 8.703104] [ T1628] drm_gem_fb_destroy+0x27/0x50 >>>> [drm_kms_helper] >>>> [ 8.703113] [ T1628] >>>> __drm_atomic_helper_plane_destroy_state+0x1a/0xa0 [drm_kms_helper] >>>> [ 8.703119] [ T1628] >>>> drm_atomic_helper_plane_destroy_state+0x10/0x20 [drm_kms_helper] >>>> [ 8.703124] [ T1628] drm_atomic_state_default_clear+0x1c0/0x2e0 >>>> [ 8.703127] [ T1628] __drm_atomic_state_free+0x6c/0xb0 >>>> [ 8.703129] [ T1628] drm_atomic_helper_disable_plane+0x92/0xe0 >>>> [drm_kms_helper] >>>> [ 8.703135] [ T1628] drm_mode_cursor_universal+0xf2/0x2a0 >>>> [ 8.703140] [ T1628] drm_mode_cursor_common.part.0+0x9c/0x1e0 >>>> [ 8.703144] [ T1628] ? drm_mode_setplane+0x320/0x320 >>>> [ 8.703146] [ T1628] drm_mode_cursor_ioctl+0x8a/0xa0 >>>> [ 8.703148] [ T1628] drm_ioctl_kernel+0xa1/0xf0 >>>> [ 8.703151] [ T1628] drm_ioctl+0x26a/0x510 >>>> [ 8.703153] [ T1628] ? drm_mode_setplane+0x320/0x320 >>>> [ 8.703155] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 >>>> [ 8.703157] [ T1628] ? rt_spin_unlock+0x12/0x40 >>>> [ 8.703159] [ T1628] ? do_setitimer+0x185/0x1d0 >>>> [ 8.703161] [ T1628] ? srso_alias_return_thunk+0x5/0xfbef5 >>>> [ 8.703164] [ T1628] amdgpu_drm_ioctl+0x46/0x90 [amdgpu] >>>> [ 8.703283] [ T1628] __x64_sys_ioctl+0x91/0xe0 >>>> [ 8.703286] [ T1628] do_syscall_64+0x65/0xfc0 >>>> [ 8.703289] [ T1628] entry_SYSCALL_64_after_hwframe+0x55/0x5d >>>> [ 8.703291] [ T1628] RIP: 0033:0x7fc1645f78db >>>> [ 8.703292] [ T1628] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 >>>> 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 >>>> 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 >>>> 28 00 00 >>>> [ 8.703294] [ T1628] RSP: 002b:00007ffd75bce430 EFLAGS: >>>> 00000246 ORIG_RAX: 0000000000000010 >>>> [ 8.703295] [ T1628] RAX: ffffffffffffffda RBX: >>>> 000056224e896ea0 RCX: 00007fc1645f78db >>>> [ 8.703296] [ T1628] RDX: 00007ffd75bce4c0 RSI: >>>> 00000000c01c64a3 RDI: 000000000000000f >>>> [ 8.703297] [ T1628] RBP: 00007ffd75bce4c0 R08: >>>> 0000000000000100 R09: 0000562210547ab0 >>>> [ 8.703298] [ T1628] R10: 000000000000004c R11: >>>> 0000000000000246 R12: 00000000c01c64a3 >>>> [ 8.703298] [ T1628] R13: 000000000000000f R14: >>>> 0000000000000000 R15: 000056224e5c1cd0 >>>> [ 8.703302] [ T1628] </TASK> >>>> [ 8.703303] [ T1628] ---[ end trace 0000000000000000 ]--- >>>> >>>> As the warnings do not occur in next-20250702, I looked at the commits >>>> given by >>>> $ git log --oneline next-20250702..next-20250703 drivers/gpu/drm >>>> to search for a culprit. So I reverted the most likely candidate, >>>> commit 582111e630f5 ("drm/gem: Acquire references on GEM handles for >>>> framebuffers"), >>>> in next-20250703 and the warnings disappeared. >>>> This is the hardware I used: >>>> $ lspci >>>> 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne >>>> Root Complex >>>> 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU >>>> 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy >>>> Host Bridge >>>> 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP >>>> Bridge >>>> 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy >>>> Host Bridge >>>> 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe >>>> GPP Bridge >>>> 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe >>>> GPP Bridge >>>> 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe >>>> GPP Bridge >>>> 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe >>>> GPP Bridge >>>> 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy >>>> Host Bridge >>>> 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal >>>> PCIe GPP Bridge to Bus >>>> 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller >>>> (rev 51) >>>> 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev >>>> 51) >>>> 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data >>>> Fabric; Function 0 >>>> 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data >>>> Fabric; Function 1 >>>> 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data >>>> Fabric; Function 2 >>>> 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data >>>> Fabric; Function 3 >>>> 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data >>>> Fabric; Function 4 >>>> 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data >>>> Fabric; Function 5 >>>> 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data >>>> Fabric; Function 6 >>>> 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data >>>> Fabric; Function 7 >>>> 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL >>>> Upstream Port of PCI Express Switch (rev c3) >>>> 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL >>>> Downstream Port of PCI Express Switch >>>> 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 >>>> [Radeon RX 6600/6600 XT/6600M] (rev c3) >>>> 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 >>>> HDMI/DP Audio Controller >>>> 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz >>>> 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. >>>> RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15) >>>> 06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. >>>> KC3000/FURY Renegade NVMe SSD [E18] (rev 01) >>>> 07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe >>>> PCIe SSD[Frampton] (rev 03) >>>> 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] >>>> Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) >>>> 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon >>>> High Definition Audio Controller >>>> 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family >>>> 17h (Models 10h-1fh) Platform Security Processor >>>> 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne >>>> USB 3.1 >>>> 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne >>>> USB 3.1 >>>> 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio >>>> Coprocessor (rev 01) >>>> 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family >>>> 17h/19h/1ah HD Audio Controller >>>> 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] >>>> Sensor Fusion Hub >>>> >>>> >>>> Bert Karwatzki >