Re: [Intel-gfx] [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler
On Thu, Aug 19, 2021 at 12:53 PM Desmond Cheong Zhi Xi wrote: > > On 18/8/21 7:02 pm, Daniel Vetter wrote: > > On Wed, Aug 18, 2021 at 03:38:22PM +0800, Desmond Cheong Zhi Xi wrote: > >> In a future patch, a read lock on drm_device.master_rwsem is > >> held in the ioctl handler before the check for ioctl > >> permissions. However, this produces the following lockdep splat: > >> > >> == > >> WARNING: possible circular locking dependency detected > >> 5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U > >> -- > >> kms_lease/1752 is trying to acquire lock: > >> 827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280 > >> > >> but task is already holding lock: > >> 88812e350108 (>master_rwsem){}-{3:3}, at: > >> drm_ioctl_kernel+0xfb/0x1a0 > >> > >> which lock already depends on the new lock. > >> > >> the existing dependency chain (in reverse order) is: > >> > >> -> #2 (>master_rwsem){}-{3:3}: > >> lock_acquire+0xd3/0x310 > >> down_read+0x3b/0x140 > >> drm_master_internal_acquire+0x1d/0x60 > >> drm_client_modeset_commit+0x10/0x40 > >> __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0 > >> drm_fb_helper_set_par+0x34/0x40 > >> intel_fbdev_set_par+0x11/0x40 [i915] > >> fbcon_init+0x270/0x4f0 > >> visual_init+0xc6/0x130 > >> do_bind_con_driver+0x1de/0x2c0 > >> do_take_over_console+0x10e/0x180 > >> do_fbcon_takeover+0x53/0xb0 > >> register_framebuffer+0x22d/0x310 > >> __drm_fb_helper_initial_config_and_unlock+0x36c/0x540 > >> intel_fbdev_initial_config+0xf/0x20 [i915] > >> async_run_entry_fn+0x28/0x130 > >> process_one_work+0x26d/0x5c0 > >> worker_thread+0x37/0x390 > >> kthread+0x13b/0x170 > >> ret_from_fork+0x1f/0x30 > >> > >> -> #1 (>lock){+.+.}-{3:3}: > >> lock_acquire+0xd3/0x310 > >> __mutex_lock+0xa8/0x930 > >> __drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0 > >> intel_fbdev_restore_mode+0x2b/0x50 [i915] > >> drm_lastclose+0x27/0x50 > >> drm_release_noglobal+0x42/0x60 > >> __fput+0x9e/0x250 > >> task_work_run+0x6b/0xb0 > >> exit_to_user_mode_prepare+0x1c5/0x1d0 > >> syscall_exit_to_user_mode+0x19/0x50 > >> do_syscall_64+0x46/0xb0 > >> entry_SYSCALL_64_after_hwframe+0x44/0xae > >> > >> -> #0 (drm_global_mutex){+.+.}-{3:3}: > >> validate_chain+0xb39/0x1e70 > >> __lock_acquire+0x5a1/0xb70 > >> lock_acquire+0xd3/0x310 > >> __mutex_lock+0xa8/0x930 > >> drm_open+0x64/0x280 > >> drm_stub_open+0x9f/0x100 > >> chrdev_open+0x9f/0x1d0 > >> do_dentry_open+0x14a/0x3a0 > >> dentry_open+0x53/0x70 > >> drm_mode_create_lease_ioctl+0x3cb/0x970 > >> drm_ioctl_kernel+0xc9/0x1a0 > >> drm_ioctl+0x201/0x3d0 > >> __x64_sys_ioctl+0x6a/0xa0 > >> do_syscall_64+0x37/0xb0 > >> entry_SYSCALL_64_after_hwframe+0x44/0xae > >> > >> other info that might help us debug this: > >> Chain exists of: > >>drm_global_mutex --> >lock --> >master_rwsem > >> Possible unsafe locking scenario: > >> CPU0CPU1 > >> > >>lock(>master_rwsem); > >> lock(>lock); > >> lock(>master_rwsem); > >>lock(drm_global_mutex); > >> > >> *** DEADLOCK *** > >> > >> The lock hierarchy inversion happens because we grab the > >> drm_global_mutex while already holding on to master_rwsem. To avoid > >> this, we do some prep work to grab the drm_global_mutex before > >> checking for ioctl permissions. > >> > >> At the same time, we update the check for the global mutex to use the > >> drm_dev_needs_global_mutex helper function. > > > > This is intentional, essentially we force all non-legacy drivers to have > > unlocked ioctl (otherwise everyone forgets to set that flag). > > > > For non-legacy drivers the global lock only ensures ordering between > > drm_open and lastclose (I think at least), and between > > drm_dev_register/unregister and the backwards ->load/unload callbacks > > (which are called in the wrong place, but we cannot fix that for legacy > > drivers). > > > > ->load/unload should be completely unused (maybe radeon still uses it), > > and ->lastclose is also on the decline. > > > > Ah ok got it, I'll change the check back to > drm_core_check_feature(dev, DRIVER_LEGACY) then. > > > Maybe we should update the comment of drm_global_mutex to explain what it > > protects and why. > > > > The comments in drm_dev_needs_global_mutex make sense I think, I just > didn't read the code closely enough. > > > I'm also confused how this patch connects to the splat, since for i915 we > > Right, my bad, this is a separate instance of
Re: [Intel-gfx] [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler
On 18/8/21 7:02 pm, Daniel Vetter wrote: On Wed, Aug 18, 2021 at 03:38:22PM +0800, Desmond Cheong Zhi Xi wrote: In a future patch, a read lock on drm_device.master_rwsem is held in the ioctl handler before the check for ioctl permissions. However, this produces the following lockdep splat: == WARNING: possible circular locking dependency detected 5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U -- kms_lease/1752 is trying to acquire lock: 827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280 but task is already holding lock: 88812e350108 (>master_rwsem){}-{3:3}, at: drm_ioctl_kernel+0xfb/0x1a0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (>master_rwsem){}-{3:3}: lock_acquire+0xd3/0x310 down_read+0x3b/0x140 drm_master_internal_acquire+0x1d/0x60 drm_client_modeset_commit+0x10/0x40 __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0 drm_fb_helper_set_par+0x34/0x40 intel_fbdev_set_par+0x11/0x40 [i915] fbcon_init+0x270/0x4f0 visual_init+0xc6/0x130 do_bind_con_driver+0x1de/0x2c0 do_take_over_console+0x10e/0x180 do_fbcon_takeover+0x53/0xb0 register_framebuffer+0x22d/0x310 __drm_fb_helper_initial_config_and_unlock+0x36c/0x540 intel_fbdev_initial_config+0xf/0x20 [i915] async_run_entry_fn+0x28/0x130 process_one_work+0x26d/0x5c0 worker_thread+0x37/0x390 kthread+0x13b/0x170 ret_from_fork+0x1f/0x30 -> #1 (>lock){+.+.}-{3:3}: lock_acquire+0xd3/0x310 __mutex_lock+0xa8/0x930 __drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0 intel_fbdev_restore_mode+0x2b/0x50 [i915] drm_lastclose+0x27/0x50 drm_release_noglobal+0x42/0x60 __fput+0x9e/0x250 task_work_run+0x6b/0xb0 exit_to_user_mode_prepare+0x1c5/0x1d0 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x46/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (drm_global_mutex){+.+.}-{3:3}: validate_chain+0xb39/0x1e70 __lock_acquire+0x5a1/0xb70 lock_acquire+0xd3/0x310 __mutex_lock+0xa8/0x930 drm_open+0x64/0x280 drm_stub_open+0x9f/0x100 chrdev_open+0x9f/0x1d0 do_dentry_open+0x14a/0x3a0 dentry_open+0x53/0x70 drm_mode_create_lease_ioctl+0x3cb/0x970 drm_ioctl_kernel+0xc9/0x1a0 drm_ioctl+0x201/0x3d0 __x64_sys_ioctl+0x6a/0xa0 do_syscall_64+0x37/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: drm_global_mutex --> >lock --> >master_rwsem Possible unsafe locking scenario: CPU0CPU1 lock(>master_rwsem); lock(>lock); lock(>master_rwsem); lock(drm_global_mutex); *** DEADLOCK *** The lock hierarchy inversion happens because we grab the drm_global_mutex while already holding on to master_rwsem. To avoid this, we do some prep work to grab the drm_global_mutex before checking for ioctl permissions. At the same time, we update the check for the global mutex to use the drm_dev_needs_global_mutex helper function. This is intentional, essentially we force all non-legacy drivers to have unlocked ioctl (otherwise everyone forgets to set that flag). For non-legacy drivers the global lock only ensures ordering between drm_open and lastclose (I think at least), and between drm_dev_register/unregister and the backwards ->load/unload callbacks (which are called in the wrong place, but we cannot fix that for legacy drivers). ->load/unload should be completely unused (maybe radeon still uses it), and ->lastclose is also on the decline. Ah ok got it, I'll change the check back to drm_core_check_feature(dev, DRIVER_LEGACY) then. Maybe we should update the comment of drm_global_mutex to explain what it protects and why. The comments in drm_dev_needs_global_mutex make sense I think, I just didn't read the code closely enough. I'm also confused how this patch connects to the splat, since for i915 we Right, my bad, this is a separate instance of circular locking. I was too hasty when I saw that for legacy drivers we might grab master_rwsem then drm_global_mutex in the ioctl handler. shouldn't be taking the drm_global_lock here at all. The problem seems to be the drm_open_helper when we create a new lease, which is an entirely different can of worms. I'm honestly not sure how to best do that, but we should be able to create a file and then call drm_open_helper directly, or well a version of that which never takes the drm_global_mutex. Because that is not needed for nested drm_file opening: - legacy drivers never go down
[Intel-gfx] [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler
In a future patch, a read lock on drm_device.master_rwsem is held in the ioctl handler before the check for ioctl permissions. However, this produces the following lockdep splat: == WARNING: possible circular locking dependency detected 5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U -- kms_lease/1752 is trying to acquire lock: 827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280 but task is already holding lock: 88812e350108 (>master_rwsem){}-{3:3}, at: drm_ioctl_kernel+0xfb/0x1a0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (>master_rwsem){}-{3:3}: lock_acquire+0xd3/0x310 down_read+0x3b/0x140 drm_master_internal_acquire+0x1d/0x60 drm_client_modeset_commit+0x10/0x40 __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0 drm_fb_helper_set_par+0x34/0x40 intel_fbdev_set_par+0x11/0x40 [i915] fbcon_init+0x270/0x4f0 visual_init+0xc6/0x130 do_bind_con_driver+0x1de/0x2c0 do_take_over_console+0x10e/0x180 do_fbcon_takeover+0x53/0xb0 register_framebuffer+0x22d/0x310 __drm_fb_helper_initial_config_and_unlock+0x36c/0x540 intel_fbdev_initial_config+0xf/0x20 [i915] async_run_entry_fn+0x28/0x130 process_one_work+0x26d/0x5c0 worker_thread+0x37/0x390 kthread+0x13b/0x170 ret_from_fork+0x1f/0x30 -> #1 (>lock){+.+.}-{3:3}: lock_acquire+0xd3/0x310 __mutex_lock+0xa8/0x930 __drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0 intel_fbdev_restore_mode+0x2b/0x50 [i915] drm_lastclose+0x27/0x50 drm_release_noglobal+0x42/0x60 __fput+0x9e/0x250 task_work_run+0x6b/0xb0 exit_to_user_mode_prepare+0x1c5/0x1d0 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x46/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (drm_global_mutex){+.+.}-{3:3}: validate_chain+0xb39/0x1e70 __lock_acquire+0x5a1/0xb70 lock_acquire+0xd3/0x310 __mutex_lock+0xa8/0x930 drm_open+0x64/0x280 drm_stub_open+0x9f/0x100 chrdev_open+0x9f/0x1d0 do_dentry_open+0x14a/0x3a0 dentry_open+0x53/0x70 drm_mode_create_lease_ioctl+0x3cb/0x970 drm_ioctl_kernel+0xc9/0x1a0 drm_ioctl+0x201/0x3d0 __x64_sys_ioctl+0x6a/0xa0 do_syscall_64+0x37/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: drm_global_mutex --> >lock --> >master_rwsem Possible unsafe locking scenario: CPU0CPU1 lock(>master_rwsem); lock(>lock); lock(>master_rwsem); lock(drm_global_mutex); *** DEADLOCK *** The lock hierarchy inversion happens because we grab the drm_global_mutex while already holding on to master_rwsem. To avoid this, we do some prep work to grab the drm_global_mutex before checking for ioctl permissions. At the same time, we update the check for the global mutex to use the drm_dev_needs_global_mutex helper function. Signed-off-by: Desmond Cheong Zhi Xi --- drivers/gpu/drm/drm_ioctl.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 880fc565d599..2cb57378a787 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -779,19 +779,19 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata, if (drm_dev_is_unplugged(dev)) return -ENODEV; + /* Enforce sane locking for modern driver ioctls. */ + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED)) + mutex_lock(_global_mutex); + retcode = drm_ioctl_permit(flags, file_priv); if (unlikely(retcode)) - return retcode; + goto out; - /* Enforce sane locking for modern driver ioctls. */ - if (likely(!drm_core_check_feature(dev, DRIVER_LEGACY)) || - (flags & DRM_UNLOCKED)) - retcode = func(dev, kdata, file_priv); - else { - mutex_lock(_global_mutex); - retcode = func(dev, kdata, file_priv); + retcode = func(dev, kdata, file_priv); + +out: + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED)) mutex_unlock(_global_mutex); - } return retcode; } EXPORT_SYMBOL(drm_ioctl_kernel); -- 2.25.1
Re: [Intel-gfx] [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler
On Wed, Aug 18, 2021 at 03:38:22PM +0800, Desmond Cheong Zhi Xi wrote: > In a future patch, a read lock on drm_device.master_rwsem is > held in the ioctl handler before the check for ioctl > permissions. However, this produces the following lockdep splat: > > == > WARNING: possible circular locking dependency detected > 5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U > -- > kms_lease/1752 is trying to acquire lock: > 827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280 > > but task is already holding lock: > 88812e350108 (>master_rwsem){}-{3:3}, at: > drm_ioctl_kernel+0xfb/0x1a0 > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #2 (>master_rwsem){}-{3:3}: >lock_acquire+0xd3/0x310 >down_read+0x3b/0x140 >drm_master_internal_acquire+0x1d/0x60 >drm_client_modeset_commit+0x10/0x40 >__drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0 >drm_fb_helper_set_par+0x34/0x40 >intel_fbdev_set_par+0x11/0x40 [i915] >fbcon_init+0x270/0x4f0 >visual_init+0xc6/0x130 >do_bind_con_driver+0x1de/0x2c0 >do_take_over_console+0x10e/0x180 >do_fbcon_takeover+0x53/0xb0 >register_framebuffer+0x22d/0x310 >__drm_fb_helper_initial_config_and_unlock+0x36c/0x540 >intel_fbdev_initial_config+0xf/0x20 [i915] >async_run_entry_fn+0x28/0x130 >process_one_work+0x26d/0x5c0 >worker_thread+0x37/0x390 >kthread+0x13b/0x170 >ret_from_fork+0x1f/0x30 > > -> #1 (>lock){+.+.}-{3:3}: >lock_acquire+0xd3/0x310 >__mutex_lock+0xa8/0x930 >__drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0 >intel_fbdev_restore_mode+0x2b/0x50 [i915] >drm_lastclose+0x27/0x50 >drm_release_noglobal+0x42/0x60 >__fput+0x9e/0x250 >task_work_run+0x6b/0xb0 >exit_to_user_mode_prepare+0x1c5/0x1d0 >syscall_exit_to_user_mode+0x19/0x50 >do_syscall_64+0x46/0xb0 >entry_SYSCALL_64_after_hwframe+0x44/0xae > > -> #0 (drm_global_mutex){+.+.}-{3:3}: >validate_chain+0xb39/0x1e70 >__lock_acquire+0x5a1/0xb70 >lock_acquire+0xd3/0x310 >__mutex_lock+0xa8/0x930 >drm_open+0x64/0x280 >drm_stub_open+0x9f/0x100 >chrdev_open+0x9f/0x1d0 >do_dentry_open+0x14a/0x3a0 >dentry_open+0x53/0x70 >drm_mode_create_lease_ioctl+0x3cb/0x970 >drm_ioctl_kernel+0xc9/0x1a0 >drm_ioctl+0x201/0x3d0 >__x64_sys_ioctl+0x6a/0xa0 >do_syscall_64+0x37/0xb0 >entry_SYSCALL_64_after_hwframe+0x44/0xae > > other info that might help us debug this: > Chain exists of: > drm_global_mutex --> >lock --> >master_rwsem > Possible unsafe locking scenario: >CPU0CPU1 > > lock(>master_rwsem); >lock(>lock); >lock(>master_rwsem); > lock(drm_global_mutex); > > *** DEADLOCK *** > > The lock hierarchy inversion happens because we grab the > drm_global_mutex while already holding on to master_rwsem. To avoid > this, we do some prep work to grab the drm_global_mutex before > checking for ioctl permissions. > > At the same time, we update the check for the global mutex to use the > drm_dev_needs_global_mutex helper function. This is intentional, essentially we force all non-legacy drivers to have unlocked ioctl (otherwise everyone forgets to set that flag). For non-legacy drivers the global lock only ensures ordering between drm_open and lastclose (I think at least), and between drm_dev_register/unregister and the backwards ->load/unload callbacks (which are called in the wrong place, but we cannot fix that for legacy drivers). ->load/unload should be completely unused (maybe radeon still uses it), and ->lastclose is also on the decline. Maybe we should update the comment of drm_global_mutex to explain what it protects and why. I'm also confused how this patch connects to the splat, since for i915 we shouldn't be taking the drm_global_lock here at all. The problem seems to be the drm_open_helper when we create a new lease, which is an entirely different can of worms. I'm honestly not sure how to best do that, but we should be able to create a file and then call drm_open_helper directly, or well a version of that which never takes the drm_global_mutex. Because that is not needed for nested drm_file opening: - legacy drivers never go down this path because leases are only supported with modesetting, and modesetting is only supported for non-legacy drivers - the races against dev->open_count due to last_close or ->load callbacks don't matter, because for the entire ioctl we already have an open drm_file and that wont disappear. So this