Re: [Intel-gfx] [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler

2021-08-19 Thread Daniel Vetter
On Thu, Aug 19, 2021 at 12:53 PM Desmond Cheong Zhi Xi
 wrote:
>
> On 18/8/21 7:02 pm, Daniel Vetter wrote:
> > On Wed, Aug 18, 2021 at 03:38:22PM +0800, Desmond Cheong Zhi Xi wrote:
> >> In a future patch, a read lock on drm_device.master_rwsem is
> >> held in the ioctl handler before the check for ioctl
> >> permissions. However, this produces the following lockdep splat:
> >>
> >> ==
> >> WARNING: possible circular locking dependency detected
> >> 5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U
> >> --
> >> kms_lease/1752 is trying to acquire lock:
> >> 827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280
> >>
> >> but task is already holding lock:
> >> 88812e350108 (>master_rwsem){}-{3:3}, at:
> >> drm_ioctl_kernel+0xfb/0x1a0
> >>
> >> which lock already depends on the new lock.
> >>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #2 (>master_rwsem){}-{3:3}:
> >> lock_acquire+0xd3/0x310
> >> down_read+0x3b/0x140
> >> drm_master_internal_acquire+0x1d/0x60
> >> drm_client_modeset_commit+0x10/0x40
> >> __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
> >> drm_fb_helper_set_par+0x34/0x40
> >> intel_fbdev_set_par+0x11/0x40 [i915]
> >> fbcon_init+0x270/0x4f0
> >> visual_init+0xc6/0x130
> >> do_bind_con_driver+0x1de/0x2c0
> >> do_take_over_console+0x10e/0x180
> >> do_fbcon_takeover+0x53/0xb0
> >> register_framebuffer+0x22d/0x310
> >> __drm_fb_helper_initial_config_and_unlock+0x36c/0x540
> >> intel_fbdev_initial_config+0xf/0x20 [i915]
> >> async_run_entry_fn+0x28/0x130
> >> process_one_work+0x26d/0x5c0
> >> worker_thread+0x37/0x390
> >> kthread+0x13b/0x170
> >> ret_from_fork+0x1f/0x30
> >>
> >> -> #1 (>lock){+.+.}-{3:3}:
> >> lock_acquire+0xd3/0x310
> >> __mutex_lock+0xa8/0x930
> >> __drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0
> >> intel_fbdev_restore_mode+0x2b/0x50 [i915]
> >> drm_lastclose+0x27/0x50
> >> drm_release_noglobal+0x42/0x60
> >> __fput+0x9e/0x250
> >> task_work_run+0x6b/0xb0
> >> exit_to_user_mode_prepare+0x1c5/0x1d0
> >> syscall_exit_to_user_mode+0x19/0x50
> >> do_syscall_64+0x46/0xb0
> >> entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>
> >> -> #0 (drm_global_mutex){+.+.}-{3:3}:
> >> validate_chain+0xb39/0x1e70
> >> __lock_acquire+0x5a1/0xb70
> >> lock_acquire+0xd3/0x310
> >> __mutex_lock+0xa8/0x930
> >> drm_open+0x64/0x280
> >> drm_stub_open+0x9f/0x100
> >> chrdev_open+0x9f/0x1d0
> >> do_dentry_open+0x14a/0x3a0
> >> dentry_open+0x53/0x70
> >> drm_mode_create_lease_ioctl+0x3cb/0x970
> >> drm_ioctl_kernel+0xc9/0x1a0
> >> drm_ioctl+0x201/0x3d0
> >> __x64_sys_ioctl+0x6a/0xa0
> >> do_syscall_64+0x37/0xb0
> >> entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>
> >> other info that might help us debug this:
> >> Chain exists of:
> >>drm_global_mutex --> >lock --> >master_rwsem
> >>   Possible unsafe locking scenario:
> >> CPU0CPU1
> >> 
> >>lock(>master_rwsem);
> >> lock(>lock);
> >> lock(>master_rwsem);
> >>lock(drm_global_mutex);
> >>
> >>   *** DEADLOCK ***
> >>
> >> The lock hierarchy inversion happens because we grab the
> >> drm_global_mutex while already holding on to master_rwsem. To avoid
> >> this, we do some prep work to grab the drm_global_mutex before
> >> checking for ioctl permissions.
> >>
> >> At the same time, we update the check for the global mutex to use the
> >> drm_dev_needs_global_mutex helper function.
> >
> > This is intentional, essentially we force all non-legacy drivers to have
> > unlocked ioctl (otherwise everyone forgets to set that flag).
> >
> > For non-legacy drivers the global lock only ensures ordering between
> > drm_open and lastclose (I think at least), and between
> > drm_dev_register/unregister and the backwards ->load/unload callbacks
> > (which are called in the wrong place, but we cannot fix that for legacy
> > drivers).
> >
> > ->load/unload should be completely unused (maybe radeon still uses it),
> > and ->lastclose is also on the decline.
> >
>
> Ah ok got it, I'll change the check back to
> drm_core_check_feature(dev, DRIVER_LEGACY) then.
>
> > Maybe we should update the comment of drm_global_mutex to explain what it
> > protects and why.
> >
>
> The comments in drm_dev_needs_global_mutex make sense I think, I just
> didn't read the code closely enough.
>
> > I'm also confused how this patch connects to the splat, since for i915 we
>
> Right, my bad, this is a separate instance of 

Re: [Intel-gfx] [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler

2021-08-19 Thread Desmond Cheong Zhi Xi

On 18/8/21 7:02 pm, Daniel Vetter wrote:

On Wed, Aug 18, 2021 at 03:38:22PM +0800, Desmond Cheong Zhi Xi wrote:

In a future patch, a read lock on drm_device.master_rwsem is
held in the ioctl handler before the check for ioctl
permissions. However, this produces the following lockdep splat:

==
WARNING: possible circular locking dependency detected
5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U
--
kms_lease/1752 is trying to acquire lock:
827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280

but task is already holding lock:
88812e350108 (>master_rwsem){}-{3:3}, at:
drm_ioctl_kernel+0xfb/0x1a0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (>master_rwsem){}-{3:3}:
lock_acquire+0xd3/0x310
down_read+0x3b/0x140
drm_master_internal_acquire+0x1d/0x60
drm_client_modeset_commit+0x10/0x40
__drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
drm_fb_helper_set_par+0x34/0x40
intel_fbdev_set_par+0x11/0x40 [i915]
fbcon_init+0x270/0x4f0
visual_init+0xc6/0x130
do_bind_con_driver+0x1de/0x2c0
do_take_over_console+0x10e/0x180
do_fbcon_takeover+0x53/0xb0
register_framebuffer+0x22d/0x310
__drm_fb_helper_initial_config_and_unlock+0x36c/0x540
intel_fbdev_initial_config+0xf/0x20 [i915]
async_run_entry_fn+0x28/0x130
process_one_work+0x26d/0x5c0
worker_thread+0x37/0x390
kthread+0x13b/0x170
ret_from_fork+0x1f/0x30

-> #1 (>lock){+.+.}-{3:3}:
lock_acquire+0xd3/0x310
__mutex_lock+0xa8/0x930
__drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0
intel_fbdev_restore_mode+0x2b/0x50 [i915]
drm_lastclose+0x27/0x50
drm_release_noglobal+0x42/0x60
__fput+0x9e/0x250
task_work_run+0x6b/0xb0
exit_to_user_mode_prepare+0x1c5/0x1d0
syscall_exit_to_user_mode+0x19/0x50
do_syscall_64+0x46/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #0 (drm_global_mutex){+.+.}-{3:3}:
validate_chain+0xb39/0x1e70
__lock_acquire+0x5a1/0xb70
lock_acquire+0xd3/0x310
__mutex_lock+0xa8/0x930
drm_open+0x64/0x280
drm_stub_open+0x9f/0x100
chrdev_open+0x9f/0x1d0
do_dentry_open+0x14a/0x3a0
dentry_open+0x53/0x70
drm_mode_create_lease_ioctl+0x3cb/0x970
drm_ioctl_kernel+0xc9/0x1a0
drm_ioctl+0x201/0x3d0
__x64_sys_ioctl+0x6a/0xa0
do_syscall_64+0x37/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:
Chain exists of:
   drm_global_mutex --> >lock --> >master_rwsem
  Possible unsafe locking scenario:
CPU0CPU1

   lock(>master_rwsem);
lock(>lock);
lock(>master_rwsem);
   lock(drm_global_mutex);

  *** DEADLOCK ***

The lock hierarchy inversion happens because we grab the
drm_global_mutex while already holding on to master_rwsem. To avoid
this, we do some prep work to grab the drm_global_mutex before
checking for ioctl permissions.

At the same time, we update the check for the global mutex to use the
drm_dev_needs_global_mutex helper function.


This is intentional, essentially we force all non-legacy drivers to have
unlocked ioctl (otherwise everyone forgets to set that flag).

For non-legacy drivers the global lock only ensures ordering between
drm_open and lastclose (I think at least), and between
drm_dev_register/unregister and the backwards ->load/unload callbacks
(which are called in the wrong place, but we cannot fix that for legacy
drivers).

->load/unload should be completely unused (maybe radeon still uses it),
and ->lastclose is also on the decline.



Ah ok got it, I'll change the check back to
drm_core_check_feature(dev, DRIVER_LEGACY) then.


Maybe we should update the comment of drm_global_mutex to explain what it
protects and why.



The comments in drm_dev_needs_global_mutex make sense I think, I just
didn't read the code closely enough.


I'm also confused how this patch connects to the splat, since for i915 we


Right, my bad, this is a separate instance of circular locking. I was
too hasty when I saw that for legacy drivers we might grab master_rwsem
then drm_global_mutex in the ioctl handler.


shouldn't be taking the drm_global_lock here at all. The problem seems to
be the drm_open_helper when we create a new lease, which is an entirely
different can of worms.

I'm honestly not sure how to best do that, but we should be able to create
a file and then call drm_open_helper directly, or well a version of that
which never takes the drm_global_mutex. Because that is not needed for
nested drm_file opening:
- legacy drivers never go down 

[Intel-gfx] [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler

2021-08-18 Thread Desmond Cheong Zhi Xi
In a future patch, a read lock on drm_device.master_rwsem is
held in the ioctl handler before the check for ioctl
permissions. However, this produces the following lockdep splat:

==
WARNING: possible circular locking dependency detected
5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U
--
kms_lease/1752 is trying to acquire lock:
827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280

but task is already holding lock:
88812e350108 (>master_rwsem){}-{3:3}, at:
drm_ioctl_kernel+0xfb/0x1a0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (>master_rwsem){}-{3:3}:
   lock_acquire+0xd3/0x310
   down_read+0x3b/0x140
   drm_master_internal_acquire+0x1d/0x60
   drm_client_modeset_commit+0x10/0x40
   __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
   drm_fb_helper_set_par+0x34/0x40
   intel_fbdev_set_par+0x11/0x40 [i915]
   fbcon_init+0x270/0x4f0
   visual_init+0xc6/0x130
   do_bind_con_driver+0x1de/0x2c0
   do_take_over_console+0x10e/0x180
   do_fbcon_takeover+0x53/0xb0
   register_framebuffer+0x22d/0x310
   __drm_fb_helper_initial_config_and_unlock+0x36c/0x540
   intel_fbdev_initial_config+0xf/0x20 [i915]
   async_run_entry_fn+0x28/0x130
   process_one_work+0x26d/0x5c0
   worker_thread+0x37/0x390
   kthread+0x13b/0x170
   ret_from_fork+0x1f/0x30

-> #1 (>lock){+.+.}-{3:3}:
   lock_acquire+0xd3/0x310
   __mutex_lock+0xa8/0x930
   __drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0
   intel_fbdev_restore_mode+0x2b/0x50 [i915]
   drm_lastclose+0x27/0x50
   drm_release_noglobal+0x42/0x60
   __fput+0x9e/0x250
   task_work_run+0x6b/0xb0
   exit_to_user_mode_prepare+0x1c5/0x1d0
   syscall_exit_to_user_mode+0x19/0x50
   do_syscall_64+0x46/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #0 (drm_global_mutex){+.+.}-{3:3}:
   validate_chain+0xb39/0x1e70
   __lock_acquire+0x5a1/0xb70
   lock_acquire+0xd3/0x310
   __mutex_lock+0xa8/0x930
   drm_open+0x64/0x280
   drm_stub_open+0x9f/0x100
   chrdev_open+0x9f/0x1d0
   do_dentry_open+0x14a/0x3a0
   dentry_open+0x53/0x70
   drm_mode_create_lease_ioctl+0x3cb/0x970
   drm_ioctl_kernel+0xc9/0x1a0
   drm_ioctl+0x201/0x3d0
   __x64_sys_ioctl+0x6a/0xa0
   do_syscall_64+0x37/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:
Chain exists of:
  drm_global_mutex --> >lock --> >master_rwsem
 Possible unsafe locking scenario:
   CPU0CPU1
   
  lock(>master_rwsem);
   lock(>lock);
   lock(>master_rwsem);
  lock(drm_global_mutex);

 *** DEADLOCK ***

The lock hierarchy inversion happens because we grab the
drm_global_mutex while already holding on to master_rwsem. To avoid
this, we do some prep work to grab the drm_global_mutex before
checking for ioctl permissions.

At the same time, we update the check for the global mutex to use the
drm_dev_needs_global_mutex helper function.

Signed-off-by: Desmond Cheong Zhi Xi 
---
 drivers/gpu/drm/drm_ioctl.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 880fc565d599..2cb57378a787 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -779,19 +779,19 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t 
*func, void *kdata,
if (drm_dev_is_unplugged(dev))
return -ENODEV;
 
+   /* Enforce sane locking for modern driver ioctls. */
+   if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & 
DRM_UNLOCKED))
+   mutex_lock(_global_mutex);
+
retcode = drm_ioctl_permit(flags, file_priv);
if (unlikely(retcode))
-   return retcode;
+   goto out;
 
-   /* Enforce sane locking for modern driver ioctls. */
-   if (likely(!drm_core_check_feature(dev, DRIVER_LEGACY)) ||
-   (flags & DRM_UNLOCKED))
-   retcode = func(dev, kdata, file_priv);
-   else {
-   mutex_lock(_global_mutex);
-   retcode = func(dev, kdata, file_priv);
+   retcode = func(dev, kdata, file_priv);
+
+out:
+   if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & 
DRM_UNLOCKED))
mutex_unlock(_global_mutex);
-   }
return retcode;
 }
 EXPORT_SYMBOL(drm_ioctl_kernel);
-- 
2.25.1



Re: [Intel-gfx] [PATCH v3 7/9] drm: update global mutex lock in the ioctl handler

2021-08-18 Thread Daniel Vetter
On Wed, Aug 18, 2021 at 03:38:22PM +0800, Desmond Cheong Zhi Xi wrote:
> In a future patch, a read lock on drm_device.master_rwsem is
> held in the ioctl handler before the check for ioctl
> permissions. However, this produces the following lockdep splat:
> 
> ==
> WARNING: possible circular locking dependency detected
> 5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U
> --
> kms_lease/1752 is trying to acquire lock:
> 827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280
> 
> but task is already holding lock:
> 88812e350108 (>master_rwsem){}-{3:3}, at:
> drm_ioctl_kernel+0xfb/0x1a0
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (>master_rwsem){}-{3:3}:
>lock_acquire+0xd3/0x310
>down_read+0x3b/0x140
>drm_master_internal_acquire+0x1d/0x60
>drm_client_modeset_commit+0x10/0x40
>__drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
>drm_fb_helper_set_par+0x34/0x40
>intel_fbdev_set_par+0x11/0x40 [i915]
>fbcon_init+0x270/0x4f0
>visual_init+0xc6/0x130
>do_bind_con_driver+0x1de/0x2c0
>do_take_over_console+0x10e/0x180
>do_fbcon_takeover+0x53/0xb0
>register_framebuffer+0x22d/0x310
>__drm_fb_helper_initial_config_and_unlock+0x36c/0x540
>intel_fbdev_initial_config+0xf/0x20 [i915]
>async_run_entry_fn+0x28/0x130
>process_one_work+0x26d/0x5c0
>worker_thread+0x37/0x390
>kthread+0x13b/0x170
>ret_from_fork+0x1f/0x30
> 
> -> #1 (>lock){+.+.}-{3:3}:
>lock_acquire+0xd3/0x310
>__mutex_lock+0xa8/0x930
>__drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0
>intel_fbdev_restore_mode+0x2b/0x50 [i915]
>drm_lastclose+0x27/0x50
>drm_release_noglobal+0x42/0x60
>__fput+0x9e/0x250
>task_work_run+0x6b/0xb0
>exit_to_user_mode_prepare+0x1c5/0x1d0
>syscall_exit_to_user_mode+0x19/0x50
>do_syscall_64+0x46/0xb0
>entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #0 (drm_global_mutex){+.+.}-{3:3}:
>validate_chain+0xb39/0x1e70
>__lock_acquire+0x5a1/0xb70
>lock_acquire+0xd3/0x310
>__mutex_lock+0xa8/0x930
>drm_open+0x64/0x280
>drm_stub_open+0x9f/0x100
>chrdev_open+0x9f/0x1d0
>do_dentry_open+0x14a/0x3a0
>dentry_open+0x53/0x70
>drm_mode_create_lease_ioctl+0x3cb/0x970
>drm_ioctl_kernel+0xc9/0x1a0
>drm_ioctl+0x201/0x3d0
>__x64_sys_ioctl+0x6a/0xa0
>do_syscall_64+0x37/0xb0
>entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> Chain exists of:
>   drm_global_mutex --> >lock --> >master_rwsem
>  Possible unsafe locking scenario:
>CPU0CPU1
>
>   lock(>master_rwsem);
>lock(>lock);
>lock(>master_rwsem);
>   lock(drm_global_mutex);
> 
>  *** DEADLOCK ***
> 
> The lock hierarchy inversion happens because we grab the
> drm_global_mutex while already holding on to master_rwsem. To avoid
> this, we do some prep work to grab the drm_global_mutex before
> checking for ioctl permissions.
> 
> At the same time, we update the check for the global mutex to use the
> drm_dev_needs_global_mutex helper function.

This is intentional, essentially we force all non-legacy drivers to have
unlocked ioctl (otherwise everyone forgets to set that flag).

For non-legacy drivers the global lock only ensures ordering between
drm_open and lastclose (I think at least), and between
drm_dev_register/unregister and the backwards ->load/unload callbacks
(which are called in the wrong place, but we cannot fix that for legacy
drivers).

->load/unload should be completely unused (maybe radeon still uses it),
and ->lastclose is also on the decline.

Maybe we should update the comment of drm_global_mutex to explain what it
protects and why.

I'm also confused how this patch connects to the splat, since for i915 we
shouldn't be taking the drm_global_lock here at all. The problem seems to
be the drm_open_helper when we create a new lease, which is an entirely
different can of worms.

I'm honestly not sure how to best do that, but we should be able to create
a file and then call drm_open_helper directly, or well a version of that
which never takes the drm_global_mutex. Because that is not needed for
nested drm_file opening:
- legacy drivers never go down this path because leases are only supported
  with modesetting, and modesetting is only supported for non-legacy
  drivers
- the races against dev->open_count due to last_close or ->load callbacks
  don't matter, because for the entire ioctl we already have an open
  drm_file and that wont disappear.

So this