Re: [Intel-gfx] [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()

Chris Wilson Tue, 18 Oct 2016 01:38:57 -0700

On Tue, Oct 18, 2016 at 09:22:56AM +0100, Tvrtko Ursulin wrote:
> 
> On 18/10/2016 09:17, Chris Wilson wrote:
> >On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
> >>On 17/10/2016 15:10, Chris Wilson wrote:
> >>>When handling execbuf relocations, we play a delicate dance with
> >>>pagefault. We first try to access the user pages underneath our
> >>>struct_mutex. However, if those pages were inside a GEM object, we may
> >>>trigger a pagefault and deadlock as i915_gem_fault() tries to
> >>>recursively acquire struct_mutex. Instead, we choose to disable
> >>>pagefaulting around the copy_from_user whilst inside the struct_mutex
> >>>and handle the EFAULT by falling back to a copy outside the
> >>>struct_mutex.
> >>>
> >>>We however presumed that disabling pagefaults would be expensive. It is
> >>>just an operation on the local current task. Cheap enough that we can
> >>>restrict the disable/enable to the critical section around the copy, and
> >>>so avoid having to handle the atomic sections within the relocation
> >>>handling itself.
> >>>
> >>>Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> >>>Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
> >>>Cc: Tvrtko Ursulin <tvrtko.ursu...@linux.intel.com>
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 
> >>> +++++++++++++-----------------
> >>>  1 file changed, 28 insertions(+), 36 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
> >>>b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>index 1d02e74ce62d..22342ad0e07f 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>@@ -551,20 +551,6 @@ repeat:
> >>>   return 0;
> >>>  }
> >>>-static bool object_is_idle(struct drm_i915_gem_object *obj)
> >>>-{
> >>>-  unsigned long active = i915_gem_object_get_active(obj);
> >>>-  int idx;
> >>>-
> >>>-  for_each_active(active, idx) {
> >>>-          if (!i915_gem_active_is_idle(&obj->last_read[idx],
> >>>-                                       &obj->base.dev->struct_mutex))
> >>>-                  return false;
> >>>-  }
> >>>-
> >>>-  return true;
> >>>-}
> >>>-
> >>>  static int
> >>>  i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> >>>                              struct eb_vmas *eb,
> >>>@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct 
> >>>drm_i915_gem_object *obj,
> >>>           return -EINVAL;
> >>>   }
> >>>-  /* We can't wait for rendering with pagefaults disabled */
> >>>-  if (pagefault_disabled() && !object_is_idle(obj))
> >>>-          return -EFAULT;
> >>>-
> >>>   ret = relocate_entry(obj, reloc, cache, target_offset);
> >>>   if (ret)
> >>>           return ret;
> >>>@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma 
> >>>*vma,
> >>>   remain = entry->relocation_count;
> >>>   while (remain) {
> >>>           struct drm_i915_gem_relocation_entry *r = stack_reloc;
> >>>-          int count = remain;
> >>>-          if (count > ARRAY_SIZE(stack_reloc))
> >>>-                  count = ARRAY_SIZE(stack_reloc);
> >>>+          unsigned long unwritten;
> >>>+          unsigned int count;
> >>>+
> >>>+          count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
> >>>           remain -= count;
> >>>-          if (__copy_from_user_inatomic(r, user_relocs, 
> >>>count*sizeof(r[0]))) {
> >>>+          /* This is the fast path and we cannot handle a pagefault
> >>>+           * whilst holding the struct mutex lest the user pass in the
> >>>+           * relocations contained within a mmaped bo. For in such a case
> >>>+           * we, the page fault handler would call i915_gem_fault() and
> >>>+           * we would try to acquire the struct mutex again. Obviously
> >>>+           * this is bad and so lockdep complains vehemently.
> >>>+           */
> >>>+          pagefault_disable();
> >>>+          unwritten = __copy_from_user_inatomic(r, user_relocs, 
> >>>count*sizeof(r[0]));
> >>>+          pagefault_enable();
> >>>+          if (unwritten) {
> >>>                   ret = -EFAULT;
> >>>                   goto out;
> >>>           }
> >>>@@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma 
> >>>*vma,
> >>>                   if (ret)
> >>>                           goto out;
> >>>-                  if (r->presumed_offset != offset &&
> >>>-                      __put_user(r->presumed_offset,
> >>>-                                 &user_relocs->presumed_offset)) {
> >>>-                          ret = -EFAULT;
> >>>-                          goto out;
> >>>+                  if (r->presumed_offset != offset) {
> >>>+                          /* Copying back to the user is allowed to fail.
> >>>+                           * The information passed back is a hint as
> >>>+                           * to the final location. If the copy_to_user
> >>>+                           * fails after a successful copy_from_user,
> >>>+                           * it must be a readonly location and so
> >>>+                           * we presume the user knows what they are
> >>>+                           * doing!
> >>>+                           */
> >>>+                          pagefault_disable();
> >>>+                          __put_user(r->presumed_offset,
> >>>+                                     &user_relocs->presumed_offset);
> >>>+                          pagefault_enable();
> >>Why is a good idea to ignore potential errors here?
> >Wrong question: why did we think it a good idea to ignore success here?
> 
> How were we ignoring errors, I don't follow?
> 
> >(a) it is safe to do so, and I can legitimately setup userspace to use
> >this
> 
> How what where? :)


igt, whereelse, has a test case to check we can execute from read only
memory.
 
> >(b) reporting an error after we have committed the change is broken
> >anyway.
> 
> You mean in the half way through cases?

We've already made the user/gpu visible change. Failing here is broken.
(User doesn't notice the change, presumed_offset doesn't match actual,
kernel is oblivious to mismatch, GPU hangs.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()

Reply via email to