mm.lock, v4.

Intel Mon, 19 Oct 2020 23:29:13 -0700


On 10/19/20 10:10 AM, Maarten Lankhorst wrote:

Op 19-10-2020 om 09:30 schreef Thomas Hellström (Intel):

On 10/16/20 12:43 PM, Maarten Lankhorst wrote:

Instead of doing what we do currently, which will never work with
PROVE_LOCKING, do the same as AMD does, and something similar to
relocation slowpath. When all locks are dropped, we acquire the
pages for pinning. When the locks are taken, we transfer those
pages in .get_pages() to the bo. As a final check before installing
the fences, we ensure that the mmu notifier was not called; if it is,
we return -EAGAIN to userspace to signal it has to start over.


Changes since v1:
- Unbinding is done in submit_init only. submit_begin() removed.
- MMU_NOTFIER -> MMU_NOTIFIER
Changes since v2:
- Make i915->mm.notifier a spinlock.
Changes since v3:
- Add WARN_ON if there are any page references left, should have been 0.
- Return 0 on success in submit_init(), bug from spinlock conversion.
- Release pvec outside of notifier_lock (Thomas).

Signed-off-by: Maarten Lankhorst <maarten.lankho...@linux.intel.com>

There are a number of failures in gem_userptr_blits (that has a patch by Chris 
enabling it to run properly) that needs investigating.

---
   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  94 ++-
   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  35 +-
   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  10 +-
   drivers/gpu/drm/i915/gem/i915_gem_pages.c     |   2 +-
   drivers/gpu/drm/i915/gem/i915_gem_userptr.c   | 765 +++++-------------
   drivers/gpu/drm/i915/i915_drv.h               |   9 +-
   drivers/gpu/drm/i915/i915_gem.c               |   5 +-
   7 files changed, 334 insertions(+), 586 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 89d7e7980eae..c9db199c4d81 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -52,14 +52,16 @@ enum {
   /* __EXEC_OBJECT_NO_RESERVE is BIT(31), defined in i915_vma.h */
   #define __EXEC_OBJECT_HAS_PIN        BIT(30)
   #define __EXEC_OBJECT_HAS_FENCE        BIT(29)
-#define __EXEC_OBJECT_NEEDS_MAP        BIT(28)
-#define __EXEC_OBJECT_NEEDS_BIAS    BIT(27)
-#define __EXEC_OBJECT_INTERNAL_FLAGS    (~0u << 27) /* all of the above + */
+#define __EXEC_OBJECT_USERPTR_INIT    BIT(28)
+#define __EXEC_OBJECT_NEEDS_MAP        BIT(27)
+#define __EXEC_OBJECT_NEEDS_BIAS    BIT(26)
+#define __EXEC_OBJECT_INTERNAL_FLAGS    (~0u << 26) /* all of the above + */
   #define __EXEC_OBJECT_RESERVED (__EXEC_OBJECT_HAS_PIN | 
__EXEC_OBJECT_HAS_FENCE)
     #define __EXEC_HAS_RELOC    BIT(31)
   #define __EXEC_ENGINE_PINNED    BIT(30)
-#define __EXEC_INTERNAL_FLAGS    (~0u << 30)
+#define __EXEC_USERPTR_USED    BIT(29)
+#define __EXEC_INTERNAL_FLAGS    (~0u << 29)
   #define UPDATE            PIN_OFFSET_FIXED
     #define BATCH_OFFSET_BIAS (256*1024)
@@ -865,6 +867,19 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
           }
             eb_add_vma(eb, i, batch, vma);
+
+        if (i915_gem_object_is_userptr(vma->obj)) {
+            err = i915_gem_object_userptr_submit_init(vma->obj);
+            if (err) {
+                if (i + 1 < eb->buffer_count)
+                    eb->vma[i + 1].vma = NULL;

Why i+1 here? Perhaps worth a code comment? }

...

   +#ifdef CONFIG_MMU_NOTIFIER
+    if (!err && (eb->args->flags & __EXEC_USERPTR_USED)) {
+        spin_lock(&eb->i915->mm.notifier_lock);
+
+        /*
+         * count is always at least 1, otherwise __EXEC_USERPTR_USED
+         * could not have been set
+         */
+        for (i = count - 1; i; i--) {

Aren't we missing index 0 here?

+            struct eb_vma *ev = &eb->vma[i];
+            struct drm_i915_gem_object *obj = ev->vma->obj;
+
+            if (!i915_gem_object_is_userptr(obj))
+                continue;
+
+            err = i915_gem_object_userptr_submit_done(obj);
+            if (err)
+                break;

Is there a chance we could avoid setting the fence for this object if we need 
to restart with -EAGAIN and thus submit an empty batchbuffer? Because if we 
allow setting a fence even when returning -EAGAIN, that would likely not be the 
fence waited for in the invalidation callback, but rather a fence waited for 
during unbind in the next submit_init which is undesirable.

+        }
+
+        spin_unlock(&eb->i915->mm.notifier_lock);
+    }
+#endif
+

...

+/**
+ * i915_gem_userptr_invalidate - callback to notify about mm change
+ *
+ * @mni: the range (mm) is about to update
+ * @range: details on the invalidation
+ * @cur_seq: Value to pass to mmu_interval_set_seq()
+ *
+ * Block for operations on BOs to finish and mark pages as accessed and
+ * potentially dirty.
+ */
+static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni,
+                    const struct mmu_notifier_range *range,
+                    unsigned long cur_seq)
   {
-    struct i915_mmu_notifier *mn =
-        container_of(_mn, struct i915_mmu_notifier, mn);
-    struct interval_tree_node *it;
-    unsigned long end;
-    int ret = 0;
-
-    if (RB_EMPTY_ROOT(&mn->objects.rb_root))
-        return 0;
-
-    /* interval ranges are inclusive, but invalidate range is exclusive */
-    end = range->end - 1;
-
-    spin_lock(&mn->lock);
-    it = interval_tree_iter_first(&mn->objects, range->start, end);
-    while (it) {
-        struct drm_i915_gem_object *obj;
-
-        if (!mmu_notifier_range_blockable(range)) {
-            ret = -EAGAIN;
-            break;
-        }
+    struct drm_i915_gem_object *obj = container_of(mni, struct 
drm_i915_gem_object, userptr.notifier);
+    struct drm_i915_private *i915 = to_i915(obj->base.dev);
+    long r;
   -        /*
-         * The mmu_object is released late when destroying the
-         * GEM object so it is entirely possible to gain a
-         * reference on an object in the process of being freed
-         * since our serialisation is via the spinlock and not
-         * the struct_mutex - and consequently use it after it
-         * is freed and then double free it. To prevent that
-         * use-after-free we only acquire a reference on the
-         * object if it is not in the process of being destroyed.
-         */
-        obj = container_of(it, struct i915_mmu_object, it)->obj;
-        if (!kref_get_unless_zero(&obj->base.refcount)) {
-            it = interval_tree_iter_next(it, range->start, end);
-            continue;
-        }
-        spin_unlock(&mn->lock);
+    if (!mmu_notifier_range_blockable(range))
+        return false;
   -        ret = i915_gem_object_unbind(obj,
-                         I915_GEM_OBJECT_UNBIND_ACTIVE |
-                         I915_GEM_OBJECT_UNBIND_BARRIER);
-        if (ret == 0)
-            ret = __i915_gem_object_put_pages(obj);
-        i915_gem_object_put(obj);
-        if (ret)
-            return ret;

If we add an additional fence wait here before setting the seq (which takes the 
invalidation write seqlock), we'd reduce (but definitely not eliminate) the 
chance of waiting inside the invalidation write seqlock, which could trigger a 
wait in submit_init until the write_seqlock is released. That would make the 
new userptr invalidation similarly unlikely to trigger a wait in the command 
submission path as the previous userptr invalidation.

I think it doesn't matter much, I chose to wait after setting the seq to ensure 
new
userptr submitters will get the new seqno as soon as possible. We only want to 
force
previous waiters to complete, invalidation of userptr, especially active ones
shouldn't be a hot path, so I don't think we want to spend much time optimizing 
it. :)

It's not so much about optimising, really, but to avoid unexpectedbehaviour under memory pressure that potentially breaks existinguser-space. As soon as we set the new seqno, new submitters will blockuninterruptibly waiting for the invalidation to complete, but I reallysee no reliable way around that.


/Thomas


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v4 15/61] drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v4.

Reply via email to