amdgpu: implement amdgpu_gem_prime_move_notify v2

VMware Tue, 18 Feb 2020 12:18:05 -0800

On 2/17/20 6:55 PM, Daniel Vetter wrote:

On Mon, Feb 17, 2020 at 04:45:09PM +0100, Christian König wrote:

Implement the importer side of unpinned DMA-buf handling.


v2: update page tables immediately

Signed-off-by: Christian König <christian.koe...@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 66 ++++++++++++++++++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  6 ++
  2 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index 770baba621b3..48de7624d49c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -453,7 +453,71 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct 
dma_buf *dma_buf)
        return ERR_PTR(ret);
  }

+/**

+ * amdgpu_dma_buf_move_notify - &attach.move_notify implementation
+ *
+ * @attach: the DMA-buf attachment
+ *
+ * Invalidate the DMA-buf attachment, making sure that the we re-create the
+ * mapping before the next use.
+ */
+static void
+amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach)
+{
+       struct drm_gem_object *obj = attach->importer_priv;
+       struct ww_acquire_ctx *ticket = dma_resv_locking_ctx(obj->resv);
+       struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
+       struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
+       struct ttm_operation_ctx ctx = { false, false };
+       struct ttm_placement placement = {};
+       struct amdgpu_vm_bo_base *bo_base;
+       int r;
+
+       if (bo->tbo.mem.mem_type == TTM_PL_SYSTEM)
+               return;
+
+       r = ttm_bo_validate(&bo->tbo, &placement, &ctx);
+       if (r) {
+               DRM_ERROR("Failed to invalidate DMA-buf import (%d))\n", r);
+               return;
+       }
+
+       for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) {
+               struct amdgpu_vm *vm = bo_base->vm;
+               struct dma_resv *resv = vm->root.base.bo->tbo.base.resv;
+
+               if (ticket) {

Yeah so this is kinda why I've been a total pain about the exact semantics
of the move_notify hook. I think we should flat-out require that importers
_always_ have a ticket attach when they call this, and that they can cope
with additional locks being taken (i.e. full EDEADLCK) handling.

Simplest way to force that contract is to add a dummy 2nd ww_mutex lock to
the dma_resv object, which we then can take #ifdef
CONFIG_WW_MUTEX_SLOWPATH_DEBUG. Plus mabye a WARN_ON(!ticket).

Now the real disaster is how we handle deadlocks. Two issues:

- Ideally we'd keep any lock we've taken locked until the end, it helps
   needless backoffs. I've played around a bit with that but not even poc
   level, just an idea:

https://cgit.freedesktop.org/~danvet/drm/commit/?id=b1799c5a0f02df9e1bb08d27be37331255ab7582

   Idea is essentially to track a list of objects we had to lock as part of
   the ttm_bo_validate of the main object.

- Second one is if we get a EDEADLCK on one of these sublocks (like the
   one here). We need to pass that up the entire callchain, including a
   temporary reference (we have to drop locks to do the ww_mutex_lock_slow
   call), and need a custom callback to drop that temporary reference
   (since that's all driver specific, might even be internal ww_mutex and
   not anything remotely looking like a normal dma_buf). This probably
   needs the exec util helpers from ttm, but at the dma_resv level, so that
   we can do something like this:

struct dma_resv_ticket {
        struct ww_acquire_ctx base;

        /* can be set by anyone (including other drivers) that got hold of
         * this ticket and had to acquire some new lock. This lock might
         * protect anything, including driver-internal stuff, and isn't
         * required to be a dma_buf or even just a dma_resv. */
        struct ww_mutex *contended_lock;

        /* callback which the driver (which might be a dma-buf exporter
         * and not matching the driver that started this locking ticket)
         * sets together with @contended_lock, for the main driver to drop
         * when it calls dma_resv_unlock on the contended_lock. */
        void (drop_ref*)(struct ww_mutex *contended_lock);
};

This is all supremely nasty (also ttm_bo_validate would need to be
improved to handle these sublocks and random new objects that could force
a ww_mutex_lock_slow).

Just a short comment on this:

Neither the currently used wait-die or the wound-wait algorithm*strictly* requires a slow lock on the contended lock. For wait-die it'sjust very convenient since it makes us sleep instead of spinning with-EDEADLK on the contended lock. For wound-wait IIRC one could justimmediately restart the whole locking transaction after an -EDEADLK, andthe transaction would automatically end up waiting on the contendedlock, provided the mutex lock stealing is not allowed. There is howevera possibility that the transaction will be wounded again on anotherlock, taken before the contended lock, but I think there are ways toimprove the wound-wait algorithm to reduce that probability.

So in short, choosing the wound-wait algorithm instead of wait-die andperhaps modifying the ww mutex code somewhat would probably help passingan -EDEADLK up the call chain without requiring passing the contendedlock, as long as each locker releases its own locks when receiving an-EDEADLK.


/Thomas




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify v2

Reply via email to