Re: [Intel-gfx] [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify v2
On 2/24/20 7:46 PM, Christian König wrote: Am 23.02.20 um 17:54 schrieb Thomas Hellström (VMware): On 2/23/20 4:45 PM, Christian König wrote: Am 21.02.20 um 18:12 schrieb Daniel Vetter: [SNIP] Yeah the Great Plan (tm) is to fully rely on ww_mutex slowly degenerating into essentially a global lock. But only when there's actual contention and thrashing. Yes exactly. A really big problem in TTM is currently that we drop the lock after evicting BOs because they tend to move in again directly after that. From practice I can also confirm that there is exactly zero benefit from dropping locks early and reacquire them for example for the VM page tables. That's just makes it more likely that somebody needs to roll back and this is what we need to avoid in the first place. If you have a benchmarking setup available it would be very interesting for future reference to see how changing from WD to WW mutexes affects the roll back frequency. WW is known to cause rollbacks much less frequently but there is more work associated with each rollback. Not of hand. To be honest I still have a hard time to get a grip on the difference between WD and WW from the algorithm point of view. So I can't judge that difference at all. OK. I don't think a detailed understanding of the algorithms is strictly necessary, though. If we had had a good testcase we'd just have to change DEFINE_WD_CLASS in dma-resv.c to DEFINE_WW_CLASS and benchmark relevant figures. Contention on BO locks during command submission is perfectly fine as long as this is as lightweight as possible while we don't have trashing. When we have trashing multi submission performance is best archived to just favor a single process to finish its business and block everybody else. Hmm. Sounds like we need a per-manager ww_rwsem protecting manager allocation, taken in write-mode then there's thrashing. In read-mode otherwise. That would limit the amount of "unnecessary" locks we'd have to keep and reduce unwanted side-effects, (see below): Well per-manager (you mean per domain here don't you?) yes. doesn't sound like that useful because we rarely use only one domain, Well the difference to keeping all locks would boil down to: "Instead of keeping all locks of all bos we evict from thrashing domains, keep locks of all thrashing domains in write mode". This means that for domains that are not thrashing, we'd just keep read locks. but I'm actually questioning for quite a while if the per BO lock scheme was the right approach. See from the performance aspect the closest to ideal solution I can think of would be a ww_rwsem per user of a resource. In other words we don't lock BOs, but instead a list of all their users and when you want to evict a BO you need to walk that list and inform all users that the BO will be moving. During command submission you then have the fast path which rather just grabs the read side of the user lock and check if all BOs are still in the expected place. If some BOs were evicted you back off and start the slow path, e.g. maybe even copy additional data from userspace then grab the write side of the lock etc.. etc... That approach is similar to what we use in amdgpu with the per-VM BOs, but goes a step further. Problem is that we are so used to per BO locks in the kernel that this is probably not doable any more. I think we need some-sort of per-bo lock to protect bo metadata. But yes, relying solely on them to resolve other resource (domain) contention may not be (or rather probably isn't) the right choice. Because of this I would actually vote for forbidding to release individual ww_mutex() locks in a context. Yes, I see the problem. But my first reaction is that this might have undersirable side-effects. Let's say somebody wanted to swap the evicted BOs out? Please explain further, I off hand don't see the problem here. Lets say thread A evicts a lot of thread B's bos, and keeps the locks of those bos for a prolonged time. Then thread C needs memory and wants to swap out thread B's bos. It can't, or at least not during a certain delay because thread A unnecessarily holds the locks. In general I actually wanted to re-work TTM in a way that BOs in the SYSTEM/SWAPABLE domain are always backed by a shmem file instead of the struct page array we currently have. That would probably work well if there are no SYSTEM+write-combined users anymore. Typically in the old AGP days, you wouldn't change caching mode when evicting from write-combine AGP to SYSTEM because of the dead slow wbinvd() operation. Or cpu-writes to them causing faults, that might also block the mm_sem, which in turn blocks hugepaged? Mhm, I also only have a higher level view how hugepaged works so why does it grabs the mm_sem on the write side? If I understand it correctly, it's needed when collapsing PMD directories to huge PMD pages. But this w
Re: [Intel-gfx] [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify v2
On 2/23/20 4:45 PM, Christian König wrote: Am 21.02.20 um 18:12 schrieb Daniel Vetter: [SNIP] Yeah the Great Plan (tm) is to fully rely on ww_mutex slowly degenerating into essentially a global lock. But only when there's actual contention and thrashing. Yes exactly. A really big problem in TTM is currently that we drop the lock after evicting BOs because they tend to move in again directly after that. From practice I can also confirm that there is exactly zero benefit from dropping locks early and reacquire them for example for the VM page tables. That's just makes it more likely that somebody needs to roll back and this is what we need to avoid in the first place. If you have a benchmarking setup available it would be very interesting for future reference to see how changing from WD to WW mutexes affects the roll back frequency. WW is known to cause rollbacks much less frequently but there is more work associated with each rollback. Contention on BO locks during command submission is perfectly fine as long as this is as lightweight as possible while we don't have trashing. When we have trashing multi submission performance is best archived to just favor a single process to finish its business and block everybody else. Hmm. Sounds like we need a per-manager ww_rwsem protecting manager allocation, taken in write-mode then there's thrashing. In read-mode otherwise. That would limit the amount of "unnecessary" locks we'd have to keep and reduce unwanted side-effects, (see below): Because of this I would actually vote for forbidding to release individual ww_mutex() locks in a context. Yes, I see the problem. But my first reaction is that this might have undersirable side-effects. Let's say somebody wanted to swap the evicted BOs out? Or cpu-writes to them causing faults, that might also block the mm_sem, which in turn blocks hugepaged? Still it's a fairly simple solution to a problem that seems otherwise hard to solve efficiently. Thanks, Thomas Regards, Christian. -Daniel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify v2
On 2/21/20 6:12 PM, Daniel Vetter wrote: On Thu, Feb 20, 2020 at 11:51:07PM +0100, Thomas Hellström (VMware) wrote: On 2/20/20 9:08 PM, Daniel Vetter wrote: On Thu, Feb 20, 2020 at 08:46:27PM +0100, Thomas Hellström (VMware) wrote: On 2/20/20 7:04 PM, Daniel Vetter wrote: On Thu, Feb 20, 2020 at 10:39:06AM +0100, Thomas Hellström (VMware) wrote: On 2/19/20 7:42 AM, Thomas Hellström (VMware) wrote: On 2/18/20 10:01 PM, Daniel Vetter wrote: On Tue, Feb 18, 2020 at 9:17 PM Thomas Hellström (VMware) wrote: On 2/17/20 6:55 PM, Daniel Vetter wrote: On Mon, Feb 17, 2020 at 04:45:09PM +0100, Christian König wrote: Implement the importer side of unpinned DMA-buf handling. v2: update page tables immediately Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 66 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 2 files changed, 71 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 770baba621b3..48de7624d49c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -453,7 +453,71 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf) return ERR_PTR(ret); } +/** + * amdgpu_dma_buf_move_notify - _notify implementation + * + * @attach: the DMA-buf attachment + * + * Invalidate the DMA-buf attachment, making sure that the we re-create the + * mapping before the next use. + */ +static void +amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach) +{ + struct drm_gem_object *obj = attach->importer_priv; + struct ww_acquire_ctx *ticket = dma_resv_locking_ctx(obj->resv); + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); + struct ttm_operation_ctx ctx = { false, false }; + struct ttm_placement placement = {}; + struct amdgpu_vm_bo_base *bo_base; + int r; + + if (bo->tbo.mem.mem_type == TTM_PL_SYSTEM) + return; + + r = ttm_bo_validate(>tbo, , ); + if (r) { + DRM_ERROR("Failed to invalidate DMA-buf import (%d))\n", r); + return; + } + + for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) { + struct amdgpu_vm *vm = bo_base->vm; + struct dma_resv *resv = vm->root.base.bo->tbo.base.resv; + + if (ticket) { Yeah so this is kinda why I've been a total pain about the exact semantics of the move_notify hook. I think we should flat-out require that importers _always_ have a ticket attach when they call this, and that they can cope with additional locks being taken (i.e. full EDEADLCK) handling. Simplest way to force that contract is to add a dummy 2nd ww_mutex lock to the dma_resv object, which we then can take #ifdef CONFIG_WW_MUTEX_SLOWPATH_DEBUG. Plus mabye a WARN_ON(!ticket). Now the real disaster is how we handle deadlocks. Two issues: - Ideally we'd keep any lock we've taken locked until the end, it helps needless backoffs. I've played around a bit with that but not even poc level, just an idea: https://cgit.freedesktop.org/~danvet/drm/commit/?id=b1799c5a0f02df9e1bb08d27be37331255ab7582 Idea is essentially to track a list of objects we had to lock as part of the ttm_bo_validate of the main object. - Second one is if we get a EDEADLCK on one of these sublocks (like the one here). We need to pass that up the entire callchain, including a temporary reference (we have to drop locks to do the ww_mutex_lock_slow call), and need a custom callback to drop that temporary reference (since that's all driver specific, might even be internal ww_mutex and not anything remotely looking like a normal dma_buf). This probably needs the exec util helpers from ttm, but at the dma_resv level, so that we can do something like this: struct dma_resv_ticket { struct ww_acquire_ctx base; /* can be set by anyone (including other drivers) that got hold of * this ticket and had to acquire some new lock. This lock might * protect anything, including driver-internal stuff, and isn't * required to be a dma_buf or even just a dma_resv. */ struct ww_mutex *contended_lock; /* callback which the driver (which might be a dma-buf exporter * and not matching the driver that started this locking ticket) * sets together with @contended_lock, for the main driver to drop * when it calls dma_resv_unlock on the contended_lock. */ void (drop_ref*)(struct ww_mutex *contended_lock); }; This is all supremely nasty (also ttm_bo_validate would need to be improved to handle these sublocks and random new objects that could force a ww_mutex_lock_slow). Just a short comment on this: Neither the currently u
Re: [Intel-gfx] [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify v2
On 2/20/20 9:08 PM, Daniel Vetter wrote: On Thu, Feb 20, 2020 at 08:46:27PM +0100, Thomas Hellström (VMware) wrote: On 2/20/20 7:04 PM, Daniel Vetter wrote: On Thu, Feb 20, 2020 at 10:39:06AM +0100, Thomas Hellström (VMware) wrote: On 2/19/20 7:42 AM, Thomas Hellström (VMware) wrote: On 2/18/20 10:01 PM, Daniel Vetter wrote: On Tue, Feb 18, 2020 at 9:17 PM Thomas Hellström (VMware) wrote: On 2/17/20 6:55 PM, Daniel Vetter wrote: On Mon, Feb 17, 2020 at 04:45:09PM +0100, Christian König wrote: Implement the importer side of unpinned DMA-buf handling. v2: update page tables immediately Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 66 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 2 files changed, 71 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 770baba621b3..48de7624d49c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -453,7 +453,71 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf) return ERR_PTR(ret); } +/** + * amdgpu_dma_buf_move_notify - _notify implementation + * + * @attach: the DMA-buf attachment + * + * Invalidate the DMA-buf attachment, making sure that the we re-create the + * mapping before the next use. + */ +static void +amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach) +{ + struct drm_gem_object *obj = attach->importer_priv; + struct ww_acquire_ctx *ticket = dma_resv_locking_ctx(obj->resv); + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); + struct ttm_operation_ctx ctx = { false, false }; + struct ttm_placement placement = {}; + struct amdgpu_vm_bo_base *bo_base; + int r; + + if (bo->tbo.mem.mem_type == TTM_PL_SYSTEM) + return; + + r = ttm_bo_validate(>tbo, , ); + if (r) { + DRM_ERROR("Failed to invalidate DMA-buf import (%d))\n", r); + return; + } + + for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) { + struct amdgpu_vm *vm = bo_base->vm; + struct dma_resv *resv = vm->root.base.bo->tbo.base.resv; + + if (ticket) { Yeah so this is kinda why I've been a total pain about the exact semantics of the move_notify hook. I think we should flat-out require that importers _always_ have a ticket attach when they call this, and that they can cope with additional locks being taken (i.e. full EDEADLCK) handling. Simplest way to force that contract is to add a dummy 2nd ww_mutex lock to the dma_resv object, which we then can take #ifdef CONFIG_WW_MUTEX_SLOWPATH_DEBUG. Plus mabye a WARN_ON(!ticket). Now the real disaster is how we handle deadlocks. Two issues: - Ideally we'd keep any lock we've taken locked until the end, it helps needless backoffs. I've played around a bit with that but not even poc level, just an idea: https://cgit.freedesktop.org/~danvet/drm/commit/?id=b1799c5a0f02df9e1bb08d27be37331255ab7582 Idea is essentially to track a list of objects we had to lock as part of the ttm_bo_validate of the main object. - Second one is if we get a EDEADLCK on one of these sublocks (like the one here). We need to pass that up the entire callchain, including a temporary reference (we have to drop locks to do the ww_mutex_lock_slow call), and need a custom callback to drop that temporary reference (since that's all driver specific, might even be internal ww_mutex and not anything remotely looking like a normal dma_buf). This probably needs the exec util helpers from ttm, but at the dma_resv level, so that we can do something like this: struct dma_resv_ticket { struct ww_acquire_ctx base; /* can be set by anyone (including other drivers) that got hold of * this ticket and had to acquire some new lock. This lock might * protect anything, including driver-internal stuff, and isn't * required to be a dma_buf or even just a dma_resv. */ struct ww_mutex *contended_lock; /* callback which the driver (which might be a dma-buf exporter * and not matching the driver that started this locking ticket) * sets together with @contended_lock, for the main driver to drop * when it calls dma_resv_unlock on the contended_lock. */ void (drop_ref*)(struct ww_mutex *contended_lock); }; This is all supremely nasty (also ttm_bo_validate would need to be improved to handle these sublocks and random new objects that could force a ww_mutex_lock_slow). Just a short comment on this: Neither the currently used wait-die or the wound-wait algorithm *strictly* requires a slow lock on the contended lock. For wait-die it's just very convenient sin
Re: [Intel-gfx] [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify v2
On 2/20/20 7:04 PM, Daniel Vetter wrote: On Thu, Feb 20, 2020 at 10:39:06AM +0100, Thomas Hellström (VMware) wrote: On 2/19/20 7:42 AM, Thomas Hellström (VMware) wrote: On 2/18/20 10:01 PM, Daniel Vetter wrote: On Tue, Feb 18, 2020 at 9:17 PM Thomas Hellström (VMware) wrote: On 2/17/20 6:55 PM, Daniel Vetter wrote: On Mon, Feb 17, 2020 at 04:45:09PM +0100, Christian König wrote: Implement the importer side of unpinned DMA-buf handling. v2: update page tables immediately Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 66 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 2 files changed, 71 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 770baba621b3..48de7624d49c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -453,7 +453,71 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf) return ERR_PTR(ret); } +/** + * amdgpu_dma_buf_move_notify - _notify implementation + * + * @attach: the DMA-buf attachment + * + * Invalidate the DMA-buf attachment, making sure that the we re-create the + * mapping before the next use. + */ +static void +amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach) +{ + struct drm_gem_object *obj = attach->importer_priv; + struct ww_acquire_ctx *ticket = dma_resv_locking_ctx(obj->resv); + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); + struct ttm_operation_ctx ctx = { false, false }; + struct ttm_placement placement = {}; + struct amdgpu_vm_bo_base *bo_base; + int r; + + if (bo->tbo.mem.mem_type == TTM_PL_SYSTEM) + return; + + r = ttm_bo_validate(>tbo, , ); + if (r) { + DRM_ERROR("Failed to invalidate DMA-buf import (%d))\n", r); + return; + } + + for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) { + struct amdgpu_vm *vm = bo_base->vm; + struct dma_resv *resv = vm->root.base.bo->tbo.base.resv; + + if (ticket) { Yeah so this is kinda why I've been a total pain about the exact semantics of the move_notify hook. I think we should flat-out require that importers _always_ have a ticket attach when they call this, and that they can cope with additional locks being taken (i.e. full EDEADLCK) handling. Simplest way to force that contract is to add a dummy 2nd ww_mutex lock to the dma_resv object, which we then can take #ifdef CONFIG_WW_MUTEX_SLOWPATH_DEBUG. Plus mabye a WARN_ON(!ticket). Now the real disaster is how we handle deadlocks. Two issues: - Ideally we'd keep any lock we've taken locked until the end, it helps needless backoffs. I've played around a bit with that but not even poc level, just an idea: https://cgit.freedesktop.org/~danvet/drm/commit/?id=b1799c5a0f02df9e1bb08d27be37331255ab7582 Idea is essentially to track a list of objects we had to lock as part of the ttm_bo_validate of the main object. - Second one is if we get a EDEADLCK on one of these sublocks (like the one here). We need to pass that up the entire callchain, including a temporary reference (we have to drop locks to do the ww_mutex_lock_slow call), and need a custom callback to drop that temporary reference (since that's all driver specific, might even be internal ww_mutex and not anything remotely looking like a normal dma_buf). This probably needs the exec util helpers from ttm, but at the dma_resv level, so that we can do something like this: struct dma_resv_ticket { struct ww_acquire_ctx base; /* can be set by anyone (including other drivers) that got hold of * this ticket and had to acquire some new lock. This lock might * protect anything, including driver-internal stuff, and isn't * required to be a dma_buf or even just a dma_resv. */ struct ww_mutex *contended_lock; /* callback which the driver (which might be a dma-buf exporter * and not matching the driver that started this locking ticket) * sets together with @contended_lock, for the main driver to drop * when it calls dma_resv_unlock on the contended_lock. */ void (drop_ref*)(struct ww_mutex *contended_lock); }; This is all supremely nasty (also ttm_bo_validate would need to be improved to handle these sublocks and random new objects that could force a ww_mutex_lock_slow). Just a short comment on this: Neither the currently used wait-die or the wound-wait algorithm *strictly* requires a slow lock on the contended lock. For wait-die it's just very convenient since it makes us sleep instead of spinning with -EDEADLK on the contended lock. For wound-wait IIRC one could just immediately restart the who
Re: [Intel-gfx] [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify v2
On 2/19/20 7:42 AM, Thomas Hellström (VMware) wrote: On 2/18/20 10:01 PM, Daniel Vetter wrote: On Tue, Feb 18, 2020 at 9:17 PM Thomas Hellström (VMware) wrote: On 2/17/20 6:55 PM, Daniel Vetter wrote: On Mon, Feb 17, 2020 at 04:45:09PM +0100, Christian König wrote: Implement the importer side of unpinned DMA-buf handling. v2: update page tables immediately Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 66 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 2 files changed, 71 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 770baba621b3..48de7624d49c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -453,7 +453,71 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf) return ERR_PTR(ret); } +/** + * amdgpu_dma_buf_move_notify - _notify implementation + * + * @attach: the DMA-buf attachment + * + * Invalidate the DMA-buf attachment, making sure that the we re-create the + * mapping before the next use. + */ +static void +amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach) +{ + struct drm_gem_object *obj = attach->importer_priv; + struct ww_acquire_ctx *ticket = dma_resv_locking_ctx(obj->resv); + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); + struct ttm_operation_ctx ctx = { false, false }; + struct ttm_placement placement = {}; + struct amdgpu_vm_bo_base *bo_base; + int r; + + if (bo->tbo.mem.mem_type == TTM_PL_SYSTEM) + return; + + r = ttm_bo_validate(>tbo, , ); + if (r) { + DRM_ERROR("Failed to invalidate DMA-buf import (%d))\n", r); + return; + } + + for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) { + struct amdgpu_vm *vm = bo_base->vm; + struct dma_resv *resv = vm->root.base.bo->tbo.base.resv; + + if (ticket) { Yeah so this is kinda why I've been a total pain about the exact semantics of the move_notify hook. I think we should flat-out require that importers _always_ have a ticket attach when they call this, and that they can cope with additional locks being taken (i.e. full EDEADLCK) handling. Simplest way to force that contract is to add a dummy 2nd ww_mutex lock to the dma_resv object, which we then can take #ifdef CONFIG_WW_MUTEX_SLOWPATH_DEBUG. Plus mabye a WARN_ON(!ticket). Now the real disaster is how we handle deadlocks. Two issues: - Ideally we'd keep any lock we've taken locked until the end, it helps needless backoffs. I've played around a bit with that but not even poc level, just an idea: https://cgit.freedesktop.org/~danvet/drm/commit/?id=b1799c5a0f02df9e1bb08d27be37331255ab7582 Idea is essentially to track a list of objects we had to lock as part of the ttm_bo_validate of the main object. - Second one is if we get a EDEADLCK on one of these sublocks (like the one here). We need to pass that up the entire callchain, including a temporary reference (we have to drop locks to do the ww_mutex_lock_slow call), and need a custom callback to drop that temporary reference (since that's all driver specific, might even be internal ww_mutex and not anything remotely looking like a normal dma_buf). This probably needs the exec util helpers from ttm, but at the dma_resv level, so that we can do something like this: struct dma_resv_ticket { struct ww_acquire_ctx base; /* can be set by anyone (including other drivers) that got hold of * this ticket and had to acquire some new lock. This lock might * protect anything, including driver-internal stuff, and isn't * required to be a dma_buf or even just a dma_resv. */ struct ww_mutex *contended_lock; /* callback which the driver (which might be a dma-buf exporter * and not matching the driver that started this locking ticket) * sets together with @contended_lock, for the main driver to drop * when it calls dma_resv_unlock on the contended_lock. */ void (drop_ref*)(struct ww_mutex *contended_lock); }; This is all supremely nasty (also ttm_bo_validate would need to be improved to handle these sublocks and random new objects that could force a ww_mutex_lock_slow). Just a short comment on this: Neither the currently used wait-die or the wound-wait algorithm *strictly* requires a slow lock on the contended lock. For wait-die it's just very convenient since it makes us sleep instead of spinning with -EDEADLK on the contended lock. For wound-wait IIRC one could just immediately restart the whole locking transaction after an -EDEADLK, and the transaction would automatically end up waiting on the contende
Re: [Intel-gfx] [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify v2
On 2/18/20 10:01 PM, Daniel Vetter wrote: On Tue, Feb 18, 2020 at 9:17 PM Thomas Hellström (VMware) wrote: On 2/17/20 6:55 PM, Daniel Vetter wrote: On Mon, Feb 17, 2020 at 04:45:09PM +0100, Christian König wrote: Implement the importer side of unpinned DMA-buf handling. v2: update page tables immediately Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 66 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 2 files changed, 71 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 770baba621b3..48de7624d49c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -453,7 +453,71 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf) return ERR_PTR(ret); } +/** + * amdgpu_dma_buf_move_notify - _notify implementation + * + * @attach: the DMA-buf attachment + * + * Invalidate the DMA-buf attachment, making sure that the we re-create the + * mapping before the next use. + */ +static void +amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach) +{ +struct drm_gem_object *obj = attach->importer_priv; +struct ww_acquire_ctx *ticket = dma_resv_locking_ctx(obj->resv); +struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); +struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); +struct ttm_operation_ctx ctx = { false, false }; +struct ttm_placement placement = {}; +struct amdgpu_vm_bo_base *bo_base; +int r; + +if (bo->tbo.mem.mem_type == TTM_PL_SYSTEM) +return; + +r = ttm_bo_validate(>tbo, , ); +if (r) { +DRM_ERROR("Failed to invalidate DMA-buf import (%d))\n", r); +return; +} + +for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) { +struct amdgpu_vm *vm = bo_base->vm; +struct dma_resv *resv = vm->root.base.bo->tbo.base.resv; + +if (ticket) { Yeah so this is kinda why I've been a total pain about the exact semantics of the move_notify hook. I think we should flat-out require that importers _always_ have a ticket attach when they call this, and that they can cope with additional locks being taken (i.e. full EDEADLCK) handling. Simplest way to force that contract is to add a dummy 2nd ww_mutex lock to the dma_resv object, which we then can take #ifdef CONFIG_WW_MUTEX_SLOWPATH_DEBUG. Plus mabye a WARN_ON(!ticket). Now the real disaster is how we handle deadlocks. Two issues: - Ideally we'd keep any lock we've taken locked until the end, it helps needless backoffs. I've played around a bit with that but not even poc level, just an idea: https://cgit.freedesktop.org/~danvet/drm/commit/?id=b1799c5a0f02df9e1bb08d27be37331255ab7582 Idea is essentially to track a list of objects we had to lock as part of the ttm_bo_validate of the main object. - Second one is if we get a EDEADLCK on one of these sublocks (like the one here). We need to pass that up the entire callchain, including a temporary reference (we have to drop locks to do the ww_mutex_lock_slow call), and need a custom callback to drop that temporary reference (since that's all driver specific, might even be internal ww_mutex and not anything remotely looking like a normal dma_buf). This probably needs the exec util helpers from ttm, but at the dma_resv level, so that we can do something like this: struct dma_resv_ticket { struct ww_acquire_ctx base; /* can be set by anyone (including other drivers) that got hold of * this ticket and had to acquire some new lock. This lock might * protect anything, including driver-internal stuff, and isn't * required to be a dma_buf or even just a dma_resv. */ struct ww_mutex *contended_lock; /* callback which the driver (which might be a dma-buf exporter * and not matching the driver that started this locking ticket) * sets together with @contended_lock, for the main driver to drop * when it calls dma_resv_unlock on the contended_lock. */ void (drop_ref*)(struct ww_mutex *contended_lock); }; This is all supremely nasty (also ttm_bo_validate would need to be improved to handle these sublocks and random new objects that could force a ww_mutex_lock_slow). Just a short comment on this: Neither the currently used wait-die or the wound-wait algorithm *strictly* requires a slow lock on the contended lock. For wait-die it's just very convenient since it makes us sleep instead of spinning with -EDEADLK on the contended lock. For wound-wait IIRC one could just immediately restart the whole locking transaction after an -EDEADLK, and the transaction would automatically end up waiting on the contended lock, provided the mutex lock stealing is not allowed. There is however a possibility th
Re: [Intel-gfx] [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify v2
On 2/17/20 6:55 PM, Daniel Vetter wrote: On Mon, Feb 17, 2020 at 04:45:09PM +0100, Christian König wrote: Implement the importer side of unpinned DMA-buf handling. v2: update page tables immediately Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 66 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 2 files changed, 71 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 770baba621b3..48de7624d49c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -453,7 +453,71 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf) return ERR_PTR(ret); } +/** + * amdgpu_dma_buf_move_notify - _notify implementation + * + * @attach: the DMA-buf attachment + * + * Invalidate the DMA-buf attachment, making sure that the we re-create the + * mapping before the next use. + */ +static void +amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach) +{ + struct drm_gem_object *obj = attach->importer_priv; + struct ww_acquire_ctx *ticket = dma_resv_locking_ctx(obj->resv); + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); + struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); + struct ttm_operation_ctx ctx = { false, false }; + struct ttm_placement placement = {}; + struct amdgpu_vm_bo_base *bo_base; + int r; + + if (bo->tbo.mem.mem_type == TTM_PL_SYSTEM) + return; + + r = ttm_bo_validate(>tbo, , ); + if (r) { + DRM_ERROR("Failed to invalidate DMA-buf import (%d))\n", r); + return; + } + + for (bo_base = bo->vm_bo; bo_base; bo_base = bo_base->next) { + struct amdgpu_vm *vm = bo_base->vm; + struct dma_resv *resv = vm->root.base.bo->tbo.base.resv; + + if (ticket) { Yeah so this is kinda why I've been a total pain about the exact semantics of the move_notify hook. I think we should flat-out require that importers _always_ have a ticket attach when they call this, and that they can cope with additional locks being taken (i.e. full EDEADLCK) handling. Simplest way to force that contract is to add a dummy 2nd ww_mutex lock to the dma_resv object, which we then can take #ifdef CONFIG_WW_MUTEX_SLOWPATH_DEBUG. Plus mabye a WARN_ON(!ticket). Now the real disaster is how we handle deadlocks. Two issues: - Ideally we'd keep any lock we've taken locked until the end, it helps needless backoffs. I've played around a bit with that but not even poc level, just an idea: https://cgit.freedesktop.org/~danvet/drm/commit/?id=b1799c5a0f02df9e1bb08d27be37331255ab7582 Idea is essentially to track a list of objects we had to lock as part of the ttm_bo_validate of the main object. - Second one is if we get a EDEADLCK on one of these sublocks (like the one here). We need to pass that up the entire callchain, including a temporary reference (we have to drop locks to do the ww_mutex_lock_slow call), and need a custom callback to drop that temporary reference (since that's all driver specific, might even be internal ww_mutex and not anything remotely looking like a normal dma_buf). This probably needs the exec util helpers from ttm, but at the dma_resv level, so that we can do something like this: struct dma_resv_ticket { struct ww_acquire_ctx base; /* can be set by anyone (including other drivers) that got hold of * this ticket and had to acquire some new lock. This lock might * protect anything, including driver-internal stuff, and isn't * required to be a dma_buf or even just a dma_resv. */ struct ww_mutex *contended_lock; /* callback which the driver (which might be a dma-buf exporter * and not matching the driver that started this locking ticket) * sets together with @contended_lock, for the main driver to drop * when it calls dma_resv_unlock on the contended_lock. */ void (drop_ref*)(struct ww_mutex *contended_lock); }; This is all supremely nasty (also ttm_bo_validate would need to be improved to handle these sublocks and random new objects that could force a ww_mutex_lock_slow). Just a short comment on this: Neither the currently used wait-die or the wound-wait algorithm *strictly* requires a slow lock on the contended lock. For wait-die it's just very convenient since it makes us sleep instead of spinning with -EDEADLK on the contended lock. For wound-wait IIRC one could just immediately restart the whole locking transaction after an -EDEADLK, and the transaction would automatically end up waiting on the contended lock, provided the mutex lock stealing is not allowed. There is however a possibility that the transaction will be wounded again on another lock, taken before the contended lock,
Re: [Intel-gfx] [PATCH] drm/auth: Drop master_create/destroy hooks
On 1/27/20 11:02 AM, Daniel Vetter wrote: vmwgfx stopped using them. With the drm device model that we've slowly evolved over the past few years master status essentially controls access to display resources, and nothing else. Since that's a pure access permission check drivers should have no need at all to track additional state on a per file basis. Aside: For cleanup and restoring kernel-internal clients the grand plan is to move everyone over to drm_client and drm_master_internal_acquire/release, like the generic fbdev code already does. That should get rid of most ->lastclose implementations, and I think also subsumes any processing vmwgfx does in master_set/drop. Cc: "Thomas Hellström (VMware)" Signed-off-by: Daniel Vetter Reviewed-by: Thomas Hellstrom ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm: Avoid drm_global_mutex for simple inc/dec of dev->open_count
On 1/24/20 7:39 PM, Chris Wilson wrote: Quoting Thomas Hellström (VMware) (2020-01-24 13:37:47) On 1/24/20 2:01 PM, Chris Wilson wrote: Since drm_global_mutex is a true global mutex across devices, we don't want to acquire it unless absolutely necessary. For maintaining the device local open_count, we can use atomic operations on the counter itself, except when making the transition to/from 0. Here, we tackle the easy portion of delaying acquiring the drm_global_mutex for the final release by using atomic_dec_and_mutex_lock(), leaving the global serialisation across the device opens. Signed-off-by: Chris Wilson Cc: Thomas Hellström (VMware) For the series: Reviewed-by: Thomas Hellström Now being opt-in, it is fairly limited in scope and will not randomly break others (touch wood) and the close() racing in BAT didn't throw anything up, so pushed to drm-misc-next. Thanks for the review and suggestions, Next task is to suggest others might like to use it as well. -Chris Thanks. I'll look at doing the same for those drivers I audited. /Thomas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm: Avoid drm_global_mutex for simple inc/dec of dev->open_count
On 1/24/20 2:01 PM, Chris Wilson wrote: Since drm_global_mutex is a true global mutex across devices, we don't want to acquire it unless absolutely necessary. For maintaining the device local open_count, we can use atomic operations on the counter itself, except when making the transition to/from 0. Here, we tackle the easy portion of delaying acquiring the drm_global_mutex for the final release by using atomic_dec_and_mutex_lock(), leaving the global serialisation across the device opens. Signed-off-by: Chris Wilson Cc: Thomas Hellström (VMware) For the series: Reviewed-by: Thomas Hellström Now the only remaining (though pre-existing) problem I can see is that there is no corresponding mutex lock in drm_open() so that firstopen might race with lastclose.. Or I might be missing something.. /Thomas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] drm: Release filp before global lock
On 1/23/20 11:21 PM, Chris Wilson wrote: The file is not part of the global drm resource and can be released prior to take the global mutex to drop the open_count (and potentially close) the drm device. As the global mutex is indeed global, not only within the device but across devices, a slow file release mechanism can bottleneck the entire system. However, inside drm_close_helper() there are a number of dev->driver callbacks that take the drm_device as the first parameter... Worryingly some of those callbacks may be (implicitly) depending on the global mutex. From a quick audit, via, sis and vmwgfx are safe, so for those Acked-by: Thomas Hellstrom Savage appears to be unsafe, due to unprotected access in the dma device member. Haven't audited i810 or potential other drivers affected. Perhaps it makes sense to enable lockfree filp release on a per-driver basis to begin with? /Thomas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm: Release filp before global lock
On 1/22/20 11:00 PM, Chris Wilson wrote: Quoting Thomas Hellström (VMware) (2020-01-22 21:52:23) Hi, Chris, On 1/22/20 4:56 PM, Chris Wilson wrote: The file is not part of the global drm resource and can be released prior to take the global mutex to drop the open_count (and potentially close) the drm device. However, inside drm_close_helper() there are a number of dev->driver callbacks that take the drm_device as the first parameter... Worryingly some of those callbacks may be (implicitly) depending on the global mutex. I read this as you suspect that there are driver callbacks inside drm_close_helper() that might need the global mutex held? But then it wouldn't be safe to move the lock? Is there a strong motivation for moving the locking in the first place? Also a minor nit below: The number of processes stuck on 'D' due to mutex_lock() caught my attention while they were cleaning up files. I think everyone else will be less impressed if their driver was stuck because i915 was freeing a user's filp. Understood. Perhaps a short motivation in the log message? Signed-off-by: Chris Wilson --- drivers/gpu/drm/drm_file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 92d16724f949..84ed313ee2e9 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -438,12 +438,12 @@ int drm_release(struct inode *inode, struct file *filp) struct drm_minor *minor = file_priv->minor; struct drm_device *dev = minor->dev; - mutex_lock(_global_mutex); - DRM_DEBUG("open_count = %d\n", dev->open_count); The read of dev->open_count should still be inside the lock to be consistent with the value that is decremented below. Perhaps move the DRM_DEBUG()? Sure. Is it even worth a debug? Probably an old relic. I'm fine with letting it go. Thanks, Thomas -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm: Release filp before global lock
Hi, Chris, On 1/22/20 4:56 PM, Chris Wilson wrote: The file is not part of the global drm resource and can be released prior to take the global mutex to drop the open_count (and potentially close) the drm device. However, inside drm_close_helper() there are a number of dev->driver callbacks that take the drm_device as the first parameter... Worryingly some of those callbacks may be (implicitly) depending on the global mutex. I read this as you suspect that there are driver callbacks inside drm_close_helper() that might need the global mutex held? But then it wouldn't be safe to move the lock? Is there a strong motivation for moving the locking in the first place? Also a minor nit below: Signed-off-by: Chris Wilson --- drivers/gpu/drm/drm_file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 92d16724f949..84ed313ee2e9 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -438,12 +438,12 @@ int drm_release(struct inode *inode, struct file *filp) struct drm_minor *minor = file_priv->minor; struct drm_device *dev = minor->dev; - mutex_lock(_global_mutex); - DRM_DEBUG("open_count = %d\n", dev->open_count); The read of dev->open_count should still be inside the lock to be consistent with the value that is decremented below. Perhaps move the DRM_DEBUG()? drm_close_helper(filp); + mutex_lock(_global_mutex); + if (!--dev->open_count) drm_lastclose(dev); Thanks, Thomas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/ttm: remove ttm_bo_wait_unreserved
On 8/22/19 4:24 PM, Thomas Hellström (VMware) wrote: On 8/22/19 4:02 PM, Koenig, Christian wrote: Am 22.08.19 um 15:06 schrieb Daniel Vetter: On Thu, Aug 22, 2019 at 07:56:56AM +, Koenig, Christian wrote: Am 22.08.19 um 08:49 schrieb Daniel Vetter: With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer. Assuming I didn't screw up anything with my audit of course. v2: - Dont forget wu_mutex (Christian König) - Keep the mmap_sem-less wait optimization (Thomas) - Use _lock_interruptible to be good citizens (Thomas) Reviewed-by: Christian König btw I realized I didn't remove your r-b, since v1 was broken. For formality, can you pls reaffirm, or still something broken? My r-b is still valid. Only problem I see is that neither of us seems to have a good idea about the different VM_FAULT_* replies. I took a look in mm/gup.c. It seems like when using get_user_pages, VM_FAULT_RETRY will retry s/retry/return/ to a requesting caller telling it that a long wait was expected and not performed, whereas VM_FAULT_NOPAGE will just keep get_user_pages to spin. So the proposed patch should be correct from my understanding. If the fault originates from user-space, I guess either is fine. /Thomas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/ttm: remove ttm_bo_wait_unreserved
On 8/22/19 4:02 PM, Koenig, Christian wrote: Am 22.08.19 um 15:06 schrieb Daniel Vetter: On Thu, Aug 22, 2019 at 07:56:56AM +, Koenig, Christian wrote: Am 22.08.19 um 08:49 schrieb Daniel Vetter: With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer. Assuming I didn't screw up anything with my audit of course. v2: - Dont forget wu_mutex (Christian König) - Keep the mmap_sem-less wait optimization (Thomas) - Use _lock_interruptible to be good citizens (Thomas) Reviewed-by: Christian König btw I realized I didn't remove your r-b, since v1 was broken. For formality, can you pls reaffirm, or still something broken? My r-b is still valid. Only problem I see is that neither of us seems to have a good idea about the different VM_FAULT_* replies. I took a look in mm/gup.c. It seems like when using get_user_pages, VM_FAULT_RETRY will retry to a requesting caller telling it that a long wait was expected and not performed, whereas VM_FAULT_NOPAGE will just keep get_user_pages to spin. So the proposed patch should be correct from my understanding. If the fault originates from user-space, I guess either is fine. /Thomas But that worked before so it should still work now, Christian. Also from the other thread: Reviewed-by: Thomas Hellström Thanks, Daniel Signed-off-by: Daniel Vetter Cc: Christian Koenig Cc: Huang Rui Cc: Gerd Hoffmann Cc: "VMware Graphics" Cc: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c | 36 --- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +--- include/drm/ttm/ttm_bo_api.h | 4 4 files changed, 5 insertions(+), 54 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(>base._resv); - mutex_destroy(>wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size); } @@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(>ddestroy); INIT_LIST_HEAD(>swap); INIT_LIST_HEAD(>io_reserve_lru); - mutex_init(>wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages; @@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ - int ret; - - /* -* In the absense of a wait_unlocked API, -* Use the bo::wu_mutex to avoid triggering livelocks due to -* concurrent use of this function. Note that this use of -* bo::wu_mutex can go away if we change locking order to -* mmap_sem -> bo::reserve. -*/ - ret = mutex_lock_interruptible(>wu_mutex); - if (unlikely(ret != 0)) - return -ERESTARTSYS; - if (!dma_resv_is_locked(bo->base.resv)) - goto out_unlock; - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); - if (ret == -EINTR) - ret = -ERESTARTSYS; - if (unlikely(ret != 0)) - goto out_unlock; - dma_resv_unlock(bo->base.resv); - -out_unlock: - mutex_unlock(>wu_mutex); - return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(>base.lru); INIT_LIST_HEAD(>base.swap); INIT_LIST_HEAD(>base.io_reserve_lru); - mutex_init(>base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(>base.base.vma_node); atomic_set(>base.cpu_writers, 0); diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) >man[bo->mem.mem_type]; struct vm_area_struct cvma; - /* -* Work around locking order reversal in fault / nopfn -* between mmap_sem and bo_reserve: Perform a trylock operation -* for reserve, and if it fails, retry the fault after waiting -
Re: [Intel-gfx] [PATCH 1/3] dma_resv: prime lockdep annotations
On 8/21/19 9:51 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 08:27:59PM +0200, Thomas Hellström (VMware) wrote: On 8/21/19 8:11 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 7:06 PM Thomas Hellström (VMware) wrote: On 8/21/19 6:34 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 05:54:27PM +0200, Thomas Hellström (VMware) wrote: On 8/20/19 4:53 PM, Daniel Vetter wrote: Full audit of everyone: - i915, radeon, amdgpu should be clean per their maintainers. - vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all. - panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean. - v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section. - vmwgfx has a bunch of ioctls that do their own copy_*_user: - vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe. - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out. - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there. Summary: vmwgfx seems to be fine too. - virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe. - qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe. - A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up. Cc: Alex Deucher Cc: Christian König Cc: Chris Wilson Cc: Thomas Zimmermann Cc: Rob Herring Cc: Tomeu Vizoso Cc: Eric Anholt Cc: Dave Airlie Cc: Gerd Hoffmann Cc: Ben Skeggs Cc: "VMware Graphics" Cc: Thomas Hellstrom Signed-off-by: Daniel Vetter --- drivers/dma-buf/dma-resv.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ #include #include +#include /** * DOC: Reservation Object Overview @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) _seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); + + if (IS_ENABLED(CONFIG_LOCKDEP)) { + if (current->mm) + down_read(>mm->mmap_sem); + ww_mutex_lock(>lock, NULL); + fs_reclaim_acquire(GFP_KERNEL); + fs_reclaim_release(GFP_KERNEL); + ww_mutex_unlock(>lock); + if (current->mm) + up_read(>mm->mmap_sem); + } } EXPORT_SYMBOL(dma_resv_init); I assume if this would have been easily done and maintainable using only lockdep annotation instead of actually acquiring the locks, that would have been done? There's might_lock(), plus a pile of macros, but they don't map obviuosly, so pretty good chances I accidentally end up with the wrong type of annotation. Easier to just take the locks quickly, and stuff that all into a lockdep-only section to avoid overhead. Otherwise LGTM. Reviewed-by: Thomas Hellström Will test this and let you know if it trips on vmwgfx, but it really shouldn't. Thanks, Daniel One thing that strikes me is that this puts restrictions on where you can actually initialize a dma_resv, even if locking orders are otherwise obeyed. But that might not be a big problem. Hm yeah ... the trouble is a need a non-kthread thread so that I have a current->mm. Otherwise I'd have put it into some init section with a temp dma_buf. And I kinda don't want to create a fake ->mm just for lockdep priming. I don't expect this to be a real problem in practice, since before you've called dma_resv_init the res
Re: [Intel-gfx] [PATCH] drm/ttm: remove ttm_bo_wait_unreserved
On 8/22/19 10:47 AM, Daniel Vetter wrote: On Thu, Aug 22, 2019 at 9:56 AM Koenig, Christian wrote: Am 22.08.19 um 08:49 schrieb Daniel Vetter: With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer. Assuming I didn't screw up anything with my audit of course. v2: - Dont forget wu_mutex (Christian König) - Keep the mmap_sem-less wait optimization (Thomas) - Use _lock_interruptible to be good citizens (Thomas) Reviewed-by: Christian König Signed-off-by: Daniel Vetter Cc: Christian Koenig Cc: Huang Rui Cc: Gerd Hoffmann Cc: "VMware Graphics" Cc: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c | 36 --- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 18 +--- include/drm/ttm/ttm_bo_api.h | 4 4 files changed, 5 insertions(+), 54 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..d1ce5d315d5b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref) dma_fence_put(bo->moving); if (!ttm_bo_uses_embedded_gem_object(bo)) dma_resv_fini(>base._resv); - mutex_destroy(>wu_mutex); bo->destroy(bo); ttm_mem_global_free(bdev->glob->mem_glob, acc_size); } @@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, INIT_LIST_HEAD(>ddestroy); INIT_LIST_HEAD(>swap); INIT_LIST_HEAD(>io_reserve_lru); - mutex_init(>wu_mutex); bo->bdev = bdev; bo->type = type; bo->num_pages = num_pages; @@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ - int ret; - - /* - * In the absense of a wait_unlocked API, - * Use the bo::wu_mutex to avoid triggering livelocks due to - * concurrent use of this function. Note that this use of - * bo::wu_mutex can go away if we change locking order to - * mmap_sem -> bo::reserve. - */ - ret = mutex_lock_interruptible(>wu_mutex); - if (unlikely(ret != 0)) - return -ERESTARTSYS; - if (!dma_resv_is_locked(bo->base.resv)) - goto out_unlock; - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); - if (ret == -EINTR) - ret = -ERESTARTSYS; - if (unlikely(ret != 0)) - goto out_unlock; - dma_resv_unlock(bo->base.resv); - -out_unlock: - mutex_unlock(>wu_mutex); - return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index fe81c565e7ef..82ea26a49959 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(>base.lru); INIT_LIST_HEAD(>base.swap); INIT_LIST_HEAD(>base.io_reserve_lru); - mutex_init(>base.wu_mutex); fbo->base.moving = NULL; drm_vma_node_reset(>base.base.vma_node); atomic_set(>base.cpu_writers, 0); diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..a61a35e57d1c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) >man[bo->mem.mem_type]; struct vm_area_struct cvma; - /* - * Work around locking order reversal in fault / nopfn - * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after waiting - * for the buffer to become unreserved. - */ if (unlikely(!dma_resv_trylock(bo->base.resv))) { if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { Not an expert on fault handling, but shouldn't this now be one if? E.g. if FAULT_FLAG_RETRY_NOWAIT is set we should return VM_FAULT_NOPAGE instead of VM_FAULT_RETRY. Honestly I have no idea at all about this stuff. I just learned about the mmap_sem less retry now that Thomas pointed it out, and I have no idea how anything else here works. My approach has always been to just throw ridiculous amounts of really nasty tests at fault handling (including handling our own gtt mmaps to copy*user in relocs or gup for userptr and all that), and leave it at that :-) But really take that with a grain of salt, No idea either. It should be functionally equivalent to wha
Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved
On 8/21/19 4:47 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware) wrote: On 8/21/19 4:09 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) wrote: On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote: On 8/20/19 4:53 PM, Daniel Vetter wrote: With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer. Assuming I didn't screw up anything with my audit of course. Signed-off-by: Daniel Vetter Cc: Christian Koenig Cc: Huang Rui Cc: Gerd Hoffmann Cc: "VMware Graphics" Cc: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c| 34 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 + include/drm/ttm/ttm_bo_api.h| 1 - 3 files changed, 1 insertion(+), 60 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ -int ret; - -/* - * In the absense of a wait_unlocked API, - * Use the bo::wu_mutex to avoid triggering livelocks due to - * concurrent use of this function. Note that this use of - * bo::wu_mutex can go away if we change locking order to - * mmap_sem -> bo::reserve. - */ -ret = mutex_lock_interruptible(>wu_mutex); -if (unlikely(ret != 0)) -return -ERESTARTSYS; -if (!dma_resv_is_locked(bo->base.resv)) -goto out_unlock; -ret = dma_resv_lock_interruptible(bo->base.resv, NULL); -if (ret == -EINTR) -ret = -ERESTARTSYS; -if (unlikely(ret != 0)) -goto out_unlock; -dma_resv_unlock(bo->base.resv); - -out_unlock: -mutex_unlock(>wu_mutex); -return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) >man[bo->mem.mem_type]; struct vm_area_struct cvma; -/* - * Work around locking order reversal in fault / nopfn - * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after waiting - * for the buffer to become unreserved. - */ -if (unlikely(!dma_resv_trylock(bo->base.resv))) { -if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { -if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { -ttm_bo_get(bo); - up_read(>vma->vm_mm->mmap_sem); -(void) ttm_bo_wait_unreserved(bo); -ttm_bo_put(bo); -} - -return VM_FAULT_RETRY; -} - -/* - * If we'd want to change locking order to - * mmap_sem -> bo::reserve, we'd use a blocking reserve here - * instead of retrying the fault... - */ I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order. The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification. Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry. That would be patches 1&2 in this series. Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation, Only nouveau was breaking was doing copy_*_user or get_user_pages while holding dma_resv locks, no one else. So nothing else needed to be changed. But patch 1 contains the full audit. I might have missed something. but to keep the mm latency optimization using the RETRY functionality: Still no idea why this is needed? All the comments here and the code and history seem like they've been about the mmap_sem vs dma_resv inversion between driver ioctls and fault handling here. Once that's officially fixed there's no reason to play games here and retry loops - previously that was necessary because the old ttm_bo_vm_fault had a busy spin and that's definitely not nice. If it's needed I think it should be a second patch on top, to keep this all clear. I had to audit an enormous amount of code, I'd like to make sure I didn't miss anything before we start to mak
Re: [Intel-gfx] [PATCH 1/3] dma_resv: prime lockdep annotations
On 8/21/19 6:34 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 05:54:27PM +0200, Thomas Hellström (VMware) wrote: On 8/20/19 4:53 PM, Daniel Vetter wrote: Full audit of everyone: - i915, radeon, amdgpu should be clean per their maintainers. - vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all. - panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean. - v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section. - vmwgfx has a bunch of ioctls that do their own copy_*_user: - vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe. - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out. - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there. Summary: vmwgfx seems to be fine too. - virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe. - qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe. - A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up. Cc: Alex Deucher Cc: Christian König Cc: Chris Wilson Cc: Thomas Zimmermann Cc: Rob Herring Cc: Tomeu Vizoso Cc: Eric Anholt Cc: Dave Airlie Cc: Gerd Hoffmann Cc: Ben Skeggs Cc: "VMware Graphics" Cc: Thomas Hellstrom Signed-off-by: Daniel Vetter --- drivers/dma-buf/dma-resv.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ #include #include +#include /** * DOC: Reservation Object Overview @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) _seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); + + if (IS_ENABLED(CONFIG_LOCKDEP)) { + if (current->mm) + down_read(>mm->mmap_sem); + ww_mutex_lock(>lock, NULL); + fs_reclaim_acquire(GFP_KERNEL); + fs_reclaim_release(GFP_KERNEL); + ww_mutex_unlock(>lock); + if (current->mm) + up_read(>mm->mmap_sem); + } } EXPORT_SYMBOL(dma_resv_init); I assume if this would have been easily done and maintainable using only lockdep annotation instead of actually acquiring the locks, that would have been done? There's might_lock(), plus a pile of macros, but they don't map obviuosly, so pretty good chances I accidentally end up with the wrong type of annotation. Easier to just take the locks quickly, and stuff that all into a lockdep-only section to avoid overhead. Otherwise LGTM. Reviewed-by: Thomas Hellström Will test this and let you know if it trips on vmwgfx, but it really shouldn't. Thanks, Daniel One thing that strikes me is that this puts restrictions on where you can actually initialize a dma_resv, even if locking orders are otherwise obeyed. But that might not be a big problem. /Thomas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved
On 8/21/19 5:14 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 5:03 PM Thomas Hellström (VMware) wrote: On 8/21/19 4:47 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware) wrote: On 8/21/19 4:09 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) wrote: On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote: On 8/20/19 4:53 PM, Daniel Vetter wrote: With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer. Assuming I didn't screw up anything with my audit of course. Signed-off-by: Daniel Vetter Cc: Christian Koenig Cc: Huang Rui Cc: Gerd Hoffmann Cc: "VMware Graphics" Cc: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c| 34 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 + include/drm/ttm/ttm_bo_api.h| 1 - 3 files changed, 1 insertion(+), 60 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ -int ret; - -/* - * In the absense of a wait_unlocked API, - * Use the bo::wu_mutex to avoid triggering livelocks due to - * concurrent use of this function. Note that this use of - * bo::wu_mutex can go away if we change locking order to - * mmap_sem -> bo::reserve. - */ -ret = mutex_lock_interruptible(>wu_mutex); -if (unlikely(ret != 0)) -return -ERESTARTSYS; -if (!dma_resv_is_locked(bo->base.resv)) -goto out_unlock; -ret = dma_resv_lock_interruptible(bo->base.resv, NULL); -if (ret == -EINTR) -ret = -ERESTARTSYS; -if (unlikely(ret != 0)) -goto out_unlock; -dma_resv_unlock(bo->base.resv); - -out_unlock: -mutex_unlock(>wu_mutex); -return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) >man[bo->mem.mem_type]; struct vm_area_struct cvma; -/* - * Work around locking order reversal in fault / nopfn - * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after waiting - * for the buffer to become unreserved. - */ -if (unlikely(!dma_resv_trylock(bo->base.resv))) { -if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { -if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { -ttm_bo_get(bo); - up_read(>vma->vm_mm->mmap_sem); -(void) ttm_bo_wait_unreserved(bo); -ttm_bo_put(bo); -} - -return VM_FAULT_RETRY; -} - -/* - * If we'd want to change locking order to - * mmap_sem -> bo::reserve, we'd use a blocking reserve here - * instead of retrying the fault... - */ I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order. The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification. Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry. That would be patches 1&2 in this series. Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation, Only nouveau was breaking was doing copy_*_user or get_user_pages while holding dma_resv locks, no one else. So nothing else needed to be changed. But patch 1 contains the full audit. I might have missed something. but to keep the mm latency optimization using the RETRY functionality: Still no idea why this is needed? All the comments here and the code and history seem like they've been about the mmap_sem vs dma_resv inversion between driver ioctls and fault handling here. Once that's officially fixed there's no reason to play games here and retry loops - previously that was necessary because the old ttm_bo_vm_fault had a busy spin and that's definitely not nice. If it's needed I think it should be a second patch on top, to keep this all cle
Re: [Intel-gfx] [PATCH 1/3] dma_resv: prime lockdep annotations
On 8/21/19 8:11 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 7:06 PM Thomas Hellström (VMware) wrote: On 8/21/19 6:34 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 05:54:27PM +0200, Thomas Hellström (VMware) wrote: On 8/20/19 4:53 PM, Daniel Vetter wrote: Full audit of everyone: - i915, radeon, amdgpu should be clean per their maintainers. - vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all. - panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean. - v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section. - vmwgfx has a bunch of ioctls that do their own copy_*_user: - vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe. - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out. - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there. Summary: vmwgfx seems to be fine too. - virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe. - qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe. - A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up. Cc: Alex Deucher Cc: Christian König Cc: Chris Wilson Cc: Thomas Zimmermann Cc: Rob Herring Cc: Tomeu Vizoso Cc: Eric Anholt Cc: Dave Airlie Cc: Gerd Hoffmann Cc: Ben Skeggs Cc: "VMware Graphics" Cc: Thomas Hellstrom Signed-off-by: Daniel Vetter --- drivers/dma-buf/dma-resv.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ #include #include +#include /** * DOC: Reservation Object Overview @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) _seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); + + if (IS_ENABLED(CONFIG_LOCKDEP)) { + if (current->mm) + down_read(>mm->mmap_sem); + ww_mutex_lock(>lock, NULL); + fs_reclaim_acquire(GFP_KERNEL); + fs_reclaim_release(GFP_KERNEL); + ww_mutex_unlock(>lock); + if (current->mm) + up_read(>mm->mmap_sem); + } } EXPORT_SYMBOL(dma_resv_init); I assume if this would have been easily done and maintainable using only lockdep annotation instead of actually acquiring the locks, that would have been done? There's might_lock(), plus a pile of macros, but they don't map obviuosly, so pretty good chances I accidentally end up with the wrong type of annotation. Easier to just take the locks quickly, and stuff that all into a lockdep-only section to avoid overhead. Otherwise LGTM. Reviewed-by: Thomas Hellström Will test this and let you know if it trips on vmwgfx, but it really shouldn't. Thanks, Daniel One thing that strikes me is that this puts restrictions on where you can actually initialize a dma_resv, even if locking orders are otherwise obeyed. But that might not be a big problem. Hm yeah ... the trouble is a need a non-kthread thread so that I have a current->mm. Otherwise I'd have put it into some init section with a temp dma_buf. And I kinda don't want to create a fake ->mm just for lockdep priming. I don't expect this to be a real problem in practice, since before you've called dma_resv_init the reservation lock doesn't exist, so you can't hold it. And you've probably just allocated it, so fs_reclaim is going to be fine. And if you allocate dma_resv obj
Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved
On 8/20/19 4:53 PM, Daniel Vetter wrote: With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer. Assuming I didn't screw up anything with my audit of course. Signed-off-by: Daniel Vetter Cc: Christian Koenig Cc: Huang Rui Cc: Gerd Hoffmann Cc: "VMware Graphics" Cc: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c| 34 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 + include/drm/ttm/ttm_bo_api.h| 1 - 3 files changed, 1 insertion(+), 60 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ - int ret; - - /* -* In the absense of a wait_unlocked API, -* Use the bo::wu_mutex to avoid triggering livelocks due to -* concurrent use of this function. Note that this use of -* bo::wu_mutex can go away if we change locking order to -* mmap_sem -> bo::reserve. -*/ - ret = mutex_lock_interruptible(>wu_mutex); - if (unlikely(ret != 0)) - return -ERESTARTSYS; - if (!dma_resv_is_locked(bo->base.resv)) - goto out_unlock; - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); - if (ret == -EINTR) - ret = -ERESTARTSYS; - if (unlikely(ret != 0)) - goto out_unlock; - dma_resv_unlock(bo->base.resv); - -out_unlock: - mutex_unlock(>wu_mutex); - return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) >man[bo->mem.mem_type]; struct vm_area_struct cvma; - /* -* Work around locking order reversal in fault / nopfn -* between mmap_sem and bo_reserve: Perform a trylock operation -* for reserve, and if it fails, retry the fault after waiting -* for the buffer to become unreserved. -*/ - if (unlikely(!dma_resv_trylock(bo->base.resv))) { - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { - if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { - ttm_bo_get(bo); - up_read(>vma->vm_mm->mmap_sem); - (void) ttm_bo_wait_unreserved(bo); - ttm_bo_put(bo); - } - - return VM_FAULT_RETRY; - } - - /* -* If we'd want to change locking order to -* mmap_sem -> bo::reserve, we'd use a blocking reserve here -* instead of retrying the fault... -*/ I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order. The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification. Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry. /Thomas - return VM_FAULT_NOPAGE; - } + dma_resv_lock(bo->base.resv, NULL); /* * Refuse to fault imported pages. This should be handled diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..6b50e624e3e2 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo); /** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved
On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote: On 8/20/19 4:53 PM, Daniel Vetter wrote: With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer. Assuming I didn't screw up anything with my audit of course. Signed-off-by: Daniel Vetter Cc: Christian Koenig Cc: Huang Rui Cc: Gerd Hoffmann Cc: "VMware Graphics" Cc: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c | 34 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 + include/drm/ttm/ttm_bo_api.h | 1 - 3 files changed, 1 insertion(+), 60 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ - int ret; - - /* - * In the absense of a wait_unlocked API, - * Use the bo::wu_mutex to avoid triggering livelocks due to - * concurrent use of this function. Note that this use of - * bo::wu_mutex can go away if we change locking order to - * mmap_sem -> bo::reserve. - */ - ret = mutex_lock_interruptible(>wu_mutex); - if (unlikely(ret != 0)) - return -ERESTARTSYS; - if (!dma_resv_is_locked(bo->base.resv)) - goto out_unlock; - ret = dma_resv_lock_interruptible(bo->base.resv, NULL); - if (ret == -EINTR) - ret = -ERESTARTSYS; - if (unlikely(ret != 0)) - goto out_unlock; - dma_resv_unlock(bo->base.resv); - -out_unlock: - mutex_unlock(>wu_mutex); - return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) >man[bo->mem.mem_type]; struct vm_area_struct cvma; - /* - * Work around locking order reversal in fault / nopfn - * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after waiting - * for the buffer to become unreserved. - */ - if (unlikely(!dma_resv_trylock(bo->base.resv))) { - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { - if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { - ttm_bo_get(bo); - up_read(>vma->vm_mm->mmap_sem); - (void) ttm_bo_wait_unreserved(bo); - ttm_bo_put(bo); - } - - return VM_FAULT_RETRY; - } - - /* - * If we'd want to change locking order to - * mmap_sem -> bo::reserve, we'd use a blocking reserve here - * instead of retrying the fault... - */ I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order. The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification. Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry. And this last sentence also means we still can remove the wu-mutex when the locking order is changed, since the chance of livelock is goes away. IIRC only a single RETRY spin is allowed by the mm core. /Thomas /Thomas - return VM_FAULT_NOPAGE; - } + dma_resv_lock(bo->base.resv, NULL); /* * Refuse to fault imported pages. This should be handled diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index 43c4929a2171..6b50e624e3e2 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file *filp, int ttm_bo_swapout(struct ttm_bo_global *glob, struct ttm_operation_ctx *ctx); void ttm_bo_swapout_all(struct ttm_bo_device *bdev); -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo); /** * ttm_bo_uses_embedded_gem_object - check if the given bo uses the ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved
On 8/21/19 4:09 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) wrote: On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote: On 8/20/19 4:53 PM, Daniel Vetter wrote: With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer. Assuming I didn't screw up anything with my audit of course. Signed-off-by: Daniel Vetter Cc: Christian Koenig Cc: Huang Rui Cc: Gerd Hoffmann Cc: "VMware Graphics" Cc: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c| 34 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 + include/drm/ttm/ttm_bo_api.h| 1 - 3 files changed, 1 insertion(+), 60 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ -int ret; - -/* - * In the absense of a wait_unlocked API, - * Use the bo::wu_mutex to avoid triggering livelocks due to - * concurrent use of this function. Note that this use of - * bo::wu_mutex can go away if we change locking order to - * mmap_sem -> bo::reserve. - */ -ret = mutex_lock_interruptible(>wu_mutex); -if (unlikely(ret != 0)) -return -ERESTARTSYS; -if (!dma_resv_is_locked(bo->base.resv)) -goto out_unlock; -ret = dma_resv_lock_interruptible(bo->base.resv, NULL); -if (ret == -EINTR) -ret = -ERESTARTSYS; -if (unlikely(ret != 0)) -goto out_unlock; -dma_resv_unlock(bo->base.resv); - -out_unlock: -mutex_unlock(>wu_mutex); -return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) >man[bo->mem.mem_type]; struct vm_area_struct cvma; -/* - * Work around locking order reversal in fault / nopfn - * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after waiting - * for the buffer to become unreserved. - */ -if (unlikely(!dma_resv_trylock(bo->base.resv))) { -if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { -if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { -ttm_bo_get(bo); - up_read(>vma->vm_mm->mmap_sem); -(void) ttm_bo_wait_unreserved(bo); -ttm_bo_put(bo); -} - -return VM_FAULT_RETRY; -} - -/* - * If we'd want to change locking order to - * mmap_sem -> bo::reserve, we'd use a blocking reserve here - * instead of retrying the fault... - */ I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order. The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification. Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry. That would be patches 1&2 in this series. Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation, but to keep the mm latency optimization using the RETRY functionality: Thanks, Thomas diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 85f5bcbe0c76..68482c67b9f7 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,30 +125,20 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) >man[bo->mem.mem_type]; struct vm_area_struct cvma; - /* -* Work around locking order reversal in fault / nopfn -* between mmap_sem and bo_reserve: Perform a trylock operation -* for reserve, and if it fails, retry the fault after waiting -* for the buffer to become unreserved. -*/ + /* Avoid blocking on reservation with mmap_sem held, if possible */ if (unlikely(!reservation_object_trylock(bo->base.resv))) { - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { - if (!(
Re: [Intel-gfx] [PATCH 1/3] dma_resv: prime lockdep annotations
On 8/20/19 4:53 PM, Daniel Vetter wrote: Full audit of everyone: - i915, radeon, amdgpu should be clean per their maintainers. - vram helpers should be fine, they don't do command submission, so really no business holding struct_mutex while doing copy_*_user. But I haven't checked them all. - panfrost seems to dma_resv_lock only in panfrost_job_push, which looks clean. - v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(), copying from/to userspace happens all in v3d_lookup_bos which is outside of the critical section. - vmwgfx has a bunch of ioctls that do their own copy_*_user: - vmw_execbuf_process: First this does some copies in vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself. Then comes the usual ttm reserve/validate sequence, then actual submission/fencing, then unreserving, and finally some more copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of details, but looks all safe. - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be seen, seems to only create a fence and copy it out. - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be found there. Summary: vmwgfx seems to be fine too. - virtio: There's virtio_gpu_execbuffer_ioctl, which does all the copying from userspace before even looking up objects through their handles, so safe. Plus the getparam/getcaps ioctl, also both safe. - qxl only has qxl_execbuffer_ioctl, which calls into qxl_process_single_command. There's a lovely comment before the __copy_from_user_inatomic that the slowpath should be copied from i915, but I guess that never happened. Try not to be unlucky and get your CS data evicted between when it's written and the kernel tries to read it. The only other copy_from_user is for relocs, but those are done before qxl_release_reserve_list(), which seems to be the only thing reserving buffers (in the ttm/dma_resv sense) in that code. So looks safe. - A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this everywhere and needs to be fixed up. Cc: Alex Deucher Cc: Christian König Cc: Chris Wilson Cc: Thomas Zimmermann Cc: Rob Herring Cc: Tomeu Vizoso Cc: Eric Anholt Cc: Dave Airlie Cc: Gerd Hoffmann Cc: Ben Skeggs Cc: "VMware Graphics" Cc: Thomas Hellstrom Signed-off-by: Daniel Vetter --- drivers/dma-buf/dma-resv.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 42a8f3f11681..3edca10d3faf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ #include #include +#include /** * DOC: Reservation Object Overview @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj) _seqcount_class); RCU_INIT_POINTER(obj->fence, NULL); RCU_INIT_POINTER(obj->fence_excl, NULL); + + if (IS_ENABLED(CONFIG_LOCKDEP)) { + if (current->mm) + down_read(>mm->mmap_sem); + ww_mutex_lock(>lock, NULL); + fs_reclaim_acquire(GFP_KERNEL); + fs_reclaim_release(GFP_KERNEL); + ww_mutex_unlock(>lock); + if (current->mm) + up_read(>mm->mmap_sem); + } } EXPORT_SYMBOL(dma_resv_init); I assume if this would have been easily done and maintainable using only lockdep annotation instead of actually acquiring the locks, that would have been done? Otherwise LGTM. Reviewed-by: Thomas Hellström Will test this and let you know if it trips on vmwgfx, but it really shouldn't. /Thomas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved
On 8/20/19 4:53 PM, Daniel Vetter wrote: With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer. Assuming I didn't screw up anything with my audit of course. Signed-off-by: Daniel Vetter Cc: Christian Koenig Cc: Huang Rui Cc: Gerd Hoffmann Cc: "VMware Graphics" Cc: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c| 34 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 + include/drm/ttm/ttm_bo_api.h| 1 - 3 files changed, 1 insertion(+), 60 deletions(-) + dma_resv_lock(bo->base.resv, NULL); interruptible, or at least killable. (IIRC think killable is the right choice in fault code, even if TTM initially implemented interruptible to get reasonable Xorg "silken mouse" latency). Thanks, /Thomas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved
On 8/21/19 4:10 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 3:16 PM Thomas Hellström (VMware) wrote: On 8/20/19 4:53 PM, Daniel Vetter wrote: With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer. Assuming I didn't screw up anything with my audit of course. Signed-off-by: Daniel Vetter Cc: Christian Koenig Cc: Huang Rui Cc: Gerd Hoffmann Cc: "VMware Graphics" Cc: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c| 34 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 + include/drm/ttm/ttm_bo_api.h| 1 - 3 files changed, 1 insertion(+), 60 deletions(-) + dma_resv_lock(bo->base.resv, NULL); interruptible, or at least killable. (IIRC think killable is the right choice in fault code, even if TTM initially implemented interruptible to get reasonable Xorg "silken mouse" latency). I think interruptible is fine. I chickend out of that for v1 because I always mix up the return code for _lock_interruptibl() :-) :). IIRC I think the in-kernel users of fault() were unhappy with interruptible. (GUP?), but I guess it's better to use a bulk change at some point if necessary. /Thomas I'll add that for the next version too. -Daniel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved
On 8/21/19 5:22 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 5:19 PM Thomas Hellström (VMware) wrote: On 8/21/19 5:14 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 5:03 PM Thomas Hellström (VMware) wrote: On 8/21/19 4:47 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware) wrote: On 8/21/19 4:09 PM, Daniel Vetter wrote: On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware) wrote: On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote: On 8/20/19 4:53 PM, Daniel Vetter wrote: With nouveau fixed all ttm-using drives have the correct nesting of mmap_sem vs dma_resv, and we can just lock the buffer. Assuming I didn't screw up anything with my audit of course. Signed-off-by: Daniel Vetter Cc: Christian Koenig Cc: Huang Rui Cc: Gerd Hoffmann Cc: "VMware Graphics" Cc: Thomas Hellstrom --- drivers/gpu/drm/ttm/ttm_bo.c| 34 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 + include/drm/ttm/ttm_bo_api.h| 1 - 3 files changed, 1 insertion(+), 60 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 20ff56f27aa4..a952dd624b06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev) ; } EXPORT_SYMBOL(ttm_bo_swapout_all); - -/** - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become - * unreserved - * - * @bo: Pointer to buffer - */ -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo) -{ -int ret; - -/* - * In the absense of a wait_unlocked API, - * Use the bo::wu_mutex to avoid triggering livelocks due to - * concurrent use of this function. Note that this use of - * bo::wu_mutex can go away if we change locking order to - * mmap_sem -> bo::reserve. - */ -ret = mutex_lock_interruptible(>wu_mutex); -if (unlikely(ret != 0)) -return -ERESTARTSYS; -if (!dma_resv_is_locked(bo->base.resv)) -goto out_unlock; -ret = dma_resv_lock_interruptible(bo->base.resv, NULL); -if (ret == -EINTR) -ret = -ERESTARTSYS; -if (unlikely(ret != 0)) -goto out_unlock; -dma_resv_unlock(bo->base.resv); - -out_unlock: -mutex_unlock(>wu_mutex); -return ret; -} diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 76eedb963693..505e1787aeea 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) >man[bo->mem.mem_type]; struct vm_area_struct cvma; -/* - * Work around locking order reversal in fault / nopfn - * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after waiting - * for the buffer to become unreserved. - */ -if (unlikely(!dma_resv_trylock(bo->base.resv))) { -if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { -if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { -ttm_bo_get(bo); - up_read(>vma->vm_mm->mmap_sem); -(void) ttm_bo_wait_unreserved(bo); -ttm_bo_put(bo); -} - -return VM_FAULT_RETRY; -} - -/* - * If we'd want to change locking order to - * mmap_sem -> bo::reserve, we'd use a blocking reserve here - * instead of retrying the fault... - */ I think you should justify why the above code is removed, since the comments actually outlines what to do if we change locking order. The code that's removed above is not for adjusting locking orders but to decrease the mm latency by releasing the mmap_sem while waiting for bo reserve which in turn might be waiting for GPU. At a minimum we should have a separate patch with justification. Note that the caller here ensures locking progress by adjusting the RETRY flags after a retry. That would be patches 1&2 in this series. Hmm? Those seem to touch only dma-buf and nouveau not ttm? I mean this patch should look along the lines of (based on an older tree) to implement the new locking-order mmap_sem->reservation, Only nouveau was breaking was doing copy_*_user or get_user_pages while holding dma_resv locks, no one else. So nothing else needed to be changed. But patch 1 contains the full audit. I might have missed something. but to keep the mm latency optimization using the RETRY functionality: Still no idea why this is needed? All the comments here and the code and history seem like they've been about the mmap_sem vs dma_resv inversion between driver ioctls and fault handling here. Once that's officially fixed there's no reason to play games here and retry loops - previously that was necessary because the old ttm_bo_vm_fault