[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [01/20] drm/i915: Preallocate stashes for vma page-directories
== Series Details == Series: series starting with [01/20] drm/i915: Preallocate stashes for vma page-directories URL : https://patchwork.freedesktop.org/series/79129/ State : success == Summary == CI Bug Log - changes from CI_DRM_8708 -> Patchwork_18082 Summary --- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/index.html Known issues Here are the changes found in Patchwork_18082 that come from known issues: ### IGT changes ### Issues hit * igt@gem_exec_suspend@basic-s0: - fi-ilk-650: [PASS][1] -> [DMESG-WARN][2] ([i915#164]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-ilk-650/igt@gem_exec_susp...@basic-s0.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-ilk-650/igt@gem_exec_susp...@basic-s0.html * igt@gem_flink_basic@double-flink: - fi-tgl-y: [PASS][3] -> [DMESG-WARN][4] ([i915#402]) +1 similar issue [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-tgl-y/igt@gem_flink_ba...@double-flink.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-tgl-y/igt@gem_flink_ba...@double-flink.html * igt@kms_busy@basic@flip: - fi-kbl-x1275: [PASS][5] -> [DMESG-WARN][6] ([i915#62] / [i915#92] / [i915#95]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-kbl-x1275/igt@kms_busy@ba...@flip.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-kbl-x1275/igt@kms_busy@ba...@flip.html Possible fixes * igt@gem_exec_suspend@basic-s0: - fi-tgl-u2: [FAIL][7] ([i915#1888]) -> [PASS][8] [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-tgl-u2/igt@gem_exec_susp...@basic-s0.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-tgl-u2/igt@gem_exec_susp...@basic-s0.html * igt@i915_pm_backlight@basic-brightness: - fi-whl-u: [DMESG-WARN][9] ([i915#95]) -> [PASS][10] [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-whl-u/igt@i915_pm_backli...@basic-brightness.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-whl-u/igt@i915_pm_backli...@basic-brightness.html * igt@vgem_basic@setversion: - fi-tgl-y: [DMESG-WARN][11] ([i915#402]) -> [PASS][12] +1 similar issue [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-tgl-y/igt@vgem_ba...@setversion.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-tgl-y/igt@vgem_ba...@setversion.html Warnings * igt@gem_exec_suspend@basic-s0: - fi-kbl-x1275: [DMESG-WARN][13] ([i915#62] / [i915#92]) -> [DMESG-WARN][14] ([i915#62] / [i915#92] / [i915#95]) +3 similar issues [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-kbl-x1275/igt@gem_exec_susp...@basic-s0.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-kbl-x1275/igt@gem_exec_susp...@basic-s0.html * igt@kms_cursor_legacy@basic-flip-after-cursor-varying-size: - fi-kbl-x1275: [DMESG-WARN][15] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][16] ([i915#62] / [i915#92]) +2 similar issues [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-kbl-x1275/igt@kms_cursor_leg...@basic-flip-after-cursor-varying-size.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-kbl-x1275/igt@kms_cursor_leg...@basic-flip-after-cursor-varying-size.html [i915#164]: https://gitlab.freedesktop.org/drm/intel/issues/164 [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888 [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402 [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62 [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92 [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95 Participating hosts (43 -> 37) -- Additional (1): fi-skl-guc Missing(7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus Build changes - * Linux: CI_DRM_8708 -> Patchwork_18082 CI-20190529: 20190529 CI_DRM_8708: 170e94a1430fd0a4f0841ad0f7366904d52e49be @ git://anongit.freedesktop.org/gfx-ci/linux IGT_5722: 9985cf23e9db9557bc7d714f5b72602e427497d3 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_18082: 33bcfafbc4ab5abd3e686c0e7b58aa7f7b61da4a @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == 33bcfafbc4ab drm/i915: Track i915_vma with its own reference counter f3622825131d drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class 0c763806ba4f drm/i915/gem: Pull execbuf dma resv under a single critical section 2036c3ef6f3e drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2. 3a96506f63d3 drm/i915/gem: Reintroduce multiple passes for reloc processing 8b519af3185f
[Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [01/20] drm/i915: Preallocate stashes for vma page-directories
== Series Details == Series: series starting with [01/20] drm/i915: Preallocate stashes for vma page-directories URL : https://patchwork.freedesktop.org/series/79129/ State : warning == Summary == $ dim sparse --fast origin/drm-tip Sparse version: v0.6.0 Fast mode used, each commit won't be checked separately. +drivers/gpu/drm/i915/selftests/i915_syncmap.c:80:54: warning: dubious: x | !y ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/20] drm/i915: Preallocate stashes for vma page-directories
== Series Details == Series: series starting with [01/20] drm/i915: Preallocate stashes for vma page-directories URL : https://patchwork.freedesktop.org/series/79129/ State : warning == Summary == $ dim checkpatch origin/drm-tip b5c5d0ce6d3d drm/i915: Preallocate stashes for vma page-directories 5ff5ae1fb079 drm/i915: Switch to object allocations for page directories 1da308be5b55 drm/i915/gem: Don't drop the timeline lock during execbuf a8aa1e74f47e drm/i915/gem: Rename execbuf.bind_link to unbound_link 6779fcaed114 drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup c3cc3ce10691 drm/i915/gem: Remove the call for no-evict i915_vma_pin acb85ac92a2f drm/i915: Add list_for_each_entry_safe_continue_reverse -:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'pos' - possible side-effects? #21: FILE: drivers/gpu/drm/i915/i915_utils.h:269: +#define list_for_each_entry_safe_continue_reverse(pos, n, head, member) \ + for (pos = list_prev_entry(pos, member),\ +n = list_prev_entry(pos, member); \ +&pos->member != (head);\ +pos = n, n = list_prev_entry(n, member)) -:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'n' - possible side-effects? #21: FILE: drivers/gpu/drm/i915/i915_utils.h:269: +#define list_for_each_entry_safe_continue_reverse(pos, n, head, member) \ + for (pos = list_prev_entry(pos, member),\ +n = list_prev_entry(pos, member); \ +&pos->member != (head);\ +pos = n, n = list_prev_entry(n, member)) -:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects? #21: FILE: drivers/gpu/drm/i915/i915_utils.h:269: +#define list_for_each_entry_safe_continue_reverse(pos, n, head, member) \ + for (pos = list_prev_entry(pos, member),\ +n = list_prev_entry(pos, member); \ +&pos->member != (head);\ +pos = n, n = list_prev_entry(n, member)) total: 0 errors, 0 warnings, 3 checks, 12 lines checked ed90e5840625 drm/i915: Always defer fenced work to the worker 03e637530b30 drm/i915/gem: Assign context id for async work fe694efd1728 drm/i915: Export a preallocate variant of i915_active_acquire() 8669de2e382e drm/i915/gem: Separate the ww_mutex walker into its own list a1fb11db685d drm/i915/gem: Asynchronous GTT unbinding 92aa37aeda01 drm/i915/gem: Bind the fence async for execbuf 6002974eb6ce drm/i915/gem: Include cmdparser in common execbuf pinning 8b519af3185f drm/i915/gem: Include secure batch in common execbuf pinning 3a96506f63d3 drm/i915/gem: Reintroduce multiple passes for reloc processing -:1434: WARNING:MEMORY_BARRIER: memory barrier without comment #1434: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c:161: + wmb(); total: 0 errors, 1 warnings, 0 checks, 1421 lines checked 2036c3ef6f3e drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2. -:59: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating? #59: new file mode 100644 -:354: WARNING:LINE_SPACING: Missing a blank line after declarations #354: FILE: drivers/gpu/drm/i915/mm/st_acquire_ctx.c:106: + const unsigned int total = ARRAY_SIZE(dl->obj); + I915_RND_STATE(prng); -:450: WARNING:YIELD: Using yield() is generally wrong. See yield() kernel-doc (sched/core.c) #450: FILE: drivers/gpu/drm/i915/mm/st_acquire_ctx.c:202: + yield(); /* start all threads before we begin */ total: 0 errors, 3 warnings, 0 checks, 446 lines checked 0c763806ba4f drm/i915/gem: Pull execbuf dma resv under a single critical section f3622825131d drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class 33bcfafbc4ab drm/i915: Track i915_vma with its own reference counter -:2081: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment #2081: FILE: drivers/gpu/drm/i915/gt/intel_gtt.h:254: + spinlock_t lock; -:3963: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment #3963: FILE: drivers/gpu/drm/i915/i915_vma.h:392: + spinlock_t lock; -:4210: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring #4210: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:419: + if (offset < hole_start + vma->size) -:4221: WARNING:LONG_LINE: line length of 101 exceeds 100 columns #4221: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:427: + __func__, p->name, err, npages, prime, offset, -:4231: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring #4231: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:444: +
[Intel-gfx] [PATCH 09/20] drm/i915/gem: Assign context id for async work
Allocate a few dma fence context id that we can use to associate async work [for the CPU] launched on behalf of this context. For extra fun, we allow a configurable concurrency width. A current example would be that we spawn an unbound worker for every userptr get_pages. In the future, we wish to charge this work to the context that initiated the async work and to impose concurrency limits based on the context. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 4 drivers/gpu/drm/i915/gem/i915_gem_context.h | 6 ++ drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 6 ++ 3 files changed, 16 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 41784df51e58..bd68746327b3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -714,6 +714,10 @@ __create_context(struct drm_i915_private *i915) ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL); mutex_init(&ctx->mutex); + ctx->async.width = rounddown_pow_of_two(num_online_cpus()); + ctx->async.context = dma_fence_context_alloc(ctx->async.width); + ctx->async.width--; + spin_lock_init(&ctx->stale.lock); INIT_LIST_HEAD(&ctx->stale.engines); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h index 3702b2fb27ab..e104ff0ae740 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h @@ -134,6 +134,12 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data, struct drm_file *file); +static inline u64 i915_gem_context_async_id(struct i915_gem_context *ctx) +{ + return (ctx->async.context + + (atomic_fetch_inc(&ctx->async.cur) & ctx->async.width)); +} + static inline struct i915_gem_context * i915_gem_context_get(struct i915_gem_context *ctx) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index ae14ca24a11f..52561f98000f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -85,6 +85,12 @@ struct i915_gem_context { struct intel_timeline *timeline; + struct { + u64 context; + atomic_t cur; + unsigned int width; + } async; + /** * @vm: unique address space (GTT) * -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 05/20] drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup
As a prelude to the next step where we want to perform all the object allocations together under the same lock, we first must delay the i915_vma_pin() as that implicitly does the allocations for us, one by one. As it only does the allocations one by one, it is not allowed to wait/evict, whereas pulling all the allocations together the entire set can be scheduled as one. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 70 +++ 1 file changed, 39 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index bf8193d9e279..35a57c1fc9c3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -33,6 +33,8 @@ struct eb_vma { /** This vma's place in the execbuf reservation list */ struct drm_i915_gem_exec_object2 *exec; + + struct list_head bind_link; struct list_head unbound_link; struct list_head reloc_link; @@ -240,8 +242,8 @@ struct i915_execbuffer { /** actual size of execobj[] as we may extend it for the cmdparser */ unsigned int buffer_count; - /** list of vma not yet bound during reservation phase */ - struct list_head unbound; + /** list of all vma required to bound for this execbuf */ + struct list_head bind_list; /** list of vma that have execobj.relocation_count */ struct list_head relocs; @@ -565,6 +567,8 @@ eb_add_vma(struct i915_execbuffer *eb, eb->lut_size)]); } + list_add_tail(&ev->bind_link, &eb->bind_list); + if (entry->relocation_count) list_add_tail(&ev->reloc_link, &eb->relocs); @@ -586,16 +590,6 @@ eb_add_vma(struct i915_execbuffer *eb, eb->batch = ev; } - - if (eb_pin_vma(eb, entry, ev)) { - if (entry->offset != vma->node.start) { - entry->offset = vma->node.start | UPDATE; - eb->args->flags |= __EXEC_HAS_RELOC; - } - } else { - eb_unreserve_vma(ev); - list_add_tail(&ev->unbound_link, &eb->unbound); - } } static int eb_reserve_vma(const struct i915_execbuffer *eb, @@ -670,13 +664,31 @@ static int wait_for_timeline(struct intel_timeline *tl) } while (1); } -static int eb_reserve(struct i915_execbuffer *eb) +static int eb_reserve_vm(struct i915_execbuffer *eb) { - const unsigned int count = eb->buffer_count; unsigned int pin_flags = PIN_USER | PIN_NONBLOCK; - struct list_head last; + struct list_head last, unbound; struct eb_vma *ev; - unsigned int i, pass; + unsigned int pass; + + INIT_LIST_HEAD(&unbound); + list_for_each_entry(ev, &eb->bind_list, bind_link) { + struct drm_i915_gem_exec_object2 *entry = ev->exec; + struct i915_vma *vma = ev->vma; + + if (eb_pin_vma(eb, entry, ev)) { + if (entry->offset != vma->node.start) { + entry->offset = vma->node.start | UPDATE; + eb->args->flags |= __EXEC_HAS_RELOC; + } + } else { + eb_unreserve_vma(ev); + list_add_tail(&ev->unbound_link, &unbound); + } + } + + if (list_empty(&unbound)) + return 0; /* * Attempt to pin all of the buffers into the GTT. @@ -699,7 +711,7 @@ static int eb_reserve(struct i915_execbuffer *eb) if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex)) return -EINTR; - list_for_each_entry(ev, &eb->unbound, unbound_link) { + list_for_each_entry(ev, &unbound, unbound_link) { err = eb_reserve_vma(eb, ev, pin_flags); if (err) break; @@ -710,13 +722,11 @@ static int eb_reserve(struct i915_execbuffer *eb) } /* Resort *all* the objects into priority order */ - INIT_LIST_HEAD(&eb->unbound); + INIT_LIST_HEAD(&unbound); INIT_LIST_HEAD(&last); - for (i = 0; i < count; i++) { - unsigned int flags; + list_for_each_entry(ev, &eb->bind_list, bind_link) { + unsigned int flags = ev->flags; - ev = &eb->vma[i]; - flags = ev->flags; if (flags & EXEC_OBJECT_PINNED && flags & __EXEC_OBJECT_HAS_PIN) continue; @@ -725,17 +735,17 @@ static int eb_reserve(struct i915_execbuffer *eb) if (flags & EXEC_OBJECT_PINNED) /* Pi
[Intel-gfx] [PATCH 15/20] drm/i915/gem: Include secure batch in common execbuf pinning
Pull the GGTT binding for the secure batch dispatch into the common vma pinning routine for execbuf, so that there is just a single central place for all i915_vma_pin(). Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 88 +++ 1 file changed, 51 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 8e4681427ce3..629b736adf2c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1648,6 +1648,48 @@ static int eb_alloc_cmdparser(struct i915_execbuffer *eb) return err; } +static int eb_secure_batch(struct i915_execbuffer *eb) +{ + struct i915_vma *vma = eb->batch->vma; + + /* +* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure +* batch" bit. Hence we need to pin secure batches into the global gtt. +* hsw should have this fixed, but bdw mucks it up again. +*/ + if (!(eb->batch_flags & I915_DISPATCH_SECURE)) + return 0; + + if (GEM_WARN_ON(vma->vm != &eb->engine->gt->ggtt->vm)) { + struct eb_vma *ev; + + ev = kzalloc(sizeof(*ev), GFP_KERNEL); + if (!ev) + return -ENOMEM; + + vma = i915_vma_instance(vma->obj, + &eb->engine->gt->ggtt->vm, + NULL); + if (IS_ERR(vma)) { + kfree(ev); + return PTR_ERR(vma); + } + + ev->vma = i915_vma_get(vma); + ev->exec = &no_entry; + + list_add(&ev->submit_link, &eb->submit_list); + list_add(&ev->reloc_link, &eb->array->aux_list); + list_add(&ev->bind_link, &eb->bind_list); + + GEM_BUG_ON(eb->batch->vma->private); + eb->batch = ev; + } + + eb->batch->flags |= EXEC_OBJECT_NEEDS_GTT; + return 0; +} + static unsigned int eb_batch_index(const struct i915_execbuffer *eb) { if (eb->args->flags & I915_EXEC_BATCH_FIRST) @@ -2442,6 +2484,10 @@ static int eb_relocate(struct i915_execbuffer *eb) if (err) return err; + err = eb_secure_batch(eb); + if (err) + return err; + err = eb_reserve_vm(eb); if (err) return err; @@ -2796,7 +2842,7 @@ add_to_client(struct i915_request *rq, struct drm_file *file) spin_unlock(&file_priv->mm.lock); } -static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch) +static int eb_submit(struct i915_execbuffer *eb) { int err; @@ -2823,7 +2869,7 @@ static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch) } err = eb->engine->emit_bb_start(eb->request, - batch->node.start + + eb->batch->vma->node.start + eb->batch_start_offset, eb->batch_len, eb->batch_flags); @@ -3296,7 +3342,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, struct i915_execbuffer eb; struct dma_fence *in_fence = NULL; struct sync_file *out_fence = NULL; - struct i915_vma *batch; int out_fence_fd = -1; int err; @@ -3388,34 +3433,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (err) goto err_vma; - /* -* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure -* batch" bit. Hence we need to pin secure batches into the global gtt. -* hsw should have this fixed, but bdw mucks it up again. */ - batch = i915_vma_get(eb.batch->vma); - if (eb.batch_flags & I915_DISPATCH_SECURE) { - struct i915_vma *vma; - - /* -* So on first glance it looks freaky that we pin the batch here -* outside of the reservation loop. But: -* - The batch is already pinned into the relevant ppgtt, so we -* already have the backing storage fully allocated. -* - No other BO uses the global gtt (well contexts, but meh), -* so we don't really have issues with multiple objects not -* fitting due to fragmentation. -* So this is actually safe. -*/ - vma = i915_gem_object_ggtt_pin(batch->obj, NULL, 0, 0, 0); - if (IS_ERR(vma)) { - err = PTR_ERR(vma); - goto err_vma; - } - - GEM_BUG_ON(vma->obj != batch->obj); - batch = vma; - } - /* All GPU relocation batches must be submitted prior to the user rq */ GEM_BUG_ON(
[Intel-gfx] [PATCH 14/20] drm/i915/gem: Include cmdparser in common execbuf pinning
Pull the cmdparser allocations in to the reservation phase, and then they are included in the common vma pinning pass. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 348 ++ drivers/gpu/drm/i915/gem/i915_gem_object.h| 10 + drivers/gpu/drm/i915/i915_cmd_parser.c| 21 +- 3 files changed, 218 insertions(+), 161 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index c14c3b7e0dfd..8e4681427ce3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -25,6 +25,7 @@ #include "i915_gem_clflush.h" #include "i915_gem_context.h" #include "i915_gem_ioctls.h" +#include "i915_memcpy.h" #include "i915_sw_fence_work.h" #include "i915_trace.h" @@ -52,6 +53,7 @@ struct eb_bind_vma { struct eb_vma_array { struct kref kref; + struct list_head aux_list; struct eb_vma vma[]; }; @@ -246,7 +248,6 @@ struct i915_execbuffer { struct i915_request *request; /** our request to build */ struct eb_vma *batch; /** identity of the batch obj/vma */ - struct i915_vma *trampoline; /** trampoline used for chaining */ /** actual size of execobj[] as we may extend it for the cmdparser */ unsigned int buffer_count; @@ -281,6 +282,11 @@ struct i915_execbuffer { unsigned int rq_size; } reloc_cache; + struct eb_cmdparser { + struct eb_vma *shadow; + struct eb_vma *trampoline; + } parser; + u64 invalid_flags; /** Set of execobj.flags that are invalid */ u32 context_flags; /** Set of execobj.flags to insert from the ctx */ @@ -298,6 +304,10 @@ struct i915_execbuffer { struct eb_vma_array *array; }; +static struct drm_i915_gem_exec_object2 no_entry = { + .offset = -1ull +}; + static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb) { return intel_engine_requires_cmd_parser(eb->engine) || @@ -314,6 +324,7 @@ static struct eb_vma_array *eb_vma_array_create(unsigned int count) return NULL; kref_init(&arr->kref); + INIT_LIST_HEAD(&arr->aux_list); arr->vma[0].vma = NULL; return arr; @@ -339,16 +350,31 @@ static inline void eb_unreserve_vma(struct eb_vma *ev) __EXEC_OBJECT_HAS_FENCE); } +static void eb_vma_destroy(struct eb_vma *ev) +{ + eb_unreserve_vma(ev); + i915_vma_put(ev->vma); +} + +static void eb_destroy_aux(struct eb_vma_array *arr) +{ + struct eb_vma *ev, *en; + + list_for_each_entry_safe(ev, en, &arr->aux_list, reloc_link) { + eb_vma_destroy(ev); + kfree(ev); + } +} + static void eb_vma_array_destroy(struct kref *kref) { struct eb_vma_array *arr = container_of(kref, typeof(*arr), kref); - struct eb_vma *ev = arr->vma; + struct eb_vma *ev; - while (ev->vma) { - eb_unreserve_vma(ev); - i915_vma_put(ev->vma); - ev++; - } + eb_destroy_aux(arr); + + for (ev = arr->vma; ev->vma; ev++) + eb_vma_destroy(ev); kvfree(arr); } @@ -396,8 +422,8 @@ eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire) static int eb_create(struct i915_execbuffer *eb) { - /* Allocate an extra slot for use by the command parser + sentinel */ - eb->array = eb_vma_array_create(eb->buffer_count + 2); + /* Allocate an extra slot for use by the sentinel */ + eb->array = eb_vma_array_create(eb->buffer_count + 1); if (!eb->array) return -ENOMEM; @@ -1078,7 +1104,7 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind) GEM_BUG_ON(!(drm_mm_node_allocated(&vma->node) ^ drm_mm_node_allocated(&bind->hole))); - if (entry->offset != vma->node.start) { + if (entry != &no_entry && entry->offset != vma->node.start) { entry->offset = vma->node.start | UPDATE; *work->p_flags |= __EXEC_HAS_RELOC; } @@ -1374,7 +1400,8 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) struct i915_vma *vma = ev->vma; if (eb_pin_vma_inplace(eb, entry, ev)) { - if (entry->offset != vma->node.start) { + if (entry != &no_entry && + entry->offset != vma->node.start) { entry->offset = vma->node.start | UPDATE; eb->args->flags |= __EXEC_HAS_RELOC; } @@ -1514,6 +1541,113 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) } while (1); } +static int eb_alloc_cmdparser(struct i915_execbuffer *eb) +{ + struct intel_gt_buffer_pool_node *pool; + struct i915_vma *vma; + struct eb_vma *ev;
[Intel-gfx] [PATCH 13/20] drm/i915/gem: Bind the fence async for execbuf
It is illegal to wait on an another vma while holding the vm->mutex, as that easily leads to ABBA deadlocks (we wait on a second vma that waits on us to release the vm->mutex). So while the vm->mutex exists, move the waiting outside of the lock into the async binding pipeline. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 21 +-- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 137 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h | 5 + 3 files changed, 151 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 6a406e8798ef..c14c3b7e0dfd 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1056,15 +1056,6 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind) return err; pin: - if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) { - err = __i915_vma_pin_fence(vma); /* XXX no waiting */ - if (unlikely(err)) - return err; - - if (vma->fence) - bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE; - } - bind_flags &= ~atomic_read(&vma->flags); if (bind_flags) { err = set_bind_fence(vma, work); @@ -1095,6 +1086,15 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind) bind->ev->flags |= __EXEC_OBJECT_HAS_PIN; GEM_BUG_ON(eb_vma_misplaced(entry, vma, bind->ev->flags)); + if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) { + err = __i915_vma_pin_fence_async(vma, &work->base); + if (unlikely(err)) + return err; + + if (vma->fence) + bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE; + } + return 0; } @@ -1160,6 +1160,9 @@ static void __eb_bind_vma(struct eb_vm_work *work) struct eb_bind_vma *bind = &work->bind[n]; struct i915_vma *vma = bind->ev->vma; + if (bind->ev->flags & __EXEC_OBJECT_HAS_FENCE) + __i915_vma_apply_fence_async(vma); + if (!bind->bind_flags) goto put; diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c index 7fb36b12fe7a..734b6aa61809 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c @@ -21,10 +21,13 @@ * IN THE SOFTWARE. */ +#include "i915_active.h" #include "i915_drv.h" #include "i915_scatterlist.h" +#include "i915_sw_fence_work.h" #include "i915_pvinfo.h" #include "i915_vgpu.h" +#include "i915_vma.h" /** * DOC: fence register handling @@ -340,19 +343,37 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt) return ERR_PTR(-EDEADLK); } +static int fence_wait_bind(struct i915_fence_reg *reg) +{ + struct dma_fence *fence; + int err = 0; + + fence = i915_active_fence_get(®->active.excl); + if (fence) { + err = dma_fence_wait(fence, true); + dma_fence_put(fence); + } + + return err; +} + int __i915_vma_pin_fence(struct i915_vma *vma) { struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm); - struct i915_fence_reg *fence; + struct i915_fence_reg *fence = vma->fence; struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL; int err; lockdep_assert_held(&vma->vm->mutex); /* Just update our place in the LRU if our fence is getting reused. */ - if (vma->fence) { - fence = vma->fence; + if (fence) { GEM_BUG_ON(fence->vma != vma); + + err = fence_wait_bind(fence); + if (err) + return err; + atomic_inc(&fence->pin_count); if (!fence->dirty) { list_move_tail(&fence->link, &ggtt->fence_list); @@ -384,6 +405,116 @@ int __i915_vma_pin_fence(struct i915_vma *vma) return err; } +static int set_bind_fence(struct i915_fence_reg *fence, + struct dma_fence_work *work) +{ + struct dma_fence *prev; + int err; + + if (rcu_access_pointer(fence->active.excl.fence) == &work->dma) + return 0; + + err = i915_sw_fence_await_active(&work->chain, +&fence->active, +I915_ACTIVE_AWAIT_ACTIVE); + if (err) + return err; + + if (i915_active_acquire(&fence->active)) + return -ENOENT; + + prev = i915_active_set_exclusive(&fence->active, &work->dma); + if (unlikely(prev)) { + err = i915_sw_fence_await_dma_fence(&work->chain, prev, 0, +
[Intel-gfx] [PATCH 08/20] drm/i915: Always defer fenced work to the worker
Currently, if an error is raised we always call the cleanup locally [and skip the main work callback]. However, some future users may need to take a mutex to cleanup and so we cannot immediately execute the cleanup as we may still be in interrupt context. With the execute-immediate flag, for most cases this should result in immediate cleanup of an error. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_sw_fence_work.c | 25 +++ 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c index a3a81bb8f2c3..29f63ebc24e8 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c @@ -16,11 +16,14 @@ static void fence_complete(struct dma_fence_work *f) static void fence_work(struct work_struct *work) { struct dma_fence_work *f = container_of(work, typeof(*f), work); - int err; - err = f->ops->work(f); - if (err) - dma_fence_set_error(&f->dma, err); + if (!f->dma.error) { + int err; + + err = f->ops->work(f); + if (err) + dma_fence_set_error(&f->dma, err); + } fence_complete(f); dma_fence_put(&f->dma); @@ -36,15 +39,11 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) if (fence->error) dma_fence_set_error(&f->dma, fence->error); - if (!f->dma.error) { - dma_fence_get(&f->dma); - if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) - fence_work(&f->work); - else - queue_work(system_unbound_wq, &f->work); - } else { - fence_complete(f); - } + dma_fence_get(&f->dma); + if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) + fence_work(&f->work); + else + queue_work(system_unbound_wq, &f->work); break; case FENCE_FREE: -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 02/20] drm/i915: Switch to object allocations for page directories
The GEM object is grossly overweight for the practicality of tracking large numbers of individual pages, yet it is currently our only abstraction for tracking DMA allocations. Since those allocations need to be reserved upfront before an operation, and that we need to break away from simple system memory, we need to ditch using plain struct page wrappers. In the process, we drop the WC mapping as we ended up clflushing everything anyway due to various issues across a wider range of platforms. Though in a future step, we need to drop the kmap_atomic approach which suggests we need to pre-map all the pages and keep them mapped. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_object_types.h | 1 + .../gpu/drm/i915/gem/selftests/huge_pages.c | 2 +- .../drm/i915/gem/selftests/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 53 ++-- drivers/gpu/drm/i915/gt/gen6_ppgtt.h | 1 + drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 89 +++--- drivers/gpu/drm/i915/gt/intel_ggtt.c | 37 ++- drivers/gpu/drm/i915/gt/intel_gtt.c | 292 +++--- drivers/gpu/drm/i915/gt/intel_gtt.h | 94 ++ drivers/gpu/drm/i915/gt/intel_ppgtt.c | 42 ++- .../gpu/drm/i915/gt/intel_ring_submission.c | 16 +- drivers/gpu/drm/i915/gvt/scheduler.c | 17 +- drivers/gpu/drm/i915/i915_drv.c | 1 + drivers/gpu/drm/i915/i915_drv.h | 5 - drivers/gpu/drm/i915/i915_vma.c | 18 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 23 ++ drivers/gpu/drm/i915/selftests/mock_gtt.c | 4 + 17 files changed, 278 insertions(+), 419 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 5335f799b548..d0847d7896f9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -282,6 +282,7 @@ struct drm_i915_gem_object { } userptr; unsigned long scratch; + u64 encode; void *gvt_info; }; diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index 8291ede6902c..9fb06fcc8f8f 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -393,7 +393,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg) */ for (i = 1; i < BIT(ARRAY_SIZE(page_sizes)); i++) { - unsigned int combination = 0; + unsigned int combination = SZ_4K; for (j = 0; j < ARRAY_SIZE(page_sizes); j++) { if (i & BIT(j)) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index b81978890641..1308198543d8 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -1745,7 +1745,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) if (!vm) return -ENODEV; - page = vm->scratch[0].base.page; + page = __px_page(vm->scratch[0]); if (!page) { pr_err("No scratch page!\n"); return -EINVAL; diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index 35e2b698f9ed..1c1e807f674d 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -16,8 +16,10 @@ static inline void gen6_write_pde(const struct gen6_ppgtt *ppgtt, const unsigned int pde, const struct i915_page_table *pt) { + dma_addr_t addr = pt ? px_dma(pt) : px_dma(ppgtt->base.vm.scratch[1]); + /* Caller needs to make sure the write completes if necessary */ - iowrite32(GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID, + iowrite32(GEN6_PDE_ADDR_ENCODE(addr) | GEN6_PDE_VALID, ppgtt->pd_addr + pde); } @@ -79,7 +81,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm, { struct gen6_ppgtt * const ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm)); const unsigned int first_entry = start / I915_GTT_PAGE_SIZE; - const gen6_pte_t scratch_pte = vm->scratch[0].encode; + const gen6_pte_t scratch_pte = vm->scratch[0]->encode; unsigned int pde = first_entry / GEN6_PTES; unsigned int pte = first_entry % GEN6_PTES; unsigned int num_entries = length / I915_GTT_PAGE_SIZE; @@ -90,8 +92,6 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm, const unsigned int count = min(num_entries, GEN6_PTES - pte); gen6_pte_t *vaddr; - GEM_BUG_ON(px_base(pt) == px_base(&vm->scratch[1])); - num_entries -= count;
[Intel-gfx] [PATCH 12/20] drm/i915/gem: Asynchronous GTT unbinding
It is reasonably common for userspace (even modern drivers like iris) to reuse an active address for a new buffer. This would cause the application to stall under its mutex (originally struct_mutex) until the old batches were idle and it could synchronously remove the stale PTE. However, we can queue up a job that waits on the signal for the old nodes to complete and upon those signals, remove the old nodes replacing them with the new ones for the batch. This is still CPU driven, but in theory we can do the GTT patching from the GPU. The job itself has a completion signal allowing the execbuf to wait upon the rebinding, and also other observers to coordinate with the common VM activity. Letting userspace queue up more work, lets it do more stuff without blocking other clients. In turn, we take care not to let it too much concurrent work, creating a small number of queues for each context to limit the number of concurrent tasks. The implementation relies on only scheduling one unbind operation per vma as we use the unbound vma->node location to track the stale PTE. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1402 Signed-off-by: Chris Wilson Cc: Matthew Auld Cc: Andi Shyti --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 917 -- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 1 + drivers/gpu/drm/i915/gt/intel_gtt.c | 4 + drivers/gpu/drm/i915/gt/intel_gtt.h | 2 + drivers/gpu/drm/i915/i915_gem.c | 7 + drivers/gpu/drm/i915/i915_gem_gtt.c | 5 + drivers/gpu/drm/i915/i915_vma.c | 71 +- drivers/gpu/drm/i915/i915_vma.h | 4 + 8 files changed, 883 insertions(+), 128 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 4d8ac89c56fc..6a406e8798ef 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -18,6 +18,7 @@ #include "gt/intel_gt.h" #include "gt/intel_gt_buffer_pool.h" #include "gt/intel_gt_pm.h" +#include "gt/intel_gt_requests.h" #include "gt/intel_ring.h" #include "i915_drv.h" @@ -43,6 +44,12 @@ struct eb_vma { u32 handle; }; +struct eb_bind_vma { + struct eb_vma *ev; + struct drm_mm_node hole; + unsigned int bind_flags; +}; + struct eb_vma_array { struct kref kref; struct eb_vma vma[]; @@ -66,11 +73,12 @@ struct eb_vma_array { I915_EXEC_RESOURCE_STREAMER) /* Catch emission of unexpected errors for CI! */ +#define __EINVAL__ 22 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) #undef EINVAL #define EINVAL ({ \ DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \ - 22; \ + __EINVAL__; \ }) #endif @@ -311,6 +319,12 @@ static struct eb_vma_array *eb_vma_array_create(unsigned int count) return arr; } +static struct eb_vma_array *eb_vma_array_get(struct eb_vma_array *arr) +{ + kref_get(&arr->kref); + return arr; +} + static inline void eb_unreserve_vma(struct eb_vma *ev) { struct i915_vma *vma = ev->vma; @@ -444,7 +458,10 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry, const struct i915_vma *vma, unsigned int flags) { - if (vma->node.size < entry->pad_to_size) + if (test_bit(I915_VMA_ERROR_BIT, __i915_vma_flags(vma))) + return true; + + if (vma->node.size < max(vma->size, entry->pad_to_size)) return true; if (entry->alignment && !IS_ALIGNED(vma->node.start, entry->alignment)) @@ -469,32 +486,6 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry, return false; } -static u64 eb_pin_flags(const struct drm_i915_gem_exec_object2 *entry, - unsigned int exec_flags) -{ - u64 pin_flags = 0; - - if (exec_flags & EXEC_OBJECT_NEEDS_GTT) - pin_flags |= PIN_GLOBAL; - - /* -* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset, -* limit address to the first 4GBs for unflagged objects. -*/ - if (!(exec_flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)) - pin_flags |= PIN_ZONE_4G; - - if (exec_flags & __EXEC_OBJECT_NEEDS_MAP) - pin_flags |= PIN_MAPPABLE; - - if (exec_flags & EXEC_OBJECT_PINNED) - pin_flags |= entry->offset | PIN_OFFSET_FIXED; - else if (exec_flags & __EXEC_OBJECT_NEEDS_BIAS) - pin_flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS; - - return pin_flags; -} - static bool eb_pin_vma_fence_inplace(struct eb_vma *ev) { struct i915_vma *vma = ev->vma; @@ -522,6 +513,10 @@ eb_pin_vma_inplace(struct i915_execbuffer *eb, struct i915_vma *vma = ev->vma; unsigned int pin_flags; + /* Concurrent async binds in progress, get in the queue */ + if (!i915_active_is_idle(&vma->vm->binding)) + return false
[Intel-gfx] [PATCH 10/20] drm/i915: Export a preallocate variant of i915_active_acquire()
Sometimes we have to be very careful not to allocate underneath a mutex (or spinlock) and yet still want to track activity. Enter i915_active_acquire_for_context(). This raises the activity counter on i915_active prior to use and ensures that the fence-tree contains a slot for the context. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 2 +- drivers/gpu/drm/i915/gt/intel_timeline.c | 4 +- drivers/gpu/drm/i915/i915_active.c| 113 +++--- drivers/gpu/drm/i915/i915_active.h| 14 ++- 4 files changed, 113 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 1015c4fd9f3e..6d20be29ff3c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1789,7 +1789,7 @@ __parser_mark_active(struct i915_vma *vma, { struct intel_gt_buffer_pool_node *node = vma->private; - return i915_active_ref(&node->active, tl, fence); + return i915_active_ref(&node->active, tl->fence_context, fence); } static int diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index 4546284fede1..e4a5326633b8 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -479,7 +479,9 @@ __intel_timeline_get_seqno(struct intel_timeline *tl, * free it after the current request is retired, which ensures that * all writes into the cacheline from previous requests are complete. */ - err = i915_active_ref(&tl->hwsp_cacheline->active, tl, &rq->fence); + err = i915_active_ref(&tl->hwsp_cacheline->active, + tl->fence_context, + &rq->fence); if (err) goto err_cacheline; diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index d960d0be5bd2..3f595446fd44 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -217,11 +217,10 @@ excl_retire(struct dma_fence *fence, struct dma_fence_cb *cb) } static struct i915_active_fence * -active_instance(struct i915_active *ref, struct intel_timeline *tl) +active_instance(struct i915_active *ref, u64 idx) { struct active_node *node, *prealloc; struct rb_node **p, *parent; - u64 idx = tl->fence_context; /* * We track the most recently used timeline to skip a rbtree search @@ -353,21 +352,17 @@ __active_del_barrier(struct i915_active *ref, struct active_node *node) return active_del_barrier(ref, node, barrier_to_engine(node)); } -int i915_active_ref(struct i915_active *ref, - struct intel_timeline *tl, - struct dma_fence *fence) +int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence) { struct i915_active_fence *active; int err; - lockdep_assert_held(&tl->mutex); - /* Prevent reaping in case we malloc/wait while building the tree */ err = i915_active_acquire(ref); if (err) return err; - active = active_instance(ref, tl); + active = active_instance(ref, idx); if (!active) { err = -ENOMEM; goto out; @@ -384,32 +379,104 @@ int i915_active_ref(struct i915_active *ref, atomic_dec(&ref->count); } if (!__i915_active_fence_set(active, fence)) - atomic_inc(&ref->count); + __i915_active_acquire(ref); out: i915_active_release(ref); return err; } -struct dma_fence * -i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f) +static struct dma_fence * +__i915_active_set_fence(struct i915_active *ref, + struct i915_active_fence *active, + struct dma_fence *fence) { struct dma_fence *prev; /* We expect the caller to manage the exclusive timeline ordering */ GEM_BUG_ON(i915_active_is_idle(ref)); + if (is_barrier(active)) { /* proto-node used by our idle barrier */ + /* +* This request is on the kernel_context timeline, and so +* we can use it to substitute for the pending idle-barrer +* request that we want to emit on the kernel_context. +*/ + __active_del_barrier(ref, node_from_active(active)); + RCU_INIT_POINTER(active->fence, NULL); + atomic_dec(&ref->count); + } + rcu_read_lock(); - prev = __i915_active_fence_set(&ref->excl, f); + prev = __i915_active_fence_set(active, fence); if (prev) prev = dma_fence_get_rcu(prev); else - atomic_inc(&ref->count); + __i915_active_acquire(ref); rcu_read_un
[Intel-gfx] [PATCH 17/20] drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.
From: Maarten Lankhorst i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory eviction. We don't use it yet, but lets start adding the definition first. To use it, we have to pass a non-NULL ww to gem_object_lock, and don't unlock directly. It is done in i915_gem_ww_ctx_fini. Changes since v1: - Change ww_ctx and obj order in locking functions (Jonas Lahtinen) Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/i915/Makefile | 4 + drivers/gpu/drm/i915/i915_globals.c | 1 + drivers/gpu/drm/i915/i915_globals.h | 1 + drivers/gpu/drm/i915/mm/i915_acquire_ctx.c| 139 ++ drivers/gpu/drm/i915/mm/i915_acquire_ctx.h| 34 +++ drivers/gpu/drm/i915/mm/st_acquire_ctx.c | 242 ++ .../drm/i915/selftests/i915_mock_selftests.h | 1 + 7 files changed, 422 insertions(+) create mode 100644 drivers/gpu/drm/i915/mm/i915_acquire_ctx.c create mode 100644 drivers/gpu/drm/i915/mm/i915_acquire_ctx.h create mode 100644 drivers/gpu/drm/i915/mm/st_acquire_ctx.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 41a27fd5dbc7..33c85b4ff3ed 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -124,6 +124,10 @@ gt-y += \ gt/gen9_renderstate.o i915-y += $(gt-y) +# Memory + DMA management +i915-y += \ + mm/i915_acquire_ctx.o + # GEM (Graphics Execution Management) code gem-y += \ gem/i915_gem_busy.o \ diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c index 3aa213684293..51ec42a14694 100644 --- a/drivers/gpu/drm/i915/i915_globals.c +++ b/drivers/gpu/drm/i915/i915_globals.c @@ -87,6 +87,7 @@ static void __i915_globals_cleanup(void) static __initconst int (* const initfn[])(void) = { i915_global_active_init, + i915_global_acquire_init, i915_global_buddy_init, i915_global_context_init, i915_global_gem_context_init, diff --git a/drivers/gpu/drm/i915/i915_globals.h b/drivers/gpu/drm/i915/i915_globals.h index b2f5cd9b9b1a..11227abf2769 100644 --- a/drivers/gpu/drm/i915/i915_globals.h +++ b/drivers/gpu/drm/i915/i915_globals.h @@ -27,6 +27,7 @@ void i915_globals_exit(void); /* constructors */ int i915_global_active_init(void); +int i915_global_acquire_init(void); int i915_global_buddy_init(void); int i915_global_context_init(void); int i915_global_gem_context_init(void); diff --git a/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c b/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c new file mode 100644 index ..d1c3b958c15d --- /dev/null +++ b/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ + +#include + +#include "i915_globals.h" +#include "gem/i915_gem_object.h" + +#include "i915_acquire_ctx.h" + +static struct i915_global_acquire { + struct i915_global base; + struct kmem_cache *slab_acquires; +} global; + +struct i915_acquire { + struct drm_i915_gem_object *obj; + struct i915_acquire *next; +}; + +static struct i915_acquire *i915_acquire_alloc(void) +{ + return kmem_cache_alloc(global.slab_acquires, GFP_KERNEL); +} + +static void i915_acquire_free(struct i915_acquire *lnk) +{ + kmem_cache_free(global.slab_acquires, lnk); +} + +void i915_acquire_ctx_init(struct i915_acquire_ctx *ctx) +{ + ww_acquire_init(&ctx->ctx, &reservation_ww_class); + ctx->locked = NULL; +} + +int i915_acquire_ctx_lock(struct i915_acquire_ctx *ctx, + struct drm_i915_gem_object *obj) +{ + struct i915_acquire *lock, *lnk; + int err; + + lock = i915_acquire_alloc(); + if (!lock) + return -ENOMEM; + + lock->obj = i915_gem_object_get(obj); + lock->next = NULL; + + while ((lnk = lock)) { + obj = lnk->obj; + lock = lnk->next; + + err = dma_resv_lock_interruptible(obj->base.resv, &ctx->ctx); + if (err == -EDEADLK) { + struct i915_acquire *old; + + while ((old = ctx->locked)) { + i915_gem_object_unlock(old->obj); + ctx->locked = old->next; + old->next = lock; + lock = old; + } + + err = dma_resv_lock_slow_interruptible(obj->base.resv, + &ctx->ctx); + } + if (!err) { + lnk->next = ctx->locked; + ctx->locked = lnk; + } else { + i915_gem_object_put(obj); + i915_acquire_free(lnk); + } + if (err == -EALREADY) + err = 0; + if (err) + break; + } + + w
[Intel-gfx] [PATCH 04/20] drm/i915/gem: Rename execbuf.bind_link to unbound_link
Rename the current list of unbound objects so that we can track of all objects that we need to bind, as well as the list of currently unbound [unprocessed] objects. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index e4d06db3f313..bf8193d9e279 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -33,7 +33,7 @@ struct eb_vma { /** This vma's place in the execbuf reservation list */ struct drm_i915_gem_exec_object2 *exec; - struct list_head bind_link; + struct list_head unbound_link; struct list_head reloc_link; struct hlist_node node; @@ -594,7 +594,7 @@ eb_add_vma(struct i915_execbuffer *eb, } } else { eb_unreserve_vma(ev); - list_add_tail(&ev->bind_link, &eb->unbound); + list_add_tail(&ev->unbound_link, &eb->unbound); } } @@ -699,7 +699,7 @@ static int eb_reserve(struct i915_execbuffer *eb) if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex)) return -EINTR; - list_for_each_entry(ev, &eb->unbound, bind_link) { + list_for_each_entry(ev, &eb->unbound, unbound_link) { err = eb_reserve_vma(eb, ev, pin_flags); if (err) break; @@ -725,15 +725,15 @@ static int eb_reserve(struct i915_execbuffer *eb) if (flags & EXEC_OBJECT_PINNED) /* Pinned must have their slot */ - list_add(&ev->bind_link, &eb->unbound); + list_add(&ev->unbound_link, &eb->unbound); else if (flags & __EXEC_OBJECT_NEEDS_MAP) /* Map require the lowest 256MiB (aperture) */ - list_add_tail(&ev->bind_link, &eb->unbound); + list_add_tail(&ev->unbound_link, &eb->unbound); else if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)) /* Prioritise 4GiB region for restricted bo */ - list_add(&ev->bind_link, &last); + list_add(&ev->unbound_link, &last); else - list_add_tail(&ev->bind_link, &last); + list_add_tail(&ev->unbound_link, &last); } list_splice_tail(&last, &eb->unbound); mutex_unlock(&eb->i915->drm.struct_mutex); -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 19/20] drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class
Our goal is to pull all memory reservations (next iteration obj->ops->get_pages()) under a ww_mutex, and to align those reservations with other drivers, i.e. control all such allocations with the reservation_ww_class. Currently, this is under the purview of the obj->mm.mutex, and while obj->mm remains an embedded struct we can "simply" switch to using the reservation_ww_class obj->base.resv->lock The major consequence is the impact on the shrinker paths as the reservation_ww_class is used to wrap allocations, and a ww_mutex does not support subclassing so we cannot do our usual trick of knowing that we never recurse inside the shrinker and instead have to finish the reclaim with a trylock. This may result in us failing to release the pages after having released the vma. This will have to do until a better idea comes along. However, this step only converts the mutex over and continues to treat everything as a single allocation and pinning the pages. With the ww_mutex in place we can remove the temporary pinning, as we can then reserve all storage en masse. One last thing to do: kill the implict page pinning for active vma. This will require us to invalidate the vma->pages when the backing store is removed (and we expect that while the vma is active, we mark the backing store as active so that it cannot be removed while the HW is busy.) Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 20 +- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 18 +- drivers/gpu/drm/i915/gem/i915_gem_domain.c| 65 ++--- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 44 +++- drivers/gpu/drm/i915/gem/i915_gem_object.c| 8 +- drivers/gpu/drm/i915/gem/i915_gem_object.h| 37 +-- .../gpu/drm/i915/gem/i915_gem_object_types.h | 1 - drivers/gpu/drm/i915/gem/i915_gem_pages.c | 136 +-- drivers/gpu/drm/i915/gem/i915_gem_phys.c | 8 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 13 +- drivers/gpu/drm/i915/gem/i915_gem_tiling.c| 2 - drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 15 +- .../gpu/drm/i915/gem/selftests/huge_pages.c | 32 ++- .../i915/gem/selftests/i915_gem_coherency.c | 14 +- .../drm/i915/gem/selftests/i915_gem_context.c | 10 +- .../drm/i915/gem/selftests/i915_gem_mman.c| 2 + drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 2 - drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 1 - drivers/gpu/drm/i915/gt/intel_ggtt.c | 5 +- drivers/gpu/drm/i915/gt/intel_gtt.h | 2 - drivers/gpu/drm/i915/gt/intel_ppgtt.c | 1 + drivers/gpu/drm/i915/i915_gem.c | 16 +- drivers/gpu/drm/i915/i915_vma.c | 225 +++--- drivers/gpu/drm/i915/i915_vma_types.h | 6 - drivers/gpu/drm/i915/mm/i915_acquire_ctx.c| 11 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 4 +- .../drm/i915/selftests/intel_memory_region.c | 17 +- 27 files changed, 319 insertions(+), 396 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c index bc0223716906..a32fd0d5570b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c @@ -27,16 +27,8 @@ static void __do_clflush(struct drm_i915_gem_object *obj) static int clflush_work(struct dma_fence_work *base) { struct clflush *clflush = container_of(base, typeof(*clflush), base); - struct drm_i915_gem_object *obj = clflush->obj; - int err; - - err = i915_gem_object_pin_pages(obj); - if (err) - return err; - - __do_clflush(obj); - i915_gem_object_unpin_pages(obj); + __do_clflush(clflush->obj); return 0; } @@ -44,7 +36,7 @@ static void clflush_release(struct dma_fence_work *base) { struct clflush *clflush = container_of(base, typeof(*clflush), base); - i915_gem_object_put(clflush->obj); + i915_gem_object_unpin_pages(clflush->obj); } static const struct dma_fence_work_ops clflush_ops = { @@ -63,8 +55,14 @@ static struct clflush *clflush_work_create(struct drm_i915_gem_object *obj) if (!clflush) return NULL; + if (__i915_gem_object_get_pages_locked(obj)) { + kfree(clflush); + return NULL; + } + dma_fence_work_init(&clflush->base, &clflush_ops); - clflush->obj = i915_gem_object_get(obj); /* obj <-> clflush cycle */ + __i915_gem_object_pin_pages(obj); + clflush->obj = obj; /* Beware the obj.resv <-> clflush fence cycle */ return clflush; } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 2679380159fc..049a15e6b496 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -124,19 +124,12 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_dire bool write = (direction == DMA_B
[Intel-gfx] [PATCH 16/20] drm/i915/gem: Reintroduce multiple passes for reloc processing
The prospect of locking the entire submission sequence under a wide ww_mutex re-imposes some key restrictions, in particular that we must not call copy_(from|to)_user underneath the mutex (as the faulthandlers themselves may need to take the ww_mutex). To satisfy this requirement, we need to split the relocation handling into multiple phases again. After dropping the reservations, we need to allocate enough buffer space to both copy the relocations from userspace into, and serve as the relocation command buffer. Once we have finished copying the relocations, we can then re-aquire all the objects for the execbuf and rebind them, including our new relocations objects. After we have bound all the new and old objects into their final locations, we can then convert the relocation entries into the GPU commands to update the relocated vma. Finally, once it is all over and we have dropped the ww_mutex for the last time, we can then complete the update of the user relocation entries. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 842 -- .../i915/gem/selftests/i915_gem_execbuffer.c | 195 ++-- 2 files changed, 520 insertions(+), 517 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 629b736adf2c..fbf5c5cd51ca 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -35,6 +35,7 @@ struct eb_vma { /** This vma's place in the execbuf reservation list */ struct drm_i915_gem_exec_object2 *exec; + u32 bias; struct list_head bind_link; struct list_head unbound_link; @@ -60,15 +61,12 @@ struct eb_vma_array { #define __EXEC_OBJECT_HAS_PIN BIT(31) #define __EXEC_OBJECT_HAS_FENCEBIT(30) #define __EXEC_OBJECT_NEEDS_MAPBIT(29) -#define __EXEC_OBJECT_NEEDS_BIAS BIT(28) -#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 28) /* all of the above */ +#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 29) /* all of the above */ #define __EXEC_HAS_RELOC BIT(31) #define __EXEC_INTERNAL_FLAGS (~0u << 31) #define UPDATE PIN_OFFSET_FIXED -#define BATCH_OFFSET_BIAS (256*1024) - #define __I915_EXEC_ILLEGAL_FLAGS \ (__I915_EXEC_UNKNOWN_FLAGS | \ I915_EXEC_CONSTANTS_MASK | \ @@ -266,20 +264,18 @@ struct i915_execbuffer { * obj/page */ struct reloc_cache { - struct drm_mm_node node; /** temporary GTT binding */ unsigned int gen; /** Cached value of INTEL_GEN */ bool use_64bit_reloc : 1; - bool has_llc : 1; bool has_fence : 1; bool needs_unfenced : 1; struct intel_context *ce; - struct i915_vma *target; - struct i915_request *rq; - struct i915_vma *rq_vma; - u32 *rq_cmd; - unsigned int rq_size; + struct eb_relocs_link { + struct i915_vma *vma; + } head; + struct drm_i915_gem_relocation_entry *map; + unsigned int pos; } reloc_cache; struct eb_cmdparser { @@ -288,7 +284,7 @@ struct i915_execbuffer { } parser; u64 invalid_flags; /** Set of execobj.flags that are invalid */ - u32 context_flags; /** Set of execobj.flags to insert from the ctx */ + u32 context_bias; u32 batch_start_offset; /** Location within object of batch */ u32 batch_len; /** Length of batch within object */ @@ -308,6 +304,12 @@ static struct drm_i915_gem_exec_object2 no_entry = { .offset = -1ull }; +static u64 noncanonical_addr(u64 addr, const struct i915_address_space *vm) +{ + GEM_BUG_ON(!is_power_of_2(vm->total)); + return addr & (vm->total - 1); +} + static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb) { return intel_engine_requires_cmd_parser(eb->engine) || @@ -479,11 +481,12 @@ static int eb_create(struct i915_execbuffer *eb) return 0; } -static bool -eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry, -const struct i915_vma *vma, -unsigned int flags) +static bool eb_vma_misplaced(const struct eb_vma *ev) { + const struct drm_i915_gem_exec_object2 *entry = ev->exec; + const struct i915_vma *vma = ev->vma; + unsigned int flags = ev->flags; + if (test_bit(I915_VMA_ERROR_BIT, __i915_vma_flags(vma))) return true; @@ -497,8 +500,7 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry, vma->node.start != entry->offset) return true; - if (flags & __EXEC_OBJECT_NEEDS_BIAS && - vma->node.start < BATCH_OFFSET_BIAS) + if (vma->node.start < ev->bias) return true; if (!(flags & EXEC_OB
[Intel-gfx] [PATCH 01/20] drm/i915: Preallocate stashes for vma page-directories
We need to make the DMA allocations used for page directories to be performed up front so that we can include those allocations in our memory reservation pass. The downside is that we have to assume the worst case, even before we know the final layout, and always allocate enough page directories for this object, even when there will be overlap. It should be noted that the lifetime for the page directories DMA is more or less decoupled from individual fences as they will be shared across objects across timelines. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_client_blt.c| 11 +-- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 38 +++-- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 77 +- drivers/gpu/drm/i915/gt/intel_ggtt.c | 60 +++--- drivers/gpu/drm/i915/gt/intel_gtt.h | 39 ++--- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 80 --- drivers/gpu/drm/i915/i915_vma.c | 29 --- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 60 -- drivers/gpu/drm/i915/selftests/mock_gtt.c | 22 ++--- 9 files changed, 230 insertions(+), 186 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c index 278664f831e7..947c8aa8e13e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c @@ -32,12 +32,13 @@ static void vma_clear_pages(struct i915_vma *vma) vma->pages = NULL; } -static int vma_bind(struct i915_address_space *vm, - struct i915_vma *vma, - enum i915_cache_level cache_level, - u32 flags) +static void vma_bind(struct i915_address_space *vm, +struct i915_vm_pt_stash *stash, +struct i915_vma *vma, +enum i915_cache_level cache_level, +u32 flags) { - return vm->vma_ops.bind_vma(vm, vma, cache_level, flags); + vm->vma_ops.bind_vma(vm, stash, vma, cache_level, flags); } static void vma_unbind(struct i915_address_space *vm, struct i915_vma *vma) diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index 05497b50103f..35e2b698f9ed 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -177,16 +177,16 @@ static void gen6_flush_pd(struct gen6_ppgtt *ppgtt, u64 start, u64 end) mutex_unlock(&ppgtt->flush); } -static int gen6_alloc_va_range(struct i915_address_space *vm, - u64 start, u64 length) +static void gen6_alloc_va_range(struct i915_address_space *vm, + struct i915_vm_pt_stash *stash, + u64 start, u64 length) { struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm)); struct i915_page_directory * const pd = ppgtt->base.pd; - struct i915_page_table *pt, *alloc = NULL; + struct i915_page_table *pt; intel_wakeref_t wakeref; u64 from = start; unsigned int pde; - int ret = 0; wakeref = intel_runtime_pm_get(&vm->i915->runtime_pm); @@ -197,21 +197,17 @@ static int gen6_alloc_va_range(struct i915_address_space *vm, if (px_base(pt) == px_base(&vm->scratch[1])) { spin_unlock(&pd->lock); - pt = fetch_and_zero(&alloc); - if (!pt) - pt = alloc_pt(vm); - if (IS_ERR(pt)) { - ret = PTR_ERR(pt); - goto unwind_out; - } + pt = stash->pt[0]; + GEM_BUG_ON(!pt); fill32_px(pt, vm->scratch[0].encode); spin_lock(&pd->lock); if (pd->entry[pde] == &vm->scratch[1]) { + stash->pt[0] = pt->stash; + atomic_set(&pt->used, 0); pd->entry[pde] = pt; } else { - alloc = pt; pt = pd->entry[pde]; } } @@ -223,15 +219,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm, if (i915_vma_is_bound(ppgtt->vma, I915_VMA_GLOBAL_BIND)) gen6_flush_pd(ppgtt, from, start); - goto out; - -unwind_out: - gen6_ppgtt_clear_range(vm, from, start - from); -out: - if (alloc) - free_px(vm, alloc); intel_runtime_pm_put(&vm->i915->runtime_pm, wakeref); - return ret; } static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt) @@ -299,10 +287,11 @@ static void pd_vma_clear_pages(struct i915_vma *vma) vma->pages = NULL; } -static int pd_vma_bind(struct i915_address_space *vm, -
[Intel-gfx] s/obj->mm.lock//
This is the easy part; pulling reservation of multiple objects under an ww acquire context. With one simple rule that eviction is handled by the ww acquire context, we can carefully transition the driver over to using eviction. Instead of feeding the acquire context everywhere, we make the caller gather up all the objects they need to acquire into the context, then acquire their backing store. The major boon here is that by providing a clean set of objects (that we have not yet started to acquire any auxiliary attachments for) to the acquire context, it can handle all the EDEADLK processing for us [since it is a pure locking operation and does not need to release attachments upon revoking the locks]. As a sketch of what that would look like, to illustrate the major work remaining: static int evict(struct drm_i915_gem_object *obj, struct i915_acquire_ctx *ctx) { struct intel_memory_region *mem = obj->mm->region; struct drm_i915_gem_object *swap; // struct i915_mm_bo *swap struct i915_request *rq; int err; /* swap = mem->create_eviction_target(obj); */ swap = i915_gem_object_create_shmem(mem->i915, obj->base.size); if (IS_ERR(swap)) return PTR_ERR(swap); err = dma_resv_lock_interruptible(swap, &ctx->ctx); GEM_BUG_ON(err == -EALREADY); if (err == -EDEADLK) goto out; /* Obviously swap has to be carefully chosen so that this may succeed */ err = __i915_gem_object_get_pages_locked(swap); if (err) goto out_unlock; rq = pinned_evict_copy(ctx, obj, swap); if (IS_ERR(rq)) { err = PTR_ERR(rq); goto out_unlock; } err = i915_gem_revoke_mm(obj); if (err) goto out_request; /* Alternatively you could wait synchronously! */ mem->release_blocks(&obj->mm->blocks, rq); i915_mm_bo_put(xchg(&obj->mm, i915_mm_bo_get(swap))); dma_resv_add_exclusive_fence(obj->base.resv, &rq->fence); out_request: i915_request_put(rq); out_unlock: dma_resv_unlock(swap); out: i915_gem_object_put(swap); return err; } static int relock_all(struct i915_acquire_ctx *ctx) { struct i915_acquire_link *lnk, *lock; int err; for (lnk = ctx->locked; lnk; lnk = lnk->next) dma_resv_unlock(lnk->obj->base.resv); lock = fetch_and_zero(&ctx->locked); while ((lnk = lock)) { struct drm_i915_gem_object *obj; obj = lnk->obj; lock = lnk->next; if (ctx->locked) err = dma_resv_lock_interruptible(obj->base.resv, &ctx->ctx); else err = dma_resv_lock_slow_interruptible(obj->base.resv, &ctx->ctx); GEM_BUG_ON(err == -EALREADY); if (err == -EDEADLK) { struct i915_acquire *old; while ((old = ctx->locked)) { dma_resv_unlock(old->obj->base.resv); ctx->locked = old->next; old->next = lock; lock = old; } lnk->next = lock; lock = lnk; continue; } if (err) { lock = lnk; break; } lnk->next = ctx->locked; ctx->locked = lnk; } while ((lnk = lock)) { lock = lnk->next; i915_gem_object_put(lnk->obj); i915_acquire_free(lnk); } return err; } int i915_acquire_mm(struct i915_acquire_ctx *ctx) { struct i915_acquire_link *lnk; int n, err; restart: for (lnk = ctx->locked; lnk; lnk = lnk->next) { for (n = 0; !i915_gem_object_has_pages(lnk->obj); n++) { struct drm_i915_gem_object *evictee = NULL; mem = get_preferred_memregion_for_object(lnk->obj, n); if (!mem) return -ENXIO; while (!i915_gem_object_get_pages(lnk->obj)) { struct i915_acquire_link *this; evictee = mem->get_eviction_candidate(mem, evictee); if (!evictee) break; err = dma_resv_lock_interruptible(evictee, &ctx-
[Intel-gfx] [PATCH 07/20] drm/i915: Add list_for_each_entry_safe_continue_reverse
One more list iterator variant, for when we want to unwind from inside one list iterator with the intention of restarting from the current entry as the new head of the list. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_utils.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h index 03a73d2bd50d..6ebccdd12d4c 100644 --- a/drivers/gpu/drm/i915/i915_utils.h +++ b/drivers/gpu/drm/i915/i915_utils.h @@ -266,6 +266,12 @@ static inline int list_is_last_rcu(const struct list_head *list, return READ_ONCE(list->next) == head; } +#define list_for_each_entry_safe_continue_reverse(pos, n, head, member) \ + for (pos = list_prev_entry(pos, member),\ +n = list_prev_entry(pos, member); \ +&pos->member != (head);\ +pos = n, n = list_prev_entry(n, member)) + /* * Wait until the work is finally complete, even if it tries to postpone * by requeueing itself. Note, that if the worker never cancels itself, -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 06/20] drm/i915/gem: Remove the call for no-evict i915_vma_pin
Remove the stub i915_vma_pin() used for incrementally pining objects for execbuf (under the severe restriction that they must not wait on a resource as we may have already pinned it) and replace it with a i915_vma_pin_inplace() that is only allowed to reclaim the currently bound location for the vma (and will never wait for a pinned resource). Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 69 +++ drivers/gpu/drm/i915/i915_vma.c | 6 +- drivers/gpu/drm/i915/i915_vma.h | 2 + 3 files changed, 45 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 35a57c1fc9c3..1015c4fd9f3e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -452,49 +452,55 @@ static u64 eb_pin_flags(const struct drm_i915_gem_exec_object2 *entry, return pin_flags; } +static bool eb_pin_vma_fence_inplace(struct eb_vma *ev) +{ + struct i915_vma *vma = ev->vma; + struct i915_fence_reg *reg = vma->fence; + + if (reg) { + if (READ_ONCE(reg->dirty)) + return false; + + atomic_inc(®->pin_count); + ev->flags |= __EXEC_OBJECT_HAS_FENCE; + } else { + if (i915_gem_object_is_tiled(vma->obj)) + return false; + } + + return true; +} + static inline bool -eb_pin_vma(struct i915_execbuffer *eb, - const struct drm_i915_gem_exec_object2 *entry, - struct eb_vma *ev) +eb_pin_vma_inplace(struct i915_execbuffer *eb, + const struct drm_i915_gem_exec_object2 *entry, + struct eb_vma *ev) { struct i915_vma *vma = ev->vma; - u64 pin_flags; + unsigned int pin_flags; - if (vma->node.size) - pin_flags = vma->node.start; - else - pin_flags = entry->offset & PIN_OFFSET_MASK; + if (eb_vma_misplaced(entry, vma, ev->flags)) + return false; - pin_flags |= PIN_USER | PIN_NOEVICT | PIN_OFFSET_FIXED; + pin_flags = PIN_USER; if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_GTT)) pin_flags |= PIN_GLOBAL; /* Attempt to reuse the current location if available */ - if (unlikely(i915_vma_pin(vma, 0, 0, pin_flags))) { - if (entry->flags & EXEC_OBJECT_PINNED) - return false; - - /* Failing that pick any _free_ space if suitable */ - if (unlikely(i915_vma_pin(vma, - entry->pad_to_size, - entry->alignment, - eb_pin_flags(entry, ev->flags) | - PIN_USER | PIN_NOEVICT))) - return false; - } + if (!i915_vma_pin_inplace(vma, pin_flags)) + return false; if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_FENCE)) { - if (unlikely(i915_vma_pin_fence(vma))) { - i915_vma_unpin(vma); + if (!eb_pin_vma_fence_inplace(ev)) { + __i915_vma_unpin(vma); return false; } - - if (vma->fence) - ev->flags |= __EXEC_OBJECT_HAS_FENCE; } + GEM_BUG_ON(eb_vma_misplaced(entry, vma, ev->flags)); + ev->flags |= __EXEC_OBJECT_HAS_PIN; - return !eb_vma_misplaced(entry, vma, ev->flags); + return true; } static int @@ -676,14 +682,17 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) struct drm_i915_gem_exec_object2 *entry = ev->exec; struct i915_vma *vma = ev->vma; - if (eb_pin_vma(eb, entry, ev)) { + if (eb_pin_vma_inplace(eb, entry, ev)) { if (entry->offset != vma->node.start) { entry->offset = vma->node.start | UPDATE; eb->args->flags |= __EXEC_HAS_RELOC; } } else { - eb_unreserve_vma(ev); - list_add_tail(&ev->unbound_link, &unbound); + /* Lightly sort user placed objects to the fore */ + if (ev->flags & EXEC_OBJECT_PINNED) + list_add(&ev->unbound_link, &unbound); + else + list_add_tail(&ev->unbound_link, &unbound); } } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 3b43d485d7c2..0c9af30fc3d6 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -740,11 +740,13 @@ i915_vma_detach(struct i915_vma *vma) list_del(&vma->vm_link); } -static bool t
[Intel-gfx] [PATCH 11/20] drm/i915/gem: Separate the ww_mutex walker into its own list
In preparation for making eb_vma bigger and heavy to run inn parallel, we need to stop apply an in-place swap() to reorder around ww_mutex deadlocks. Keep the array intact and reorder the locks using a dedicated list. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 83 --- 1 file changed, 54 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 6d20be29ff3c..4d8ac89c56fc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -37,6 +37,7 @@ struct eb_vma { struct list_head bind_link; struct list_head unbound_link; struct list_head reloc_link; + struct list_head submit_link; struct hlist_node node; u32 handle; @@ -248,6 +249,8 @@ struct i915_execbuffer { /** list of vma that have execobj.relocation_count */ struct list_head relocs; + struct list_head submit_list; + /** * Track the most recently used object for relocations, as we * frequently have to perform multiple relocations within the same @@ -341,6 +344,42 @@ static void eb_vma_array_put(struct eb_vma_array *arr) kref_put(&arr->kref, eb_vma_array_destroy); } +static int +eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire) +{ + struct eb_vma *ev; + int err = 0; + + list_for_each_entry(ev, &eb->submit_list, submit_link) { + struct i915_vma *vma = ev->vma; + + err = ww_mutex_lock_interruptible(&vma->resv->lock, acquire); + if (err == -EDEADLK) { + struct eb_vma *unlock = ev, *en; + + list_for_each_entry_safe_continue_reverse(unlock, en, + &eb->submit_list, + submit_link) { + ww_mutex_unlock(&unlock->vma->resv->lock); + list_move_tail(&unlock->submit_link, &eb->submit_list); + } + + GEM_BUG_ON(!list_is_first(&ev->submit_link, &eb->submit_list)); + err = ww_mutex_lock_slow_interruptible(&vma->resv->lock, + acquire); + } + if (err) { + list_for_each_entry_continue_reverse(ev, +&eb->submit_list, +submit_link) + ww_mutex_unlock(&ev->vma->resv->lock); + break; + } + } + + return err; +} + static int eb_create(struct i915_execbuffer *eb) { /* Allocate an extra slot for use by the command parser + sentinel */ @@ -393,6 +432,10 @@ static int eb_create(struct i915_execbuffer *eb) eb->lut_size = -eb->buffer_count; } + INIT_LIST_HEAD(&eb->bind_list); + INIT_LIST_HEAD(&eb->submit_list); + INIT_LIST_HEAD(&eb->relocs); + return 0; } @@ -574,6 +617,7 @@ eb_add_vma(struct i915_execbuffer *eb, } list_add_tail(&ev->bind_link, &eb->bind_list); + list_add_tail(&ev->submit_link, &eb->submit_list); if (entry->relocation_count) list_add_tail(&ev->reloc_link, &eb->relocs); @@ -910,9 +954,6 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) unsigned int i; int err = 0; - INIT_LIST_HEAD(&eb->bind_list); - INIT_LIST_HEAD(&eb->relocs); - for (i = 0; i < eb->buffer_count; i++) { struct i915_vma *vma; @@ -1583,38 +1624,19 @@ static int eb_relocate(struct i915_execbuffer *eb) static int eb_move_to_gpu(struct i915_execbuffer *eb) { - const unsigned int count = eb->buffer_count; struct ww_acquire_ctx acquire; - unsigned int i; + struct eb_vma *ev; int err = 0; ww_acquire_init(&acquire, &reservation_ww_class); - for (i = 0; i < count; i++) { - struct eb_vma *ev = &eb->vma[i]; - struct i915_vma *vma = ev->vma; - - err = ww_mutex_lock_interruptible(&vma->resv->lock, &acquire); - if (err == -EDEADLK) { - GEM_BUG_ON(i == 0); - do { - int j = i - 1; - - ww_mutex_unlock(&eb->vma[j].vma->resv->lock); - - swap(eb->vma[i], eb->vma[j]); - } while (--i); + err = eb_lock_vma(eb, &acquire); + if (err) + goto err_fini; - err = ww_mutex_lock_slow_interruptible(&vma->resv->lock, -
[Intel-gfx] [PATCH 18/20] drm/i915/gem: Pull execbuf dma resv under a single critical section
Acquire all the objects and their backing storage, and page directories, as used by execbuf under a single common ww_mutex. Albeit we have to restart the critical section a few times in order to handle various restrictions (such as avoiding copy_(from|to)_user and mmap_sem). Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 157 -- .../i915/gem/selftests/i915_gem_execbuffer.c | 8 +- 2 files changed, 78 insertions(+), 87 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index fbf5c5cd51ca..bcd100f8a6c7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -20,6 +20,7 @@ #include "gt/intel_gt_pm.h" #include "gt/intel_gt_requests.h" #include "gt/intel_ring.h" +#include "mm/i915_acquire_ctx.h" #include "i915_drv.h" #include "i915_gem_clflush.h" @@ -244,6 +245,8 @@ struct i915_execbuffer { struct intel_context *context; /* logical state for the request */ struct i915_gem_context *gem_context; /** caller's context */ + struct i915_acquire_ctx acquire; /** lock for _all_ DMA reservations */ + struct i915_request *request; /** our request to build */ struct eb_vma *batch; /** identity of the batch obj/vma */ @@ -386,42 +389,6 @@ static void eb_vma_array_put(struct eb_vma_array *arr) kref_put(&arr->kref, eb_vma_array_destroy); } -static int -eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire) -{ - struct eb_vma *ev; - int err = 0; - - list_for_each_entry(ev, &eb->submit_list, submit_link) { - struct i915_vma *vma = ev->vma; - - err = ww_mutex_lock_interruptible(&vma->resv->lock, acquire); - if (err == -EDEADLK) { - struct eb_vma *unlock = ev, *en; - - list_for_each_entry_safe_continue_reverse(unlock, en, - &eb->submit_list, - submit_link) { - ww_mutex_unlock(&unlock->vma->resv->lock); - list_move_tail(&unlock->submit_link, &eb->submit_list); - } - - GEM_BUG_ON(!list_is_first(&ev->submit_link, &eb->submit_list)); - err = ww_mutex_lock_slow_interruptible(&vma->resv->lock, - acquire); - } - if (err) { - list_for_each_entry_continue_reverse(ev, -&eb->submit_list, -submit_link) - ww_mutex_unlock(&ev->vma->resv->lock); - break; - } - } - - return err; -} - static int eb_create(struct i915_execbuffer *eb) { /* Allocate an extra slot for use by the sentinel */ @@ -665,6 +632,25 @@ eb_add_vma(struct i915_execbuffer *eb, } } +static int eb_lock_mm(struct i915_execbuffer *eb) +{ + struct eb_vma *ev; + int err; + + list_for_each_entry(ev, &eb->bind_list, bind_link) { + err = i915_acquire_ctx_lock(&eb->acquire, ev->vma->obj); + if (err) + return err; + } + + return 0; +} + +static int eb_acquire_mm(struct i915_execbuffer *eb) +{ + return i915_acquire_mm(&eb->acquire); +} + struct eb_vm_work { struct dma_fence_work base; struct eb_vma_array *array; @@ -1390,7 +1376,15 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) unsigned long count; struct eb_vma *ev; unsigned int pass; - int err = 0; + int err; + + err = eb_lock_mm(eb); + if (err) + return err; + + err = eb_acquire_mm(eb); + if (err) + return err; count = 0; INIT_LIST_HEAD(&unbound); @@ -1416,10 +1410,19 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) if (count == 0) return 0; + /* We need to reserve page directories, release all, start over */ + i915_acquire_ctx_fini(&eb->acquire); + pass = 0; do { struct eb_vm_work *work; + i915_acquire_ctx_init(&eb->acquire); + + err = eb_lock_mm(eb); + if (err) + return err; + work = eb_vm_work(eb, count); if (!work) return -ENOMEM; @@ -1437,6 +1440,10 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) } } + err = eb_acquire_mm(eb); + if (err) + return eb_vm_work_cancel(work, err); +
[Intel-gfx] [PATCH 03/20] drm/i915/gem: Don't drop the timeline lock during execbuf
Our timeline lock is our defence against a concurrent execbuf interrupting our request construction. we need hold it throughout or, for example, a second thread may interject a relocation request in between our own relocation request and execution in the ring. A second, major benefit, is that it allows us to preserve a large chunk of the ringbuffer for our exclusive use; which should virtually eliminate the threat of hitting a wait_for_space during request construction -- although we should have already dropped other contentious locks at that point. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 241 -- .../i915/gem/selftests/i915_gem_execbuffer.c | 24 +- 2 files changed, 186 insertions(+), 79 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index dd15a799f9d6..e4d06db3f313 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -259,6 +259,8 @@ struct i915_execbuffer { bool has_fence : 1; bool needs_unfenced : 1; + struct intel_context *ce; + struct i915_vma *target; struct i915_request *rq; struct i915_vma *rq_vma; @@ -639,6 +641,35 @@ static int eb_reserve_vma(const struct i915_execbuffer *eb, return 0; } +static void retire_requests(struct intel_timeline *tl, struct i915_request *end) +{ + struct i915_request *rq, *rn; + + list_for_each_entry_safe(rq, rn, &tl->requests, link) + if (rq == end || !i915_request_retire(rq)) + break; +} + +static int wait_for_timeline(struct intel_timeline *tl) +{ + do { + struct dma_fence *fence; + int err; + + fence = i915_active_fence_get(&tl->last_request); + if (!fence) + return 0; + + err = dma_fence_wait(fence, true); + dma_fence_put(fence); + if (err) + return err; + + /* Retiring may trigger a barrier, requiring an extra pass */ + retire_requests(tl, NULL); + } while (1); +} + static int eb_reserve(struct i915_execbuffer *eb) { const unsigned int count = eb->buffer_count; @@ -646,7 +677,6 @@ static int eb_reserve(struct i915_execbuffer *eb) struct list_head last; struct eb_vma *ev; unsigned int i, pass; - int err = 0; /* * Attempt to pin all of the buffers into the GTT. @@ -662,18 +692,22 @@ static int eb_reserve(struct i915_execbuffer *eb) * room for the earlier objects *unless* we need to defragment. */ - if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex)) - return -EINTR; - pass = 0; do { + int err = 0; + + if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex)) + return -EINTR; + list_for_each_entry(ev, &eb->unbound, bind_link) { err = eb_reserve_vma(eb, ev, pin_flags); if (err) break; } - if (!(err == -ENOSPC || err == -EAGAIN)) - break; + if (!(err == -ENOSPC || err == -EAGAIN)) { + mutex_unlock(&eb->i915->drm.struct_mutex); + return err; + } /* Resort *all* the objects into priority order */ INIT_LIST_HEAD(&eb->unbound); @@ -702,11 +736,10 @@ static int eb_reserve(struct i915_execbuffer *eb) list_add_tail(&ev->bind_link, &last); } list_splice_tail(&last, &eb->unbound); + mutex_unlock(&eb->i915->drm.struct_mutex); if (err == -EAGAIN) { - mutex_unlock(&eb->i915->drm.struct_mutex); flush_workqueue(eb->i915->mm.userptr_wq); - mutex_lock(&eb->i915->drm.struct_mutex); continue; } @@ -715,25 +748,23 @@ static int eb_reserve(struct i915_execbuffer *eb) break; case 1: - /* Too fragmented, unbind everything and retry */ - mutex_lock(&eb->context->vm->mutex); - err = i915_gem_evict_vm(eb->context->vm); - mutex_unlock(&eb->context->vm->mutex); + /* +* Too fragmented, retire everything on the timeline +* and so make it all [contexts included] available to +* evict. +*/ + err = wait_for_timeline(eb->context->timeline); if (err) -
Re: [Intel-gfx] linux-next: manual merge of the drm-intel tree with the drm-intel-fixes tree
Hi all, On Tue, 30 Jun 2020 11:52:02 +1000 Stephen Rothwell wrote: > > Today's linux-next merge of the drm-intel tree got a conflict in: > > drivers/gpu/drm/i915/gvt/handlers.c > > between commit: > > fc1e3aa0337c ("drm/i915/gvt: Fix incorrect check of enabled bits in mask > registers") > > from the drm-intel-fixes tree and commit: > > 5f4ae2704d59 ("drm/i915: Identify Cometlake platform") > > from the drm-intel tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > > diff --cc drivers/gpu/drm/i915/gvt/handlers.c > index fadd2adb8030,26cae4846c82.. > --- a/drivers/gpu/drm/i915/gvt/handlers.c > +++ b/drivers/gpu/drm/i915/gvt/handlers.c > @@@ -1731,8 -1734,9 +1734,9 @@@ static int ring_mode_mmio_write(struct > return 0; > } > > - if (IS_COFFEELAKE(vgpu->gvt->gt->i915) && > + if ((IS_COFFEELAKE(vgpu->gvt->gt->i915) || > + IS_COMETLAKE(vgpu->gvt->gt->i915)) && > -data & _MASKED_BIT_ENABLE(2)) { > +IS_MASKED_BITS_ENABLED(data, 2)) { > enter_failsafe_mode(vgpu, GVT_FAILSAFE_UNSUPPORTED_GUEST); > return 0; > } This is now a conflict between the drm tree and Linus' tree. -- Cheers, Stephen Rothwell pgpC3peoNAXK7.pgp Description: OpenPGP digital signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 17/23] drm/i915/gem: Asynchronous GTT unbinding
Quoting Andi Shyti (2020-07-05 23:01:29) > Hi Chris, > > > +static int gen6_fixup_ggtt(struct i915_vma *vma) > > you create this function here and remove it in patch 21. This > series is a bit confusing, can we have a final version of it? It get's removed because the next patches reorder all the pinning around this central function. Until that occurs, the fixup occurs after we do the pinning. And those patches depend on this to provide the central pinning. So it's a circular dependency and this patch needs to provide the fixup so that it works by itself. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 17/23] drm/i915/gem: Asynchronous GTT unbinding
Hi Chris, > +static int gen6_fixup_ggtt(struct i915_vma *vma) you create this function here and remove it in patch 21. This series is a bit confusing, can we have a final version of it? Andi ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 5/6] drm/i915/display: use port_info in intel_ddi_init
Hi Lucas, Thank you for the patch! Yet something to improve: [auto build test ERROR on drm-intel/for-linux-next] [cannot apply to drm-tip/drm-tip v5.8-rc3 next-20200703] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Lucas-De-Marchi/display-ddi-keep-register-indexes-in-a-table/20200625-081557 base: git://anongit.freedesktop.org/drm-intel for-linux-next config: x86_64-randconfig-a002-20200705 (attached as .config) compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project f804bd586ee58199db4cfb2da8e9ef067425900b) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): >> drivers/gpu/drm/i915/display/intel_ddi.c:4900:28: error: format string is >> not a string literal (potentially insecure) [-Werror,-Wformat-security] DRM_MODE_ENCODER_TMDS, port_info->name); ^~~ drivers/gpu/drm/i915/display/intel_ddi.c:4900:28: note: treat the string as an argument to avoid this DRM_MODE_ENCODER_TMDS, port_info->name); ^ "%s", 1 error generated. vim +4900 drivers/gpu/drm/i915/display/intel_ddi.c 4860 4861 void intel_ddi_init(struct drm_i915_private *dev_priv, 4862 const struct intel_ddi_port_info *port_info) 4863 { 4864 enum port port = port_info->port; 4865 struct intel_digital_port *intel_dig_port; 4866 struct intel_encoder *encoder; 4867 bool init_hdmi, init_dp, init_lspcon = false; 4868 4869 init_hdmi = intel_bios_port_supports_dvi(dev_priv, port) || 4870 intel_bios_port_supports_hdmi(dev_priv, port); 4871 init_dp = intel_bios_port_supports_dp(dev_priv, port); 4872 4873 if (intel_bios_is_lspcon_present(dev_priv, port)) { 4874 /* 4875 * Lspcon device needs to be driven with DP connector 4876 * with special detection sequence. So make sure DP 4877 * is initialized before lspcon. 4878 */ 4879 init_dp = true; 4880 init_lspcon = true; 4881 init_hdmi = false; 4882 drm_dbg_kms(&dev_priv->drm, "VBT says port %s has lspcon\n", 4883 port_info->name); 4884 } 4885 4886 if (!init_dp && !init_hdmi) { 4887 drm_dbg_kms(&dev_priv->drm, 4888 "VBT says port %s is not DVI/HDMI/DP compatible, respect it\n", 4889 port_info->name); 4890 return; 4891 } 4892 4893 intel_dig_port = kzalloc(sizeof(*intel_dig_port), GFP_KERNEL); 4894 if (!intel_dig_port) 4895 return; 4896 4897 encoder = &intel_dig_port->base; 4898 4899 drm_encoder_init(&dev_priv->drm, &encoder->base, &intel_ddi_funcs, > 4900 DRM_MODE_ENCODER_TMDS, port_info->name); --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx