[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [01/20] drm/i915: Preallocate stashes for vma page-directories

2020-07-05 Thread Patchwork
== Series Details ==

Series: series starting with [01/20] drm/i915: Preallocate stashes for vma 
page-directories
URL   : https://patchwork.freedesktop.org/series/79129/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8708 -> Patchwork_18082


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/index.html

Known issues


  Here are the changes found in Patchwork_18082 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_exec_suspend@basic-s0:
- fi-ilk-650: [PASS][1] -> [DMESG-WARN][2] ([i915#164])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-ilk-650/igt@gem_exec_susp...@basic-s0.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-ilk-650/igt@gem_exec_susp...@basic-s0.html

  * igt@gem_flink_basic@double-flink:
- fi-tgl-y:   [PASS][3] -> [DMESG-WARN][4] ([i915#402]) +1 similar 
issue
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-tgl-y/igt@gem_flink_ba...@double-flink.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-tgl-y/igt@gem_flink_ba...@double-flink.html

  * igt@kms_busy@basic@flip:
- fi-kbl-x1275:   [PASS][5] -> [DMESG-WARN][6] ([i915#62] / [i915#92] / 
[i915#95])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-kbl-x1275/igt@kms_busy@ba...@flip.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-kbl-x1275/igt@kms_busy@ba...@flip.html

  
 Possible fixes 

  * igt@gem_exec_suspend@basic-s0:
- fi-tgl-u2:  [FAIL][7] ([i915#1888]) -> [PASS][8]
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-tgl-u2/igt@gem_exec_susp...@basic-s0.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-tgl-u2/igt@gem_exec_susp...@basic-s0.html

  * igt@i915_pm_backlight@basic-brightness:
- fi-whl-u:   [DMESG-WARN][9] ([i915#95]) -> [PASS][10]
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-whl-u/igt@i915_pm_backli...@basic-brightness.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-whl-u/igt@i915_pm_backli...@basic-brightness.html

  * igt@vgem_basic@setversion:
- fi-tgl-y:   [DMESG-WARN][11] ([i915#402]) -> [PASS][12] +1 
similar issue
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-tgl-y/igt@vgem_ba...@setversion.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-tgl-y/igt@vgem_ba...@setversion.html

  
 Warnings 

  * igt@gem_exec_suspend@basic-s0:
- fi-kbl-x1275:   [DMESG-WARN][13] ([i915#62] / [i915#92]) -> 
[DMESG-WARN][14] ([i915#62] / [i915#92] / [i915#95]) +3 similar issues
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-kbl-x1275/igt@gem_exec_susp...@basic-s0.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-kbl-x1275/igt@gem_exec_susp...@basic-s0.html

  * igt@kms_cursor_legacy@basic-flip-after-cursor-varying-size:
- fi-kbl-x1275:   [DMESG-WARN][15] ([i915#62] / [i915#92] / [i915#95]) 
-> [DMESG-WARN][16] ([i915#62] / [i915#92]) +2 similar issues
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8708/fi-kbl-x1275/igt@kms_cursor_leg...@basic-flip-after-cursor-varying-size.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18082/fi-kbl-x1275/igt@kms_cursor_leg...@basic-flip-after-cursor-varying-size.html

  
  [i915#164]: https://gitlab.freedesktop.org/drm/intel/issues/164
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (43 -> 37)
--

  Additional (1): fi-skl-guc 
  Missing(7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan 
fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-

  * Linux: CI_DRM_8708 -> Patchwork_18082

  CI-20190529: 20190529
  CI_DRM_8708: 170e94a1430fd0a4f0841ad0f7366904d52e49be @ 
git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5722: 9985cf23e9db9557bc7d714f5b72602e427497d3 @ 
git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18082: 33bcfafbc4ab5abd3e686c0e7b58aa7f7b61da4a @ 
git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

33bcfafbc4ab drm/i915: Track i915_vma with its own reference counter
f3622825131d drm/i915/gem: Replace i915_gem_object.mm.mutex with 
reservation_ww_class
0c763806ba4f drm/i915/gem: Pull execbuf dma resv under a single critical section
2036c3ef6f3e drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.
3a96506f63d3 drm/i915/gem: Reintroduce multiple passes for reloc processing
8b519af3185f 

[Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [01/20] drm/i915: Preallocate stashes for vma page-directories

2020-07-05 Thread Patchwork
== Series Details ==

Series: series starting with [01/20] drm/i915: Preallocate stashes for vma 
page-directories
URL   : https://patchwork.freedesktop.org/series/79129/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be checked separately.
+drivers/gpu/drm/i915/selftests/i915_syncmap.c:80:54: warning: dubious: x | !y

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/20] drm/i915: Preallocate stashes for vma page-directories

2020-07-05 Thread Patchwork
== Series Details ==

Series: series starting with [01/20] drm/i915: Preallocate stashes for vma 
page-directories
URL   : https://patchwork.freedesktop.org/series/79129/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
b5c5d0ce6d3d drm/i915: Preallocate stashes for vma page-directories
5ff5ae1fb079 drm/i915: Switch to object allocations for page directories
1da308be5b55 drm/i915/gem: Don't drop the timeline lock during execbuf
a8aa1e74f47e drm/i915/gem: Rename execbuf.bind_link to unbound_link
6779fcaed114 drm/i915/gem: Break apart the early i915_vma_pin from execbuf 
object lookup
c3cc3ce10691 drm/i915/gem: Remove the call for no-evict i915_vma_pin
acb85ac92a2f drm/i915: Add list_for_each_entry_safe_continue_reverse
-:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'pos' - possible side-effects?
#21: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)
\
+   for (pos = list_prev_entry(pos, member),\
+n = list_prev_entry(pos, member);  \
+&pos->member != (head);\
+pos = n, n = list_prev_entry(n, member))

-:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'n' - possible side-effects?
#21: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)
\
+   for (pos = list_prev_entry(pos, member),\
+n = list_prev_entry(pos, member);  \
+&pos->member != (head);\
+pos = n, n = list_prev_entry(n, member))

-:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible 
side-effects?
#21: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)
\
+   for (pos = list_prev_entry(pos, member),\
+n = list_prev_entry(pos, member);  \
+&pos->member != (head);\
+pos = n, n = list_prev_entry(n, member))

total: 0 errors, 0 warnings, 3 checks, 12 lines checked
ed90e5840625 drm/i915: Always defer fenced work to the worker
03e637530b30 drm/i915/gem: Assign context id for async work
fe694efd1728 drm/i915: Export a preallocate variant of i915_active_acquire()
8669de2e382e drm/i915/gem: Separate the ww_mutex walker into its own list
a1fb11db685d drm/i915/gem: Asynchronous GTT unbinding
92aa37aeda01 drm/i915/gem: Bind the fence async for execbuf
6002974eb6ce drm/i915/gem: Include cmdparser in common execbuf pinning
8b519af3185f drm/i915/gem: Include secure batch in common execbuf pinning
3a96506f63d3 drm/i915/gem: Reintroduce multiple passes for reloc processing
-:1434: WARNING:MEMORY_BARRIER: memory barrier without comment
#1434: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c:161:
+   wmb();

total: 0 errors, 1 warnings, 0 checks, 1421 lines checked
2036c3ef6f3e drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.
-:59: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does 
MAINTAINERS need updating?
#59: 
new file mode 100644

-:354: WARNING:LINE_SPACING: Missing a blank line after declarations
#354: FILE: drivers/gpu/drm/i915/mm/st_acquire_ctx.c:106:
+   const unsigned int total = ARRAY_SIZE(dl->obj);
+   I915_RND_STATE(prng);

-:450: WARNING:YIELD: Using yield() is generally wrong. See yield() kernel-doc 
(sched/core.c)
#450: FILE: drivers/gpu/drm/i915/mm/st_acquire_ctx.c:202:
+   yield(); /* start all threads before we begin */

total: 0 errors, 3 warnings, 0 checks, 446 lines checked
0c763806ba4f drm/i915/gem: Pull execbuf dma resv under a single critical section
f3622825131d drm/i915/gem: Replace i915_gem_object.mm.mutex with 
reservation_ww_class
33bcfafbc4ab drm/i915: Track i915_vma with its own reference counter
-:2081: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#2081: FILE: drivers/gpu/drm/i915/gt/intel_gtt.h:254:
+   spinlock_t lock;

-:3963: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#3963: FILE: drivers/gpu/drm/i915/i915_vma.h:392:
+   spinlock_t lock;

-:4210: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code 
refactoring
#4210: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:419:
+   if (offset < hole_start + 
vma->size)

-:4221: WARNING:LONG_LINE: line length of 101 exceeds 100 columns
#4221: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:427:
+  __func__, p->name, err, 
npages, prime, offset,

-:4231: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code 
refactoring
#4231: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:444:
+ 

[Intel-gfx] [PATCH 09/20] drm/i915/gem: Assign context id for async work

2020-07-05 Thread Chris Wilson
Allocate a few dma fence context id that we can use to associate async work
[for the CPU] launched on behalf of this context. For extra fun, we allow
a configurable concurrency width.

A current example would be that we spawn an unbound worker for every
userptr get_pages. In the future, we wish to charge this work to the
context that initiated the async work and to impose concurrency limits
based on the context.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 4 
 drivers/gpu/drm/i915/gem/i915_gem_context.h   | 6 ++
 drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 6 ++
 3 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 41784df51e58..bd68746327b3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -714,6 +714,10 @@ __create_context(struct drm_i915_private *i915)
ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
mutex_init(&ctx->mutex);
 
+   ctx->async.width = rounddown_pow_of_two(num_online_cpus());
+   ctx->async.context = dma_fence_context_alloc(ctx->async.width);
+   ctx->async.width--;
+
spin_lock_init(&ctx->stale.lock);
INIT_LIST_HEAD(&ctx->stale.engines);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index 3702b2fb27ab..e104ff0ae740 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -134,6 +134,12 @@ int i915_gem_context_setparam_ioctl(struct drm_device 
*dev, void *data,
 int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file);
 
+static inline u64 i915_gem_context_async_id(struct i915_gem_context *ctx)
+{
+   return (ctx->async.context +
+   (atomic_fetch_inc(&ctx->async.cur) & ctx->async.width));
+}
+
 static inline struct i915_gem_context *
 i915_gem_context_get(struct i915_gem_context *ctx)
 {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index ae14ca24a11f..52561f98000f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -85,6 +85,12 @@ struct i915_gem_context {
 
struct intel_timeline *timeline;
 
+   struct {
+   u64 context;
+   atomic_t cur;
+   unsigned int width;
+   } async;
+
/**
 * @vm: unique address space (GTT)
 *
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 05/20] drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup

2020-07-05 Thread Chris Wilson
As a prelude to the next step where we want to perform all the object
allocations together under the same lock, we first must delay the
i915_vma_pin() as that implicitly does the allocations for us, one by
one. As it only does the allocations one by one, it is not allowed to
wait/evict, whereas pulling all the allocations together the entire set
can be scheduled as one.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 70 +++
 1 file changed, 39 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index bf8193d9e279..35a57c1fc9c3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -33,6 +33,8 @@ struct eb_vma {
 
/** This vma's place in the execbuf reservation list */
struct drm_i915_gem_exec_object2 *exec;
+
+   struct list_head bind_link;
struct list_head unbound_link;
struct list_head reloc_link;
 
@@ -240,8 +242,8 @@ struct i915_execbuffer {
/** actual size of execobj[] as we may extend it for the cmdparser */
unsigned int buffer_count;
 
-   /** list of vma not yet bound during reservation phase */
-   struct list_head unbound;
+   /** list of all vma required to bound for this execbuf */
+   struct list_head bind_list;
 
/** list of vma that have execobj.relocation_count */
struct list_head relocs;
@@ -565,6 +567,8 @@ eb_add_vma(struct i915_execbuffer *eb,
eb->lut_size)]);
}
 
+   list_add_tail(&ev->bind_link, &eb->bind_list);
+
if (entry->relocation_count)
list_add_tail(&ev->reloc_link, &eb->relocs);
 
@@ -586,16 +590,6 @@ eb_add_vma(struct i915_execbuffer *eb,
 
eb->batch = ev;
}
-
-   if (eb_pin_vma(eb, entry, ev)) {
-   if (entry->offset != vma->node.start) {
-   entry->offset = vma->node.start | UPDATE;
-   eb->args->flags |= __EXEC_HAS_RELOC;
-   }
-   } else {
-   eb_unreserve_vma(ev);
-   list_add_tail(&ev->unbound_link, &eb->unbound);
-   }
 }
 
 static int eb_reserve_vma(const struct i915_execbuffer *eb,
@@ -670,13 +664,31 @@ static int wait_for_timeline(struct intel_timeline *tl)
} while (1);
 }
 
-static int eb_reserve(struct i915_execbuffer *eb)
+static int eb_reserve_vm(struct i915_execbuffer *eb)
 {
-   const unsigned int count = eb->buffer_count;
unsigned int pin_flags = PIN_USER | PIN_NONBLOCK;
-   struct list_head last;
+   struct list_head last, unbound;
struct eb_vma *ev;
-   unsigned int i, pass;
+   unsigned int pass;
+
+   INIT_LIST_HEAD(&unbound);
+   list_for_each_entry(ev, &eb->bind_list, bind_link) {
+   struct drm_i915_gem_exec_object2 *entry = ev->exec;
+   struct i915_vma *vma = ev->vma;
+
+   if (eb_pin_vma(eb, entry, ev)) {
+   if (entry->offset != vma->node.start) {
+   entry->offset = vma->node.start | UPDATE;
+   eb->args->flags |= __EXEC_HAS_RELOC;
+   }
+   } else {
+   eb_unreserve_vma(ev);
+   list_add_tail(&ev->unbound_link, &unbound);
+   }
+   }
+
+   if (list_empty(&unbound))
+   return 0;
 
/*
 * Attempt to pin all of the buffers into the GTT.
@@ -699,7 +711,7 @@ static int eb_reserve(struct i915_execbuffer *eb)
if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
return -EINTR;
 
-   list_for_each_entry(ev, &eb->unbound, unbound_link) {
+   list_for_each_entry(ev, &unbound, unbound_link) {
err = eb_reserve_vma(eb, ev, pin_flags);
if (err)
break;
@@ -710,13 +722,11 @@ static int eb_reserve(struct i915_execbuffer *eb)
}
 
/* Resort *all* the objects into priority order */
-   INIT_LIST_HEAD(&eb->unbound);
+   INIT_LIST_HEAD(&unbound);
INIT_LIST_HEAD(&last);
-   for (i = 0; i < count; i++) {
-   unsigned int flags;
+   list_for_each_entry(ev, &eb->bind_list, bind_link) {
+   unsigned int flags = ev->flags;
 
-   ev = &eb->vma[i];
-   flags = ev->flags;
if (flags & EXEC_OBJECT_PINNED &&
flags & __EXEC_OBJECT_HAS_PIN)
continue;
@@ -725,17 +735,17 @@ static int eb_reserve(struct i915_execbuffer *eb)
 
if (flags & EXEC_OBJECT_PINNED)
/* Pi

[Intel-gfx] [PATCH 15/20] drm/i915/gem: Include secure batch in common execbuf pinning

2020-07-05 Thread Chris Wilson
Pull the GGTT binding for the secure batch dispatch into the common vma
pinning routine for execbuf, so that there is just a single central
place for all i915_vma_pin().

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 88 +++
 1 file changed, 51 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 8e4681427ce3..629b736adf2c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1648,6 +1648,48 @@ static int eb_alloc_cmdparser(struct i915_execbuffer *eb)
return err;
 }
 
+static int eb_secure_batch(struct i915_execbuffer *eb)
+{
+   struct i915_vma *vma = eb->batch->vma;
+
+   /*
+* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
+* batch" bit. Hence we need to pin secure batches into the global gtt.
+* hsw should have this fixed, but bdw mucks it up again.
+*/
+   if (!(eb->batch_flags & I915_DISPATCH_SECURE))
+   return 0;
+
+   if (GEM_WARN_ON(vma->vm != &eb->engine->gt->ggtt->vm)) {
+   struct eb_vma *ev;
+
+   ev = kzalloc(sizeof(*ev), GFP_KERNEL);
+   if (!ev)
+   return -ENOMEM;
+
+   vma = i915_vma_instance(vma->obj,
+   &eb->engine->gt->ggtt->vm,
+   NULL);
+   if (IS_ERR(vma)) {
+   kfree(ev);
+   return PTR_ERR(vma);
+   }
+
+   ev->vma = i915_vma_get(vma);
+   ev->exec = &no_entry;
+
+   list_add(&ev->submit_link, &eb->submit_list);
+   list_add(&ev->reloc_link, &eb->array->aux_list);
+   list_add(&ev->bind_link, &eb->bind_list);
+
+   GEM_BUG_ON(eb->batch->vma->private);
+   eb->batch = ev;
+   }
+
+   eb->batch->flags |= EXEC_OBJECT_NEEDS_GTT;
+   return 0;
+}
+
 static unsigned int eb_batch_index(const struct i915_execbuffer *eb)
 {
if (eb->args->flags & I915_EXEC_BATCH_FIRST)
@@ -2442,6 +2484,10 @@ static int eb_relocate(struct i915_execbuffer *eb)
if (err)
return err;
 
+   err = eb_secure_batch(eb);
+   if (err)
+   return err;
+
err = eb_reserve_vm(eb);
if (err)
return err;
@@ -2796,7 +2842,7 @@ add_to_client(struct i915_request *rq, struct drm_file 
*file)
spin_unlock(&file_priv->mm.lock);
 }
 
-static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch)
+static int eb_submit(struct i915_execbuffer *eb)
 {
int err;
 
@@ -2823,7 +2869,7 @@ static int eb_submit(struct i915_execbuffer *eb, struct 
i915_vma *batch)
}
 
err = eb->engine->emit_bb_start(eb->request,
-   batch->node.start +
+   eb->batch->vma->node.start +
eb->batch_start_offset,
eb->batch_len,
eb->batch_flags);
@@ -3296,7 +3342,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
struct i915_execbuffer eb;
struct dma_fence *in_fence = NULL;
struct sync_file *out_fence = NULL;
-   struct i915_vma *batch;
int out_fence_fd = -1;
int err;
 
@@ -3388,34 +3433,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
if (err)
goto err_vma;
 
-   /*
-* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
-* batch" bit. Hence we need to pin secure batches into the global gtt.
-* hsw should have this fixed, but bdw mucks it up again. */
-   batch = i915_vma_get(eb.batch->vma);
-   if (eb.batch_flags & I915_DISPATCH_SECURE) {
-   struct i915_vma *vma;
-
-   /*
-* So on first glance it looks freaky that we pin the batch here
-* outside of the reservation loop. But:
-* - The batch is already pinned into the relevant ppgtt, so we
-*   already have the backing storage fully allocated.
-* - No other BO uses the global gtt (well contexts, but meh),
-*   so we don't really have issues with multiple objects not
-*   fitting due to fragmentation.
-* So this is actually safe.
-*/
-   vma = i915_gem_object_ggtt_pin(batch->obj, NULL, 0, 0, 0);
-   if (IS_ERR(vma)) {
-   err = PTR_ERR(vma);
-   goto err_vma;
-   }
-
-   GEM_BUG_ON(vma->obj != batch->obj);
-   batch = vma;
-   }
-
/* All GPU relocation batches must be submitted prior to the user rq */
GEM_BUG_ON(

[Intel-gfx] [PATCH 14/20] drm/i915/gem: Include cmdparser in common execbuf pinning

2020-07-05 Thread Chris Wilson
Pull the cmdparser allocations in to the reservation phase, and then
they are included in the common vma pinning pass.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 348 ++
 drivers/gpu/drm/i915/gem/i915_gem_object.h|  10 +
 drivers/gpu/drm/i915/i915_cmd_parser.c|  21 +-
 3 files changed, 218 insertions(+), 161 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index c14c3b7e0dfd..8e4681427ce3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -25,6 +25,7 @@
 #include "i915_gem_clflush.h"
 #include "i915_gem_context.h"
 #include "i915_gem_ioctls.h"
+#include "i915_memcpy.h"
 #include "i915_sw_fence_work.h"
 #include "i915_trace.h"
 
@@ -52,6 +53,7 @@ struct eb_bind_vma {
 
 struct eb_vma_array {
struct kref kref;
+   struct list_head aux_list;
struct eb_vma vma[];
 };
 
@@ -246,7 +248,6 @@ struct i915_execbuffer {
 
struct i915_request *request; /** our request to build */
struct eb_vma *batch; /** identity of the batch obj/vma */
-   struct i915_vma *trampoline; /** trampoline used for chaining */
 
/** actual size of execobj[] as we may extend it for the cmdparser */
unsigned int buffer_count;
@@ -281,6 +282,11 @@ struct i915_execbuffer {
unsigned int rq_size;
} reloc_cache;
 
+   struct eb_cmdparser {
+   struct eb_vma *shadow;
+   struct eb_vma *trampoline;
+   } parser;
+
u64 invalid_flags; /** Set of execobj.flags that are invalid */
u32 context_flags; /** Set of execobj.flags to insert from the ctx */
 
@@ -298,6 +304,10 @@ struct i915_execbuffer {
struct eb_vma_array *array;
 };
 
+static struct drm_i915_gem_exec_object2 no_entry = {
+   .offset = -1ull
+};
+
 static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb)
 {
return intel_engine_requires_cmd_parser(eb->engine) ||
@@ -314,6 +324,7 @@ static struct eb_vma_array *eb_vma_array_create(unsigned 
int count)
return NULL;
 
kref_init(&arr->kref);
+   INIT_LIST_HEAD(&arr->aux_list);
arr->vma[0].vma = NULL;
 
return arr;
@@ -339,16 +350,31 @@ static inline void eb_unreserve_vma(struct eb_vma *ev)
   __EXEC_OBJECT_HAS_FENCE);
 }
 
+static void eb_vma_destroy(struct eb_vma *ev)
+{
+   eb_unreserve_vma(ev);
+   i915_vma_put(ev->vma);
+}
+
+static void eb_destroy_aux(struct eb_vma_array *arr)
+{
+   struct eb_vma *ev, *en;
+
+   list_for_each_entry_safe(ev, en, &arr->aux_list, reloc_link) {
+   eb_vma_destroy(ev);
+   kfree(ev);
+   }
+}
+
 static void eb_vma_array_destroy(struct kref *kref)
 {
struct eb_vma_array *arr = container_of(kref, typeof(*arr), kref);
-   struct eb_vma *ev = arr->vma;
+   struct eb_vma *ev;
 
-   while (ev->vma) {
-   eb_unreserve_vma(ev);
-   i915_vma_put(ev->vma);
-   ev++;
-   }
+   eb_destroy_aux(arr);
+
+   for (ev = arr->vma; ev->vma; ev++)
+   eb_vma_destroy(ev);
 
kvfree(arr);
 }
@@ -396,8 +422,8 @@ eb_lock_vma(struct i915_execbuffer *eb, struct 
ww_acquire_ctx *acquire)
 
 static int eb_create(struct i915_execbuffer *eb)
 {
-   /* Allocate an extra slot for use by the command parser + sentinel */
-   eb->array = eb_vma_array_create(eb->buffer_count + 2);
+   /* Allocate an extra slot for use by the sentinel */
+   eb->array = eb_vma_array_create(eb->buffer_count + 1);
if (!eb->array)
return -ENOMEM;
 
@@ -1078,7 +1104,7 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct 
eb_bind_vma *bind)
GEM_BUG_ON(!(drm_mm_node_allocated(&vma->node) ^
 drm_mm_node_allocated(&bind->hole)));
 
-   if (entry->offset != vma->node.start) {
+   if (entry != &no_entry && entry->offset != vma->node.start) {
entry->offset = vma->node.start | UPDATE;
*work->p_flags |= __EXEC_HAS_RELOC;
}
@@ -1374,7 +1400,8 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
struct i915_vma *vma = ev->vma;
 
if (eb_pin_vma_inplace(eb, entry, ev)) {
-   if (entry->offset != vma->node.start) {
+   if (entry != &no_entry &&
+   entry->offset != vma->node.start) {
entry->offset = vma->node.start | UPDATE;
eb->args->flags |= __EXEC_HAS_RELOC;
}
@@ -1514,6 +1541,113 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
} while (1);
 }
 
+static int eb_alloc_cmdparser(struct i915_execbuffer *eb)
+{
+   struct intel_gt_buffer_pool_node *pool;
+   struct i915_vma *vma;
+   struct eb_vma *ev;

[Intel-gfx] [PATCH 13/20] drm/i915/gem: Bind the fence async for execbuf

2020-07-05 Thread Chris Wilson
It is illegal to wait on an another vma while holding the vm->mutex, as
that easily leads to ABBA deadlocks (we wait on a second vma that waits
on us to release the vm->mutex). So while the vm->mutex exists, move the
waiting outside of the lock into the async binding pipeline.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  21 +--
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  | 137 +-
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h  |   5 +
 3 files changed, 151 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 6a406e8798ef..c14c3b7e0dfd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1056,15 +1056,6 @@ static int eb_reserve_vma(struct eb_vm_work *work, 
struct eb_bind_vma *bind)
return err;
 
 pin:
-   if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
-   err = __i915_vma_pin_fence(vma); /* XXX no waiting */
-   if (unlikely(err))
-   return err;
-
-   if (vma->fence)
-   bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE;
-   }
-
bind_flags &= ~atomic_read(&vma->flags);
if (bind_flags) {
err = set_bind_fence(vma, work);
@@ -1095,6 +1086,15 @@ static int eb_reserve_vma(struct eb_vm_work *work, 
struct eb_bind_vma *bind)
bind->ev->flags |= __EXEC_OBJECT_HAS_PIN;
GEM_BUG_ON(eb_vma_misplaced(entry, vma, bind->ev->flags));
 
+   if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
+   err = __i915_vma_pin_fence_async(vma, &work->base);
+   if (unlikely(err))
+   return err;
+
+   if (vma->fence)
+   bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE;
+   }
+
return 0;
 }
 
@@ -1160,6 +1160,9 @@ static void __eb_bind_vma(struct eb_vm_work *work)
struct eb_bind_vma *bind = &work->bind[n];
struct i915_vma *vma = bind->ev->vma;
 
+   if (bind->ev->flags & __EXEC_OBJECT_HAS_FENCE)
+   __i915_vma_apply_fence_async(vma);
+
if (!bind->bind_flags)
goto put;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
index 7fb36b12fe7a..734b6aa61809 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
@@ -21,10 +21,13 @@
  * IN THE SOFTWARE.
  */
 
+#include "i915_active.h"
 #include "i915_drv.h"
 #include "i915_scatterlist.h"
+#include "i915_sw_fence_work.h"
 #include "i915_pvinfo.h"
 #include "i915_vgpu.h"
+#include "i915_vma.h"
 
 /**
  * DOC: fence register handling
@@ -340,19 +343,37 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt 
*ggtt)
return ERR_PTR(-EDEADLK);
 }
 
+static int fence_wait_bind(struct i915_fence_reg *reg)
+{
+   struct dma_fence *fence;
+   int err = 0;
+
+   fence = i915_active_fence_get(®->active.excl);
+   if (fence) {
+   err = dma_fence_wait(fence, true);
+   dma_fence_put(fence);
+   }
+
+   return err;
+}
+
 int __i915_vma_pin_fence(struct i915_vma *vma)
 {
struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
-   struct i915_fence_reg *fence;
+   struct i915_fence_reg *fence = vma->fence;
struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL;
int err;
 
lockdep_assert_held(&vma->vm->mutex);
 
/* Just update our place in the LRU if our fence is getting reused. */
-   if (vma->fence) {
-   fence = vma->fence;
+   if (fence) {
GEM_BUG_ON(fence->vma != vma);
+
+   err = fence_wait_bind(fence);
+   if (err)
+   return err;
+
atomic_inc(&fence->pin_count);
if (!fence->dirty) {
list_move_tail(&fence->link, &ggtt->fence_list);
@@ -384,6 +405,116 @@ int __i915_vma_pin_fence(struct i915_vma *vma)
return err;
 }
 
+static int set_bind_fence(struct i915_fence_reg *fence,
+ struct dma_fence_work *work)
+{
+   struct dma_fence *prev;
+   int err;
+
+   if (rcu_access_pointer(fence->active.excl.fence) == &work->dma)
+   return 0;
+
+   err = i915_sw_fence_await_active(&work->chain,
+&fence->active,
+I915_ACTIVE_AWAIT_ACTIVE);
+   if (err)
+   return err;
+
+   if (i915_active_acquire(&fence->active))
+   return -ENOENT;
+
+   prev = i915_active_set_exclusive(&fence->active, &work->dma);
+   if (unlikely(prev)) {
+   err = i915_sw_fence_await_dma_fence(&work->chain, prev, 0,
+   

[Intel-gfx] [PATCH 08/20] drm/i915: Always defer fenced work to the worker

2020-07-05 Thread Chris Wilson
Currently, if an error is raised we always call the cleanup locally
[and skip the main work callback]. However, some future users may need
to take a mutex to cleanup and so we cannot immediately execute the
cleanup as we may still be in interrupt context.

With the execute-immediate flag, for most cases this should result in
immediate cleanup of an error.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_sw_fence_work.c | 25 +++
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c 
b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index a3a81bb8f2c3..29f63ebc24e8 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -16,11 +16,14 @@ static void fence_complete(struct dma_fence_work *f)
 static void fence_work(struct work_struct *work)
 {
struct dma_fence_work *f = container_of(work, typeof(*f), work);
-   int err;
 
-   err = f->ops->work(f);
-   if (err)
-   dma_fence_set_error(&f->dma, err);
+   if (!f->dma.error) {
+   int err;
+
+   err = f->ops->work(f);
+   if (err)
+   dma_fence_set_error(&f->dma, err);
+   }
 
fence_complete(f);
dma_fence_put(&f->dma);
@@ -36,15 +39,11 @@ fence_notify(struct i915_sw_fence *fence, enum 
i915_sw_fence_notify state)
if (fence->error)
dma_fence_set_error(&f->dma, fence->error);
 
-   if (!f->dma.error) {
-   dma_fence_get(&f->dma);
-   if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
-   fence_work(&f->work);
-   else
-   queue_work(system_unbound_wq, &f->work);
-   } else {
-   fence_complete(f);
-   }
+   dma_fence_get(&f->dma);
+   if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
+   fence_work(&f->work);
+   else
+   queue_work(system_unbound_wq, &f->work);
break;
 
case FENCE_FREE:
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 02/20] drm/i915: Switch to object allocations for page directories

2020-07-05 Thread Chris Wilson
The GEM object is grossly overweight for the practicality of tracking
large numbers of individual pages, yet it is currently our only
abstraction for tracking DMA allocations. Since those allocations need
to be reserved upfront before an operation, and that we need to break
away from simple system memory, we need to ditch using plain struct page
wrappers.

In the process, we drop the WC mapping as we ended up clflushing
everything anyway due to various issues across a wider range of
platforms. Though in a future step, we need to drop the kmap_atomic
approach which suggests we need to pre-map all the pages and keep them
mapped.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
 .../drm/i915/gem/selftests/i915_gem_context.c |   2 +-
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |  53 ++--
 drivers/gpu/drm/i915/gt/gen6_ppgtt.h  |   1 +
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  89 +++---
 drivers/gpu/drm/i915/gt/intel_ggtt.c  |  37 ++-
 drivers/gpu/drm/i915/gt/intel_gtt.c   | 292 +++---
 drivers/gpu/drm/i915/gt/intel_gtt.h   |  94 ++
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |  42 ++-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  16 +-
 drivers/gpu/drm/i915/gvt/scheduler.c  |  17 +-
 drivers/gpu/drm/i915/i915_drv.c   |   1 +
 drivers/gpu/drm/i915/i915_drv.h   |   5 -
 drivers/gpu/drm/i915/i915_vma.c   |  18 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  23 ++
 drivers/gpu/drm/i915/selftests/mock_gtt.c |   4 +
 17 files changed, 278 insertions(+), 419 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 5335f799b548..d0847d7896f9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -282,6 +282,7 @@ struct drm_i915_gem_object {
} userptr;
 
unsigned long scratch;
+   u64 encode;
 
void *gvt_info;
};
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c 
b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 8291ede6902c..9fb06fcc8f8f 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -393,7 +393,7 @@ static int igt_mock_exhaust_device_supported_pages(void 
*arg)
 */
 
for (i = 1; i < BIT(ARRAY_SIZE(page_sizes)); i++) {
-   unsigned int combination = 0;
+   unsigned int combination = SZ_4K;
 
for (j = 0; j < ARRAY_SIZE(page_sizes); j++) {
if (i & BIT(j))
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index b81978890641..1308198543d8 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -1745,7 +1745,7 @@ static int check_scratch_page(struct i915_gem_context 
*ctx, u32 *out)
if (!vm)
return -ENODEV;
 
-   page = vm->scratch[0].base.page;
+   page = __px_page(vm->scratch[0]);
if (!page) {
pr_err("No scratch page!\n");
return -EINVAL;
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 35e2b698f9ed..1c1e807f674d 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -16,8 +16,10 @@ static inline void gen6_write_pde(const struct gen6_ppgtt 
*ppgtt,
  const unsigned int pde,
  const struct i915_page_table *pt)
 {
+   dma_addr_t addr = pt ? px_dma(pt) : px_dma(ppgtt->base.vm.scratch[1]);
+
/* Caller needs to make sure the write completes if necessary */
-   iowrite32(GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID,
+   iowrite32(GEN6_PDE_ADDR_ENCODE(addr) | GEN6_PDE_VALID,
  ppgtt->pd_addr + pde);
 }
 
@@ -79,7 +81,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space 
*vm,
 {
struct gen6_ppgtt * const ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
const unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
-   const gen6_pte_t scratch_pte = vm->scratch[0].encode;
+   const gen6_pte_t scratch_pte = vm->scratch[0]->encode;
unsigned int pde = first_entry / GEN6_PTES;
unsigned int pte = first_entry % GEN6_PTES;
unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
@@ -90,8 +92,6 @@ static void gen6_ppgtt_clear_range(struct i915_address_space 
*vm,
const unsigned int count = min(num_entries, GEN6_PTES - pte);
gen6_pte_t *vaddr;
 
-   GEM_BUG_ON(px_base(pt) == px_base(&vm->scratch[1]));
-
num_entries -= count;
 
  

[Intel-gfx] [PATCH 12/20] drm/i915/gem: Asynchronous GTT unbinding

2020-07-05 Thread Chris Wilson
It is reasonably common for userspace (even modern drivers like iris) to
reuse an active address for a new buffer. This would cause the
application to stall under its mutex (originally struct_mutex) until the
old batches were idle and it could synchronously remove the stale PTE.
However, we can queue up a job that waits on the signal for the old
nodes to complete and upon those signals, remove the old nodes replacing
them with the new ones for the batch. This is still CPU driven, but in
theory we can do the GTT patching from the GPU. The job itself has a
completion signal allowing the execbuf to wait upon the rebinding, and
also other observers to coordinate with the common VM activity.

Letting userspace queue up more work, lets it do more stuff without
blocking other clients. In turn, we take care not to let it too much
concurrent work, creating a small number of queues for each context to
limit the number of concurrent tasks.

The implementation relies on only scheduling one unbind operation per
vma as we use the unbound vma->node location to track the stale PTE.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1402
Signed-off-by: Chris Wilson 
Cc: Matthew Auld 
Cc: Andi Shyti 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 917 --
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |   1 +
 drivers/gpu/drm/i915/gt/intel_gtt.c   |   4 +
 drivers/gpu/drm/i915/gt/intel_gtt.h   |   2 +
 drivers/gpu/drm/i915/i915_gem.c   |   7 +
 drivers/gpu/drm/i915/i915_gem_gtt.c   |   5 +
 drivers/gpu/drm/i915/i915_vma.c   |  71 +-
 drivers/gpu/drm/i915/i915_vma.h   |   4 +
 8 files changed, 883 insertions(+), 128 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 4d8ac89c56fc..6a406e8798ef 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -18,6 +18,7 @@
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_buffer_pool.h"
 #include "gt/intel_gt_pm.h"
+#include "gt/intel_gt_requests.h"
 #include "gt/intel_ring.h"
 
 #include "i915_drv.h"
@@ -43,6 +44,12 @@ struct eb_vma {
u32 handle;
 };
 
+struct eb_bind_vma {
+   struct eb_vma *ev;
+   struct drm_mm_node hole;
+   unsigned int bind_flags;
+};
+
 struct eb_vma_array {
struct kref kref;
struct eb_vma vma[];
@@ -66,11 +73,12 @@ struct eb_vma_array {
 I915_EXEC_RESOURCE_STREAMER)
 
 /* Catch emission of unexpected errors for CI! */
+#define __EINVAL__ 22
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 #undef EINVAL
 #define EINVAL ({ \
DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
-   22; \
+   __EINVAL__; \
 })
 #endif
 
@@ -311,6 +319,12 @@ static struct eb_vma_array *eb_vma_array_create(unsigned 
int count)
return arr;
 }
 
+static struct eb_vma_array *eb_vma_array_get(struct eb_vma_array *arr)
+{
+   kref_get(&arr->kref);
+   return arr;
+}
+
 static inline void eb_unreserve_vma(struct eb_vma *ev)
 {
struct i915_vma *vma = ev->vma;
@@ -444,7 +458,10 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 
*entry,
 const struct i915_vma *vma,
 unsigned int flags)
 {
-   if (vma->node.size < entry->pad_to_size)
+   if (test_bit(I915_VMA_ERROR_BIT, __i915_vma_flags(vma)))
+   return true;
+
+   if (vma->node.size < max(vma->size, entry->pad_to_size))
return true;
 
if (entry->alignment && !IS_ALIGNED(vma->node.start, entry->alignment))
@@ -469,32 +486,6 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 
*entry,
return false;
 }
 
-static u64 eb_pin_flags(const struct drm_i915_gem_exec_object2 *entry,
-   unsigned int exec_flags)
-{
-   u64 pin_flags = 0;
-
-   if (exec_flags & EXEC_OBJECT_NEEDS_GTT)
-   pin_flags |= PIN_GLOBAL;
-
-   /*
-* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
-* limit address to the first 4GBs for unflagged objects.
-*/
-   if (!(exec_flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS))
-   pin_flags |= PIN_ZONE_4G;
-
-   if (exec_flags & __EXEC_OBJECT_NEEDS_MAP)
-   pin_flags |= PIN_MAPPABLE;
-
-   if (exec_flags & EXEC_OBJECT_PINNED)
-   pin_flags |= entry->offset | PIN_OFFSET_FIXED;
-   else if (exec_flags & __EXEC_OBJECT_NEEDS_BIAS)
-   pin_flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS;
-
-   return pin_flags;
-}
-
 static bool eb_pin_vma_fence_inplace(struct eb_vma *ev)
 {
struct i915_vma *vma = ev->vma;
@@ -522,6 +513,10 @@ eb_pin_vma_inplace(struct i915_execbuffer *eb,
struct i915_vma *vma = ev->vma;
unsigned int pin_flags;
 
+   /* Concurrent async binds in progress, get in the queue */
+   if (!i915_active_is_idle(&vma->vm->binding))
+   return false

[Intel-gfx] [PATCH 10/20] drm/i915: Export a preallocate variant of i915_active_acquire()

2020-07-05 Thread Chris Wilson
Sometimes we have to be very careful not to allocate underneath a mutex
(or spinlock) and yet still want to track activity. Enter
i915_active_acquire_for_context(). This raises the activity counter on
i915_active prior to use and ensures that the fence-tree contains a slot
for the context.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|   2 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c  |   4 +-
 drivers/gpu/drm/i915/i915_active.c| 113 +++---
 drivers/gpu/drm/i915/i915_active.h|  14 ++-
 4 files changed, 113 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 1015c4fd9f3e..6d20be29ff3c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1789,7 +1789,7 @@ __parser_mark_active(struct i915_vma *vma,
 {
struct intel_gt_buffer_pool_node *node = vma->private;
 
-   return i915_active_ref(&node->active, tl, fence);
+   return i915_active_ref(&node->active, tl->fence_context, fence);
 }
 
 static int
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c 
b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 4546284fede1..e4a5326633b8 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -479,7 +479,9 @@ __intel_timeline_get_seqno(struct intel_timeline *tl,
 * free it after the current request is retired, which ensures that
 * all writes into the cacheline from previous requests are complete.
 */
-   err = i915_active_ref(&tl->hwsp_cacheline->active, tl, &rq->fence);
+   err = i915_active_ref(&tl->hwsp_cacheline->active,
+ tl->fence_context,
+ &rq->fence);
if (err)
goto err_cacheline;
 
diff --git a/drivers/gpu/drm/i915/i915_active.c 
b/drivers/gpu/drm/i915/i915_active.c
index d960d0be5bd2..3f595446fd44 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -217,11 +217,10 @@ excl_retire(struct dma_fence *fence, struct dma_fence_cb 
*cb)
 }
 
 static struct i915_active_fence *
-active_instance(struct i915_active *ref, struct intel_timeline *tl)
+active_instance(struct i915_active *ref, u64 idx)
 {
struct active_node *node, *prealloc;
struct rb_node **p, *parent;
-   u64 idx = tl->fence_context;
 
/*
 * We track the most recently used timeline to skip a rbtree search
@@ -353,21 +352,17 @@ __active_del_barrier(struct i915_active *ref, struct 
active_node *node)
return active_del_barrier(ref, node, barrier_to_engine(node));
 }
 
-int i915_active_ref(struct i915_active *ref,
-   struct intel_timeline *tl,
-   struct dma_fence *fence)
+int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence)
 {
struct i915_active_fence *active;
int err;
 
-   lockdep_assert_held(&tl->mutex);
-
/* Prevent reaping in case we malloc/wait while building the tree */
err = i915_active_acquire(ref);
if (err)
return err;
 
-   active = active_instance(ref, tl);
+   active = active_instance(ref, idx);
if (!active) {
err = -ENOMEM;
goto out;
@@ -384,32 +379,104 @@ int i915_active_ref(struct i915_active *ref,
atomic_dec(&ref->count);
}
if (!__i915_active_fence_set(active, fence))
-   atomic_inc(&ref->count);
+   __i915_active_acquire(ref);
 
 out:
i915_active_release(ref);
return err;
 }
 
-struct dma_fence *
-i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
+static struct dma_fence *
+__i915_active_set_fence(struct i915_active *ref,
+   struct i915_active_fence *active,
+   struct dma_fence *fence)
 {
struct dma_fence *prev;
 
/* We expect the caller to manage the exclusive timeline ordering */
GEM_BUG_ON(i915_active_is_idle(ref));
 
+   if (is_barrier(active)) { /* proto-node used by our idle barrier */
+   /*
+* This request is on the kernel_context timeline, and so
+* we can use it to substitute for the pending idle-barrer
+* request that we want to emit on the kernel_context.
+*/
+   __active_del_barrier(ref, node_from_active(active));
+   RCU_INIT_POINTER(active->fence, NULL);
+   atomic_dec(&ref->count);
+   }
+
rcu_read_lock();
-   prev = __i915_active_fence_set(&ref->excl, f);
+   prev = __i915_active_fence_set(active, fence);
if (prev)
prev = dma_fence_get_rcu(prev);
else
-   atomic_inc(&ref->count);
+   __i915_active_acquire(ref);
rcu_read_un

[Intel-gfx] [PATCH 17/20] drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.

2020-07-05 Thread Chris Wilson
From: Maarten Lankhorst 

i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory
eviction. We don't use it yet, but lets start adding the definition
first.

To use it, we have to pass a non-NULL ww to gem_object_lock, and don't
unlock directly. It is done in i915_gem_ww_ctx_fini.

Changes since v1:
- Change ww_ctx and obj order in locking functions (Jonas Lahtinen)

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/Makefile |   4 +
 drivers/gpu/drm/i915/i915_globals.c   |   1 +
 drivers/gpu/drm/i915/i915_globals.h   |   1 +
 drivers/gpu/drm/i915/mm/i915_acquire_ctx.c| 139 ++
 drivers/gpu/drm/i915/mm/i915_acquire_ctx.h|  34 +++
 drivers/gpu/drm/i915/mm/st_acquire_ctx.c  | 242 ++
 .../drm/i915/selftests/i915_mock_selftests.h  |   1 +
 7 files changed, 422 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/mm/i915_acquire_ctx.c
 create mode 100644 drivers/gpu/drm/i915/mm/i915_acquire_ctx.h
 create mode 100644 drivers/gpu/drm/i915/mm/st_acquire_ctx.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 41a27fd5dbc7..33c85b4ff3ed 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -124,6 +124,10 @@ gt-y += \
gt/gen9_renderstate.o
 i915-y += $(gt-y)
 
+# Memory + DMA management
+i915-y += \
+   mm/i915_acquire_ctx.o
+
 # GEM (Graphics Execution Management) code
 gem-y += \
gem/i915_gem_busy.o \
diff --git a/drivers/gpu/drm/i915/i915_globals.c 
b/drivers/gpu/drm/i915/i915_globals.c
index 3aa213684293..51ec42a14694 100644
--- a/drivers/gpu/drm/i915/i915_globals.c
+++ b/drivers/gpu/drm/i915/i915_globals.c
@@ -87,6 +87,7 @@ static void __i915_globals_cleanup(void)
 
 static __initconst int (* const initfn[])(void) = {
i915_global_active_init,
+   i915_global_acquire_init,
i915_global_buddy_init,
i915_global_context_init,
i915_global_gem_context_init,
diff --git a/drivers/gpu/drm/i915/i915_globals.h 
b/drivers/gpu/drm/i915/i915_globals.h
index b2f5cd9b9b1a..11227abf2769 100644
--- a/drivers/gpu/drm/i915/i915_globals.h
+++ b/drivers/gpu/drm/i915/i915_globals.h
@@ -27,6 +27,7 @@ void i915_globals_exit(void);
 
 /* constructors */
 int i915_global_active_init(void);
+int i915_global_acquire_init(void);
 int i915_global_buddy_init(void);
 int i915_global_context_init(void);
 int i915_global_gem_context_init(void);
diff --git a/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c 
b/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c
new file mode 100644
index ..d1c3b958c15d
--- /dev/null
+++ b/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include 
+
+#include "i915_globals.h"
+#include "gem/i915_gem_object.h"
+
+#include "i915_acquire_ctx.h"
+
+static struct i915_global_acquire {
+   struct i915_global base;
+   struct kmem_cache *slab_acquires;
+} global;
+
+struct i915_acquire {
+   struct drm_i915_gem_object *obj;
+   struct i915_acquire *next;
+};
+
+static struct i915_acquire *i915_acquire_alloc(void)
+{
+   return kmem_cache_alloc(global.slab_acquires, GFP_KERNEL);
+}
+
+static void i915_acquire_free(struct i915_acquire *lnk)
+{
+   kmem_cache_free(global.slab_acquires, lnk);
+}
+
+void i915_acquire_ctx_init(struct i915_acquire_ctx *ctx)
+{
+   ww_acquire_init(&ctx->ctx, &reservation_ww_class);
+   ctx->locked = NULL;
+}
+
+int i915_acquire_ctx_lock(struct i915_acquire_ctx *ctx,
+ struct drm_i915_gem_object *obj)
+{
+   struct i915_acquire *lock, *lnk;
+   int err;
+
+   lock = i915_acquire_alloc();
+   if (!lock)
+   return -ENOMEM;
+
+   lock->obj = i915_gem_object_get(obj);
+   lock->next = NULL;
+
+   while ((lnk = lock)) {
+   obj = lnk->obj;
+   lock = lnk->next;
+
+   err = dma_resv_lock_interruptible(obj->base.resv, &ctx->ctx);
+   if (err == -EDEADLK) {
+   struct i915_acquire *old;
+
+   while ((old = ctx->locked)) {
+   i915_gem_object_unlock(old->obj);
+   ctx->locked = old->next;
+   old->next = lock;
+   lock = old;
+   }
+
+   err = dma_resv_lock_slow_interruptible(obj->base.resv,
+  &ctx->ctx);
+   }
+   if (!err) {
+   lnk->next = ctx->locked;
+   ctx->locked = lnk;
+   } else {
+   i915_gem_object_put(obj);
+   i915_acquire_free(lnk);
+   }
+   if (err == -EALREADY)
+   err = 0;
+   if (err)
+   break;
+   }
+
+   w

[Intel-gfx] [PATCH 04/20] drm/i915/gem: Rename execbuf.bind_link to unbound_link

2020-07-05 Thread Chris Wilson
Rename the current list of unbound objects so that we can track of all
objects that we need to bind, as well as the list of currently unbound
[unprocessed] objects.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index e4d06db3f313..bf8193d9e279 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -33,7 +33,7 @@ struct eb_vma {
 
/** This vma's place in the execbuf reservation list */
struct drm_i915_gem_exec_object2 *exec;
-   struct list_head bind_link;
+   struct list_head unbound_link;
struct list_head reloc_link;
 
struct hlist_node node;
@@ -594,7 +594,7 @@ eb_add_vma(struct i915_execbuffer *eb,
}
} else {
eb_unreserve_vma(ev);
-   list_add_tail(&ev->bind_link, &eb->unbound);
+   list_add_tail(&ev->unbound_link, &eb->unbound);
}
 }
 
@@ -699,7 +699,7 @@ static int eb_reserve(struct i915_execbuffer *eb)
if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
return -EINTR;
 
-   list_for_each_entry(ev, &eb->unbound, bind_link) {
+   list_for_each_entry(ev, &eb->unbound, unbound_link) {
err = eb_reserve_vma(eb, ev, pin_flags);
if (err)
break;
@@ -725,15 +725,15 @@ static int eb_reserve(struct i915_execbuffer *eb)
 
if (flags & EXEC_OBJECT_PINNED)
/* Pinned must have their slot */
-   list_add(&ev->bind_link, &eb->unbound);
+   list_add(&ev->unbound_link, &eb->unbound);
else if (flags & __EXEC_OBJECT_NEEDS_MAP)
/* Map require the lowest 256MiB (aperture) */
-   list_add_tail(&ev->bind_link, &eb->unbound);
+   list_add_tail(&ev->unbound_link, &eb->unbound);
else if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS))
/* Prioritise 4GiB region for restricted bo */
-   list_add(&ev->bind_link, &last);
+   list_add(&ev->unbound_link, &last);
else
-   list_add_tail(&ev->bind_link, &last);
+   list_add_tail(&ev->unbound_link, &last);
}
list_splice_tail(&last, &eb->unbound);
mutex_unlock(&eb->i915->drm.struct_mutex);
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 19/20] drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class

2020-07-05 Thread Chris Wilson
Our goal is to pull all memory reservations (next iteration
obj->ops->get_pages()) under a ww_mutex, and to align those reservations
with other drivers, i.e. control all such allocations with the
reservation_ww_class. Currently, this is under the purview of the
obj->mm.mutex, and while obj->mm remains an embedded struct we can
"simply" switch to using the reservation_ww_class obj->base.resv->lock

The major consequence is the impact on the shrinker paths as the
reservation_ww_class is used to wrap allocations, and a ww_mutex does
not support subclassing so we cannot do our usual trick of knowing that
we never recurse inside the shrinker and instead have to finish the
reclaim with a trylock. This may result in us failing to release the
pages after having released the vma. This will have to do until a better
idea comes along.

However, this step only converts the mutex over and continues to treat
everything as a single allocation and pinning the pages. With the
ww_mutex in place we can remove the temporary pinning, as we can then
reserve all storage en masse.

One last thing to do: kill the implict page pinning for active vma.
This will require us to invalidate the vma->pages when the backing store
is removed (and we expect that while the vma is active, we mark the
backing store as active so that it cannot be removed while the HW is
busy.)

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/gem/i915_gem_clflush.c   |  20 +-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  18 +-
 drivers/gpu/drm/i915/gem/i915_gem_domain.c|  65 ++---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  44 +++-
 drivers/gpu/drm/i915/gem/i915_gem_object.c|   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.h|  37 +--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 -
 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 136 +--
 drivers/gpu/drm/i915/gem/i915_gem_phys.c  |   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  13 +-
 drivers/gpu/drm/i915/gem/i915_gem_tiling.c|   2 -
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |  15 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  32 ++-
 .../i915/gem/selftests/i915_gem_coherency.c   |  14 +-
 .../drm/i915/gem/selftests/i915_gem_context.c |  10 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c|   2 +
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |   2 -
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |   1 -
 drivers/gpu/drm/i915/gt/intel_ggtt.c  |   5 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h   |   2 -
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |   1 +
 drivers/gpu/drm/i915/i915_gem.c   |  16 +-
 drivers/gpu/drm/i915/i915_vma.c   | 225 +++---
 drivers/gpu/drm/i915/i915_vma_types.h |   6 -
 drivers/gpu/drm/i915/mm/i915_acquire_ctx.c|  11 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   4 +-
 .../drm/i915/selftests/intel_memory_region.c  |  17 +-
 27 files changed, 319 insertions(+), 396 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c 
b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
index bc0223716906..a32fd0d5570b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
@@ -27,16 +27,8 @@ static void __do_clflush(struct drm_i915_gem_object *obj)
 static int clflush_work(struct dma_fence_work *base)
 {
struct clflush *clflush = container_of(base, typeof(*clflush), base);
-   struct drm_i915_gem_object *obj = clflush->obj;
-   int err;
-
-   err = i915_gem_object_pin_pages(obj);
-   if (err)
-   return err;
-
-   __do_clflush(obj);
-   i915_gem_object_unpin_pages(obj);
 
+   __do_clflush(clflush->obj);
return 0;
 }
 
@@ -44,7 +36,7 @@ static void clflush_release(struct dma_fence_work *base)
 {
struct clflush *clflush = container_of(base, typeof(*clflush), base);
 
-   i915_gem_object_put(clflush->obj);
+   i915_gem_object_unpin_pages(clflush->obj);
 }
 
 static const struct dma_fence_work_ops clflush_ops = {
@@ -63,8 +55,14 @@ static struct clflush *clflush_work_create(struct 
drm_i915_gem_object *obj)
if (!clflush)
return NULL;
 
+   if (__i915_gem_object_get_pages_locked(obj)) {
+   kfree(clflush);
+   return NULL;
+   }
+
dma_fence_work_init(&clflush->base, &clflush_ops);
-   clflush->obj = i915_gem_object_get(obj); /* obj <-> clflush cycle */
+   __i915_gem_object_pin_pages(obj);
+   clflush->obj = obj; /* Beware the obj.resv <-> clflush fence cycle */
 
return clflush;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 2679380159fc..049a15e6b496 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -124,19 +124,12 @@ static int i915_gem_begin_cpu_access(struct dma_buf 
*dma_buf, enum dma_data_dire
bool write = (direction == DMA_B

[Intel-gfx] [PATCH 16/20] drm/i915/gem: Reintroduce multiple passes for reloc processing

2020-07-05 Thread Chris Wilson
The prospect of locking the entire submission sequence under a wide
ww_mutex re-imposes some key restrictions, in particular that we must
not call copy_(from|to)_user underneath the mutex (as the faulthandlers
themselves may need to take the ww_mutex). To satisfy this requirement,
we need to split the relocation handling into multiple phases again.
After dropping the reservations, we need to allocate enough buffer space
to both copy the relocations from userspace into, and serve as the
relocation command buffer. Once we have finished copying the
relocations, we can then re-aquire all the objects for the execbuf and
rebind them, including our new relocations objects. After we have bound
all the new and old objects into their final locations, we can then
convert the relocation entries into the GPU commands to update the
relocated vma. Finally, once it is all over and we have dropped the
ww_mutex for the last time, we can then complete the update of the user
relocation entries.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 842 --
 .../i915/gem/selftests/i915_gem_execbuffer.c  | 195 ++--
 2 files changed, 520 insertions(+), 517 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 629b736adf2c..fbf5c5cd51ca 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -35,6 +35,7 @@ struct eb_vma {
 
/** This vma's place in the execbuf reservation list */
struct drm_i915_gem_exec_object2 *exec;
+   u32 bias;
 
struct list_head bind_link;
struct list_head unbound_link;
@@ -60,15 +61,12 @@ struct eb_vma_array {
 #define __EXEC_OBJECT_HAS_PIN  BIT(31)
 #define __EXEC_OBJECT_HAS_FENCEBIT(30)
 #define __EXEC_OBJECT_NEEDS_MAPBIT(29)
-#define __EXEC_OBJECT_NEEDS_BIAS   BIT(28)
-#define __EXEC_OBJECT_INTERNAL_FLAGS   (~0u << 28) /* all of the above */
+#define __EXEC_OBJECT_INTERNAL_FLAGS   (~0u << 29) /* all of the above */
 
 #define __EXEC_HAS_RELOC   BIT(31)
 #define __EXEC_INTERNAL_FLAGS  (~0u << 31)
 #define UPDATE PIN_OFFSET_FIXED
 
-#define BATCH_OFFSET_BIAS (256*1024)
-
 #define __I915_EXEC_ILLEGAL_FLAGS \
(__I915_EXEC_UNKNOWN_FLAGS | \
 I915_EXEC_CONSTANTS_MASK  | \
@@ -266,20 +264,18 @@ struct i915_execbuffer {
 * obj/page
 */
struct reloc_cache {
-   struct drm_mm_node node; /** temporary GTT binding */
unsigned int gen; /** Cached value of INTEL_GEN */
bool use_64bit_reloc : 1;
-   bool has_llc : 1;
bool has_fence : 1;
bool needs_unfenced : 1;
 
struct intel_context *ce;
 
-   struct i915_vma *target;
-   struct i915_request *rq;
-   struct i915_vma *rq_vma;
-   u32 *rq_cmd;
-   unsigned int rq_size;
+   struct eb_relocs_link {
+   struct i915_vma *vma;
+   } head;
+   struct drm_i915_gem_relocation_entry *map;
+   unsigned int pos;
} reloc_cache;
 
struct eb_cmdparser {
@@ -288,7 +284,7 @@ struct i915_execbuffer {
} parser;
 
u64 invalid_flags; /** Set of execobj.flags that are invalid */
-   u32 context_flags; /** Set of execobj.flags to insert from the ctx */
+   u32 context_bias;
 
u32 batch_start_offset; /** Location within object of batch */
u32 batch_len; /** Length of batch within object */
@@ -308,6 +304,12 @@ static struct drm_i915_gem_exec_object2 no_entry = {
.offset = -1ull
 };
 
+static u64 noncanonical_addr(u64 addr, const struct i915_address_space *vm)
+{
+   GEM_BUG_ON(!is_power_of_2(vm->total));
+   return addr & (vm->total - 1);
+}
+
 static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb)
 {
return intel_engine_requires_cmd_parser(eb->engine) ||
@@ -479,11 +481,12 @@ static int eb_create(struct i915_execbuffer *eb)
return 0;
 }
 
-static bool
-eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry,
-const struct i915_vma *vma,
-unsigned int flags)
+static bool eb_vma_misplaced(const struct eb_vma *ev)
 {
+   const struct drm_i915_gem_exec_object2 *entry = ev->exec;
+   const struct i915_vma *vma = ev->vma;
+   unsigned int flags = ev->flags;
+
if (test_bit(I915_VMA_ERROR_BIT, __i915_vma_flags(vma)))
return true;
 
@@ -497,8 +500,7 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 
*entry,
vma->node.start != entry->offset)
return true;
 
-   if (flags & __EXEC_OBJECT_NEEDS_BIAS &&
-   vma->node.start < BATCH_OFFSET_BIAS)
+   if (vma->node.start < ev->bias)
return true;
 
if (!(flags & EXEC_OB

[Intel-gfx] [PATCH 01/20] drm/i915: Preallocate stashes for vma page-directories

2020-07-05 Thread Chris Wilson
We need to make the DMA allocations used for page directories to be
performed up front so that we can include those allocations in our
memory reservation pass. The downside is that we have to assume the
worst case, even before we know the final layout, and always allocate
enough page directories for this object, even when there will be overlap.

It should be noted that the lifetime for the page directories DMA is
more or less decoupled from individual fences as they will be shared
across objects across timelines.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_client_blt.c| 11 +--
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c  | 38 +++--
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 77 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c  | 60 +++---
 drivers/gpu/drm/i915/gt/intel_gtt.h   | 39 ++---
 drivers/gpu/drm/i915/gt/intel_ppgtt.c | 80 ---
 drivers/gpu/drm/i915/i915_vma.c   | 29 ---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 60 --
 drivers/gpu/drm/i915/selftests/mock_gtt.c | 22 ++---
 9 files changed, 230 insertions(+), 186 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c 
b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index 278664f831e7..947c8aa8e13e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -32,12 +32,13 @@ static void vma_clear_pages(struct i915_vma *vma)
vma->pages = NULL;
 }
 
-static int vma_bind(struct i915_address_space *vm,
-   struct i915_vma *vma,
-   enum i915_cache_level cache_level,
-   u32 flags)
+static void vma_bind(struct i915_address_space *vm,
+struct i915_vm_pt_stash *stash,
+struct i915_vma *vma,
+enum i915_cache_level cache_level,
+u32 flags)
 {
-   return vm->vma_ops.bind_vma(vm, vma, cache_level, flags);
+   vm->vma_ops.bind_vma(vm, stash, vma, cache_level, flags);
 }
 
 static void vma_unbind(struct i915_address_space *vm, struct i915_vma *vma)
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 05497b50103f..35e2b698f9ed 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -177,16 +177,16 @@ static void gen6_flush_pd(struct gen6_ppgtt *ppgtt, u64 
start, u64 end)
mutex_unlock(&ppgtt->flush);
 }
 
-static int gen6_alloc_va_range(struct i915_address_space *vm,
-  u64 start, u64 length)
+static void gen6_alloc_va_range(struct i915_address_space *vm,
+   struct i915_vm_pt_stash *stash,
+   u64 start, u64 length)
 {
struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
struct i915_page_directory * const pd = ppgtt->base.pd;
-   struct i915_page_table *pt, *alloc = NULL;
+   struct i915_page_table *pt;
intel_wakeref_t wakeref;
u64 from = start;
unsigned int pde;
-   int ret = 0;
 
wakeref = intel_runtime_pm_get(&vm->i915->runtime_pm);
 
@@ -197,21 +197,17 @@ static int gen6_alloc_va_range(struct i915_address_space 
*vm,
if (px_base(pt) == px_base(&vm->scratch[1])) {
spin_unlock(&pd->lock);
 
-   pt = fetch_and_zero(&alloc);
-   if (!pt)
-   pt = alloc_pt(vm);
-   if (IS_ERR(pt)) {
-   ret = PTR_ERR(pt);
-   goto unwind_out;
-   }
+   pt = stash->pt[0];
+   GEM_BUG_ON(!pt);
 
fill32_px(pt, vm->scratch[0].encode);
 
spin_lock(&pd->lock);
if (pd->entry[pde] == &vm->scratch[1]) {
+   stash->pt[0] = pt->stash;
+   atomic_set(&pt->used, 0);
pd->entry[pde] = pt;
} else {
-   alloc = pt;
pt = pd->entry[pde];
}
}
@@ -223,15 +219,7 @@ static int gen6_alloc_va_range(struct i915_address_space 
*vm,
if (i915_vma_is_bound(ppgtt->vma, I915_VMA_GLOBAL_BIND))
gen6_flush_pd(ppgtt, from, start);
 
-   goto out;
-
-unwind_out:
-   gen6_ppgtt_clear_range(vm, from, start - from);
-out:
-   if (alloc)
-   free_px(vm, alloc);
intel_runtime_pm_put(&vm->i915->runtime_pm, wakeref);
-   return ret;
 }
 
 static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt)
@@ -299,10 +287,11 @@ static void pd_vma_clear_pages(struct i915_vma *vma)
vma->pages = NULL;
 }
 
-static int pd_vma_bind(struct i915_address_space *vm,
- 

[Intel-gfx] s/obj->mm.lock//

2020-07-05 Thread Chris Wilson
This is the easy part; pulling reservation of multiple objects under an
ww acquire context. With one simple rule that eviction is handled by the
ww acquire context, we can carefully transition the driver over to using
eviction. Instead of feeding the acquire context everywhere, we make the
caller gather up all the objects they need to acquire into the context,
then acquire their backing store. The major boon here is that by
providing a clean set of objects (that we have not yet started to
acquire any auxiliary attachments for) to the acquire context, it can
handle all the EDEADLK processing for us [since it is a pure locking
operation and does not need to release attachments upon revoking the
locks].

As a sketch of what that would look like, to illustrate the major work
remaining:

static int evict(struct drm_i915_gem_object *obj, struct i915_acquire_ctx *ctx)
{
struct intel_memory_region *mem = obj->mm->region;
struct drm_i915_gem_object *swap; // struct i915_mm_bo *swap
struct i915_request *rq;
int err;

/* swap = mem->create_eviction_target(obj); */
swap = i915_gem_object_create_shmem(mem->i915, obj->base.size);
if (IS_ERR(swap))
return PTR_ERR(swap);

err = dma_resv_lock_interruptible(swap, &ctx->ctx);
GEM_BUG_ON(err == -EALREADY);
if (err == -EDEADLK)
goto out;

/* Obviously swap has to be carefully chosen so that this may succeed */
err = __i915_gem_object_get_pages_locked(swap);
if (err)
goto out_unlock;

rq = pinned_evict_copy(ctx, obj, swap);
if (IS_ERR(rq)) {
err = PTR_ERR(rq);
goto out_unlock;
}

err = i915_gem_revoke_mm(obj);
if (err)
goto out_request;

/* Alternatively you could wait synchronously! */
mem->release_blocks(&obj->mm->blocks, rq);
i915_mm_bo_put(xchg(&obj->mm, i915_mm_bo_get(swap)));

dma_resv_add_exclusive_fence(obj->base.resv, &rq->fence);
out_request:
i915_request_put(rq);
out_unlock:
dma_resv_unlock(swap);
out:
i915_gem_object_put(swap);
return err;
}

static int relock_all(struct i915_acquire_ctx *ctx)
{
struct i915_acquire_link *lnk, *lock;
int err;

for (lnk = ctx->locked; lnk; lnk = lnk->next)
dma_resv_unlock(lnk->obj->base.resv);

lock = fetch_and_zero(&ctx->locked);
while ((lnk = lock)) {
struct drm_i915_gem_object *obj;

obj = lnk->obj;
lock = lnk->next;

if (ctx->locked)
err = dma_resv_lock_interruptible(obj->base.resv,
  &ctx->ctx);
else
err = dma_resv_lock_slow_interruptible(obj->base.resv,
   &ctx->ctx);
GEM_BUG_ON(err == -EALREADY);
if (err == -EDEADLK) {
struct i915_acquire *old;

while ((old = ctx->locked)) {
dma_resv_unlock(old->obj->base.resv);
ctx->locked = old->next;
old->next = lock;
lock = old;
}

lnk->next = lock;
lock = lnk;
continue;
}
if (err) {
lock = lnk;
break;
}

lnk->next = ctx->locked;
ctx->locked = lnk;
}

while ((lnk = lock)) {
lock = lnk->next;
i915_gem_object_put(lnk->obj);
i915_acquire_free(lnk);
}

return err;
}

int i915_acquire_mm(struct i915_acquire_ctx *ctx)
{
struct i915_acquire_link *lnk;
int n, err;

restart:
for (lnk = ctx->locked; lnk; lnk = lnk->next) {
for (n = 0;
 !i915_gem_object_has_pages(lnk->obj);
 n++) {
struct drm_i915_gem_object *evictee = NULL;

mem = get_preferred_memregion_for_object(lnk->obj, n);
if (!mem)
return -ENXIO;

while (!i915_gem_object_get_pages(lnk->obj)) {
struct i915_acquire_link *this;

evictee = mem->get_eviction_candidate(mem,
  evictee);
if (!evictee)
break;

err = dma_resv_lock_interruptible(evictee,
  &ctx-

[Intel-gfx] [PATCH 07/20] drm/i915: Add list_for_each_entry_safe_continue_reverse

2020-07-05 Thread Chris Wilson
One more list iterator variant, for when we want to unwind from inside
one list iterator with the intention of restarting from the current
entry as the new head of the list.

Signed-off-by: Chris Wilson 
Reviewed-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_utils.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index 03a73d2bd50d..6ebccdd12d4c 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -266,6 +266,12 @@ static inline int list_is_last_rcu(const struct list_head 
*list,
return READ_ONCE(list->next) == head;
 }
 
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)
\
+   for (pos = list_prev_entry(pos, member),\
+n = list_prev_entry(pos, member);  \
+&pos->member != (head);\
+pos = n, n = list_prev_entry(n, member))
+
 /*
  * Wait until the work is finally complete, even if it tries to postpone
  * by requeueing itself. Note, that if the worker never cancels itself,
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 06/20] drm/i915/gem: Remove the call for no-evict i915_vma_pin

2020-07-05 Thread Chris Wilson
Remove the stub i915_vma_pin() used for incrementally pining objects for
execbuf (under the severe restriction that they must not wait on a
resource as we may have already pinned it) and replace it with a
i915_vma_pin_inplace() that is only allowed to reclaim the currently
bound location for the vma (and will never wait for a pinned resource).

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 69 +++
 drivers/gpu/drm/i915/i915_vma.c   |  6 +-
 drivers/gpu/drm/i915/i915_vma.h   |  2 +
 3 files changed, 45 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 35a57c1fc9c3..1015c4fd9f3e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -452,49 +452,55 @@ static u64 eb_pin_flags(const struct 
drm_i915_gem_exec_object2 *entry,
return pin_flags;
 }
 
+static bool eb_pin_vma_fence_inplace(struct eb_vma *ev)
+{
+   struct i915_vma *vma = ev->vma;
+   struct i915_fence_reg *reg = vma->fence;
+
+   if (reg) {
+   if (READ_ONCE(reg->dirty))
+   return false;
+
+   atomic_inc(®->pin_count);
+   ev->flags |= __EXEC_OBJECT_HAS_FENCE;
+   } else {
+   if (i915_gem_object_is_tiled(vma->obj))
+   return false;
+   }
+
+   return true;
+}
+
 static inline bool
-eb_pin_vma(struct i915_execbuffer *eb,
-  const struct drm_i915_gem_exec_object2 *entry,
-  struct eb_vma *ev)
+eb_pin_vma_inplace(struct i915_execbuffer *eb,
+  const struct drm_i915_gem_exec_object2 *entry,
+  struct eb_vma *ev)
 {
struct i915_vma *vma = ev->vma;
-   u64 pin_flags;
+   unsigned int pin_flags;
 
-   if (vma->node.size)
-   pin_flags = vma->node.start;
-   else
-   pin_flags = entry->offset & PIN_OFFSET_MASK;
+   if (eb_vma_misplaced(entry, vma, ev->flags))
+   return false;
 
-   pin_flags |= PIN_USER | PIN_NOEVICT | PIN_OFFSET_FIXED;
+   pin_flags = PIN_USER;
if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_GTT))
pin_flags |= PIN_GLOBAL;
 
/* Attempt to reuse the current location if available */
-   if (unlikely(i915_vma_pin(vma, 0, 0, pin_flags))) {
-   if (entry->flags & EXEC_OBJECT_PINNED)
-   return false;
-
-   /* Failing that pick any _free_ space if suitable */
-   if (unlikely(i915_vma_pin(vma,
- entry->pad_to_size,
- entry->alignment,
- eb_pin_flags(entry, ev->flags) |
- PIN_USER | PIN_NOEVICT)))
-   return false;
-   }
+   if (!i915_vma_pin_inplace(vma, pin_flags))
+   return false;
 
if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_FENCE)) {
-   if (unlikely(i915_vma_pin_fence(vma))) {
-   i915_vma_unpin(vma);
+   if (!eb_pin_vma_fence_inplace(ev)) {
+   __i915_vma_unpin(vma);
return false;
}
-
-   if (vma->fence)
-   ev->flags |= __EXEC_OBJECT_HAS_FENCE;
}
 
+   GEM_BUG_ON(eb_vma_misplaced(entry, vma, ev->flags));
+
ev->flags |= __EXEC_OBJECT_HAS_PIN;
-   return !eb_vma_misplaced(entry, vma, ev->flags);
+   return true;
 }
 
 static int
@@ -676,14 +682,17 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
struct drm_i915_gem_exec_object2 *entry = ev->exec;
struct i915_vma *vma = ev->vma;
 
-   if (eb_pin_vma(eb, entry, ev)) {
+   if (eb_pin_vma_inplace(eb, entry, ev)) {
if (entry->offset != vma->node.start) {
entry->offset = vma->node.start | UPDATE;
eb->args->flags |= __EXEC_HAS_RELOC;
}
} else {
-   eb_unreserve_vma(ev);
-   list_add_tail(&ev->unbound_link, &unbound);
+   /* Lightly sort user placed objects to the fore */
+   if (ev->flags & EXEC_OBJECT_PINNED)
+   list_add(&ev->unbound_link, &unbound);
+   else
+   list_add_tail(&ev->unbound_link, &unbound);
}
}
 
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 3b43d485d7c2..0c9af30fc3d6 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -740,11 +740,13 @@ i915_vma_detach(struct i915_vma *vma)
list_del(&vma->vm_link);
 }
 
-static bool t

[Intel-gfx] [PATCH 11/20] drm/i915/gem: Separate the ww_mutex walker into its own list

2020-07-05 Thread Chris Wilson
In preparation for making eb_vma bigger and heavy to run inn parallel,
we need to stop apply an in-place swap() to reorder around ww_mutex
deadlocks. Keep the array intact and reorder the locks using a dedicated
list.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 83 ---
 1 file changed, 54 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 6d20be29ff3c..4d8ac89c56fc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -37,6 +37,7 @@ struct eb_vma {
struct list_head bind_link;
struct list_head unbound_link;
struct list_head reloc_link;
+   struct list_head submit_link;
 
struct hlist_node node;
u32 handle;
@@ -248,6 +249,8 @@ struct i915_execbuffer {
/** list of vma that have execobj.relocation_count */
struct list_head relocs;
 
+   struct list_head submit_list;
+
/**
 * Track the most recently used object for relocations, as we
 * frequently have to perform multiple relocations within the same
@@ -341,6 +344,42 @@ static void eb_vma_array_put(struct eb_vma_array *arr)
kref_put(&arr->kref, eb_vma_array_destroy);
 }
 
+static int
+eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire)
+{
+   struct eb_vma *ev;
+   int err = 0;
+
+   list_for_each_entry(ev, &eb->submit_list, submit_link) {
+   struct i915_vma *vma = ev->vma;
+
+   err = ww_mutex_lock_interruptible(&vma->resv->lock, acquire);
+   if (err == -EDEADLK) {
+   struct eb_vma *unlock = ev, *en;
+
+   list_for_each_entry_safe_continue_reverse(unlock, en,
+ 
&eb->submit_list,
+ submit_link) {
+   ww_mutex_unlock(&unlock->vma->resv->lock);
+   list_move_tail(&unlock->submit_link, 
&eb->submit_list);
+   }
+
+   GEM_BUG_ON(!list_is_first(&ev->submit_link, 
&eb->submit_list));
+   err = ww_mutex_lock_slow_interruptible(&vma->resv->lock,
+  acquire);
+   }
+   if (err) {
+   list_for_each_entry_continue_reverse(ev,
+&eb->submit_list,
+submit_link)
+   ww_mutex_unlock(&ev->vma->resv->lock);
+   break;
+   }
+   }
+
+   return err;
+}
+
 static int eb_create(struct i915_execbuffer *eb)
 {
/* Allocate an extra slot for use by the command parser + sentinel */
@@ -393,6 +432,10 @@ static int eb_create(struct i915_execbuffer *eb)
eb->lut_size = -eb->buffer_count;
}
 
+   INIT_LIST_HEAD(&eb->bind_list);
+   INIT_LIST_HEAD(&eb->submit_list);
+   INIT_LIST_HEAD(&eb->relocs);
+
return 0;
 }
 
@@ -574,6 +617,7 @@ eb_add_vma(struct i915_execbuffer *eb,
}
 
list_add_tail(&ev->bind_link, &eb->bind_list);
+   list_add_tail(&ev->submit_link, &eb->submit_list);
 
if (entry->relocation_count)
list_add_tail(&ev->reloc_link, &eb->relocs);
@@ -910,9 +954,6 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
unsigned int i;
int err = 0;
 
-   INIT_LIST_HEAD(&eb->bind_list);
-   INIT_LIST_HEAD(&eb->relocs);
-
for (i = 0; i < eb->buffer_count; i++) {
struct i915_vma *vma;
 
@@ -1583,38 +1624,19 @@ static int eb_relocate(struct i915_execbuffer *eb)
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
-   const unsigned int count = eb->buffer_count;
struct ww_acquire_ctx acquire;
-   unsigned int i;
+   struct eb_vma *ev;
int err = 0;
 
ww_acquire_init(&acquire, &reservation_ww_class);
 
-   for (i = 0; i < count; i++) {
-   struct eb_vma *ev = &eb->vma[i];
-   struct i915_vma *vma = ev->vma;
-
-   err = ww_mutex_lock_interruptible(&vma->resv->lock, &acquire);
-   if (err == -EDEADLK) {
-   GEM_BUG_ON(i == 0);
-   do {
-   int j = i - 1;
-
-   ww_mutex_unlock(&eb->vma[j].vma->resv->lock);
-
-   swap(eb->vma[i],  eb->vma[j]);
-   } while (--i);
+   err = eb_lock_vma(eb, &acquire);
+   if (err)
+   goto err_fini;
 
-   err = ww_mutex_lock_slow_interruptible(&vma->resv->lock,
-   

[Intel-gfx] [PATCH 18/20] drm/i915/gem: Pull execbuf dma resv under a single critical section

2020-07-05 Thread Chris Wilson
Acquire all the objects and their backing storage, and page directories,
as used by execbuf under a single common ww_mutex. Albeit we have to
restart the critical section a few times in order to handle various
restrictions (such as avoiding copy_(from|to)_user and mmap_sem).

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 157 --
 .../i915/gem/selftests/i915_gem_execbuffer.c  |   8 +-
 2 files changed, 78 insertions(+), 87 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index fbf5c5cd51ca..bcd100f8a6c7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -20,6 +20,7 @@
 #include "gt/intel_gt_pm.h"
 #include "gt/intel_gt_requests.h"
 #include "gt/intel_ring.h"
+#include "mm/i915_acquire_ctx.h"
 
 #include "i915_drv.h"
 #include "i915_gem_clflush.h"
@@ -244,6 +245,8 @@ struct i915_execbuffer {
struct intel_context *context; /* logical state for the request */
struct i915_gem_context *gem_context; /** caller's context */
 
+   struct i915_acquire_ctx acquire; /** lock for _all_ DMA reservations */
+
struct i915_request *request; /** our request to build */
struct eb_vma *batch; /** identity of the batch obj/vma */
 
@@ -386,42 +389,6 @@ static void eb_vma_array_put(struct eb_vma_array *arr)
kref_put(&arr->kref, eb_vma_array_destroy);
 }
 
-static int
-eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire)
-{
-   struct eb_vma *ev;
-   int err = 0;
-
-   list_for_each_entry(ev, &eb->submit_list, submit_link) {
-   struct i915_vma *vma = ev->vma;
-
-   err = ww_mutex_lock_interruptible(&vma->resv->lock, acquire);
-   if (err == -EDEADLK) {
-   struct eb_vma *unlock = ev, *en;
-
-   list_for_each_entry_safe_continue_reverse(unlock, en,
- 
&eb->submit_list,
- submit_link) {
-   ww_mutex_unlock(&unlock->vma->resv->lock);
-   list_move_tail(&unlock->submit_link, 
&eb->submit_list);
-   }
-
-   GEM_BUG_ON(!list_is_first(&ev->submit_link, 
&eb->submit_list));
-   err = ww_mutex_lock_slow_interruptible(&vma->resv->lock,
-  acquire);
-   }
-   if (err) {
-   list_for_each_entry_continue_reverse(ev,
-&eb->submit_list,
-submit_link)
-   ww_mutex_unlock(&ev->vma->resv->lock);
-   break;
-   }
-   }
-
-   return err;
-}
-
 static int eb_create(struct i915_execbuffer *eb)
 {
/* Allocate an extra slot for use by the sentinel */
@@ -665,6 +632,25 @@ eb_add_vma(struct i915_execbuffer *eb,
}
 }
 
+static int eb_lock_mm(struct i915_execbuffer *eb)
+{
+   struct eb_vma *ev;
+   int err;
+
+   list_for_each_entry(ev, &eb->bind_list, bind_link) {
+   err = i915_acquire_ctx_lock(&eb->acquire, ev->vma->obj);
+   if (err)
+   return err;
+   }
+
+   return 0;
+}
+
+static int eb_acquire_mm(struct i915_execbuffer *eb)
+{
+   return i915_acquire_mm(&eb->acquire);
+}
+
 struct eb_vm_work {
struct dma_fence_work base;
struct eb_vma_array *array;
@@ -1390,7 +1376,15 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
unsigned long count;
struct eb_vma *ev;
unsigned int pass;
-   int err = 0;
+   int err;
+
+   err = eb_lock_mm(eb);
+   if (err)
+   return err;
+
+   err = eb_acquire_mm(eb);
+   if (err)
+   return err;
 
count = 0;
INIT_LIST_HEAD(&unbound);
@@ -1416,10 +1410,19 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
if (count == 0)
return 0;
 
+   /* We need to reserve page directories, release all, start over */
+   i915_acquire_ctx_fini(&eb->acquire);
+
pass = 0;
do {
struct eb_vm_work *work;
 
+   i915_acquire_ctx_init(&eb->acquire);
+
+   err = eb_lock_mm(eb);
+   if (err)
+   return err;
+
work = eb_vm_work(eb, count);
if (!work)
return -ENOMEM;
@@ -1437,6 +1440,10 @@ static int eb_reserve_vm(struct i915_execbuffer *eb)
}
}
 
+   err = eb_acquire_mm(eb);
+   if (err)
+   return eb_vm_work_cancel(work, err);
+

[Intel-gfx] [PATCH 03/20] drm/i915/gem: Don't drop the timeline lock during execbuf

2020-07-05 Thread Chris Wilson
Our timeline lock is our defence against a concurrent execbuf
interrupting our request construction. we need hold it throughout or,
for example, a second thread may interject a relocation request in
between our own relocation request and execution in the ring.

A second, major benefit, is that it allows us to preserve a large chunk
of the ringbuffer for our exclusive use; which should virtually
eliminate the threat of hitting a wait_for_space during request
construction -- although we should have already dropped other
contentious locks at that point.

Signed-off-by: Chris Wilson 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 241 --
 .../i915/gem/selftests/i915_gem_execbuffer.c  |  24 +-
 2 files changed, 186 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index dd15a799f9d6..e4d06db3f313 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -259,6 +259,8 @@ struct i915_execbuffer {
bool has_fence : 1;
bool needs_unfenced : 1;
 
+   struct intel_context *ce;
+
struct i915_vma *target;
struct i915_request *rq;
struct i915_vma *rq_vma;
@@ -639,6 +641,35 @@ static int eb_reserve_vma(const struct i915_execbuffer *eb,
return 0;
 }
 
+static void retire_requests(struct intel_timeline *tl, struct i915_request 
*end)
+{
+   struct i915_request *rq, *rn;
+
+   list_for_each_entry_safe(rq, rn, &tl->requests, link)
+   if (rq == end || !i915_request_retire(rq))
+   break;
+}
+
+static int wait_for_timeline(struct intel_timeline *tl)
+{
+   do {
+   struct dma_fence *fence;
+   int err;
+
+   fence = i915_active_fence_get(&tl->last_request);
+   if (!fence)
+   return 0;
+
+   err = dma_fence_wait(fence, true);
+   dma_fence_put(fence);
+   if (err)
+   return err;
+
+   /* Retiring may trigger a barrier, requiring an extra pass */
+   retire_requests(tl, NULL);
+   } while (1);
+}
+
 static int eb_reserve(struct i915_execbuffer *eb)
 {
const unsigned int count = eb->buffer_count;
@@ -646,7 +677,6 @@ static int eb_reserve(struct i915_execbuffer *eb)
struct list_head last;
struct eb_vma *ev;
unsigned int i, pass;
-   int err = 0;
 
/*
 * Attempt to pin all of the buffers into the GTT.
@@ -662,18 +692,22 @@ static int eb_reserve(struct i915_execbuffer *eb)
 * room for the earlier objects *unless* we need to defragment.
 */
 
-   if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
-   return -EINTR;
-
pass = 0;
do {
+   int err = 0;
+
+   if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
+   return -EINTR;
+
list_for_each_entry(ev, &eb->unbound, bind_link) {
err = eb_reserve_vma(eb, ev, pin_flags);
if (err)
break;
}
-   if (!(err == -ENOSPC || err == -EAGAIN))
-   break;
+   if (!(err == -ENOSPC || err == -EAGAIN)) {
+   mutex_unlock(&eb->i915->drm.struct_mutex);
+   return err;
+   }
 
/* Resort *all* the objects into priority order */
INIT_LIST_HEAD(&eb->unbound);
@@ -702,11 +736,10 @@ static int eb_reserve(struct i915_execbuffer *eb)
list_add_tail(&ev->bind_link, &last);
}
list_splice_tail(&last, &eb->unbound);
+   mutex_unlock(&eb->i915->drm.struct_mutex);
 
if (err == -EAGAIN) {
-   mutex_unlock(&eb->i915->drm.struct_mutex);
flush_workqueue(eb->i915->mm.userptr_wq);
-   mutex_lock(&eb->i915->drm.struct_mutex);
continue;
}
 
@@ -715,25 +748,23 @@ static int eb_reserve(struct i915_execbuffer *eb)
break;
 
case 1:
-   /* Too fragmented, unbind everything and retry */
-   mutex_lock(&eb->context->vm->mutex);
-   err = i915_gem_evict_vm(eb->context->vm);
-   mutex_unlock(&eb->context->vm->mutex);
+   /*
+* Too fragmented, retire everything on the timeline
+* and so make it all [contexts included] available to
+* evict.
+*/
+   err = wait_for_timeline(eb->context->timeline);
if (err)
- 

Re: [Intel-gfx] linux-next: manual merge of the drm-intel tree with the drm-intel-fixes tree

2020-07-05 Thread Stephen Rothwell
Hi all,

On Tue, 30 Jun 2020 11:52:02 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the drm-intel tree got a conflict in:
> 
>   drivers/gpu/drm/i915/gvt/handlers.c
> 
> between commit:
> 
>   fc1e3aa0337c ("drm/i915/gvt: Fix incorrect check of enabled bits in mask 
> registers")
> 
> from the drm-intel-fixes tree and commit:
> 
>   5f4ae2704d59 ("drm/i915: Identify Cometlake platform")
> 
> from the drm-intel tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> diff --cc drivers/gpu/drm/i915/gvt/handlers.c
> index fadd2adb8030,26cae4846c82..
> --- a/drivers/gpu/drm/i915/gvt/handlers.c
> +++ b/drivers/gpu/drm/i915/gvt/handlers.c
> @@@ -1731,8 -1734,9 +1734,9 @@@ static int ring_mode_mmio_write(struct 
>   return 0;
>   }
>   
> - if (IS_COFFEELAKE(vgpu->gvt->gt->i915) &&
> + if ((IS_COFFEELAKE(vgpu->gvt->gt->i915) ||
> +  IS_COMETLAKE(vgpu->gvt->gt->i915)) &&
>  -data & _MASKED_BIT_ENABLE(2)) {
>  +IS_MASKED_BITS_ENABLED(data, 2)) {
>   enter_failsafe_mode(vgpu, GVT_FAILSAFE_UNSUPPORTED_GUEST);
>   return 0;
>   }

This is now a conflict between the drm tree and Linus' tree.

-- 
Cheers,
Stephen Rothwell


pgpC3peoNAXK7.pgp
Description: OpenPGP digital signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 17/23] drm/i915/gem: Asynchronous GTT unbinding

2020-07-05 Thread Chris Wilson
Quoting Andi Shyti (2020-07-05 23:01:29)
> Hi Chris,
> 
> > +static int gen6_fixup_ggtt(struct i915_vma *vma)
> 
> you create this function here and remove it in patch 21. This
> series is a bit confusing, can we have a final version of it?

It get's removed because the next patches reorder all the pinning around
this central function. Until that occurs, the fixup occurs after we do
the pinning.  And those patches depend on this to provide the central
pinning. So it's a circular dependency and this patch needs to provide
the fixup so that it works by itself.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 17/23] drm/i915/gem: Asynchronous GTT unbinding

2020-07-05 Thread Andi Shyti
Hi Chris,

> +static int gen6_fixup_ggtt(struct i915_vma *vma)

you create this function here and remove it in patch 21. This
series is a bit confusing, can we have a final version of it?

Andi
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 5/6] drm/i915/display: use port_info in intel_ddi_init

2020-07-05 Thread kernel test robot
Hi Lucas,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on drm-intel/for-linux-next]
[cannot apply to drm-tip/drm-tip v5.8-rc3 next-20200703]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Lucas-De-Marchi/display-ddi-keep-register-indexes-in-a-table/20200625-081557
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: x86_64-randconfig-a002-20200705 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
f804bd586ee58199db4cfb2da8e9ef067425900b)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/i915/display/intel_ddi.c:4900:28: error: format string is 
>> not a string literal (potentially insecure) [-Werror,-Wformat-security]
DRM_MODE_ENCODER_TMDS, port_info->name);
   ^~~
   drivers/gpu/drm/i915/display/intel_ddi.c:4900:28: note: treat the string as 
an argument to avoid this
DRM_MODE_ENCODER_TMDS, port_info->name);
   ^
   "%s", 
   1 error generated.

vim +4900 drivers/gpu/drm/i915/display/intel_ddi.c

  4860  
  4861  void intel_ddi_init(struct drm_i915_private *dev_priv,
  4862  const struct intel_ddi_port_info *port_info)
  4863  {
  4864  enum port port = port_info->port;
  4865  struct intel_digital_port *intel_dig_port;
  4866  struct intel_encoder *encoder;
  4867  bool init_hdmi, init_dp, init_lspcon = false;
  4868  
  4869  init_hdmi = intel_bios_port_supports_dvi(dev_priv, port) ||
  4870  intel_bios_port_supports_hdmi(dev_priv, port);
  4871  init_dp = intel_bios_port_supports_dp(dev_priv, port);
  4872  
  4873  if (intel_bios_is_lspcon_present(dev_priv, port)) {
  4874  /*
  4875   * Lspcon device needs to be driven with DP connector
  4876   * with special detection sequence. So make sure DP
  4877   * is initialized before lspcon.
  4878   */
  4879  init_dp = true;
  4880  init_lspcon = true;
  4881  init_hdmi = false;
  4882  drm_dbg_kms(&dev_priv->drm, "VBT says port %s has 
lspcon\n",
  4883  port_info->name);
  4884  }
  4885  
  4886  if (!init_dp && !init_hdmi) {
  4887  drm_dbg_kms(&dev_priv->drm,
  4888  "VBT says port %s is not DVI/HDMI/DP 
compatible, respect it\n",
  4889  port_info->name);
  4890  return;
  4891  }
  4892  
  4893  intel_dig_port = kzalloc(sizeof(*intel_dig_port), GFP_KERNEL);
  4894  if (!intel_dig_port)
  4895  return;
  4896  
  4897  encoder = &intel_dig_port->base;
  4898  
  4899  drm_encoder_init(&dev_priv->drm, &encoder->base, 
&intel_ddi_funcs,
> 4900   DRM_MODE_ENCODER_TMDS, port_info->name);

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx