Re: [Intel-gfx] [PATCH 8/8] drm/i915: Wait on external rendering for GEM objects

2016-07-20 Thread Joonas Lahtinen
On ti, 2016-07-19 at 15:27 +0100, Chris Wilson wrote:
> On Tue, Jul 19, 2016 at 05:12:22PM +0300, Joonas Lahtinen wrote:
> > 
> > On ti, 2016-07-19 at 11:51 +0100, Chris Wilson wrote:
> > > 
> > > + resv = i915_gem_object_get_dmabuf_resv(obj);
> > > + if (resv) {
> > > + long err;
> > We already have ret in the function scope.
> ret is int, we need a long. At least I can attest to our test coverage!
> 
> > 
> > > 
> > > @@ -3402,13 +3414,13 @@ i915_gem_object_set_to_gtt_domain(struct 
> > > drm_i915_gem_object *obj, bool write)
> > >   struct i915_vma *vma;
> > >   int ret;
> > >  
> > > - if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
> > > - return 0;
> > > -
> > >   ret = i915_gem_object_wait_rendering(obj, !write);
> > >   if (ret)
> > >   return ret;
> > >  
> > > + if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
> > > + return 0;
> > > +
> > Not sure I follow this change, wait rendering would only be issued if
> > DOMAIN_CPU was in place, this function was about moving to gtt_domain
> > so should really be NOP if in GTT domain already, no? Maybe calling
> > site should do an explicit wait_rendering if needed.
> No, this is where we do the wait rendering. The API says
> set-to-gtt-domain implies that it is out of the GPU domain (no
> rendering) and out of the CPU domain. (Similarly for set-to-cpu-domain.)
> The relaxation I've applied here is to try and catch third parties that
> have not updated our domain tracking for their access.
> 
> Moving the wait_rendering to the parent is a fair amount of burden.
> 
> > 
> > > 
> > >   vma->pin_count = 0;
> > > - ret = i915_vma_unbind(vma);
> > Maybe add a comment/TODO/FIXME: about the potential WARN.
> __i915_vma_unbind_no_wait() is itself an automatic FIXME - it only
> exists to cover up a WARN. (I wish we had this series in place first so
> that such a hack wasn't applied.)

OK then,

Reviewed-by: Joonas Lahtinen 

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Wait on external rendering for GEM objects

2016-07-19 Thread Chris Wilson
On Tue, Jul 19, 2016 at 05:12:22PM +0300, Joonas Lahtinen wrote:
> On ti, 2016-07-19 at 11:51 +0100, Chris Wilson wrote:
> > +   resv = i915_gem_object_get_dmabuf_resv(obj);
> > +   if (resv) {
> > +   long err;
> 
> We already have ret in the function scope.

ret is int, we need a long. At least I can attest to our test coverage!

> > @@ -3402,13 +3414,13 @@ i915_gem_object_set_to_gtt_domain(struct 
> > drm_i915_gem_object *obj, bool write)
> >     struct i915_vma *vma;
> >     int ret;
> >  
> > -   if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
> > -   return 0;
> > -
> >     ret = i915_gem_object_wait_rendering(obj, !write);
> >     if (ret)
> >     return ret;
> >  
> > +   if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
> > +   return 0;
> > +
> 
> Not sure I follow this change, wait rendering would only be issued if
> DOMAIN_CPU was in place, this function was about moving to gtt_domain
> so should really be NOP if in GTT domain already, no? Maybe calling
> site should do an explicit wait_rendering if needed.

No, this is where we do the wait rendering. The API says
set-to-gtt-domain implies that it is out of the GPU domain (no
rendering) and out of the CPU domain. (Similarly for set-to-cpu-domain.)
The relaxation I've applied here is to try and catch third parties that
have not updated our domain tracking for their access.

Moving the wait_rendering to the parent is a fair amount of burden.

> >     vma->pin_count = 0;
> > -   ret = i915_vma_unbind(vma);
> 
> Maybe add a comment/TODO/FIXME: about the potential WARN.

__i915_vma_unbind_no_wait() is itself an automatic FIXME - it only
exists to cover up a WARN. (I wish we had this series in place first so
that such a hack wasn't applied.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 8/8] drm/i915: Wait on external rendering for GEM objects

2016-07-19 Thread Joonas Lahtinen
On ti, 2016-07-19 at 11:51 +0100, Chris Wilson wrote:
> When transitioning to the GTT or CPU domain we wait on all rendering
> from i915 to complete (with the optimisation of allowing concurrent read
> access by both the GPU and client). We don't yet ensure all rendering
> from third parties (tracked by implicit fences on the dma-buf) is
> complete. Since implicitly tracked rendering by third parties will
> ignore our cache-domain tracking, we have to always wait upon rendering
> from third-parties when transitioning to direct access to the backing
> store. We still rely on clients notifying us of cache domain changes
> (i.e. they need to move to the GTT read or write domain after doing a CPU
> access before letting the third party render again).
> 
> v2:
> This introduces a potential WARN_ON into i915_gem_object_free() as the
> current i915_vma_unbind() calls i915_gem_object_wait_rendering(). To
> hit this path we first need to render with the GPU, have a dma-buf
> attached with an unsignaled fence and then interrupt the wait. It does
> get fixed later in the series (when i915_vma_unbind() only waits on the
> active VMA and not all, including third-party, rendering.
> 
> To offset that risk, use the __i915_vma_unbind_no_wait hack.
> 
> Testcase: igt/prime_vgem/basic-fence-read
> Testcase: igt/prime_vgem/basic-fence-mmap
> Signed-off-by: Chris Wilson 
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 44 
> ++---
>  1 file changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 079e09cee16a..37868cc594cb 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -29,10 +29,12 @@
>  #include 
>  #include 
>  #include "i915_drv.h"
> +#include "i915_gem_dmabuf.h"
>  #include "i915_vgpu.h"
>  #include "i915_trace.h"
>  #include "intel_drv.h"
>  #include "intel_mocs.h"
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -511,6 +513,10 @@ int i915_gem_obj_prepare_shmem_read(struct 
> drm_i915_gem_object *obj,
>   if (WARN_ON(!i915_gem_object_has_struct_page(obj)))
>   return -EINVAL;
>  
> + ret = i915_gem_object_wait_rendering(obj, true);
> + if (ret)
> + return ret;
> +
>   if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
>   /* If we're not in the cpu read domain, set ourself into the gtt
>    * read domain and manually flush cachelines (if required). This
> @@ -518,9 +524,6 @@ int i915_gem_obj_prepare_shmem_read(struct 
> drm_i915_gem_object *obj,
>    * anyway again before the next pread happens. */
>   *needs_clflush = !cpu_cache_is_coherent(obj->base.dev,
>   obj->cache_level);
> - ret = i915_gem_object_wait_rendering(obj, true);
> - if (ret)
> - return ret;
>   }
>  
>   ret = i915_gem_object_get_pages(obj);
> @@ -1132,15 +1135,16 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
>  
>   obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
>  
> + ret = i915_gem_object_wait_rendering(obj, false);
> + if (ret)
> + return ret;
> +
>   if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
>   /* If we're not in the cpu write domain, set ourself into the 
> gtt
>    * write domain and manually flush cachelines (if required). 
> This
>    * optimizes for the case when the gpu will use the data
>    * right away and we therefore have to clflush anyway. */
>   needs_clflush_after = cpu_write_needs_clflush(obj);
> - ret = i915_gem_object_wait_rendering(obj, false);
> - if (ret)
> - return ret;
>   }
>   /* Same trick applies to invalidate partially written cachelines read
>    * before writing. */
> @@ -1335,11 +1339,9 @@ int
>  i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
>      bool readonly)
>  {
> + struct reservation_object *resv;
>   int ret, i;
>  
> - if (!obj->active)
> - return 0;
> -
>   if (readonly) {
>   if (obj->last_write_req != NULL) {
>   ret = i915_wait_request(obj->last_write_req);
> @@ -1366,6 +1368,16 @@ i915_gem_object_wait_rendering(struct 
> drm_i915_gem_object *obj,
>   GEM_BUG_ON(obj->active);
>   }
>  
> + resv = i915_gem_object_get_dmabuf_resv(obj);
> + if (resv) {
> + long err;

We already have ret in the function scope.

> +
> + err = reservation_object_wait_timeout_rcu(resv, !readonly, true,
> +   MAX_SCHEDULE_TIMEOUT);
> + if (err < 0)
> + return err;
> + }
> +
>   return 0;
>  }
>  
> @@ -3402,13 +3414,13 @@ i915_gem_object_set_to_gtt_domain(struct 
>

[Intel-gfx] [PATCH 8/8] drm/i915: Wait on external rendering for GEM objects

2016-07-19 Thread Chris Wilson
When transitioning to the GTT or CPU domain we wait on all rendering
from i915 to complete (with the optimisation of allowing concurrent read
access by both the GPU and client). We don't yet ensure all rendering
from third parties (tracked by implicit fences on the dma-buf) is
complete. Since implicitly tracked rendering by third parties will
ignore our cache-domain tracking, we have to always wait upon rendering
from third-parties when transitioning to direct access to the backing
store. We still rely on clients notifying us of cache domain changes
(i.e. they need to move to the GTT read or write domain after doing a CPU
access before letting the third party render again).

v2:
This introduces a potential WARN_ON into i915_gem_object_free() as the
current i915_vma_unbind() calls i915_gem_object_wait_rendering(). To
hit this path we first need to render with the GPU, have a dma-buf
attached with an unsignaled fence and then interrupt the wait. It does
get fixed later in the series (when i915_vma_unbind() only waits on the
active VMA and not all, including third-party, rendering.

To offset that risk, use the __i915_vma_unbind_no_wait hack.

Testcase: igt/prime_vgem/basic-fence-read
Testcase: igt/prime_vgem/basic-fence-mmap
Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_gem.c | 44 ++---
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 079e09cee16a..37868cc594cb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -29,10 +29,12 @@
 #include 
 #include 
 #include "i915_drv.h"
+#include "i915_gem_dmabuf.h"
 #include "i915_vgpu.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
 #include "intel_mocs.h"
+#include 
 #include 
 #include 
 #include 
@@ -511,6 +513,10 @@ int i915_gem_obj_prepare_shmem_read(struct 
drm_i915_gem_object *obj,
if (WARN_ON(!i915_gem_object_has_struct_page(obj)))
return -EINVAL;
 
+   ret = i915_gem_object_wait_rendering(obj, true);
+   if (ret)
+   return ret;
+
if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
/* If we're not in the cpu read domain, set ourself into the gtt
 * read domain and manually flush cachelines (if required). This
@@ -518,9 +524,6 @@ int i915_gem_obj_prepare_shmem_read(struct 
drm_i915_gem_object *obj,
 * anyway again before the next pread happens. */
*needs_clflush = !cpu_cache_is_coherent(obj->base.dev,
obj->cache_level);
-   ret = i915_gem_object_wait_rendering(obj, true);
-   if (ret)
-   return ret;
}
 
ret = i915_gem_object_get_pages(obj);
@@ -1132,15 +1135,16 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 
obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
 
+   ret = i915_gem_object_wait_rendering(obj, false);
+   if (ret)
+   return ret;
+
if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
/* If we're not in the cpu write domain, set ourself into the 
gtt
 * write domain and manually flush cachelines (if required). 
This
 * optimizes for the case when the gpu will use the data
 * right away and we therefore have to clflush anyway. */
needs_clflush_after = cpu_write_needs_clflush(obj);
-   ret = i915_gem_object_wait_rendering(obj, false);
-   if (ret)
-   return ret;
}
/* Same trick applies to invalidate partially written cachelines read
 * before writing. */
@@ -1335,11 +1339,9 @@ int
 i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
   bool readonly)
 {
+   struct reservation_object *resv;
int ret, i;
 
-   if (!obj->active)
-   return 0;
-
if (readonly) {
if (obj->last_write_req != NULL) {
ret = i915_wait_request(obj->last_write_req);
@@ -1366,6 +1368,16 @@ i915_gem_object_wait_rendering(struct 
drm_i915_gem_object *obj,
GEM_BUG_ON(obj->active);
}
 
+   resv = i915_gem_object_get_dmabuf_resv(obj);
+   if (resv) {
+   long err;
+
+   err = reservation_object_wait_timeout_rcu(resv, !readonly, true,
+ MAX_SCHEDULE_TIMEOUT);
+   if (err < 0)
+   return err;
+   }
+
return 0;
 }
 
@@ -3402,13 +3414,13 @@ i915_gem_object_set_to_gtt_domain(struct 
drm_i915_gem_object *obj, bool write)
struct i915_vma *vma;
int ret;
 
-   if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
-   return 0;
-
ret = i915_gem_object_wait_rendering(obj, !write);
if (