Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-19 Thread akash goel
On Sun, Aug 9, 2015 at 7:02 PM, Goel, Akash akash.g...@intel.com wrote:


 On 8/9/2015 6:19 PM, Chris Wilson wrote:

 On Sun, Aug 09, 2015 at 05:11:52PM +0530, Goel, Akash wrote:



 On 8/9/2015 4:25 PM, Chris Wilson wrote:

 On Sun, Aug 09, 2015 at 04:23:01PM +0530, Goel, Akash wrote:

 On 8/7/2015 1:37 PM, Daniel Vetter wrote:

 I presume though you only want to avoid clflush when actually purging
 an
 object, so maybe we can keep this by purging the shmem backing node
 first
 and checking here for __I915_MADV_PURGED instead?


 An object marked as MADV_DONT_NEED, implies that it will be
 purged/truncated right away after the call to put_pages_gtt
 function.
 So doing the other way round by purging first and then checking for
 __I915_MADV_PURGED, might be equivalent.


 But disregards a few nice sanity checks, which I would like to keep.
 -Chris

 Fine, just wanted to convey that doing the other way round may not
 be really beneficial.

 About the other point of virtually indexed/physically tagged cache,
 would it be safe just rely on the MADV_DONT_NEED state of the object
 (which indicates that there are no active CPU mmappings) ?
 Due to an earlier CPU mmappings, there could be cachelines holding
 the stale data ?


 If the conflicts survive munmap(), I don't have a clever idea on how to
 avoid the clflush before we hand back the pages to the system.

 One case could be, as you suggested, check if ever there was a CPU mapping
 created for the object  so avoid the clflush for GPU (GPU + GTT) only
 objects.


We have verified (on BYT/CHV) that cachelines corresponding to
object's pages do not
get invalidated on munmap, so there is a possibility of stale data
even when no CPU mappings are active.

So we need to amend this patch to also check if ever there was a CPU
mapping created for the object.

Best regards
Akash

 Best regards
 Akash

 -Chris

 ___
 Intel-gfx mailing list
 Intel-gfx@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-12 Thread Daniel Vetter
On Fri, Aug 07, 2015 at 02:07:39PM +0100, Chris Wilson wrote:
 On Fri, Aug 07, 2015 at 01:55:01PM +0200, Daniel Vetter wrote:
  On Fri, Aug 07, 2015 at 11:10:58AM +0100, Chris Wilson wrote:
   On Fri, Aug 07, 2015 at 10:07:28AM +0200, Daniel Vetter wrote:
On Thu, Aug 06, 2015 at 05:43:39PM +0100, Chris Wilson wrote:
But it's still salvageable I think since we only care about coherency 
for
the gpu (where data might be stuck in cpu caches). From the cpu's pov 
(and
hence the entire system except the gpu) we should never see 
inconsistency
really - as soon as the gpu does a write to a cacheline it'll win, and
before that nothing in the system can assume anything about the contents
of these pages.
   
   But the GPU doesn't write to cachelines (except in LLC/snooped+flush).
   The issue is what happens when the user lies about writing to the object
   through a WB cpu mapping (dirtying a cacheline) and the GPU also does.
   Who wins then?
   
   We have postulated that it could be entirely possible for the CPU to
   trust it cache and return local contents and for those to be also
   considered not dirty and so not flushed to memory. Later, we then read
   what the gpu wrote and choas ensues.
  
  This was just with an eye towards purged memory where we don't care about
  correct data anyway. The only thing we care about is that when it's all
  overwritten again by someone, that someone should win. And since GEM
  assumes new pages are in the cpu domain and clflushes them first that
  should hold even for GEM. But the tricky part is that I think we can pull
  this off only if the backing storage is purged already.
 
 But what's the difference between:
 
 lock
   put_pages
   purge
 
 and
 
 lock
   purge
   put_pages
 
 if you are dismissing the user dirtying CPU cachelines vs already dirty
 GPU data as being a source of worry?

Just worrying that eventually we'll change something and end up without
the purge after put_pages. If we reorder we can't get it wrong.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-12 Thread Chris Wilson
On Wed, Aug 12, 2015 at 02:35:59PM +0200, Daniel Vetter wrote:
 On Fri, Aug 07, 2015 at 02:07:39PM +0100, Chris Wilson wrote:
  On Fri, Aug 07, 2015 at 01:55:01PM +0200, Daniel Vetter wrote:
   On Fri, Aug 07, 2015 at 11:10:58AM +0100, Chris Wilson wrote:
On Fri, Aug 07, 2015 at 10:07:28AM +0200, Daniel Vetter wrote:
 On Thu, Aug 06, 2015 at 05:43:39PM +0100, Chris Wilson wrote:
 But it's still salvageable I think since we only care about coherency 
 for
 the gpu (where data might be stuck in cpu caches). From the cpu's pov 
 (and
 hence the entire system except the gpu) we should never see 
 inconsistency
 really - as soon as the gpu does a write to a cacheline it'll win, and
 before that nothing in the system can assume anything about the 
 contents
 of these pages.

But the GPU doesn't write to cachelines (except in LLC/snooped+flush).
The issue is what happens when the user lies about writing to the object
through a WB cpu mapping (dirtying a cacheline) and the GPU also does.
Who wins then?

We have postulated that it could be entirely possible for the CPU to
trust it cache and return local contents and for those to be also
considered not dirty and so not flushed to memory. Later, we then read
what the gpu wrote and choas ensues.
   
   This was just with an eye towards purged memory where we don't care about
   correct data anyway. The only thing we care about is that when it's all
   overwritten again by someone, that someone should win. And since GEM
   assumes new pages are in the cpu domain and clflushes them first that
   should hold even for GEM. But the tricky part is that I think we can pull
   this off only if the backing storage is purged already.
  
  But what's the difference between:
  
  lock
put_pages
purge
  
  and
  
  lock
purge
put_pages
  
  if you are dismissing the user dirtying CPU cachelines vs already dirty
  GPU data as being a source of worry?
 
 Just worrying that eventually we'll change something and end up without
 the purge after put_pages. If we reorder we can't get it wrong.

I'd rather keep my sanity checks (that say we are not allowed to touch
the backing storage whilst we are still using it). So how about a
comment in put_pages() detailing the issue?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-09 Thread Goel, Akash



On 8/7/2015 1:37 PM, Daniel Vetter wrote:

On Thu, Aug 06, 2015 at 05:43:39PM +0100, Chris Wilson wrote:

We have for a long time been ultra-paranoid about the situation whereby
we hand back pages to the system that have been written to by the GPU
and potentially simultaneously by the user through a CPU mmapping. We
can relax this restriction when we know that the cache domain tracking
is true and there can be no stale cacheline invalidatations. This is
true if the object has never been CPU mmaped as all internal accesses
(i.e. kmap/iomap) are carefully flushed. For a CPU mmaping, one would
expect that the invalid cache lines are resolved on PTE/TLB shootdown
during munmap(), so the only situation we need to be paranoid about is


That seems a pretty strong assumption given that x86 cache are physically
indexed - I'd never expect flushing to happen on munmap.



x86 L1 caches are most probably virtually indexed/physically tagged and 
generally this is the prevalent  most optimal Cache configuration.


For the virtually indexed/physically tagged caches, the cache flush is 
not really required either on Context switch or on page table update 
(munmap, PTE/TLB shootdown).


So there could be few cache lines, holding the stale data (due to a 
prior CPU mmapping), for the object being purged/truncated.





when such a CPU mmaping exists at the time of put_pages. Given that we
need to treat put_pages carefully as we may return live data to the
system that we want to use again in the future (i.e. I915_MADV_WILLNEED
pages) we can simply treat a live CPU mmaping as a special case of
WILLNEED (which it is!). Any I915_MADV_DONTNEED pages and their
mmapings are shotdown immediately following put_pages.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Goel, Akash akash.g...@intel.com
Cc: Ville Syrjälä ville.syrj...@linux.intel.com
Cc: Daniel Vetter daniel.vet...@ffwll.ch
Cc: Jesse Barnes jbar...@virtuousgeek.org


But it's still salvageable I think since we only care about coherency for
the gpu (where data might be stuck in cpu caches). From the cpu's pov (and
hence the entire system except the gpu) we should never see inconsistency
really - as soon as the gpu does a write to a cacheline it'll win, and
before that nothing in the system can assume anything about the contents
of these pages.

So imo a simple

/* For purgeable objects we don't care about object contents. */

would be enough.

Well except that there's gl extensions which expose purgeable objects, so
I guess we can't actually do this.

I presume though you only want to avoid clflush when actually purging an
object, so maybe we can keep this by purging the shmem backing node first
and checking here for __I915_MADV_PURGED instead?


An object marked as MADV_DONT_NEED, implies that it will be 
purged/truncated right away after the call to put_pages_gtt function.
So doing the other way round by purging first and then checking for 
__I915_MADV_PURGED, might be equivalent.


Best regards
Akash



Oh and some perf data would be good for this.
-Daniel


---
  drivers/gpu/drm/i915/i915_gem.c | 49 ++---
  1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2dfe707f11d3..24deace364a5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2047,22 +2047,45 @@ i915_gem_object_put_pages_gtt(struct 
drm_i915_gem_object *obj)

BUG_ON(obj-madv == __I915_MADV_PURGED);

-   ret = i915_gem_object_set_to_cpu_domain(obj, true);
-   if (ret) {
-   /* In the event of a disaster, abandon all caches and
-* hope for the best.
-*/
-   WARN_ON(ret != -EIO);
-   i915_gem_clflush_object(obj, true);
-   obj-base.read_domains = obj-base.write_domain = 
I915_GEM_DOMAIN_CPU;
-   }
-
i915_gem_gtt_finish_object(obj);

-   if (i915_gem_object_needs_bit17_swizzle(obj))
-   i915_gem_object_save_bit_17_swizzle(obj);
+   /* If we need to access the data in the future, we need to
+* be sure that the contents of the object is coherent with
+* the CPU prior to releasing the pages back to the system.
+* Once we unpin them, the mm is free to move them to different
+* zones or even swap them out to disk - all without our
+* intervention. (Though we could track such operations with our
+* own gemfs, if we ever write one.) As such if we want to keep
+* the data, set it to the CPU domain now just in case someone
+* else touches it.
+*
+* For a long time we have been paranoid about handing back
+* pages to the system with stale cacheline invalidation. For
+* all internal use (kmap/iomap), we know that the domain tracking is
+* accurate. However, the userspace API is lax and the user can CPU
+* mmap the 

Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-09 Thread Goel, Akash



On 8/9/2015 4:25 PM, Chris Wilson wrote:

On Sun, Aug 09, 2015 at 04:23:01PM +0530, Goel, Akash wrote:

On 8/7/2015 1:37 PM, Daniel Vetter wrote:

I presume though you only want to avoid clflush when actually purging an
object, so maybe we can keep this by purging the shmem backing node first
and checking here for __I915_MADV_PURGED instead?


An object marked as MADV_DONT_NEED, implies that it will be
purged/truncated right away after the call to put_pages_gtt
function.
So doing the other way round by purging first and then checking for
__I915_MADV_PURGED, might be equivalent.


But disregards a few nice sanity checks, which I would like to keep.
-Chris
Fine, just wanted to convey that doing the other way round may not be 
really beneficial.


About the other point of virtually indexed/physically tagged cache, 
would it be safe just rely on the MADV_DONT_NEED state of the object 
(which indicates that there are no active CPU mmappings) ?
Due to an earlier CPU mmappings, there could be cachelines holding the 
stale data ?


Best regards
Akash




___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-09 Thread Chris Wilson
On Sun, Aug 09, 2015 at 04:23:01PM +0530, Goel, Akash wrote:
 On 8/7/2015 1:37 PM, Daniel Vetter wrote:
 I presume though you only want to avoid clflush when actually purging an
 object, so maybe we can keep this by purging the shmem backing node first
 and checking here for __I915_MADV_PURGED instead?
 
 An object marked as MADV_DONT_NEED, implies that it will be
 purged/truncated right away after the call to put_pages_gtt
 function.
 So doing the other way round by purging first and then checking for
 __I915_MADV_PURGED, might be equivalent.

But disregards a few nice sanity checks, which I would like to keep.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-09 Thread Chris Wilson
On Sun, Aug 09, 2015 at 05:11:52PM +0530, Goel, Akash wrote:
 
 
 On 8/9/2015 4:25 PM, Chris Wilson wrote:
 On Sun, Aug 09, 2015 at 04:23:01PM +0530, Goel, Akash wrote:
 On 8/7/2015 1:37 PM, Daniel Vetter wrote:
 I presume though you only want to avoid clflush when actually purging an
 object, so maybe we can keep this by purging the shmem backing node first
 and checking here for __I915_MADV_PURGED instead?
 
 An object marked as MADV_DONT_NEED, implies that it will be
 purged/truncated right away after the call to put_pages_gtt
 function.
 So doing the other way round by purging first and then checking for
 __I915_MADV_PURGED, might be equivalent.
 
 But disregards a few nice sanity checks, which I would like to keep.
 -Chris
 Fine, just wanted to convey that doing the other way round may not
 be really beneficial.
 
 About the other point of virtually indexed/physically tagged cache,
 would it be safe just rely on the MADV_DONT_NEED state of the object
 (which indicates that there are no active CPU mmappings) ?
 Due to an earlier CPU mmappings, there could be cachelines holding
 the stale data ?

If the conflicts survive munmap(), I don't have a clever idea on how to
avoid the clflush before we hand back the pages to the system.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-09 Thread Goel, Akash



On 8/9/2015 6:19 PM, Chris Wilson wrote:

On Sun, Aug 09, 2015 at 05:11:52PM +0530, Goel, Akash wrote:



On 8/9/2015 4:25 PM, Chris Wilson wrote:

On Sun, Aug 09, 2015 at 04:23:01PM +0530, Goel, Akash wrote:

On 8/7/2015 1:37 PM, Daniel Vetter wrote:

I presume though you only want to avoid clflush when actually purging an
object, so maybe we can keep this by purging the shmem backing node first
and checking here for __I915_MADV_PURGED instead?


An object marked as MADV_DONT_NEED, implies that it will be
purged/truncated right away after the call to put_pages_gtt
function.
So doing the other way round by purging first and then checking for
__I915_MADV_PURGED, might be equivalent.


But disregards a few nice sanity checks, which I would like to keep.
-Chris

Fine, just wanted to convey that doing the other way round may not
be really beneficial.

About the other point of virtually indexed/physically tagged cache,
would it be safe just rely on the MADV_DONT_NEED state of the object
(which indicates that there are no active CPU mmappings) ?
Due to an earlier CPU mmappings, there could be cachelines holding
the stale data ?


If the conflicts survive munmap(), I don't have a clever idea on how to
avoid the clflush before we hand back the pages to the system.
One case could be, as you suggested, check if ever there was a CPU 
mapping created for the object  so avoid the clflush for GPU (GPU + 
GTT) only objects.


Best regards
Akash

-Chris


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-07 Thread Daniel Vetter
On Thu, Aug 06, 2015 at 05:43:39PM +0100, Chris Wilson wrote:
 We have for a long time been ultra-paranoid about the situation whereby
 we hand back pages to the system that have been written to by the GPU
 and potentially simultaneously by the user through a CPU mmapping. We
 can relax this restriction when we know that the cache domain tracking
 is true and there can be no stale cacheline invalidatations. This is
 true if the object has never been CPU mmaped as all internal accesses
 (i.e. kmap/iomap) are carefully flushed. For a CPU mmaping, one would
 expect that the invalid cache lines are resolved on PTE/TLB shootdown
 during munmap(), so the only situation we need to be paranoid about is

That seems a pretty strong assumption given that x86 cache are physically
indexed - I'd never expect flushing to happen on munmap.

 when such a CPU mmaping exists at the time of put_pages. Given that we
 need to treat put_pages carefully as we may return live data to the
 system that we want to use again in the future (i.e. I915_MADV_WILLNEED
 pages) we can simply treat a live CPU mmaping as a special case of
 WILLNEED (which it is!). Any I915_MADV_DONTNEED pages and their
 mmapings are shotdown immediately following put_pages.
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Goel, Akash akash.g...@intel.com
 Cc: Ville Syrjälä ville.syrj...@linux.intel.com
 Cc: Daniel Vetter daniel.vet...@ffwll.ch
 Cc: Jesse Barnes jbar...@virtuousgeek.org

But it's still salvageable I think since we only care about coherency for
the gpu (where data might be stuck in cpu caches). From the cpu's pov (and
hence the entire system except the gpu) we should never see inconsistency
really - as soon as the gpu does a write to a cacheline it'll win, and
before that nothing in the system can assume anything about the contents
of these pages.

So imo a simple

/* For purgeable objects we don't care about object contents. */

would be enough.

Well except that there's gl extensions which expose purgeable objects, so
I guess we can't actually do this.

I presume though you only want to avoid clflush when actually purging an
object, so maybe we can keep this by purging the shmem backing node first
and checking here for __I915_MADV_PURGED instead?

Oh and some perf data would be good for this.
-Daniel

 ---
  drivers/gpu/drm/i915/i915_gem.c | 49 
 ++---
  1 file changed, 36 insertions(+), 13 deletions(-)
 
 diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
 index 2dfe707f11d3..24deace364a5 100644
 --- a/drivers/gpu/drm/i915/i915_gem.c
 +++ b/drivers/gpu/drm/i915/i915_gem.c
 @@ -2047,22 +2047,45 @@ i915_gem_object_put_pages_gtt(struct 
 drm_i915_gem_object *obj)
  
   BUG_ON(obj-madv == __I915_MADV_PURGED);
  
 - ret = i915_gem_object_set_to_cpu_domain(obj, true);
 - if (ret) {
 - /* In the event of a disaster, abandon all caches and
 -  * hope for the best.
 -  */
 - WARN_ON(ret != -EIO);
 - i915_gem_clflush_object(obj, true);
 - obj-base.read_domains = obj-base.write_domain = 
 I915_GEM_DOMAIN_CPU;
 - }
 -
   i915_gem_gtt_finish_object(obj);
  
 - if (i915_gem_object_needs_bit17_swizzle(obj))
 - i915_gem_object_save_bit_17_swizzle(obj);
 + /* If we need to access the data in the future, we need to
 +  * be sure that the contents of the object is coherent with
 +  * the CPU prior to releasing the pages back to the system.
 +  * Once we unpin them, the mm is free to move them to different
 +  * zones or even swap them out to disk - all without our
 +  * intervention. (Though we could track such operations with our
 +  * own gemfs, if we ever write one.) As such if we want to keep
 +  * the data, set it to the CPU domain now just in case someone
 +  * else touches it.
 +  *
 +  * For a long time we have been paranoid about handing back
 +  * pages to the system with stale cacheline invalidation. For
 +  * all internal use (kmap/iomap), we know that the domain tracking is
 +  * accurate. However, the userspace API is lax and the user can CPU
 +  * mmap the object and invalidate cachelines without our accurate
 +  * tracking. We have been paranoid to be sure that we always flushed
 +  * the cachelines when we stopped using the pages. However, given
 +  * that the CPU PTE/TLB shootdown must have invalidated the cachelines
 +  * upon munmap(), we only need to be paranoid about a live CPU mmap
 +  * now. For this, we need only treat it as live data see
 +  * discard_backing_storage().
 +  */
 + if (obj-madv == I915_MADV_WILLNEED) {
 + ret = i915_gem_object_set_to_cpu_domain(obj, true);
 + if (ret) {
 + /* In the event of a disaster, abandon all caches and
 +  * hope for the best.
 +  */
 

Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-07 Thread Chris Wilson
On Fri, Aug 07, 2015 at 10:07:28AM +0200, Daniel Vetter wrote:
 On Thu, Aug 06, 2015 at 05:43:39PM +0100, Chris Wilson wrote:
 But it's still salvageable I think since we only care about coherency for
 the gpu (where data might be stuck in cpu caches). From the cpu's pov (and
 hence the entire system except the gpu) we should never see inconsistency
 really - as soon as the gpu does a write to a cacheline it'll win, and
 before that nothing in the system can assume anything about the contents
 of these pages.

But the GPU doesn't write to cachelines (except in LLC/snooped+flush).
The issue is what happens when the user lies about writing to the object
through a WB cpu mapping (dirtying a cacheline) and the GPU also does.
Who wins then?

We have postulated that it could be entirely possible for the CPU to
trust it cache and return local contents and for those to be also
considered not dirty and so not flushed to memory. Later, we then read
what the gpu wrote and choas ensues.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-07 Thread Daniel Vetter
On Fri, Aug 07, 2015 at 11:10:58AM +0100, Chris Wilson wrote:
 On Fri, Aug 07, 2015 at 10:07:28AM +0200, Daniel Vetter wrote:
  On Thu, Aug 06, 2015 at 05:43:39PM +0100, Chris Wilson wrote:
  But it's still salvageable I think since we only care about coherency for
  the gpu (where data might be stuck in cpu caches). From the cpu's pov (and
  hence the entire system except the gpu) we should never see inconsistency
  really - as soon as the gpu does a write to a cacheline it'll win, and
  before that nothing in the system can assume anything about the contents
  of these pages.
 
 But the GPU doesn't write to cachelines (except in LLC/snooped+flush).
 The issue is what happens when the user lies about writing to the object
 through a WB cpu mapping (dirtying a cacheline) and the GPU also does.
 Who wins then?
 
 We have postulated that it could be entirely possible for the CPU to
 trust it cache and return local contents and for those to be also
 considered not dirty and so not flushed to memory. Later, we then read
 what the gpu wrote and choas ensues.

This was just with an eye towards purged memory where we don't care about
correct data anyway. The only thing we care about is that when it's all
overwritten again by someone, that someone should win. And since GEM
assumes new pages are in the cpu domain and clflushes them first that
should hold even for GEM. But the tricky part is that I think we can pull
this off only if the backing storage is purged already.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-07 Thread Chris Wilson
On Fri, Aug 07, 2015 at 01:55:01PM +0200, Daniel Vetter wrote:
 On Fri, Aug 07, 2015 at 11:10:58AM +0100, Chris Wilson wrote:
  On Fri, Aug 07, 2015 at 10:07:28AM +0200, Daniel Vetter wrote:
   On Thu, Aug 06, 2015 at 05:43:39PM +0100, Chris Wilson wrote:
   But it's still salvageable I think since we only care about coherency for
   the gpu (where data might be stuck in cpu caches). From the cpu's pov (and
   hence the entire system except the gpu) we should never see inconsistency
   really - as soon as the gpu does a write to a cacheline it'll win, and
   before that nothing in the system can assume anything about the contents
   of these pages.
  
  But the GPU doesn't write to cachelines (except in LLC/snooped+flush).
  The issue is what happens when the user lies about writing to the object
  through a WB cpu mapping (dirtying a cacheline) and the GPU also does.
  Who wins then?
  
  We have postulated that it could be entirely possible for the CPU to
  trust it cache and return local contents and for those to be also
  considered not dirty and so not flushed to memory. Later, we then read
  what the gpu wrote and choas ensues.
 
 This was just with an eye towards purged memory where we don't care about
 correct data anyway. The only thing we care about is that when it's all
 overwritten again by someone, that someone should win. And since GEM
 assumes new pages are in the cpu domain and clflushes them first that
 should hold even for GEM. But the tricky part is that I think we can pull
 this off only if the backing storage is purged already.

But what's the difference between:

lock
  put_pages
  purge

and

lock
  purge
  put_pages

if you are dismissing the user dirtying CPU cachelines vs already dirty
GPU data as being a source of worry?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Only move to the CPU write domain if keeping the GTT pages

2015-08-06 Thread Chris Wilson
We have for a long time been ultra-paranoid about the situation whereby
we hand back pages to the system that have been written to by the GPU
and potentially simultaneously by the user through a CPU mmapping. We
can relax this restriction when we know that the cache domain tracking
is true and there can be no stale cacheline invalidatations. This is
true if the object has never been CPU mmaped as all internal accesses
(i.e. kmap/iomap) are carefully flushed. For a CPU mmaping, one would
expect that the invalid cache lines are resolved on PTE/TLB shootdown
during munmap(), so the only situation we need to be paranoid about is
when such a CPU mmaping exists at the time of put_pages. Given that we
need to treat put_pages carefully as we may return live data to the
system that we want to use again in the future (i.e. I915_MADV_WILLNEED
pages) we can simply treat a live CPU mmaping as a special case of
WILLNEED (which it is!). Any I915_MADV_DONTNEED pages and their
mmapings are shotdown immediately following put_pages.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Goel, Akash akash.g...@intel.com
Cc: Ville Syrjälä ville.syrj...@linux.intel.com
Cc: Daniel Vetter daniel.vet...@ffwll.ch
Cc: Jesse Barnes jbar...@virtuousgeek.org
---
 drivers/gpu/drm/i915/i915_gem.c | 49 ++---
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2dfe707f11d3..24deace364a5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2047,22 +2047,45 @@ i915_gem_object_put_pages_gtt(struct 
drm_i915_gem_object *obj)
 
BUG_ON(obj-madv == __I915_MADV_PURGED);
 
-   ret = i915_gem_object_set_to_cpu_domain(obj, true);
-   if (ret) {
-   /* In the event of a disaster, abandon all caches and
-* hope for the best.
-*/
-   WARN_ON(ret != -EIO);
-   i915_gem_clflush_object(obj, true);
-   obj-base.read_domains = obj-base.write_domain = 
I915_GEM_DOMAIN_CPU;
-   }
-
i915_gem_gtt_finish_object(obj);
 
-   if (i915_gem_object_needs_bit17_swizzle(obj))
-   i915_gem_object_save_bit_17_swizzle(obj);
+   /* If we need to access the data in the future, we need to
+* be sure that the contents of the object is coherent with
+* the CPU prior to releasing the pages back to the system.
+* Once we unpin them, the mm is free to move them to different
+* zones or even swap them out to disk - all without our
+* intervention. (Though we could track such operations with our
+* own gemfs, if we ever write one.) As such if we want to keep
+* the data, set it to the CPU domain now just in case someone
+* else touches it.
+*
+* For a long time we have been paranoid about handing back
+* pages to the system with stale cacheline invalidation. For
+* all internal use (kmap/iomap), we know that the domain tracking is
+* accurate. However, the userspace API is lax and the user can CPU
+* mmap the object and invalidate cachelines without our accurate
+* tracking. We have been paranoid to be sure that we always flushed
+* the cachelines when we stopped using the pages. However, given
+* that the CPU PTE/TLB shootdown must have invalidated the cachelines
+* upon munmap(), we only need to be paranoid about a live CPU mmap
+* now. For this, we need only treat it as live data see
+* discard_backing_storage().
+*/
+   if (obj-madv == I915_MADV_WILLNEED) {
+   ret = i915_gem_object_set_to_cpu_domain(obj, true);
+   if (ret) {
+   /* In the event of a disaster, abandon all caches and
+* hope for the best.
+*/
+   WARN_ON(ret != -EIO);
+   i915_gem_clflush_object(obj, true);
+   obj-base.read_domains = I915_GEM_DOMAIN_CPU;
+   obj-base.write_domain = I915_GEM_DOMAIN_CPU;
+   }
 
-   if (obj-madv == I915_MADV_DONTNEED)
+   if (i915_gem_object_needs_bit17_swizzle(obj))
+   i915_gem_object_save_bit_17_swizzle(obj);
+   } else
obj-dirty = 0;
 
st_for_each_page(iter, obj-pages) {
-- 
2.5.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx