Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-14 Thread Daniel Vetter
On Wed, Jul 14, 2021 at 02:42:53PM +0300, Ville Syrjälä wrote:
> On Wed, Jul 14, 2021 at 01:16:57PM +0200, Daniel Vetter wrote:
> > On Tue, Jul 13, 2021 at 09:46:30PM +0300, Ville Syrjälä wrote:
> > > On Tue, Jul 13, 2021 at 07:24:23PM +0100, Matthew Auld wrote:
> > > > On Tue, 13 Jul 2021 at 18:47, Ville Syrjälä
> > > >  wrote:
> > > > >
> > > > > On Tue, Jul 13, 2021 at 05:13:37PM +0100, Matthew Auld wrote:
> > > > > > On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> > > > > > > > + /**
> > > > > > > > +  * @cache_coherent:
> > > > > > > > +  *
> > > > > > > > +  * Track whether the pages are coherent with the GPU if 
> > > > > > > > reading or
> > > > > > > > +  * writing through the CPU cache.
> > > > > > > > +  *
> > > > > > > > +  * This largely depends on the @cache_level, for example 
> > > > > > > > if the object
> > > > > > > > +  * is marked as I915_CACHE_LLC, then GPU access is 
> > > > > > > > coherent for both
> > > > > > > > +  * reads and writes through the CPU cache.
> > > > > > > > +  *
> > > > > > > > +  * Note that on platforms with shared-LLC 
> > > > > > > > support(HAS_LLC) reads through
> > > > > > > > +  * the CPU cache are always coherent, regardless of the 
> > > > > > > > @cache_level. On
> > > > > > > > +  * snooping based platforms this is not the case, unless 
> > > > > > > > the full
> > > > > > > > +  * I915_CACHE_LLC or similar setting is used.
> > > > > > > > +  *
> > > > > > > > +  * As a result of this we need to track coherency 
> > > > > > > > separately for reads
> > > > > > > > +  * and writes, in order to avoid superfluous flushing on 
> > > > > > > > shared-LLC
> > > > > > > > +  * platforms, for reads.
> > > > > > > > +  *
> > > > > > > > +  * I915_BO_CACHE_COHERENT_FOR_READ:
> > > > > > > > +  *
> > > > > > > > +  * When reading through the CPU cache, the GPU is still 
> > > > > > > > coherent. Note
> > > > > > > > +  * that no data has actually been modified here, so it 
> > > > > > > > might seem
> > > > > > > > +  * strange that we care about this.
> > > > > > > > +  *
> > > > > > > > +  * As an example, if some object is mapped on the CPU 
> > > > > > > > with write-back
> > > > > > > > +  * caching, and we read some page, then the cache likely 
> > > > > > > > now contains
> > > > > > > > +  * the data from that read. At this point the cache and 
> > > > > > > > main memory
> > > > > > > > +  * match up, so all good. But next the GPU needs to write 
> > > > > > > > some data to
> > > > > > > > +  * that same page. Now if the @cache_level is 
> > > > > > > > I915_CACHE_NONE and the
> > > > > > > > +  * the platform doesn't have the shared-LLC, then the GPU 
> > > > > > > > will
> > > > > > > > +  * effectively skip invalidating the cache(or however 
> > > > > > > > that works
> > > > > > > > +  * internally) when writing the new value.  This is 
> > > > > > > > really bad since the
> > > > > > > > +  * GPU has just written some new data to main memory, but 
> > > > > > > > the CPU cache
> > > > > > > > +  * is still valid and now contains stale data. As a 
> > > > > > > > result the next time
> > > > > > > > +  * we do a cached read with the CPU, we are rewarded with 
> > > > > > > > stale data.
> > > > > > > > +  * Likewise if the cache is later flushed, we might be 
> > > > > > > > rewarded with
> > > > > > > > +  * overwriting main memory with stale data.
> > > > > > > > +  *
> > > > > > > > +  * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > > > > > > > +  *
> > > > > > > > +  * When writing through the CPU cache, the GPU is still 
> > > > > > > > coherent. Note
> > > > > > > > +  * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > > > > > > > +  *
> > > > > > > > +  * This is never set when I915_CACHE_NONE is used for 
> > > > > > > > @cache_level,
> > > > > > > > +  * where instead we have to manually flush the caches 
> > > > > > > > after writing
> > > > > > > > +  * through the CPU cache. For other cache levels this 
> > > > > > > > should be set and
> > > > > > > > +  * the object is therefore considered coherent for both 
> > > > > > > > reads and writes
> > > > > > > > +  * through the CPU cache.
> > > > > > >
> > > > > > > I don't remember why we have this read vs. write split and this 
> > > > > > > new
> > > > > > > documentation doesn't seem to really explain it either.
> > > > > >
> > > > > > Hmm, I attempted to explain that earlier:
> > > > > >
> > > > > > * Note that on platforms with shared-LLC support(HAS_LLC) reads 
> > > > > > through
> > > > > > * the CPU cache are always coherent, regardless of the 
> > > > > > @cache_level. On
> > > > > > * snooping based platforms this is not the case, unless the full
> > > > > > * I915_CACHE_LLC or 

Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-14 Thread Ville Syrjälä
On Wed, Jul 14, 2021 at 02:42:53PM +0300, Ville Syrjälä wrote:
> On Wed, Jul 14, 2021 at 01:16:57PM +0200, Daniel Vetter wrote:
> > On Tue, Jul 13, 2021 at 09:46:30PM +0300, Ville Syrjälä wrote:
> > > On Tue, Jul 13, 2021 at 07:24:23PM +0100, Matthew Auld wrote:
> > > > On Tue, 13 Jul 2021 at 18:47, Ville Syrjälä
> > > >  wrote:
> > > > >
> > > > > On Tue, Jul 13, 2021 at 05:13:37PM +0100, Matthew Auld wrote:
> > > > > > On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> > > > > > > > + /**
> > > > > > > > +  * @cache_coherent:
> > > > > > > > +  *
> > > > > > > > +  * Track whether the pages are coherent with the GPU if 
> > > > > > > > reading or
> > > > > > > > +  * writing through the CPU cache.
> > > > > > > > +  *
> > > > > > > > +  * This largely depends on the @cache_level, for example 
> > > > > > > > if the object
> > > > > > > > +  * is marked as I915_CACHE_LLC, then GPU access is 
> > > > > > > > coherent for both
> > > > > > > > +  * reads and writes through the CPU cache.
> > > > > > > > +  *
> > > > > > > > +  * Note that on platforms with shared-LLC 
> > > > > > > > support(HAS_LLC) reads through
> > > > > > > > +  * the CPU cache are always coherent, regardless of the 
> > > > > > > > @cache_level. On
> > > > > > > > +  * snooping based platforms this is not the case, unless 
> > > > > > > > the full
> > > > > > > > +  * I915_CACHE_LLC or similar setting is used.
> > > > > > > > +  *
> > > > > > > > +  * As a result of this we need to track coherency 
> > > > > > > > separately for reads
> > > > > > > > +  * and writes, in order to avoid superfluous flushing on 
> > > > > > > > shared-LLC
> > > > > > > > +  * platforms, for reads.
> > > > > > > > +  *
> > > > > > > > +  * I915_BO_CACHE_COHERENT_FOR_READ:
> > > > > > > > +  *
> > > > > > > > +  * When reading through the CPU cache, the GPU is still 
> > > > > > > > coherent. Note
> > > > > > > > +  * that no data has actually been modified here, so it 
> > > > > > > > might seem
> > > > > > > > +  * strange that we care about this.
> > > > > > > > +  *
> > > > > > > > +  * As an example, if some object is mapped on the CPU 
> > > > > > > > with write-back
> > > > > > > > +  * caching, and we read some page, then the cache likely 
> > > > > > > > now contains
> > > > > > > > +  * the data from that read. At this point the cache and 
> > > > > > > > main memory
> > > > > > > > +  * match up, so all good. But next the GPU needs to write 
> > > > > > > > some data to
> > > > > > > > +  * that same page. Now if the @cache_level is 
> > > > > > > > I915_CACHE_NONE and the
> > > > > > > > +  * the platform doesn't have the shared-LLC, then the GPU 
> > > > > > > > will
> > > > > > > > +  * effectively skip invalidating the cache(or however 
> > > > > > > > that works
> > > > > > > > +  * internally) when writing the new value.  This is 
> > > > > > > > really bad since the
> > > > > > > > +  * GPU has just written some new data to main memory, but 
> > > > > > > > the CPU cache
> > > > > > > > +  * is still valid and now contains stale data. As a 
> > > > > > > > result the next time
> > > > > > > > +  * we do a cached read with the CPU, we are rewarded with 
> > > > > > > > stale data.
> > > > > > > > +  * Likewise if the cache is later flushed, we might be 
> > > > > > > > rewarded with
> > > > > > > > +  * overwriting main memory with stale data.
> > > > > > > > +  *
> > > > > > > > +  * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > > > > > > > +  *
> > > > > > > > +  * When writing through the CPU cache, the GPU is still 
> > > > > > > > coherent. Note
> > > > > > > > +  * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > > > > > > > +  *
> > > > > > > > +  * This is never set when I915_CACHE_NONE is used for 
> > > > > > > > @cache_level,
> > > > > > > > +  * where instead we have to manually flush the caches 
> > > > > > > > after writing
> > > > > > > > +  * through the CPU cache. For other cache levels this 
> > > > > > > > should be set and
> > > > > > > > +  * the object is therefore considered coherent for both 
> > > > > > > > reads and writes
> > > > > > > > +  * through the CPU cache.
> > > > > > >
> > > > > > > I don't remember why we have this read vs. write split and this 
> > > > > > > new
> > > > > > > documentation doesn't seem to really explain it either.
> > > > > >
> > > > > > Hmm, I attempted to explain that earlier:
> > > > > >
> > > > > > * Note that on platforms with shared-LLC support(HAS_LLC) reads 
> > > > > > through
> > > > > > * the CPU cache are always coherent, regardless of the 
> > > > > > @cache_level. On
> > > > > > * snooping based platforms this is not the case, unless the full
> > > > > > * I915_CACHE_LLC or 

Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-14 Thread Ville Syrjälä
On Wed, Jul 14, 2021 at 01:16:57PM +0200, Daniel Vetter wrote:
> On Tue, Jul 13, 2021 at 09:46:30PM +0300, Ville Syrjälä wrote:
> > On Tue, Jul 13, 2021 at 07:24:23PM +0100, Matthew Auld wrote:
> > > On Tue, 13 Jul 2021 at 18:47, Ville Syrjälä
> > >  wrote:
> > > >
> > > > On Tue, Jul 13, 2021 at 05:13:37PM +0100, Matthew Auld wrote:
> > > > > On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> > > > > > > + /**
> > > > > > > +  * @cache_coherent:
> > > > > > > +  *
> > > > > > > +  * Track whether the pages are coherent with the GPU if 
> > > > > > > reading or
> > > > > > > +  * writing through the CPU cache.
> > > > > > > +  *
> > > > > > > +  * This largely depends on the @cache_level, for example if 
> > > > > > > the object
> > > > > > > +  * is marked as I915_CACHE_LLC, then GPU access is coherent 
> > > > > > > for both
> > > > > > > +  * reads and writes through the CPU cache.
> > > > > > > +  *
> > > > > > > +  * Note that on platforms with shared-LLC support(HAS_LLC) 
> > > > > > > reads through
> > > > > > > +  * the CPU cache are always coherent, regardless of the 
> > > > > > > @cache_level. On
> > > > > > > +  * snooping based platforms this is not the case, unless 
> > > > > > > the full
> > > > > > > +  * I915_CACHE_LLC or similar setting is used.
> > > > > > > +  *
> > > > > > > +  * As a result of this we need to track coherency 
> > > > > > > separately for reads
> > > > > > > +  * and writes, in order to avoid superfluous flushing on 
> > > > > > > shared-LLC
> > > > > > > +  * platforms, for reads.
> > > > > > > +  *
> > > > > > > +  * I915_BO_CACHE_COHERENT_FOR_READ:
> > > > > > > +  *
> > > > > > > +  * When reading through the CPU cache, the GPU is still 
> > > > > > > coherent. Note
> > > > > > > +  * that no data has actually been modified here, so it 
> > > > > > > might seem
> > > > > > > +  * strange that we care about this.
> > > > > > > +  *
> > > > > > > +  * As an example, if some object is mapped on the CPU with 
> > > > > > > write-back
> > > > > > > +  * caching, and we read some page, then the cache likely 
> > > > > > > now contains
> > > > > > > +  * the data from that read. At this point the cache and 
> > > > > > > main memory
> > > > > > > +  * match up, so all good. But next the GPU needs to write 
> > > > > > > some data to
> > > > > > > +  * that same page. Now if the @cache_level is 
> > > > > > > I915_CACHE_NONE and the
> > > > > > > +  * the platform doesn't have the shared-LLC, then the GPU 
> > > > > > > will
> > > > > > > +  * effectively skip invalidating the cache(or however that 
> > > > > > > works
> > > > > > > +  * internally) when writing the new value.  This is really 
> > > > > > > bad since the
> > > > > > > +  * GPU has just written some new data to main memory, but 
> > > > > > > the CPU cache
> > > > > > > +  * is still valid and now contains stale data. As a result 
> > > > > > > the next time
> > > > > > > +  * we do a cached read with the CPU, we are rewarded with 
> > > > > > > stale data.
> > > > > > > +  * Likewise if the cache is later flushed, we might be 
> > > > > > > rewarded with
> > > > > > > +  * overwriting main memory with stale data.
> > > > > > > +  *
> > > > > > > +  * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > > > > > > +  *
> > > > > > > +  * When writing through the CPU cache, the GPU is still 
> > > > > > > coherent. Note
> > > > > > > +  * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > > > > > > +  *
> > > > > > > +  * This is never set when I915_CACHE_NONE is used for 
> > > > > > > @cache_level,
> > > > > > > +  * where instead we have to manually flush the caches after 
> > > > > > > writing
> > > > > > > +  * through the CPU cache. For other cache levels this 
> > > > > > > should be set and
> > > > > > > +  * the object is therefore considered coherent for both 
> > > > > > > reads and writes
> > > > > > > +  * through the CPU cache.
> > > > > >
> > > > > > I don't remember why we have this read vs. write split and this new
> > > > > > documentation doesn't seem to really explain it either.
> > > > >
> > > > > Hmm, I attempted to explain that earlier:
> > > > >
> > > > > * Note that on platforms with shared-LLC support(HAS_LLC) reads 
> > > > > through
> > > > > * the CPU cache are always coherent, regardless of the @cache_level. 
> > > > > On
> > > > > * snooping based platforms this is not the case, unless the full
> > > > > * I915_CACHE_LLC or similar setting is used.
> > > > > *
> > > > > * As a result of this we need to track coherency separately for reads
> > > > > * and writes, in order to avoid superfluous flushing on shared-LLC
> > > > > * platforms, for reads.
> > > > >
> > > > > So AFAIK it's just because s

Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-14 Thread Daniel Vetter
On Tue, Jul 13, 2021 at 09:46:30PM +0300, Ville Syrjälä wrote:
> On Tue, Jul 13, 2021 at 07:24:23PM +0100, Matthew Auld wrote:
> > On Tue, 13 Jul 2021 at 18:47, Ville Syrjälä
> >  wrote:
> > >
> > > On Tue, Jul 13, 2021 at 05:13:37PM +0100, Matthew Auld wrote:
> > > > On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
> > > >  wrote:
> > > > >
> > > > > On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> > > > > > + /**
> > > > > > +  * @cache_coherent:
> > > > > > +  *
> > > > > > +  * Track whether the pages are coherent with the GPU if 
> > > > > > reading or
> > > > > > +  * writing through the CPU cache.
> > > > > > +  *
> > > > > > +  * This largely depends on the @cache_level, for example if 
> > > > > > the object
> > > > > > +  * is marked as I915_CACHE_LLC, then GPU access is coherent 
> > > > > > for both
> > > > > > +  * reads and writes through the CPU cache.
> > > > > > +  *
> > > > > > +  * Note that on platforms with shared-LLC support(HAS_LLC) 
> > > > > > reads through
> > > > > > +  * the CPU cache are always coherent, regardless of the 
> > > > > > @cache_level. On
> > > > > > +  * snooping based platforms this is not the case, unless the 
> > > > > > full
> > > > > > +  * I915_CACHE_LLC or similar setting is used.
> > > > > > +  *
> > > > > > +  * As a result of this we need to track coherency separately 
> > > > > > for reads
> > > > > > +  * and writes, in order to avoid superfluous flushing on 
> > > > > > shared-LLC
> > > > > > +  * platforms, for reads.
> > > > > > +  *
> > > > > > +  * I915_BO_CACHE_COHERENT_FOR_READ:
> > > > > > +  *
> > > > > > +  * When reading through the CPU cache, the GPU is still 
> > > > > > coherent. Note
> > > > > > +  * that no data has actually been modified here, so it might 
> > > > > > seem
> > > > > > +  * strange that we care about this.
> > > > > > +  *
> > > > > > +  * As an example, if some object is mapped on the CPU with 
> > > > > > write-back
> > > > > > +  * caching, and we read some page, then the cache likely now 
> > > > > > contains
> > > > > > +  * the data from that read. At this point the cache and main 
> > > > > > memory
> > > > > > +  * match up, so all good. But next the GPU needs to write 
> > > > > > some data to
> > > > > > +  * that same page. Now if the @cache_level is I915_CACHE_NONE 
> > > > > > and the
> > > > > > +  * the platform doesn't have the shared-LLC, then the GPU will
> > > > > > +  * effectively skip invalidating the cache(or however that 
> > > > > > works
> > > > > > +  * internally) when writing the new value.  This is really 
> > > > > > bad since the
> > > > > > +  * GPU has just written some new data to main memory, but the 
> > > > > > CPU cache
> > > > > > +  * is still valid and now contains stale data. As a result 
> > > > > > the next time
> > > > > > +  * we do a cached read with the CPU, we are rewarded with 
> > > > > > stale data.
> > > > > > +  * Likewise if the cache is later flushed, we might be 
> > > > > > rewarded with
> > > > > > +  * overwriting main memory with stale data.
> > > > > > +  *
> > > > > > +  * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > > > > > +  *
> > > > > > +  * When writing through the CPU cache, the GPU is still 
> > > > > > coherent. Note
> > > > > > +  * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > > > > > +  *
> > > > > > +  * This is never set when I915_CACHE_NONE is used for 
> > > > > > @cache_level,
> > > > > > +  * where instead we have to manually flush the caches after 
> > > > > > writing
> > > > > > +  * through the CPU cache. For other cache levels this should 
> > > > > > be set and
> > > > > > +  * the object is therefore considered coherent for both reads 
> > > > > > and writes
> > > > > > +  * through the CPU cache.
> > > > >
> > > > > I don't remember why we have this read vs. write split and this new
> > > > > documentation doesn't seem to really explain it either.
> > > >
> > > > Hmm, I attempted to explain that earlier:
> > > >
> > > > * Note that on platforms with shared-LLC support(HAS_LLC) reads through
> > > > * the CPU cache are always coherent, regardless of the @cache_level. On
> > > > * snooping based platforms this is not the case, unless the full
> > > > * I915_CACHE_LLC or similar setting is used.
> > > > *
> > > > * As a result of this we need to track coherency separately for reads
> > > > * and writes, in order to avoid superfluous flushing on shared-LLC
> > > > * platforms, for reads.
> > > >
> > > > So AFAIK it's just because shared-LLC can be coherent for reads, while
> > > > also not being coherent for writes(CACHE_NONE),
> > >
> > > CPU vs. GPU is fully coherent when it comes to LLC. Or at least I've
> > > never heard of any mechanism that would make it only partially coherent.
> > 
> > What do you mean by "comes to LLC"

Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-13 Thread Ville Syrjälä
On Tue, Jul 13, 2021 at 07:24:23PM +0100, Matthew Auld wrote:
> On Tue, 13 Jul 2021 at 18:47, Ville Syrjälä
>  wrote:
> >
> > On Tue, Jul 13, 2021 at 05:13:37PM +0100, Matthew Auld wrote:
> > > On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
> > >  wrote:
> > > >
> > > > On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> > > > > + /**
> > > > > +  * @cache_coherent:
> > > > > +  *
> > > > > +  * Track whether the pages are coherent with the GPU if reading 
> > > > > or
> > > > > +  * writing through the CPU cache.
> > > > > +  *
> > > > > +  * This largely depends on the @cache_level, for example if the 
> > > > > object
> > > > > +  * is marked as I915_CACHE_LLC, then GPU access is coherent for 
> > > > > both
> > > > > +  * reads and writes through the CPU cache.
> > > > > +  *
> > > > > +  * Note that on platforms with shared-LLC support(HAS_LLC) 
> > > > > reads through
> > > > > +  * the CPU cache are always coherent, regardless of the 
> > > > > @cache_level. On
> > > > > +  * snooping based platforms this is not the case, unless the 
> > > > > full
> > > > > +  * I915_CACHE_LLC or similar setting is used.
> > > > > +  *
> > > > > +  * As a result of this we need to track coherency separately 
> > > > > for reads
> > > > > +  * and writes, in order to avoid superfluous flushing on 
> > > > > shared-LLC
> > > > > +  * platforms, for reads.
> > > > > +  *
> > > > > +  * I915_BO_CACHE_COHERENT_FOR_READ:
> > > > > +  *
> > > > > +  * When reading through the CPU cache, the GPU is still 
> > > > > coherent. Note
> > > > > +  * that no data has actually been modified here, so it might 
> > > > > seem
> > > > > +  * strange that we care about this.
> > > > > +  *
> > > > > +  * As an example, if some object is mapped on the CPU with 
> > > > > write-back
> > > > > +  * caching, and we read some page, then the cache likely now 
> > > > > contains
> > > > > +  * the data from that read. At this point the cache and main 
> > > > > memory
> > > > > +  * match up, so all good. But next the GPU needs to write some 
> > > > > data to
> > > > > +  * that same page. Now if the @cache_level is I915_CACHE_NONE 
> > > > > and the
> > > > > +  * the platform doesn't have the shared-LLC, then the GPU will
> > > > > +  * effectively skip invalidating the cache(or however that works
> > > > > +  * internally) when writing the new value.  This is really bad 
> > > > > since the
> > > > > +  * GPU has just written some new data to main memory, but the 
> > > > > CPU cache
> > > > > +  * is still valid and now contains stale data. As a result the 
> > > > > next time
> > > > > +  * we do a cached read with the CPU, we are rewarded with stale 
> > > > > data.
> > > > > +  * Likewise if the cache is later flushed, we might be rewarded 
> > > > > with
> > > > > +  * overwriting main memory with stale data.
> > > > > +  *
> > > > > +  * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > > > > +  *
> > > > > +  * When writing through the CPU cache, the GPU is still 
> > > > > coherent. Note
> > > > > +  * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > > > > +  *
> > > > > +  * This is never set when I915_CACHE_NONE is used for 
> > > > > @cache_level,
> > > > > +  * where instead we have to manually flush the caches after 
> > > > > writing
> > > > > +  * through the CPU cache. For other cache levels this should be 
> > > > > set and
> > > > > +  * the object is therefore considered coherent for both reads 
> > > > > and writes
> > > > > +  * through the CPU cache.
> > > >
> > > > I don't remember why we have this read vs. write split and this new
> > > > documentation doesn't seem to really explain it either.
> > >
> > > Hmm, I attempted to explain that earlier:
> > >
> > > * Note that on platforms with shared-LLC support(HAS_LLC) reads through
> > > * the CPU cache are always coherent, regardless of the @cache_level. On
> > > * snooping based platforms this is not the case, unless the full
> > > * I915_CACHE_LLC or similar setting is used.
> > > *
> > > * As a result of this we need to track coherency separately for reads
> > > * and writes, in order to avoid superfluous flushing on shared-LLC
> > > * platforms, for reads.
> > >
> > > So AFAIK it's just because shared-LLC can be coherent for reads, while
> > > also not being coherent for writes(CACHE_NONE),
> >
> > CPU vs. GPU is fully coherent when it comes to LLC. Or at least I've
> > never heard of any mechanism that would make it only partially coherent.
> 
> What do you mean by "comes to LLC", are you talking about HAS_LLC() or
> I915_CACHE_LLC?

I'm talking about the actual cache.

> 
> If you set I915_CACHE_LLC, then yes it is fully coherent for both
> HAS_LLC() and HAS_SNOOP().
> 
> If you set I915_CACHE_NONE, then reads are still coherent on
> HAS_LLC(),

Reads and wri

Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-13 Thread Matthew Auld
On Tue, 13 Jul 2021 at 18:47, Ville Syrjälä
 wrote:
>
> On Tue, Jul 13, 2021 at 05:13:37PM +0100, Matthew Auld wrote:
> > On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
> >  wrote:
> > >
> > > On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> > > > + /**
> > > > +  * @cache_coherent:
> > > > +  *
> > > > +  * Track whether the pages are coherent with the GPU if reading or
> > > > +  * writing through the CPU cache.
> > > > +  *
> > > > +  * This largely depends on the @cache_level, for example if the 
> > > > object
> > > > +  * is marked as I915_CACHE_LLC, then GPU access is coherent for 
> > > > both
> > > > +  * reads and writes through the CPU cache.
> > > > +  *
> > > > +  * Note that on platforms with shared-LLC support(HAS_LLC) reads 
> > > > through
> > > > +  * the CPU cache are always coherent, regardless of the 
> > > > @cache_level. On
> > > > +  * snooping based platforms this is not the case, unless the full
> > > > +  * I915_CACHE_LLC or similar setting is used.
> > > > +  *
> > > > +  * As a result of this we need to track coherency separately for 
> > > > reads
> > > > +  * and writes, in order to avoid superfluous flushing on 
> > > > shared-LLC
> > > > +  * platforms, for reads.
> > > > +  *
> > > > +  * I915_BO_CACHE_COHERENT_FOR_READ:
> > > > +  *
> > > > +  * When reading through the CPU cache, the GPU is still coherent. 
> > > > Note
> > > > +  * that no data has actually been modified here, so it might seem
> > > > +  * strange that we care about this.
> > > > +  *
> > > > +  * As an example, if some object is mapped on the CPU with 
> > > > write-back
> > > > +  * caching, and we read some page, then the cache likely now 
> > > > contains
> > > > +  * the data from that read. At this point the cache and main 
> > > > memory
> > > > +  * match up, so all good. But next the GPU needs to write some 
> > > > data to
> > > > +  * that same page. Now if the @cache_level is I915_CACHE_NONE and 
> > > > the
> > > > +  * the platform doesn't have the shared-LLC, then the GPU will
> > > > +  * effectively skip invalidating the cache(or however that works
> > > > +  * internally) when writing the new value.  This is really bad 
> > > > since the
> > > > +  * GPU has just written some new data to main memory, but the CPU 
> > > > cache
> > > > +  * is still valid and now contains stale data. As a result the 
> > > > next time
> > > > +  * we do a cached read with the CPU, we are rewarded with stale 
> > > > data.
> > > > +  * Likewise if the cache is later flushed, we might be rewarded 
> > > > with
> > > > +  * overwriting main memory with stale data.
> > > > +  *
> > > > +  * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > > > +  *
> > > > +  * When writing through the CPU cache, the GPU is still coherent. 
> > > > Note
> > > > +  * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > > > +  *
> > > > +  * This is never set when I915_CACHE_NONE is used for 
> > > > @cache_level,
> > > > +  * where instead we have to manually flush the caches after 
> > > > writing
> > > > +  * through the CPU cache. For other cache levels this should be 
> > > > set and
> > > > +  * the object is therefore considered coherent for both reads and 
> > > > writes
> > > > +  * through the CPU cache.
> > >
> > > I don't remember why we have this read vs. write split and this new
> > > documentation doesn't seem to really explain it either.
> >
> > Hmm, I attempted to explain that earlier:
> >
> > * Note that on platforms with shared-LLC support(HAS_LLC) reads through
> > * the CPU cache are always coherent, regardless of the @cache_level. On
> > * snooping based platforms this is not the case, unless the full
> > * I915_CACHE_LLC or similar setting is used.
> > *
> > * As a result of this we need to track coherency separately for reads
> > * and writes, in order to avoid superfluous flushing on shared-LLC
> > * platforms, for reads.
> >
> > So AFAIK it's just because shared-LLC can be coherent for reads, while
> > also not being coherent for writes(CACHE_NONE),
>
> CPU vs. GPU is fully coherent when it comes to LLC. Or at least I've
> never heard of any mechanism that would make it only partially coherent.

What do you mean by "comes to LLC", are you talking about HAS_LLC() or
I915_CACHE_LLC?

If you set I915_CACHE_LLC, then yes it is fully coherent for both
HAS_LLC() and HAS_SNOOP().

If you set I915_CACHE_NONE, then reads are still coherent on
HAS_LLC(), for HAS_SNOOP() they are not. Or at least that is the
existing behaviour in the driver AFAIK.

>
> --
> Ville Syrjälä
> Intel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-13 Thread Ville Syrjälä
On Tue, Jul 13, 2021 at 05:13:37PM +0100, Matthew Auld wrote:
> On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
>  wrote:
> >
> > On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> > > + /**
> > > +  * @cache_coherent:
> > > +  *
> > > +  * Track whether the pages are coherent with the GPU if reading or
> > > +  * writing through the CPU cache.
> > > +  *
> > > +  * This largely depends on the @cache_level, for example if the 
> > > object
> > > +  * is marked as I915_CACHE_LLC, then GPU access is coherent for both
> > > +  * reads and writes through the CPU cache.
> > > +  *
> > > +  * Note that on platforms with shared-LLC support(HAS_LLC) reads 
> > > through
> > > +  * the CPU cache are always coherent, regardless of the 
> > > @cache_level. On
> > > +  * snooping based platforms this is not the case, unless the full
> > > +  * I915_CACHE_LLC or similar setting is used.
> > > +  *
> > > +  * As a result of this we need to track coherency separately for 
> > > reads
> > > +  * and writes, in order to avoid superfluous flushing on shared-LLC
> > > +  * platforms, for reads.
> > > +  *
> > > +  * I915_BO_CACHE_COHERENT_FOR_READ:
> > > +  *
> > > +  * When reading through the CPU cache, the GPU is still coherent. 
> > > Note
> > > +  * that no data has actually been modified here, so it might seem
> > > +  * strange that we care about this.
> > > +  *
> > > +  * As an example, if some object is mapped on the CPU with 
> > > write-back
> > > +  * caching, and we read some page, then the cache likely now 
> > > contains
> > > +  * the data from that read. At this point the cache and main memory
> > > +  * match up, so all good. But next the GPU needs to write some data 
> > > to
> > > +  * that same page. Now if the @cache_level is I915_CACHE_NONE and 
> > > the
> > > +  * the platform doesn't have the shared-LLC, then the GPU will
> > > +  * effectively skip invalidating the cache(or however that works
> > > +  * internally) when writing the new value.  This is really bad 
> > > since the
> > > +  * GPU has just written some new data to main memory, but the CPU 
> > > cache
> > > +  * is still valid and now contains stale data. As a result the next 
> > > time
> > > +  * we do a cached read with the CPU, we are rewarded with stale 
> > > data.
> > > +  * Likewise if the cache is later flushed, we might be rewarded with
> > > +  * overwriting main memory with stale data.
> > > +  *
> > > +  * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > > +  *
> > > +  * When writing through the CPU cache, the GPU is still coherent. 
> > > Note
> > > +  * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > > +  *
> > > +  * This is never set when I915_CACHE_NONE is used for @cache_level,
> > > +  * where instead we have to manually flush the caches after writing
> > > +  * through the CPU cache. For other cache levels this should be set 
> > > and
> > > +  * the object is therefore considered coherent for both reads and 
> > > writes
> > > +  * through the CPU cache.
> >
> > I don't remember why we have this read vs. write split and this new
> > documentation doesn't seem to really explain it either.
> 
> Hmm, I attempted to explain that earlier:
> 
> * Note that on platforms with shared-LLC support(HAS_LLC) reads through
> * the CPU cache are always coherent, regardless of the @cache_level. On
> * snooping based platforms this is not the case, unless the full
> * I915_CACHE_LLC or similar setting is used.
> *
> * As a result of this we need to track coherency separately for reads
> * and writes, in order to avoid superfluous flushing on shared-LLC
> * platforms, for reads.
> 
> So AFAIK it's just because shared-LLC can be coherent for reads, while
> also not being coherent for writes(CACHE_NONE),

CPU vs. GPU is fully coherent when it comes to LLC. Or at least I've
never heard of any mechanism that would make it only partially coherent.

-- 
Ville Syrjälä
Intel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-13 Thread Daniel Vetter
On Tue, Jul 13, 2021 at 6:14 PM Matthew Auld
 wrote:
> On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
>  wrote:
> >
> > On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> > > + /**
> > > +  * @cache_coherent:
> > > +  *
> > > +  * Track whether the pages are coherent with the GPU if reading or
> > > +  * writing through the CPU cache.
> > > +  *
> > > +  * This largely depends on the @cache_level, for example if the 
> > > object
> > > +  * is marked as I915_CACHE_LLC, then GPU access is coherent for both
> > > +  * reads and writes through the CPU cache.
> > > +  *
> > > +  * Note that on platforms with shared-LLC support(HAS_LLC) reads 
> > > through
> > > +  * the CPU cache are always coherent, regardless of the 
> > > @cache_level. On
> > > +  * snooping based platforms this is not the case, unless the full
> > > +  * I915_CACHE_LLC or similar setting is used.
> > > +  *
> > > +  * As a result of this we need to track coherency separately for 
> > > reads
> > > +  * and writes, in order to avoid superfluous flushing on shared-LLC
> > > +  * platforms, for reads.
> > > +  *
> > > +  * I915_BO_CACHE_COHERENT_FOR_READ:
> > > +  *
> > > +  * When reading through the CPU cache, the GPU is still coherent. 
> > > Note
> > > +  * that no data has actually been modified here, so it might seem
> > > +  * strange that we care about this.
> > > +  *
> > > +  * As an example, if some object is mapped on the CPU with 
> > > write-back
> > > +  * caching, and we read some page, then the cache likely now 
> > > contains
> > > +  * the data from that read. At this point the cache and main memory
> > > +  * match up, so all good. But next the GPU needs to write some data 
> > > to
> > > +  * that same page. Now if the @cache_level is I915_CACHE_NONE and 
> > > the
> > > +  * the platform doesn't have the shared-LLC, then the GPU will
> > > +  * effectively skip invalidating the cache(or however that works
> > > +  * internally) when writing the new value.  This is really bad 
> > > since the
> > > +  * GPU has just written some new data to main memory, but the CPU 
> > > cache
> > > +  * is still valid and now contains stale data. As a result the next 
> > > time
> > > +  * we do a cached read with the CPU, we are rewarded with stale 
> > > data.
> > > +  * Likewise if the cache is later flushed, we might be rewarded with
> > > +  * overwriting main memory with stale data.
> > > +  *
> > > +  * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > > +  *
> > > +  * When writing through the CPU cache, the GPU is still coherent. 
> > > Note
> > > +  * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > > +  *
> > > +  * This is never set when I915_CACHE_NONE is used for @cache_level,
> > > +  * where instead we have to manually flush the caches after writing
> > > +  * through the CPU cache. For other cache levels this should be set 
> > > and
> > > +  * the object is therefore considered coherent for both reads and 
> > > writes
> > > +  * through the CPU cache.
> >
> > I don't remember why we have this read vs. write split and this new
> > documentation doesn't seem to really explain it either.
>
> Hmm, I attempted to explain that earlier:
>
> * Note that on platforms with shared-LLC support(HAS_LLC) reads through
> * the CPU cache are always coherent, regardless of the @cache_level. On
> * snooping based platforms this is not the case, unless the full
> * I915_CACHE_LLC or similar setting is used.
> *
> * As a result of this we need to track coherency separately for reads
> * and writes, in order to avoid superfluous flushing on shared-LLC
> * platforms, for reads.
>
> So AFAIK it's just because shared-LLC can be coherent for reads, while
> also not being coherent for writes(CACHE_NONE), so being able to track
> each separately is kind of needed to avoid unnecessary flushing for
> the read cases i.e simple boolean for coherent vs non-coherent is not
> enough.
>
> I can try to reword things to make that more clear.

Maybe highlight the security aspect a bit more: When reads are always
coherent, we don't have to force the clflush. If reads are not
coherent we must ensure that the clflush has finished before userspace
can get at the backing storage, like writing ptes and similar things.
Writes otoh can only result in userspace eating cacheling corruption
if it races against the kernel (by e.g. trying to predict where we'll
bind a buffer and issuing gpu access to that location before the
buffer is actually bound from some other engine in parallel with an
execbuf that binds the buffer).

Atm we don't do a great job with that, but that's something that I
think is getting looked into.
-Daniel

> > Is it for optimizing some display related case where we can omit the
> > invalidates but still have to do the writeback to keep the disp

Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-13 Thread Matthew Auld
On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
 wrote:
>
> On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> > + /**
> > +  * @cache_coherent:
> > +  *
> > +  * Track whether the pages are coherent with the GPU if reading or
> > +  * writing through the CPU cache.
> > +  *
> > +  * This largely depends on the @cache_level, for example if the object
> > +  * is marked as I915_CACHE_LLC, then GPU access is coherent for both
> > +  * reads and writes through the CPU cache.
> > +  *
> > +  * Note that on platforms with shared-LLC support(HAS_LLC) reads 
> > through
> > +  * the CPU cache are always coherent, regardless of the @cache_level. 
> > On
> > +  * snooping based platforms this is not the case, unless the full
> > +  * I915_CACHE_LLC or similar setting is used.
> > +  *
> > +  * As a result of this we need to track coherency separately for reads
> > +  * and writes, in order to avoid superfluous flushing on shared-LLC
> > +  * platforms, for reads.
> > +  *
> > +  * I915_BO_CACHE_COHERENT_FOR_READ:
> > +  *
> > +  * When reading through the CPU cache, the GPU is still coherent. Note
> > +  * that no data has actually been modified here, so it might seem
> > +  * strange that we care about this.
> > +  *
> > +  * As an example, if some object is mapped on the CPU with write-back
> > +  * caching, and we read some page, then the cache likely now contains
> > +  * the data from that read. At this point the cache and main memory
> > +  * match up, so all good. But next the GPU needs to write some data to
> > +  * that same page. Now if the @cache_level is I915_CACHE_NONE and the
> > +  * the platform doesn't have the shared-LLC, then the GPU will
> > +  * effectively skip invalidating the cache(or however that works
> > +  * internally) when writing the new value.  This is really bad since 
> > the
> > +  * GPU has just written some new data to main memory, but the CPU 
> > cache
> > +  * is still valid and now contains stale data. As a result the next 
> > time
> > +  * we do a cached read with the CPU, we are rewarded with stale data.
> > +  * Likewise if the cache is later flushed, we might be rewarded with
> > +  * overwriting main memory with stale data.
> > +  *
> > +  * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > +  *
> > +  * When writing through the CPU cache, the GPU is still coherent. Note
> > +  * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > +  *
> > +  * This is never set when I915_CACHE_NONE is used for @cache_level,
> > +  * where instead we have to manually flush the caches after writing
> > +  * through the CPU cache. For other cache levels this should be set 
> > and
> > +  * the object is therefore considered coherent for both reads and 
> > writes
> > +  * through the CPU cache.
>
> I don't remember why we have this read vs. write split and this new
> documentation doesn't seem to really explain it either.

Hmm, I attempted to explain that earlier:

* Note that on platforms with shared-LLC support(HAS_LLC) reads through
* the CPU cache are always coherent, regardless of the @cache_level. On
* snooping based platforms this is not the case, unless the full
* I915_CACHE_LLC or similar setting is used.
*
* As a result of this we need to track coherency separately for reads
* and writes, in order to avoid superfluous flushing on shared-LLC
* platforms, for reads.

So AFAIK it's just because shared-LLC can be coherent for reads, while
also not being coherent for writes(CACHE_NONE), so being able to track
each separately is kind of needed to avoid unnecessary flushing for
the read cases i.e simple boolean for coherent vs non-coherent is not
enough.

I can try to reword things to make that more clear.

>
> Is it for optimizing some display related case where we can omit the
> invalidates but still have to do the writeback to keep the display
> engine happy?
>
> --
> Ville Syrjälä
> Intel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-13 Thread Ville Syrjälä
On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> + /**
> +  * @cache_coherent:
> +  *
> +  * Track whether the pages are coherent with the GPU if reading or
> +  * writing through the CPU cache.
> +  *
> +  * This largely depends on the @cache_level, for example if the object
> +  * is marked as I915_CACHE_LLC, then GPU access is coherent for both
> +  * reads and writes through the CPU cache.
> +  *
> +  * Note that on platforms with shared-LLC support(HAS_LLC) reads through
> +  * the CPU cache are always coherent, regardless of the @cache_level. On
> +  * snooping based platforms this is not the case, unless the full
> +  * I915_CACHE_LLC or similar setting is used.
> +  *
> +  * As a result of this we need to track coherency separately for reads
> +  * and writes, in order to avoid superfluous flushing on shared-LLC
> +  * platforms, for reads.
> +  *
> +  * I915_BO_CACHE_COHERENT_FOR_READ:
> +  *
> +  * When reading through the CPU cache, the GPU is still coherent. Note
> +  * that no data has actually been modified here, so it might seem
> +  * strange that we care about this.
> +  *
> +  * As an example, if some object is mapped on the CPU with write-back
> +  * caching, and we read some page, then the cache likely now contains
> +  * the data from that read. At this point the cache and main memory
> +  * match up, so all good. But next the GPU needs to write some data to
> +  * that same page. Now if the @cache_level is I915_CACHE_NONE and the
> +  * the platform doesn't have the shared-LLC, then the GPU will
> +  * effectively skip invalidating the cache(or however that works
> +  * internally) when writing the new value.  This is really bad since the
> +  * GPU has just written some new data to main memory, but the CPU cache
> +  * is still valid and now contains stale data. As a result the next time
> +  * we do a cached read with the CPU, we are rewarded with stale data.
> +  * Likewise if the cache is later flushed, we might be rewarded with
> +  * overwriting main memory with stale data.
> +  *
> +  * I915_BO_CACHE_COHERENT_FOR_WRITE:
> +  *
> +  * When writing through the CPU cache, the GPU is still coherent. Note
> +  * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> +  *
> +  * This is never set when I915_CACHE_NONE is used for @cache_level,
> +  * where instead we have to manually flush the caches after writing
> +  * through the CPU cache. For other cache levels this should be set and
> +  * the object is therefore considered coherent for both reads and writes
> +  * through the CPU cache.

I don't remember why we have this read vs. write split and this new
documentation doesn't seem to really explain it either.

Is it for optimizing some display related case where we can omit the
invalidates but still have to do the writeback to keep the display
engine happy?

-- 
Ville Syrjälä
Intel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-13 Thread Mika Kuoppala
Matthew Auld  writes:

> Try to document the object caching related bits, like cache_coherent and
> cache_dirty.
>
> Suggested-by: Daniel Vetter 
> Signed-off-by: Matthew Auld 
> ---
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 135 +-
>  drivers/gpu/drm/i915/i915_drv.h   |   9 --
>  2 files changed, 131 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index ef3de2ae9723..02c3529b774c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -92,6 +92,57 @@ struct drm_i915_gem_object_ops {
>   const char *name; /* friendly name for debug, e.g. lockdep classes */
>  };
>  
> +/**
> + * enum i915_cache_level - The supported GTT caching values for system memory
> + * pages.
> + *
> + * These translate to some special GTT PTE bits when binding pages into some
> + * address space. It also determines whether an object, or rather its pages 
> are
> + * coherent with the GPU, when also reading or writing through the CPU cache
> + * with those pages.
> + *
> + * Userspace can also control this through struct drm_i915_gem_caching.
> + */
> +enum i915_cache_level {
> + /**
> +  * @I915_CACHE_NONE:
> +  *
> +  * Not coherent with the CPU cache. If the cache is dirty and we need
> +  * the underlying pages to be coherent with some later GPU access then
> +  * we need to manually flush the pages.
> +  *
> +  * Note that on shared-LLC platforms reads through the CPU cache are
> +  * still coherent even with this setting. See also
> +  * I915_BO_CACHE_COHERENT_FOR_READ for more details.
> +  */
> + I915_CACHE_NONE = 0,
> + /**
> +  * @I915_CACHE_LLC:
> +  *
> +  * Coherent with the CPU cache. If the cache is dirty, then the GPU will
> +  * ensure that access remains coherent, when both reading and writing
> +  * through the CPU cache.
> +  *
> +  * Applies to both platforms with shared-LLC(HAS_LLC), and snooping
> +  * based platforms(HAS_SNOOP).
> +  */
> + I915_CACHE_LLC,
> + /**
> +  * @I915_CACHE_L3_LLC:
> +  *
> +  * gen7+, L3 sits between the domain specifc caches, eg sampler/render

typo: specifc

> +  * caches, and the large Last-Level-Cache. LLC is coherent with the CPU,
> +  * but L3 is only visible to the GPU.
> +  */

I dont get the difference between this and I915_CACHE_LLC.
Could the diff between LLC and L3_LLC be described here with example?

Thanks,
-Mika

> + I915_CACHE_L3_LLC,
> + /**
> +  * @I915_CACHE_WT:
> +  *
> +  * hsw:gt3e Write-through for scanout buffers.
> +  */
> + I915_CACHE_WT,
> +};
> +
>  enum i915_map_type {
>   I915_MAP_WB = 0,
>   I915_MAP_WC,
> @@ -228,14 +279,90 @@ struct drm_i915_gem_object {
>   unsigned int mem_flags;
>  #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */
>  #define I915_BO_FLAG_IOMEM   BIT(1) /* Object backed by IO memory */
> - /*
> -  * Is the object to be mapped as read-only to the GPU
> -  * Only honoured if hardware has relevant pte bit
> + /**
> +  * @cache_level: The desired GTT caching level.
> +  *
> +  * See enum i915_cache_level for possible values, along with what
> +  * each does.
>*/
>   unsigned int cache_level:3;
> - unsigned int cache_coherent:2;
> + /**
> +  * @cache_coherent:
> +  *
> +  * Track whether the pages are coherent with the GPU if reading or
> +  * writing through the CPU cache.
> +  *
> +  * This largely depends on the @cache_level, for example if the object
> +  * is marked as I915_CACHE_LLC, then GPU access is coherent for both
> +  * reads and writes through the CPU cache.
> +  *
> +  * Note that on platforms with shared-LLC support(HAS_LLC) reads through
> +  * the CPU cache are always coherent, regardless of the @cache_level. On
> +  * snooping based platforms this is not the case, unless the full
> +  * I915_CACHE_LLC or similar setting is used.
> +  *
> +  * As a result of this we need to track coherency separately for reads
> +  * and writes, in order to avoid superfluous flushing on shared-LLC
> +  * platforms, for reads.
> +  *
> +  * I915_BO_CACHE_COHERENT_FOR_READ:
> +  *
> +  * When reading through the CPU cache, the GPU is still coherent. Note
> +  * that no data has actually been modified here, so it might seem
> +  * strange that we care about this.
> +  *
> +  * As an example, if some object is mapped on the CPU with write-back
> +  * caching, and we read some page, then the cache likely now contains
> +  * the data from that read. At this point the cache and main memory
> +  * match up, so all good. But next the GPU needs to write some data to
> +  * that same p

[Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

2021-07-13 Thread Matthew Auld
Try to document the object caching related bits, like cache_coherent and
cache_dirty.

Suggested-by: Daniel Vetter 
Signed-off-by: Matthew Auld 
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  | 135 +-
 drivers/gpu/drm/i915/i915_drv.h   |   9 --
 2 files changed, 131 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index ef3de2ae9723..02c3529b774c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -92,6 +92,57 @@ struct drm_i915_gem_object_ops {
const char *name; /* friendly name for debug, e.g. lockdep classes */
 };
 
+/**
+ * enum i915_cache_level - The supported GTT caching values for system memory
+ * pages.
+ *
+ * These translate to some special GTT PTE bits when binding pages into some
+ * address space. It also determines whether an object, or rather its pages are
+ * coherent with the GPU, when also reading or writing through the CPU cache
+ * with those pages.
+ *
+ * Userspace can also control this through struct drm_i915_gem_caching.
+ */
+enum i915_cache_level {
+   /**
+* @I915_CACHE_NONE:
+*
+* Not coherent with the CPU cache. If the cache is dirty and we need
+* the underlying pages to be coherent with some later GPU access then
+* we need to manually flush the pages.
+*
+* Note that on shared-LLC platforms reads through the CPU cache are
+* still coherent even with this setting. See also
+* I915_BO_CACHE_COHERENT_FOR_READ for more details.
+*/
+   I915_CACHE_NONE = 0,
+   /**
+* @I915_CACHE_LLC:
+*
+* Coherent with the CPU cache. If the cache is dirty, then the GPU will
+* ensure that access remains coherent, when both reading and writing
+* through the CPU cache.
+*
+* Applies to both platforms with shared-LLC(HAS_LLC), and snooping
+* based platforms(HAS_SNOOP).
+*/
+   I915_CACHE_LLC,
+   /**
+* @I915_CACHE_L3_LLC:
+*
+* gen7+, L3 sits between the domain specifc caches, eg sampler/render
+* caches, and the large Last-Level-Cache. LLC is coherent with the CPU,
+* but L3 is only visible to the GPU.
+*/
+   I915_CACHE_L3_LLC,
+   /**
+* @I915_CACHE_WT:
+*
+* hsw:gt3e Write-through for scanout buffers.
+*/
+   I915_CACHE_WT,
+};
+
 enum i915_map_type {
I915_MAP_WB = 0,
I915_MAP_WC,
@@ -228,14 +279,90 @@ struct drm_i915_gem_object {
unsigned int mem_flags;
 #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */
 #define I915_BO_FLAG_IOMEM   BIT(1) /* Object backed by IO memory */
-   /*
-* Is the object to be mapped as read-only to the GPU
-* Only honoured if hardware has relevant pte bit
+   /**
+* @cache_level: The desired GTT caching level.
+*
+* See enum i915_cache_level for possible values, along with what
+* each does.
 */
unsigned int cache_level:3;
-   unsigned int cache_coherent:2;
+   /**
+* @cache_coherent:
+*
+* Track whether the pages are coherent with the GPU if reading or
+* writing through the CPU cache.
+*
+* This largely depends on the @cache_level, for example if the object
+* is marked as I915_CACHE_LLC, then GPU access is coherent for both
+* reads and writes through the CPU cache.
+*
+* Note that on platforms with shared-LLC support(HAS_LLC) reads through
+* the CPU cache are always coherent, regardless of the @cache_level. On
+* snooping based platforms this is not the case, unless the full
+* I915_CACHE_LLC or similar setting is used.
+*
+* As a result of this we need to track coherency separately for reads
+* and writes, in order to avoid superfluous flushing on shared-LLC
+* platforms, for reads.
+*
+* I915_BO_CACHE_COHERENT_FOR_READ:
+*
+* When reading through the CPU cache, the GPU is still coherent. Note
+* that no data has actually been modified here, so it might seem
+* strange that we care about this.
+*
+* As an example, if some object is mapped on the CPU with write-back
+* caching, and we read some page, then the cache likely now contains
+* the data from that read. At this point the cache and main memory
+* match up, so all good. But next the GPU needs to write some data to
+* that same page. Now if the @cache_level is I915_CACHE_NONE and the
+* the platform doesn't have the shared-LLC, then the GPU will
+* effectively skip invalidating the cache(or however that works
+* internally) when writing the new value.  This i