On Wed, Jul 14, 2021 at 02:42:53PM +0300, Ville Syrjälä wrote:
> On Wed, Jul 14, 2021 at 01:16:57PM +0200, Daniel Vetter wrote:
> > On Tue, Jul 13, 2021 at 09:46:30PM +0300, Ville Syrjälä wrote:
> > > On Tue, Jul 13, 2021 at 07:24:23PM +0100, Matthew Auld wrote:
> > > > On Tue, 13 Jul 2021 at 18:47, Ville Syrjälä
> > > > <ville.syrj...@linux.intel.com> wrote:
> > > > >
> > > > > On Tue, Jul 13, 2021 at 05:13:37PM +0100, Matthew Auld wrote:
> > > > > > On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
> > > > > > <ville.syrj...@linux.intel.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> > > > > > > > +     /**
> > > > > > > > +      * @cache_coherent:
> > > > > > > > +      *
> > > > > > > > +      * Track whether the pages are coherent with the GPU if 
> > > > > > > > reading or
> > > > > > > > +      * writing through the CPU cache.
> > > > > > > > +      *
> > > > > > > > +      * This largely depends on the @cache_level, for example 
> > > > > > > > if the object
> > > > > > > > +      * is marked as I915_CACHE_LLC, then GPU access is 
> > > > > > > > coherent for both
> > > > > > > > +      * reads and writes through the CPU cache.
> > > > > > > > +      *
> > > > > > > > +      * Note that on platforms with shared-LLC 
> > > > > > > > support(HAS_LLC) reads through
> > > > > > > > +      * the CPU cache are always coherent, regardless of the 
> > > > > > > > @cache_level. On
> > > > > > > > +      * snooping based platforms this is not the case, unless 
> > > > > > > > the full
> > > > > > > > +      * I915_CACHE_LLC or similar setting is used.
> > > > > > > > +      *
> > > > > > > > +      * As a result of this we need to track coherency 
> > > > > > > > separately for reads
> > > > > > > > +      * and writes, in order to avoid superfluous flushing on 
> > > > > > > > shared-LLC
> > > > > > > > +      * platforms, for reads.
> > > > > > > > +      *
> > > > > > > > +      * I915_BO_CACHE_COHERENT_FOR_READ:
> > > > > > > > +      *
> > > > > > > > +      * When reading through the CPU cache, the GPU is still 
> > > > > > > > coherent. Note
> > > > > > > > +      * that no data has actually been modified here, so it 
> > > > > > > > might seem
> > > > > > > > +      * strange that we care about this.
> > > > > > > > +      *
> > > > > > > > +      * As an example, if some object is mapped on the CPU 
> > > > > > > > with write-back
> > > > > > > > +      * caching, and we read some page, then the cache likely 
> > > > > > > > now contains
> > > > > > > > +      * the data from that read. At this point the cache and 
> > > > > > > > main memory
> > > > > > > > +      * match up, so all good. But next the GPU needs to write 
> > > > > > > > some data to
> > > > > > > > +      * that same page. Now if the @cache_level is 
> > > > > > > > I915_CACHE_NONE and the
> > > > > > > > +      * the platform doesn't have the shared-LLC, then the GPU 
> > > > > > > > will
> > > > > > > > +      * effectively skip invalidating the cache(or however 
> > > > > > > > that works
> > > > > > > > +      * internally) when writing the new value.  This is 
> > > > > > > > really bad since the
> > > > > > > > +      * GPU has just written some new data to main memory, but 
> > > > > > > > the CPU cache
> > > > > > > > +      * is still valid and now contains stale data. As a 
> > > > > > > > result the next time
> > > > > > > > +      * we do a cached read with the CPU, we are rewarded with 
> > > > > > > > stale data.
> > > > > > > > +      * Likewise if the cache is later flushed, we might be 
> > > > > > > > rewarded with
> > > > > > > > +      * overwriting main memory with stale data.
> > > > > > > > +      *
> > > > > > > > +      * I915_BO_CACHE_COHERENT_FOR_WRITE:
> > > > > > > > +      *
> > > > > > > > +      * When writing through the CPU cache, the GPU is still 
> > > > > > > > coherent. Note
> > > > > > > > +      * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> > > > > > > > +      *
> > > > > > > > +      * This is never set when I915_CACHE_NONE is used for 
> > > > > > > > @cache_level,
> > > > > > > > +      * where instead we have to manually flush the caches 
> > > > > > > > after writing
> > > > > > > > +      * through the CPU cache. For other cache levels this 
> > > > > > > > should be set and
> > > > > > > > +      * the object is therefore considered coherent for both 
> > > > > > > > reads and writes
> > > > > > > > +      * through the CPU cache.
> > > > > > >
> > > > > > > I don't remember why we have this read vs. write split and this 
> > > > > > > new
> > > > > > > documentation doesn't seem to really explain it either.
> > > > > >
> > > > > > Hmm, I attempted to explain that earlier:
> > > > > >
> > > > > > * Note that on platforms with shared-LLC support(HAS_LLC) reads 
> > > > > > through
> > > > > > * the CPU cache are always coherent, regardless of the 
> > > > > > @cache_level. On
> > > > > > * snooping based platforms this is not the case, unless the full
> > > > > > * I915_CACHE_LLC or similar setting is used.
> > > > > > *
> > > > > > * As a result of this we need to track coherency separately for 
> > > > > > reads
> > > > > > * and writes, in order to avoid superfluous flushing on shared-LLC
> > > > > > * platforms, for reads.
> > > > > >
> > > > > > So AFAIK it's just because shared-LLC can be coherent for reads, 
> > > > > > while
> > > > > > also not being coherent for writes(CACHE_NONE),
> > > > >
> > > > > CPU vs. GPU is fully coherent when it comes to LLC. Or at least I've
> > > > > never heard of any mechanism that would make it only partially 
> > > > > coherent.
> > > > 
> > > > What do you mean by "comes to LLC", are you talking about HAS_LLC() or
> > > > I915_CACHE_LLC?
> > > 
> > > I'm talking about the actual cache.
> > > 
> > > > 
> > > > If you set I915_CACHE_LLC, then yes it is fully coherent for both
> > > > HAS_LLC() and HAS_SNOOP().
> > > > 
> > > > If you set I915_CACHE_NONE, then reads are still coherent on
> > > > HAS_LLC(),
> > > 
> > > Reads and writes both. The only thing that's not coherent is the
> > > display engine.
> > 
> > There's a lot of code which seems to disagree,
> 
> Can't even imagine why anyone would make a cache coherency protocol
> that only handles reads but not writes...
> 
> > plus there's now this new
> > MOCS thing.
> 
> That's just a full LLC bypass AFAICS. Can't omit invalidates if
> you use that one or you'll just get stale data from the cache
> on reads as well.
> 
> > I really hope we don't have all those cache coherency bits
> > just because the code complexity is entertaining?
> 
> They were definitely added to fix a display issue, and before
> that it was just a single flag, which wasn't doing what the display
> needed. I think before the flag was added we used some other indicators
> to check when we need to clflush, or maybe we did a some extra pointless
> clflushes here and there and the broken single flag was supposed to
> avoid those. Not quite sure.

Hm I thought cache_dirty was added for that display case? But yeah this
entire model is maybe not super well-defined in terms of all the use-cases
and what exactly it means ...

I'm also not clear why this is on the object, and not some kind of
function which computes it from the platform + cache_level.

> I suppose these two flags should maybe have been named more like
> "needs invalidate" and "needs writeback" to make it clear what 
> one needs to do.

tbh I have no idea, this all started to get added after I disappeared for
a few years into display.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to