[Intel-gfx] [PATCH] [RFC] intel: Non-LLC based non-blocking maps.

2012-06-19 Thread Chris Wilson
On Tue, 19 Jun 2012 09:13:20 -0700, Ben Widawsky  wrote:
> On Tue, 19 Jun 2012 09:22:03 +0100
> Chris Wilson  wrote:
> 
> > On Mon, 18 Jun 2012 20:38:15 -0700, Ben Widawsky  
> > wrote:
> > > The history on this patch goes back quite a way. This time around, the
> > > patch builds on top of the map_unsynchronized that Eric pushed. Eric's
> > > patch attempted only to solve the problem for LLC machines. Unlike
> > > my earlier versions of this patch (with the help from Daniel Vetter), we
> > > do not attempt to cpu map objects in a unsynchronized manner.
> > > 
> > > The concept is fairly simple - once a buffer is moved into the GTT
> > > domain, we can assume it remains there unless we tell it otherwise (via
> > > cpu map). It therefore stands to reason that as long as we can keep the
> > > object in the GTT domain, and don't ever count on reading back contents,
> > > things might just work. I believe as long as we are doing GTT mappings
> > > only, we get to avoid worry about clflushing the dirtied cachelines, but
> > > that could use some fact checking.
> > > 
> > > The patch makes some assumptions about how the kernel does buffer
> > > tracking, this could be conceived as an ABI dependency, but actually the
> > > behavior is pretty confined. It exploits the fact the BOs are only moved
> > > into the CPU domain under certain circumstances, and daintily dances
> > > around those conditions. The main thing here is we assume MADV_WILLNEED
> > > prevents the object from getting evicted.
> > > 
> > > I am not aware of a good way to test it's effectiveness
> > > performance-wise; but it introduces no regressions with piglit on my
> > > ILK, or SNB.
> > 
> > This is broken wrt to cache invalidation if I want to rewrite part of
> > the buffer that already has been read by the GPU.
> > -Chris
> > 
> 
> Well if you're talking about what I think you're talking about (ie. not
> clflushing, but simply dealing with the GPUs internal caching). It's a
> problem that has existed with all of the non-LLC non-blocking map
> patches; and sort of the point of non-blocking maps. Play it fast and
> loose, submit pipe controls if you get nervous.
> 
> Did I catch your meaning, or were you just talking about clflushing
> stuff (we also miss chipset flush on really old platforms; I was
> thinking of restricting this to ILK only)?

Sorry, I actually meant GPU caches. However, I was under the false
impression that you were chaning existing API not bringing GTT maps into
compliance with the new async mappings. My warning was merely about the
issue that can arrise from missing the invalidate when reusing an async
map (or even if the sampler prefetches futher than expected which is
what I was stung by most recently.) Furthermore, Daniel has just added
unconditional invalidates before each batchbuffer which neatly papers
over this issue (in the future at least).

Again, sorry for the noise, please continue :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[Intel-gfx] [PATCH] [RFC] intel: Non-LLC based non-blocking maps.

2012-06-19 Thread Chris Wilson
On Mon, 18 Jun 2012 20:38:15 -0700, Ben Widawsky  wrote:
> The history on this patch goes back quite a way. This time around, the
> patch builds on top of the map_unsynchronized that Eric pushed. Eric's
> patch attempted only to solve the problem for LLC machines. Unlike
> my earlier versions of this patch (with the help from Daniel Vetter), we
> do not attempt to cpu map objects in a unsynchronized manner.
> 
> The concept is fairly simple - once a buffer is moved into the GTT
> domain, we can assume it remains there unless we tell it otherwise (via
> cpu map). It therefore stands to reason that as long as we can keep the
> object in the GTT domain, and don't ever count on reading back contents,
> things might just work. I believe as long as we are doing GTT mappings
> only, we get to avoid worry about clflushing the dirtied cachelines, but
> that could use some fact checking.
> 
> The patch makes some assumptions about how the kernel does buffer
> tracking, this could be conceived as an ABI dependency, but actually the
> behavior is pretty confined. It exploits the fact the BOs are only moved
> into the CPU domain under certain circumstances, and daintily dances
> around those conditions. The main thing here is we assume MADV_WILLNEED
> prevents the object from getting evicted.
> 
> I am not aware of a good way to test it's effectiveness
> performance-wise; but it introduces no regressions with piglit on my
> ILK, or SNB.

This is broken wrt to cache invalidation if I want to rewrite part of
the buffer that already has been read by the GPU.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[Intel-gfx] [PATCH] [RFC] intel: Non-LLC based non-blocking maps.

2012-06-19 Thread Ben Widawsky
On Tue, 19 Jun 2012 09:22:03 +0100
Chris Wilson  wrote:

> On Mon, 18 Jun 2012 20:38:15 -0700, Ben Widawsky  wrote:
> > The history on this patch goes back quite a way. This time around, the
> > patch builds on top of the map_unsynchronized that Eric pushed. Eric's
> > patch attempted only to solve the problem for LLC machines. Unlike
> > my earlier versions of this patch (with the help from Daniel Vetter), we
> > do not attempt to cpu map objects in a unsynchronized manner.
> > 
> > The concept is fairly simple - once a buffer is moved into the GTT
> > domain, we can assume it remains there unless we tell it otherwise (via
> > cpu map). It therefore stands to reason that as long as we can keep the
> > object in the GTT domain, and don't ever count on reading back contents,
> > things might just work. I believe as long as we are doing GTT mappings
> > only, we get to avoid worry about clflushing the dirtied cachelines, but
> > that could use some fact checking.
> > 
> > The patch makes some assumptions about how the kernel does buffer
> > tracking, this could be conceived as an ABI dependency, but actually the
> > behavior is pretty confined. It exploits the fact the BOs are only moved
> > into the CPU domain under certain circumstances, and daintily dances
> > around those conditions. The main thing here is we assume MADV_WILLNEED
> > prevents the object from getting evicted.
> > 
> > I am not aware of a good way to test it's effectiveness
> > performance-wise; but it introduces no regressions with piglit on my
> > ILK, or SNB.
> 
> This is broken wrt to cache invalidation if I want to rewrite part of
> the buffer that already has been read by the GPU.
> -Chris
> 

Well if you're talking about what I think you're talking about (ie. not
clflushing, but simply dealing with the GPUs internal caching). It's a
problem that has existed with all of the non-LLC non-blocking map
patches; and sort of the point of non-blocking maps. Play it fast and
loose, submit pipe controls if you get nervous.

Did I catch your meaning, or were you just talking about clflushing
stuff (we also miss chipset flush on really old platforms; I was
thinking of restricting this to ILK only)?

-- 
Ben Widawsky, Intel Open Source Technology Center


Re: [Intel-gfx] [PATCH] [RFC] intel: Non-LLC based non-blocking maps.

2012-06-19 Thread Chris Wilson
On Mon, 18 Jun 2012 20:38:15 -0700, Ben Widawsky b...@bwidawsk.net wrote:
 The history on this patch goes back quite a way. This time around, the
 patch builds on top of the map_unsynchronized that Eric pushed. Eric's
 patch attempted only to solve the problem for LLC machines. Unlike
 my earlier versions of this patch (with the help from Daniel Vetter), we
 do not attempt to cpu map objects in a unsynchronized manner.
 
 The concept is fairly simple - once a buffer is moved into the GTT
 domain, we can assume it remains there unless we tell it otherwise (via
 cpu map). It therefore stands to reason that as long as we can keep the
 object in the GTT domain, and don't ever count on reading back contents,
 things might just work. I believe as long as we are doing GTT mappings
 only, we get to avoid worry about clflushing the dirtied cachelines, but
 that could use some fact checking.
 
 The patch makes some assumptions about how the kernel does buffer
 tracking, this could be conceived as an ABI dependency, but actually the
 behavior is pretty confined. It exploits the fact the BOs are only moved
 into the CPU domain under certain circumstances, and daintily dances
 around those conditions. The main thing here is we assume MADV_WILLNEED
 prevents the object from getting evicted.
 
 I am not aware of a good way to test it's effectiveness
 performance-wise; but it introduces no regressions with piglit on my
 ILK, or SNB.

This is broken wrt to cache invalidation if I want to rewrite part of
the buffer that already has been read by the GPU.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] [RFC] intel: Non-LLC based non-blocking maps.

2012-06-19 Thread Ben Widawsky
On Tue, 19 Jun 2012 09:22:03 +0100
Chris Wilson ch...@chris-wilson.co.uk wrote:

 On Mon, 18 Jun 2012 20:38:15 -0700, Ben Widawsky b...@bwidawsk.net wrote:
  The history on this patch goes back quite a way. This time around, the
  patch builds on top of the map_unsynchronized that Eric pushed. Eric's
  patch attempted only to solve the problem for LLC machines. Unlike
  my earlier versions of this patch (with the help from Daniel Vetter), we
  do not attempt to cpu map objects in a unsynchronized manner.
  
  The concept is fairly simple - once a buffer is moved into the GTT
  domain, we can assume it remains there unless we tell it otherwise (via
  cpu map). It therefore stands to reason that as long as we can keep the
  object in the GTT domain, and don't ever count on reading back contents,
  things might just work. I believe as long as we are doing GTT mappings
  only, we get to avoid worry about clflushing the dirtied cachelines, but
  that could use some fact checking.
  
  The patch makes some assumptions about how the kernel does buffer
  tracking, this could be conceived as an ABI dependency, but actually the
  behavior is pretty confined. It exploits the fact the BOs are only moved
  into the CPU domain under certain circumstances, and daintily dances
  around those conditions. The main thing here is we assume MADV_WILLNEED
  prevents the object from getting evicted.
  
  I am not aware of a good way to test it's effectiveness
  performance-wise; but it introduces no regressions with piglit on my
  ILK, or SNB.
 
 This is broken wrt to cache invalidation if I want to rewrite part of
 the buffer that already has been read by the GPU.
 -Chris
 

Well if you're talking about what I think you're talking about (ie. not
clflushing, but simply dealing with the GPUs internal caching). It's a
problem that has existed with all of the non-LLC non-blocking map
patches; and sort of the point of non-blocking maps. Play it fast and
loose, submit pipe controls if you get nervous.

Did I catch your meaning, or were you just talking about clflushing
stuff (we also miss chipset flush on really old platforms; I was
thinking of restricting this to ILK only)?

-- 
Ben Widawsky, Intel Open Source Technology Center
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] [RFC] intel: Non-LLC based non-blocking maps.

2012-06-19 Thread Chris Wilson
On Tue, 19 Jun 2012 09:13:20 -0700, Ben Widawsky b...@bwidawsk.net wrote:
 On Tue, 19 Jun 2012 09:22:03 +0100
 Chris Wilson ch...@chris-wilson.co.uk wrote:
 
  On Mon, 18 Jun 2012 20:38:15 -0700, Ben Widawsky b...@bwidawsk.net wrote:
   The history on this patch goes back quite a way. This time around, the
   patch builds on top of the map_unsynchronized that Eric pushed. Eric's
   patch attempted only to solve the problem for LLC machines. Unlike
   my earlier versions of this patch (with the help from Daniel Vetter), we
   do not attempt to cpu map objects in a unsynchronized manner.
   
   The concept is fairly simple - once a buffer is moved into the GTT
   domain, we can assume it remains there unless we tell it otherwise (via
   cpu map). It therefore stands to reason that as long as we can keep the
   object in the GTT domain, and don't ever count on reading back contents,
   things might just work. I believe as long as we are doing GTT mappings
   only, we get to avoid worry about clflushing the dirtied cachelines, but
   that could use some fact checking.
   
   The patch makes some assumptions about how the kernel does buffer
   tracking, this could be conceived as an ABI dependency, but actually the
   behavior is pretty confined. It exploits the fact the BOs are only moved
   into the CPU domain under certain circumstances, and daintily dances
   around those conditions. The main thing here is we assume MADV_WILLNEED
   prevents the object from getting evicted.
   
   I am not aware of a good way to test it's effectiveness
   performance-wise; but it introduces no regressions with piglit on my
   ILK, or SNB.
  
  This is broken wrt to cache invalidation if I want to rewrite part of
  the buffer that already has been read by the GPU.
  -Chris
  
 
 Well if you're talking about what I think you're talking about (ie. not
 clflushing, but simply dealing with the GPUs internal caching). It's a
 problem that has existed with all of the non-LLC non-blocking map
 patches; and sort of the point of non-blocking maps. Play it fast and
 loose, submit pipe controls if you get nervous.
 
 Did I catch your meaning, or were you just talking about clflushing
 stuff (we also miss chipset flush on really old platforms; I was
 thinking of restricting this to ILK only)?

Sorry, I actually meant GPU caches. However, I was under the false
impression that you were chaning existing API not bringing GTT maps into
compliance with the new async mappings. My warning was merely about the
issue that can arrise from missing the invalidate when reusing an async
map (or even if the sampler prefetches futher than expected which is
what I was stung by most recently.) Furthermore, Daniel has just added
unconditional invalidates before each batchbuffer which neatly papers
over this issue (in the future at least).

Again, sorry for the noise, please continue :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel