Re: [Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)

2015-10-14 Thread Ben Widawsky
On Tue, Oct 13, 2015 at 08:50:17PM -0700, Ben Widawsky wrote:
> This patch series adds support for fast color clears on SKL as it exists on
> previous generations of hardware minus the new hardware restriction on surface
> formats. Additionally, it adds support for utilizing clear values with up to 
> 32b
> per color channel (see note at the bottom). It is based on work originally 
> done
> by Kristian, so thanks to him for that initial work as well as helping me 
> debug
> some of the issues.
> 
> Additionally, thanks to Chad for helping track down the last bug in the 
> rectangle
> scaling code which was (for me) being masked by another bug (#3 below). I
> imagine it would have been several more weeks at least before I uncovered it.
> 
> We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order 
> to
> support the 32b per channel. As it turned out though, Skylake made other 
> changes
> to support this which caused weird failures which seemed to interfere with
> each other.
> 
> 1. Not all surface formats support lossless compression.
> 2. Clearing multiple color buffer attachments must happen in n passes
> 3. Change to the scaling factors for the MCS surface - SKL has 2x height (this
> was the bug which Chad helped uncover, I had it correct in my patch from March
> http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we
> had other problems which prevented merge, including #1 and #2 above).
> 
> I have no piglit, dEQP or CTS regressions (except for the last patch). I 
> haven't
> yet, but will collect perf data on this ASAP. Historically we've come to 
> expect
> this to provide large gains in tests which are memory bandwidth limited and
> doing many clears.

I left out the note here about 32b having two small regressions.

I did some very basic performance data collection. As expected, the rep_clears
which were already enabled by Chad seem to actually provide most of the gains. I
didn't actually run long enough to do much except prove to myself that there
aren't any performance regressions over the gen9 rep clears. These are the
results which shouldn't be taken too seriously (5 runs only).

Benchmark   % diff (master->full 32b fast clears)
OglBatch0 1.87   
OglBatch1 0.54   
OglBatch2 -0.44  
OglBatch3 0.11   
OglBatch4 -0.94  
OglBatch5 -2.11  
OglBatch6 1.18   
OglBatch7 7.02   
OglDeferred   3.05   
OglDeferredAA 3.6
OglFillPixel  0.07   
OglFillTexMulti   -0.01  
OglFillTexSingle  0.03   
OglGeomPoint  0.07   
OglGeomTriList0.74   
OglGeomTriStrip   -0.13  
OglHdrBloom   -1.93  
OglMultithread-0.96  
OglPSBump20.33   
OglPSBump80.31   
OglPSPhong0.18   
OglPSPom  -0.08  
OglShMapPcf   0.03   
OglShMapVsm   -0.3   
OglTerrainFlyInst 0.46   
OglTerrainPanInst 0.4
OglTexFilterAniso -0.08  
OglTexFilterTri   0.13   
OglTexMem128  0.2
OglTexMem512  -0.03  
OglVSDiffuse1 0.23   
OglVSDiffuse8 -0.23  
OglVSInstancing   -0.15  
OglVSTangent  -0.06  
OglZBuffer0.07   
fill  0.17   
filloff   -0.01  
fur   -0.19  
heaven0.56   
plot3d-0.18  
trex  4.51   
trexoff   3.69   
triangle  0.04   
valley1.86   
warsow0.18   
xonotic   0.4


BTW: the patches are here as well (with 32b support reverted):
http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=skl-fast-clear
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)

2015-10-14 Thread Neil Roberts
Looks good, it'll be great to get this landed. Patches 1-3 and 6-8 are:

Reviewed-by: Neil Roberts 

I've sent comments separately for 4, 5 and 9. Hopefully I can try to
help with patch 10 once my SKL machine arrives.

Regards,
- Neil

Ben Widawsky  writes:

> This patch series adds support for fast color clears on SKL as it exists on
> previous generations of hardware minus the new hardware restriction on surface
> formats. Additionally, it adds support for utilizing clear values with up to 
> 32b
> per color channel (see note at the bottom). It is based on work originally 
> done
> by Kristian, so thanks to him for that initial work as well as helping me 
> debug
> some of the issues.
>
> Additionally, thanks to Chad for helping track down the last bug in the 
> rectangle
> scaling code which was (for me) being masked by another bug (#3 below). I
> imagine it would have been several more weeks at least before I uncovered it.
>
> We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order 
> to
> support the 32b per channel. As it turned out though, Skylake made other 
> changes
> to support this which caused weird failures which seemed to interfere with
> each other.
>
> 1. Not all surface formats support lossless compression.
> 2. Clearing multiple color buffer attachments must happen in n passes
> 3. Change to the scaling factors for the MCS surface - SKL has 2x height (this
> was the bug which Chad helped uncover, I had it correct in my patch from March
> http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we
> had other problems which prevented merge, including #1 and #2 above).
>
> I have no piglit, dEQP or CTS regressions (except for the last patch). I 
> haven't
> yet, but will collect perf data on this ASAP. Historically we've come to 
> expect
> this to provide large gains in tests which are memory bandwidth limited and
> doing many clears.
>
> Ben Widawsky (10):
>   i965/gen8+: Remove redundant zeroing of surface state
>   i965/gen8+: Extract color clear surface state
>   i965/skl: Enable fast color clears on SKL
>   i965/skl: skip fast clears for certain surface formats
>   i965/meta/gen9: Individually fast clear color attachments
>   Revert "i965/gen9: Disable MCS for 1x color surfaces"
>   Revert "i965/gen9: Enable rep clears on gen9"
>   i965/meta: Assert fast clears and rep clears never overlap
>   i965/meta: Remove fast_clear_color variable
>   i965/gen9: Support fast clears for 32b float
>
>  src/mesa/drivers/dri/i965/brw_context.h |   1 +
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 172 
> ++--
>  src/mesa/drivers/dri/i965/brw_surface_formats.c |  27 
>  src/mesa/drivers/dri/i965/gen8_surface_state.c  |  48 ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  20 +--
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h   |   7 +-
>  6 files changed, 205 insertions(+), 70 deletions(-)
>
> -- 
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)

2015-10-13 Thread Ben Widawsky
This patch series adds support for fast color clears on SKL as it exists on
previous generations of hardware minus the new hardware restriction on surface
formats. Additionally, it adds support for utilizing clear values with up to 32b
per color channel (see note at the bottom). It is based on work originally done
by Kristian, so thanks to him for that initial work as well as helping me debug
some of the issues.

Additionally, thanks to Chad for helping track down the last bug in the 
rectangle
scaling code which was (for me) being masked by another bug (#3 below). I
imagine it would have been several more weeks at least before I uncovered it.

We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order to
support the 32b per channel. As it turned out though, Skylake made other changes
to support this which caused weird failures which seemed to interfere with
each other.

1. Not all surface formats support lossless compression.
2. Clearing multiple color buffer attachments must happen in n passes
3. Change to the scaling factors for the MCS surface - SKL has 2x height (this
was the bug which Chad helped uncover, I had it correct in my patch from March
http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we
had other problems which prevented merge, including #1 and #2 above).

I have no piglit, dEQP or CTS regressions (except for the last patch). I haven't
yet, but will collect perf data on this ASAP. Historically we've come to expect
this to provide large gains in tests which are memory bandwidth limited and
doing many clears.

Ben Widawsky (10):
  i965/gen8+: Remove redundant zeroing of surface state
  i965/gen8+: Extract color clear surface state
  i965/skl: Enable fast color clears on SKL
  i965/skl: skip fast clears for certain surface formats
  i965/meta/gen9: Individually fast clear color attachments
  Revert "i965/gen9: Disable MCS for 1x color surfaces"
  Revert "i965/gen9: Enable rep clears on gen9"
  i965/meta: Assert fast clears and rep clears never overlap
  i965/meta: Remove fast_clear_color variable
  i965/gen9: Support fast clears for 32b float

 src/mesa/drivers/dri/i965/brw_context.h |   1 +
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 172 ++--
 src/mesa/drivers/dri/i965/brw_surface_formats.c |  27 
 src/mesa/drivers/dri/i965/gen8_surface_state.c  |  48 ---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  20 +--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h   |   7 +-
 6 files changed, 205 insertions(+), 70 deletions(-)

-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev