Re: [Mesa-dev] i965: On-demand render target flushing

2017-03-30 Thread Francisco Jerez
"Pohjolainen, Topi"  writes:

> Jason, Curro, do you have any opinion if this is worth pursuing?
> I need something for blorp blits at least - using blorp for texture
> uploads on top of current excessive flushing regresses perf.
>

I wouldn't be surprised if it improves throughput slightly for workloads
doing both compute and 3D rendering, but only testing will tell how
helpful it is in practice...

> When working on gpu hangs on SKL we also identified compute
> flushing caches it shouldn't. I think those could be addressed
> nicely here as well. I can respin if this is something we'd like
> to have.
>
> On Fri, Feb 17, 2017 at 09:32:03PM +0200, Topi Pohjolainen wrote:
>> Currently:
>> 
>> 1) Blorp color clears and resolves emit unconditional render target
>>flush + command stream after every clear/resolve (including
>>regular non-fast clears).
>> 
>> 2) Blorp color clears, resolves and blits emit texture and constant
>>cache resolves even in case only destination is dirty. This is
>>because brw_render_cache_set_check_flush() does both render target
>>flush as well as the top-of-pipe read cache flushes.
>> 
>> 3) Similarly to item 2, 3D and compute paths also flush texture and
>>constant caches even if none of the texture surfaces are dirty.
>> 
>> 4) In case of multiple surfaces needing resolves, all render paths
>>(blorp, 3D and compute) emit render target, texture and constant
>>cache flushes after each resolve instead of just once after all
>>resolves.
>> 
>> This series addresses all four cases. Good news are that even though
>> the current setup isn't optimal, it doesn't actually get any better in
>> most cases performance wise. There is modest gain in OglDrvRes which
>> does heavy blorp blitting. I'm expecting this series also to make
>> blorp tex uploads and blorp mipmap generation more competitive.
>> 
>> Bad news are in the final patch - it looks that current unconditional
>> flushing/stalling has been hiding bugs elsewhere. There are cases
>> which rely on the flushes after non-fast clears. Hunting the real
>> cause is, however, difficult. I only saw them in CI system within
>> full runs and was not able to reproduce them myself.
>> 
>> As first steps the series introduces end-of-pipe synchronization.
>> This is a flush combined with stall and post-sync operation of
>> writing a double word (32 bits). Until now this wasn't really
>> needed as there was in many cases double flushing which in turn
>> looks to take long enough to hide the need for the sync. I also
>> noticed that one needs to be rather careful with it - performance
>> gets decreased noticeably when used unneeded.
>> 
>> I don't really know if we want to go this way myself even. Current
>> logic - while not ideal - is rather simple.
>> 
>> Topi Pohjolainen (16):
>>   i965/miptree: Tell if anything got resolved
>>   i965/gen6+: Implement end-of-pipe sync
>>   i965: Hook end-of-pipe-sync after texture resolves
>>   i965: Hook end-of-pipe-sync after image resolves
>>   i965: Hook end-of-pipe-sync after framebuffer resolves
>>   i965: Consider layered rt resolves along with other
>>   i965: Add color resolve end-of-pipe-sync before switch to blit ring
>>   i965/dri2: Add end-of-pipe-sync after color resolves
>>   i965/miptree: Add color resolve end-of-pipe-sync before sharing
>>   i965: Add end-of-pipe sync before non-gpu read of color resolves
>>   i965/blorp: Do more fine grained flushing/syncing
>>   i965/blorp/blit: Refactor hiz/ccs prep for blits
>>   i965/blorp: Use conditional end-of-pipe-sync
>>   i965: Consider surface resolves and sync after blorp ops
>>   i965: Check if fast color clear state transition needs sync
>>   i965/blorp: Drop unnecessary flushes after clear/resolve
>> 
>>  src/mesa/drivers/dri/i965/brw_blorp.c  | 187 ++
>>  src/mesa/drivers/dri/i965/brw_compute.c|   2 +
>>  src/mesa/drivers/dri/i965/brw_context.c| 333 
>> +++--
>>  src/mesa/drivers/dri/i965/brw_context.h|   3 +
>>  src/mesa/drivers/dri/i965/brw_draw.c   |  36 +--
>>  src/mesa/drivers/dri/i965/brw_pipe_control.c   |  91 +++
>>  src/mesa/drivers/dri/i965/genX_blorp_exec.c|  11 -
>>  src/mesa/drivers/dri/i965/intel_blit.c |  16 +-
>>  src/mesa/drivers/dri/i965/intel_copy_image.c   |  10 +-
>>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c  |  25 +-
>>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h  |   2 +-
>>  src/mesa/drivers/dri/i965/intel_pixel.c|   4 +
>>  src/mesa/drivers/dri/i965/intel_pixel_bitmap.c |   5 +-
>>  src/mesa/drivers/dri/i965/intel_pixel_read.c   |   7 +-
>>  src/mesa/drivers/dri/i965/intel_tex_image.c|  10 +-
>>  src/mesa/drivers/dri/i965/intel_tex_subimage.c |  11 +-
>>  16 files changed, 557 insertions(+), 196 deletions(-)
>> 
>> -- 
>> 2.5.5
>> 


signature.asc
Description: PGP signature
___
mesa-dev mailing list

Re: [Mesa-dev] i965: On-demand render target flushing

2017-03-28 Thread Pohjolainen, Topi

Jason, Curro, do you have any opinion if this is worth pursuing?
I need something for blorp blits at least - using blorp for texture
uploads on top of current excessive flushing regresses perf.

When working on gpu hangs on SKL we also identified compute
flushing caches it shouldn't. I think those could be addressed
nicely here as well. I can respin if this is something we'd like
to have.

On Fri, Feb 17, 2017 at 09:32:03PM +0200, Topi Pohjolainen wrote:
> Currently:
> 
> 1) Blorp color clears and resolves emit unconditional render target
>flush + command stream after every clear/resolve (including
>regular non-fast clears).
> 
> 2) Blorp color clears, resolves and blits emit texture and constant
>cache resolves even in case only destination is dirty. This is
>because brw_render_cache_set_check_flush() does both render target
>flush as well as the top-of-pipe read cache flushes.
> 
> 3) Similarly to item 2, 3D and compute paths also flush texture and
>constant caches even if none of the texture surfaces are dirty.
> 
> 4) In case of multiple surfaces needing resolves, all render paths
>(blorp, 3D and compute) emit render target, texture and constant
>cache flushes after each resolve instead of just once after all
>resolves.
> 
> This series addresses all four cases. Good news are that even though
> the current setup isn't optimal, it doesn't actually get any better in
> most cases performance wise. There is modest gain in OglDrvRes which
> does heavy blorp blitting. I'm expecting this series also to make
> blorp tex uploads and blorp mipmap generation more competitive.
> 
> Bad news are in the final patch - it looks that current unconditional
> flushing/stalling has been hiding bugs elsewhere. There are cases
> which rely on the flushes after non-fast clears. Hunting the real
> cause is, however, difficult. I only saw them in CI system within
> full runs and was not able to reproduce them myself.
> 
> As first steps the series introduces end-of-pipe synchronization.
> This is a flush combined with stall and post-sync operation of
> writing a double word (32 bits). Until now this wasn't really
> needed as there was in many cases double flushing which in turn
> looks to take long enough to hide the need for the sync. I also
> noticed that one needs to be rather careful with it - performance
> gets decreased noticeably when used unneeded.
> 
> I don't really know if we want to go this way myself even. Current
> logic - while not ideal - is rather simple.
> 
> Topi Pohjolainen (16):
>   i965/miptree: Tell if anything got resolved
>   i965/gen6+: Implement end-of-pipe sync
>   i965: Hook end-of-pipe-sync after texture resolves
>   i965: Hook end-of-pipe-sync after image resolves
>   i965: Hook end-of-pipe-sync after framebuffer resolves
>   i965: Consider layered rt resolves along with other
>   i965: Add color resolve end-of-pipe-sync before switch to blit ring
>   i965/dri2: Add end-of-pipe-sync after color resolves
>   i965/miptree: Add color resolve end-of-pipe-sync before sharing
>   i965: Add end-of-pipe sync before non-gpu read of color resolves
>   i965/blorp: Do more fine grained flushing/syncing
>   i965/blorp/blit: Refactor hiz/ccs prep for blits
>   i965/blorp: Use conditional end-of-pipe-sync
>   i965: Consider surface resolves and sync after blorp ops
>   i965: Check if fast color clear state transition needs sync
>   i965/blorp: Drop unnecessary flushes after clear/resolve
> 
>  src/mesa/drivers/dri/i965/brw_blorp.c  | 187 ++
>  src/mesa/drivers/dri/i965/brw_compute.c|   2 +
>  src/mesa/drivers/dri/i965/brw_context.c| 333 
> +++--
>  src/mesa/drivers/dri/i965/brw_context.h|   3 +
>  src/mesa/drivers/dri/i965/brw_draw.c   |  36 +--
>  src/mesa/drivers/dri/i965/brw_pipe_control.c   |  91 +++
>  src/mesa/drivers/dri/i965/genX_blorp_exec.c|  11 -
>  src/mesa/drivers/dri/i965/intel_blit.c |  16 +-
>  src/mesa/drivers/dri/i965/intel_copy_image.c   |  10 +-
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c  |  25 +-
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h  |   2 +-
>  src/mesa/drivers/dri/i965/intel_pixel.c|   4 +
>  src/mesa/drivers/dri/i965/intel_pixel_bitmap.c |   5 +-
>  src/mesa/drivers/dri/i965/intel_pixel_read.c   |   7 +-
>  src/mesa/drivers/dri/i965/intel_tex_image.c|  10 +-
>  src/mesa/drivers/dri/i965/intel_tex_subimage.c |  11 +-
>  16 files changed, 557 insertions(+), 196 deletions(-)
> 
> -- 
> 2.5.5
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] i965: On-demand render target flushing

2017-02-27 Thread Pohjolainen, Topi
On Fri, Feb 17, 2017 at 09:32:03PM +0200, Topi Pohjolainen wrote:
> Currently:
> 
> 1) Blorp color clears and resolves emit unconditional render target
>flush + command stream after every clear/resolve (including
>regular non-fast clears).
> 
> 2) Blorp color clears, resolves and blits emit texture and constant
>cache resolves even in case only destination is dirty. This is
>because brw_render_cache_set_check_flush() does both render target
>flush as well as the top-of-pipe read cache flushes.
> 
> 3) Similarly to item 2, 3D and compute paths also flush texture and
>constant caches even if none of the texture surfaces are dirty.
> 
> 4) In case of multiple surfaces needing resolves, all render paths
>(blorp, 3D and compute) emit render target, texture and constant
>cache flushes after each resolve instead of just once after all
>resolves.
> 
> This series addresses all four cases. Good news are that even though
> the current setup isn't optimal, it doesn't actually get any better in
> most cases performance wise. There is modest gain in OglDrvRes which
> does heavy blorp blitting. I'm expecting this series also to make
> blorp tex uploads and blorp mipmap generation more competitive.
> 
> Bad news are in the final patch - it looks that current unconditional
> flushing/stalling has been hiding bugs elsewhere. There are cases
> which rely on the flushes after non-fast clears. Hunting the real
> cause is, however, difficult. I only saw them in CI system within
> full runs and was not able to reproduce them myself.

With the added patch disabling deferred allocation of mcs for CCS_E,
the last patch gives:

  OglDrvRes:   4.30768% +/- 0.183354%
  OglDrvState: 3.16878% +/- 0.769695%

This is mostly due to blorp blits flushing less than before.

This series also improves blorp tex upload. I'll send some updates
to that series.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] i965: On-demand render target flushing

2017-02-17 Thread Topi Pohjolainen
Currently:

1) Blorp color clears and resolves emit unconditional render target
   flush + command stream after every clear/resolve (including
   regular non-fast clears).

2) Blorp color clears, resolves and blits emit texture and constant
   cache resolves even in case only destination is dirty. This is
   because brw_render_cache_set_check_flush() does both render target
   flush as well as the top-of-pipe read cache flushes.

3) Similarly to item 2, 3D and compute paths also flush texture and
   constant caches even if none of the texture surfaces are dirty.

4) In case of multiple surfaces needing resolves, all render paths
   (blorp, 3D and compute) emit render target, texture and constant
   cache flushes after each resolve instead of just once after all
   resolves.

This series addresses all four cases. Good news are that even though
the current setup isn't optimal, it doesn't actually get any better in
most cases performance wise. There is modest gain in OglDrvRes which
does heavy blorp blitting. I'm expecting this series also to make
blorp tex uploads and blorp mipmap generation more competitive.

Bad news are in the final patch - it looks that current unconditional
flushing/stalling has been hiding bugs elsewhere. There are cases
which rely on the flushes after non-fast clears. Hunting the real
cause is, however, difficult. I only saw them in CI system within
full runs and was not able to reproduce them myself.

As first steps the series introduces end-of-pipe synchronization.
This is a flush combined with stall and post-sync operation of
writing a double word (32 bits). Until now this wasn't really
needed as there was in many cases double flushing which in turn
looks to take long enough to hide the need for the sync. I also
noticed that one needs to be rather careful with it - performance
gets decreased noticeably when used unneeded.

I don't really know if we want to go this way myself even. Current
logic - while not ideal - is rather simple.

Topi Pohjolainen (16):
  i965/miptree: Tell if anything got resolved
  i965/gen6+: Implement end-of-pipe sync
  i965: Hook end-of-pipe-sync after texture resolves
  i965: Hook end-of-pipe-sync after image resolves
  i965: Hook end-of-pipe-sync after framebuffer resolves
  i965: Consider layered rt resolves along with other
  i965: Add color resolve end-of-pipe-sync before switch to blit ring
  i965/dri2: Add end-of-pipe-sync after color resolves
  i965/miptree: Add color resolve end-of-pipe-sync before sharing
  i965: Add end-of-pipe sync before non-gpu read of color resolves
  i965/blorp: Do more fine grained flushing/syncing
  i965/blorp/blit: Refactor hiz/ccs prep for blits
  i965/blorp: Use conditional end-of-pipe-sync
  i965: Consider surface resolves and sync after blorp ops
  i965: Check if fast color clear state transition needs sync
  i965/blorp: Drop unnecessary flushes after clear/resolve

 src/mesa/drivers/dri/i965/brw_blorp.c  | 187 ++
 src/mesa/drivers/dri/i965/brw_compute.c|   2 +
 src/mesa/drivers/dri/i965/brw_context.c| 333 +++--
 src/mesa/drivers/dri/i965/brw_context.h|   3 +
 src/mesa/drivers/dri/i965/brw_draw.c   |  36 +--
 src/mesa/drivers/dri/i965/brw_pipe_control.c   |  91 +++
 src/mesa/drivers/dri/i965/genX_blorp_exec.c|  11 -
 src/mesa/drivers/dri/i965/intel_blit.c |  16 +-
 src/mesa/drivers/dri/i965/intel_copy_image.c   |  10 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c  |  25 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h  |   2 +-
 src/mesa/drivers/dri/i965/intel_pixel.c|   4 +
 src/mesa/drivers/dri/i965/intel_pixel_bitmap.c |   5 +-
 src/mesa/drivers/dri/i965/intel_pixel_read.c   |   7 +-
 src/mesa/drivers/dri/i965/intel_tex_image.c|  10 +-
 src/mesa/drivers/dri/i965/intel_tex_subimage.c |  11 +-
 16 files changed, 557 insertions(+), 196 deletions(-)

-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev