Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.

2012-08-26 Thread Kenneth Graunke
On 08/21/2012 10:49 PM, Eric Anholt wrote:
 I don't like the idea of losing 16-wide on apps where it might have
 succeeded if we just tried.  Note that a chunk of register space gets
 eaten by things that don't scale with number of pixels, like attribute
 setup, push constants, and the MRF hack.

Nor do I, which is why I was trying to be conservative.  I figured the
attribute setup could be accounted for by a large enough fudge factor,
but I forgot about the MRF hack region.

 I could be convinced otherwise with some shader-db stats, though.

Good call.

total instructions in shared programs: 507477 - 507477 (0.00%)
instructions in affected programs: 0 - 0

However, the margin is more slim than I thought: Unigine Sanctuary
starts losing out on 16-wide with the cut-off at 70 (my patch was 75).
Perhaps I should refine the heuristic to account for the non-scaling
sections.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.

2012-08-26 Thread Eric Anholt
Kenneth Graunke kenn...@whitecape.org writes:

 On 08/21/2012 10:49 PM, Eric Anholt wrote:
 I don't like the idea of losing 16-wide on apps where it might have
 succeeded if we just tried.  Note that a chunk of register space gets
 eaten by things that don't scale with number of pixels, like attribute
 setup, push constants, and the MRF hack.

 Nor do I, which is why I was trying to be conservative.  I figured the
 attribute setup could be accounted for by a large enough fudge factor,
 but I forgot about the MRF hack region.

 I could be convinced otherwise with some shader-db stats, though.

 Good call.

 total instructions in shared programs: 507477 - 507477 (0.00%)
 instructions in affected programs: 0 - 0

 However, the margin is more slim than I thought: Unigine Sanctuary
 starts losing out on 16-wide with the cut-off at 70 (my patch was 75).
 Perhaps I should refine the heuristic to account for the non-scaling
 sections.

Sounds good enough for me.


pgpnsQGAw6vw0.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.

2012-08-22 Thread Ian Romanick

On 08/21/2012 10:49 PM, Eric Anholt wrote:

Kenneth Graunke kenn...@whitecape.org writes:


16-wide programs use roughly twice as many registers as 8-wide, and we
don't support spilling in 16-wide.  So if an 8-wide program uses more
than half the available GRFs, the 16-wide one almost certainly will fail
to compile during register allocation.

Not only that, but attempting to compiling such shaders is expensive:
programs that use a lot of registers tend to be quite complex, meaning
that we spend more time than usual generating and optimizing code.  If
we fail at register allocation, we've failed at the last step, after
needlessly burning through a lot of CPU time.

To make things worse, such shader compilation typically happens at the
first draw call using the shader, so it can cause the GPU to stall.

With all that in mind, it makes sense to short-circuit the 16-wide
attempt if the 8-wide program uses too many registers.  I've chosen 75
to be conservative---if we /can/ compile a SIMD16 program, we want to.

Reduces the number of GPU stalls due to fragment shader recompiles
in Left 4 Dead 2 by about 20%, and reduces the duration of many of the
remaining stalls by about half.


I don't like the idea of losing 16-wide on apps where it might have
succeeded if we just tried.  Note that a chunk of register space gets
eaten by things that don't scale with number of pixels, like attribute
setup, push constants, and the MRF hack.


I think that's part of the reason he pick 75% usage as the cut-off 
instead of 50%.



I could be convinced otherwise with some shader-db stats, though.


Yeah, we probably should do that, just to be safe.


However, there's that one shader that spills still, iirc, and we
definitely shouldn't try for 16-wide on that one!

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.

2012-08-22 Thread Matt Turner
On Wed, Aug 22, 2012 at 5:32 PM, Ian Romanick i...@freedesktop.org wrote:
 On 08/21/2012 10:49 PM, Eric Anholt wrote:
 I don't like the idea of losing 16-wide on apps where it might have
 succeeded if we just tried.  Note that a chunk of register space gets
 eaten by things that don't scale with number of pixels, like attribute
 setup, push constants, and the MRF hack.


 I think that's part of the reason he pick 75% usage as the cut-off instead
 of 50%.

Actually, I think he picked 75 because it was the smallest shader in
L4D2 that failed to compile for 16-wide.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.

2012-08-21 Thread Eric Anholt
Kenneth Graunke kenn...@whitecape.org writes:

 16-wide programs use roughly twice as many registers as 8-wide, and we
 don't support spilling in 16-wide.  So if an 8-wide program uses more
 than half the available GRFs, the 16-wide one almost certainly will fail
 to compile during register allocation.

 Not only that, but attempting to compiling such shaders is expensive:
 programs that use a lot of registers tend to be quite complex, meaning
 that we spend more time than usual generating and optimizing code.  If
 we fail at register allocation, we've failed at the last step, after
 needlessly burning through a lot of CPU time.

 To make things worse, such shader compilation typically happens at the
 first draw call using the shader, so it can cause the GPU to stall.

 With all that in mind, it makes sense to short-circuit the 16-wide
 attempt if the 8-wide program uses too many registers.  I've chosen 75
 to be conservative---if we /can/ compile a SIMD16 program, we want to.

 Reduces the number of GPU stalls due to fragment shader recompiles
 in Left 4 Dead 2 by about 20%, and reduces the duration of many of the
 remaining stalls by about half.

I don't like the idea of losing 16-wide on apps where it might have
succeeded if we just tried.  Note that a chunk of register space gets
eaten by things that don't scale with number of pixels, like attribute
setup, push constants, and the MRF hack.

I could be convinced otherwise with some shader-db stats, though.

However, there's that one shader that spills still, iirc, and we
definitely shouldn't try for 16-wide on that one!


pgpdJpnwF6fYB.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.

2012-08-16 Thread Paul Berry
On 15 August 2012 16:37, Kenneth Graunke kenn...@whitecape.org wrote:

 16-wide programs use roughly twice as many registers as 8-wide, and we
 don't support spilling in 16-wide.  So if an 8-wide program uses more
 than half the available GRFs, the 16-wide one almost certainly will fail
 to compile during register allocation.

 Not only that, but attempting to compiling such shaders is expensive:
 programs that use a lot of registers tend to be quite complex, meaning
 that we spend more time than usual generating and optimizing code.  If
 we fail at register allocation, we've failed at the last step, after
 needlessly burning through a lot of CPU time.

 To make things worse, such shader compilation typically happens at the
 first draw call using the shader, so it can cause the GPU to stall.

 With all that in mind, it makes sense to short-circuit the 16-wide
 attempt if the 8-wide program uses too many registers.  I've chosen 75
 to be conservative---if we /can/ compile a SIMD16 program, we want to.

 Reduces the number of GPU stalls due to fragment shader recompiles
 in Left 4 Dead 2 by about 20%, and reduces the duration of many of the
 remaining stalls by about half.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_fs.cpp | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index e2dafdc..a113105 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -2100,7 +2100,10 @@ brw_wm_fs_emit(struct brw_context *brw, struct
 brw_wm_compile *c,
return false;
 }

 -   if (intel-gen = 5  c-prog_data.nr_pull_params == 0) {
 +   if (v.grf_used = 75) {
 +  perf_debug(Too many registers to attempt compiling a 16-wide
 shader; 
 + falling back to 8-wide at a 10-20%% performance
 cost.\n);
 +   } else if (intel-gen = 5  c-prog_data.nr_pull_params == 0) {


It looks like this code will give the perf warning even for situations
where we couldn't do 16-wide anyhow (intel-gen == 4 ||
c-prog_data.nr_pull_params != 0).  To avoid confusing people, we should
probably only give the perf warning if register count is the *only* reason
we can't do 16-wide.


c-dispatch_width = 16;
fs_visitor v2(c, prog, shader);
v2.import_uniforms(v);
 --
 1.7.11.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.

2012-08-15 Thread Kenneth Graunke
16-wide programs use roughly twice as many registers as 8-wide, and we
don't support spilling in 16-wide.  So if an 8-wide program uses more
than half the available GRFs, the 16-wide one almost certainly will fail
to compile during register allocation.

Not only that, but attempting to compiling such shaders is expensive:
programs that use a lot of registers tend to be quite complex, meaning
that we spend more time than usual generating and optimizing code.  If
we fail at register allocation, we've failed at the last step, after
needlessly burning through a lot of CPU time.

To make things worse, such shader compilation typically happens at the
first draw call using the shader, so it can cause the GPU to stall.

With all that in mind, it makes sense to short-circuit the 16-wide
attempt if the 8-wide program uses too many registers.  I've chosen 75
to be conservative---if we /can/ compile a SIMD16 program, we want to.

Reduces the number of GPU stalls due to fragment shader recompiles
in Left 4 Dead 2 by about 20%, and reduces the duration of many of the
remaining stalls by about half.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e2dafdc..a113105 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2100,7 +2100,10 @@ brw_wm_fs_emit(struct brw_context *brw, struct 
brw_wm_compile *c,
   return false;
}
 
-   if (intel-gen = 5  c-prog_data.nr_pull_params == 0) {
+   if (v.grf_used = 75) {
+  perf_debug(Too many registers to attempt compiling a 16-wide shader; 
+ falling back to 8-wide at a 10-20%% performance cost.\n);
+   } else if (intel-gen = 5  c-prog_data.nr_pull_params == 0) {
   c-dispatch_width = 16;
   fs_visitor v2(c, prog, shader);
   v2.import_uniforms(v);
-- 
1.7.11.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev