Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.
On 08/21/2012 10:49 PM, Eric Anholt wrote: I don't like the idea of losing 16-wide on apps where it might have succeeded if we just tried. Note that a chunk of register space gets eaten by things that don't scale with number of pixels, like attribute setup, push constants, and the MRF hack. Nor do I, which is why I was trying to be conservative. I figured the attribute setup could be accounted for by a large enough fudge factor, but I forgot about the MRF hack region. I could be convinced otherwise with some shader-db stats, though. Good call. total instructions in shared programs: 507477 - 507477 (0.00%) instructions in affected programs: 0 - 0 However, the margin is more slim than I thought: Unigine Sanctuary starts losing out on 16-wide with the cut-off at 70 (my patch was 75). Perhaps I should refine the heuristic to account for the non-scaling sections. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.
Kenneth Graunke kenn...@whitecape.org writes: On 08/21/2012 10:49 PM, Eric Anholt wrote: I don't like the idea of losing 16-wide on apps where it might have succeeded if we just tried. Note that a chunk of register space gets eaten by things that don't scale with number of pixels, like attribute setup, push constants, and the MRF hack. Nor do I, which is why I was trying to be conservative. I figured the attribute setup could be accounted for by a large enough fudge factor, but I forgot about the MRF hack region. I could be convinced otherwise with some shader-db stats, though. Good call. total instructions in shared programs: 507477 - 507477 (0.00%) instructions in affected programs: 0 - 0 However, the margin is more slim than I thought: Unigine Sanctuary starts losing out on 16-wide with the cut-off at 70 (my patch was 75). Perhaps I should refine the heuristic to account for the non-scaling sections. Sounds good enough for me. pgpnsQGAw6vw0.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.
On 08/21/2012 10:49 PM, Eric Anholt wrote: Kenneth Graunke kenn...@whitecape.org writes: 16-wide programs use roughly twice as many registers as 8-wide, and we don't support spilling in 16-wide. So if an 8-wide program uses more than half the available GRFs, the 16-wide one almost certainly will fail to compile during register allocation. Not only that, but attempting to compiling such shaders is expensive: programs that use a lot of registers tend to be quite complex, meaning that we spend more time than usual generating and optimizing code. If we fail at register allocation, we've failed at the last step, after needlessly burning through a lot of CPU time. To make things worse, such shader compilation typically happens at the first draw call using the shader, so it can cause the GPU to stall. With all that in mind, it makes sense to short-circuit the 16-wide attempt if the 8-wide program uses too many registers. I've chosen 75 to be conservative---if we /can/ compile a SIMD16 program, we want to. Reduces the number of GPU stalls due to fragment shader recompiles in Left 4 Dead 2 by about 20%, and reduces the duration of many of the remaining stalls by about half. I don't like the idea of losing 16-wide on apps where it might have succeeded if we just tried. Note that a chunk of register space gets eaten by things that don't scale with number of pixels, like attribute setup, push constants, and the MRF hack. I think that's part of the reason he pick 75% usage as the cut-off instead of 50%. I could be convinced otherwise with some shader-db stats, though. Yeah, we probably should do that, just to be safe. However, there's that one shader that spills still, iirc, and we definitely shouldn't try for 16-wide on that one! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.
On Wed, Aug 22, 2012 at 5:32 PM, Ian Romanick i...@freedesktop.org wrote: On 08/21/2012 10:49 PM, Eric Anholt wrote: I don't like the idea of losing 16-wide on apps where it might have succeeded if we just tried. Note that a chunk of register space gets eaten by things that don't scale with number of pixels, like attribute setup, push constants, and the MRF hack. I think that's part of the reason he pick 75% usage as the cut-off instead of 50%. Actually, I think he picked 75 because it was the smallest shader in L4D2 that failed to compile for 16-wide. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.
Kenneth Graunke kenn...@whitecape.org writes: 16-wide programs use roughly twice as many registers as 8-wide, and we don't support spilling in 16-wide. So if an 8-wide program uses more than half the available GRFs, the 16-wide one almost certainly will fail to compile during register allocation. Not only that, but attempting to compiling such shaders is expensive: programs that use a lot of registers tend to be quite complex, meaning that we spend more time than usual generating and optimizing code. If we fail at register allocation, we've failed at the last step, after needlessly burning through a lot of CPU time. To make things worse, such shader compilation typically happens at the first draw call using the shader, so it can cause the GPU to stall. With all that in mind, it makes sense to short-circuit the 16-wide attempt if the 8-wide program uses too many registers. I've chosen 75 to be conservative---if we /can/ compile a SIMD16 program, we want to. Reduces the number of GPU stalls due to fragment shader recompiles in Left 4 Dead 2 by about 20%, and reduces the duration of many of the remaining stalls by about half. I don't like the idea of losing 16-wide on apps where it might have succeeded if we just tried. Note that a chunk of register space gets eaten by things that don't scale with number of pixels, like attribute setup, push constants, and the MRF hack. I could be convinced otherwise with some shader-db stats, though. However, there's that one shader that spills still, iirc, and we definitely shouldn't try for 16-wide on that one! pgpdJpnwF6fYB.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.
On 15 August 2012 16:37, Kenneth Graunke kenn...@whitecape.org wrote: 16-wide programs use roughly twice as many registers as 8-wide, and we don't support spilling in 16-wide. So if an 8-wide program uses more than half the available GRFs, the 16-wide one almost certainly will fail to compile during register allocation. Not only that, but attempting to compiling such shaders is expensive: programs that use a lot of registers tend to be quite complex, meaning that we spend more time than usual generating and optimizing code. If we fail at register allocation, we've failed at the last step, after needlessly burning through a lot of CPU time. To make things worse, such shader compilation typically happens at the first draw call using the shader, so it can cause the GPU to stall. With all that in mind, it makes sense to short-circuit the 16-wide attempt if the 8-wide program uses too many registers. I've chosen 75 to be conservative---if we /can/ compile a SIMD16 program, we want to. Reduces the number of GPU stalls due to fragment shader recompiles in Left 4 Dead 2 by about 20%, and reduces the duration of many of the remaining stalls by about half. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index e2dafdc..a113105 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2100,7 +2100,10 @@ brw_wm_fs_emit(struct brw_context *brw, struct brw_wm_compile *c, return false; } - if (intel-gen = 5 c-prog_data.nr_pull_params == 0) { + if (v.grf_used = 75) { + perf_debug(Too many registers to attempt compiling a 16-wide shader; + falling back to 8-wide at a 10-20%% performance cost.\n); + } else if (intel-gen = 5 c-prog_data.nr_pull_params == 0) { It looks like this code will give the perf warning even for situations where we couldn't do 16-wide anyhow (intel-gen == 4 || c-prog_data.nr_pull_params != 0). To avoid confusing people, we should probably only give the perf warning if register count is the *only* reason we can't do 16-wide. c-dispatch_width = 16; fs_visitor v2(c, prog, shader); v2.import_uniforms(v); -- 1.7.11.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Don't try 16-wide if 8-wide uses more than half the registers.
16-wide programs use roughly twice as many registers as 8-wide, and we don't support spilling in 16-wide. So if an 8-wide program uses more than half the available GRFs, the 16-wide one almost certainly will fail to compile during register allocation. Not only that, but attempting to compiling such shaders is expensive: programs that use a lot of registers tend to be quite complex, meaning that we spend more time than usual generating and optimizing code. If we fail at register allocation, we've failed at the last step, after needlessly burning through a lot of CPU time. To make things worse, such shader compilation typically happens at the first draw call using the shader, so it can cause the GPU to stall. With all that in mind, it makes sense to short-circuit the 16-wide attempt if the 8-wide program uses too many registers. I've chosen 75 to be conservative---if we /can/ compile a SIMD16 program, we want to. Reduces the number of GPU stalls due to fragment shader recompiles in Left 4 Dead 2 by about 20%, and reduces the duration of many of the remaining stalls by about half. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index e2dafdc..a113105 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2100,7 +2100,10 @@ brw_wm_fs_emit(struct brw_context *brw, struct brw_wm_compile *c, return false; } - if (intel-gen = 5 c-prog_data.nr_pull_params == 0) { + if (v.grf_used = 75) { + perf_debug(Too many registers to attempt compiling a 16-wide shader; + falling back to 8-wide at a 10-20%% performance cost.\n); + } else if (intel-gen = 5 c-prog_data.nr_pull_params == 0) { c-dispatch_width = 16; fs_visitor v2(c, prog, shader); v2.import_uniforms(v); -- 1.7.11.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev