Re: [Mesa-dev] Clamp/saturate optimizations v3
On 26.08.2014 15:17, Abdiel Janulgue wrote: > On 23.08.2014 02:57, Ian Romanick wrote: >> Patches 2, 3, 4, 5, 6, 9, 10, 11, 12, 15, and 17 are >> >> Reviewed-by: Ian Romanick >> >> (Additional question below.) >> >> On 08/18/2014 05:17 AM, Abdiel Janulgue wrote: >>> v3 of clamp and saturate optimizations >>> >>> Changes since v1: >>> - Only remove the old try_emit_saturate operations after the new >>> optimizations are >>>in place. (Matt, Ian) >>> - Output [min/max](saturate(x),b) instead of saturate([min/max](x,b)) as >>> suggested >>>by Ilia Mirkin. >>> - The change above required some refactoring in the fs/vec4 backend to >>> allow >>>propagation of certain instructions with saturate flag to SEL. For other >>> instructions, >>>we don't propagate saturate instructions, similar to the previous >>> behaviour. >>> Since v2: >>> - Fix comments to reflect we are doing a commutative operation, add >>> missing conditions >>>when optimizing clamp in opt_algebraic pass. >>> - Refactor try_emit_saturate() in i965/fs instead of completely removing >>> it. This fixed a >>>a regression where the changes emitted an (extra) unnecessary saturated >>> mov when the >>>expression generating src can do saturate directly instead. >>> - Fix regression in the i965/vec4 copy-propagate optimization caused by >>> ignoring >>>channels in the propagated instruction. >>> - Count generated loops from the fs/vec4 generator. >>> >>> Results from our shader-db: >>> >>> total instructions in shared programs: 4538627 -> 4560104 (0.47%) >>> instructions in affected programs: 45144 -> 66621 (47.57%) >>> total loops in shared programs:887 -> 711 (-19.84%) >>> GAINED:0 >>> LOST: 36 >> >> Can we try benchmarking the applications that have shaders that lost >> SIMD16 before pushing these changes? I'd hate to have an "optimization" >> that actually makes performance worse. :( > > There were a couple of games that lost simd16: heroes-of-newerth and > savage2. However, it is interesting that the same games also reported an > even higher amount of shaders that gained improvements (loop reductions > and instruction count reduction) with the saturate optimizations > switched on. > > I did a preliminary comparison on HoN: > > x Heroes_Of_Newearth_master > + Heroes_Of_Newearth_sat_fixes > > N Min MaxMedian AvgStddev > x 11 53.1 55.8 55.1 54.872727 0.83317574 > + 11 54.256 55.1 55.036364 0.59375538 > No difference proven at 95.0% confidence > > I'll post the results of savage2 as soon as I get it running. Savage 2 results (maximum-quality settings, AA turned off): x savage2.master + savage2.sat_fixes N Min MaxMedian Avg Stddev x 11 16.1 35.2 29.7 27.881818 5.7516638 + 11 18.235 30.7 28.718182 5.4258305 No difference proven at 95.0% confidence Both of the apps are tested on IVB. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Clamp/saturate optimizations v3
On 23.08.2014 02:57, Ian Romanick wrote: > Patches 2, 3, 4, 5, 6, 9, 10, 11, 12, 15, and 17 are > > Reviewed-by: Ian Romanick > > (Additional question below.) > > On 08/18/2014 05:17 AM, Abdiel Janulgue wrote: >> v3 of clamp and saturate optimizations >> >> Changes since v1: >> - Only remove the old try_emit_saturate operations after the new >> optimizations are >>in place. (Matt, Ian) >> - Output [min/max](saturate(x),b) instead of saturate([min/max](x,b)) as >> suggested >>by Ilia Mirkin. >> - The change above required some refactoring in the fs/vec4 backend to allow >>propagation of certain instructions with saturate flag to SEL. For other >> instructions, >>we don't propagate saturate instructions, similar to the previous >> behaviour. >> Since v2: >> - Fix comments to reflect we are doing a commutative operation, add missing >> conditions >>when optimizing clamp in opt_algebraic pass. >> - Refactor try_emit_saturate() in i965/fs instead of completely removing >> it. This fixed a >>a regression where the changes emitted an (extra) unnecessary saturated >> mov when the >>expression generating src can do saturate directly instead. >> - Fix regression in the i965/vec4 copy-propagate optimization caused by >> ignoring >>channels in the propagated instruction. >> - Count generated loops from the fs/vec4 generator. >> >> Results from our shader-db: >> >> total instructions in shared programs: 4538627 -> 4560104 (0.47%) >> instructions in affected programs: 45144 -> 66621 (47.57%) >> total loops in shared programs:887 -> 711 (-19.84%) >> GAINED:0 >> LOST: 36 > > Can we try benchmarking the applications that have shaders that lost > SIMD16 before pushing these changes? I'd hate to have an "optimization" > that actually makes performance worse. :( There were a couple of games that lost simd16: heroes-of-newerth and savage2. However, it is interesting that the same games also reported an even higher amount of shaders that gained improvements (loop reductions and instruction count reduction) with the saturate optimizations switched on. I did a preliminary comparison on HoN: x Heroes_Of_Newearth_master + Heroes_Of_Newearth_sat_fixes N Min MaxMedian AvgStddev x 11 53.1 55.8 55.1 54.872727 0.83317574 + 11 54.256 55.1 55.036364 0.59375538 No difference proven at 95.0% confidence I'll post the results of savage2 as soon as I get it running. > >> I modified shader-db a bit to catch loops unrolls. The shaders that show >> increase in >> instruction count are all due to the loop unroll pass triggered by this >> optimization >> on games that contain looped clamp/saturate operation. The unroll pass also >> resulted in a few shaders with looped clamp/sat skipping SIMD16 generation. >> >> ** No piglit regressions observed ** >> >> Abdiel Janulgue (17): >> i965/vec4/fs: Count loops in shader debug >> glsl: Add ir_unop_saturate >> glsl: Add constant evaluation of ir_unop_saturate >> glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1) >> ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate >> ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate >> i965/fs: Add support for ir_unop_saturate >> i965/vec4: Add support for ir_unop_saturate >> glsl: Implement saturate as ir_unop_saturate >> glsl: Optimize clamp(x, 0, 1) as saturate(x) >> glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b) >> glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b) >> i965/fs: Allow propagation of instructions with saturate flag to sel >> i965/vec4: Allow propagation of instructions with saturate flag to sel >> ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate >> i965/fs: Refactor try_emit_saturate >> i965/vec4: Remove try_emit_saturate >> >> src/glsl/ir.cpp | 2 + >> src/glsl/ir.h| 1 + >> src/glsl/ir_builder.cpp | 6 +- >> src/glsl/ir_constant_expression.cpp | 6 ++ >> src/glsl/ir_optimization.h | 1 + >> src/glsl/ir_validate.cpp | 1 + >> src/glsl/lower_instructions.cpp | 29 >> src/glsl/opt_algebraic.cpp | 98 >> ++ >> src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp | 1 + >> src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp| 18 - >> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 6 +- >> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 27 --- >> src/mesa/drivers/dri/i965/brw_vec4.h | 2 +- >> s
Re: [Mesa-dev] Clamp/saturate optimizations v3
Patches 2, 3, 4, 5, 6, 9, 10, 11, 12, 15, and 17 are Reviewed-by: Ian Romanick (Additional question below.) On 08/18/2014 05:17 AM, Abdiel Janulgue wrote: > v3 of clamp and saturate optimizations > > Changes since v1: > - Only remove the old try_emit_saturate operations after the new > optimizations are >in place. (Matt, Ian) > - Output [min/max](saturate(x),b) instead of saturate([min/max](x,b)) as > suggested >by Ilia Mirkin. > - The change above required some refactoring in the fs/vec4 backend to allow >propagation of certain instructions with saturate flag to SEL. For other > instructions, >we don't propagate saturate instructions, similar to the previous > behaviour. > Since v2: > - Fix comments to reflect we are doing a commutative operation, add missing > conditions >when optimizing clamp in opt_algebraic pass. > - Refactor try_emit_saturate() in i965/fs instead of completely removing it. > This fixed a >a regression where the changes emitted an (extra) unnecessary saturated > mov when the >expression generating src can do saturate directly instead. > - Fix regression in the i965/vec4 copy-propagate optimization caused by > ignoring >channels in the propagated instruction. > - Count generated loops from the fs/vec4 generator. > > Results from our shader-db: > > total instructions in shared programs: 4538627 -> 4560104 (0.47%) > instructions in affected programs: 45144 -> 66621 (47.57%) > total loops in shared programs:887 -> 711 (-19.84%) > GAINED:0 > LOST: 36 Can we try benchmarking the applications that have shaders that lost SIMD16 before pushing these changes? I'd hate to have an "optimization" that actually makes performance worse. :( > I modified shader-db a bit to catch loops unrolls. The shaders that show > increase in > instruction count are all due to the loop unroll pass triggered by this > optimization > on games that contain looped clamp/saturate operation. The unroll pass also > resulted in a few shaders with looped clamp/sat skipping SIMD16 generation. > > ** No piglit regressions observed ** > > Abdiel Janulgue (17): > i965/vec4/fs: Count loops in shader debug > glsl: Add ir_unop_saturate > glsl: Add constant evaluation of ir_unop_saturate > glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1) > ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate > ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate > i965/fs: Add support for ir_unop_saturate > i965/vec4: Add support for ir_unop_saturate > glsl: Implement saturate as ir_unop_saturate > glsl: Optimize clamp(x, 0, 1) as saturate(x) > glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b) > glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b) > i965/fs: Allow propagation of instructions with saturate flag to sel > i965/vec4: Allow propagation of instructions with saturate flag to sel > ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate > i965/fs: Refactor try_emit_saturate > i965/vec4: Remove try_emit_saturate > > src/glsl/ir.cpp | 2 + > src/glsl/ir.h| 1 + > src/glsl/ir_builder.cpp | 6 +- > src/glsl/ir_constant_expression.cpp | 6 ++ > src/glsl/ir_optimization.h | 1 + > src/glsl/ir_validate.cpp | 1 + > src/glsl/lower_instructions.cpp | 29 > src/glsl/opt_algebraic.cpp | 98 > ++ > src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp | 1 + > src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp| 18 - > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 6 +- > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 27 --- > src/mesa/drivers/dri/i965/brw_vec4.h | 2 +- > src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 85 > +++--- > src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 6 +- > src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 25 ++- > src/mesa/program/ir_to_mesa.cpp | 59 > +++- > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 63 > +++-- > 18 files changed, 261 insertions(+), 175 deletions(-) > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Clamp/saturate optimizations v3
On 20.08.2014 05:40, Matt Turner wrote: > > Patches 2-4, (5-9 already reviewed), 10, 13-16, (17 already reviewed) are > > Reviewed-by: Matt Turner > > I've requested a change come before patch 1, and then rebased patch 1 > should be an easy R-b. > > I'll need to take a closer look at 11 and 12. > Thanks for the review! I'll send the updated changes ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Clamp/saturate optimizations v3
On Mon, Aug 18, 2014 at 5:17 AM, Abdiel Janulgue wrote: > v3 of clamp and saturate optimizations Patches 2-4, (5-9 already reviewed), 10, 13-16, (17 already reviewed) are Reviewed-by: Matt Turner I've requested a change come before patch 1, and then rebased patch 1 should be an easy R-b. I'll need to take a closer look at 11 and 12. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Clamp/saturate optimizations v3
v3 of clamp and saturate optimizations Changes since v1: - Only remove the old try_emit_saturate operations after the new optimizations are in place. (Matt, Ian) - Output [min/max](saturate(x),b) instead of saturate([min/max](x,b)) as suggested by Ilia Mirkin. - The change above required some refactoring in the fs/vec4 backend to allow propagation of certain instructions with saturate flag to SEL. For other instructions, we don't propagate saturate instructions, similar to the previous behaviour. Since v2: - Fix comments to reflect we are doing a commutative operation, add missing conditions when optimizing clamp in opt_algebraic pass. - Refactor try_emit_saturate() in i965/fs instead of completely removing it. This fixed a a regression where the changes emitted an (extra) unnecessary saturated mov when the expression generating src can do saturate directly instead. - Fix regression in the i965/vec4 copy-propagate optimization caused by ignoring channels in the propagated instruction. - Count generated loops from the fs/vec4 generator. Results from our shader-db: total instructions in shared programs: 4538627 -> 4560104 (0.47%) instructions in affected programs: 45144 -> 66621 (47.57%) total loops in shared programs:887 -> 711 (-19.84%) GAINED:0 LOST: 36 I modified shader-db a bit to catch loops unrolls. The shaders that show increase in instruction count are all due to the loop unroll pass triggered by this optimization on games that contain looped clamp/saturate operation. The unroll pass also resulted in a few shaders with looped clamp/sat skipping SIMD16 generation. ** No piglit regressions observed ** Abdiel Janulgue (17): i965/vec4/fs: Count loops in shader debug glsl: Add ir_unop_saturate glsl: Add constant evaluation of ir_unop_saturate glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1) ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate i965/fs: Add support for ir_unop_saturate i965/vec4: Add support for ir_unop_saturate glsl: Implement saturate as ir_unop_saturate glsl: Optimize clamp(x, 0, 1) as saturate(x) glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b) glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b) i965/fs: Allow propagation of instructions with saturate flag to sel i965/vec4: Allow propagation of instructions with saturate flag to sel ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate i965/fs: Refactor try_emit_saturate i965/vec4: Remove try_emit_saturate src/glsl/ir.cpp | 2 + src/glsl/ir.h| 1 + src/glsl/ir_builder.cpp | 6 +- src/glsl/ir_constant_expression.cpp | 6 ++ src/glsl/ir_optimization.h | 1 + src/glsl/ir_validate.cpp | 1 + src/glsl/lower_instructions.cpp | 29 src/glsl/opt_algebraic.cpp | 98 ++ src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp | 1 + src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp| 18 - src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 6 +- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 27 --- src/mesa/drivers/dri/i965/brw_vec4.h | 2 +- src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 85 +++--- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 6 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 25 ++- src/mesa/program/ir_to_mesa.cpp | 59 +++- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 63 +++-- 18 files changed, 261 insertions(+), 175 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev