Re: [Mesa-dev] Clamp/saturate optimizations v3

2014-08-28 Thread Abdiel Janulgue


On 26.08.2014 15:17, Abdiel Janulgue wrote:
> On 23.08.2014 02:57, Ian Romanick wrote:
>> Patches 2, 3, 4, 5, 6, 9, 10, 11, 12, 15, and 17 are
>>
>> Reviewed-by: Ian Romanick 
>>
>> (Additional question below.)
>>
>> On 08/18/2014 05:17 AM, Abdiel Janulgue wrote:
>>> v3 of clamp and saturate optimizations
>>>
>>> Changes since v1: 
>>>  - Only remove the old try_emit_saturate operations after the new 
>>> optimizations are
>>>in place. (Matt, Ian)
>>>  - Output [min/max](saturate(x),b) instead of saturate([min/max](x,b)) as 
>>> suggested
>>>by Ilia Mirkin.
>>>  - The change above required some refactoring in the fs/vec4 backend to 
>>> allow
>>>propagation of certain instructions with saturate flag to SEL. For other 
>>> instructions,
>>>we don't propagate saturate instructions, similar to the previous 
>>> behaviour.
>>> Since v2:
>>>  - Fix comments to reflect we are doing a commutative operation, add 
>>> missing conditions
>>>when optimizing clamp in opt_algebraic pass.
>>>  - Refactor try_emit_saturate() in i965/fs instead of completely removing 
>>> it. This fixed a
>>>a regression where the changes emitted an (extra) unnecessary saturated 
>>> mov when the 
>>>expression generating src can do saturate directly instead.
>>>  - Fix regression in the i965/vec4 copy-propagate optimization caused by 
>>> ignoring 
>>>channels in the propagated instruction.
>>>  - Count generated loops from the fs/vec4 generator.
>>>
>>> Results from our shader-db:
>>>
>>> total instructions in shared programs: 4538627 -> 4560104 (0.47%)
>>> instructions in affected programs: 45144 -> 66621 (47.57%)
>>> total loops in shared programs:887 -> 711 (-19.84%)
>>> GAINED:0
>>> LOST:  36
>>
>> Can we try benchmarking the applications that have shaders that lost
>> SIMD16 before pushing these changes?  I'd hate to have an "optimization"
>> that actually makes performance worse. :(
> 
> There were a couple of games that lost simd16: heroes-of-newerth and
> savage2. However, it is interesting that the same games also reported an
> even higher amount of shaders that gained improvements (loop reductions
> and instruction count reduction) with the saturate optimizations
> switched on.
> 
> I did a preliminary comparison on HoN:
> 
> x Heroes_Of_Newearth_master
> + Heroes_Of_Newearth_sat_fixes
> 
> N   Min   MaxMedian   AvgStddev
> x  11  53.1  55.8  55.1 54.872727 0.83317574
> +  11  54.256  55.1 55.036364 0.59375538
> No difference proven at 95.0% confidence
> 
> I'll post the results of savage2 as soon as I get it running.

Savage 2 results (maximum-quality settings, AA turned off):

x savage2.master
+ savage2.sat_fixes

N   Min   MaxMedian   Avg Stddev
x  11  16.1  35.2  29.7 27.881818  5.7516638
+  11  18.235  30.7 28.718182  5.4258305
No difference proven at 95.0% confidence

Both of the apps are tested on IVB.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Clamp/saturate optimizations v3

2014-08-26 Thread Abdiel Janulgue
On 23.08.2014 02:57, Ian Romanick wrote:
> Patches 2, 3, 4, 5, 6, 9, 10, 11, 12, 15, and 17 are
> 
> Reviewed-by: Ian Romanick 
> 
> (Additional question below.)
> 
> On 08/18/2014 05:17 AM, Abdiel Janulgue wrote:
>> v3 of clamp and saturate optimizations
>>
>> Changes since v1: 
>>  - Only remove the old try_emit_saturate operations after the new 
>> optimizations are
>>in place. (Matt, Ian)
>>  - Output [min/max](saturate(x),b) instead of saturate([min/max](x,b)) as 
>> suggested
>>by Ilia Mirkin.
>>  - The change above required some refactoring in the fs/vec4 backend to allow
>>propagation of certain instructions with saturate flag to SEL. For other 
>> instructions,
>>we don't propagate saturate instructions, similar to the previous 
>> behaviour.
>> Since v2:
>>  - Fix comments to reflect we are doing a commutative operation, add missing 
>> conditions
>>when optimizing clamp in opt_algebraic pass.
>>  - Refactor try_emit_saturate() in i965/fs instead of completely removing 
>> it. This fixed a
>>a regression where the changes emitted an (extra) unnecessary saturated 
>> mov when the 
>>expression generating src can do saturate directly instead.
>>  - Fix regression in the i965/vec4 copy-propagate optimization caused by 
>> ignoring 
>>channels in the propagated instruction.
>>  - Count generated loops from the fs/vec4 generator.
>>
>> Results from our shader-db:
>>
>> total instructions in shared programs: 4538627 -> 4560104 (0.47%)
>> instructions in affected programs: 45144 -> 66621 (47.57%)
>> total loops in shared programs:887 -> 711 (-19.84%)
>> GAINED:0
>> LOST:  36
> 
> Can we try benchmarking the applications that have shaders that lost
> SIMD16 before pushing these changes?  I'd hate to have an "optimization"
> that actually makes performance worse. :(

There were a couple of games that lost simd16: heroes-of-newerth and
savage2. However, it is interesting that the same games also reported an
even higher amount of shaders that gained improvements (loop reductions
and instruction count reduction) with the saturate optimizations
switched on.

I did a preliminary comparison on HoN:

x Heroes_Of_Newearth_master
+ Heroes_Of_Newearth_sat_fixes

N   Min   MaxMedian   AvgStddev
x  11  53.1  55.8  55.1 54.872727 0.83317574
+  11  54.256  55.1 55.036364 0.59375538
No difference proven at 95.0% confidence

I'll post the results of savage2 as soon as I get it running.

> 
>> I modified shader-db a bit to catch loops unrolls. The shaders that show 
>> increase in
>> instruction count are all due to the loop unroll pass triggered by this 
>> optimization
>> on games that contain looped clamp/saturate operation. The unroll pass also
>> resulted in a few shaders with looped clamp/sat skipping SIMD16 generation.
>>
>> ** No piglit regressions observed **
>>
>> Abdiel Janulgue (17):
>>   i965/vec4/fs: Count loops in shader debug
>>   glsl: Add ir_unop_saturate
>>   glsl: Add constant evaluation of ir_unop_saturate
>>   glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)
>>   ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate
>>   ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate
>>   i965/fs: Add support for ir_unop_saturate
>>   i965/vec4: Add support for ir_unop_saturate
>>   glsl: Implement saturate as ir_unop_saturate
>>   glsl: Optimize clamp(x, 0, 1) as saturate(x)
>>   glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)
>>   glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)
>>   i965/fs: Allow propagation of instructions with saturate flag to sel
>>   i965/vec4: Allow propagation of instructions with saturate flag to sel
>>   ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate
>>   i965/fs: Refactor try_emit_saturate
>>   i965/vec4: Remove try_emit_saturate
>>
>>  src/glsl/ir.cpp  |  2 +
>>  src/glsl/ir.h|  1 +
>>  src/glsl/ir_builder.cpp  |  6 +-
>>  src/glsl/ir_constant_expression.cpp  |  6 ++
>>  src/glsl/ir_optimization.h   |  1 +
>>  src/glsl/ir_validate.cpp |  1 +
>>  src/glsl/lower_instructions.cpp  | 29 
>>  src/glsl/opt_algebraic.cpp   | 98 
>> ++
>>  src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp |  1 +
>>  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp| 18 -
>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  6 +-
>>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 27 ---
>>  src/mesa/drivers/dri/i965/brw_vec4.h |  2 +-
>>  s

Re: [Mesa-dev] Clamp/saturate optimizations v3

2014-08-22 Thread Ian Romanick
Patches 2, 3, 4, 5, 6, 9, 10, 11, 12, 15, and 17 are

Reviewed-by: Ian Romanick 

(Additional question below.)

On 08/18/2014 05:17 AM, Abdiel Janulgue wrote:
> v3 of clamp and saturate optimizations
> 
> Changes since v1: 
>  - Only remove the old try_emit_saturate operations after the new 
> optimizations are
>in place. (Matt, Ian)
>  - Output [min/max](saturate(x),b) instead of saturate([min/max](x,b)) as 
> suggested
>by Ilia Mirkin.
>  - The change above required some refactoring in the fs/vec4 backend to allow
>propagation of certain instructions with saturate flag to SEL. For other 
> instructions,
>we don't propagate saturate instructions, similar to the previous 
> behaviour.
> Since v2:
>  - Fix comments to reflect we are doing a commutative operation, add missing 
> conditions
>when optimizing clamp in opt_algebraic pass.
>  - Refactor try_emit_saturate() in i965/fs instead of completely removing it. 
> This fixed a
>a regression where the changes emitted an (extra) unnecessary saturated 
> mov when the 
>expression generating src can do saturate directly instead.
>  - Fix regression in the i965/vec4 copy-propagate optimization caused by 
> ignoring 
>channels in the propagated instruction.
>  - Count generated loops from the fs/vec4 generator.
> 
> Results from our shader-db:
> 
> total instructions in shared programs: 4538627 -> 4560104 (0.47%)
> instructions in affected programs: 45144 -> 66621 (47.57%)
> total loops in shared programs:887 -> 711 (-19.84%)
> GAINED:0
> LOST:  36

Can we try benchmarking the applications that have shaders that lost
SIMD16 before pushing these changes?  I'd hate to have an "optimization"
that actually makes performance worse. :(

> I modified shader-db a bit to catch loops unrolls. The shaders that show 
> increase in
> instruction count are all due to the loop unroll pass triggered by this 
> optimization
> on games that contain looped clamp/saturate operation. The unroll pass also
> resulted in a few shaders with looped clamp/sat skipping SIMD16 generation.
> 
> ** No piglit regressions observed **
> 
> Abdiel Janulgue (17):
>   i965/vec4/fs: Count loops in shader debug
>   glsl: Add ir_unop_saturate
>   glsl: Add constant evaluation of ir_unop_saturate
>   glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)
>   ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate
>   ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate
>   i965/fs: Add support for ir_unop_saturate
>   i965/vec4: Add support for ir_unop_saturate
>   glsl: Implement saturate as ir_unop_saturate
>   glsl: Optimize clamp(x, 0, 1) as saturate(x)
>   glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)
>   glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)
>   i965/fs: Allow propagation of instructions with saturate flag to sel
>   i965/vec4: Allow propagation of instructions with saturate flag to sel
>   ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate
>   i965/fs: Refactor try_emit_saturate
>   i965/vec4: Remove try_emit_saturate
> 
>  src/glsl/ir.cpp  |  2 +
>  src/glsl/ir.h|  1 +
>  src/glsl/ir_builder.cpp  |  6 +-
>  src/glsl/ir_constant_expression.cpp  |  6 ++
>  src/glsl/ir_optimization.h   |  1 +
>  src/glsl/ir_validate.cpp |  1 +
>  src/glsl/lower_instructions.cpp  | 29 
>  src/glsl/opt_algebraic.cpp   | 98 
> ++
>  src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp |  1 +
>  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp| 18 -
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  6 +-
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 27 ---
>  src/mesa/drivers/dri/i965/brw_vec4.h |  2 +-
>  src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp  | 85 
> +++---
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  6 +-
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   | 25 ++-
>  src/mesa/program/ir_to_mesa.cpp  | 59 
> +++-
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp   | 63 
> +++--
>  18 files changed, 261 insertions(+), 175 deletions(-)
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Clamp/saturate optimizations v3

2014-08-20 Thread Abdiel Janulgue


On 20.08.2014 05:40, Matt Turner wrote:
> 
> Patches 2-4, (5-9 already reviewed), 10, 13-16, (17 already reviewed) are
> 
> Reviewed-by: Matt Turner 
> 
> I've requested a change come before patch 1, and then rebased patch 1
> should be an easy R-b.
> 
> I'll need to take a closer look at 11 and 12.
> 

Thanks for the review! I'll send the updated changes
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Clamp/saturate optimizations v3

2014-08-19 Thread Matt Turner
On Mon, Aug 18, 2014 at 5:17 AM, Abdiel Janulgue
 wrote:
> v3 of clamp and saturate optimizations

Patches 2-4, (5-9 already reviewed), 10, 13-16, (17 already reviewed) are

Reviewed-by: Matt Turner 

I've requested a change come before patch 1, and then rebased patch 1
should be an easy R-b.

I'll need to take a closer look at 11 and 12.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Clamp/saturate optimizations v3

2014-08-18 Thread Abdiel Janulgue
v3 of clamp and saturate optimizations

Changes since v1: 
 - Only remove the old try_emit_saturate operations after the new optimizations 
are
   in place. (Matt, Ian)
 - Output [min/max](saturate(x),b) instead of saturate([min/max](x,b)) as 
suggested
   by Ilia Mirkin.
 - The change above required some refactoring in the fs/vec4 backend to allow
   propagation of certain instructions with saturate flag to SEL. For other 
instructions,
   we don't propagate saturate instructions, similar to the previous behaviour.
Since v2:
 - Fix comments to reflect we are doing a commutative operation, add missing 
conditions
   when optimizing clamp in opt_algebraic pass.
 - Refactor try_emit_saturate() in i965/fs instead of completely removing it. 
This fixed a
   a regression where the changes emitted an (extra) unnecessary saturated mov 
when the 
   expression generating src can do saturate directly instead.
 - Fix regression in the i965/vec4 copy-propagate optimization caused by 
ignoring 
   channels in the propagated instruction.
 - Count generated loops from the fs/vec4 generator.

Results from our shader-db:

total instructions in shared programs: 4538627 -> 4560104 (0.47%)
instructions in affected programs: 45144 -> 66621 (47.57%)
total loops in shared programs:887 -> 711 (-19.84%)
GAINED:0
LOST:  36

I modified shader-db a bit to catch loops unrolls. The shaders that show 
increase in
instruction count are all due to the loop unroll pass triggered by this 
optimization
on games that contain looped clamp/saturate operation. The unroll pass also
resulted in a few shaders with looped clamp/sat skipping SIMD16 generation.

** No piglit regressions observed **

Abdiel Janulgue (17):
  i965/vec4/fs: Count loops in shader debug
  glsl: Add ir_unop_saturate
  glsl: Add constant evaluation of ir_unop_saturate
  glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)
  ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate
  ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate
  i965/fs: Add support for ir_unop_saturate
  i965/vec4: Add support for ir_unop_saturate
  glsl: Implement saturate as ir_unop_saturate
  glsl: Optimize clamp(x, 0, 1) as saturate(x)
  glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)
  glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)
  i965/fs: Allow propagation of instructions with saturate flag to sel
  i965/vec4: Allow propagation of instructions with saturate flag to sel
  ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate
  i965/fs: Refactor try_emit_saturate
  i965/vec4: Remove try_emit_saturate

 src/glsl/ir.cpp  |  2 +
 src/glsl/ir.h|  1 +
 src/glsl/ir_builder.cpp  |  6 +-
 src/glsl/ir_constant_expression.cpp  |  6 ++
 src/glsl/ir_optimization.h   |  1 +
 src/glsl/ir_validate.cpp |  1 +
 src/glsl/lower_instructions.cpp  | 29 
 src/glsl/opt_algebraic.cpp   | 98 
++
 src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp |  1 +
 src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp| 18 -
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  6 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 27 ---
 src/mesa/drivers/dri/i965/brw_vec4.h |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp  | 85 
+++---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  6 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   | 25 ++-
 src/mesa/program/ir_to_mesa.cpp  | 59 +++-
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp   | 63 +++--
 18 files changed, 261 insertions(+), 175 deletions(-)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev