Re: [Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations

2019-03-03 Thread Iago Toral
On Thu, 2019-02-28 at 16:20 -0800, Ian Romanick wrote:
> On 2/28/19 4:47 AM, Iago Toral wrote:
> > On Wed, 2019-02-27 at 17:04 -0800, Ian Romanick wrote:
> > > On 2/27/19 4:45 AM, Iago Toral Quiroga wrote:
> > > > Now that we propagate constants to the first source of 2src
> > > > instructions we
> > > > see more opportunities of constant folding in the backend.
> > > > 
> > > > Shader-db results on KBL:
> > > > 
> > > > total instructions in shared programs: 14965607 -> 14855983 (-
> > > > 0.73%)
> > > > instructions in affected programs: 3988102 -> 3878478 (-2.75%)
> > > > helped: 14292
> > > > HURT: 59
> > > > 
> > > > total cycles in shared programs: 344324295 -> 340656008 (-
> > > > 1.07%)
> > > > cycles in affected programs: 247527740 -> 243859453 (-1.48%)
> > > > helped: 14056
> > > > HURT: 3314
> > > > 
> > > > total loops in shared programs: 4283 -> 4283 (0.00%)
> > > > loops in affected programs: 0 -> 0
> > > > helped: 0
> > > > HURT: 0
> > > > 
> > > > total spills in shared programs: 27812 -> 24350 (-12.45%)
> > > > spills in affected programs: 24921 -> 21459 (-13.89%)
> > > > helped: 345
> > > > HURT: 19
> > > > 
> > > > total fills in shared programs: 24173 -> 22032 (-8.86%)
> > > > fills in affected programs: 21124 -> 18983 (-10.14%)
> > > > helped: 355
> > > > HURT: 25
> > > 
> > > Ignore my previous questions about nir_opt_constant_folding after
> > > nir_opt_algebraic_late.  I had done that because I added a bunch
> > > of
> > > things to nir_opt_algebraic_late that created my constant folding
> > > opportunities.
> > > 
> > > This is the combined changes for this patch and the previous
> > > patch.  For
> > > this patch alone, I got:
> > > 
> > > total instructions in shared programs: 15306213 -> 15221518 (-
> > > 0.55%)
> > > instructions in affected programs: 2911451 -> 2826756 (-2.91%)
> > > helped: 13121
> > > HURT: 44
> > > helped stats (abs) min: 1 max: 51 x̄: 6.66 x̃: 6
> > > helped stats (rel) min: <.01% max: 16.67% x̄: 4.27% x̃: 3.30%
> > > HURT stats (abs)   min: 3 max: 453 x̄: 61.16 x̃: 5
> > > HURT stats (rel)   min: 0.20% max: 151.00% x̄: 31.57% x̃: 19.23%
> > > 95% mean confidence interval for instructions value: -6.61 -6.26
> > > 95% mean confidence interval for instructions %-change: -4.23%
> > > -4.07%
> > > Instructions are helped.
> > > 
> > > total cycles in shared programs: 375419164 -> 372829148 (-0.69%)
> > > cycles in affected programs: 146769299 -> 144179283 (-1.76%)
> > > helped: 10992
> > > HURT: 1833
> > > helped stats (abs) min: 1 max: 56127 x̄: 250.29 x̃: 18
> > > helped stats (rel) min: <.01% max: 40.52% x̄: 3.11% x̃: 2.58%
> > > HURT stats (abs)   min: 1 max: 1718 x̄: 87.93 x̃: 42
> > > HURT stats (rel)   min: <.01% max: 139.33% x̄: 7.74% x̃: 3.08%
> > > 95% mean confidence interval for cycles value: -248.21 -155.69
> > > 95% mean confidence interval for cycles %-change: -1.67% -1.44%
> > > Cycles are helped.
> > > 
> > > total spills in shared programs: 28828 -> 2 (0.21%)
> > > spills in affected programs: 2037 -> 2097 (2.95%)
> > > helped: 0
> > > HURT: 24
> > > 
> > > total fills in shared programs: 35542 -> 35639 (0.27%)
> > > fills in affected programs: 3078 -> 3175 (3.15%)
> > > helped: 2
> > > HURT: 26
> > > 
> > > I decided to look at some of the hurt shaders... it looks like
> > > some
> > > of
> > > the Unigine geometry shaders really took a beating (+150%
> > > instructions).
> > > Note the "max" in the "instructions in affected programs" above.
> > 
> > I am seeing quite different results on my KBL laptop:
> > 
> > total instructions in shared programs: 14945933 -> 14858158 (-
> > 0.59%)
> > instructions in affected programs: 2842901 -> 2755126 (-3.09%)
> > helped: 13196
> > HURT: 5
> > 
> > instructions HURT:   shaders/closed/steam/deus-ex-mankind-
> > divided/274.shader_test CS SIMD8: 1535 -> 1538 (0.20%)
> > instructions HURT:   shaders/closed/steam/deus-ex-mankind-
> > divided/184.shader_test CS SIMD8: 1535 -> 1538 (0.20%)
> > instructions HURT:   shaders/dolphin/ubershaders/147.shader_test FS
> > SIMD8: 3481 -> 3491 (0.29%)
> > instructions HURT:   shaders/dolphin/ubershaders/156.shader_test FS
> > SIMD8: 3465 -> 3475 (0.29%)
> > instructions HURT:   shaders/dolphin/ubershaders/138.shader_test FS
> > SIMD8: 3465 -> 3475 (0.29%)
> > 
> > Did you test on a different gen? Can you paste here the paths of
> > some
> > of the GS shaders where you see the big regressions so I can verify
> > I
> > have them in my shader-db?
> > 
> > Also, how did you test this patch exactly? When I was going to
> > capture
> > the reference shader-db results for patch 2 in this series so I
> > could
> > extract the results for patch 3 by comparing against it, I noticed
> > that
> > patch 2 would create constant folding scenarios (for example for
> > ADD
> > and MUL) that, before this patch, would hit an assertion in the
> > driver
> > since the algebraic pass only expects to find these opportunities
> > for F
> > types and will assert on that, so I guess you n

Re: [Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations

2019-02-28 Thread Ian Romanick
On 2/28/19 4:47 AM, Iago Toral wrote:
> On Wed, 2019-02-27 at 17:04 -0800, Ian Romanick wrote:
>> On 2/27/19 4:45 AM, Iago Toral Quiroga wrote:
>>> Now that we propagate constants to the first source of 2src
>>> instructions we
>>> see more opportunities of constant folding in the backend.
>>>
>>> Shader-db results on KBL:
>>>
>>> total instructions in shared programs: 14965607 -> 14855983 (-
>>> 0.73%)
>>> instructions in affected programs: 3988102 -> 3878478 (-2.75%)
>>> helped: 14292
>>> HURT: 59
>>>
>>> total cycles in shared programs: 344324295 -> 340656008 (-1.07%)
>>> cycles in affected programs: 247527740 -> 243859453 (-1.48%)
>>> helped: 14056
>>> HURT: 3314
>>>
>>> total loops in shared programs: 4283 -> 4283 (0.00%)
>>> loops in affected programs: 0 -> 0
>>> helped: 0
>>> HURT: 0
>>>
>>> total spills in shared programs: 27812 -> 24350 (-12.45%)
>>> spills in affected programs: 24921 -> 21459 (-13.89%)
>>> helped: 345
>>> HURT: 19
>>>
>>> total fills in shared programs: 24173 -> 22032 (-8.86%)
>>> fills in affected programs: 21124 -> 18983 (-10.14%)
>>> helped: 355
>>> HURT: 25
>>
>> Ignore my previous questions about nir_opt_constant_folding after
>> nir_opt_algebraic_late.  I had done that because I added a bunch of
>> things to nir_opt_algebraic_late that created my constant folding
>> opportunities.
>>
>> This is the combined changes for this patch and the previous
>> patch.  For
>> this patch alone, I got:
>>
>> total instructions in shared programs: 15306213 -> 15221518 (-0.55%)
>> instructions in affected programs: 2911451 -> 2826756 (-2.91%)
>> helped: 13121
>> HURT: 44
>> helped stats (abs) min: 1 max: 51 x̄: 6.66 x̃: 6
>> helped stats (rel) min: <.01% max: 16.67% x̄: 4.27% x̃: 3.30%
>> HURT stats (abs)   min: 3 max: 453 x̄: 61.16 x̃: 5
>> HURT stats (rel)   min: 0.20% max: 151.00% x̄: 31.57% x̃: 19.23%
>> 95% mean confidence interval for instructions value: -6.61 -6.26
>> 95% mean confidence interval for instructions %-change: -4.23% -4.07%
>> Instructions are helped.
>>
>> total cycles in shared programs: 375419164 -> 372829148 (-0.69%)
>> cycles in affected programs: 146769299 -> 144179283 (-1.76%)
>> helped: 10992
>> HURT: 1833
>> helped stats (abs) min: 1 max: 56127 x̄: 250.29 x̃: 18
>> helped stats (rel) min: <.01% max: 40.52% x̄: 3.11% x̃: 2.58%
>> HURT stats (abs)   min: 1 max: 1718 x̄: 87.93 x̃: 42
>> HURT stats (rel)   min: <.01% max: 139.33% x̄: 7.74% x̃: 3.08%
>> 95% mean confidence interval for cycles value: -248.21 -155.69
>> 95% mean confidence interval for cycles %-change: -1.67% -1.44%
>> Cycles are helped.
>>
>> total spills in shared programs: 28828 -> 2 (0.21%)
>> spills in affected programs: 2037 -> 2097 (2.95%)
>> helped: 0
>> HURT: 24
>>
>> total fills in shared programs: 35542 -> 35639 (0.27%)
>> fills in affected programs: 3078 -> 3175 (3.15%)
>> helped: 2
>> HURT: 26
>>
>> I decided to look at some of the hurt shaders... it looks like some
>> of
>> the Unigine geometry shaders really took a beating (+150%
>> instructions).
>> Note the "max" in the "instructions in affected programs" above.
> 
> I am seeing quite different results on my KBL laptop:
> 
> total instructions in shared programs: 14945933 -> 14858158 (-0.59%)
> instructions in affected programs: 2842901 -> 2755126 (-3.09%)
> helped: 13196
> HURT: 5
> 
> instructions HURT:   shaders/closed/steam/deus-ex-mankind-
> divided/274.shader_test CS SIMD8: 1535 -> 1538 (0.20%)
> instructions HURT:   shaders/closed/steam/deus-ex-mankind-
> divided/184.shader_test CS SIMD8: 1535 -> 1538 (0.20%)
> instructions HURT:   shaders/dolphin/ubershaders/147.shader_test FS
> SIMD8: 3481 -> 3491 (0.29%)
> instructions HURT:   shaders/dolphin/ubershaders/156.shader_test FS
> SIMD8: 3465 -> 3475 (0.29%)
> instructions HURT:   shaders/dolphin/ubershaders/138.shader_test FS
> SIMD8: 3465 -> 3475 (0.29%)
> 
> Did you test on a different gen? Can you paste here the paths of some
> of the GS shaders where you see the big regressions so I can verify I
> have them in my shader-db?
> 
> Also, how did you test this patch exactly? When I was going to capture
> the reference shader-db results for patch 2 in this series so I could
> extract the results for patch 3 by comparing against it, I noticed that
> patch 2 would create constant folding scenarios (for example for ADD
> and MUL) that, before this patch, would hit an assertion in the driver
> since the algebraic pass only expects to find these opportunities for F
> types and will assert on that, so I guess you noticed this and fixed it
> before taking your numbers?

I ran it through my usual shader-db gauntlet that runs shader-db at each
commit for SKL, BDW, HSW, IVB, SNB, ILK, and GM45.  *But* since one pass
of that takes a really, really long time, I only run release builds with
-march=native and all the other tricks.  None of the assertions would
exist in that run.

If patch 2 creates possible assertion failures, the two patches should
probably be re-ordered or the previous pa

Re: [Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations

2019-02-28 Thread Iago Toral
On Wed, 2019-02-27 at 17:04 -0800, Ian Romanick wrote:
> On 2/27/19 4:45 AM, Iago Toral Quiroga wrote:
> > Now that we propagate constants to the first source of 2src
> > instructions we
> > see more opportunities of constant folding in the backend.
> > 
> > Shader-db results on KBL:
> > 
> > total instructions in shared programs: 14965607 -> 14855983 (-
> > 0.73%)
> > instructions in affected programs: 3988102 -> 3878478 (-2.75%)
> > helped: 14292
> > HURT: 59
> > 
> > total cycles in shared programs: 344324295 -> 340656008 (-1.07%)
> > cycles in affected programs: 247527740 -> 243859453 (-1.48%)
> > helped: 14056
> > HURT: 3314
> > 
> > total loops in shared programs: 4283 -> 4283 (0.00%)
> > loops in affected programs: 0 -> 0
> > helped: 0
> > HURT: 0
> > 
> > total spills in shared programs: 27812 -> 24350 (-12.45%)
> > spills in affected programs: 24921 -> 21459 (-13.89%)
> > helped: 345
> > HURT: 19
> > 
> > total fills in shared programs: 24173 -> 22032 (-8.86%)
> > fills in affected programs: 21124 -> 18983 (-10.14%)
> > helped: 355
> > HURT: 25
> 
> Ignore my previous questions about nir_opt_constant_folding after
> nir_opt_algebraic_late.  I had done that because I added a bunch of
> things to nir_opt_algebraic_late that created my constant folding
> opportunities.
> 
> This is the combined changes for this patch and the previous
> patch.  For
> this patch alone, I got:
> 
> total instructions in shared programs: 15306213 -> 15221518 (-0.55%)
> instructions in affected programs: 2911451 -> 2826756 (-2.91%)
> helped: 13121
> HURT: 44
> helped stats (abs) min: 1 max: 51 x̄: 6.66 x̃: 6
> helped stats (rel) min: <.01% max: 16.67% x̄: 4.27% x̃: 3.30%
> HURT stats (abs)   min: 3 max: 453 x̄: 61.16 x̃: 5
> HURT stats (rel)   min: 0.20% max: 151.00% x̄: 31.57% x̃: 19.23%
> 95% mean confidence interval for instructions value: -6.61 -6.26
> 95% mean confidence interval for instructions %-change: -4.23% -4.07%
> Instructions are helped.
> 
> total cycles in shared programs: 375419164 -> 372829148 (-0.69%)
> cycles in affected programs: 146769299 -> 144179283 (-1.76%)
> helped: 10992
> HURT: 1833
> helped stats (abs) min: 1 max: 56127 x̄: 250.29 x̃: 18
> helped stats (rel) min: <.01% max: 40.52% x̄: 3.11% x̃: 2.58%
> HURT stats (abs)   min: 1 max: 1718 x̄: 87.93 x̃: 42
> HURT stats (rel)   min: <.01% max: 139.33% x̄: 7.74% x̃: 3.08%
> 95% mean confidence interval for cycles value: -248.21 -155.69
> 95% mean confidence interval for cycles %-change: -1.67% -1.44%
> Cycles are helped.
> 
> total spills in shared programs: 28828 -> 2 (0.21%)
> spills in affected programs: 2037 -> 2097 (2.95%)
> helped: 0
> HURT: 24
> 
> total fills in shared programs: 35542 -> 35639 (0.27%)
> fills in affected programs: 3078 -> 3175 (3.15%)
> helped: 2
> HURT: 26
> 
> I decided to look at some of the hurt shaders... it looks like some
> of
> the Unigine geometry shaders really took a beating (+150%
> instructions).
> Note the "max" in the "instructions in affected programs" above.

I am seeing quite different results on my KBL laptop:

total instructions in shared programs: 14945933 -> 14858158 (-0.59%)
instructions in affected programs: 2842901 -> 2755126 (-3.09%)
helped: 13196
HURT: 5

instructions HURT:   shaders/closed/steam/deus-ex-mankind-
divided/274.shader_test CS SIMD8: 1535 -> 1538 (0.20%)
instructions HURT:   shaders/closed/steam/deus-ex-mankind-
divided/184.shader_test CS SIMD8: 1535 -> 1538 (0.20%)
instructions HURT:   shaders/dolphin/ubershaders/147.shader_test FS
SIMD8: 3481 -> 3491 (0.29%)
instructions HURT:   shaders/dolphin/ubershaders/156.shader_test FS
SIMD8: 3465 -> 3475 (0.29%)
instructions HURT:   shaders/dolphin/ubershaders/138.shader_test FS
SIMD8: 3465 -> 3475 (0.29%)

Did you test on a different gen? Can you paste here the paths of some
of the GS shaders where you see the big regressions so I can verify I
have them in my shader-db?

Also, how did you test this patch exactly? When I was going to capture
the reference shader-db results for patch 2 in this series so I could
extract the results for patch 3 by comparing against it, I noticed that
patch 2 would create constant folding scenarios (for example for ADD
and MUL) that, before this patch, would hit an assertion in the driver
since the algebraic pass only expects to find these opportunities for F
types and will assert on that, so I guess you noticed this and fixed it
before taking your numbers?

> More comments below by SHL...
> 
> > LOST:   0
> > GAINED: 5
> > ---
> >  src/intel/compiler/brw_fs.cpp | 203
> > --
> >  1 file changed, 195 insertions(+), 8 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_fs.cpp
> > b/src/intel/compiler/brw_fs.cpp
> > index 2358acbeb59..b2b60237c82 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -2583,9 +2583,55 @@ fs_visitor::opt_algebraic()
> >   break;
> >  
> >case BRW_OPCODE_MUL:
> > - if (inst->src[1]

Re: [Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations

2019-02-27 Thread Iago Toral
On Wed, 2019-02-27 at 17:04 -0800, Ian Romanick wrote:
> On 2/27/19 4:45 AM, Iago Toral Quiroga wrote:
> > Now that we propagate constants to the first source of 2src
> > instructions we
> > see more opportunities of constant folding in the backend.
> > 
> > Shader-db results on KBL:
> > 
> > total instructions in shared programs: 14965607 -> 14855983 (-
> > 0.73%)
> > instructions in affected programs: 3988102 -> 3878478 (-2.75%)
> > helped: 14292
> > HURT: 59
> > 
> > total cycles in shared programs: 344324295 -> 340656008 (-1.07%)
> > cycles in affected programs: 247527740 -> 243859453 (-1.48%)
> > helped: 14056
> > HURT: 3314
> > 
> > total loops in shared programs: 4283 -> 4283 (0.00%)
> > loops in affected programs: 0 -> 0
> > helped: 0
> > HURT: 0
> > 
> > total spills in shared programs: 27812 -> 24350 (-12.45%)
> > spills in affected programs: 24921 -> 21459 (-13.89%)
> > helped: 345
> > HURT: 19
> > 
> > total fills in shared programs: 24173 -> 22032 (-8.86%)
> > fills in affected programs: 21124 -> 18983 (-10.14%)
> > helped: 355
> > HURT: 25
> 
> Ignore my previous questions about nir_opt_constant_folding after
> nir_opt_algebraic_late.  I had done that because I added a bunch of
> things to nir_opt_algebraic_late that created my constant folding
> opportunities.
> 
> This is the combined changes for this patch and the previous
> patch.  For
> this patch alone, I got:
> 
> total instructions in shared programs: 15306213 -> 15221518 (-0.55%)
> instructions in affected programs: 2911451 -> 2826756 (-2.91%)
> helped: 13121
> HURT: 44
> helped stats (abs) min: 1 max: 51 x̄: 6.66 x̃: 6
> helped stats (rel) min: <.01% max: 16.67% x̄: 4.27% x̃: 3.30%
> HURT stats (abs)   min: 3 max: 453 x̄: 61.16 x̃: 5
> HURT stats (rel)   min: 0.20% max: 151.00% x̄: 31.57% x̃: 19.23%
> 95% mean confidence interval for instructions value: -6.61 -6.26
> 95% mean confidence interval for instructions %-change: -4.23% -4.07%
> Instructions are helped.
> 
> total cycles in shared programs: 375419164 -> 372829148 (-0.69%)
> cycles in affected programs: 146769299 -> 144179283 (-1.76%)
> helped: 10992
> HURT: 1833
> helped stats (abs) min: 1 max: 56127 x̄: 250.29 x̃: 18
> helped stats (rel) min: <.01% max: 40.52% x̄: 3.11% x̃: 2.58%
> HURT stats (abs)   min: 1 max: 1718 x̄: 87.93 x̃: 42
> HURT stats (rel)   min: <.01% max: 139.33% x̄: 7.74% x̃: 3.08%
> 95% mean confidence interval for cycles value: -248.21 -155.69
> 95% mean confidence interval for cycles %-change: -1.67% -1.44%
> Cycles are helped.
> 
> total spills in shared programs: 28828 -> 2 (0.21%)
> spills in affected programs: 2037 -> 2097 (2.95%)
> helped: 0
> HURT: 24
> 
> total fills in shared programs: 35542 -> 35639 (0.27%)
> fills in affected programs: 3078 -> 3175 (3.15%)
> helped: 2
> HURT: 26
> 
> I decided to look at some of the hurt shaders... it looks like some
> of
> the Unigine geometry shaders really took a beating (+150%
> instructions).
> Note the "max" in the "instructions in affected programs" above.
> 
> More comments below by SHL...
> 
> > LOST:   0
> > GAINED: 5
> > ---
> >  src/intel/compiler/brw_fs.cpp | 203
> > --
> >  1 file changed, 195 insertions(+), 8 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_fs.cpp
> > b/src/intel/compiler/brw_fs.cpp
> > index 2358acbeb59..b2b60237c82 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -2583,9 +2583,55 @@ fs_visitor::opt_algebraic()
> >   break;
> >  
> >case BRW_OPCODE_MUL:
> > - if (inst->src[1].file != IMM)
> > + if (inst->src[0].file != IMM && inst->src[1].file != IMM)
> >  continue;
> >  
> > + /* Constant folding */
> > + if (inst->src[0].file == IMM && inst->src[1].file == IMM)
> > {
> > +assert(inst->src[0].type == inst->src[1].type);
> > +bool local_progress = true;
> > +switch (inst->src[0].type) {
> > +case BRW_REGISTER_TYPE_HF: {
> > +   float v1 = _mesa_half_to_float(inst->src[0].ud &
> > 0xu);
> > +   float v2 = _mesa_half_to_float(inst->src[1].ud &
> > 0xu);
> > +   inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 *
> > v2));
> > +   break;
> > +}
> > +case BRW_REGISTER_TYPE_W: {
> > +   int16_t v1 = inst->src[0].ud & 0xu;
> > +   int16_t v2 = inst->src[1].ud & 0xu;
> > +   inst->src[0] = brw_imm_w(v1 * v2);
> > +   break;
> > +}
> > +case BRW_REGISTER_TYPE_UW: {
> > +   uint16_t v1 = inst->src[0].ud & 0xu;
> > +   uint16_t v2 = inst->src[1].ud & 0xu;
> > +   inst->src[0] = brw_imm_uw(v1 * v2);
> > +   break;
> > +}
> > +case BRW_REGISTER_TYPE_F:
> > +   inst->src[0].f *= inst->src[1].f;
> > +   break;
> > +case BRW_REGIS

Re: [Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations

2019-02-27 Thread Ian Romanick
On 2/27/19 4:45 AM, Iago Toral Quiroga wrote:
> Now that we propagate constants to the first source of 2src instructions we
> see more opportunities of constant folding in the backend.
> 
> Shader-db results on KBL:
> 
> total instructions in shared programs: 14965607 -> 14855983 (-0.73%)
> instructions in affected programs: 3988102 -> 3878478 (-2.75%)
> helped: 14292
> HURT: 59
> 
> total cycles in shared programs: 344324295 -> 340656008 (-1.07%)
> cycles in affected programs: 247527740 -> 243859453 (-1.48%)
> helped: 14056
> HURT: 3314
> 
> total loops in shared programs: 4283 -> 4283 (0.00%)
> loops in affected programs: 0 -> 0
> helped: 0
> HURT: 0
> 
> total spills in shared programs: 27812 -> 24350 (-12.45%)
> spills in affected programs: 24921 -> 21459 (-13.89%)
> helped: 345
> HURT: 19
> 
> total fills in shared programs: 24173 -> 22032 (-8.86%)
> fills in affected programs: 21124 -> 18983 (-10.14%)
> helped: 355
> HURT: 25

Ignore my previous questions about nir_opt_constant_folding after
nir_opt_algebraic_late.  I had done that because I added a bunch of
things to nir_opt_algebraic_late that created my constant folding
opportunities.

This is the combined changes for this patch and the previous patch.  For
this patch alone, I got:

total instructions in shared programs: 15306213 -> 15221518 (-0.55%)
instructions in affected programs: 2911451 -> 2826756 (-2.91%)
helped: 13121
HURT: 44
helped stats (abs) min: 1 max: 51 x̄: 6.66 x̃: 6
helped stats (rel) min: <.01% max: 16.67% x̄: 4.27% x̃: 3.30%
HURT stats (abs)   min: 3 max: 453 x̄: 61.16 x̃: 5
HURT stats (rel)   min: 0.20% max: 151.00% x̄: 31.57% x̃: 19.23%
95% mean confidence interval for instructions value: -6.61 -6.26
95% mean confidence interval for instructions %-change: -4.23% -4.07%
Instructions are helped.

total cycles in shared programs: 375419164 -> 372829148 (-0.69%)
cycles in affected programs: 146769299 -> 144179283 (-1.76%)
helped: 10992
HURT: 1833
helped stats (abs) min: 1 max: 56127 x̄: 250.29 x̃: 18
helped stats (rel) min: <.01% max: 40.52% x̄: 3.11% x̃: 2.58%
HURT stats (abs)   min: 1 max: 1718 x̄: 87.93 x̃: 42
HURT stats (rel)   min: <.01% max: 139.33% x̄: 7.74% x̃: 3.08%
95% mean confidence interval for cycles value: -248.21 -155.69
95% mean confidence interval for cycles %-change: -1.67% -1.44%
Cycles are helped.

total spills in shared programs: 28828 -> 2 (0.21%)
spills in affected programs: 2037 -> 2097 (2.95%)
helped: 0
HURT: 24

total fills in shared programs: 35542 -> 35639 (0.27%)
fills in affected programs: 3078 -> 3175 (3.15%)
helped: 2
HURT: 26

I decided to look at some of the hurt shaders... it looks like some of
the Unigine geometry shaders really took a beating (+150% instructions).
Note the "max" in the "instructions in affected programs" above.

More comments below by SHL...

> LOST:   0
> GAINED: 5
> ---
>  src/intel/compiler/brw_fs.cpp | 203 --
>  1 file changed, 195 insertions(+), 8 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 2358acbeb59..b2b60237c82 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -2583,9 +2583,55 @@ fs_visitor::opt_algebraic()
>   break;
>  
>case BRW_OPCODE_MUL:
> - if (inst->src[1].file != IMM)
> + if (inst->src[0].file != IMM && inst->src[1].file != IMM)
>  continue;
>  
> + /* Constant folding */
> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) {
> +assert(inst->src[0].type == inst->src[1].type);
> +bool local_progress = true;
> +switch (inst->src[0].type) {
> +case BRW_REGISTER_TYPE_HF: {
> +   float v1 = _mesa_half_to_float(inst->src[0].ud & 0xu);
> +   float v2 = _mesa_half_to_float(inst->src[1].ud & 0xu);
> +   inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 * v2));
> +   break;
> +}
> +case BRW_REGISTER_TYPE_W: {
> +   int16_t v1 = inst->src[0].ud & 0xu;
> +   int16_t v2 = inst->src[1].ud & 0xu;
> +   inst->src[0] = brw_imm_w(v1 * v2);
> +   break;
> +}
> +case BRW_REGISTER_TYPE_UW: {
> +   uint16_t v1 = inst->src[0].ud & 0xu;
> +   uint16_t v2 = inst->src[1].ud & 0xu;
> +   inst->src[0] = brw_imm_uw(v1 * v2);
> +   break;
> +}
> +case BRW_REGISTER_TYPE_F:
> +   inst->src[0].f *= inst->src[1].f;
> +   break;
> +case BRW_REGISTER_TYPE_D:
> +   inst->src[0].d *= inst->src[1].d;
> +   break;
> +case BRW_REGISTER_TYPE_UD:
> +   inst->src[0].ud *= inst->src[1].ud;
> +   break;
> +default:
> +   local_progress = false;
> +   break;
> +};
> +
> +if (

Re: [Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations

2019-02-27 Thread Ian Romanick
On 2/27/19 4:45 AM, Iago Toral Quiroga wrote:
> Now that we propagate constants to the first source of 2src instructions we
> see more opportunities of constant folding in the backend.

All the benefit of the series is from more constant folding?  Once upon
a time, I had a patch that added another call to
nir_opt_constant_folding after we call nir_opt_algebraic_late.  My
recollection is that it hurt vec4 shaders, but it helped scalar shaders
quite a bit.  How does doing that affect these results?

Hrm... I can collect that data.

> Shader-db results on KBL:
> 
> total instructions in shared programs: 14965607 -> 14855983 (-0.73%)
> instructions in affected programs: 3988102 -> 3878478 (-2.75%)
> helped: 14292
> HURT: 59
> 
> total cycles in shared programs: 344324295 -> 340656008 (-1.07%)
> cycles in affected programs: 247527740 -> 243859453 (-1.48%)
> helped: 14056
> HURT: 3314
> 
> total loops in shared programs: 4283 -> 4283 (0.00%)
> loops in affected programs: 0 -> 0
> helped: 0
> HURT: 0
> 
> total spills in shared programs: 27812 -> 24350 (-12.45%)
> spills in affected programs: 24921 -> 21459 (-13.89%)
> helped: 345
> HURT: 19
> 
> total fills in shared programs: 24173 -> 22032 (-8.86%)
> fills in affected programs: 21124 -> 18983 (-10.14%)
> helped: 355
> HURT: 25
> 
> LOST:   0
> GAINED: 5
> ---
>  src/intel/compiler/brw_fs.cpp | 203 --
>  1 file changed, 195 insertions(+), 8 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 2358acbeb59..b2b60237c82 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -2583,9 +2583,55 @@ fs_visitor::opt_algebraic()
>   break;
>  
>case BRW_OPCODE_MUL:
> - if (inst->src[1].file != IMM)
> + if (inst->src[0].file != IMM && inst->src[1].file != IMM)
>  continue;
>  
> + /* Constant folding */
> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) {
> +assert(inst->src[0].type == inst->src[1].type);
> +bool local_progress = true;
> +switch (inst->src[0].type) {
> +case BRW_REGISTER_TYPE_HF: {
> +   float v1 = _mesa_half_to_float(inst->src[0].ud & 0xu);
> +   float v2 = _mesa_half_to_float(inst->src[1].ud & 0xu);
> +   inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 * v2));
> +   break;
> +}
> +case BRW_REGISTER_TYPE_W: {
> +   int16_t v1 = inst->src[0].ud & 0xu;
> +   int16_t v2 = inst->src[1].ud & 0xu;
> +   inst->src[0] = brw_imm_w(v1 * v2);
> +   break;
> +}
> +case BRW_REGISTER_TYPE_UW: {
> +   uint16_t v1 = inst->src[0].ud & 0xu;
> +   uint16_t v2 = inst->src[1].ud & 0xu;
> +   inst->src[0] = brw_imm_uw(v1 * v2);
> +   break;
> +}
> +case BRW_REGISTER_TYPE_F:
> +   inst->src[0].f *= inst->src[1].f;
> +   break;
> +case BRW_REGISTER_TYPE_D:
> +   inst->src[0].d *= inst->src[1].d;
> +   break;
> +case BRW_REGISTER_TYPE_UD:
> +   inst->src[0].ud *= inst->src[1].ud;
> +   break;
> +default:
> +   local_progress = false;
> +   break;
> +};
> +
> +if (local_progress) {
> +   inst->opcode = BRW_OPCODE_MOV;
> +   inst->src[1] = reg_undef;
> +   progress = true;
> +   break;
> +}
> + }
> +
> +
>   /* a * 1.0 = a */
>   if (inst->src[1].is_one()) {
>  inst->opcode = BRW_OPCODE_MOV;
> @@ -2594,6 +2640,14 @@ fs_visitor::opt_algebraic()
>  break;
>   }
>  
> + if (inst->src[0].is_one()) {
> +inst->opcode = BRW_OPCODE_MOV;
> +inst->src[0] = inst->src[1];
> +inst->src[1] = reg_undef;
> +progress = true;
> +break;
> + }
> +
>   /* a * -1.0 = -a */
>   if (inst->src[1].is_negative_one()) {
>  inst->opcode = BRW_OPCODE_MOV;
> @@ -2603,27 +2657,160 @@ fs_visitor::opt_algebraic()
>  break;
>   }
>  
> - if (inst->src[0].file == IMM) {
> -assert(inst->src[0].type == BRW_REGISTER_TYPE_F);
> + if (inst->src[0].is_negative_one()) {
> +inst->opcode = BRW_OPCODE_MOV;
> +inst->src[0] = inst->src[1];
> +inst->src[0].negate = !inst->src[1].negate;
> +inst->src[1] = reg_undef;
> +progress = true;
> +break;
> + }
> +
> + /* a * 0 = 0 (this is not exact for floating point) */
> + if (inst->src[1].is_zero() &&
> + brw_reg_type_is_integer(inst->src[1].type)) {
> +inst->opcode = BRW_OPCODE_MOV;
> +  

[Mesa-dev] [PATCH 3/3] intel/compiler: implement more algebraic optimizations

2019-02-27 Thread Iago Toral Quiroga
Now that we propagate constants to the first source of 2src instructions we
see more opportunities of constant folding in the backend.

Shader-db results on KBL:

total instructions in shared programs: 14965607 -> 14855983 (-0.73%)
instructions in affected programs: 3988102 -> 3878478 (-2.75%)
helped: 14292
HURT: 59

total cycles in shared programs: 344324295 -> 340656008 (-1.07%)
cycles in affected programs: 247527740 -> 243859453 (-1.48%)
helped: 14056
HURT: 3314

total loops in shared programs: 4283 -> 4283 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 27812 -> 24350 (-12.45%)
spills in affected programs: 24921 -> 21459 (-13.89%)
helped: 345
HURT: 19

total fills in shared programs: 24173 -> 22032 (-8.86%)
fills in affected programs: 21124 -> 18983 (-10.14%)
helped: 355
HURT: 25

LOST:   0
GAINED: 5
---
 src/intel/compiler/brw_fs.cpp | 203 --
 1 file changed, 195 insertions(+), 8 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 2358acbeb59..b2b60237c82 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -2583,9 +2583,55 @@ fs_visitor::opt_algebraic()
  break;
 
   case BRW_OPCODE_MUL:
- if (inst->src[1].file != IMM)
+ if (inst->src[0].file != IMM && inst->src[1].file != IMM)
 continue;
 
+ /* Constant folding */
+ if (inst->src[0].file == IMM && inst->src[1].file == IMM) {
+assert(inst->src[0].type == inst->src[1].type);
+bool local_progress = true;
+switch (inst->src[0].type) {
+case BRW_REGISTER_TYPE_HF: {
+   float v1 = _mesa_half_to_float(inst->src[0].ud & 0xu);
+   float v2 = _mesa_half_to_float(inst->src[1].ud & 0xu);
+   inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 * v2));
+   break;
+}
+case BRW_REGISTER_TYPE_W: {
+   int16_t v1 = inst->src[0].ud & 0xu;
+   int16_t v2 = inst->src[1].ud & 0xu;
+   inst->src[0] = brw_imm_w(v1 * v2);
+   break;
+}
+case BRW_REGISTER_TYPE_UW: {
+   uint16_t v1 = inst->src[0].ud & 0xu;
+   uint16_t v2 = inst->src[1].ud & 0xu;
+   inst->src[0] = brw_imm_uw(v1 * v2);
+   break;
+}
+case BRW_REGISTER_TYPE_F:
+   inst->src[0].f *= inst->src[1].f;
+   break;
+case BRW_REGISTER_TYPE_D:
+   inst->src[0].d *= inst->src[1].d;
+   break;
+case BRW_REGISTER_TYPE_UD:
+   inst->src[0].ud *= inst->src[1].ud;
+   break;
+default:
+   local_progress = false;
+   break;
+};
+
+if (local_progress) {
+   inst->opcode = BRW_OPCODE_MOV;
+   inst->src[1] = reg_undef;
+   progress = true;
+   break;
+}
+ }
+
+
  /* a * 1.0 = a */
  if (inst->src[1].is_one()) {
 inst->opcode = BRW_OPCODE_MOV;
@@ -2594,6 +2640,14 @@ fs_visitor::opt_algebraic()
 break;
  }
 
+ if (inst->src[0].is_one()) {
+inst->opcode = BRW_OPCODE_MOV;
+inst->src[0] = inst->src[1];
+inst->src[1] = reg_undef;
+progress = true;
+break;
+ }
+
  /* a * -1.0 = -a */
  if (inst->src[1].is_negative_one()) {
 inst->opcode = BRW_OPCODE_MOV;
@@ -2603,27 +2657,160 @@ fs_visitor::opt_algebraic()
 break;
  }
 
- if (inst->src[0].file == IMM) {
-assert(inst->src[0].type == BRW_REGISTER_TYPE_F);
+ if (inst->src[0].is_negative_one()) {
+inst->opcode = BRW_OPCODE_MOV;
+inst->src[0] = inst->src[1];
+inst->src[0].negate = !inst->src[1].negate;
+inst->src[1] = reg_undef;
+progress = true;
+break;
+ }
+
+ /* a * 0 = 0 (this is not exact for floating point) */
+ if (inst->src[1].is_zero() &&
+ brw_reg_type_is_integer(inst->src[1].type)) {
+inst->opcode = BRW_OPCODE_MOV;
+inst->src[0] = inst->src[1];
+inst->src[1] = reg_undef;
+progress = true;
+break;
+ }
+
+ if (inst->src[0].is_zero() &&
+ brw_reg_type_is_integer(inst->src[0].type)) {
 inst->opcode = BRW_OPCODE_MOV;
-inst->src[0].f *= inst->src[1].f;
 inst->src[1] = reg_undef;
 progress = true;
 break;
  }
  break;
   case BRW_OPCODE_ADD:
- if (inst->src[1].file != IMM)
+ if (inst->src[0].file != IMM && inst->src[1].file != IMM)
 continue;
 
- if (inst->src[0].file == IMM) {
-