Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

2016-10-02 Thread Ilia Mirkin
On Sun, Oct 2, 2016 at 2:43 PM, Tobias Klausmann
 wrote:
>
>
> On 02.10.2016 20:26, Ilia Mirkin wrote:
>>
>> That's very odd. LoadPropagation should have picked that up even in
>> its current form. Should try to figure out why it didn't and that is
>> likely to "fix" a *lot* more situations.
>
>
> Actually i was coming from an, given really constrained, addition to the
> LoadPropagation pass, where i was told to fix it within OP_MAD :/

Why doesn't LoadPropagation handle it as-is?
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

2016-10-02 Thread Tobias Klausmann



On 02.10.2016 20:26, Ilia Mirkin wrote:

That's very odd. LoadPropagation should have picked that up even in
its current form. Should try to figure out why it didn't and that is
likely to "fix" a *lot* more situations.


Actually i was coming from an, given really constrained, addition to the 
LoadPropagation pass, where i was told to fix it within OP_MAD :/



On Sun, Oct 2, 2016 at 2:24 PM, Tobias Klausmann
 wrote:


On 02.10.2016 20:03, Ilia Mirkin wrote:

On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
 wrote:

Previously we'd end up with an unnecessary mov for the thirs immediate
value.

total instructions in shared programs : 851881 -> 851864 (-0.00%)
total gprs used in shared programs: 110295 -> 110295 (0.00%)
total local used in shared programs   : 1020 -> 1020 (0.00%)

  localgpr   inst  bytes
  helped   0   0  17  17
hurt   0   0   0   0

Suggested-by: Karol Herbst 
Signed-off-by: Tobias Klausmann 
---
   src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15
---
   1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 9875738..8bb5cf9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i,
ImmediateValue &imm0, int s)
 break;
  case OP_MAD:
 if (imm0.isInteger(0)) {
+ ImmediateValue imm1;
i->setSrc(0, i->getSrc(2));
i->src(0).mod = i->src(2).mod;
i->setSrc(1, NULL);
i->setSrc(2, NULL);
- i->op = i->src(0).mod.getOp();
- if (i->op != OP_CVT)
-i->src(0).mod = 0;
+ if (i->src(0).getImmediate(imm1)) {
+bld.setPosition(i, false);
+newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64),
+ i->dType);
+delete_Instruction(prog, i);

What's an example of a situation where this helps? It shouldn't
matter, the mov's should get cleaned up. [Clearly 17 shaders
disagree...] Is this just a side-effect of the fact that we don't run
the opts to a fixed point?


It is a second mov that causes a problem for later folding in the imm, here
output of a testshader[1]:

   0: nop u32 %r56 (0)
   1: ld  u32 %r31 c0[0x0] (0)
   2: ld  u32 %r37 c0[0x140] (0)
   3: mov u32 %r38 0x (0)
   4: mov u32 %r39 0x3f80 (0)
   5: mad f32 %r40 %r37 %r38 %r39 (0)
   6: mad f32 %r44 %r37 %r38 %r38 (0)
   7: add f32 %r53 %r31 %r40 (0)
   8: add f32 %r54 %r31 %r44 (0)
   9: add f32 %r57 %r56 %r44 (0)

Constantfolding...

MAIN:-1 ()
BB:0 (14 instructions) - df = { }
  -> BB:1 (tree)
   0: nop u32 %r56 (0)
   1: ld  u32 %r31 c0[0x0] (0)
   2: ld  u32 %r37 c0[0x140] (0)
   3: mov u32 %r38 0x (0)
   4: mov u32 %r39 0x3f80 (0)
   5: mov f32 %r40 %r39 (0)
   6: mov f32 %r44 %r38 (0)
   7: add f32 %r53 %r31 %r40 (0)
   8: mov f32 %r54 %r31 (0)
   9: mov f32 %r57 %r56 (0)


The outcome:
   0: ld  u32 $r2 c0[0x0] (8)
   1: mov u32 $r0 0x3f80 (8)
   2: add ftz f32 $r0 $r2 $r0 (8)
   3: mov f32 $r3 $r1 (8)
   4: mov u32 $r1 $r2 (8)
   5: export b128 # o[0x0] $r0q (8)

With patch:
   0: ld  u32 $r2 c0[0x0] (8)
   1: add ftz f32 $r0 $r2 1.00 (8)
   2: mov f32 $r3 $r1 (8)
   3: mov u32 $r1 $r2 (8)
   4: export b128 # o[0x0] $r0q (8)


[1]:
VERT
PROPERTY NEXT_SHADER FRAG
DCL OUT[0], GENERIC[0]
DCL CONST[0]
DCL TEMP[0..1], LOCAL
IMM[0] FLT32 {0.0078,-1., 0., 0.5000}
IMM[1] FLT32 {1., 0., 65535., 0.0100}
   0: MOV TEMP[0].xyz, CONST[0].
  39: MAD TEMP[1], CONST[20]., IMM[1]., IMM[1].xyyy
  41: ADD TEMP[1], TEMP[0], TEMP[1]
208: MOV OUT[0], TEMP[1]
211: END





+ }
+ else {
+i->op = i->src(0).mod.getOp();
+if (i->op != OP_CVT)
+   i->src(0).mod = 0;
+ }
 } else
 if (i->subOp != NV50_IR_SUBOP_MUL_HIGH &&
 (imm0.isInteger(1) || imm0.isInteger(-1))) {
--
2.10.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau




___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

2016-10-02 Thread Ilia Mirkin
That's very odd. LoadPropagation should have picked that up even in
its current form. Should try to figure out why it didn't and that is
likely to "fix" a *lot* more situations.

On Sun, Oct 2, 2016 at 2:24 PM, Tobias Klausmann
 wrote:
>
>
> On 02.10.2016 20:03, Ilia Mirkin wrote:
>>
>> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
>>  wrote:
>>>
>>> Previously we'd end up with an unnecessary mov for the thirs immediate
>>> value.
>>>
>>> total instructions in shared programs : 851881 -> 851864 (-0.00%)
>>> total gprs used in shared programs: 110295 -> 110295 (0.00%)
>>> total local used in shared programs   : 1020 -> 1020 (0.00%)
>>>
>>>  localgpr   inst  bytes
>>>  helped   0   0  17  17
>>>hurt   0   0   0   0
>>>
>>> Suggested-by: Karol Herbst 
>>> Signed-off-by: Tobias Klausmann 
>>> ---
>>>   src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15
>>> ---
>>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> index 9875738..8bb5cf9 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i,
>>> ImmediateValue &imm0, int s)
>>> break;
>>>  case OP_MAD:
>>> if (imm0.isInteger(0)) {
>>> + ImmediateValue imm1;
>>>i->setSrc(0, i->getSrc(2));
>>>i->src(0).mod = i->src(2).mod;
>>>i->setSrc(1, NULL);
>>>i->setSrc(2, NULL);
>>> - i->op = i->src(0).mod.getOp();
>>> - if (i->op != OP_CVT)
>>> -i->src(0).mod = 0;
>>> + if (i->src(0).getImmediate(imm1)) {
>>> +bld.setPosition(i, false);
>>> +newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64),
>>> + i->dType);
>>> +delete_Instruction(prog, i);
>>
>> What's an example of a situation where this helps? It shouldn't
>> matter, the mov's should get cleaned up. [Clearly 17 shaders
>> disagree...] Is this just a side-effect of the fact that we don't run
>> the opts to a fixed point?
>
>
> It is a second mov that causes a problem for later folding in the imm, here
> output of a testshader[1]:
>
>   0: nop u32 %r56 (0)
>   1: ld  u32 %r31 c0[0x0] (0)
>   2: ld  u32 %r37 c0[0x140] (0)
>   3: mov u32 %r38 0x (0)
>   4: mov u32 %r39 0x3f80 (0)
>   5: mad f32 %r40 %r37 %r38 %r39 (0)
>   6: mad f32 %r44 %r37 %r38 %r38 (0)
>   7: add f32 %r53 %r31 %r40 (0)
>   8: add f32 %r54 %r31 %r44 (0)
>   9: add f32 %r57 %r56 %r44 (0)
>
> Constantfolding...
>
> MAIN:-1 ()
> BB:0 (14 instructions) - df = { }
>  -> BB:1 (tree)
>   0: nop u32 %r56 (0)
>   1: ld  u32 %r31 c0[0x0] (0)
>   2: ld  u32 %r37 c0[0x140] (0)
>   3: mov u32 %r38 0x (0)
>   4: mov u32 %r39 0x3f80 (0)
>   5: mov f32 %r40 %r39 (0)
>   6: mov f32 %r44 %r38 (0)
>   7: add f32 %r53 %r31 %r40 (0)
>   8: mov f32 %r54 %r31 (0)
>   9: mov f32 %r57 %r56 (0)
>
>
> The outcome:
>   0: ld  u32 $r2 c0[0x0] (8)
>   1: mov u32 $r0 0x3f80 (8)
>   2: add ftz f32 $r0 $r2 $r0 (8)
>   3: mov f32 $r3 $r1 (8)
>   4: mov u32 $r1 $r2 (8)
>   5: export b128 # o[0x0] $r0q (8)
>
> With patch:
>   0: ld  u32 $r2 c0[0x0] (8)
>   1: add ftz f32 $r0 $r2 1.00 (8)
>   2: mov f32 $r3 $r1 (8)
>   3: mov u32 $r1 $r2 (8)
>   4: export b128 # o[0x0] $r0q (8)
>
>
> [1]:
> VERT
> PROPERTY NEXT_SHADER FRAG
> DCL OUT[0], GENERIC[0]
> DCL CONST[0]
> DCL TEMP[0..1], LOCAL
> IMM[0] FLT32 {0.0078,-1., 0., 0.5000}
> IMM[1] FLT32 {1., 0., 65535., 0.0100}
>   0: MOV TEMP[0].xyz, CONST[0].
>  39: MAD TEMP[1], CONST[20]., IMM[1]., IMM[1].xyyy
>  41: ADD TEMP[1], TEMP[0], TEMP[1]
> 208: MOV OUT[0], TEMP[1]
> 211: END
>
>
>
>
>>
>>> + }
>>> + else {
>>> +i->op = i->src(0).mod.getOp();
>>> +if (i->op != OP_CVT)
>>> +   i->src(0).mod = 0;
>>> + }
>>> } else
>>> if (i->subOp != NV50_IR_SUBOP_MUL_HIGH &&
>>> (imm0.isInteger(1) || imm0.isInteger(-1))) {
>>> --
>>> 2.10.0
>>>
>>> ___
>>> Nouveau mailing list
>>> Nouveau@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/nouveau
>
>
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

2016-10-02 Thread Tobias Klausmann



On 02.10.2016 20:03, Ilia Mirkin wrote:

On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
 wrote:

Previously we'd end up with an unnecessary mov for the thirs immediate value.

total instructions in shared programs : 851881 -> 851864 (-0.00%)
total gprs used in shared programs: 110295 -> 110295 (0.00%)
total local used in shared programs   : 1020 -> 1020 (0.00%)

 localgpr   inst  bytes
 helped   0   0  17  17
   hurt   0   0   0   0

Suggested-by: Karol Herbst 
Signed-off-by: Tobias Klausmann 
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 ---
  1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 9875738..8bb5cf9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
&imm0, int s)
break;
 case OP_MAD:
if (imm0.isInteger(0)) {
+ ImmediateValue imm1;
   i->setSrc(0, i->getSrc(2));
   i->src(0).mod = i->src(2).mod;
   i->setSrc(1, NULL);
   i->setSrc(2, NULL);
- i->op = i->src(0).mod.getOp();
- if (i->op != OP_CVT)
-i->src(0).mod = 0;
+ if (i->src(0).getImmediate(imm1)) {
+bld.setPosition(i, false);
+newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64),
+ i->dType);
+delete_Instruction(prog, i);

What's an example of a situation where this helps? It shouldn't
matter, the mov's should get cleaned up. [Clearly 17 shaders
disagree...] Is this just a side-effect of the fact that we don't run
the opts to a fixed point?


It is a second mov that causes a problem for later folding in the imm, 
here output of a testshader[1]:


  0: nop u32 %r56 (0)
  1: ld  u32 %r31 c0[0x0] (0)
  2: ld  u32 %r37 c0[0x140] (0)
  3: mov u32 %r38 0x (0)
  4: mov u32 %r39 0x3f80 (0)
  5: mad f32 %r40 %r37 %r38 %r39 (0)
  6: mad f32 %r44 %r37 %r38 %r38 (0)
  7: add f32 %r53 %r31 %r40 (0)
  8: add f32 %r54 %r31 %r44 (0)
  9: add f32 %r57 %r56 %r44 (0)

Constantfolding...

MAIN:-1 ()
BB:0 (14 instructions) - df = { }
 -> BB:1 (tree)
  0: nop u32 %r56 (0)
  1: ld  u32 %r31 c0[0x0] (0)
  2: ld  u32 %r37 c0[0x140] (0)
  3: mov u32 %r38 0x (0)
  4: mov u32 %r39 0x3f80 (0)
  5: mov f32 %r40 %r39 (0)
  6: mov f32 %r44 %r38 (0)
  7: add f32 %r53 %r31 %r40 (0)
  8: mov f32 %r54 %r31 (0)
  9: mov f32 %r57 %r56 (0)


The outcome:
  0: ld  u32 $r2 c0[0x0] (8)
  1: mov u32 $r0 0x3f80 (8)
  2: add ftz f32 $r0 $r2 $r0 (8)
  3: mov f32 $r3 $r1 (8)
  4: mov u32 $r1 $r2 (8)
  5: export b128 # o[0x0] $r0q (8)

With patch:
  0: ld  u32 $r2 c0[0x0] (8)
  1: add ftz f32 $r0 $r2 1.00 (8)
  2: mov f32 $r3 $r1 (8)
  3: mov u32 $r1 $r2 (8)
  4: export b128 # o[0x0] $r0q (8)


[1]:
VERT
PROPERTY NEXT_SHADER FRAG
DCL OUT[0], GENERIC[0]
DCL CONST[0]
DCL TEMP[0..1], LOCAL
IMM[0] FLT32 {0.0078,-1., 0., 0.5000}
IMM[1] FLT32 {1., 0., 65535., 0.0100}
  0: MOV TEMP[0].xyz, CONST[0].
 39: MAD TEMP[1], CONST[20]., IMM[1]., IMM[1].xyyy
 41: ADD TEMP[1], TEMP[0], TEMP[1]
208: MOV OUT[0], TEMP[1]
211: END






+ }
+ else {
+i->op = i->src(0).mod.getOp();
+if (i->op != OP_CVT)
+   i->src(0).mod = 0;
+ }
} else
if (i->subOp != NV50_IR_SUBOP_MUL_HIGH &&
(imm0.isInteger(1) || imm0.isInteger(-1))) {
--
2.10.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

2016-10-02 Thread Ilia Mirkin
On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
 wrote:
> Previously we'd end up with an unnecessary mov for the thirs immediate value.
>
> total instructions in shared programs : 851881 -> 851864 (-0.00%)
> total gprs used in shared programs: 110295 -> 110295 (0.00%)
> total local used in shared programs   : 1020 -> 1020 (0.00%)
>
> localgpr   inst  bytes
> helped   0   0  17  17
>   hurt   0   0   0   0
>
> Suggested-by: Karol Herbst 
> Signed-off-by: Tobias Klausmann 
> ---
>  src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index 9875738..8bb5cf9 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
> &imm0, int s)
>break;
> case OP_MAD:
>if (imm0.isInteger(0)) {
> + ImmediateValue imm1;
>   i->setSrc(0, i->getSrc(2));
>   i->src(0).mod = i->src(2).mod;
>   i->setSrc(1, NULL);
>   i->setSrc(2, NULL);
> - i->op = i->src(0).mod.getOp();
> - if (i->op != OP_CVT)
> -i->src(0).mod = 0;
> + if (i->src(0).getImmediate(imm1)) {
> +bld.setPosition(i, false);
> +newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64),
> + i->dType);
> +delete_Instruction(prog, i);

What's an example of a situation where this helps? It shouldn't
matter, the mov's should get cleaned up. [Clearly 17 shaders
disagree...] Is this just a side-effect of the fact that we don't run
the opts to a fixed point?

> + }
> + else {
> +i->op = i->src(0).mod.getOp();
> +if (i->op != OP_CVT)
> +   i->src(0).mod = 0;
> + }
>} else
>if (i->subOp != NV50_IR_SUBOP_MUL_HIGH &&
>(imm0.isInteger(1) || imm0.isInteger(-1))) {
> --
> 2.10.0
>
> ___
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

2016-10-02 Thread Tobias Klausmann
Previously we'd end up with an unnecessary mov for the thirs immediate value.

total instructions in shared programs : 851881 -> 851864 (-0.00%)
total gprs used in shared programs: 110295 -> 110295 (0.00%)
total local used in shared programs   : 1020 -> 1020 (0.00%)

localgpr   inst  bytes
helped   0   0  17  17
  hurt   0   0   0   0

Suggested-by: Karol Herbst 
Signed-off-by: Tobias Klausmann 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 9875738..8bb5cf9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
&imm0, int s)
   break;
case OP_MAD:
   if (imm0.isInteger(0)) {
+ ImmediateValue imm1;
  i->setSrc(0, i->getSrc(2));
  i->src(0).mod = i->src(2).mod;
  i->setSrc(1, NULL);
  i->setSrc(2, NULL);
- i->op = i->src(0).mod.getOp();
- if (i->op != OP_CVT)
-i->src(0).mod = 0;
+ if (i->src(0).getImmediate(imm1)) {
+bld.setPosition(i, false);
+newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64),
+ i->dType);
+delete_Instruction(prog, i);
+ }
+ else {
+i->op = i->src(0).mod.getOp();
+if (i->op != OP_CVT)
+   i->src(0).mod = 0;
+ }
   } else
   if (i->subOp != NV50_IR_SUBOP_MUL_HIGH &&
   (imm0.isInteger(1) || imm0.isInteger(-1))) {
-- 
2.10.0

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau