Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On Sun, Oct 2, 2016 at 2:43 PM, Tobias Klausmann wrote: > > > On 02.10.2016 20:26, Ilia Mirkin wrote: >> >> That's very odd. LoadPropagation should have picked that up even in >> its current form. Should try to figure out why it didn't and that is >> likely to "fix" a *lot* more situations. > > > Actually i was coming from an, given really constrained, addition to the > LoadPropagation pass, where i was told to fix it within OP_MAD :/ Why doesn't LoadPropagation handle it as-is? ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:26, Ilia Mirkin wrote: That's very odd. LoadPropagation should have picked that up even in its current form. Should try to figure out why it didn't and that is likely to "fix" a *lot* more situations. Actually i was coming from an, given really constrained, addition to the LoadPropagation pass, where i was told to fix it within OP_MAD :/ On Sun, Oct 2, 2016 at 2:24 PM, Tobias Klausmann wrote: On 02.10.2016 20:03, Ilia Mirkin wrote: On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann wrote: Previously we'd end up with an unnecessary mov for the thirs immediate value. total instructions in shared programs : 851881 -> 851864 (-0.00%) total gprs used in shared programs: 110295 -> 110295 (0.00%) total local used in shared programs : 1020 -> 1020 (0.00%) localgpr inst bytes helped 0 0 17 17 hurt 0 0 0 0 Suggested-by: Karol Herbst Signed-off-by: Tobias Klausmann --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 9875738..8bb5cf9 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) break; case OP_MAD: if (imm0.isInteger(0)) { + ImmediateValue imm1; i->setSrc(0, i->getSrc(2)); i->src(0).mod = i->src(2).mod; i->setSrc(1, NULL); i->setSrc(2, NULL); - i->op = i->src(0).mod.getOp(); - if (i->op != OP_CVT) -i->src(0).mod = 0; + if (i->src(0).getImmediate(imm1)) { +bld.setPosition(i, false); +newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), + i->dType); +delete_Instruction(prog, i); What's an example of a situation where this helps? It shouldn't matter, the mov's should get cleaned up. [Clearly 17 shaders disagree...] Is this just a side-effect of the fact that we don't run the opts to a fixed point? It is a second mov that causes a problem for later folding in the imm, here output of a testshader[1]: 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0) 2: ld u32 %r37 c0[0x140] (0) 3: mov u32 %r38 0x (0) 4: mov u32 %r39 0x3f80 (0) 5: mad f32 %r40 %r37 %r38 %r39 (0) 6: mad f32 %r44 %r37 %r38 %r38 (0) 7: add f32 %r53 %r31 %r40 (0) 8: add f32 %r54 %r31 %r44 (0) 9: add f32 %r57 %r56 %r44 (0) Constantfolding... MAIN:-1 () BB:0 (14 instructions) - df = { } -> BB:1 (tree) 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0) 2: ld u32 %r37 c0[0x140] (0) 3: mov u32 %r38 0x (0) 4: mov u32 %r39 0x3f80 (0) 5: mov f32 %r40 %r39 (0) 6: mov f32 %r44 %r38 (0) 7: add f32 %r53 %r31 %r40 (0) 8: mov f32 %r54 %r31 (0) 9: mov f32 %r57 %r56 (0) The outcome: 0: ld u32 $r2 c0[0x0] (8) 1: mov u32 $r0 0x3f80 (8) 2: add ftz f32 $r0 $r2 $r0 (8) 3: mov f32 $r3 $r1 (8) 4: mov u32 $r1 $r2 (8) 5: export b128 # o[0x0] $r0q (8) With patch: 0: ld u32 $r2 c0[0x0] (8) 1: add ftz f32 $r0 $r2 1.00 (8) 2: mov f32 $r3 $r1 (8) 3: mov u32 $r1 $r2 (8) 4: export b128 # o[0x0] $r0q (8) [1]: VERT PROPERTY NEXT_SHADER FRAG DCL OUT[0], GENERIC[0] DCL CONST[0] DCL TEMP[0..1], LOCAL IMM[0] FLT32 {0.0078,-1., 0., 0.5000} IMM[1] FLT32 {1., 0., 65535., 0.0100} 0: MOV TEMP[0].xyz, CONST[0]. 39: MAD TEMP[1], CONST[20]., IMM[1]., IMM[1].xyyy 41: ADD TEMP[1], TEMP[0], TEMP[1] 208: MOV OUT[0], TEMP[1] 211: END + } + else { +i->op = i->src(0).mod.getOp(); +if (i->op != OP_CVT) + i->src(0).mod = 0; + } } else if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && (imm0.isInteger(1) || imm0.isInteger(-1))) { -- 2.10.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
That's very odd. LoadPropagation should have picked that up even in its current form. Should try to figure out why it didn't and that is likely to "fix" a *lot* more situations. On Sun, Oct 2, 2016 at 2:24 PM, Tobias Klausmann wrote: > > > On 02.10.2016 20:03, Ilia Mirkin wrote: >> >> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann >> wrote: >>> >>> Previously we'd end up with an unnecessary mov for the thirs immediate >>> value. >>> >>> total instructions in shared programs : 851881 -> 851864 (-0.00%) >>> total gprs used in shared programs: 110295 -> 110295 (0.00%) >>> total local used in shared programs : 1020 -> 1020 (0.00%) >>> >>> localgpr inst bytes >>> helped 0 0 17 17 >>>hurt 0 0 0 0 >>> >>> Suggested-by: Karol Herbst >>> Signed-off-by: Tobias Klausmann >>> --- >>> src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 >>> --- >>> 1 file changed, 12 insertions(+), 3 deletions(-) >>> >>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> index 9875738..8bb5cf9 100644 >>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>> @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, >>> ImmediateValue &imm0, int s) >>> break; >>> case OP_MAD: >>> if (imm0.isInteger(0)) { >>> + ImmediateValue imm1; >>>i->setSrc(0, i->getSrc(2)); >>>i->src(0).mod = i->src(2).mod; >>>i->setSrc(1, NULL); >>>i->setSrc(2, NULL); >>> - i->op = i->src(0).mod.getOp(); >>> - if (i->op != OP_CVT) >>> -i->src(0).mod = 0; >>> + if (i->src(0).getImmediate(imm1)) { >>> +bld.setPosition(i, false); >>> +newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), >>> + i->dType); >>> +delete_Instruction(prog, i); >> >> What's an example of a situation where this helps? It shouldn't >> matter, the mov's should get cleaned up. [Clearly 17 shaders >> disagree...] Is this just a side-effect of the fact that we don't run >> the opts to a fixed point? > > > It is a second mov that causes a problem for later folding in the imm, here > output of a testshader[1]: > > 0: nop u32 %r56 (0) > 1: ld u32 %r31 c0[0x0] (0) > 2: ld u32 %r37 c0[0x140] (0) > 3: mov u32 %r38 0x (0) > 4: mov u32 %r39 0x3f80 (0) > 5: mad f32 %r40 %r37 %r38 %r39 (0) > 6: mad f32 %r44 %r37 %r38 %r38 (0) > 7: add f32 %r53 %r31 %r40 (0) > 8: add f32 %r54 %r31 %r44 (0) > 9: add f32 %r57 %r56 %r44 (0) > > Constantfolding... > > MAIN:-1 () > BB:0 (14 instructions) - df = { } > -> BB:1 (tree) > 0: nop u32 %r56 (0) > 1: ld u32 %r31 c0[0x0] (0) > 2: ld u32 %r37 c0[0x140] (0) > 3: mov u32 %r38 0x (0) > 4: mov u32 %r39 0x3f80 (0) > 5: mov f32 %r40 %r39 (0) > 6: mov f32 %r44 %r38 (0) > 7: add f32 %r53 %r31 %r40 (0) > 8: mov f32 %r54 %r31 (0) > 9: mov f32 %r57 %r56 (0) > > > The outcome: > 0: ld u32 $r2 c0[0x0] (8) > 1: mov u32 $r0 0x3f80 (8) > 2: add ftz f32 $r0 $r2 $r0 (8) > 3: mov f32 $r3 $r1 (8) > 4: mov u32 $r1 $r2 (8) > 5: export b128 # o[0x0] $r0q (8) > > With patch: > 0: ld u32 $r2 c0[0x0] (8) > 1: add ftz f32 $r0 $r2 1.00 (8) > 2: mov f32 $r3 $r1 (8) > 3: mov u32 $r1 $r2 (8) > 4: export b128 # o[0x0] $r0q (8) > > > [1]: > VERT > PROPERTY NEXT_SHADER FRAG > DCL OUT[0], GENERIC[0] > DCL CONST[0] > DCL TEMP[0..1], LOCAL > IMM[0] FLT32 {0.0078,-1., 0., 0.5000} > IMM[1] FLT32 {1., 0., 65535., 0.0100} > 0: MOV TEMP[0].xyz, CONST[0]. > 39: MAD TEMP[1], CONST[20]., IMM[1]., IMM[1].xyyy > 41: ADD TEMP[1], TEMP[0], TEMP[1] > 208: MOV OUT[0], TEMP[1] > 211: END > > > > >> >>> + } >>> + else { >>> +i->op = i->src(0).mod.getOp(); >>> +if (i->op != OP_CVT) >>> + i->src(0).mod = 0; >>> + } >>> } else >>> if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && >>> (imm0.isInteger(1) || imm0.isInteger(-1))) { >>> -- >>> 2.10.0 >>> >>> ___ >>> Nouveau mailing list >>> Nouveau@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/nouveau > > ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:03, Ilia Mirkin wrote: On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann wrote: Previously we'd end up with an unnecessary mov for the thirs immediate value. total instructions in shared programs : 851881 -> 851864 (-0.00%) total gprs used in shared programs: 110295 -> 110295 (0.00%) total local used in shared programs : 1020 -> 1020 (0.00%) localgpr inst bytes helped 0 0 17 17 hurt 0 0 0 0 Suggested-by: Karol Herbst Signed-off-by: Tobias Klausmann --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 9875738..8bb5cf9 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) break; case OP_MAD: if (imm0.isInteger(0)) { + ImmediateValue imm1; i->setSrc(0, i->getSrc(2)); i->src(0).mod = i->src(2).mod; i->setSrc(1, NULL); i->setSrc(2, NULL); - i->op = i->src(0).mod.getOp(); - if (i->op != OP_CVT) -i->src(0).mod = 0; + if (i->src(0).getImmediate(imm1)) { +bld.setPosition(i, false); +newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), + i->dType); +delete_Instruction(prog, i); What's an example of a situation where this helps? It shouldn't matter, the mov's should get cleaned up. [Clearly 17 shaders disagree...] Is this just a side-effect of the fact that we don't run the opts to a fixed point? It is a second mov that causes a problem for later folding in the imm, here output of a testshader[1]: 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0) 2: ld u32 %r37 c0[0x140] (0) 3: mov u32 %r38 0x (0) 4: mov u32 %r39 0x3f80 (0) 5: mad f32 %r40 %r37 %r38 %r39 (0) 6: mad f32 %r44 %r37 %r38 %r38 (0) 7: add f32 %r53 %r31 %r40 (0) 8: add f32 %r54 %r31 %r44 (0) 9: add f32 %r57 %r56 %r44 (0) Constantfolding... MAIN:-1 () BB:0 (14 instructions) - df = { } -> BB:1 (tree) 0: nop u32 %r56 (0) 1: ld u32 %r31 c0[0x0] (0) 2: ld u32 %r37 c0[0x140] (0) 3: mov u32 %r38 0x (0) 4: mov u32 %r39 0x3f80 (0) 5: mov f32 %r40 %r39 (0) 6: mov f32 %r44 %r38 (0) 7: add f32 %r53 %r31 %r40 (0) 8: mov f32 %r54 %r31 (0) 9: mov f32 %r57 %r56 (0) The outcome: 0: ld u32 $r2 c0[0x0] (8) 1: mov u32 $r0 0x3f80 (8) 2: add ftz f32 $r0 $r2 $r0 (8) 3: mov f32 $r3 $r1 (8) 4: mov u32 $r1 $r2 (8) 5: export b128 # o[0x0] $r0q (8) With patch: 0: ld u32 $r2 c0[0x0] (8) 1: add ftz f32 $r0 $r2 1.00 (8) 2: mov f32 $r3 $r1 (8) 3: mov u32 $r1 $r2 (8) 4: export b128 # o[0x0] $r0q (8) [1]: VERT PROPERTY NEXT_SHADER FRAG DCL OUT[0], GENERIC[0] DCL CONST[0] DCL TEMP[0..1], LOCAL IMM[0] FLT32 {0.0078,-1., 0., 0.5000} IMM[1] FLT32 {1., 0., 65535., 0.0100} 0: MOV TEMP[0].xyz, CONST[0]. 39: MAD TEMP[1], CONST[20]., IMM[1]., IMM[1].xyyy 41: ADD TEMP[1], TEMP[0], TEMP[1] 208: MOV OUT[0], TEMP[1] 211: END + } + else { +i->op = i->src(0).mod.getOp(); +if (i->op != OP_CVT) + i->src(0).mod = 0; + } } else if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && (imm0.isInteger(1) || imm0.isInteger(-1))) { -- 2.10.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann wrote: > Previously we'd end up with an unnecessary mov for the thirs immediate value. > > total instructions in shared programs : 851881 -> 851864 (-0.00%) > total gprs used in shared programs: 110295 -> 110295 (0.00%) > total local used in shared programs : 1020 -> 1020 (0.00%) > > localgpr inst bytes > helped 0 0 17 17 > hurt 0 0 0 0 > > Suggested-by: Karol Herbst > Signed-off-by: Tobias Klausmann > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 --- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index 9875738..8bb5cf9 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue > &imm0, int s) >break; > case OP_MAD: >if (imm0.isInteger(0)) { > + ImmediateValue imm1; > i->setSrc(0, i->getSrc(2)); > i->src(0).mod = i->src(2).mod; > i->setSrc(1, NULL); > i->setSrc(2, NULL); > - i->op = i->src(0).mod.getOp(); > - if (i->op != OP_CVT) > -i->src(0).mod = 0; > + if (i->src(0).getImmediate(imm1)) { > +bld.setPosition(i, false); > +newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), > + i->dType); > +delete_Instruction(prog, i); What's an example of a situation where this helps? It shouldn't matter, the mov's should get cleaned up. [Clearly 17 shaders disagree...] Is this just a side-effect of the fact that we don't run the opts to a fixed point? > + } > + else { > +i->op = i->src(0).mod.getOp(); > +if (i->op != OP_CVT) > + i->src(0).mod = 0; > + } >} else >if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && >(imm0.isInteger(1) || imm0.isInteger(-1))) { > -- > 2.10.0 > > ___ > Nouveau mailing list > Nouveau@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
Previously we'd end up with an unnecessary mov for the thirs immediate value. total instructions in shared programs : 851881 -> 851864 (-0.00%) total gprs used in shared programs: 110295 -> 110295 (0.00%) total local used in shared programs : 1020 -> 1020 (0.00%) localgpr inst bytes helped 0 0 17 17 hurt 0 0 0 0 Suggested-by: Karol Herbst Signed-off-by: Tobias Klausmann --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 9875738..8bb5cf9 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1008,13 +1008,22 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) break; case OP_MAD: if (imm0.isInteger(0)) { + ImmediateValue imm1; i->setSrc(0, i->getSrc(2)); i->src(0).mod = i->src(2).mod; i->setSrc(1, NULL); i->setSrc(2, NULL); - i->op = i->src(0).mod.getOp(); - if (i->op != OP_CVT) -i->src(0).mod = 0; + if (i->src(0).getImmediate(imm1)) { +bld.setPosition(i, false); +newi = bld.mkMov(i->getDef(0), bld.mkImm(imm1.reg.data.u64), + i->dType); +delete_Instruction(prog, i); + } + else { +i->op = i->src(0).mod.getOp(); +if (i->op != OP_CVT) + i->src(0).mod = 0; + } } else if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && (imm0.isInteger(1) || imm0.isInteger(-1))) { -- 2.10.0 ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau