Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5
On Thursday, March 19, 2015 04:21:25 PM Eero Tamminen wrote: > Hi, > > On 03/16/2015 06:37 PM, Matt Turner wrote: > > On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli > > wrote: > >> Is there some particular Piglit test case that hits this path and is it > >> possible with gen>5 (by removing gen check)? I've tried this with > >> handicrafted shader_test and also shader-db and cannot hit the conditions > >> for changes to happen. Would be nice to be able to run examine changes and > >> understand this better. > > > > Well, from the shader-db stats there must be some shaders affected in > > shader-db. > > > > Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from > > include/pci_ids/i965_pci_ids.h. > > And disable drawing to avoid issues from mismatched GPU output? > > (With shader cache, I guess these could be used to implement "offline > compiler" for a build machine running on different GEN from target > machine. :-)) > > > - Eero libdrm handles the INTEL_DEVID_OVERRIDE variable. If the PCI ID doesn't match the actual one on your system, it skips executing batches. So yeah, that just happens automatically. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5
On 03/19/2015 07:21 AM, Eero Tamminen wrote: > Hi, > > On 03/16/2015 06:37 PM, Matt Turner wrote: >> On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli >> wrote: >>> Is there some particular Piglit test case that hits this path and is it >>> possible with gen>5 (by removing gen check)? I've tried this with >>> handicrafted shader_test and also shader-db and cannot hit the >>> conditions >>> for changes to happen. Would be nice to be able to run examine >>> changes and >>> understand this better. >> >> Well, from the shader-db stats there must be some shaders affected in >> shader-db. >> >> Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from >> include/pci_ids/i965_pci_ids.h. > > And disable drawing to avoid issues from mismatched GPU output? > > (With shader cache, I guess these could be used to implement "offline > compiler" for a build machine running on different GEN from target > machine. :-)) Shader-db doesn't do anything other than compile and link shaders, so there's no drawing to disable. As you will see in the commit messages of a series that should go out today, I've been collection shader-db for all platforms on my IVB system for quite some time. It seems that Jason has added an option to shader-db to force a specific platform via -p. Even easier! > - Eero > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5
Hi, On 03/16/2015 06:37 PM, Matt Turner wrote: On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli wrote: Is there some particular Piglit test case that hits this path and is it possible with gen>5 (by removing gen check)? I've tried this with handicrafted shader_test and also shader-db and cannot hit the conditions for changes to happen. Would be nice to be able to run examine changes and understand this better. Well, from the shader-db stats there must be some shaders affected in shader-db. Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from include/pci_ids/i965_pci_ids.h. And disable drawing to avoid issues from mismatched GPU output? (With shader cache, I guess these could be used to implement "offline compiler" for a build machine running on different GEN from target machine. :-)) - Eero ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5
Reviewed-by: Tapani Pälli On 03/11/2015 10:44 PM, Ian Romanick wrote: From: Ian Romanick On platforms that do not natively generate 0u and ~0u for Boolean results, b2f expressions that look like f = b2f(expr cmp 0) will generate better code by pretending the expression is f = ir_triop_sel(0.0, 1.0, expr cmp 0) This is because the last instruction of "expr" can generate the condition code for the "cmp 0". This avoids having to do the "-(b & 1)" trick to generate 0u or ~0u for the Boolean result. This means code like mov(16) g16<1>F 1F mul.ge.f0(16) nullg6<8,8,1>F g14<8,8,1>F (+f0) sel(16) m6<1>F g16<8,8,1>F 0F will be generated instead of mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F and(16) g4<1>D g2<8,8,1>D 1D and(16) m6<1>D -g4<8,8,1>D 0x3f80UD v2: When the comparison is either == 0.0 or != 0.0 use the knowledge that the true (or false) case already results in zero would allow better code generation by possibly avoiding a load-immediate instruction. v3: Apply the optimization even when neither comparitor is zero. Shader-db results: GM45 (0x2A42): total instructions in shared programs: 3551002 -> 3550829 (-0.00%) instructions in affected programs: 33269 -> 33096 (-0.52%) helped:121 Iron Lake (0x0046): total instructions in shared programs: 4993327 -> 4993146 (-0.00%) instructions in affected programs: 34199 -> 34018 (-0.53%) helped:129 No change on other platforms. Signed-off-by: Ian Romanick Cc: Tapani Palli --- src/mesa/drivers/dri/i965/brw_fs.h | 2 + src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 101 +-- 2 files changed, 99 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index d9d5858..075e90c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -307,6 +307,7 @@ public: const fs_reg &a); void emit_minmax(enum brw_conditional_mod conditionalmod, const fs_reg &dst, const fs_reg &src0, const fs_reg &src1); + bool try_emit_b2f_of_comparison(ir_expression *ir); bool try_emit_saturate(ir_expression *ir); bool try_emit_line(ir_expression *ir); bool try_emit_mad(ir_expression *ir); @@ -317,6 +318,7 @@ public: bool opt_saturate_propagation(); bool opt_cmod_propagation(); void emit_bool_to_cond_code(ir_rvalue *condition); + void emit_bool_to_cond_code_of_reg(ir_expression *expr, fs_reg op[3]); void emit_if_gen6(ir_if *ir); void emit_unspill(bblock_t *block, fs_inst *inst, fs_reg reg, uint32_t spill_offset, int count); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 3025a9d..3d79796 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -475,6 +475,87 @@ fs_visitor::try_emit_mad(ir_expression *ir) return true; } +bool +fs_visitor::try_emit_b2f_of_comparison(ir_expression *ir) +{ + /* On platforms that do not natively generate 0u and ~0u for Boolean +* results, b2f expressions that look like +* +* f = b2f(expr cmp 0) +* +* will generate better code by pretending the expression is +* +* f = ir_triop_csel(0.0, 1.0, expr cmp 0) +* +* This is because the last instruction of "expr" can generate the +* condition code for the "cmp 0". This avoids having to do the "-(b & 1)" +* trick to generate 0u or ~0u for the Boolean result. This means code like +* +* mov(16) g16<1>F 1F +* mul.ge.f0(16) nullg6<8,8,1>F g14<8,8,1>F +* (+f0) sel(16) m6<1>F g16<8,8,1>F 0F +* +* will be generated instead of +* +* mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F +* cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F +* and(16) g4<1>D g2<8,8,1>D 1D +* and(16) m6<1>D -g4<8,8,1>D 0x3f80UD +* +* When the comparison is either == 0.0 or != 0.0 using the knowledge that +* the true (or false) case already results in zero would allow better code +* generation by possibly avoiding a load-immediate instruction. +*/ + ir_expression *cmp = ir->operands[0]->as_expression(); + if (cmp == NULL) + return false; + + if (cmp->operation == ir_binop_equal || cmp->operation == ir_binop_nequal) { + for (unsigned i = 0; i < 2; i++) { + ir_constant *c = cmp->operands[i]->as_constant(); + if (c == NULL || !c->is_zero()) +continue; + + ir_expression *expr = cmp->operands[i ^ 1]->as_e
Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5
On 03/17/2015 07:42 AM, Tapani Pälli wrote: On 03/16/2015 06:37 PM, Matt Turner wrote: On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli wrote: Is there some particular Piglit test case that hits this path and is it possible with gen>5 (by removing gen check)? I've tried this with handicrafted shader_test and also shader-db and cannot hit the conditions for changes to happen. Would be nice to be able to run examine changes and understand this better. Well, from the shader-db stats there must be some shaders affected in shader-db. Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from include/pci_ids/i965_pci_ids.h. Yeah, got that but run.py for me says following error so I did not want to try manually go one by one. The amount of tests it runs before error changes between runs. seems I did not have the latest head, did git pull and now shader-db works for me, not sure what that error was. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5
On 03/16/2015 06:37 PM, Matt Turner wrote: On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli wrote: Is there some particular Piglit test case that hits this path and is it possible with gen>5 (by removing gen check)? I've tried this with handicrafted shader_test and also shader-db and cannot hit the conditions for changes to happen. Would be nice to be able to run examine changes and understand this better. Well, from the shader-db stats there must be some shaders affected in shader-db. Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from include/pci_ids/i965_pci_ids.h. Yeah, got that but run.py for me says following error so I did not want to try manually go one by one. The amount of tests it runs before error changes between runs. --- 8< --- [tpalli@localhost shader-db]$ ./run.py shaders shaders/humus-domino/1.shader_test vs : 140.073 secs shaders/nexuiz/43.shader_testvs : 250.092 secs shaders/anholt/12.shader_testvs : 80.062 secs Traceback (most recent call last): File "./run.py", line 149, in main() File "./run.py", line 142, in main for t in executor.map(run_test, filenames): File "/usr/lib64/python3.4/concurrent/futures/_base.py", line 549, in result_iterator yield future.result() File "/usr/lib64/python3.4/concurrent/futures/_base.py", line 395, in result return self.__get_result() File "/usr/lib64/python3.4/concurrent/futures/_base.py", line 354, in __get_result raise self._exception File "/usr/lib64/python3.4/concurrent/futures/thread.py", line 54, in run result = self.fn(*self.args, **self.kwargs) File "./run.py", line 95, in run_test counts[current_type] = counts[current_type] + 1 KeyError: 'UNKNOWN' ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5
On 03/16/2015 10:06 AM, Matt Turner wrote: > On Wed, Mar 11, 2015 at 1:44 PM, Ian Romanick wrote: >> From: Ian Romanick >> >> On platforms that do not natively generate 0u and ~0u for Boolean >> results, b2f expressions that look like >> >>f = b2f(expr cmp 0) >> >> will generate better code by pretending the expression is >> >> f = ir_triop_sel(0.0, 1.0, expr cmp 0) >> >> This is because the last instruction of "expr" can generate the >> condition code for the "cmp 0". This avoids having to do the "-(b & 1)" >> trick to generate 0u or ~0u for the Boolean result. This means code like >> >> mov(16) g16<1>F 1F >> mul.ge.f0(16) nullg6<8,8,1>F g14<8,8,1>F >> (+f0) sel(16) m6<1>F g16<8,8,1>F 0F >> >> will be generated instead of >> >> mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F >> cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F > > Presumably this g4 should be g2? Probably. I was cutting out of a diff of shader-db results, and I must have botched it. Here's the diff from shaders/anholt/6.shader_test: @@ -129,7 +129,7 @@ ) Native code for unnamed fragment shader 3 -SIMD8 shader: 77 instructions. 0 loops. Compacted 1232 to 832 bytes (32%) +SIMD8 shader: 76 instructions. 0 loops. Compacted 1216 to 816 bytes (33%) START B0 add(8) g9<1>UW g1.4<2,4,0>UW 0x10101010V { align1 }; mov(8) m3<1>F 16F { align1 }; @@ -163,7 +163,7 @@ add(8) g2<1>F g2<8,8,1>F g6<8,8,1>F { align1 compacted }; send(8) 2 g6<1>F g2<8,8,1>F math rsq mlen 1 rlen 1 { align1 }; -mul(8) g2<1>F g5<8,8,1>F g6<8,8,1>F { align1 compacted }; +mul.ge.f0(8)g2<1>F g5<8,8,1>F g6<8,8,1>F { align1 compacted }; mul(8) g5<1>F -g12<8,8,1>F-g12<8,8,1>F{ align1 compacted }; mul(8) g7<1>F -g16<8,8,1>F-g16<8,8,1>F{ align1 compacted }; mul(8) g8<1>F -g15<8,8,1>F-g15<8,8,1>F{ align1 compacted }; @@ -194,14 +194,13 @@ send(8) 2 g3<1>F g3<8,8,1>F math pow mlen 2 rlen 1 { align1 }; mul(8) m6<1>F g11<8,8,1>F g8<8,8,1>F { align1 }; -cmp.ge.f0(8)g4<1>F g2<8,8,1>F 0F { align1 }; +mov(8) g4<1>F 1F { align1 }; mov(8) m2<1>F g6<8,8,1>F { align1 }; mov(8) m3<1>F g7<8,8,1>F { align1 }; add(8) g9<1>F g2<8,8,1>F g3<8,8,1>F { align1 compacted }; -and(8) g8<1>D g4<8,8,1>D 1D { align1 }; +(+f0) sel(8)g8<1>F g4<8,8,1>F 0F { align1 }; send(8) 2 g4<1>UW null sampler (1, 0, 3, 1) mlen 5 rlen 4 { align1 }; -and(8) g8<1>D -g8<8,8,1>D 0x3f80UD{ align1 }; mul(8) g9<1>F g9<8,8,1>F g4<8,8,1>F { align1 compacted }; mul(8) m3<1>F g8<8,8,1>F g9<8,8,1>F { align1 }; mul(8) g9<1>F g2<8,8,1>F 0.7F{ align1 }; I think I can adjust the commit message to: "...This means code like mul.ge.f0(8)g2<1>F g5<8,8,1>F g6<8,8,1>F mov(8) g4<1>F 1F (+f0) sel(8)g8<1>F g4<8,8,1>F 0F will be generated instead of mul(8) g2<1>F g5<8,8,1>F g6<8,8,1>F cmp.ge.f0(8)g4<1>F g2<8,8,1>F 0F and(8) g8<1>D g4<8,8,1>D 1D and(8) g8<1>D -g8<8,8,1>D 0x3f80UD" I'll update the comment in the code too. >> and(16) g4<1>D g2<8,8,1>D 1D >> and(16) m6<1>D -g4<8,8,1>D 0x3f80UD >> >> v2: When the comparison is either == 0.0 or != 0.0 use the knowledge >> that the true (or false) case already results in zero would allow better >> code generation by possibly avoiding a load-immediate instruction. >> >> v3: Apply the optimization even when neither comparitor is zero. >> >> Shader-db results: >> >> GM45 (0x2A42): >> total instructions in shared programs: 3551002 -> 3550829 (-0.00%) >> instructions in affected programs: 33269 -> 33096 (-0.52%) >> helped:121 >> >> Iron Lake (0x0046): >> total instructions in shared programs: 4993327 -> 4993146 (-0.00%) >> instructions in affected programs: 34199 -> 34018 (-0.53%) >> helped:129 >> >> No change on other platforms. >> >> Signed-off-by: Ian Romanick >> Cc: Tapani Palli >> --- >> src/mesa/drivers/dri/i965/brw_fs.h | 2 + >> src/mesa/drivers/dri/i965/brw_f
Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5
On 03/16/2015 09:37 AM, Matt Turner wrote: > On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli wrote: >> Is there some particular Piglit test case that hits this path and is it >> possible with gen>5 (by removing gen check)? I've tried this with >> handicrafted shader_test and also shader-db and cannot hit the conditions >> for changes to happen. Would be nice to be able to run examine changes and >> understand this better. > > Well, from the shader-db stats there must be some shaders affected in > shader-db. For example, shaders/anholt/6.shader_test was helped. That appears to be the smallest (and one of the only open-source) shaders that is helped. helped: shaders/anholt/6.shader_test FS SIMD16: 86 -> 85 (-1.16%) helped: shaders/anholt/6.shader_test FS SIMD8:77 -> 76 (-1.30%) The part of that shader that generates the code we care about is: gl_FragColor = step(0.0, n_dot_l) * vec4((diffuse + vec3(specular)) * shadow, material_color.w); It should be easy enough to create a test using step() that generates similar code on GEN4 and GEN5. > Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from > include/pci_ids/i965_pci_ids.h. That's part of the reason I include the platform name and the PCI ID in the shader-db log in the commit message. :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5
On Wed, Mar 11, 2015 at 1:44 PM, Ian Romanick wrote: > From: Ian Romanick > > On platforms that do not natively generate 0u and ~0u for Boolean > results, b2f expressions that look like > >f = b2f(expr cmp 0) > > will generate better code by pretending the expression is > > f = ir_triop_sel(0.0, 1.0, expr cmp 0) > > This is because the last instruction of "expr" can generate the > condition code for the "cmp 0". This avoids having to do the "-(b & 1)" > trick to generate 0u or ~0u for the Boolean result. This means code like > > mov(16) g16<1>F 1F > mul.ge.f0(16) nullg6<8,8,1>F g14<8,8,1>F > (+f0) sel(16) m6<1>F g16<8,8,1>F 0F > > will be generated instead of > > mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F > cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F Presumably this g4 should be g2? > and(16) g4<1>D g2<8,8,1>D 1D > and(16) m6<1>D -g4<8,8,1>D 0x3f80UD > > v2: When the comparison is either == 0.0 or != 0.0 use the knowledge > that the true (or false) case already results in zero would allow better > code generation by possibly avoiding a load-immediate instruction. > > v3: Apply the optimization even when neither comparitor is zero. > > Shader-db results: > > GM45 (0x2A42): > total instructions in shared programs: 3551002 -> 3550829 (-0.00%) > instructions in affected programs: 33269 -> 33096 (-0.52%) > helped:121 > > Iron Lake (0x0046): > total instructions in shared programs: 4993327 -> 4993146 (-0.00%) > instructions in affected programs: 34199 -> 34018 (-0.53%) > helped:129 > > No change on other platforms. > > Signed-off-by: Ian Romanick > Cc: Tapani Palli > --- > src/mesa/drivers/dri/i965/brw_fs.h | 2 + > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 101 > +-- > 2 files changed, 99 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h > b/src/mesa/drivers/dri/i965/brw_fs.h > index d9d5858..075e90c 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.h > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > @@ -307,6 +307,7 @@ public: > const fs_reg &a); > void emit_minmax(enum brw_conditional_mod conditionalmod, const fs_reg > &dst, > const fs_reg &src0, const fs_reg &src1); > + bool try_emit_b2f_of_comparison(ir_expression *ir); > bool try_emit_saturate(ir_expression *ir); > bool try_emit_line(ir_expression *ir); > bool try_emit_mad(ir_expression *ir); > @@ -317,6 +318,7 @@ public: > bool opt_saturate_propagation(); > bool opt_cmod_propagation(); > void emit_bool_to_cond_code(ir_rvalue *condition); > + void emit_bool_to_cond_code_of_reg(ir_expression *expr, fs_reg op[3]); > void emit_if_gen6(ir_if *ir); > void emit_unspill(bblock_t *block, fs_inst *inst, fs_reg reg, > uint32_t spill_offset, int count); > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > index 3025a9d..3d79796 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > @@ -475,6 +475,87 @@ fs_visitor::try_emit_mad(ir_expression *ir) > return true; > } > > +bool > +fs_visitor::try_emit_b2f_of_comparison(ir_expression *ir) > +{ > + /* On platforms that do not natively generate 0u and ~0u for Boolean > +* results, b2f expressions that look like > +* > +* f = b2f(expr cmp 0) > +* > +* will generate better code by pretending the expression is > +* > +* f = ir_triop_csel(0.0, 1.0, expr cmp 0) > +* > +* This is because the last instruction of "expr" can generate the > +* condition code for the "cmp 0". This avoids having to do the "-(b & > 1)" > +* trick to generate 0u or ~0u for the Boolean result. This means code > like > +* > +* mov(16) g16<1>F 1F > +* mul.ge.f0(16) nullg6<8,8,1>F g14<8,8,1>F > +* (+f0) sel(16) m6<1>F g16<8,8,1>F 0F > +* > +* will be generated instead of > +* > +* mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F > +* cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F > +* and(16) g4<1>D g2<8,8,1>D 1D > +* and(16) m6<1>D -g4<8,8,1>D 0x3f80UD > +* > +* When the comparison is either == 0.0 or != 0.0 using the knowledge that > +* the true (or false) case already results in zero would allow better > code > +* generation by possibly avoiding a load-immediate instruction. > +*/ > + ir_expression *cmp = ir->operands[0]->as_expression(); > + if (cmp == NULL) > + return false; > + > + if (cmp->operation == ir_binop_equal || cmp->operation == > ir_binop_nequal) { > +
Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5
On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli wrote: > Is there some particular Piglit test case that hits this path and is it > possible with gen>5 (by removing gen check)? I've tried this with > handicrafted shader_test and also shader-db and cannot hit the conditions > for changes to happen. Would be nice to be able to run examine changes and > understand this better. Well, from the shader-db stats there must be some shaders affected in shader-db. Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from include/pci_ids/i965_pci_ids.h. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5
Hi Ian; Is there some particular Piglit test case that hits this path and is it possible with gen>5 (by removing gen check)? I've tried this with handicrafted shader_test and also shader-db and cannot hit the conditions for changes to happen. Would be nice to be able to run examine changes and understand this better. On 03/11/2015 10:44 PM, Ian Romanick wrote: From: Ian Romanick On platforms that do not natively generate 0u and ~0u for Boolean results, b2f expressions that look like f = b2f(expr cmp 0) will generate better code by pretending the expression is f = ir_triop_sel(0.0, 1.0, expr cmp 0) This is because the last instruction of "expr" can generate the condition code for the "cmp 0". This avoids having to do the "-(b & 1)" trick to generate 0u or ~0u for the Boolean result. This means code like mov(16) g16<1>F 1F mul.ge.f0(16) nullg6<8,8,1>F g14<8,8,1>F (+f0) sel(16) m6<1>F g16<8,8,1>F 0F will be generated instead of mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F and(16) g4<1>D g2<8,8,1>D 1D and(16) m6<1>D -g4<8,8,1>D 0x3f80UD v2: When the comparison is either == 0.0 or != 0.0 use the knowledge that the true (or false) case already results in zero would allow better code generation by possibly avoiding a load-immediate instruction. v3: Apply the optimization even when neither comparitor is zero. Shader-db results: GM45 (0x2A42): total instructions in shared programs: 3551002 -> 3550829 (-0.00%) instructions in affected programs: 33269 -> 33096 (-0.52%) helped:121 Iron Lake (0x0046): total instructions in shared programs: 4993327 -> 4993146 (-0.00%) instructions in affected programs: 34199 -> 34018 (-0.53%) helped:129 No change on other platforms. Signed-off-by: Ian Romanick Cc: Tapani Palli --- src/mesa/drivers/dri/i965/brw_fs.h | 2 + src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 101 +-- 2 files changed, 99 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index d9d5858..075e90c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -307,6 +307,7 @@ public: const fs_reg &a); void emit_minmax(enum brw_conditional_mod conditionalmod, const fs_reg &dst, const fs_reg &src0, const fs_reg &src1); + bool try_emit_b2f_of_comparison(ir_expression *ir); bool try_emit_saturate(ir_expression *ir); bool try_emit_line(ir_expression *ir); bool try_emit_mad(ir_expression *ir); @@ -317,6 +318,7 @@ public: bool opt_saturate_propagation(); bool opt_cmod_propagation(); void emit_bool_to_cond_code(ir_rvalue *condition); + void emit_bool_to_cond_code_of_reg(ir_expression *expr, fs_reg op[3]); void emit_if_gen6(ir_if *ir); void emit_unspill(bblock_t *block, fs_inst *inst, fs_reg reg, uint32_t spill_offset, int count); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 3025a9d..3d79796 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -475,6 +475,87 @@ fs_visitor::try_emit_mad(ir_expression *ir) return true; } +bool +fs_visitor::try_emit_b2f_of_comparison(ir_expression *ir) +{ + /* On platforms that do not natively generate 0u and ~0u for Boolean +* results, b2f expressions that look like +* +* f = b2f(expr cmp 0) +* +* will generate better code by pretending the expression is +* +* f = ir_triop_csel(0.0, 1.0, expr cmp 0) +* +* This is because the last instruction of "expr" can generate the +* condition code for the "cmp 0". This avoids having to do the "-(b & 1)" +* trick to generate 0u or ~0u for the Boolean result. This means code like +* +* mov(16) g16<1>F 1F +* mul.ge.f0(16) nullg6<8,8,1>F g14<8,8,1>F +* (+f0) sel(16) m6<1>F g16<8,8,1>F 0F +* +* will be generated instead of +* +* mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F +* cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F +* and(16) g4<1>D g2<8,8,1>D 1D +* and(16) m6<1>D -g4<8,8,1>D 0x3f80UD +* +* When the comparison is either == 0.0 or != 0.0 using the knowledge that +* the true (or false) case already results in zero would allow better code +* generation by possibly avoiding a load-immediate instruction. +*/ + ir_expression *cmp = ir->operands[0]->as_expression(); + if (cmp == NULL) + return false; + + if (cmp-