Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

2015-03-19 Thread Kenneth Graunke
On Thursday, March 19, 2015 04:21:25 PM Eero Tamminen wrote:
> Hi,
> 
> On 03/16/2015 06:37 PM, Matt Turner wrote:
> > On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli  
> > wrote:
> >> Is there some particular Piglit test case that hits this path and is it
> >> possible with gen>5 (by removing gen check)? I've tried this with
> >> handicrafted shader_test and also shader-db and cannot hit the conditions
> >> for changes to happen. Would be nice to be able to run examine changes and
> >> understand this better.
> >
> > Well, from the shader-db stats there must be some shaders affected in 
> > shader-db.
> >
> > Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from
> > include/pci_ids/i965_pci_ids.h.
> 
> And disable drawing to avoid issues from mismatched GPU output?
> 
> (With shader cache, I guess these could be used to implement "offline 
> compiler" for a build machine running on different GEN from target 
> machine. :-))
> 
> 
>   - Eero

libdrm handles the INTEL_DEVID_OVERRIDE variable.  If the PCI ID doesn't
match the actual one on your system, it skips executing batches.

So yeah, that just happens automatically.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

2015-03-19 Thread Ian Romanick
On 03/19/2015 07:21 AM, Eero Tamminen wrote:
> Hi,
> 
> On 03/16/2015 06:37 PM, Matt Turner wrote:
>> On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli 
>> wrote:
>>> Is there some particular Piglit test case that hits this path and is it
>>> possible with gen>5 (by removing gen check)? I've tried this with
>>> handicrafted shader_test and also shader-db and cannot hit the
>>> conditions
>>> for changes to happen. Would be nice to be able to run examine
>>> changes and
>>> understand this better.
>>
>> Well, from the shader-db stats there must be some shaders affected in
>> shader-db.
>>
>> Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from
>> include/pci_ids/i965_pci_ids.h.
> 
> And disable drawing to avoid issues from mismatched GPU output?
> 
> (With shader cache, I guess these could be used to implement "offline
> compiler" for a build machine running on different GEN from target
> machine. :-))

Shader-db doesn't do anything other than compile and link shaders, so
there's no drawing to disable.  As you will see in the commit messages
of a series that should go out today, I've been collection shader-db for
all platforms on my IVB system for quite some time.

It seems that Jason has added an option to shader-db to force a specific
platform via -p.  Even easier!

> - Eero
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

2015-03-19 Thread Eero Tamminen

Hi,

On 03/16/2015 06:37 PM, Matt Turner wrote:

On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli  wrote:

Is there some particular Piglit test case that hits this path and is it
possible with gen>5 (by removing gen check)? I've tried this with
handicrafted shader_test and also shader-db and cannot hit the conditions
for changes to happen. Would be nice to be able to run examine changes and
understand this better.


Well, from the shader-db stats there must be some shaders affected in shader-db.

Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from
include/pci_ids/i965_pci_ids.h.


And disable drawing to avoid issues from mismatched GPU output?

(With shader cache, I guess these could be used to implement "offline 
compiler" for a build machine running on different GEN from target 
machine. :-))



- Eero

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

2015-03-18 Thread Tapani

Reviewed-by: Tapani Pälli 

On 03/11/2015 10:44 PM, Ian Romanick wrote:

From: Ian Romanick 

On platforms that do not natively generate 0u and ~0u for Boolean
results, b2f expressions that look like

f = b2f(expr cmp 0)

will generate better code by pretending the expression is

 f = ir_triop_sel(0.0, 1.0, expr cmp 0)

This is because the last instruction of "expr" can generate the
condition code for the "cmp 0".  This avoids having to do the "-(b & 1)"
trick to generate 0u or ~0u for the Boolean result.  This means code like

 mov(16) g16<1>F 1F
 mul.ge.f0(16)   nullg6<8,8,1>F  g14<8,8,1>F
 (+f0) sel(16)   m6<1>F  g16<8,8,1>F 0F

will be generated instead of

 mul(16) g2<1>F  g12<8,8,1>F g4<8,8,1>F
 cmp.ge.f0(16)   g2<1>D  g4<8,8,1>F  0F
 and(16) g4<1>D  g2<8,8,1>D  1D
 and(16) m6<1>D  -g4<8,8,1>D 0x3f80UD

v2: When the comparison is either == 0.0 or != 0.0 use the knowledge
that the true (or false) case already results in zero would allow better
code generation by possibly avoiding a load-immediate instruction.

v3: Apply the optimization even when neither comparitor is zero.

Shader-db results:

GM45 (0x2A42):
total instructions in shared programs: 3551002 -> 3550829 (-0.00%)
instructions in affected programs: 33269 -> 33096 (-0.52%)
helped:121

Iron Lake (0x0046):
total instructions in shared programs: 4993327 -> 4993146 (-0.00%)
instructions in affected programs: 34199 -> 34018 (-0.53%)
helped:129

No change on other platforms.

Signed-off-by: Ian Romanick 
Cc: Tapani Palli 
---
  src/mesa/drivers/dri/i965/brw_fs.h   |   2 +
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 101 +--
  2 files changed, 99 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index d9d5858..075e90c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -307,6 +307,7 @@ public:
   const fs_reg &a);
 void emit_minmax(enum brw_conditional_mod conditionalmod, const fs_reg 
&dst,
  const fs_reg &src0, const fs_reg &src1);
+   bool try_emit_b2f_of_comparison(ir_expression *ir);
 bool try_emit_saturate(ir_expression *ir);
 bool try_emit_line(ir_expression *ir);
 bool try_emit_mad(ir_expression *ir);
@@ -317,6 +318,7 @@ public:
 bool opt_saturate_propagation();
 bool opt_cmod_propagation();
 void emit_bool_to_cond_code(ir_rvalue *condition);
+   void emit_bool_to_cond_code_of_reg(ir_expression *expr, fs_reg op[3]);
 void emit_if_gen6(ir_if *ir);
 void emit_unspill(bblock_t *block, fs_inst *inst, fs_reg reg,
   uint32_t spill_offset, int count);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 3025a9d..3d79796 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -475,6 +475,87 @@ fs_visitor::try_emit_mad(ir_expression *ir)
 return true;
  }
  
+bool

+fs_visitor::try_emit_b2f_of_comparison(ir_expression *ir)
+{
+   /* On platforms that do not natively generate 0u and ~0u for Boolean
+* results, b2f expressions that look like
+*
+* f = b2f(expr cmp 0)
+*
+* will generate better code by pretending the expression is
+*
+* f = ir_triop_csel(0.0, 1.0, expr cmp 0)
+*
+* This is because the last instruction of "expr" can generate the
+* condition code for the "cmp 0".  This avoids having to do the "-(b & 1)"
+* trick to generate 0u or ~0u for the Boolean result.  This means code like
+*
+* mov(16) g16<1>F 1F
+* mul.ge.f0(16)   nullg6<8,8,1>F  g14<8,8,1>F
+* (+f0) sel(16)   m6<1>F  g16<8,8,1>F 0F
+*
+* will be generated instead of
+*
+* mul(16) g2<1>F  g12<8,8,1>F g4<8,8,1>F
+* cmp.ge.f0(16)   g2<1>D  g4<8,8,1>F  0F
+* and(16) g4<1>D  g2<8,8,1>D  1D
+* and(16) m6<1>D  -g4<8,8,1>D 0x3f80UD
+*
+* When the comparison is either == 0.0 or != 0.0 using the knowledge that
+* the true (or false) case already results in zero would allow better code
+* generation by possibly avoiding a load-immediate instruction.
+*/
+   ir_expression *cmp = ir->operands[0]->as_expression();
+   if (cmp == NULL)
+  return false;
+
+   if (cmp->operation == ir_binop_equal || cmp->operation == ir_binop_nequal) {
+  for (unsigned i = 0; i < 2; i++) {
+ ir_constant *c = cmp->operands[i]->as_constant();
+ if (c == NULL || !c->is_zero())
+continue;
+
+ ir_expression *expr = cmp->operands[i ^ 1]->as_e

Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

2015-03-16 Thread Tapani Pälli


On 03/17/2015 07:42 AM, Tapani Pälli wrote:



On 03/16/2015 06:37 PM, Matt Turner wrote:

On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli 
wrote:

Is there some particular Piglit test case that hits this path and is it
possible with gen>5 (by removing gen check)? I've tried this with
handicrafted shader_test and also shader-db and cannot hit the
conditions
for changes to happen. Would be nice to be able to run examine
changes and
understand this better.


Well, from the shader-db stats there must be some shaders affected in
shader-db.

Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from
include/pci_ids/i965_pci_ids.h.



Yeah, got that but run.py for me says following error so I did not want
to try manually go one by one. The amount of tests it runs before error
changes between runs.


seems I did not have the latest head, did git pull and now shader-db 
works for me, not sure what that error was.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

2015-03-16 Thread Tapani Pälli



On 03/16/2015 06:37 PM, Matt Turner wrote:

On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli  wrote:

Is there some particular Piglit test case that hits this path and is it
possible with gen>5 (by removing gen check)? I've tried this with
handicrafted shader_test and also shader-db and cannot hit the conditions
for changes to happen. Would be nice to be able to run examine changes and
understand this better.


Well, from the shader-db stats there must be some shaders affected in shader-db.

Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from
include/pci_ids/i965_pci_ids.h.



Yeah, got that but run.py for me says following error so I did not want 
to try manually go one by one. The amount of tests it runs before error 
changes between runs.


--- 8< ---
[tpalli@localhost shader-db]$ ./run.py shaders
shaders/humus-domino/1.shader_test   vs   : 140.073 secs
shaders/nexuiz/43.shader_testvs   : 250.092 secs
shaders/anholt/12.shader_testvs   :  80.062 secs
Traceback (most recent call last):
  File "./run.py", line 149, in 
main()
  File "./run.py", line 142, in main
for t in executor.map(run_test, filenames):
  File "/usr/lib64/python3.4/concurrent/futures/_base.py", line 549, in 
result_iterator

yield future.result()
  File "/usr/lib64/python3.4/concurrent/futures/_base.py", line 395, in 
result

return self.__get_result()
  File "/usr/lib64/python3.4/concurrent/futures/_base.py", line 354, in 
__get_result

raise self._exception
  File "/usr/lib64/python3.4/concurrent/futures/thread.py", line 54, in run
result = self.fn(*self.args, **self.kwargs)
  File "./run.py", line 95, in run_test
counts[current_type] = counts[current_type] + 1
KeyError: 'UNKNOWN'

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

2015-03-16 Thread Ian Romanick
On 03/16/2015 10:06 AM, Matt Turner wrote:
> On Wed, Mar 11, 2015 at 1:44 PM, Ian Romanick  wrote:
>> From: Ian Romanick 
>>
>> On platforms that do not natively generate 0u and ~0u for Boolean
>> results, b2f expressions that look like
>>
>>f = b2f(expr cmp 0)
>>
>> will generate better code by pretending the expression is
>>
>> f = ir_triop_sel(0.0, 1.0, expr cmp 0)
>>
>> This is because the last instruction of "expr" can generate the
>> condition code for the "cmp 0".  This avoids having to do the "-(b & 1)"
>> trick to generate 0u or ~0u for the Boolean result.  This means code like
>>
>> mov(16) g16<1>F 1F
>> mul.ge.f0(16)   nullg6<8,8,1>F  g14<8,8,1>F
>> (+f0) sel(16)   m6<1>F  g16<8,8,1>F 0F
>>
>> will be generated instead of
>>
>> mul(16) g2<1>F  g12<8,8,1>F g4<8,8,1>F
>> cmp.ge.f0(16)   g2<1>D  g4<8,8,1>F  0F
> 
> Presumably this g4 should be g2?

Probably.  I was cutting out of a diff of shader-db results, and I must
have botched it.  Here's the diff from shaders/anholt/6.shader_test:

@@ -129,7 +129,7 @@
 )
 
 Native code for unnamed fragment shader 3
-SIMD8 shader: 77 instructions. 0 loops. Compacted 1232 to 832 bytes (32%)
+SIMD8 shader: 76 instructions. 0 loops. Compacted 1216 to 816 bytes (33%)
START B0
 add(8)  g9<1>UW g1.4<2,4,0>UW   0x10101010V { align1 };
 mov(8)  m3<1>F  16F { align1 };
@@ -163,7 +163,7 @@
 add(8)  g2<1>F  g2<8,8,1>F  g6<8,8,1>F  { align1 
compacted };
 send(8) 2   g6<1>F  g2<8,8,1>F
 math rsq mlen 1 rlen 1  { 
align1 };
-mul(8)  g2<1>F  g5<8,8,1>F  g6<8,8,1>F  { align1 
compacted };
+mul.ge.f0(8)g2<1>F  g5<8,8,1>F  g6<8,8,1>F  { align1 
compacted };
 mul(8)  g5<1>F  -g12<8,8,1>F-g12<8,8,1>F{ align1 
compacted };
 mul(8)  g7<1>F  -g16<8,8,1>F-g16<8,8,1>F{ align1 
compacted };
 mul(8)  g8<1>F  -g15<8,8,1>F-g15<8,8,1>F{ align1 
compacted };
@@ -194,14 +194,13 @@
 send(8) 2   g3<1>F  g3<8,8,1>F
 math pow mlen 2 rlen 1  { 
align1 };
 mul(8)  m6<1>F  g11<8,8,1>F g8<8,8,1>F  { align1 };
-cmp.ge.f0(8)g4<1>F  g2<8,8,1>F  0F  { align1 };
+mov(8)  g4<1>F  1F  { align1 };
 mov(8)  m2<1>F  g6<8,8,1>F  { align1 };
 mov(8)  m3<1>F  g7<8,8,1>F  { align1 };
 add(8)  g9<1>F  g2<8,8,1>F  g3<8,8,1>F  { align1 
compacted };
-and(8)  g8<1>D  g4<8,8,1>D  1D  { align1 };
+(+f0) sel(8)g8<1>F  g4<8,8,1>F  0F  { align1 };
 send(8) 2   g4<1>UW null
 sampler (1, 0, 3, 1) mlen 5 rlen 4  { 
align1 };
-and(8)  g8<1>D  -g8<8,8,1>D 0x3f80UD{ align1 };
 mul(8)  g9<1>F  g9<8,8,1>F  g4<8,8,1>F  { align1 
compacted };
 mul(8)  m3<1>F  g8<8,8,1>F  g9<8,8,1>F  { align1 };
 mul(8)  g9<1>F  g2<8,8,1>F  0.7F{ align1 };

I think I can adjust the commit message to:

"...This means code like

mul.ge.f0(8)g2<1>F  g5<8,8,1>F  g6<8,8,1>F
mov(8)  g4<1>F  1F
(+f0) sel(8)g8<1>F  g4<8,8,1>F  0F

will be generated instead of

mul(8)  g2<1>F  g5<8,8,1>F  g6<8,8,1>F
cmp.ge.f0(8)g4<1>F  g2<8,8,1>F  0F
and(8)  g8<1>D  g4<8,8,1>D  1D
and(8)  g8<1>D  -g8<8,8,1>D 0x3f80UD"

I'll update the comment in the code too.

>> and(16) g4<1>D  g2<8,8,1>D  1D
>> and(16) m6<1>D  -g4<8,8,1>D 0x3f80UD
>>
>> v2: When the comparison is either == 0.0 or != 0.0 use the knowledge
>> that the true (or false) case already results in zero would allow better
>> code generation by possibly avoiding a load-immediate instruction.
>>
>> v3: Apply the optimization even when neither comparitor is zero.
>>
>> Shader-db results:
>>
>> GM45 (0x2A42):
>> total instructions in shared programs: 3551002 -> 3550829 (-0.00%)
>> instructions in affected programs: 33269 -> 33096 (-0.52%)
>> helped:121
>>
>> Iron Lake (0x0046):
>> total instructions in shared programs: 4993327 -> 4993146 (-0.00%)
>> instructions in affected programs: 34199 -> 34018 (-0.53%)
>> helped:129
>>
>> No change on other platforms.
>>
>> Signed-off-by: Ian Romanick 
>> Cc: Tapani Palli 
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs.h   |   2 +
>>  src/mesa/drivers/dri/i965/brw_f

Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

2015-03-16 Thread Ian Romanick
On 03/16/2015 09:37 AM, Matt Turner wrote:
> On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli  wrote:
>> Is there some particular Piglit test case that hits this path and is it
>> possible with gen>5 (by removing gen check)? I've tried this with
>> handicrafted shader_test and also shader-db and cannot hit the conditions
>> for changes to happen. Would be nice to be able to run examine changes and
>> understand this better.
> 
> Well, from the shader-db stats there must be some shaders affected in 
> shader-db.

For example, shaders/anholt/6.shader_test was helped.  That appears to
be the smallest (and one of the only open-source) shaders that is
helped.

helped:   shaders/anholt/6.shader_test FS SIMD16:   86 -> 85 (-1.16%)
helped:   shaders/anholt/6.shader_test FS SIMD8:77 -> 76 (-1.30%)

The part of that shader that generates the code we care about is:

gl_FragColor = step(0.0, n_dot_l) *
vec4((diffuse + vec3(specular)) * shadow, material_color.w);

It should be easy enough to create a test using step() that generates
similar code on GEN4 and GEN5.

> Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from
> include/pci_ids/i965_pci_ids.h.

That's part of the reason I include the platform name and the PCI ID in
the shader-db log in the commit message. :)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

2015-03-16 Thread Matt Turner
On Wed, Mar 11, 2015 at 1:44 PM, Ian Romanick  wrote:
> From: Ian Romanick 
>
> On platforms that do not natively generate 0u and ~0u for Boolean
> results, b2f expressions that look like
>
>f = b2f(expr cmp 0)
>
> will generate better code by pretending the expression is
>
> f = ir_triop_sel(0.0, 1.0, expr cmp 0)
>
> This is because the last instruction of "expr" can generate the
> condition code for the "cmp 0".  This avoids having to do the "-(b & 1)"
> trick to generate 0u or ~0u for the Boolean result.  This means code like
>
> mov(16) g16<1>F 1F
> mul.ge.f0(16)   nullg6<8,8,1>F  g14<8,8,1>F
> (+f0) sel(16)   m6<1>F  g16<8,8,1>F 0F
>
> will be generated instead of
>
> mul(16) g2<1>F  g12<8,8,1>F g4<8,8,1>F
> cmp.ge.f0(16)   g2<1>D  g4<8,8,1>F  0F

Presumably this g4 should be g2?

> and(16) g4<1>D  g2<8,8,1>D  1D
> and(16) m6<1>D  -g4<8,8,1>D 0x3f80UD
>
> v2: When the comparison is either == 0.0 or != 0.0 use the knowledge
> that the true (or false) case already results in zero would allow better
> code generation by possibly avoiding a load-immediate instruction.
>
> v3: Apply the optimization even when neither comparitor is zero.
>
> Shader-db results:
>
> GM45 (0x2A42):
> total instructions in shared programs: 3551002 -> 3550829 (-0.00%)
> instructions in affected programs: 33269 -> 33096 (-0.52%)
> helped:121
>
> Iron Lake (0x0046):
> total instructions in shared programs: 4993327 -> 4993146 (-0.00%)
> instructions in affected programs: 34199 -> 34018 (-0.53%)
> helped:129
>
> No change on other platforms.
>
> Signed-off-by: Ian Romanick 
> Cc: Tapani Palli 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.h   |   2 +
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 101 
> +--
>  2 files changed, 99 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index d9d5858..075e90c 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -307,6 +307,7 @@ public:
>   const fs_reg &a);
> void emit_minmax(enum brw_conditional_mod conditionalmod, const fs_reg 
> &dst,
>  const fs_reg &src0, const fs_reg &src1);
> +   bool try_emit_b2f_of_comparison(ir_expression *ir);
> bool try_emit_saturate(ir_expression *ir);
> bool try_emit_line(ir_expression *ir);
> bool try_emit_mad(ir_expression *ir);
> @@ -317,6 +318,7 @@ public:
> bool opt_saturate_propagation();
> bool opt_cmod_propagation();
> void emit_bool_to_cond_code(ir_rvalue *condition);
> +   void emit_bool_to_cond_code_of_reg(ir_expression *expr, fs_reg op[3]);
> void emit_if_gen6(ir_if *ir);
> void emit_unspill(bblock_t *block, fs_inst *inst, fs_reg reg,
>   uint32_t spill_offset, int count);
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 3025a9d..3d79796 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -475,6 +475,87 @@ fs_visitor::try_emit_mad(ir_expression *ir)
> return true;
>  }
>
> +bool
> +fs_visitor::try_emit_b2f_of_comparison(ir_expression *ir)
> +{
> +   /* On platforms that do not natively generate 0u and ~0u for Boolean
> +* results, b2f expressions that look like
> +*
> +* f = b2f(expr cmp 0)
> +*
> +* will generate better code by pretending the expression is
> +*
> +* f = ir_triop_csel(0.0, 1.0, expr cmp 0)
> +*
> +* This is because the last instruction of "expr" can generate the
> +* condition code for the "cmp 0".  This avoids having to do the "-(b & 
> 1)"
> +* trick to generate 0u or ~0u for the Boolean result.  This means code 
> like
> +*
> +* mov(16) g16<1>F 1F
> +* mul.ge.f0(16)   nullg6<8,8,1>F  g14<8,8,1>F
> +* (+f0) sel(16)   m6<1>F  g16<8,8,1>F 0F
> +*
> +* will be generated instead of
> +*
> +* mul(16) g2<1>F  g12<8,8,1>F g4<8,8,1>F
> +* cmp.ge.f0(16)   g2<1>D  g4<8,8,1>F  0F
> +* and(16) g4<1>D  g2<8,8,1>D  1D
> +* and(16) m6<1>D  -g4<8,8,1>D 0x3f80UD
> +*
> +* When the comparison is either == 0.0 or != 0.0 using the knowledge that
> +* the true (or false) case already results in zero would allow better 
> code
> +* generation by possibly avoiding a load-immediate instruction.
> +*/
> +   ir_expression *cmp = ir->operands[0]->as_expression();
> +   if (cmp == NULL)
> +  return false;
> +
> +   if (cmp->operation == ir_binop_equal || cmp->operation == 
> ir_binop_nequal) {
> +

Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

2015-03-16 Thread Matt Turner
On Mon, Mar 16, 2015 at 4:54 AM, Tapani Pälli  wrote:
> Is there some particular Piglit test case that hits this path and is it
> possible with gen>5 (by removing gen check)? I've tried this with
> handicrafted shader_test and also shader-db and cannot hit the conditions
> for changes to happen. Would be nice to be able to run examine changes and
> understand this better.

Well, from the shader-db stats there must be some shaders affected in shader-db.

Use INTEL_DEVID_OVERRIDE=... with a Gen4 or Gen5 PCI ID from
include/pci_ids/i965_pci_ids.h.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/fs: Emit better b2f of an expression on GEN4 and GEN5

2015-03-16 Thread Tapani Pälli

Hi Ian;

Is there some particular Piglit test case that hits this path and is it 
possible with gen>5 (by removing gen check)? I've tried this with 
handicrafted shader_test and also shader-db and cannot hit the 
conditions for changes to happen. Would be nice to be able to run 
examine changes and understand this better.



On 03/11/2015 10:44 PM, Ian Romanick wrote:

From: Ian Romanick 

On platforms that do not natively generate 0u and ~0u for Boolean
results, b2f expressions that look like

f = b2f(expr cmp 0)

will generate better code by pretending the expression is

 f = ir_triop_sel(0.0, 1.0, expr cmp 0)

This is because the last instruction of "expr" can generate the
condition code for the "cmp 0".  This avoids having to do the "-(b & 1)"
trick to generate 0u or ~0u for the Boolean result.  This means code like

 mov(16) g16<1>F 1F
 mul.ge.f0(16)   nullg6<8,8,1>F  g14<8,8,1>F
 (+f0) sel(16)   m6<1>F  g16<8,8,1>F 0F

will be generated instead of

 mul(16) g2<1>F  g12<8,8,1>F g4<8,8,1>F
 cmp.ge.f0(16)   g2<1>D  g4<8,8,1>F  0F
 and(16) g4<1>D  g2<8,8,1>D  1D
 and(16) m6<1>D  -g4<8,8,1>D 0x3f80UD

v2: When the comparison is either == 0.0 or != 0.0 use the knowledge
that the true (or false) case already results in zero would allow better
code generation by possibly avoiding a load-immediate instruction.

v3: Apply the optimization even when neither comparitor is zero.

Shader-db results:

GM45 (0x2A42):
total instructions in shared programs: 3551002 -> 3550829 (-0.00%)
instructions in affected programs: 33269 -> 33096 (-0.52%)
helped:121

Iron Lake (0x0046):
total instructions in shared programs: 4993327 -> 4993146 (-0.00%)
instructions in affected programs: 34199 -> 34018 (-0.53%)
helped:129

No change on other platforms.

Signed-off-by: Ian Romanick 
Cc: Tapani Palli 
---
  src/mesa/drivers/dri/i965/brw_fs.h   |   2 +
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 101 +--
  2 files changed, 99 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index d9d5858..075e90c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -307,6 +307,7 @@ public:
   const fs_reg &a);
 void emit_minmax(enum brw_conditional_mod conditionalmod, const fs_reg 
&dst,
  const fs_reg &src0, const fs_reg &src1);
+   bool try_emit_b2f_of_comparison(ir_expression *ir);
 bool try_emit_saturate(ir_expression *ir);
 bool try_emit_line(ir_expression *ir);
 bool try_emit_mad(ir_expression *ir);
@@ -317,6 +318,7 @@ public:
 bool opt_saturate_propagation();
 bool opt_cmod_propagation();
 void emit_bool_to_cond_code(ir_rvalue *condition);
+   void emit_bool_to_cond_code_of_reg(ir_expression *expr, fs_reg op[3]);
 void emit_if_gen6(ir_if *ir);
 void emit_unspill(bblock_t *block, fs_inst *inst, fs_reg reg,
   uint32_t spill_offset, int count);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 3025a9d..3d79796 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -475,6 +475,87 @@ fs_visitor::try_emit_mad(ir_expression *ir)
 return true;
  }

+bool
+fs_visitor::try_emit_b2f_of_comparison(ir_expression *ir)
+{
+   /* On platforms that do not natively generate 0u and ~0u for Boolean
+* results, b2f expressions that look like
+*
+* f = b2f(expr cmp 0)
+*
+* will generate better code by pretending the expression is
+*
+* f = ir_triop_csel(0.0, 1.0, expr cmp 0)
+*
+* This is because the last instruction of "expr" can generate the
+* condition code for the "cmp 0".  This avoids having to do the "-(b & 1)"
+* trick to generate 0u or ~0u for the Boolean result.  This means code like
+*
+* mov(16) g16<1>F 1F
+* mul.ge.f0(16)   nullg6<8,8,1>F  g14<8,8,1>F
+* (+f0) sel(16)   m6<1>F  g16<8,8,1>F 0F
+*
+* will be generated instead of
+*
+* mul(16) g2<1>F  g12<8,8,1>F g4<8,8,1>F
+* cmp.ge.f0(16)   g2<1>D  g4<8,8,1>F  0F
+* and(16) g4<1>D  g2<8,8,1>D  1D
+* and(16) m6<1>D  -g4<8,8,1>D 0x3f80UD
+*
+* When the comparison is either == 0.0 or != 0.0 using the knowledge that
+* the true (or false) case already results in zero would allow better code
+* generation by possibly avoiding a load-immediate instruction.
+*/
+   ir_expression *cmp = ir->operands[0]->as_expression();
+   if (cmp == NULL)
+  return false;
+
+   if (cmp-