Re: [Mesa-dev] [PATCH 4/7] i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.

2015-05-04 Thread Francisco Jerez
Kenneth Graunke kenn...@whitecape.org writes:

 On Thursday, April 30, 2015 04:59:49 PM Francisco Jerez wrote:
 Matt Turner matts...@gmail.com writes:
 
  On Fri, Feb 20, 2015 at 11:49 AM, Francisco Jerez curroje...@riseup.net 
  wrote:
  ---
   src/mesa/drivers/dri/i965/brw_fs.cpp   | 49 
  ++
   src/mesa/drivers/dri/i965/brw_fs.h |  1 +
   src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  1 +
   src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 +
   src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
   src/mesa/drivers/dri/i965/brw_vec4_cse.cpp |  1 +
   6 files changed, 94 insertions(+)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
  b/src/mesa/drivers/dri/i965/brw_fs.cpp
  index d567c2b..4537900 100644
  --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
  @@ -2734,6 +2734,54 @@ fs_visitor::compute_to_mrf()
   }
 
   /**
  + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control
  + * flow.  We could probably do better here with some form of divergence
  + * analysis.
  + */
  +bool
  +fs_visitor::eliminate_find_live_channel()
  +{
  +   bool progress = false;
  +   unsigned depth = 0;
  +
  +   foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
  +  switch (inst-opcode) {
  +  case BRW_OPCODE_IF:
  +  case BRW_OPCODE_DO:
  + depth++;
  + break;
  +
  +  case BRW_OPCODE_ENDIF:
  +  case BRW_OPCODE_WHILE:
  + depth--;
  + break;
  +
  +  case FS_OPCODE_DISCARD_JUMP:
  + /* This can potentially make control flow non-uniform until the 
  end
  +  * of the program.
  +  */
  + depth++;
  + break;

 Why not just return progress at this point?  depth++ just makes it
 effectively never do anything again, but continue iterating over the
 rest of the program.

Heh, indeed.

 Otherwise, this all seems good to me.  This patch is
 Reviewed-by: Kenneth Graunke kenn...@whitecape.org

Thank you both for having a look. :)

  +
  +  case SHADER_OPCODE_FIND_LIVE_CHANNEL:
  + if (depth == 0) {
  +inst-opcode = BRW_OPCODE_MOV;
  +inst-src[0] = fs_reg(0);
 
  How does this work if we're on a primitive edge and channel zero is
  disabled? Does the EU not actually disable those channels except in
  the framebuffer write or something?
 
 Yes, exactly, samples outside the primitive (or samples discarded by the
 early depth/stencil test) have the corresponding execution mask bit
 enabled, otherwise it wouldn't be possible to calculate derivatives near
 the edge of a primitive.  Whole subspans not covered by the primitive or
 dropped by early fragment tests are never dispatched to any PS thread,
 so channel 0 should always have valid contents except when disabled by
 control flow.
 
  +inst-sources = 1;
  +inst-force_writemask_all = true;
  +progress = true;
  + }
  + break;
  +
  +  default:
  + break;
  +  }
  +   }
  +
  +   return progress;
  +}
 
  I wouldn't mind if this was just part of opt_algebraic. Less code that
  way and one fewer pass over the instruction list. What do you think,
  Ken?
 
 I shortly considered doing that but finally decided not to because:
  - It's not an algebraic instruction.
  - The optimization is non-local unlike the other fully self-contained
algebraic optimizations, as it needs to keep track of the control
flow nesting level.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/7] i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.

2015-04-30 Thread Francisco Jerez
Matt Turner matts...@gmail.com writes:

 On Fri, Feb 20, 2015 at 11:49 AM, Francisco Jerez curroje...@riseup.net 
 wrote:
 ---
  src/mesa/drivers/dri/i965/brw_fs.cpp   | 49 
 ++
  src/mesa/drivers/dri/i965/brw_fs.h |  1 +
  src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  1 +
  src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 +
  src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
  src/mesa/drivers/dri/i965/brw_vec4_cse.cpp |  1 +
  6 files changed, 94 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index d567c2b..4537900 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -2734,6 +2734,54 @@ fs_visitor::compute_to_mrf()
  }

  /**
 + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control
 + * flow.  We could probably do better here with some form of divergence
 + * analysis.
 + */
 +bool
 +fs_visitor::eliminate_find_live_channel()
 +{
 +   bool progress = false;
 +   unsigned depth = 0;
 +
 +   foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
 +  switch (inst-opcode) {
 +  case BRW_OPCODE_IF:
 +  case BRW_OPCODE_DO:
 + depth++;
 + break;
 +
 +  case BRW_OPCODE_ENDIF:
 +  case BRW_OPCODE_WHILE:
 + depth--;
 + break;
 +
 +  case FS_OPCODE_DISCARD_JUMP:
 + /* This can potentially make control flow non-uniform until the end
 +  * of the program.
 +  */
 + depth++;
 + break;
 +
 +  case SHADER_OPCODE_FIND_LIVE_CHANNEL:
 + if (depth == 0) {
 +inst-opcode = BRW_OPCODE_MOV;
 +inst-src[0] = fs_reg(0);

 How does this work if we're on a primitive edge and channel zero is
 disabled? Does the EU not actually disable those channels except in
 the framebuffer write or something?

Yes, exactly, samples outside the primitive (or samples discarded by the
early depth/stencil test) have the corresponding execution mask bit
enabled, otherwise it wouldn't be possible to calculate derivatives near
the edge of a primitive.  Whole subspans not covered by the primitive or
dropped by early fragment tests are never dispatched to any PS thread,
so channel 0 should always have valid contents except when disabled by
control flow.

 +inst-sources = 1;
 +inst-force_writemask_all = true;
 +progress = true;
 + }
 + break;
 +
 +  default:
 + break;
 +  }
 +   }
 +
 +   return progress;
 +}

 I wouldn't mind if this was just part of opt_algebraic. Less code that
 way and one fewer pass over the instruction list. What do you think,
 Ken?

I shortly considered doing that but finally decided not to because:
 - It's not an algebraic instruction.
 - The optimization is non-local unlike the other fully self-contained
   algebraic optimizations, as it needs to keep track of the control
   flow nesting level.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/7] i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.

2015-04-30 Thread Kenneth Graunke
On Thursday, April 30, 2015 04:59:49 PM Francisco Jerez wrote:
 Matt Turner matts...@gmail.com writes:
 
  On Fri, Feb 20, 2015 at 11:49 AM, Francisco Jerez curroje...@riseup.net 
  wrote:
  ---
   src/mesa/drivers/dri/i965/brw_fs.cpp   | 49 
  ++
   src/mesa/drivers/dri/i965/brw_fs.h |  1 +
   src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  1 +
   src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 +
   src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
   src/mesa/drivers/dri/i965/brw_vec4_cse.cpp |  1 +
   6 files changed, 94 insertions(+)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
  b/src/mesa/drivers/dri/i965/brw_fs.cpp
  index d567c2b..4537900 100644
  --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
  @@ -2734,6 +2734,54 @@ fs_visitor::compute_to_mrf()
   }
 
   /**
  + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control
  + * flow.  We could probably do better here with some form of divergence
  + * analysis.
  + */
  +bool
  +fs_visitor::eliminate_find_live_channel()
  +{
  +   bool progress = false;
  +   unsigned depth = 0;
  +
  +   foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
  +  switch (inst-opcode) {
  +  case BRW_OPCODE_IF:
  +  case BRW_OPCODE_DO:
  + depth++;
  + break;
  +
  +  case BRW_OPCODE_ENDIF:
  +  case BRW_OPCODE_WHILE:
  + depth--;
  + break;
  +
  +  case FS_OPCODE_DISCARD_JUMP:
  + /* This can potentially make control flow non-uniform until the 
  end
  +  * of the program.
  +  */
  + depth++;
  + break;

Why not just return progress at this point?  depth++ just makes it
effectively never do anything again, but continue iterating over the
rest of the program.

Otherwise, this all seems good to me.  This patch is
Reviewed-by: Kenneth Graunke kenn...@whitecape.org

  +
  +  case SHADER_OPCODE_FIND_LIVE_CHANNEL:
  + if (depth == 0) {
  +inst-opcode = BRW_OPCODE_MOV;
  +inst-src[0] = fs_reg(0);
 
  How does this work if we're on a primitive edge and channel zero is
  disabled? Does the EU not actually disable those channels except in
  the framebuffer write or something?
 
 Yes, exactly, samples outside the primitive (or samples discarded by the
 early depth/stencil test) have the corresponding execution mask bit
 enabled, otherwise it wouldn't be possible to calculate derivatives near
 the edge of a primitive.  Whole subspans not covered by the primitive or
 dropped by early fragment tests are never dispatched to any PS thread,
 so channel 0 should always have valid contents except when disabled by
 control flow.
 
  +inst-sources = 1;
  +inst-force_writemask_all = true;
  +progress = true;
  + }
  + break;
  +
  +  default:
  + break;
  +  }
  +   }
  +
  +   return progress;
  +}
 
  I wouldn't mind if this was just part of opt_algebraic. Less code that
  way and one fewer pass over the instruction list. What do you think,
  Ken?
 
 I shortly considered doing that but finally decided not to because:
  - It's not an algebraic instruction.
  - The optimization is non-local unlike the other fully self-contained
algebraic optimizations, as it needs to keep track of the control
flow nesting level.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/7] i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.

2015-04-29 Thread Matt Turner
On Fri, Feb 20, 2015 at 11:49 AM, Francisco Jerez curroje...@riseup.net wrote:
 ---
  src/mesa/drivers/dri/i965/brw_fs.cpp   | 49 
 ++
  src/mesa/drivers/dri/i965/brw_fs.h |  1 +
  src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  1 +
  src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 +
  src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
  src/mesa/drivers/dri/i965/brw_vec4_cse.cpp |  1 +
  6 files changed, 94 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index d567c2b..4537900 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -2734,6 +2734,54 @@ fs_visitor::compute_to_mrf()
  }

  /**
 + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control
 + * flow.  We could probably do better here with some form of divergence
 + * analysis.
 + */
 +bool
 +fs_visitor::eliminate_find_live_channel()
 +{
 +   bool progress = false;
 +   unsigned depth = 0;
 +
 +   foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
 +  switch (inst-opcode) {
 +  case BRW_OPCODE_IF:
 +  case BRW_OPCODE_DO:
 + depth++;
 + break;
 +
 +  case BRW_OPCODE_ENDIF:
 +  case BRW_OPCODE_WHILE:
 + depth--;
 + break;
 +
 +  case FS_OPCODE_DISCARD_JUMP:
 + /* This can potentially make control flow non-uniform until the end
 +  * of the program.
 +  */
 + depth++;
 + break;
 +
 +  case SHADER_OPCODE_FIND_LIVE_CHANNEL:
 + if (depth == 0) {
 +inst-opcode = BRW_OPCODE_MOV;
 +inst-src[0] = fs_reg(0);

How does this work if we're on a primitive edge and channel zero is
disabled? Does the EU not actually disable those channels except in
the framebuffer write or something?

 +inst-sources = 1;
 +inst-force_writemask_all = true;
 +progress = true;
 + }
 + break;
 +
 +  default:
 + break;
 +  }
 +   }
 +
 +   return progress;
 +}

I wouldn't mind if this was just part of opt_algebraic. Less code that
way and one fewer pass over the instruction list. What do you think,
Ken?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/7] i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.

2015-02-20 Thread Francisco Jerez
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 49 ++
 src/mesa/drivers/dri/i965/brw_fs.h |  1 +
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  1 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 +
 src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
 src/mesa/drivers/dri/i965/brw_vec4_cse.cpp |  1 +
 6 files changed, 94 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index d567c2b..4537900 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2734,6 +2734,54 @@ fs_visitor::compute_to_mrf()
 }
 
 /**
+ * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control
+ * flow.  We could probably do better here with some form of divergence
+ * analysis.
+ */
+bool
+fs_visitor::eliminate_find_live_channel()
+{
+   bool progress = false;
+   unsigned depth = 0;
+
+   foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
+  switch (inst-opcode) {
+  case BRW_OPCODE_IF:
+  case BRW_OPCODE_DO:
+ depth++;
+ break;
+
+  case BRW_OPCODE_ENDIF:
+  case BRW_OPCODE_WHILE:
+ depth--;
+ break;
+
+  case FS_OPCODE_DISCARD_JUMP:
+ /* This can potentially make control flow non-uniform until the end
+  * of the program.
+  */
+ depth++;
+ break;
+
+  case SHADER_OPCODE_FIND_LIVE_CHANNEL:
+ if (depth == 0) {
+inst-opcode = BRW_OPCODE_MOV;
+inst-src[0] = fs_reg(0);
+inst-sources = 1;
+inst-force_writemask_all = true;
+progress = true;
+ }
+ break;
+
+  default:
+ break;
+  }
+   }
+
+   return progress;
+}
+
+/**
  * Once we've generated code, try to convert normal FS_OPCODE_FB_WRITE
  * instructions to FS_OPCODE_REP_FB_WRITE.
  */
@@ -3678,6 +3726,7 @@ fs_visitor::optimize()
   OPT(opt_saturate_propagation);
   OPT(register_coalesce);
   OPT(compute_to_mrf);
+  OPT(eliminate_find_live_channel);
 
   OPT(compact_virtual_grfs);
} while (progress);
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 9375412..6e5fa1d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -228,6 +228,7 @@ public:
bool opt_register_renaming();
bool register_coalesce();
bool compute_to_mrf();
+   bool eliminate_find_live_channel();
bool dead_code_eliminate();
bool remove_duplicate_mrf_writes();
bool virtual_grf_interferes(int a, int b);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index bbe0747..04a073e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -89,6 +89,7 @@ is_expression(const fs_inst *const inst)
case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD:
case FS_OPCODE_CINTERP:
case FS_OPCODE_LINTERP:
+   case SHADER_OPCODE_FIND_LIVE_CHANNEL:
case SHADER_OPCODE_BROADCAST:
   return true;
case SHADER_OPCODE_RCP:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 601b8c1..219c05e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1250,6 +1250,46 @@ vec4_visitor::opt_register_coalesce()
 }
 
 /**
+ * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control
+ * flow.  We could probably do better here with some form of divergence
+ * analysis.
+ */
+bool
+vec4_visitor::eliminate_find_live_channel()
+{
+   bool progress = false;
+   unsigned depth = 0;
+
+   foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
+  switch (inst-opcode) {
+  case BRW_OPCODE_IF:
+  case BRW_OPCODE_DO:
+ depth++;
+ break;
+
+  case BRW_OPCODE_ENDIF:
+  case BRW_OPCODE_WHILE:
+ depth--;
+ break;
+
+  case SHADER_OPCODE_FIND_LIVE_CHANNEL:
+ if (depth == 0) {
+inst-opcode = BRW_OPCODE_MOV;
+inst-src[0] = src_reg(0);
+inst-force_writemask_all = true;
+progress = true;
+ }
+ break;
+
+  default:
+ break;
+  }
+   }
+
+   return progress;
+}
+
+/**
  * Splits virtual GRFs requesting more than one contiguous physical register.
  *
  * We initially create large virtual GRFs for temporary structures, arrays,
@@ -1880,6 +1920,7 @@ vec4_visitor::run()
   OPT(opt_cse);
   OPT(opt_algebraic);
   OPT(opt_register_coalesce);
+  OPT(eliminate_find_live_channel);
} while (progress);
 
pass_num = 0;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index a24f843..7a92d59 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -207,6 +207,7 @@ public:
bool opt_cse();
bool opt_algebraic();
bool