Re: [Mesa-dev] [PATCH] i965: Extend compute-to-mrf pass to understand blocks of MOVs

2014-07-08 Thread Kristian Høgsberg
On Mon, Jun 30, 2014 at 3:14 PM, Matt Turner matts...@gmail.com wrote:
 On Fri, Jun 27, 2014 at 12:00 PM, Kristian Høgsberg k...@bitplanet.net 
 wrote:
 From: Kristian Høgsberg krh@century-sparrow.local

 With your email address fixed,

Done, thanks for the review.  I realized that this also applies to MRT
shaders where we write the same value to two render targets.  From
piglits fbo-drawbuffers2-blend:

0x: pln(8)  g61F  g40,1,0F  g28,8,1F
 { align1 WE_normal 1Q compacted };
0x0008: pln(8)  g71F  g4.40,1,0Fg28,8,1F
 { align1 WE_normal 1Q compacted };
0x0010: pln(8)  g81F  g5.40,1,0Fg28,8,1F
 { align1 WE_normal 1Q compacted };
0x0018: math inv(8) g21F  g88,8,1F  null
 { align1 WE_normal 1Q compacted };
0x0020: mul(8)  g31F  g68,8,1F  g28,8,1F
 { align1 WE_normal 1Q compacted };
0x0028: mul(8)  g41F  g78,8,1F  g28,8,1F
 { align1 WE_normal 1Q compacted };
0x0030: send(8) g21UW g38,8,1F
sampler (1, 0, 0, 1) mlen 2 rlen 4
 { align1 WE_normal 1Q };
0x0040: mov(8)  g1131Fg28,8,1F
 { align1 WE_normal 1Q compacted };
0x0048: mov(8)  g1141Fg38,8,1F
 { align1 WE_normal 1Q compacted };
0x0050: mov(8)  g1151Fg48,8,1F
 { align1 WE_normal 1Q compacted };
0x0058: mov(8)  g1161Fg58,8,1F
 { align1 WE_normal 1Q compacted };
0x0060: sendc(8)nullg1138,8,1F
render RT write SIMD8 LastRT Surface = 0
mlen 4 rlen 0 { align1 WE_normal 1Q EOT };

becomes:

0x: pln(8)  g61F  g40,1,0F  g28,8,1F
 { align1 WE_normal 1Q compacted };
0x0008: pln(8)  g81F  g4.40,1,0Fg28,8,1F
 { align1 WE_normal 1Q compacted };
0x0010: pln(8)  g91F  g5.40,1,0Fg28,8,1F
 { align1 WE_normal 1Q compacted };
0x0018: math inv(8) g71F  g98,8,1F  null
 { align1 WE_normal 1Q compacted };
0x0020: mul(8)  g21F  g68,8,1F  g78,8,1F
 { align1 WE_normal 1Q compacted };
0x0028: mul(8)  g31F  g88,8,1F  g78,8,1F
 { align1 WE_normal 1Q compacted };
0x0030: send(8) g1131UW   g28,8,1F
sampler (1, 0, 0, 1) mlen 2 rlen 4
 { align1 WE_normal 1Q };
0x0040: sendc(8)nullg1138,8,1F
render RT write SIMD8 LastRT Surface = 0
mlen 4 rlen 0 { align1 WE_normal 1Q EOT };

which is lovely.

Kristian

 Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Extend compute-to-mrf pass to understand blocks of MOVs

2014-07-08 Thread Kristian Høgsberg
On Mon, Jul 7, 2014 at 11:07 PM, Kristian Høgsberg k...@bitplanet.net wrote:
 On Mon, Jun 30, 2014 at 3:14 PM, Matt Turner matts...@gmail.com wrote:
 On Fri, Jun 27, 2014 at 12:00 PM, Kristian Høgsberg k...@bitplanet.net 
 wrote:
 From: Kristian Høgsberg krh@century-sparrow.local

 With your email address fixed,

 Done, thanks for the review.  I realized that this also applies to MRT
 shaders where we write the same value to two render targets.  From
 piglits fbo-drawbuffers2-blend:

 0x: pln(8)  g61F  g40,1,0F  g28,8,1F
  { align1 WE_normal 1Q compacted };
 0x0008: pln(8)  g71F  g4.40,1,0Fg28,8,1F
  { align1 WE_normal 1Q compacted };
 0x0010: pln(8)  g81F  g5.40,1,0Fg28,8,1F
  { align1 WE_normal 1Q compacted };
 0x0018: math inv(8) g21F  g88,8,1F  null
  { align1 WE_normal 1Q compacted };
 0x0020: mul(8)  g31F  g68,8,1F  g28,8,1F
  { align1 WE_normal 1Q compacted };
 0x0028: mul(8)  g41F  g78,8,1F  g28,8,1F
  { align1 WE_normal 1Q compacted };
 0x0030: send(8) g21UW g38,8,1F
 sampler (1, 0, 0, 1) mlen 2 rlen 4

Ugh, nevermind that, that's a sampler send there...

  { align1 WE_normal 1Q };
 0x0040: mov(8)  g1131Fg28,8,1F
  { align1 WE_normal 1Q compacted };
 0x0048: mov(8)  g1141Fg38,8,1F
  { align1 WE_normal 1Q compacted };
 0x0050: mov(8)  g1151Fg48,8,1F
  { align1 WE_normal 1Q compacted };
 0x0058: mov(8)  g1161Fg58,8,1F
  { align1 WE_normal 1Q compacted };
 0x0060: sendc(8)nullg1138,8,1F
 render RT write SIMD8 LastRT Surface = 0
 mlen 4 rlen 0 { align1 WE_normal 1Q EOT };

 becomes:

 0x: pln(8)  g61F  g40,1,0F  g28,8,1F
  { align1 WE_normal 1Q compacted };
 0x0008: pln(8)  g81F  g4.40,1,0Fg28,8,1F
  { align1 WE_normal 1Q compacted };
 0x0010: pln(8)  g91F  g5.40,1,0Fg28,8,1F
  { align1 WE_normal 1Q compacted };
 0x0018: math inv(8) g71F  g98,8,1F  null
  { align1 WE_normal 1Q compacted };
 0x0020: mul(8)  g21F  g68,8,1F  g78,8,1F
  { align1 WE_normal 1Q compacted };
 0x0028: mul(8)  g31F  g88,8,1F  g78,8,1F
  { align1 WE_normal 1Q compacted };
 0x0030: send(8) g1131UW   g28,8,1F
 sampler (1, 0, 0, 1) mlen 2 rlen 4
  { align1 WE_normal 1Q };
 0x0040: sendc(8)nullg1138,8,1F
 render RT write SIMD8 LastRT Surface = 0
 mlen 4 rlen 0 { align1 WE_normal 1Q EOT };

 which is lovely.

 Kristian

 Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Extend compute-to-mrf pass to understand blocks of MOVs

2014-06-30 Thread Matt Turner
On Fri, Jun 27, 2014 at 12:00 PM, Kristian Høgsberg k...@bitplanet.net wrote:
 From: Kristian Høgsberg krh@century-sparrow.local

With your email address fixed,

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Extend compute-to-mrf pass to understand blocks of MOVs

2014-06-27 Thread Kristian Høgsberg
From: Kristian Høgsberg krh@century-sparrow.local

The current compute-to-mrf pass doesn't handle blocks of MOVs.  Shaders
that end with a texture fetch follwed by an fb write are left like this:

0x: pln(8)  g61F  g40,1,0F  g28,8,1F  { 
align1 WE_normal 1Q compacted };
0x0008: pln(8)  g71F  g4.40,1,0Fg28,8,1F  { 
align1 WE_normal 1Q compacted };
0x0010: send(8) g21UW g68,8,1F
sampler (1, 0, 0, 1) mlen 2 rlen 4  { 
align1 WE_normal 1Q };
0x0020: mov(8)  g1131Fg28,8,1F  { 
align1 WE_normal 1Q compacted };
0x0028: mov(8)  g1141Fg38,8,1F  { 
align1 WE_normal 1Q compacted };
0x0030: mov(8)  g1151Fg48,8,1F  { 
align1 WE_normal 1Q compacted };
0x0038: mov(8)  g1161Fg58,8,1F  { 
align1 WE_normal 1Q compacted };
0x0040: sendc(8)nullg1138,8,1F
render ( RT write, 0, 4, 12) mlen 4 rlen 0  { 
align1 WE_normal 1Q EOT };

This patch lets compute-to-mrf recognize blocks of MOVs and match them to
instructions (typically SEND) that writes multiple registers.  With this,
the above shader becomes:

0x: pln(8)  g61F  g40,1,0F  g28,8,1F  { 
align1 WE_normal 1Q compacted };
0x0008: pln(8)  g71F  g4.40,1,0Fg28,8,1F  { 
align1 WE_normal 1Q compacted };
0x0010: send(8) g1131UW   g68,8,1F
sampler (1, 0, 0, 1) mlen 2 rlen 4  { 
align1 WE_normal 1Q };
0x0020: sendc(8)nullg1138,8,1F
render ( RT write, 0, 20, 12) mlen 4 rlen 0 { 
align1 WE_normal 1Q EOT };

which is the bulk of the shader db results:

total instructions in shared programs: 987040 - 986720 (-0.03%)
instructions in affected programs: 844 - 524 (-37.91%)
GAINED:0
LOST:  0

No measurable performance impact.  No piglit regressions.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 63 ++--
 1 file changed, 53 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 929379a..bbdc1f1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2076,7 +2076,8 @@ bool
 fs_visitor::compute_to_mrf()
 {
bool progress = false;
-   int next_ip = 0;
+   int next_ip = 0, block_size = 0, step = dispatch_width / 8;
+   fs_inst *block_start = NULL, *block_end = NULL;
 
calculate_live_intervals();
 
@@ -2092,8 +2093,27 @@ fs_visitor::compute_to_mrf()
  inst-dst.type != inst-src[0].type ||
  inst-src[0].abs || inst-src[0].negate ||
   !inst-src[0].is_contiguous() ||
-  inst-src[0].subreg_offset)
+  inst-src[0].subreg_offset) {
+ block_start = NULL;
 continue;
+  }
+
+  /* We're trying to identify a block of GRF-to-MRF MOVs for the purpose
+   * of rewriting the send that assigned the GRFs to just return in the
+   * MRFs directly.  send can't saturate, so if any of the MOVs do that,
+   * cancel the block.
+   */
+  if (inst-saturate) {
+ block_start = NULL;
+  } else if (block_start  inst-dst.reg == block_end-dst.reg + step 
+ inst-src[0].reg == block_end-src[0].reg 
+ inst-src[0].reg_offset == block_end-src[0].reg_offset + 1) {
+ block_size++;
+ block_end = inst;
+  } else if (inst-src[0].reg_offset == 0) {
+ block_size = 1;
+ block_start = block_end = inst;
+  }
 
   /* Work out which hardware MRF registers are written by this
* instruction.
@@ -2136,14 +2156,8 @@ fs_visitor::compute_to_mrf()
if (scan_inst-is_partial_write())
   break;
 
-/* Things returning more than one register would need us to
- * understand coalescing out more than one MOV at a time.
- */
-if (scan_inst-regs_written  1)
-   break;
-
-   /* SEND instructions can't have MRF as a destination. */
-   if (scan_inst-mlen)
+   /* SEND instructions can't have MRF as a destination before Gen7. */
+   if (brw-gen  7  scan_inst-mlen)
   break;
 
if (brw-gen == 6) {
@@ -2155,6 +2169,35 @@ fs_visitor::compute_to_mrf()
   }
}
 
+/* We have a contiguous block of mov to MRF that aligns with the
+ * return registers of a send instruction.  Modify the send
+ * instruction to just return in the MRFs.
+ */
+if (scan_inst-mlen  0 
+scan_inst-regs_written == block_size