Re: [Mesa-dev] [PATCH] i965: Extend compute-to-mrf pass to understand blocks of MOVs
On Mon, Jun 30, 2014 at 3:14 PM, Matt Turner matts...@gmail.com wrote: On Fri, Jun 27, 2014 at 12:00 PM, Kristian Høgsberg k...@bitplanet.net wrote: From: Kristian Høgsberg krh@century-sparrow.local With your email address fixed, Done, thanks for the review. I realized that this also applies to MRT shaders where we write the same value to two render targets. From piglits fbo-drawbuffers2-blend: 0x: pln(8) g61F g40,1,0F g28,8,1F { align1 WE_normal 1Q compacted }; 0x0008: pln(8) g71F g4.40,1,0Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0010: pln(8) g81F g5.40,1,0Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0018: math inv(8) g21F g88,8,1F null { align1 WE_normal 1Q compacted }; 0x0020: mul(8) g31F g68,8,1F g28,8,1F { align1 WE_normal 1Q compacted }; 0x0028: mul(8) g41F g78,8,1F g28,8,1F { align1 WE_normal 1Q compacted }; 0x0030: send(8) g21UW g38,8,1F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x0040: mov(8) g1131Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0048: mov(8) g1141Fg38,8,1F { align1 WE_normal 1Q compacted }; 0x0050: mov(8) g1151Fg48,8,1F { align1 WE_normal 1Q compacted }; 0x0058: mov(8) g1161Fg58,8,1F { align1 WE_normal 1Q compacted }; 0x0060: sendc(8)nullg1138,8,1F render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; becomes: 0x: pln(8) g61F g40,1,0F g28,8,1F { align1 WE_normal 1Q compacted }; 0x0008: pln(8) g81F g4.40,1,0Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0010: pln(8) g91F g5.40,1,0Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0018: math inv(8) g71F g98,8,1F null { align1 WE_normal 1Q compacted }; 0x0020: mul(8) g21F g68,8,1F g78,8,1F { align1 WE_normal 1Q compacted }; 0x0028: mul(8) g31F g88,8,1F g78,8,1F { align1 WE_normal 1Q compacted }; 0x0030: send(8) g1131UW g28,8,1F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x0040: sendc(8)nullg1138,8,1F render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; which is lovely. Kristian Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Extend compute-to-mrf pass to understand blocks of MOVs
On Mon, Jul 7, 2014 at 11:07 PM, Kristian Høgsberg k...@bitplanet.net wrote: On Mon, Jun 30, 2014 at 3:14 PM, Matt Turner matts...@gmail.com wrote: On Fri, Jun 27, 2014 at 12:00 PM, Kristian Høgsberg k...@bitplanet.net wrote: From: Kristian Høgsberg krh@century-sparrow.local With your email address fixed, Done, thanks for the review. I realized that this also applies to MRT shaders where we write the same value to two render targets. From piglits fbo-drawbuffers2-blend: 0x: pln(8) g61F g40,1,0F g28,8,1F { align1 WE_normal 1Q compacted }; 0x0008: pln(8) g71F g4.40,1,0Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0010: pln(8) g81F g5.40,1,0Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0018: math inv(8) g21F g88,8,1F null { align1 WE_normal 1Q compacted }; 0x0020: mul(8) g31F g68,8,1F g28,8,1F { align1 WE_normal 1Q compacted }; 0x0028: mul(8) g41F g78,8,1F g28,8,1F { align1 WE_normal 1Q compacted }; 0x0030: send(8) g21UW g38,8,1F sampler (1, 0, 0, 1) mlen 2 rlen 4 Ugh, nevermind that, that's a sampler send there... { align1 WE_normal 1Q }; 0x0040: mov(8) g1131Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0048: mov(8) g1141Fg38,8,1F { align1 WE_normal 1Q compacted }; 0x0050: mov(8) g1151Fg48,8,1F { align1 WE_normal 1Q compacted }; 0x0058: mov(8) g1161Fg58,8,1F { align1 WE_normal 1Q compacted }; 0x0060: sendc(8)nullg1138,8,1F render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; becomes: 0x: pln(8) g61F g40,1,0F g28,8,1F { align1 WE_normal 1Q compacted }; 0x0008: pln(8) g81F g4.40,1,0Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0010: pln(8) g91F g5.40,1,0Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0018: math inv(8) g71F g98,8,1F null { align1 WE_normal 1Q compacted }; 0x0020: mul(8) g21F g68,8,1F g78,8,1F { align1 WE_normal 1Q compacted }; 0x0028: mul(8) g31F g88,8,1F g78,8,1F { align1 WE_normal 1Q compacted }; 0x0030: send(8) g1131UW g28,8,1F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x0040: sendc(8)nullg1138,8,1F render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; which is lovely. Kristian Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Extend compute-to-mrf pass to understand blocks of MOVs
On Fri, Jun 27, 2014 at 12:00 PM, Kristian Høgsberg k...@bitplanet.net wrote: From: Kristian Høgsberg krh@century-sparrow.local With your email address fixed, Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Extend compute-to-mrf pass to understand blocks of MOVs
From: Kristian Høgsberg krh@century-sparrow.local The current compute-to-mrf pass doesn't handle blocks of MOVs. Shaders that end with a texture fetch follwed by an fb write are left like this: 0x: pln(8) g61F g40,1,0F g28,8,1F { align1 WE_normal 1Q compacted }; 0x0008: pln(8) g71F g4.40,1,0Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0010: send(8) g21UW g68,8,1F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x0020: mov(8) g1131Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0028: mov(8) g1141Fg38,8,1F { align1 WE_normal 1Q compacted }; 0x0030: mov(8) g1151Fg48,8,1F { align1 WE_normal 1Q compacted }; 0x0038: mov(8) g1161Fg58,8,1F { align1 WE_normal 1Q compacted }; 0x0040: sendc(8)nullg1138,8,1F render ( RT write, 0, 4, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; This patch lets compute-to-mrf recognize blocks of MOVs and match them to instructions (typically SEND) that writes multiple registers. With this, the above shader becomes: 0x: pln(8) g61F g40,1,0F g28,8,1F { align1 WE_normal 1Q compacted }; 0x0008: pln(8) g71F g4.40,1,0Fg28,8,1F { align1 WE_normal 1Q compacted }; 0x0010: send(8) g1131UW g68,8,1F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 WE_normal 1Q }; 0x0020: sendc(8)nullg1138,8,1F render ( RT write, 0, 20, 12) mlen 4 rlen 0 { align1 WE_normal 1Q EOT }; which is the bulk of the shader db results: total instructions in shared programs: 987040 - 986720 (-0.03%) instructions in affected programs: 844 - 524 (-37.91%) GAINED:0 LOST: 0 No measurable performance impact. No piglit regressions. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs.cpp | 63 ++-- 1 file changed, 53 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 929379a..bbdc1f1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2076,7 +2076,8 @@ bool fs_visitor::compute_to_mrf() { bool progress = false; - int next_ip = 0; + int next_ip = 0, block_size = 0, step = dispatch_width / 8; + fs_inst *block_start = NULL, *block_end = NULL; calculate_live_intervals(); @@ -2092,8 +2093,27 @@ fs_visitor::compute_to_mrf() inst-dst.type != inst-src[0].type || inst-src[0].abs || inst-src[0].negate || !inst-src[0].is_contiguous() || - inst-src[0].subreg_offset) + inst-src[0].subreg_offset) { + block_start = NULL; continue; + } + + /* We're trying to identify a block of GRF-to-MRF MOVs for the purpose + * of rewriting the send that assigned the GRFs to just return in the + * MRFs directly. send can't saturate, so if any of the MOVs do that, + * cancel the block. + */ + if (inst-saturate) { + block_start = NULL; + } else if (block_start inst-dst.reg == block_end-dst.reg + step + inst-src[0].reg == block_end-src[0].reg + inst-src[0].reg_offset == block_end-src[0].reg_offset + 1) { + block_size++; + block_end = inst; + } else if (inst-src[0].reg_offset == 0) { + block_size = 1; + block_start = block_end = inst; + } /* Work out which hardware MRF registers are written by this * instruction. @@ -2136,14 +2156,8 @@ fs_visitor::compute_to_mrf() if (scan_inst-is_partial_write()) break; -/* Things returning more than one register would need us to - * understand coalescing out more than one MOV at a time. - */ -if (scan_inst-regs_written 1) - break; - - /* SEND instructions can't have MRF as a destination. */ - if (scan_inst-mlen) + /* SEND instructions can't have MRF as a destination before Gen7. */ + if (brw-gen 7 scan_inst-mlen) break; if (brw-gen == 6) { @@ -2155,6 +2169,35 @@ fs_visitor::compute_to_mrf() } } +/* We have a contiguous block of mov to MRF that aligns with the + * return registers of a send instruction. Modify the send + * instruction to just return in the MRFs. + */ +if (scan_inst-mlen 0 +scan_inst-regs_written == block_size