(Co-authored by Matt Turner.) Image atomics, for example, return a value - but the shader may not want to use it. We assigned a useless VGRF destination. This seemed harmless, but it can actually be quite harmful. The register allocator has to assign that VGRF to a real register. It may assign the same actual GRF to the destination of an instruction that follows soon after.
This results in a write-after-write (WAW) dependency, and stall. A number of "Deus Ex: Mankind Divided" shaders use image atomics, but don't use the return value. Several of these were hitting WAW stalls for nearly 14,000 cycles a pop. This patch cuts one shader's cycles by -98.35%! Making dead code elimination null out the destination avoids this issue. We can drop the redundant NOP handling as well - a following hunk already turns instructions with null destinations and no effects into NOPs. We do need to special case memory barriers as they need an actual VGRF to function correctly, even though that registers serves only to create register dependencies. On Skylake: total instructions in shared programs: 13700090 -> 13699698 (-0.00%) instructions in affected programs: 13514 -> 13122 (-2.90%) helped: 2 HURT: 0 total cycles in shared programs: 288879704 -> 278651496 (-3.54%) cycles in affected programs: 15842112 -> 5613904 (-64.56%) helped: 19 HURT: 3 total spills in shared programs: 15040 -> 14990 (-0.33%) spills in affected programs: 140 -> 90 (-35.71%) helped: 2 HURT: 0 total fills in shared programs: 17328 -> 17264 (-0.37%) fills in affected programs: 228 -> 164 (-28.07%) helped: 2 HURT: 0 Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> --- src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp index 8a0469a..4c524a4 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp @@ -52,7 +52,8 @@ fs_visitor::dead_code_eliminate() sizeof(BITSET_WORD)); foreach_inst_in_block_reverse_safe(fs_inst, inst, block) { - if (inst->dst.file == VGRF && !inst->has_side_effects()) { + if (inst->dst.file == VGRF && + inst->opcode != SHADER_OPCODE_MEMORY_FENCE) { const unsigned var = live_intervals->var_from_reg(inst->dst); bool result_live = false; @@ -60,13 +61,8 @@ fs_visitor::dead_code_eliminate() result_live |= BITSET_TEST(live, var + i); if (!result_live) { + inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type)); progress = true; - - if (inst->writes_accumulator || inst->flags_written()) { - inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type)); - } else { - inst->opcode = BRW_OPCODE_NOP; - } } } -- 2.10.2 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev