The opt_zero_samples instruction tries to find the corresponding load_payload instruction for each sample instruction. However it was previously only looking at the previous instruction. This patch makes it search back within the block to whatever was the last instruction to write to each individual argument to the send message. There are two reasons two do this:
On Gen<=6 load_payload isn't used and there is a separate message register file. This version of the optimisation also finds MOVs into the MRF registers so it now also works on SNB. Unfortunately this doesn't show up in a shader-db report because the dead code eliminator doesn't do anything for instructions writing to MRF registers so it can't remove the redundant MOVs. However if I hack Mesa to report the message lengths instead of the instruction counts then it shows this: total mlen in shared programs: 2600373 -> 2574663 (-0.99%) mlen in affected programs: 237077 -> 211367 (-10.84%) helped: 3508 HURT: 0 I haven't tested whether reducing the message length without decreasing the instruction count is actually a performance benefit but it's hard to imagine that it could possibly be a disadvantage. It also paves the way to reduce the instruction count later if someone improves the dead code eliminator. Secondly it could help on other gens because sometimes the load_payload instruction can become separated from the corresponding send instruction and the old version wouldn't work in those cases. Currently this doesn't seem to make any difference in practice because the register coalescer is run after this optimisation. However it seems like this version is more robust. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 54 ++++++++++++++++++++++++++++-------- 1 file changed, 42 insertions(+), 12 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 97d7fd7..f87a5a7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2150,6 +2150,41 @@ fs_visitor::opt_algebraic() return progress; } +static bool +last_texture_source_is_zero(const fs_inst *send_inst) +{ + int reg_offset = send_inst->mlen - send_inst->exec_size / 8; + fs_reg src; + + /* Get the last argument of the texture instruction */ + if (send_inst->is_send_from_grf()) + src = byte_offset(send_inst->src[0], reg_offset * 32); + else + src = fs_reg(MRF, send_inst->base_mrf + reg_offset); + + /* Look for the last instruction that writes to the source */ + foreach_inst_in_block_reverse_starting_from(const fs_inst, inst, send_inst) { + if (inst->overwrites_reg(src)) { + if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) { + const int src_num = ((send_inst->mlen - send_inst->header_size) / + (inst->exec_size / 8) + + inst->header_size - 1); + return inst->src[src_num].is_zero(); + } else if (inst->opcode == BRW_OPCODE_MOV) { + if (inst->is_partial_write() || !inst->dst.equals(src)) + return false; + + return inst->src[0].is_zero(); + } + + /* Something unknown is writing to the src */ + break; + } + } + + return false; +} + /** * Optimize sample messages that have constant zero values for the trailing * texture coordinates. We can just reduce the message length for these @@ -2173,12 +2208,6 @@ fs_visitor::opt_zero_samples() if (!inst->is_tex()) continue; - fs_inst *load_payload = (fs_inst *) inst->prev; - - if (load_payload->is_head_sentinel() || - load_payload->opcode != SHADER_OPCODE_LOAD_PAYLOAD) - continue; - /* We don't want to remove the message header or the first parameter. * Removing the first parameter is not allowed, see the Haswell PRM * volume 7, page 149: @@ -2186,12 +2215,13 @@ fs_visitor::opt_zero_samples() * "Parameter 0 is required except for the sampleinfo message, which * has no parameter 0" */ - while (inst->mlen > inst->header_size + inst->exec_size / 8 && - load_payload->src[(inst->mlen - inst->header_size) / - (inst->exec_size / 8) + - inst->header_size - 1].is_zero()) { - inst->mlen -= inst->exec_size / 8; - progress = true; + while (inst->mlen > inst->header_size + inst->exec_size / 8) { + if (last_texture_source_is_zero(inst)) { + inst->mlen -= inst->exec_size / 8; + progress = true; + } else { + break; + } } } -- 1.9.3 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev