"discard" instructions generate HALT instructions which jump to a final HALT near the end of the shader. Previously, fs_generator created this final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it to jump right before the FB write epilogue. This is normally good.
However, INTEL_DEBUG=shader_time also has an epilogue section which records the final timestamp. The frontend emits IR for this just before FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering: 1. Shader Time Epilogue 2. Final HALT (where discards jump) 3. Framebuffer Write Epilogue This meant that discarded pixels completely skipped the shader time epilogue, causing no ending timestamp to be written. This obviously led to inaccurate results. This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just before any epilogue sections. This is where the final HALT should be generated, and makes it easy to ensure the correct ordering: 1. Final HALT 2. Shader Time Epilogue 3. Framebuffer Write Epilogue For shaders that don't discard, this opcode compiles away to nothing. The scheduler adds barrier dependencies to make sure that it doesn't get moved above any FS_OPCODE_DISCARD_JUMP instructions. One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to a mere 5.13 Gcycles. Cc: Eric Anholt <e...@anholt.net> Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> --- src/mesa/drivers/dri/i965/brw_defines.h | 1 + src/mesa/drivers/dri/i965/brw_fs.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 13 +++++++------ src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp | 3 +++ src/mesa/drivers/dri/i965/brw_shader.cpp | 3 +++ 5 files changed, 16 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index b45e4a4..47dceac 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -732,6 +732,7 @@ enum opcode { FS_OPCODE_PACK_HALF_2x16_SPLIT, FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X, FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y, + FS_OPCODE_PLACEHOLDER_HALT, VS_OPCODE_URB_WRITE, VS_OPCODE_SCRATCH_READ, diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index a34d5e8..e6e4302 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2766,6 +2766,8 @@ fs_visitor::run() if (failed) return false; + emit(FS_OPCODE_PLACEHOLDER_HALT); + emit_fb_writes(); split_virtual_grfs(); diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp index 5db481c..28217d7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp @@ -102,12 +102,6 @@ fs_generator::generate_fb_write(fs_inst *inst) struct brw_reg implied_header; uint32_t msg_control; - /* Note that the jumps emitted to this point mean that the g0 -> - * base_mrf setup must be inside of this function, so that we jump - * to a point containing it. - */ - patch_discard_jumps_to_fb_writes(); - /* Header is 2 regs, g0 and g1 are the contents. g0 will be implied * move, here's g1. */ @@ -1342,6 +1336,13 @@ fs_generator::generate_code(exec_list *instructions) generate_unpack_half_2x16_split(inst, dst, src[0]); break; + case FS_OPCODE_PLACEHOLDER_HALT: + /* This is the place where the final HALT needs to be inserted if + * we've emitted any discards. If not, this will emit no code. + */ + patch_discard_jumps_to_fb_writes(); + break; + default: if (inst->opcode < (int) ARRAY_SIZE(opcode_descs)) { _mesa_problem(ctx, "Unsupported opcode `%s' in FS", diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp index c125928..74f7c67 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp @@ -469,6 +469,9 @@ instruction_scheduler::calculate_deps() schedule_node *n = (schedule_node *)node; fs_inst *inst = n->inst; + if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT) + add_barrier_deps(n); + /* read-after-write deps. */ for (int i = 0; i < 3; i++) { if (inst->src[i].file == GRF) { diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index e4392bd..066cf4e 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -471,6 +471,9 @@ brw_instruction_name(enum opcode op) case FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y: return "unpack_half_2x16_split_y"; + case FS_OPCODE_PLACEHOLDER_HALT: + return "placeholder_halt"; + case VS_OPCODE_URB_WRITE: return "urb_write"; case VS_OPCODE_SCRATCH_READ: -- 1.8.2 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev