Re: [Mesa-dev] [PATCH 4/7] i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.
Kenneth Graunke kenn...@whitecape.org writes: On Thursday, April 30, 2015 04:59:49 PM Francisco Jerez wrote: Matt Turner matts...@gmail.com writes: On Fri, Feb 20, 2015 at 11:49 AM, Francisco Jerez curroje...@riseup.net wrote: --- src/mesa/drivers/dri/i965/brw_fs.cpp | 49 ++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 + src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 1 + 6 files changed, 94 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d567c2b..4537900 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2734,6 +2734,54 @@ fs_visitor::compute_to_mrf() } /** + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control + * flow. We could probably do better here with some form of divergence + * analysis. + */ +bool +fs_visitor::eliminate_find_live_channel() +{ + bool progress = false; + unsigned depth = 0; + + foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { + switch (inst-opcode) { + case BRW_OPCODE_IF: + case BRW_OPCODE_DO: + depth++; + break; + + case BRW_OPCODE_ENDIF: + case BRW_OPCODE_WHILE: + depth--; + break; + + case FS_OPCODE_DISCARD_JUMP: + /* This can potentially make control flow non-uniform until the end + * of the program. + */ + depth++; + break; Why not just return progress at this point? depth++ just makes it effectively never do anything again, but continue iterating over the rest of the program. Heh, indeed. Otherwise, this all seems good to me. This patch is Reviewed-by: Kenneth Graunke kenn...@whitecape.org Thank you both for having a look. :) + + case SHADER_OPCODE_FIND_LIVE_CHANNEL: + if (depth == 0) { +inst-opcode = BRW_OPCODE_MOV; +inst-src[0] = fs_reg(0); How does this work if we're on a primitive edge and channel zero is disabled? Does the EU not actually disable those channels except in the framebuffer write or something? Yes, exactly, samples outside the primitive (or samples discarded by the early depth/stencil test) have the corresponding execution mask bit enabled, otherwise it wouldn't be possible to calculate derivatives near the edge of a primitive. Whole subspans not covered by the primitive or dropped by early fragment tests are never dispatched to any PS thread, so channel 0 should always have valid contents except when disabled by control flow. +inst-sources = 1; +inst-force_writemask_all = true; +progress = true; + } + break; + + default: + break; + } + } + + return progress; +} I wouldn't mind if this was just part of opt_algebraic. Less code that way and one fewer pass over the instruction list. What do you think, Ken? I shortly considered doing that but finally decided not to because: - It's not an algebraic instruction. - The optimization is non-local unlike the other fully self-contained algebraic optimizations, as it needs to keep track of the control flow nesting level. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/7] i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.
Matt Turner matts...@gmail.com writes: On Fri, Feb 20, 2015 at 11:49 AM, Francisco Jerez curroje...@riseup.net wrote: --- src/mesa/drivers/dri/i965/brw_fs.cpp | 49 ++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 + src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 1 + 6 files changed, 94 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d567c2b..4537900 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2734,6 +2734,54 @@ fs_visitor::compute_to_mrf() } /** + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control + * flow. We could probably do better here with some form of divergence + * analysis. + */ +bool +fs_visitor::eliminate_find_live_channel() +{ + bool progress = false; + unsigned depth = 0; + + foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { + switch (inst-opcode) { + case BRW_OPCODE_IF: + case BRW_OPCODE_DO: + depth++; + break; + + case BRW_OPCODE_ENDIF: + case BRW_OPCODE_WHILE: + depth--; + break; + + case FS_OPCODE_DISCARD_JUMP: + /* This can potentially make control flow non-uniform until the end + * of the program. + */ + depth++; + break; + + case SHADER_OPCODE_FIND_LIVE_CHANNEL: + if (depth == 0) { +inst-opcode = BRW_OPCODE_MOV; +inst-src[0] = fs_reg(0); How does this work if we're on a primitive edge and channel zero is disabled? Does the EU not actually disable those channels except in the framebuffer write or something? Yes, exactly, samples outside the primitive (or samples discarded by the early depth/stencil test) have the corresponding execution mask bit enabled, otherwise it wouldn't be possible to calculate derivatives near the edge of a primitive. Whole subspans not covered by the primitive or dropped by early fragment tests are never dispatched to any PS thread, so channel 0 should always have valid contents except when disabled by control flow. +inst-sources = 1; +inst-force_writemask_all = true; +progress = true; + } + break; + + default: + break; + } + } + + return progress; +} I wouldn't mind if this was just part of opt_algebraic. Less code that way and one fewer pass over the instruction list. What do you think, Ken? I shortly considered doing that but finally decided not to because: - It's not an algebraic instruction. - The optimization is non-local unlike the other fully self-contained algebraic optimizations, as it needs to keep track of the control flow nesting level. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/7] i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.
On Thursday, April 30, 2015 04:59:49 PM Francisco Jerez wrote: Matt Turner matts...@gmail.com writes: On Fri, Feb 20, 2015 at 11:49 AM, Francisco Jerez curroje...@riseup.net wrote: --- src/mesa/drivers/dri/i965/brw_fs.cpp | 49 ++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 + src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 1 + 6 files changed, 94 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d567c2b..4537900 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2734,6 +2734,54 @@ fs_visitor::compute_to_mrf() } /** + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control + * flow. We could probably do better here with some form of divergence + * analysis. + */ +bool +fs_visitor::eliminate_find_live_channel() +{ + bool progress = false; + unsigned depth = 0; + + foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { + switch (inst-opcode) { + case BRW_OPCODE_IF: + case BRW_OPCODE_DO: + depth++; + break; + + case BRW_OPCODE_ENDIF: + case BRW_OPCODE_WHILE: + depth--; + break; + + case FS_OPCODE_DISCARD_JUMP: + /* This can potentially make control flow non-uniform until the end + * of the program. + */ + depth++; + break; Why not just return progress at this point? depth++ just makes it effectively never do anything again, but continue iterating over the rest of the program. Otherwise, this all seems good to me. This patch is Reviewed-by: Kenneth Graunke kenn...@whitecape.org + + case SHADER_OPCODE_FIND_LIVE_CHANNEL: + if (depth == 0) { +inst-opcode = BRW_OPCODE_MOV; +inst-src[0] = fs_reg(0); How does this work if we're on a primitive edge and channel zero is disabled? Does the EU not actually disable those channels except in the framebuffer write or something? Yes, exactly, samples outside the primitive (or samples discarded by the early depth/stencil test) have the corresponding execution mask bit enabled, otherwise it wouldn't be possible to calculate derivatives near the edge of a primitive. Whole subspans not covered by the primitive or dropped by early fragment tests are never dispatched to any PS thread, so channel 0 should always have valid contents except when disabled by control flow. +inst-sources = 1; +inst-force_writemask_all = true; +progress = true; + } + break; + + default: + break; + } + } + + return progress; +} I wouldn't mind if this was just part of opt_algebraic. Less code that way and one fewer pass over the instruction list. What do you think, Ken? I shortly considered doing that but finally decided not to because: - It's not an algebraic instruction. - The optimization is non-local unlike the other fully self-contained algebraic optimizations, as it needs to keep track of the control flow nesting level. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/7] i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.
On Fri, Feb 20, 2015 at 11:49 AM, Francisco Jerez curroje...@riseup.net wrote: --- src/mesa/drivers/dri/i965/brw_fs.cpp | 49 ++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 + src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 1 + 6 files changed, 94 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d567c2b..4537900 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2734,6 +2734,54 @@ fs_visitor::compute_to_mrf() } /** + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control + * flow. We could probably do better here with some form of divergence + * analysis. + */ +bool +fs_visitor::eliminate_find_live_channel() +{ + bool progress = false; + unsigned depth = 0; + + foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { + switch (inst-opcode) { + case BRW_OPCODE_IF: + case BRW_OPCODE_DO: + depth++; + break; + + case BRW_OPCODE_ENDIF: + case BRW_OPCODE_WHILE: + depth--; + break; + + case FS_OPCODE_DISCARD_JUMP: + /* This can potentially make control flow non-uniform until the end + * of the program. + */ + depth++; + break; + + case SHADER_OPCODE_FIND_LIVE_CHANNEL: + if (depth == 0) { +inst-opcode = BRW_OPCODE_MOV; +inst-src[0] = fs_reg(0); How does this work if we're on a primitive edge and channel zero is disabled? Does the EU not actually disable those channels except in the framebuffer write or something? +inst-sources = 1; +inst-force_writemask_all = true; +progress = true; + } + break; + + default: + break; + } + } + + return progress; +} I wouldn't mind if this was just part of opt_algebraic. Less code that way and one fewer pass over the instruction list. What do you think, Ken? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 49 ++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 + src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 1 + 6 files changed, 94 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d567c2b..4537900 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2734,6 +2734,54 @@ fs_visitor::compute_to_mrf() } /** + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control + * flow. We could probably do better here with some form of divergence + * analysis. + */ +bool +fs_visitor::eliminate_find_live_channel() +{ + bool progress = false; + unsigned depth = 0; + + foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { + switch (inst-opcode) { + case BRW_OPCODE_IF: + case BRW_OPCODE_DO: + depth++; + break; + + case BRW_OPCODE_ENDIF: + case BRW_OPCODE_WHILE: + depth--; + break; + + case FS_OPCODE_DISCARD_JUMP: + /* This can potentially make control flow non-uniform until the end + * of the program. + */ + depth++; + break; + + case SHADER_OPCODE_FIND_LIVE_CHANNEL: + if (depth == 0) { +inst-opcode = BRW_OPCODE_MOV; +inst-src[0] = fs_reg(0); +inst-sources = 1; +inst-force_writemask_all = true; +progress = true; + } + break; + + default: + break; + } + } + + return progress; +} + +/** * Once we've generated code, try to convert normal FS_OPCODE_FB_WRITE * instructions to FS_OPCODE_REP_FB_WRITE. */ @@ -3678,6 +3726,7 @@ fs_visitor::optimize() OPT(opt_saturate_propagation); OPT(register_coalesce); OPT(compute_to_mrf); + OPT(eliminate_find_live_channel); OPT(compact_virtual_grfs); } while (progress); diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 9375412..6e5fa1d 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -228,6 +228,7 @@ public: bool opt_register_renaming(); bool register_coalesce(); bool compute_to_mrf(); + bool eliminate_find_live_channel(); bool dead_code_eliminate(); bool remove_duplicate_mrf_writes(); bool virtual_grf_interferes(int a, int b); diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index bbe0747..04a073e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -89,6 +89,7 @@ is_expression(const fs_inst *const inst) case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD: case FS_OPCODE_CINTERP: case FS_OPCODE_LINTERP: + case SHADER_OPCODE_FIND_LIVE_CHANNEL: case SHADER_OPCODE_BROADCAST: return true; case SHADER_OPCODE_RCP: diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 601b8c1..219c05e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1250,6 +1250,46 @@ vec4_visitor::opt_register_coalesce() } /** + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control + * flow. We could probably do better here with some form of divergence + * analysis. + */ +bool +vec4_visitor::eliminate_find_live_channel() +{ + bool progress = false; + unsigned depth = 0; + + foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) { + switch (inst-opcode) { + case BRW_OPCODE_IF: + case BRW_OPCODE_DO: + depth++; + break; + + case BRW_OPCODE_ENDIF: + case BRW_OPCODE_WHILE: + depth--; + break; + + case SHADER_OPCODE_FIND_LIVE_CHANNEL: + if (depth == 0) { +inst-opcode = BRW_OPCODE_MOV; +inst-src[0] = src_reg(0); +inst-force_writemask_all = true; +progress = true; + } + break; + + default: + break; + } + } + + return progress; +} + +/** * Splits virtual GRFs requesting more than one contiguous physical register. * * We initially create large virtual GRFs for temporary structures, arrays, @@ -1880,6 +1920,7 @@ vec4_visitor::run() OPT(opt_cse); OPT(opt_algebraic); OPT(opt_register_coalesce); + OPT(eliminate_find_live_channel); } while (progress); pass_num = 0; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index a24f843..7a92d59 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -207,6 +207,7 @@ public: bool opt_cse(); bool opt_algebraic(); bool