El 08/09/17 a las 11:06, Alejandro Piñeiro escribió: > On 08/09/17 02:50, Francisco Jerez wrote: >> Currently the liveness analysis pass would extend a live interval up >> to the top of the program when no unconditional and complete >> definition of the variable is found that dominates all of its uses. >> >> This can lead to a serious performance problem in shaders containing >> many partial writes, like scalar arithmetic, FP64 and soon FP16 >> operations. > Just tested with the VK_KHR_16bit_storage implementation. Works really > fine with the most problematic tests, so we can drop the "i965/fs: Add > reuse_16bit_conversions_register optimization" patch (that was already > NAKed by both you and Connor). > > My test was limited to that extension CTS tests, but in case it helps: > Tested-by: Alejandro Piñeiro <apinhe...@igalia.com>
I've seen one regression on a full VK-CTS run with this patch over the VK_KHR_16bit_storage branch, but I can reproduce the same regression applying it on master. Failing test is: dEQP-VK.glsl.return.return_in_dynamic_loop_dynamic_vertex >> The number of oversize live intervals in such workloads >> can cause the compilation time of the shader to explode because of the >> worse than quadratic behavior of the register allocator and scheduler >> when running out of registers, and it can also cause the running time >> of the shader to explode due to the amount of spilling it leads to, >> which is orders of magnitude slower than GRF memory. >> >> This patch fixes it by computing the intersection of our current live >> intervals with the subset of the program that can possibly be reached >> from any definition of the variable. Extending the storage allocation >> of the variable beyond that is pretty useless because its value is >> guaranteed to be undefined at a point that cannot be reached from any >> definition. >> >> No significant change in the running time of shader-db (with 5% >> statistical significance). >> >> shader-db results on IVB: >> >> total cycles in shared programs: 61108780 -> 60932856 (-0.29%) >> cycles in affected programs: 16335482 -> 16159558 (-1.08%) >> helped: 5121 >> HURT: 4347 >> >> total spills in shared programs: 1309 -> 1288 (-1.60%) >> spills in affected programs: 249 -> 228 (-8.43%) >> helped: 3 >> HURT: 0 >> >> total fills in shared programs: 1652 -> 1597 (-3.33%) >> fills in affected programs: 262 -> 207 (-20.99%) >> helped: 4 >> HURT: 0 >> >> LOST: 2 >> GAINED: 209 >> >> shader-db results on BDW: >> >> total cycles in shared programs: 67617262 -> 67361220 (-0.38%) >> cycles in affected programs: 23397142 -> 23141100 (-1.09%) >> helped: 8045 >> HURT: 6488 >> >> total spills in shared programs: 1456 -> 1252 (-14.01%) >> spills in affected programs: 465 -> 261 (-43.87%) >> helped: 3 >> HURT: 0 >> >> total fills in shared programs: 1720 -> 1465 (-14.83%) >> fills in affected programs: 471 -> 216 (-54.14%) >> helped: 4 >> HURT: 0 >> >> LOST: 2 >> GAINED: 162 >> >> shader-db results on SKL: >> >> total cycles in shared programs: 65436248 -> 65245186 (-0.29%) >> cycles in affected programs: 22560936 -> 22369874 (-0.85%) >> helped: 8457 >> HURT: 6247 >> >> total spills in shared programs: 437 -> 437 (0.00%) >> spills in affected programs: 0 -> 0 >> helped: 0 >> HURT: 0 >> >> total fills in shared programs: 870 -> 854 (-1.84%) >> fills in affected programs: 16 -> 0 >> helped: 1 >> HURT: 0 >> >> LOST: 0 >> GAINED: 107 >> --- >> src/intel/compiler/brw_fs_live_variables.cpp | 34 >> ++++++++++++++++++++++++---- >> src/intel/compiler/brw_fs_live_variables.h | 12 ++++++++++ >> 2 files changed, 42 insertions(+), 4 deletions(-) >> >> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp >> b/src/intel/compiler/brw_fs_live_variables.cpp >> index c449672a519..059f076fa51 100644 >> --- a/src/intel/compiler/brw_fs_live_variables.cpp >> +++ b/src/intel/compiler/brw_fs_live_variables.cpp >> @@ -83,9 +83,11 @@ fs_live_variables::setup_one_write(struct block_data *bd, >> fs_inst *inst, >> /* The def[] bitset marks when an initialization in a block completely >> * screens off previous updates of that variable (VGRF channel). >> */ >> - if (inst->dst.file == VGRF && !inst->is_partial_write()) { >> - if (!BITSET_TEST(bd->use, var)) >> + if (inst->dst.file == VGRF) { >> + if (!inst->is_partial_write() && !BITSET_TEST(bd->use, var)) >> BITSET_SET(bd->def, var); >> + >> + BITSET_SET(bd->defout, var); >> } >> } >> >> @@ -199,6 +201,28 @@ fs_live_variables::compute_live_variables() >> } >> } >> } >> + >> + /* Propagate defin and defout down the CFG to calculate the union of live >> + * variables potentially defined along any possible control flow path. >> + */ >> + do { >> + cont = false; >> + >> + foreach_block (block, cfg) { >> + const struct block_data *bd = &block_data[block->num]; >> + >> + foreach_list_typed(bblock_link, child_link, link, &block->children) { >> + struct block_data *child_bd = >> &block_data[child_link->block->num]; >> + >> + for (int i = 0; i < bitset_words; i++) { >> + const BITSET_WORD new_def = bd->defout[i] & >> ~child_bd->defin[i]; >> + child_bd->defin[i] |= new_def; >> + child_bd->defout[i] |= new_def; >> + cont |= new_def; >> + } >> + } >> + } >> + } while (cont); >> } >> >> /** >> @@ -212,12 +236,12 @@ fs_live_variables::compute_start_end() >> struct block_data *bd = &block_data[block->num]; >> >> for (int i = 0; i < num_vars; i++) { >> - if (BITSET_TEST(bd->livein, i)) { >> + if (BITSET_TEST(bd->livein, i) && BITSET_TEST(bd->defin, i)) { >> start[i] = MIN2(start[i], block->start_ip); >> end[i] = MAX2(end[i], block->start_ip); >> } >> >> - if (BITSET_TEST(bd->liveout, i)) { >> + if (BITSET_TEST(bd->liveout, i) && BITSET_TEST(bd->defout, i)) { >> start[i] = MIN2(start[i], block->end_ip); >> end[i] = MAX2(end[i], block->end_ip); >> } >> @@ -260,6 +284,8 @@ fs_live_variables::fs_live_variables(fs_visitor *v, >> const cfg_t *cfg) >> block_data[i].use = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words); >> block_data[i].livein = rzalloc_array(mem_ctx, BITSET_WORD, >> bitset_words); >> block_data[i].liveout = rzalloc_array(mem_ctx, BITSET_WORD, >> bitset_words); >> + block_data[i].defin = rzalloc_array(mem_ctx, BITSET_WORD, >> bitset_words); >> + block_data[i].defout = rzalloc_array(mem_ctx, BITSET_WORD, >> bitset_words); >> >> block_data[i].flag_def[0] = 0; >> block_data[i].flag_use[0] = 0; >> diff --git a/src/intel/compiler/brw_fs_live_variables.h >> b/src/intel/compiler/brw_fs_live_variables.h >> index d2d5898ed1c..9e95e443170 100644 >> --- a/src/intel/compiler/brw_fs_live_variables.h >> +++ b/src/intel/compiler/brw_fs_live_variables.h >> @@ -55,6 +55,18 @@ struct block_data { >> /** Which defs reach the exit point of the block. */ >> BITSET_WORD *liveout; >> >> + /** >> + * Variables such that the entry point of the block may be reached from >> any >> + * of their definitions. >> + */ >> + BITSET_WORD *defin; >> + >> + /** >> + * Variables such that the exit point of the block may be reached from >> any >> + * of their definitions. >> + */ >> + BITSET_WORD *defout; >> + >> BITSET_WORD flag_def[1]; >> BITSET_WORD flag_use[1]; >> BITSET_WORD flag_livein[1]; > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev