On 2/28/19 4:47 AM, Iago Toral wrote: > On Wed, 2019-02-27 at 17:04 -0800, Ian Romanick wrote: >> On 2/27/19 4:45 AM, Iago Toral Quiroga wrote: >>> Now that we propagate constants to the first source of 2src >>> instructions we >>> see more opportunities of constant folding in the backend. >>> >>> Shader-db results on KBL: >>> >>> total instructions in shared programs: 14965607 -> 14855983 (- >>> 0.73%) >>> instructions in affected programs: 3988102 -> 3878478 (-2.75%) >>> helped: 14292 >>> HURT: 59 >>> >>> total cycles in shared programs: 344324295 -> 340656008 (-1.07%) >>> cycles in affected programs: 247527740 -> 243859453 (-1.48%) >>> helped: 14056 >>> HURT: 3314 >>> >>> total loops in shared programs: 4283 -> 4283 (0.00%) >>> loops in affected programs: 0 -> 0 >>> helped: 0 >>> HURT: 0 >>> >>> total spills in shared programs: 27812 -> 24350 (-12.45%) >>> spills in affected programs: 24921 -> 21459 (-13.89%) >>> helped: 345 >>> HURT: 19 >>> >>> total fills in shared programs: 24173 -> 22032 (-8.86%) >>> fills in affected programs: 21124 -> 18983 (-10.14%) >>> helped: 355 >>> HURT: 25 >> >> Ignore my previous questions about nir_opt_constant_folding after >> nir_opt_algebraic_late. I had done that because I added a bunch of >> things to nir_opt_algebraic_late that created my constant folding >> opportunities. >> >> This is the combined changes for this patch and the previous >> patch. For >> this patch alone, I got: >> >> total instructions in shared programs: 15306213 -> 15221518 (-0.55%) >> instructions in affected programs: 2911451 -> 2826756 (-2.91%) >> helped: 13121 >> HURT: 44 >> helped stats (abs) min: 1 max: 51 x̄: 6.66 x̃: 6 >> helped stats (rel) min: <.01% max: 16.67% x̄: 4.27% x̃: 3.30% >> HURT stats (abs) min: 3 max: 453 x̄: 61.16 x̃: 5 >> HURT stats (rel) min: 0.20% max: 151.00% x̄: 31.57% x̃: 19.23% >> 95% mean confidence interval for instructions value: -6.61 -6.26 >> 95% mean confidence interval for instructions %-change: -4.23% -4.07% >> Instructions are helped. >> >> total cycles in shared programs: 375419164 -> 372829148 (-0.69%) >> cycles in affected programs: 146769299 -> 144179283 (-1.76%) >> helped: 10992 >> HURT: 1833 >> helped stats (abs) min: 1 max: 56127 x̄: 250.29 x̃: 18 >> helped stats (rel) min: <.01% max: 40.52% x̄: 3.11% x̃: 2.58% >> HURT stats (abs) min: 1 max: 1718 x̄: 87.93 x̃: 42 >> HURT stats (rel) min: <.01% max: 139.33% x̄: 7.74% x̃: 3.08% >> 95% mean confidence interval for cycles value: -248.21 -155.69 >> 95% mean confidence interval for cycles %-change: -1.67% -1.44% >> Cycles are helped. >> >> total spills in shared programs: 28828 -> 28888 (0.21%) >> spills in affected programs: 2037 -> 2097 (2.95%) >> helped: 0 >> HURT: 24 >> >> total fills in shared programs: 35542 -> 35639 (0.27%) >> fills in affected programs: 3078 -> 3175 (3.15%) >> helped: 2 >> HURT: 26 >> >> I decided to look at some of the hurt shaders... it looks like some >> of >> the Unigine geometry shaders really took a beating (+150% >> instructions). >> Note the "max" in the "instructions in affected programs" above. > > I am seeing quite different results on my KBL laptop: > > total instructions in shared programs: 14945933 -> 14858158 (-0.59%) > instructions in affected programs: 2842901 -> 2755126 (-3.09%) > helped: 13196 > HURT: 5 > > instructions HURT: shaders/closed/steam/deus-ex-mankind- > divided/274.shader_test CS SIMD8: 1535 -> 1538 (0.20%) > instructions HURT: shaders/closed/steam/deus-ex-mankind- > divided/184.shader_test CS SIMD8: 1535 -> 1538 (0.20%) > instructions HURT: shaders/dolphin/ubershaders/147.shader_test FS > SIMD8: 3481 -> 3491 (0.29%) > instructions HURT: shaders/dolphin/ubershaders/156.shader_test FS > SIMD8: 3465 -> 3475 (0.29%) > instructions HURT: shaders/dolphin/ubershaders/138.shader_test FS > SIMD8: 3465 -> 3475 (0.29%) > > Did you test on a different gen? Can you paste here the paths of some > of the GS shaders where you see the big regressions so I can verify I > have them in my shader-db? > > Also, how did you test this patch exactly? When I was going to capture > the reference shader-db results for patch 2 in this series so I could > extract the results for patch 3 by comparing against it, I noticed that > patch 2 would create constant folding scenarios (for example for ADD > and MUL) that, before this patch, would hit an assertion in the driver > since the algebraic pass only expects to find these opportunities for F > types and will assert on that, so I guess you noticed this and fixed it > before taking your numbers?
I ran it through my usual shader-db gauntlet that runs shader-db at each commit for SKL, BDW, HSW, IVB, SNB, ILK, and GM45. *But* since one pass of that takes a really, really long time, I only run release builds with -march=native and all the other tricks. None of the assertions would exist in that run. If patch 2 creates possible assertion failures, the two patches should probably be re-ordered or the previous patch should do something to avoid the assertions. Some day, someone will hit that patch in a bisect, and they will be sad. What I think happened is the previous patch helped a bunch of shaders by breaking them, and this patch fixes them... making them "worse." When I look at the total across both patches, I get results much more similar to yours, so this was a false alarm. Skylake total instructions in shared programs: 15328886 -> 15219343 (-0.71%) instructions in affected programs: 3999638 -> 3890095 (-2.74%) helped: 14311 HURT: 65 helped stats (abs) min: 1 max: 502 x̄: 7.67 x̃: 4 helped stats (rel) min: 0.04% max: 26.58% x̄: 4.07% x̃: 3.05% HURT stats (abs) min: 1 max: 22 x̄: 2.35 x̃: 1 HURT stats (rel) min: 0.06% max: 1.96% x̄: 0.42% x̃: 0.34% 95% mean confidence interval for instructions value: -7.88 -7.36 95% mean confidence interval for instructions %-change: -4.10% -3.99% Instructions are helped. total cycles in shared programs: 376419186 -> 372742630 (-0.98%) cycles in affected programs: 277719712 -> 274043156 (-1.32%) helped: 14204 HURT: 3270 helped stats (abs) min: 1 max: 118186 x̄: 278.57 x̃: 17 helped stats (rel) min: <.01% max: 40.52% x̄: 2.71% x̃: 2.14% HURT stats (abs) min: 1 max: 5031 x̄: 85.69 x̃: 26 HURT stats (rel) min: <.01% max: 65.74% x̄: 6.12% x̃: 2.00% 95% mean confidence interval for cycles value: -247.21 -173.60 95% mean confidence interval for cycles %-change: -1.15% -0.96% Cycles are helped. total spills in shared programs: 32328 -> 28888 (-10.64%) spills in affected programs: 26262 -> 22822 (-13.10%) helped: 345 HURT: 17 total fills in shared programs: 37768 -> 35639 (-5.64%) fills in affected programs: 22394 -> 20265 (-9.51%) helped: 354 HURT: 24 LOST: 0 GAINED: 5 >> More comments below by SHL... >> >>> LOST: 0 >>> GAINED: 5 >>> --- >>> src/intel/compiler/brw_fs.cpp | 203 >>> ++++++++++++++++++++++++++++++++-- >>> 1 file changed, 195 insertions(+), 8 deletions(-) >>> >>> diff --git a/src/intel/compiler/brw_fs.cpp >>> b/src/intel/compiler/brw_fs.cpp >>> index 2358acbeb59..b2b60237c82 100644 >>> --- a/src/intel/compiler/brw_fs.cpp >>> +++ b/src/intel/compiler/brw_fs.cpp >>> @@ -2583,9 +2583,55 @@ fs_visitor::opt_algebraic() >>> break; >>> >>> case BRW_OPCODE_MUL: >>> - if (inst->src[1].file != IMM) >>> + if (inst->src[0].file != IMM && inst->src[1].file != IMM) >>> continue; >>> >>> + /* Constant folding */ >>> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) >>> { >>> + assert(inst->src[0].type == inst->src[1].type); >>> + bool local_progress = true; >>> + switch (inst->src[0].type) { >>> + case BRW_REGISTER_TYPE_HF: { >>> + float v1 = _mesa_half_to_float(inst->src[0].ud & >>> 0xffffu); >>> + float v2 = _mesa_half_to_float(inst->src[1].ud & >>> 0xffffu); >>> + inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 * >>> v2)); >>> + break; >>> + } >>> + case BRW_REGISTER_TYPE_W: { >>> + int16_t v1 = inst->src[0].ud & 0xffffu; >>> + int16_t v2 = inst->src[1].ud & 0xffffu; >>> + inst->src[0] = brw_imm_w(v1 * v2); >>> + break; >>> + } >>> + case BRW_REGISTER_TYPE_UW: { >>> + uint16_t v1 = inst->src[0].ud & 0xffffu; >>> + uint16_t v2 = inst->src[1].ud & 0xffffu; >>> + inst->src[0] = brw_imm_uw(v1 * v2); >>> + break; >>> + } >>> + case BRW_REGISTER_TYPE_F: >>> + inst->src[0].f *= inst->src[1].f; >>> + break; >>> + case BRW_REGISTER_TYPE_D: >>> + inst->src[0].d *= inst->src[1].d; >>> + break; >>> + case BRW_REGISTER_TYPE_UD: >>> + inst->src[0].ud *= inst->src[1].ud; >>> + break; >>> + default: >>> + local_progress = false; >>> + break; >>> + }; >>> + >>> + if (local_progress) { >>> + inst->opcode = BRW_OPCODE_MOV; >>> + inst->src[1] = reg_undef; >>> + progress = true; >>> + break; >>> + } >>> + } >>> + >>> + >>> /* a * 1.0 = a */ >>> if (inst->src[1].is_one()) { >>> inst->opcode = BRW_OPCODE_MOV; >>> @@ -2594,6 +2640,14 @@ fs_visitor::opt_algebraic() >>> break; >>> } >>> >>> + if (inst->src[0].is_one()) { >>> + inst->opcode = BRW_OPCODE_MOV; >>> + inst->src[0] = inst->src[1]; >>> + inst->src[1] = reg_undef; >>> + progress = true; >>> + break; >>> + } >>> + >>> /* a * -1.0 = -a */ >>> if (inst->src[1].is_negative_one()) { >>> inst->opcode = BRW_OPCODE_MOV; >>> @@ -2603,27 +2657,160 @@ fs_visitor::opt_algebraic() >>> break; >>> } >>> >>> - if (inst->src[0].file == IMM) { >>> - assert(inst->src[0].type == BRW_REGISTER_TYPE_F); >>> + if (inst->src[0].is_negative_one()) { >>> + inst->opcode = BRW_OPCODE_MOV; >>> + inst->src[0] = inst->src[1]; >>> + inst->src[0].negate = !inst->src[1].negate; >>> + inst->src[1] = reg_undef; >>> + progress = true; >>> + break; >>> + } >>> + >>> + /* a * 0 = 0 (this is not exact for floating point) */ >>> + if (inst->src[1].is_zero() && >>> + brw_reg_type_is_integer(inst->src[1].type)) { >>> + inst->opcode = BRW_OPCODE_MOV; >>> + inst->src[0] = inst->src[1]; >>> + inst->src[1] = reg_undef; >>> + progress = true; >>> + break; >>> + } >>> + >>> + if (inst->src[0].is_zero() && >>> + brw_reg_type_is_integer(inst->src[0].type)) { >>> inst->opcode = BRW_OPCODE_MOV; >>> - inst->src[0].f *= inst->src[1].f; >>> inst->src[1] = reg_undef; >>> progress = true; >>> break; >>> } >>> break; >>> case BRW_OPCODE_ADD: >>> - if (inst->src[1].file != IMM) >>> + if (inst->src[0].file != IMM && inst->src[1].file != IMM) >>> continue; >>> >>> - if (inst->src[0].file == IMM) { >>> - assert(inst->src[0].type == BRW_REGISTER_TYPE_F); >>> + /* Constant folding */ >>> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) >>> { >>> + assert(inst->src[0].type == inst->src[1].type); >>> + bool local_progress = true; >>> + switch (inst->src[0].type) { >>> + case BRW_REGISTER_TYPE_HF: { >>> + float v1 = _mesa_half_to_float(inst->src[0].ud & >>> 0xffffu); >>> + float v2 = _mesa_half_to_float(inst->src[1].ud & >>> 0xffffu); >>> + inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 + >>> v2)); >>> + break; >>> + } >>> + case BRW_REGISTER_TYPE_W: { >>> + int16_t v1 = inst->src[0].ud & 0xffffu; >>> + int16_t v2 = inst->src[1].ud & 0xffffu; >>> + inst->src[0] = brw_imm_w(v1 + v2); >>> + break; >>> + } >>> + case BRW_REGISTER_TYPE_UW: { >>> + uint16_t v1 = inst->src[0].ud & 0xffffu; >>> + uint16_t v2 = inst->src[1].ud & 0xffffu; >>> + inst->src[0] = brw_imm_uw(v1 + v2); >>> + break; >>> + } >>> + case BRW_REGISTER_TYPE_F: >>> + inst->src[0].f += inst->src[1].f; >>> + break; >>> + case BRW_REGISTER_TYPE_D: >>> + inst->src[0].d += inst->src[1].d; >>> + break; >>> + case BRW_REGISTER_TYPE_UD: >>> + inst->src[0].ud += inst->src[1].ud; >>> + break; >>> + default: >>> + local_progress = false; >>> + break; >>> + }; >>> + >>> + if (local_progress) { >>> + inst->opcode = BRW_OPCODE_MOV; >>> + inst->src[1] = reg_undef; >>> + progress = true; >>> + break; >>> + } >>> + } >>> + >>> + /* a + 0 = a (this is not exact for floating point) */ >>> + if (inst->src[1].is_zero() && >>> + brw_reg_type_is_integer(inst->src[1].type)) { >>> inst->opcode = BRW_OPCODE_MOV; >>> - inst->src[0].f += inst->src[1].f; >>> inst->src[1] = reg_undef; >>> progress = true; >>> break; >>> } >>> + >>> + if (inst->src[0].is_zero() && >>> + brw_reg_type_is_integer(inst->src[0].type)) { >>> + inst->opcode = BRW_OPCODE_MOV; >>> + inst->src[0] = inst->src[1]; >>> + inst->src[1] = reg_undef; >>> + progress = true; >>> + break; >>> + } >>> + break; >>> + case BRW_OPCODE_SHL: >>> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) >>> { >>> + bool local_progress = true; >>> + switch (inst->src[0].type) { >>> + case BRW_REGISTER_TYPE_D: >>> + case BRW_REGISTER_TYPE_UD: >>> + inst->src[0].ud <<= inst->src[1].ud; >>> + break; >> >> There's something fishy going on here. In one of the less hurt >> Unigine >> shaders, the before code was: >> >> mov(1) g2<1>UD 0x00000001UD { >> align1 WE_all 1N compacted }; >> mov(8) g126<1>UD g1<8,8,1>UD { >> align1 WE_all 1Q compacted }; >> shl(8) g127<1>UD g2<0,1,0>UD 0xffffffffUD { >> align1 1Q compacted }; >> >> and the after code is >> >> mov(8) g126<1>UD g1<8,8,1>UD { >> align1 WE_all 1Q compacted }; >> mov(8) g127<1>UD 0x00000008UD { >> align1 WE_all 1Q compacted }; >> >> That doesn't seem right. I thought 1U << 0xffffffffU should be 1u << >> 31 >> = 0x80000000. Since Gen explicitly only uses the low 5 bits of src1, >> maybe we need "& 0x1f" around all the shift counts? I thought the >> CPU >> did the same thing... >> >> Does this series pass the CI? >> >>> + case BRW_REGISTER_TYPE_W: >>> + case BRW_REGISTER_TYPE_UW: { >>> + uint16_t v1 = inst->src[0].ud & 0xffffu; >>> + uint16_t v2 = inst->src[1].ud & 0xffffu; >>> + inst->src[0] = retype(brw_imm_uw(v1 << v2), inst- >>>> src[0].type); >>> + break; >>> + } >>> + default: >>> + local_progress = false; >>> + break; >>> + } >>> + if (local_progress) { >>> + inst->opcode = BRW_OPCODE_MOV; >>> + inst->src[1] = reg_undef; >>> + progress = true; >>> + break; >>> + } >>> + } >>> + break; >>> + case BRW_OPCODE_SHR: >>> + if (inst->src[0].file == IMM && inst->src[1].file == IMM) >>> { >>> + bool local_progress = true; >>> + switch (inst->src[0].type) { >>> + case BRW_REGISTER_TYPE_D: >>> + inst->src[0].d >>= inst->src[1].ud; >>> + break; >>> + case BRW_REGISTER_TYPE_UD: >>> + inst->src[0].ud >>= inst->src[1].ud; >>> + break; >>> + case BRW_REGISTER_TYPE_W: { >>> + int16_t v1 = inst->src[0].ud & 0xffffu; >>> + uint16_t v2 = inst->src[1].ud & 0xffffu; >>> + inst->src[0] = brw_imm_w(v1 >> v2); >>> + break; >>> + } >>> + case BRW_REGISTER_TYPE_UW: { >>> + uint16_t v1 = inst->src[0].ud & 0xffffu; >>> + uint16_t v2 = inst->src[1].ud & 0xffffu; >>> + inst->src[0] = brw_imm_uw(v1 >> v2); >>> + break; >>> + } >>> + default: >>> + local_progress = false; >>> + break; >>> + } >>> + if (local_progress) { >>> + inst->opcode = BRW_OPCODE_MOV; >>> + inst->src[1] = reg_undef; >>> + progress = true; >>> + break; >>> + } >>> + } >>> break; >>> case BRW_OPCODE_OR: >>> if (inst->src[0].equals(inst->src[1]) || _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev