Hey all-apologies for the long delay. Haven't had time until recently to look into this further. >>> The zero extract now matching against other modes would generate a >>> test + branch rather than the combined instruction which led to the >>> code size regression. I've updated the patch so that tbnz etc. matches GPI >>> and >>that brings code size down to <0.2% in spec2017 and <0.4% in spec2006. >> >>That's looking better indeed. I notice there are still differences, eg. >>tbz/tbnz >>counts are significantly different in perlbench, with ~350 missed cases >>overall >>(mostly tbz reg, #7). >> >>There are also more uses of uxtw, ubfiz, sbfiz - for example I see cases like >>this >>in namd: >> >> 42c7dc: 13007400 sbfx w0, w0, #0, #30 >> 42c7e0: 937c7c00 sbfiz x0, x0, #4, #32 >> >>So it would be a good idea to check any benchmarks where there is still a non- >>trivial codesize difference. You can get a quick idea what is happening by >>grepping for instructions like this: >> >>grep -c sbfiz out1.txt out2.txt >>out1.txt:872 >>out2.txt:934 >> >>grep -c tbnz out1.txt out2.txt >>out1.txt:5189 >>out2.txt:4989
That's really good insight Wilco! I took a look at the tbnz/tbz case in perl and we lose matching against this because allowing SI mode on extv/extzv causes subst in combine.c to generate: (lshiftrt:SI (reg:SI 107 [ _16 ]) (const_int 7 [0x7])) (nil) Instead of: (and:DI (lshiftrt:DI (subreg:DI (reg:SI 107 [ _16 ]) 0) (const_int 7 [0x7])) (const_int 1 [0x1])) The latter case is picked up in make_compound_operation_int to transform into a zero_extract while the new case is left alone. A lshiftrt generally can't be reduced down to a bit-test but in this case it can because we have zero_bit information on it. Given that, looking around try_combine it seems like the best place to detect this pattern is in the 2nd chance code after the first failure of recog_for_combine which I've done in this patch. I think this is the place to put this fix given changing subst/make_compound_operation_int leads to significantly more diffs. After this change the total number of tbnz/tbz lines up near identical to the baseline which is good and overall size within .1% on spec 2017 and spec 2006. However, looking further at ubfiz there's a pretty large increase in certain benchmarks. I looked into spec 2017/blender and we fail to combine this pattern: Trying 653 -> 654: 653: r512:SI=r94:SI 0>>0x8 REG_DEAD r94:SI 654: r513:DI=zero_extend(r512:SI) REG_DEAD r512:SI Failed to match this instruction: (set (reg:DI 513) (zero_extract:DI (reg:SI 94 [ bswapdst_4 ]) (const_int 8 [0x8]) (const_int 8 [0x8]))) Where previously we combined it like this: Trying 653 -> 654: 653: r512:SI=r94:SI 0>>0x8 REG_DEAD r94:SI 654: r513:DI=zero_extend(r512:SI) REG_DEAD r512:SI Successfully matched this instruction: (set (reg:DI 513) (zero_extract:DI (subreg:DI (reg:SI 94 [ bswapdst_4 ]) 0) // subreg used (const_int 8 [0x8]) (const_int 8 [0x8]))) Here's where I'm at an impasse. The code that generates the modes in get_best_reg_extraction_insn looks at the inner mode of SI now that extzvsi is valid and generates a non-subreg use. However, the MD pattern is looking for all modes being DI or SI not a mix. I think a fix could be done to canonicalize these extracts to the same mode but am unsure if in general a mode mismatched extract RTX is valid which would make this a fairly large change. Latest patch with fix for tbnz/tbz is attached alongside the numbers for SPEC and instruction count for SPEC 2017 are attached for reference. >>> Can you send me the necessary documents to make that happen? Thanks! >> >>That's something you need to sort out with the fsf. There is a mailing list >>for this: >>mailto:ass...@gnu.org. I haven't had any response from my previous mail there. Should I add one of you to the CC or mail someone specifically to get traction? Best, Modi
pr86901.patch
Description: pr86901.patch
base diff % increase text data bss total filename text data bss total filename 1038264 243802 12472 1294538 base/benchspec/CPU2006/400.perlbench/exe/perlbench_base.gcc9-base 1038344 243714 12472 1294530 diff/benchspec/CPU2006/400.perlbench/exe/perlbench_base.gcc9-diff 0.01% 72024 16030 4328 92382 base/benchspec/CPU2006/401.bzip2/exe/bzip2_base.gcc9-base 72016 16030 4328 92374 diff/benchspec/CPU2006/401.bzip2/exe/bzip2_base.gcc9-diff -0.01% 2651976 816398 749792 4218166 base/benchspec/CPU2006/403.gcc/exe/gcc_base.gcc9-base 2652096 816558 749792 4218446 diff/benchspec/CPU2006/403.gcc/exe/gcc_base.gcc9-diff 0.00% 8232 4803 11912 24947 base/benchspec/CPU2006/429.mcf/exe/mcf_base.gcc9-base 8232 4803 11912 24947 diff/benchspec/CPU2006/429.mcf/exe/mcf_base.gcc9-diff 0.00% 105912 31228 37176 174316 base/benchspec/CPU2006/433.milc/exe/milc_base.gcc9-base 105904 31228 37176 174308 diff/benchspec/CPU2006/433.milc/exe/milc_base.gcc9-diff -0.01% 205752 25512 496 231760 base/benchspec/CPU2006/444.namd/exe/namd_base.gcc9-base 204792 25288 496 230576 diff/benchspec/CPU2006/444.namd/exe/namd_base.gcc9-diff -0.47% 757960 2917634 2328968 6004562 base/benchspec/CPU2006/445.gobmk/exe/gobmk_base.gcc9-base 757952 2917634 2328968 6004554 diff/benchspec/CPU2006/445.gobmk/exe/gobmk_base.gcc9-diff 0.00% 2180808 880678 3528 3065014 base/benchspec/CPU2006/447.dealII/exe/dealII_base.gcc9-base 2181248 880670 3528 3065446 diff/benchspec/CPU2006/447.dealII/exe/dealII_base.gcc9-diff 0.02% 345832 79234 1584 426650 base/benchspec/CPU2006/450.soplex/exe/soplex_base.gcc9-base 345936 79234 1584 426754 diff/benchspec/CPU2006/450.soplex/exe/soplex_base.gcc9-diff 0.03% 779768 225837 161496 1167101 base/benchspec/CPU2006/453.povray/exe/povray_base.gcc9-base 779904 225965 161496 1167365 diff/benchspec/CPU2006/453.povray/exe/povray_base.gcc9-diff 0.02% 249528 69146 81944 400618 base/benchspec/CPU2006/456.hmmer/exe/hmmer_base.gcc9-base 249536 69146 81944 400626 diff/benchspec/CPU2006/456.hmmer/exe/hmmer_base.gcc9-diff 0.00% 125080 38096 2576288 2739464 base/benchspec/CPU2006/458.sjeng/exe/sjeng_base.gcc9-base 125080 38096 2576288 2739464 diff/benchspec/CPU2006/458.sjeng/exe/sjeng_base.gcc9-diff 0.00% 30452 11213 96 41761 base/benchspec/CPU2006/462.libquantum/exe/libquantum_base.gcc9-base 30420 11221 96 41737 diff/benchspec/CPU2006/462.libquantum/exe/libquantum_base.gcc9-diff -0.11% 567320 127533 371064 1065917 base/benchspec/CPU2006/464.h264ref/exe/h264ref_base.gcc9-base 567304 127533 371064 1065901 diff/benchspec/CPU2006/464.h264ref/exe/h264ref_base.gcc9-diff 0.00% 8376 4584 24 12984 base/benchspec/CPU2006/470.lbm/exe/lbm_base.gcc9-base 8368 4584 24 12976 diff/benchspec/CPU2006/470.lbm/exe/lbm_base.gcc9-diff -0.10% 484536 216255 14528 715319 base/benchspec/CPU2006/471.omnetpp/exe/omnetpp_base.gcc9-base 484528 216255 14528 715311 diff/benchspec/CPU2006/471.omnetpp/exe/omnetpp_base.gcc9-diff 0.00% 33256 10075 5152 48483 base/benchspec/CPU2006/473.astar/exe/astar_base.gcc9-base 33248 10075 5152 48475 diff/benchspec/CPU2006/473.astar/exe/astar_base.gcc9-diff -0.02% 155608 53889 32888 242385 base/benchspec/CPU2006/482.sphinx3/exe/sphinx_livepretend_base.gcc9-base 155600 53889 32888 242377 diff/benchspec/CPU2006/482.sphinx3/exe/sphinx_livepretend_base.gcc9-diff -0.01% 2930312 1698488 11544 4640344 base/benchspec/CPU2006/483.xalancbmk/exe/Xalan_base.gcc9-base 2930304 1698488 11544 4640336 diff/benchspec/CPU2006/483.xalancbmk/exe/Xalan_base.gcc9-diff 0.00% 1016 1728 8 2752 base/benchspec/CPU2006/998.specrand/exe/specrand_base.gcc9-base 1016 1728 8 2752 diff/benchspec/CPU2006/998.specrand/exe/specrand_base.gcc9-diff 0.00% 1016 1728 8 2752 base/benchspec/CPU2006/999.specrand/exe/specrand_base.gcc9-base 1016 1728 8 2752 diff/benchspec/CPU2006/999.specrand/exe/specrand_base.gcc9-diff 0.00% 12733028 12732844 0.00%
tbnz tbz sbfiz ubfiz uxtw sxtw base diff diff-base base diff diff-base base diff diff-base base diff diff-base base diff diff-base base diff diff-base diff/benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.diff-64 4394 4396 2 4442 4444 2 215 215 0 322 323 1 328 328 0 3217 3234 17 diff/benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.diff-64 7755 7833 78 6645 6728 83 1235 1236 1 2801 3018 217 10271 10271 0 9770 9794 24 diff/benchspec/CPU/505.mcf_r/exe/mcf_r_base.diff-64 19 19 0 12 12 0 1 1 0 0 0 0 0 0 0 25 25 0 diff/benchspec/CPU/508.namd_r/exe/namd_r_base.diff-64 148 148 0 65 65 0 724 786 62 3302 3475 173 735 735 0 5057 5121 64 diff/benchspec/CPU/510.parest_r/exe/parest_r_base.diff-64 1295 1295 0 1447 1448 1 1694 1694 0 5775 5899 124 7367 7368 1 11068 11072 4 diff/benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.diff-64 5 5 0 8 8 0 3 3 0 25 27 2 2 2 0 13 13 0 diff/benchspec/CPU/511.povray_r/exe/povray_r_base.diff-64 312 312 0 438 438 0 245 245 0 140 154 14 165 165 0 982 987 5 diff/benchspec/CPU/519.lbm_r/exe/lbm_r_base.diff-64 3 3 0 3 3 0 5 5 0 0 0 0 0 0 0 8 8 0 diff/benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.diff-64 690 690 0 361 361 0 336 336 0 26 26 0 179 179 0 1216 1216 0 diff/benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.diff-64 446 451 5 905 905 0 138 138 0 1108 1249 141 3946 3946 0 893 893 0 diff/benchspec/CPU/526.blender_r/exe/blender_r_base.diff-64 6750 6761 11 7849 7855 6 2125 2125 0 813 952 139 1341 1340 -1 8339 8358 19 diff/benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.diff-64 5 5 0 8 8 0 3 3 0 25 27 2 2 2 0 13 13 0 diff/benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.diff-64 43 43 0 13 13 0 18 18 0 19 23 4 9 9 0 414 424 10 diff/benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.diff-64 5 5 0 8 8 0 3 3 0 25 27 2 2 2 0 13 13 0 diff/benchspec/CPU/538.imagick_r/exe/imagick_r_base.diff-64 340 340 0 497 497 0 2 2 0 682 682 0 75 75 0 157 158 1 diff/benchspec/CPU/541.leela_r/exe/leela_r_base.diff-64 24 24 0 26 26 0 13 13 0 6 10 4 44 44 0 595 595 0 diff/benchspec/CPU/544.nab_r/exe/nab_r_base.diff-64 56 56 0 70 70 0 92 92 0 36 47 11 10 10 0 454 454 0 diff/benchspec/CPU/557.xz_r/exe/xz_r_base.diff-64 22 22 0 9 9 0 1 1 0 115 118 3 251 251 0 28 28 0 diff/benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.diff-64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 diff/benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.diff-64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0
base diff % increase text data bss total filename text data bss total filename 1775240 561457 8760 2345457 base/benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.base-64 1775432 561305 8760 2345497 diff/benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.diff-64 0.01% 7755224 2191963 1146680 11093867 base/benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.base-64 7754488 2192051 1146680 11093219 diff/benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.diff-64 -0.01% 23064 6391 728 30183 base/benchspec/CPU/505.mcf_r/exe/mcf_r_base.base-64 23064 6391 728 30183 diff/benchspec/CPU/505.mcf_r/exe/mcf_r_base.diff-64 0.00% 677176 30046 960 708182 base/benchspec/CPU/508.namd_r/exe/namd_r_base.base-64 677816 30046 960 708822 diff/benchspec/CPU/508.namd_r/exe/namd_r_base.diff-64 0.09% 7082456 1704125 16128 8802709 base/benchspec/CPU/510.parest_r/exe/parest_r_base.base-64 7084584 1704145 16128 8804857 diff/benchspec/CPU/510.parest_r/exe/parest_r_base.diff-64 0.03% 14072 9141 24 23237 base/benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.base-64 14072 9141 24 23237 diff/benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.diff-64 0.00% 789000 246142 188784 1223926 base/benchspec/CPU/511.povray_r/exe/povray_r_base.base-64 789032 246134 188784 1223950 diff/benchspec/CPU/511.povray_r/exe/povray_r_base.diff-64 0.00% 10648 4744 24 15416 base/benchspec/CPU/519.lbm_r/exe/lbm_r_base.base-64 10640 4744 24 15408 diff/benchspec/CPU/519.lbm_r/exe/lbm_r_base.diff-64 -0.08% 1505352 744909 46536 2296797 base/benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.base-64 1505344 744925 46536 2296805 diff/benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.diff-64 0.00% 3992744 1933641 14752 5941137 base/benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.base-64 3992736 1933641 14752 5941129 diff/benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.diff-64 0.00% 7994116 7841208 417344 16252668 base/benchspec/CPU/526.blender_r/exe/blender_r_base.base-64 7994276 7841400 417344 16253020 diff/benchspec/CPU/526.blender_r/exe/blender_r_base.diff-64 0.00% 14072 9141 24 23237 base/benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.base-64 14072 9141 24 23237 diff/benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.diff-64 0.00% 76264 16488 12138272 12231024 base/benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.base-64 76296 16488 12138272 12231056 diff/benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.diff-64 0.04% 14072 9080 24 23176 base/benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.base-64 14072 9080 24 23176 diff/benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.diff-64 0.00% 1772984 408975 5520 2187479 base/benchspec/CPU/538.imagick_r/exe/imagick_r_base.base-64 1772152 409119 5520 2186791 diff/benchspec/CPU/538.imagick_r/exe/imagick_r_base.diff-64 -0.05% 174616 46829 30032 251477 base/benchspec/CPU/541.leela_r/exe/leela_r_base.base-64 174624 46829 30032 251485 diff/benchspec/CPU/541.leela_r/exe/leela_r_base.diff-64 0.00% 166616 39807 381952 588375 base/benchspec/CPU/544.nab_r/exe/nab_r_base.base-64 166616 39807 381952 588375 diff/benchspec/CPU/544.nab_r/exe/nab_r_base.diff-64 0.00% 131672 69274 17576 218522 base/benchspec/CPU/557.xz_r/exe/xz_r_base.base-64 131680 69274 17576 218530 diff/benchspec/CPU/557.xz_r/exe/xz_r_base.diff-64 0.01% 2440 2361 5008 9809 base/benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.base-64 2440 2361 5008 9809 diff/benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.diff-64 0.00% 2440 2361 5008 9809 base/benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.base-64 2440 2361 5008 9809 diff/benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.diff-64 0.00% 33974268 33975876 0.00%