Oleg Endo <oleg.e...@t-online.de> writes: > On Thu, 2024-06-20 at 14:34 +0100, Richard Sandiford wrote: >> >> I tried compiling at least one target per CPU directory and comparing >> the assembly output for parts of the GCC testsuite. This is just a way >> of getting a flavour of how the pass performs; it obviously isn't a >> meaningful benchmark. All targets seemed to improve on average: >> >> Target Tests Good Bad %Good Delta Median >> ====== ===== ==== === ===== ===== ====== >> aarch64-linux-gnu 2215 1975 240 89.16% -4159 -1 >> aarch64_be-linux-gnu 1569 1483 86 94.52% -10117 -1 >> alpha-linux-gnu 1454 1370 84 94.22% -9502 -1 >> amdgcn-amdhsa 5122 4671 451 91.19% -35737 -1 >> arc-elf 2166 1932 234 89.20% -37742 -1 >> arm-linux-gnueabi 1953 1661 292 85.05% -12415 -1 >> arm-linux-gnueabihf 1834 1549 285 84.46% -11137 -1 >> avr-elf 4789 4330 459 90.42% -441276 -4 >> bfin-elf 2795 2394 401 85.65% -19252 -1 >> bpf-elf 3122 2928 194 93.79% -8785 -1 >> c6x-elf 2227 1929 298 86.62% -17339 -1 >> cris-elf 3464 3270 194 94.40% -23263 -2 >> csky-elf 2915 2591 324 88.89% -22146 -1 >> epiphany-elf 2399 2304 95 96.04% -28698 -2 >> fr30-elf 7712 7299 413 94.64% -99830 -2 >> frv-linux-gnu 3332 2877 455 86.34% -25108 -1 >> ft32-elf 2775 2667 108 96.11% -25029 -1 >> h8300-elf 3176 2862 314 90.11% -29305 -2 >> hppa64-hp-hpux11.23 4287 4247 40 99.07% -45963 -2 >> ia64-linux-gnu 2343 1946 397 83.06% -9907 -2 >> iq2000-elf 9684 9637 47 99.51% -126557 -2 >> lm32-elf 2681 2608 73 97.28% -59884 -3 >> loongarch64-linux-gnu 1303 1218 85 93.48% -13375 -2 >> m32r-elf 1626 1517 109 93.30% -9323 -2 >> m68k-linux-gnu 3022 2620 402 86.70% -21531 -1 >> mcore-elf 2315 2085 230 90.06% -24160 -1 >> microblaze-elf 2782 2585 197 92.92% -16530 -1 >> mipsel-linux-gnu 1958 1827 131 93.31% -15462 -1 >> mipsisa64-linux-gnu 1655 1488 167 89.91% -16592 -2 >> mmix 4914 4814 100 97.96% -63021 -1 >> mn10300-elf 3639 3320 319 91.23% -34752 -2 >> moxie-rtems 3497 3252 245 92.99% -87305 -3 >> msp430-elf 4353 3876 477 89.04% -23780 -1 >> nds32le-elf 3042 2780 262 91.39% -27320 -1 >> nios2-linux-gnu 1683 1355 328 80.51% -8065 -1 >> nvptx-none 2114 1781 333 84.25% -12589 -2 >> or1k-elf 3045 2699 346 88.64% -14328 -2 >> pdp11 4515 4146 369 91.83% -26047 -2 >> pru-elf 1585 1245 340 78.55% -5225 -1 >> riscv32-elf 2122 2000 122 94.25% -101162 -2 >> riscv64-elf 1841 1726 115 93.75% -49997 -2 >> rl78-elf 2823 2530 293 89.62% -40742 -4 >> rx-elf 2614 2480 134 94.87% -18863 -1 >> s390-linux-gnu 1591 1393 198 87.55% -16696 -1 >> s390x-linux-gnu 2015 1879 136 93.25% -21134 -1 >> sh-linux-gnu 1870 1507 363 80.59% -9491 -1 >> sparc-linux-gnu 1123 1075 48 95.73% -14503 -1 >> sparc-wrs-vxworks 1121 1073 48 95.72% -14578 -1 >> sparc64-linux-gnu 1096 1021 75 93.16% -15003 -1 >> v850-elf 1897 1728 169 91.09% -11078 -1 >> vax-netbsdelf 3035 2995 40 98.68% -27642 -1 >> visium-elf 1392 1106 286 79.45% -7984 -2 >> xstormy16-elf 2577 2071 506 80.36% -13061 -1 >> >> > > Since you have already briefly compared some of the code, can you share > those cases which get worse and might require some potential follow up > patches?
I think a lot of them are unpredictable secondary effects, such as on register allocation, tail merging potential, and so on. For sh, it also includes whether delay slots are filled with useful work, or whether they get a nop. (Instruction combination tends to create more complex instructions, so there will be fewer 2-byte instructions to act as delay slot candidates.) Also, this kind of combination can decrease the number of instructions but increase the constant pool size. The figures take that into account. (The comparison is a bit ad-hoc, though, since I wasn't dedicated enough to try to build a full source->executable toolchain for each target. :)) To give one example, the effect on gcc.c-torture/compile/20040727-1.c is: @@ -6,18 +6,21 @@ .global GC_dirty_init .type GC_dirty_init, @function GC_dirty_init: - mov.l .L2,r4 - mov r4,r6 - mov r4,r5 - add #-64,r5 - mov.l .L3,r0 + mov.l .L2,r6 + mov.l .L3,r5 + mov.l .L4,r4 + mov.l .L5,r0 jmp @r0 - add #-128,r4 -.L4: + nop +.L6: .align 2 .L2: .long GC_old_exc_ports+132 .L3: + .long GC_old_exc_ports+68 +.L4: + .long GC_old_exc_ports+4 +.L5: .long task_get_exception_ports .size GC_dirty_init, .-GC_dirty_init .local GC_old_exc_ports Thanks, Richard