Re: [PATCH 6/6] Add a late-combine pass [PR106594]

Richard Sandiford Fri, 21 Jun 2024 01:10:10 -0700

Oleg Endo <oleg.e...@t-online.de> writes:
> On Thu, 2024-06-20 at 14:34 +0100, Richard Sandiford wrote:
>> 
>> I tried compiling at least one target per CPU directory and comparing
>> the assembly output for parts of the GCC testsuite.  This is just a way
>> of getting a flavour of how the pass performs; it obviously isn't a
>> meaningful benchmark.  All targets seemed to improve on average:
>> 
>> Target                 Tests   Good    Bad   %Good   Delta  Median
>> ======                 =====   ====    ===   =====   =====  ======
>> aarch64-linux-gnu       2215   1975    240  89.16%   -4159      -1
>> aarch64_be-linux-gnu    1569   1483     86  94.52%  -10117      -1
>> alpha-linux-gnu         1454   1370     84  94.22%   -9502      -1
>> amdgcn-amdhsa           5122   4671    451  91.19%  -35737      -1
>> arc-elf                 2166   1932    234  89.20%  -37742      -1
>> arm-linux-gnueabi       1953   1661    292  85.05%  -12415      -1
>> arm-linux-gnueabihf     1834   1549    285  84.46%  -11137      -1
>> avr-elf                 4789   4330    459  90.42% -441276      -4
>> bfin-elf                2795   2394    401  85.65%  -19252      -1
>> bpf-elf                 3122   2928    194  93.79%   -8785      -1
>> c6x-elf                 2227   1929    298  86.62%  -17339      -1
>> cris-elf                3464   3270    194  94.40%  -23263      -2
>> csky-elf                2915   2591    324  88.89%  -22146      -1
>> epiphany-elf            2399   2304     95  96.04%  -28698      -2
>> fr30-elf                7712   7299    413  94.64%  -99830      -2
>> frv-linux-gnu           3332   2877    455  86.34%  -25108      -1
>> ft32-elf                2775   2667    108  96.11%  -25029      -1
>> h8300-elf               3176   2862    314  90.11%  -29305      -2
>> hppa64-hp-hpux11.23     4287   4247     40  99.07%  -45963      -2
>> ia64-linux-gnu          2343   1946    397  83.06%   -9907      -2
>> iq2000-elf              9684   9637     47  99.51% -126557      -2
>> lm32-elf                2681   2608     73  97.28%  -59884      -3
>> loongarch64-linux-gnu   1303   1218     85  93.48%  -13375      -2
>> m32r-elf                1626   1517    109  93.30%   -9323      -2
>> m68k-linux-gnu          3022   2620    402  86.70%  -21531      -1
>> mcore-elf               2315   2085    230  90.06%  -24160      -1
>> microblaze-elf          2782   2585    197  92.92%  -16530      -1
>> mipsel-linux-gnu        1958   1827    131  93.31%  -15462      -1
>> mipsisa64-linux-gnu     1655   1488    167  89.91%  -16592      -2
>> mmix                    4914   4814    100  97.96%  -63021      -1
>> mn10300-elf             3639   3320    319  91.23%  -34752      -2
>> moxie-rtems             3497   3252    245  92.99%  -87305      -3
>> msp430-elf              4353   3876    477  89.04%  -23780      -1
>> nds32le-elf             3042   2780    262  91.39%  -27320      -1
>> nios2-linux-gnu         1683   1355    328  80.51%   -8065      -1
>> nvptx-none              2114   1781    333  84.25%  -12589      -2
>> or1k-elf                3045   2699    346  88.64%  -14328      -2
>> pdp11                   4515   4146    369  91.83%  -26047      -2
>> pru-elf                 1585   1245    340  78.55%   -5225      -1
>> riscv32-elf             2122   2000    122  94.25% -101162      -2
>> riscv64-elf             1841   1726    115  93.75%  -49997      -2
>> rl78-elf                2823   2530    293  89.62%  -40742      -4
>> rx-elf                  2614   2480    134  94.87%  -18863      -1
>> s390-linux-gnu          1591   1393    198  87.55%  -16696      -1
>> s390x-linux-gnu         2015   1879    136  93.25%  -21134      -1
>> sh-linux-gnu            1870   1507    363  80.59%   -9491      -1
>> sparc-linux-gnu         1123   1075     48  95.73%  -14503      -1
>> sparc-wrs-vxworks       1121   1073     48  95.72%  -14578      -1
>> sparc64-linux-gnu       1096   1021     75  93.16%  -15003      -1
>> v850-elf                1897   1728    169  91.09%  -11078      -1
>> vax-netbsdelf           3035   2995     40  98.68%  -27642      -1
>> visium-elf              1392   1106    286  79.45%   -7984      -2
>> xstormy16-elf           2577   2071    506  80.36%  -13061      -1
>> 
>> 
>
> Since you have already briefly compared some of the code, can you share
> those cases which get worse and might require some potential follow up
> patches?


I think a lot of them are unpredictable secondary effects, such as on
register allocation, tail merging potential, and so on.  For sh, it also
includes whether delay slots are filled with useful work, or whether
they get a nop.  (Instruction combination tends to create more complex
instructions, so there will be fewer 2-byte instructions to act as delay
slot candidates.)

Also, this kind of combination can decrease the number of instructions
but increase the constant pool size.  The figures take that into account.
(The comparison is a bit ad-hoc, though, since I wasn't dedicated enough
to try to build a full source->executable toolchain for each target. :))

To give one example, the effect on gcc.c-torture/compile/20040727-1.c is:

@@ -6,18 +6,21 @@
        .global GC_dirty_init
        .type   GC_dirty_init, @function
 GC_dirty_init:
-       mov.l   .L2,r4
-       mov     r4,r6
-       mov     r4,r5
-       add     #-64,r5
-       mov.l   .L3,r0
+       mov.l   .L2,r6
+       mov.l   .L3,r5
+       mov.l   .L4,r4
+       mov.l   .L5,r0
        jmp     @r0
-       add     #-128,r4
-.L4:
+       nop
+.L6:
        .align 2
 .L2:
        .long   GC_old_exc_ports+132
 .L3:
+       .long   GC_old_exc_ports+68
+.L4:
+       .long   GC_old_exc_ports+4
+.L5:
        .long   task_get_exception_ports
        .size   GC_dirty_init, .-GC_dirty_init
        .local  GC_old_exc_ports

Thanks,
Richard

Re: [PATCH 6/6] Add a late-combine pass [PR106594]

Reply via email to