On 9/26/23 10:21, Richard Sandiford wrote:
This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.  I hope it would
also help with Robin's vec_duplicate testcase, although the
pressure heuristic might need tweaking for that case.

This is just a first step..  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutitable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

I've run an assembly comparison with one target per CPU directory,
and it seems to be a win for all targets except nvptx (which is hard
to measure, being a higher-level asm).  The biggest winner seemed
to be AVR.

However, if a version of the pass does go in, it might be better
to enable it by default only on targets where the extra compile
time seems to be worth it.  IMO, fixing PR106594 and the MOVPRFX
issues makes it worthwhile for AArch64.

The patch contains various bug fixes and new helper routines.
I'd submit those separately in the final version.  Because of
that, there's no GNU changelog yet.

Bootstrapped & regression tested on aarch64-linux-gnu so far.
Very interesting.

I would generally expect it to be a win on most targets and might allow us to reduce the number of post-reload hacks we do. So I'd lean towards enabling it everywhere.


With that in mind, I briefly threw it into my tester. The first thing that popped out was rl78-elf regresses on compile/20021008-1.c.

In the pre-RA version we've taken these insns:

(insn 22 21 7 2 (set (reg/v/f:HI 44 [ buf ])
        (const_int 0 [0])) "k.c":9:9 -1
     (nil))
(insn 7 22 8 2 (set (subreg:SI (reg:DF 43 [ _1 ]) 0)
        (mem:SI (plus:HI (reg/v/f:HI 44 [ buf ])
                (const_int 1 [0x1])) [1 MEM[(long double *)buf_4(D) + 1B]+0 S4 A16])) 
"k.c":9:9 2 {movsi}
     (nil))
(insn 8 7 9 2 (set (subreg:SI (reg:DF 43 [ _1 ]) 4)
        (mem:SI (plus:HI (reg/v/f:HI 44 [ buf ])
                (const_int 5 [0x5])) [1 MEM[(long double *)buf_4(D) + 1B]+4 S4 A16])) 
"k.c":9:9 2 {movsi}
     (expr_list:REG_DEAD (reg/v/f:HI 44 [ buf ])
        (nil)))


We combine insn 22 with insn 7 and 8 resulting in:

(insn 7 22 8 2 (set (subreg:SI (reg:DF 43 [ _1 ]) 0)
        (mem:SI (const_int 1 [0x1]) [1 MEM[(long double *)buf_4(D) + 1B]+0 S4 A16])) 
"k.c":9:9 2 {movsi}
     (nil))
(insn 8 7 9 2 (set (subreg:SI (reg:DF 43 [ _1 ]) 4)
        (mem:SI (const_int 5 [0x5]) [1 MEM[(long double *)buf_4(D) + 1B]+4 S4 A16])) 
"k.c":9:9 2 {movsi}
     (nil))


Which ultimately triggers an assembler error:

k.s: Assembler messages:
k.s:41: Error: movw ax,!1
k.s:41: Error:          ^ Expression not word-aligned
k.s:43: Error: movw ax,!3
k.s:43: Error:          ^ Expression not word-aligned
[ ... ]


This seems more likely than not to be a target issue. I suspect combine didn't trip over this because of the multiple uses of (reg 44).


Jeff

Reply via email to