https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #23 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
(In reply to wilco from comment #22)
>
> What I meant is that your patch still makes a large difference on the
> original test case despite making no difference in simple cases like the
> above.
For sure it is papering over something, as the complete init-regs pass,
it is even documented to do that:
/* Check all of the uses of pseudo variables. If any use that is MUST
uninitialized, add a store of 0 immediately before it. For
subregs, this makes combine happy. For full word regs, this makes
other optimizations, like the register allocator and the reg-stack
happy as well as papers over some problems on the arm and other
processors where certain isa constraints cannot be handled by gcc.
I have seen the DI = 0 decay to two SI = 0 and finally removed
well before the init-regs pass runs, and the init-regs pass finds
nothing to do in this test case. Nevertheless they have a very
positive influence in the lra pass. In the moment I do not see,
what could replace this. Magic.
> Anyway, there is another bug: on AArch64 we correctly recognize there are 8
> 1-byte loads, shifts and orrs which can be replaced by a single 8-byte load
> and a byte reverse. Although it is recognized on ARM and works correctly if
> it is a little endian load, it doesn't perform the optimization if a byte
> reverse is needed. As a result there are lots of 64-bit shifts and orrs
> which create huge register pressure if not expanded early.
>
> This testcase is turning out to be a goldmine of bugs...
Yes, and the test case can be modified to exercise other insns too.
For instance I just added di-mode ~ to the sigma blocks:
#define Sigma0(x) ~(ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39))
#define Sigma1(x) ~(ROTR((x),14) ^ ROTR((x),18) ^ ROTR((x),41))
#define sigma0(x) ~(ROTR((x),1) ^ ROTR((x),8) ^ ((x)>>7))
#define sigma1(x) ~(ROTR((x),19) ^ ROTR((x),61) ^ ((x)>>6))
and saw the stack use double with -marm -mfpu=vfp -msoft-float -Os
to 528, and when I disable the one_cmpldi2 pattern it goes
back to 278 again:
thus I will add this to the second patch:
@@ -5020,7 +5020,7 @@
(define_insn_and_split "one_cmpldi2"
[(set (match_operand:DI 0 "s_register_operand" "=w,&r,&r,?w")
(not:DI (match_operand:DI 1 "s_register_operand" " w, 0, r, w")))]
- "TARGET_32BIT"
+ "TARGET_32BIT && TARGET_HARD_FLOAT"
"@
vmvn\t%P0, %P1
#
Not every di2 pattern is hamful, for instance unary minus does nothing.
Mostly all patterns that mix =w and =r alternatives.