https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #23 from Bernd Edlinger <bernd.edlinger at hotmail dot de> --- (In reply to wilco from comment #22) > > What I meant is that your patch still makes a large difference on the > original test case despite making no difference in simple cases like the > above. For sure it is papering over something, as the complete init-regs pass, it is even documented to do that: /* Check all of the uses of pseudo variables. If any use that is MUST uninitialized, add a store of 0 immediately before it. For subregs, this makes combine happy. For full word regs, this makes other optimizations, like the register allocator and the reg-stack happy as well as papers over some problems on the arm and other processors where certain isa constraints cannot be handled by gcc. I have seen the DI = 0 decay to two SI = 0 and finally removed well before the init-regs pass runs, and the init-regs pass finds nothing to do in this test case. Nevertheless they have a very positive influence in the lra pass. In the moment I do not see, what could replace this. Magic. > Anyway, there is another bug: on AArch64 we correctly recognize there are 8 > 1-byte loads, shifts and orrs which can be replaced by a single 8-byte load > and a byte reverse. Although it is recognized on ARM and works correctly if > it is a little endian load, it doesn't perform the optimization if a byte > reverse is needed. As a result there are lots of 64-bit shifts and orrs > which create huge register pressure if not expanded early. > > This testcase is turning out to be a goldmine of bugs... Yes, and the test case can be modified to exercise other insns too. For instance I just added di-mode ~ to the sigma blocks: #define Sigma0(x) ~(ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39)) #define Sigma1(x) ~(ROTR((x),14) ^ ROTR((x),18) ^ ROTR((x),41)) #define sigma0(x) ~(ROTR((x),1) ^ ROTR((x),8) ^ ((x)>>7)) #define sigma1(x) ~(ROTR((x),19) ^ ROTR((x),61) ^ ((x)>>6)) and saw the stack use double with -marm -mfpu=vfp -msoft-float -Os to 528, and when I disable the one_cmpldi2 pattern it goes back to 278 again: thus I will add this to the second patch: @@ -5020,7 +5020,7 @@ (define_insn_and_split "one_cmpldi2" [(set (match_operand:DI 0 "s_register_operand" "=w,&r,&r,?w") (not:DI (match_operand:DI 1 "s_register_operand" " w, 0, r, w")))] - "TARGET_32BIT" + "TARGET_32BIT && TARGET_HARD_FLOAT" "@ vmvn\t%P0, %P1 # Not every di2 pattern is hamful, for instance unary minus does nothing. Mostly all patterns that mix =w and =r alternatives.