[Bug target/77308] surprisingly large stack usage for sha512 on arm

bernd.edlinger at hotmail dot de Thu, 27 Oct 2016 09:52:39 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308


--- Comment #23 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
(In reply to wilco from comment #22)
> 
> What I meant is that your patch still makes a large difference on the
> original test case despite making no difference in simple cases like the
> above.

For sure it is papering over something, as the complete init-regs pass,
it is even documented to do that:

/* Check all of the uses of pseudo variables.  If any use that is MUST
   uninitialized, add a store of 0 immediately before it.  For
   subregs, this makes combine happy.  For full word regs, this makes
   other optimizations, like the register allocator and the reg-stack
   happy as well as papers over some problems on the arm and other
   processors where certain isa constraints cannot be handled by gcc.

I have seen the DI = 0 decay to two SI = 0 and finally removed
well before the init-regs pass runs, and the init-regs pass finds
nothing to do in this test case.  Nevertheless they have a very
positive influence in the lra pass.  In the moment I do not see,
what could replace this.  Magic.

> Anyway, there is another bug: on AArch64 we correctly recognize there are 8
> 1-byte loads, shifts and orrs which can be replaced by a single 8-byte load
> and a byte reverse. Although it is recognized on ARM and works correctly if
> it is a little endian load, it doesn't perform the optimization if a byte
> reverse is needed. As a result there are lots of 64-bit shifts and orrs
> which create huge register pressure if not expanded early.
> 
> This testcase is turning out to be a goldmine of bugs...


Yes, and the test case can be modified to exercise other insns too.

For instance I just added di-mode ~ to the sigma blocks:

#define Sigma0(x)       ~(ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39))
#define Sigma1(x)       ~(ROTR((x),14) ^ ROTR((x),18) ^ ROTR((x),41))
#define sigma0(x)       ~(ROTR((x),1)  ^ ROTR((x),8)  ^ ((x)>>7))
#define sigma1(x)       ~(ROTR((x),19) ^ ROTR((x),61) ^ ((x)>>6))

and saw the stack use double with -marm -mfpu=vfp -msoft-float -Os
to 528, and when I disable the one_cmpldi2 pattern it goes
back to 278 again:

thus I will add this to the second patch:

@@ -5020,7 +5020,7 @@
 (define_insn_and_split "one_cmpldi2"
   [(set (match_operand:DI 0 "s_register_operand"        "=w,&r,&r,?w")
        (not:DI (match_operand:DI 1 "s_register_operand" " w, 0, r, w")))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && TARGET_HARD_FLOAT"
   "@
    vmvn\t%P0, %P1
    #


Not every di2 pattern is hamful, for instance unary minus does nothing.
Mostly all patterns that mix =w and =r alternatives.

[Bug target/77308] surprisingly large stack usage for sha512 on arm

Reply via email to