RE: New rematerialization sub-pass in LRA

Wilco Dijkstra Tue, 14 Oct 2014 09:38:18 -0700

> Wilco Dijkstra wrote:
> > Vladimir Makarov wrote:
> > > On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and
> > > SPECFP is ~0.2% faster.
> > Thanks for reporting this.  It is important for me as I have no aarch64
> > machine for benchmarking.
> >
> > Perlbmk performance degradation is too big and I'll definitely look at
> > this problem.
> 
> Looking at the diffs in regexec.c which has the hot function regmatch(),
> nothing obvious stands out that could cause a serious regression.
> I did notice this around line 2300:
> 
> .L802:
>         ldr     x1, [x23, 48]
>         adrp    x5, PL_savestack_ix
>         ldr     w0, [x23]
>         str     x5, [sp, 104]
>         str     x1, [x24, #:lo12:PL_regcc]
>         ldr     w27, [x1, 4]
>         bl      regcppush
> -       ldr     x5, [sp, 104]
>         str     w0, [sp, 112]
>         ldr     x0, [x23, 32]
> +       adrp    x5, PL_savestack_ix
>         ldr     w28, [x5, #:lo12:PL_savestack_ix]
> +       str     x5, [sp, 104]
>         bl      regmatch
>         ldr     x5, [sp, 104]
>         mov     w19, w0
>         ldr     w1, [sp, 112]
>         ldr     w0, [x5, #:lo12:PL_savestack_ix]
> 
> So it rematerializes once instance, but fails to rematerialize the second use.
> An extra store is inserted, and the first adrp and store are not removed as 
> dead.


A simple example that reproduces the issue (-mcpu=cortex-a57 -O2 
-fomit-frame-pointer 
-ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 
-ffixed-x25 
-ffixed-x26 -ffixed-x27 -ffixed-x28 -ffixed-x29 -ffixed-x30). It looks like an 
odd
interaction between -fcaller-saves and rematerialization.

void g(void);
int x;
int f3b(int y)
{
   y += x;
   g();
   y += x;
   g();
   y += x;
   return y;
}

f3b:
        adrp    x2, x   --> DEAD
        sub     sp, sp, #16
        ldr     w1, [x2, #:lo12:x]
        str     x2, [sp]  --> DEAD
        add     w0, w0, w1
        str     w0, [sp]  --> reuse of stackslot!!!
        bl      g
        adrp    x2, x
        ldr     w0, [sp]
        ldr     w1, [x2, #:lo12:x]
        str     x2, [sp, 8]
        add     w0, w0, w1
        str     w0, [sp]  --> REMOVE
        bl      g
        ldr     x2, [sp, 8] --> rematerialize adrp
        ldr     w0, [sp]
        add     sp, sp, 16
        ldr     w1, [x2, #:lo12:x]
        add     w0, w0, w1
        ret

Wilco

RE: New rematerialization sub-pass in LRA

Reply via email to