[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

wdijkstr at arm dot com Tue, 21 Oct 2014 10:41:53 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503


--- Comment #10 from Wilco <wdijkstr at arm dot com> ---
The loops shown are not the correct inner loops for those options - with
-ffast-math they are vectorized. LLVM unrolls 2x but GCC doesn't. So the
question is why GCC doesn't unroll vectorized loops like LLVM?

GCC:

.L24:
    ldr    q3, [x13, x5]
    add    x6, x6, 1
    ldr    q2, [x16, x5]
    cmp    x6, x12
    add    x5, x5, 16
    fmla    v1.2d, v3.2d, v2.2d
    bcc    .L24

LLVM:

.LBB2_12:
    ldur    q2, [x8, #-16]
    ldr    q3, [x8], #32
    ldur    q4, [x21, #-16]
    ldr    q5, [x21], #32
    fmla    v1.2d, v2.2d, v4.2d
    fmla    v0.2d, v3.2d, v5.2d
    sub    x30, x30, #4            // =4
    cbnz    x30, .LBB2_12

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

Reply via email to