https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70809

            Bug ID: 70809
           Summary: [AArch64] aarch64_vmls pattern should be rejected if
                    -ffp-contract=off
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jgreenhalgh at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*-*-*

Take this simple testcase:

  void
  foo (float * __restrict__ __attribute__ ((aligned (16))) a,
       float * __restrict__ __attribute__ ((aligned (16))) x,
       float * __restrict__ __attribute__ ((aligned (16))) y,
       float * __restrict__ __attribute__ ((aligned (16))) z)
  {
    unsigned i = 0;
    for (i = 0; i < 256; i++)
      a[i] = x[i] - (y[i] * z[i]);
  }

GCC for AArch64 (all versions) will generate a vectorized fmls instruction even
when given the --fp-contract=off command (for trunk and 6 you'll need to play
with -mcpu options to find one which permits the combine through the cost
model):

(for trunk) $ gcc -O3 -ffp-contract=off -mcpu=xgene1 foo.c

   <snip>
   .L4:
        ldr     q2, [x9, x4]
        add     w5, w5, 1
        ldr     q1, [x8, x4]
        cmp     w5, w7
        ldr     q0, [x10, x4]
        fmls    v0.4s, v2.4s, v1.4s
        str     q0, [x6, x4]
        add     x4, x4, 16
        bcc     .L4
  <snip>

The problem seems pretty clear, the aarch64_vmls<mode> pattern needs to be
tightened up not to fuse multiplies and subtracts when we're not in
-ffp-contract=fast.

  (define_insn "aarch64_vmls<mode>"
    [(set (match_operand:VDQF 0 "register_operand" "=w")
         (minus:VDQF (match_operand:VDQF 1 "register_operand" "0")
                     (mult:VDQF (match_operand:VDQF 2 "register_operand" "w")
                                (match_operand:VDQF 3 "register_operand"
"w"))))]
    "TARGET_SIMD"
   "fmls\\t%0.<Vtype>, %2.<Vtype>, %3.<Vtype>"
    [(set_attr "type" "neon_fp_mla_<Vetype>_scalar<q>")]
  )

Reply via email to