[Bug rtl-optimization/41574] New: Distribute floating point expressions causes bad code.

dougkwan at google dot com Mon, 05 Oct 2009 01:01:35 -0700

We found this problem on ARM but I believe the problem affect other platforms
as well.  In the function distribute_and_simplify() of combine.c, there is not
check for floating point expressions.  Sometimes it incorrectly optimizes a
floating point RTL.


-----bug.c----
static const double one=1.0;

double
f(double x)
{
  /* This is incorrectly transformed to x + x*x */
  return x*(one+x);
}
---------

$ arm-eabi-gcc -O2 -S -march=armv7-a -mfloat-abi=hard -mfpu=neon
-fno-unsafe-math-optimizations -g0 bug.c
$ cat bug.s
        .arch armv7-a
        .eabi_attribute 27, 3
        .eabi_attribute 28, 1
        .fpu neon
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 1
        .eabi_attribute 30, 2
        .eabi_attribute 18, 4
        .file   "bug.c"
        .text
        .align  2
        .global f
        .type   f, %function
f:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        fmacd   d0, d0, d0
        bx      lr
        .size   f, .-f
        .ident  "GCC: (GNU) 4.5.0 20091005 (experimental)"

The expression x*(1.0 + x) above is optmized into x + x * x using the fmacd
instruction, which multiplies and accumlates.  The following is part combine
pass dump, note how instruction 13 is modified.:

---

;; Function f (f)

starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
insn_cost 2: 4
insn_cost 6: 4
insn_cost 7: 4
insn_cost 8: 4
insn_cost 13: 4
insn_cost 16: 0
deferring deletion of insn with uid = 2.
modifying insn i2     7 r138:DF=s0:DF+r139:DF
      REG_DEAD: r139:DF
deferring rescan insn with uid = 7.
modifying insn i3     8 r137:DF=r138:DF*s0:DF
      REG_DEAD: r138:DF
deferring rescan insn with uid = 8.
deferring deletion of insn with uid = 7.
deferring deletion of insn with uid = 6.
modifying insn i3     8 r137:DF=s0:DF*s0:DF+s0:DF
deferring rescan insn with uid = 8.
deferring deletion of insn with uid = 8.
modifying insn i3    13 s0:DF=s0:DF*s0:DF+s0:DF
deferring rescan insn with uid = 13.
(note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(note 2 4 3 2 NOTE_INSN_DELETED)

(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)

(note 6 3 7 2 NOTE_INSN_DELETED)

(note 7 6 8 2 NOTE_INSN_DELETED)

(note 8 7 13 2 NOTE_INSN_DELETED)

(insn 13 8 16 2 bug.c:8 (set (reg/i:DF 63 s0)
        (plus:DF (mult:DF (reg:DF 63 s0 [ x ])
                (reg:DF 63 s0 [ x ]))
            (reg:DF 63 s0 [ x ]))) 610 {*muldf3adddf_vfp} (nil))

(insn 16 13 0 2 bug.c:8 (use (reg/i:DF 63 s0)) -1 (nil))
starting the processing of deferred insns
deleting insn with uid = 2.
deleting insn with uid = 6.
deleting insn with uid = 7.
deleting insn with uid = 8.
rescanning insn with uid = 13.
deleting insn with uid = 13.
ending the processing of deferred insns

;; Combiner totals: 10 attempts, 10 substitutions (3 requiring new space),
;; 3 successes.

---

The problem happens in this part of distribute_and_simplify_rtx ():


  tmp = apply_distributive_law (simplify_gen_binary (inner_code, mode,
                                                     new_op0, new_op1));
  if (GET_CODE (tmp) != outer_code
      && rtx_cost (tmp, SET, optimize_this_for_speed_p)
         < rtx_cost (x, SET, optimize_this_for_speed_p))
    return tmp;

It synthesizes a new expression by distributing one of the sub-expressions and
then call apply_distribute_law.  In the test-case above, apply_distribute_law
detects a floating point RTL expression and returns immediately but the
simplified expression generated has a lower RTL-cost than the original
expression.  Hence distribute_and_simplify_rtx returns the simplified
expression, eventhough it should not unless -funsafe-math-optimizations is
given.

I think the fix is to add this test at the entrance of the function
distribute_and_simplify_rtx

  /* Distributivity is not true for floating point as it can change the
     value.  So we don't do it unless -funsafe-math-optimizations.  */
  if (FLOAT_MODE_P (GET_MODE (x))
      && ! flag_unsafe_math_optimizations)
    return NULL_RTX;


-- 
           Summary: Distribute floating point expressions causes bad code.
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dougkwan at google dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: arm-none-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41574

[Bug rtl-optimization/41574] New: Distribute floating point expressions causes bad code.

Reply via email to