We found this problem on ARM but I believe the problem affect other platforms as well. In the function distribute_and_simplify() of combine.c, there is not check for floating point expressions. Sometimes it incorrectly optimizes a floating point RTL.
-----bug.c---- static const double one=1.0; double f(double x) { /* This is incorrectly transformed to x + x*x */ return x*(one+x); } --------- $ arm-eabi-gcc -O2 -S -march=armv7-a -mfloat-abi=hard -mfpu=neon -fno-unsafe-math-optimizations -g0 bug.c $ cat bug.s .arch armv7-a .eabi_attribute 27, 3 .eabi_attribute 28, 1 .fpu neon .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 1 .eabi_attribute 30, 2 .eabi_attribute 18, 4 .file "bug.c" .text .align 2 .global f .type f, %function f: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. fmacd d0, d0, d0 bx lr .size f, .-f .ident "GCC: (GNU) 4.5.0 20091005 (experimental)" The expression x*(1.0 + x) above is optmized into x + x * x using the fmacd instruction, which multiplies and accumlates. The following is part combine pass dump, note how instruction 13 is modified.: --- ;; Function f (f) starting the processing of deferred insns ending the processing of deferred insns df_analyze called insn_cost 2: 4 insn_cost 6: 4 insn_cost 7: 4 insn_cost 8: 4 insn_cost 13: 4 insn_cost 16: 0 deferring deletion of insn with uid = 2. modifying insn i2 7 r138:DF=s0:DF+r139:DF REG_DEAD: r139:DF deferring rescan insn with uid = 7. modifying insn i3 8 r137:DF=r138:DF*s0:DF REG_DEAD: r138:DF deferring rescan insn with uid = 8. deferring deletion of insn with uid = 7. deferring deletion of insn with uid = 6. modifying insn i3 8 r137:DF=s0:DF*s0:DF+s0:DF deferring rescan insn with uid = 8. deferring deletion of insn with uid = 8. modifying insn i3 13 s0:DF=s0:DF*s0:DF+s0:DF deferring rescan insn with uid = 13. (note 4 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 2 4 3 2 NOTE_INSN_DELETED) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (note 6 3 7 2 NOTE_INSN_DELETED) (note 7 6 8 2 NOTE_INSN_DELETED) (note 8 7 13 2 NOTE_INSN_DELETED) (insn 13 8 16 2 bug.c:8 (set (reg/i:DF 63 s0) (plus:DF (mult:DF (reg:DF 63 s0 [ x ]) (reg:DF 63 s0 [ x ])) (reg:DF 63 s0 [ x ]))) 610 {*muldf3adddf_vfp} (nil)) (insn 16 13 0 2 bug.c:8 (use (reg/i:DF 63 s0)) -1 (nil)) starting the processing of deferred insns deleting insn with uid = 2. deleting insn with uid = 6. deleting insn with uid = 7. deleting insn with uid = 8. rescanning insn with uid = 13. deleting insn with uid = 13. ending the processing of deferred insns ;; Combiner totals: 10 attempts, 10 substitutions (3 requiring new space), ;; 3 successes. --- The problem happens in this part of distribute_and_simplify_rtx (): tmp = apply_distributive_law (simplify_gen_binary (inner_code, mode, new_op0, new_op1)); if (GET_CODE (tmp) != outer_code && rtx_cost (tmp, SET, optimize_this_for_speed_p) < rtx_cost (x, SET, optimize_this_for_speed_p)) return tmp; It synthesizes a new expression by distributing one of the sub-expressions and then call apply_distribute_law. In the test-case above, apply_distribute_law detects a floating point RTL expression and returns immediately but the simplified expression generated has a lower RTL-cost than the original expression. Hence distribute_and_simplify_rtx returns the simplified expression, eventhough it should not unless -funsafe-math-optimizations is given. I think the fix is to add this test at the entrance of the function distribute_and_simplify_rtx /* Distributivity is not true for floating point as it can change the value. So we don't do it unless -funsafe-math-optimizations. */ if (FLOAT_MODE_P (GET_MODE (x)) && ! flag_unsafe_math_optimizations) return NULL_RTX; -- Summary: Distribute floating point expressions causes bad code. Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dougkwan at google dot com GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: arm-none-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41574