http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57159
Bug #: 57159 Summary: Latent bug in RTL GCSE/PRE Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: minor Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ju...@gcc.gnu.org Created attachment 30018 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30018 testcase We encountered a wrong-code bug in an out-of-tree port (4.6-based, though the affected code doesn't appear to have changed that much) in target-independent code, though I have not been able to reproduce the problem on a currently-supported target. The port in question is unusual in that it allows read-modify-write arithmetic directly on memory locations (for some operations, at least). That means that during RTL expansion, the compiler may attach REG_EQUAL notes to insns that contain memory operands: (insn 16 15 17 (set (reg:SI 70) (mem/s:SI (reg/v/f:SI 68 [ iter ]) [3 iter_4(D)->xhv_riter+0 S4 A32])) hvmin.c:32 -1 (nil)) (insn 17 16 0 (set (reg:SI 56 [ D.4066 ]) (plus:SI (reg:SI 70) (const_int 1 [0x1]))) hvmin.c:32 -1 (expr_list:REG_EQUAL (plus:SI (mem/s:SI (reg/v/f:SI 68 [ iter ]) [3 iter_4(D)->xhv_riter+0 S4 A32]) (const_int 1 [0x1])) (nil))) That in itself is not a problem, but certain parts of gcse.c's PRE code examine only the instruction itself (e.g. when building tables of "interesting" memory locations -- compute_ld_motion_mems), whereas other parts pay attention to the REG_EQUAL note (hash_scan_set). That means that memory references which are "too complicated" for PRE to handle properly can leak through to later parts of the algorithm and cause misoptimisations -- in the case of the attached file, the increment in the loop is optimised away, and the loop spins forever. The fix then is to prevent "complicated" memory references which may be present in REG_EQUAL notes from being considered for load motion, by stopping them from being added to the relevant tables to start with. That's what the attached patch does, and it works for us -- though without a way of reproducing the bug on mainline, it's not quite obvious that it should be applied there. Interestingly, on x86 (which also allows read-modify-write operations), it looks like the bug is latent because a normal addition clobbers the flags register: (insn 28 91 94 4 (parallel [ (set (reg:SI 60 [ D.2122 ]) (plus:SI (reg:SI 74 [ iter_4(D)->xhv_riter ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) hvmin.c:32 252 {*addsi_1} (expr_list:REG_DEAD (reg:SI 74 [ iter_4(D)->xhv_riter ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (plus:SI (mem/s:SI (reg/v/f:DI 72 [ iter ]) [3 iter_4(D)->xhv_riter+0 S4 A32]) (const_int 1 [0x1])) (nil))))) This is apparently sufficient to prevent the misoptimisation (probably because of want_to_gcse_p's call to can_assign_to_reg_without_clobbers_p). I tried to reproduce on m68k also, but addsi patterns are expanded differently there so the failing condition doesn't trigger.