[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 Georg-Johann Lay changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #18 from Georg-Johann Lay --- Fixed in v14.
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 --- Comment #17 from Georg-Johann Lay --- Created attachment 55318 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55318&action=edit Reduced patch for v8 (handles only CLOBBERs, hence should be good enough for AVR).
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 --- Comment #16 from Georg-Johann Lay --- Created attachment 55317 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55317&action=edit Backport to v13 of Jeff's patch.
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 --- Comment #15 from CVS Commits --- The master branch has been updated by Jeff Law : https://gcc.gnu.org/g:ae193f9008e02683e27f3c87f3b06f38e103b1d0 commit r14-1738-gae193f9008e02683e27f3c87f3b06f38e103b1d0 Author: Jeff Law Date: Mon Jun 12 12:52:10 2023 -0600 [committed] [PR rtl-optimization/101188] Fix reload_cse_move2add ignoring clobbers So as Georg-Johann discusses in the BZ, reload_cse_move2add can generate incorrect code when optimizing code with clobbers. Specifically in the case where we try to optimize a sequence of 4 operations down to 3 operations we can reset INSN to the next instruction and continue the loop. That skips the code to invalidate objects based on things like REG_INC nodes, stack pushes and most importantly clobbers attached to the current insn. This patch factors all of the invalidation code used by reload_cse_move2add into a new function and calls it at the appropriate time. Georg-Johann has confirmed this patch fixes his avr bug and I've had it in my tester over the weekend. It's bootstrapped and regression tested on aarch64, m68k, sh4, alpha and hppa. It's also regression tested successfully on a wide variety of other targets. gcc/ PR rtl-optimization/101188 * postreload.cc (reload_cse_move2add_invalidate): New function, extracted from... (reload_cse_move2add): Call reload_cse_move2add_invalidate. gcc/testsuite PR rtl-optimization/101188 * gcc.c-torture/execute/pr101188.c: New test
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 --- Comment #14 from Ulrich Weigand --- (In reply to Georg-Johann Lay from comment #13) > Also I don't have a test case for your scenario. I can reproduce the bug > back to v5 on avr and maybe it is even older. As it appears, this PR lead > to no hickups on any other target, so for now I'd like to keep the fix > restricted to what I can test. I agree that your patch looks correct and unlikely to cause any new problems, so I won't object to it being committed. I just wanted to point out that it might not be a complete fix.
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 --- Comment #13 from Georg-Johann Lay --- (In reply to Ulrich Weigand from comment #12) > I think your root cause analysis is correct. In this part of code: > > if (success) > delete_insn (insn); > changed |= success; > insn = next; > move2add_record_mode (reg); > reg_offset[regno] > = trunc_int_for_mode (added_offset + base_offset, > mode); > continue; > > the intent seems to be to manually update the move2add data structures to > account for the effects of "next", because the default logic is now skipped > for the "next" insn. That's why in particular the reg mode and offset are > manually calculated. > > This manual logic however is really only correct if "next" is actually just > a simple SET. Reading the comment before the whole loop: > /* For simplicity, we only perform this optimization on > straightforward SETs. */ > makes me suspect the original author assumed that "next" is in fact a > straightforward SET here as well. That would render the optimization far less likely, e..g in the case of clobber of CCmode regs. I understodd the comment as only referring to "insn", not necessarily to "next". > This is however not true due to behavior > of the "single_set" extractor. (I'm wondering if "single_set" used to be > defined differently back in the days?) > Your fix does look correct to me as far as handling parallel CLOBBERs go. > However, looking at "single_set", it seems there is yet another case: the > extractor also accepts a parallel of two or more SETs, as long as all except > one of those SETs have destinations that are dead. These cases would still > not be handled correctly with your patch, I think. > > I'm wondering whether it is even worthwhile to attempt to cover those cases. > Maybe a more straightforward fix would be to keep in line with the > above-mentioned comment about "straightforward SETs" and just check for a > single SET directly instead of using "single_set" here. Do you think this > would miss any important optimizations? Not sure about how many optimizations this would kill. Many insns are parallells that also set CCmode regs which don't interfere with this optimization, but only considering SETs would skip all such optimizations on targets that can have CCmode during reload (avr is not one of them). Also I don't have a test case for your scenario. I can reproduce the bug back to v5 on avr and maybe it is even older. As it appears, this PR lead to no hickups on any other target, so for now I'd like to keep the fix restricted to what I can test. I already started a review this morning: https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620446.html
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 Ulrich Weigand changed: What|Removed |Added CC||uweigand at gcc dot gnu.org --- Comment #12 from Ulrich Weigand --- Sorry for not responding earlier, I've been out on vacation. I think your root cause analysis is correct. In this part of code: if (success) delete_insn (insn); changed |= success; insn = next; move2add_record_mode (reg); reg_offset[regno] = trunc_int_for_mode (added_offset + base_offset, mode); continue; the intent seems to be to manually update the move2add data structures to account for the effects of "next", because the default logic is now skipped for the "next" insn. That's why in particular the reg mode and offset are manually calculated. This manual logic however is really only correct if "next" is actually just a simple SET. Reading the comment before the whole loop: /* For simplicity, we only perform this optimization on straightforward SETs. */ makes me suspect the original author assumed that "next" is in fact a straightforward SET here as well. This is however not true due to behavior of the "single_set" extractor. (I'm wondering if "single_set" used to be defined differently back in the days?) Your fix does look correct to me as far as handling parallel CLOBBERs go. However, looking at "single_set", it seems there is yet another case: the extractor also accepts a parallel of two or more SETs, as long as all except one of those SETs have destinations that are dead. These cases would still not be handled correctly with your patch, I think. I'm wondering whether it is even worthwhile to attempt to cover those cases. Maybe a more straightforward fix would be to keep in line with the above-mentioned comment about "straightforward SETs" and just check for a single SET directly instead of using "single_set" here. Do you think this would miss any important optimizations?
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 --- Comment #11 from Georg-Johann Lay --- Created attachment 55217 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55217&action=edit Proposed patch for postreload.cc to record clobbers of next insn + test case. This patch solves the problem for avr and tests with no additional regressions for avr. -- rtl-optimization/101188: Don't bypass clobbers of some insns that are optimized or are optimization candidates. gcc/ PR rtl-optimization/101188 * postreload.cc (reload_cse_move2add): Record clobbers of next insn using move2add_note_store. gcc/testsuite/ PR rtl-optimization/101188 * gcc.c-torture/execute/pr101188.c: New test.
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 --- Comment #10 from Georg-Johann Lay --- Created attachment 55189 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55189&action=edit Proposed patch for postreload.cc so analysis is not bypassing "next" insn. (In reply to Georg-Johann Lay from comment #9) > postreload.cc::reload_cse_move2add() loops over all insns, and at some point > it encounters > > (insn 44 14 15 2 (set (reg/f:HI 14 r14 [58]) > (reg/v/f:HI 16 r16 [orig:51 self ] [51])) "fail1.c":28:5 101 > {*movhi_split} > (nil)) > > During the analysis for that insn, it executes > > rtx_insn *next = next_nonnote_nondebug_insn (insn); > rtx set = NULL_RTX; > if (next) > set = single_set (next); > > where next is > > (insn 15 44 16 2 (parallel [ > (set (reg/f:HI 14 r14 [58]) > (plus:HI (reg/f:HI 14 r14 [58]) > (const_int 68 [0x44]))) > (clobber (reg:QI 31 r31)) > ]) "fail1.c":28:5 175 {addhi3_clobber} > (nil)) > > Further down, it continues with success = 0: > > if (success) > delete_insn (insn); > changed |= success; > insn = next; > [...] > continue; > > The scan then continues with NEXT_INSN (insn), which is the insn AFTER insn > 15, so the CLOBBER of QI:31 in insn 15 is bypassed, and note_stores or > similar is never executed on insn 15. A simple solution is to not "insn = next;" and just to pseudo-delete insn so that NEXT_INSN (insn) works in the for loop: diff --git a/gcc/postreload.cc b/gcc/postreload.cc index fb392651e1b..388b8c0a506 100644 --- a/gcc/postreload.cc +++ b/gcc/postreload.cc @@ -2031,9 +2031,8 @@ reload_cse_move2add (rtx_insn *first) } } if (success) - delete_insn (insn); + SET_INSN_DELETED (insn); changed |= success; - insn = next; move2add_record_mode (reg); reg_offset[regno] = trunc_int_for_mode (added_offset + base_offset,
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 --- Comment #9 from Georg-Johann Lay --- The bug works as follows: postreload.cc::reload_cse_move2add() loops over all insns, and at some point it encounters (insn 44 14 15 2 (set (reg/f:HI 14 r14 [58]) (reg/v/f:HI 16 r16 [orig:51 self ] [51])) "fail1.c":28:5 101 {*movhi_split} (nil)) During the analysis for that insn, it executes rtx_insn *next = next_nonnote_nondebug_insn (insn); rtx set = NULL_RTX; if (next) set = single_set (next); where next is (insn 15 44 16 2 (parallel [ (set (reg/f:HI 14 r14 [58]) (plus:HI (reg/f:HI 14 r14 [58]) (const_int 68 [0x44]))) (clobber (reg:QI 31 r31)) ]) "fail1.c":28:5 175 {addhi3_clobber} (nil)) Further down, it continues with success = 0: if (success) delete_insn (insn); changed |= success; insn = next; [...] continue; The scan then continues with NEXT_INSN (insn), which is the insn AFTER insn 15, so the CLOBBER of QI:31 in insn 15 is bypassed, and note_stores or similar is never executed on insn 15. The "set = single_set (next)" also bypasses that insn 15 is a PARALLEL with a CLOBBER of a general purpose register. Appears the code is in postreload since 2003, when postreload.c was split out of reload1.c.
[Bug rtl-optimization/101188] [postreload] Uses content of a clobbered register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188 Georg-Johann Lay changed: What|Removed |Added Summary|[AVR] Miscompilation and|[postreload] Uses content |function pointers |of a clobbered register See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=56833 --- Comment #8 from Georg-Johann Lay --- Changing the title to something that resembles what is going wrong. Also there is PR56833 which was fixed around v4.9, so maybe that fix was incomplete. There is also PR56442 which is still open, and where it's unclear whether that is a duplicate.