Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
I was mainly arguing with the sentence that for asm volatile, the output constraints (did you mean outputs and clobbers?) have essentially no meaning. While for some optimizations perhaps it might be desirable to treat asm volatile as full optimization barrier, I'm not sure about all of them (i.e. it would be important to look for performance degradations caused by that change), and for var-tracking I'd argue that asm vs. asm volatile is completely unimportant, if the asm volatile pattern (or even UNSPEC_VOLATILE) doesn't say it clobbers or sets hard register XYZ, then it can't change it (well, it could but is responsible of restoring it), similarly for memory, if it doesn't clobber memory and doesn't set some memory, it can't do that. Yes, I meant outputs and clobbers. The origin of all this is: `blockage' This pattern defines a pseudo insn that prevents the instruction scheduler and other passes from moving instructions and using register equivalences across the boundary defined by the blockage insn. This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM. Now the typical blockage insn is that of the x86: ;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and ;; all of memory. This blocks insns from being moved across this point. (define_insn blockage [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)] [(set_attr length 0)]) [Note the pretty adamant comment about UNSPEC_VOLATILE here...] So, if UNSPEC_VOLATILE is not treated specially, the blockage insn of most ports won't block hard register equivalences. And blockages are precisely used to do that, see e.g. the untyped_call pattern of the x86: /* The optimizer does not know that the call sets the function value registers we stored in the result block. We avoid problems by claiming that all hard registers are used and clobbered at this point. */ emit_insn (gen_blockage ()); That being said, I agree that volatile asms are a separate discussion. So, do you object even to the var-tracking change (which would mean if var-tracking found that say some variable's value lives in hard register 12, when encountering asm volatile it would mean to forget about that even when hard register 12 isn't clobbered by the asm, nor set)? For now var-tracking seems to be the most important issue, as we generate invalid debug info there (latent before, but never hit on inline asm on x86_64/i686 except on setjmp calls). For volatile asms, no, thanks for fixing the fallout on this side. As for other passes, the r193802 change e.g. regresses slightly modified pr44194-1.c testcase on x86_64, struct ints { int a, b, c; } foo(); void bar(int a, int b); void func() { struct ints s = foo(); int x; asm volatile ( : =r (x)); bar(s.a, s.b); } func: .LFB0: - xorl%eax, %eax subq$24, %rsp .LCFI0: + xorl%eax, %eax callfoo - movq%rax, %rcx - shrq$32, %rcx - movq%rcx, %rsi - movl%eax, %edi + movq%rax, (%rsp) + movl%edx, 8(%rsp) + movl4(%rsp), %esi + movl(%rsp), %edi addq$24, %rsp .LCFI1: jmp bar To me it looks like an unnecessary pessimization, the volatile there I understand that just it doesn't want to be moved around (e.g. scheduled before the foo call or after bar), that it can't be DCEd, but as it doesn't mention it modifies memory, I don't understand why it should force some registers to stack and back when it has no way to know the compiler would be emitting anything like that at all. Compare that to asm volatile ( : =r (x) : : memory); in the testcase, where the r193802 commit makes no difference, the stores/loads are done in both cases. OK, I agree that the fallout for DSE, and the effect on memory in general, is undesirable, especially for volatile asms. For CSE, I'd agree it should treat asm volatile as barriers, so for cselib.c we probably need some additional cselib_init flag how we want to treat volatile asms... For UNSPEC_VOLATILE, perhaps even DSE should stay conservative, but for var-tracking.c I still don't see a reason for that. Thank you (as well as the others) for the detailed feedback. Clearly my initial stance on this was a bit extreme and needs to be watered down. I'll ponder about this over the week-end and propose something next week. -- Eric Botcazou
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
On Tue, Nov 27, 2012 at 8:29 PM, Uros Bizjak ubiz...@gmail.com wrote: As long as volatile asms and UNSPEC_VOLATILE insns (aka. barriers) are handled the same way and consistently throughout gcc, I'm fine. It seems your patch does that, so thanks! But the question is also what effects your patch can have e.g. on RTL DSE. Looks like the patch caused a bootstrap for s390x. Eagerly awaiting a PR for that, but whoever is interested on that, please try Jakub's patch first... 2012-11-26 Jakub Jelinek ja...@redhat.com PR debug/36728 PR debug/55467 * cselib.c (cselib_process_insn): If cselib_preserve_constants, don't reset table and exit early on volatile insns and setjmp. Reset table afterwards on setjmp. * gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm are non-empty and add dependency between the first and second asm. * gcc.dg/guality/pr36728-2.c: Likewise. * gcc.dg/guality/pr36728-3.c: New test. * gcc.dg/guality/pr36728-4.c: New test. I have hit the same ICE on alpha-linux-gnu bootstrap. Jakub's patch fixes the ICE, and allows bootstrap to pass well into stage2 now. However, it takes ~10 hours for full bootstrap+regtest to finish, will report back tomorrow morning (CET). The results look OK [1]. Please note that I didn't patch the testsuite. [1] http://gcc.gnu.org/ml/gcc-testresults/2012-11/msg02335.html Uros.
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
On Tue, 27 Nov 2012, Jakub Jelinek wrote: 2012-11-26 Jakub Jelinek ja...@redhat.com PR debug/36728 PR debug/55467 * cselib.c (cselib_process_insn): If cselib_preserve_constants, don't reset table and exit early on volatile insns and setjmp. Reset table afterwards on setjmp. It seems this also fixes the s390x-linux bootstrap; at least the test-case in PR bootstrap/55511. Thanks again. (N.B. there may also be a bug in var-tracking, covered up by the patch above.) brgds, H-P
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
On Wed, Nov 28, 2012 at 08:44:18AM -0500, Hans-Peter Nilsson wrote: On Tue, 27 Nov 2012, Jakub Jelinek wrote: 2012-11-26 Jakub Jelinek ja...@redhat.com PR debug/36728 PR debug/55467 * cselib.c (cselib_process_insn): If cselib_preserve_constants, don't reset table and exit early on volatile insns and setjmp. Reset table afterwards on setjmp. It seems this also fixes the s390x-linux bootstrap; at least the test-case in PR bootstrap/55511. Thanks again. (N.B. there may also be a bug in var-tracking, covered up by the patch above.) The patch is actually a fix for that. The thing is that both cselib was doing the wrong thing for the resets (not calling cselib_preserve_only_constants () before cselib_reset_table resp. cselib_reset_table not prepared to the thing that it would need to flush all regs/mems, not just from the VALUEs that are kept in the hash table, but also from those that are dropped), and also not adding some special magic microoperation for the volatile insns (unclear what exactly it would have to do). By never handling the volatile insns specially during var-tracking all those issues are gone. Jakub
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
Eric Botcazou ebotca...@adacore.com writes: I strongly disagree with this. Outputs and clobbers have significant meaning even on volatile asms, asm volatile doesn't mean all registers and all memory are supposed to be considered clobbered. For memory you have memory clobber for that, for registers it is users responsibility to describe exactly what changes, either in clobbers or in outputs. The difference between non-volatile and volatile asm is, as the documentation states: The `volatile' keyword indicates that the instruction has important side-effects. GCC will not delete a volatile `asm' if it is reachable. Volatile asm acts as an optimization barrier to some extent, but it really can't modify registers or memory that aren't described as modified in the asm pattern. The important side-effects are of some other kind than modifying registers or memory visible from the current function. Ditto for UNSPEC_VOLATILE. Well, the last sentence would essentially void the entire argument I think. It's well established in the RTL middle-end that UNSPEC_VOLATILE can do pretty much everything behind the back of the compiler. As always when jumping in the middle of thread, I might well be missing the point sorry, but this sounded a bit extreme if taken literally. An UNSPEC_VOLATILE doesn't in itself force the function to save all call-saved registers, so an UNSPEC_VOLATILE that modifies those registers behind the back of the compiler would lead to us generating wrong code. And I thought UNSPEC_VOLATILEs that also clobber visible memory in an unpredictable way had to have something like (clobber (mem:BLK (scratch))). I thought Jakub's the important side-effects are of some other kind than modifying registers or memory visible from the current function applied to UNSPEC_VOLATILE too. Richard
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
On 11/27/2012 09:35 AM, Eric Botcazou wrote: It's well established in the RTL middle-end that UNSPEC_VOLATILE can do pretty much everything behind the back of the compiler. This is false. U_V is a scheduling barrier, and is never deleted as dead (as opposed to unreachable) code. Certainly we don't do anything so complete as invalidate all registers, nor do we invalidate memory. r~
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
On 11/27/2012 04:03 AM, Jakub Jelinek wrote: 2012-11-26 Jakub Jelinek ja...@redhat.com PR debug/36728 PR debug/55467 * cselib.c (cselib_process_insn): If cselib_preserve_constants, don't reset table and exit early on volatile insns and setjmp. Reset table afterwards on setjmp. * gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm are non-empty and add dependency between the first and second asm. * gcc.dg/guality/pr36728-2.c: Likewise. * gcc.dg/guality/pr36728-3.c: New test. * gcc.dg/guality/pr36728-4.c: New test. Ok. It appears that PRs 55507 and 55511 are also fixed. r~
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
I quoted the whole discussion, see a single line below. On Mon, 19 Nov 2012, Eric Botcazou wrote: Unfortunately, it causes regressions; read on for a very brief analysis. For x86_64-linux (base multilib): Running /home/hp/gcctop/tmp/pr55030-2n/gcc/gcc/testsuite/gcc.dg/guality/guality.exp ... ... FAIL: gcc.dg/guality/pr36728-1.c -O1 line 12 arg7 == 30 [...] I looked into gcc.dg/guality/pr36728-1.c at base -O1. The generated assembly code modulo debug info, is the same. The value of arg7 is optimized out, says gdb in gcc.log. The problem doesn't seem to be any md-generated frame-related-barrier as was my first guess, but the volatile asms in the source(!). The first one looks like this (and the second one similar) in pr36728-1.c.168r.cse1 (r193583): (insn 26 25 27 2 (parallel [ (set (mem/c:SI (plus:DI (reg/f:DI 20 frame) (const_int -32 [0xffe0])) [0 y+0 S4 A256]) (asm_operands/v:SI () (=m) 0 [ (mem/c:SI (plus:DI (reg/f:DI 20 frame) (const_int -32 [0xffe0])) [0 y+0 S4 A256]) ] [ (asm_input:SI (m) (null):0) ] [] /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12)) (clobber (reg:QI 18 fpsr)) (clobber (reg:QI 17 flags)) ]) /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12 -1 (nil)) It's not caught by the previous test: - GET_CODE (PATTERN (insn)) == ASM_OPERANDS - MEM_VOLATILE_P (PATTERN (insn))) ...but since it is a volatile asm (as seen in the source and by the /v on the asm_operands) volatile_insn_p does catch it (as expected and intended) and down the road for some reason, gcc can't recover the arg7 contents. Somewhat expected, but this volatile asm is also more volatile than intended; a clobber list for example as seen above inserted by the md, is now redundant. Thanks for the analysis. I don't think that the redundancy of the clobber list matters here: the clobbers are added to all asm statements, volatile or not, so they aren't intended to make the volatile statements more precise in the hope of optimizing around them. IIIUC, Jakub disagrees on this point, see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55467#c13. Sigh, hoping we can get consensus on this point, or at least that gcc has a consistent view... I'm not sure what to do with this. Try changing volatile_insn_p adding a parameter to optionally allow volatile asms with outputs to pass? But then, when *should* that distinction be done, to let such volatile asms be allowed to pass as not-really-as-volatile-as-we-look-for? I'd think never for any of the the patched code, or maybe only when looking at effects on memory. Yes, weakening volatile_insn_p seems also dangerous to me, and I'd rather lean towards the opposite, conservative side. We apparently have a small conflict between the meaning of volatile asms with operands at the source level and volatile_insn_p. However, I think that the latter interpretation is the correct one: if your asm statements have effects that can be described, then you should use output constraints instead of volatile; otherwise, you should use volatile and the output constraints have essentially no meaning. The gcc.dg/guality/pr36728-[12].c testcases use volatile asms in an artificial way to coax the compiler into following a certain behaviour, so I don't think that they should be taken into account to judge the correctness of the change. Therefore, I'd go ahead and apply the full patch below, possibly adjusting both testcases to cope with it. Tentative (and untested) patch attached. I've also CCed Jakub though, as he might have a different opinion. gcc: PR middle-end/55030 * builtins.c (expand_builtin_setjmp_receiver): Update comment regarding purpose of blockage. * emit-rtl.c [!HAVE_blockage] (gen_blockage): Similarly for the head comment. * rtlanal.c (volatile_insn_p): Ditto. * doc/md.texi (blockage): Update similarly. Change wording to require one of two forms, rather than implying a wider choice. * cse.c (cse_insn): Where checking for blocking insns, use volatile_insn_p instead of manual check for volatile ASM. * dse.c (scan_insn): Ditto. * cselib.c (cselib_process_insn): Ditto. -- Eric Botcazou
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
On Tue, Nov 27, 2012 at 06:44:23AM -0500, Hans-Peter Nilsson wrote: We apparently have a small conflict between the meaning of volatile asms with operands at the source level and volatile_insn_p. However, I think that the latter interpretation is the correct one: if your asm statements have effects that can be described, then you should use output constraints instead of volatile; otherwise, you should use volatile and the output constraints have essentially no meaning. I strongly disagree with this. Outputs and clobbers have significant meaning even on volatile asms, asm volatile doesn't mean all registers and all memory are supposed to be considered clobbered. For memory you have memory clobber for that, for registers it is users responsibility to describe exactly what changes, either in clobbers or in outputs. The difference between non-volatile and volatile asm is, as the documentation states: The `volatile' keyword indicates that the instruction has important side-effects. GCC will not delete a volatile `asm' if it is reachable. Volatile asm acts as an optimization barrier to some extent, but it really can't modify registers or memory that aren't described as modified in the asm pattern. The important side-effects are of some other kind than modifying registers or memory visible from the current function. Ditto for UNSPEC_VOLATILE. So, at least from var-tracking POV which doesn't attempt to perform any optimizations across any insn, just tries to track where values live, IMHO a volatile asm acts exactly the same as non-volatile, that is why I'm testing following patch right now. But the question is also what effects your patch can have e.g. on RTL DSE. 2012-11-26 Jakub Jelinek ja...@redhat.com PR debug/36728 PR debug/55467 * cselib.c (cselib_process_insn): If cselib_preserve_constants, don't reset table and exit early on volatile insns and setjmp. Reset table afterwards on setjmp. * gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm are non-empty and add dependency between the first and second asm. * gcc.dg/guality/pr36728-2.c: Likewise. * gcc.dg/guality/pr36728-3.c: New test. * gcc.dg/guality/pr36728-4.c: New test. --- gcc/cselib.c.jj 2012-11-26 10:14:26.0 +0100 +++ gcc/cselib.c2012-11-26 20:01:10.488304558 +0100 @@ -2626,11 +2626,12 @@ cselib_process_insn (rtx insn) cselib_current_insn = insn; /* Forget everything at a CODE_LABEL, a volatile insn, or a setjmp. */ - if (LABEL_P (insn) - || (CALL_P (insn) - find_reg_note (insn, REG_SETJMP, NULL)) - || (NONJUMP_INSN_P (insn) - volatile_insn_p (PATTERN (insn + if ((LABEL_P (insn) + || (CALL_P (insn) + find_reg_note (insn, REG_SETJMP, NULL)) + || (NONJUMP_INSN_P (insn) + volatile_insn_p (PATTERN (insn + !cselib_preserve_constants) { cselib_reset_table (next_uid); cselib_current_insn = NULL_RTX; @@ -2668,9 +2669,18 @@ cselib_process_insn (rtx insn) /* Look for any CLOBBERs in CALL_INSN_FUNCTION_USAGE, but only after we have processed the insn. */ if (CALL_P (insn)) -for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1)) - if (GET_CODE (XEXP (x, 0)) == CLOBBER) - cselib_invalidate_rtx (XEXP (XEXP (x, 0), 0)); +{ + for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1)) + if (GET_CODE (XEXP (x, 0)) == CLOBBER) + cselib_invalidate_rtx (XEXP (XEXP (x, 0), 0)); + /* Flush evertything on setjmp. */ + if (cselib_preserve_constants + find_reg_note (insn, REG_SETJMP, NULL)) + { + cselib_preserve_only_values (); + cselib_reset_table (next_uid); + } +} /* On setter of the hard frame pointer if frame_pointer_needed, invalidate stack_pointer_rtx, so that sp and {,h}fp based --- gcc/testsuite/gcc.dg/guality/pr36728-1.c.jj 2012-11-26 10:14:25.0 +0100 +++ gcc/testsuite/gcc.dg/guality/pr36728-1.c2012-11-27 10:01:14.517080157 +0100 @@ -1,7 +1,11 @@ /* PR debug/36728 */ /* { dg-do run } */ /* { dg-options -g } */ -int a; + +#include ../nop.h + +int a, b; + int __attribute__((noinline)) foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7) { @@ -9,9 +13,9 @@ foo (int arg1, int arg2, int arg3, int a int __attribute__ ((aligned(32))) y; y = 2; - asm ( : =m (y) : m (y)); + asm (NOP : =m (y), =m (b) : m (y)); x[0] = 25; - asm ( : =m (x[0]), =m (a) : m (x[0])); + asm (NOP : =m (x[0]), =m (a) : m (x[0]), m (b)); return y; } @@ -21,23 +25,23 @@ foo (int arg1, int arg2, int arg3, int a and arg2. So it is expected that these values are unavailable in some of these tests. */ -/* { dg-final { gdb-test 12 arg1 1 { target { ! s390*-*-* } } } }*/ -/* { dg-final { gdb-test 12 arg2 2 { target { ! s390*-*-* } } } }*/ -/* { dg-final { gdb-test 12
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
On Tue, 27 Nov 2012, Jakub Jelinek wrote: On Tue, Nov 27, 2012 at 06:44:23AM -0500, Hans-Peter Nilsson wrote: JFTR: No I didn't, Eric wrote the below. But, it made sense to me. :) We apparently have a small conflict between the meaning of volatile asms with operands at the source level and volatile_insn_p. However, I think that the latter interpretation is the correct one: if your asm statements have effects that can be described, then you should use output constraints instead of volatile; otherwise, you should use volatile and the output constraints have essentially no meaning. I strongly disagree with this. [...] As long as volatile asms and UNSPEC_VOLATILE insns (aka. barriers) are handled the same way and consistently throughout gcc, I'm fine. It seems your patch does that, so thanks! But the question is also what effects your patch can have e.g. on RTL DSE. Looks like the patch caused a bootstrap for s390x. Eagerly awaiting a PR for that, but whoever is interested on that, please try Jakub's patch first... 2012-11-26 Jakub Jelinek ja...@redhat.com PR debug/36728 PR debug/55467 * cselib.c (cselib_process_insn): If cselib_preserve_constants, don't reset table and exit early on volatile insns and setjmp. Reset table afterwards on setjmp. * gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm are non-empty and add dependency between the first and second asm. * gcc.dg/guality/pr36728-2.c: Likewise. * gcc.dg/guality/pr36728-3.c: New test. * gcc.dg/guality/pr36728-4.c: New test. brgds, H-P
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
On Tue, Nov 27, 2012 at 01:03:47PM +0100, Jakub Jelinek wrote: So, at least from var-tracking POV which doesn't attempt to perform any optimizations across any insn, just tries to track where values live, IMHO a volatile asm acts exactly the same as non-volatile, that is why I'm testing following patch right now. 2012-11-26 Jakub Jelinek ja...@redhat.com PR debug/36728 PR debug/55467 * cselib.c (cselib_process_insn): If cselib_preserve_constants, don't reset table and exit early on volatile insns and setjmp. Reset table afterwards on setjmp. * gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm are non-empty and add dependency between the first and second asm. * gcc.dg/guality/pr36728-2.c: Likewise. * gcc.dg/guality/pr36728-3.c: New test. * gcc.dg/guality/pr36728-4.c: New test. Bootstrapped/regtested fine on x86_64-linux and i686-linux (and fixes the pr36728-[34].c failures that appear without the patch, which are of the wrong debug kind). Jakub
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
I strongly disagree with this. Outputs and clobbers have significant meaning even on volatile asms, asm volatile doesn't mean all registers and all memory are supposed to be considered clobbered. For memory you have memory clobber for that, for registers it is users responsibility to describe exactly what changes, either in clobbers or in outputs. The difference between non-volatile and volatile asm is, as the documentation states: The `volatile' keyword indicates that the instruction has important side-effects. GCC will not delete a volatile `asm' if it is reachable. Volatile asm acts as an optimization barrier to some extent, but it really can't modify registers or memory that aren't described as modified in the asm pattern. The important side-effects are of some other kind than modifying registers or memory visible from the current function. Ditto for UNSPEC_VOLATILE. Well, the last sentence would essentially void the entire argument I think. It's well established in the RTL middle-end that UNSPEC_VOLATILE can do pretty much everything behind the back of the compiler. Now I agree that the discussion exists for volatile asms. But you have for example in the unmodified cse.c: /* A volatile ASM invalidates everything. */ if (NONJUMP_INSN_P (insn) GET_CODE (PATTERN (insn)) == ASM_OPERANDS MEM_VOLATILE_P (PATTERN (insn))) flush_hash_table (); and the comment of volatile_insn_p is rather explicit: /* Nonzero if X contains any volatile instructions. These are instructions which may cause unpredictable machine state instructions, and thus no instructions should be moved or combined across them. This includes only volatile asms and UNSPEC_VOLATILE instructions. */ The problem is that the various RTL passes don't have a consistent view on that so the patch attempts to tidy this up in a conservative manner. I think that a compromise could be to say that volatile asms without outputs are full barriers (like UNSPEC_VOLATILE) but other volatile asms aren't. That's what the unmodified cse.c, cselib.c and dse.c currently implement. But implementing it consistently would mean weakening volatile_insn_p. So, at least from var-tracking POV which doesn't attempt to perform any optimizations across any insn, just tries to track where values live, IMHO a volatile asm acts exactly the same as non-volatile, that is why I'm testing following patch right now. But the question is also what effects your patch can have e.g. on RTL DSE. It will make all volatile asms be treated as volatile asms without outputs. -- Eric Botcazou
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
On Tue, Nov 27, 2012 at 06:35:52PM +0100, Eric Botcazou wrote: Now I agree that the discussion exists for volatile asms. But you have for example in the unmodified cse.c: /* A volatile ASM invalidates everything. */ if (NONJUMP_INSN_P (insn) GET_CODE (PATTERN (insn)) == ASM_OPERANDS MEM_VOLATILE_P (PATTERN (insn))) flush_hash_table (); That was weird indeed, because asm volatile ( : : r (x)); flushed (for targets that don't add any implicit clobbers) but e.g. asm volatile ( : : r (x) : r23); wouldn't (then the pattern is a parallel with ASM_OPERANDS and some CLOBBERs). The effect of that is that it never triggered on i?86/x86_64, because those have the implicit fprs/flags clobbers. I was mainly arguing with the sentence that for asm volatile, the output constraints (did you mean outputs and clobbers?) have essentially no meaning. While for some optimizations perhaps it might be desirable to treat asm volatile as full optimization barrier, I'm not sure about all of them (i.e. it would be important to look for performance degradations caused by that change), and for var-tracking I'd argue that asm vs. asm volatile is completely unimportant, if the asm volatile pattern (or even UNSPEC_VOLATILE) doesn't say it clobbers or sets hard register XYZ, then it can't change it (well, it could but is responsible of restoring it), similarly for memory, if it doesn't clobber memory and doesn't set some memory, it can't do that. So, do you object even to the var-tracking change (which would mean if var-tracking found that say some variable's value lives in hard register 12, when encountering asm volatile it would mean to forget about that even when hard register 12 isn't clobbered by the asm, nor set)? For now var-tracking seems to be the most important issue, as we generate invalid debug info there (latent before, but never hit on inline asm on x86_64/i686 except on setjmp calls). As for other passes, the r193802 change e.g. regresses slightly modified pr44194-1.c testcase on x86_64, struct ints { int a, b, c; } foo(); void bar(int a, int b); void func() { struct ints s = foo(); int x; asm volatile ( : =r (x)); bar(s.a, s.b); } func: .LFB0: - xorl%eax, %eax subq$24, %rsp .LCFI0: + xorl%eax, %eax callfoo - movq%rax, %rcx - shrq$32, %rcx - movq%rcx, %rsi - movl%eax, %edi + movq%rax, (%rsp) + movl%edx, 8(%rsp) + movl4(%rsp), %esi + movl(%rsp), %edi addq$24, %rsp .LCFI1: jmp bar To me it looks like an unnecessary pessimization, the volatile there I understand that just it doesn't want to be moved around (e.g. scheduled before the foo call or after bar), that it can't be DCEd, but as it doesn't mention it modifies memory, I don't understand why it should force some registers to stack and back when it has no way to know the compiler would be emitting anything like that at all. Compare that to asm volatile ( : =r (x) : : memory); in the testcase, where the r193802 commit makes no difference, the stores/loads are done in both cases. For CSE, I'd agree it should treat asm volatile as barriers, so for cselib.c we probably need some additional cselib_init flag how we want to treat volatile asms... For UNSPEC_VOLATILE, perhaps even DSE should stay conservative, but for var-tracking.c I still don't see a reason for that. Jakub
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
Hello! On Tue, 27 Nov 2012, Jakub Jelinek wrote: On Tue, Nov 27, 2012 at 06:44:23AM -0500, Hans-Peter Nilsson wrote: JFTR: No I didn't, Eric wrote the below. But, it made sense to me. :) We apparently have a small conflict between the meaning of volatile asms with operands at the source level and volatile_insn_p. However, I think that the latter interpretation is the correct one: if your asm statements have effects that can be described, then you should use output constraints instead of volatile; otherwise, you should use volatile and the output constraints have essentially no meaning. I strongly disagree with this. [...] As long as volatile asms and UNSPEC_VOLATILE insns (aka. barriers) are handled the same way and consistently throughout gcc, I'm fine. It seems your patch does that, so thanks! But the question is also what effects your patch can have e.g. on RTL DSE. Looks like the patch caused a bootstrap for s390x. Eagerly awaiting a PR for that, but whoever is interested on that, please try Jakub's patch first... 2012-11-26 Jakub Jelinek ja...@redhat.com PR debug/36728 PR debug/55467 * cselib.c (cselib_process_insn): If cselib_preserve_constants, don't reset table and exit early on volatile insns and setjmp. Reset table afterwards on setjmp. * gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm are non-empty and add dependency between the first and second asm. * gcc.dg/guality/pr36728-2.c: Likewise. * gcc.dg/guality/pr36728-3.c: New test. * gcc.dg/guality/pr36728-4.c: New test. I have hit the same ICE on alpha-linux-gnu bootstrap. Jakub's patch fixes the ICE, and allows bootstrap to pass well into stage2 now. However, it takes ~10 hours for full bootstrap+regtest to finish, will report back tomorrow morning (CET). Uros.
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
Unfortunately, it causes regressions; read on for a very brief analysis. For x86_64-linux (base multilib): Running /home/hp/gcctop/tmp/pr55030-2n/gcc/gcc/testsuite/gcc.dg/guality/guality.exp ... ... FAIL: gcc.dg/guality/pr36728-1.c -O1 line 12 arg7 == 30 [...] I looked into gcc.dg/guality/pr36728-1.c at base -O1. The generated assembly code modulo debug info, is the same. The value of arg7 is optimized out, says gdb in gcc.log. The problem doesn't seem to be any md-generated frame-related-barrier as was my first guess, but the volatile asms in the source(!). The first one looks like this (and the second one similar) in pr36728-1.c.168r.cse1 (r193583): (insn 26 25 27 2 (parallel [ (set (mem/c:SI (plus:DI (reg/f:DI 20 frame) (const_int -32 [0xffe0])) [0 y+0 S4 A256]) (asm_operands/v:SI () (=m) 0 [ (mem/c:SI (plus:DI (reg/f:DI 20 frame) (const_int -32 [0xffe0])) [0 y+0 S4 A256]) ] [ (asm_input:SI (m) (null):0) ] [] /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12)) (clobber (reg:QI 18 fpsr)) (clobber (reg:QI 17 flags)) ]) /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12 -1 (nil)) It's not caught by the previous test: - GET_CODE (PATTERN (insn)) == ASM_OPERANDS - MEM_VOLATILE_P (PATTERN (insn))) ...but since it is a volatile asm (as seen in the source and by the /v on the asm_operands) volatile_insn_p does catch it (as expected and intended) and down the road for some reason, gcc can't recover the arg7 contents. Somewhat expected, but this volatile asm is also more volatile than intended; a clobber list for example as seen above inserted by the md, is now redundant. Thanks for the analysis. I don't think that the redundancy of the clobber list matters here: the clobbers are added to all asm statements, volatile or not, so they aren't intended to make the volatile statements more precise in the hope of optimizing around them. I'm not sure what to do with this. Try changing volatile_insn_p adding a parameter to optionally allow volatile asms with outputs to pass? But then, when *should* that distinction be done, to let such volatile asms be allowed to pass as not-really-as-volatile-as-we-look-for? I'd think never for any of the the patched code, or maybe only when looking at effects on memory. Yes, weakening volatile_insn_p seems also dangerous to me, and I'd rather lean towards the opposite, conservative side. We apparently have a small conflict between the meaning of volatile asms with operands at the source level and volatile_insn_p. However, I think that the latter interpretation is the correct one: if your asm statements have effects that can be described, then you should use output constraints instead of volatile; otherwise, you should use volatile and the output constraints have essentially no meaning. The gcc.dg/guality/pr36728-[12].c testcases use volatile asms in an artificial way to coax the compiler into following a certain behaviour, so I don't think that they should be taken into account to judge the correctness of the change. Therefore, I'd go ahead and apply the full patch below, possibly adjusting both testcases to cope with it. Tentative (and untested) patch attached. I've also CCed Jakub though, as he might have a different opinion. gcc: PR middle-end/55030 * builtins.c (expand_builtin_setjmp_receiver): Update comment regarding purpose of blockage. * emit-rtl.c [!HAVE_blockage] (gen_blockage): Similarly for the head comment. * rtlanal.c (volatile_insn_p): Ditto. * doc/md.texi (blockage): Update similarly. Change wording to require one of two forms, rather than implying a wider choice. * cse.c (cse_insn): Where checking for blocking insns, use volatile_insn_p instead of manual check for volatile ASM. * dse.c (scan_insn): Ditto. * cselib.c (cselib_process_insn): Ditto. -- Eric BotcazouIndex: gcc.dg/guality/pr36728-1.c === --- gcc.dg/guality/pr36728-1.c (revision 193596) +++ gcc.dg/guality/pr36728-1.c (working copy) @@ -9,10 +9,10 @@ foo (int arg1, int arg2, int arg3, int a int __attribute__ ((aligned(32))) y; y = 2; - asm volatile ( : =m (y) : m (y)); + asm ( : =m (y) : m (y)); x[0] = 25; - asm volatile ( : =m (x[0]) : m (x[0])); - return y; + asm ( : =m (x[0]) : m (x[0])); + return y + x[0]; } /* On s390(x) r2 and r3 are (depending on the optimization level) used @@ -39,11 +39,13 @@ foo (int arg1, int arg2, int arg3, int a /* { dg-final { gdb-test 14 *x (char) 25 } } */ /* { dg-final { gdb-test 14 y 2 } } */ +int a; + int main () { int l = 0; -
Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
This is a target-specific blockage insn, but with the general form found in all targets defining it. The default blockage is an empty asm-volatile, which is what cse_insn recognized. The blockage insn is there to prevent scheduling of the critical insns and register values. It's almost obvious that an unspec_volatile should have that effect too; at least that's intended by the code in expand_builtin_setjmp_receiver. Luckily (or unluckily regarding the presence of the bug) *this* cse code is not run post-frame-layout (post-reload-cse does not use this code), or it seems people would soon notice register values used from the wrong side of the blockage, considering its critical use at the restore of the stack-pointer. As mentioned in the previous patch, http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01172.html, clobbering the frame-pointer (and then using it) does not seem the logical alternative to the patch below; the blockage insn should just do its job. Agreed. I updated comments and documentation so it's more apparent that register uses should also not be moved across the blockage; see the patched comments. Tested native x86_64-unknown-linux-gnu w./wo. -m32 and before/after 192677. Ok to commit? gcc: PR middle-end/55030 * builtins.c (expand_builtin_setjmp_receiver): Update comment regarding purpose of blockage. * emit-rtl.c [!HAVE_blockage] (gen_blockage): Similarly for the head comment. * doc/md.texi (blockage): Update similarly. Change wording to require one of two forms, rather than implying a wider choice. * cse.c (cse_insn): Where checking for blocking insns, treat UNSPEC_VOLATILE as blocking, besides volatile ASM. That's fine on principle, but there is a predicate for this (volatile_insn_p) so I think we should use it here. Moreover, cselib_process_insn has the same check so we should adjust it as well, which in turn means that dse.c:scan_insn should probably be adjusted too. OK with these changes, thanks. -- Eric Botcazou
[RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver
The problem exposed in PR55030 (repeatable on x86_64-linux with -m32 at r192676) is that the fake-frame-pointer frame is replaced with the actual-frame-pointer bp in cse1, around the critical insn in __builtin_setjmp_receiver that restores their defined offset. The patch in PR55030/r192676 removed a clobber of the frame-pointer, and cse1 felt free to replace the former register with the latter, as in the following expansion of __builtin_setjmp_receiver (in builtins.c: expand_builtin_setjmp_receiver) from a reduced test-case, see the PR. (You might find the setup/restore code having an unexpected order; there are two __builtin_setjmps in series and the setup of the second one is storing the wrong value for the frame-pointer.) The pre-transformation dump in cse1 says: (code_label/s 13 51 14 3 2 [2 uses]) (note 14 13 15 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 15 14 78 3 (use (reg/f:SI 6 bp)) /home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31 -1 (nil)) (insn 78 15 17 3 (set (reg/f:SI 20 frame) (reg/f:SI 6 bp)) /home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31 -1 (nil)) (insn 17 78 56 3 (unspec_volatile [ (const_int 0 [0]) ] UNSPECV_BLOCKAGE) /home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31 643 {blockage} (nil)) (insn 56 17 57 3 (set (reg/f:SI 74) (symbol_ref:SI (chk_fail_buf) [flags 0x40] var_decl 0x77512688 chk_fail_buf)) /home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:37 65 {*movsi_internal} (nil)) (insn 57 56 58 3 (set (mem:SI (reg/f:SI 74) [5 S4 A8]) (reg/f:SI 20 frame)) /home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:37 65 {*movsi_internal} (nil)) Before r192676 and after the reversion in r192701, there was/is that weird extra (clobber (reg/f:SI 6 bp) inserted before insn 17. But without that, we notice post-cse1-transformation, a change in insn 57: (code_label/s 13 51 14 3 2 [3 uses]) (note 14 13 15 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 15 14 78 3 (use (reg/f:SI 6 bp)) /home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31 -1 (nil)) (insn 78 15 17 3 (set (reg/f:SI 20 frame) (reg/f:SI 6 bp)) /home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31 -1 (nil)) (insn 17 78 56 3 (unspec_volatile [ (const_int 0 [0]) ] UNSPECV_BLOCKAGE) /home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31 643 {blockage} (nil)) (insn 56 17 57 3 (set (reg/f:SI 74) (symbol_ref:SI (chk_fail_buf) [flags 0x40] var_decl 0x77512688 chk_fail_buf)) /home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:37 65 {*movsi_internal} (nil)) (insn 57 56 58 3 (set (mem:SI (reg/f:SI 74) [5 S4 A8]) (reg/f:SI 6 bp)) /home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:37 65 {*movsi_internal} (nil)) It seemed wrong for this change to be stopped *only* by that (clobber bp). Stepping through the code for insn 78 and 57, I was looking for related special-casing of frame- and other similar registers in cse.c and was a bit confused when I found none (not counting hash_rtx_cb). (I admit lack of special-casing is a good sign, though. :) Then I was blinded by the obvious; the next insn, insn 17, is a blockage insn. This is a target-specific blockage insn, but with the general form found in all targets defining it. The default blockage is an empty asm-volatile, which is what cse_insn recognized. The blockage insn is there to prevent scheduling of the critical insns and register values. It's almost obvious that an unspec_volatile should have that effect too; at least that's intended by the code in expand_builtin_setjmp_receiver. Luckily (or unluckily regarding the presence of the bug) *this* cse code is not run post-frame-layout (post-reload-cse does not use this code), or it seems people would soon notice register values used from the wrong side of the blockage, considering its critical use at the restore of the stack-pointer. As mentioned in the previous patch, http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01172.html, clobbering the frame-pointer (and then using it) does not seem the logical alternative to the patch below; the blockage insn should just do its job. I updated comments and documentation so it's more apparent that register uses should also not be moved across the blockage; see the patched comments. Tested native x86_64-unknown-linux-gnu w./wo. -m32 and before/after 192677. Ok to commit? gcc: PR middle-end/55030 * builtins.c (expand_builtin_setjmp_receiver): Update comment regarding purpose of blockage. * emit-rtl.c [!HAVE_blockage] (gen_blockage): Similarly for