Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-30 Thread Eric Botcazou
 I was mainly arguing with the sentence that for asm volatile, the output
 constraints (did you mean outputs and clobbers?) have essentially no
 meaning.  While for some optimizations perhaps it might be desirable
 to treat asm volatile as full optimization barrier, I'm not sure about all
 of them (i.e. it would be important to look for performance degradations
 caused by that change), and for var-tracking I'd argue that asm vs. asm
 volatile is completely unimportant, if the asm volatile pattern (or even
 UNSPEC_VOLATILE) doesn't say it clobbers or sets hard register XYZ, then it
 can't change it (well, it could but is responsible of restoring it),
 similarly for memory, if it doesn't clobber memory and doesn't set some
 memory, it can't do that.

Yes, I meant outputs and clobbers.  The origin of all this is:

`blockage'
 This pattern defines a pseudo insn that prevents the instruction
 scheduler and other passes from moving instructions and using
 register equivalences across the boundary defined by the blockage
 insn.  This needs to be an UNSPEC_VOLATILE pattern or a volatile
 ASM.

Now the typical blockage insn is that of the x86:

;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
;; all of memory.  This blocks insns from being moved across this point.

(define_insn blockage
  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
  
  
  [(set_attr length 0)])

[Note the pretty adamant comment about UNSPEC_VOLATILE here...]

So, if UNSPEC_VOLATILE is not treated specially, the blockage insn of most 
ports won't block hard register equivalences.  And blockages are precisely 
used to do that, see e.g. the untyped_call pattern of the x86:

  /* The optimizer does not know that the call sets the function value
 registers we stored in the result block.  We avoid problems by
 claiming that all hard registers are used and clobbered at this
 point.  */
  emit_insn (gen_blockage ());

That being said, I agree that volatile asms are a separate discussion.

 So, do you object even to the var-tracking change (which would mean if
 var-tracking found that say some variable's value lives in hard register 12,
 when encountering asm volatile it would mean to forget about that even when
 hard register 12 isn't clobbered by the asm, nor set)?  For now
 var-tracking seems to be the most important issue, as we generate invalid
 debug info there (latent before, but never hit on inline asm on x86_64/i686
 except on setjmp calls).

For volatile asms, no, thanks for fixing the fallout on this side.

 As for other passes, the r193802 change e.g. regresses slightly modified
 pr44194-1.c testcase on x86_64,
 struct ints { int a, b, c; } foo();
 void bar(int a, int b);
 
 void func() {
   struct ints s = foo();
   int x;
   asm volatile ( : =r (x));
   bar(s.a, s.b);
 }
 
  func:
  .LFB0:
 - xorl%eax, %eax
   subq$24, %rsp
  .LCFI0:
 + xorl%eax, %eax
   callfoo
 - movq%rax, %rcx
 - shrq$32, %rcx
 - movq%rcx, %rsi
 - movl%eax, %edi
 + movq%rax, (%rsp)
 + movl%edx, 8(%rsp)
 + movl4(%rsp), %esi
 + movl(%rsp), %edi
   addq$24, %rsp
  .LCFI1:
   jmp bar
 
 To me it looks like an unnecessary pessimization, the volatile there
 I understand that just it doesn't want to be moved around (e.g. scheduled
 before the foo call or after bar), that it can't be DCEd, but as it doesn't
 mention it modifies memory, I don't understand why it should force some
 registers to stack and back when it has no way to know the compiler would be
 emitting anything like that at all.
 Compare that to
   asm volatile ( : =r (x) : : memory);
 in the testcase, where the r193802 commit makes no difference, the
 stores/loads are done in both cases.

OK, I agree that the fallout for DSE, and the effect on memory in general, is 
undesirable, especially for volatile asms.

 For CSE, I'd agree it should treat asm volatile as barriers, so for cselib.c
 we probably need some additional cselib_init flag how we want to treat
 volatile asms...  For UNSPEC_VOLATILE, perhaps even DSE should stay
 conservative, but for var-tracking.c I still don't see a reason for that.

Thank you (as well as the others) for the detailed feedback.  Clearly my 
initial stance on this was a bit extreme and needs to be watered down.
I'll ponder about this over the week-end and propose something next week.

-- 
Eric Botcazou


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-28 Thread Uros Bizjak
On Tue, Nov 27, 2012 at 8:29 PM, Uros Bizjak ubiz...@gmail.com wrote:

 As long as volatile asms and UNSPEC_VOLATILE insns (aka.
 barriers) are handled the same way and consistently throughout
 gcc, I'm fine.  It seems your patch does that, so thanks!

  But the question is also what effects your patch can have e.g. on RTL DSE.

 Looks like the patch caused a bootstrap for s390x.

 Eagerly awaiting a PR for that, but whoever is interested
 on that, please try Jakub's patch first...

  2012-11-26  Jakub Jelinek  ja...@redhat.com
 
  PR debug/36728
  PR debug/55467
  * cselib.c (cselib_process_insn): If cselib_preserve_constants,
  don't reset table and exit early on volatile insns and setjmp.
  Reset table afterwards on setjmp.
 
  * gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm
  are non-empty and add dependency between the first and second asm.
  * gcc.dg/guality/pr36728-2.c: Likewise.
  * gcc.dg/guality/pr36728-3.c: New test.
  * gcc.dg/guality/pr36728-4.c: New test.

 I have hit the same ICE on alpha-linux-gnu bootstrap. Jakub's patch
 fixes the ICE, and allows bootstrap to pass well into stage2 now.
 However, it takes ~10 hours for full bootstrap+regtest to finish, will
 report back tomorrow morning (CET).

The results look OK [1]. Please note that I didn't patch the testsuite.

[1] http://gcc.gnu.org/ml/gcc-testresults/2012-11/msg02335.html

Uros.


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-28 Thread Hans-Peter Nilsson
On Tue, 27 Nov 2012, Jakub Jelinek wrote:
 2012-11-26  Jakub Jelinek  ja...@redhat.com

   PR debug/36728
   PR debug/55467
   * cselib.c (cselib_process_insn): If cselib_preserve_constants,
   don't reset table and exit early on volatile insns and setjmp.
   Reset table afterwards on setjmp.

It seems this also fixes the s390x-linux bootstrap; at least the
test-case in PR bootstrap/55511.  Thanks again.

(N.B. there may also be a bug in var-tracking, covered up by the
patch above.)

brgds, H-P


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-28 Thread Jakub Jelinek
On Wed, Nov 28, 2012 at 08:44:18AM -0500, Hans-Peter Nilsson wrote:
 On Tue, 27 Nov 2012, Jakub Jelinek wrote:
  2012-11-26  Jakub Jelinek  ja...@redhat.com
 
  PR debug/36728
  PR debug/55467
  * cselib.c (cselib_process_insn): If cselib_preserve_constants,
  don't reset table and exit early on volatile insns and setjmp.
  Reset table afterwards on setjmp.
 
 It seems this also fixes the s390x-linux bootstrap; at least the
 test-case in PR bootstrap/55511.  Thanks again.
 
 (N.B. there may also be a bug in var-tracking, covered up by the
 patch above.)

The patch is actually a fix for that.  The thing is that
both cselib was doing the wrong thing for the resets (not calling
cselib_preserve_only_constants () before cselib_reset_table resp.
cselib_reset_table not prepared to the thing that it would need
to flush all regs/mems, not just from the VALUEs that are kept in the hash
table, but also from those that are dropped), and also not adding some
special magic microoperation for the volatile insns (unclear what exactly
it would have to do).  By never handling the volatile insns specially during
var-tracking all those issues are gone.

Jakub


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-28 Thread Richard Sandiford
Eric Botcazou ebotca...@adacore.com writes:
 I strongly disagree with this.  Outputs and clobbers have significant
 meaning even on volatile asms, asm volatile doesn't mean all registers and
 all memory are supposed to be considered clobbered.  For memory you have
 memory clobber for that, for registers it is users responsibility to
 describe exactly what changes, either in clobbers or in outputs.
 The difference between non-volatile and volatile asm is, as the
 documentation states:
 
 The `volatile' keyword indicates that the instruction has important
 side-effects.  GCC will not delete a volatile `asm' if it is reachable.
 
 Volatile asm acts as an optimization barrier to some extent, but it really
 can't modify registers or memory that aren't described as modified in the
 asm pattern.  The important side-effects are of some other kind than
 modifying registers or memory visible from the current function.
 Ditto for UNSPEC_VOLATILE.

 Well, the last sentence would essentially void the entire argument I
 think.  It's well established in the RTL middle-end that
 UNSPEC_VOLATILE can do pretty much everything behind the back of the
 compiler.

As always when jumping in the middle of thread, I might well be missing
the point sorry, but this sounded a bit extreme if taken literally.
An UNSPEC_VOLATILE doesn't in itself force the function to save all
call-saved registers, so an UNSPEC_VOLATILE that modifies those registers
behind the back of the compiler would lead to us generating wrong code.
And I thought UNSPEC_VOLATILEs that also clobber visible memory in an
unpredictable way had to have something like (clobber (mem:BLK (scratch))).

I thought Jakub's the important side-effects are of some other
kind than modifying registers or memory visible from the current
function applied to UNSPEC_VOLATILE too.

Richard


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-28 Thread Richard Henderson
On 11/27/2012 09:35 AM, Eric Botcazou wrote:
 It's well established in the RTL middle-end that UNSPEC_VOLATILE can do 
 pretty 
 much everything behind the back of the compiler.

This is false.  U_V is a scheduling barrier, and is never deleted as
dead (as opposed to unreachable) code.  Certainly we don't do anything
so complete as invalidate all registers, nor do we invalidate memory.


r~


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-28 Thread Richard Henderson
On 11/27/2012 04:03 AM, Jakub Jelinek wrote:
 2012-11-26  Jakub Jelinek  ja...@redhat.com
 
   PR debug/36728
   PR debug/55467
   * cselib.c (cselib_process_insn): If cselib_preserve_constants,
   don't reset table and exit early on volatile insns and setjmp.
   Reset table afterwards on setjmp.
 
   * gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm
   are non-empty and add dependency between the first and second asm.
   * gcc.dg/guality/pr36728-2.c: Likewise.
   * gcc.dg/guality/pr36728-3.c: New test.
   * gcc.dg/guality/pr36728-4.c: New test.

Ok.

It appears that PRs 55507 and 55511 are also fixed.


r~


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Hans-Peter Nilsson
I quoted the whole discussion, see a single line below.

On Mon, 19 Nov 2012, Eric Botcazou wrote:
  Unfortunately, it causes regressions; read on for a very brief
  analysis.
 
  For x86_64-linux (base multilib):
 
  Running
  /home/hp/gcctop/tmp/pr55030-2n/gcc/gcc/testsuite/gcc.dg/guality/guality.exp
  ... ...
  FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 12 arg7 == 30
  [...]
 
  I looked into gcc.dg/guality/pr36728-1.c at base -O1.
 
  The generated assembly code modulo debug info, is the same.  The
  value of arg7 is optimized out, says gdb in gcc.log.  The
  problem doesn't seem to be any md-generated
  frame-related-barrier as was my first guess, but the volatile
  asms in the source(!).  The first one looks like this (and the
  second one similar) in pr36728-1.c.168r.cse1 (r193583):
 
  (insn 26 25 27 2 (parallel [
  (set (mem/c:SI (plus:DI (reg/f:DI 20 frame)
  (const_int -32 [0xffe0])) [0 y+0 S4
  A256]) (asm_operands/v:SI () (=m) 0 [
  (mem/c:SI (plus:DI (reg/f:DI 20 frame)
  (const_int -32 [0xffe0])) [0 y+0
  S4 A256]) ]
   [
  (asm_input:SI (m) (null):0)
  ]
   []
  /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12))
  (clobber (reg:QI 18 fpsr))
  (clobber (reg:QI 17 flags))
  ])
  /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12
  -1 (nil))
 
  It's not caught by the previous test:
  -   GET_CODE (PATTERN (insn)) == ASM_OPERANDS
  -   MEM_VOLATILE_P (PATTERN (insn)))
 
  ...but since it is a volatile asm (as seen in the source and by
  the /v on the asm_operands) volatile_insn_p does catch it (as
  expected and intended) and down the road for some reason, gcc
  can't recover the arg7 contents.  Somewhat expected, but this
  volatile asm is also more volatile than intended; a clobber list
  for example as seen above inserted by the md, is now redundant.

 Thanks for the analysis.  I don't think that the redundancy of the clobber
 list matters here: the clobbers are added to all asm statements, volatile or
 not, so they aren't intended to make the volatile statements more precise in
 the hope of optimizing around them.

IIIUC, Jakub disagrees on this point, see
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55467#c13.

Sigh, hoping we can get consensus on this point, or at least
that gcc has a consistent view...

  I'm not sure what to do with this.  Try changing volatile_insn_p
  adding a parameter to optionally allow volatile asms with
  outputs to pass?  But then, when *should* that distinction be
  done, to let such volatile asms be allowed to pass as
  not-really-as-volatile-as-we-look-for?  I'd think never for
  any of the the patched code, or maybe only when looking at
  effects on memory.

 Yes, weakening volatile_insn_p seems also dangerous to me, and I'd rather lean
 towards the opposite, conservative side.

 We apparently have a small conflict between the meaning of volatile asms with
 operands at the source level and volatile_insn_p.  However, I think that the
 latter interpretation is the correct one: if your asm statements have effects
 that can be described, then you should use output constraints instead of
 volatile; otherwise, you should use volatile and the output constraints have
 essentially no meaning.

 The gcc.dg/guality/pr36728-[12].c testcases use volatile asms in an artificial
 way to coax the compiler into following a certain behaviour, so I don't think
 that they should be taken into account to judge the correctness of the change.
 Therefore, I'd go ahead and apply the full patch below, possibly adjusting
 both testcases to cope with it.  Tentative (and untested) patch attached.

 I've also CCed Jakub though, as he might have a different opinion.

  gcc:
  PR middle-end/55030
  * builtins.c (expand_builtin_setjmp_receiver): Update comment
  regarding purpose of blockage.
  * emit-rtl.c [!HAVE_blockage] (gen_blockage): Similarly for
  the head comment.
  * rtlanal.c (volatile_insn_p): Ditto.
  * doc/md.texi (blockage): Update similarly.  Change wording to
  require one of two forms, rather than implying a wider choice.
  * cse.c (cse_insn): Where checking for blocking insns, use
  volatile_insn_p instead of manual check for volatile ASM.
  * dse.c (scan_insn): Ditto.
  * cselib.c (cselib_process_insn): Ditto.

 --
 Eric Botcazou



Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 06:44:23AM -0500, Hans-Peter Nilsson wrote:
  We apparently have a small conflict between the meaning of volatile asms 
  with
  operands at the source level and volatile_insn_p.  However, I think that the
  latter interpretation is the correct one: if your asm statements have 
  effects
  that can be described, then you should use output constraints instead of
  volatile; otherwise, you should use volatile and the output constraints have
  essentially no meaning.

I strongly disagree with this.  Outputs and clobbers have significant
meaning even on volatile asms, asm volatile doesn't mean all registers and
all memory are supposed to be considered clobbered.  For memory you have
memory clobber for that, for registers it is users responsibility to
describe exactly what changes, either in clobbers or in outputs.
The difference between non-volatile and volatile asm is, as the
documentation states:

The `volatile' keyword indicates that the instruction has important
side-effects.  GCC will not delete a volatile `asm' if it is reachable.

Volatile asm acts as an optimization barrier to some extent, but it really
can't modify registers or memory that aren't described as modified in the
asm pattern.  The important side-effects are of some other kind than
modifying registers or memory visible from the current function.
Ditto for UNSPEC_VOLATILE.

So, at least from var-tracking POV which doesn't attempt to perform any
optimizations across any insn, just tries to track where values live, IMHO a
volatile asm acts exactly the same as non-volatile, that is why I'm testing
following patch right now.

But the question is also what effects your patch can have e.g. on RTL DSE.

2012-11-26  Jakub Jelinek  ja...@redhat.com

PR debug/36728
PR debug/55467
* cselib.c (cselib_process_insn): If cselib_preserve_constants,
don't reset table and exit early on volatile insns and setjmp.
Reset table afterwards on setjmp.

* gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm
are non-empty and add dependency between the first and second asm.
* gcc.dg/guality/pr36728-2.c: Likewise.
* gcc.dg/guality/pr36728-3.c: New test.
* gcc.dg/guality/pr36728-4.c: New test.

--- gcc/cselib.c.jj 2012-11-26 10:14:26.0 +0100
+++ gcc/cselib.c2012-11-26 20:01:10.488304558 +0100
@@ -2626,11 +2626,12 @@ cselib_process_insn (rtx insn)
   cselib_current_insn = insn;
 
   /* Forget everything at a CODE_LABEL, a volatile insn, or a setjmp.  */
-  if (LABEL_P (insn)
-  || (CALL_P (insn)
-  find_reg_note (insn, REG_SETJMP, NULL))
-  || (NONJUMP_INSN_P (insn)
-  volatile_insn_p (PATTERN (insn
+  if ((LABEL_P (insn)
+   || (CALL_P (insn)
+   find_reg_note (insn, REG_SETJMP, NULL))
+   || (NONJUMP_INSN_P (insn)
+   volatile_insn_p (PATTERN (insn
+   !cselib_preserve_constants)
 {
   cselib_reset_table (next_uid);
   cselib_current_insn = NULL_RTX;
@@ -2668,9 +2669,18 @@ cselib_process_insn (rtx insn)
   /* Look for any CLOBBERs in CALL_INSN_FUNCTION_USAGE, but only
  after we have processed the insn.  */
   if (CALL_P (insn))
-for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1))
-  if (GET_CODE (XEXP (x, 0)) == CLOBBER)
-   cselib_invalidate_rtx (XEXP (XEXP (x, 0), 0));
+{
+  for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1))
+   if (GET_CODE (XEXP (x, 0)) == CLOBBER)
+ cselib_invalidate_rtx (XEXP (XEXP (x, 0), 0));
+  /* Flush evertything on setjmp.  */
+  if (cselib_preserve_constants
+  find_reg_note (insn, REG_SETJMP, NULL))
+   {
+ cselib_preserve_only_values ();
+ cselib_reset_table (next_uid);
+   }
+}
 
   /* On setter of the hard frame pointer if frame_pointer_needed,
  invalidate stack_pointer_rtx, so that sp and {,h}fp based
--- gcc/testsuite/gcc.dg/guality/pr36728-1.c.jj 2012-11-26 10:14:25.0 
+0100
+++ gcc/testsuite/gcc.dg/guality/pr36728-1.c2012-11-27 10:01:14.517080157 
+0100
@@ -1,7 +1,11 @@
 /* PR debug/36728 */
 /* { dg-do run } */
 /* { dg-options -g } */
-int a;
+
+#include ../nop.h
+
+int a, b;
+
 int __attribute__((noinline))
 foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7)
 {
@@ -9,9 +13,9 @@ foo (int arg1, int arg2, int arg3, int a
   int __attribute__ ((aligned(32))) y;
 
   y = 2;
-  asm ( : =m (y) : m (y));
+  asm (NOP : =m (y), =m (b) : m (y));
   x[0] = 25;
-  asm ( : =m (x[0]), =m (a) : m (x[0]));
+  asm (NOP : =m (x[0]), =m (a) : m (x[0]), m (b));
   return y;
 }
 
@@ -21,23 +25,23 @@ foo (int arg1, int arg2, int arg3, int a
and arg2.  So it is expected that these values are unavailable in
some of these tests.  */
 
-/* { dg-final { gdb-test 12 arg1 1 { target { ! s390*-*-* } } } }*/
-/* { dg-final { gdb-test 12 arg2 2 { target { ! s390*-*-* } } } }*/
-/* { dg-final { gdb-test 12 

Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Hans-Peter Nilsson
On Tue, 27 Nov 2012, Jakub Jelinek wrote:
 On Tue, Nov 27, 2012 at 06:44:23AM -0500, Hans-Peter Nilsson wrote:

JFTR: No I didn't, Eric wrote the below.  But, it made sense to me. :)

   We apparently have a small conflict between the meaning of volatile asms 
   with
   operands at the source level and volatile_insn_p.  However, I think that 
   the
   latter interpretation is the correct one: if your asm statements have 
   effects
   that can be described, then you should use output constraints instead of
   volatile; otherwise, you should use volatile and the output constraints 
   have
   essentially no meaning.

 I strongly disagree with this.
 [...]

As long as volatile asms and UNSPEC_VOLATILE insns (aka.
barriers) are handled the same way and consistently throughout
gcc, I'm fine.  It seems your patch does that, so thanks!

 But the question is also what effects your patch can have e.g. on RTL DSE.

Looks like the patch caused a bootstrap for s390x.

Eagerly awaiting a PR for that, but whoever is interested
on that, please try Jakub's patch first...

 2012-11-26  Jakub Jelinek  ja...@redhat.com

   PR debug/36728
   PR debug/55467
   * cselib.c (cselib_process_insn): If cselib_preserve_constants,
   don't reset table and exit early on volatile insns and setjmp.
   Reset table afterwards on setjmp.

   * gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm
   are non-empty and add dependency between the first and second asm.
   * gcc.dg/guality/pr36728-2.c: Likewise.
   * gcc.dg/guality/pr36728-3.c: New test.
   * gcc.dg/guality/pr36728-4.c: New test.

brgds, H-P


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 01:03:47PM +0100, Jakub Jelinek wrote:
 So, at least from var-tracking POV which doesn't attempt to perform any
 optimizations across any insn, just tries to track where values live, IMHO a
 volatile asm acts exactly the same as non-volatile, that is why I'm testing
 following patch right now.
 
 2012-11-26  Jakub Jelinek  ja...@redhat.com
 
   PR debug/36728
   PR debug/55467
   * cselib.c (cselib_process_insn): If cselib_preserve_constants,
   don't reset table and exit early on volatile insns and setjmp.
   Reset table afterwards on setjmp.
 
   * gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm
   are non-empty and add dependency between the first and second asm.
   * gcc.dg/guality/pr36728-2.c: Likewise.
   * gcc.dg/guality/pr36728-3.c: New test.
   * gcc.dg/guality/pr36728-4.c: New test.

Bootstrapped/regtested fine on x86_64-linux and i686-linux (and fixes the
pr36728-[34].c failures that appear without the patch, which are of the
wrong debug kind).

Jakub


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Eric Botcazou
 I strongly disagree with this.  Outputs and clobbers have significant
 meaning even on volatile asms, asm volatile doesn't mean all registers and
 all memory are supposed to be considered clobbered.  For memory you have
 memory clobber for that, for registers it is users responsibility to
 describe exactly what changes, either in clobbers or in outputs.
 The difference between non-volatile and volatile asm is, as the
 documentation states:
 
 The `volatile' keyword indicates that the instruction has important
 side-effects.  GCC will not delete a volatile `asm' if it is reachable.
 
 Volatile asm acts as an optimization barrier to some extent, but it really
 can't modify registers or memory that aren't described as modified in the
 asm pattern.  The important side-effects are of some other kind than
 modifying registers or memory visible from the current function.
 Ditto for UNSPEC_VOLATILE.

Well, the last sentence would essentially void the entire argument I think.  
It's well established in the RTL middle-end that UNSPEC_VOLATILE can do pretty 
much everything behind the back of the compiler.

Now I agree that the discussion exists for volatile asms.  But you have for 
example in the unmodified cse.c:

  /* A volatile ASM invalidates everything.  */
  if (NONJUMP_INSN_P (insn)
   GET_CODE (PATTERN (insn)) == ASM_OPERANDS
   MEM_VOLATILE_P (PATTERN (insn)))
flush_hash_table ();

and the comment of volatile_insn_p is rather explicit:

/* Nonzero if X contains any volatile instructions.  These are instructions
   which may cause unpredictable machine state instructions, and thus no
   instructions should be moved or combined across them.  This includes
   only volatile asms and UNSPEC_VOLATILE instructions.  */

The problem is that the various RTL passes don't have a consistent view on 
that so the patch attempts to tidy this up in a conservative manner.

I think that a compromise could be to say that volatile asms without outputs 
are full barriers (like UNSPEC_VOLATILE) but other volatile asms aren't.
That's what the unmodified cse.c, cselib.c and dse.c currently implement.
But implementing it consistently would mean weakening volatile_insn_p.

 So, at least from var-tracking POV which doesn't attempt to perform any
 optimizations across any insn, just tries to track where values live, IMHO a
 volatile asm acts exactly the same as non-volatile, that is why I'm testing
 following patch right now.
 
 But the question is also what effects your patch can have e.g. on RTL DSE.

It will make all volatile asms be treated as volatile asms without outputs.

-- 
Eric Botcazou


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 06:35:52PM +0100, Eric Botcazou wrote:
 Now I agree that the discussion exists for volatile asms.  But you have for 
 example in the unmodified cse.c:
 
   /* A volatile ASM invalidates everything.  */
   if (NONJUMP_INSN_P (insn)
GET_CODE (PATTERN (insn)) == ASM_OPERANDS
MEM_VOLATILE_P (PATTERN (insn)))
 flush_hash_table ();

That was weird indeed, because
  asm volatile ( : : r (x));
flushed (for targets that don't add any implicit clobbers) but e.g.
  asm volatile ( : : r (x) : r23);
wouldn't (then the pattern is a parallel with ASM_OPERANDS and some
CLOBBERs).  The effect of that is that it never triggered on i?86/x86_64,
because those have the implicit fprs/flags clobbers.

I was mainly arguing with the sentence that for asm volatile, the output
constraints (did you mean outputs and clobbers?) have essentially no
meaning.  While for some optimizations perhaps it might be desirable
to treat asm volatile as full optimization barrier, I'm not sure about all
of them (i.e. it would be important to look for performance degradations
caused by that change), and for var-tracking I'd argue that asm vs. asm
volatile is completely unimportant, if the asm volatile pattern (or even
UNSPEC_VOLATILE) doesn't say it clobbers or sets hard register XYZ, then it
can't change it (well, it could but is responsible of restoring it),
similarly for memory, if it doesn't clobber memory and doesn't set some
memory, it can't do that.

So, do you object even to the var-tracking change (which would mean if
var-tracking found that say some variable's value lives in hard register 12, 
when
encountering asm volatile it would mean to forget about that even when
hard register 12 isn't clobbered by the asm, nor set)?  For now var-tracking
seems to be the most important issue, as we generate invalid debug info
there (latent before, but never hit on inline asm on x86_64/i686 except
on setjmp calls).

As for other passes, the r193802 change e.g. regresses slightly modified
pr44194-1.c testcase on x86_64,
struct ints { int a, b, c; } foo();
void bar(int a, int b);

void func() {
  struct ints s = foo();
  int x;
  asm volatile ( : =r (x));
  bar(s.a, s.b);
}

 func:
 .LFB0:
-   xorl%eax, %eax
subq$24, %rsp
 .LCFI0:
+   xorl%eax, %eax
callfoo
-   movq%rax, %rcx
-   shrq$32, %rcx
-   movq%rcx, %rsi
-   movl%eax, %edi
+   movq%rax, (%rsp)
+   movl%edx, 8(%rsp)
+   movl4(%rsp), %esi
+   movl(%rsp), %edi
addq$24, %rsp
 .LCFI1:
jmp bar

To me it looks like an unnecessary pessimization, the volatile there
I understand that just it doesn't want to be moved around (e.g. scheduled
before the foo call or after bar), that it can't be DCEd, but as it doesn't
mention it modifies memory, I don't understand why it should force some
registers to stack and back when it has no way to know the compiler would be
emitting anything like that at all.
Compare that to
  asm volatile ( : =r (x) : : memory);
in the testcase, where the r193802 commit makes no difference, the
stores/loads are done in both cases.

For CSE, I'd agree it should treat asm volatile as barriers, so for cselib.c
we probably need some additional cselib_init flag how we want to treat
volatile asms...  For UNSPEC_VOLATILE, perhaps even DSE should stay
conservative, but for var-tracking.c I still don't see a reason for that.

Jakub


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Uros Bizjak
Hello!

 On Tue, 27 Nov 2012, Jakub Jelinek wrote:
  On Tue, Nov 27, 2012 at 06:44:23AM -0500, Hans-Peter Nilsson wrote:

 JFTR: No I didn't, Eric wrote the below.  But, it made sense to me. :)

We apparently have a small conflict between the meaning of volatile 
asms with
operands at the source level and volatile_insn_p.  However, I think 
that the
latter interpretation is the correct one: if your asm statements have 
effects
that can be described, then you should use output constraints instead of
volatile; otherwise, you should use volatile and the output constraints 
have
essentially no meaning.
 
  I strongly disagree with this.
  [...]

 As long as volatile asms and UNSPEC_VOLATILE insns (aka.
 barriers) are handled the same way and consistently throughout
 gcc, I'm fine.  It seems your patch does that, so thanks!

  But the question is also what effects your patch can have e.g. on RTL DSE.

 Looks like the patch caused a bootstrap for s390x.

 Eagerly awaiting a PR for that, but whoever is interested
 on that, please try Jakub's patch first...

  2012-11-26  Jakub Jelinek  ja...@redhat.com
 
  PR debug/36728
  PR debug/55467
  * cselib.c (cselib_process_insn): If cselib_preserve_constants,
  don't reset table and exit early on volatile insns and setjmp.
  Reset table afterwards on setjmp.
 
  * gcc.dg/guality/pr36728-1.c: Include ../nop.h, make sure the asm
  are non-empty and add dependency between the first and second asm.
  * gcc.dg/guality/pr36728-2.c: Likewise.
  * gcc.dg/guality/pr36728-3.c: New test.
  * gcc.dg/guality/pr36728-4.c: New test.

I have hit the same ICE on alpha-linux-gnu bootstrap. Jakub's patch
fixes the ICE, and allows bootstrap to pass well into stage2 now.
However, it takes ~10 hours for full bootstrap+regtest to finish, will
report back tomorrow morning (CET).

Uros.


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-19 Thread Eric Botcazou
 Unfortunately, it causes regressions; read on for a very brief
 analysis.
 
 For x86_64-linux (base multilib):
 
 Running
 /home/hp/gcctop/tmp/pr55030-2n/gcc/gcc/testsuite/gcc.dg/guality/guality.exp
 ... ...
 FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 12 arg7 == 30
 [...]

 I looked into gcc.dg/guality/pr36728-1.c at base -O1.
 
 The generated assembly code modulo debug info, is the same.  The
 value of arg7 is optimized out, says gdb in gcc.log.  The
 problem doesn't seem to be any md-generated
 frame-related-barrier as was my first guess, but the volatile
 asms in the source(!).  The first one looks like this (and the
 second one similar) in pr36728-1.c.168r.cse1 (r193583):
 
 (insn 26 25 27 2 (parallel [
 (set (mem/c:SI (plus:DI (reg/f:DI 20 frame)
 (const_int -32 [0xffe0])) [0 y+0 S4
 A256]) (asm_operands/v:SI () (=m) 0 [
 (mem/c:SI (plus:DI (reg/f:DI 20 frame)
 (const_int -32 [0xffe0])) [0 y+0
 S4 A256]) ]
  [
 (asm_input:SI (m) (null):0)
 ]
  []
 /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12))
 (clobber (reg:QI 18 fpsr))
 (clobber (reg:QI 17 flags))
 ])
 /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12
 -1 (nil))
 
 It's not caught by the previous test:
 -   GET_CODE (PATTERN (insn)) == ASM_OPERANDS
 -   MEM_VOLATILE_P (PATTERN (insn)))
 
 ...but since it is a volatile asm (as seen in the source and by
 the /v on the asm_operands) volatile_insn_p does catch it (as
 expected and intended) and down the road for some reason, gcc
 can't recover the arg7 contents.  Somewhat expected, but this
 volatile asm is also more volatile than intended; a clobber list
 for example as seen above inserted by the md, is now redundant.

Thanks for the analysis.  I don't think that the redundancy of the clobber 
list matters here: the clobbers are added to all asm statements, volatile or 
not, so they aren't intended to make the volatile statements more precise in 
the hope of optimizing around them.

 I'm not sure what to do with this.  Try changing volatile_insn_p
 adding a parameter to optionally allow volatile asms with
 outputs to pass?  But then, when *should* that distinction be
 done, to let such volatile asms be allowed to pass as
 not-really-as-volatile-as-we-look-for?  I'd think never for
 any of the the patched code, or maybe only when looking at
 effects on memory.

Yes, weakening volatile_insn_p seems also dangerous to me, and I'd rather lean 
towards the opposite, conservative side.

We apparently have a small conflict between the meaning of volatile asms with 
operands at the source level and volatile_insn_p.  However, I think that the 
latter interpretation is the correct one: if your asm statements have effects 
that can be described, then you should use output constraints instead of 
volatile; otherwise, you should use volatile and the output constraints have 
essentially no meaning.

The gcc.dg/guality/pr36728-[12].c testcases use volatile asms in an artificial 
way to coax the compiler into following a certain behaviour, so I don't think 
that they should be taken into account to judge the correctness of the change.
Therefore, I'd go ahead and apply the full patch below, possibly adjusting 
both testcases to cope with it.  Tentative (and untested) patch attached.

I've also CCed Jakub though, as he might have a different opinion.

 gcc:
   PR middle-end/55030
   * builtins.c (expand_builtin_setjmp_receiver): Update comment
   regarding purpose of blockage.
   * emit-rtl.c [!HAVE_blockage] (gen_blockage): Similarly for
   the head comment.
   * rtlanal.c (volatile_insn_p): Ditto.
   * doc/md.texi (blockage): Update similarly.  Change wording to
   require one of two forms, rather than implying a wider choice.
   * cse.c (cse_insn): Where checking for blocking insns, use
   volatile_insn_p instead of manual check for volatile ASM.
   * dse.c (scan_insn): Ditto.
   * cselib.c (cselib_process_insn): Ditto.

-- 
Eric BotcazouIndex: gcc.dg/guality/pr36728-1.c
===
--- gcc.dg/guality/pr36728-1.c	(revision 193596)
+++ gcc.dg/guality/pr36728-1.c	(working copy)
@@ -9,10 +9,10 @@ foo (int arg1, int arg2, int arg3, int a
   int __attribute__ ((aligned(32))) y;
 
   y = 2;
-  asm volatile ( : =m (y) : m (y));
+  asm ( : =m (y) : m (y));
   x[0] = 25;
-  asm volatile ( : =m (x[0]) : m (x[0]));
-  return y;
+  asm ( : =m (x[0]) : m (x[0]));
+  return y + x[0];
 }
 
 /* On s390(x) r2 and r3 are (depending on the optimization level) used
@@ -39,11 +39,13 @@ foo (int arg1, int arg2, int arg3, int a
 /* { dg-final { gdb-test 14 *x (char) 25 } } */
 /* { dg-final { gdb-test 14 y 2 } } */
 
+int a;
+
 int
 main ()
 {
   int l = 0;
-  

Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-12 Thread Eric Botcazou
 This is a target-specific blockage insn, but with the general form
 found in all targets defining it.  The default blockage is an empty
 asm-volatile, which is what cse_insn recognized.  The blockage insn is
 there to prevent scheduling of the critical insns and register
 values.  It's almost obvious that an unspec_volatile should have that
 effect too; at least that's intended by the code in
 expand_builtin_setjmp_receiver.  Luckily (or unluckily regarding the
 presence of the bug) *this* cse code is not run post-frame-layout
 (post-reload-cse does not use this code), or it seems people would
 soon notice register values used from the wrong side of the blockage,
 considering its critical use at the restore of the stack-pointer.
 As mentioned in the previous patch,
 http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01172.html, clobbering
 the frame-pointer (and then using it) does not seem the logical
 alternative to the patch below; the blockage insn should just do its job.

Agreed.

 I updated comments and documentation so it's more apparent that
 register uses should also not be moved across the blockage; see the
 patched comments.
 
 Tested native x86_64-unknown-linux-gnu w./wo. -m32 and before/after
 192677.  Ok to commit?
 
 gcc:
   PR middle-end/55030
   * builtins.c (expand_builtin_setjmp_receiver): Update comment
   regarding purpose of blockage.
   * emit-rtl.c [!HAVE_blockage] (gen_blockage): Similarly for
   the head comment.
   * doc/md.texi (blockage): Update similarly.  Change wording to
   require one of two forms, rather than implying a wider choice.
   * cse.c (cse_insn): Where checking for blocking insns, treat
   UNSPEC_VOLATILE as blocking, besides volatile ASM.

That's fine on principle, but there is a predicate for this (volatile_insn_p) 
so I think we should use it here.  Moreover, cselib_process_insn has the same 
check so we should adjust it as well, which in turn means that dse.c:scan_insn 
should probably be adjusted too.  OK with these changes, thanks.

-- 
Eric Botcazou


[RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-11 Thread Hans-Peter Nilsson
The problem exposed in PR55030 (repeatable on x86_64-linux with -m32 at
r192676) is that the fake-frame-pointer frame is replaced with the
actual-frame-pointer bp in cse1, around the critical insn in
__builtin_setjmp_receiver that restores their defined offset.  The
patch in PR55030/r192676 removed a clobber of the frame-pointer, and
cse1 felt free to replace the former register with the latter, as in
the following expansion of __builtin_setjmp_receiver (in builtins.c:
expand_builtin_setjmp_receiver) from a reduced test-case, see the PR.
(You might find the setup/restore code having an unexpected order;
there are two __builtin_setjmps in series and the setup of the second
one is storing the wrong value for the frame-pointer.)

The pre-transformation dump in cse1 says:

(code_label/s 13 51 14 3 2  [2 uses])
(note 14 13 15 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 15 14 78 3 (use (reg/f:SI 6 bp)) 
/home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31
 -1
 (nil))
(insn 78 15 17 3 (set (reg/f:SI 20 frame)
(reg/f:SI 6 bp)) 
/home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31
 -1
 (nil))
(insn 17 78 56 3 (unspec_volatile [
(const_int 0 [0])
] UNSPECV_BLOCKAGE) 
/home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31
 643 {blockage}
 (nil))
(insn 56 17 57 3 (set (reg/f:SI 74)
(symbol_ref:SI (chk_fail_buf) [flags 0x40]  var_decl 0x77512688 
chk_fail_buf)) 
/home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:37
 65 {*movsi_internal}
 (nil))
(insn 57 56 58 3 (set (mem:SI (reg/f:SI 74) [5 S4 A8])
(reg/f:SI 20 frame)) 
/home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:37
 65 {*movsi_internal}
 (nil))

Before r192676 and after the reversion in r192701, there was/is that
weird extra (clobber (reg/f:SI 6 bp) inserted before insn 17.  But
without that, we notice post-cse1-transformation, a change in insn 57:

(code_label/s 13 51 14 3 2  [3 uses])
(note 14 13 15 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 15 14 78 3 (use (reg/f:SI 6 bp)) 
/home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31
 -1
 (nil))
(insn 78 15 17 3 (set (reg/f:SI 20 frame)
(reg/f:SI 6 bp)) 
/home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31
 -1
 (nil))
(insn 17 78 56 3 (unspec_volatile [
(const_int 0 [0])
] UNSPECV_BLOCKAGE) 
/home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:31
 643 {blockage}
 (nil))
(insn 56 17 57 3 (set (reg/f:SI 74)
(symbol_ref:SI (chk_fail_buf) [flags 0x40]  var_decl 0x77512688 
chk_fail_buf)) 
/home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:37
 65 {*movsi_internal}
 (nil))
(insn 57 56 58 3 (set (mem:SI (reg/f:SI 74) [5 S4 A8])
(reg/f:SI 6 bp)) 
/home/hp/gcctop/tmp/base/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/pr55030-chk.c:37
 65 {*movsi_internal}
 (nil))

It seemed wrong for this change to be stopped *only* by that (clobber bp).
Stepping through the code for insn 78 and 57, I was looking for
related special-casing of frame- and other similar registers in cse.c
and was a bit confused when I found none (not counting hash_rtx_cb).
(I admit lack of special-casing is a good sign, though. :)  Then I was
blinded by the obvious; the next insn, insn 17, is a blockage insn.

This is a target-specific blockage insn, but with the general form
found in all targets defining it.  The default blockage is an empty
asm-volatile, which is what cse_insn recognized.  The blockage insn is
there to prevent scheduling of the critical insns and register
values.  It's almost obvious that an unspec_volatile should have that
effect too; at least that's intended by the code in
expand_builtin_setjmp_receiver.  Luckily (or unluckily regarding the
presence of the bug) *this* cse code is not run post-frame-layout
(post-reload-cse does not use this code), or it seems people would
soon notice register values used from the wrong side of the blockage,
considering its critical use at the restore of the stack-pointer.
As mentioned in the previous patch,
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01172.html, clobbering
the frame-pointer (and then using it) does not seem the logical
alternative to the patch below; the blockage insn should just do its job.

I updated comments and documentation so it's more apparent that
register uses should also not be moved across the blockage; see the
patched comments.

Tested native x86_64-unknown-linux-gnu w./wo. -m32 and before/after
192677.  Ok to commit?

gcc:
PR middle-end/55030
* builtins.c (expand_builtin_setjmp_receiver): Update comment
regarding purpose of blockage.
* emit-rtl.c [!HAVE_blockage] (gen_blockage): Similarly for