Re: [RFC] Do not consider volatile asms as optimization barriers #1

2014-03-03 Thread Richard Sandiford
Eric Botcazou ebotca...@adacore.com writes:
 Thanks for doing this.  FWIW I agree it's probably the best stop-gap fix.
 But the implication seems to be that unspec_volatile and volatile asms
 are volatile in different ways.  IMO they're volatile in the same way
 and the problems for volatile asms apply to unspec_volatile too.

 I disagree, we need a simple way for the RTL middle-end as well as the back-
 ends to block most optimizations across a specific point (e.g. a non-local 
 label as in HP's fix) and UNSPEC_VOLATILE is the best candidate, at least in 
 the short term.

I don't agree it's the best candidate if...

 E.g. although cse.c will flush the table for unspec_volatile,
 it isn't the case that unspec_volatile forces a containing function
 to save all call-saved registers.  That would be excessive for a plain
 blockage instruction.  So again we seem to be assuming one thing in places
 like cse.c and another in the register allocator.  Code that uses the DF
 framework will also assume that registers are not implicitly clobbered
 by an unspec_volatile:
 [...]
 Also, ira-lives.c (which tracks the liveness of both pseudo and hard
 registers) doesn't mention volatile at all.

 Yes, the definition of a blockage instruction is somewhat vague and I agree 
 that it shoudn't cause registers to be spilled.  But it needs to block most, 
 if not all, optimizations.

...it's so loosely defined.  If non-local labels are the specific problem,
I think it'd be better to limit the flush to that.

I'm back to throwing examples around, sorry, but take the MIPS testcase:

volatile int x = 1;

void foo (void)
{
  x = 1;
  __builtin_mips_set_fcsr (0);
  x = 2;
}

where __builtin_mips_set_fcsr is a handy way of getting unspec_volatile.
(I'm not interested in what the function does here.)  Even at -O2,
the cse.c code successfully prevents %hi(x) from being shared,
as you'd expect:

li  $3,1# 0x1
lui $2,%hi(x)
sw  $3,%lo(x)($2)
move$2,$0
ctc1$2,$31
li  $3,2# 0x2
lui $2,%hi(x)
sw  $3,%lo(x)($2)
j   $31
nop

But put it in a loop:

void frob (void)
{
  for (;;)
{
  x = 1;
  __builtin_mips_set_fcsr (0);
  x = 2;
}
}

and we get the rather bizarre code:

lui $2,%hi(x)
li  $6,1# 0x1
move$5,$0
move$4,$2
li  $3,2# 0x2
.align  3
.L3:
sw  $6,%lo(x)($2)
ctc1$5,$31
sw  $3,%lo(x)($4)
j   .L3
lui $2,%hi(x)

Here the _second_ %hi(x), the 1 and the 2 have been hoisted but the first
%hi(x) is reloaded each time.  So what's the correct behaviour here?
Should the hoisting of the second %hi(x) have been disabled because the
loop contains an unspec_volatile?  What about the 1 (from the first store)
and the 2?

If instead it was:

   for (i = 0; i  100; i++)

would converting to a hardware do-loop be acceptable?

 So most passes assume that no pseudos or hard registers will be
 implicitly clobbered by unspec_volatile, just like for a volatile asm.
 And IMO that's right.  I think the rule should be the same for volatile
 asms and unspec_volatiles, and the same for registers as it already is for
 memory: if the instruction clobbers something, it should say so explicitly.

 IMO that would buy us nothing and, on the contrary, would add complexity 
 where 
 there currently isn't.  We really need a simple blockage instruction.

 Volatile itself should:
 
 (a) prevent deletion or duplication of the operation
 (b) prevent reordering wrt other volatiles
 (c) prevent the operation from being considered equivalent to any other
 operation (even if it's structurally identical and has the same inputs)
 
 but nothing beyond that.

 Maybe UNSPEC_VOLATILE is a misnomer then and we should allow volatile UNSPECs 
 along the above lines.

That'd be fine with me, especially since with the patch it sounds like
using volatile asm would produce better code than a built-in function
that expands to an unspec_volatile, whereas IMO the opposite should
always be true.

But I still think we need a similar list for what unspec_volatile
means now, if we decide to keep something with the current meaning.

Thanks,
Richard


Re: Is LLVM really pulling ahead of gcc?

2014-03-03 Thread Jakub Jelinek
On Mon, Mar 03, 2014 at 10:20:10AM +0800, lin zuojian wrote:
 I saw
 
 http://www.phoronix.com/scan.php?page=articleitem=llvm34_gcc49_compilersnum=1
 And LLVM/clang beaten gcc in serval tests.But I ran that
 tests,e.g.SciMark,it didn't appear to be like this on my i7 machine.Was 
 that
 article written by Apple?

See my comment there and various others, the phoronix benchmarks are usually
microbenchmarks where the results vary a lot from one run to another one,
I've very rarely have been actually able to reproduce anything close to the
numbers Michael posts, and even when e.g. in this January I've improved
predcom on one of the microbenchmarks a lot in GCC, no changes have been
visible in Michael's numbers afterwards.  I find his articles very
LLVM/clang biased and numbers untrustworthy.  Of course it makes sense
to us to look even at those microbenchmarks from time to time and try to
improve things if possible, often it is primarily about badly chosen
compiler options on the Phoronix side (say benchmarks needing -ffast-math
for vectorization and it not being supplied, so it is never vectorized by
any compiler).  SPEC is of course far more trustworthy benchmark.

Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread lin zuojian
Hi Jakub,
Any comments on this patch?
--
Regards
lin zuojian

On Tue, Feb 25, 2014 at 03:28:14PM +0800, lin zuojian wrote:
 Sorry,I have forgot setting another shadow_mem's align.And many strbs
 bump up.
 Here 's patch v4.(last one contains html,damn thunderbird).
 
 --
 Without aligning the asan stack base,this base will only 64-bit aligned
 in ARM machines.
 But asan require 256-bit aligned base because of this:
 1.right shift take ASAN_SHADOW_SHIFT(which is 3) bits are zeros
 2.store multiple/load multiple instructions require the other 2 bits are
 zeros
 
 that add up lowest 5 bits should be zeros.That means 32 bytes or 256
 bits aligned.
 
 * asan.c (asan_emit_stack_protection): Forcing the base to align to 256
 bits if STRICT_ALIGNMENT.
 And set shadow_mem align to 256 bits if STRICT_ALIGNMENT
 * cfgexpand.c (expand_stack_vars): set base_align appropriately when
 asan is on
 (expand_used_vars): Leaving a space in the stack frame for alignment if
 STRICT_ALIGNMENT
 ---
 gcc/asan.c | 14 ++
 gcc/cfgexpand.c | 13 -
 2 files changed, 26 insertions(+), 1 deletion(-)
 
 diff --git a/gcc/asan.c b/gcc/asan.c
 index 53992a8..64898cd 100644
 --- a/gcc/asan.c
 +++ b/gcc/asan.c
 @@ -1017,8 +1017,16 @@ asan_emit_stack_protection (rtx base, rtx pbase,
 unsigned int alignb,
 base_align_bias = ((asan_frame_size + alignb - 1)
  ~(alignb - HOST_WIDE_INT_1)) - asan_frame_size;
 }
 + /* Align base if target is STRICT_ALIGNMENT. */
 + if (STRICT_ALIGNMENT)
 + base = expand_binop (Pmode, and_optab, base,
 + gen_int_mode (-((GET_MODE_ALIGNMENT (SImode)  ASAN_SHADOW_SHIFT) /
 BITS_PER_UNIT), Pmode),
 + NULL_RTX, 1, OPTAB_DIRECT);
 +
 if (use_after_return_class == -1  pbase)
 emit_move_insn (pbase, base);
 +
 +
 base = expand_binop (Pmode, add_optab, base,
 gen_int_mode (base_offset - base_align_bias, Pmode),
 NULL_RTX, 1, OPTAB_DIRECT);
 @@ -1097,6 +1105,8 @@ asan_emit_stack_protection (rtx base, rtx pbase,
 unsigned int alignb,
  (ASAN_RED_ZONE_SIZE  ASAN_SHADOW_SHIFT) == 4);
 shadow_mem = gen_rtx_MEM (SImode, shadow_base);
 set_mem_alias_set (shadow_mem, asan_shadow_set);
 + if (STRICT_ALIGNMENT)
 + set_mem_align(shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
 prev_offset = base_offset;
 for (l = length; l; l -= 2)
 {
 @@ -1186,6 +1196,10 @@ asan_emit_stack_protection (rtx base, rtx pbase,
 unsigned int alignb,
 
 shadow_mem = gen_rtx_MEM (BLKmode, shadow_base);
 set_mem_alias_set (shadow_mem, asan_shadow_set);
 +
 + if (STRICT_ALIGNMENT)
 + set_mem_align(shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
 +
 prev_offset = base_offset;
 last_offset = base_offset;
 last_size = 0;
 diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
 index 06d494c..14fd1c2 100644
 --- a/gcc/cfgexpand.c
 +++ b/gcc/cfgexpand.c
 @@ -1013,10 +1013,18 @@ expand_stack_vars (bool (*pred) (size_t), struct
 stack_vars_data *data)
 if (data-asan_base == NULL)
 data-asan_base = gen_reg_rtx (Pmode);
 base = data-asan_base;
 +
 + if (!STRICT_ALIGNMENT)
 + base_align = crtl-max_used_stack_slot_alignment;
 + else
 + base_align = MAX(crtl-max_used_stack_slot_alignment,
 + (GET_MODE_ALIGNMENT (SImode)  ASAN_SHADOW_SHIFT));
 }
 else
 + {
 offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
 - base_align = crtl-max_used_stack_slot_alignment;
 + base_align = crtl-max_used_stack_slot_alignment;
 + }
 }
 else
 {
 @@ -1843,6 +1851,9 @@ expand_used_vars (void)
 = alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE);
 data.asan_vec.safe_push (prev_offset);
 data.asan_vec.safe_push (offset);
 + /* Leave a space for alignment if STRICT_ALIGNMENT. */
 + if (STRICT_ALIGNMENT)
 + alloc_stack_frame_space ((GET_MODE_ALIGNMENT (SImode) 
 ASAN_SHADOW_SHIFT) / BITS_PER_UNIT , 1);
 
 var_end_seq
 = asan_emit_stack_protection (virtual_stack_vars_rtx,
 -- 
 1.8.3.2
 


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread Jakub Jelinek
On Fri, Feb 28, 2014 at 08:47:56AM +0100, Bernd Edlinger wrote:
 I see the problem too.
 
 But I think it is not necessary to change the stack alignment
 to solve the problem.
 
 It appears to me that the code in asan_emit_stack_protection
 is just wrong. It uses SImode when the memory is not aligned
 enough for that mode. This would not happen if that code
 is rewritten to use get_best_mode, and by the way, even on
 x86_64 the emitted code is not optimal, because that target
 could work with DImode more efficiently.

No, the use of SImode on x86_64 is very much intentional, movabsqs + movq
are generally slower.

 So, to fix that, it would be better to concentrate on that function,
 and use word_mode instead of SImode, and let get_best_mode
 choose the required mode.

No.  As I wrote earlier, the alternative is to use unaligned stores for ARM,
I've asked Lin to benchmark that compared to his patch, but haven't seen
that done yet.

Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread Jakub Jelinek
On Sat, Mar 01, 2014 at 08:22:32PM +0100, Bernd Edlinger wrote:
 So, that is what I mean: this patch makes the stack grow by
 32 bytes, just because the emit_stack_protection uses SImode,
 with unaligned addresses which is not possible for ARM, and
 not optimal for X86_64.

Incorrect, for this case it is optimal on x86-64.

Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread Jakub Jelinek
On Mon, Mar 03, 2014 at 04:51:20PM +0800, lin zuojian wrote:
 Hi Jakub,
 Any comments on this patch?

Can you please repost the patch (+ ChangeLog entry) as attachment?
Or use saner MUA?  When all tabs are eaten and other whitespace crippled,
it is impossible to look at the formatting.

Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread lin zuojian
Hi Jakub,

 No.  As I wrote earlier, the alternative is to use unaligned stores for ARM,
 I've asked Lin to benchmark that compared to his patch, but haven't seen
 that done yet.
 
   Jakub
I have not benchmark yet.But according to what I hear from an ARM Engineer in 
Huawei,
unaligned accessing usually slow.And not recommand to use too much.
-- 
Regards
lin zuojian


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread Jakub Jelinek
On Mon, Mar 03, 2014 at 05:04:45PM +0800, lin zuojian wrote:
  No.  As I wrote earlier, the alternative is to use unaligned stores for ARM,
  I've asked Lin to benchmark that compared to his patch, but haven't seen
  that done yet.

 I have not benchmark yet.But according to what I hear from an ARM Engineer in 
 Huawei,
 unaligned accessing usually slow.And not recommand to use too much.

It is expected it will not be as fast as aligned store, the question is if
an unaligned 32-bit store is faster than 4 8-bit stores, and/or if the cost of
the unaligned stores is bad enough (note, usually it is just a few stores in
the prologue and epilogue) to offset for the penalties introduced by
realigning the stack (typically one extra register that has to be live, plus
the cost of the realignment itself).

Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread lin zuojian
Okay,I will use mutt as my MUA.
--
Regards
lin zuojian

On Mon, Mar 03, 2014 at 09:58:59AM +0100, Jakub Jelinek wrote:
 On Mon, Mar 03, 2014 at 04:51:20PM +0800, lin zuojian wrote:
  Hi Jakub,
  Any comments on this patch?
 
 Can you please repost the patch (+ ChangeLog entry) as attachment?
 Or use saner MUA?  When all tabs are eaten and other whitespace crippled,
 it is impossible to look at the formatting.
 
   Jakub


Re: [PATCH v4] PR middle-end/60281

2014-03-03 Thread lin zuojian
Hi Jakub,

On Mon, Mar 03, 2014 at 10:08:53AM +0100, Jakub Jelinek wrote:
 On Mon, Mar 03, 2014 at 05:04:45PM +0800, lin zuojian wrote:
   No.  As I wrote earlier, the alternative is to use unaligned stores for 
   ARM,
   I've asked Lin to benchmark that compared to his patch, but haven't seen
   that done yet.
 
  I have not benchmark yet.But according to what I hear from an ARM Engineer 
  in Huawei,
  unaligned accessing usually slow.And not recommand to use too much.
 
 It is expected it will not be as fast as aligned store, the question is if
 an unaligned 32-bit store is faster than 4 8-bit stores, and/or if the cost of
 the unaligned stores is bad enough (note, usually it is just a few stores in
 the prologue and epilogue) to offset for the penalties introduced by
 realigning the stack (typically one extra register that has to be live, plus
 the cost of the realignment itself).
Will I run some microbenchmarks?Like 
char * a = new char[100];
a++;
int * b = reinterpret_castint*(a);
b[0] = xxx;
b[4] = xxx;
b[1] = xxx;
b[2] = xxx;
b[3] = xxx;
...
I don't know if this is accurate.
 
   Jakub


Re: C++ PATCH for c++/58170 (ICE with alias template)

2014-03-03 Thread Jakub Jelinek
On Sat, Feb 22, 2014 at 12:48:40AM -0500, Jason Merrill wrote:
 There's no reason why we wouldn't check for dependent scopes when
 parsing the target of an alias declaration, and indeed not doing so
 led to the ICE here.
 
 The rest of the patch improves the diagnostic for this testcase (and
 some others).
 
 Tested x86_64-pc-linux-gnu, applying to trunk.  Also applying the
 cp_parser_type_name hunk to 4.8.

This broke the obj-c++.dg/invalid-method-2.mm testcase, fixed thusly,
applied as obvious to the trunk:

2014-03-03  Jakub Jelinek  ja...@redhat.com

PR objc++/60398
* obj-c++.dg/invalid-method-2.mm: Adjust dg-error regexps.

--- gcc/testsuite/obj-c++.dg/invalid-method-2.mm.jj 2012-11-15 
18:21:44.0 +0100
+++ gcc/testsuite/obj-c++.dg/invalid-method-2.mm2014-03-03 
10:26:50.333818063 +0100
@@ -7,11 +7,11 @@
 @end
 
 @implementation MyClass
-- (x) method /* { dg-error expected } */
+- (x) method /* { dg-error expected|type } */
 {
   return 0;
 }
-- (id) method2: (x)argument /* { dg-error expected } */
+- (id) method2: (x)argument /* { dg-error expected|type } */
 {
   return 0;
 }


Jakub


Re: Optimize n?rotate(x,n):x

2014-03-03 Thread Richard Biener
On Sat, Mar 1, 2014 at 3:33 PM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,

 again, a stage 1 patch that I will ping then, but early comments are
 welcome.

 PR 59100 was asking to transform n?rotate(x,n):x to rotate(x,n) (because it
 can be hard to write a strictly valid rotate in plain C). The operation is
 really:
 (x != neutral) ? x op y : y
 where neutral is such that (neutral op y) is always y, so that's what I
 implemented (and absorbing elements while I was at it).

 For some operations on some platforms, the transformation may not be such a
 good idea, in particular if division is very slow and b is 1 most of the
 time, then computing a/b may be slower than (b!=1)?a/b:a. The easiest might
 be to comment out those operations in the switch for now. I think divisions
 are the only ones slow enough to deserve this, though there certainly are
 CPUs where multiplication is not so fast, and even for rotate it may not
 always be a win if the processor doesn't have a rotate instruction and the
 shift amount is almost always 0.

You only handle integer operations, so checking for INTEGER_TYPE_P
early on would make sense.

Note that some archs may dispatch to libgcc for integer operations
so it may make sense to check whether that is the case (you can
query optabs to check that) - if the comparison also dispatches to
libgcc then of course the transform would be a win again.  Even on
x86 TImode division uses libgcc.  Note also that value-profiling
may have created this special case in the first place!
(gimple_divmod_fixed_value_transform)

Otherwise I think this is a good transform.  Did you check if it triggers
during GCC bootstrap?

Thanks,
Richard.

 Passes bootstrap+testsuite on x86_64-linux-gnu.

 2014-03-01  Marc Glisse  marc.gli...@inria.fr

 PR tree-optimization/59100
 gcc/
 * tree-ssa-phiopt.c (neutral_element_p, absorbing_element_p): New
 functions.
 (value_replacement): Handle conditional binary operations with a
 neutral or absorbing element.
 gcc/testsuite/
 * gcc.dg/tree-ssa/phi-opt-12.c: New file.

 --
 Marc Glisse
 Index: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c  (revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c  (working copy)
 @@ -0,0 +1,23 @@
 +/* { dg-do compile } */
 +/* { dg-options -O -fdump-tree-phiopt1 } */
 +
 +int f(int a, int b, int c) {
 +  if (c  5) return c;
 +  if (a == 0) return b;
 +  return a + b;
 +}
 +
 +unsigned rot(unsigned x, int n) {
 +  const int bits = __CHAR_BIT__ * __SIZEOF_INT__;
 +  return (n == 0) ? x : ((x  n) | (x  (bits - n)));
 +}
 +
 +unsigned m(unsigned a, unsigned b) {
 +  if (a == 0)
 +return 0;
 +  else
 +return a  b;
 +}
 +
 +/* { dg-final { scan-tree-dump-times goto 2 phiopt1 } } */
 +/* { dg-final { cleanup-tree-dump phiopt1 } } */

 Property changes on: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c
 ___
 Added: svn:eol-style
 ## -0,0 +1 ##
 +native
 \ No newline at end of property
 Added: svn:keywords
 ## -0,0 +1 ##
 +Author Date Id Revision URL
 \ No newline at end of property
 Index: gcc/tree-ssa-phiopt.c
 ===
 --- gcc/tree-ssa-phiopt.c   (revision 208241)
 +++ gcc/tree-ssa-phiopt.c   (working copy)
 @@ -140,20 +140,37 @@ static bool gate_hoist_loads (void);
 x = PHI (CONST, a)

 Gets replaced with:
   bb0:
   bb2:
 t1 = a == CONST;
 t2 = b  c;
 t3 = t1  t2;
 x = a;

 +
 +   It also replaces
 +
 + bb0:
 +   if (a != 0) goto bb1; else goto bb2;
 + bb1:
 +   c = a + b;
 + bb2:
 +   x = PHI c (bb1), b (bb0), ...;
 +
 +   with
 +
 + bb0:
 +   c = a + b;
 + bb2:
 +   x = PHI c (bb0), ...;
 +
 ABS Replacement
 ---

 This transformation, implemented in abs_replacement, replaces

   bb0:
 if (a = 0) goto bb2; else goto bb1;
   bb1:
 x = -a;
   bb2:
 @@ -809,20 +826,79 @@ operand_equal_for_value_replacement (con
if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp))
  return true;

tmp = gimple_assign_rhs2 (def);
if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp))
  return true;

return false;
  }

 +/* Returns true if ARG is a neutral element for operation CODE
 +   on the RIGHT side.  */
 +
 +static bool
 +neutral_element_p (tree_code code, tree arg, bool right)
 +{
 +  switch (code)
 +{
 +case PLUS_EXPR:
 +case BIT_IOR_EXPR:
 +case BIT_XOR_EXPR:
 +  return integer_zerop (arg);
 +
 +case LROTATE_EXPR:
 +case RROTATE_EXPR:
 +case LSHIFT_EXPR:
 +case RSHIFT_EXPR:
 +case MINUS_EXPR:
 +case POINTER_PLUS_EXPR:
 +  return right  integer_zerop (arg);
 +
 +case MULT_EXPR:
 +  return integer_onep (arg);
 +
 

[Patch, avr] Remove atxmega16x1 from device list

2014-03-03 Thread Senthil Kumar Selvaraj
The atxmega16x1 AVR variant doesn't exist, but the device name got
into the device list in gcc/config/avr/avr-mcus.def. This patch removes the
non-existent device.

If ok, could someone commit please? I don't have commit access.

Regards
Senthil

2014-03-03  Senthil Kumar Selvaraj  senthil_kumar.selva...@atmel.com

* config/avr/avr-mcus.def: Remove atxmega16x1.
* config/avr/avr-tables.opt: Regenerate.
* config/avr/t-multilib: Regenerate.
* doc/avr-mmcu.texi: Regenerate.


diff --git gcc/config/avr/avr-mcus.def gcc/config/avr/avr-mcus.def
index affd9f3..d068f5e 100644
--- gcc/config/avr/avr-mcus.def
+++ gcc/config/avr/avr-mcus.def
@@ -260,7 +260,6 @@ AVR_MCU (atmega2561,   ARCH_AVR6, 
__AVR_ATmega2561__,0, 0, 0
 AVR_MCU (avrxmega2,ARCH_AVRXMEGA2, NULL,   0, 0, 
0x2000, 1, x32a4)
 AVR_MCU (atxmega16a4,  ARCH_AVRXMEGA2, __AVR_ATxmega16A4__,  0, 0, 
0x2000, 1, x16a4)
 AVR_MCU (atxmega16d4,  ARCH_AVRXMEGA2, __AVR_ATxmega16D4__,  0, 0, 
0x2000, 1, x16d4)
-AVR_MCU (atxmega16x1,  ARCH_AVRXMEGA2, __AVR_ATxmega16X1__,  0, 0, 
0x2000, 1, x16x1)
 AVR_MCU (atxmega32a4,  ARCH_AVRXMEGA2, __AVR_ATxmega32A4__,  0, 0, 
0x2000, 1, x32a4)
 AVR_MCU (atxmega32d4,  ARCH_AVRXMEGA2, __AVR_ATxmega32D4__,  0, 0, 
0x2000, 1, x32d4)
 AVR_MCU (atxmega32x1,  ARCH_AVRXMEGA2, __AVR_ATxmega32X1__,  0, 0, 
0x2000, 1, x32x1)
diff --git gcc/config/avr/avr-tables.opt gcc/config/avr/avr-tables.opt
index 90de7e1..b5c6d82 100644
--- gcc/config/avr/avr-tables.opt
+++ gcc/config/avr/avr-tables.opt
@@ -597,173 +597,170 @@ EnumValue
 Enum(avr_mcu) String(atxmega16d4) Value(190)
 
 EnumValue
-Enum(avr_mcu) String(atxmega16x1) Value(191)
+Enum(avr_mcu) String(atxmega32a4) Value(191)
 
 EnumValue
-Enum(avr_mcu) String(atxmega32a4) Value(192)
+Enum(avr_mcu) String(atxmega32d4) Value(192)
 
 EnumValue
-Enum(avr_mcu) String(atxmega32d4) Value(193)
+Enum(avr_mcu) String(atxmega32x1) Value(193)
 
 EnumValue
-Enum(avr_mcu) String(atxmega32x1) Value(194)
+Enum(avr_mcu) String(atmxt112sl) Value(194)
 
 EnumValue
-Enum(avr_mcu) String(atmxt112sl) Value(195)
+Enum(avr_mcu) String(atmxt224) Value(195)
 
 EnumValue
-Enum(avr_mcu) String(atmxt224) Value(196)
+Enum(avr_mcu) String(atmxt224e) Value(196)
 
 EnumValue
-Enum(avr_mcu) String(atmxt224e) Value(197)
+Enum(avr_mcu) String(atmxt336s) Value(197)
 
 EnumValue
-Enum(avr_mcu) String(atmxt336s) Value(198)
+Enum(avr_mcu) String(atxmega16a4u) Value(198)
 
 EnumValue
-Enum(avr_mcu) String(atxmega16a4u) Value(199)
+Enum(avr_mcu) String(atxmega16c4) Value(199)
 
 EnumValue
-Enum(avr_mcu) String(atxmega16c4) Value(200)
+Enum(avr_mcu) String(atxmega32a4u) Value(200)
 
 EnumValue
-Enum(avr_mcu) String(atxmega32a4u) Value(201)
+Enum(avr_mcu) String(atxmega32c4) Value(201)
 
 EnumValue
-Enum(avr_mcu) String(atxmega32c4) Value(202)
+Enum(avr_mcu) String(atxmega32e5) Value(202)
 
 EnumValue
-Enum(avr_mcu) String(atxmega32e5) Value(203)
+Enum(avr_mcu) String(avrxmega4) Value(203)
 
 EnumValue
-Enum(avr_mcu) String(avrxmega4) Value(204)
+Enum(avr_mcu) String(atxmega64a3) Value(204)
 
 EnumValue
-Enum(avr_mcu) String(atxmega64a3) Value(205)
+Enum(avr_mcu) String(atxmega64d3) Value(205)
 
 EnumValue
-Enum(avr_mcu) String(atxmega64d3) Value(206)
+Enum(avr_mcu) String(atxmega64a3u) Value(206)
 
 EnumValue
-Enum(avr_mcu) String(atxmega64a3u) Value(207)
+Enum(avr_mcu) String(atxmega64a4u) Value(207)
 
 EnumValue
-Enum(avr_mcu) String(atxmega64a4u) Value(208)
+Enum(avr_mcu) String(atxmega64b1) Value(208)
 
 EnumValue
-Enum(avr_mcu) String(atxmega64b1) Value(209)
+Enum(avr_mcu) String(atxmega64b3) Value(209)
 
 EnumValue
-Enum(avr_mcu) String(atxmega64b3) Value(210)
+Enum(avr_mcu) String(atxmega64c3) Value(210)
 
 EnumValue
-Enum(avr_mcu) String(atxmega64c3) Value(211)
+Enum(avr_mcu) String(atxmega64d4) Value(211)
 
 EnumValue
-Enum(avr_mcu) String(atxmega64d4) Value(212)
+Enum(avr_mcu) String(avrxmega5) Value(212)
 
 EnumValue
-Enum(avr_mcu) String(avrxmega5) Value(213)
+Enum(avr_mcu) String(atxmega64a1) Value(213)
 
 EnumValue
-Enum(avr_mcu) String(atxmega64a1) Value(214)
+Enum(avr_mcu) String(atxmega64a1u) Value(214)
 
 EnumValue
-Enum(avr_mcu) String(atxmega64a1u) Value(215)
+Enum(avr_mcu) String(avrxmega6) Value(215)
 
 EnumValue
-Enum(avr_mcu) String(avrxmega6) Value(216)
+Enum(avr_mcu) String(atxmega128a3) Value(216)
 
 EnumValue
-Enum(avr_mcu) String(atxmega128a3) Value(217)
+Enum(avr_mcu) String(atxmega128d3) Value(217)
 
 EnumValue
-Enum(avr_mcu) String(atxmega128d3) Value(218)
+Enum(avr_mcu) String(atxmega192a3) Value(218)
 
 EnumValue
-Enum(avr_mcu) String(atxmega192a3) Value(219)
+Enum(avr_mcu) String(atxmega192d3) Value(219)
 
 EnumValue
-Enum(avr_mcu) String(atxmega192d3) Value(220)
+Enum(avr_mcu) String(atxmega256a3) Value(220)
 
 EnumValue
-Enum(avr_mcu) String(atxmega256a3) Value(221)
+Enum(avr_mcu) String(atxmega256a3b) Value(221)
 
 EnumValue
-Enum(avr_mcu) String(atxmega256a3b) Value(222)

Re: calloc = malloc + memset

2014-03-03 Thread Richard Biener
On Fri, Feb 28, 2014 at 11:48 PM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,

 this is a stage 1 patch, and I'll ping it then, but if you have comments
 now...

 Passes bootstrap+testsuite on x86_64-linux-gnu.

 2014-02-28  Marc Glisse  marc.gli...@inria.fr

 PR tree-optimization/57742
 gcc/
 * tree-ssa-forwprop.c (simplify_malloc_memset): New function.
 (simplify_builtin_call): Call it.
 gcc/testsuite/
 * g++.dg/tree-ssa/calloc.C: New testcase.
 * gcc.dg/tree-ssa/calloc.c: Likewise.

 --
 Marc Glisse
 Index: gcc/testsuite/g++.dg/tree-ssa/calloc.C
 ===
 --- gcc/testsuite/g++.dg/tree-ssa/calloc.C  (revision 0)
 +++ gcc/testsuite/g++.dg/tree-ssa/calloc.C  (working copy)
 @@ -0,0 +1,35 @@
 +/* { dg-do compile } */
 +/* { dg-options -std=gnu++11 -O3 -fdump-tree-optimized } */
 +
 +#include new
 +#include vector
 +#include cstdlib
 +
 +void g(void*);
 +inline void* operator new(std::size_t sz) _GLIBCXX_THROW (std::bad_alloc)
 +{
 +  void *p;
 +
 +  if (sz == 0)
 +sz = 1;
 +
 +  // Slightly modified from the libsupc++ version, that one has 2 calls
 +  // to malloc which makes it too hard to optimize.
 +  while ((p = std::malloc (sz)) == 0)
 +{
 +  std::new_handler handler = std::get_new_handler ();
 +  if (! handler)
 +_GLIBCXX_THROW_OR_ABORT(std::bad_alloc());
 +  handler ();
 +}
 +  return p;
 +}
 +
 +void f(void*p,int n){
 +  new(p)std::vectorint(n);
 +}
 +
 +/* { dg-final { scan-tree-dump-times calloc 1 optimized } } */
 +/* { dg-final { scan-tree-dump-not malloc optimized } } */
 +/* { dg-final { scan-tree-dump-not memset optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */

 Property changes on: gcc/testsuite/g++.dg/tree-ssa/calloc.C
 ___
 Added: svn:eol-style
 ## -0,0 +1 ##
 +native
 \ No newline at end of property
 Added: svn:keywords
 ## -0,0 +1 ##
 +Author Date Id Revision URL
 \ No newline at end of property
 Index: gcc/testsuite/gcc.dg/tree-ssa/calloc.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/calloc.c  (revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/calloc.c  (working copy)
 @@ -0,0 +1,29 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-optimized } */
 +
 +#include stdlib.h
 +#include string.h
 +
 +extern int a;
 +extern int* b;
 +int n;
 +void* f(long*q){
 +  int*p=malloc(n);
 +  ++*q;
 +  if(p){
 +++*q;
 +a=2;
 +memset(p,0,n);
 +*b=3;
 +  }
 +  return p;
 +}
 +void* g(void){
 +  float*p=calloc(8,4);
 +  return memset(p,0,32);
 +}
 +
 +/* { dg-final { scan-tree-dump-times calloc 2 optimized } } */
 +/* { dg-final { scan-tree-dump-not malloc optimized } } */
 +/* { dg-final { scan-tree-dump-not memset optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */

 Property changes on: gcc/testsuite/gcc.dg/tree-ssa/calloc.c
 ___
 Added: svn:keywords
 ## -0,0 +1 ##
 +Author Date Id Revision URL
 \ No newline at end of property
 Added: svn:eol-style
 ## -0,0 +1 ##
 +native
 \ No newline at end of property
 Index: gcc/tree-ssa-forwprop.c
 ===
 --- gcc/tree-ssa-forwprop.c (revision 208224)
 +++ gcc/tree-ssa-forwprop.c (working copy)
 @@ -1487,20 +1487,149 @@ constant_pointer_difference (tree p1, tr
  }

for (i = 0; i  cnt[0]; i++)
  for (j = 0; j  cnt[1]; j++)
if (exps[0][i] == exps[1][j])
 return size_binop (MINUS_EXPR, offs[0][i], offs[1][j]);

return NULL_TREE;
  }

 +/* Optimize
 +   ptr = malloc (n);
 +   memset (ptr, 0, n);
 +   into
 +   ptr = calloc (n);
 +   gsi_p is known to point to a call to __builtin_memset.  */
 +static bool
 +simplify_malloc_memset (gimple_stmt_iterator *gsi_p)
 +{
 +  /* First make sure we have:
 + ptr = malloc (n);
 + memset (ptr, 0, n);  */
 +  gimple stmt2 = gsi_stmt (*gsi_p);
 +  if (!integer_zerop (gimple_call_arg (stmt2, 1)))
 +return false;
 +  tree ptr1, ptr2 = gimple_call_arg (stmt2, 0);
 +  tree size = gimple_call_arg (stmt2, 2);
 +  if (TREE_CODE (ptr2) != SSA_NAME)
 +return false;
 +  gimple stmt1 = SSA_NAME_DEF_STMT (ptr2);
 +  tree callee1;
 +  /* Handle the case where STMT1 is a unary PHI, which happends
 + for instance with:
 + while (!(p = malloc (n))) { ... }
 + memset (p, 0, n);  */
 +  if (!stmt1)
 +return false;
 +  if (gimple_code (stmt1) == GIMPLE_PHI
 +   gimple_phi_num_args (stmt1) == 1)
 +{
 +  ptr1 = gimple_phi_arg_def (stmt1, 0);
 +  if (TREE_CODE (ptr1) != SSA_NAME)
 +   return false;
 +  stmt1 = SSA_NAME_DEF_STMT (ptr1);
 +}
 +  else
 +ptr1 = ptr2;
 +  if (!stmt1
 +  || !is_gimple_call (stmt1)
 +  || !(callee1 = gimple_call_fndecl (stmt1)))
 +return false;

That's a bit 

Re: [PATCH] [lto/55113] Fix use of -fshort-double with -flto for powerpc

2014-03-03 Thread Richard Biener
On Sat, Mar 1, 2014 at 11:23 PM, Paulo J. Matos pa...@matos-sorge.com wrote:

 This patch fixes lto/55113 for powerpc.
 Combining -fshort-double with -flto is now working fine.

 I attach patch and testcase (unsure if testcase is in the right place).
 Tested with target powerpc-abispe.


 2014-03-01  Paulo Matos  pa...@matos-sorge.com

 * c-family/c.opt: Add LTO FE support for fshort-double option.
 * tree-streamer.c (record_common_node): Assert we don't record
 nodes with type double.
 (preload_common_node): Skip type double, complex double and
 double pointer since it is now frontend dependent due to
 fshort-double option.

 2014-03-01  Paulo Matos  pa...@matos-sorge.com

 * gcc.target/powerpc/pr55113.c: New testcase.


 OK to commit?

Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 208249)
+++ gcc/c-family/c.opt  (working copy)
@@ -1141,7 +1141,7 @@ C++ ObjC++ Optimization Var(flag_rtti) I
 Generate run time type descriptor information

 fshort-double
-C ObjC C++ ObjC++ Optimization Var(flag_short_double)
+C ObjC C++ ObjC++ LTO Optimization Var(flag_short_double)
 Use the same size for double as for float

This hunk isn't needed.

Index: gcc/testsuite/gcc.target/powerpc/pr55113.c
===
--- gcc/testsuite/gcc.target/powerpc/pr55113.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr55113.c  (working copy)
@@ -0,0 +1,11 @@
+#include stdio.h
+
+int main()
+{
+   static float f;
+   float a = 1.0;
+   float b = 2.0;
+   f = a + b * 1e-12;
+   printf(%f\n, f);
+   return 0;
+}

that doesn't seem to be run with -flto nor -fshort-double.  The proper
place for a testcase is gcc.dg/lto/ with sth like

{ dg-lto-do link }
{ dg-lto-options { { -O2 -fshort-double -flto } } }

and naming the testcase pr55113_0.c.  Your testcase doens't use
doubles at all ... (well, ok, a + b * 1e-12 uses them implicitely, but
for that we have fsingle-precision-constant)

Richard.

 --
 PMatos


[C++ Patch] PR 60376

2014-03-03 Thread Paolo Carlini

Hi,

in this -std=c++1y regression we ICE on invalid code during error recovery:

60376.C: In function ‘void bar()’:
60376.C:8:9: error: expected nested-name-specifier before ‘(’ token
using (A().foo);
^
60376.C:8:9: error: expected unqualified-id before ‘(’ token
60376.C:8:9: error: expected ‘;’ before ‘(’ token
60376.C:8:18: error: statement cannot resolve address of overloaded function
using (A().foo);
^
60376.C:8:9: internal compiler error: ...

I think we can take the occasion to handle the issue early, in the 
parser, thus cleaning up the diagnostic we emit in such cases, that is, 
emit only the first error message (to achieve that I used the same 
approach already used elsewhere: detect the hard error condition and 
skip). Tested x86_64-linux.


Thanks,
Paolo.

PS: Submitter posted also an ICE on valid issue, which seems separate to me.

/
/gcc/cp
2014-03-03  Paolo Carlini  paolo.carl...@oracle.com

PR c++/60376
* parser.c (cp_parser_using_declaration): Early return when
cp_parser_nested_name_specifier errors out.

/gcc/testsuite
2014-03-03  Paolo Carlini  paolo.carl...@oracle.com

PR c++/60376
* g++.dg/cpp1y/pr60376.C: New.

/libstdc++-v3
2014-03-03  Paolo Carlini  paolo.carl...@oracle.com

PR c++/60376
* testsuite/29_atomics/headers/atomic/types_std_c++0x_neg.cc:
Adjust dg-error directives.
Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c (revision 208269)
+++ gcc/cp/parser.c (working copy)
@@ -15932,10 +15932,17 @@ cp_parser_using_declaration (cp_parser* parser,
   /* If we saw `typename', or didn't see `::', then there must be a
  nested-name-specifier present.  */
   if (typename_p || !global_scope_p)
-qscope = cp_parser_nested_name_specifier (parser, typename_p,
- /*check_dependency_p=*/true,
- /*type_p=*/false,
- /*is_declaration=*/true);
+{
+  qscope = cp_parser_nested_name_specifier (parser, typename_p,
+   /*check_dependency_p=*/true,
+   /*type_p=*/false,
+   /*is_declaration=*/true);
+  if (!qscope  !cp_parser_uncommitted_to_tentative_parse_p (parser))
+   {
+ cp_parser_skip_to_end_of_block_or_statement (parser);
+ return false;
+   }
+}
   /* Otherwise, we could be in either of the two productions.  In that
  case, treat the nested-name-specifier as optional.  */
   else
Index: gcc/testsuite/g++.dg/cpp1y/pr60376.C
===
--- gcc/testsuite/g++.dg/cpp1y/pr60376.C(revision 0)
+++ gcc/testsuite/g++.dg/cpp1y/pr60376.C(working copy)
@@ -0,0 +1,12 @@
+// PR c++/60376
+// { dg-options -std=c++1y }
+
+struct A
+{
+  int foo();
+};
+
+templatetypename void bar()
+{
+  using (A().foo);  // { dg-error expected }
+}
Index: libstdc++-v3/testsuite/29_atomics/headers/atomic/types_std_c++0x_neg.cc
===
--- libstdc++-v3/testsuite/29_atomics/headers/atomic/types_std_c++0x_neg.cc 
(revision 208269)
+++ libstdc++-v3/testsuite/29_atomics/headers/atomic/types_std_c++0x_neg.cc 
(working copy)
@@ -121,50 +121,3 @@ void test01()
 // { dg-error expected nested-name-specifier  { target *-*-* } 72 }
 // { dg-error expected nested-name-specifier  { target *-*-* } 73 }
 // { dg-error expected nested-name-specifier  { target *-*-* } 75 }
-
-// { dg-error declared  { target *-*-* } 26 }
-// { dg-error declared  { target *-*-* } 27 }
-// { dg-error declared  { target *-*-* } 28 }
-// { dg-error declared  { target *-*-* } 29 }
-// { dg-error declared  { target *-*-* } 30 }
-// { dg-error declared  { target *-*-* } 31 }
-// { dg-error declared  { target *-*-* } 32 }
-// { dg-error declared  { target *-*-* } 34 }
-// { dg-error declared  { target *-*-* } 36 }
-// { dg-error declared  { target *-*-* } 37 }
-// { dg-error declared  { target *-*-* } 38 }
-// { dg-error declared  { target *-*-* } 39 }
-// { dg-error declared  { target *-*-* } 40 }
-// { dg-error declared  { target *-*-* } 41 }
-// { dg-error declared  { target *-*-* } 42 }
-// { dg-error declared  { target *-*-* } 43 }
-// { dg-error declared  { target *-*-* } 44 }
-// { dg-error declared  { target *-*-* } 45 }
-// { dg-error declared  { target *-*-* } 46 }
-// { dg-error declared  { target *-*-* } 47 }
-// { dg-error declared  { target *-*-* } 48 }
-// { dg-error declared  { target *-*-* } 49 }
-// { dg-error declared  { target *-*-* } 50 }
-// { dg-error declared  { target *-*-* } 52 }
-// { dg-error declared  { target *-*-* } 53 }
-// { dg-error declared  { target *-*-* } 54 }
-// { dg-error declared  { target *-*-* } 55 }
-// { dg-error declared  { 

Re: [RFC] Do not consider volatile asms as optimization barriers #1

2014-03-03 Thread Richard Biener
On Mon, Mar 3, 2014 at 9:01 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Eric Botcazou ebotca...@adacore.com writes:
 Thanks for doing this.  FWIW I agree it's probably the best stop-gap fix.
 But the implication seems to be that unspec_volatile and volatile asms
 are volatile in different ways.  IMO they're volatile in the same way
 and the problems for volatile asms apply to unspec_volatile too.

 I disagree, we need a simple way for the RTL middle-end as well as the back-
 ends to block most optimizations across a specific point (e.g. a non-local
 label as in HP's fix) and UNSPEC_VOLATILE is the best candidate, at least in
 the short term.

 I don't agree it's the best candidate if...

 E.g. although cse.c will flush the table for unspec_volatile,
 it isn't the case that unspec_volatile forces a containing function
 to save all call-saved registers.  That would be excessive for a plain
 blockage instruction.  So again we seem to be assuming one thing in places
 like cse.c and another in the register allocator.  Code that uses the DF
 framework will also assume that registers are not implicitly clobbered
 by an unspec_volatile:
 [...]
 Also, ira-lives.c (which tracks the liveness of both pseudo and hard
 registers) doesn't mention volatile at all.

 Yes, the definition of a blockage instruction is somewhat vague and I agree
 that it shoudn't cause registers to be spilled.  But it needs to block most,
 if not all, optimizations.

 ...it's so loosely defined.  If non-local labels are the specific problem,
 I think it'd be better to limit the flush to that.

non-local labels should block most optimizations by the fact they
are a receiver of control flow and thus should have an abnormal
edge coming into them.  If that's not the case (no abnormal edge)
then that's the bug to fix.

Otherwise I agree with Richard.  Please sit down and _exactly_ define
what 'volatile' in an asm provides for guarantees compared to non-volatile
asms.  Likewise do so for volatile UNSPECs.

A volatile shouldn't be a cheap way out of properly enumerating all
uses, defs and clobbers of a stmt.  If volatile is used to tell the
insn has additional uses/defs or clobbers to those explicitely given
the only reason that may be valid is because we cannot explicitely
enumerate those.  But we should fix that instead (for example with
the special register idea or by adding a middle-end wide special
blockage that you can use/def/clobber).

To better assess the problem at hand can you enumerate the cases
where you need that special easy blockage instruction?  With
testcases please?

Note that on GIMPLE even volatiles are not strictly ordered if they
don't have a dependence that orders them (that doesn't mean that
any existing transform deliberately re-orders them, but as shown
with the loop example below such re-ordering can happen
as a side-effect of a valid transform).

 I'm back to throwing examples around, sorry, but take the MIPS testcase:

 volatile int x = 1;

 void foo (void)
 {
   x = 1;
   __builtin_mips_set_fcsr (0);
   x = 2;
 }

 where __builtin_mips_set_fcsr is a handy way of getting unspec_volatile.
 (I'm not interested in what the function does here.)  Even at -O2,
 the cse.c code successfully prevents %hi(x) from being shared,
 as you'd expect:

 li  $3,1# 0x1
 lui $2,%hi(x)
 sw  $3,%lo(x)($2)
 move$2,$0
 ctc1$2,$31
 li  $3,2# 0x2
 lui $2,%hi(x)
 sw  $3,%lo(x)($2)
 j   $31
 nop

 But put it in a loop:

 void frob (void)
 {
   for (;;)
 {
   x = 1;
   __builtin_mips_set_fcsr (0);
   x = 2;
 }
 }

 and we get the rather bizarre code:

 lui $2,%hi(x)
 li  $6,1# 0x1
 move$5,$0
 move$4,$2
 li  $3,2# 0x2
 .align  3
 .L3:
 sw  $6,%lo(x)($2)
 ctc1$5,$31
 sw  $3,%lo(x)($4)
 j   .L3
 lui $2,%hi(x)

 Here the _second_ %hi(x), the 1 and the 2 have been hoisted but the first
 %hi(x) is reloaded each time.  So what's the correct behaviour here?
 Should the hoisting of the second %hi(x) have been disabled because the
 loop contains an unspec_volatile?  What about the 1 (from the first store)
 and the 2?

 If instead it was:

for (i = 0; i  100; i++)

 would converting to a hardware do-loop be acceptable?

 So most passes assume that no pseudos or hard registers will be
 implicitly clobbered by unspec_volatile, just like for a volatile asm.
 And IMO that's right.  I think the rule should be the same for volatile
 asms and unspec_volatiles, and the same for registers as it already is for
 memory: if the instruction clobbers something, it should say so explicitly.

 IMO that would buy us nothing and, on the contrary, would add 

Re: [RFC] Do not consider volatile asms as optimization barriers #1

2014-03-03 Thread Eric Botcazou
 non-local labels should block most optimizations by the fact they
 are a receiver of control flow and thus should have an abnormal
 edge coming into them.  If that's not the case (no abnormal edge)
 then that's the bug to fix.

It's (of course) more complicated, you need to look at HP's fix and testcase 
to see why we need a full optimization barrier.  See also the prologue and 
epilogue of many architectures which also need a blockage when they are 
establishing or destroying the frame.

 Otherwise I agree with Richard.  Please sit down and _exactly_ define
 what 'volatile' in an asm provides for guarantees compared to non-volatile
 asms.  Likewise do so for volatile UNSPECs.

Too late, we apparently all agree about what volatile asms and future volatile 
UNSPECs mean. :-)  The remaining point is UNSPEC_VOLATILE, but the discussion 
can be deferred until the next stage 1.

 A volatile shouldn't be a cheap way out of properly enumerating all
 uses, defs and clobbers of a stmt.  If volatile is used to tell the
 insn has additional uses/defs or clobbers to those explicitely given
 the only reason that may be valid is because we cannot explicitely
 enumerate those.  But we should fix that instead (for example with
 the special register idea or by adding a middle-end wide special
 blockage that you can use/def/clobber).

For the time being this special blockage is UNSPEC_VOLATILE for RTL.

-- 
Eric Botcazou


Re: [RFC] Do not consider volatile asms as optimization barriers #1

2014-03-03 Thread Richard Sandiford
Eric Botcazou ebotca...@adacore.com writes:
 non-local labels should block most optimizations by the fact they
 are a receiver of control flow and thus should have an abnormal
 edge coming into them.  If that's not the case (no abnormal edge)
 then that's the bug to fix.

 It's (of course) more complicated, you need to look at HP's fix and testcase 
 to see why we need a full optimization barrier.  See also the prologue and 
 epilogue of many architectures which also need a blockage when they are 
 establishing or destroying the frame.

But the prologue/epilogue case often doesn't need to be a full blockage.
We could move a load-immediate instruction -- or even an accesss to known-
global memory -- before the allocation or after the deallocation.  This can
actually be important on architectures that use load-multiple to restore the
return register and want the prefetcher to see the target address as early
as possible.

So I think the prologue and epilogue is one case where we really do want
to spell out what's clobbered by the allocation and deallocation.

Thanks,
Richard


[committed] Add a testcase for PR60400

2014-03-03 Thread Jakub Jelinek
Hi!

I've committed following testcase for an ICE fixed by r200376
on the trunk.  When the fix is backported to 4.8/4.7 branches,
the testcase should be added there too.

2014-03-03  Jakub Jelinek  ja...@redhat.com

PR preprocessor/60400
* c-c++-common/cpp/pr60400.c: New test.
* c-c++-common/cpp/pr60400-1.h: New file.
* c-c++-common/cpp/pr60400-2.h: New file.

--- gcc/testsuite/c-c++-common/cpp/pr60400.c.jj 2014-03-03 11:56:07.524663686 
+0100
+++ gcc/testsuite/c-c++-common/cpp/pr60400.c2014-03-03 11:55:56.476728948 
+0100
@@ -0,0 +1,13 @@
+/* PR preprocessor/60400 */
+/* { dg-do compile } */
+/* { dg-options -trigraphs -Wtrigraphs } */
+
+??=include pr60400-1.h
+??=include pr60400-2.h
+
+/* { dg-warning trigraph  { target *-*-* } 1 } */
+/* { dg-warning trigraph  { target *-*-* } 2 } */
+/* { dg-warning trigraph  { target *-*-* } 3 } */
+/* { dg-warning trigraph  { target *-*-* } 4 } */
+/* { dg-warning trigraph  { target *-*-* } 5 } */
+/* { dg-warning trigraph  { target *-*-* } 6 } */
--- gcc/testsuite/c-c++-common/cpp/pr60400-1.h.jj   2014-03-03 
11:56:01.322701306 +0100
+++ gcc/testsuite/c-c++-common/cpp/pr60400-1.h  2014-03-03 11:52:40.0 
+0100
@@ -0,0 +1,3 @@
+??=ifndef PR60400_1_H
+??=define PR60400_1_H
+??=endif
--- gcc/testsuite/c-c++-common/cpp/pr60400-2.h.jj   2014-03-03 
11:56:04.678681720 +0100
+++ gcc/testsuite/c-c++-common/cpp/pr60400-2.h  2014-03-03 11:52:55.0 
+0100
@@ -0,0 +1,4 @@
+??=ifndef PR60400_2_H
+??=define PR60400_2_H
+??=include pr60400-1.h
+??=endif

Jakub


Re: [RFC] Do not consider volatile asms as optimization barriers #1

2014-03-03 Thread Eric Botcazou
 ...it's so loosely defined.  If non-local labels are the specific problem,
 I think it'd be better to limit the flush to that.

No, there was e.g. written so non-local labels are not the only problem.

 I'm back to throwing examples around, sorry, but take the MIPS testcase:
 
 volatile int x = 1;
 
 void foo (void)
 {
   x = 1;
   __builtin_mips_set_fcsr (0);
   x = 2;
 }
 
 where __builtin_mips_set_fcsr is a handy way of getting unspec_volatile.
 (I'm not interested in what the function does here.)  Even at -O2,
 the cse.c code successfully prevents %hi(x) from being shared,
 as you'd expect:
 
 li  $3,1# 0x1
 lui $2,%hi(x)
 sw  $3,%lo(x)($2)
 move$2,$0
 ctc1$2,$31
 li  $3,2# 0x2
 lui $2,%hi(x)
 sw  $3,%lo(x)($2)
 j   $31
 nop
 
 But put it in a loop:
 
 void frob (void)
 {
   for (;;)
 {
   x = 1;
   __builtin_mips_set_fcsr (0);
   x = 2;
 }
 }
 
 and we get the rather bizarre code:
 
 lui $2,%hi(x)
 li  $6,1# 0x1
 move$5,$0
 move$4,$2
 li  $3,2# 0x2
 .align  3
 .L3:
 sw  $6,%lo(x)($2)
 ctc1$5,$31
 sw  $3,%lo(x)($4)
 j   .L3
 lui $2,%hi(x)
 
 Here the _second_ %hi(x), the 1 and the 2 have been hoisted but the first
 %hi(x) is reloaded each time.  So what's the correct behaviour here?
 Should the hoisting of the second %hi(x) have been disabled because the
 loop contains an unspec_volatile?  What about the 1 (from the first store)
 and the 2?

Well, I personally wouldn't spend much time on the code generated in a loop 
containing an UNSPEC_VOLATILE.  If an instruction or a builtin is supposed to 
be performance-sensitive, then don't use an UNSPEC_VOLATILE by all means and 
properly model it instead!

-- 
Eric Botcazou


Re: [PATCH,GRAPHITE] Fix for P1 bug 58028

2014-03-03 Thread Richard Biener
On Fri, Feb 28, 2014 at 8:37 PM, Mircea Namolaru
mircea.namol...@inria.fr wrote:
 Hi,

 Thanks. Here is the updated patch.

Boostrapped / tested on x86_64-unknown-linux-gnu and applied.

Thanks,
Richard.

 2014-02-26  Tobias Grosser  tob...@grosser.es
 Mircea Namolaru  mircea.namol...@inria.fr

  PR tree-optimization/58028
  * graphite-clast-to-gimple.c (set_cloog_options): Don't remove scalar
dimensions.

 Index: gcc/graphite-clast-to-gimple.c
 ===
 --- gcc/graphite-clast-to-gimple.c  (revision 207298)
 +++ gcc/graphite-clast-to-gimple.c  (working copy)
 @@ -1522,6 +1522,13 @@
   variables.  */
options-save_domains = 1;

 +  /* Do not remove scalar dimensions.  CLooG by default removes scalar
 + dimensions very early from the input schedule.  However, they are
 + necessary to correctly derive from the saved domains
 + (options-save_domains) the relationship between the generated loops
 + and the schedule dimensions they are generated from.  */
 +  options-noscalars = 1;
 +
/* Disable optimizations and make cloog generate source code closer to the
   input.  This is useful for debugging,  but later we want the optimized
   code.

 Mircea


Re: [PATCH,GRAPHITE] Fix for P1 bug 58028

2014-03-03 Thread Tobias Grosser

On 03/03/2014 12:39 PM, Richard Biener wrote:

On Fri, Feb 28, 2014 at 8:37 PM, Mircea Namolaru
mircea.namol...@inria.fr wrote:

Hi,

Thanks. Here is the updated patch.


Boostrapped / tested on x86_64-unknown-linux-gnu and applied.


Thanks Richard!

Tobias



Re: [PING][PATCH][AARCH64]Resolves testsuite/gcc.target/aarch64/aapcs64/ret-func-1.c regression

2014-03-03 Thread Kyrill Tkachov

On 27/02/14 12:33, Marcus Shawcroft wrote:

On 24 February 2014 09:49, Renlin Li renlin...@arm.com wrote:


gcc/testsuite/ChangeLog:

2014-02-03  Renlin Li renlin...@arm.com

 * gcc.target/aarch64/aapcs64/validate_memory.h: Move f32in64 and
i32in128 cases
 outside special big-endian processing block.


This is is a fix for a broken test case, this is OK.


Hi all,

I've committed this as r208275 with ChangeLog entry:

2014-03-03  Renlin Li  renlin...@arm.com

* gcc.target/aarch64/aapcs64/validate_memory.h: Move f32in64 and
i32in128 cases outside special big-endian processing block.


/Marcus






[testsuite, i386] Fix gcc.target/i386/prefetchwt1-1.c on Solaris 9/x86

2014-03-03 Thread Rainer Orth
The new gcc.target/i386/prefetchwt1-1.c test currently FAILs on Solaris 9/x86:

FAIL: gcc.target/i386/prefetchwt1-1.c (test for excess errors)
Excess errors:
/var/gcc/regression/trunk/9-gcc-gas/build/gcc/include/xmmintrin.h:1195:1: error:
 inlining failed in call to always_inline '_mm_prefetch': target specific option
 mismatch
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/prefetchwt1-1.c:12:5:
error: called from here

gcc.target/i386/prefetchwt1-1.c: output file does not exist
UNRESOLVED: gcc.target/i386/prefetchwt1-1.c scan-assembler [ \\t]+prefetchwt1[ \
\t]+

This can be fixed by compiling with -msse2.

Tested with the appropriate runtest invocation on i386-pc-solaris2.9,
i386-pc-solaris2.11, and x86_64-unknown-linux-gnu.

Ok for mainline?

Rainer


2014-03-03  Rainer Orth  r...@cebitec.uni-bielefeld.de

* gcc.target/i386/prefetchwt1-1.c: Add -msse2 to dg-options.

# HG changeset patch
# Parent 0b7661d6f1837948f92eeafc1be90622bf52c7f5
Fix gcc.target/i386/prefetchwt1-1.c on Solaris 9/x86

diff --git a/gcc/testsuite/gcc.target/i386/prefetchwt1-1.c b/gcc/testsuite/gcc.target/i386/prefetchwt1-1.c
--- a/gcc/testsuite/gcc.target/i386/prefetchwt1-1.c
+++ b/gcc/testsuite/gcc.target/i386/prefetchwt1-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -mprefetchwt1 -O2 } */
+/* { dg-options -msse2 -mprefetchwt1 -O2 } */
 /* { dg-final { scan-assembler \[ \\t\]+prefetchwt1\[ \\t\]+ } } */
 
 #include x86intrin.h

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[testsuite, g++] Fix g++.dg/abi/anon2.C with -std=c++98

2014-03-03 Thread Rainer Orth
The new g++.dg/abi/anon2.C test currently causes lots of noise in
mail-report.log:

UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler .weak(_definition)?[ 
\\t]_?_ZN2N11D1C3fn1ENS0_1BE
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler .weak(_definition)?[ 
\\t]_?_ZN2N11D1C3fn2ES1_
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
.weak(_definition)?[ \\t]_?_ZN2N23._31C3fn1ENS0_1BE
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
.weak(_definition)?[ \\t]_?_ZN2N23._31C3fn2ES1_
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler .weak(_definition)?[ 
\\t]_?_ZN2N31D1CIiE3fn1ENS0_1BE
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler .weak(_definition)?[ 
\\t]_?_ZN2N31D1CIiE3fn2ES2_
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
.weak(_definition)?[ \\t]_?_ZN2N43._91CIiE3fn1ENS0_1BE
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
.weak(_definition)?[ \\t]_?_ZN2N43._91CIiE3fn2ES2_

Since the file doesn't compile with -std=c++98, the assembler scans
should be restricted to ! c++98.  The following patch does just that.

Tested with the appropriate runtest invocations on i386-pc-solaris2.11
and x86_64-unknown-linux-gnu.

Ok for mainline?

Rainer


2014-03-03  Rainer Orth  r...@cebitec.uni-bielefeld.de

* g++.dg/abi/anon2.C: Don't scan assembler for c++98.

# HG changeset patch
# Parent eb8ef98d9271199f43ddfa3c64f85f2fda1a
Fix g++.dg/abi/anon2.C with -std=c++98

diff --git a/gcc/testsuite/g++.dg/abi/anon2.C b/gcc/testsuite/g++.dg/abi/anon2.C
--- a/gcc/testsuite/g++.dg/abi/anon2.C
+++ b/gcc/testsuite/g++.dg/abi/anon2.C
@@ -6,9 +6,9 @@ namespace N1 {
 typedef enum { X, Y } A;
 typedef struct { } B;
 struct C {
-  // { dg-final { scan-assembler .weak\(_definition\)?\[ \t\]_?_ZN2N11D1C3fn1ENS0_1BE } }
+  // { dg-final { scan-assembler .weak\(_definition\)?\[ \t\]_?_ZN2N11D1C3fn1ENS0_1BE { target { ! c++98 } } } }
   static void fn1 (B) { }
-  // { dg-final { scan-assembler .weak\(_definition\)?\[ \t\]_?_ZN2N11D1C3fn2ES1_ } }
+  // { dg-final { scan-assembler .weak\(_definition\)?\[ \t\]_?_ZN2N11D1C3fn2ES1_ { target { ! c++98 } } } }
   static void fn2 (C) { }
 };
   } D;
@@ -22,9 +22,9 @@ namespace N2 {
 typedef enum { X, Y } A;
 typedef struct { } B;
 struct C {
-  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ \t\]_?_ZN2N23._31C3fn1ENS0_1BE } }
+  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ \t\]_?_ZN2N23._31C3fn1ENS0_1BE { target { ! c++98 } } } }
   static void fn1 (B) { } // { dg-error no linkage  { target c++98 } }
-  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ \t\]_?_ZN2N23._31C3fn2ES1_ } }
+  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ \t\]_?_ZN2N23._31C3fn2ES1_ { target { ! c++98 } } } }
   static void fn2 (C) { } // { dg-error no linkage  { target c++98 } }
 };
   } const D;
@@ -38,9 +38,9 @@ namespace N3 {
 typedef enum { X, Y } A;
 typedef struct { } B;
 template class T struct C {
-  // { dg-final { scan-assembler .weak\(_definition\)?\[ \t\]_?_ZN2N31D1CIiE3fn1ENS0_1BE } }
+  // { dg-final { scan-assembler .weak\(_definition\)?\[ \t\]_?_ZN2N31D1CIiE3fn1ENS0_1BE { target { ! c++98 } } } }
   static void fn1 (B) { }
-  // { dg-final { scan-assembler .weak\(_definition\)?\[ \t\]_?_ZN2N31D1CIiE3fn2ES2_ } }
+  // { dg-final { scan-assembler .weak\(_definition\)?\[ \t\]_?_ZN2N31D1CIiE3fn2ES2_ { target { ! c++98 } } } }
   static void fn2 (C) { }
 };
   } D;
@@ -54,9 +54,9 @@ namespace N4 {
 typedef enum { X, Y } A;
 typedef struct { } B;
 template class T struct C {
-  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ \t\]_?_ZN2N43._91CIiE3fn1ENS0_1BE } }
+  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ \t\]_?_ZN2N43._91CIiE3fn1ENS0_1BE { target { ! c++98 } } } }
   static void fn1 (B) { } // { not-dg-error no linkage  { target c++98 } }
-  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ \t\]_?_ZN2N43._91CIiE3fn2ES2_ } }
+  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ \t\]_?_ZN2N43._91CIiE3fn2ES2_ { target { ! c++98 } } } }
   static void fn2 (C) { } // { not-dg-error no linkage  { target c++98 } }
 };
   } const D;

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PING] [PATCH ARM]: Fix more -mapcs-frame failures

2014-03-03 Thread Christian Bruel
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01402.html

fixes -mapcs-frame -g ICEs.

ok for trunk ?





Re: [PATCH, LIBITM] Backport libitm bug fixes to FSF 4.8

2014-03-03 Thread Torvald Riegel
On Fri, 2014-02-28 at 19:32 -0600, Peter Bergner wrote:
 I'd like to ask for permission to backport the following two LIBITM bug
 fixes to the FSF 4.8 branch.  Although these are not technically fixing
 regressions, they do fix the libitm.c/reentrant.c testsuite failure on
 s390 and powerpc (or at least it will when we finally get our power8
 code backported to FSF 4.8).  It also fixes a real bug on x86 that is
 latent because we don't currently have a test case that warms up the
 x86's RTM hardware enough such that its xbegin succeeds exposing the
 bug.  I'd like this backport so that the 4.8 based distros won't need
 to carry this as an add-on patch.
 
 It should also be fairly safe as well, since the fixed code is limited
 to the arches (x86, s390 and powerpc) that define USE_HTM_FASTPATH,
 so all others definitely won't see a difference.

Looks good to me.

 I'll note I CC'd some of the usual suspects interested in TM as well
 as the normal RMs, because LIBITM doesn't seem to have a maintainer
 or reviewer listed in the MAINTAINERS file.  Is that an oversight or???

I'm reviewing all libitm patches that I'm aware of (but I don't read
gcc-patches regularly).  Should I add myself as maintainer for libitm?
Does this come with any other responsibilities than reviewing patches?

Torvald



[Patch] Try to peephole2 QI *4 into SI on i386

2014-03-03 Thread lin zuojian

---
 gcc/config/i386/i386.md | 49 +
 1 file changed, 49 insertions(+)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b9f1320..86ab025 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18535,6 +18535,55 @@
   [(set_attr type other)
(set_attr length 3)])
 
+(define_peephole2
+ [(set (mem:QI (match_operand 0 register_operand))
+   (match_operand 1 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(const_int 1)))
+   (match_operand 2 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(const_int 2)))
+   (match_operand 3 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(const_int 3)))
+   (match_operand 4 const_int_operand))]
+   
+   [(set (mem:SI (match_dup 0))
+   (match_operand 5 const_int_operand))]
+{
+   int32_t _const = (INTVAL(operands[1])  0xff) | (((INTVAL(operands[2])) 
 0xff)   8)
+   
| (((INTVAL(operands[3]))  0xff)   16) | 
(((INTVAL(operands[4]))  0xff)   24);
+   operands[5] = gen_rtx_CONST_INT (SImode, _const);
+}
+)
+
+(define_peephole2
+ [(set (mem:QI (plus (match_operand 0 register_operand)
+(match_operand 6 
const_int_operand)))
+   (match_operand 1 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(match_operand 7 
const_int_operand)))
+   (match_operand 2 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(match_operand 8 
const_int_operand)))
+   (match_operand 3 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(match_operand 9 
const_int_operand)))
+   (match_operand 4 const_int_operand))]
+   
+   [(set (mem:SI (match_dup 0))
+   (match_operand 5 const_int_operand))]
+{
+   if ((INTVAL(operands[7]) - INTVAL(operands[6]) != 1)
+(INTVAL(operands[8]) - INTVAL(operands[7]) != 1)
+(INTVAL(operands[9]) - INTVAL(operands[8]) != 1))
+   FAIL;
+   int32_t _const = (INTVAL(operands[1])  0xff) | (((INTVAL(operands[2])) 
 0xff)   8)
+   
| (((INTVAL(operands[3]))  0xff)   16) | 
(((INTVAL(operands[4]))  0xff)   24);
+   operands[5] = gen_rtx_CONST_INT (SImode, _const);
+}
+)
+
 (include mmx.md)
 (include sse.md)
 (include sync.md)
-- 
1.8.3.2


- End forwarded message -


[PATCH v5] PR middle-end/60281

2014-03-03 Thread lin zuojian
Hi,
In this patch I only add ChangeLog.
--
Without aligning the asan stack base,this base will only 64-bit aligned in ARM 
machines.
But asan require 256-bit aligned base because of this:
1.right shift take ASAN_SHADOW_SHIFT(which is 3) bits are zeros
2.store multiple/load multiple instructions require the other 2 bits are zeros

that add up lowest 5 bits should be zeros.That means 32 bytes or 256 bits 
aligned.

* asan.c (asan_emit_stack_protection): Forcing the base to align to
appropriate bits if STRICT_ALIGNMENT.And set shadow_mem align to
appropriate bits if STRICT_ALIGNMENT.
* cfgexpand.c (expand_stack_vars): set base_align appropriately when 
asan is
on.
(expand_used_vars): Leaving a space in the stack frame for alignment if
STRICT_ALIGNMENT.

Signed-off-by: lin zuojian manjian2...@gmail.com
---
 gcc/ChangeLog   | 10 ++
 gcc/asan.c  | 14 ++
 gcc/cfgexpand.c | 13 -
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c056946..c00bfc4 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2014-03-03  lin zuojian  manjian2...@gmail.com
+   Fix PR middle-end/60281
+   * asan.c (asan_emit_stack_protection): Forcing the base to align to
+   appropriate bits if STRICT_ALIGNMENT.And set shadow_mem align to 
+   appropriate bits if STRICT_ALIGNMENT.
+   * cfgexpand.c (expand_stack_vars): set base_align appropriately when 
asan is
+   on.
+   (expand_used_vars): Leaving a space in the stack frame for alignment if
+   STRICT_ALIGNMENT.
+
 2014-03-02  Jan Hubicka  hubi...@ucw.cz
 
PR ipa/60150
diff --git a/gcc/asan.c b/gcc/asan.c
index 53992a8..64898cd 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1017,8 +1017,16 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
unsigned int alignb,
base_align_bias = ((asan_frame_size + alignb - 1)
~(alignb - HOST_WIDE_INT_1)) - asan_frame_size;
 }
+  /* Align base if target is STRICT_ALIGNMENT.  */
+  if (STRICT_ALIGNMENT)
+ base = expand_binop (Pmode, and_optab, base,
+  gen_int_mode (-((GET_MODE_ALIGNMENT (SImode)  
ASAN_SHADOW_SHIFT) / BITS_PER_UNIT), Pmode),
+  NULL_RTX, 1, OPTAB_DIRECT);
+
   if (use_after_return_class == -1  pbase)
 emit_move_insn (pbase, base);
+
+
   base = expand_binop (Pmode, add_optab, base,
   gen_int_mode (base_offset - base_align_bias, Pmode),
   NULL_RTX, 1, OPTAB_DIRECT);
@@ -1097,6 +1105,8 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned 
int alignb,
   (ASAN_RED_ZONE_SIZE  ASAN_SHADOW_SHIFT) == 4);
   shadow_mem = gen_rtx_MEM (SImode, shadow_base);
   set_mem_alias_set (shadow_mem, asan_shadow_set);
+  if (STRICT_ALIGNMENT)
+ set_mem_align(shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
   prev_offset = base_offset;
   for (l = length; l; l -= 2)
 {
@@ -1186,6 +1196,10 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
unsigned int alignb,
 
   shadow_mem = gen_rtx_MEM (BLKmode, shadow_base);
   set_mem_alias_set (shadow_mem, asan_shadow_set);
+
+  if (STRICT_ALIGNMENT)
+ set_mem_align(shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
+
   prev_offset = base_offset;
   last_offset = base_offset;
   last_size = 0;
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 06d494c..14fd1c2 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1013,10 +1013,18 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)
  if (data-asan_base == NULL)
data-asan_base = gen_reg_rtx (Pmode);
  base = data-asan_base;
+
+ if (!STRICT_ALIGNMENT)
+   base_align = crtl-max_used_stack_slot_alignment;
+ else
+   base_align = MAX(crtl-max_used_stack_slot_alignment,
+   (GET_MODE_ALIGNMENT (SImode)  
ASAN_SHADOW_SHIFT));
}
  else
+   {
offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
- base_align = crtl-max_used_stack_slot_alignment;
+   base_align = crtl-max_used_stack_slot_alignment;
+   }
}
   else
{
@@ -1843,6 +1851,9 @@ expand_used_vars (void)
= alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE);
  data.asan_vec.safe_push (prev_offset);
  data.asan_vec.safe_push (offset);
+ /* Leave a space for alignment if STRICT_ALIGNMENT.  */
+ if (STRICT_ALIGNMENT)
+   alloc_stack_frame_space ((GET_MODE_ALIGNMENT (SImode)  
ASAN_SHADOW_SHIFT) / BITS_PER_UNIT , 1);
 
  var_end_seq
= asan_emit_stack_protection (virtual_stack_vars_rtx,
-- 
1.8.3.2




Re: [RFC] Do not consider volatile asms as optimization barriers #1

2014-03-03 Thread Richard Sandiford
Eric Botcazou ebotca...@adacore.com writes:
 ...it's so loosely defined.  If non-local labels are the specific problem,
 I think it'd be better to limit the flush to that.

 No, there was e.g. written so non-local labels are not the only problem.

What are the others though?  As discussed in the other subthread,
I don't think prologue and epilogue barriers are quite the same.

 I'm back to throwing examples around, sorry, but take the MIPS testcase:
 
 volatile int x = 1;
 
 void foo (void)
 {
   x = 1;
   __builtin_mips_set_fcsr (0);
   x = 2;
 }
 
 where __builtin_mips_set_fcsr is a handy way of getting unspec_volatile.
 (I'm not interested in what the function does here.)  Even at -O2,
 the cse.c code successfully prevents %hi(x) from being shared,
 as you'd expect:
 
 li  $3,1# 0x1
 lui $2,%hi(x)
 sw  $3,%lo(x)($2)
 move$2,$0
 ctc1$2,$31
 li  $3,2# 0x2
 lui $2,%hi(x)
 sw  $3,%lo(x)($2)
 j   $31
 nop
 
 But put it in a loop:
 
 void frob (void)
 {
   for (;;)
 {
   x = 1;
   __builtin_mips_set_fcsr (0);
   x = 2;
 }
 }
 
 and we get the rather bizarre code:
 
 lui $2,%hi(x)
 li  $6,1# 0x1
 move$5,$0
 move$4,$2
 li  $3,2# 0x2
 .align  3
 .L3:
 sw  $6,%lo(x)($2)
 ctc1$5,$31
 sw  $3,%lo(x)($4)
 j   .L3
 lui $2,%hi(x)
 
 Here the _second_ %hi(x), the 1 and the 2 have been hoisted but the first
 %hi(x) is reloaded each time.  So what's the correct behaviour here?
 Should the hoisting of the second %hi(x) have been disabled because the
 loop contains an unspec_volatile?  What about the 1 (from the first store)
 and the 2?

 Well, I personally wouldn't spend much time on the code generated in a loop 
 containing an UNSPEC_VOLATILE.  If an instruction or a builtin is supposed to 
 be performance-sensitive, then don't use an UNSPEC_VOLATILE by all means and 
 properly model it instead!

That doesn't really answer the question though.  What's the correct
behaviour for an unspec volatile in a loop?  I don't think it's what
we did in the example above, since it doesn't seem self-consistent.
And not spending too much time is again a bit vague in terms of
saying what's right and what's wrong.

My point is that if the construct is well-defined enough to handle the
important things we want it to handle, the answer should be known to somebody,
even if it isn't to me. :-)

Thanks,
Richard



Re: [PATCH v5] PR middle-end/60281

2014-03-03 Thread Jakub Jelinek
On Mon, Mar 03, 2014 at 09:10:28PM +0800, lin zuojian wrote:
 +2014-03-03  lin zuojian  manjian2...@gmail.com

Capital letters at the beginning of name/surname please.
One empty line after the date/name/mail line.

 + Fix PR middle-end/60281

Remove Fix , just
PR middle-end/60281
 + * asan.c (asan_emit_stack_protection): Forcing the base to align to

Force instead of Forcing.

 + appropriate bits if STRICT_ALIGNMENT.And set shadow_mem align to 

Two spaces after dot, remove And, just write .  Set shadow_mem align ...

 + appropriate bits if STRICT_ALIGNMENT.
 + * cfgexpand.c (expand_stack_vars): set base_align appropriately when 
 asan is

Capital S in Set.  Too long line (at most 80 chars), so wrap after when?

 + on.
 + (expand_used_vars): Leaving a space in the stack frame for alignment if
 + STRICT_ALIGNMENT.

Leave.

 index 53992a8..64898cd 100644
 --- a/gcc/asan.c
 +++ b/gcc/asan.c
 @@ -1017,8 +1017,16 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
 unsigned int alignb,
   base_align_bias = ((asan_frame_size + alignb - 1)
   ~(alignb - HOST_WIDE_INT_1)) - asan_frame_size;
  }
 +  /* Align base if target is STRICT_ALIGNMENT.  */
 +  if (STRICT_ALIGNMENT)
 +   base = expand_binop (Pmode, and_optab, base,
 +gen_int_mode (-((GET_MODE_ALIGNMENT (SImode)  
 ASAN_SHADOW_SHIFT) / BITS_PER_UNIT), Pmode),
 +NULL_RTX, 1, OPTAB_DIRECT);

Wrong formatting.  base should start 2 spaces after if, so something like:
  if (STRICT_ALIGNMENT)
base = expand_binop (Pmode, and_optab, base,
 gen_int_mode (-((GET_MODE_ALIGNMENT (SImode)
   ASAN_SHADOW_SHIFT)
 / BITS_PER_UNIT), Pmode), NULL_RTX,
 1, OPTAB_DIRECT);

 +
if (use_after_return_class == -1  pbase)
  emit_move_insn (pbase, base);
 +
 +

If you really want, just add one vertical space here, certainly not two.

base = expand_binop (Pmode, add_optab, base,
  gen_int_mode (base_offset - base_align_bias, Pmode),
  NULL_RTX, 1, OPTAB_DIRECT);
 @@ -1097,6 +1105,8 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
 unsigned int alignb,
  (ASAN_RED_ZONE_SIZE  ASAN_SHADOW_SHIFT) == 4);
shadow_mem = gen_rtx_MEM (SImode, shadow_base);
set_mem_alias_set (shadow_mem, asan_shadow_set);
 +  if (STRICT_ALIGNMENT)
 +   set_mem_align(shadow_mem, (GET_MODE_ALIGNMENT (SImode)));

set_mem_align should be indented by 4 spaces, missing space before opening (
after set_mem_align.

 @@ -1186,6 +1196,10 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
 unsigned int alignb,
  
shadow_mem = gen_rtx_MEM (BLKmode, shadow_base);
set_mem_alias_set (shadow_mem, asan_shadow_set);
 +
 +  if (STRICT_ALIGNMENT)
 +   set_mem_align(shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
 +

Likewise.
 @@ -1013,10 +1013,18 @@ expand_stack_vars (bool (*pred) (size_t), struct 
 stack_vars_data *data)
 if (data-asan_base == NULL)
   data-asan_base = gen_reg_rtx (Pmode);
 base = data-asan_base;
 +
 +   if (!STRICT_ALIGNMENT)
 + base_align = crtl-max_used_stack_slot_alignment;
 +   else
 + base_align = MAX(crtl-max_used_stack_slot_alignment,
 + (GET_MODE_ALIGNMENT (SImode)  
 ASAN_SHADOW_SHIFT));

Wrong formatting, if should be aligned below base = data-asan_base,
base_align = two spaces more.  Space in between MAX and (, no need for the
extra parens around the .
   }
 else
 + {
   offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
 -   base_align = crtl-max_used_stack_slot_alignment;
 + base_align = crtl-max_used_stack_slot_alignment;
 + }

Again, wrong indentation.
   }
else
   {
 @@ -1843,6 +1851,9 @@ expand_used_vars (void)
   = alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE);
 data.asan_vec.safe_push (prev_offset);
 data.asan_vec.safe_push (offset);
 +   /* Leave a space for alignment if STRICT_ALIGNMENT.  */
 +   if (STRICT_ALIGNMENT)
 + alloc_stack_frame_space ((GET_MODE_ALIGNMENT (SImode)  
 ASAN_SHADOW_SHIFT) / BITS_PER_UNIT , 1);

Likewise, too long line, which needs to be split, and extra space before ,

Jakub


Re: copyright dates in binutils (and includes/)

2014-03-03 Thread Richard Sandiford
Alan Modra amo...@gmail.com writes:
 On Mon, Mar 03, 2014 at 02:14:51PM +1030, Alan Modra wrote:
 I'll post update-copyright.py separately.

Thanks for doing this, looks good to me FWIW.  I don't know whether
we want to keep a single script for both GCC and binutils+gdb or fork,
but probably separate copies makes sense.

As far as including include/ goes: the only reason that didn't happen
for gcc/ was because I didn't want to sort out which files were GCC-
specific and whether binutils, GCC or GDB was the master for each file.
So if we do your option (b) I think we should do (c) as well.  I suppose
that means syncing GCC's include/ with binutils+gdb and then adding
GCC's include/ to the list of approved directories.  I'm happy to try
that if it sounds OK.  (Maybe after the 4.9 release, not sure.)

Thanks,
Richard



Re: calloc = malloc + memset

2014-03-03 Thread Marc Glisse

On Mon, 3 Mar 2014, Richard Biener wrote:


That's a bit much of ad-hoc pattern-matching ... wouldn't be
p = malloc (n);
memset (p, 0, n);

transform better suited to the strlen opt pass?  After all that tracks
what 'string' is associated with a SSA name pointer through
arbitrary satements using a lattice.


Too early, it needs to run later than ldist, or there won't be any
memset to match in the std::vector case. Would you consider moving or
duplicating either strlen or ldist so they are run in the order I need?


The same probably applies to calloc(); memset (, 0,);


Oh, you mean the length doesn't have to match for calloc? That's true, I
completely missed that.


though here you
could even match points-to info (after all even only clearing a portion of
the calloc()ed memory is dead code).  If points-to conservatively computes
that the memset pointer only points to null or the memory tag the
calloc return value points to then you can discard it without further
checking ...


I'll look into it (and DSE). Note that the calloc case is just an
afterthought, what I really care about is replacing malloc.


+  /* Finally, make sure the memory is not used before stmt2.  */
+  ao_ref ref;
+  ao_ref_init_from_ptr_and_size (ref, ptr1, size);
+  tree vdef = gimple_vuse (stmt2);
+  if (vdef == NULL)
+return false;
+  while (true)
+{
+  gimple cur = SSA_NAME_DEF_STMT (vdef);
+  if (cur == stmt1) break;
+  if (stmt_may_clobber_ref_p_1 (cur, ref))
+   return false;
+  vdef = gimple_vuse (cur);
+}


We have walk_aliased_vdefs() for this.


As explained in the PR, walk_aliased_vdefs misses the call to malloc (it 
doesn't clobber the memory pointed to by p). You then suggested:
Exact pattern matching of the CFG involved might be the easiest, plus 
manually implementing walk_aliased_vdefs by simply walking the use-def 
chain of the virtual operands from the memset operation to the malloc and 
checking stmt_may_clobber_ref_p_1 on the ao_ref_init_from_ptr_and_size 
ref.



That said, please try to integrate this kind of transforms with
the strlen opt pass (even if it requires making its lattice more generic).


Assuming the passes have a chance of being reordered, I'll try to 
understand how strlen works.


Thanks for the comments,

--
Marc Glisse


[Patch v2] Try to peephole2 QI *4 into SI on i386

2014-03-03 Thread lin zuojian
Hi,
Patch v1 is just wrong.
--
Regards
lin zuojian

---
 gcc/config/i386/i386.md | 50 +
 1 file changed, 50 insertions(+)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b9f1320..e44fb14 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18535,6 +18535,56 @@
   [(set_attr type other)
(set_attr length 3)])
 
+(define_peephole2
+ [(set (mem:QI (match_operand 0 register_operand))
+   (match_operand 1 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(const_int 1)))
+   (match_operand 2 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(const_int 2)))
+   (match_operand 3 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(const_int 3)))
+   (match_operand 4 const_int_operand))]
+   
+   [(set (mem:SI (match_dup 0))
+   (match_operand 5 const_int_operand))]
+{
+   int32_t _const = (INTVAL(operands[1])  0xff) | (((INTVAL(operands[2])) 
 0xff)   8)
+   
| (((INTVAL(operands[3]))  0xff)   16) | 
(((INTVAL(operands[4]))  0xff)   24);
+   operands[5] = gen_rtx_CONST_INT (SImode, _const);
+}
+)
+
+(define_peephole2
+ [(set (mem:QI (plus (match_operand 0 register_operand)
+(match_operand 6 
const_int_operand)))
+   (match_operand 1 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(match_operand 7 
const_int_operand)))
+   (match_operand 2 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(match_operand 8 
const_int_operand)))
+   (match_operand 3 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(match_operand 9 
const_int_operand)))
+   (match_operand 4 const_int_operand))]
+   
+   [(set (mem:SI (plus (match_dup 0)
+   (match_dup 6)))
+   (match_operand 5 const_int_operand))]
+{
+   if ((INTVAL(operands[7]) - INTVAL(operands[6]) != 1)
+(INTVAL(operands[8]) - INTVAL(operands[7]) != 1)
+(INTVAL(operands[9]) - INTVAL(operands[8]) != 1))
+   FAIL;
+   int32_t _const = (INTVAL(operands[1])  0xff) | (((INTVAL(operands[2])) 
 0xff)   8)
+   
| (((INTVAL(operands[3]))  0xff)   16) | 
(((INTVAL(operands[4]))  0xff)   24);
+   operands[5] = gen_rtx_CONST_INT (SImode, _const);
+}
+)
+
 (include mmx.md)
 (include sse.md)
 (include sync.md)
-- 
1.8.3.2



Re: C++ PATCH for c++/55877 (names for linkage purposes)

2014-03-03 Thread Dominique Dhumieres
The test g++.dg/abi/anon2.C introduced at r208157 causes:

UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler .weak(_definition)?[ 
\\t]_?_ZN2N11D1C3fn1ENS0_1BE
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler .weak(_definition)?[ 
\\t]_?_ZN2N11D1C3fn2ES1_
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
.weak(_definition)?[ \\t]_?_ZN2N23._31C3fn1ENS0_1BE
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
.weak(_definition)?[ \\t]_?_ZN2N23._31C3fn2ES1_
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler .weak(_definition)?[ 
\\t]_?_ZN2N31D1CIiE3fn1ENS0_1BE
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler .weak(_definition)?[ 
\\t]_?_ZN2N31D1CIiE3fn2ES2_
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
.weak(_definition)?[ \\t]_?_ZN2N43._91CIiE3fn1ENS0_1BE
UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
.weak(_definition)?[ \\t]_?_ZN2N43._91CIiE3fn2ES2_

This is silenced by the following patch

--- ../_clean/gcc/testsuite/g++.dg/abi/anon2.C  2014-02-26 10:46:45.0 
+0100
+++ gcc/testsuite/g++.dg/abi/anon2.C2014-03-01 14:45:54.0 +0100
@@ -6,9 +6,9 @@ namespace N1 {
 typedef enum { X, Y } A;
 typedef struct { } B;
 struct C {
-  // { dg-final { scan-assembler .weak\(_definition\)?\[ 
\t\]_?_ZN2N11D1C3fn1ENS0_1BE } }
+  // { dg-final { scan-assembler .weak\(_definition\)?\[ 
\t\]_?_ZN2N11D1C3fn1ENS0_1BE { target c++11 } } }
   static void fn1 (B) { }
-  // { dg-final { scan-assembler .weak\(_definition\)?\[ 
\t\]_?_ZN2N11D1C3fn2ES1_ } }
+  // { dg-final { scan-assembler .weak\(_definition\)?\[ 
\t\]_?_ZN2N11D1C3fn2ES1_ { target c++11 } } }
   static void fn2 (C) { }
 };
   } D;
@@ -22,9 +22,9 @@ namespace N2 {
 typedef enum { X, Y } A;
 typedef struct { } B;
 struct C {
-  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ 
\t\]_?_ZN2N23._31C3fn1ENS0_1BE } }
+  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ 
\t\]_?_ZN2N23._31C3fn1ENS0_1BE { target c++11 } } }
   static void fn1 (B) { } // { dg-error no linkage  { target c++98 } }
-  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ 
\t\]_?_ZN2N23._31C3fn2ES1_ } }
+  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ 
\t\]_?_ZN2N23._31C3fn2ES1_ { target c++11 } } }
   static void fn2 (C) { } // { dg-error no linkage  { target c++98 } }
 };
   } const D;
@@ -38,9 +38,9 @@ namespace N3 {
 typedef enum { X, Y } A;
 typedef struct { } B;
 template class T struct C {
-  // { dg-final { scan-assembler .weak\(_definition\)?\[ 
\t\]_?_ZN2N31D1CIiE3fn1ENS0_1BE } }
+  // { dg-final { scan-assembler .weak\(_definition\)?\[ 
\t\]_?_ZN2N31D1CIiE3fn1ENS0_1BE { target c++11 } } }
   static void fn1 (B) { }
-  // { dg-final { scan-assembler .weak\(_definition\)?\[ 
\t\]_?_ZN2N31D1CIiE3fn2ES2_ } }
+  // { dg-final { scan-assembler .weak\(_definition\)?\[ 
\t\]_?_ZN2N31D1CIiE3fn2ES2_ { target c++11 } } }
   static void fn2 (C) { }
 };
   } D;
@@ -54,9 +54,9 @@ namespace N4 {
 typedef enum { X, Y } A;
 typedef struct { } B;
 template class T struct C {
-  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ 
\t\]_?_ZN2N43._91CIiE3fn1ENS0_1BE } }
+  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ 
\t\]_?_ZN2N43._91CIiE3fn1ENS0_1BE { target c++11 } } }
   static void fn1 (B) { } // { not-dg-error no linkage  { target c++98 
} }
-  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ 
\t\]_?_ZN2N43._91CIiE3fn2ES2_ } }
+  // { dg-final { scan-assembler-not .weak\(_definition\)?\[ 
\t\]_?_ZN2N43._91CIiE3fn2ES2_ { target c++11 } } }
   static void fn2 (C) { } // { not-dg-error no linkage  { target c++98 
} }
 };
   } const D;

Dominique


Re: C++ PATCH for c++/55877 (names for linkage purposes)

2014-03-03 Thread Jakub Jelinek
On Mon, Mar 03, 2014 at 04:49:24PM +0100, Dominique Dhumieres wrote:
 The test g++.dg/abi/anon2.C introduced at r208157 causes:
 
 UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler 
 .weak(_definition)?[ \\t]_?_ZN2N11D1C3fn1ENS0_1BE
 UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler 
 .weak(_definition)?[ \\t]_?_ZN2N11D1C3fn2ES1_
 UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
 .weak(_definition)?[ \\t]_?_ZN2N23._31C3fn1ENS0_1BE
 UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
 .weak(_definition)?[ \\t]_?_ZN2N23._31C3fn2ES1_
 UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler 
 .weak(_definition)?[ \\t]_?_ZN2N31D1CIiE3fn1ENS0_1BE
 UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler 
 .weak(_definition)?[ \\t]_?_ZN2N31D1CIiE3fn2ES2_
 UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
 .weak(_definition)?[ \\t]_?_ZN2N43._91CIiE3fn1ENS0_1BE
 UNRESOLVED: g++.dg/abi/anon2.C -std=c++98  scan-assembler-not 
 .weak(_definition)?[ \\t]_?_ZN2N43._91CIiE3fn2ES2_
 
 This is silenced by the following patch

Rainer has already posted similar patch earlier today.

Jakub


Re: [Patch, avr] Remove atxmega16x1 from device list

2014-03-03 Thread Denis Chertykov
2014-03-03 13:41 GMT+04:00 Senthil Kumar Selvaraj
senthil_kumar.selva...@atmel.com:
 The atxmega16x1 AVR variant doesn't exist, but the device name got
 into the device list in gcc/config/avr/avr-mcus.def. This patch removes the
 non-existent device.

 If ok, could someone commit please? I don't have commit access.

 Regards
 Senthil

 2014-03-03  Senthil Kumar Selvaraj  senthil_kumar.selva...@atmel.com

 * config/avr/avr-mcus.def: Remove atxmega16x1.
 * config/avr/avr-tables.opt: Regenerate.
 * config/avr/t-multilib: Regenerate.
 * doc/avr-mmcu.texi: Regenerate.


Committed.

Denis.


Re: [PATCH, rs6000] Restrict reload use of FLOAT_REGS

2014-03-03 Thread Ulrich Weigand
David Edelsohn wrote:
 On Fri, Feb 28, 2014 at 7:11 PM, Bill Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  * config/rs6000/rs6000.c (rs6000_preferred_reload_class): Disallow
  PLUS rtx's from reloading into a superset of FLOAT_REGS; relax
  constraint on constants to only prevent them from being reloaded
  into a superset of FLOAT_REGS.
 
 This is okay with me. Uli is the best one to comment if this is the right 
 test.


-  if (CONSTANT_P (x)  reg_classes_intersect_p (rclass, FLOAT_REGS))
+  if ((CONSTANT_P (x) || GET_CODE (x) == PLUS)
+   reg_class_subset_p (FLOAT_REGS, rclass))
 return NO_REGS;

So the reg_class test change is really a no-op: given the set of classes
defined for rs6000, rclass intersects FLOAT_REGS if and only if FLOAT_REGS
is a subset of rclass ...

This test (in either form) is probably safe, but not the best we can
do when dealing with mixed superset classes like ALL_REGS, since we'll
completely reject loading a constant into ALL_REGS, even though it could
be loaded fine into a GPR.

The best way seems to be to *restrict* the preferred reload class,
but not all the way down to NO_REGS, but to the largest subclass of
the original rclass that can actually handle constants (which might
be GENERAL_REGS or BASE_REGS).  This could be implemented by
something along the lines of:

if (CONSTANT_P (x) || GET_CODE (x) == PLUS)
  {
if (reg_class_subset_p (GENERAL_REGS, rclass))
  return GENERAL_REGS;
else if (reg_class_subset_p (BASE_REGS, rclass))
  return BASE_REGS;
else
  return NO_REGS;
  }

(which is similar to how we did it for s390).

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [testsuite, g++] Fix g++.dg/abi/anon2.C with -std=c++98

2014-03-03 Thread Jason Merrill

OK, thanks.

Jason


Re: [patch,avr] Device specific instructions support for avr

2014-03-03 Thread Denis Chertykov
2014-03-03 13:34 GMT+04:00 S, Pitchumani pitchuman...@atmel.com:
 Hi,

 Few AVR Xmega devices have specific instruction support than the architecture
 it belongs to. For example atxmega128b1 device has RMW instructions (XCH,LAC,
 LAS and LAT) support, but not all avrxmega6 devices have.

 Now, avr-gcc passes architecture name to assembler instead of device name. So,
 RMW instructions are not recognized (illegal opcode error) by assembler.

 To address this issue, we could add device specific ISA to device details
 in GCC. Driver can pass additional option based on specific ISA that a device
 has. Assembler can add device specific ISA to architecture ISA based on the
 option it receives.

 I have attached patches for avr-gcc.

 device-specific-isa-avr-gcc.patch:
 * Device specific ISA information is added to device details.
 * avr-gcc passes -mrmw option to assembler if the selected device
 has RMW instruction support.

I don't like additional option '-mrmw' because we already have a way
for passing device specific ISA.
IMHO better to add new avr_arch (ie atxmega128b1 is ARCH_AVRXMEGA6U)
GAS already have AVR_ISA_XMEGAU for RMW instructions.

Denis.


Re: [PATCH, LIBITM] Backport libitm bug fixes to FSF 4.8

2014-03-03 Thread Richard Henderson
On 03/03/2014 04:48 AM, Torvald Riegel wrote:
 Should I add myself as maintainer for libitm?

Yes.

 Does this come with any other responsibilities than reviewing patches?

No.


r~


Re: [PATCH, rs6000] Restrict reload use of FLOAT_REGS

2014-03-03 Thread Bill Schmidt
Uli, thanks.  Sorry I misunderstood what you said the first time.  I'm
currently testing this version, which I've verified fixes the bug.  I'll
plan to commit the new version after testing provided neither you nor
David objects.

Thanks!
Bill

On Mon, 2014-03-03 at 17:17 +0100, Ulrich Weigand wrote:
 David Edelsohn wrote:
  On Fri, Feb 28, 2014 at 7:11 PM, Bill Schmidt
  wschm...@linux.vnet.ibm.com wrote:
   * config/rs6000/rs6000.c (rs6000_preferred_reload_class): Disallow
   PLUS rtx's from reloading into a superset of FLOAT_REGS; relax
   constraint on constants to only prevent them from being reloaded
   into a superset of FLOAT_REGS.
  
  This is okay with me. Uli is the best one to comment if this is the right 
  test.
 
 
 -  if (CONSTANT_P (x)  reg_classes_intersect_p (rclass, FLOAT_REGS))
 +  if ((CONSTANT_P (x) || GET_CODE (x) == PLUS)
 +   reg_class_subset_p (FLOAT_REGS, rclass))
  return NO_REGS;
 
 So the reg_class test change is really a no-op: given the set of classes
 defined for rs6000, rclass intersects FLOAT_REGS if and only if FLOAT_REGS
 is a subset of rclass ...
 
 This test (in either form) is probably safe, but not the best we can
 do when dealing with mixed superset classes like ALL_REGS, since we'll
 completely reject loading a constant into ALL_REGS, even though it could
 be loaded fine into a GPR.
 
 The best way seems to be to *restrict* the preferred reload class,
 but not all the way down to NO_REGS, but to the largest subclass of
 the original rclass that can actually handle constants (which might
 be GENERAL_REGS or BASE_REGS).  This could be implemented by
 something along the lines of:
 
 if (CONSTANT_P (x) || GET_CODE (x) == PLUS)
   {
 if (reg_class_subset_p (GENERAL_REGS, rclass))
   return GENERAL_REGS;
 else if (reg_class_subset_p (BASE_REGS, rclass))
   return BASE_REGS;
 else
   return NO_REGS;
   }
 
 (which is similar to how we did it for s390).
 
 Bye,
 Ulrich
 



RFA: Add PchIgnore option property

2014-03-03 Thread Joern Rennecke
I've been looking how to make the precompiled header mechanism allow
me to use the
ARC -misize option (which outputs additional information about gcc's
idea of instruction
addresses for the purpose of branch shortening, to help debugging the
latter) in a
compilation involving precompiled headers.
I can't use TARGET_CHECK_PCH_TARGET_FLAGS for that purpose because
-misize uses its own variable (to save on target_flags bits).
If I wanted to use the TARGET_PCH_VALID_P hook, I'd have to duplicate
lots of code
from default_pch_valid_p, which is intricately tied with the pch
implementation, and
also 'knows' that non-target flags never affect pch.
Moreover, having this extra information encoded in a separate
function, instead of
at the specific option(s) in config/target/target.opt is rather messy.

Therefore, I propose to add a new option property to mark an option that should
be ignored for the purpose of checking pch validity.

The attached patch implements this as PchIgnore.

bootstrapped on i686-pc-linux.gnu.

OK to apply?


pch-ignore-patch
Description: Binary data


[jit] Add more syntactic sugar to C++ wrapper API

2014-03-03 Thread David Malcolm
Committed to branch dmalcolm/jit:

gcc/jit/
* libgccjit++.h (gccjit::function::operator()): Add overload for
a call with 3 arguments.
(gccjit::block::add_call): Likewise for 4 arguments.
(gccjit::rvalue::cast_to): New method.
(gccjit::rvalue::operator[]): New methods.
---
 gcc/jit/ChangeLog.jit |  8 
 gcc/jit/libgccjit++.h | 55 ++-
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index f2fea8c..dd4bf84 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,3 +1,11 @@
+2014-03-03  David Malcolm  dmalc...@redhat.com
+
+   * libgccjit++.h (gccjit::function::operator()): Add overload for
+   a call with 3 arguments.
+   (gccjit::block::add_call): Likewise for 4 arguments.
+   (gccjit::rvalue::cast_to): New method.
+   (gccjit::rvalue::operator[]): New methods.
+
 2014-02-28  David Malcolm  dmalc...@redhat.com
 
* libgccjit.c (gcc_jit_context_new_binary_op): Check that the
diff --git a/gcc/jit/libgccjit++.h b/gcc/jit/libgccjit++.h
index b77e82f..e7ff5ea 100644
--- a/gcc/jit/libgccjit++.h
+++ b/gcc/jit/libgccjit++.h
@@ -310,6 +310,8 @@ namespace gccjit
   location loc = location ());
 rvalue operator() (rvalue arg0, rvalue arg1,
   location loc = location ());
+rvalue operator() (rvalue arg0, rvalue arg1, rvalue arg2,
+  location loc = location ());
   };
 
   class block : public object
@@ -347,6 +349,9 @@ namespace gccjit
 rvalue add_call (function other,
 rvalue arg0, rvalue arg1, rvalue arg2,
 location loc = location ());
+rvalue add_call (function other,
+rvalue arg0, rvalue arg1, rvalue arg2, rvalue arg3,
+location loc = location ());
 
 void add_comment (const std::string text,
  location loc = location ());
@@ -381,7 +386,14 @@ namespace gccjit
  location loc = location ());
 
 lvalue dereference (location loc = location ());
- };
+
+rvalue cast_to (type type_,
+   location loc = location ());
+
+/* Array access.  */
+lvalue operator[] (rvalue index);
+lvalue operator[] (int index);
+  };
 
   class lvalue : public rvalue
   {
@@ -1249,6 +1261,16 @@ block::add_call (function other,
 }
 
 inline rvalue
+block::add_call (function other,
+rvalue arg0, rvalue arg1, rvalue arg2, rvalue arg3,
+location loc)
+{
+  rvalue c = get_context ().new_call (other, arg0, arg1, arg2, arg3, loc);
+  add_eval (c);
+  return c;
+}
+
+inline rvalue
 function::operator() (location loc)
 {
   return get_context ().new_call (*this, loc);
@@ -1269,6 +1291,14 @@ function::operator() (rvalue arg0, rvalue arg1,
  arg0, arg1,
  loc);
 }
+inline rvalue
+function::operator() (rvalue arg0, rvalue arg1, rvalue arg2,
+ location loc)
+{
+  return get_context ().new_call (*this,
+ arg0, arg1, arg2,
+ loc);
+}
 
 // class block
 inline block::block () : object (NULL) {}
@@ -1327,6 +1357,29 @@ rvalue::dereference (location loc)
 loc.get_inner_location ()));
 }
 
+inline rvalue
+rvalue::cast_to (type type_,
+location loc)
+{
+  return get_context ().new_cast (*this, type_, loc);
+}
+
+inline lvalue
+rvalue::operator[] (rvalue index)
+{
+  return get_context ().new_array_access (*this, index);
+}
+
+inline lvalue
+rvalue::operator[] (int index)
+{
+  context ctxt = get_context ();
+  type int_t = ctxt.get_int_type int ();
+  return ctxt.new_array_access (*this,
+   ctxt.new_rvalue (int_t,
+index));
+}
+
 // class lvalue : public rvalue
 inline lvalue::lvalue () : rvalue () {}
 inline lvalue::lvalue (gcc_jit_lvalue *inner)
-- 
1.7.11.7



Re: RFA: Add PchIgnore option property

2014-03-03 Thread Joseph S. Myers
On Mon, 3 Mar 2014, Joern Rennecke wrote:

 be ignored for the purpose of checking pch validity.
 
 The attached patch implements this as PchIgnore.
 
 bootstrapped on i686-pc-linux.gnu.
 
 OK to apply?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH, libsanitizer] Disable for PowerPC little endian for now

2014-03-03 Thread Bill Schmidt
Hi,

Currently most or all of the libsanitizer tests fail for
powerpc64le-linux-gnu, and we won't be able to address this in GCC 4.9.
(We do plan to look at this in the next release.)  Richard Biener
suggested we mark it as unsupported, as this patch implements.  

Bootstrapped and tested on powerpc64le-unknown-linux-gnu, verifying that
libsanitizer is not built.  Also bootstrapped and tested on
powerpc64-unknown-linux-gnu, verifying that libsanitizer is built
normally.  Is this ok for trunk?

Thanks,
Bill


2014-03-03  Bill Schmidt  wschm...@linux.vnet.ibm.com

* configure.tgt: Unsupported for little endian PowerPC for now.


Index: libsanitizer/configure.tgt
===
--- libsanitizer/configure.tgt  (revision 208265)
+++ libsanitizer/configure.tgt  (working copy)
@@ -26,6 +26,9 @@ case ${target} in
LSAN_SUPPORTED=yes
fi
;;
+  powerpc*le-*-linux*)
+   UNSUPPORTED=1
+   ;;
   powerpc*-*-linux*)
;;
   sparc*-*-linux*)




Re: [PATCH, libsanitizer] Disable for PowerPC little endian for now

2014-03-03 Thread Jakub Jelinek
On Mon, Mar 03, 2014 at 01:21:47PM -0600, Bill Schmidt wrote:
 Currently most or all of the libsanitizer tests fail for
 powerpc64le-linux-gnu, and we won't be able to address this in GCC 4.9.
 (We do plan to look at this in the next release.)  Richard Biener
 suggested we mark it as unsupported, as this patch implements.  
 
 Bootstrapped and tested on powerpc64le-unknown-linux-gnu, verifying that
 libsanitizer is not built.  Also bootstrapped and tested on
 powerpc64-unknown-linux-gnu, verifying that libsanitizer is built
 normally.  Is this ok for trunk?

 2014-03-03  Bill Schmidt  wschm...@linux.vnet.ibm.com
 
   * configure.tgt: Unsupported for little endian PowerPC for now.

This is ok.

 --- libsanitizer/configure.tgt(revision 208265)
 +++ libsanitizer/configure.tgt(working copy)
 @@ -26,6 +26,9 @@ case ${target} in
   LSAN_SUPPORTED=yes
   fi
   ;;
 +  powerpc*le-*-linux*)
 + UNSUPPORTED=1
 + ;;
powerpc*-*-linux*)
   ;;
sparc*-*-linux*)
 

Jakub


Re: [C++ Patch] PR 60376

2014-03-03 Thread Jason Merrill

OK.

Jason


libgo patch committed: Update to Go 1.2.1 release

2014-03-03 Thread Ian Lance Taylor
I've committed patch to libgo to update to the Go 1.2.1 release.  This
is a small patch that fixes a couple of serious bugs that have no source
code workarounds.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

diff -r eb03648db396 libgo/MERGE
--- a/libgo/MERGE	Mon Mar 03 07:44:35 2014 -0800
+++ b/libgo/MERGE	Mon Mar 03 12:13:10 2014 -0800
@@ -1,4 +1,4 @@
-65bf677ab8d8
+0ddbdc3c7ce2
 
 The first line of this file holds the Mercurial revision number of the
 last merge done from the master library sources.
diff -r eb03648db396 libgo/go/database/sql/sql.go
--- a/libgo/go/database/sql/sql.go	Mon Mar 03 07:44:35 2014 -0800
+++ b/libgo/go/database/sql/sql.go	Mon Mar 03 12:13:10 2014 -0800
@@ -620,8 +620,8 @@
 	}
 
 	// If db.maxOpen  0 and the number of open connections is over the limit
-	// or there are no free connection, then make a request and wait.
-	if db.maxOpen  0  (db.numOpen = db.maxOpen || db.freeConn.Len() == 0) {
+	// and there are no free connection, make a request and wait.
+	if db.maxOpen  0  db.numOpen = db.maxOpen  db.freeConn.Len() == 0 {
 		// Make the connRequest channel. It's buffered so that the
 		// connectionOpener doesn't block while waiting for the req to be read.
 		ch := make(chan interface{}, 1)
diff -r eb03648db396 libgo/go/database/sql/sql_test.go
--- a/libgo/go/database/sql/sql_test.go	Mon Mar 03 07:44:35 2014 -0800
+++ b/libgo/go/database/sql/sql_test.go	Mon Mar 03 12:13:10 2014 -0800
@@ -1005,6 +1005,29 @@
 	}
 }
 
+func TestSingleOpenConn(t *testing.T) {
+	db := newTestDB(t, people)
+	defer closeDB(t, db)
+
+	db.SetMaxOpenConns(1)
+
+	rows, err := db.Query(SELECT|people|name|)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if err = rows.Close(); err != nil {
+		t.Fatal(err)
+	}
+	// shouldn't deadlock
+	rows, err = db.Query(SELECT|people|name|)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if err = rows.Close(); err != nil {
+		t.Fatal(err)
+	}
+}
+
 // golang.org/issue/5323
 func TestStmtCloseDeps(t *testing.T) {
 	if testing.Short() {
diff -r eb03648db396 libgo/go/net/fd_windows.go
--- a/libgo/go/net/fd_windows.go	Mon Mar 03 07:44:35 2014 -0800
+++ b/libgo/go/net/fd_windows.go	Mon Mar 03 12:13:10 2014 -0800
@@ -513,12 +513,7 @@
 	})
 }
 
-func (fd *netFD) accept(toAddr func(syscall.Sockaddr) Addr) (*netFD, error) {
-	if err := fd.readLock(); err != nil {
-		return nil, err
-	}
-	defer fd.readUnlock()
-
+func (fd *netFD) acceptOne(toAddr func(syscall.Sockaddr) Addr, rawsa []syscall.RawSockaddrAny, o *operation) (*netFD, error) {
 	// Get new socket.
 	s, err := sysSocket(fd.family, fd.sotype, 0)
 	if err != nil {
@@ -537,9 +532,7 @@
 	}
 
 	// Submit accept request.
-	o := fd.rop
 	o.handle = s
-	var rawsa [2]syscall.RawSockaddrAny
 	o.rsan = int32(unsafe.Sizeof(rawsa[0]))
 	_, err = rsrv.ExecIO(o, AcceptEx, func(o *operation) error {
 		return syscall.AcceptEx(o.fd.sysfd, o.handle, (*byte)(unsafe.Pointer(rawsa[0])), 0, uint32(o.rsan), uint32(o.rsan), o.qty, o.o)
@@ -556,6 +549,45 @@
 		return nil, OpError{Setsockopt, fd.net, fd.laddr, err}
 	}
 
+	return netfd, nil
+}
+
+func (fd *netFD) accept(toAddr func(syscall.Sockaddr) Addr) (*netFD, error) {
+	if err := fd.readLock(); err != nil {
+		return nil, err
+	}
+	defer fd.readUnlock()
+
+	o := fd.rop
+	var netfd *netFD
+	var err error
+	var rawsa [2]syscall.RawSockaddrAny
+	for {
+		netfd, err = fd.acceptOne(toAddr, rawsa[:], o)
+		if err == nil {
+			break
+		}
+		// Sometimes we see WSAECONNRESET and ERROR_NETNAME_DELETED is
+		// returned here. These happen if connection reset is received
+		// before AcceptEx could complete. These errors relate to new
+		// connection, not to AcceptEx, so ignore broken connection and
+		// try AcceptEx again for more connections.
+		operr, ok := err.(*OpError)
+		if !ok {
+			return nil, err
+		}
+		errno, ok := operr.Err.(syscall.Errno)
+		if !ok {
+			return nil, err
+		}
+		switch errno {
+		case syscall.ERROR_NETNAME_DELETED, syscall.WSAECONNRESET:
+			// ignore these and try again
+		default:
+			return nil, err
+		}
+	}
+
 	// Get local and peer addr out of AcceptEx buffer.
 	var lrsa, rrsa *syscall.RawSockaddrAny
 	var llen, rlen int32
diff -r eb03648db396 libgo/runtime/mgc0.c
--- a/libgo/runtime/mgc0.c	Mon Mar 03 07:44:35 2014 -0800
+++ b/libgo/runtime/mgc0.c	Mon Mar 03 12:13:10 2014 -0800
@@ -1770,6 +1770,8 @@
 void
 runtime_gchelper(void)
 {
+	uint32 nproc;
+
 	gchelperstart();
 
 	// parallel mark for over gc roots
@@ -1786,7 +1788,8 @@
 
 	runtime_parfordo(work.sweepfor);
 	bufferList[runtime_m()-helpgc].busy = 0;
-	if(runtime_xadd(work.ndone, +1) == work.nproc-1)
+	nproc = work.nproc;  // work.nproc can change right after we increment work.ndone
+	if(runtime_xadd(work.ndone, +1) == nproc-1)
 		runtime_notewakeup(work.alldone);
 }
 


web site patch committed: Add Go note for GCC 4.9

2014-03-03 Thread Ian Lance Taylor
I committed this patch to the web site to mention that GCC 4.9 supports
Go 1.2.1.

Ian

Index: gcc-4.9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.57
diff -u -r1.57 changes.html
--- gcc-4.9/changes.html	22 Feb 2014 09:11:43 -	1.57
+++ gcc-4.9/changes.html	3 Mar 2014 20:17:39 -
@@ -397,6 +397,12 @@
 /ul/li
   /ul
 
+h3 id=goGo/h3
+  ul
+liGCC 4.9 provides a complete implementation of the Go 1.2.1
+release.
+  /ul
+
 !--
 h3Java (GCJ)/h3
 --


Re: [RFC] Do not consider volatile asms as optimization barriers #1

2014-03-03 Thread Mike Stump
On Mar 1, 2014, at 6:36 AM, Eric Botcazou ebotca...@adacore.com wrote:
 It introduces a new, temporary predicate really_volatile_insn_p

really is a really horrible name.  Hint, if cs domain specific wikipedia 
describe what you were doing, what would the page be called?  really is 
unlikely to be it.

Re: [PATCH][AARCH64]PR60034

2014-03-03 Thread Kugan
On 27/02/14 22:32, Marcus Shawcroft wrote:
 On 21 February 2014 04:24, Kugan kugan.vivekanandara...@linaro.org wrote:
 
 Compiling inline asm results in ICE (PR60034). Alignment calculation in
 aarch64_classify_address for (symbol_ref:DI (*.LANCHOR4) [flags
 0x182])) seems wrong here.
 
 Hi Kugan,
 
 +  else if (SYMBOL_REF_FLAGS (sym))
 + align = GET_MODE_ALIGNMENT (GET_MODE (sym));
 
 This is inserted into the LO_SUM handling in the function
 aarch64_classify_address(), the code in question is checking the
 alignment of the object to ensure that a scaled address instruction
 would be valid. The proposed code is testing if any of a bunch of
 unrelated predicate flags have been set on the symbol and using that
 to gate whether GET_MODE_ALIGNMENT would give accurate alignment
 information on the symbol. I'm not convinced that the presence of
 SYMBOL_REF_FLAGS states anything definitive about the relevance of
 GET_MODE_ALIGNMENT.   The test looks like it fails because a section
 anchor has been introduced and we fail to determine anything sensible
 about the alignment of a section anchor.  How about this instead?
 
  if (SYMBOL_REF_BLOCK (sym))
align = SYMBOL_REF_BLOCK (sym)-alignment;
 

Thanks Marcus for the explanation.  I have now changed it based on this
and regression tested on qemu-aarch64 for aarch64-none-linux-gnu with no
new regressions.

Is this OK?


 Fixing this also  caused a regression for pr38151.c, which is due to
 complex type being allocated with wrong alignment. Attached patch fixes
 these issues.
 
 It ~might~ be beneficial to increase data_alignment here as suggest
 for performance reasons, but the existing alignment should not cause
 breakage... this issue suggest to me that the SYMBOL_REF_FLAGS
 approach is at fault.
 

Removing this hunk. I will post it as a desperate patch after more analysis.

Thanks,
Kugan


gcc/

2014-03-03  Kugan Vivekanandarajah  kug...@linaro.org

PR target/60034
* aarch64/aarch64.c (aarch64_classify_address): Fix alignment for
section anchor.



gcc/testsuite/

2014-03-03  Kugan Vivekanandarajah  kug...@linaro.org

PR target/60034
* gcc.target/aarch64/pr60034.c: New file.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 901ad3d..d2a9217 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3199,6 +3199,10 @@ aarch64_classify_address (struct aarch64_address_info 
*info,
}
  else if (SYMBOL_REF_DECL (sym))
align = DECL_ALIGN (SYMBOL_REF_DECL (sym));
+ else if (SYMBOL_REF_HAS_BLOCK_INFO_P (sym)
+   SYMBOL_REF_ANCHOR_P (sym)
+   SYMBOL_REF_BLOCK (sym) != NULL)
+   align = SYMBOL_REF_BLOCK (sym)-alignment;
  else
align = BITS_PER_UNIT;
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr60034.c 
b/gcc/testsuite/gcc.target/aarch64/pr60034.c
index e69de29..d126779 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr60034.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr60034.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options -std=gnu99 -fgnu89-inline -O -Wall -Winline -Wwrite-strings 
-fmerge-all-constants -frounding-math -g -Wstrict-prototypes } */
+
+static unsigned long global_max_fast;
+
+void __libc_mallopt (int param_number, int value)
+{
+ __asm__ __volatile__ (# %[_SDT_A21] :: [_SDT_A21] nr ((global_max_fast)));
+ global_max_fast = 1;
+}


Re: [PATCH, LIBITM] Backport libitm bug fixes to FSF 4.8

2014-03-03 Thread Peter Bergner
On Mon, 2014-03-03 at 13:48 +0100, Torvald Riegel wrote:
 On Fri, 2014-02-28 at 19:32 -0600, Peter Bergner wrote:
  I'd like to ask for permission to backport the following two LIBITM bug
  fixes to the FSF 4.8 branch.  Although these are not technically fixing
  regressions, they do fix the libitm.c/reentrant.c testsuite failure on
  s390 and powerpc (or at least it will when we finally get our power8
  code backported to FSF 4.8).  It also fixes a real bug on x86 that is
  latent because we don't currently have a test case that warms up the
  x86's RTM hardware enough such that its xbegin succeeds exposing the
  bug.  I'd like this backport so that the 4.8 based distros won't need
  to carry this as an add-on patch.
  
  It should also be fairly safe as well, since the fixed code is limited
  to the arches (x86, s390 and powerpc) that define USE_HTM_FASTPATH,
  so all others definitely won't see a difference.
 
 Looks good to me.

Ok, committed as revision 208295.  Thanks everyone!

Peter




[PATCH, i386] Fix emitting of prefetch instructions

2014-03-03 Thread Uros Bizjak
On Mon, Mar 3, 2014 at 1:10 PM, Rainer Orth r...@cebitec.uni-bielefeld.de 
wrote:

 The new gcc.target/i386/prefetchwt1-1.c test currently FAILs on Solaris 9/x86:

 FAIL: gcc.target/i386/prefetchwt1-1.c (test for excess errors)
 Excess errors:
 /var/gcc/regression/trunk/9-gcc-gas/build/gcc/include/xmmintrin.h:1195:1: 
 error:
  inlining failed in call to always_inline '_mm_prefetch': target specific 
 option
  mismatch
 /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/prefetchwt1-1.c:12:5:
 error: called from here

 gcc.target/i386/prefetchwt1-1.c: output file does not exist
 UNRESOLVED: gcc.target/i386/prefetchwt1-1.c scan-assembler [ 
 \\t]+prefetchwt1[ \
 \t]+

 This can be fixed by compiling with -msse2.

Actually, we should take prefetch instructions out of various GCC
target pragmas. Patterns that emit these instructions are designed to
(depending on selected ISA) always emit  the most optimal prefetch
instruction.

The patch also changes the compiler to emit prefetchwt1 only for
_MM_HINT_T1, while for _MM_HINT_T0, it still emits prefetchw. In
addition, the patch corrects wrong MM_HINT_T0 value.

Patch was bootstrapped and tested on x86_64-pc-linux-gnu {,-m32}  and
committed to mainline SVN.

2014-03-03  Uros Bizjak  ubiz...@gmail.com

* config/i386/xmmintrin.h (enum _mm_hint) _MM_HINT_ET0: Correct
hint value.
(_mm_prefetch): Move out of GCC target(sse) pragma.
* config/i386/prfchwintrin.h (_m_prefetchw): Move out of
GCC target(prfchw) pragma.
* config/i386/i386.md (prefetch): Emit prefetchwt1 only
for locality = 2.
* config/i386/i386.c (ix86_option_override_internal): Enable
-mprfchw with -mprefetchwt1.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 208281)
+++ config/i386/i386.c  (working copy)
@@ -3874,8 +3874,9 @@ ix86_option_override_internal (bool main_args_p,
   || (TARGET_PRFCHW  !TARGET_3DNOW_P (opts-x_ix86_isa_flags)))
 x86_prefetch_sse = true;
 
-  /* Enable prefetch{,w} instructions for -m3dnow.  */
-  if (TARGET_3DNOW_P (opts-x_ix86_isa_flags))
+  /* Enable prefetch{,w} instructions for -m3dnow and -mprefetchwt1.  */
+  if (TARGET_3DNOW_P (opts-x_ix86_isa_flags)
+  || TARGET_PREFETCHWT1_P (opts-x_ix86_isa_flags))
 opts-x_ix86_isa_flags
   |= OPTION_MASK_ISA_PRFCHW  ~opts-x_ix86_isa_flags_explicit;
 
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 208281)
+++ config/i386/i386.md (working copy)
@@ -17867,7 +17867,7 @@
  supported by SSE counterpart or the SSE prefetch is not available
  (K6 machines).  Otherwise use SSE prefetch as it allows specifying
  of locality.  */
-  if (TARGET_PREFETCHWT1  write)
+  if (TARGET_PREFETCHWT1  write  locality = 2)
 operands[2] = const2_rtx;
   else if (TARGET_PRFCHW  (write || !TARGET_PREFETCH_SSE))
 operands[2] = GEN_INT (3);
Index: config/i386/prfchwintrin.h
===
--- config/i386/prfchwintrin.h  (revision 208281)
+++ config/i386/prfchwintrin.h  (working copy)
@@ -25,16 +25,9 @@
 # error Never use prfchwintrin.h directly; include x86intrin.h or 
mm3dnow.h instead.
 #endif
 
-
 #ifndef _PRFCHWINTRIN_H_INCLUDED
 #define _PRFCHWINTRIN_H_INCLUDED
 
-#ifndef __PRFCHW__
-#pragma GCC push_options
-#pragma GCC target(prfchw)
-#define __DISABLE_PRFCHW__
-#endif /* __PRFCHW__ */
-
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _m_prefetchw (void *__P)
 {
@@ -41,9 +34,4 @@ _m_prefetchw (void *__P)
   __builtin_prefetch (__P, 1, 3 /* _MM_HINT_T0 */);
 }
 
-#ifdef __DISABLE_PRFCHW__
-#undef __DISABLE_PRFCHW__
-#pragma GCC pop_options
-#endif /* __DISABLE_PRFCHW__ */
-
 #endif /* _PRFCHWINTRIN_H_INCLUDED */
Index: config/i386/xmmintrin.h
===
--- config/i386/xmmintrin.h (revision 208281)
+++ config/i386/xmmintrin.h (working copy)
@@ -33,6 +33,31 @@
 /* Get _mm_malloc () and _mm_free ().  */
 #include mm_malloc.h
 
+/* Constants for use with _mm_prefetch.  */
+enum _mm_hint
+{
+  /* _MM_HINT_ET is _MM_HINT_T with set 3rd bit.  */
+  _MM_HINT_ET0 = 7,
+  _MM_HINT_ET1 = 6,
+  _MM_HINT_T0 = 3,
+  _MM_HINT_T1 = 2,
+  _MM_HINT_T2 = 1,
+  _MM_HINT_NTA = 0
+};
+
+/* Loads one cache line from address P to a location closer to the
+   processor.  The selector I specifies the type of prefetch operation.  */
+#ifdef __OPTIMIZE__
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_prefetch (const void *__P, enum _mm_hint __I)
+{
+  __builtin_prefetch (__P, (__I  0x4)  2, __I  0x3);
+}
+#else
+#define _mm_prefetch(P, I) \
+  __builtin_prefetch ((P), ((I  0x4)  2), (I  0x3))
+#endif
+
 #ifndef __SSE__
 #pragma GCC push_options
 #pragma GCC target(sse)
@@ -50,18 +75,6 @@ typedef float __v4sf __attribute__ 

Re: [RFC] Do not consider volatile asms as optimization barriers #1

2014-03-03 Thread Eric Botcazou
 That doesn't really answer the question though.  What's the correct
 behaviour for an unspec volatile in a loop?  I don't think it's what
 we did in the example above, since it doesn't seem self-consistent.
 And not spending too much time is again a bit vague in terms of
 saying what's right and what's wrong.

not spending too much time is a polite way to say I don't really care. :-)
If you do, feel free to post a formal definition, a implementation plan and 
maybe a patch at some point.

-- 
Eric Botcazou


Re: [PATCH, i386] Fix emitting of prefetch instructions

2014-03-03 Thread Uros Bizjak
On Tue, Mar 4, 2014 at 12:31 AM, Uros Bizjak ubiz...@gmail.com wrote:

 The new gcc.target/i386/prefetchwt1-1.c test currently FAILs on Solaris 
 9/x86:

 FAIL: gcc.target/i386/prefetchwt1-1.c (test for excess errors)
 Excess errors:
 /var/gcc/regression/trunk/9-gcc-gas/build/gcc/include/xmmintrin.h:1195:1: 
 error:
  inlining failed in call to always_inline '_mm_prefetch': target specific 
 option
  mismatch
 /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/prefetchwt1-1.c:12:5:
 error: called from here

 gcc.target/i386/prefetchwt1-1.c: output file does not exist
 UNRESOLVED: gcc.target/i386/prefetchwt1-1.c scan-assembler [ 
 \\t]+prefetchwt1[ \
 \t]+

 This can be fixed by compiling with -msse2.

 Actually, we should take prefetch instructions out of various GCC
 target pragmas. Patterns that emit these instructions are designed to
 (depending on selected ISA) always emit  the most optimal prefetch
 instruction.

 The patch also changes the compiler to emit prefetchwt1 only for
 _MM_HINT_T1, while for _MM_HINT_T0, it still emits prefetchw. In
 addition, the patch corrects wrong MM_HINT_T0 value.

 Patch was bootstrapped and tested on x86_64-pc-linux-gnu {,-m32}  and
 committed to mainline SVN.

 2014-03-03  Uros Bizjak  ubiz...@gmail.com

 * config/i386/xmmintrin.h (enum _mm_hint) _MM_HINT_ET0: Correct
 hint value.
 (_mm_prefetch): Move out of GCC target(sse) pragma.
 * config/i386/prfchwintrin.h (_m_prefetchw): Move out of
 GCC target(prfchw) pragma.
 * config/i386/i386.md (prefetch): Emit prefetchwt1 only
 for locality = 2.
 * config/i386/i386.c (ix86_option_override_internal): Enable
 -mprfchw with -mprefetchwt1.

 BTW: There are a couple of new testsuite failures:

 FAIL: gcc.target/i386/avx512pf-vscatterpf0dpd-1.c (test for excess errors)
 UNRESOLVED: gcc.target/i386/avx512pf-vscatterpf0dpd-1.c
 scan-assembler-times vscatterpf0dpd[ t]+[^\\n]*%ymm[0-9] 2
 UNRESOLVED: gcc.target/i386/avx512pf-vscatterpf0dpd-1.c
 scan-assembler-times vscatterpf0dpd[ t]+[^\\n]*{%k[1-7] 1
 FAIL: gcc.target/i386/avx512pf-vscatterpf0dps-1.c (test for excess errors)
 UNRESOLVED: gcc.target/i386/avx512pf-vscatterpf0dps-1.c
 scan-assembler-times vscatterpf0dps[ t]+[^\\n]*%zmm[0-9] 2
 UNRESOLVED: gcc.target/i386/avx512pf-vscatterpf0dps-1.c
 scan-assembler-times vscatterpf0dps[ t]+[^\\n]*{%k[1-7] 1
 FAIL: gcc.target/i386/avx512pf-vscatterpf0qpd-1.c (test for excess errors)
 UNRESOLVED: gcc.target/i386/avx512pf-vscatterpf0qpd-1.c
 scan-assembler-times vscatterpf0qpd[ t]+[^\\n]*%zmm[0-9] 2
 UNRESOLVED: gcc.target/i386/avx512pf-vscatterpf0qpd-1.c
 scan-assembler-times vscatterpf0qpd[ t]+[^\\n]*{%k[1-7] 1
 FAIL: gcc.target/i386/avx512pf-vscatterpf0qps-1.c (test for excess errors)
 UNRESOLVED: gcc.target/i386/avx512pf-vscatterpf0qps-1.c
 scan-assembler-times vscatterpf0qps[ t]+[^\\n]*%zmm[0-9] 2
 UNRESOLVED: gcc.target/i386/avx512pf-vscatterpf0qps-1.c
 scan-assembler-times vscatterpf0qps[ t]+[^\\n]*{%k[1-7] 1

 They are all:

 FAIL: gcc.target/i386/avx512pf-vscatterpf0dpd-1.c (test for excess errors)
 Excess errors:
 /ssd/uros/gcc-build/gcc/include/avx512pfintrin.h:108:3: error: the
 last argument must be hint 0 or 1

 They are due to _MM_HINT_ET0 fix, and probably show that the pattern
 was not updated when hint constants were adjusted to 2 and 3.

 Kirill, can you please look at this inconsistency?

Attached patch fixes these failures, and also fixes and improves error message.

Uros.
Index: i386.c
===
--- i386.c  (revision 208296)
+++ i386.c  (working copy)
@@ -36022,7 +36022,7 @@ addcarryx:
 
   if (!insn_data[icode].operand[4].predicate (op4, mode4))
{
- error (the last argument must be hint 0 or 1);
+ error (incorrect hint operand);
  return const0_rtx;
}
 
Index: predicates.md
===
--- predicates.md   (revision 208295)
+++ predicates.md   (working copy)
@@ -660,12 +660,12 @@
   return i == 2 || i == 4 || i == 8;
 })
 
-;; Match 2, 3, 5, or 6
-(define_predicate const2356_operand
+;; Match 2, 3, 6, or 7
+(define_predicate const2367_operand
   (match_code const_int)
 {
   HOST_WIDE_INT i = INTVAL (op);
-  return i == 2 || i == 3 || i == 5 || i == 6;
+  return i == 2 || i == 3 || i == 6 || i == 7;
 })
 
 ;; Match 1, 2, 4, or 8
Index: sse.md
===
--- sse.md  (revision 208295)
+++ sse.md  (working copy)
@@ -12652,7 +12652,7 @@
  [(match_operand 2 vsib_address_operand)
   (match_operand:VI48_512 1 register_operand)
   (match_operand:SI 3 const1248_operand)]))
-  (match_operand:SI 4 const2356_operand)]
+  (match_operand:SI 4 const2367_operand)]
  UNSPEC_SCATTER_PREFETCH)]
   TARGET_AVX512PF
 {
@@ -12670,7 +12670,7 @@
(match_operand:VI48_512 1 

Go patch committed: Fix reloc info for circular immutable structs

2014-03-03 Thread Ian Lance Taylor
In Go, unlike C, there can be mutually recursive static initializers.
For type descriptors, those initializers can be common, in the sense
that they can appear in multiple object files.  This is done using
make_decl_one_only.  That function only computes relocation information
correctly when the initializer is set, so we only call it at that point.
(When the relocation information is incorrect, on some systems the
variable will be put in a writable .rodata section, which makes little
sense.)  If there is a mutually recursive reference, then one of the
initializers need to be computed before the other is set.  To work
around this issue, we temporarily mark the variable as weak after it has
been created but before the initializer is set, which should ensure that
the right relocation is computed.  This is definitely a hack, but it
should suffice for GCC 4.9.

Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian


2014-03-03  Ian Lance Taylor  i...@google.com

* go-gcc.cc (Gcc_backend::immutable_struct): If IS_COMMON, set
DECL_WEAK.
(GCC_backend::immutable_struct_set_init): If IS_COMMON, clear
DECL_WEAK.


Index: go-gcc.cc
===
--- go-gcc.cc	(revision 208300)
+++ go-gcc.cc	(working copy)
@@ -1871,7 +1871,7 @@
 
 Bvariable*
 Gcc_backend::immutable_struct(const std::string name, bool is_hidden,
-			  bool, Btype* btype, Location location)
+			  bool is_common, Btype* btype, Location location)
 {
   tree type_tree = btype-get_tree();
   if (type_tree == error_mark_node)
@@ -1888,6 +1888,21 @@
   if (!is_hidden)
 TREE_PUBLIC(decl) = 1;
 
+  // When the initializer for one immutable_struct refers to another,
+  // it needs to know the visibility of the referenced struct so that
+  // compute_reloc_for_constant will return the right value.  On many
+  // systems calling make_decl_one_only will mark the decl as weak,
+  // which will change the return value of compute_reloc_for_constant.
+  // We can't reliably call make_decl_one_only yet, because we don't
+  // yet know the initializer.  This issue doesn't arise in C because
+  // Go initializers, unlike C initializers, can be indirectly
+  // recursive.  To ensure that compute_reloc_for_constant computes
+  // the right value if some other initializer refers to this one, we
+  // mark this symbol as weak here.  We undo that below in
+  // immutable_struct_set_init before calling mark_decl_one_only.
+  if (is_common)
+DECL_WEAK(decl) = 1;
+
   // We don't call rest_of_decl_compilation until we have the
   // initializer.
 
@@ -1910,9 +1925,13 @@
 
   DECL_INITIAL(decl) = init_tree;
 
-  // We can't call make_decl_one_only until we set DECL_INITIAL.
+  // Now that DECL_INITIAL is set, we can't call make_decl_one_only.
+  // See the comment where DECL_WEAK is set in immutable_struct.
   if (is_common)
-make_decl_one_only(decl, DECL_ASSEMBLER_NAME(decl));
+{
+  DECL_WEAK(decl) = 0;
+  make_decl_one_only(decl, DECL_ASSEMBLER_NAME(decl));
+}
 
   // These variables are often unneeded in the final program, so put
   // them in their own section so that linker GC can discard them.


[PATCH v6] PR middle-end/60281

2014-03-03 Thread lin zuojian
Hi,
This patch fixes wrong formatting as Jakub points out.
--
Regards
lin zuojian

--
Without aligning the asan stack base,this base will only 64-bit aligned in ARM 
machines.
But asan require 256-bit aligned base because of this:
1.right shift take ASAN_SHADOW_SHIFT(which is 3) bits are zeros
2.store multiple/load multiple instructions require the other 2 bits are zeros

that add up lowest 5 bits should be zeros.That means 32 bytes or 256 bits 
aligned.

* asan.c (asan_emit_stack_protection): Force the base to align to
appropriate bits if STRICT_ALIGNMENT.  Set shadow_mem align to
appropriate bits if STRICT_ALIGNMENT.
* cfgexpand.c (expand_stack_vars): Set base_align appropriately
when asan is on.
(expand_used_vars): Leave a space in the stack frame for alignment if
STRICT_ALIGNMENT.

Signed-off-by: lin zuojian manjian2...@gmail.com
---
 gcc/ChangeLog   | 10 ++
 gcc/asan.c  | 15 +++
 gcc/cfgexpand.c | 16 ++--
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6f5bd57..6cd1ba8 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2014-03-03  Lin Zuojian  manjian2...@gmail.com
+   PR middle-end/60281
+   * asan.c (asan_emit_stack_protection): Force the base to align to
+   appropriate bits if STRICT_ALIGNMENT.  Set shadow_mem align to 
+   appropriate bits if STRICT_ALIGNMENT.
+   * cfgexpand.c (expand_stack_vars): Set base_align appropriately 
+   when asan is on.
+   (expand_used_vars): Leave a space in the stack frame for alignment if
+   STRICT_ALIGNMENT.
+
 2014-03-03  Uros Bizjak  ubiz...@gmail.com
 
* config/i386/xmmintrin.h (enum _mm_hint) _MM_HINT_ET0: Correct
diff --git a/gcc/asan.c b/gcc/asan.c
index 53992a8..28a476f 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1017,8 +1017,17 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
unsigned int alignb,
base_align_bias = ((asan_frame_size + alignb - 1)
~(alignb - HOST_WIDE_INT_1)) - asan_frame_size;
 }
+  /* Align base if target is STRICT_ALIGNMENT.  */
+  if (STRICT_ALIGNMENT)
+base = expand_binop (Pmode, and_optab, base,
+gen_int_mode (-((GET_MODE_ALIGNMENT (SImode)
+  ASAN_SHADOW_SHIFT)
+/ BITS_PER_UNIT), Pmode), NULL_RTX,
+1, OPTAB_DIRECT);
+
   if (use_after_return_class == -1  pbase)
 emit_move_insn (pbase, base);
+
   base = expand_binop (Pmode, add_optab, base,
   gen_int_mode (base_offset - base_align_bias, Pmode),
   NULL_RTX, 1, OPTAB_DIRECT);
@@ -1097,6 +1106,8 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned 
int alignb,
   (ASAN_RED_ZONE_SIZE  ASAN_SHADOW_SHIFT) == 4);
   shadow_mem = gen_rtx_MEM (SImode, shadow_base);
   set_mem_alias_set (shadow_mem, asan_shadow_set);
+  if (STRICT_ALIGNMENT)
+set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
   prev_offset = base_offset;
   for (l = length; l; l -= 2)
 {
@@ -1186,6 +1197,10 @@ asan_emit_stack_protection (rtx base, rtx pbase, 
unsigned int alignb,
 
   shadow_mem = gen_rtx_MEM (BLKmode, shadow_base);
   set_mem_alias_set (shadow_mem, asan_shadow_set);
+
+  if (STRICT_ALIGNMENT)
+set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
+
   prev_offset = base_offset;
   last_offset = base_offset;
   last_size = 0;
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 5c23b72..c72c30f 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1013,10 +1013,18 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)
  if (data-asan_base == NULL)
data-asan_base = gen_reg_rtx (Pmode);
  base = data-asan_base;
+
+  if (!STRICT_ALIGNMENT)
+base_align = crtl-max_used_stack_slot_alignment;
+  else
+base_align = MAX (crtl-max_used_stack_slot_alignment,
+GET_MODE_ALIGNMENT (SImode)  ASAN_SHADOW_SHIFT);
}
  else
-   offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
- base_align = crtl-max_used_stack_slot_alignment;
+   {
+ offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
+ base_align = crtl-max_used_stack_slot_alignment;
+   }
}
   else
{
@@ -1843,6 +1851,10 @@ expand_used_vars (void)
= alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE);
  data.asan_vec.safe_push (prev_offset);
  data.asan_vec.safe_push (offset);
+ /* Leave a space for alignment if STRICT_ALIGNMENT.  */
+ if (STRICT_ALIGNMENT)
+   alloc_stack_frame_space ((GET_MODE_ALIGNMENT (SImode) 
+   ASAN_SHADOW_SHIFT) / BITS_PER_UNIT, 1);
 
  var_end_seq
= 

[Patch v3] Try to peephole2 QI *4 into SI on i386

2014-03-03 Thread lin zuojian
Hi,
Patch v1 v2 all have problem.Checkout v3.
--
Regards
lin zuojian

---
 gcc/config/i386/i386.md | 49 +
 1 file changed, 49 insertions(+)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b9f1320..80d3bf7 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18535,6 +18535,55 @@
   [(set_attr type other)
(set_attr length 3)])
 
+(define_peephole2
+ [(set (mem:QI (match_operand 0 register_operand))
+   (match_operand 1 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(const_int 1)))
+   (match_operand 2 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(const_int 2)))
+   (match_operand 3 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(const_int 3)))
+   (match_operand 4 const_int_operand))]
+   
+   [(set (mem:SI (match_dup 0))
+   (match_operand 5 const_int_operand))]
+{
+   int32_t _const = (INTVAL(operands[1])  0xff) | (((INTVAL(operands[2])) 
 0xff)   8)
+   
| (((INTVAL(operands[3]))  0xff)   16) | 
(((INTVAL(operands[4]))  0xff)   24);
+   operands[5] = gen_rtx_CONST_INT (SImode, _const);
+}
+)
+
+(define_peephole2
+ [(set (mem:QI (plus (match_operand 0 register_operand)
+(match_operand 6 
const_int_operand)))
+   (match_operand 1 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(match_operand 7 
const_int_operand)))
+   (match_operand 2 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(match_operand 8 
const_int_operand)))
+   (match_operand 3 const_int_operand))
+   (set (mem:QI (plus (match_dup 0)
+(match_operand 9 
const_int_operand)))
+   (match_operand 4 const_int_operand))]
+   
+   [(set (mem:SI (match_dup 0))
+   (match_operand 5 const_int_operand))]
+{
+   if ((INTVAL(operands[7]) - INTVAL(operands[6]) != 1)
+   || (INTVAL(operands[8]) - INTVAL(operands[7]) != 1)
+   || (INTVAL(operands[9]) - INTVAL(operands[8]) != 1))
+   FAIL;
+   int32_t _const = (INTVAL(operands[1])  0xff) | (((INTVAL(operands[2])) 
 0xff)   8)
+   
| (((INTVAL(operands[3]))  0xff)   16) | 
(((INTVAL(operands[4]))  0xff)   24);
+   operands[5] = gen_rtx_CONST_INT (SImode, _const);
+}
+)
+
 (include mmx.md)
 (include sse.md)
 (include sync.md)
-- 
1.8.3.2




Re: [Patch] Try to peephole2 QI *4 into SI on i386

2014-03-03 Thread Andrew Pinski
On Mon, Mar 3, 2014 at 5:02 AM, lin zuojian manjian2...@gmail.com wrote:

Testcase?
How about making a generic pass which does this?

See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684 also.  At the
same time this can be used to do the store pair optimization for
ARM/AARCH64 too.

Thanks,
Andrew



 ---
  gcc/config/i386/i386.md | 49 
 +
  1 file changed, 49 insertions(+)

 diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
 index b9f1320..86ab025 100644
 --- a/gcc/config/i386/i386.md
 +++ b/gcc/config/i386/i386.md
 @@ -18535,6 +18535,55 @@
[(set_attr type other)
 (set_attr length 3)])

 +(define_peephole2
 + [(set (mem:QI (match_operand 0 register_operand))
 +   (match_operand 1 const_int_operand))
 +   (set (mem:QI (plus (match_dup 0)
 +(const_int 1)))
 +   (match_operand 2 const_int_operand))
 +   (set (mem:QI (plus (match_dup 0)
 +(const_int 2)))
 +   (match_operand 3 const_int_operand))
 +   (set (mem:QI (plus (match_dup 0)
 +(const_int 3)))
 +   (match_operand 4 const_int_operand))]
 +   
 +   [(set (mem:SI (match_dup 0))
 +   (match_operand 5 const_int_operand))]
 +{
 +   int32_t _const = (INTVAL(operands[1])  0xff) | 
 (((INTVAL(operands[2]))  0xff)   8)
 + 
   | (((INTVAL(operands[3]))  0xff)   16) | 
 (((INTVAL(operands[4]))  0xff)   24);
 +   operands[5] = gen_rtx_CONST_INT (SImode, _const);
 +}
 +)
 +
 +(define_peephole2
 + [(set (mem:QI (plus (match_operand 0 register_operand)
 +(match_operand 6 
 const_int_operand)))
 +   (match_operand 1 const_int_operand))
 +   (set (mem:QI (plus (match_dup 0)
 +(match_operand 7 
 const_int_operand)))
 +   (match_operand 2 const_int_operand))
 +   (set (mem:QI (plus (match_dup 0)
 +(match_operand 8 
 const_int_operand)))
 +   (match_operand 3 const_int_operand))
 +   (set (mem:QI (plus (match_dup 0)
 +(match_operand 9 
 const_int_operand)))
 +   (match_operand 4 const_int_operand))]
 +   
 +   [(set (mem:SI (match_dup 0))
 +   (match_operand 5 const_int_operand))]
 +{
 +   if ((INTVAL(operands[7]) - INTVAL(operands[6]) != 1)
 +(INTVAL(operands[8]) - INTVAL(operands[7]) != 1)
 +(INTVAL(operands[9]) - INTVAL(operands[8]) != 1))
 +   FAIL;
 +   int32_t _const = (INTVAL(operands[1])  0xff) | 
 (((INTVAL(operands[2]))  0xff)   8)
 + 
   | (((INTVAL(operands[3]))  0xff)   16) | 
 (((INTVAL(operands[4]))  0xff)   24);
 +   operands[5] = gen_rtx_CONST_INT (SImode, _const);
 +}
 +)
 +
  (include mmx.md)
  (include sse.md)
  (include sync.md)
 --
 1.8.3.2


 - End forwarded message -


Re: [Patch] Try to peephole2 QI *4 into SI on i386

2014-03-03 Thread lin zuojian
On Mon, Mar 03, 2014 at 07:19:51PM -0800, Andrew Pinski wrote:
 On Mon, Mar 3, 2014 at 5:02 AM, lin zuojian manjian2...@gmail.com wrote:
 
 Testcase?
The test case is the same with
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684.
 How about making a generic pass which does this?
Maybe GIMPLE should add a pass about it.Now slp pass has hadled it,but
only available in O3.
 
 See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684 also.  At the
 same time this can be used to do the store pair optimization for
 ARM/AARCH64 too.
 
 Thanks,
 Andrew
 
 
 
  ---
   gcc/config/i386/i386.md | 49 
  +
   1 file changed, 49 insertions(+)
 
  diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
  index b9f1320..86ab025 100644
  --- a/gcc/config/i386/i386.md
  +++ b/gcc/config/i386/i386.md
  @@ -18535,6 +18535,55 @@
 [(set_attr type other)
  (set_attr length 3)])
 
  +(define_peephole2
  + [(set (mem:QI (match_operand 0 register_operand))
  +   (match_operand 1 
  const_int_operand))
  +   (set (mem:QI (plus (match_dup 0)
  +(const_int 1)))
  +   (match_operand 2 const_int_operand))
  +   (set (mem:QI (plus (match_dup 0)
  +(const_int 2)))
  +   (match_operand 3 const_int_operand))
  +   (set (mem:QI (plus (match_dup 0)
  +(const_int 3)))
  +   (match_operand 4 const_int_operand))]
  +   
  +   [(set (mem:SI (match_dup 0))
  +   (match_operand 5 const_int_operand))]
  +{
  +   int32_t _const = (INTVAL(operands[1])  0xff) | 
  (((INTVAL(operands[2]))  0xff)   8)
  +   
  | (((INTVAL(operands[3]))  0xff)   16) | 
  (((INTVAL(operands[4]))  0xff)   24);
  +   operands[5] = gen_rtx_CONST_INT (SImode, _const);
  +}
  +)
  +
  +(define_peephole2
  + [(set (mem:QI (plus (match_operand 0 register_operand)
  +(match_operand 6 
  const_int_operand)))
  +   (match_operand 1 
  const_int_operand))
  +   (set (mem:QI (plus (match_dup 0)
  +(match_operand 7 
  const_int_operand)))
  +   (match_operand 2 const_int_operand))
  +   (set (mem:QI (plus (match_dup 0)
  +(match_operand 8 
  const_int_operand)))
  +   (match_operand 3 const_int_operand))
  +   (set (mem:QI (plus (match_dup 0)
  +(match_operand 9 
  const_int_operand)))
  +   (match_operand 4 const_int_operand))]
  +   
  +   [(set (mem:SI (match_dup 0))
  +   (match_operand 5 const_int_operand))]
  +{
  +   if ((INTVAL(operands[7]) - INTVAL(operands[6]) != 1)
  +(INTVAL(operands[8]) - INTVAL(operands[7]) != 1)
  +(INTVAL(operands[9]) - INTVAL(operands[8]) != 1))
  +   FAIL;
  +   int32_t _const = (INTVAL(operands[1])  0xff) | 
  (((INTVAL(operands[2]))  0xff)   8)
  +   
  | (((INTVAL(operands[3]))  0xff)   16) | 
  (((INTVAL(operands[4]))  0xff)   24);
  +   operands[5] = gen_rtx_CONST_INT (SImode, _const);
  +}
  +)
  +
   (include mmx.md)
   (include sse.md)
   (include sync.md)
  --
  1.8.3.2
 
 
  - End forwarded message -


Website patch committed: Add missing /li

2014-03-03 Thread Ian Lance Taylor
Gerald's script pointed out that I forgot to add a /li to my recent
patch to gcc-4.9/changes.html.  Fixed with the appended.  Committed.

Ian

Index: gcc-4.9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.58
diff -u -r1.58 changes.html
--- gcc-4.9/changes.html	3 Mar 2014 20:17:52 -	1.58
+++ gcc-4.9/changes.html	4 Mar 2014 05:39:47 -
@@ -400,7 +400,7 @@
 h3 id=goGo/h3
   ul
 liGCC 4.9 provides a complete implementation of the Go 1.2.1
-release.
+  release./li
   /ul
 
 !--


Merge from trunk to gccgo branch

2014-03-03 Thread Ian Lance Taylor
I merged GCC trunk revision 208301 to the gccgo branch.

Ian


Re: [Patch] Try to peephole2 QI *4 into SI on i386

2014-03-03 Thread lin zuojian
And please note that my patch is for the situation where expansion has
been done.Like instrumentation.
--
Regards
lin zuojian

On Tue, Mar 04, 2014 at 11:44:06AM +0800, lin zuojian wrote:
 On Mon, Mar 03, 2014 at 07:19:51PM -0800, Andrew Pinski wrote:
  On Mon, Mar 3, 2014 at 5:02 AM, lin zuojian manjian2...@gmail.com wrote:
  
  Testcase?
 The test case is the same with
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684.
  How about making a generic pass which does this?
 Maybe GIMPLE should add a pass about it.Now slp pass has hadled it,but
 only available in O3.
  
  See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684 also.  At the
  same time this can be used to do the store pair optimization for
  ARM/AARCH64 too.
  
  Thanks,
  Andrew
  
  
  
   ---
gcc/config/i386/i386.md | 49 
   +
1 file changed, 49 insertions(+)
  
   diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
   index b9f1320..86ab025 100644
   --- a/gcc/config/i386/i386.md
   +++ b/gcc/config/i386/i386.md
   @@ -18535,6 +18535,55 @@
  [(set_attr type other)
   (set_attr length 3)])
  
   +(define_peephole2
   + [(set (mem:QI (match_operand 0 register_operand))
   +   (match_operand 1 
   const_int_operand))
   +   (set (mem:QI (plus (match_dup 0)
   +(const_int 1)))
   +   (match_operand 2 const_int_operand))
   +   (set (mem:QI (plus (match_dup 0)
   +(const_int 2)))
   +   (match_operand 3 const_int_operand))
   +   (set (mem:QI (plus (match_dup 0)
   +(const_int 3)))
   +   (match_operand 4 const_int_operand))]
   +   
   +   [(set (mem:SI (match_dup 0))
   +   (match_operand 5 const_int_operand))]
   +{
   +   int32_t _const = (INTVAL(operands[1])  0xff) | 
   (((INTVAL(operands[2]))  0xff)   8)
   + 
 | (((INTVAL(operands[3]))  0xff)   16) | 
   (((INTVAL(operands[4]))  0xff)   24);
   +   operands[5] = gen_rtx_CONST_INT (SImode, _const);
   +}
   +)
   +
   +(define_peephole2
   + [(set (mem:QI (plus (match_operand 0 register_operand)
   +(match_operand 6 
   const_int_operand)))
   +   (match_operand 1 
   const_int_operand))
   +   (set (mem:QI (plus (match_dup 0)
   +(match_operand 7 
   const_int_operand)))
   +   (match_operand 2 const_int_operand))
   +   (set (mem:QI (plus (match_dup 0)
   +(match_operand 8 
   const_int_operand)))
   +   (match_operand 3 const_int_operand))
   +   (set (mem:QI (plus (match_dup 0)
   +(match_operand 9 
   const_int_operand)))
   +   (match_operand 4 const_int_operand))]
   +   
   +   [(set (mem:SI (match_dup 0))
   +   (match_operand 5 const_int_operand))]
   +{
   +   if ((INTVAL(operands[7]) - INTVAL(operands[6]) != 1)
   +(INTVAL(operands[8]) - INTVAL(operands[7]) != 
   1)
   +(INTVAL(operands[9]) - INTVAL(operands[8]) != 
   1))
   +   FAIL;
   +   int32_t _const = (INTVAL(operands[1])  0xff) | 
   (((INTVAL(operands[2]))  0xff)   8)
   + 
 | (((INTVAL(operands[3]))  0xff)   16) | 
   (((INTVAL(operands[4]))  0xff)   24);
   +   operands[5] = gen_rtx_CONST_INT (SImode, _const);
   +}
   +)
   +
(include mmx.md)
(include sse.md)
(include sync.md)
   --
   1.8.3.2
  
  
   - End forwarded message -


Re: [RFC] Do not consider volatile asms as optimization barriers #1

2014-03-03 Thread Richard Sandiford
Richard Sandiford rdsandif...@googlemail.com writes:
 I'll run a full test overnight, but does this look like it might be
 a way out, at least for 4.9?

FWIW, it passed testing on x86_64-linux-gnu ({,-m32}, all,ada).
Here it is again with an updated cselib.c comment.  OK to install?

Thanks,
Richard


gcc/
* builtins.c (expand_builtin_setjmp_receiver): Use and clobber
hard_frame_pointer_rtx.
* cse.c (cse_insn): Remove volatile check.
* cselib.c (cselib_process_insn): Likewise.
* dse.c (scan_insn): Likewise.

Index: gcc/builtins.c
===
--- gcc/builtins.c  2014-03-03 21:47:59.749026019 +
+++ gcc/builtins.c  2014-03-03 21:48:00.550030853 +
@@ -910,18 +910,27 @@ expand_builtin_setjmp_receiver (rtx rece
 #ifdef HAVE_nonlocal_goto
   if (! HAVE_nonlocal_goto)
 #endif
-/* First adjust our frame pointer to its actual value.  It was
-   previously set to the start of the virtual area corresponding to
-   the stacked variables when we branched here and now needs to be
-   adjusted to the actual hardware fp value.
-
-   Assignments to virtual registers are converted by
-   instantiate_virtual_regs into the corresponding assignment
-   to the underlying register (fp in this case) that makes
-   the original assignment true.
-   So the following insn will actually be decrementing fp by
-   STARTING_FRAME_OFFSET.  */
-emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx);
+{
+  /* First adjust our frame pointer to its actual value.  It was
+previously set to the start of the virtual area corresponding to
+the stacked variables when we branched here and now needs to be
+adjusted to the actual hardware fp value.
+
+Assignments to virtual registers are converted by
+instantiate_virtual_regs into the corresponding assignment
+to the underlying register (fp in this case) that makes
+the original assignment true.
+So the following insn will actually be decrementing fp by
+STARTING_FRAME_OFFSET.  */
+  emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx);
+
+  /* Restoring the frame pointer also modifies the hard frame pointer.
+Mark it used (so that the previous assignment remains live once
+the frame pointer is eliminated) and clobbered (to represent the
+implicit update from the assignment).  */
+  emit_use (hard_frame_pointer_rtx);
+  emit_clobber (hard_frame_pointer_rtx);
+}
 
 #if !HARD_FRAME_POINTER_IS_ARG_POINTER
   if (fixed_regs[ARG_POINTER_REGNUM])
@@ -965,8 +974,7 @@ expand_builtin_setjmp_receiver (rtx rece
 
   /* We must not allow the code we just generated to be reordered by
  scheduling.  Specifically, the update of the frame pointer must
- happen immediately, not later.  Similarly, we must block
- (frame-related) register values to be used across this code.  */
+ happen immediately, not later.  */
   emit_insn (gen_blockage ());
 }
 
Index: gcc/cse.c
===
--- gcc/cse.c   2014-03-03 21:47:59.869026741 +
+++ gcc/cse.c   2014-03-03 21:48:00.625031305 +
@@ -5664,11 +5664,6 @@ cse_insn (rtx insn)
  invalidate (XEXP (dest, 0), GET_MODE (dest));
   }
 
-  /* A volatile ASM or an UNSPEC_VOLATILE invalidates everything.  */
-  if (NONJUMP_INSN_P (insn)
-   volatile_insn_p (PATTERN (insn)))
-flush_hash_table ();
-
   /* Don't cse over a call to setjmp; on some machines (eg VAX)
  the regs restored by the longjmp come from a later time
  than the setjmp.  */
Index: gcc/cselib.c
===
--- gcc/cselib.c2014-03-03 21:47:59.870026748 +
+++ gcc/cselib.c2014-03-03 22:09:24.211994918 +
@@ -2626,12 +2626,10 @@ cselib_process_insn (rtx insn)
 
   cselib_current_insn = insn;
 
-  /* Forget everything at a CODE_LABEL, a volatile insn, or a setjmp.  */
+  /* Forget everything at a CODE_LABEL or a setjmp.  */
   if ((LABEL_P (insn)
|| (CALL_P (insn)
-   find_reg_note (insn, REG_SETJMP, NULL))
-   || (NONJUMP_INSN_P (insn)
-   volatile_insn_p (PATTERN (insn
+   find_reg_note (insn, REG_SETJMP, NULL)))
!cselib_preserve_constants)
 {
   cselib_reset_table (next_uid);
Index: gcc/dse.c
===
--- gcc/dse.c   2014-03-03 21:47:59.871026754 +
+++ gcc/dse.c   2014-03-03 21:48:00.627031317 +
@@ -2470,16 +2470,6 @@ scan_insn (bb_info_t bb_info, rtx insn)
   return;
 }
 
-  /* Cselib clears the table for this case, so we have to essentially
- do the same.  */
-  if (NONJUMP_INSN_P (insn)
-   volatile_insn_p (PATTERN (insn)))
-{
-  add_wild_read (bb_info);
-  insn_info-cannot_delete = true;
-  

Re: [RFC] Do not consider volatile asms as optimization barriers #1

2014-03-03 Thread Richard Sandiford
Eric Botcazou ebotca...@adacore.com writes:
 That doesn't really answer the question though.  What's the correct
 behaviour for an unspec volatile in a loop?  I don't think it's what
 we did in the example above, since it doesn't seem self-consistent.
 And not spending too much time is again a bit vague in terms of
 saying what's right and what's wrong.

 not spending too much time is a polite way to say I don't really care. :-)
 If you do, feel free to post a formal definition, a implementation plan and 
 maybe a patch at some point.

Well, I still reckon unspec_volatile and volatile asm should be volatile
in the same way.  There should be no implicit magic clobbers.

Thanks,
Richard


Re: [Patch] Try to peephole2 QI *4 into SI on i386

2014-03-03 Thread Jakub Jelinek
On Mon, Mar 03, 2014 at 07:19:51PM -0800, Andrew Pinski wrote:
 On Mon, Mar 3, 2014 at 5:02 AM, lin zuojian manjian2...@gmail.com wrote:
 
 Testcase?
 How about making a generic pass which does this?
 
 See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684 also.  At the
 same time this can be used to do the store pair optimization for
 ARM/AARCH64 too.

Yeah, I'll try to ressurrect my PR22141 patch after 4.9 branches, and would
appreciate if more people would cooperate in finding out the best heuristics
when it should be applied (for -Os it is usually clear, for non-strict align
targets and optimization for speed less so, for strict align targets even
less so).

Jakub


Re: [PATCH 1/4] [GOMP4] [Fortran] OpenACC 1.0+ support in fortran front-end

2014-03-03 Thread Ilmir Usmanov

Hi Tobias!

I fixed my patches. Could you review them?



I'd use integer instead of INTEGER as it is not a 'reserved' word 
OpenMP implementation uses capital letters, so, perhaps, we also should 
do this, for consistency?


--
Ilmir.


Re: [PATCH 2/4] [GOMP4] [Fortran] OpenACC 1.0+ support in fortran front-end

2014-03-03 Thread Ilmir Usmanov

OpenACC 1.0 fortran FE support -- matching and resolving.

* gcc/fortran/openmp.c
(gfc_free_omp_clauses): Remove also OpenACC clauses.
(gfc_free_expr_list): New function to clear expression list.
(match_oacc_expr_list): New function to match expression list.
(match_oacc_clause_gang): New function to match OpenACC 2.0 gang 
clauses.

(OMP_CLAUSE_ASYNC, OMP_CLAUSE_NUM_GANGS,
OMP_CLAUSE_NUM_WORKERS, OMP_CLAUSE_VECTOR_LENGTH,
OMP_CLAUSE_COPY, OMP_CLAUSE_OACC_COPYIN,
OMP_CLAUSE_COPYOUT, OMP_CLAUSE_CREATE, OMP_CLAUSE_PRESENT,
OMP_CLAUSE_PRESENT_OR_COPY, OMP_CLAUSE_PRESENT_OR_COPYIN,
OMP_CLAUSE_PRESENT_OR_COPYOUT, OMP_CLAUSE_PRESENT_OR_CREATE,
OMP_CLAUSE_DEVICEPTR, OMP_CLAUSE_GANG, OMP_CLAUSE_WORKER,
OMP_CLAUSE_VECTOR, OMP_CLAUSE_SEQ, OMP_CLAUSE_INDEPENDENT,
OMP_CLAUSE_USE_DEVICE, OMP_CLAUSE_HOST, OMP_CLAUSE_DEVICE_RESIDENT,
OMP_CLAUSE_DEVICE, OMP_CLAUSE_DEFAULT, OMP_CLAUSE_WAIT,
OMP_CLAUSE_DELETE, OMP_CLAUSE_AUTO, OMP_CLAUSE_TILE): New clauses.
(OACC_PARALLEL_CLAUSES, OACC_KERNELS_CLAUSES, OACC_DATA_CLAUSES,
OACC_LOOP_CLAUSES, OACC_PARALLEL_LOOP_CLAUSES,
OACC_KERNELS_LOOP_CLAUSES, OACC_HOST_DATA_CLAUSES, 
OACC_DECLARE_CLAUSES,

OACC_UPDATE_CLAUSES, OACC_ENTER_DATA_CLAUSES,
OACC_EXIT_DATA_CLAUSES): New defines.
(gfc_match_oacc_parallel_loop, gfc_match_oacc_parallel,
gfc_match_oacc_kernels_loop, gfc_match_oacc_kernels,
gfc_match_oacc_data, gfc_match_oacc_host_data, gfc_match_oacc_loop,
gfc_match_oacc_declare, gfc_match_oacc_update,
gfc_match_oacc_enter_data, gfc_match_oacc_exit_data,
gfc_match_oacc_wait, gfc_match_oacc_cache, oacc_is_loop,
check_symbol_not_pointer, resolve_oacc_scalar_int_expr,
resolve_oacc_positive_int_expr, check_array_not_assumed,
resolve_oacc_data_clauses, resolve_oacc_deviceptr_clause,
oacc_is_parallel, oacc_is_kernels, omp_code_to_statement,
oacc_code_to_statement, resolve_oacc_directive_inside_omp_region,
resolve_omp_directive_inside_oacc_region, resolve_oacc_nested_loops,
resolve_oacc_params_in_parallel, resolve_oacc_loop_blocks,
gfc_resolve_oacc_blocks, resolve_oacc_loop, resolve_oacc_cache,
resolve_oacc_wait, gfc_resolve_oacc_declare,
gfc_resolve_oacc_directive): New functions.
(resolve_omp_clauses): Resolve also OpenACC clauses.
(gfc_resolve_omp_directive): Check for enclosing OpenACC region.
From 11beca16743c66a112ec44717eaf9ed61f203d1e Mon Sep 17 00:00:00 2001
From: Ilmir Usmanov i.usma...@samsung.com
Date: Wed, 26 Feb 2014 19:03:12 +0400
Subject: [PATCH 2/5] OpenACC Fortran FE: part 2

---
 gcc/fortran/openmp.c | 1221 +-
 1 file changed, 1199 insertions(+), 22 deletions(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index dff3ab1..c582d51 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -69,11 +69,37 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   gfc_free_expr (c-final_expr);
   gfc_free_expr (c-num_threads);
   gfc_free_expr (c-chunk_size);
+  gfc_free_expr (c-async_expr);
+  gfc_free_expr (c-gang_expr);
+  gfc_free_expr (c-worker_expr);
+  gfc_free_expr (c-vector_expr);
+  gfc_free_expr (c-num_gangs_expr);
+  gfc_free_expr (c-num_workers_expr);
+  gfc_free_expr (c-vector_length_expr);
+  gfc_free_expr (c-non_clause_wait_expr);
+
   for (i = 0; i  OMP_LIST_NUM; i++)
 gfc_free_namelist (c-lists[i]);
+
+  gfc_free_expr_list (c-wait_list);
+  gfc_free_expr_list (c-tile_list);
+
   free (c);
 }
 
+/* Free expression list. */
+void
+gfc_free_expr_list (gfc_expr_list *list)
+{
+  gfc_expr_list *n;
+
+  for (; list; list = n)
+{
+  n = list-next;
+  free (list);
+}
+}
+
 /* Match a variable/common block list and construct a namelist from it.  */
 
 static match
@@ -169,6 +195,87 @@ cleanup:
   return MATCH_ERROR;
 }
 
+static match
+match_oacc_expr_list (const char *str, gfc_expr_list **list, bool allow_asterisk)
+{
+  gfc_expr_list *head, *tail, *p;
+  locus old_loc;
+  gfc_expr *expr;
+  match m;
+
+  head = tail = NULL;
+
+  old_loc = gfc_current_locus;
+
+  m = gfc_match (str);
+  if (m != MATCH_YES)
+return m;
+
+  for (;;)
+{
+  m = gfc_match_expr (expr);
+  if (m == MATCH_YES || allow_asterisk)
+	{
+	  p = gfc_get_expr_list ();
+	  if (head == NULL)
+	head = tail = p;
+	  else
+	{
+	  tail-next = p;
+	  tail = tail-next;
+	}
+	  if (m == MATCH_YES)
+	tail-expr = expr;
+	  else if (gfc_match ( *) != MATCH_YES)
+	goto syntax;
+	  goto next_item;
+	}
+  if (m == MATCH_ERROR)
+	goto cleanup;
+  goto syntax;
+
+next_item:
+  if (gfc_match_char (')') == MATCH_YES)
+	break;
+  if (gfc_match_char (',') != MATCH_YES)
+	goto syntax;
+}
+
+  while (*list)
+list = (*list)-next;
+
+  *list = head;
+  return MATCH_YES;
+
+syntax:
+  gfc_error (Syntax error in OpenACC expression list at %C);
+
+cleanup:
+  gfc_free_expr_list (head);
+  gfc_current_locus = old_loc;
+  

Re: [PATCH 3/4] [GOMP4] [Fortran] OpenACC 1.0+ support in fortran front-end

2014-03-03 Thread Ilmir Usmanov

OpenACC 1.0 fortran FE support -- translation to GENERIC.

gcc/fortran/
* trans-decl.c
(gfc_generate_function_code): Insert OACC_DECLARE GENERIC node.
* trans-openmp.c (gfc_convert_expr_to_tree): New helper function.
(gfc_trans_omp_array_reduction): Support also OpenACC. Add parameter.
(gfc_trans_omp_reduction_list): Update.
(gfc_trans_oacc_construct): New transform function.
(gfc_trans_omp_map_clause_list): Likewise.
(gfc_trans_oacc_executable_directive): Likewise.
(gfc_trans_oacc_combined_directive, gfc_trans_oacc_declare): Likewise.
(gfc_trans_oacc_directive): Use them.
(gfc_trans_oacc_loop): Stub.
(gfc_trans_omp_clauses): Transform OpenACC clauses.
* trans-stmt.h  (gfc_trans_oacc_directive): New function prototype.
(gfc_trans_oacc_declare): Likewise.
* trans.c (trans_code): Transform also OpenACC directives.
From b5010b30cc9fe6495cb9b9df356c044d267210c9 Mon Sep 17 00:00:00 2001
From: Ilmir Usmanov i.usma...@samsung.com
Date: Wed, 26 Feb 2014 19:03:24 +0400
Subject: [PATCH 3/5] OpenACC Fortran FE: part 3

---
 gcc/fortran/trans-decl.c   |   7 +
 gcc/fortran/trans-openmp.c | 357 +
 gcc/fortran/trans-stmt.c   |   8 +
 gcc/fortran/trans-stmt.h   |   4 +
 gcc/fortran/trans.c|  15 ++
 5 files changed, 391 insertions(+)

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 9c86653..ad26ef8 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -5606,6 +5606,13 @@ gfc_generate_function_code (gfc_namespace * ns)
   if ((gfc_option.rtcheck  GFC_RTCHECK_BOUNDS)  !sym-attr.is_bind_c)
 add_argument_checking (body, sym);
 
+  /* Generate !$ACC DECLARE directive. */
+  if (ns-oacc_declare_clauses)
+{
+  tree tmp = gfc_trans_oacc_declare (body, ns);
+  gfc_add_expr_to_block (body, tmp);
+}
+
   tmp = gfc_trans_code (ns-code);
   gfc_add_expr_to_block (body, tmp);
 
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 41020a8..a1abd66 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -767,6 +767,40 @@ gfc_trans_omp_reduction_list (gfc_namelist *namelist, tree list,
 }
 
 static tree
+gfc_trans_omp_map_clause_list (enum omp_clause_map_kind kind, 
+			   gfc_namelist *namelist, tree list)
+{
+  for (; namelist != NULL; namelist = namelist-next)
+if (namelist-sym-attr.referenced)
+  {
+	tree t = gfc_trans_omp_variable (namelist-sym);
+	if (t != error_mark_node)
+	  {
+	tree node = build_omp_clause (input_location, OMP_CLAUSE_MAP);
+	OMP_CLAUSE_DECL (node) = t;
+	OMP_CLAUSE_MAP_KIND (node) = kind;
+	list = gfc_trans_add_clause (node, list);
+	  }
+  }
+  return list;
+}
+
+static inline tree
+gfc_convert_expr_to_tree (stmtblock_t *block, gfc_expr *expr)
+{
+  gfc_se se;
+  tree result;
+
+  gfc_init_se (se, NULL );
+  gfc_conv_expr (se, expr);
+  gfc_add_block_to_block (block, se.pre);
+  result = gfc_evaluate_now (se.expr, block);
+  gfc_add_block_to_block (block, se.post);
+
+  return result;
+}
+
+static tree
 gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		   locus where)
 {
@@ -834,6 +868,51 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 	where);
 	  continue;
 	}
+  if (list = OMP_LIST_DATA_CLAUSE_FIRST
+	   list = OMP_LIST_DATA_CLAUSE_LAST)
+	{
+	  enum omp_clause_map_kind kind;
+	  switch (list) 
+	{
+	case OMP_LIST_COPY:
+	  kind = OMP_CLAUSE_MAP_FORCE_TOFROM;
+	  break;
+	case OMP_LIST_OACC_COPYIN:
+	  kind = OMP_CLAUSE_MAP_FORCE_TO;
+	  break;
+	case OMP_LIST_COPYOUT:
+	  kind = OMP_CLAUSE_MAP_FORCE_FROM;
+	  break;
+	case OMP_LIST_CREATE:
+	  kind = OMP_CLAUSE_MAP_FORCE_ALLOC;
+	  break;
+	case OMP_LIST_DELETE:
+	  kind = OMP_CLAUSE_MAP_FORCE_DEALLOC;
+	  break;
+	case OMP_LIST_PRESENT:
+	  kind = OMP_CLAUSE_MAP_FORCE_PRESENT;
+	  break;
+	case OMP_LIST_PRESENT_OR_COPY:
+	  kind = OMP_CLAUSE_MAP_TOFROM;
+	  break;
+	case OMP_LIST_PRESENT_OR_COPYIN:
+	  kind = OMP_CLAUSE_MAP_TO;
+	  break;
+	case OMP_LIST_PRESENT_OR_COPYOUT:
+	  kind = OMP_CLAUSE_MAP_FROM;
+	  break;
+	case OMP_LIST_PRESENT_OR_CREATE:
+	  kind = OMP_CLAUSE_MAP_ALLOC;
+	  break;
+	case OMP_LIST_DEVICEPTR:
+	  kind = OMP_CLAUSE_MAP_FORCE_DEVICEPTR;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+	  omp_clauses = gfc_trans_omp_map_clause_list (kind, n, omp_clauses);
+	  continue;
+	}
   switch (list)
 	{
 	case OMP_LIST_PRIVATE:
@@ -853,6 +932,21 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 	  goto add_clause;
 	case OMP_LIST_COPYPRIVATE:
 	  clause_code = OMP_CLAUSE_COPYPRIVATE;
+	  goto add_clause;
+	case OMP_LIST_USE_DEVICE:
+	  clause_code = OMP_CLAUSE_USE_DEVICE;
+	  goto add_clause;
+	case OMP_LIST_DEVICE_RESIDENT:
+	  clause_code = 

Re: [Patch] Try to peephole2 QI *4 into SI on i386

2014-03-03 Thread lin zuojian
On Tue, Mar 04, 2014 at 08:48:40AM +0100, Jakub Jelinek wrote:
 On Mon, Mar 03, 2014 at 07:19:51PM -0800, Andrew Pinski wrote:
  On Mon, Mar 3, 2014 at 5:02 AM, lin zuojian manjian2...@gmail.com wrote:
  
  Testcase?
  How about making a generic pass which does this?
  
  See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23684 also.  At the
  same time this can be used to do the store pair optimization for
  ARM/AARCH64 too.
 
 Yeah, I'll try to ressurrect my PR22141 patch after 4.9 branches, and would
 appreciate if more people would cooperate in finding out the best heuristics
 when it should be applied (for -Os it is usually clear, for non-strict align
 targets and optimization for speed less so, for strict align targets even
 less so).
 
   Jakub

Hi,
Seem been handled.Ignore this patch.All of them are wrong.
--
Regards
lin zuojian