Re: fix PR46029: reimplement if conversion of loads and stores

2015-06-22 Thread Alan Lawrence

Abe Skolnik wrote:

Hi everybody!

In the current implementation of if conversion, loads and stores are
if-converted in a thread-unsafe way:

  * loads were always executed, even when they should have not been.
Some source code could be rendered invalid due to null pointers
that were OK in the original program because they were never
dereferenced.

  * writes were if-converted via load/maybe-modify/store, which
renders some code multithreading-unsafe.

This patch reimplements if-conversion of loads and stores in a safe
way using a scratchpad allocated by the compiler on the stack:

  * loads are done through an indirection, reading either the correct
data from the correct source [if the condition is true] or reading
from the scratchpad and later ignoring this read result [if the
condition is false].

  * writes are also done through an indirection, writing either to the
correct destination [if the condition is true] or to the
scratchpad [if the condition is false].

Vectorization of if-cvt-stores-vect-ifcvt-18.c disabled because the
old if-conversion resulted in unsafe code that could fail under
multithreading even though the as-written code _was_ thread-safe.

Passed regression testing and bootstrap on amd64-linux.
Is this OK to commit to trunk?

Regards,

Abe



Thanks for getting back to this!

My main thought concerns the direction we are travelling here. A major reason 
why we do if-conversion is to enable vectorization. Is this is targetted at 
gathering/scattering loads? Following vectorization, different elements of the 
vector being loaded/stored may have to go to/from the scratchpad or to/from main 
memory.


Or, are we aiming at the case where the predicate or address are invariant? That 
seems unlikely - loop unswitching would be better for the predicate; loading 
from an address, we'd just peel and hoist; storing, this'd result in the address 
holding the last value written, at exit from the loop, a curious idiom. Where 
the predicate/address is invariant across the vector? (!)


Or, at we aiming at non-vectorized code?

Beyond that question...

Does the description for -ftree-loop-if-convert-stores in doc/invoke.texi 
describe what the flag now does? (It doesn't mention loads; the code doesn't 
look like we use scratchpads at all without -ftree-loop-if-convert-stores, or am 
I missing something?)


In tree-if-conv.c:
@@ -883,7 +733,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,

   if (flag_tree_loop_if_convert_stores)
 {
-  if (ifcvt_could_trap_p (stmt, refs))
+  if (ifcvt_could_trap_p (stmt))
{
  if (ifcvt_can_use_mask_load_store (stmt))
{

and

+
+  if (has_non_addressable_refs (stmt))
+   {
+ if (dump_file  (dump_flags  TDF_DETAILS))
+   fprintf (dump_file, has non-addressable memory references\n);
+ return false;
+   }
+

if it doesn't trap, but has_non_addressable_refs, can't we use 
ifcvt_can_use_mask_load_store there too?


And/or, I think I may be misunderstanding here, but if an access could trap, but 
is addressable, can't we use the scratchpad technique to get round the trapping 
problem?


(Look at it another way - this patch makes strictly more things return true from 
ifcvt_could_trap_p, which always exits immediately from 
if_convertible_gimple_assign_stmt_p...?)



Re. creation of scratchpads:
   (1) Should the '64' byte size be the result of scanning the function, for 
the largest data size to which we store? (ideally, conditionally store!)
   (2) Allocating only once per function: if we had one scratchpad per loop, it 
could/would live inside the test of gimple_build_call_internal 
(IFN_LOOP_VECTORIZED,  Otherwise, if we if-convert one or more loops in the 
function, but then fail to vectorize them, we'll leave the scratchpad around for 
later phases to clean up. Is that OK?



Also some style nits:

@@ -1342,7 +1190,7 @@ if_convertible_loop_p_1 (struct loop *loop,
   /* Check the if-convertibility of statements in predicated BBs.  */
   if (!dominated_by_p (CDI_DOMINATORS, loop-latch, bb))
for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (itr))
- if (!if_convertible_stmt_p (gsi_stmt (itr), *refs,
+ if (!if_convertible_stmt_p (gsi_stmt (itr),
  any_mask_load_store))
return false;
 }

bet that fits on one line now.

+ * Returns a memory reference to the pointer defined by the
+conditional expression: pointer = cond ? A[i] : scratch_pad; and
+   inserts this code at GSI.  */
+
+static tree
+create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
+  gimple_stmt_iterator *gsi, bool swap)

in comment, should A[i] just be AI, as I see nothing in 
create_indirect_cond_expr that requires ai to be an array dereference?


@@ -2063,12 +1998,14 @@ mask_exists (int size, vecint vec)
| end_bb_1
|
| bb_2
+   |   cond = 

[AArch64][TLSGD Desc][2/3] Sort case label alphabetically

2015-06-22 Thread Jiong Wang

Obivious coding style fix.

2015-06-22  Jiong Wang  jiong.w...@arm.com
gcc/
  * config/aarch64/aarch64.c (aarch64_expand_move_immediate): Sort case
  label alphabetically.
  
-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 16c8dba..dddf401 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1548,12 +1548,12 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	  emit_insn (gen_rtx_SET (dest, mem));
 	  return;
 
-case SYMBOL_TLSGD:
-case SYMBOL_SMALL_GOTTPREL:
+	case SYMBOL_SMALL_GOTTPREL:
 	case SYMBOL_SMALL_GOT:
 	case SYMBOL_TINY_GOT:
-case SYMBOL_TINY_TLSIE:
+	case SYMBOL_TINY_TLSIE:
 	case SYMBOL_TLSDESC:
+	case SYMBOL_TLSGD:
 	  if (offset != const0_rtx)
 	{
 	  gcc_assert(can_create_pseudo_p ());


[AArch64][TLSGD Desc][1/3] Generalize TLS Descriptor for Global Dynamic

2015-06-22 Thread Jiong Wang

Currently, there is only small model support for TLS Global Dynamic
(Desciptor) on AArch64. While TLS Global Dynamic (Descriptor) is
actually the same for all memory mode.

We always generate below code sequences:

R0 = GOT entry address of tls descriptor for var.
Rx = speialize_func
.tlsdesccall var
blr Rx
  
Instruction sequences for different memory model differs only for how to
addressing the GOT descriptor of that TLS variable, and they should
always be packed together for later linker relaxation.

Tiny:

  ldr   xr, :tlsdesc:var
  adr   x0, :tlsdesc:var
  .tlsdesccall var
  blr   xr

Small:

  adrp  x0, :tlsdesc:var
  ldr   xr, [x0, #:tlsdesc_lo12:var]
  add   x0, x0, #:tlsdesc_lo12:var
  .tlsdesccall var
  blr   xr

Large:

  movz  x0, #:tlsdesc_off_g1:var
  movk  x0, #:tlsdesc_off_g0_nc:var
  .tlsdescldr var
  ldr   xr, [gp, x0]
  .tlsdescadd var
  add   x0, gp, x0
  .tlsdesccall var
  blr   xr

This patch generalize TLS Global Dynamic Descriptor code for all memory
model. Another seperate patch will add descriptor support for Tiny model.
  
OK for trunk?

2015-06-22  Jiong Wang  jiong.w...@arm.com

gcc/
  * config/aarch64/aarch64-protos.h (aarch64_symbol_context): Rename
  SYMBOL_SMALL_TLSDESC to SYMBOL_TLSDESC.
  (aarch64_symbol_context): Ditto.
  * config/aarch64/aarch64.md (tlsdesc_small_mode): Renamed into 
tlsdesc_mode.
  * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Rename
  SYMBOL_SMALL_TLSDESC to SYMBOL_TLSDESC. Rename gen_tlsdesc_small_* to 
gen_tlsdesc_*.
  (aarch64_expand_mov_immediate): Ditto.
  (aarch64_print_operand): Ditto.
  (aarch64_classify_tls_symbol): Ditto.

-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 7fad48b..576acc0 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -61,9 +61,9 @@ enum aarch64_symbol_context
 
This corresponds to the small PIC model of the compiler.
 
-   SYMBOL_SMALL_TLSDESC
SYMBOL_SMALL_GOTTPREL
SYMBOL_TINY_TLSIE
+   SYMBOL_TLSDESC
SYMBOL_TLSGD
SYMBOL_TLSLE
Each of of these represents a thread-local symbol, and corresponds to the
@@ -96,11 +96,11 @@ enum aarch64_symbol_type
 {
   SYMBOL_SMALL_ABSOLUTE,
   SYMBOL_SMALL_GOT,
-  SYMBOL_SMALL_TLSDESC,
   SYMBOL_SMALL_GOTTPREL,
   SYMBOL_TINY_ABSOLUTE,
   SYMBOL_TINY_GOT,
   SYMBOL_TINY_TLSIE,
+  SYMBOL_TLSDESC,
   SYMBOL_TLSGD,
   SYMBOL_TLSLE,
   SYMBOL_FORCE_TO_MEM
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e724bd4..16c8dba 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -921,7 +921,7 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	return;
   }
 
-case SYMBOL_SMALL_TLSDESC:
+case SYMBOL_TLSDESC:
   {
 	machine_mode mode = GET_MODE (dest);
 	rtx x0 = gen_rtx_REG (mode, R0_REGNUM);
@@ -932,9 +932,9 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	/* In ILP32, the got entry is always of SImode size.  Unlike
 	   small GOT, the dest is fixed at reg 0.  */
 	if (TARGET_ILP32)
-	  emit_insn (gen_tlsdesc_small_si (imm));
+	  emit_insn (gen_tlsdesc_si (imm));
 	else
-	  emit_insn (gen_tlsdesc_small_di (imm));
+	  emit_insn (gen_tlsdesc_di (imm));
 	tp = aarch64_load_tp (NULL);
 
 	if (mode != Pmode)
@@ -1549,11 +1549,11 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	  return;
 
 case SYMBOL_TLSGD:
-case SYMBOL_SMALL_TLSDESC:
 case SYMBOL_SMALL_GOTTPREL:
 	case SYMBOL_SMALL_GOT:
 	case SYMBOL_TINY_GOT:
 case SYMBOL_TINY_TLSIE:
+	case SYMBOL_TLSDESC:
 	  if (offset != const0_rtx)
 	{
 	  gcc_assert(can_create_pseudo_p ());
@@ -4435,7 +4435,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  asm_fprintf (asm_out_file, :tlsgd:);
 	  break;
 
-	case SYMBOL_SMALL_TLSDESC:
+	case SYMBOL_TLSDESC:
 	  asm_fprintf (asm_out_file, :tlsdesc:);
 	  break;
 
@@ -4468,7 +4468,7 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  asm_fprintf (asm_out_file, :tlsgd_lo12:);
 	  break;
 
-	case SYMBOL_SMALL_TLSDESC:
+	case SYMBOL_TLSDESC:
 	  asm_fprintf (asm_out_file, :tlsdesc_lo12:);
 	  break;
 
@@ -7273,7 +7273,7 @@ aarch64_classify_tls_symbol (rtx x)
 {
 case TLS_MODEL_GLOBAL_DYNAMIC:
 case TLS_MODEL_LOCAL_DYNAMIC:
-  return TARGET_TLS_DESC ? SYMBOL_SMALL_TLSDESC : SYMBOL_TLSGD;
+  return TARGET_TLS_DESC ? SYMBOL_TLSDESC : SYMBOL_TLSGD;
 
 case TLS_MODEL_INITIAL_EXEC:
   switch (aarch64_cmodel)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9f1b26e..f3d9082 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4378,7 +4378,7 @@
(set_attr length 8, 12)]
 )
 
-(define_insn tlsdesc_small_mode
+(define_insn tlsdesc_mode
   [(set (reg:PTR R0_REGNUM)
 (unspec:PTR [(match_operand 0 aarch64_valid_symref S)]
 		   UNSPEC_TLSDESC))


[AArch64][TLSGD Desc][3/3] Implement TLS Global Dynamic Descriptor for tiny model

2015-06-22 Thread Jiong Wang

As we have generalized GD Descriptor support for all memory model in
the first patch.  Support for tiny model is quite straightforward. We
just need to output different instruction sequences according on memory
model.

OK for trunk?

2015-06-22  Jiong Wang  jiong.w...@arm.com

gcc/
  * config/aarch64/aarch64.md (tlsdesc_mode): Support tiny model
  constraint.
  
gcc/testsuite/
  * gcc.target/aarch64/tlsdesc_small.c: New.
  * gcc.target/aarch64/tlsdesc_tiny.c: Ditto.
  
-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 827ae8e..1b4e387 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4394,9 +4394,19 @@
(clobber (reg:CC CC_REGNUM))
(clobber (match_scratch:DI 1 =r))]
   TARGET_TLS_DESC
-  adrp\\tx0, %A0\;ldr\\t%w1, [x0, #%L0]\;add\\tw0, w0, %L0\;.tlsdesccall\\t%0\;blr\\t%1
+  {
+if (aarch64_cmodel_var == AARCH64_CMODEL_TINY)
+  return ldr\t%w1, #%A0;adr\tw0, %A0;.tlsdesccall\t%0;blr\t%1;
+else if (aarch64_cmodel_var == AARCH64_CMODEL_SMALL)
+  return adrp\tx0, %A0;ldr\t%w1, [x0, #%L0];add\tw0, w0, %L0;.tlsdesccall\t%0;blr\t%1;
+else
+  /* TBD: Large model to be supported.  */
+  gcc_unreachable ();
+  }
   [(set_attr type call)
-   (set_attr length 16)])
+   (set (attr length)
+	(if_then_else (match_test aarch64_cmodel_var == AARCH64_CMODEL_TINY)
+	(const_int 12) (const_int 16)))])
 
 (define_insn stack_tie
   [(set (mem:BLK (scratch))
diff --git a/gcc/testsuite/gcc.target/aarch64/tlsdesc_small.c b/gcc/testsuite/gcc.target/aarch64/tlsdesc_small.c
new file mode 100644
index 000..f1429b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tlsdesc_small.c
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options -O2 -ftls-model=global-dynamic -fPIC --save-temps } */
+
+#include tls.c
+
+/* { dg-final { scan-assembler-times adrp\tx0, :tlsdesc: 2 } } */
+/* { dg-final { scan-assembler-times tlsdesccall 2 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/tlsdesc_tiny.c b/gcc/testsuite/gcc.target/aarch64/tlsdesc_tiny.c
new file mode 100644
index 000..a107650
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tlsdesc_tiny.c
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options -O2 -ftls-model=global-dynamic -fPIC -mcmodel=tiny --save-temps } */
+
+#include tls.c
+
+/* { dg-final { scan-assembler-times adr\tx0, :tlsdesc: 2 } } */
+/* { dg-final { scan-assembler-times tlsdesccall 2 } } */
+/* { dg-final { cleanup-saved-temps } } */


Re: C++ PATCH for c++/66515 (ICE with initializer_list)

2015-06-22 Thread Jason Merrill

On 06/17/2015 04:44 PM, Jason Merrill wrote:

Now that reshape_init can return a non-CONSTRUCTOR, we need to call it
earlier in implicit_conversion.


I haven't noticed any problems with the original patch, but just to be 
safe this patch limits the new reshape to the same conditions as the old 
one: only classes.


Tested x86_64-pc-linux-gnu,

commit 98a362ded24db54963524761c3b0613ff844de51
Author: Jason Merrill ja...@redhat.com
Date:   Fri Jun 19 15:29:58 2015 -0400

	PR c++/66515
	* call.c (implicit_conversion): Only reshape for classes.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index ba5da4c..a6c313a 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -1759,8 +1759,9 @@ implicit_conversion (tree to, tree from, tree expr, bool c_cast_p,
 
   /* Call reshape_init early to remove redundant braces.  */
   if (expr  BRACE_ENCLOSED_INITIALIZER_P (expr)
+   CLASS_TYPE_P (to)
COMPLETE_TYPE_P (complete_type (to))
-   CP_AGGREGATE_TYPE_P (to))
+   !CLASSTYPE_NON_AGGREGATE (to))
 {
   expr = reshape_init (to, expr, complain);
   if (expr == error_mark_node)


[patch] Delete temporary response file

2015-06-22 Thread Eric Botcazou
Hi,

when you pass a response file at link time and you use the GNU linker, then 
collect2 creates another, temporary response file and passes it to the linker.
But it fails to delete the file after it is done.  This can easily be seen 
with the following manipulation:

eric@polaris:~/build/gcc/native cat t.c
int main (void) { return 0; }
eric@polaris:~/build/gcc/native cat t.resp
-L/usr/lib64
eric@polaris:~/build/gcc/native gcc -c t.c
eric@polaris:~/build/gcc/native export TMPDIR=$PWD
eric@polaris:~/build/gcc/native gcc -o t t.o @t.resp
eric@polaris:~/build/gcc/native ls cc*
ccVSQ6W5

The problem is that do_wait is not invoked by tlink_execute, only collect_wait 
is, so the cleanup code present therein is never invoked.

Tested on x86_64-suse-linux, OK for the mainline?


2015-06-22  Tristan Gingold  ging...@adacore.com

* collect2.c (collect_wait): Unlink the response file here instead of...
(do_wait): ...here.
(utils_cleanup): ...and here.


-- 
Eric BotcazouIndex: collect-utils.c
===
--- collect-utils.c	(revision 224708)
+++ collect-utils.c	(working copy)
@@ -68,6 +68,12 @@ collect_wait (const char *prog, struct p
 fatal_error (input_location, can't get program status: %m);
   pex_free (pex);
 
+  if (response_file  !save_temps)
+{
+  unlink (response_file);
+  response_file = NULL;
+}
+
   if (status)
 {
   if (WIFSIGNALED (status))
@@ -90,12 +96,6 @@ do_wait (const char *prog, struct pex_ob
   int ret = collect_wait (prog, pex);
   if (ret != 0)
 fatal_error (input_location, %s returned %d exit status, prog, ret);
-
-  if (response_file  !save_temps)
-{
-  unlink (response_file);
-  response_file = NULL;
-}
 }
 
 
@@ -224,7 +224,5 @@ utils_cleanup (bool from_signal)
  calls to maybe_unlink fails. */
   cleanup_done = true;
 
-  if (response_file)
-maybe_unlink (response_file);
   tool_cleanup (from_signal);
 }

Re: [PATCH 3/3] Improve -Wmissing-indentation heuristics

2015-06-22 Thread Jeff Law

On 06/09/2015 11:31 AM, Patrick Palka wrote:

This patch improves the heuristics of the warning in a number of ways.
The improvements are hopefully adequately documented in the code
comments.

The additions to the test case also highlight the improvements.

I tested an earlier version of this patch on more than a dozen C code
bases.  I only found one class of bogus warnings yet emitted, in the
libpng and bdwgc projects.  These projects have a coding style which
indents code inside #ifdefs as if this code was guarded by an if(), e.g.

   if (foo != 0)
 x = 10;
   else   // GUARD
 y = 100; // BODY

   #ifdef BAR
 blah ();  // NEXT
   #endif

These bogus warnings are pre-existing, however (i.e. not caused by this
patch).

gcc/c-family/ChangeLog:

* c-indentation.c (should_warn_for_misleading_indentation):
Improve heuristics.

gcc/testsuite/ChangeLog:

* c-c++-common/Wmisleading-indentation.c: Add more tests.

OK after confirming a successful bootstrap  regression test.

jeff



[PATCH 1/2][ARM] Record FPU features as a bit-set

2015-06-22 Thread Matthew Wahab

Hello,

The ARM backend records FPU features as booleans, one for each feature. This
means that adding support for a new feature involves updating every entry in the
list of FPU descriptions in arm-fpus.def. This patch series changes the
representation of FPU features to use a simple bit-set and flags, as is done
elsewhere.

This patch adds the new FPU feature representation, with feature sets
represented as unsigned longs.

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  matthew.wa...@arm.com

* config/arm/arm.h (arm_fpu_fset): New.
(ARM_FPU_FSET_HAS): New.
(FPU_FL_NONE): New.
(FPU_FL_NEON): New.
(FPU_FL_FP16): New.
(FPU_FL_CRYPTO): New.
From 0ae697751afd9420ece15432e4892a60574b1d56 Mon Sep 17 00:00:00 2001
From: Matthew Wahab matthew.wa...@arm.com
Date: Wed, 10 Jun 2015 09:57:55 +0100
Subject: [PATCH 1/2] Add fpu feature set definitions.

Change-Id: I9614d12b19f068ae2e0cebc1a6c3903972c73d6a
---
 gcc/config/arm/arm.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 373dc85..eadbcec 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -318,6 +318,19 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
   {mode, %{!marm:%{!mthumb:-m%(VALUE)}}}, \
   {tls, %{!mtls-dialect=*:-mtls-dialect=%(VALUE)}},
 
+/* FPU feature sets.  */
+
+typedef unsigned long arm_fpu_fset;
+
+/* Test for an FPU feature.  */
+#define ARM_FPU_FSET_HAS(S,F) (((S)  (F)) == F)
+
+/* FPU Features.  */
+#define FPU_FL_NONE	(0)
+#define FPU_FL_NEON	(1  0)	/* NEON instructions.  */
+#define FPU_FL_FP16	(1  1)	/* Half-precision.  */
+#define FPU_FL_CRYPTO	(1  2)	/* Crypto extensions.  */
+
 /* Which floating point model to use.  */
 enum arm_fp_model
 {
-- 
1.9.1



Re: [Ping, Patch, fortran, 64674, v3] [OOP] ICE in ASSOCIATE with class array

2015-06-22 Thread Paul Richard Thomas
Dear Andre,

It was indeed the associate(pam = im(:, c)) that I had in mind. If
you have that working and in the tescase, that's good enough for me.

Cheers

Paul

On 22 June 2015 at 17:15, Andre Vehreschild ve...@gmx.de wrote:
 Hi Paul,

 On Mon, 22 Jun 2015 16:04:09 +0200
 Paul Richard Thomas paul.richard.tho...@gmail.com wrote:

 Hi Andre,

 Some questions: The first and second chunks look a bit awkward in
 parse.c. Do they have to be there in order that primary.c does the
 right thing?

 I tried at first to do this rank resolution in primary.c, but that was too
 late. parse.c needs to propagate the rank correctly. When I remember 
 correctly,
 then doing so later prevents parse.c to correctly recognize the vector (from
 the example in the initial description) as such, i.e., the indexing in
 vector(2) was not allowed. gfortran assumed vector to be scalar. So, IMHO yes.

 Could the whole lot be transferred to resolve.c or would that make it
 horribly messy?

 Again, IMO is it not easily transferable to primary.c or even resolve.c.
 Therefore no.

 I couldn't apply the patch right now -
 does it work with variable expressions for the target array indices?

 I am not quite sure, what you mean. Something like this:

   associate(pam = im(2:3, 2:3))
 pam = 9
 pam(1,2) = 10
 do c = 1, 2
 pam(2, c) = 0
 end do
   end associate

 ? I have added that to the testcase and it works. Or do you want the variable
 expressions in the target, like this:

   integer :: expect(20)= 23
   integer :: im(4,5) = 23
   integer :: c

   expect(2:3) = 9
   do c = 1, 5
 im = 23
 associate(pam = im(:, c))
   pam(2:3) = 9
 end associate
 if (any (reshape(im, [20]) /= expect)) call abort()
 ! Shift expect
 expect = [expect(17:), expect(:16)]
   end do

 Will this do, or did you have something more elaborate in mind? This is also
 working and in the testcase now.

 Thanks for the review so far.

 Regards,
 Andre
 --
 Andre Vehreschild * Email: vehre ad gmx dot de



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx


[PATCH 1/4][ARM] Make room for more CPU feature flags.

2015-06-22 Thread Matthew Wahab

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. To be able to support new architecture features, the
current representation will need to be replaced so that more flags can be
recorded.

This series of patches replaces the single unsigned long with a representation
based on an array of unsigned longs. Constructors and operations are explicitly
defined for the new representation and the backend is updated to use the new
operations.

The individual patches:
- Make architecture flags explicit in arm-cores.def, to prepare for the changes.
- Add definitions for the new representation as type arm_feature_set and macros
  with prefix ARM_FSET.
- Replace uses of the old representation with the arm_feature_set type and
  operations.
- Rework arm-cores.def and arm-arches.def to make the feature set constructions
  explicit.

The series tested for arm-none-linux-gnueabihf with check-gcc.

This patch moves the derived FL_FOR_ARCH##ARCH flags from the expansion of macro
arm.c/ARM_CORE and makes them explicit in the entries in arm-cores.def.

This patch tested for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

2015-06-22  Matthew Wahab  matthew.wa...@arm.com

* gcc/config/arm/arm-cores.def: Add FL_FOR_ARCH flag for each
ARM_CORE entry.  Fix some white-space.
* gcc/config/arm/arm.c: Remove FL_FOR_ARCH derivation from
ARM_CORE definition.
From b8d4b4ef938d64996d0d20aaa9974757057aaad2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab matthew.wa...@arm.com
Date: Fri, 5 Jun 2015 12:33:34 +0100
Subject: [PATCH 1/4] [ARM] Make ARCH flags explicit in arm-cores.def

Change-Id: I13a79c89bebaf82aa921f0502b721ff5d9b92dbe
---
 gcc/config/arm/arm-cores.def | 200 +--
 gcc/config/arm/arm.c |   2 +-
 2 files changed, 101 insertions(+), 101 deletions(-)

diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 103c314..f362c27 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -43,134 +43,134 @@
Some tools assume no whitespace up to the first , in each entry.  */
 
 /* V2/V2A Architecture Processors */
-ARM_CORE(arm2, 	arm2, arm2,	2, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE(arm250, 	arm250, arm250,	2, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE(arm3,	arm3, arm3,	2, FL_CO_PROC | FL_MODE26, slowmul)
+ARM_CORE(arm2,	arm2, arm2,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
+ARM_CORE(arm250,	arm250, arm250,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
+ARM_CORE(arm3,	arm3, arm3,	2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul)
 
 /* V3 Architecture Processors */
-ARM_CORE(arm6,	arm6, arm6,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE(arm60,	arm60, arm60,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE(arm600,	arm600, arm600,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE(arm610,	arm610, arm610,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE(arm620,	arm620, arm620,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE(arm7,	arm7, arm7,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE(arm7d,	arm7d, arm7d,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE(arm7di,	arm7di, arm7di,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE(arm70,	arm70, arm70,		3, FL_CO_PROC | FL_MODE26, slowmul)
-ARM_CORE(arm700,	arm700, arm700,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE(arm700i,	arm700i, arm700i,	3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE(arm710,	arm710, arm710,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE(arm720,	arm720, arm720,		3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE(arm710c,	arm710c, arm710c,	3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE(arm7100,	arm7100, arm7100,	3, FL_MODE26 | FL_WBUF, slowmul)
-ARM_CORE(arm7500,	arm7500, arm7500,	3, FL_MODE26 | FL_WBUF, slowmul)
+ARM_CORE(arm6,	arm6, arm6,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm60,	arm60, arm60,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm600,	arm600, arm600,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm610,	arm610, arm610,		3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm620,	arm620, arm620,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm7,	arm7, arm7,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm7d,	arm7d, arm7d,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm7di,	arm7di, arm7di,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm70,	arm70, arm70,		3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm700,	arm700, arm700,		3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm700i,	arm700i, arm700i,	3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm710,	arm710, arm710,		3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm720,	arm720, arm720,		3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul)
+ARM_CORE(arm710c,	

Re: RFA: Fix isl-ast-gen-if-1.c test

2015-06-22 Thread Jeff Law

On 06/22/2015 09:38 AM, Nick Clifton wrote:

Hi Guys,

   The test file gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c file
   was generating an unexpected failure for the RX.  When I investigated
   I found that a return address on the stack was being corrupted, and I
   tracked it down to the foo() function:

 foo (int a[], int n)
 {
   int i;
   for (i = 0; i  n; i++)
 {
   if (i  25)
 a[i] = i;
  a[n - i] = 1;
 }
 }

   The problem is that when i is 0, the line a[n - i] writes to a[50]
   which is beyond the end of the a array.  (In the RX case it writes
   over the return address on the stack).

   The patch below fixes the problem, although it could also be solved by
   increasing the size of the a array when it is declared in main().

   OK to apply ?

Cheers
   Nick

gcc/testsuite/ChangeLog
2015-06-22  Nick Clifton  ni...@redhat.com

* gcc.dg/graphite/isl-ast-gen-if-1.c (foo): Prevent writing after
the end of the array.
I'd tend to prefer to change the size of the array -- adding another 
conditional in the loop may have unintended consequences that possibly 
scramble things just enough to compromise the test.


jeff



Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-06-22 Thread Jiong Wang


On 04/05/15 17:37, Alexander Monakov wrote:

This patch introduces option -fno-plt that allows to expand calls that would
go via PLT to load the address of the function immediately at call site (which
introduces a GOT load).  Cover letter explains the motivation for this patch.

New option documentation for invoke.texi is missing from the patch; if this is
accepted I'll be happy to send a v2 with documentation added.

* calls.c (prepare_call_address): Transform PLT call to GOT lookup and
indirect call by forcing address into a pseudo with -fno-plt.
* common.opt (flag_plt): New option.


Have done a quick experiment, -fno-plt doesn't work on AArch64.

it's because although this patch force the function address into register,
but the combine pass runs later combine it back as AArch64 have defined such
insn pattern.

For X86, it's not combined back. From the rtl dump, it's because the rtl 
pre pass
has moved the address load instruction into another basic block and 
combine pass
don't combine across basic blocks. Also, x86 backend has done some check 
on flag_plt
in the new added ix86_nopic_noplt_attribute_p which could help generate 
correct insns.


What I can think of the fix on AArch64 is by restricting the call symbol 
under
flag_plt == true only, so that call via register can't be combined 
into call

symbol direct,

Or better to prohibit combine pass for such combining? as the generic 
fix on combine may

fix other broken targets.

Thoughts?

Regards,
Jiong



Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Nathan Sidwell

On 06/22/15 11:18, Bernd Schmidt wrote:


You can have a hint that it is desirable, but not a hint that it is correct
(because passes in between may invalidate that). The OpenACC directives
guarantee to the compiler that the program can be transformed into a parallel
form. If we lose them early we must then rely on our analysis which may not be
strong enough to prove that the loop can be parallelized. If we make these
transformations early enough, while we still have the OpenACC directives, we can
guarantee that we do exactly what the programmer specified.


How does this differ from openmp's needs to preserve parallelism on a parallel 
loop?  Is it more than the reconvergence issue?


nathan

--
Nathan Sidwell


Re: [PATCH] top-level for libvtv: use normal (not raw_cxx) target exports

2015-06-22 Thread Michael Haubenwallner

On 06/19/2015 11:36 AM, Paolo Bonzini wrote:
 On 09/06/2015 16:22, Michael Haubenwallner wrote:
 Hi build machinery maintainers,

 since we always build the C++ compiler now, I fail to see the need to still
 use RAW_CXX_TARGET_EXPORTS for libvtv.

 The situation to expose the problem is:
 * Use a multilib-enabled x86_64-linux box.
 * Use a 64-bit (multilib-disabled) bootstrap compiler (binary image).
 $ configure --enable-multilib --with-system-zlib
 $ make bootstrap

 When it comes to build the 32-bit libvtv, it breaks because of using
 CC=/build/prev-gcc/xgcc -m32 CXX=g++ -m32, while it should use
 CC=/build/prev-gcc/xgcc -m32 CXX=/build/prev-gcc/xg++ -m32 instead.

Unfortunately, I've been unable to reproduce this problem for a while now,
and it turns out that I also used --enable-maintainer-mode. And it happens
only when some generated autotool-file is updated while it is in use - not
really sure which one though, probably some configure script.

But still:

 However, I'm not sure about the general question behind:
 Should it work to bootstrap the multilib-compiler using a non-multilib one?

 This also needs above configure flags to work around two more but minor 
 issues,
 which I'm unsure about whether I can/should fix at all:
 * --enable-multilib: Without this, the user friendly check is breaking,
 since https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=205975
 
 Why is it breaking?

The OS I'm running on is a multilib-enabled x86_64 (Gentoo) Linux,
and 32-bit development libraries (libc and headers) are available.

The compiler I want to bootstrap the multilib compiler with is an
x86_64-only binary image (c,c++,ada), built _without_ multilib.

Now this user friendly check tries to create a 32-bit executable using
that 64-bit-only bootstrap gcc, which lacks 32-bit libgcc* - while it
can create 32-bit object files though.

 * --with-system-zlib: Without this, --enable-multilib tries to build a
 32-bit zlib with CC=/build/32/./prev-gcc/xgcc
 
 Ouch, that's a separate bug...  Arguably --with-system-zlib should be
 the default these days (and should have been for 10 years or so).

This one I'll leave untouched.

 The patch is ok.

Even if the problem raises only because maintainer-mode isn't multilib-save?

Thanks!
/haubi/


Re: Re: [PATCH] [PATCH][ARM] Fix split-live-ranges-for-shrink-wrap.c testcase.

2015-06-22 Thread Alex Velenko

On 20/05/15 21:14, Joseph Myers wrote:

Again, the condition you propose to add doesn't make sense.  arm_arch_X_ok
is only appropriate for tests using an explicit -march=X.  Testing with
-march=armv7* should automatically skip this test anyway because it would
cause arm_thumb1_ok to fail.



Hi,

I adjusted the patch to skip execution split-live-ranges-for-shrink-wrap.c
with explicitly specified -march=armv4t and provide -march=armv5t flag =
for
arm_arch_v5t_ok targets.

Is patch ok?

Alex

gcc/testsuite

2015-06-22  Alex Velenko  alex.vele...@arm.com

 * gcc.target/arm/split-live-ranges-for-shrink-wrap.c (dg-skip-if):
Skip -march=armv4t.
(dg-additional-options): Set armv5t flag.

diff --git
a/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c
b/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c
index e36000b..3cb93dc 100644
--- a/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c
+++ b/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c
@@ -1,6 +1,8 @@
  /* { dg-do assemble } */
  /* { dg-options -mthumb -Os -fdump-rtl-ira  }  */
  /* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-skip-if do not test on armv4t { *-*-* } { -march=armv4t } } =
*/
+/* { dg-additional-options -march=armv5t {target arm_arch_v5t_ok} } */

  int foo (char *, char *, int);
  int test (int d, char * out, char *in, int len)

--1.8.1.2--



[committed] Add missing update_stmt in transform_to_exit_first_loop_alt

2015-06-22 Thread Tom de Vries

Hi,

I realized that transform_to_exit_first_loop_alt is missing an 
update_stmt for the gimple_cond_set_rhs (transform_to_exit_first_loop 
has an update_stmt after a similar gimple_cond_set_lhs).


Bootstrapped and reg-tested on x86_64.

Committed as trivial.

Thanks,
- Tom
Add missing update_stmt in transform_to_exit_first_loop_alt

2015-06-22  Tom de Vries  t...@codesourcery.com

	* tree-parloops.c (transform_to_exit_first_loop_alt): Add update_stmt
	for cond_stmt.
---
 gcc/tree-parloops.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 28112b2..7123c27 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1679,6 +1679,7 @@ transform_to_exit_first_loop_alt (struct loop *loop,
 
   /* Set the new loop bound.  */
   gimple_cond_set_rhs (cond_stmt, bound);
+  update_stmt (cond_stmt);
 
   /* Repair the ssa.  */
   vecedge_var_map *v = redirect_edge_var_map_vector (post_inc_edge);
-- 
1.9.1



Re: [i386, PATCH, 2/3] IA MCU psABI support: changes to libraries.

2015-06-22 Thread Kirill Yukhin
Hello,
Patch in the bottom adds support of IA MCU psABI to libgcc
(enables soft-fp) and libdecnumber (enables it for IA MCU).

Bootstrapped and regtested on top of [1/3] patch.

config/
* dfp.m4 (enable_decimal_float): Also set to yes for
 i?86*-*-elfiamcu target.
gcc/
* configure: Regenerated.
libdecnumber/
* configure: Regenerated.
libgcc/
* config.host: Support i[34567]86-*-elfiamcu target.
* config/i386/32/t-iamcu: New file.
* configure: Regenerated.

Is it OK for trunk?

--
Thanks, K

diff --git a/config/dfp.m4 b/config/dfp.m4
index 48683f0..5b29089 100644
--- a/config/dfp.m4
+++ b/config/dfp.m4
@@ -21,7 +21,7 @@ Valid choices are 'yes', 'bid', 'dpd', and 'no'.]) ;;
 [
   case $1 in
 powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \
-i?86*-*-gnu* | \
+i?86*-*-elfiamcu | i?86*-*-gnu* | \
 i?86*-*-mingw* | x86_64*-*-mingw* | \
 i?86*-*-cygwin* | x86_64*-*-cygwin*)
   enable_decimal_float=yes
diff --git a/gcc/configure b/gcc/configure
index b26a86f..64eeac6 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -7317,7 +7317,7 @@ else

   case $target in
 powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \
-i?86*-*-gnu* | \
+i?86*-*-elfiamcu | i?86*-*-gnu* | \
 i?86*-*-mingw* | x86_64*-*-mingw* | \
 i?86*-*-cygwin* | x86_64*-*-cygwin*)
   enable_decimal_float=yes
diff --git a/libdecnumber/configure b/libdecnumber/configure
index 2720f46..964837d 100755
--- a/libdecnumber/configure
+++ b/libdecnumber/configure
@@ -4614,7 +4614,7 @@ else

   case $target in
 powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \
-i?86*-*-gnu* | \
+i?86*-*-elfiamcu | i?86*-*-gnu* | \
 i?86*-*-mingw* | x86_64*-*-mingw* | \
 i?86*-*-cygwin* | x86_64*-*-cygwin*)
   enable_decimal_float=yes
diff --git a/libgcc/config.host b/libgcc/config.host
index 4df..dd8e356 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -562,6 +562,9 @@ x86_64-*-darwin*)
tm_file=$tm_file i386/darwin-lib.h
extra_parts=$extra_parts crtprec32.o crtprec64.o crtprec80.o 
crtfastmath.o
;;
+i[34567]86-*-elfiamcu)
+   tmake_file=$tmake_file i386/t-crtstuff t-softfp-sfdf i386/32/t-softfp 
i386/32/t-iamcu i386/t-softfp t-softfp t-dfprules
+   ;;
 i[34567]86-*-elf*)
tmake_file=$tmake_file i386/t-crtstuff t-crtstuff-pic t-libgcc-pic
;;
diff --git a/libgcc/config/i386/32/t-iamcu b/libgcc/config/i386/32/t-iamcu
new file mode 100644
index 000..0752bff
--- /dev/null
+++ b/libgcc/config/i386/32/t-iamcu
@@ -0,0 +1,6 @@
+softfp_float_modes += tf
+softfp_extensions += sftf dftf xftf
+softfp_truncations += tfsf tfdf tfxf
+softfp_exclude_libgcc2 := n
+
+HOST_LIBGCC2_CFLAGS += -mlong-double-80
diff --git a/libgcc/configure b/libgcc/configure
index ce66d1d..e22cbcb 100644
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -4436,7 +4436,7 @@ else

   case $host in
 powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \
-i?86*-*-gnu* | \
+i?86*-*-elfiamcu | i?86*-*-gnu* | \
 i?86*-*-mingw* | x86_64*-*-mingw* | \
 i?86*-*-cygwin* | x86_64*-*-cygwin*)
   enable_decimal_float=yes





[PATCH 3/4][ARM] Use new feature set representation.

2015-06-22 Thread Matthew Wahab

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch replaces the existing representation of CPU feature sets with the
type arm_feature_set and ARM_FSET macros added in an earlier patch in this
series.

Tested arm-none-linux-gnueabihf with check-gcc. Also tested as part of the
series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  matthew.wa...@arm.com

* config/arm/arm-builtins.c (def_mbuiltin): Use ARM_FSET macro.
* config/arm/arm-protos.h (insn_flags): Declare as type
arm_feature_set.
(tune_flags): Likewise.
* config/arm/arm.c (feature_count): New.
(insn_flags): Define as type arm_feature_set.
(tune_flags): Likewise.
(struct processors): Define field flags as type arm_feature_set.
(all_cores): Update for change to struct processors.
(all_architectures): Likewise.
(arm_option_check_internal): Use arm_feature_set and ARM_FSET macros.
(arm_option_override_internal): Likewise.
(arm_option_override): Likewise.

From 8b5e132868da066eb8a8673286b796656b9ed127 Mon Sep 17 00:00:00 2001
From: Matthew Wahab matthew.wa...@arm.com
Date: Mon, 8 Jun 2015 14:11:13 +0100
Subject: [PATCH 3/4] Use feature sets.

Change-Id: I5a1b162102dd19b6376637218dc548502112cf4b
---
 gcc/config/arm/arm-builtins.c |   4 +-
 gcc/config/arm/arm-protos.h   |   4 +-
 gcc/config/arm/arm.c  | 131 --
 3 files changed, 80 insertions(+), 59 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index f960e0a..31203d4 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -1074,10 +1074,10 @@ arm_init_neon_builtins (void)
 #undef NUM_DREG_TYPES
 #undef NUM_QREG_TYPES
 
-#define def_mbuiltin(MASK, NAME, TYPE, CODE)\
+#define def_mbuiltin(FLAG, NAME, TYPE, CODE)\
   do	\
 {	\
-  if ((MASK)  insn_flags)		\
+  if (ARM_FSET_HAS_CPU1 (insn_flags, (FLAG)))			\
 	{\
 	  tree bdecl;			\
 	  bdecl = add_builtin_function ((NAME), (TYPE), (CODE),		\
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index a19d54d..859b5d2 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -515,11 +515,11 @@ typedef struct
 
 /* The bits in this mask specify which
instructions we are allowed to generate.  */
-extern unsigned long insn_flags;
+extern arm_feature_set insn_flags;
 
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
-extern unsigned long tune_flags;
+extern arm_feature_set tune_flags;
 
 /* Nonzero if this chip supports the ARM Architecture 3M extensions.  */
 extern int arm_arch3m;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b21f433..dd892a7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -105,6 +105,7 @@ static void arm_add_gc_roots (void);
 static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
 			 HOST_WIDE_INT, rtx, rtx, int, int);
 static unsigned bit_count (unsigned long);
+static unsigned feature_count (const arm_feature_set*);
 static int arm_address_register_rtx_p (rtx, int);
 static int arm_legitimate_index_p (machine_mode, rtx, RTX_CODE, int);
 static bool is_called_in_ARM_mode (tree);
@@ -771,11 +772,11 @@ static int thumb_call_reg_needed;
 
 /* The bits in this mask specify which
instructions we are allowed to generate.  */
-unsigned long insn_flags = 0;
+arm_feature_set insn_flags = ARM_FSET_EMPTY;
 
 /* The bits in this mask specify which instruction scheduling options should
be used.  */
-unsigned long tune_flags = 0;
+arm_feature_set tune_flags = ARM_FSET_EMPTY;
 
 /* The highest ARM architecture version supported by the
target.  */
@@ -928,7 +929,7 @@ struct processors
   enum processor_type core;
   const char *arch;
   enum base_architecture base_arch;
-  const unsigned long flags;
+  const arm_feature_set flags;
   const struct tune_params *const tune;
 };
 
@@ -2197,10 +2198,10 @@ static const struct processors all_cores[] =
   /* ARM Cores */
 #define ARM_CORE(NAME, X, IDENT, ARCH, FLAGS, COSTS) \
   {NAME, IDENT, #ARCH, BASE_ARCH_##ARCH,	  \
-FLAGS, arm_##COSTS##_tune},
+   ARM_FSET_MAKE_CPU1 (FLAGS), arm_##COSTS##_tune},
 #include arm-cores.def
 #undef ARM_CORE
-  {NULL, arm_none, NULL, BASE_ARCH_0, 0, NULL}
+  {NULL, arm_none, NULL, BASE_ARCH_0, ARM_FSET_EMPTY, NULL}
 };
 
 static const struct processors all_architectures[] =
@@ -2210,10 +2211,10 @@ static const struct processors all_architectures[] =
  from the core.  */
 
 #define ARM_ARCH(NAME, CORE, ARCH, FLAGS) \
-  {NAME, CORE, #ARCH, BASE_ARCH_##ARCH, FLAGS, NULL},
+  {NAME, CORE, #ARCH, BASE_ARCH_##ARCH, ARM_FSET_MAKE_CPU1 

Re: Fix more of C/fortran canonical type issues

2015-06-22 Thread Jan Hubicka
  Hi,
  I would like to ping this.  There are still few things to fix to make our
  merging compliant at least for C/C++/Fortran rules (the array bounds for
  Fortran and union ordering for C I believe) and I would like to progress
  on this.
 
 I don't like the changes to useless_type_conversion_p much.  Why
 do you preserve qualifiers for the integer kind compares?
 
 All the testcases have the integral types in aggregates as members.
 I already said that I'm happy globbing them together in aggregates.

I originally made the testcase this way because I wanted to test the way
aggregates are built and because it needs less of fortran code to realize it.
It is possible to consutrct same testcase with scalar variables.  See the other
patch fixing the surprious warning.  You need also variant size_t a to be
compatible with fortran equivalent of signed size_t a, so it is not only
about variables.

 
 I'm still not convinced that we need a 1:1 correspondence between
 canonical types and alias sets.  In particular canonical types are
 used for type compatibility in lhs = rhs assignments
 (useless_type_conversion_p) which is a transitive relation.
 Mixing both too much will cause serious confusion.  We have
 alias-sets for a reason.

OK, I am not sure if canonical types needs to actually mean the type
compatibility in the middle-end sense.  It is a language specific thing:

   The canonical type for this type node, which is used by frontends to
   compare the type for equality with another type.  If two types are
   equal (based on the semantics of the language), then they will have
   equivalent TYPE_CANONICAL entries.

In a way TYPE_CANONICAL seems bit schizofrenic about if it means language level
compatibility, representation compatibility or middle end semantic
compatibility.  It seems bit odd to define something like
useless_type_conversion_p by language specific manner despite the fact its
definition is now sound in language independent way as we now have all semantics
represented in IL (and flags, well)

but I would be happy to update the patch to assign different canonical types to
signed/unsigned integers and avoid recursion on those for aggregates/arrays and
all other derived types. (After all I plan to do that for pointers
incrementally)

Here we are lucky that alias.c already contains the globbing for
signed/unsigned.  Do we want to do the same scheme in other cases? For example
next on my list is the fact that array with bounds 3...5 is interoperable with
array[3] in C.  Here again we can not consider these useless_type_conversion_p
because index operation is different, but they are representation compatible.
This will need a special case in get_alias_set. I would not like to make
get_alias_set, or (with less loss of code quality on non-C languages) in lto's
get_alias_set langhook.

Honza
 
 Richard.


Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Nathan Sidwell

On 06/22/15 12:20, Jakub Jelinek wrote:


OpenMP worksharing loop is just coordination between the threads in the
team, which thread takes which subset of the loop's iterations, and
optionally followed by a barrier.  OpenMP simd loop is a loop that has
certain properties guaranteed by the user and can be vectorized.
In contrast to this, OpenACC spawns all the threads/CTAs upfront, and then
idles on some of them until there is work for them.


correct.  I expressed my question poorly.  What I mean is that in openmp, a loop 
that is parallelizeable (by user decree, I guess[*]), should not be transformed 
such that it is not parallelizeable.


This seems to me to be a common requirement of both languages.  How one gets 
parallel threads of execution to the body of the loop is a different question.


nathan

[*] For ones where the compiler needs to detect parallizeablilty, it's 
preferable that it doesn't do something earlier to force serializeablility.


--
Nathan Sidwell


Re: [C++ Patch] Use declspecs-locations[ds_virtual]

2015-06-22 Thread Jason Merrill

On 06/22/2015 09:54 AM, Paolo Carlini wrote:

I think this also qualifies as obvious given the past work / discussion:
use in one more place declspecs-locations to improve the location of
the error message.


Agreed, thanks.

Jason




Re: [PATCH 1/3] Refactor entry point to -Wmisleading-indentation

2015-06-22 Thread Jeff Law

On 06/09/2015 11:31 AM, Patrick Palka wrote:

This patch refactors the entry point of -Wmisleading-indentation from:

   void
   warn_for_misleading_indentation (location_t guard_loc,
location_t body_loc,
location_t next_stmt_loc,
enum cpp_ttype next_tok_type,
const char *guard_kind);

to

   struct token_indent_info
   {
 location_t location;
 cpp_ttype type;
 rid keyword;
   };

   void
   warn_for_misleading_indentation (const token_indent_info guard_tinfo,
const token_indent_info body_tinfo,
const token_indent_info next_tinfo);

The purpose of this refactoring is to expose more information to the
-Wmisleading-indentation implementation to allow for more advanced
heuristics and for better coverage.

(I decided to keep the usage of const references because nobody
seems to mind.  Also I added a new header file, c-indentation.h.)

gcc/c-family/ChangeLog:

* c-indentation.h (struct token_indent_info): Define.
(get_token_indent_info): Define.
(warn_for_misleading_information): Declare.
* c-common.h (warn_for_misleading_information): Remove.
* c-identation.c (warn_for_misleading_indentation):
Change declaration to take three token_indent_infos.  Adjust
accordingly.
* c-identation.c (should_warn_for_misleading_indentation):
Likewise.  Bail out early if the body is a compound statement.
(guard_tinfo_to_string): Define.

gcc/c/ChangeLog:

* c-parser.c (c_parser_if_body): Take token_indent_info
argument. Call warn_for_misleading_indentation even when the
body is a semicolon.  Extract token_indent_infos corresponding
to the guard, body and next tokens.  Adjust call to
warn_for_misleading_indentation accordingly.
(c_parser_else_body): Likewise.
(c_parser_if_statement): Likewise.
(c_parser_while_statement): Likewise.
(c_parser_for_statement): Likewise.

gcc/cp/ChangeLog:

* parser.c (cp_parser_selection_statement): Move handling of
semicolon body to ...
(cp_parser_implicitly_scoped_statement): .. here.  Call
warn_for_misleading_indentation even when the body is a
semicolon.  Extract token_indent_infos corresponding to the
guard, body and next tokens.  Adjust call to
warn_for_misleading_indentation accordingly.  Take
token_indent_info argument.
(cp_parser_already_scoped_statement): Likewise.
(cp_parser_selection_statement, cp_parser_iteration_statement):
Extract a token_indent_info corresponding to the guard token.
The only question in my mind is bootstrap  regression testing.  From 
reading the thread for the earlier version of this patch I got the 
impression you had bootstrapped and regression tested earlier versions.


If you could confirm that you've bootstrapped and regression tested this 
version it'd be appreciated.  You can do it on the individual patches or 
the set as a whole.


Jeff




Re: [Patch, C++, PR65882] Check tf_warning flag in build_new_op_1

2015-06-22 Thread Jason Merrill

On 06/19/2015 08:23 PM, Mikhail Maltsev wrote:

I see that version 5.2 is set as target milestone for this bug. Should I
backport the patch?


Please.

Jason




Re: [gomp4] Remove some ptxness from middle end

2015-06-22 Thread Marek Polacek
On Mon, Jun 22, 2015 at 01:00:51PM -0400, Nathan Sidwell wrote:
 +  if (GET_CODE (arg) != CONST_INT
 +  || (unsigned HOST_WIDE_INT)INTVAL (arg) = OACC_HWM)

Don't we have UINTVAL for this?  So UINTVAL (arg).

Marek


Re: [PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Tom de Vries

On 22/06/15 13:47, Richard Biener wrote:

On Mon, Jun 22, 2015 at 1:33 PM, Tom de Vries tom_devr...@mentor.com wrote:

On 22/06/15 12:14, Richard Biener wrote:


On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries tom_devr...@mentor.com
wrote:


Hi,

during development of a patch I ran into a case where
compute_dominance_frontiers was called with incorrect dominance info.

The result was a segmentation violation somewhere in the bitmap code
while
executing this bitmap_set_bit in compute_dominance_frontiers_1:
...
if (!bitmap_set_bit (frontiers[runner-index],
 b-index))
  break;
...

The segmentation violation happens because runner-index is 0, and
frontiers[0] is uninitialized.

[ The initialization in update_ssa looks like this:
...
   dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
FOR_EACH_BB_FN (bb, cfun)
  bitmap_initialize (dfs[bb-index], bitmap_default_obstack);
compute_dominance_frontiers (dfs);
...

FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0]
(frontiers[0] in compute_dominance_frontiers_1) is not initialized.

We could add initialization by making the entry/exit-block bitmap_heads
empty and setting the obstack to a reserved obstack bitmap_no_obstack for
which allocation results in an assert. ]

AFAIU, the immediate problem is not that frontiers[0] is uninitialized,
but
that the loop reaches the state of runner-index == 0, due to the
incorrect
dominance info.

The patch adds an assert to the loop in compute_dominance_frontiers_1, to
make the failure mode cleaner and easier to understand.

I think we wouldn't catch all errors in dominance info with this assert.
So
the patch also contains an ENABLE_CHECKING-enabled verify_dominators call
at
the start of compute_dominance_frontiers. I'm not sure if:
- adding the verify_dominators call is too costly in runtime.
- the verify_dominators call should be inside or outside the
TV_DOM_FRONTIERS measurement.
- there is a level of ENABLE_CHECKING that is more appropriate for the
verify_dominators call.

Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds?



I don't think these kind of asserts are good.  A segfault is good by
itself
(so you can just add the comment if you like).



The segfault is not guaranteed to trigger, because it works on uninitialized
data. Instead, we may end up modifying valid memory and silently generating
wrong code or causing sigsegvs (which will be difficult to track back this
error). So I don't think doing nothing is an option here. If we're not going
to add this assert, we should initialize the uninitialized data in such a
way that we are guaranteed to detect the error. The scheme I proposed above
would take care of that. Should I implement that instead?


No, instead the check below should catch the error much earlier.


Likewise the verify_dominators call is too expensive and misplaced.

If then the call belongs in the dom_computed[] == DOM_OK early-out
in calculate_dominance_info



OK, like this:
...
diff --git a/gcc/dominance.c b/gcc/dominance.c
index a9e042e..1827eda9 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir)
bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false;

if (dom_computed[dir_index] == DOM_OK)
-return;
+{
+#if ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+#endif
+  return;
+}

timevar_push (TV_DOMINANCE);
if (!dom_info_available_p (dir))
...


Yes.


I didn't fully understand your comment, do you want me to test this?


Sure, it should catch the error.



Bootstrapped and reg-tested on x86_64. Committed as attached.

Thanks,
- Tom
Verify dominators in early-out calculate_dominance_info

2015-06-22  Tom de Vries  t...@codesourcery.com

	* dominance.c (calculate_dominance_info): Verify dominators if
	early-out.
---
 gcc/dominance.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/dominance.c b/gcc/dominance.c
index a9e042e..9c66ca2 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir)
   bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false;
 
   if (dom_computed[dir_index] == DOM_OK)
-return;
+{
+#if ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+#endif
+  return;
+}
 
   timevar_push (TV_DOMINANCE);
   if (!dom_info_available_p (dir))
-- 
1.9.1



[committed] Test for flag_parallelize_loops 1

2015-06-22 Thread Tom de Vries

On 19/06/15 11:26, Tom de Vries wrote:

Hi,

DEF_GOMP_BUILTIN tests for 'flag_parallelize_loops'. But if
flag_parallelize_loops is one (which is also the default), then
pass_parloops doesn't do anything, and won't generate any OMP constructs.

This patch makes DEF_GOMP_BUILTIN tests 'flag_parallelize_loops  1',
just like all the other tests of flag_parallelize_loops in the compiler.

Build on x86_64 and reg-tested libgomp's c.exp.



During bootstrap and reg-test, I found regressions for -fcilkplus. 
There's a dependency of fcilkplus on the gomp builtins, which is exposed 
by this patch.


This updated patch also enables the gomp builtins for fcilkplus.

Bootstrapped and reg-tested on x86_64 on top of trunk.

Committed to trunk as obvious.

Thanks,
- Tom

Test for flag_parallelize_loops  1

2015-06-19  Tom de Vries  t...@codesourcery.com

	* builtins.def (DEF_GOMP_BUILTIN): Test
	'flag_tree_parallelize_loops  1' instead of
	'flag_tree_parallelize_loops'.  Test flag_cilkplus.
---
 gcc/builtins.def | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 55ce9f6..80e4a9c 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -182,7 +182,9 @@ along with GCC; see the file COPYING3.  If not see
 #define DEF_GOMP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
false, true, true, ATTRS, false, \
-	   (flag_openmp || flag_tree_parallelize_loops \
+	   (flag_openmp \
+		|| flag_tree_parallelize_loops  1 \
+		|| flag_cilkplus \
 		|| flag_offload_abi != OFFLOAD_ABI_UNSET))
 
 /* Builtin used by implementation of Cilk Plus.  Most of these are decomposed
-- 
1.9.1



Re: [PATCH 2/3] Remove is_first_nonwhitespace_on_line(), instead improve get_visual_column()

2015-06-22 Thread Jeff Law

On 06/09/2015 11:31 AM, Patrick Palka wrote:

This patch removes the function is_first_nonwhitespace_on_line() in
favor of augmenting the function get_visual_column() to optionally
return the visual column corresponding to the first non-whitespace character
on the line.  Existing usage of is_first_nonwhitespace_on_line() can
be trivially replaced by calling get_visual_column() and comparing *out
with *first_nws.

The rationale for this change is that in many cases it is better to use
the visual column of the first non-whitespace character rather than the
visual column of the token.  Consider:

   if (p) {
 foo (1);
   } else   // GUARD
 if (q) // BODY
   foo (2);
 foo (3);   // NEXT

Here, with current heuristics, we do not emit a warning because we
notice that the visual columns of each token line up (suggesting
autogenerated code).  Yet it is obvious that we should warn here because
it misleadingly looks like the foo (3); statement is guarded by the
else.

If we instead consider the visual column of the first non-whitespace
character on the guard line, the columns will not line up thus we will
emit the warning.  This will be done in the next patch.

gcc/c-family/ChangeLog:

* c-indentation.c (get_visual_column): Add parameter first_nws,
use it.  Update comment documenting the function.
(is_first_nonwhitespace_on_line): Remove.
(should_warn_for_misleading_indentation): Replace usage of
of is_first_nonwhitespace_on_line with get_visual_column.

Same comment/question WRT testing as the prior patch.

OK once you've confirmed bootstrap  regression testing was completed 
successfully.


jeff



Re: RFA: Fix isl-ast-gen-if-1.c test

2015-06-22 Thread Nicholas Clifton

Hi Jeff,


I'd tend to prefer to change the size of the array -- adding another
conditional in the loop may have unintended consequences that possibly
scramble things just enough to compromise the test.


Okey dokey, here is a revised version.  Is this one OK ?

Cheers
  Nick

gcc/ChangeLog

Index: 2015-06-22  Nick Clifton  ni...@redhat.com

* gcc.dg/graphite/isl-ast-gen-if.c (main): Increase size of a
array to allow a[50] to be a valid location.

gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c
===
--- gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(revision 224722)
+++ gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(working copy)
@@ -28,7 +28,7 @@
 int
 main (void)
 {
-  int a[50];
+  int a[51]; /* NB This size allows foo's first iteration to write to 
a[50].  */

   foo (a, 50);
   int res = array_sum (a);
   if (res != 49)



Re: [C++/58583] ICE instantiating NSDMIs

2015-06-22 Thread Nathan Sidwell

On 06/22/15 03:37, Andreas Schwab wrote:

Nathan Sidwell nat...@acm.org writes:


On 06/20/15 02:09, Andreas Schwab wrote:

This also fails on powerpc.


what  is the build compiler?


It is a bootstrapped build, so the build compiler should not matter.


ok, thanks.

I've just built me a powerpc-linux targeting compiler from an x86_64-linux host. 
 That is showing the expected diagnostic, so I'm still unable to reproduce the 
failure.


nsidwell@build6-lucid-cs:19install/bin/powerpc-linux-gnu-g++ -std=c++11 -c 
nsdmi-template14.C
nsdmi-template14.C:6:20: error: constructor required before non-static data 
member for 'A0::i' has been parsed

   int i = (A0(), 0); // { dg-error has been parsed }
^
nsdmi-template14.C: In constructor 'constexpr A0::A()':
nsdmi-template14.C:4:22: error: constructor required before non-static data 
member for 'A0::i' has been parsed

 templateint struct A // { dg-error has been parsed }
  ^
nsdmi-template14.C: At global scope:
nsdmi-template14.C:6:20: note: synthesized method 'constexpr A0::A()' first 
required here

   int i = (A0(), 0); // { dg-error has been parsed }
^
nsdmi-template14.C:14:6: error: recursive instantiation of non-static data 
member initializer for 'B1::p'

 B1 x; // { dg-error recursive instantiation of non-static data }
  ^
nathan


Re: [PATCH 1/3] Refactor entry point to -Wmisleading-indentation

2015-06-22 Thread Jeff Law

On 06/18/2015 10:39 AM, David Malcolm wrote:

On Thu, 2015-06-18 at 11:41 -0400, Patrick Palka wrote:

On Tue, Jun 9, 2015 at 1:31 PM, Patrick Palka patr...@parcs.ath.cx wrote:

This patch refactors the entry point of -Wmisleading-indentation from:

   void
   warn_for_misleading_indentation (location_t guard_loc,
location_t body_loc,
location_t next_stmt_loc,
enum cpp_ttype next_tok_type,
const char *guard_kind);

to

   struct token_indent_info
   {
 location_t location;
 cpp_ttype type;
 rid keyword;
   };

   void
   warn_for_misleading_indentation (const token_indent_info guard_tinfo,
const token_indent_info body_tinfo,
const token_indent_info next_tinfo);

The purpose of this refactoring is to expose more information to the
-Wmisleading-indentation implementation to allow for more advanced
heuristics and for better coverage.

(I decided to keep the usage of const references because nobody
seems to mind.  Also I added a new header file, c-indentation.h.)

gcc/c-family/ChangeLog:

 * c-indentation.h (struct token_indent_info): Define.
 (get_token_indent_info): Define.
 (warn_for_misleading_information): Declare.
 * c-common.h (warn_for_misleading_information): Remove.
 * c-identation.c (warn_for_misleading_indentation):
 Change declaration to take three token_indent_infos.  Adjust
 accordingly.
 * c-identation.c (should_warn_for_misleading_indentation):
 Likewise.  Bail out early if the body is a compound statement.
 (guard_tinfo_to_string): Define.

gcc/c/ChangeLog:

 * c-parser.c (c_parser_if_body): Take token_indent_info
 argument. Call warn_for_misleading_indentation even when the
 body is a semicolon.  Extract token_indent_infos corresponding
 to the guard, body and next tokens.  Adjust call to
 warn_for_misleading_indentation accordingly.
 (c_parser_else_body): Likewise.
 (c_parser_if_statement): Likewise.
 (c_parser_while_statement): Likewise.
 (c_parser_for_statement): Likewise.

gcc/cp/ChangeLog:

 * parser.c (cp_parser_selection_statement): Move handling of
 semicolon body to ...
 (cp_parser_implicitly_scoped_statement): .. here.  Call
 warn_for_misleading_indentation even when the body is a
 semicolon.  Extract token_indent_infos corresponding to the
 guard, body and next tokens.  Adjust call to
 warn_for_misleading_indentation accordingly.  Take
 token_indent_info argument.
 (cp_parser_already_scoped_statement): Likewise.
 (cp_parser_selection_statement, cp_parser_iteration_statement):
 Extract a token_indent_info corresponding to the guard token.


Pinging this series.


FWIW, they look reasonable to me; I'm not a reviewer.
But as the implementer of the warning, your comments/thoughts are 
definitely helpful in the review process.


We've never worked too hard to find a way to formalize this into a set 
of policies and procedures, which is probably a mistake.


jeff


Re: Re: [PATCH] [PATCH][ARM] Fix split-live-ranges-for-shrink-wrap.c testcase.

2015-06-22 Thread Joseph Myers
I have no more comments on this patch.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Tom de Vries

On 22/06/15 13:47, Richard Biener wrote:

(eventually also for the case where we
end up only computing the fast-query stuff).



Like this?
...
diff --git a/gcc/dominance.c b/gcc/dominance.c
index 9c66ca2..58fc6fd 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -679,6 +679,12 @@ calculate_dominance_info (enum cdi_direction dir)
   free_dom_info (di);
   dom_computed[dir_index] = DOM_NO_FAST_QUERY;
 }
+  else
+{
+#if ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+#endif
+}

   compute_dom_fast_query (dir);

...

Thanks,
- Tom


Re: [Ping, Patch, fortran, 64674, v3] [OOP] ICE in ASSOCIATE with class array

2015-06-22 Thread Andre Vehreschild
Hi Paul,

On Mon, 22 Jun 2015 16:04:09 +0200
Paul Richard Thomas paul.richard.tho...@gmail.com wrote:

 Hi Andre,
 
 Some questions: The first and second chunks look a bit awkward in
 parse.c. Do they have to be there in order that primary.c does the
 right thing?

I tried at first to do this rank resolution in primary.c, but that was too
late. parse.c needs to propagate the rank correctly. When I remember correctly,
then doing so later prevents parse.c to correctly recognize the vector (from
the example in the initial description) as such, i.e., the indexing in
vector(2) was not allowed. gfortran assumed vector to be scalar. So, IMHO yes.

 Could the whole lot be transferred to resolve.c or would that make it
 horribly messy?

Again, IMO is it not easily transferable to primary.c or even resolve.c.
Therefore no.

 I couldn't apply the patch right now -
 does it work with variable expressions for the target array indices?

I am not quite sure, what you mean. Something like this:

  associate(pam = im(2:3, 2:3))
pam = 9
pam(1,2) = 10
do c = 1, 2
pam(2, c) = 0
end do
  end associate

? I have added that to the testcase and it works. Or do you want the variable
expressions in the target, like this:

  integer :: expect(20)= 23
  integer :: im(4,5) = 23
  integer :: c

  expect(2:3) = 9
  do c = 1, 5
im = 23
associate(pam = im(:, c))
  pam(2:3) = 9
end associate
if (any (reshape(im, [20]) /= expect)) call abort()
! Shift expect
expect = [expect(17:), expect(:16)]
  end do

Will this do, or did you have something more elaborate in mind? This is also
working and in the testcase now.

Thanks for the review so far.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


[ping] Couple of patches for -fdump-ada-spec

2015-06-22 Thread Eric Botcazou
Add query for template-dependent arguments to -fdump-ada-spec:
  http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00403.html

Get rid of assembly file with -fdump-ada-spec:
  http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00420.html

Thanks in advance.

-- 
Eric Botcazou


[Aarch64] Expand +rdma documentation, small changes to march and mcpu text.

2015-06-22 Thread Matthew Wahab

Hello,

The documentation for the ARMv8.1 +rdma option doesn't mention that enabling it
also implies enabling Adv.SIMD. This patch fixes that.

The documentation for the -march and -mcpu options are also a little messy, this
patch tries to make the text clearer and adds a (texinfo) link to the subsection
documenting the feature modifiers.

Tested by checking the html output.

Ok for trunk?
Matthew

2015-06-22  Matthew Wahab  matthew.wa...@arm.com

* doc/invoke.texi (Aarch64 Options, -march): Split out arch and
feature description, split out the native option, add a link to
the feature documentation, rearrange and slightly rewrite text.
(Aarch64 options, -mcpu): Likewise.
(Aarch64 options, Feature Modifiers): Add an anchor.  Mention +rdma
implies Adv. SIMD.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b99ab1c..599dbf0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12426,24 +12426,26 @@ corresponding flag to the linker.
 @opindex march
 Specify the name of the target architecture, optionally suffixed by one or
 more feature modifiers.  This option has the form
-@option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where the
-permissible values for @var{arch} are @samp{armv8-a} or @samp{armv8.1-a}.
-The permissible values for @var{feature} are documented in the sub-section
-below.  Additionally on native AArch64 GNU/Linux systems the value
+@option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
+
+The permissible values for @var{arch} are @samp{armv8-a} or
+@samp{armv8.1-a}.
+
+For the permissible values for @var{feature}, see the sub-section on
+@ref{aarch64-feature-modifiers,,@option{-march} and @option{-mcpu}
+Feature Modifiers}.  Where conflicting feature modifiers are
+specified, the right-most feature is used.
+
+Additionally on native AArch64 GNU/Linux systems the value
 @samp{native} is available.  This option causes the compiler to pick the
 architecture of the host system.  If the compiler is unable to recognize the
 architecture of the host system this option has no effect.
 
-Where conflicting feature modifiers are specified, the right-most feature is
-used.
-
-GCC uses this name to determine what kind of instructions it can emit when
-generating assembly code.
-
-Where @option{-march} is specified without either of @option{-mtune}
-or @option{-mcpu} also being specified, the code is tuned to perform
-well across a range of target processors implementing the target
-architecture.
+GCC uses @var{name} to determine what kind of instructions it can emit
+when generating assembly code.  If @option{-march} is specified
+without either of @option{-mtune} or @option{-mcpu} also being
+specified, the code is tuned to perform well across a range of target
+processors implementing the target architecture.
 
 @item -mtune=@var{name}
 @opindex mtune
@@ -12456,12 +12458,11 @@ Additionally, this option can specify that GCC should tune the performance
 of the code for a big.LITTLE system.  Permissible values for this
 option are: @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}.
 
-Additionally on native AArch64 GNU/Linux systems the value @samp{native}
-is available.
-This option causes the compiler to pick the architecture of and tune the
-performance of the code for the processor of the host system.
-If the compiler is unable to recognize the processor of the host system
-this option has no effect.
+Additionally on native AArch64 GNU/Linux systems the value
+@samp{native} is available.  This option causes the compiler to pick
+the architecture of and tune the performance of the code for the
+processor of the host system.  If the compiler is unable to recognize
+the processor of the host system this option has no effect.
 
 Where none of @option{-mtune=}, @option{-mcpu=} or @option{-march=}
 are specified, the code is tuned to perform well across a range
@@ -12471,23 +12472,23 @@ This option cannot be suffixed by feature modifiers.
 
 @item -mcpu=@var{name}
 @opindex mcpu
-Specify the name of the target processor, optionally suffixed by one or more
-feature modifiers.  This option has the form
-@option{-mcpu=@var{cpu}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where the
-permissible values for @var{cpu} are the same as those available for
-@option{-mtune}.  Additionally on native AArch64 GNU/Linux systems the
-value @samp{native} is available.
-This option causes the compiler to tune the performance of the code for the
-processor of the host system.  If the compiler is unable to recognize the
-processor of the host system this option has no effect.
-
-The permissible values for @var{feature} are documented in the sub-section
-below.
-
-Where conflicting feature modifiers are specified, the right-most feature is
-used.
+Specify the name of the target processor, optionally suffixed by one
+or more feature modifiers.  This option has the form

Re: RFA: Fix isl-ast-gen-if-1.c test

2015-06-22 Thread Jeff Law

On 06/22/2015 10:07 AM, Nicholas Clifton wrote:

Hi Jeff,


I'd tend to prefer to change the size of the array -- adding another
conditional in the loop may have unintended consequences that possibly
scramble things just enough to compromise the test.


Okey dokey, here is a revised version.  Is this one OK ?

Cheers
   Nick

gcc/ChangeLog

Index: 2015-06-22  Nick Clifton  ni...@redhat.com

 * gcc.dg/graphite/isl-ast-gen-if.c (main): Increase size of a
 array to allow a[50] to be a valid location.

OK.
jeff



Re: [C++ Patch] Remove pointless code in grokdeclarator

2015-06-22 Thread Jason Merrill
I think we should keep a comment to clarify why we don't care about 
type_quals here.


Jason


Re: [C++ Patch] Remove pointless code in grokdeclarator

2015-06-22 Thread Paolo Carlini

Hi,

On 06/22/2015 06:56 PM, Jason Merrill wrote:
I think we should keep a comment to clarify why we don't care about 
type_quals here.

Ok, I will commit with a comment added.

Thanks,
Paolo.


Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Julian Brown
On Mon, 22 Jun 2015 16:24:56 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote:
  One problem is that (at least on the GPU hardware we've considered
  so far) we're somewhat constrained in how much control we have over
  how the underlying hardware executes code: it's possible to draw up
  a scheme where OpenACC source-level control-flow semantics are
  reflected directly in the PTX assembly output (e.g. to say all
  threads in a CTA/warp will be coherent after such-and-such a
  loop), and lowering OpenACC directives quite early seems to make
  that relatively tractable. (Even if the resulting code is
  relatively un-optimisable due to the abnormal edges inserted to
  make sure that the CFG doesn't become ill-formed.)
  
  If arbitrary optimisations are done between OMP-lowering time and
  somewhere around vectorisation (say), it's less clear if that
  correspondence can be maintained. Say if the code executed by half
  the threads in a warp becomes physically separated from the code
  executed by the other half of the threads in a warp due to some loop
  optimisation, we can no longer easily determine where that warp will
  reconverge, and certain other operations (relying on coherent warps
  -- e.g. CTA synchronisation) become impossible. A similar issue
  exists for warps within a CTA.
  
  So, essentially -- I don't know how late loop lowering would
  interact with:
  
  (a) Maintaining a CFG that will work with PTX.
  
  (b) Predication for worker-single and/or vector-single modes
  (actually all currently-proposed schemes have problems with proper
  representation of data-dependencies for variables and
  compiler-generated temporaries between predicated regions.)
 
 I don't understand why lowering the way you suggest helps here at all.
 In the proposed scheme, you essentially have whole function
 in e.g. worker-single or vector-single mode, which you need to be
 able to handle properly in any case, because users can write such
 routines themselves.  And then you can have a loop in such a function
 that has some special attribute, a hint that it is desirable to
 vectorize it (for PTX the PTX way) or use vector-single mode for it
 in a worker-single function.  So, the special pass then of course
 needs to handle all the needed broadcasting and reduction required to
 change the mode from e.g. worker-single to vector-single, but the
 convergence points still would be either on the boundary of such
 loops to be vectorized or parallelized, or wherever else they appear
 in normal vector-single or worker-single functions (around the calls
 to certainly calls?).

I think most of my concerns are centred around loops (with the markings
you suggest) that might be split into parts: if that cannot happen for
loops that are annotated as you describe, maybe things will work out OK.

(Apologies for my ignorance here, this isn't a part of the compiler
that I know anything about.)

Julian


[PATCH 2/2][ARM] Use new FPU features representation

2015-06-22 Thread Matthew Wahab

Hello,

This patch series changes the representation of FPU features to use a simple
bit-set and flags, as is done elsewhere.

This patch uses the new representation of FPU feature sets.

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  matthew.wa...@arm.com

* config/arm/arm-fpus.def: Replace neon, fp16 and crypto boolean
fields with feature flags.  Update comment.
* config/arm/arm.c (ARM_FPU): Update macro.
* config/arm/arm.h (TARGET_NEON_FP16): Update feature test.
(TARGET_FP16): Likewise.
(TARGET_CRYPTO): Likewise.
(TARGET_NEON): Likewise.
(struct arm_fpu_desc): Remove fields neon, fp16 and crypto.  Add
field features.

From 6f9cd1b41d7597d95bd80aa21344f8e6e011e168 Mon Sep 17 00:00:00 2001
From: Matthew Wahab matthew.wa...@arm.com
Date: Wed, 10 Jun 2015 10:11:56 +0100
Subject: [PATCH 2/2] Use new FPU feature definitions.

Change-Id: I0c45e52b08b31433ec2b30fcb666584cabcb826b
---
 gcc/config/arm/arm-fpus.def | 40 
 gcc/config/arm/arm.c|  4 ++--
 gcc/config/arm/arm.h| 22 +-
 3 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index 2dfefd6..efd5896 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -19,30 +19,30 @@
 
 /* Before using #include to read this file, define a macro:
 
-  ARM_FPU(NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO)
+  ARM_FPU(NAME, MODEL, REV, VFP_REGS, FEATURES)
 
The arguments are the fields of struct arm_fpu_desc.
 
genopt.sh assumes no whitespace up to the first , in each entry.  */
 
-ARM_FPU(vfp,		ARM_FP_MODEL_VFP, 2, VFP_REG_D16, false, false, false)
-ARM_FPU(vfpv3,	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false, false)
-ARM_FPU(vfpv3-fp16,	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, true, false)
-ARM_FPU(vfpv3-d16,	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, false, false)
-ARM_FPU(vfpv3-d16-fp16,	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, true, false)
-ARM_FPU(vfpv3xd,	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, false, false)
-ARM_FPU(vfpv3xd-fp16,	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, true, false)
-ARM_FPU(neon,		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true , false, false)
-ARM_FPU(neon-fp16,	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true, true, false)
-ARM_FPU(vfpv4,	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, false, true, false)
-ARM_FPU(vfpv4-d16,	ARM_FP_MODEL_VFP, 4, VFP_REG_D16, false, true, false)
-ARM_FPU(fpv4-sp-d16,	ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, false, true, false)
-ARM_FPU(fpv5-sp-d16,	ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, false, true, false)
-ARM_FPU(fpv5-d16,	ARM_FP_MODEL_VFP, 5, VFP_REG_D16, false, true, false)
-ARM_FPU(neon-vfpv4,	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, true, true, false)
-ARM_FPU(fp-armv8,	ARM_FP_MODEL_VFP, 8, VFP_REG_D32, false, true, false)
-ARM_FPU(neon-fp-armv8,ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, false)
+ARM_FPU(vfp,		ARM_FP_MODEL_VFP, 2, VFP_REG_D16, FPU_FL_NONE)
+ARM_FPU(vfpv3,	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE)
+ARM_FPU(vfpv3-fp16,	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU(vfpv3-d16,	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_NONE)
+ARM_FPU(vfpv3-d16-fp16,	ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU(vfpv3xd,	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, FPU_FL_NONE)
+ARM_FPU(vfpv3xd-fp16,	ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU(neon,		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON)
+ARM_FPU(neon-fp16,	ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
+ARM_FPU(vfpv4,	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU(vfpv4-d16,	ARM_FP_MODEL_VFP, 4, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU(fpv4-sp-d16,	ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU(fpv5-sp-d16,	ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, FPU_FL_FP16)
+ARM_FPU(fpv5-d16,	ARM_FP_MODEL_VFP, 5, VFP_REG_D16, FPU_FL_FP16)
+ARM_FPU(neon-vfpv4,	ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
+ARM_FPU(fp-armv8,	ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_FP16)
+ARM_FPU(neon-fp-armv8,ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
 ARM_FPU(crypto-neon-fp-armv8,
-			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, true)
+			ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_CRYPTO)
 /* Compatibility aliases.  */
-ARM_FPU(vfp3,		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false, false)
+ARM_FPU(vfp3,		ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e79a369..e104d2f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2231,8 +2231,8 @@ char arm_arch_name[] = __ARM_ARCH_0UNK__;
 
 static const struct arm_fpu_desc all_fpus[] =
 {
-#define ARM_FPU(NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO) \
-  { NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO },
+#define ARM_FPU(NAME, MODEL, REV, VFP_REGS, FEATURES) \
+  { 

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Bernd Schmidt

On 06/22/2015 04:24 PM, Jakub Jelinek wrote:

I don't understand why lowering the way you suggest helps here at all.
In the proposed scheme, you essentially have whole function
in e.g. worker-single or vector-single mode, which you need to be able to
handle properly in any case, because users can write such routines
themselves.  And then you can have a loop in such a function that
has some special attribute, a hint that it is desirable to vectorize it
(for PTX the PTX way) or use vector-single mode for it in a worker-single
function.


You can have a hint that it is desirable, but not a hint that it is 
correct (because passes in between may invalidate that). The OpenACC 
directives guarantee to the compiler that the program can be transformed 
into a parallel form. If we lose them early we must then rely on our 
analysis which may not be strong enough to prove that the loop can be 
parallelized. If we make these transformations early enough, while we 
still have the OpenACC directives, we can guarantee that we do exactly 
what the programmer specified.



Bernd



[PATCH 2/4][ARM] Add feature set definitions.

2015-06-22 Thread Matthew Wahab

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch adds, but doesn't use, type arm_feature_set and macros prefixed
with ARM_FSET to represent and operate on feature sets.

Tested by building with no errors. Also tested as part of the series, for
arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  matthew.wa...@arm.com

* config/arm/arm-protos.h (FL_NONE): New.
(FL_ANY): New.
(arm_feature_set): New.
(ARM_FSET_MAKE): New.
(ARM_FSET_MAKE_CPU1): New.
(ARM_FSET_MAKE_CPU2): New.
(ARM_FSET_CPU1): New.
(ARM_FSET_CPU2): New.
(ARM_FSET_EMPTY): New.
(ARM_FSET_ANY): New.
(ARM_FSET_HAS_CPU1): New.
(ARM_FSET_HAS_CPU2): New.
(ARM_FSET_ADD_CPU1): New.
(ARM_FSET_ADD_CPU2): New.
(ARM_FSET_DEL_CPU1): New.
(ARM_FSET_DEL_CPU2): New.
(ARM_FSET_UNION): New.
(ARM_FSET_INTER): New.
(ARM_FSET_XOR): New.
(ARM_FSET_EXCLUDE): New.
(AFM_FSET_IS_EMPTY): New.
(ARM_FSET_CPU_SUBSET): New.

From 1a98a80b64427f7bb97212ae9ecff515e980ddb7 Mon Sep 17 00:00:00 2001
From: Matthew Wahab matthew.wa...@arm.com
Date: Thu, 4 Jun 2015 15:35:25 +0100
Subject: [PATCH 2/4] Add feature set definitions.

Change-Id: I5f89b46ea57e35f477ec4751fea3cb6ee8fce251
---
 gcc/config/arm/arm-protos.h | 101 
 1 file changed, 101 insertions(+)

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 62f91ef..a19d54d 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -346,6 +346,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 /* Flags used to identify the presence of processor capabilities.  */
 
 /* Bit values used to identify processor capabilities.  */
+#define FL_NONE	  (0)	  /* No flags.  */
+#define FL_ANY	  (0x)/* All flags.  */
 #define FL_CO_PROC(1  0)/* Has external co-processor bus */
 #define FL_ARCH3M (1  1)/* Extended multiply */
 #define FL_MODE26 (1  2)/* 26-bit mode support */
@@ -412,6 +414,105 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
 
+/* There are too many feature bits to fit in a single word so the set of cpu and
+   fpu capabilities is a structure.  A feature set is created and manipulated
+   with the ARM_FSET macros.  */
+
+typedef struct
+{
+  unsigned long cpu[2];
+} arm_feature_set;
+
+
+/* Initialize a feature set.  */
+
+#define ARM_FSET_MAKE(CPU1,CPU2) { { (CPU1), (CPU2) } }
+
+#define ARM_FSET_MAKE_CPU1(CPU1) ARM_FSET_MAKE ((CPU1), (FL_NONE))
+#define ARM_FSET_MAKE_CPU2(CPU2) ARM_FSET_MAKE ((FL_NONE), (CPU2))
+
+/* Accessors.  */
+
+#define ARM_FSET_CPU1(S) ((S).cpu[0])
+#define ARM_FSET_CPU2(S) ((S).cpu[1])
+
+/* Useful combinations.  */
+
+#define ARM_FSET_EMPTY ARM_FSET_MAKE (FL_NONE, FL_NONE)
+#define ARM_FSET_ANY ARM_FSET_MAKE (FL_ANY, FL_ANY)
+
+/* Tests for a specific CPU feature.  */
+
+#define ARM_FSET_HAS_CPU1(A, F)  (((A).cpu[0]  (F)) == F)
+#define ARM_FSET_HAS_CPU2(A, F)  (((A).cpu[1]  (F)) == F)
+
+/* Add a feature to a feature set.  */
+
+#define ARM_FSET_ADD_CPU1(DST, F)		\
+  do {		\
+(DST).cpu[0] |= (F);			\
+  } while (0)
+
+#define ARM_FSET_ADD_CPU2(DST, F)		\
+  do {		\
+(DST).cpu[1] |= (F);			\
+  } while (0)
+
+/* Remove a feature from a feature set.  */
+
+#define ARM_FSET_DEL_CPU1(DST, F)		\
+  do {		\
+(DST).cpu[0] = ~(F);			\
+  } while (0)
+
+#define ARM_FSET_DEL_CPU2(DST, F)		\
+  do {		\
+(DST).cpu[1] = ~(F);			\
+  } while (0)
+
+/* Union of feature sets.  */
+
+#define ARM_FSET_UNION(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0] | (F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1] | (F2).cpu[1];	\
+  } while (0)
+
+/* Intersection of feature sets.  */
+
+#define ARM_FSET_INTER(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0]  (F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1]  (F2).cpu[1];	\
+  } while (0)
+
+/* Exclusive disjunction.  */
+
+#define ARM_FSET_XOR(DST,F1,F2)\
+  do {			\
+(DST).cpu[0] = (F1).cpu[0] ^ (F2).cpu[0];		\
+(DST).cpu[1] = (F1).cpu[1] ^ (F2).cpu[1];		\
+  } while (0)
+
+/* Difference of feature sets: F1 excluding the elements of F2.  */
+
+#define ARM_FSET_EXCLUDE(DST,F1,F2)		\
+  do {		\
+(DST).cpu[0] = (F1).cpu[0]  ~(F2).cpu[0];	\
+(DST).cpu[1] = (F1).cpu[1]  ~(F2).cpu[1];	\
+  } while (0)
+
+/* Test for an empty feature set.  */
+
+#define ARM_FSET_IS_EMPTY(A)		\
+  (!((A).cpu[0])  !((A).cpu[1]))
+
+/* Tests whether the cpu features of A are a subset of B.  */
+
+#define ARM_FSET_CPU_SUBSET(A,B)	\
+  A).cpu[0]  (B).cpu[0]) == (A).cpu[0])			

Re: genmatch: guess the type of a?b:c as b instead of a

2015-06-22 Thread Jeff Law

On 06/06/2015 05:34 AM, Marc Glisse wrote:

Hello,

as discussed around
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00041.html
we are currently guessing the type of a?b:c incorrectly. This does not
affect current simplifications, because the only 'cond' in output
patterns are at the outermost level, so their type is forced to 'type'
and never guessed. Indeed, the patch does not change the generated
*-match.c. It would allow removing an explicit cond:itype in a patch
posted by Jeff.

I tested it on a dummy .pd file containing:
(simplify
  (plus @0 (plus @1 @2))
  (negate (cond @0 @1 @2)))

and the generated files differ by:

-  res = fold_build3_loc (loc, COND_EXPR, TREE_TYPE (ops1[0]), ops1[0],
ops1[1], ops1[2]);
+  res = fold_build3_loc (loc, COND_EXPR, TREE_TYPE (ops1[1]), ops1[0],
ops1[1], ops1[2]);

(and something similar for gimple)

I wondered about using something like
VOID_TYPE_P (TREE_TYPE (ops1[1])) ? TREE_TYPE (ops1[2]) : TREE_TYPE
(ops1[1])
but I don't think that will be necessary.

Bootstrap is currently broken on many platforms with comparison
failures, but since it went that far and generated the same *-match.c
files, that seems sufficient testing.

2015-06-08  Marc Glisse  marc.gli...@inria.fr

 * genmatch.c (expr::gen_transform): For conditions, guess the type
 from the second operand.
Thanks for taking care of this.  I'd gone back and verified the type was 
needed, but didn't have to time reduce a testcase for it prior to going 
on PTO.


Jeff


[PATCH 4/4][ARM] Move initializer into arm-cores.def and arm-arches.def

2015-06-22 Thread Matthew Wahab

Hello,

The ARM backend uses an unsigned long to record CPU feature flags and there are
currently 30 bits in use. This series of patches replaces the single unsigned
long with a representation based on an array of values.

This patch updates the entries in the arm-core.def and arm-arches.def files
for the new arm_feature_set representation, moving the initializers from a macro
expansion and making them explicit in the file entries.

Tested for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-08-22  Matthew Wahab  matthew.wa...@arm.com

* config/arm/arm-arches.def: Replace single value flags with
initializer built from ARM_FSET_MAKE_CPU1.
* config/arm/arm-cores.def: Likewise.
* config/arm/arm.c: (all_cores): Remove ARM_FSET_MAKE_CPU1
derivation from the ARM_CORE macro definition, use the given value
instead.
(all_architectures): Remove ARM_FSET_MAKE_CPU1 derivation from the
ARM_ARCH macro definition, use the given value instead.

From 389cfb0e1046b1d84dd3d8920aa5bed50dc19164 Mon Sep 17 00:00:00 2001
From: Matthew Wahab matthew.wa...@arm.com
Date: Mon, 8 Jun 2015 16:15:52 +0100
Subject: [PATCH 4/4] Move feature sets into core and arch def files.

Change-Id: Ica484c7d9f46413c196b26a630ff49413b10289b
---
 gcc/config/arm/arm-arches.def |  56 ++--
 gcc/config/arm/arm-cores.def  | 200 +-
 gcc/config/arm/arm.c  |   4 +-
 3 files changed, 130 insertions(+), 130 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 840c1ff..6d0374a 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -28,33 +28,33 @@
 
genopt.sh assumes no whitespace up to the first , in each entry.  */
 
-ARM_ARCH(armv2,   arm2,   2,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)
-ARM_ARCH(armv2a,  arm2,   2,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)
-ARM_ARCH(armv3,   arm6,   3,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3)
-ARM_ARCH(armv3m,  arm7m,  3M,  FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M)
-ARM_ARCH(armv4,   arm7tdmi,   4,   FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4)
+ARM_ARCH(armv2,   arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH(armv2a,  arm2,   2,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2))
+ARM_ARCH(armv3,   arm6,   3,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3))
+ARM_ARCH(armv3m,  arm7m,  3M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M))
+ARM_ARCH(armv4,   arm7tdmi,   4,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4))
 /* Strictly, FL_MODE26 is a permitted option for v4t, but there are no
implementations that support it, so we will leave it out for now.  */
-ARM_ARCH(armv4t,  arm7tdmi,   4T,  FL_CO_PROC | FL_FOR_ARCH4T)
-ARM_ARCH(armv5,   arm10tdmi,  5,   FL_CO_PROC | FL_FOR_ARCH5)
-ARM_ARCH(armv5t,  arm10tdmi,  5T,  FL_CO_PROC | FL_FOR_ARCH5T)
-ARM_ARCH(armv5e,  arm1026ejs, 5E,  FL_CO_PROC | FL_FOR_ARCH5E)
-ARM_ARCH(armv5te, arm1026ejs, 5TE, FL_CO_PROC | FL_FOR_ARCH5TE)
-ARM_ARCH(armv6,   arm1136js,  6,   FL_CO_PROC | FL_FOR_ARCH6)
-ARM_ARCH(armv6j,  arm1136js,  6J,  FL_CO_PROC | FL_FOR_ARCH6J)
-ARM_ARCH(armv6k,  mpcore,	6K,  FL_CO_PROC | FL_FOR_ARCH6K)
-ARM_ARCH(armv6z,  arm1176jzs, 6Z,  FL_CO_PROC | FL_FOR_ARCH6Z)
-ARM_ARCH(armv6zk, arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK)
-ARM_ARCH(armv6t2, arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2)
-ARM_ARCH(armv6-m, cortexm1,	6M,			  FL_FOR_ARCH6M)
-ARM_ARCH(armv6s-m, cortexm1,	6M,			  FL_FOR_ARCH6M)
-ARM_ARCH(armv7,   cortexa8,	7,   FL_CO_PROC |	  FL_FOR_ARCH7)
-ARM_ARCH(armv7-a, cortexa8,	7A,  FL_CO_PROC |	  FL_FOR_ARCH7A)
-ARM_ARCH(armv7ve, cortexa8,	7A,  FL_CO_PROC |	  FL_FOR_ARCH7VE)
-ARM_ARCH(armv7-r, cortexr4,	7R,  FL_CO_PROC |	  FL_FOR_ARCH7R)
-ARM_ARCH(armv7-m, cortexm3,	7M,  FL_CO_PROC |	  FL_FOR_ARCH7M)
-ARM_ARCH(armv7e-m, cortexm4,  7EM, FL_CO_PROC |	  FL_FOR_ARCH7EM)
-ARM_ARCH(armv8-a, cortexa53,  8A,  FL_CO_PROC | FL_FOR_ARCH8A)
-ARM_ARCH(armv8-a+crc,cortexa53, 8A,FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A)
-ARM_ARCH(iwmmxt,  iwmmxt, 5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)
-ARM_ARCH(iwmmxt2, iwmmxt2,5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)
+ARM_ARCH(armv4t,  arm7tdmi,   4T,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH4T))
+ARM_ARCH(armv5,   arm10tdmi,  5,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5))
+ARM_ARCH(armv5t,  arm10tdmi,  5T,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5T))
+ARM_ARCH(armv5e,  arm1026ejs, 5E,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5E))
+ARM_ARCH(armv5te, arm1026ejs, 5TE,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | 

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Jakub Jelinek
On Mon, Jun 22, 2015 at 12:08:36PM -0400, Nathan Sidwell wrote:
 On 06/22/15 11:18, Bernd Schmidt wrote:
 
 You can have a hint that it is desirable, but not a hint that it is correct
 (because passes in between may invalidate that). The OpenACC directives
 guarantee to the compiler that the program can be transformed into a parallel
 form. If we lose them early we must then rely on our analysis which may not 
 be
 strong enough to prove that the loop can be parallelized. If we make these
 transformations early enough, while we still have the OpenACC directives, we 
 can
 guarantee that we do exactly what the programmer specified.
 
 How does this differ from openmp's needs to preserve parallelism on a
 parallel loop?  Is it more than the reconvergence issue?

OpenMP has significantly different execution model, a parallel block in
OpenMP is run by certain number of threads (the initial thread (the one
encountering that region) and then dpeending on clauses and library
decisions perhaps others), with a barrier at the end of the region, and
afterwards only the initial thread continues again.
So, an OpenMP parallel is implemented as a library call, taking outlined
function from the parallel's body as one of its arguments and the body
is executed by the initial thread and perhaps others.
OpenMP worksharing loop is just coordination between the threads in the
team, which thread takes which subset of the loop's iterations, and
optionally followed by a barrier.  OpenMP simd loop is a loop that has
certain properties guaranteed by the user and can be vectorized.
In contrast to this, OpenACC spawns all the threads/CTAs upfront, and then
idles on some of them until there is work for them.

Jakub


Re: [gomp4] Remove some ptxness from middle end

2015-06-22 Thread Nathan Sidwell

On 06/22/15 13:04, Marek Polacek wrote:

On Mon, Jun 22, 2015 at 01:00:51PM -0400, Nathan Sidwell wrote:

+  if (GET_CODE (arg) != CONST_INT
+  || (unsigned HOST_WIDE_INT)INTVAL (arg) = OACC_HWM)


Don't we have UINTVAL for this?  So UINTVAL (arg).


Oh, thanks! will fix

nathan

--
Nathan Sidwell


RFA: Fix isl-ast-gen-if-1.c test

2015-06-22 Thread Nick Clifton
Hi Guys,

  The test file gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c file
  was generating an unexpected failure for the RX.  When I investigated
  I found that a return address on the stack was being corrupted, and I
  tracked it down to the foo() function:

foo (int a[], int n)
{
  int i;
  for (i = 0; i  n; i++)
{
  if (i  25)
a[i] = i;
  a[n - i] = 1;
}
}

  The problem is that when i is 0, the line a[n - i] writes to a[50]
  which is beyond the end of the a array.  (In the RX case it writes
  over the return address on the stack).

  The patch below fixes the problem, although it could also be solved by
  increasing the size of the a array when it is declared in main().

  OK to apply ?

Cheers
  Nick

gcc/testsuite/ChangeLog
2015-06-22  Nick Clifton  ni...@redhat.com

* gcc.dg/graphite/isl-ast-gen-if-1.c (foo): Prevent writing after
the end of the array.

Index: gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c
===
--- gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(revision 224722)
+++ gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(working copy)
@@ -10,7 +10,8 @@
 {
   if (i  25)
 a[i] = i;
-  a[n - i] = 1;
+  if (i  0)
+   a[n - i] = 1;
 }
 }
 


Re: [PATCH 3/3] Improve -Wmissing-indentation heuristics

2015-06-22 Thread Jeff Law

On 06/18/2015 10:57 AM, Patrick Palka wrote:
[ big snip ]



These bogus warnings are pre-existing, however (i.e. not caused by this
patch).


(nods)   Fixing the false positives from libpng/bdwgc sounds like a
separate issue and thus a separate patch then.

Agreed.

jeff


[gomp4] Remove some ptxness from middle end

2015-06-22 Thread Nathan Sidwell
I've committed this patch to the gomp4 branch, after testing.  It does a number 
of cleanups


1) removes the ptx-specific TID, NTID, CTAID  NCTAID builtins, replacing them 
with openacc-specific GOACC_id and GOACC_nid builtins, using gang/worker  
vector level enumeration.  These are mapped by the PTX backend to PTX-specifc 
instructions.


2) Created a  oacc_loop_levels enumeration, and generate the loop nest masks 
from that.


3) Removed a bunch of duplicate calculations in omp-low related to determining 
number of threads and thread index. With #2 it becomes easier to use a loop.


nathan
--
Nathan Sidwell
2015-06-20  Nathan Sidwell  nat...@codesourcery.com

	gcc/
	* omp-builtins.def (BUILT_IN_GOACC_NTID, BUILTIN_NCTAID): Replace
	with ...
	(BUILT_IN_GOACC_NID): ... this.
	(BUILT_IN_GOACC_TID, BUILTIN_CTAID): Replace with ...
	(BUILT_IN_GOACC_ID): ... this.
	* builtins.c: Include omp-low.h.
	(expand_oacc_buoltin): Replace with ...
	(expand_oacc_id): ... this.
	(expand_builtin, is_simple_builtin): Adjust.oo
	* omp-low.h (enum oacc_loop_levels): New.
	* omp-low.c (MASK_GANG, MASK_WORKER, MASK_VECTOR): Replace with ...
	(OACC_LOOP_MASK): ... this.
	(scan_omp_for, scan_omp_target): Adjust.
	(expand_oacc_get_num_threads): Adjust and use a loop.
	(expand_oacc_get_thread_num): Likewise.
	(oacc_loop_needs_thread_barrier_p, find_omp_for_region_gwv,
	find_omp_taarget_region_data, required_predication_mask,
	generate_vector_broadcast, generate_oacc_broadcast): Adjust.
	(make_predication_test): Adjust and use a loop.
	(predicate_bb, oacc_broadcast, oacc_init_count_vars): Adjust.
	* config/nvptx/nvptx.md (UNSPEC_NTID, UNSPEC_TID, UNSPEC_NCTAID,
	UNSPEC_CTAID): Replace with ...
	(UNSPEC_NID, UNSPEC_ID): ... these.
	(*oacc_ntid_insn, oacc_ntid, *oacc_tid_insn, oacc_tid,
	*oacc_nctaid_insn, oacc_nctaid, *oacc_ctaid_insn,
	oacc_ctaid): Replace with ...
	(oacc_nid, oacc_id): ... these.
	* config/nvptx/nvptx.c (nvptx_print_operand [CASE 'd']): Remove.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Replace
	GOACC_ctaid builtin with GOACC_id.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c	(revision 224671)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c	(working copy)
@@ -35,38 +35,38 @@ main ()
 
 #pragma acc parallel loop gang (static:*) num_gangs (10)
   for (i = 0; i  100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_nonstatic (a, 10);
 
 #pragma acc parallel loop gang (static:1) num_gangs (10)
   for (i = 0; i  100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_static (a, 10, 1);
 
 #pragma acc parallel loop gang (static:2) num_gangs (10)
   for (i = 0; i  100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_static (a, 10, 2);
 
 #pragma acc parallel loop gang (static:5) num_gangs (10)
   for (i = 0; i  100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_static (a, 10, 5);
 
 #pragma acc parallel loop gang (static:20) num_gangs (10)
   for (i = 0; i  100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_static (a, 10, 20);
 
   /* Non-static gang.  */
 #pragma acc parallel loop gang num_gangs (10)
   for (i = 0; i  100; i++)
-a[i] = __builtin_GOACC_ctaid (0);
+a[i] = __builtin_GOACC_id (0);
 
   test_nonstatic (a, 10);
 
Index: gcc/omp-builtins.def
===
--- gcc/omp-builtins.def	(revision 224671)
+++ gcc/omp-builtins.def	(working copy)
@@ -61,13 +61,9 @@ DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, GOACC_wait,
 		   BT_FN_VOID_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NTID, GOACC_ntid,
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ID, GOACC_id,
 		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_TID, GOACC_tid,
-		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NCTAID, GOACC_nctaid,
-		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_CTAID, GOACC_ctaid,
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NID, GOACC_nid,
 		   BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_GANGLOCAL_PTR, GOACC_get_ganglocal_ptr,
 		   BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST)
Index: gcc/config/nvptx/nvptx.md
===
--- gcc/config/nvptx/nvptx.md	(revision 224671)
+++ gcc/config/nvptx/nvptx.md	(working copy)
@@ -49,10 +49,8 @@
 
UNSPEC_ALLOCA
 
-   UNSPEC_NTID
-   UNSPEC_TID
-   UNSPEC_NCTAID
-   UNSPEC_CTAID
+   UNSPEC_NID
+   UNSPEC_ID
 
UNSPEC_SHARED_DATA
 ])
@@ -1263,65 +1261,32 @@
   DONE;
 })
 
-(define_insn 

[patch] Fix std::polar() test FAIL

2015-06-22 Thread Jonathan Wakely

I recently added a debug mode assertion that std::polar is not called
with a negative rho argument, which this test does.

Tested powerpc64le-linux, committed to trunk.

commit 3592c4a31ba7f3af4eb8111565888651652ad7b1
Author: Jonathan Wakely jwak...@redhat.com
Date:   Mon Jun 22 15:08:55 2015 +0100

	* testsuite/26_numerics/complex/value_operations/1.cc: Use
	non-negative rho argument.

diff --git a/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc b/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc
index 1caf9f1..a1e0a6b 100644
--- a/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc
+++ b/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc
@@ -53,7 +53,7 @@ void test01()
 
  complex_type e __attribute__((unused)) = conj(c);
  
- complex_type f = polar(c.imag(), 0.0);
+ complex_type f = polar(std::abs(c.imag()), 0.0);
  VERIFY( f.real() != 0 );
 }
 


Re: [patch] libstdc++/55409 C++11 allocator support for std::list

2015-06-22 Thread Jonathan Wakely

On 17/06/15 21:36 +0100, Jonathan Wakely wrote:

I didn't get time to finish this for 5.1, but this adds missing C++11
allocator support to std::list.


François pointed out that this change means we can update
__gnu_debug::list to derive from an allocator-aware _Safe_sequence.

We can do the same for __cxx11::basic_string by evaluating
bool(_GLIBCXX_USE_CXX11_ABI).

Tested powerpc64-linux, committed to trunk.
commit e5fb5e8dc5d1b1ffeaa48cd1d05f76ee93bc377d
Author: Jonathan Wakely jwak...@redhat.com
Date:   Thu Jun 18 10:02:15 2015 +0100

	* include/debug/list (__gnu_debug::list): Use allocator-aware
	_Safe_container base.
	* include/debug/string (__gnu_debug::basic_string): Use
	allocator-aware _Safe_container base for cxx11 ABI.

diff --git a/libstdc++-v3/include/debug/list b/libstdc++-v3/include/debug/list
index 1562946..12ac53c 100644
--- a/libstdc++-v3/include/debug/list
+++ b/libstdc++-v3/include/debug/list
@@ -43,12 +43,12 @@ namespace __debug
 class list
 : public __gnu_debug::_Safe_container
 	list_Tp, _Allocator, _Allocator,
-	__gnu_debug::_Safe_node_sequence, false,
+	__gnu_debug::_Safe_node_sequence,
   public _GLIBCXX_STD_C::list_Tp, _Allocator
 {
-  typedef _GLIBCXX_STD_C::list_Tp, _Allocator			_Base;
+  typedef _GLIBCXX_STD_C::list_Tp, _Allocator		_Base;
   typedef __gnu_debug::_Safe_container
-	list, _Allocator, __gnu_debug::_Safe_node_sequence, false	_Safe;
+	list, _Allocator, __gnu_debug::_Safe_node_sequence	_Safe;
 
   typedef typename _Base::iterator		_Base_iterator;
   typedef typename _Base::const_iterator	_Base_const_iterator;
diff --git a/libstdc++-v3/include/debug/string b/libstdc++-v3/include/debug/string
index 3793a35..f068ef0 100644
--- a/libstdc++-v3/include/debug/string
+++ b/libstdc++-v3/include/debug/string
@@ -42,12 +42,13 @@ namespace __gnu_debug
 class basic_string
 : public __gnu_debug::_Safe_container
 	basic_string_CharT, _Traits, _Allocator,
-	_Allocator, _Safe_sequence, false,
+	_Allocator, _Safe_sequence, bool(_GLIBCXX_USE_CXX11_ABI),
   public std::basic_string_CharT, _Traits, _Allocator
 {
   typedef std::basic_string_CharT, _Traits, _Allocator	_Base;
   typedef __gnu_debug::_Safe_container
-	basic_string, _Allocator, _Safe_sequence, false	_Safe;
+	basic_string, _Allocator, _Safe_sequence, bool(_GLIBCXX_USE_CXX11_ABI)
+	_Safe;
 
   public:
 // types:


Re: Do not take address of empty string front

2015-06-22 Thread Jonathan Wakely

On 20/06/15 12:59 +0100, Jonathan Wakely wrote:

On 20/06/15 12:03 +0200, François Dumont wrote:

Hi

  2 experimental tests are failing in debug mode because
__do_str_codecvt is sometimes taking address of string front() and
back() even if empty. It wasn't use so not a big issue but it still
seems better to avoid. I propose to rather use string begin() to get
buffer address.


But derefencing begin() is still undefined for an empty string.
Shouldn't that fail for debug mode too? Why change one form of
undefined behaviour that we diagnose to another form that we don't
diagnose?

It would be better if that function didn't do any work when the input
range is empty:

--- a/libstdc++-v3/include/bits/locale_conv.h
+++ b/libstdc++-v3/include/bits/locale_conv.h
@@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _OutStr __outstr, const _Codecvt __cvt, _State __state,
   size_t __count, _Fn __fn)
   {
+  if (__first == __last)
+   {
+ __outstr.clear();
+ return true;
+   }
+
 size_t __outchars = 0;
 auto __next = __first;
 const auto __maxlen = __cvt.max_length() + 1;


This makes that change, and also moves wstring_convert into the
ABI-tagged __cxx11 namespace, and fixes a copypaste error in the
exception thrown from wbuffer_convert.

Tested powerpc64le-linux, committed to trunk.

François, your changes to add extra checks in std::string are still
useful separately.

commit 4ab3f0a76f7e18074c91c4644cbfdf23084e93ba
Author: Jonathan Wakely jwak...@redhat.com
Date:   Mon Jun 22 13:47:24 2015 +0100

	* include/bits/locale_conv.h (__do_str_codecvt): Handle empty range.
	(wstring_convert): Move into __cxx11 namespace.
	(wbuffer_convert(streambuf*, _Codecvt*, state_type)): Fix exception
	message.

diff --git a/libstdc++-v3/include/bits/locale_conv.h b/libstdc++-v3/include/bits/locale_conv.h
index 61b535c..fd99499 100644
--- a/libstdc++-v3/include/bits/locale_conv.h
+++ b/libstdc++-v3/include/bits/locale_conv.h
@@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		 _OutStr __outstr, const _Codecvt __cvt, _State __state,
 		 size_t __count, _Fn __fn)
 {
+  if (__first == __last)
+	{
+	  __outstr.clear();
+	  return true;
+	}
+
   size_t __outchars = 0;
   auto __next = __first;
   const auto __maxlen = __cvt.max_length() + 1;
@@ -150,6 +156,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __str_codecvt_out(__first, __last, __outstr, __cvt, __state, __n);
 }
 
+_GLIBCXX_BEGIN_NAMESPACE_CXX11
+
   /// String conversions
   templatetypename _Codecvt, typename _Elem = wchar_t,
 	   typename _Wide_alloc = allocator_Elem,
@@ -301,6 +309,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool			_M_with_strings = false;
 };
 
+_GLIBCXX_END_NAMESPACE_CXX11
+
   /// Buffer conversions
   templatetypename _Codecvt, typename _Elem = wchar_t,
 	   typename _Tr = char_traits_Elem
@@ -325,7 +335,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   : _M_buf(__bytebuf), _M_cvt(__pcvt), _M_state(__state)
   {
 	if (!_M_cvt)
-	  __throw_logic_error(wstring_convert);
+	  __throw_logic_error(wbuffer_convert);
 
 	_M_always_noconv = _M_cvt-always_noconv();
 


Re: Fix more of C/fortran canonical type issues

2015-06-22 Thread Richard Biener
On Mon, 22 Jun 2015, Jan Hubicka wrote:

   On Mon, 8 Jun 2015, Jan Hubicka wrote:
   
 
 I think we should instead work towards eliminating the get_alias_set
 langhook first.  The LTO langhook variant contains the same handling, 
 btw,
 so just inline that into get_alias_set and see what remains?

I see, i completely missed existence of gimple_get_alias_set. It makes 
more
sense now.

Is moving everyting to alias.c realy a desirable thing? If non-C 
languages do
not have this rule, why we want to reduce the code quality when 
compiling
those?
   
   Well, for consistency and for getting rid of one langhook ;)
  :)
  In a way this particular langhook makes sense to me - TBAA rules are 
  language specific.
  We also may with explicit streaming of the TBAA dag, like LLVM does.
  
  Anyway, this is the updated patch fixing the Fortran's interoperability with
  size_t and signed char.  I will send separate patch for the extra lto-symtab
  warnings shortly.
  
  I will be happy looking into the TYPE_CANONICAL (int) to be different from
  TYPE_CANONICAL (unsigned int) if that seems desirable. There are two things 
  that
  needs to be solved - 
  hash_canonical_type/gimple_canonical_types_compatible_p can't
  use TYPE_CNAONICAL of subtypes in all cases (that is easy) and we will need 
  some
  way to recognize the conflict in lto-symtab other thanjust comparing 
  TYPE_CANONICAL
  to not warn when a variable is declared signed in Fortran unit and unsigned 
  in C.
  
  Bootstrapped/regtested ppc64le-linux.
  
  * lto/lto.c (hash_canonical_type): Do not hash TYPE_UNSIGNED
  of INTEGER_TYPE.
  * tree.c (gimple_canonical_types_compatible_p): Do not compare 
  TYPE_UNSIGNED
  of INTEGER_TYPE.
  * gimple-expr.c (useless_type_conversion_p): Move INTEGER type handling
  ahead the canonical type lookup.
  
  * gfortran.dg/lto/bind_c-2_0.f90: New testcase
  * gfortran.dg/lto/bind_c-2_1.c: New testcase
  * gfortran.dg/lto/bind_c-3_0.f90: New testcase
  * gfortran.dg/lto/bind_c-3_1.c: New testcase
  * gfortran.dg/lto/bind_c-4_0.f90: New testcase
  * gfortran.dg/lto/bind_c-4_1.c: New testcase
 
 Hi,
 I would like to ping this.  There are still few things to fix to make our
 merging compliant at least for C/C++/Fortran rules (the array bounds for
 Fortran and union ordering for C I believe) and I would like to progress
 on this.

I don't like the changes to useless_type_conversion_p much.  Why
do you preserve qualifiers for the integer kind compares?

All the testcases have the integral types in aggregates as members.
I already said that I'm happy globbing them together in aggregates.

I'm still not convinced that we need a 1:1 correspondence between
canonical types and alias sets.  In particular canonical types are
used for type compatibility in lhs = rhs assignments
(useless_type_conversion_p) which is a transitive relation.
Mixing both too much will cause serious confusion.  We have
alias-sets for a reason.

Richard.


Re: [VRP] Improve value ranges for unsigned division

2015-06-22 Thread Richard Biener
On Sat, Jun 20, 2015 at 9:12 AM, Kugan
kugan.vivekanandara...@linaro.org wrote:
 As discussed in PR64130, this patch improves the VRP value ranges for
 unsigned division.

 Bootstrapped and regression tested on x86_64-linux-gnu and regression
 tested on arm-none-linux-gnu with no new regression.

 Is this OK for trunk?

Hum, the patch is at least incomplete not covering the
cmp == -1 case in the max value computation, no?

Also I wonder if we have two VR_RANGEs as you require
the code using extract_range_from_multiplicative_op_1 isn't
better suited and already handles the case properly?

Richard.

 Thanks,
 Kugan

 gcc/ChangeLog:

 2015-06-20  Kugan Vivekanandarajah  kug...@linaro.org

 PR middle-end/64130
 * tree-vrp.c (extract_range_from_binary_expr_1): For unsigned
 division, compute minimum when value ranges for dividend and
 divisor are available.


 gcc/testsuite/ChangeLog:

 2015-06-20  Kugan Vivekanandarajah  kug...@linaro.org

 PR middle-end/64130
 * gcc.dg/tree-ssa/pr64130.c: New test.


Re: [patch] libstdc++/64657 support iterators with overloaded comma operator

2015-06-22 Thread Jonathan Wakely

On 29/04/15 16:22 +0100, Jonathan Wakely wrote:

I think this covers all the places in the library where we do:

++i, ++j


I missed one. Tested powerpc64-linux, committed to trunk.
commit 28542877cbeae9e1da3bdc8e3a5a3053b2f0ee23
Author: Jonathan Wakely jwak...@redhat.com
Date:   Mon Jun 22 13:44:06 2015 +0100

	PR libstdc++/64657
	* include/bits/stl_uninitialized.h
	(__uninitialized_copy::__uninit_copy): Cast expression to void.

diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h b/libstdc++-v3/include/bits/stl_uninitialized.h
index 715cb58..045bdd7 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -71,7 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  _ForwardIterator __cur = __result;
 	  __try
 	{
-	  for (; __first != __last; ++__first, ++__cur)
+	  for (; __first != __last; ++__first, (void)++__cur)
 		std::_Construct(std::__addressof(*__cur), *__first);
 	  return __cur;
 	}


Re: [gomp4] Remove some ptxness from middle end

2015-06-22 Thread Nathan Sidwell

On 06/22/15 13:04, Marek Polacek wrote:

On Mon, Jun 22, 2015 at 01:00:51PM -0400, Nathan Sidwell wrote:

+  if (GET_CODE (arg) != CONST_INT
+  || (unsigned HOST_WIDE_INT)INTVAL (arg) = OACC_HWM)


Don't we have UINTVAL for this?  So UINTVAL (arg).


Applied the attached, after testing.  Also realized I'd missed some places I 
should have used the new loop level enumeration.


nathan

--
Nathan Sidwell
2015-06-22  Nathan Sidwell  nat...@codesourcery.com

	* omp-low.c (expand_oacc_get_num_threads): Use OACC enum.
	(expand_oacc_get_thread_num, make_predication_test): Likewise.
	* builtins.c (expand_oacc_id): Use UINTVAL.

Index: omp-low.c
===
--- omp-low.c	(revision 224747)
+++ omp-low.c	(working copy)
@@ -4994,8 +4994,8 @@ expand_oacc_get_num_threads (gimple_seq
   tree  decl = builtin_decl_explicit (BUILT_IN_GOACC_NID);
   unsigned ix;
 
-  for (ix = 0; (1  ix) = gwv_bits; ix++)
-if ((1  ix)  gwv_bits)
+  for (ix = OACC_gang; ix != OACC_HWM; ix++)
+if (OACC_LOOP_MASK(ix)  gwv_bits)
   {
 	tree arg = build_int_cst (unsigned_type_node, ix);
 	tree count = create_tmp_var (unsigned_type_node);
@@ -5022,8 +5022,8 @@ expand_oacc_get_thread_num (gimple_seq *
   unsigned ix;
 
   /* Start at gang level, and examine relevant dimension indices.  */
-  for (ix = 0; (1  ix) = gwv_bits; ix++)
-if ((1  ix)  gwv_bits)
+  for (ix = OACC_gang; ix != OACC_HWM; ix++)
+if (OACC_LOOP_MASK (ix)  gwv_bits)
   {
 	tree arg = build_int_cst (unsigned_type_node, ix);
 
@@ -10671,7 +10671,7 @@ make_predication_test (edge true_edge, b
   unsigned ix;
 
   for (ix = OACC_worker; ix = OACC_vector; ix++)
-if (mask  (1  ix))
+if (OACC_LOOP_MASK (ix)  mask)
   {
 	gimple call = gimple_build_call
 	  (decl, 1, build_int_cst (unsigned_type_node, ix));
Index: builtins.c
===
--- builtins.c	(revision 224747)
+++ builtins.c	(working copy)
@@ -5971,8 +5971,7 @@ expand_oacc_id (enum built_in_function f
   rtx arg;
 
   arg = expand_normal (arg0);
-  if (GET_CODE (arg) != CONST_INT
-  || (unsigned HOST_WIDE_INT)INTVAL (arg) = OACC_HWM)
+  if (GET_CODE (arg) != CONST_INT || UINTVAL (arg) = OACC_HWM)
 {
   error (argument to %D must be constant in range 0 to %d,
 	 get_callee_fndecl (exp), OACC_HWM - 1);


patch to fix PR63740 on gcc5 branch

2015-06-22 Thread Vladimir Makarov

I've committed the following patch to gcc 5 branch as rev.224761.

The patch was bootstrapped on x86-64.


2015-06-22  Vladimir Makarov vmaka...@redhat.com

PR bootstrap/63740
* lra-lives.c (process_bb_lives): Check insn copying the same
reload pseudo and don't create a copy for it.

Index: lra-lives.c
===
--- lra-lives.c (revision 224739)
+++ lra-lives.c (working copy)
@@ -565,7 +565,15 @@ process_bb_lives (basic_block bb, int c
  dst_regno = REGNO (SET_DEST (set));
  if (dst_regno = lra_constraint_new_regno_start
   src_regno = lra_constraint_new_regno_start)
-   lra_create_copy (dst_regno, src_regno, freq);
+   {
+ /* It might be still an original (non-reload) insn with
+one unused output and a constraint requiring to use
+the same reg for input/output operands. In this case
+dst_regno and src_regno have the same value, we don't
+need a misleading copy for this case.  */
+ if (dst_regno != src_regno)
+   lra_create_copy (dst_regno, src_regno, freq);
+   }
  else if (dst_regno = lra_constraint_new_regno_start)
{
  if ((hard_regno = src_regno) = FIRST_PSEUDO_REGISTER)





Re: [PATCH] Combine related fail of gcc.target/powerpc/ti_math1.c

2015-06-22 Thread Alan Modra
On Mon, Jun 22, 2015 at 09:24:07AM +0200, Eric Botcazou wrote:
  * rtlanal.c (commutative_operand_precedence): Correct comments.
  * simplify-rtx.c (simplify_plus_minus_op_data_cmp): Delete forward
  declaration.  Return an int.  Distinguish REG,REG return from
  others.
  (struct simplify_plus_minus_op_data): Make local to function.
  (simplify_plus_minus): Rename canonicalized to not_canonical.
  Don't set not_canonical if merely sorting registers.  Avoid
  packing ops if nothing changes.  White space fixes.
 
 OK in principle, but...

Thanks for reviewing!

  Some notes: Renaming canonicalized to not_canonical better reflects
  its usage.  At the time the var is set, the expression hasn't been
  canonicalized.
 
 I'm quite skeptical, in particular given:

I'm a little surprised, but committed without the renaming.

-- 
Alan Modra
Australia Development Lab, IBM


Fix ao initialization in ipa-polymorphic-call

2015-06-22 Thread Jan Hubicka
Hi,
this patch fixes thinko when initializing ao oracle in
ipa_polymorphic_call_context::get_dynamic_type.  It took get_deref_alias_set of 
vptr type
instead of get_alias_set that now makes difference because pointer types are 
different.

Bootstrapped/regtested x86_64-linux, comitted.

Honza
PR ipa/66351
* ipa-polymorphic-call.c
(ipa_polymorphic_call_context::get_dynamic_type): Fix thinko when
initializing alias oracle; fix formating; set base_alias_set if it
is known.
Index: ipa-polymorphic-call.c
===
--- ipa-polymorphic-call.c  (revision 224713)
+++ ipa-polymorphic-call.c  (working copy)
@@ -1574,13 +1574,15 @@ ipa_polymorphic_call_context::get_dynami
  tree base_ref = get_ref_base_and_extent
   (ref_exp, offset2, size, max_size);
 
- /* Finally verify that what we found looks like read from 
OTR_OBJECT
-or from INSTANCE with offset OFFSET.  */
+ /* Finally verify that what we found looks like read from
+OTR_OBJECT or from INSTANCE with offset OFFSET.  */
  if (base_ref
   ((TREE_CODE (base_ref) == MEM_REF
((offset2 == instance_offset
 TREE_OPERAND (base_ref, 0) == instance)
-  || (!offset2  TREE_OPERAND (base_ref, 0) == 
otr_object)))
+  || (!offset2
+   TREE_OPERAND (base_ref, 0)
+ == otr_object)))
  || (DECL_P (instance)  base_ref == instance
   offset2 == instance_offset)))
{
@@ -1608,9 +1610,17 @@ ipa_polymorphic_call_context::get_dynami
   /* We look for vtbl pointer read.  */
   ao.size = POINTER_SIZE;
   ao.max_size = ao.size;
+  /* We are looking for stores to vptr pointer within the instance of
+ outer type.
+ TODO: The vptr pointer type is globally known, we probably should
+ keep it and do that even when otr_type is unknown.  */
   if (otr_type)
-ao.ref_alias_set
-  = get_deref_alias_set (TREE_TYPE (BINFO_VTABLE (TYPE_BINFO (otr_type;
+{
+  ao.base_alias_set
+   = get_alias_set (outer_type ? outer_type : otr_type);
+  ao.ref_alias_set
+= get_alias_set (TREE_TYPE (BINFO_VTABLE (TYPE_BINFO (otr_type;
+}
 
   if (dump_file)
 {


Re: Fix more of C/fortran canonical type issues

2015-06-22 Thread Jan Hubicka
  On Mon, 8 Jun 2015, Jan Hubicka wrote:
  

I think we should instead work towards eliminating the get_alias_set
langhook first.  The LTO langhook variant contains the same handling, 
btw,
so just inline that into get_alias_set and see what remains?
   
   I see, i completely missed existence of gimple_get_alias_set. It makes 
   more
   sense now.
   
   Is moving everyting to alias.c realy a desirable thing? If non-C 
   languages do
   not have this rule, why we want to reduce the code quality when compiling
   those?
  
  Well, for consistency and for getting rid of one langhook ;)
 :)
 In a way this particular langhook makes sense to me - TBAA rules are language 
 specific.
 We also may with explicit streaming of the TBAA dag, like LLVM does.
 
 Anyway, this is the updated patch fixing the Fortran's interoperability with
 size_t and signed char.  I will send separate patch for the extra lto-symtab
 warnings shortly.
 
 I will be happy looking into the TYPE_CANONICAL (int) to be different from
 TYPE_CANONICAL (unsigned int) if that seems desirable. There are two things 
 that
 needs to be solved - hash_canonical_type/gimple_canonical_types_compatible_p 
 can't
 use TYPE_CNAONICAL of subtypes in all cases (that is easy) and we will need 
 some
 way to recognize the conflict in lto-symtab other thanjust comparing 
 TYPE_CANONICAL
 to not warn when a variable is declared signed in Fortran unit and unsigned 
 in C.
 
 Bootstrapped/regtested ppc64le-linux.
 
   * lto/lto.c (hash_canonical_type): Do not hash TYPE_UNSIGNED
   of INTEGER_TYPE.
   * tree.c (gimple_canonical_types_compatible_p): Do not compare 
 TYPE_UNSIGNED
   of INTEGER_TYPE.
   * gimple-expr.c (useless_type_conversion_p): Move INTEGER type handling
   ahead the canonical type lookup.
 
   * gfortran.dg/lto/bind_c-2_0.f90: New testcase
   * gfortran.dg/lto/bind_c-2_1.c: New testcase
   * gfortran.dg/lto/bind_c-3_0.f90: New testcase
   * gfortran.dg/lto/bind_c-3_1.c: New testcase
   * gfortran.dg/lto/bind_c-4_0.f90: New testcase
   * gfortran.dg/lto/bind_c-4_1.c: New testcase

Hi,
I would like to ping this.  There are still few things to fix to make our
merging compliant at least for C/C++/Fortran rules (the array bounds for
Fortran and union ordering for C I believe) and I would like to progress
on this.

Honza


Re: [PATCH] Fix PR ipa/65908.

2015-06-22 Thread Jan Hubicka
Hi,
this patch extends earlier Martin's patch. I completely removed
parse_tree_args because it assumed that every function has only one set
of parm types and one return value type.  We really want to compare two
sets of parameter types (DECL_ARGUMENTS and TYPE_PARM_TYPES) as they do not
need to 100% match and similarly for the return value.

I also added check that skips unnecesary matching (in addition to the
signature) for parameters that are not used.

The patch bootstrapped/regtested x86_64-linux, and I checked it behaves
sane with Firefox.  I will commit it to mainline and will follow with branch
in couple days.

Honza

PR ipa/65908
* ipa-icf.c (sem_item::target_supports_symbol_aliases): Remove
construction of arg_types.
(sem_function::sem_function): Likewise.
(sem_function::~sem_function): Remove destruction of arg_types.
(sem_function::compatible_parm_types_p): New function.
(sem_function::equals_wpa): Reorg matching of return values
and parameter types.
(sem_function::equals_private): Reorg mathcing of argument types.
(sem_function::parse_tree_args): Remove.
* ipa-icf.h (init_wpa): Do not call it.
(parse_tree_args): Remove.
(compatible_parm_types_p): Declare.
(result_type): Remove.
(arg_types): Remove.
* testsuite/g++.dg/ipa/pr65908.C: New testcase.
Index: ipa-icf.h
===
--- ipa-icf.h   (revision 224713)
+++ ipa-icf.h   (working copy)
@@ -292,7 +292,6 @@ public:
 
   inline virtual void init_wpa (void)
   {
-parse_tree_args ();
   }
 
   virtual void init (void);
@@ -310,9 +309,6 @@ public:
 dump_function_to_file (decl, file, TDF_DETAILS);
   }
 
-  /* Parses function arguments and result type.  */
-  void parse_tree_args (void);
-
   /* Returns cgraph_node.  */
   inline cgraph_node *get_node (void)
   {
@@ -329,15 +325,13 @@ public:
  semantic function item.  */
   static sem_function *parse (cgraph_node *node, bitmap_obstack *stack);
 
+  /* Perform additional checks needed to match types of used function
+ paramters.  */
+  bool compatible_parm_types_p (tree, tree);
+
   /* Exception handling region tree.  */
   eh_region region_tree;
 
-  /* Result type tree node.  */
-  tree result_type;
-
-  /* Array of argument tree types.  */
-  vec tree arg_types;
-
   /* Number of function arguments.  */
   unsigned int arg_count;
 
Index: testsuite/g++.dg/ipa/pr65908.C
===
--- testsuite/g++.dg/ipa/pr65908.C  (revision 0)
+++ testsuite/g++.dg/ipa/pr65908.C  (revision 0)
@@ -0,0 +1,27 @@
+// PR ipa/65908
+// { dg-do compile }
+// { dg-options -O2 }
+// { dg-additional-options -fPIC { target fpic } }
+
+class A
+{
+  A (A );
+};
+class B
+{
+  const A m_fn1 () const;
+};
+class C
+{
+  A m_fn2 () const;
+};
+A
+C::m_fn2 () const
+{
+  throw 0;
+}
+const A 
+B::m_fn1 () const
+{
+  throw 0;
+}
Index: ipa-icf.c
===
--- ipa-icf.c   (revision 224713)
+++ ipa-icf.c   (working copy)
@@ -259,7 +259,6 @@ sem_item::target_supports_symbol_aliases
 sem_function::sem_function (bitmap_obstack *stack): sem_item (FUNC, stack),
   m_checker (NULL), m_compared_func (NULL)
 {
-  arg_types.create (0);
   bb_sizes.create (0);
   bb_sorted.create (0);
 }
@@ -271,7 +270,6 @@ sem_function::sem_function (cgraph_node
   sem_item (FUNC, node, hash, stack),
   m_checker (NULL), m_compared_func (NULL)
 {
-  arg_types.create (0);
   bb_sizes.create (0);
   bb_sorted.create (0);
 }
@@ -281,7 +279,6 @@ sem_function::~sem_function ()
   for (unsigned i = 0; i  bb_sorted.length (); i++)
 delete (bb_sorted[i]);
 
-  arg_types.release ();
   bb_sizes.release ();
   bb_sorted.release ();
 }
@@ -581,6 +578,30 @@ sem_function::param_used_p (unsigned int
   return ipa_is_param_used (IPA_NODE_REF (get_node ()), i);
 }
 
+/* Perform additional check needed to match types function parameters that are
+   used.  Unlike for normal decls it matters if type is TYPE_RESTRICT and we
+   make an assumption that REFERENCE_TYPE parameters are always non-NULL.  */
+
+bool
+sem_function::compatible_parm_types_p (tree parm1, tree parm2)
+{
+  /* Be sure that parameters are TBAA compatible.  */
+  if (!func_checker::compatible_types_p (parm1, parm2))
+return return_false_with_msg (parameter type is not compatible);
+
+  if (POINTER_TYPE_P (parm1)
+   (TYPE_RESTRICT (parm1) != TYPE_RESTRICT (parm2)))
+return return_false_with_msg (argument restrict flag mismatch);
+
+  /* nonnull_arg_p implies non-zero range to REFERENCE types.  */
+  if (POINTER_TYPE_P (parm1)
+   TREE_CODE (parm1) != TREE_CODE (parm2)
+   opt_for_fn (decl, flag_delete_null_pointer_checks))
+return return_false_with_msg (pointer wrt reference mismatch);
+
+  return true;
+}
+
 /* Fast equality function based on knowledge known 

[PATCH] Avoid computing scalar iteration cost multiple times

2015-06-22 Thread Richard Biener

This avoids doing $subject in the vectorizer.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-06-22  Richard Biener  rguent...@suse.de

* tree-vectorizer.h (_loop_vec_info): Add scalar_cost_vec
and single_scalar_iteration_cost members.
(LOOP_VINFO_SCALAR_ITERATION_COST): New.
(LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST): Likewise.
(vect_get_single_scalar_iteration_cost): Remove.
* tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
Use LOOP_VINFO_SCALAR_ITERATION_COST.
* tree-vect-loop.c (destroy_loop_vec_info): Free
scalar_cost_vec.
(vect_get_single_scalar_iteration_cost): Compute result into
LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST and
LOOP_VINFO_SCALAR_ITERATION_COST.  Make static.
(vect_analyze_loop_2): Call vect_get_single_scalar_iteration_cost.
(vect_estimate_min_profitable_iters): Use them.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 224603)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -1165,11 +1165,10 @@ vect_peeling_hash_get_lowest_cost (_vect
   SET_DR_MISALIGNMENT (dr, save_misalignment);
 }
 
-  auto_vecstmt_info_for_cost scalar_cost_vec;
-  vect_get_single_scalar_iteration_cost (loop_vinfo, scalar_cost_vec);
   outside_cost += vect_get_known_peeling_cost
 (loop_vinfo, elem-npeel, dummy,
- scalar_cost_vec, prologue_cost_vec, epilogue_cost_vec);
+ LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+ prologue_cost_vec, epilogue_cost_vec);
 
   /* Prologue and epilogue costs are added to the target model later.
  These costs depend only on the scalar iteration cost, the
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 224603)
+++ gcc/tree-vect-loop.c(working copy)
@@ -1095,12 +1095,82 @@ destroy_loop_vec_info (loop_vec_info loo
   LOOP_VINFO_PEELING_HTAB (loop_vinfo) = NULL;
 
   destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
+  loop_vinfo-scalar_cost_vec.release ();
 
   free (loop_vinfo);
   loop-aux = NULL;
 }
 
 
+/* Calculate the cost of one scalar iteration of the loop.  */
+static void
+vect_get_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
+{
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+  int nbbs = loop-num_nodes, factor, scalar_single_iter_cost = 0;
+  int innerloop_iters, i;
+
+  /* Count statements in scalar loop.  Using this as scalar cost for a single
+ iteration for now.
+
+ TODO: Add outer loop support.
+
+ TODO: Consider assigning different costs to different scalar
+ statements.  */
+
+  /* FORNOW.  */
+  innerloop_iters = 1;
+  if (loop-inner)
+innerloop_iters = 50; /* FIXME */
+
+  for (i = 0; i  nbbs; i++)
+{
+  gimple_stmt_iterator si;
+  basic_block bb = bbs[i];
+
+  if (bb-loop_father == loop-inner)
+factor = innerloop_iters;
+  else
+factor = 1;
+
+  for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (si))
+{
+  gimple stmt = gsi_stmt (si);
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+
+  if (!is_gimple_assign (stmt)  !is_gimple_call (stmt))
+continue;
+
+  /* Skip stmts that are not vectorized inside the loop.  */
+  if (stmt_info
+   !STMT_VINFO_RELEVANT_P (stmt_info)
+   (!STMT_VINFO_LIVE_P (stmt_info)
+  || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info)))
+  !STMT_VINFO_IN_PATTERN_P (stmt_info))
+continue;
+
+ vect_cost_for_stmt kind;
+  if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt)))
+{
+  if (DR_IS_READ (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt
+   kind = scalar_load;
+ else
+   kind = scalar_store;
+}
+  else
+kind = scalar_stmt;
+
+ scalar_single_iter_cost
+   += record_stmt_cost (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+factor, kind, NULL, 0, vect_prologue);
+}
+}
+  LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo)
+= scalar_single_iter_cost;
+}
+
+
 /* Function vect_analyze_loop_1.
 
Apply a set of analyses on LOOP, and create a loop_vec_info struct
@@ -1834,6 +1904,9 @@ vect_analyze_loop_2 (loop_vec_info loop_
   return false;
 }
 
+  /* Compute the scalar iteration cost.  */
+  vect_get_single_scalar_iteration_cost (loop_vinfo);
+
   /* This pass will decide on using loop versioning and/or loop peeling in
  order to enhance the alignment of data references in the loop.  */
 
@@ -2706,74 +2779,6 @@ vect_force_simple_reduction (loop_vec_in
 double_reduc, true);
 }
 
-/* 

Re: [PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Tom de Vries

On 22/06/15 12:14, Richard Biener wrote:

On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries tom_devr...@mentor.com wrote:

Hi,

during development of a patch I ran into a case where
compute_dominance_frontiers was called with incorrect dominance info.

The result was a segmentation violation somewhere in the bitmap code while
executing this bitmap_set_bit in compute_dominance_frontiers_1:
...
   if (!bitmap_set_bit (frontiers[runner-index],
b-index))
 break;
...

The segmentation violation happens because runner-index is 0, and
frontiers[0] is uninitialized.

[ The initialization in update_ssa looks like this:
...
  dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
   FOR_EACH_BB_FN (bb, cfun)
 bitmap_initialize (dfs[bb-index], bitmap_default_obstack);
   compute_dominance_frontiers (dfs);
...

FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0]
(frontiers[0] in compute_dominance_frontiers_1) is not initialized.

We could add initialization by making the entry/exit-block bitmap_heads
empty and setting the obstack to a reserved obstack bitmap_no_obstack for
which allocation results in an assert. ]

AFAIU, the immediate problem is not that frontiers[0] is uninitialized, but
that the loop reaches the state of runner-index == 0, due to the incorrect
dominance info.

The patch adds an assert to the loop in compute_dominance_frontiers_1, to
make the failure mode cleaner and easier to understand.

I think we wouldn't catch all errors in dominance info with this assert. So
the patch also contains an ENABLE_CHECKING-enabled verify_dominators call at
the start of compute_dominance_frontiers. I'm not sure if:
- adding the verify_dominators call is too costly in runtime.
- the verify_dominators call should be inside or outside the
   TV_DOM_FRONTIERS measurement.
- there is a level of ENABLE_CHECKING that is more appropriate for the
   verify_dominators call.

Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds?


I don't think these kind of asserts are good.  A segfault is good by itself
(so you can just add the comment if you like).



The segfault is not guaranteed to trigger, because it works on 
uninitialized data. Instead, we may end up modifying valid memory and 
silently generating wrong code or causing sigsegvs (which will be 
difficult to track back this error). So I don't think doing nothing is 
an option here. If we're not going to add this assert, we should 
initialize the uninitialized data in such a way that we are guaranteed 
to detect the error. The scheme I proposed above would take care of 
that. Should I implement that instead?



Likewise the verify_dominators call is too expensive and misplaced.

If then the call belongs in the dom_computed[] == DOM_OK early-out
in calculate_dominance_info


OK, like this:
...
diff --git a/gcc/dominance.c b/gcc/dominance.c
index a9e042e..1827eda9 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir)
   bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false;

   if (dom_computed[dir_index] == DOM_OK)
-return;
+{
+#if ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+#endif
+  return;
+}

   timevar_push (TV_DOMINANCE);
   if (!dom_info_available_p (dir))
...

I didn't fully understand your comment, do you want me to test this?

Thanks,
- Tom


(eventually also for the case where we
end up only computing the fast-query stuff).




Re: [PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Richard Biener
On Mon, Jun 22, 2015 at 1:33 PM, Tom de Vries tom_devr...@mentor.com wrote:
 On 22/06/15 12:14, Richard Biener wrote:

 On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries tom_devr...@mentor.com
 wrote:

 Hi,

 during development of a patch I ran into a case where
 compute_dominance_frontiers was called with incorrect dominance info.

 The result was a segmentation violation somewhere in the bitmap code
 while
 executing this bitmap_set_bit in compute_dominance_frontiers_1:
 ...
if (!bitmap_set_bit (frontiers[runner-index],
 b-index))
  break;
 ...

 The segmentation violation happens because runner-index is 0, and
 frontiers[0] is uninitialized.

 [ The initialization in update_ssa looks like this:
 ...
   dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
FOR_EACH_BB_FN (bb, cfun)
  bitmap_initialize (dfs[bb-index], bitmap_default_obstack);
compute_dominance_frontiers (dfs);
 ...

 FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0]
 (frontiers[0] in compute_dominance_frontiers_1) is not initialized.

 We could add initialization by making the entry/exit-block bitmap_heads
 empty and setting the obstack to a reserved obstack bitmap_no_obstack for
 which allocation results in an assert. ]

 AFAIU, the immediate problem is not that frontiers[0] is uninitialized,
 but
 that the loop reaches the state of runner-index == 0, due to the
 incorrect
 dominance info.

 The patch adds an assert to the loop in compute_dominance_frontiers_1, to
 make the failure mode cleaner and easier to understand.

 I think we wouldn't catch all errors in dominance info with this assert.
 So
 the patch also contains an ENABLE_CHECKING-enabled verify_dominators call
 at
 the start of compute_dominance_frontiers. I'm not sure if:
 - adding the verify_dominators call is too costly in runtime.
 - the verify_dominators call should be inside or outside the
TV_DOM_FRONTIERS measurement.
 - there is a level of ENABLE_CHECKING that is more appropriate for the
verify_dominators call.

 Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds?


 I don't think these kind of asserts are good.  A segfault is good by
 itself
 (so you can just add the comment if you like).


 The segfault is not guaranteed to trigger, because it works on uninitialized
 data. Instead, we may end up modifying valid memory and silently generating
 wrong code or causing sigsegvs (which will be difficult to track back this
 error). So I don't think doing nothing is an option here. If we're not going
 to add this assert, we should initialize the uninitialized data in such a
 way that we are guaranteed to detect the error. The scheme I proposed above
 would take care of that. Should I implement that instead?

No, instead the check below should catch the error much earlier.

 Likewise the verify_dominators call is too expensive and misplaced.

 If then the call belongs in the dom_computed[] == DOM_OK early-out
 in calculate_dominance_info


 OK, like this:
 ...
 diff --git a/gcc/dominance.c b/gcc/dominance.c
 index a9e042e..1827eda9 100644
 --- a/gcc/dominance.c
 +++ b/gcc/dominance.c
 @@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir)
bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false;

if (dom_computed[dir_index] == DOM_OK)
 -return;
 +{
 +#if ENABLE_CHECKING
 +  verify_dominators (CDI_DOMINATORS);
 +#endif
 +  return;
 +}

timevar_push (TV_DOMINANCE);
if (!dom_info_available_p (dir))
 ...

Yes.

 I didn't fully understand your comment, do you want me to test this?

Sure, it should catch the error.

Richard.

 Thanks,
 - Tom


 (eventually also for the case where we
 end up only computing the fast-query stuff).




[PATCH 3/3][ARM][PR target/65697] Add tests for __sync builtins.

2015-06-22 Thread Matthew Wahab

This is the ARM version of the patches to strengthen memory barriers for the
__sync builtins on ARMv8 targets
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

This patch adds tests for the code generated by the ARM backend for the __sync
builtins.

Tested the series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/testsuite
2015-06-22  Matthew Wahab  matthew.wa...@arm.com

PR Target/65697
* gcc.target/arm/armv8-sync-comp-swap.c: New.
* gcc.target/arm/armv8-sync-op-acquire.c: New.
* gcc.target/arm/armv8-sync-op-full.c: New.
* gcc.target/arm/armv8-sync-op-release.c: New.

From 8157c7480a9d6d559013d02e24519d1b7ba1ed5b Mon Sep 17 00:00:00 2001
From: Matthew Wahab matthew.wa...@arm.com
Date: Wed, 3 Jun 2015 16:27:55 +0100
Subject: [PATCH 3/3] [ARM] Add test cases.

Change-Id: I0f2257ce5b5e7f9d0f75e57e6be22fd9733ed3ca
---
 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c  | 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c | 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c| 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c |  8 
 4 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c

diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
new file mode 100644
index 000..f96c81a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options -O2 } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include ../aarch64/sync-comp-swap.x
+
+/* { dg-final { scan-assembler-times ldrex 2 } } */
+/* { dg-final { scan-assembler-times stlex 2 } } */
+/* { dg-final { scan-assembler-times dmb 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
new file mode 100644
index 000..8d6659b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options -O2 } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include ../aarch64/sync-op-acquire.x
+
+/* { dg-final { scan-assembler-times ldrex 1 } } */
+/* { dg-final { scan-assembler-times stlex 1 } } */
+/* { dg-final { scan-assembler-times dmb 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
new file mode 100644
index 000..a5ad3bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options -O2 } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include ../aarch64/sync-op-full.x
+
+/* { dg-final { scan-assembler-times ldrex 12 } } */
+/* { dg-final { scan-assembler-times stlex 12 } } */
+/* { dg-final { scan-assembler-times dmb 12 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
new file mode 100644
index 000..0d3be7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options -O2 } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include ../aarch64/sync-op-release.x
+
+/* { dg-final { scan-assembler-times stl 1 } } */
-- 
1.9.1



Re: [PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Richard Biener
On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries tom_devr...@mentor.com wrote:
 Hi,

 during development of a patch I ran into a case where
 compute_dominance_frontiers was called with incorrect dominance info.

 The result was a segmentation violation somewhere in the bitmap code while
 executing this bitmap_set_bit in compute_dominance_frontiers_1:
 ...
   if (!bitmap_set_bit (frontiers[runner-index],
b-index))
 break;
 ...

 The segmentation violation happens because runner-index is 0, and
 frontiers[0] is uninitialized.

 [ The initialization in update_ssa looks like this:
 ...
  dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
   FOR_EACH_BB_FN (bb, cfun)
 bitmap_initialize (dfs[bb-index], bitmap_default_obstack);
   compute_dominance_frontiers (dfs);
 ...

 FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0]
 (frontiers[0] in compute_dominance_frontiers_1) is not initialized.

 We could add initialization by making the entry/exit-block bitmap_heads
 empty and setting the obstack to a reserved obstack bitmap_no_obstack for
 which allocation results in an assert. ]

 AFAIU, the immediate problem is not that frontiers[0] is uninitialized, but
 that the loop reaches the state of runner-index == 0, due to the incorrect
 dominance info.

 The patch adds an assert to the loop in compute_dominance_frontiers_1, to
 make the failure mode cleaner and easier to understand.

 I think we wouldn't catch all errors in dominance info with this assert. So
 the patch also contains an ENABLE_CHECKING-enabled verify_dominators call at
 the start of compute_dominance_frontiers. I'm not sure if:
 - adding the verify_dominators call is too costly in runtime.
 - the verify_dominators call should be inside or outside the
   TV_DOM_FRONTIERS measurement.
 - there is a level of ENABLE_CHECKING that is more appropriate for the
   verify_dominators call.

 Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds?

I don't think these kind of asserts are good.  A segfault is good by itself
(so you can just add the comment if you like).

Likewise the verify_dominators call is too expensive and misplaced.

If then the call belongs in the dom_computed[] == DOM_OK early-out
in calculate_dominance_info (eventually also for the case where we
end up only computing the fast-query stuff).

Richard.

 Thanks,
 - Tom


Re: [C++/58583] ICE instantiating NSDMIs

2015-06-22 Thread Andreas Schwab
Nathan Sidwell nat...@acm.org writes:

 On 06/20/15 02:09, Andreas Schwab wrote:
 This also fails on powerpc.

 what  is the build compiler?

It is a bootstrapped build, so the build compiler should not matter.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


[PATCH] Check dominator info in compute_dominance_frontiers

2015-06-22 Thread Tom de Vries

Hi,

during development of a patch I ran into a case where 
compute_dominance_frontiers was called with incorrect dominance info.


The result was a segmentation violation somewhere in the bitmap code 
while executing this bitmap_set_bit in compute_dominance_frontiers_1:

...
  if (!bitmap_set_bit (frontiers[runner-index],
   b-index))
break;
...

The segmentation violation happens because runner-index is 0, and 
frontiers[0] is uninitialized.


[ The initialization in update_ssa looks like this:
...
 dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun));
  FOR_EACH_BB_FN (bb, cfun)
bitmap_initialize (dfs[bb-index], bitmap_default_obstack);
  compute_dominance_frontiers (dfs);
...

FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0] 
(frontiers[0] in compute_dominance_frontiers_1) is not initialized.


We could add initialization by making the entry/exit-block bitmap_heads 
empty and setting the obstack to a reserved obstack bitmap_no_obstack 
for which allocation results in an assert. ]


AFAIU, the immediate problem is not that frontiers[0] is uninitialized, 
but that the loop reaches the state of runner-index == 0, due to the 
incorrect dominance info.


The patch adds an assert to the loop in compute_dominance_frontiers_1, 
to make the failure mode cleaner and easier to understand.


I think we wouldn't catch all errors in dominance info with this assert. 
So the patch also contains an ENABLE_CHECKING-enabled verify_dominators 
call at the start of compute_dominance_frontiers. I'm not sure if:

- adding the verify_dominators call is too costly in runtime.
- the verify_dominators call should be inside or outside the
  TV_DOM_FRONTIERS measurement.
- there is a level of ENABLE_CHECKING that is more appropriate for the
  verify_dominators call.

Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds?

Thanks,
- Tom
Check dominator info in compute_dominance_frontiers

2015-06-22  Tom de Vries  t...@codesourcery.com

	* cfganal.c (compute_dominance_frontiers_1): Add assert.
	(compute_dominance_frontiers): Verify dominators if ENABLE_CHECKING.
---
 gcc/cfganal.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/cfganal.c b/gcc/cfganal.c
index b8d67bc..0e0e2bb 100644
--- a/gcc/cfganal.c
+++ b/gcc/cfganal.c
@@ -1261,6 +1261,11 @@ compute_dominance_frontiers_1 (bitmap_head *frontiers)
 	  domsb = get_immediate_dominator (CDI_DOMINATORS, b);
 	  while (runner != domsb)
 		{
+		  /* If you're running into this assert, the dominator info is
+		 incorrect.  Try enabling the verify_dominators call at the
+		 start of compute_dominance_frontiers.  */
+		  gcc_assert (runner != ENTRY_BLOCK_PTR_FOR_FN (cfun));
+
 		  if (!bitmap_set_bit (frontiers[runner-index],
    b-index))
 		break;
@@ -1276,6 +1281,10 @@ compute_dominance_frontiers_1 (bitmap_head *frontiers)
 void
 compute_dominance_frontiers (bitmap_head *frontiers)
 {
+#if ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+#endif
+
   timevar_push (TV_DOM_FRONTIERS);
 
   compute_dominance_frontiers_1 (frontiers);
-- 
1.9.1



[gomp4] dominance info after predicate_omp_regions

2015-06-22 Thread Tom de Vries

On 21/05/15 13:42, ber...@gcc.gnu.org wrote:

Author: bernds
Date: Thu May 21 11:42:14 2015
New Revision: 223478

URL: https://gcc.gnu.org/viewcvs?rev=223478root=gccview=rev
Log:
* omp-low.c (struct omp_region): Add a gwv_this field.
(bb_region_map): New variable.
(find_omp_for_region_data, find_omp_target_region_data): New static
functions.
(build_omp_regions_1): Call them.  Build the bb_region_map.
(enclosing_target_region, requires_vector_predicate,
generate_vector_broadcast, predicate_bb, find_predicatable_bbs,
predicate_omp_regions): New static functions.
(execute_expand_omp): Allocate and free bb_region_map.

Modified:
 branches/gomp-4_0-branch/gcc/ChangeLog.gomp
 branches/gomp-4_0-branch/gcc/omp-low.c



Hi Bernd,

I ran into trouble with invalid dominance info, AFAIU because 
predicate_omp_regions invalidates the dominance info. For now I'm using 
this workaround:

...
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f7e13d3..5601cff 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11268,6 +11268,8 @@ execute_expand_omp (void)
}

   predicate_omp_regions (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+  free_dominance_info (CDI_DOMINATORS);
+  calculate_dominance_info (CDI_DOMINATORS);

   remove_exit_barriers (root_omp_region);

...

Thanks,
- Tom


[PATCH, i386]: Ignore the cost of embedded comparison (PR 65871)

2015-06-22 Thread Uros Bizjak
Hello!

As shown in the PR [1], RTX costs can reject combination of the
operation and its embedded comparison. Attached patch fixes this by
ignoring the cost of embedded comparison.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65871#c10

2015-06-22  Uros Bizjak  ubiz...@gmail.com

PR target/65871
* config/i386/i386.c (ix86_rtx_costs) case COMPARE: Ignore the
cost of embedded comparison.

Bootstrapped on x86_64-linux-gnu, regtest in progress.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 224718)
+++ config/i386/i386.c  (working copy)
@@ -42531,6 +42531,12 @@ ix86_rtx_costs (rtx x, int code_i, int outer_code_
+ rtx_cost (const1_rtx, outer_code, opno, speed));
  return true;
}
+
+  /* The embedded comparison operand is completely free.  */
+  if (!general_operand (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
+  XEXP (x, 1) == const0_rtx)
+   *total = 0;
+
   return false;
 
 case FLOAT_EXTEND:


[PATCH 2/3][ARM][PR target/65697] Strengthen barriers for compare-and-swap builtin.

2015-06-22 Thread Matthew Wahab

This is the ARM version of the patches to strengthen memory barriers for the
__sync builtins on ARMv8 targets
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

This patch changes the code generated for __sync_type_compare_and_swap to remove
the acquire-barrier from the load and end the operation with a fence. This also
strengthens the acquire barrier generated for __sync_lock_test_and_set which,
like compare-and-swap, is implemented as a form of atomic exchange.

Tested as part of a series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  matthew.wa...@arm.com

PR Target/65697
* config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an
initial acquire barrier with a final full barrier.

From ddb9a45acda7bb64d91c446bc40afe4b78fcc1e1 Mon Sep 17 00:00:00 2001
From: Matthew Wahab matthew.wa...@arm.com
Date: Fri, 22 May 2015 13:36:39 +0100
Subject: [PATCH 2/3] [ARM] Strengthen barriers for compare-and-swap builtin.

Change-Id: I43381b2ea88492f807d85a73d233369334c99881
---
 gcc/config/arm/arm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 94118f4..4610ff6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27603,6 +27603,8 @@ arm_split_compare_and_swap (rtx operands[])
   scratch = operands[7];
   mode = GET_MODE (mem);
 
+  bool is_armv8_sync = arm_arch8  is_mm_sync (mod_s);
+
   bool use_acquire = TARGET_HAVE_LDACQ
   !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s)
 			  || is_mm_release (mod_s));
@@ -27611,6 +27613,11 @@ arm_split_compare_and_swap (rtx operands[])
   !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s)
 			  || is_mm_acquire (mod_s));
 
+  /* For ARMv8, the load-acquire is too weak for __sync memory orders.  Instead,
+ a full barrier is emitted after the store-release.  */
+  if (is_armv8_sync)
+use_acquire = false;
+
   /* Checks whether a barrier is needed and emits one accordingly.  */
   if (!(use_acquire || use_release))
 arm_pre_atomic_barrier (mod_s);
@@ -27651,7 +27658,8 @@ arm_split_compare_and_swap (rtx operands[])
 emit_label (label2);
 
   /* Checks whether a barrier is needed and emits one accordingly.  */
-  if (!(use_acquire || use_release))
+  if (is_armv8_sync
+  || !(use_acquire || use_release))
 arm_post_atomic_barrier (mod_s);
 
   if (is_mm_relaxed (mod_f))
-- 
1.9.1



Re: [PATCH] Combine related fail of gcc.target/powerpc/ti_math1.c

2015-06-22 Thread Eric Botcazou
 This patch fixes
 FAIL: gcc.target/powerpc/ti_math1.c scan-assembler-times adde 1
 a failure caused by combine simplifying this i2src
 
 (plus:DI (plus:DI (reg:DI 165 [ val+8 ])
 (reg:DI 169 [+8 ]))
 (reg:DI 76 ca))
 
 to this
 
 (plus:DI (plus:DI (reg:DI 76 ca)
 (reg:DI 165 [ val+8 ]))
 (reg:DI 169 [+8 ]))
 
 which no longer matches rs6000.md adddi3_carry_in_internal.  See
 https://gcc.gnu.org/ml/gcc/2015-05/msg00206.html for related
 discussion.  Bootstrapped and regression tested powerpc64le-linux,
 powerpc64-linux and x86_64-linux.  OK to apply mainline?
 
   * rtlanal.c (commutative_operand_precedence): Correct comments.
   * simplify-rtx.c (simplify_plus_minus_op_data_cmp): Delete forward
   declaration.  Return an int.  Distinguish REG,REG return from
   others.
   (struct simplify_plus_minus_op_data): Make local to function.
   (simplify_plus_minus): Rename canonicalized to not_canonical.
   Don't set not_canonical if merely sorting registers.  Avoid
   packing ops if nothing changes.  White space fixes.

OK in principle, but...

 Some notes: Renaming canonicalized to not_canonical better reflects
 its usage.  At the time the var is set, the expression hasn't been
 canonicalized.

I'm quite skeptical, in particular given:

+ /* Just swapping registers doesn't count as canonicalization.  */
+ if (cmp != 1)
+   not_canonical = 1;

and

+  /* If nothing changed, fail.  */
+  if (!not_canonical)
+return NULL_RTX;

Both are rather confusing now so the renaming isn't really a progress IMO.

-- 
Eric Botcazou


[PATCH 1/3][ARM][PR target/65697] Strengthen memory barriers for __sync builtins

2015-06-22 Thread Matthew Wahab

This is the ARM version of the patches to strengthen memory barriers for the
__sync builtins on ARMv8 targets
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

The problem is that the barriers generated for the __sync builtins for ARMv8
targets are too weak. This affects the full and the acquire barriers in the
__sync fetch-and-op, compare-and-swap functions and __sync_lock_test_and_set.

This patch series changes the code to strengthen the barriers by replacing
initial load-acquires with a simple load and adding a final memory barrier to
prevent code hoisting.

- Full barriers:  __sync_fetch_and_op, __sync_op_and_fetch
  __sync_*_compare_and_swap

  [load-acquire; code; store-release]
  becomes
  [load; code ; store-release; barrier].

- Acquire barriers:  __sync_lock_test_and_set

  [load-acquire; code; store]
  becomes
  [load; code; store; barrier]

This patch changes the code generated for __sync_fetch_and_op and
__sync_op_and_fetch builtins.

Tested as part of a series for arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-06-22  Matthew Wahab  matthew.wa...@arm.com

PR Target/65697
* config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
initial acquire barrier with a final full barrier.
From 3e9f71c04dba20ba66b5c9bae284fcac5fdd91ec Mon Sep 17 00:00:00 2001
From: Matthew Wahab matthew.wa...@arm.com
Date: Fri, 22 May 2015 13:31:58 +0100
Subject: [PATCH 1/3] [ARM] Strengthen barriers for sync-fetch-op builtin.

Change-Id: I18f5af5ba4b2e74b5866009d3a090e251eff4a45
---
 gcc/config/arm/arm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e79a369..94118f4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27668,6 +27668,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   rtx_code_label *label;
   rtx x;
 
+  bool is_armv8_sync = arm_arch8  is_mm_sync (model);
+
   bool use_acquire = TARGET_HAVE_LDACQ
   !(is_mm_relaxed (model) || is_mm_consume (model)
 			  || is_mm_release (model));
@@ -27676,6 +27678,11 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   !(is_mm_relaxed (model) || is_mm_consume (model)
 			  || is_mm_acquire (model));
 
+  /* For ARMv8, a load-acquire is too weak for __sync memory orders.  Instead,
+ a full barrier is emitted after the store-release.  */
+  if (is_armv8_sync)
+use_acquire = false;
+
   /* Checks whether a barrier is needed and emits one accordingly.  */
   if (!(use_acquire || use_release))
 arm_pre_atomic_barrier (model);
@@ -27746,7 +27753,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   emit_unlikely_jump (gen_cbranchsi4 (x, cond, const0_rtx, label));
 
   /* Checks whether a barrier is needed and emits one accordingly.  */
-  if (!(use_acquire || use_release))
+  if (is_armv8_sync
+  || !(use_acquire || use_release))
 arm_post_atomic_barrier (model);
 }
 
-- 
1.9.1



[gomp4, committed] Handle reduction in oacc kernels region

2015-06-22 Thread Tom de Vries

Hi,

attached patch handles reductions in oacc kernels region.

The approach uses the normal parloops reduction handling code, with 
these modifications:


1.

For each reduction, we look for this pattern in the oacc-lowered code, 
and store 'addr' in the corresponding struct reduction_info:

...
 bb preheader
   .omp_data_i = .omp_data_arr;
   addr = .omp_data_i-sum;
   sum_a = *addr;

 bb header:
   sum_b = PHI sum_a (preheader), sum_c (latch)
...

2.

We replaces the non-atomic store to 'addr' at the end of the kernels 
region with an atomic one.



Bootstrapped and reg-tested on x86_64 on top of gomp-4_0-branch.

Committed to gomp-4_0-branch.

Thanks,
- Tom

Handle reduction in oacc kernels region

2015-06-18  Tom de Vries  t...@codesourcery.com

	* tree-parloops.c (struct reduction_info): Add reduc_addr field.
	(create_call_for_reduction_1): Handle case that reduc_addr is non-NULL.
	(gen_parallel_loop): Init clsn_data for oacc_kernels_p case.
	(try_create_reduction_list): Add and handle oacc_kernels_p parameter.
	(parallelize_loops): Add argument to call to try_create_reduction_list.

	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c: New test.

	* c-c++-common/goacc/kernels-reduction.c: New test.
---
 .../c-c++-common/goacc/kernels-reduction.c | 38 +
 gcc/tree-parloops.c| 92 --
 .../libgomp.oacc-c-c++-common/kernels-reduction.c  | 37 +
 3 files changed, 162 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
new file mode 100644
index 000..bfbcdbd
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
@@ -0,0 +1,38 @@
+/* { dg-additional-options -O2 } */
+/* { dg-additional-options -ftree-parallelize-loops=32 } */
+/* { dg-additional-options -fdump-tree-parloops_oacc_kernels-all } */
+/* { dg-additional-options -fdump-tree-optimized } */
+
+#include stdlib.h
+
+#define n 1
+
+unsigned int a[n];
+
+void  __attribute__((noinline,noclone))
+foo (void)
+{
+  int i;
+  unsigned int sum = 1;
+
+#pragma acc kernels copyin (a[0:n]) copy (sum)
+  {
+for (i = 0; i  n; ++i)
+  sum += a[i];
+  }
+
+  if (sum != 5001)
+abort ();
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times SUCCESS: may be parallelized 1 parloops_oacc_kernels } } */
+/* { dg-final { scan-tree-dump-not FAILED: parloops_oacc_kernels } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times (?n);; Function .*foo.*._omp_fn.0 1 optimized } } */
+
+/* { dg-final { scan-tree-dump-times (?n)pragma omp target oacc_parallel.*num_gangs\\(32\\) 1 parloops_oacc_kernels } } */
+
+/* { dg-final { cleanup-tree-dump parloops_oacc_kernels } } */
+/* { dg-final { cleanup-tree-dump optimized } } */
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 0661b78..c5f4d9a 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -218,6 +218,8 @@ struct reduction_info
    of the reduction variable when existing the loop. */
   tree initial_value;		/* The initial value of the reduction var before entering the loop.  */
   tree field;			/*  the name of the field in the parloop data structure intended for reduction.  */
+  tree reduc_addr;		/* The address of the reduction variable for
+   openacc reductions.  */
   tree init;			/* reduction initialization value.  */
   gphi *new_phi;		/* (helper field) Newly created phi node whose result
    will be passed to the atomic operation.  Represents
@@ -1107,10 +1109,30 @@ create_call_for_reduction_1 (reduction_info **slot, struct clsn_data *clsn_data)
   tree tmp_load, name;
   gimple load;
 
-  load_struct = build_simple_mem_ref (clsn_data-load);
-  t = build3 (COMPONENT_REF, type, load_struct, reduc-field, NULL_TREE);
+  if (reduc-reduc_addr == NULL_TREE)
+{
+  load_struct = build_simple_mem_ref (clsn_data-load);
+  t = build3 (COMPONENT_REF, type, load_struct, reduc-field, NULL_TREE);
+
+  addr = build_addr (t, current_function_decl);
+}
+  else
+{
+  /* Set the address for the atomic store.  */
+  addr = reduc-reduc_addr;
+
+  /* Remove the non-atomic store '*addr = sum'.  */
+  tree res = PHI_RESULT (reduc-keep_res);
+  use_operand_p use_p;
+  gimple stmt;
+  bool single_use_p = single_imm_use (res, use_p, stmt);
+  gcc_assert (single_use_p);
+  replace_uses_by (gimple_vdef (stmt),
+		   gimple_vuse (stmt));
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gsi_remove (gsi, true);
+}
 
-  addr = build_addr (t, current_function_decl);
 
   /* Create phi node.  */
   bb = clsn_data-load_bb;
@@ -2441,6 +2463,10 @@ 

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Julian Brown
On Mon, 22 Jun 2015 16:24:56 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote:
  One problem is that (at least on the GPU hardware we've considered
  so far) we're somewhat constrained in how much control we have over
  how the underlying hardware executes code: it's possible to draw up
  a scheme where OpenACC source-level control-flow semantics are
  reflected directly in the PTX assembly output (e.g. to say all
  threads in a CTA/warp will be coherent after such-and-such a
  loop), and lowering OpenACC directives quite early seems to make
  that relatively tractable. (Even if the resulting code is
  relatively un-optimisable due to the abnormal edges inserted to
  make sure that the CFG doesn't become ill-formed.)
  
  If arbitrary optimisations are done between OMP-lowering time and
  somewhere around vectorisation (say), it's less clear if that
  correspondence can be maintained. Say if the code executed by half
  the threads in a warp becomes physically separated from the code
  executed by the other half of the threads in a warp due to some loop
  optimisation, we can no longer easily determine where that warp will
  reconverge, and certain other operations (relying on coherent warps
  -- e.g. CTA synchronisation) become impossible. A similar issue
  exists for warps within a CTA.
  
  So, essentially -- I don't know how late loop lowering would
  interact with:
  
  (a) Maintaining a CFG that will work with PTX.
  
  (b) Predication for worker-single and/or vector-single modes
  (actually all currently-proposed schemes have problems with proper
  representation of data-dependencies for variables and
  compiler-generated temporaries between predicated regions.)
 
 I don't understand why lowering the way you suggest helps here at all.
 In the proposed scheme, you essentially have whole function
 in e.g. worker-single or vector-single mode, which you need to be
 able to handle properly in any case, because users can write such
 routines themselves.

In vector-single or worker-single mode, divergence of threads within a
warp or a CTA is controlled by broadcasting the controlling expression
of conditional branches to the set of inactive threads, so each of
those follows along with the active thread. So you only get
potentially-problematic thread divergence when workers or vectors are
operating in partitioned mode.

So, for instance, a made-up example:

#pragma acc parallel
{
  #pragma acc loop gang
  for (i = 0; i  N; i++))
  {
#pragma acc loop worker
for (j = 0; j  M; j++)
{
  if (j  M / 2)
/* stmt 1 */
  else
/* stmt 2 */
}

/* reconvergence point: thread barrier */

[...]
  }
}

Here stmt 1 and stmt 2 execute in worker-partitioned, vector-single
mode. With early lowering, the reconvergence point can be
inserted at the end of the loop, and abnormal edges (etc.) can be used
to ensure that the CFG does not get changed in such a way that there is
no longer a unique point at which the loop threads reconverge.

With late lowering, it's no longer obvious to me if that can still be
done.

Julian


[PATCH] parloops exit phi fixes

2015-06-22 Thread Tom de Vries

Hi,

the gomp-4_0-branch contains the kernels oacc pass group. I've run into 
trouble before with this pass group due to the fact that it uses passes 
in an unusual location or order (pass_lim before pass_stdarg, 
https://gcc.gnu.org/ml/gcc/2015-01/msg00282.html ).


In an attempt to find this sort of issue pro-actively, I've modified the 
pass list in the following way (similar to the oacc kernels pass group, 
but always functional, not just for functions with kernel regions or 
loops in kernels regions), and bootstrapped and reg-tested on x86_64 on 
top of gomp-4_-0-branch:

...
   NEXT_PASS (pass_build_ealias);
   NEXT_PASS (pass_fre);
+  NEXT_PASS (pass_ch);
+  NEXT_PASS (pass_tree_loop_init);
+  NEXT_PASS (pass_lim);
+  NEXT_PASS (pass_tree_loop_done);
+  NEXT_PASS (pass_fre);
+  NEXT_PASS (pass_tree_loop_init);
+  NEXT_PASS (pass_scev_cprop);
+  NEXT_PASS (pass_parallelize_loops);
+  NEXT_PASS (pass_expand_omp_ssa);
+  NEXT_PASS (pass_tree_loop_done);
   NEXT_PASS (pass_merge_phi);
   NEXT_PASS (pass_dse);
...

Apart from running into PR66616, I found two issues with the parloops pass:
1. handling of loop header phi, when there's no corresponding loop exit
   phi (unused reduction result)
2. handling of loop exit phi, when there's no corresponding loop header
   phi (value not modified in loop)

The two attached patches fix these problems.

Bootstrapped and reg-tested on x864_64 on top of gomp-4_0-branch in 
combination with the patch series that triggered the problem.


Bootstrapped and reg-tested on x864_64 on top of trunk.

OK for trunk?

Thanks,
- Tom
Handle unused reduction in create_loads_for_reductions

2015-06-22  Tom de Vries  t...@codesourcery.com

	* tree-parloops.c (create_loads_for_reductions): Handle case that
	reduction is unused.
---
 gcc/tree-parloops.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 48c143d..28112b2 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1162,6 +1162,10 @@ create_loads_for_reductions (reduction_info **slot, struct clsn_data *clsn_data)
   tree name;
   tree x;
 
+  /* If there's no exit phi, the result of the reduction is unused.  */
+  if (red-keep_res == NULL)
+return 1;
+
   gsi = gsi_after_labels (clsn_data-load_bb);
   load_struct = build_simple_mem_ref (clsn_data-load);
   load_struct = build3 (COMPONENT_REF, type, load_struct, red-field,
-- 
1.9.1

Handle exit phi without header phi in create_parallel_loop

2015-06-22  Tom de Vries  t...@codesourcery.com

	* tree-parloops.c (create_parallel_loop): Handle case that exit phi does
	not have a corresponding loop header phi.
---
 gcc/tree-parloops.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 7123c27..0693b9e 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2061,13 +2061,17 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
!gsi_end_p (gpi); gsi_next (gpi))
 {
   source_location locus;
-  tree def;
   gphi *phi = gpi.phi ();
-  gphi *stmt;
+  tree def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
+  gimple def_stmt = SSA_NAME_DEF_STMT (def);
 
-  stmt = as_a gphi * (
-	   SSA_NAME_DEF_STMT (PHI_ARG_DEF_FROM_EDGE (phi, exit)));
+  /* If the exit phi is not connected to a header phi in the same loop, this
+	 value is not modified in the loop, and we're done with this phi.  */
+  if (!(gimple_code (def_stmt) == GIMPLE_PHI
+	 gimple_bb (def_stmt) == loop-header))
+	continue;
 
+  gphi *stmt = as_a gphi * (def_stmt);
   def = PHI_ARG_DEF_FROM_EDGE (stmt, loop_preheader_edge (loop));
   locus = gimple_phi_arg_location_from_edge (stmt,
 		 loop_preheader_edge (loop));
-- 
1.9.1



Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Jakub Jelinek
On Mon, Jun 22, 2015 at 06:48:10PM +0100, Julian Brown wrote:
 In vector-single or worker-single mode, divergence of threads within a
 warp or a CTA is controlled by broadcasting the controlling expression
 of conditional branches to the set of inactive threads, so each of
 those follows along with the active thread. So you only get
 potentially-problematic thread divergence when workers or vectors are
 operating in partitioned mode.
 
 So, for instance, a made-up example:
 
 #pragma acc parallel
 {
   #pragma acc loop gang
   for (i = 0; i  N; i++))
   {
 #pragma acc loop worker
 for (j = 0; j  M; j++)
 {
   if (j  M / 2)
 /* stmt 1 */
   else
 /* stmt 2 */
 }
 
 /* reconvergence point: thread barrier */
 
 [...]
   }
 }
 
 Here stmt 1 and stmt 2 execute in worker-partitioned, vector-single
 mode. With early lowering, the reconvergence point can be
 inserted at the end of the loop, and abnormal edges (etc.) can be used
 to ensure that the CFG does not get changed in such a way that there is
 no longer a unique point at which the loop threads reconverge.
 
 With late lowering, it's no longer obvious to me if that can still be
 done.

Why?  The loop still has an exit edge (if there is no break/return/throw out of
the loop which I bet is not allowed), so you just insert the reconvergence
point at the exit edge from the loop.
For the late lowering, I said it is up for benchmarking/investigation
where it would be best placed, it doesn't have to be after the loop passes,
there are plenty of optimization passes even before those.  But once you turn
many of the SSA_NAMEs in a function into (ab) ssa vars, many optimizations
just give up.
And, if you really want to avoid certain loop optimizations, you have always
the possibility to e.g. wrap certain statement in the loop in internal
function (e.g. the loop condition) or something similar to make the passes
more careful about those loops and make it easier to lower it later.

Jakub


patch to fix PR63740 on trunk

2015-06-22 Thread Vladimir Makarov

I've committed patch for PR63740 to the trunk as rev. 224753.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63740

The patch was bootstrapped on x86-64.

2015-06-22  Vladimir Makarov vmaka...@redhat.com

PR bootstrap/63740
* lra-lives.c (process_bb_lives): Check insn copying the same
reload pseudo and don't create a copy for it.

Index: lra-lives.c
===
--- lra-lives.c (revision 224739)
+++ lra-lives.c (working copy)
@@ -565,7 +565,15 @@ process_bb_lives (basic_block bb, int c
  dst_regno = REGNO (SET_DEST (set));
  if (dst_regno = lra_constraint_new_regno_start
   src_regno = lra_constraint_new_regno_start)
-   lra_create_copy (dst_regno, src_regno, freq);
+   {
+ /* It might be still an original (non-reload) insn with
+one unused output and a constraint requiring to use
+the same reg for input/output operands. In this case
+dst_regno and src_regno have the same value, we don't
+need a misleading copy for this case.  */
+ if (dst_regno != src_regno)
+   lra_create_copy (dst_regno, src_regno, freq);
+   }
  else if (dst_regno = lra_constraint_new_regno_start)
{
  if ((hard_regno = src_regno) = FIRST_PSEUDO_REGISTER)





Re: match.pd: Three new patterns

2015-06-22 Thread Marek Polacek
On Fri, Jun 19, 2015 at 05:51:53PM +0200, Marc Glisse wrote:
 On Fri, 19 Jun 2015, Marek Polacek wrote:
 
 +/* x + y - (x | y) - x  y */
 +(simplify
 + (minus (plus @0 @1) (bit_ior @0 @1))
 + (if (!TYPE_OVERFLOW_SANITIZED (type)  !TYPE_SATURATING (type))
 +  (bit_and @0 @1)))
 +
 +/* (x + y) - (x  y) - x | y */
 +(simplify
 + (minus (plus @0 @1) (bit_and @0 @1))
 + (if (!TYPE_OVERFLOW_SANITIZED (type)  !TYPE_SATURATING (type))
 +  (bit_ior @0 @1)))
 
 It could be macroized so they are handled by the same piece of code, but
 that's not important for a couple lines.
 
Yeah, that could be done, but I didn't see much value in doing that.

 As far as I can tell, TYPE_SATURATING is for fixed point numbers only, are
 we allowed to use bit_ior/bit_and on those? I never know what kind of
 integers are supposed to be supported, so I would have checked
 TYPE_OVERFLOW_UNDEFINED (type) || TYPE_OVERFLOW_WRAPS (type) since those are
 the 2 cases where we know it is safe (for TYPE_OVERFLOW_TRAPS it is never
 clear if we are supposed to preserve traps or just avoid introducing new
 ones). Well, the reviewer will know, I'll shut up :-)
 
I think you're right about TYPE_SATURATING so I've dropped that and instead
replaced it with TYPE_OVERFLOW_TRAPS.  That should do the right thing
together with TYPE_OVERFLOW_SANITIZED.

 (I still believe that the necessity for TYPE_OVERFLOW_SANITIZED here points
 to a design issue in ubsan, but it is way too late to discuss that)

I think delayed folding would help here a bit.  Also, we've been talking
about doing the signed overflow sanitization earlier, but so far I didn't
implement that.  And -ftrapv should be merged into the ubsan infrastructure
some day.

 It is probably not worth the trouble adding the variant:
 x+(y-(xy)) - x|y
 since it decomposes as
 y-(xy) - y~x
 x+(y~x) - x|y
 x+(y-(x|y)) - x-(x~y) - xy is less likely to happen because the first
 transform y-(x|y) - -(x~y) increases the number of insns. Bah, we can't
 handle everything...

That sounds about right ;).  Thanks!

So, Richi, is this variant ok as well?  I also added one ubsan test.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-22  Marek Polacek  pola...@redhat.com

* match.pd ((x + y) - (x | y) - x  y,
(x + y) - (x  y) - x | y): New patterns.

* gcc.dg/fold-minus-4.c: New test.
* gcc.dg/fold-minus-5.c: New test.
* c-c++-common/ubsan/overflow-add-5.c: New test.

diff --git gcc/match.pd gcc/match.pd
index badb80a..6d520ef 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -343,6 +343,18 @@ along with GCC; see the file COPYING3.  If not see
  (plus:c (bit_and @0 @1) (bit_ior @0 @1))
  (plus @0 @1))
 
+/* (x + y) - (x | y) - x  y */
+(simplify
+ (minus (plus @0 @1) (bit_ior @0 @1))
+ (if (!TYPE_OVERFLOW_SANITIZED (type)  !TYPE_OVERFLOW_TRAPS (type))
+  (bit_and @0 @1)))
+
+/* (x + y) - (x  y) - x | y */
+(simplify
+ (minus (plus @0 @1) (bit_and @0 @1))
+ (if (!TYPE_OVERFLOW_SANITIZED (type)  !TYPE_OVERFLOW_TRAPS (type))
+  (bit_ior @0 @1)))
+
 /* (x | y) - (x ^ y) - x  y */
 (simplify
  (minus (bit_ior @0 @1) (bit_xor @0 @1))
diff --git gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c 
gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c
index e69de29..905a60a 100644
--- gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c
+++ gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options -fsanitize=signed-integer-overflow } */
+
+int __attribute__ ((noinline))
+foo (int i, int j)
+{
+  return (i + j) - (i | j);
+}
+
+/* { dg-output signed integer overflow: 2147483647 \\+ 1 cannot be 
represented in type 'int'\[^\n\r]*(\n|\r\n|\r) } */
+/* { dg-output \[^\n\r]*signed integer overflow: -2147483648 - 2147483647 
cannot be represented in type 'int'\[^\n\r]*(\n|\r\n|\r) } */
+
+int __attribute__ ((noinline))
+bar (int i, int j)
+{
+  return (i + j) - (i  j);
+}
+
+/* { dg-output \[^\n\r]*signed integer overflow: 2147483647 \\+ 1 cannot be 
represented in type 'int'\[^\n\r]*(\n|\r\n|\r) } */
+/* { dg-output \[^\n\r]*signed integer overflow: -2147483648 - 1 cannot be 
represented in type 'int' } */
+
+int
+main ()
+{
+  int r = foo (__INT_MAX__, 1);
+  asm volatile ( : +g (r));
+  r = bar (__INT_MAX__, 1);
+  asm volatile ( : +g (r));
+  return 0;
+}
diff --git gcc/testsuite/gcc.dg/fold-minus-4.c 
gcc/testsuite/gcc.dg/fold-minus-4.c
index e69de29..2d76b4f 100644
--- gcc/testsuite/gcc.dg/fold-minus-4.c
+++ gcc/testsuite/gcc.dg/fold-minus-4.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options -O -fdump-tree-cddce1 } */
+
+int
+fn1 (int a, int b)
+{
+  int tem1 = a + b;
+  int tem2 = a  b;
+  return tem1 - tem2;
+}
+
+int
+fn2 (int a, int b)
+{
+  int tem1 = b + a;
+  int tem2 = a  b;
+  return tem1 - tem2;
+}
+
+int
+fn3 (int a, int b)
+{
+  int tem1 = a + b;
+  int tem2 = b  a;
+  return tem1 - tem2;
+}
+
+int
+fn4 (int a, int b)
+{
+  int tem1 = b + a;
+  int tem2 = b  a;
+  return tem1 - tem2;
+}
+
+/* { dg-final { 

Re: [ping] Couple of patches for -fdump-ada-spec

2015-06-22 Thread Jeff Law

On 06/22/2015 09:33 AM, Eric Botcazou wrote:

Add query for template-dependent arguments to -fdump-ada-spec:
   http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00403.html

Get rid of assembly file with -fdump-ada-spec:
   http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00420.html

OK for both.

jeff


Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-06-22 Thread Alexander Monakov
On Mon, 22 Jun 2015, Jiong Wang wrote:
 Have done a quick experiment, -fno-plt doesn't work on AArch64.
 
 it's because although this patch force the function address into register,
 but the combine pass runs later combine it back as AArch64 have defined such
 insn pattern.
 
 For X86, it's not combined back. From the rtl dump, it's because the rtl pre
 pass has moved the address load instruction into another basic block and
 combine pass don't combine across basic blocks. Also, x86 backend has done
 some check on flag_plt in the new added ix86_nopic_noplt_attribute_p which
 could help generate correct insns.
 
 What I can think of the fix on AArch64 is by restricting the call symbol
 under flag_plt == true only, so that call via register can't be combined
 into call symbol direct,
 
 Or better to prohibit combine pass for such combining? as the generic fix on
 combine may fix other broken targets.

My colleagues at ISP RAS (CC'ed) have been looking on arm (and aarch64) no-plt
codegen.  We also saw the problem with the combine pass you describe.  I think
your description of why it's not observed on x86 is incorrect; the newly added
ix86_nopic_noplt_attribute_p should not have anything to do with that.  It's
just that the GOT load insn has a REG_EQUAL note, and the combine pass can use
it to replace the register in the indirect branch, producing a direct branch
to a symbol (i.e. a PLT jump).

Actually we are not hitting the same problem on x86 by pure luck.  Early RTL
passes manage to lose the REG_EQUAL note, so by the time combine runs, the
register annotation is lost.  It's possible to reproduce the arm/aarch64
problem on x86 with -fno-gcse and the following hack:

diff --git a/gcc/cse.c b/gcc/cse.c
index 2a33827..88cff96 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -6634,6 +6634,9 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs)
   int *rc_order = XNEWVEC (int, last_basic_block_for_fn (cfun));
   int i, n_blocks;
 
+  if (!flag_gcse)
+return 0;
+
   df_set_flags (DF_LR_RUN_DCE);
   df_note_add_problem ();
   df_analyze ();

Regarding fixing the issue, I also think that combine pass might be a better
place (than the backends).  I'd appreciate comments from maintainers.


If you try disabling the REG_EQUAL note generation [*], you'll probably find a
performance regression on arm32 (and probably on aarch64 as well? we only
tried arm32 so far).  The main reason for that is that GCC emits pretty bad
code for a GOT load.  Instead of using two add instructions and one ldr for
the GOT slot access, like the PLT stubs do, it uses three(!) ldr instructions
and one add.  The first ldr is for loading the GOT address, and the second is
for the offset of the GOT slot.  As I understand, to fix that, GCC has to
learn using the GOT_PREL relocation type.

[*] To do that, we hacked arm legitimize_pic_address not to emit REG_EQUAL
note under !flag_plt.

Alexander


Re: [PATCH 1/3] Refactor entry point to -Wmisleading-indentation

2015-06-22 Thread Patrick Palka
On Mon, Jun 22, 2015 at 1:29 PM, Jeff Law l...@redhat.com wrote:
 On 06/09/2015 11:31 AM, Patrick Palka wrote:

 This patch refactors the entry point of -Wmisleading-indentation from:

void
warn_for_misleading_indentation (location_t guard_loc,
 location_t body_loc,
 location_t next_stmt_loc,
 enum cpp_ttype next_tok_type,
 const char *guard_kind);

 to

struct token_indent_info
{
  location_t location;
  cpp_ttype type;
  rid keyword;
};

void
warn_for_misleading_indentation (const token_indent_info guard_tinfo,
 const token_indent_info body_tinfo,
 const token_indent_info next_tinfo);

 The purpose of this refactoring is to expose more information to the
 -Wmisleading-indentation implementation to allow for more advanced
 heuristics and for better coverage.

 (I decided to keep the usage of const references because nobody
 seems to mind.  Also I added a new header file, c-indentation.h.)

 gcc/c-family/ChangeLog:

 * c-indentation.h (struct token_indent_info): Define.
 (get_token_indent_info): Define.
 (warn_for_misleading_information): Declare.
 * c-common.h (warn_for_misleading_information): Remove.
 * c-identation.c (warn_for_misleading_indentation):
 Change declaration to take three token_indent_infos.  Adjust
 accordingly.
 * c-identation.c (should_warn_for_misleading_indentation):
 Likewise.  Bail out early if the body is a compound statement.
 (guard_tinfo_to_string): Define.

 gcc/c/ChangeLog:

 * c-parser.c (c_parser_if_body): Take token_indent_info
 argument. Call warn_for_misleading_indentation even when the
 body is a semicolon.  Extract token_indent_infos corresponding
 to the guard, body and next tokens.  Adjust call to
 warn_for_misleading_indentation accordingly.
 (c_parser_else_body): Likewise.
 (c_parser_if_statement): Likewise.
 (c_parser_while_statement): Likewise.
 (c_parser_for_statement): Likewise.

 gcc/cp/ChangeLog:

 * parser.c (cp_parser_selection_statement): Move handling of
 semicolon body to ...
 (cp_parser_implicitly_scoped_statement): .. here.  Call
 warn_for_misleading_indentation even when the body is a
 semicolon.  Extract token_indent_infos corresponding to the
 guard, body and next tokens.  Adjust call to
 warn_for_misleading_indentation accordingly.  Take
 token_indent_info argument.
 (cp_parser_already_scoped_statement): Likewise.
 (cp_parser_selection_statement, cp_parser_iteration_statement):
 Extract a token_indent_info corresponding to the guard token.

 The only question in my mind is bootstrap  regression testing.  From
 reading the thread for the earlier version of this patch I got the
 impression you had bootstrapped and regression tested earlier versions.

 If you could confirm that you've bootstrapped and regression tested this
 version it'd be appreciated.  You can do it on the individual patches or the
 set as a whole.

I think I successfully bootstrapped + regtested this exact version but
I'm not sure.  I was going to do so again before committing anyway.
I will fire off a build tonight and confirm the results tomorrow.


 Jeff




patch to fix PR63740

2015-06-22 Thread Vladimir Makarov

The following patch fixes PR63740 which is describedin details on

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63740

Committed as rev. 224752.

I'll commit later the same patch for the trunk.

2015-06-22  Vladimir Makarov  vmaka...@redhat.com

PR bootstrap/63740
* lra-lives.c (process_bb_lives): Check insn copying the same
reload pseudo and don't create a copy for it.

Index: lra-lives.c
===
--- lra-lives.c (revision 224739)
+++ lra-lives.c (working copy)
@@ -565,7 +565,15 @@ process_bb_lives (basic_block bb, int c
  dst_regno = REGNO (SET_DEST (set));
  if (dst_regno = lra_constraint_new_regno_start
   src_regno = lra_constraint_new_regno_start)
-   lra_create_copy (dst_regno, src_regno, freq);
+   {
+ /* It might be still an original (non-reload) insn with
+one unused output and a constraint requiring to use
+the same reg for input/output operands. In this case
+dst_regno and src_regno have the same value, we don't
+need a misleading copy for this case.  */
+ if (dst_regno != src_regno)
+   lra_create_copy (dst_regno, src_regno, freq);
+   }
  else if (dst_regno = lra_constraint_new_regno_start)
{
  if ((hard_regno = src_regno) = FIRST_PSEUDO_REGISTER)





Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-06-22 Thread Marek Polacek
On Sun, Jun 21, 2015 at 05:05:14PM -0600, Martin Sebor wrote:
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/addr_builtin-pr66516.c
 @@ -0,0 +1,59 @@
 +/* { dg-do compile } */

One more nit: I think I'd prefer naming the test addr-builtin-1.c
and then putting /* PR c/66516 */ on the first line of the test.

Marek


Re: [PATCH] Fix PR c++/30044

2015-06-22 Thread Jason Merrill

On 06/15/2015 02:32 PM, Patrick Palka wrote:

On Mon, Jun 15, 2015 at 2:05 PM, Jason Merrill ja...@redhat.com wrote:

Any reason not to use grow_tree_vec?


Doing so causes a lot of ICEs in the testsuite.  I think it's because
grow_tree_vec invalidates the older parameter_vec which some trees may
still be holding a reference to in their DECL_TEMPLATE_PARMS field.


Hmm, that's unfortunate, as doing it this way means we get a bunch of 
garbage TREE_VECs in the process.  But I guess the patch is OK as is.


Jason



Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-06-22 Thread Martin Sebor

It seems like this patch regresess pr59630.c testcase; I don't see
the testcase being addressed in this patch.


Thanks for the review and for pointing out this regression!
I missed it among all the C test suite failures (I see 157
of them in 24 distinct tests on x86_64.)

pr59630 is marked ice-on-valid-code even though the call via
the converted pointer is clearly invalid (UB). What's more
relevant, though, is that the test case is one of those that
(while they both compile and link with the unpatched GCC) are
not intended to compile with the patch (and don't compile with
Clang).

In this simple case, the call to __builtin_abs(0) is folded
into the constant 0, but in more involved cases GCC emits
a call to abs. It's not clear to me from the manual or from
the builtin tests I've seen whether this is by design or
an accident of the implementation

Is it intended that programs be able to take the address of
the builtins that correspond to libc functions and make calls
to the underlying libc functions via such pointers? (If so,
the patch will need some tweaking.)



Please no c/ and cp/ prefixes.


Sure, let me fix that in the next patch once the question
above has been settled.




+#include stdlib.h


As Joseph already pointed out, this is redundant.


Yes, that was an accidental vestige of some debugging code I had
added. I'll take it out.




@@ -3384,7 +3392,14 @@ parser_build_unary_op (location_t loc, enum tree_code 
code, struct c_expr arg)
result.original_code = code;
result.original_type = NULL;

-  if (TREE_OVERFLOW_P (result.value)  !TREE_OVERFLOW_P (arg.value))
+  if (code == ADDR_EXPR
+   TREE_CODE (TREE_TYPE (arg.value)) == FUNCTION_TYPE
+   DECL_IS_BUILTIN (arg.value))
+{
+  error_at (loc, taking address of a builtin function);
+  result.value = error_mark_node;
+}
+  else if (TREE_OVERFLOW_P (result.value)  !TREE_OVERFLOW_P (arg.value))
  overflow_warning (loc, result.value);


It seems like you can move the new hunk a bit above so that we don't call
build_unary_op in a case when taking the address of a built-in function.


Yes, that should work.



Unfortunately, it doesn't seem possible to do this error in build_unary_op
or in function_to_pointer_conversion :(.


Right. I couldn't find a way to do it because it gets called
for function calls too.


One more nit: I think I'd prefer naming the test addr-builtin-1.c
and then putting /* PR c/66516 */ on the first line of the test.


Will do.

Martin



[i386, PATCH] Support new psABI for IA MCU.

2015-06-22 Thread Kirill Yukhin
Hello,
I am starting (hopefully small) serie of patches to support 
new ABI dedicated for Intel's MicroController Units [1].

Support for new arch was introduced into Binutils in a few threads, e.g. [2].

This patchset includes:
 - Support in GCC: new switch (-miamcu), macro etc.
 - Changes to libraries.
 - Testsuite.

Whole patch is in the bottom.

[1] - https://groups.google.com/forum/#!topic/ia32-abi/cn7TM6J_TIg
[2] - https://sourceware.org/ml/binutils/2015-05/msg00063.html

--
Thanks, K

diff --git a/config/dfp.m4 b/config/dfp.m4
index 48683f0..5b29089 100644
--- a/config/dfp.m4
+++ b/config/dfp.m4
@@ -21,7 +21,7 @@ Valid choices are 'yes', 'bid', 'dpd', and 'no'.]) ;;
 [
   case $1 in
 powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \
-i?86*-*-gnu* | \
+i?86*-*-elfiamcu | i?86*-*-gnu* | \
 i?86*-*-mingw* | x86_64*-*-mingw* | \
 i?86*-*-cygwin* | x86_64*-*-cygwin*)
   enable_decimal_float=yes
diff --git a/configure b/configure
index bced9de..82e45f3 100755
--- a/configure
+++ b/configure
@@ -6914,7 +6914,7 @@ case ${enable_target_optspace}:${target} in
   :d30v-*)
 ospace_frag=config/mt-d30v
 ;;
-  :m32r-* | :d10v-* | :fr30-*)
+  :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu)
 ospace_frag=config/mt-ospace
 ;;
   no:* | :*)
diff --git a/configure.ac b/configure.ac
index 7c06e6b..dc77a1b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2560,7 +2560,7 @@ case ${enable_target_optspace}:${target} in
   :d30v-*)
 ospace_frag=config/mt-d30v
 ;;
-  :m32r-* | :d10v-* | :fr30-*)
+  :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu)
 ospace_frag=config/mt-ospace
 ;;
   no:* | :*)
diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index 0f8c3e1..79b2472 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -223,7 +223,7 @@ along with GCC; see the file COPYING3.  If not see
 
 bool
 ix86_handle_option (struct gcc_options *opts,
-   struct gcc_options *opts_set ATTRIBUTE_UNUSED,
+   struct gcc_options *opts_set,
const struct cl_decoded_option *decoded,
location_t loc)
 {
@@ -232,6 +232,20 @@ ix86_handle_option (struct gcc_options *opts,
 
   switch (code)
 {
+case OPT_miamcu:
+  if (value)
+   {
+ /* Turn off x87/MMX/SSE/AVX codegen for -miamcu.  */
+ opts-x_target_flags = ~MASK_80387;
+ opts_set-x_target_flags |= MASK_80387;
+ opts-x_ix86_isa_flags = ~(OPTION_MASK_ISA_MMX_UNSET
+ | OPTION_MASK_ISA_SSE_UNSET);
+ opts-x_ix86_isa_flags_explicit |= (OPTION_MASK_ISA_MMX_UNSET
+ | OPTION_MASK_ISA_SSE_UNSET);
+
+   }
+  return true;
+
 case OPT_mmmx:
   if (value)
{
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 805638d..2b3af82 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1389,6 +1389,9 @@ x86_64-*-darwin*)
tmake_file=${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc
tm_file=${tm_file} ${cpu_type}/darwin64.h
;;
+i[34567]86-*-elfiamcu)
+   tm_file=${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/iamcu.h
+   ;;
 i[34567]86-*-elf*)
tm_file=${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/i386elf.h
;;
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 0228f4b..66f7e37 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -426,6 +426,11 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, __CLWB__);
   if (isa_flag  OPTION_MASK_ISA_MWAITX)
 def_or_undef (parse_in, __MWAITX__);
+  if (TARGET_IAMCU)
+{
+  def_or_undef (parse_in, __iamcu);
+  def_or_undef (parse_in, __iamcu__);
+}
 }
 
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 24fccfc..26ffa67 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3433,6 +3433,10 @@ ix86_option_override_internal (bool main_args_p,
  || TARGET_16BIT_P (opts-x_ix86_isa_flags))
opts-x_ix86_isa_flags = ~OPTION_MASK_ABI_X32;
 #endif
+  if (TARGET_64BIT_P (opts-x_ix86_isa_flags)
+  TARGET_IAMCU_P (opts-x_target_flags))
+   sorry (Intel MCU psABI isn%'t supported in %s mode,
+  TARGET_X32_P (opts-x_ix86_isa_flags) ? x32 : 64-bit);
 }
 #endif
 
@@ -3817,6 +3821,20 @@ ix86_option_override_internal (bool main_args_p,
   if (TARGET_X32  (ix86_isa_flags  OPTION_MASK_ISA_MPX))
 error (Intel MPX does not support x32);
 
+  if (TARGET_IAMCU_P (opts-x_target_flags))
+{
+  /* Verify that x87/MMX/SSE/AVX is off for -miamcu.  */
+  if (TARGET_80387_P (opts-x_target_flags))
+   sorry (X87 FPU isn%'t supported in Intel MCU psABI);
+  else if ((opts-x_ix86_isa_flags  (OPTION_MASK_ISA_MMX
+   

[Patch, fortran] PR52846 - [F2008] Support submodules

2015-06-22 Thread Paul Richard Thomas
Dear All,

This patch enables submodule support in gfortran. Submodules are a
feature of F2008 but are fully described in ISO/IEC TR 19767:2004(E).

The patch has one significant non-conformance (that I know about,
anyway!); whilst private derived type components are correctly dealt
with, symbols whose access is private within the parent module are
not. They should effectively be host associated in descendant
submodules. At present gfortran handles private access at the module
write stage. This means that when a submodule reads the module file,
there is no information present about symbols whose access was
private. Since this modification might cause significant fall-out to
existing code, I propose to submit a separate patch later on to sort
out the non-conformance. However, as required private and public
statements are not allowed in submodules.

The patch makes maximum possible leverage of existing code to handle
modules. Once the submodule is matched, the ancestor module and
submodules are first used and then all the symbols are set host
associated and private derived type components set public.

Most of the work involved matching module procedures, with both the
traditional form of declaration and the abbreviated one. I have chosen
to treat MODULE as a prefix like PURE or ELEMENTAL. This is logical
both because of the form of the declaration and because the
identification of module procedures is most easily done with an
attribute bit. With traditional procedure declarations, the procedure,
result and dummy characteristics are compared with those of the
interface declaration. The comparison of the dummy characteristics is
a bit cobbled together and might be better done by copying the
formal_namespace and it's contents to the new symbol and retaining the
old for the interface symbol. This patch leaves the old dummy symbols
in the formal namespace in the new ones in the formal arglist. I have
checked that cleanup occurs for all objects.

Note the comment in submodule_1.f90 about the possibility of
undetected recursion between procedures in different submodules. I am
not at all sure that I know how to deal with this and am open to
suggestions.

In addition, it should be noted that collisions between the names of
entities and procedures, other than module procedures are detected by
the linker at present.

Apart from this, all is very straightforward and follows the the ChangeLogs.

Thanks for testing of an early version of the patch by Damian Rouson,
Salvatore Filippone and Tobias Burnus.

Bootstrapped and regtested on FC21/x86_64 - OK for trunk?

Cheers

Paul

2015-06-22  Paul Thomas  pa...@gcc.gnu.org

PR fortran/52846
* decl.c (get_proc_name): Make a partially populated interface
symbol to carry the characteristics of a module procedure and
its result.
(match_attr_spec): Submodule variables have implicit save
attribute for F2008 onwards.
(gfc_match_prefix): Add 'module' as the a prefix and set the
module_procedure attribute.
(gfc_match_formal_arglist): For a module procedure keep the
interface formal_arglist from the interface, match new the
formal arguments and then compare the number and names of each.
(gfc_match_procedure): Add case COMP_SUBMODULE.
(gfc_match_function_decl, gfc_match_subroutine_decl): Set the
module_procedure attribute.
(gfc_match_entry, gfc_match_end):  Add case COMP_SUBMODULE.
(gfc_match_submod_proc): New function to match the abbreviated
style of submodule declaration.
* gfortran.h : Add ST_SUBMODULE and ST_END_SUBMODULE. Add the
attribute bits 'used_in_submodule' and 'module_procedure'. Add
prototypes for the functions 'gfc_check_dummy_characteristics'
and 'gfc_check_result_characteristics'.
* interface.c : Add the prefix 'gfc_' to the names of functions
'check_dummy(result)_characteristics' and all their references.
* match.h : Add prototype for 'gfc_match_submod_proc' and
'gfc_match_submodule'.
* module.c (gfc_match_submodule): New function. Add handling
for the 'module_procedure' attribute bit.
* parse.c (decode_statement): Handle a match occurring in
'gfc_match_submod_proc' and a match for 'submodule'.
(gfc_enclosing_unit): Include the state COMP_SUBMODULE.
(gfc_ascii_statement): Add END SUBMODULE.
(accept_statement): Add ST_SUBMODULE.
(parse_spec): Disallow statement functions in a submodule
specification part.
(parse_contained): Add ST_END_SUBMODULE and COMP_SUBMODULE
twice each.
(set_syms_host_assoc): Make symbols from the ancestor module
and submodules use associated, as required by the standard and
set all private components public. Module procedures 'external'
attribute bit is reset and the 'used_in_submodule' bit is set.
(parse_module): If this is a submodule, use the ancestor module
and submodules. Traverse the namespace, calling
'set_syms_host_assoc'. Add ST_END_SUBMODULE and COMP_SUBMODULE.
* 

[C++ Patch] Remove pointless code in grokdeclarator

2015-06-22 Thread Paolo Carlini

Hi,

I think this qualifies as obvious: we reset type_quals to 
TYPE_UNQUALIFIED and then we only use it in a 'if (type_quals != 
TYPE_UNQUALIFIED)' test before returning.


Thanks,
Paolo.

///
2015-06-22  Paolo Carlini  paolo.carl...@oracle.com

* decl.c (grokdeclarator): Remove pointless code.
Index: cp/decl.c
===
--- cp/decl.c   (revision 224724)
+++ cp/decl.c   (working copy)
@@ -10476,19 +10477,9 @@ grokdeclarator (const cp_declarator *declarator,
 
   if (decl_context == TYPENAME)
 {
-  /* Note that the grammar rejects storage classes
-in typenames, fields or parameters.  */
-  if (type_quals != TYPE_UNQUALIFIED)
-   type_quals = TYPE_UNQUALIFIED;
-
   /* Special case: friend class foo looks like a TYPENAME context.  */
   if (friendp)
{
- if (type_quals != TYPE_UNQUALIFIED)
-   {
- error (type qualifiers specified for friend class declaration);
- type_quals = TYPE_UNQUALIFIED;
-   }
  if (inlinep)
{
  error (%inline% specified for friend class declaration);


Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Bernd Schmidt

On 06/19/2015 03:45 PM, Jakub Jelinek wrote:

I actually believe having some optimization passes in between the ompexp
and the lowering of the IR into the form PTX wants is highly desirable,
the form with the worker-single or vector-single mode lowered will contain
too complex CFG for many optimizations to be really effective, especially
if it uses abnormal edges.  E.g. inlining supposedly would have harder job
etc.  What exact unpredictable effects do you fear?


Mostly the ones I can't predict. But let's take one example, LICM: let's 
say you pull some assignment out of a loop, then you find yourself in 
one of two possible situations: either it's become not actually 
available inside the loop (because the data and control flow is not 
described correctly and the compiler doesn't know what's going on), or, 
to avoid that, you introduce additional broadcasting operations when 
entering the loop, which might be quite expensive.



Bernd



Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Jakub Jelinek
On Mon, Jun 22, 2015 at 03:59:57PM +0200, Bernd Schmidt wrote:
 On 06/19/2015 03:45 PM, Jakub Jelinek wrote:
 I actually believe having some optimization passes in between the ompexp
 and the lowering of the IR into the form PTX wants is highly desirable,
 the form with the worker-single or vector-single mode lowered will contain
 too complex CFG for many optimizations to be really effective, especially
 if it uses abnormal edges.  E.g. inlining supposedly would have harder job
 etc.  What exact unpredictable effects do you fear?
 
 Mostly the ones I can't predict. But let's take one example, LICM: let's say
 you pull some assignment out of a loop, then you find yourself in one of two
 possible situations: either it's become not actually available inside the
 loop (because the data and control flow is not described correctly and the
 compiler doesn't know what's going on), or, to avoid that, you introduce

Why do you think that would happen?  E.g. for non-addressable gimple types you'd
most likely just have a PHI for it on the loop.

 additional broadcasting operations when entering the loop, which might be
 quite expensive.

If the PHI has cheap initialization, there is not a problem to emit it as
initialization in the loop instead of a broadcast (kind like RA
rematerialization).  And by actually adding such an optimization, you help
even code that has computation in a vector-single code and uses it in vector
acc loop.

Jakub


Re: [gomp4][PATCH] Handle casts in bound in try_transform_to_exit_first_loop_alt

2015-06-22 Thread Richard Biener
On Thu, 18 Jun 2015, Tom de Vries wrote:

 On 13/06/15 16:24, Tom de Vries wrote:
  Hi,
  
  this patch allows try_transform_to_exit_first_loop_alt to succeed when
  handling cases where the expression representing the number of
  iterations contains a cast.
  
  Currently, transform_to_exit_first_loop_alt testcase
  gfortran/parloops-exit-first-loop-alt.f95 will fail.
  
  The nit is _19, which is defined as follows:
  ...
  _20 = _6 + -1;
  _19 = (unsigned int) _20;
  ...
  And transform_to_exit_first_loop_alt currently only handles nits with
  defining stmt 'nit = x - 1', for which it finds alt_bound 'x'.
  
  The patch:
  - uses try_get_loop_niter to get nit as a nested tree expression
 '(unsigned int) (_6 + -1)'
  - strips the outer nops (assuming no change in value)
  - uses '(unsigned int)_6' as the alt_bound, and
  - gimplifies the expression.
  
  Bootstrapped and reg-tested on x86_64.
  
 
 Cleaned up whitespace in testcases.
 
 Committed to gomp-4_0-branch as atttached.
 
  OK for trunk?
  

I assume the above also handles the reverse, (int) (_6 + -1)?
In this case what happens if _6 == INT_MAX + 1?  nit is INT_MAX
but (int) _6 is INT_MIN.

Likewise what happens if _6 + -1 under-/overflows?

Richard.

 Thanks,
 - Tom
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: [i386, PATCH, 1/3] IA MCU psABI support: GCC changes.

2015-06-22 Thread Kirill Yukhin
Hello,
This patch introduces basic support into GCC.

Bootstrapped and regtested.

/
* configure.ac (ospace_frag): Enable for i?86*-*-elfiamcu
target.
* configure: Regenerate.
gcc/
* config.gcc: Support i[34567]86-*-elfiamcu target.
* config/i386/iamcu.h: New.
* config/i386/i386.opt: Add -miamcu.
* doc/invoke.texi: Document -miamcu.
* common/config/i386/i386-common.c  (ix86_handle_option): Turn
off x87/MMX/SSE/AVX codegen for -miamcu.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__iamcu/__iamcu__ for -miamcu.
* config/i386/i386.h (PREFERRED_STACK_BOUNDARY_DEFAULT): Set
to MIN_STACK_BOUNDARY if TARGET_IAMCU is true.
(BIGGEST_ALIGNMENT): Set to 32 if TARGET_IAMCU is true.
* config/i386/i386.c (ix86_option_override_internal):
  - Ignore and warn -mregparm for Intel MCU. Turn
on -mregparm=3 for Intel MCU by default.
  - Default long double to 64-bit for Intel MCU.
  - Turn on -freg-struct-return for Intel MCU.
  - Issue an error when -miamcu is used in 64-bit or x32 mode,
or if x87, MMX, SSE or AVX is turned on.
(function_arg_advance_32): Pass value whose
size is no larger than 8 bytes in registers for Intel MCU.
(function_arg_32): Likewise.
(ix86_return_in_memory): Return value whose size is no larger
than 8 bytes in registers for Intel MCU.
(iamcu_alignment): New function.
(ix86_data_alignment): Call iamcu_alignment if TARGET_IAMCU is
true.
(ix86_local_alignment): Don't increase
alignment for Intel MCU.
(x86_field_alignment): Return iamcu_alignment if TARGET_IAMCU is
true.

Is it OK for trunk?

--
Thanks, K

diff --git a/configure b/configure
index bced9de..82e45f3 100755
--- a/configure
+++ b/configure
@@ -6914,7 +6914,7 @@ case ${enable_target_optspace}:${target} in
   :d30v-*)
 ospace_frag=config/mt-d30v
 ;;
-  :m32r-* | :d10v-* | :fr30-*)
+  :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu)
 ospace_frag=config/mt-ospace
 ;;
   no:* | :*)
diff --git a/configure.ac b/configure.ac
index 7c06e6b..dc77a1b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2560,7 +2560,7 @@ case ${enable_target_optspace}:${target} in
   :d30v-*)
 ospace_frag=config/mt-d30v
 ;;
-  :m32r-* | :d10v-* | :fr30-*)
+  :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu)
 ospace_frag=config/mt-ospace
 ;;
   no:* | :*)
diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index 0f8c3e1..79b2472 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -223,7 +223,7 @@ along with GCC; see the file COPYING3.  If not see

 bool
 ix86_handle_option (struct gcc_options *opts,
-   struct gcc_options *opts_set ATTRIBUTE_UNUSED,
+   struct gcc_options *opts_set,
const struct cl_decoded_option *decoded,
location_t loc)
 {
@@ -232,6 +232,20 @@ ix86_handle_option (struct gcc_options *opts,

   switch (code)
 {
+case OPT_miamcu:
+  if (value)
+   {
+ /* Turn off x87/MMX/SSE/AVX codegen for -miamcu.  */
+ opts-x_target_flags = ~MASK_80387;
+ opts_set-x_target_flags |= MASK_80387;
+ opts-x_ix86_isa_flags = ~(OPTION_MASK_ISA_MMX_UNSET
+ | OPTION_MASK_ISA_SSE_UNSET);
+ opts-x_ix86_isa_flags_explicit |= (OPTION_MASK_ISA_MMX_UNSET
+ | OPTION_MASK_ISA_SSE_UNSET);
+
+   }
+  return true;
+
 case OPT_mmmx:
   if (value)
{
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 805638d..2b3af82 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1389,6 +1389,9 @@ x86_64-*-darwin*)
tmake_file=${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc
tm_file=${tm_file} ${cpu_type}/darwin64.h
;;
+i[34567]86-*-elfiamcu)
+   tm_file=${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/iamcu.h
+   ;;
 i[34567]86-*-elf*)
tm_file=${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/i386elf.h
;;
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 0228f4b..66f7e37 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -426,6 +426,11 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, __CLWB__);
   if (isa_flag  OPTION_MASK_ISA_MWAITX)
 def_or_undef (parse_in, __MWAITX__);
+  if (TARGET_IAMCU)
+{
+  def_or_undef (parse_in, __iamcu);
+  def_or_undef (parse_in, __iamcu__);
+}
 }

 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 24fccfc..26ffa67 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3433,6 +3433,10 @@ ix86_option_override_internal (bool main_args_p,

Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-06-22 Thread Marek Polacek
On Sun, Jun 21, 2015 at 05:05:14PM -0600, Martin Sebor wrote:
 Attached is a patch to reject C and C++ constructs that result
 in obtaining a pointer (or a reference in C++) to a builtin
 function.  These constructs are currently silently accepted by
 GCC and, in most cases(*), result in a linker error.  The patch
 brings GCC on par with Clang by rejecting all such constructs.
 
 Bootstrapped and tested on x86_64-unknown-linux-gnu.
 
It seems like this patch regresess pr59630.c testcase; I don't see
the testcase being addressed in this patch.

 2015-06-21  Martin Sebor  mse...@redhat.com
 
   PR c/66516
   * c/c-typeck.c (default_function_array_conversion): Reject
   converting a builtin function to a pointer.
   (parser_build_unary_op): Reject taking the address of a builtin
   function.
   * cp/call.c (convert_like_real): Reject converting a builtin function
   to a pointer.
   (initialize_reference): Reject initializing a reference with a builtin
   function.
   * cp/typeck.c (cp_build_addr_expr_strict): Reject taking the address
   of a builtin function.
   (build_reinterpret_cast_1): Reject casting a builtin function to
   a pointer.
   (convert_for_initialization): Reject initializing a pointer with
   the a builtin function.
 
Please no c/ and cp/ prefixes.

 +#include stdlib.h

As Joseph already pointed out, this is redundant.

 @@ -3384,7 +3392,14 @@ parser_build_unary_op (location_t loc, enum tree_code 
 code, struct c_expr arg)
result.original_code = code;
result.original_type = NULL;
 
 -  if (TREE_OVERFLOW_P (result.value)  !TREE_OVERFLOW_P (arg.value))
 +  if (code == ADDR_EXPR
 +   TREE_CODE (TREE_TYPE (arg.value)) == FUNCTION_TYPE
 +   DECL_IS_BUILTIN (arg.value))
 +{
 +  error_at (loc, taking address of a builtin function);
 +  result.value = error_mark_node;
 +}
 +  else if (TREE_OVERFLOW_P (result.value)  !TREE_OVERFLOW_P (arg.value))
  overflow_warning (loc, result.value);

It seems like you can move the new hunk a bit above so that we don't call
build_unary_op in a case when taking the address of a built-in function.

Unfortunately, it doesn't seem possible to do this error in build_unary_op
or in function_to_pointer_conversion :(.

Marek


Re: match.pd: Three new patterns

2015-06-22 Thread Richard Biener
On Fri, 19 Jun 2015, Marek Polacek wrote:

 On Thu, Jun 18, 2015 at 05:41:18PM +0200, Marek Polacek wrote:
   Again for symmetry, it seems like this comes with
   x + y - (x | y) - x  y
   x + y - (x  y) - x | y
   which seem fine when overflow is undefined or wraps, but not if for 
   instance
   it saturates.
  
  I'll leave this as a follow-up.
 
 ...here.
 
 Bootstrapped/regtested on x86_64-linux, ok for trunk?
 
 2015-06-19  Marek Polacek  pola...@redhat.com
 
   * match.pd (x + y - (x | y) - x  y,

) missing

   (x + y) - (x  y) - x | y): New patterns.
 
   * gcc.dg/fold-minus-4.c: New test.
   * gcc.dg/fold-minus-5.c: New test.
 
 diff --git gcc/match.pd gcc/match.pd
 index badb80a..61ff710 100644
 --- gcc/match.pd
 +++ gcc/match.pd
 @@ -343,6 +343,18 @@ along with GCC; see the file COPYING3.  If not see
   (plus:c (bit_and @0 @1) (bit_ior @0 @1))
   (plus @0 @1))
  
 +/* x + y - (x | y) - x  y */

Please wrap x + y in () here as well.

Ok with that changes.

Thanks,
Richard.

 +(simplify
 + (minus (plus @0 @1) (bit_ior @0 @1))
 + (if (!TYPE_OVERFLOW_SANITIZED (type)  !TYPE_SATURATING (type))
 +  (bit_and @0 @1)))
 +
 +/* (x + y) - (x  y) - x | y */
 +(simplify
 + (minus (plus @0 @1) (bit_and @0 @1))
 + (if (!TYPE_OVERFLOW_SANITIZED (type)  !TYPE_SATURATING (type))
 +  (bit_ior @0 @1)))
 +
  /* (x | y) - (x ^ y) - x  y */
  (simplify
   (minus (bit_ior @0 @1) (bit_xor @0 @1))
 diff --git gcc/testsuite/gcc.dg/fold-minus-4.c 
 gcc/testsuite/gcc.dg/fold-minus-4.c
 index e69de29..2d76b4f 100644
 --- gcc/testsuite/gcc.dg/fold-minus-4.c
 +++ gcc/testsuite/gcc.dg/fold-minus-4.c
 @@ -0,0 +1,37 @@
 +/* { dg-do compile } */
 +/* { dg-options -O -fdump-tree-cddce1 } */
 +
 +int
 +fn1 (int a, int b)
 +{
 +  int tem1 = a + b;
 +  int tem2 = a  b;
 +  return tem1 - tem2;
 +}
 +
 +int
 +fn2 (int a, int b)
 +{
 +  int tem1 = b + a;
 +  int tem2 = a  b;
 +  return tem1 - tem2;
 +}
 +
 +int
 +fn3 (int a, int b)
 +{
 +  int tem1 = a + b;
 +  int tem2 = b  a;
 +  return tem1 - tem2;
 +}
 +
 +int
 +fn4 (int a, int b)
 +{
 +  int tem1 = b + a;
 +  int tem2 = b  a;
 +  return tem1 - tem2;
 +}
 +
 +/* { dg-final { scan-tree-dump-notcddce1 } } */
 +/* { dg-final { scan-tree-dump-not  \\+  cddce1 } } */
 diff --git gcc/testsuite/gcc.dg/fold-minus-5.c 
 gcc/testsuite/gcc.dg/fold-minus-5.c
 index e69de29..a31e1cc 100644
 --- gcc/testsuite/gcc.dg/fold-minus-5.c
 +++ gcc/testsuite/gcc.dg/fold-minus-5.c
 @@ -0,0 +1,37 @@
 +/* { dg-do compile } */
 +/* { dg-options -O -fdump-tree-cddce1 } */
 +
 +int
 +fn1 (int a, int b)
 +{
 +  int tem1 = a + b;
 +  int tem2 = a | b;
 +  return tem1 - tem2;
 +}
 +
 +int
 +fn2 (int a, int b)
 +{
 +  int tem1 = b + a;
 +  int tem2 = a | b;
 +  return tem1 - tem2;
 +}
 +
 +int
 +fn3 (int a, int b)
 +{
 +  int tem1 = a + b;
 +  int tem2 = b | a;
 +  return tem1 - tem2;
 +}
 +
 +int
 +fn4 (int a, int b)
 +{
 +  int tem1 = b + a;
 +  int tem2 = b | a;
 +  return tem1 - tem2;
 +}
 +
 +/* { dg-final { scan-tree-dump-not  \\+  cddce1 } } */
 +/* { dg-final { scan-tree-dump-not  \\|  cddce1 } } */
 
   Marek
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


[C++ Patch] Use declspecs-locations[ds_virtual]

2015-06-22 Thread Paolo Carlini

Hi,

I think this also qualifies as obvious given the past work / discussion: 
use in one more place declspecs-locations to improve the location of 
the error message.


Thanks,
Paolo.


/cp
2015-06-22  Paolo Carlini  paolo.carl...@oracle.com

* decl.c (grokdeclarator): Use declspecs-locations[ds_virtual].

/testsuite
2015-06-22  Paolo Carlini  paolo.carl...@oracle.com

* g++.dg/inherit/pure1.C: Test location too.
Index: cp/decl.c
===
--- cp/decl.c   (revision 224724)
+++ cp/decl.c   (working copy)
@@ -9529,7 +9529,8 @@ grokdeclarator (const cp_declarator *declarator,
   if (virtualp
(current_class_name == NULL_TREE || decl_context != FIELD))
 {
-  error (%virtual% outside class declaration);
+  error_at (declspecs-locations[ds_virtual],
+   %virtual% outside class declaration);
   virtualp = 0;
 }
 
Index: testsuite/g++.dg/inherit/pure1.C
===
--- testsuite/g++.dg/inherit/pure1.C(revision 224724)
+++ testsuite/g++.dg/inherit/pure1.C(working copy)
@@ -3,8 +3,8 @@
 // { dg-do compile }
 
 void foo0() = 0;   // { dg-error like a variable }
-virtual void foo1() = 0;   // { dg-error outside class|variable }
-
+virtual void foo1() = 0;   // { dg-error 1:'virtual' outside class }
+// { dg-error like a variable  { target *-*-* } 6 }
 struct A
 {
   void foo2() = 0; // { dg-error non-virtual }


Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-06-22 Thread Joseph Myers
On Sun, 21 Jun 2015, Martin Sebor wrote:

 diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
 index 636e0bb..637a292 100644
 --- a/gcc/c/c-typeck.c
 +++ b/gcc/c/c-typeck.c
 @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3.  If not see
  #include cilk.h
  #include gomp-constants.h
 
 +#include stdlib.h

Included from system.h, don't include it explicitly in source files.

 +  if (DECL_IS_BUILTIN (exp.value))
 + {
 +   error_at (loc, converting builtin function to a pointer);

Say built-in (see codingconventions.html).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Jakub Jelinek
On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote:
 One problem is that (at least on the GPU hardware we've considered so
 far) we're somewhat constrained in how much control we have over how the
 underlying hardware executes code: it's possible to draw up a scheme
 where OpenACC source-level control-flow semantics are reflected directly
 in the PTX assembly output (e.g. to say all threads in a CTA/warp will
 be coherent after such-and-such a loop), and lowering OpenACC
 directives quite early seems to make that relatively tractable. (Even
 if the resulting code is relatively un-optimisable due to the abnormal
 edges inserted to make sure that the CFG doesn't become ill-formed.)
 
 If arbitrary optimisations are done between OMP-lowering time and
 somewhere around vectorisation (say), it's less clear if that
 correspondence can be maintained. Say if the code executed by half the
 threads in a warp becomes physically separated from the code executed
 by the other half of the threads in a warp due to some loop
 optimisation, we can no longer easily determine where that warp will
 reconverge, and certain other operations (relying on coherent warps --
 e.g. CTA synchronisation) become impossible. A similar issue exists for
 warps within a CTA.
 
 So, essentially -- I don't know how late loop lowering would interact
 with:
 
 (a) Maintaining a CFG that will work with PTX.
 
 (b) Predication for worker-single and/or vector-single modes
 (actually all currently-proposed schemes have problems with proper
 representation of data-dependencies for variables and
 compiler-generated temporaries between predicated regions.)

I don't understand why lowering the way you suggest helps here at all.
In the proposed scheme, you essentially have whole function
in e.g. worker-single or vector-single mode, which you need to be able to
handle properly in any case, because users can write such routines
themselves.  And then you can have a loop in such a function that
has some special attribute, a hint that it is desirable to vectorize it
(for PTX the PTX way) or use vector-single mode for it in a worker-single
function.  So, the special pass then of course needs to handle all the
needed broadcasting and reduction required to change the mode from e.g.
worker-single to vector-single, but the convergence points still would be
either on the boundary of such loops to be vectorized or parallelized, or
wherever else they appear in normal vector-single or worker-single functions
(around the calls to certainly calls?).

Jakub


Re: match.pd: Three new patterns (and some more)

2015-06-22 Thread Richard Biener
On Thu, 18 Jun 2015, Marek Polacek wrote:

 On Tue, Jun 16, 2015 at 03:35:15PM +0200, Richard Biener wrote:
  We already have
  
  /* (x  y) ^ (x | y) - x ^ y */
  (simplify
   (bit_xor:c (bit_and @0 @1) (bit_ior @0 @1))
   (bit_xor @0 @1))
  
  but of course with minus it doesn't commutate so it's hard to
  merge.
 
 Yeah :(.
  
+/* (x  y) + (x | y) - x + y */
   
   Again for symmetry, it seems like this comes with
   x + y - (x | y) - x  y
   x + y - (x  y) - x | y
   which seem fine when overflow is undefined or wraps, but not if for 
   instance
   it saturates.
  
  Can you adjust according to Marcs comment and re-submit?  If you like
  you can do it as followup as well and thus the original patch is ok
  as well.
 
 Sure.  This is a new version with some more patters.  Thanks.
 
 Bootstrapped/regtested on x86_64-linux, ok for trunk?

Ok.

Thanks,
Richard.

 2015-06-18  Marek Polacek  pola...@redhat.com
 
   * match.pd ((x ^ y) ^ (x | y) - x  y,
   (x  y) + (x ^ y) - x | y, (x  y) | (x ^ y) - x | y,
   (x  y) ^ (x ^ y) - x | y, (x  y) + (x | y) - x + y,
   (x | y) - (x ^ y) - x  y, (x | y) - (x  y) - x ^ y): New patterns.
 
   * gcc.dg/fold-ior-1.c: New test.
   * gcc.dg/fold-minus-2.c: New test.
   * gcc.dg/fold-minus-3.c: New test.
   * gcc.dg/fold-plus-1.c: New test.
   * gcc.dg/fold-plus-2.c: New test.
   * gcc.dg/fold-xor-4.c: New test.
   * gcc.dg/fold-xor-5.c: New test.
 
 diff --git gcc/match.pd gcc/match.pd
 index 1ab2b1c..badb80a 100644
 --- gcc/match.pd
 +++ gcc/match.pd
 @@ -325,6 +325,34 @@ along with GCC; see the file COPYING3.  If not see
   (bit_xor:c (bit_and @0 @1) (bit_ior @0 @1))
   (bit_xor @0 @1))
  
 +/* (x ^ y) ^ (x | y) - x  y */
 +(simplify
 + (bit_xor:c (bit_xor @0 @1) (bit_ior @0 @1))
 + (bit_and @0 @1))
 +
 +/* (x  y) + (x ^ y) - x | y */
 +/* (x  y) | (x ^ y) - x | y */
 +/* (x  y) ^ (x ^ y) - x | y */
 +(for op (plus bit_ior bit_xor)
 + (simplify
 +  (op:c (bit_and @0 @1) (bit_xor @0 @1))
 +  (bit_ior @0 @1)))
 +
 +/* (x  y) + (x | y) - x + y */
 +(simplify
 + (plus:c (bit_and @0 @1) (bit_ior @0 @1))
 + (plus @0 @1))
 +
 +/* (x | y) - (x ^ y) - x  y */
 +(simplify
 + (minus (bit_ior @0 @1) (bit_xor @0 @1))
 + (bit_and @0 @1))
 +
 +/* (x | y) - (x  y) - x ^ y */
 +(simplify
 + (minus (bit_ior @0 @1) (bit_and @0 @1))
 + (bit_xor @0 @1))
 +
  (simplify
   (abs (negate @0))
   (abs @0))
 diff --git gcc/testsuite/gcc.dg/fold-ior-1.c gcc/testsuite/gcc.dg/fold-ior-1.c
 index e69de29..0358eb5 100644
 --- gcc/testsuite/gcc.dg/fold-ior-1.c
 +++ gcc/testsuite/gcc.dg/fold-ior-1.c
 @@ -0,0 +1,69 @@
 +/* { dg-do compile } */
 +/* { dg-options -O -fdump-tree-cddce1 } */
 +
 +int
 +fn1 (int a, int b)
 +{
 +  int tem1 = a  b;
 +  int tem2 = a ^ b;
 +  return tem1 | tem2;
 +}
 +
 +int
 +fn2 (int a, int b)
 +{
 +  int tem1 = b  a;
 +  int tem2 = a ^ b;
 +  return tem1 | tem2;
 +}
 +
 +int
 +fn3 (int a, int b)
 +{
 +  int tem1 = a  b;
 +  int tem2 = b ^ a;
 +  return tem1 | tem2;
 +}
 +
 +int
 +fn4 (int a, int b)
 +{
 +  int tem1 = b  a;
 +  int tem2 = b ^ a;
 +  return tem1 | tem2;
 +}
 +
 +int
 +fn5 (int a, int b)
 +{
 +  int tem1 = a ^ b;
 +  int tem2 = a  b;
 +  return tem1 | tem2;
 +}
 +
 +int
 +fn6 (int a, int b)
 +{
 +  int tem1 = b ^ a;
 +  int tem2 = a  b;
 +  return tem1 | tem2;
 +}
 +
 +int
 +fn7 (int a, int b)
 +{
 +  int tem1 = a ^ b;
 +  int tem2 = b  a;
 +  return tem1 | tem2;
 +}
 +
 +int
 +fn8 (int a, int b)
 +{
 +  int tem1 = b ^ a;
 +  int tem2 = b  a;
 +  return tem1 | tem2;
 +}
 +
 +/* { dg-final { scan-tree-dump-notcddce1 } } */
 +/* { dg-final { scan-tree-dump-not  \\^  cddce1 } } */
 diff --git gcc/testsuite/gcc.dg/fold-minus-2.c 
 gcc/testsuite/gcc.dg/fold-minus-2.c
 index e69de29..6501f2f 100644
 --- gcc/testsuite/gcc.dg/fold-minus-2.c
 +++ gcc/testsuite/gcc.dg/fold-minus-2.c
 @@ -0,0 +1,37 @@
 +/* { dg-do compile } */
 +/* { dg-options -O -fdump-tree-cddce1 } */
 +
 +int
 +fn1 (int a, int b)
 +{
 +  int tem1 = a | b;
 +  int tem2 = a ^ b;
 +  return tem1 - tem2;
 +}
 +
 +int
 +fn2 (int a, int b)
 +{
 +  int tem1 = b | a;
 +  int tem2 = a ^ b;
 +  return tem1 - tem2;
 +}
 +
 +int
 +fn3 (int a, int b)
 +{
 +  int tem1 = a | b;
 +  int tem2 = b ^ a;
 +  return tem1 - tem2;
 +}
 +
 +int
 +fn4 (int a, int b)
 +{
 +  int tem1 = b | a;
 +  int tem2 = b ^ a;
 +  return tem1 - tem2;
 +}
 +
 +/* { dg-final { scan-tree-dump-not  \\^  cddce1 } } */
 +/* { dg-final { scan-tree-dump-not  \\|  cddce1 } } */
 diff --git gcc/testsuite/gcc.dg/fold-minus-3.c 
 gcc/testsuite/gcc.dg/fold-minus-3.c
 index e69de29..e7adce6 100644
 --- gcc/testsuite/gcc.dg/fold-minus-3.c
 +++ gcc/testsuite/gcc.dg/fold-minus-3.c
 @@ -0,0 +1,37 @@
 +/* { dg-do compile } */
 +/* { dg-options -O -fdump-tree-cddce1 } */
 +
 +int
 +fn1 (int a, int b)
 +{
 +  int tem1 = a | b;
 +  int tem2 = a  b;
 +  return tem1 - tem2;
 +}
 +
 +int
 +fn2 (int a, int b)
 +{
 +  int tem1 = b | a;
 +  int tem2 = a  b;
 +  return tem1 - tem2;
 +}
 +
 +int
 +fn3 (int a, int b)
 

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Julian Brown
On Fri, 19 Jun 2015 14:25:57 +0200
Jakub Jelinek ja...@redhat.com wrote:

 On Fri, Jun 19, 2015 at 11:53:14AM +0200, Bernd Schmidt wrote:
  On 05/28/2015 05:08 PM, Jakub Jelinek wrote:
  
  I understand it is more work, I'd just like to ask that when
  designing stuff for the OpenACC offloading you (plural) try to
  take the other offloading devices and host fallback into account.
  
  The problem is that many of the transformations we need to do are
  really GPU specific, and with the current structure of
  omplow/ompexp they are being done in the host compiler. The
  offloading scheme we decided on does not give us the means to write
  out multiple versions of an offloaded function where each target
  gets a different one. For that reason I think we should postpone
  these lowering decisions until we're in the accel compiler, where
  they could be controlled by target hooks, and over the last two
  weeks I've been doing some experiments to see how that could be
  achieved.

 I wonder why struct loop flags and other info together with function
 attributes and/or cgraph flags and other info aren't sufficient for
 the OpenACC needs.
 Have you or Thomas looked what we're doing for OpenMP simd / Cilk+
 simd?
 
 Why can't the execution model (normal, vector-single and
 worker-single) be simply attributes on functions or cgraph node flags
 and the kind of #acc loop simply be flags on struct loop, like
 already OpenMP simd / Cilk+ simd is?

One problem is that (at least on the GPU hardware we've considered so
far) we're somewhat constrained in how much control we have over how the
underlying hardware executes code: it's possible to draw up a scheme
where OpenACC source-level control-flow semantics are reflected directly
in the PTX assembly output (e.g. to say all threads in a CTA/warp will
be coherent after such-and-such a loop), and lowering OpenACC
directives quite early seems to make that relatively tractable. (Even
if the resulting code is relatively un-optimisable due to the abnormal
edges inserted to make sure that the CFG doesn't become ill-formed.)

If arbitrary optimisations are done between OMP-lowering time and
somewhere around vectorisation (say), it's less clear if that
correspondence can be maintained. Say if the code executed by half the
threads in a warp becomes physically separated from the code executed
by the other half of the threads in a warp due to some loop
optimisation, we can no longer easily determine where that warp will
reconverge, and certain other operations (relying on coherent warps --
e.g. CTA synchronisation) become impossible. A similar issue exists for
warps within a CTA.

So, essentially -- I don't know how late loop lowering would interact
with:

(a) Maintaining a CFG that will work with PTX.

(b) Predication for worker-single and/or vector-single modes
(actually all currently-proposed schemes have problems with proper
representation of data-dependencies for variables and
compiler-generated temporaries between predicated regions.)

Julian


  1   2   >