Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-12 Thread Bin.Cheng
On Fri, Nov 13, 2015 at 2:13 PM, Jeff Law  wrote:
> On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:
>
>>
>> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>>
>>
>>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 2001
>> From: Ajit Kumar Agarwal
>> Date: Wed, 7 Oct 2015 20:50:40 +0200
>> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used
>> inside
>>   loop for LICM and IVOPTS.
>>
>> Changes are done in the Loop Invariant(LICM) at RTL level and also the
>> Induction variable optimization based on SSA representation. The current
>> logic used in LICM for register used inside the loops is changed. The
>> Live Out of the loop latch node and the Live in of the destination of
>> the exit nodes is used to set the Loops Liveness at the exit of the Loop.
>> The register used is the number of live variables at the exit of the
>> Loop calculated above.
>>
>> For Induction variable optimization on tree SSA representation, the
>> register
>> used logic is based on the number of phi nodes at the loop header to
>> represent
>> the liveness at the loop.  Current Logic used only the number of phi nodes
>> at
>> the loop header.  Changes are made to represent the phi operands also live
>> at
>> the loop. Thus number of phi operands also gets incremented in the number
>> of
>> registers used.
>>
>> ChangeLog:
>> 2015-10-09  Ajit Agarwal
>>
>> * loop-invariant.c (compute_loop_liveness): New.
>> (determine_regs_used): New.
>> (find_invariants_to_move): Use of determine_regs_used.
>> * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>> arguments for register used.
>
> I think Bin rejected the tree-ssa-loop-ivopts change.  However, the
> loop-invariant change is still pending, right?
Ah, reject is a strong word, I am just being dumb and don't understand
why it's a general better estimation yet.
Maybe Richard have some inputs here?

Thanks,
bin
>
>
>>
>> Signed-off-by:Ajit agarwalajit...@xilinx.com
>> ---
>>   gcc/loop-invariant.c   | 72
>> +-
>>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>>   2 files changed, 60 insertions(+), 16 deletions(-)
>>
>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>> index 52c8ae8..e4291c9 100644
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>>   }
>>   }
>>
>> +static int
>> +determine_regs_used()
>> +{
>> +  unsigned int j;
>> +  unsigned int reg_used = 2;
>> +  bitmap_iterator bi;
>> +
>> +  EXECUTE_IF_SET_IN_BITMAP (_DATA (curr_loop)->regs_live, 0, j, bi)
>> +(reg_used) ++;
>> +
>> +  return reg_used;
>> +}
>
> Isn't this just bitmap_count_bits (regs_live) + 2?
>
>
>> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>>   }
>>   }
>>
>> -
>> +static void
>> +calculate_loop_liveness (void)
>
> Needs a function comment.
>
>
>> +{
>> +  basic_block bb;
>> +  struct loop *loop;
>>
>> -/* Move the invariants out of the loops.  */
>> +  FOR_EACH_LOOP (loop, 0)
>> +if (loop->aux == NULL)
>> +  {
>> +loop->aux = xcalloc (1, sizeof (struct loop_data));
>> +bitmap_initialize (_DATA (loop)->regs_live, _obstack);
>> + }
>> +
>> +  FOR_EACH_BB_FN (bb, cfun)
>
> Why loop over blocks here?  Why not just iterate through all the loops in
> the loop structure.  Order isn't particularly important AFAICT for this
> code.
>
>
>
>> +   {
>> + int  i;
>> + edge e;
>> + vec edges;
>> + edges = get_loop_exit_edges (loop);
>> + FOR_EACH_VEC_ELT (edges, i, e)
>> + {
>> +   bitmap_ior_into (_DATA (loop)->regs_live,
>> DF_LR_OUT(e->src));
>> +   bitmap_ior_into (_DATA (loop)->regs_live,
>> DF_LR_IN(e->dest));
>
> Space before the open-paren in the previous two lines
> DF_LR_OUT (e->src) and FD_LR_INT (e->dest))
>
>
>> + }
>> +  }
>> +  }
>> +}
>> +
>> +/* Move the invariants  ut of the loops.  */
>
> Looks like you introduced a typo.
>
> I'd like to see testcases which show the change in # regs used computation
> helping generate better code.
>
> And  I'd also like to see some background information on why you think this
> is a more accurate measure for the number of registers used in the loop.
> regs_used AFAICT is supposed to be an estimate of the registers live around
> the loop.  So ISTM that you get that value by live-out set on the backedge
> of the loop.  I guess you get somethign similar by looking at the exit
> edge's source block's live-out set.  But I don't see any value  in including
> stuff live at the block outside the loop.
>
> It also seems fairly non-intuitive.  Get the block's latch and use its
> live-out set.  That seems more intuitive.
>


Re: [PATCH] gcc.c: new macro POST_LINK_SPECS to be able to add additional steps after linking

2015-11-12 Thread Jeff Law

On 11/10/2015 11:16 AM, Andris Pavenis wrote:

One may need to execute extra steps after linking program. This is required
for example for DJGPP to run stubify.exe on file generated by linker.

The only way how to achieve was to use LINK_COMMAND_SPEC. It would be
much easier
and less error prone to use new macro POST_LINK_SPEC introduced in this
patch.

Andris

ChangeLog entry

2015 Nov 10 Andris Pavenis 

 * gcc.c: new macro POST_LINK_SPEC
 * doc/tm.texi.in: document POST_LINK_SPEC
 * doc/tm.texi: regenerate
Instaled with a better ChangeLog.  Presumably there's going to be a 
followup to simplify a bunch of hte djgpp configuration?


jeff



[PATCH, i386]: Check natural alignment of the operand in misaligned_operand predicate

2015-11-12 Thread Uros Bizjak
Hello!

We have to check natural alignment of the operand in
misaligned_operand predicate. This predicate is used to check SSE
memory operands for alignment, when movaps instead of movups can be
used. This change makes predicate independent of BIGGEST_ALIGNMENT
setting.

2015-11-13  Uros Bizjak  

* config/i386/predicates.md (misaligned_operand): Return true if
operand is aligned to less than its natural alignmnet.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32},
committed to mainline SVN.

Uros.

Index: config/i386/predicates.md
===
--- config/i386/predicates.md   (revision 230213)
+++ config/i386/predicates.md   (working copy)
@@ -1364,10 +1364,11 @@
 (define_predicate "absneg_operator"
   (match_code "abs,neg"))

-;; Return true if OP is misaligned memory operand
+;; Return true if OP is a memory operand, aligned to
+;; less than its natural alignment.
 (define_predicate "misaligned_operand"
   (and (match_code "mem")
-   (match_test "MEM_ALIGN (op) < GET_MODE_ALIGNMENT (mode)")))
+   (match_test "MEM_ALIGN (op) < GET_MODE_BITSIZE (mode)")))

 ;; Return true if OP is a emms operation, known to be a PARALLEL.


Re: [PATCH] gcc.c: new macro POST_LINK_SPECS to be able to add additional steps after linking

2015-11-12 Thread Jeff Law

On 11/10/2015 09:30 PM, Andris Pavenis wrote:

On 11/10/2015 11:20 PM, Jeff Law wrote:

On 11/10/2015 11:16 AM, Andris Pavenis wrote:

One may need to execute extra steps after linking program. This is
required
for example for DJGPP to run stubify.exe on file generated by linker.

The only way how to achieve was to use LINK_COMMAND_SPEC. It would be
much easier
and less error prone to use new macro POST_LINK_SPEC introduced in this
patch.

Andris

ChangeLog entry

2015 Nov 10 Andris Pavenis 

 * gcc.c: new macro POST_LINK_SPEC
 * doc/tm.texi.in: document POST_LINK_SPEC
 * doc/tm.texi: regenerate


Can you also include the changes to djgpp.h which exploit this
capability?

Jeff


OK. I'm only sending changes for djgpp.h. There are required changes to
other files not included, so djgpp.h patch currently
to illustrate use of POST_LINK_SPEC only not committing.
Thanks.  I think the key is that y'all have to essentially duplicate and 
hack of LINK_COMMAND_SPEC and this change allows the djgpp port to avoid 
that problem.


Thanks again,
Jeff




Re: [PATCH 0/2] Levenshtein-based suggestions (v3)

2015-11-12 Thread Marek Polacek
Probably coming too late, sorry.

On Thu, Nov 12, 2015 at 09:08:36PM -0500, David Malcolm wrote:
> index 4335a87..eb4e1fc 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "c-family/c-ubsan.h"
>  #include "cilk.h"
>  #include "gomp-constants.h"
> +#include "spellcheck.h"
>  
>  /* Possible cases of implicit bad conversions.  Used to select
> diagnostic messages in convert_for_assignment.  */
> @@ -2242,6 +2243,72 @@ lookup_field (tree type, tree component)
>return tree_cons (NULL_TREE, field, NULL_TREE);
>  }
>  
> +/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
> +
> +static void
> +lookup_field_fuzzy_find_candidates (tree type, tree component,
> + vec *candidates)
> +{
> +  tree field;
> +  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))

I'd prefer declaring field in the for loop, so
  for (tree field = TYPE_FIELDS...

> +   && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> +   || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))

This is RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)).

> + {
> +   lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
> +   component,
> +   candidates);
> + }

Lose the brackets around a single statement.

> +  if (DECL_NAME (field))
> + candidates->safe_push (DECL_NAME (field));
> +}
> +}
> +
> +/* Like "lookup_field", but find the closest matching IDENTIFIER_NODE,
> +   rather than returning a TREE_LIST for an exact match.  */
> +
> +static tree
> +lookup_field_fuzzy (tree type, tree component)
> +{
> +  gcc_assert (TREE_CODE (component) == IDENTIFIER_NODE);
> +
> +  /* First, gather a list of candidates.  */
> +  auto_vec  candidates;
> +
> +  lookup_field_fuzzy_find_candidates (type, component,
> +   );
> +
> +  /* Now determine which is closest.  */
> +  int i;
> +  tree identifier;
> +  tree best_identifier = NULL;

NULL_TREE

> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +  FOR_EACH_VEC_ELT (candidates, i, identifier)
> +{
> +  gcc_assert (TREE_CODE (identifier) == IDENTIFIER_NODE);
> +  edit_distance_t dist = levenshtein_distance (component, identifier);
> +  if (dist < best_distance)
> + {
> +   best_distance = dist;
> +   best_identifier = identifier;
> + }
> +}
> +
> +  /* If more than half of the letters were misspelled, the suggestion is
> + likely to be meaningless.  */
> +  if (best_identifier)
> +{
> +  unsigned int cutoff = MAX (IDENTIFIER_LENGTH (component),
> +  IDENTIFIER_LENGTH (best_identifier)) / 2;
> +  if (best_distance > cutoff)
> + return NULL;

NULL_TREE

> +/* The Levenshtein distance is an "edit-distance": the minimal
> +   number of one-character insertions, removals or substitutions
> +   that are needed to change one string into another.
> +
> +   This implementation uses the Wagner-Fischer algorithm.  */
> +
> +static edit_distance_t
> +levenshtein_distance (const char *s, int len_s,
> +   const char *t, int len_t)
> +{
> +  const bool debug = false;
> +
> +  if (debug)
> +{
> +  printf ("s: \"%s\" (len_s=%i)\n", s, len_s);
> +  printf ("t: \"%s\" (len_t=%i)\n", t, len_t);
> +}

Did you leave this debug stuff here intentionally?

> +  /* Build the rest of the row by considering neighbours to
> +  the north, west and northwest.  */
> +  for (int j = 0; j < len_s; j++)
> + {
> +   edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
> +   edit_distance_t deletion = v1[j] + 1;
> +   edit_distance_t insertion= v0[j + 1] + 1;

The formatting doesn't look right here.

Marek


Re: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-12 Thread Jeff Law

On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:



0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch


 From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 2001
From: Ajit Kumar Agarwal
Date: Wed, 7 Oct 2015 20:50:40 +0200
Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
  loop for LICM and IVOPTS.

Changes are done in the Loop Invariant(LICM) at RTL level and also the
Induction variable optimization based on SSA representation. The current
logic used in LICM for register used inside the loops is changed. The
Live Out of the loop latch node and the Live in of the destination of
the exit nodes is used to set the Loops Liveness at the exit of the Loop.
The register used is the number of live variables at the exit of the
Loop calculated above.

For Induction variable optimization on tree SSA representation, the register
used logic is based on the number of phi nodes at the loop header to represent
the liveness at the loop.  Current Logic used only the number of phi nodes at
the loop header.  Changes are made to represent the phi operands also live at
the loop. Thus number of phi operands also gets incremented in the number of
registers used.

ChangeLog:
2015-10-09  Ajit Agarwal

* loop-invariant.c (compute_loop_liveness): New.
(determine_regs_used): New.
(find_invariants_to_move): Use of determine_regs_used.
* tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
arguments for register used.
I think Bin rejected the tree-ssa-loop-ivopts change.  However, the 
loop-invariant change is still pending, right?





Signed-off-by:Ajit agarwalajit...@xilinx.com
---
  gcc/loop-invariant.c   | 72 +-
  gcc/tree-ssa-loop-ivopts.c |  4 +--
  2 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 52c8ae8..e4291c9 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
  }
  }

+static int
+determine_regs_used()
+{
+  unsigned int j;
+  unsigned int reg_used = 2;
+  bitmap_iterator bi;
+
+  EXECUTE_IF_SET_IN_BITMAP (_DATA (curr_loop)->regs_live, 0, j, bi)
+(reg_used) ++;
+
+  return reg_used;
+}

Isn't this just bitmap_count_bits (regs_live) + 2?



@@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
  }
  }

-
+static void
+calculate_loop_liveness (void)

Needs a function comment.



+{
+  basic_block bb;
+  struct loop *loop;

-/* Move the invariants out of the loops.  */
+  FOR_EACH_LOOP (loop, 0)
+if (loop->aux == NULL)
+  {
+loop->aux = xcalloc (1, sizeof (struct loop_data));
+bitmap_initialize (_DATA (loop)->regs_live, _obstack);
+ }
+
+  FOR_EACH_BB_FN (bb, cfun)
Why loop over blocks here?  Why not just iterate through all the loops 
in the loop structure.  Order isn't particularly important AFAICT for 
this code.





+   {
+ int  i;
+ edge e;
+ vec edges;
+ edges = get_loop_exit_edges (loop);
+ FOR_EACH_VEC_ELT (edges, i, e)
+ {
+   bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_OUT(e->src));
+   bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_IN(e->dest));

Space before the open-paren in the previous two lines
DF_LR_OUT (e->src) and FD_LR_INT (e->dest))



+ }
+  }
+  }
+}
+
+/* Move the invariants  ut of the loops.  */

Looks like you introduced a typo.

I'd like to see testcases which show the change in # regs used 
computation helping generate better code.


And  I'd also like to see some background information on why you think 
this is a more accurate measure for the number of registers used in the 
loop.  regs_used AFAICT is supposed to be an estimate of the registers 
live around the loop.  So ISTM that you get that value by live-out set 
on the backedge of the loop.  I guess you get somethign similar by 
looking at the exit edge's source block's live-out set.  But I don't see 
any value  in including stuff live at the block outside the loop.


It also seems fairly non-intuitive.  Get the block's latch and use its 
live-out set.  That seems more intuitive.




Re: [PR64164] drop copyrename, integrate into expand

2015-11-12 Thread Jeff Law

On 11/11/2015 11:10 AM, Alexandre Oliva wrote:

On Nov 10, 2015, Jeff Law  wrote:


* function.c (assign_parm_setup_block): Right-shift
upward-padded big-endian args when bypassing the stack slot.

Don't you need to check the value of BLOCK_REG_PADDING at runtime?
The padding is essentially allowed to vary.


Well, yeah, it's the result of BLOCK_REG_PADDING that tells whether
upward-padding occurred and shifting is required.


If you  look at the other places where BLOCK_REG_PADDING is used, it's
checked in a #ifdef, then again inside a if conditional.


That's what I do in the patch too.
?  I don't see the runtime check in your patch.  I see a couple 
gcc_asserts, but no runtime check of BLOCK_REG_PADDING.




That said, the initial conditions in the if/else-if/else chain for the
no-larger-than-a-word case cover all of the non-BLOCK_REG_PADDING cases
correctly, so that, if BLOCK_REG_PADDING is not defined, we can just
skip the !MEM_P block altogether.  That's also the reason why we can go
straight to shifting when we get there.

I tried to document my reasoning in the comments, but maybe it was still
too obscure?
Certainly seems that way.  Is it your assertion that the new code is 
what we want regardless of the *value* of REG_BLOCK_PADDING? 
Essentially meaning the check in the IF is covering both cases?


What am I missing here?

Jeff



[Ada] Spurious visibility error with derivation and incomplete declaration

2015-11-12 Thread Arnaud Charlet
This patch fixes a spurious visibility error on an operator of a derived type,
when the parent type is declared in another unit, and has an incomplete type
declaration. The primitive operations of the derived types are to be found in
the scope of its base type, and not in that of its ancestor.

The following must compile quietly:

   gnatmake -q operator_use

---
with CALC_PACKAGE;
with STORE_PACKAGE;
with ADA.TEXT_IO;
use type CALC_PACKAGE.RECORD_TYPE;
procedure OPERATOR_USE is

  B : CALC_PACKAGE.RECORD_TYPE;

begin
  B := CALC_PACKAGE.GET_VALUE;

  if STORE_PACKAGE.STORE(4).MY_ACCESS.all.MY_VALUE > B
  then
ADA.TEXT_IO.PUT_LINE("TRUE");
  end if;

  if CALC_PACKAGE.">"(STORE_PACKAGE.STORE(4).MY_ACCESS.all.MY_VALUE, B)
  then
ADA.TEXT_IO.PUT_LINE("TRUE again");
  end if;

end OPERATOR_USE;
---
with TYPE_PACKAGE;
package CALC_PACKAGE is

  type RECORD_TYPE is new TYPE_PACKAGE.BASE_TYPE;

  function ">"
(A : in RECORD_TYPE;
 B : in RECORD_TYPE)
return BOOLEAN;

  function GET_VALUE
return RECORD_TYPE;

end CALC_PACKAGE;
---
package body CALC_PACKAGE is
  C_VAL : INTEGER := 0;

  function GET_VALUE
return RECORD_TYPE is
  begin
C_VAL := C_VAL + 1;
return (X => C_VAL,
Y => 0);
  end GET_VALUE;

  function ">"
(A : in RECORD_TYPE;
 B : in RECORD_TYPE)
 return BOOLEAN is
  begin
return A.X > B.Y;
  end ">";
end CALC_PACKAGE;
---
with CALC_PACKAGE;
package STORE_PACKAGE is

  type INDEX_TYPE is range 1 .. 10;

  type RECORD_TYPE is
record
  MY_VALUE : CALC_PACKAGE.RECORD_TYPE;
end record;

  type RECORD_ACCESS_TYPE is access all RECORD_TYPE;

  type STORE_TYPE is
record
  MY_ACCESS : RECORD_ACCESS_TYPE;
end record;

  type ARRAY_TYPE is
array (INDEX_TYPE)
   of STORE_TYPE;

  STORE : ARRAY_TYPE := (others => (my_access => new Record_Type));
end STORE_PACKAGE;
---
package TYPE_PACKAGE is
   type BASE_TYPE;

  type BASE_TYPE is
record
  X : INTEGER := 1;
  Y : INTEGER := 0;
end record;
end TYPE_PACKAGE;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-12  Ed Schonberg  

* sem_util.adb (Collect_Primitive_Operations): If the type is
derived from a type declared elsewhere that has an incomplete
type declaration, the primitives are found in the scope of the
type nat that of its ancestor.

Index: sem_util.adb
===
--- sem_util.adb(revision 230239)
+++ sem_util.adb(working copy)
@@ -4223,6 +4223,14 @@
  then
 Id := Defining_Entity (Incomplete_View (Parent (B_Type)));
 
+--  If T is a derived from a type with an incomplete view declared
+--  elsewhere, that incomplete view is irrelevant, we want the
+--  operations in the scope of T.
+
+if Scope (Id) /= Scope (B_Type) then
+   Id := Next_Entity (B_Type);
+end if;
+
  else
 Id := Next_Entity (B_Type);
  end if;


Re: [v3 PATCH] Implement D0013R2, logical type traits.

2015-11-12 Thread Jonathan Wakely

On 12/11/15 00:46 +0200, Ville Voutilainen wrote:

On 12 November 2015 at 00:18, Jonathan Wakely  wrote:

So I think we want to define them again, independently, in
, even though it might lead to ambiguities


Here. Tested again on Linux-PPC64.

2015-11-11  Ville Voutilainen  

   Implement D0013R2, logical type traits.

   /libstdc++-v3
   * include/experimental/type_traits (conjunction, disjunction,
   negation, conjunction_v, disjunction_v, negation_v): New.
   * include/std/type_traits (conjunction, disjunction, negation):
   Likewise.
   * testsuite/20_util/declval/requirements/1_neg.cc: Adjust.
   * testsuite/20_util/make_signed/requirements/typedefs_neg.cc: Likewise.
   * testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc:
   Likewise.
   * testsuite/experimental/type_traits/value.cc: Likewise.
   * testsuite/20_util/logical_traits/requirements/explicit_instantiation.cc:
New.
   * testsuite/20_util/logical_traits/requirements/typedefs.cc: Likewise.
   * testsuite/20_util/logical_traits/value.cc: Likewise.

   /testsuite
   * g++.dg/cpp0x/Wattributes1.C: Adjust.


OK for trunk, thanks.




[PATCH] nvptx: implement automatic storage in custom stacks

2015-11-12 Thread Alexander Monakov
Hello,

I'm proposing the following patch as a step towards resolving the issue with
inaccessibility of stack storage (.local memory) in PTX to other threads than
the one using that stack.  The idea is to have preallocated stacks, and have
__nvptx_stacks[] array in shared memory hold current stack pointers.  Each
thread is maintaining __nvptx_stacks[tid.y] as its stack pointer, thus for
OpenMP the intent is to preallocate on a per-warp basis (not per-thread).
For OpenMP SIMD regions we'll have to ensure that conflicting accesses are not
introduced.

I've exposed a new command-line option -msoft-stack to ease testing, but for
OpenMP we'll have to automatically flip it based on function attributes.
Right now it's not easy because OpenMP and OpenACC both use "omp declare
target".  Jakub, I seem to recall a discussion about OpenACC changing to use a
separate attribute, but I cannot find it now.  Any advice here?

This approach also allows to implement alloca.  However, to drop
alloca-avoiding changes in libgomp we'd have to selectively enable
-msoft-stack there, only for functions that OpenACC wouldn't use.

I've run it through make -k check-c regtesting.  These are new fails, all
mysterious:

+FAIL: gcc.c-torture/execute/20090113-2.c   -O[123s]  execution test
Execution failure with invalid memory access.

+FAIL: gcc.c-torture/execute/20090113-3.c   -O[123s]  execution test
Times out (looping infinitely).

The above two I had difficulties investigating due to cuda-gdb 7.0 not showing
dissassembly for the misbehaving function.

+FAIL: gcc.c-torture/execute/loop-15.c   -O2  execution test
Rather surprising and unclear failure due to branch stack overflow.

There are also tests that now pass:
+PASS: gcc.c-torture/execute/20020529-1.c   -O0  execution test
Used to fail with invalid memory access.

+PASS: gcc.dg/sibcall-9.c execution test
(not meaningful on NVPTX)

+PASS: gcc.dg/torture/pr54261-1.c   -O[0123s]  execution test
Atomic modification to stack variables now works.

gcc/
* config/nvptx/nvptx.c (need_softstack_decl): Declare.
(nvptx_declare_function_name): Handle TARGET_SOFT_STACK.
(nvptx_output_return): Restore stack pointer if needed.
(nvptx_file_end): Emit declaration of __nvptx_stacks.
* config/nvptx/nvptx.opt (msoft-stack): New option.
* doc/invoke.texi (-msoft-stack): Document.

libgcc/
* config/nvptx/crt0.s (__nvptx_stacks): Define.
(%__softstack): Define 128 KiB stack for -msoft-stack.
(__main): Setup __nvptx_stacks.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 0204ad3..df915b9 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -114,6 +114,9 @@ static unsigned worker_red_align;
 #define worker_red_name "__worker_red"
 static GTY(()) rtx worker_red_sym;
 
+/* True if any function references __nvptx_stacks.  */
+static bool need_softstack_decl;
+
 /* Allocate a new, cleared machine_function structure.  */
 
 static struct machine_function *
@@ -689,15 +692,46 @@ nvptx_declare_function_name (FILE *file, const char 
*name, const_tree decl)
 
   /* Declare a local variable for the frame.  */
   sz = get_frame_size ();
-  if (sz > 0 || cfun->machine->has_call_with_sc)
+  if (sz == 0 && cfun->machine->has_call_with_sc)
+sz = 1;
+  if (sz > 0)
 {
   int alignment = crtl->stack_alignment_needed / BITS_PER_UNIT;
 
-  fprintf (file, "\t.reg.u%d %%frame;\n"
-  "\t.local.align %d .b8 %%farray[" HOST_WIDE_INT_PRINT_DEC"];\n",
-  BITS_PER_WORD, alignment, sz == 0 ? 1 : sz);
-  fprintf (file, "\tcvta.local.u%d %%frame, %%farray;\n",
-  BITS_PER_WORD);
+  fprintf (file, "\t.reg.u%d %%frame;\n", BITS_PER_WORD);
+  if (TARGET_SOFT_STACK)
+   {
+ /* Maintain 64-bit stack alignment.  */
+ int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
+ sz = (sz + keep_align - 1) & ~(keep_align - 1);
+ int bits = BITS_PER_WORD;
+ fprintf (file, "\t.reg.u32 %%fstmp0;\n");
+ fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits);
+ fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits);
+ fprintf (file, "\tmov.u32 %%fstmp0, %%tid.y;\n");
+ fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n",
+  bits == 64 ? ".wide" : "", bits);
+ fprintf (file, "\tmov.u%d %%fstmp2, __nvptx_stacks;\n", bits);
+ /* fstmp2 = &__nvptx_stacks[tid.y];  */
+ fprintf (file, "\tadd.u%d %%fstmp2, %%fstmp2, %%fstmp1;\n", bits);
+ fprintf (file, "\tld.shared.u%d %%fstmp1, [%%fstmp2];\n", bits);
+ fprintf (file, "\tsub.u%d %%frame, %%fstmp1, "
+  HOST_WIDE_INT_PRINT_DEC ";\n", bits, sz);
+ if (alignment > keep_align)
+   fprintf (file, "\tand.b%d %%frame, %%frame, %d;\n",
+bits, -alignment);
+ if (!crtl->is_leaf)
+   fprintf (file, "\tst.shared.u%d [%%fstmp2], %%frame;\n", bits);
+ 

Re: [PATCH] Simple optimization for MASK_STORE.

2015-11-12 Thread Richard Biener
On Wed, Nov 11, 2015 at 2:13 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> What we should do to cope with this problem (structure size increasing)?
> Should we return to vector comparison version?

Ok, given this constraint I think the cleanest approach is to allow
integer(!) vector equality(!) compares with scalar result.  This should then
expand via cmp_optab and not via vec_cmp_optab.

On gimple you can then have

 if (mask_vec_1 != {0, 0,  })
...

Note that a fallback expansion (for optabs.c to try) would be
the suggested view-conversion (aka, subreg) variant using
a same-sized integer mode.

Target maintainers can then choose what is a better fit for
their target (and instruction set as register set constraints may apply).

The patch you posted seems to do this but not restrict the compares
to integer ones (please do that).

   if (TREE_CODE (op0_type) == VECTOR_TYPE
  || TREE_CODE (op1_type) == VECTOR_TYPE)
 {
-  error ("vector comparison returning a boolean");
-  debug_generic_expr (op0_type);
-  debug_generic_expr (op1_type);
-  return true;
+ /* Allow vector comparison returning boolean if operand types
+are equal and CODE is EQ/NE.  */
+ if ((code != EQ_EXPR && code != NE_EXPR)
+ || TREE_CODE (op0_type) != TREE_CODE (op1_type)
+ || TYPE_VECTOR_SUBPARTS (op0_type)
+!= TYPE_VECTOR_SUBPARTS (op1_type)
+ || GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type)))
+!= GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1_type

These are all checked with the useless_type_conversion_p checks done earlier.

As said I'd like to see a

|| ! VECTOR_INTEGER_TYPE_P (op0_type)

check added so we and targets do not need to worry about using EQ/NE vs. CMP
and worry about signed zeros and friends.

+   {
+ error ("type mismatch for vector comparison returning a boolean");
+ debug_generic_expr (op0_type);
+ debug_generic_expr (op1_type);
+ return true;
+   }



--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -422,6 +422,15 @@ forward_propagate_into_comparison_1 (gimple *stmt,
  enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
  bool invariant_only_p = !single_use0_p;

+ /* Can't combine vector comparison with scalar boolean type of
+the result and VEC_COND_EXPR having vector type of comparison.  */
+ if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+ && INTEGRAL_TYPE_P (type)
+ && (TREE_CODE (type) == BOOLEAN_TYPE
+ || TYPE_PRECISION (type) == 1)
+ && def_code == VEC_COND_EXPR)
+   return NULL_TREE;

this hints at larger fallout you paper over here.  So this effectively
means we're trying combining (vec1 != vec2) != 0 for example
and that fails miserably?  If so then the solution is to fix whatever
does not expect this (valid) GENERIC tree.

+  if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0)
+return;

not sure if I like a param more than a target hook ... :/

+  /* Create vector comparison with boolean result.  */
+  vectype = TREE_TYPE (mask);
+  zero = build_zero_cst (TREE_TYPE (vectype));
+  zero = build_vector_from_val (vectype, zero);

build_zero_cst (vectype);

+  stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);

you can omit the NULL_TREE operands.

+  gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME);

please omit the assert.

+  gimple_set_vdef (last, new_vdef);

do this before you create the PHI.

+ /* Put definition statement of stored value in STORE_BB
+if possible.  */
+ arg3 = gimple_call_arg (last, 3);
+ if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3))
+   {
...

is this really necessary?  It looks incomplete to me anyway.  I'd rather have
a late sink pass if this shows necessary.  Btw,...

+it is legal.  */
+ if (gimple_bb (def_stmt) == bb
+ && is_valid_sink (def_stmt, last_store))

with the implementation of is_valid_sink this is effectively

   && (!gimple_vuse (def_stmt)
  || gimple_vuse (def_stmt) == gimple_vdef (last_store))


I still think this "pass" is quite a hack, esp. as it appears as generic
code in a GIMPLE pass.  And esp. as this hack seems to be needed
for Haswell only, not Boradwell or Skylake.

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2015-11-11 12:18 GMT+03:00 Richard Biener :
>> On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich  
>> wrote:
>>> 2015-11-10 17:46 GMT+03:00 Richard Biener :
 On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich  
 wrote:
> 2015-11-10 15:33 GMT+03:00 Richard Biener :
>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri 

Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-12 Thread Ville Voutilainen
On 12 November 2015 at 16:23, Gerald Pfeifer  wrote:
> On Wed, 11 Nov 2015, Jonathan Wakely wrote:
>>
>> Fixed by this patch.
>
>
> Thanks, Jonathan!  Unfortunately bootstrap is still broken
> (on i386-unknown-freebsd11.0 at least):
>
> In file included from
> /scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/thread.cc:27:0:
> /scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
> thread: In function ‘void std::this_thread::sleep_for(const
> std::chrono::duration<_Rep1, _Period1>&)’:
> /scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
> thread:300:44: error: ‘errno’ was not declared in this scope
>  while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
>^
> /scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
> thread:300:53: error: ‘EINTR’ was not declared in this scope
>  while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)


Note that that's a separate problem that has nothing to do with the
tag-type-explicit-default-ctor
patch.


Re: [PATCH, 11/16] Update testcases after adding kernels pass group

2015-11-12 Thread Richard Biener
On Thu, Nov 12, 2015 at 3:31 PM, Tom de Vries  wrote:
> On 11/11/15 12:03, Richard Biener wrote:
>>
>> On Mon, 9 Nov 2015, Tom de Vries wrote:
>>
>>> On 09/11/15 16:35, Tom de Vries wrote:

 Hi,

 this patch series for stage1 trunk adds support to:
 - parallelize oacc kernels regions using parloops, and
 - map the loops onto the oacc gang dimension.

 The patch series contains these patches:

1Insert new exit block only when needed in
   transform_to_exit_first_loop_alt
2Make create_parallel_loop return void
3Ignore reduction clause on kernels directive
4Implement -foffload-alias
5Add in_oacc_kernels_region in struct loop
6Add pass_oacc_kernels
7Add pass_dominator_oacc_kernels
8Add pass_ch_oacc_kernels
9Add pass_parallelize_loops_oacc_kernels
   10Add pass_oacc_kernels pass group in passes.def
   11Update testcases after adding kernels pass group
   12Handle acc loop directive
   13Add c-c++-common/goacc/kernels-*.c
   14Add gfortran.dg/goacc/kernels-*.f95
   15Add libgomp.oacc-c-c++-common/kernels-*.c
   16Add libgomp.oacc-fortran/kernels-*.f95

 The first 9 patches are more or less independent, but patches 10-16 are
 intended to be committed at the same time.

 Bootstrapped and reg-tested on x86_64.

 Build and reg-tested with nvidia accelerator, in combination with a
 patch that enables accelerator testing (which is submitted at
 https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

 I'll post the individual patches in reply to this message.
>>>
>>>
>>> This patch updates existing testcases with new pass numbers, given the
>>> passes
>>> that were added in the pass list in patch 10.
>>
>>
>> I think it would be nice to be able to specify the number in the .def
>> file instead so we can avoid this kind of churn everytime we do this.
>
>
> How about something along the lines of:
> ...
>   /* pass_build_ealias is a dummy pass that ensures that we
>  execute TODO_rebuild_alias at this point.  */
>   NEXT_PASS (pass_build_ealias);
>   /* Pass group that runs when there are oacc kernels in the
>   function.  */
>   NEXT_PASS (pass_oacc_kernels);
>   PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
>   PUSH_ID ("oacc_kernels")
> ...
>   POP_ID ()
>   POP_INSERT_PASSES ()
>   NEXT_PASS (pass_fre);
> ...
>
> where the PUSH_ID/POP_ID pair has the functionality that all the contained
> passes:
> - have the id prefixed to the dump file, so the dump file of pass_ch
>   which normally is "ch" becomes "oacc_kernels_ch", and
> - the pass name in pass_instances.def becomes pass_oacc_kernels_ch, such
>   that it doesn't count as numbered instance of pass_ch
> ?

Hmm.  I'd like to have sth that allows me to add "slp" to both
pass_slp_vectorize
instances, having them share the suffix (as no two functions are in both dumps).

We similarly have "duplicates" across the -Og vs. the -O[0-3] pipeline.

Basically make all dump file name suffixes manually specified which means moving
them from the class definition to the actual instance.

Well, just an idea.  In a distant future I like our pass pipeline to become more
dynamic, getting away from a static passes.def towards, say, a pass "script"
(to be able to say "if inlining did nothing skip this group" or similar).

Richard.


> Thanks,
> - Tom


Re: [OpenACC] declare directive

2015-11-12 Thread James Norris

Jakub

On 11/12/2015 03:09 AM, Jakub Jelinek wrote:

On Wed, Nov 11, 2015 at 07:07:58PM -0600, James Norris wrote:

+ oacc_declare_returns->remove (t);
+
+ if (oacc_declare_returns->elements () == 0)
+   {
+ delete oacc_declare_returns;
+ oacc_declare_returns = NULL;
+   }


Something for incremental patch:
1) might be nice to have some assertion that at the end of gimplify_body
or so oacc_declare_returns is NULL
2) what happens if you refer to automatic variables of other functions
(C or Fortran nested functions, maybe C++ lambdas); shall those be
unmapped at the end of the (nested) function's body?



Ok. Thanks! Will put on my TODO list.


@@ -5858,6 +5910,10 @@ omp_default_clause (struct gimplify_omp_ctx *ctx, tree 
decl,
flags |= GOVD_FIRSTPRIVATE;
break;
  case OMP_CLAUSE_DEFAULT_UNSPECIFIED:
+  if (is_global_var (decl)
+ && ctx->region_type & (ORT_ACC_PARALLEL | ORT_ACC_KERNELS)


Please put this condition as cheapest first.  I'd also surround
it into (), just to make it clear that the bitwise & is intentional.
Perhaps () != 0.


+ && device_resident_p (decl))
+   flags |= GOVD_MAP_TO_ONLY | GOVD_MAP;



+ case GOMP_MAP_FROM:
+   kinds[i] = GOMP_MAP_FORCE_FROM;
+   GOACC_enter_exit_data (device, 1, [i], [i],
+  [i], 0, 0);


Wrong indentation.



Fixed.


Ok with those two changes and please think about the incremental stuff.


Again, thanks for taking the time for the review.

Jim



Re: [RFC] Remove first_pass_instance from pass_vrp

2015-11-12 Thread Richard Biener
On Thu, Nov 12, 2015 at 2:49 PM, Tom de Vries  wrote:
> On 12/11/15 13:26, Richard Biener wrote:
>>
>> On Thu, Nov 12, 2015 at 12:37 PM, Tom de Vries 
>> wrote:
>>>
>>> Hi,
>>>
>>> [ See also related discussion at
>>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00452.html ]
>>>
>>> this patch removes the usage of first_pass_instance from pass_vrp.
>>>
>>> the patch:
>>> - limits itself to pass_vrp, but my intention is to remove all
>>>usage of first_pass_instance
>>> - lacks an update to gdbhooks.py
>>>
>>> Modifying the pass behaviour depending on the instance number, as
>>> first_pass_instance does, break compositionality of the pass list. In
>>> other
>>> words, adding a pass instance in a pass list may change the behaviour of
>>> another instance of that pass in the pass list. Which obviously makes it
>>> harder to understand and change the pass list. [ I've filed this issue as
>>> PR68247 - Remove pass_first_instance ]
>>>
>>> The solution is to make the difference in behaviour explicit in the pass
>>> list, and no longer change behaviour depending on instance number.
>>>
>>> One obvious possible fix is to create a duplicate pass with a different
>>> name, say 'pass_vrp_warn_array_bounds':
>>> ...
>>>NEXT_PASS (pass_vrp_warn_array_bounds);
>>>...
>>>NEXT_PASS (pass_vrp);
>>> ...
>>>
>>> But, AFAIU that requires us to choose a different dump-file name for each
>>> pass. And choosing vrp1 and vrp2 as new dump-file names still means that
>>> -fdump-tree-vrp no longer works (which was mentioned as drawback here:
>>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00453.html ).
>>>
>>> This patch instead makes pass creation parameterizable. So in the pass
>>> list,
>>> we use:
>>> ...
>>>NEXT_PASS_WITH_ARG (pass_vrp, true /* warn_array_bounds_p */);
>>>...
>>>NEXT_PASS_WITH_ARG (pass_vrp, false /* warn_array_bounds_p */);
>>> ...
>>>
>>> This approach gives us clarity in the pass list, similar to using a
>>> duplicate pass 'pass_vrp_warn_array_bounds'.
>>>
>>> But it also means -fdump-tree-vrp still works as before.
>>>
>>> Good idea? Other comments?
>>
>>
>> It's good to get rid of the first_pass_instance hack.
>>
>> I can't comment on the AWK, leaving that to others.  Syntax-wise I'd hoped
>> we can just use NEXT_PASS with the extra argument being optional...
>
>
> I suppose I could use NEXT_PASS in the pass list, and expand into
> NEXT_PASS_WITH_ARG in pass-instances.def.
>
> An alternative would be to change the NEXT_PASS macro definitions into
> vararg variants. But the last time I submitted something with a vararg macro
> ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00794.html ), I got a
> question about it ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00912.html
> ), so I tend to avoid using vararg macros.
>
>> I don't see the need for giving clone_with_args a new name, just use an
>> overload
>> of clone ()?
>
>
> That's what I tried initially, but I ran into:
> ...
> src/gcc/tree-pass.h:85:21: warning: ‘virtual opt_pass* opt_pass::clone()’
> was hidden [-Woverloaded-virtual]
>virtual opt_pass *clone ();
>  ^
> src/gcc/tree-vrp.c:10393:14: warning:   by ‘virtual opt_pass*
> {anonymous}::pass_vrp::clone(bool)’ [-Woverloaded-virtual]
>opt_pass * clone (bool warn_array_bounds_p) { return new pass_vrp
> (m_ctxt, warn_array_bounds_p); }
> ...
>
> Googling the error message gives this discussion: (
> http://stackoverflow.com/questions/16505092/confused-about-virtual-overloaded-functions
> ), and indeed adding
>   "using gimple_opt_pass::clone;"
> in class pass_vrp makes the warning disappear.
>
> I'll submit an updated version.

Hmm, but actually the above means the pass does not expose the
non-argument clone
which is good!

Or did you forget to add the virtual-with-arg variant to opt_pass?
That is, why's it
a virtual function in the first place?  (clone_with_arg)

> Thanks,
> - Tom
>
>
>> [ideally C++ would allow us to say that only one overload may be
>> implemented]


Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-12 Thread Jonathan Wakely

On 12/11/15 15:23 +0100, Gerald Pfeifer wrote:

On Wed, 11 Nov 2015, Jonathan Wakely wrote:

Fixed by this patch.


Thanks, Jonathan!  Unfortunately bootstrap is still broken
(on i386-unknown-freebsd11.0 at least):


Different issue.

In file included from 
/scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/thread.cc:27:0:

/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread: In function ‘void std::this_thread::sleep_for(const 
std::chrono::duration<_Rep1, _Period1>&)’:
/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread:300:44: error: ‘errno’ was not declared in this scope
while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
  ^
/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread:300:53: error: ‘EINTR’ was not declared in this scope
while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)


Does adding #include  to libstdc++-v3/include/std/thread
solve it?



Re: [hsa 2/12] Modifications to libgomp proper

2015-11-12 Thread Thomas Schwinge
Hi!

On Thu, 12 Nov 2015 11:11:33 +0100, Jakub Jelinek  wrote:
> On Thu, Nov 05, 2015 at 10:54:42PM +0100, Martin Jambor wrote:
> > --- a/libgomp/libgomp.h
> > +++ b/libgomp/libgomp.h
> > @@ -876,7 +876,8 @@ struct gomp_device_descr
> >void *(*dev2host_func) (int, void *, const void *, size_t);
> >void *(*host2dev_func) (int, void *, const void *, size_t);
> >void *(*dev2dev_func) (int, void *, const void *, size_t);
> > -  void (*run_func) (int, void *, void *);
> > +  void (*run_func) (int, void *, void *, const void *);
> 
> Adding arguments to existing plugin methods is a plugin ABI incompatible
> change.  We now have:
>   DLSYM (version);
>   if (device->version_func () != GOMP_VERSION)
> {
>   err = "plugin version mismatch";
>   goto fail;
> }
> so there is a way to deal with it, but you need to adjust all plugins.

I'm confused -- didn't we agree that we don't need to maintain backwards
compatibility in the libgomp <-> plugins interface?  (Nathan?)  As far as
I remember, the argument was that libgomp and all its plugins will always
be built from the same source tree, so will be compatible with each
other, "by definition"?

(We do need, and have, versioning between GCC proper and libgomp
interfaces.)


> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -1248,7 +1248,12 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
> > *devicep,
> >splay_tree_key tgt_fn = splay_tree_lookup (>mem_map, );
> >gomp_mutex_unlock (>lock);
> >if (tgt_fn == NULL)
> > -   gomp_fatal ("Target function wasn't mapped");
> > +   {
> > + if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> > +   return NULL;
> > + else
> > +   gomp_fatal ("Target function wasn't mapped");
> > +   }
> >  
> >return (void *) tgt_fn->tgt_offset;
> >  }
> > @@ -1276,6 +1281,7 @@ GOMP_target (int device, void (*fn) (void *), const 
> > void *unused,
> >  return gomp_target_fallback (fn, hostaddrs);
> >  
> >void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> > +  assert (fn_addr);
> 
> I must say I really don't like putting asserts into libgomp, in production
> it is after all not built with -D_NDEBUG.

I like them, because they help during development, and for getting
higher-quality bug reports from users, and they serve as source code
documentation.  Of course, I understand your -- I suppose -- performance
worries.  Does such an NULL checking assert -- hopefully marked as
"unlikely" -- cause any noticeable overhead, though?


> But this shows a worse problem,
> if you have GCC 5 compiled OpenMP code, of course there won't be HSA
> offloaded copy, but if you try to run it on a box with HSA offloading
> enabled, you can run into this assertion failure.

That's one of the issues that I'm working on resolving with my
"Forwarding -foffload=[...] from the driver (compile-time) to libgomp
(run-time)" patch,
.
In such a case (no GOMP_offload_register_ver call for HSA), HSA
offloading would not be considered (not "enabled") in libgomp.  (It'll be
two more weeks before I can make progress with that patch; will be
attending SuperComputing 2015 next week -- anyone else will be there,
too?)

> Supposedly the old APIs (GOMP_target, GOMP_target_update, GOMP_target_data)
> should treat GOMP_OFFLOAD_CAP_SHARED_MEM capable devices as unconditional
> device fallback?


> > @@ -1297,7 +1304,7 @@ GOMP_target (int device, void (*fn) (void *), const 
> > void *unused,
> >  void
> >  GOMP_target_41 (int device, void (*fn) (void *), size_t mapnum,
> > void **hostaddrs, size_t *sizes, unsigned short *kinds,
> > -   unsigned int flags, void **depend)
> > +   unsigned int flags, void **depend, const void *kernel_launch)
> 
> GOMP_target_ext has different arguments, you get the num_teams and
> thread_limit clauses values in there already (if known at compile time or
> before entering target region; 0 stands for implementation defined choice,
> -1 for unknown before GOMP_target_ext).
> Plus I must say I really don't like the addition of HSA specific argument
> to the API, it is unclean and really doesn't scale, when somebody adds
> support for another offloading target, would we add again another argument?
> Can't use the same one, because one could have configured both HSA and that
> other kind offloading at the same time and which one is picked would be only
> a runtime decision, based on env vars of omp_set_default_device etc.
> num_teams/thread_limit, as runtime arguments, you already get on the trunk.
> For compile time decided values, those should go into some data section
> and be somehow attached to what fn is translated into in the AVL tree (which
> you really don't need to use for variables on GOMP_OFFLOAD_CAP_SHARED_MEM
> obviously, but can still use for the kernels, and populate during
> registration of the offloading 

[Ada] Library-level error on aspects

2015-11-12 Thread Arnaud Charlet
This patch fixes a bug where GNAT fails to detect an error on an aspect that
must be applied to a library-level entity.

The following test must give an error:
tls.adb:2:26: entity for aspect "Thread_Local_Storage" must be library level
entity

procedure Tls is
   V : Natural := 0 with Thread_Local_Storage;
begin
   null;
end Tls;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-11-12  Bob Duff  

* sem_prag.adb (Check_Arg_Is_Library_Level_Local_Name): A
pragma that comes from an aspect does not "come from source",
so we need to test whether it comes from an aspect.

Index: sem_prag.adb
===
--- sem_prag.adb(revision 230242)
+++ sem_prag.adb(working copy)
@@ -4328,8 +4328,12 @@
   begin
  Check_Arg_Is_Local_Name (Arg);
 
+ --  If it came from an aspect, we want to give the error just as if it
+ --  came from source.
+
  if not Is_Library_Level_Entity (Entity (Get_Pragma_Arg (Arg)))
-   and then Comes_From_Source (N)
+   and then (Comes_From_Source (N)
+   or else Present (Corresponding_Aspect (Parent (Arg
  then
 Error_Pragma_Arg
   ("argument for pragma% must be library level entity", Arg);


Re: open acc default data attribute

2015-11-12 Thread Nathan Sidwell

On 11/12/15 03:53, Jakub Jelinek wrote:


+  error ("%qE not specified in enclosing OpenACC %s construct",
+DECL_NAME (lang_hooks.decls.omp_report_decl (decl)), rkind);
+  error_at (ctx->location, "enclosing OpenACC %s construct", rkind);

I'd use %qs instead of %s.


thanks,

nathan



Re: [hsa 2/12] Modifications to libgomp proper

2015-11-12 Thread Nathan Sidwell

On 11/12/15 08:21, Thomas Schwinge wrote:

Hi!




so there is a way to deal with it, but you need to adjust all plugins.


I'm confused -- didn't we agree that we don't need to maintain backwards
compatibility in the libgomp <-> plugins interface?  (Nathan?)


Indeed, no need to deal with version skew between libgomp and its plugins.

On 07/24/15 12:30, Jakub Jelinek wrote:

And I'd say that we don't really need to maintain support for mixing libgomp
from one GCC version and libgomp plugins from another version, worst case
there should be some GOMP_OFFLOAD_get_version function that libgomp could
use to verify it is talking to the right version of the plugin and
completely ignore it if it gives wrong version.




(We do need, and have, versioning between GCC proper and libgomp
interfaces.)


Yes. (For avoidance of doubt)

nathan


Re: [PATCH] Enable libstdc++ numeric conversions on Cygwin

2015-11-12 Thread Jonathan Wakely

On 12/11/15 13:39 +, Jonathan Wakely wrote:

One downside of this change is that we introduce some (hopefully safe)
ODR violations, where inline functions and templates that depend on
_GLIBCXX_USE_C99_FOO might now be defined differently in C++98 and
C++11 code. Previously they had the same definition, even though in
C++11 mode the value of the _GLIBCXX_USE_C99_FOO macro might have been
sub-optimal (i.e. the C99 features were usable, but the macro said
they weren't). Those ODR violatiosn could be avoided if needed, by
always using the _GLIBCXX98_USE_C99_FOO macro in code that can be
included from either C++98 or C++11. We could still use the
_GLIBCXX11_USE_C99_FOO macro in pure C++11 code (such as the numeric
conversion functions) and get most of the benefit of this change.


This patch (relative to the previous one) would avoid the ODR
problems, by only using the C++98 macro in code that gets used in
C++98 and later, and using the _GLIBCXX11_XXX ones in code that is
never compiled as C++98 (specifically, the numeric conversion
functions).

Maybe this is a safer, more conservative change.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index deefa04..11dee8e 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -960,7 +960,7 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
 ])
 AC_MSG_RESULT($glibcxx_cv_c99_math_cxx98)
 if test x"$glibcxx_cv_c99_math_cxx98" = x"yes"; then
-  AC_DEFINE(_GLIBCXX98_USE_C99_MATH, 1,
+  AC_DEFINE(_GLIBCXX_USE_C99_MATH, 1,
 [Define if C99 functions or macros in  should be imported
 in  in namespace std for C++98.])
 fi
@@ -1029,7 +1029,7 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
 fi
 AC_MSG_RESULT($glibcxx_cv_c99_complex_cxx98)
 if test x"$glibcxx_cv_c99_complex_cxx98" = x"yes"; then
-  AC_DEFINE(_GLIBCXX98_USE_C99_COMPLEX, 1,
+  AC_DEFINE(_GLIBCXX_USE_C99_COMPLEX, 1,
 [Define if C99 functions in  should be used in
  for C++98. Using compiler builtins for these functions
 requires corresponding C99 library functions to be present.])
@@ -1054,7 +1054,7 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
 ])
 AC_MSG_RESULT($glibcxx_cv_c99_stdio_cxx98)
 if test x"$glibcxx_cv_c99_stdio_cxx98" = x"yes"; then
-  AC_DEFINE(_GLIBCXX98_USE_C99_STDIO, 1,
+  AC_DEFINE(_GLIBCXX_USE_C99_STDIO, 1,
 [Define if C99 functions or macros in  should be imported
 in  in namespace std for C++98.])
 fi
@@ -1100,7 +1100,7 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
 
   AC_MSG_RESULT($glibcxx_cv_c99_wchar_cxx98)
   if test x"$glibcxx_cv_c99_wchar_cxx98" = x"yes"; then
-AC_DEFINE(_GLIBCXX98_USE_C99_WCHAR, 1,
+AC_DEFINE(_GLIBCXX_USE_C99_WCHAR, 1,
   [Define if C99 functions or macros in  should be imported
   in  in namespace std for C++98.])
   fi
diff --git a/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h b/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h
index 055c5b6..b21726f 100644
--- a/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h
+++ b/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h
@@ -37,5 +37,8 @@
 #define _GLIBCXX_USE_C99_DYNAMIC (!(__ISO_C_VISIBLE >= 1999))
 #define _GLIBCXX_USE_C99_LONG_LONG_CHECK 1
 #define _GLIBCXX_USE_C99_LONG_LONG_DYNAMIC (_GLIBCXX_USE_C99_DYNAMIC || !defined __LONG_LONG_SUPPORTED)
+#define _GLIBCXX11_USE_C99_STDIO 1
+#define _GLIBCXX11_USE_C99_STDLIB 1
+#define _GLIBCXX11_USE_C99_WCHAR 1
 
 #endif
diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index b3853cd..2fa345a 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -5396,7 +5396,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
-#if _GLIBCXX_USE_C99_STDLIB
+#if _GLIBCXX11_USE_C99_STDLIB
   // 21.4 Numeric Conversions [string.conversions].
   inline int
   stoi(const string& __str, size_t* __idx = 0, int __base = 10)
@@ -5435,9 +5435,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   inline long double
   stold(const string& __str, size_t* __idx = 0)
   { return __gnu_cxx::__stoa(::strtold, "stold", __str.c_str(), __idx); }
-#endif // _GLIBCXX_USE_C99_STDLIB
+#endif // _GLIBCXX11_USE_C99_STDLIB
 
-#if _GLIBCXX_USE_C99_STDIO
+#if _GLIBCXX11_USE_C99_STDIO
   // NB: (v)snprintf vs sprintf.
 
   // DR 1261.
@@ -5501,9 +5501,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 return __gnu_cxx::__to_xstring(::vsnprintf, __n,
 	   "%Lf", __val);
   }
-#endif // _GLIBCXX_USE_C99_STDIO
+#endif // _GLIBCXX11_USE_C99_STDIO
 
-#if defined(_GLIBCXX_USE_WCHAR_T) && defined(_GLIBCXX_USE_C99_WCHAR)
+#if defined(_GLIBCXX_USE_WCHAR_T) && defined(_GLIBCXX11_USE_C99_WCHAR)
   inline int 
   stoi(const wstring& __str, size_t* __idx = 0, int __base = 10)
   { return __gnu_cxx::__stoa(::wcstol, "stoi", __str.c_str(),
@@ -5605,7 +5605,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 	L"%Lf", __val);
   }
 

Re: [PATCH 1/4] [ARM] PR63870 Add qualifiers for NEON builtins

2015-11-12 Thread Charles Baylis
On 9 November 2015 at 09:03, Ramana Radhakrishnan
 wrote:
>
> Missing comment and please prefix this with NEON_ or SIMD_ .
>
>>
>> +#define ENDIAN_LANE_N(mode, n)  \
>> +  (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n)
>> +
>
> Otherwise OK -

With those changes, the attached patch was applied as r230142
From 4a05b67a1757e88e1989ce7515cd10de0a6def1c Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Wed, 17 Jun 2015 17:08:30 +0100
Subject: [PATCH 1/4] [ARM] PR63870 Add qualifiers for NEON builtins

gcc/ChangeLog:

  Charles Baylis  

	PR target/63870
	* config/arm/arm-builtins.c (enum arm_type_qualifiers): New enumerator
	qualifier_struct_load_store_lane_index.
	(builtin_arg): New enumerator NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX.
	(arm_expand_neon_args): New parameter. Remove ellipsis. Handle NEON
	argument qualifiers.
	(arm_expand_neon_builtin): Handle new NEON argument qualifier.
	* config/arm/arm.h (NEON_ENDIAN_LANE_N): New macro.

Change-Id: Iaa14d8736879fa53776319977eda2089f0a26647
---
 gcc/config/arm/arm-builtins.c | 48 +++
 gcc/config/arm/arm.c  |  1 +
 gcc/config/arm/arm.h  |  6 ++
 3 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index bad3dc3..d0bd777 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -67,7 +67,9 @@ enum arm_type_qualifiers
   /* Polynomial types.  */
   qualifier_poly = 0x100,
   /* Lane indices - must be within range of previous argument = a vector.  */
-  qualifier_lane_index = 0x200
+  qualifier_lane_index = 0x200,
+  /* Lane indices for single lane structure loads and stores.  */
+  qualifier_struct_load_store_lane_index = 0x400
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -1963,6 +1965,7 @@ typedef enum {
   NEON_ARG_COPY_TO_REG,
   NEON_ARG_CONSTANT,
   NEON_ARG_LANE_INDEX,
+  NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX,
   NEON_ARG_MEMORY,
   NEON_ARG_STOP
 } builtin_arg;
@@ -2020,9 +2023,9 @@ neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode,
 /* Expand a Neon builtin.  */
 static rtx
 arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
-		  int icode, int have_retval, tree exp, ...)
+		  int icode, int have_retval, tree exp,
+		  builtin_arg *args)
 {
-  va_list ap;
   rtx pat;
   tree arg[SIMD_MAX_BUILTIN_ARGS];
   rtx op[SIMD_MAX_BUILTIN_ARGS];
@@ -2037,13 +2040,11 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
 	  || !(*insn_data[icode].operand[0].predicate) (target, tmode)))
 target = gen_reg_rtx (tmode);
 
-  va_start (ap, exp);
-
   formals = TYPE_ARG_TYPES (TREE_TYPE (arm_builtin_decls[fcode]));
 
   for (;;)
 {
-  builtin_arg thisarg = (builtin_arg) va_arg (ap, int);
+  builtin_arg thisarg = args[argc];
 
   if (thisarg == NEON_ARG_STOP)
 	break;
@@ -2079,6 +2080,18 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
 		op[argc] = copy_to_mode_reg (mode[argc], op[argc]);
 	  break;
 
+	case NEON_ARG_STRUCT_LOAD_STORE_LANE_INDEX:
+	  gcc_assert (argc > 1);
+	  if (CONST_INT_P (op[argc]))
+		{
+		  neon_lane_bounds (op[argc], 0,
+GET_MODE_NUNITS (map_mode), exp);
+		  /* Keep to GCC-vector-extension lane indices in the RTL.  */
+		  op[argc] =
+		GEN_INT (NEON_ENDIAN_LANE_N (map_mode, INTVAL (op[argc])));
+		}
+	  goto constant_arg;
+
 	case NEON_ARG_LANE_INDEX:
 	  /* Previous argument must be a vector, which this indexes.  */
 	  gcc_assert (argc > 0);
@@ -2089,19 +2102,22 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
 		}
 	  /* Fall through - if the lane index isn't a constant then
 		 the next case will error.  */
+
 	case NEON_ARG_CONSTANT:
+constant_arg:
 	  if (!(*insn_data[icode].operand[opno].predicate)
 		  (op[argc], mode[argc]))
-		error_at (EXPR_LOCATION (exp), "incompatible type for argument %d, "
-		   "expected %", argc + 1);
+		{
+		  error ("%Kargument %d must be a constant immediate",
+			 exp, argc + 1);
+		  return const0_rtx;
+		}
 	  break;
+
 case NEON_ARG_MEMORY:
 	  /* Check if expand failed.  */
 	  if (op[argc] == const0_rtx)
-	  {
-		va_end (ap);
 		return 0;
-	  }
 	  gcc_assert (MEM_P (op[argc]));
 	  PUT_MODE (op[argc], mode[argc]);
 	  /* ??? arm_neon.h uses the same built-in functions for signed
@@ -2122,8 +2138,6 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
 	}
 }
 
-  va_end (ap);
-
   if (have_retval)
 switch (argc)
   {
@@ -2235,6 +2249,8 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
 
   if (d->qualifiers[qualifiers_k] & qualifier_lane_index)
 	args[k] = NEON_ARG_LANE_INDEX;
+  else if (d->qualifiers[qualifiers_k] & 

Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers

2015-11-12 Thread Charles Baylis
On 9 November 2015 at 13:35, Ramana Radhakrishnan
 wrote:
>
>
> On 08/11/15 00:26, charles.bay...@linaro.org wrote:
>> From: Charles Baylis 
>>
>>   Charles Baylis  
>>
>>   * config/arm/neon.md (neon_vld1_lane): Remove error for invalid
>>   lane number.
>>   (neon_vst1_lane): Likewise.
>>   (neon_vld2_lane): Likewise.
>>   (neon_vst2_lane): Likewise.
>>   (neon_vld3_lane): Likewise.
>>   (neon_vst3_lane): Likewise.
>>   (neon_vld4_lane): Likewise.
>>   (neon_vst4_lane): Likewise.
>>
>
> The only way we can get here is through the intrinsics - we do a check for 
> lane numbers earlier.
>
> If things go horribly wrong - the assembler will complain, so it's ok to 
> elide this internal_error here, thus OK.

Applied as r230144


Re: [RFC] Remove first_pass_instance from pass_vrp

2015-11-12 Thread Richard Biener
On Thu, Nov 12, 2015 at 3:04 PM, Richard Biener
 wrote:
> On Thu, Nov 12, 2015 at 2:49 PM, Tom de Vries  wrote:
>> On 12/11/15 13:26, Richard Biener wrote:
>>>
>>> On Thu, Nov 12, 2015 at 12:37 PM, Tom de Vries 
>>> wrote:

 Hi,

 [ See also related discussion at
 https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00452.html ]

 this patch removes the usage of first_pass_instance from pass_vrp.

 the patch:
 - limits itself to pass_vrp, but my intention is to remove all
usage of first_pass_instance
 - lacks an update to gdbhooks.py

 Modifying the pass behaviour depending on the instance number, as
 first_pass_instance does, break compositionality of the pass list. In
 other
 words, adding a pass instance in a pass list may change the behaviour of
 another instance of that pass in the pass list. Which obviously makes it
 harder to understand and change the pass list. [ I've filed this issue as
 PR68247 - Remove pass_first_instance ]

 The solution is to make the difference in behaviour explicit in the pass
 list, and no longer change behaviour depending on instance number.

 One obvious possible fix is to create a duplicate pass with a different
 name, say 'pass_vrp_warn_array_bounds':
 ...
NEXT_PASS (pass_vrp_warn_array_bounds);
...
NEXT_PASS (pass_vrp);
 ...

 But, AFAIU that requires us to choose a different dump-file name for each
 pass. And choosing vrp1 and vrp2 as new dump-file names still means that
 -fdump-tree-vrp no longer works (which was mentioned as drawback here:
 https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00453.html ).

 This patch instead makes pass creation parameterizable. So in the pass
 list,
 we use:
 ...
NEXT_PASS_WITH_ARG (pass_vrp, true /* warn_array_bounds_p */);
...
NEXT_PASS_WITH_ARG (pass_vrp, false /* warn_array_bounds_p */);
 ...

 This approach gives us clarity in the pass list, similar to using a
 duplicate pass 'pass_vrp_warn_array_bounds'.

 But it also means -fdump-tree-vrp still works as before.

 Good idea? Other comments?
>>>
>>>
>>> It's good to get rid of the first_pass_instance hack.
>>>
>>> I can't comment on the AWK, leaving that to others.  Syntax-wise I'd hoped
>>> we can just use NEXT_PASS with the extra argument being optional...
>>
>>
>> I suppose I could use NEXT_PASS in the pass list, and expand into
>> NEXT_PASS_WITH_ARG in pass-instances.def.
>>
>> An alternative would be to change the NEXT_PASS macro definitions into
>> vararg variants. But the last time I submitted something with a vararg macro
>> ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00794.html ), I got a
>> question about it ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00912.html
>> ), so I tend to avoid using vararg macros.
>>
>>> I don't see the need for giving clone_with_args a new name, just use an
>>> overload
>>> of clone ()?
>>
>>
>> That's what I tried initially, but I ran into:
>> ...
>> src/gcc/tree-pass.h:85:21: warning: ‘virtual opt_pass* opt_pass::clone()’
>> was hidden [-Woverloaded-virtual]
>>virtual opt_pass *clone ();
>>  ^
>> src/gcc/tree-vrp.c:10393:14: warning:   by ‘virtual opt_pass*
>> {anonymous}::pass_vrp::clone(bool)’ [-Woverloaded-virtual]
>>opt_pass * clone (bool warn_array_bounds_p) { return new pass_vrp
>> (m_ctxt, warn_array_bounds_p); }
>> ...
>>
>> Googling the error message gives this discussion: (
>> http://stackoverflow.com/questions/16505092/confused-about-virtual-overloaded-functions
>> ), and indeed adding
>>   "using gimple_opt_pass::clone;"
>> in class pass_vrp makes the warning disappear.
>>
>> I'll submit an updated version.
>
> Hmm, but actually the above means the pass does not expose the
> non-argument clone
> which is good!
>
> Or did you forget to add the virtual-with-arg variant to opt_pass?
> That is, why's it
> a virtual function in the first place?  (clone_with_arg)

That said,

--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -83,6 +83,7 @@ public:

  The default implementation prints an error message and aborts.  */
   virtual opt_pass *clone ();
+  virtual opt_pass *clone_with_arg (bool);


means the arg type is fixed at 'bool' (yeah, mimicing
first_pass_instance).  That
looks a bit limiting to me, but anyway.

Richard.

>> Thanks,
>> - Tom
>>
>>
>>> [ideally C++ would allow us to say that only one overload may be
>>> implemented]


Re: [PATCH] nvptx: implement automatic storage in custom stacks

2015-11-12 Thread Bernd Schmidt

I'm proposing the following patch as a step towards resolving the issue with
inaccessibility of stack storage (.local memory) in PTX to other threads than
the one using that stack.  The idea is to have preallocated stacks, and have
__nvptx_stacks[] array in shared memory hold current stack pointers.  Each
thread is maintaining __nvptx_stacks[tid.y] as its stack pointer, thus for
OpenMP the intent is to preallocate on a per-warp basis (not per-thread).
For OpenMP SIMD regions we'll have to ensure that conflicting accesses are not
introduced.


This is of course really ugly; I'd propose we keep it on an nvptx-OpenMP 
specific branch for now until we know that this is really going somewhere.



I've run it through make -k check-c regtesting.  These are new fails, all
mysterious:


These would have to be investigated first.


+ sz = (sz + keep_align - 1) & ~(keep_align - 1);


Use the ROUND_UP macro.


+ fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n",
+  bits == 64 ? ".wide" : "", bits);


Use a shift.


+
+  if (need_softstack_decl)
+{
+  fprintf (asm_out_file, ".extern .shared .u64 __nvptx_stacks[];\n;");
+}


Lose excess braces.


+.global .u64 %__softstack[16384];


Maybe declarea as .u8 so you don't have two different constants for the 
stack size?



+.reg .u64 %stackptr;
+mov.u64%stackptr, %__softstack;
+cvta.global.u64%stackptr, %stackptr;
+add.u64%stackptr, %stackptr, 131072;
+st.shared.u64  [__nvptx_stacks], %stackptr;
+


I'm guessing you have other missing pieces for setting this up for 
multiple threads.



Bernd



Re: [PATCH 1/4][AArch64] Add scheduling and cost models for Exynos M1

2015-11-12 Thread James Greenhalgh
On Thu, Nov 05, 2015 at 11:31:33AM -0600, Evandro Menezes wrote:
> James,
> 
> Since other members of the "tune_params" structure were signed
> integers, even though negative numbers would make no sense for most
> either, I followed the same pattern.
> 
> Regardless, here's a patch with unsigned integers as you requested:
> 
>[AArch64] Add extra tuning parameters for target processors
> 
>2015-11-05  Evandro Menezes  
> 
>gcc/
> 
>* config/aarch64/aarch64-protos.h (tune_params): Add new members
>"max_case_values" and "cache_line_size".
>* config/aarch64/aarch64.c (aarch64_case_values_threshold): New
>function.
>(aarch64_override_options_internal): Tune heuristics based on new
>members in "tune_params".
>(TARGET_CASE_VALUES_THRESHOLD): Define macro.
> 
> Please, commit if it's alright.

Hi Evandro,

This is OK with a few nits.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 81792bc..ecf4685 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -195,6 +195,9 @@ struct tune_params
>int vec_reassoc_width;
>int min_div_recip_mul_sf;
>int min_div_recip_mul_df;
> +  unsigned int max_case_values; /* Case values threshold; or 0 for the 
> default.  */
> +
> +  unsigned int cache_line_size; /* Cache line size; or 0 for the default.  */
>  
>  /* An enum specifying how to take into account CPU autoprefetch capabilities
> during instruction scheduling:

I'd put the comments above the field, and make them slightly more
descriptive:

+  /* Value for aarch64_case_values_threshold; or 0 for the default.  */
+  unsigned int max_case_values;
+  /* Value for PARAM_L1_CACHE_LINE_SIZE; or 0 to use the default.  */
+  unsigned int cache_line_size;

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5c8604f..e7f1c07 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13385,6 +13418,7 @@ aarch64_promoted_type (const_tree t)
>  return float_type_node;
>return NULL_TREE;
>  }
> +
>  #undef TARGET_ADDRESS_COST
>  #define TARGET_ADDRESS_COST aarch64_address_cost
>  

Drop this hunk.

I've applied the patch with those changes as revision 230261 on your behalf.

Thanks,
James



Re: [PATCH] Enable libstdc++ numeric conversions on Cygwin

2015-11-12 Thread Jonathan Wakely

On 12/11/15 11:40 +, Jonathan Wakely wrote:

On 18/09/15 12:01 -0400, Jennifer Yao wrote:

Forgot to include the patch.

On Fri, Sep 18, 2015 at 11:17 AM, Jennifer Yao
 wrote:

A number of functions in libstdc++ are guarded by the _GLIBCXX_USE_C99
preprocessor macro, which is only defined on systems that pass all of
the checks for a large set of C99 functions. Consequently, on systems
which lack any of the required C99 facilities (e.g. Cygwin, which
lacks some C99 complex math functions), the numeric conversion
functions (std::stoi(), std::stol(), std::to_string(), etc.) are not
defined—a rather silly outcome, as none of the numeric conversion
functions are implemented using C99 math functions.

This patch enables numeric conversion functions on the aforementioned
systems by splitting the checks for C99 support and defining several
new macros (_GLIBCXX_USE_C99_STDIO, _GLIBCXX_USE_C99_STDLIB, and
_GLIBCXX_USE_C99_WCHAR), which replace the use of _GLIBCXX_USE_C99 in
#if conditionals where appropriate.


(Coming back to this now that Jennifer's copyright assignment is
complete...)

Splitting the _GLIBCXX_USE_C99 macro into more fine-grained macros for
separate features is definitely the right direction.

However your patch also changes the configure tests to use -std=c++0x
(which should be -std=c++11, but that's a minor point). On an OS that
only makes the C99 library available conditionally that will mean that
configure determines that C99 library features are supported, but we
will get errors if we try to use those features in C++03 parts of the
library.

I think a more complete solution is to have two sets of configure
tests and two sets of macros, so that we define _GLIBCXX_USE_C99_STDIO
when C99 stdio is available unconditionally, and define
_GLIBCXX11_USE_C99_STDIO when it's available with -std=c++11.

Then in the library code we can check _GLIBCXX_USE_C99_STDIO if we
want to use C99 features in C++03 code, and check
_GLIBCXX11_USE_C99_STDIO if we want to use the features in C++11 code.

That should still solve the problem for the numeric conversion
functions, because they are defined in C++11 and so would check
_GLIBCXX11_USE_C99_STDIO, which will be defined for newlib.

Other pieces of the library, such as locales, will use
_GLIBCXX_USE_C99_STDIO and that might still be false for newlib (and
for other strict C libraries like the Solaris and FreeBSD libc).

I will make the changes to acinclude.m4 to duplicate the tests, so we
test once with -std=c++98 and once with -std=c++11, and then change
the library to check either _GLIBCXX_xxx or _GLIBCXX11_xxx as
appropriate.


Here's a patch implementing my suggestion.

The major changes since Jennifer's original patch are in acinclude.m4,
to do the autoconf tests once with -std=c++98 and again with
-std=c++11, and in include/bits/c++config to define the
_GLIBCXX_USE_C99_XXX macros according to either _GLIBCXX98_USE_CXX_XXX
or _GLIBCXX11_USE_CXX_XXX, depending on the standard mode in effect
when the file is included.

Because those new definitions in bits/c++config are unconditional I
had to adjust a few #ifdef tests to use #if instead.

I also removed the changes to GLIBCXX_CHECK_C99_TR1, so that there are
no changes to the macros used for the TR1 library. As a follow-up
change I will add a test for  to GLIBCXX_ENABLE_C99 and
change several C++ headers to stop using the TR1 macros.

This passes all tests on powerpc64le-linux, I'll also try to test on
DragonFly and FreeBSD.

Does this look good to everyone?

One downside of this change is that we introduce some (hopefully safe)
ODR violations, where inline functions and templates that depend on
_GLIBCXX_USE_C99_FOO might now be defined differently in C++98 and
C++11 code. Previously they had the same definition, even though in
C++11 mode the value of the _GLIBCXX_USE_C99_FOO macro might have been
sub-optimal (i.e. the C99 features were usable, but the macro said
they weren't). Those ODR violatiosn could be avoided if needed, by
always using the _GLIBCXX98_USE_C99_FOO macro in code that can be
included from either C++98 or C++11. We could still use the
_GLIBCXX11_USE_C99_FOO macro in pure C++11 code (such as the numeric
conversion functions) and get most of the benefit of this change.

commit 1459e1d0a0033a8f2605d33f52e2bf789fb6ab33
Author: Jonathan Wakely 
Date:   Thu Nov 12 13:19:38 2015 +

More fine-grained autoconf checks for C99 library

2015-09-18  Jennifer Yao  
	Jonathan Wakely  

	PR libstdc++/58393
	PR libstdc++/61580
	* acinclude.m4 (GLIBCXX_ENABLE_C99): Perform tests twice, with
	-std=c++11 as well as -std=c++98, and define separate macros for each.
	Cache the results of checking for complex math and wide character
	functions. Reformat for readability.
	* config.h.in: Regenerate.
	* include/bits/c++config: Define _GLIBCXX_USE_C99_XXX macros to

Re: [RFC] Remove first_pass_instance from pass_vrp

2015-11-12 Thread Tom de Vries

On 12/11/15 13:26, Richard Biener wrote:

On Thu, Nov 12, 2015 at 12:37 PM, Tom de Vries  wrote:

Hi,

[ See also related discussion at
https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00452.html ]

this patch removes the usage of first_pass_instance from pass_vrp.

the patch:
- limits itself to pass_vrp, but my intention is to remove all
   usage of first_pass_instance
- lacks an update to gdbhooks.py

Modifying the pass behaviour depending on the instance number, as
first_pass_instance does, break compositionality of the pass list. In other
words, adding a pass instance in a pass list may change the behaviour of
another instance of that pass in the pass list. Which obviously makes it
harder to understand and change the pass list. [ I've filed this issue as
PR68247 - Remove pass_first_instance ]

The solution is to make the difference in behaviour explicit in the pass
list, and no longer change behaviour depending on instance number.

One obvious possible fix is to create a duplicate pass with a different
name, say 'pass_vrp_warn_array_bounds':
...
   NEXT_PASS (pass_vrp_warn_array_bounds);
   ...
   NEXT_PASS (pass_vrp);
...

But, AFAIU that requires us to choose a different dump-file name for each
pass. And choosing vrp1 and vrp2 as new dump-file names still means that
-fdump-tree-vrp no longer works (which was mentioned as drawback here:
https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00453.html ).

This patch instead makes pass creation parameterizable. So in the pass list,
we use:
...
   NEXT_PASS_WITH_ARG (pass_vrp, true /* warn_array_bounds_p */);
   ...
   NEXT_PASS_WITH_ARG (pass_vrp, false /* warn_array_bounds_p */);
...

This approach gives us clarity in the pass list, similar to using a
duplicate pass 'pass_vrp_warn_array_bounds'.

But it also means -fdump-tree-vrp still works as before.

Good idea? Other comments?


It's good to get rid of the first_pass_instance hack.

I can't comment on the AWK, leaving that to others.  Syntax-wise I'd hoped
we can just use NEXT_PASS with the extra argument being optional...


I suppose I could use NEXT_PASS in the pass list, and expand into 
NEXT_PASS_WITH_ARG in pass-instances.def.


An alternative would be to change the NEXT_PASS macro definitions into 
vararg variants. But the last time I submitted something with a vararg 
macro ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00794.html ), I 
got a question about it ( 
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00912.html ), so I tend to 
avoid using vararg macros.



I don't see the need for giving clone_with_args a new name, just use an overload
of clone ()?


That's what I tried initially, but I ran into:
...
src/gcc/tree-pass.h:85:21: warning: ‘virtual opt_pass* 
opt_pass::clone()’ was hidden [-Woverloaded-virtual]

   virtual opt_pass *clone ();
 ^
src/gcc/tree-vrp.c:10393:14: warning:   by ‘virtual opt_pass* 
{anonymous}::pass_vrp::clone(bool)’ [-Woverloaded-virtual]
   opt_pass * clone (bool warn_array_bounds_p) { return new pass_vrp 
(m_ctxt, warn_array_bounds_p); }

...

Googling the error message gives this discussion: ( 
http://stackoverflow.com/questions/16505092/confused-about-virtual-overloaded-functions 
), and indeed adding

  "using gimple_opt_pass::clone;"
in class pass_vrp makes the warning disappear.

I'll submit an updated version.

Thanks,
- Tom

> [ideally C++ would allow us to say that only one overload may be
> implemented]


[gomp4] remove c++ reference restriction

2015-11-12 Thread Nathan Sidwell
I've applied this to gomp4 branch.  It removes the machinery concerning c++ 
references.  The openacc std makes no  mention of such a type, so originally we 
were not permitting the type. But,

(a) OpenMP supports them, which suggests openacc wishes to
(b) Fortran already has reference types that need supporting
(c) it's more work to not support them, by modifying the mappable_type hook.

nathan
2015-11-12  Nathan Sidwell  

	* langhooks-def.h (omp_mappable_type): Remove oacc arg.
	* langhooks.h (lhd_omp_mappable_type): Likewise.
	* langhooks.c (lhd_omp_mappable_type): Likswise.
	* gimplify.c (omp_notice_variable): Adjust omp_mappable_type call.

	c/
	* c-typeck.c (c_finish_omp_clauses): Adjust omp_mappable_type calls.
	* c-decl.c (c_decl_attributes): Likewise.

	testsuite/
	* g++.dg/goacc/reference.C: Adjust.

	cp/
	* semantics.c (finish_ommp_clauses): Adjust omp_mappable_type calls.
	* decl2.c (cp_omp_mappable_type): Remove oacc arg and processing.
	* cp-tree.h (cp_omp_mappable_type): Remove oacc arg.

oIndex: gimplify.c
===
--- gimplify.c	(revision 230177)
+++ gimplify.c	(working copy)
@@ -6106,9 +6106,7 @@ omp_notice_variable (struct gimplify_omp
 		&& lang_hooks.decls.omp_privatize_by_reference (decl))
 	  type = TREE_TYPE (type);
 	if (nflags == flags
-		&& !lang_hooks.types.omp_mappable_type (type,
-			(ctx->region_type
-			 & ORT_ACC) != 0))
+		&& !lang_hooks.types.omp_mappable_type (type))
 	  {
 		error ("%qD referenced in target region does not have "
 		   "a mappable type", decl);
Index: langhooks.c
===
--- langhooks.c	(revision 230177)
+++ langhooks.c	(working copy)
@@ -525,7 +525,7 @@ lhd_omp_firstprivatize_type_sizes (struc
 /* Return true if TYPE is an OpenMP mappable type.  */
 
 bool
-lhd_omp_mappable_type (tree type, bool oacc ATTRIBUTE_UNUSED)
+lhd_omp_mappable_type (tree type)
 {
   /* Mappable type has to be complete.  */
   if (type == error_mark_node || !COMPLETE_TYPE_P (type))
Index: langhooks.h
===
--- langhooks.h	(revision 230177)
+++ langhooks.h	(working copy)
@@ -112,7 +112,7 @@ struct lang_hooks_for_types
   void (*omp_firstprivatize_type_sizes) (struct gimplify_omp_ctx *, tree);
 
   /* Return true if TYPE is a mappable type.  */
-  bool (*omp_mappable_type) (tree type, bool oacc);
+  bool (*omp_mappable_type) (tree type);
 
   /* Return TRUE if TYPE1 and TYPE2 are identical for type hashing purposes.
  Called only after doing all language independent checks.
Index: testsuite/g++.dg/goacc/reference.C
===
--- testsuite/g++.dg/goacc/reference.C	(revision 230177)
+++ testsuite/g++.dg/goacc/reference.C	(working copy)
@@ -4,7 +4,7 @@
 int
 test1 (int )
 {
-#pragma acc kernels copy (ref) // { dg-error "reference types are not supported in OpenACC" }
+#pragma acc kernels copy (ref)
   {
 ref = 10;
   }
@@ -16,12 +16,12 @@ test2 (int )
   int b;
 #pragma acc kernels copyout (b)
   {
-b = ref + 10; // { dg-error "referenced in target region does not have a mappable type" }
+b = ref + 10;
   }
 
 #pragma acc parallel copyout (b)
   {
-b = ref + 10; // { dg-error "referenced in target region does not have a mappable type" }
+b = ref + 10;
   }
 
   ref = b;
@@ -33,7 +33,7 @@ main()
   int a = 0;
   int _a = a;
 
-  #pragma acc parallel copy (a, ref_a) // { dg-error "reference types are not supported in OpenACC" }
+  #pragma acc parallel copy (a, ref_a)
   {
 ref_a = 5;
   }
Index: c/c-typeck.c
===
--- c/c-typeck.c	(revision 230177)
+++ c/c-typeck.c	(working copy)
@@ -12863,8 +12863,7 @@ c_finish_omp_clauses (tree clauses, bool
 	  else
 		{
 		  t = OMP_CLAUSE_DECL (c);
-		  if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (t),
-			   is_oacc))
+		  if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (t)))
 		{
 		  error_at (OMP_CLAUSE_LOCATION (c),
 "array section does not have mappable type "
@@ -12916,8 +12915,7 @@ c_finish_omp_clauses (tree clauses, bool
 			t, omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
 		  remove = true;
 		}
-	  else if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (t),
-			is_oacc))
+	  else if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (t)))
 		{
 		  error_at (OMP_CLAUSE_LOCATION (c),
 			"%qE does not have a mappable type in %qs clause",
@@ -12967,8 +12965,7 @@ c_finish_omp_clauses (tree clauses, bool
 			 || (OMP_CLAUSE_MAP_KIND (c)
 			 == GOMP_MAP_FORCE_DEVICEPTR)))
 		   && t == OMP_CLAUSE_DECL (c)
-		   && !lang_hooks.types.omp_mappable_type (TREE_TYPE (t),
-			   is_oacc))
+		   && !lang_hooks.types.omp_mappable_type (TREE_TYPE (t)))
 	{
 	  error_at (OMP_CLAUSE_LOCATION (c),
 			"%qD does not 

Re: [Ada] More efficient code generated for object overlays

2015-11-12 Thread Duncan Sands

Hi Arnaud,

On 12/11/15 12:06, Arnaud Charlet wrote:

This change refines the use of the "volatile hammer" to implement the advice
given in RM 13.3(19) by disabling it for object overlays altogether. relying
instead on the ref-all aliasing property of reference types to achieve the
desired effect.

This will generate better code for object overlays, for example the following
function should now make no memory accesses at all on 64-bit platforms when
compiled at -O2 or above:


this is great!  When doing tricks to improve performance I've several times 
resorted to address overlays, forgetting about the "volatile hammer", only to 
rediscover it for the N'th time due to the poor performance and the horrible 
code generated.


Best wishes, Duncan.


Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-12 Thread Gerald Pfeifer

On Wed, 11 Nov 2015, Jonathan Wakely wrote:

Fixed by this patch.


Thanks, Jonathan!  Unfortunately bootstrap is still broken
(on i386-unknown-freebsd11.0 at least):

In file included from 
/scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/thread.cc:27:0:

/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread: In function ‘void std::this_thread::sleep_for(const 
std::chrono::duration<_Rep1, _Period1>&)’:
/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread:300:44: error: ‘errno’ was not declared in this scope
 while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
   ^
/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread:300:53: error: ‘EINTR’ was not declared in this scope
 while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)

Geraldcommit 97c2da9d4cc11bd5dae077ccb5fda4e72f7c34d5
Author: Jonathan Wakely 
Date:   Wed Nov 11 17:27:23 2015 +

	* libsupc++/new_handler.cc: Fix for explicit constructor change.

diff --git a/libstdc++-v3/libsupc++/new_handler.cc b/libstdc++-v3/libsupc++/new_handler.cc
index a09012c..4da48b3 100644
--- a/libstdc++-v3/libsupc++/new_handler.cc
+++ b/libstdc++-v3/libsupc++/new_handler.cc
@@ -34,7 +34,7 @@ namespace
 }
 #endif
 
-const std::nothrow_t std::nothrow = { };
+const std::nothrow_t std::nothrow = std::nothrow_t{ };
 
 using std::new_handler;
 namespace


Re: [PATCH, 11/16] Update testcases after adding kernels pass group

2015-11-12 Thread Tom de Vries

On 11/11/15 12:03, Richard Biener wrote:

On Mon, 9 Nov 2015, Tom de Vries wrote:


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

   1Insert new exit block only when needed in
  transform_to_exit_first_loop_alt
   2Make create_parallel_loop return void
   3Ignore reduction clause on kernels directive
   4Implement -foffload-alias
   5Add in_oacc_kernels_region in struct loop
   6Add pass_oacc_kernels
   7Add pass_dominator_oacc_kernels
   8Add pass_ch_oacc_kernels
   9Add pass_parallelize_loops_oacc_kernels
  10Add pass_oacc_kernels pass group in passes.def
  11Update testcases after adding kernels pass group
  12Handle acc loop directive
  13Add c-c++-common/goacc/kernels-*.c
  14Add gfortran.dg/goacc/kernels-*.f95
  15Add libgomp.oacc-c-c++-common/kernels-*.c
  16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


This patch updates existing testcases with new pass numbers, given the passes
that were added in the pass list in patch 10.


I think it would be nice to be able to specify the number in the .def
file instead so we can avoid this kind of churn everytime we do this.


How about something along the lines of:
...
  /* pass_build_ealias is a dummy pass that ensures that we
 execute TODO_rebuild_alias at this point.  */
  NEXT_PASS (pass_build_ealias);
  /* Pass group that runs when there are oacc kernels in the
  function.  */
  NEXT_PASS (pass_oacc_kernels);
  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
  PUSH_ID ("oacc_kernels")
...
  POP_ID ()
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_fre);
...

where the PUSH_ID/POP_ID pair has the functionality that all the 
contained passes:

- have the id prefixed to the dump file, so the dump file of pass_ch
  which normally is "ch" becomes "oacc_kernels_ch", and
- the pass name in pass_instances.def becomes pass_oacc_kernels_ch, such
  that it doesn't count as numbered instance of pass_ch
?

Thanks,
- Tom


libiberty TAGS

2015-11-12 Thread Mike Stump
I applied this one as obvious.

* Makefile.in (etags tags TAGS): Use && instead of ;.

Index: Makefile.in
===
--- Makefile.in (revision 230269)
+++ Makefile.in (working copy)
@@ -409,7 +409,7 @@ stamp-noasandir:
 .PHONY: all etags tags TAGS ls clean stage1 stage2
 
 etags tags TAGS: etags-subdir
-   cd $(srcdir); etags $(CFILES)
+   cd $(srcdir) && etags $(CFILES)
 
 # The standalone demangler (c++filt) has been moved to binutils.
 # But make this target work anyway for demangler hacking.



[PATCH] Fix possible correctness issue in BB dependence analysis

2015-11-12 Thread Richard Biener

This fixes BB vectorization dependence analysis to not rely on
all instances being vectorized.  The dependence check

-   gimple *earlier_stmt = get_earlier_stmt (DR_STMT (dra), DR_STMT 
(drb));
-   if (DR_IS_READ (STMT_VINFO_DATA_REF (vinfo_for_stmt (earlier_stmt
- {
-   /* That only holds for load-store pairs taking part in 
vectorization.  */
-   if (STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (DR_STMT (dra)))
- && STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (DR_STMT (drb
-   return false;

effectively does that and we remove instances later.  I wasn't able
to build a testcase showing this defect as we always fail vectorization
for some other reason.

Still the following fixes this and implements this special case
instance-local only.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-11-12  Richard Biener  

* tree-vectorizer.h (vect_slp_analyze_data_ref_dependences):
Rename to vect_slp_analyze_instance_dependence.
* tree-vect-data-refs.c (vect_slp_analyze_data_ref_dependence):
Remove WAR special-case.
(vect_slp_analyze_node_dependences): Instead add more specific
code here, not relying on other instances being vectorized.
(vect_slp_analyze_instance_dependence): Adjust accordingly.
* tree-vect-slp.c (vect_build_slp_tree_1): Remove excessive
vertical space in dump files.
(vect_print_slp_tree): Likewise.
(vect_analyze_slp_instance): Dump a header for the final SLP tree.
(vect_slp_analyze_bb_1): Delay computing relevant stmts and
not vectorized stmts until after dependence analysis removed
instances.  Merge alignment and dependence checks.
* tree-vectorizer.c (pass_slp_vectorize::execute): Clear visited
flag on all stmts.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h   (revision 230216)
--- gcc/tree-vectorizer.h   (working copy)
*** extern enum dr_alignment_support vect_su
*** 1009,1015 
  extern tree vect_get_smallest_scalar_type (gimple *, HOST_WIDE_INT *,
 HOST_WIDE_INT *);
  extern bool vect_analyze_data_ref_dependences (loop_vec_info, int *);
! extern bool vect_slp_analyze_data_ref_dependences (bb_vec_info);
  extern bool vect_enhance_data_refs_alignment (loop_vec_info);
  extern bool vect_analyze_data_refs_alignment (loop_vec_info);
  extern bool vect_verify_datarefs_alignment (loop_vec_info);
--- 1009,1015 
  extern tree vect_get_smallest_scalar_type (gimple *, HOST_WIDE_INT *,
 HOST_WIDE_INT *);
  extern bool vect_analyze_data_ref_dependences (loop_vec_info, int *);
! extern bool vect_slp_analyze_instance_dependence (slp_instance);
  extern bool vect_enhance_data_refs_alignment (loop_vec_info);
  extern bool vect_analyze_data_refs_alignment (loop_vec_info);
  extern bool vect_verify_datarefs_alignment (loop_vec_info);
Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 230216)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_slp_analyze_data_ref_dependence (st
*** 537,568 
dump_printf (MSG_NOTE,  "\n");
  }
  
-   /* We do not vectorize basic blocks with write-write dependencies.  */
-   if (DR_IS_WRITE (dra) && DR_IS_WRITE (drb))
- return true;
- 
-   /* If we have a read-write dependence check that the load is before the 
store.
-  When we vectorize basic blocks, vector load can be only before
-  corresponding scalar load, and vector store can be only after its
-  corresponding scalar store.  So the order of the acceses is preserved in
-  case the load is before the store.  */
-   gimple *earlier_stmt = get_earlier_stmt (DR_STMT (dra), DR_STMT (drb));
-   if (DR_IS_READ (STMT_VINFO_DATA_REF (vinfo_for_stmt (earlier_stmt
- {
-   /* That only holds for load-store pairs taking part in vectorization.  
*/
-   if (STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (DR_STMT (dra)))
- && STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (DR_STMT (drb
-   return false;
- }
- 
return true;
  }
  
  
! /* Analyze dependences involved in the transform of SLP NODE.  */
  
  static bool
! vect_slp_analyze_node_dependences (slp_instance instance, slp_tree node)
  {
/* This walks over all stmts involved in the SLP load/store done
   in NODE verifying we can sink them up to the last stmt in the
--- 559,575 
dump_printf (MSG_NOTE,  "\n");
  }
  
return true;
  }
  
  
! /* Analyze dependences involved in the transform of SLP NODE.  STORES
!contain the vector of scalar stores of this instance if we are
!disambiguating the loads.  */
  
  static bool
! vect_slp_analyze_node_dependences (slp_instance instance, slp_tree 

Re: [PATCH 04/N] Fix big memory leak in ix86_valid_target_attribute_p

2015-11-12 Thread Martin Liška
On 11/12/2015 12:29 PM, Richard Biener wrote:
> On Thu, Nov 12, 2015 at 11:03 AM, Martin Liška  wrote:
>> Hello.
>>
>> Following patch was a bit negotiated with Jakub and can save a huge amount 
>> of memory in cases
>> where target attributes are heavily utilized.
>>
>> Can bootstrap and survives regression tests on x86_64-linux-pc.
>>
>> Ready for trunk?
> 
> +static bool opts_obstack_initialized = false;
> +
> +/* Initialize opts_obstack if not initialized.  */
> +
> +void
> +init_opts_obstack (void)
> +{
> +  if (!opts_obstack_initialized)
> +{
> +  opts_obstack_initialized = true;
> +  gcc_obstack_init (_obstack);
> 
> you can move the static global to function scope.
> 
> Ok with that change.

Done and installed as r230264. Final version of the patch is attached.

> 
> Btw, don't other targets need a similar adjustment to their hook?
> Grepping shows arm and nios2.

nios2 is not the case as it doesn't utilize:
  init_options_struct (_options, NULL);

I've been testing patch for aarch64 that is also included in the email.

Martin

> 
> Thanks,
> Richard.
> 
> 
>> Thanks,
>> Martin

>From 2a7ed80a489228bbb3c271ca9d0217cca888eeae Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 11 Nov 2015 12:52:11 +0100
Subject: [PATCH] Fix big memory leak in ix86_valid_target_attribute_p

gcc/ChangeLog:

2015-11-12  Martin Liska  

	* config/i386/i386.c (ix86_valid_target_attribute_p):
	Finalize options at the of the function.
	* gcc.c (driver_get_configure_time_options): Call newly
	introduced init_opts_obstack.
	* lto-wrapper.c (main): Likewise.
	* opts.c (init_opts_obstack): New function.
	(init_options_struct): Call newly
	introduced init_opts_obstack.
	* opts.h (init_options_struct): Declare.
---
 gcc/config/i386/i386.c |  2 ++
 gcc/gcc.c  |  2 +-
 gcc/lto-wrapper.c  |  2 +-
 gcc/opts.c | 16 +++-
 gcc/opts.h |  1 +
 5 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b84a11d..1325cf0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6237,6 +6237,8 @@ ix86_valid_target_attribute_p (tree fndecl,
 	DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl) = new_optimize;
 }
 
+  finalize_options_struct (_options);
+
   return ret;
 }
 
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 8bbf5be..87d1979 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -9915,7 +9915,7 @@ driver_get_configure_time_options (void (*cb) (const char *option,
   size_t i;
 
   obstack_init ();
-  gcc_obstack_init (_obstack);
+  init_opts_obstack ();
   n_switches = 0;
 
   for (i = 0; i < ARRAY_SIZE (option_default_specs); i++)
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 20e67ed..b9ac535 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1355,7 +1355,7 @@ main (int argc, char *argv[])
 {
   const char *p;
 
-  gcc_obstack_init (_obstack);
+  init_opts_obstack ();
 
   p = argv[0] + strlen (argv[0]);
   while (p != argv[0] && !IS_DIR_SEPARATOR (p[-1]))
diff --git a/gcc/opts.c b/gcc/opts.c
index 9a3fbb3..930ae43 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -266,6 +266,20 @@ add_comma_separated_to_vector (void **pvec, const char *arg)
   *pvec = v;
 }
 
+/* Initialize opts_obstack if not initialized.  */
+
+void
+init_opts_obstack (void)
+{
+  static bool opts_obstack_initialized = false;
+
+  if (!opts_obstack_initialized)
+{
+  opts_obstack_initialized = true;
+  gcc_obstack_init (_obstack);
+}
+}
+
 /* Initialize OPTS and OPTS_SET before using them in parsing options.  */
 
 void
@@ -273,7 +287,7 @@ init_options_struct (struct gcc_options *opts, struct gcc_options *opts_set)
 {
   size_t num_params = get_num_compiler_params ();
 
-  gcc_obstack_init (_obstack);
+  init_opts_obstack ();
 
   *opts = global_options_init;
 
diff --git a/gcc/opts.h b/gcc/opts.h
index 38b3837..2eb2d97 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -323,6 +323,7 @@ extern void decode_cmdline_options_to_array (unsigned int argc,
 extern void init_options_once (void);
 extern void init_options_struct (struct gcc_options *opts,
  struct gcc_options *opts_set);
+extern void init_opts_obstack (void);
 extern void finalize_options_struct (struct gcc_options *opts);
 extern void decode_cmdline_options_to_array_default_mask (unsigned int argc,
 			  const char **argv, 
-- 
2.6.2

>From b2a7dcae8de93db470da0b35162f5538337320ee Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 12 Nov 2015 16:41:16 +0100
Subject: [PATCH] Finalize func_options in arm target in
 arm_valid_target_attribute_p

gcc/ChangeLog:

2015-11-12  Martin Liska  

	* config/arm/arm.c (arm_valid_target_attribute_p): Finalize
	options struct.
---
 gcc/config/arm/arm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f4ebbc8..941774b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29956,6 

Re: [PATCH] New version of libmpx with new memmove wrapper

2015-11-12 Thread Ilya Enkovich
2015-11-05 13:37 GMT+03:00 Aleksandra Tsvetkova :
> New version of libmpx was added. There is a new function get_bd() that
> allows to get bounds directory. Wrapper for memmove was modified. Now
> it moves data and then moves corresponding bounds directly from one
> bounds table to another. This approach made moving unaligned pointers
> possible. It also makes memmove function faster on sizes bigger than
> 64 bytes.

+2015-10-27  Tsvetkova Alexandra  
+
+ * gcc.target/i386/mpx/memmove.c: New test for __mpx_wrapper_memmove.
+

Did you test it on different targets? It seems to me this test will
fail if you run it
on non-MPX target.  Please look at mpx-check.h and how other MPX run
tests use it.

+ * mpxrt/mpxrt.c (NUM_L1_BITS): Moved to mpxrt.h.
+ * mpxrt/mpxrt.c (REG_IP_IDX): Moved to mpxrt.h.
+ * mpxrt/mpxrt.c (REX_PREFIX): Moved to mpxrt.h.
+ * mpxrt/mpxrt.c (XSAVE_OFFSET_IN_FPMEM): Moved to mpxrt.h.
+ * mpxrt/mpxrt.c (MPX_L1_SIZE): Moved to mpxrt.h.

No need to repeat file name.

+ * libmpxwrap/mpx_wrappers.c: Rewrite __mpx_wrapper_memmove to make it faster.

You added new functions, types and modified existing function.  Make
ChangeLog more detailed.

--- /dev/null
+++ b/libmpx/mpxrt/mpxrt.h
@@ -0,0 +1,75 @@
+/* mpxrt.h  -*-C++-*-
+ *
+ *
+ *
+ *  @copyright
+ *  Copyright (C) 2014, 2015, Intel Corporation
+ *  All rights reserved.

2015 only

+const uintptr_t MPX_L1_ADDR_MASK = 0xf000UL;
+const uintptr_t MPX_L2_ADDR_MASK = 0xfffcUL;
+const uintptr_t MPX_L2_VALID_MASK = 0x0001UL;

Use defines


--- a/libmpx/mpxwrap/Makefile.am
+++ b/libmpx/mpxwrap/Makefile.am
@@ -1,4 +1,5 @@
 ALCLOCAL_AMFLAGS = -I .. -I ../config
+AM_CPPFLAGS = -I $(top_srcdir)

This is not reflected in ChangeLog

+/* The mpx_bt_entry struct represents a cell in bounds table.
+   *lb is the lower bound, *ub is the upper bound,
+   *p is the stored pointer.  */

Bounds and pointer are in lb, ub, p, not in *lb, *ub, *p. Right?

+static inline void
+alloc_bt (void *ptr)
+{
+  __asm__ __volatile__ ("bndstx %%bnd0, (%0,%0)"::"r" (ptr):"%bnd0");
+}

This should be marked as bnd_legacy.

+/* move_bounds function copies N bytes from SRC to DST.

Really?

+   It also copies bounds for all pointers inside.
+   There are 3 parts of the algorithm:
+   1) We copy everything till the end of the first bounds table SRC)

SRC is not a bounds table

+   2) In loop we copy whole bound tables till the second-last one
+   3) Data in the last bounds table is copied separately, after the loop.
+   If one of bound tables in SRC doesn't exist,
+   we skip it because there are no pointers.
+   Depending on the arrangement of SRC and DST we copy from the beginning
+   or from the end.  */
+__attribute__ ((bnd_legacy)) static void *
+move_bounds (void *dst, const void *src, size_t n)

What is returned value for?

+void *
+__mpx_wrapper_memmove (void *dst, const void *src, size_t n)
+{
+  if (n == 0)
+return dst;
+
+  __bnd_chk_ptr_bounds (dst, n);
+  __bnd_chk_ptr_bounds (src, n);
+
+  memmove (dst, src, n);
+  move_bounds (dst, src, n);
+  return dst;
 }

You completely remove old algorithm which should be faster on small
sizes. __mpx_wrapper_memmove should become a dispatcher between old
and new implementations depending on target (32-bit or 64-bit) and N.
Since old version performs both data and bounds copy, BD check should
be moved into __mpx_wrapper_memmove to never call
it when MPX is disabled.

Thanks,
Ilya


Re: [PATCH] nvptx: implement automatic storage in custom stacks

2015-11-12 Thread Bernd Schmidt

On 11/12/2015 03:59 PM, Alexander Monakov wrote:

On Thu, 12 Nov 2015, Bernd Schmidt wrote:

I've run it through make -k check-c regtesting.  These are new fails, all
mysterious:


These would have to be investigated first.


Any specific suggestions?  The PTX code emitted from GCC differs only in
prologue/epilogue, so whatever's broken... I think is unlikely due to this
change.  I can give it another try after upgrading CUDA driver and cuda-gdb
from 7.0 to latest.


Yeah, load it into cuda-gdb, that may help show what's happening.


+ fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n",
+  bits == 64 ? ".wide" : "", bits);


Use a shift.


I think mul is acceptable here: PTX JIT is handling it properly, according to
what I saw while investigating in cuda-gdb.  If I used a shift, I'd also have
to introduce another instruction for a widening integer conversion in the
64-bit case.  Do you insist?


Nah, it's fine.


This is crt0.s, which is linked in only for single-threaded testing with
-mmainkernel; for OpenMP, the intention is to handle it in the file that
implements libgomp_nvptx_main.


Yeah, that's what I meant. It might be nice to see that too if it 
already exists.



Bernd



[PATCH][GCC] Make stackalign test LTO proof

2015-11-12 Thread Andre Vieira

Hi,

  This patch changes this testcase to make sure LTO will not optimize 
away the assignment of the local array to a global variable which was 
introduced to make sure stack space was made available for the test to work.


  This is correct because LTO is supposed to optimize this global away 
as at link time it knows this global will never be read. By adding a 
read of the global, LTO will no longer optimize it away.


  Tested by running regressions for this testcase for various ARM targets.

  Is this OK to commit?

  Thanks,
  Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira  

* gcc.dg/torture/stackalign/builtin-return-1.c: Added read
  to global such that a write is not optimized away by LTO.
From 6fbac447475c3b669baee84aa9d6291c3d09f1ab Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Fri, 6 Nov 2015 13:13:47 +
Subject: [PATCH] keep the stack testsuite fix

---
 gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
index af017532aeb3878ef7ad717a2743661a87a56b7d..1ccd109256de72419a3c71c2c1be6d07c423c005 100644
--- a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
+++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
@@ -39,5 +39,10 @@ int main(void)
   if (bar(1) != 2)
 abort();
 
+  /* Make sure there is a read of the global after the function call to bar
+   * such that LTO does not optimize away the assignment above.  */
+  if (g != dummy)
+abort();
+
   return 0;
 }
-- 
1.9.1



Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

2015-11-12 Thread Evandro Menezes

On 11/12/2015 08:55 AM, James Greenhalgh wrote:

On Tue, Nov 10, 2015 at 11:50:12AM -0600, Evandro Menezes wrote:

2015-11-10  Evandro Menezes 

gcc/

* config/aarch64/aarch64.md (predicated): Copy attribute from
"arm.md".

This patch duplicates an attribute from arm.md so that the same
pipeline model can be used for both AArch32 and AArch64.

Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.

Please, commit if it's alright.

--
Evandro Menezes


 From 3b643a3c026350864713e1700dc44e4794d93809 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:11:16 -0600
Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
  pipeline models

gcc/
* config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md".
---
  gcc/config/aarch64/aarch64.md | 5 +
  1 file changed, 5 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6b08850..2bc2ff5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -195,6 +195,11 @@
  ;; 1 :=: yes
  (define_attr "far_branch" "" (const_int 0))
  
+;; [For compatibility with ARM in pipeline models]

+;; Attribute that specifies whether or not the instruction is executed
+;; conditionally ( != "AL"? "yes": "no").

I'm not sure this  != "AL" [...] part makes sense to me (thinking only
of AArch64, I'd understand it on AArch32 :) ) and we should document that
this is never set for AArch64. Could you respin with a slightly clearer
comment.
Since this attribute was not described in config/arm/arm.md, I felt that 
it needed to, but perhaps in its original place instead.  I agree that I 
should also point out that it's strictly for compatibility with AArch32 
and that it never matters for AArch64.


WRT the  thing, I was referring to the opcode fields terminology used 
by ARM in its ISA documentation.  Perhaps it's unnecessary, yes?


Thank you,

--
Evandro Menezes



[PATCH] Fix ICE for masked store of boolean value

2015-11-12 Thread Ilya Enkovich
Hi,

We may get ICE in vectorizer in case stored value get vectype not compatible 
with a storage.  This may happen for bool values.  This patch fixes ICE.  
Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-11-12  Ilya Enkovich  

* tree-vect-stmts.c (vectorizable_mask_load_store): Check
types of stored value and storage are compatible.

gcc/testsuite/

2015-11-12  Ilya Enkovich  

* g++.dg/vect/simd-mask-store-bool.cc: New test.


diff --git a/gcc/testsuite/g++.dg/vect/simd-mask-store-bool.cc 
b/gcc/testsuite/g++.dg/vect/simd-mask-store-bool.cc
new file mode 100644
index 000..c5f0458
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/simd-mask-store-bool.cc
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+/* { dg-additional-options "-mavx512bw" { target { i?86-*-* x86_64-*-* } } } */
+
+#define N 1024
+
+int a[N], b[N], c[N];
+bool d[N];
+
+void
+test (void)
+{
+  int i;
+#pragma omp simd safelen(64)
+  for (i = 0; i < N; i++)
+if (a[i] > 0)
+  d[i] = b[i] > c[i];
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index cfe30e0..7870b29 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1688,6 +1688,7 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
   bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree rhs_vectype = NULL_TREE;
   tree mask_vectype;
   tree elem_type;
   gimple *new_stmt;
@@ -1757,6 +1758,13 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
   if (!mask_vectype)
 return false;
 
+  if (is_store)
+{
+  tree rhs = gimple_call_arg (stmt, 3);
+  if (!vect_is_simple_use (rhs, loop_vinfo, _stmt, , _vectype))
+   return false;
+}
+
   if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
 {
   gimple *def_stmt;
@@ -1790,16 +1798,11 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
   else if (!VECTOR_MODE_P (TYPE_MODE (vectype))
   || !can_vec_mask_load_store_p (TYPE_MODE (vectype),
  TYPE_MODE (mask_vectype),
- !is_store))
+ !is_store)
+  || (rhs_vectype
+  && !useless_type_conversion_p (vectype, rhs_vectype)))
 return false;
 
-  if (is_store)
-{
-  tree rhs = gimple_call_arg (stmt, 3);
-  if (!vect_is_simple_use (rhs, loop_vinfo, _stmt, ))
-   return false;
-}
-
   if (!vec_stmt) /* transformation not required.  */
 {
   STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;


Re: [PATCH 04/N] Fix big memory leak in ix86_valid_target_attribute_p

2015-11-12 Thread Ramana Radhakrishnan


On 12/11/15 15:52, Martin Liška wrote:
> On 11/12/2015 12:29 PM, Richard Biener wrote:
>> On Thu, Nov 12, 2015 at 11:03 AM, Martin Liška  wrote:
>>> Hello.
>>>
>>> Following patch was a bit negotiated with Jakub and can save a huge amount 
>>> of memory in cases
>>> where target attributes are heavily utilized.
>>>
>>> Can bootstrap and survives regression tests on x86_64-linux-pc.
>>>
>>> Ready for trunk?
>>
>> +static bool opts_obstack_initialized = false;
>> +
>> +/* Initialize opts_obstack if not initialized.  */
>> +
>> +void
>> +init_opts_obstack (void)
>> +{
>> +  if (!opts_obstack_initialized)
>> +{
>> +  opts_obstack_initialized = true;
>> +  gcc_obstack_init (_obstack);
>>
>> you can move the static global to function scope.
>>
>> Ok with that change.
> 
> Done and installed as r230264. Final version of the patch is attached.
> 
>>
>> Btw, don't other targets need a similar adjustment to their hook?
>> Grepping shows arm and nios2.
> 
> nios2 is not the case as it doesn't utilize:
>   init_options_struct (_options, NULL);
> 
> I've been testing patch for aarch64 that is also included in the email.

The change is also needed in config/aarch64/aarch64.c 
(aarch64_option_valid_attribute_p). The attached patch is for arm i.e. 32 bit 
arm.

Ramana

> 
> Martin
> 
>>
>> Thanks,
>> Richard.
>>
>>
>>> Thanks,
>>> Martin
> 


Re: [PATCH][GCC][ARM] testcase memset-inline-10.c uses -mfloat-abi=hard but does not check whether target supports it

2015-11-12 Thread Andre Vieira

On 12/11/15 15:08, Andre Vieira wrote:

Hi,

   This patch changes the memset-inline-10.c testcase to make sure that
it is only compiled for ARM targets that support -mfloat-abi=hard using
the fact that all non-thumb1 targets do.

   This is correct because all targets for which -mthumb causes the
compiler to use thumb2 will support the generation of FP instructions.

   Tested by running regressions for this testcase for various ARM targets.

   Is this OK to commit?

   Thanks,
   Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira  

 * gcc.target/arm/memset-inline-10.c: Added
 dg-require-effective-target arm_thumb2_ok.


Now with attachment, sorry about that.

Cheers,
Andre
From f6515d9cceacf99d213aea1236b7027c7839ab4b Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Fri, 6 Nov 2015 14:48:27 +
Subject: [PATCH] added check for thumb2_ok

---
 gcc/testsuite/gcc.target/arm/memset-inline-10.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/arm/memset-inline-10.c b/gcc/testsuite/gcc.target/arm/memset-inline-10.c
index c1087c8e693fb723ca9396108f5fe872ede167e9..ce51c1d9eeb800cf67790fe06817ae23215399e9 100644
--- a/gcc/testsuite/gcc.target/arm/memset-inline-10.c
+++ b/gcc/testsuite/gcc.target/arm/memset-inline-10.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
 /* { dg-options "-march=armv7-a -mfloat-abi=hard -mfpu=neon -O2" } */
 /* { dg-skip-if "need SIMD instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
 /* { dg-skip-if "need SIMD instructions" { *-*-* } { "-mfpu=vfp*" } { "" } } */
-- 
1.9.1



[PATCH, doc] Document some standard pattern names

2015-11-12 Thread Ilya Enkovich
Hi,

This patch adds description for several standard pattern names.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-11-12  Ilya Enkovich  

* doc/md.texi (vec_cmp@var{m}@var{n}): New item.
(vec_cmpu@var{m}@var{n}): New item.
(vcond@var{m}@var{n}): Specify comparison is signed.
(vcondu@var{m}@var{n}): New item.
(vcond_mask_@var{m}@var{n}): New item.
(maskload@var{m}@var{n}): New item.
(maskstore@var{m}@var{n}): New item.


diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 71a2791..7fdc935 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4749,17 +4749,51 @@ specify field index and operand 0 place to store value 
into.
 Initialize the vector to given values.  Operand 0 is the vector to initialize
 and operand 1 is parallel containing values for individual fields.
 
+@cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
+@item @samp{vec_cmp@var{m}@var{n}}
+Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
+predicate in operand 1 which is a signed vector comparison with operands of
+mode @var{m} in operands 2 and 3.  Predicate is computed by element-wise
+evaluation of the vector comparison with a truth value of all-ones and a false
+value of all-zeros.
+
+@cindex @code{vec_cmpu@var{m}@var{n}} instruction pattern
+@item @samp{vec_cmpu@var{m}@var{n}}
+Similar to @code{vec_cmp@var{m}@var{n}} but perform unsigned vector comparison.
+
 @cindex @code{vcond@var{m}@var{n}} instruction pattern
 @item @samp{vcond@var{m}@var{n}}
 Output a conditional vector move.  Operand 0 is the destination to
 receive a combination of operand 1 and operand 2, which are of mode @var{m},
-dependent on the outcome of the predicate in operand 3 which is a
+dependent on the outcome of the predicate in operand 3 which is a signed
 vector comparison with operands of mode @var{n} in operands 4 and 5.  The
 modes @var{m} and @var{n} should have the same size.  Operand 0
 will be set to the value @var{op1} & @var{msk} | @var{op2} & ~@var{msk}
 where @var{msk} is computed by element-wise evaluation of the vector
 comparison with a truth value of all-ones and a false value of all-zeros.
 
+@cindex @code{vcondu@var{m}@var{n}} instruction pattern
+@item @samp{vcondu@var{m}@var{n}}
+Similar to @code{vcond@var{m}@var{n}} but performs unsigned vector
+comparison.
+
+@cindex @code{vcond_mask_@var{m}@var{n}} instruction pattern
+@item @samp{vcond_mask_@var{m}@var{n}}
+Similar to @code{vcond@var{m}@var{n}} but operand 3 holds a pre-computed
+result of vector comparison.
+
+@cindex @code{maskload@var{m}@var{n}} instruction pattern
+@item @samp{maskload@var{m}@var{n}}
+Perform a masked load of vector from memory operand 1 of mode @var{m}
+into register operand 0.  Mask is provided in register operand 2 of
+mode @var{n}.
+
+@cindex @code{maskstore@var{m}@var{n}} instruction pattern
+@item @samp{maskload@var{m}@var{n}}
+Perform a masked store of vector from register operand 1 of mode @var{m}
+into memory operand 0.  Mask is provided in register operand 2 of
+mode @var{n}.
+
 @cindex @code{vec_perm@var{m}} instruction pattern
 @item @samp{vec_perm@var{m}}
 Output a (variable) vector permutation.  Operand 0 is the destination


Re: [RFC] Remove first_pass_instance from pass_vrp

2015-11-12 Thread David Malcolm
On Thu, 2015-11-12 at 15:06 +0100, Richard Biener wrote:
> On Thu, Nov 12, 2015 at 3:04 PM, Richard Biener
>  wrote:
> > On Thu, Nov 12, 2015 at 2:49 PM, Tom de Vries  
> > wrote:
> >> On 12/11/15 13:26, Richard Biener wrote:
> >>>
> >>> On Thu, Nov 12, 2015 at 12:37 PM, Tom de Vries 
> >>> wrote:
> 
>  Hi,
> 
>  [ See also related discussion at
>  https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00452.html ]
> 
>  this patch removes the usage of first_pass_instance from pass_vrp.
> 
>  the patch:
>  - limits itself to pass_vrp, but my intention is to remove all
> usage of first_pass_instance
>  - lacks an update to gdbhooks.py
> 
>  Modifying the pass behaviour depending on the instance number, as
>  first_pass_instance does, break compositionality of the pass list. In
>  other
>  words, adding a pass instance in a pass list may change the behaviour of
>  another instance of that pass in the pass list. Which obviously makes it
>  harder to understand and change the pass list. [ I've filed this issue as
>  PR68247 - Remove pass_first_instance ]
> 
>  The solution is to make the difference in behaviour explicit in the pass
>  list, and no longer change behaviour depending on instance number.
> 
>  One obvious possible fix is to create a duplicate pass with a different
>  name, say 'pass_vrp_warn_array_bounds':
>  ...
> NEXT_PASS (pass_vrp_warn_array_bounds);
> ...
> NEXT_PASS (pass_vrp);
>  ...
> 
>  But, AFAIU that requires us to choose a different dump-file name for each
>  pass. And choosing vrp1 and vrp2 as new dump-file names still means that
>  -fdump-tree-vrp no longer works (which was mentioned as drawback here:
>  https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00453.html ).
> 
>  This patch instead makes pass creation parameterizable. So in the pass
>  list,
>  we use:
>  ...
> NEXT_PASS_WITH_ARG (pass_vrp, true /* warn_array_bounds_p */);
> ...
> NEXT_PASS_WITH_ARG (pass_vrp, false /* warn_array_bounds_p */);
>  ...
> 
>  This approach gives us clarity in the pass list, similar to using a
>  duplicate pass 'pass_vrp_warn_array_bounds'.
> 
>  But it also means -fdump-tree-vrp still works as before.
> 
>  Good idea? Other comments?
> >>>
> >>>
> >>> It's good to get rid of the first_pass_instance hack.
> >>>
> >>> I can't comment on the AWK, leaving that to others.  Syntax-wise I'd hoped
> >>> we can just use NEXT_PASS with the extra argument being optional...
> >>
> >>
> >> I suppose I could use NEXT_PASS in the pass list, and expand into
> >> NEXT_PASS_WITH_ARG in pass-instances.def.
> >>
> >> An alternative would be to change the NEXT_PASS macro definitions into
> >> vararg variants. But the last time I submitted something with a vararg 
> >> macro
> >> ( https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00794.html ), I got a
> >> question about it ( 
> >> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00912.html
> >> ), so I tend to avoid using vararg macros.
> >>
> >>> I don't see the need for giving clone_with_args a new name, just use an
> >>> overload
> >>> of clone ()?
> >>
> >>
> >> That's what I tried initially, but I ran into:
> >> ...
> >> src/gcc/tree-pass.h:85:21: warning: ‘virtual opt_pass* opt_pass::clone()’
> >> was hidden [-Woverloaded-virtual]
> >>virtual opt_pass *clone ();
> >>  ^
> >> src/gcc/tree-vrp.c:10393:14: warning:   by ‘virtual opt_pass*
> >> {anonymous}::pass_vrp::clone(bool)’ [-Woverloaded-virtual]
> >>opt_pass * clone (bool warn_array_bounds_p) { return new pass_vrp
> >> (m_ctxt, warn_array_bounds_p); }
> >> ...
> >>
> >> Googling the error message gives this discussion: (
> >> http://stackoverflow.com/questions/16505092/confused-about-virtual-overloaded-functions
> >> ), and indeed adding
> >>   "using gimple_opt_pass::clone;"
> >> in class pass_vrp makes the warning disappear.
> >>
> >> I'll submit an updated version.
> >
> > Hmm, but actually the above means the pass does not expose the
> > non-argument clone
> > which is good!
> >
> > Or did you forget to add the virtual-with-arg variant to opt_pass?
> > That is, why's it
> > a virtual function in the first place?  (clone_with_arg)
> 
> That said,
> 
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -83,6 +83,7 @@ public:
> 
>   The default implementation prints an error message and aborts.  */
>virtual opt_pass *clone ();
> +  virtual opt_pass *clone_with_arg (bool);
> 
> 
> means the arg type is fixed at 'bool' (yeah, mimicing
> first_pass_instance).  That
> looks a bit limiting to me, but anyway.
> 
> Richard.
> 
> >> Thanks,
> >> - Tom
> >>
> >>
> >>> [ideally C++ would allow us to say that only one overload may be
> >>> implemented]

IIRC, the idea of the clone vfunc was 

Re: [PATCH 04/N] Fix big memory leak in ix86_valid_target_attribute_p

2015-11-12 Thread Martin Liška
On 11/12/2015 12:33 PM, Bernd Schmidt wrote:
> On 11/12/2015 12:29 PM, Richard Biener wrote:
>> +static bool opts_obstack_initialized = false;
>> +
>> +/* Initialize opts_obstack if not initialized.  */
>> +
>> +void
>> +init_opts_obstack (void)
>> +{
>> +  if (!opts_obstack_initialized)
>> +{
>> +  opts_obstack_initialized = true;
>> +  gcc_obstack_init (_obstack);
>>
>> you can move the static global to function scope.
> 
> Also, why bother with it? Why not simply arrange to call the function just 
> once at startup?

Hello.

It's called from multiple locations:

$ git grep init_opts_obstack
gcc/gcc.c:  init_opts_obstack ();
gcc/lto-wrapper.c:  init_opts_obstack ();
gcc/opts.c:init_opts_obstack (void)
gcc/opts.c:  init_opts_obstack ();
gcc/opts.h:extern void init_opts_obstack (void);

Maybe there's a common ancestor of all paths, maybe it can be done as a 
follow-up.

> 
> It's not clear from the submission why this is done and how it relates to the 
> i386.c hunk.

Sorry that the hunk isn't explained better, in fast this change is unrelated 
and fixed
another memory leak.

Thanks,
Martin

> 
> 
> Bernd



Re: [PATCH, 11/16] Update testcases after adding kernels pass group

2015-11-12 Thread David Malcolm
On Thu, 2015-11-12 at 15:43 +0100, Richard Biener wrote:
> On Thu, Nov 12, 2015 at 3:31 PM, Tom de Vries  wrote:
> > On 11/11/15 12:03, Richard Biener wrote:
> >>
> >> On Mon, 9 Nov 2015, Tom de Vries wrote:
> >>
> >>> On 09/11/15 16:35, Tom de Vries wrote:
> 
>  Hi,
> 
>  this patch series for stage1 trunk adds support to:
>  - parallelize oacc kernels regions using parloops, and
>  - map the loops onto the oacc gang dimension.
> 
>  The patch series contains these patches:
> 
> 1Insert new exit block only when needed in
>    transform_to_exit_first_loop_alt
> 2Make create_parallel_loop return void
> 3Ignore reduction clause on kernels directive
> 4Implement -foffload-alias
> 5Add in_oacc_kernels_region in struct loop
> 6Add pass_oacc_kernels
> 7Add pass_dominator_oacc_kernels
> 8Add pass_ch_oacc_kernels
> 9Add pass_parallelize_loops_oacc_kernels
>    10Add pass_oacc_kernels pass group in passes.def
>    11Update testcases after adding kernels pass group
>    12Handle acc loop directive
>    13Add c-c++-common/goacc/kernels-*.c
>    14Add gfortran.dg/goacc/kernels-*.f95
>    15Add libgomp.oacc-c-c++-common/kernels-*.c
>    16Add libgomp.oacc-fortran/kernels-*.f95
> 
>  The first 9 patches are more or less independent, but patches 10-16 are
>  intended to be committed at the same time.
> 
>  Bootstrapped and reg-tested on x86_64.
> 
>  Build and reg-tested with nvidia accelerator, in combination with a
>  patch that enables accelerator testing (which is submitted at
>  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> 
>  I'll post the individual patches in reply to this message.
> >>>
> >>>
> >>> This patch updates existing testcases with new pass numbers, given the
> >>> passes
> >>> that were added in the pass list in patch 10.
> >>
> >>
> >> I think it would be nice to be able to specify the number in the .def
> >> file instead so we can avoid this kind of churn everytime we do this.
> >
> >
> > How about something along the lines of:
> > ...
> >   /* pass_build_ealias is a dummy pass that ensures that we
> >  execute TODO_rebuild_alias at this point.  */
> >   NEXT_PASS (pass_build_ealias);
> >   /* Pass group that runs when there are oacc kernels in the
> >   function.  */
> >   NEXT_PASS (pass_oacc_kernels);
> >   PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
> >   PUSH_ID ("oacc_kernels")
> > ...
> >   POP_ID ()
> >   POP_INSERT_PASSES ()
> >   NEXT_PASS (pass_fre);
> > ...
> >
> > where the PUSH_ID/POP_ID pair has the functionality that all the contained
> > passes:
> > - have the id prefixed to the dump file, so the dump file of pass_ch
> >   which normally is "ch" becomes "oacc_kernels_ch", and
> > - the pass name in pass_instances.def becomes pass_oacc_kernels_ch, such
> >   that it doesn't count as numbered instance of pass_ch
> > ?
> 
> Hmm.  I'd like to have sth that allows me to add "slp" to both
> pass_slp_vectorize
> instances, having them share the suffix (as no two functions are in both 
> dumps).
> 
> We similarly have "duplicates" across the -Og vs. the -O[0-3] pipeline.
> 
> Basically make all dump file name suffixes manually specified which means 
> moving
> them from the class definition to the actual instance.
> 
> Well, just an idea.  In a distant future I like our pass pipeline to become 
> more
> dynamic, getting away from a static passes.def towards, say, a pass "script"
> (to be able to say "if inlining did nothing skip this group" or similar).

Can't that be done by having a parent pass to hold them, with a gate
function?

Or are you thinking of having another domain-specific language?

Thinking aloud, I've sometimes wondered if it would be helpful to be
able to subclass pass_manager, so that multiple passes.def files could
generate alternative pass_manager subclasses, with the precise choice of
pass_manager subclass being determined by options+target.  I don't know
if that latter idea is useful though.

Dave



[PATCH] Fix ICE for boolean comparison

2015-11-12 Thread Ilya Enkovich
Hi,

Currently compiler may ICE when loaded boolean is compared with vector 
invariant or another boolean value.  This is because we don't detect mix of 
bool and non-bool vectypes and incorrectly determine vectype for boolean loop 
invariant for comparison.  This was fixed for COND_EXP before but also needs to 
be fixed for comparison.  This patch was bootstrapped and tested on 
x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-11-12  Ilya Enkovich  

* tree-vect-loop.c (vect_determine_vectorization_factor): Check
mix of boolean and integer vectors in a single statement.
* tree-vect-slp.c (vect_mask_constant_operand_p): New.
(vect_get_constant_vectors): Use vect_mask_constant_operand_p to
determine constant type.
* tree-vect-stmts.c (vectorizable_comparison): Provide vectype
for loop invariants.

gcc/testsuite/

2015-11-12  Ilya Enkovich  

* g++.dg/vect/simd-bool-comparison-1.cc: New test.
* g++.dg/vect/simd-bool-comparison-2.cc: New test.


diff --git a/gcc/testsuite/g++.dg/vect/simd-bool-comparison-1.cc 
b/gcc/testsuite/g++.dg/vect/simd-bool-comparison-1.cc
new file mode 100644
index 000..a08362f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/simd-bool-comparison-1.cc
@@ -0,0 +1,21 @@
+// { dg-do compile }
+// { dg-additional-options "-mavx512bw -mavx512dq" { target { i?86-*-* 
x86_64-*-* } } }
+
+#define N 1024
+
+double a[N];
+bool b[N];
+bool c;
+
+void test ()
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+if (b[i] != c)
+  a[i] = 0.0;
+else
+  a[i] = 1.0;
+}
+
+// { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
i?86-*-* x86_64-*-* } } } }
diff --git a/gcc/testsuite/g++.dg/vect/simd-bool-comparison-2.cc 
b/gcc/testsuite/g++.dg/vect/simd-bool-comparison-2.cc
new file mode 100644
index 000..4accf56
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/simd-bool-comparison-2.cc
@@ -0,0 +1,20 @@
+// { dg-do compile }
+// { dg-additional-options "-mavx512bw -mavx512dq" { target { i?86-*-* 
x86_64-*-* } } }
+
+#define N 1024
+
+double a[N];
+bool b[N];
+char c[N];
+
+void test ()
+{
+  int i;
+
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+if ((c[i] > 0) && b[i])
+  a[i] = 0.0;
+else
+  a[i] = 1.0;
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 55e5309..6b78b55 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -649,7 +649,32 @@ vect_determine_vectorization_factor (loop_vec_info 
loop_vinfo)
}
  return false;
}
+ else if (VECTOR_BOOLEAN_TYPE_P (mask_type)
+  != VECTOR_BOOLEAN_TYPE_P (vectype))
+   {
+ if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "not vectorized: mixed mask and "
+  "nonmask vector types in statement, ");
+ dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+mask_type);
+ dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
+ dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+vectype);
+ dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+   }
+ return false;
+   }
}
+
+ /* We may compare boolean value loaded as vector of integers.
+Fix mask_type in such case.  */
+ if (mask_type
+ && !VECTOR_BOOLEAN_TYPE_P (mask_type)
+ && gimple_code (stmt) == GIMPLE_ASSIGN
+ && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == 
tcc_comparison)
+   mask_type = build_same_sized_truth_vector_type (mask_type);
}
 
   /* No mask_type should mean loop invariant predicate.
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 9d97140..f3acb04 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2589,6 +2589,57 @@ vect_slp_bb (basic_block bb)
 }
 
 
+/* Return 1 if vector type of boolean constant which is OPNUM
+   operand in statement STMT is a boolean vector.  */
+
+static bool
+vect_mask_constant_operand_p (gimple *stmt, int opnum)
+{
+  stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
+  enum tree_code code = gimple_expr_code (stmt);
+  tree op, vectype;
+  gimple *def_stmt;
+  enum vect_def_type dt;
+
+  /* For comparison and COND_EXPR type is chosen depending
+ on the other comparison operand.  */
+  if (TREE_CODE_CLASS (code) == tcc_comparison)
+{
+  if (opnum)
+   op = gimple_assign_rhs1 (stmt);
+  else
+   op = gimple_assign_rhs2 (stmt);
+
+  if (!vect_is_simple_use (op, stmt_vinfo->vinfo, _stmt,
+  , ))
+   gcc_unreachable ();
+
+  return 

Re: [patch] libstdc++/56158 Extend valid values of iostream bitmask types

2015-11-12 Thread Martin Sebor

On 11/11/2015 02:48 AM, Jonathan Wakely wrote:

As described in the PR, we have operator~ overloads defined for
enumeration types which produce values outside the range of valid
values for the type. In C++11 that can be trivially solved by giving
the enumeration types a fixed underlying type, but this code needs to
be valid in C++03 too.

This patch defines new min/max enumerators as INT_MIN/INT_MAX so that
every int value is also a valid value for the bitmask type.

Does anyone see any problems with this solution, or better solutions?


Just a minor nit that the C-style cast in the below triggers
a -Wold-style-cast warning in Clang, in case libstdc++ tries
to be Clang-warning free. Since the type of __INT_MAX__ is
int it shouldn't be necessary.

+  _S_ios_fmtflags_min = ~(int)__INT_MAX__



Any suggestions for how to test this, given that GCC's ubsan doesn't
check for this, and we can't run the testsuite with ubsan anyway?


Use a case/switch statement with -Werror=switch-enum to make sure
all the cases are handled and none is duplicated or outside of the
valid values of the enumeration:

  void foo (ios::iostate s) {
  switch (s) {
  case badbit:
  case eofbit:
  case failbit:
  case goodbit:
  case __INT_MAX__:
  case ~__INT_MAX__: ;
}
  }

Martin


Re: [PATCH] More compile-time saving in BB vectorization

2015-11-12 Thread Andreas Schwab
Richard Biener  writes:

>   * tree-vectorizer.h (vect_slp_analyze_and_verify_instance_alignment):
>   Declare.
>   (vect_analyze_data_refs_alignment): Make loop vect specific.
>   (vect_verify_datarefs_alignment): Likewise.
>   * tree-vect-data-refs.c (vect_slp_analyze_data_ref_dependences):
>   Add missing continue.
>   (vect_compute_data_ref_alignment): Export.
>   (vect_compute_data_refs_alignment): Merge into...
>   (vect_analyze_data_refs_alignment): ... this.
>   (verify_data_ref_alignment): Split out from ...
>   (vect_verify_datarefs_alignment): ... here.
>   (vect_slp_analyze_and_verify_node_alignment): New function.
>   (vect_slp_analyze_and_verify_instance_alignment): Likewise.
>   * tree-vect-slp.c (vect_supported_load_permutation_p): Remove
>   misplaced checks on alignment.
>   (vect_slp_analyze_bb_1): Add fatal output parameter.  Do
>   alignment analysis after SLP discovery and do it per instance.
>   (vect_slp_bb): When vect_slp_analyze_bb_1 fatally failed do not
>   bother to re-try using different vector sizes.

This breaks libgfortran on ia64:

../../../libgfortran/generated/matmul_c4.c: In function 'matmul_c4':
../../../libgfortran/generated/matmul_c4.c:79:1: internal compiler error: in 
vectorizable_store, at tree-vect-stmts.c:5651
 matmul_c4 (gfc_array_c4 * const restrict retarray,
 ^
0x410ff01f vectorizable_store
../../gcc/tree-vect-stmts.c:5651
0x41115b5f vect_transform_stmt(gimple*, gimple_stmt_iterator*, bool*, 
_slp_tree*, _slp_instance*)
../../gcc/tree-vect-stmts.c:8003
0x4114df1f vect_schedule_slp_instance
../../gcc/tree-vect-slp.c:3484
0x41154d6f vect_schedule_slp(vec_info*)
../../gcc/tree-vect-slp.c:3549
0x411562bf vect_slp_bb(basic_block_def*)
../../gcc/tree-vect-slp.c:2543
0x41159f2f execute
../../gcc/tree-vectorizer.c:734

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

2015-11-12 Thread James Greenhalgh
On Tue, Nov 10, 2015 at 11:50:12AM -0600, Evandro Menezes wrote:
>2015-11-10  Evandro Menezes 
> 
>gcc/
> 
>* config/aarch64/aarch64.md (predicated): Copy attribute from
>"arm.md".
> 
> This patch duplicates an attribute from arm.md so that the same
> pipeline model can be used for both AArch32 and AArch64.
> 
> Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.
> 
> Please, commit if it's alright.
> 
> -- 
> Evandro Menezes
> 
> 

> From 3b643a3c026350864713e1700dc44e4794d93809 Mon Sep 17 00:00:00 2001
> From: Evandro Menezes 
> Date: Mon, 9 Nov 2015 17:11:16 -0600
> Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
>  pipeline models
> 
> gcc/
>   * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md".
> ---
>  gcc/config/aarch64/aarch64.md | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 6b08850..2bc2ff5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -195,6 +195,11 @@
>  ;; 1 :=: yes
>  (define_attr "far_branch" "" (const_int 0))
>  
> +;; [For compatibility with ARM in pipeline models]
> +;; Attribute that specifies whether or not the instruction is executed
> +;; conditionally ( != "AL"? "yes": "no").

I'm not sure this  != "AL" [...] part makes sense to me (thinking only
of AArch64, I'd understand it on AArch32 :) ) and we should document that
this is never set for AArch64. Could you respin with a slightly clearer
comment.

Thanks,
James



Re: [PATCH] nvptx: implement automatic storage in custom stacks

2015-11-12 Thread Alexander Monakov
On Thu, 12 Nov 2015, Bernd Schmidt wrote:
> > I've run it through make -k check-c regtesting.  These are new fails, all
> > mysterious:
> 
> These would have to be investigated first.

Any specific suggestions?  The PTX code emitted from GCC differs only in
prologue/epilogue, so whatever's broken... I think is unlikely due to this
change.  I can give it another try after upgrading CUDA driver and cuda-gdb
from 7.0 to latest.

> > + sz = (sz + keep_align - 1) & ~(keep_align - 1);
> 
> Use the ROUND_UP macro.

OK, thanks.
 
> > + fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n",
> > +  bits == 64 ? ".wide" : "", bits);
> 
> Use a shift.

I think mul is acceptable here: PTX JIT is handling it properly, according to
what I saw while investigating in cuda-gdb.  If I used a shift, I'd also have
to introduce another instruction for a widening integer conversion in the
64-bit case.  Do you insist?

> > +
> > +  if (need_softstack_decl)
> > +{
> > +  fprintf (asm_out_file, ".extern .shared .u64 __nvptx_stacks[];\n;");
> > +}
> 
> Lose excess braces.

OK.
 
> > +.global .u64 %__softstack[16384];
> 
> Maybe declarea as .u8 so you don't have two different constants for the stack
> size?

OK, with ".align 8" to ensure 64-bit alignment.
 
> > +.reg .u64 %stackptr;
> > +mov.u64%stackptr, %__softstack;
> > +cvta.global.u64%stackptr, %stackptr;
> > +add.u64%stackptr, %stackptr, 131072;
> > +st.shared.u64  [__nvptx_stacks], %stackptr;
> > +
> 
> I'm guessing you have other missing pieces for setting this up for multiple
> threads.

This is crt0.s, which is linked in only for single-threaded testing with
-mmainkernel; for OpenMP, the intention is to handle it in the file that
implements libgomp_nvptx_main.

Thanks.
Alexander


[PATCH][GCC][ARM] testcase memset-inline-10.c uses -mfloat-abi=hard but does not check whether target supports it

2015-11-12 Thread Andre Vieira

Hi,

  This patch changes the memset-inline-10.c testcase to make sure that 
it is only compiled for ARM targets that support -mfloat-abi=hard using 
the fact that all non-thumb1 targets do.


  This is correct because all targets for which -mthumb causes the 
compiler to use thumb2 will support the generation of FP instructions.


  Tested by running regressions for this testcase for various ARM targets.

  Is this OK to commit?

  Thanks,
  Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira  

* gcc.target/arm/memset-inline-10.c: Added
dg-require-effective-target arm_thumb2_ok.



[PATCH] Enable libmpx by default on supported target

2015-11-12 Thread Ilya Enkovich
Hi,

libmpx was added close to release date and therefore was disabled by default 
for all targets.  This patch enables it by default for supported targets.  Is 
it OK for trunk?

Thanks,
Ilya
--
2015-11-12  Tsvetkova Alexandra  

* configure.ac: Enable libmpx by default.
* configure: Regenerated.


diff --git a/configure.ac b/configure.ac
index cb6ca24..55f9ab0 100644
--- a/configure.ac
+++ b/configure.ac
@@ -660,7 +660,7 @@ fi
 
 # Enable libmpx on supported systems by request.
 if test -d ${srcdir}/libmpx; then
-if test x$enable_libmpx = xyes; then
+if test x$enable_libmpx = x; then
AC_MSG_CHECKING([for libmpx support])
if (srcdir=${srcdir}/libmpx; \
. ${srcdir}/configure.tgt; \
@@ -671,8 +671,6 @@ if test -d ${srcdir}/libmpx; then
else
AC_MSG_RESULT([yes])
fi
-else
-   noconfigdirs="$noconfigdirs target-libmpx"
 fi
 fi
 


Re: [hsa 2/12] Modifications to libgomp proper

2015-11-12 Thread Jakub Jelinek
On Thu, Nov 12, 2015 at 02:21:56PM +0100, Thomas Schwinge wrote:
> > > --- a/libgomp/libgomp.h
> > > +++ b/libgomp/libgomp.h
> > > @@ -876,7 +876,8 @@ struct gomp_device_descr
> > >void *(*dev2host_func) (int, void *, const void *, size_t);
> > >void *(*host2dev_func) (int, void *, const void *, size_t);
> > >void *(*dev2dev_func) (int, void *, const void *, size_t);
> > > -  void (*run_func) (int, void *, void *);
> > > +  void (*run_func) (int, void *, void *, const void *);
> > 
> > Adding arguments to existing plugin methods is a plugin ABI incompatible
> > change.  We now have:
> >   DLSYM (version);
> >   if (device->version_func () != GOMP_VERSION)
> > {
> >   err = "plugin version mismatch";
> >   goto fail;
> > }
> > so there is a way to deal with it, but you need to adjust all plugins.
> 
> I'm confused -- didn't we agree that we don't need to maintain backwards
> compatibility in the libgomp <-> plugins interface?  (Nathan?)  As far as
> I remember, the argument was that libgomp and all its plugins will always
> be built from the same source tree, so will be compatible with each
> other, "by definition"?
> 
> (We do need, and have, versioning between GCC proper and libgomp
> interfaces.)

I've mentioned the GOMP_VERSION check in there, and that all the plugins
would need to be modified, which was I think not shown in the patch.

> > But this shows a worse problem,
> > if you have GCC 5 compiled OpenMP code, of course there won't be HSA
> > offloaded copy, but if you try to run it on a box with HSA offloading
> > enabled, you can run into this assertion failure.
> 
> That's one of the issues that I'm working on resolving with my
> "Forwarding -foffload=[...] from the driver (compile-time) to libgomp
> (run-time)" patch,
> .
> In such a case (no GOMP_offload_register_ver call for HSA), HSA
> offloading would not be considered (not "enabled") in libgomp.  (It'll be
> two more weeks before I can make progress with that patch; will be
> attending SuperComputing 2015 next week -- anyone else will be there,
> too?)

You are aware of my objections to that and what does it do with later dlopened
libraries.

> > GOMP_target_ext has different arguments, you get the num_teams and
> > thread_limit clauses values in there already (if known at compile time or
> > before entering target region; 0 stands for implementation defined choice,
> > -1 for unknown before GOMP_target_ext).
> > Plus I must say I really don't like the addition of HSA specific argument
> > to the API, it is unclean and really doesn't scale, when somebody adds
> > support for another offloading target, would we add again another argument?
> > Can't use the same one, because one could have configured both HSA and that
> > other kind offloading at the same time and which one is picked would be only
> > a runtime decision, based on env vars of omp_set_default_device etc.
> > num_teams/thread_limit, as runtime arguments, you already get on the trunk.
> > For compile time decided values, those should go into some data section
> > and be somehow attached to what fn is translated into in the AVL tree (which
> > you really don't need to use for variables on GOMP_OFFLOAD_CAP_SHARED_MEM
> > obviously, but can still use for the kernels, and populate during
> > registration of the offloading region).
> 
> What about adopting the "tagging" scheme that we added for
> libgomp/oacc-parallel.c:GOACC_parallel_keyed?  With support for other
> offloading schemes being added one by one, isn't it quite likely that the
> interface will need to be adjusted for each of them, because
> more/different data will have to be transmitted from GCC proper to
> libgomp?

Perhaps something similar, but certainly not as varargs, that is a really
bad idea.  Instead of the long * array I've talked about perhaps use void **,
and put in unkeyed num_teams and thread_limit as the first two arguments
thereof, then add keyed arguments in there, terminate by 0.

Jakub


Re: [gomp4.5] depend nowait support for target

2015-11-12 Thread Jakub Jelinek
Hi!

Here is updated patch with the team == NULL case hopefully handled.
But the testcase I wrote (target-33.c) hangs, the problem is in the
  #pragma omp target nowait map (tofrom: a, b) depend(out: d[3])
  {
#pragma omp atomic update
a = a + 9;
b -= 8;
  }
  #pragma omp target nowait map (tofrom: a, c) depend(out: d[4])
  {
#pragma omp atomic update
a = a + 4;
c >>= 1;
  }
  #pragma omp task if (0) depend (in: d[3], d[4])
  if (a != 50 || b != 4 || c != 20)
abort ();
part, where (I should change that for the case of no dependencies
eventually) the task with map_vars+async_run is queued in both cases,
then we reach GOMP_task, which calls gomp_task_maybe_wait_for_dependencies
which spawns the first half task (map_vars+async_run), and then
the second half task (map_vars+async_run), but that one gets stuck somewhere
in liboffloadmic, then some other thread (from liboffloadmic) calls
GOMP_PLUGIN_target_task_completion and enqueues the second half of the first
target task (unmap_vars), but as the only normal thread in the main program
is stuck in liboffloadmic (during gomp_map_vars, trying to allocate
target memory in the plugin), there is no thread to schedule the second half
of first target task.  So, if liboffloadmic is stuck waiting for unmap_vars,
it is a deadlock.  Can you please try to debug this?
I'll try tomorrow another testcase like target-33.c, but with
#pragma omp parallel 
#pragma omp single
around everything in main, both with OMP_NUM_THREADS=16 and 1, for 1 I would
expect it would be the same though.

--- liboffloadmic/runtime/offload_host.cpp.jj   2015-11-05 11:31:05.013916598 
+0100
+++ liboffloadmic/runtime/offload_host.cpp  2015-11-10 12:58:55.090951303 
+0100
@@ -64,6 +64,9 @@ static void __offload_fini_library(void)
 #define GET_OFFLOAD_NUMBER(timer_data) \
 timer_data? timer_data->offload_number : 0
 
+extern "C" void
+__gomp_offload_intelmic_async_completed (const void *);
+
 extern "C" {
 #ifdef TARGET_WINNT
 // Windows does not support imports from libraries without actually
@@ -2507,7 +2510,7 @@ extern "C" {
 const void *info
 )
 {
-   /* TODO: Call callback function, pass info.  */
+   __gomp_offload_intelmic_async_completed (info);
 }
 }
 
--- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp.jj 2015-10-14 
10:24:10.922194230 +0200
+++ liboffloadmic/plugin/libgomp-plugin-intelmic.cpp2015-11-11 
15:48:55.428967827 +0100
@@ -192,11 +192,23 @@ GOMP_OFFLOAD_get_num_devices (void)
 
 static void
 offload (const char *file, uint64_t line, int device, const char *name,
-int num_vars, VarDesc *vars, VarDesc2 *vars2)
+int num_vars, VarDesc *vars, VarDesc2 *vars2, const void **async_data)
 {
   OFFLOAD ofld = __offload_target_acquire1 (, file, line);
   if (ofld)
-__offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL, NULL);
+{
+  if (async_data == NULL)
+   __offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL,
+   NULL);
+  else
+   {
+ OffloadFlags flags;
+ flags.flags = 0;
+ flags.bits.omp_async = 1;
+ __offload_offload3 (ofld, name, 0, num_vars, vars, NULL, 0, NULL,
+ async_data, 0, NULL, flags, NULL);
+   }
+}
   else
 {
   fprintf (stderr, "%s:%d: Offload target acquire failed\n", file, line);
@@ -218,7 +230,7 @@ GOMP_OFFLOAD_init_device (int device)
   TRACE ("");
   pthread_once (_image_is_registered, register_main_image);
   offload (__FILE__, __LINE__, device, "__offload_target_init_proc", 0,
-  NULL, NULL);
+  NULL, NULL, NULL);
 }
 
 extern "C" void
@@ -240,7 +252,7 @@ get_target_table (int device, int _f
   VarDesc2 vd1g[2] = { { "num_funcs", 0 }, { "num_vars", 0 } };
 
   offload (__FILE__, __LINE__, device, "__offload_target_table_p1", 2,
-  vd1, vd1g);
+  vd1, vd1g, NULL);
 
   int table_size = num_funcs + 2 * num_vars;
   if (table_size > 0)
@@ -254,7 +266,7 @@ get_target_table (int device, int _f
   VarDesc2 vd2g = { "table", 0 };
 
   offload (__FILE__, __LINE__, device, "__offload_target_table_p2", 1,
-  , );
+  , , NULL);
 }
 }
 
@@ -401,8 +413,8 @@ GOMP_OFFLOAD_alloc (int device, size_t s
   vd1[1].size = sizeof (void *);
   VarDesc2 vd1g[2] = { { "size", 0 }, { "tgt_ptr", 0 } };
 
-  offload (__FILE__, __LINE__, device, "__offload_target_alloc", 2, vd1, vd1g);
-
+  offload (__FILE__, __LINE__, device, "__offload_target_alloc", 2, vd1, vd1g,
+  NULL);
   return tgt_ptr;
 }
 
@@ -416,7 +428,8 @@ GOMP_OFFLOAD_free (int device, void *tgt
   vd1.size = sizeof (void *);
   VarDesc2 vd1g = { "tgt_ptr", 0 };
 
-  offload (__FILE__, __LINE__, device, "__offload_target_free", 1, , 
);
+  offload (__FILE__, __LINE__, device, "__offload_target_free", 1, , ,
+  NULL);
 }
 
 extern "C" void *
@@ -435,7 +448,7 @@ GOMP_OFFLOAD_host2dev (int device, void
   VarDesc2 vd1g[2] 

Re: [PATCH] Enable libstdc++ numeric conversions on Cygwin

2015-11-12 Thread Jonathan Wakely

On 12/11/15 12:24 -0500, Jennifer Yao wrote:

On 12/11/15 13:39 +, Jonathan Wakely wrote:


One downside of this change is that we introduce some (hopefully safe)
ODR violations, where inline functions and templates that depend on
_GLIBCXX_USE_C99_FOO might now be defined differently in C++98 and
C++11 code. Previously they had the same definition, even though in
C++11 mode the value of the _GLIBCXX_USE_C99_FOO macro might have been
sub-optimal (i.e. the C99 features were usable, but the macro said
they weren't). Those ODR violatiosn could be avoided if needed, by
always using the _GLIBCXX98_USE_C99_FOO macro in code that can be
included from either C++98 or C++11. We could still use the
_GLIBCXX11_USE_C99_FOO macro in pure C++11 code (such as the numeric
conversion functions) and get most of the benefit of this change.



This patch (relative to the previous one) would avoid the ODR
problems, by only using the C++98 macro in code that gets used in
C++98 and later, and using the _GLIBCXX11_XXX ones in code that is
never compiled as C++98 (specifically, the numeric conversion
functions).

Maybe this is a safer, more conservative change.


I haven't tested either of your patches yet (the testsuite runs
rally slowly on Cygwin T___T), but I just wanted to express my
approval of the proposed changes (more specifically, the second patch
you posted).


The second one doesn't work sadly. If you're going to test, pelase
test the first patch only.


Also, I was not aware that we had to worry about C++03 compatibility.
Sounds tedious.


It is :-\




Re: [gomp4.5] depend nowait support for target

2015-11-12 Thread Ilya Verbin
On Thu, Nov 12, 2015 at 18:58:22 +0100, Jakub Jelinek wrote:
> > Unfortunately, target-32.c fails for me using emulation mode:
> 
> I haven't managed to get it stuck yet (unlike the target-33.c one, see
> another mail), what OMP_NUM_THREADS you are using
> and how many cores/threads?

OMP_NUM_THREADS isn't set.  40 cores.

  -- Ilya


[hsa] Merged trunk revision 230248 into the hsa branch

2015-11-12 Thread Martin Jambor
Hi

I have just Merged trunk revision 230248 into the hsa branch.  I will
prepare a new submission for inclusion to trunk tomorrow.

Thanks,

Martin


Re: [gomp4.5] depend nowait support for target

2015-11-12 Thread Ilya Verbin
On Wed, Nov 11, 2015 at 17:52:22 +0100, Jakub Jelinek wrote:
> On Mon, Oct 19, 2015 at 10:47:54PM +0300, Ilya Verbin wrote:
> > So, here is what I have for now.  Attached target-29.c testcase works fine 
> > with
> > MIC emul, however I don't know how to (and where) properly check for 
> > completion
> > of async execution on target.  And, similarly, where to do unmapping after 
> > that?
> > Do we need a callback from plugin to libgomp (as far as I understood, PTX
> > runtime supports this, but HSA doesn't), or libgomp will just check for
> > ttask->is_completed in task.c?
> 
> Here is the patch updated to have a task.c defined function that the plugin
> can call upon completion of async offloading exection.

Thanks.

> The testsuite coverage will need to improve, the testcase is wrong
> (contains data races - if you want to test parallel running of two target
> regions that both touch the same var, I'd say best would be to use
> #pragma omp atomic and or in 4 in one case and 1 in another case, then
> test if result is 5 (and similarly for the other var).
> Also, with the usleeps Alex Monakov will be unhappy because PTX newlib does
> not have it, but we'll need to find some solution for that.
> 
> Another thing to work on beyond testsuite coverage (it is desirable to test
> nowait target tasks (both depend and without depend) being awaited in all
> the various waiting spots, i.e. end of parallel, barrier, taskwait, end of
> taskgroup, or if (0) task with depend clause waiting on that.
> 
> Also, I wonder what to do if #pragma omp target nowait is used outside of
> (host) parallel - when team is NULL.  All the tasking code in that case just
> executes tasks undeferred, which is fine for all but target nowait - there
> it is I'd say useful to be able to run a single host thread concurrently
> with some async offloading tasks.  So, I wonder if in that case,
> if we encounter target nowait with team == NULL, should not just create a
> dummy non-active (nthreads == 1) team, as if there was #pragma omp parallel
> if (0) starting above it and ending at program's end.  In OpenMP, the
> program's initial thread is implicitly surrounded by inactive parallel, so
> this isn't anything against the OpenMP execution model.  But we'd need to
> free the team somewhere in a destructor.
>
> Can you please try to cleanup the liboffloadmic side of this, so that
> a callback instead of hardcoded __gomp_offload_intelmic_async_completed call
> is used?

Do you mean something like the patch bellow?  I'll discuss it with liboffloadmic
maintainers.

> Can you make sure it works on XeonPhi non-emulated too?

I'm trying to do it, but it will take some time...

Unfortunately, target-32.c fails for me using emulation mode:

Program received signal SIGSEGV, Segmentation fault.
#0  0x7ff4ab1265ed in priority_list_remove (list=0x0, node=0x7ff49001afa0, 
model=MEMMODEL_RELAXED) at libgomp/priority_queue.h:422
#1  0x7ff4ab1266d9 in priority_tree_remove (type=PQ_CHILDREN, 
head=0x1883138, node=0x7ff49001afa0) at libgomp/priority_queue.c:195
#2  0x7ff4ab10fa06 in priority_queue_remove (type=PQ_CHILDREN, 
head=0x1883138, task=0x7ff49001af30, model=MEMMODEL_RELAXED) at 
libgomp/priority_queue.h:468
#3  0x7ff4ab11570d in gomp_task_maybe_wait_for_dependencies 
(depend=0x7ff49b0d9de0) at libgomp/task.c:1539
#4  0x7ff4ab11fd46 in GOMP_target_enter_exit_data (device=-1, mapnum=3, 
hostaddrs=0x7ff49b0d9dc0, sizes=0x6020b0 <.omp_data_sizes.38>, kinds=0x6020a0 
<.omp_data_kinds.39>, flags=2, depend=0x7ff49b0d9de0) at libgomp/target.c:1662
#5  0x004011f9 in main._omp_fn ()
#6  0x7ff4ab1160f3 in gomp_thread_start (xdata=0x7fffe93766a0) at 
libgomp/team.c:119
#7  0x003b07e07ee5 in start_thread () from /lib64/libpthread.so.0
#8  0x003b076f4b8d in clone () from /lib64/libc.so.6

However when I manually run commands from testsuite/libgomp.log under the same
environment, it passes.  Don't know where is the difference.

Also I tried to replace 'b = 4;' and 'b = 5;' with infinite loops, but got only
100% CPU usage in offload_target_main instead of 200%, so it seems that only one
target task is running concurrently.


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 6da09b1..772e198 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -220,6 +220,10 @@ static void
 register_main_image ()
 {
   __offload_register_image (_target_image);
+
+  /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
+ asynchronous task on target is completed.  */
+  __offload_register_task_callback (GOMP_PLUGIN_target_task_completion);
 }
 
 /* liboffloadmic loads and runs offload_target_main on all available devices
@@ -537,13 +541,3 @@ GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars)
 
   GOMP_OFFLOAD_async_run (device, tgt_fn, tgt_vars, NULL);
 }
-
-/* Called by liboffloadmic 

Re: [gomp4.5] depend nowait support for target

2015-11-12 Thread Jakub Jelinek
On Thu, Nov 12, 2015 at 08:43:53PM +0300, Ilya Verbin wrote:
> > Can you please try to cleanup the liboffloadmic side of this, so that
> > a callback instead of hardcoded __gomp_offload_intelmic_async_completed call
> > is used?
> 
> Do you mean something like the patch bellow?  I'll discuss it with 
> liboffloadmic
> maintainers.

Yeah; though am not 100% sure if a static variable settable by some function
is the best way, in case liboffloadmic is used by more than just libgomp
itself in the same process.

> I'm trying to do it, but it will take some time...
> 
> Unfortunately, target-32.c fails for me using emulation mode:

I haven't managed to get it stuck yet (unlike the target-33.c one, see
another mail), what OMP_NUM_THREADS you are using
and how many cores/threads?
Anyway, will try to figure out something from the backtrace you've provided.

Jakub


Re: [patch] libstdc++/56158 Extend valid values of iostream bitmask types

2015-11-12 Thread Martin Sebor

On 11/12/2015 10:08 AM, Jonathan Wakely wrote:

On 12/11/15 08:48 -0700, Martin Sebor wrote:

On 11/11/2015 02:48 AM, Jonathan Wakely wrote:

As described in the PR, we have operator~ overloads defined for
enumeration types which produce values outside the range of valid
values for the type. In C++11 that can be trivially solved by giving
the enumeration types a fixed underlying type, but this code needs to
be valid in C++03 too.

This patch defines new min/max enumerators as INT_MIN/INT_MAX so that
every int value is also a valid value for the bitmask type.

Does anyone see any problems with this solution, or better solutions?


Just a minor nit that the C-style cast in the below triggers
a -Wold-style-cast warning in Clang, in case libstdc++ tries
to be Clang-warning free. Since the type of __INT_MAX__ is
int it shouldn't be necessary.

+  _S_ios_fmtflags_min = ~(int)__INT_MAX__


That's worth fixing, thanks.



Any suggestions for how to test this, given that GCC's ubsan doesn't
check for this, and we can't run the testsuite with ubsan anyway?


Use a case/switch statement with -Werror=switch-enum to make sure
all the cases are handled and none is duplicated or outside of the
valid values of the enumeration:

 void foo (ios::iostate s) {
 switch (s) {
 case badbit:
 case eofbit:
 case failbit:
 case goodbit:
 case __INT_MAX__:
 case ~__INT_MAX__: ;
   }
 }


I thought this was a great idea at first ... but -Wswitch-enum will
complain that the end, min and max enumerators are not handled (even
though __INT_MAX__ and ~__INT_MAX__ have the same values as the max
and min ones, respectively).


Hmm, I didn't see any warnings for the small test case I wrote and
still don't. Just out of curiosity, what did I miss?

enum iostate {
goodbit = 0,
eofbit,
failbit,
badbit,
max = __INT_MAX__,
min = ~__INT_MAX__
};

void foo (iostate s) {
switch (s) {
case badbit:
case eofbit:
case failbit:
case goodbit:
case __INT_MAX__:
case ~__INT_MAX__: ;
;
}
}

Martin



Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-11-12 Thread Jeff Law

On 11/12/2015 03:58 AM, Richard Biener wrote:

On Wed, Nov 11, 2015 at 9:38 PM, Jeff Law  wrote:

On 09/04/2015 11:36 AM, Ajit Kumar Agarwal wrote:


diff --git a/gcc/passes.def b/gcc/passes.def
index 6b66f8f..20ddf3d 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ccp);
   /* After CCP we rewrite no longer addressed locals into SSA
  form if possible.  */
+  NEXT_PASS (pass_path_split);
   NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_sra_early);


I can't recall if we've discussed the location of the pass at all.  I'm
not objecting to this location, but would like to hear why you chose
this particular location in the optimization pipeline.


So returning to the question of where this would live in the optimization
pipeline and how it interacts with if-conversion and vectorization.


Note that adding passes to the early pipeline that do code duplication
is a no-no.
The early pipeline should be exclusively for things making functions
more suitable for inlining.
I'd been experimenting with moving it down in the pipeline.  It 
certainly doesn't seem to need to be in the early optimizations.   At 
some point we force latches to have single successors which spoils the 
simplistic region recognition of the path splitting pass.





The concern with moving it to late in the pipeline was that we'd miss
VRP/DCE/CSE opportunities.  I'm not sure if you're aware, but we actually
run those passes more than once.  So it would be possible to run path
splitting after if-conversion & vectorization, but before the second passs
of VRP & DOM.  But trying that seems to result in something scrambling the
loop enough that the path splitting opportunity is missed.  That might be
worth deeper investigation if we can't come up with some kind of heuristics
to fire or suppress path splitting.


As I still think it is a transform similar to tracer just put it next to that.
The CFG has changed shape significantly by that point.  So some 
adjustments would be needed.  Essentially it's no longer the latch that 
needs to be duplicated into the THEN/ELSE clauses, but the join point 
that's the predecessor of the latch.


But that's probably a good change to make anyway because we end up doing 
less damage to the overall shape of the CFG.  Essentially path splitting 
would look like creating superblocks by target duplication, and that's 
kind of what I expected this to look like all-along.





But IIRC you mentioned it should enable vectorization or so?  In this case
that's obviously too late.
The opposite.  Path splitting interferes with if-conversion & 
vectorization.  Path splitting mucks up the CFG enough that 
if-conversion won't fire and as a result vectorization is inhibited.  It 
also creates multi-latch loops, which isn't a great situation either.


It *may* be the case that dropping it that far down in the pipeline and 
making the modifications necessary to handle simple latches may in turn 
make the path splitting code play better with if-conversion and 
vectorization and avoid creation of multi-latch loops.  At least that's 
how it looks on paper when I draw out the CFG manipulations.


I'll do some experiments.

Jeff




Re: [PATCH] Enable libstdc++ numeric conversions on Cygwin

2015-11-12 Thread Jennifer Yao
> On 12/11/15 13:39 +, Jonathan Wakely wrote:
>>
>> One downside of this change is that we introduce some (hopefully safe)
>> ODR violations, where inline functions and templates that depend on
>> _GLIBCXX_USE_C99_FOO might now be defined differently in C++98 and
>> C++11 code. Previously they had the same definition, even though in
>> C++11 mode the value of the _GLIBCXX_USE_C99_FOO macro might have been
>> sub-optimal (i.e. the C99 features were usable, but the macro said
>> they weren't). Those ODR violatiosn could be avoided if needed, by
>> always using the _GLIBCXX98_USE_C99_FOO macro in code that can be
>> included from either C++98 or C++11. We could still use the
>> _GLIBCXX11_USE_C99_FOO macro in pure C++11 code (such as the numeric
>> conversion functions) and get most of the benefit of this change.
>
>
> This patch (relative to the previous one) would avoid the ODR
> problems, by only using the C++98 macro in code that gets used in
> C++98 and later, and using the _GLIBCXX11_XXX ones in code that is
> never compiled as C++98 (specifically, the numeric conversion
> functions).
>
> Maybe this is a safer, more conservative change.

I haven't tested either of your patches yet (the testsuite runs
rally slowly on Cygwin T___T), but I just wanted to express my
approval of the proposed changes (more specifically, the second patch
you posted).

Also, I was not aware that we had to worry about C++03 compatibility.
Sounds tedious.


Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

2015-11-12 Thread Evandro Menezes

On 11/12/2015 09:39 AM, Evandro Menezes wrote:

On 11/12/2015 08:55 AM, James Greenhalgh wrote:

On Tue, Nov 10, 2015 at 11:50:12AM -0600, Evandro Menezes wrote:

2015-11-10  Evandro Menezes 

gcc/

* config/aarch64/aarch64.md (predicated): Copy attribute from
"arm.md".

This patch duplicates an attribute from arm.md so that the same
pipeline model can be used for both AArch32 and AArch64.

Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.

Please, commit if it's alright.

--
Evandro Menezes


 From 3b643a3c026350864713e1700dc44e4794d93809 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:11:16 -0600
Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
  pipeline models

gcc/
* config/aarch64/aarch64.md (predicated): Copy attribute from 
"arm.md".

---
  gcc/config/aarch64/aarch64.md | 5 +
  1 file changed, 5 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md 
b/gcc/config/aarch64/aarch64.md

index 6b08850..2bc2ff5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -195,6 +195,11 @@
  ;; 1 :=: yes
  (define_attr "far_branch" "" (const_int 0))
  +;; [For compatibility with ARM in pipeline models]
+;; Attribute that specifies whether or not the instruction is executed
+;; conditionally ( != "AL"? "yes": "no").
I'm not sure this  != "AL" [...] part makes sense to me (thinking 
only
of AArch64, I'd understand it on AArch32 :) ) and we should document 
that

this is never set for AArch64. Could you respin with a slightly clearer
comment.
Since this attribute was not described in config/arm/arm.md, I felt 
that it needed to, but perhaps in its original place instead.  I agree 
that I should also point out that it's strictly for compatibility with 
AArch32 and that it never matters for AArch64.


WRT the  thing, I was referring to the opcode fields terminology 
used by ARM in its ISA documentation.  Perhaps it's unnecessary, yes?




   2015-11-12  Evandro Menezes 

   [AArch64] Add attribute for compatibility with ARM pipeline models

   gcc/

   * config/aarch64/aarch64.md (predicated): Copy attribute from
   "arm.md".
   * config/arm/arm.md (predicated): Added description.

Please, commit if it's alright.

--
Evandro Menezes

>From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:11:16 -0600
Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
 pipeline models

gcc/
	* config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md".
	* config/arm/arm.md (predicated): Added description.
---
 gcc/config/aarch64/aarch64.md | 4 
 gcc/config/arm/arm.md | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1586256..d46f837 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -195,6 +195,10 @@
 ;; 1 :=: yes
 (define_attr "far_branch" "" (const_int 0))
 
+;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has
+;; no predicated insns.
+(define_attr "predicated" "yes,no" (const_string "no"))
+
 ;; ---
 ;; Pipeline descriptions and scheduling
 ;; ---
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 73c3088..6bda491 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -105,6 +105,9 @@
 (define_attr "fpu" "none,vfp"
   (const (symbol_ref "arm_fpu_attr")))
 
+; Predicated means that the insn form is conditionally executed based on a
+; predicate.  We default to 'no' because no Thumb patterns match this rule
+; and not all ARM insns do.
 (define_attr "predicated" "yes,no" (const_string "no"))
 
 ; LENGTH of an instruction (in bytes)
-- 
2.1.0.243.g30d45f7



Re: [PATCH] Make disabled-optimization warning more informative; increase default max-gcse-memory

2015-11-12 Thread Bradley Lucier

On 11/12/2015 11:57 AM, Bernd Schmidt wrote:

The expanded warning allowed me to see how much memory really was needed
to apply gcse to some of my routines, and 128MB fixes my problem.  The
limit has been 50MB for over 10 years, I think we can up it a bit now.
  {
+  unsigned int memory_request = n_basic_blocks_for_fn (cfun)
+* SBITMAP_SET_SIZE (max_reg_num ())
+* sizeof (SBITMAP_ELT_TYPE);
+


Formatting (wrap in parens to get it indented).


Fixed.



Otherwise ok.


Thanks.

Here's the reformatted patch (but without retesting).

I don't have check-in privileges.

Brad

* gcc/cprop.c (is_too_expensive): Remove.
(gcse.h): Include.
(one_cprop_pass): Call gcse_or_cprop_is_too_expensive, not
is_too_expensive.
* gcc/gcse.h (gcse_or_cprop_is_too_expensive): Declare.
* gcc/gcse.c (is_too_expensive): Rename to ...
(gcse_or_cprop_is_too_expensive): ... this.
Expand warning to add required size of max-gcse-memory.
(one_pre_gcse_pass): Use it.
(one_code_hoisting_pass): Use it.
* gcc/params.def (max-gcse-memory): Increase from 50MB to 128MB.

Index: gcc/gcse.c
===
--- gcc/gcse.c	(revision 230136)
+++ gcc/gcse.c	(working copy)
@@ -510,7 +510,6 @@
 static void update_ld_motion_stores (struct gcse_expr *);
 static void clear_modify_mem_tables (void);
 static void free_modify_mem_tables (void);
-static bool is_too_expensive (const char *);
 
 #define GNEW(T)			((T *) gmalloc (sizeof (T)))
 #define GCNEW(T)		((T *) gcalloc (1, sizeof (T)))
@@ -2565,7 +2564,7 @@
 
   /* Return if there's nothing to do, or it is too expensive.  */
   if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1
-  || is_too_expensive (_("PRE disabled")))
+  || gcse_or_cprop_is_too_expensive (_("PRE disabled")))
 return 0;
 
   /* We need alias.  */
@@ -3493,7 +3492,7 @@
 
   /* Return if there's nothing to do, or it is too expensive.  */
   if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1
-  || is_too_expensive (_("GCSE disabled")))
+  || gcse_or_cprop_is_too_expensive (_("GCSE disabled")))
 return 0;
 
   doing_code_hoisting_p = true;
@@ -3957,9 +3956,13 @@
 /* Return true if the graph is too expensive to optimize. PASS is the
optimization about to be performed.  */
 
-static bool
-is_too_expensive (const char *pass)
+bool
+gcse_or_cprop_is_too_expensive (const char *pass)
 {
+  unsigned int memory_request = (n_basic_blocks_for_fn (cfun)
+ * SBITMAP_SET_SIZE (max_reg_num ())
+ * sizeof (SBITMAP_ELT_TYPE));
+  
   /* Trying to perform global optimizations on flow graphs which have
  a high connectivity will take a long time and is unlikely to be
  particularly useful.
@@ -3981,13 +3984,12 @@
 
   /* If allocating memory for the dataflow bitmaps would take up too much
  storage it's better just to disable the optimization.  */
-  if ((n_basic_blocks_for_fn (cfun)
-   * SBITMAP_SET_SIZE (max_reg_num ())
-   * sizeof (SBITMAP_ELT_TYPE)) > MAX_GCSE_MEMORY)
+  if (memory_request > MAX_GCSE_MEMORY)
 {
   warning (OPT_Wdisabled_optimization,
-	   "%s: %d basic blocks and %d registers",
-	   pass, n_basic_blocks_for_fn (cfun), max_reg_num ());
+	   "%s: %d basic blocks and %d registers; increase --param max-gcse-memory above %d",
+	   pass, n_basic_blocks_for_fn (cfun), max_reg_num (),
+	   memory_request);
 
   return true;
 }
Index: gcc/gcse.h
===
--- gcc/gcse.h	(revision 230136)
+++ gcc/gcse.h	(working copy)
@@ -40,5 +40,6 @@
 #endif
 
 void gcse_c_finalize (void);
+extern bool gcse_or_cprop_is_too_expensive (const char *);
 
 #endif
Index: gcc/cprop.c
===
--- gcc/cprop.c	(revision 230136)
+++ gcc/cprop.c	(working copy)
@@ -39,6 +39,7 @@
 #include "tree-pass.h"
 #include "dbgcnt.h"
 #include "cfgloop.h"
+#include "gcse.h"
 
 
 /* An obstack for our working variables.  */
@@ -1724,47 +1725,6 @@
   return changed;
 }
 
-/* Return true if the graph is too expensive to optimize.  PASS is the
-   optimization about to be performed.  */
-
-static bool
-is_too_expensive (const char *pass)
-{
-  /* Trying to perform global optimizations on flow graphs which have
- a high connectivity will take a long time and is unlikely to be
- particularly useful.
-
- In normal circumstances a cfg should have about twice as many
- edges as blocks.  But we do not want to punish small functions
- which have a couple switch statements.  Rather than simply
- threshold the number of blocks, uses something with a more
- graceful degradation.  */
-  if (n_edges_for_fn (cfun) > 2 + n_basic_blocks_for_fn (cfun) * 4)
-{
-  warning (OPT_Wdisabled_optimization,
-	   "%s: %d basic blocks and %d edges/basic block",
-	   pass, n_basic_blocks_for_fn 

Re: [patch] libstdc++/56158 Extend valid values of iostream bitmask types

2015-11-12 Thread Jonathan Wakely

On 12/11/15 08:48 -0700, Martin Sebor wrote:

On 11/11/2015 02:48 AM, Jonathan Wakely wrote:

As described in the PR, we have operator~ overloads defined for
enumeration types which produce values outside the range of valid
values for the type. In C++11 that can be trivially solved by giving
the enumeration types a fixed underlying type, but this code needs to
be valid in C++03 too.

This patch defines new min/max enumerators as INT_MIN/INT_MAX so that
every int value is also a valid value for the bitmask type.

Does anyone see any problems with this solution, or better solutions?


Just a minor nit that the C-style cast in the below triggers
a -Wold-style-cast warning in Clang, in case libstdc++ tries
to be Clang-warning free. Since the type of __INT_MAX__ is
int it shouldn't be necessary.

+  _S_ios_fmtflags_min = ~(int)__INT_MAX__


That's worth fixing, thanks.



Any suggestions for how to test this, given that GCC's ubsan doesn't
check for this, and we can't run the testsuite with ubsan anyway?


Use a case/switch statement with -Werror=switch-enum to make sure
all the cases are handled and none is duplicated or outside of the
valid values of the enumeration:

 void foo (ios::iostate s) {
 switch (s) {
 case badbit:
 case eofbit:
 case failbit:
 case goodbit:
 case __INT_MAX__:
 case ~__INT_MAX__: ;
   }
 }


I thought this was a great idea at first ... but -Wswitch-enum will
complain that the end, min and max enumerators are not handled (even
though __INT_MAX__ and ~__INT_MAX__ have the same values as the max
and min ones, respectively).

We could use something like this:

 switch(f()) {
 case ~ios::iostate():
 default:
   break;
 }

Which with no warning options at all (or with -Wno-switch to suppress
other warnings) gives this for the buggy code currently on trunk:

e.cc:13:2: warning: case label value is less than minimum value for type
 case ~ios::iostate():
 ^

But that could break if someone fixes that warning to depend on
-Wswitch (which it probably should do). And with -Wswitch that gives a
different warning:

e.cc:10:3: warning: case value ‘-1’ not in enumerated type ‘E’ [-Wswitch]

So I think I'll just leave the tests as they are, and rely on our
diligent Clang users to report if there's still a problem :-)

I've committed the attached patch to trunk.

commit 2ec414f95401368dc532b3fc6155ed7be41edb8e
Author: Jonathan Wakely 
Date:   Thu Nov 12 16:16:20 2015 +

Extend valid values of iostream bitmask types

	PR libstdc++/56158
	* include/bits/ios_base.h (_Ios_Fmtflags, _Ios_Openmode, _Ios_Iostate):
	Define enumerators to ensure all values of type int are valid values
	of the enumeration type.
	* testsuite/27_io/ios_base/types/fmtflags/case_label.cc: Add new cases.
	* testsuite/27_io/ios_base/types/iostate/case_label.cc: Likewise.
	* testsuite/27_io/ios_base/types/openmode/case_label.cc: Likewise.

diff --git a/libstdc++-v3/include/bits/ios_base.h b/libstdc++-v3/include/bits/ios_base.h
index 44029ad..908ba7c 100644
--- a/libstdc++-v3/include/bits/ios_base.h
+++ b/libstdc++-v3/include/bits/ios_base.h
@@ -74,7 +74,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _S_adjustfield 	= _S_left | _S_right | _S_internal,
   _S_basefield 	= _S_dec | _S_oct | _S_hex,
   _S_floatfield 	= _S_scientific | _S_fixed,
-  _S_ios_fmtflags_end = 1L << 16 
+  _S_ios_fmtflags_end = 1L << 16,
+  _S_ios_fmtflags_max = __INT_MAX__,
+  _S_ios_fmtflags_min = ~__INT_MAX__
 };
 
   inline _GLIBCXX_CONSTEXPR _Ios_Fmtflags
@@ -114,7 +116,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _S_in 		= 1L << 3,
   _S_out 		= 1L << 4,
   _S_trunc 		= 1L << 5,
-  _S_ios_openmode_end = 1L << 16 
+  _S_ios_openmode_end = 1L << 16,
+  _S_ios_openmode_max = __INT_MAX__,
+  _S_ios_openmode_min = ~__INT_MAX__
 };
 
   inline _GLIBCXX_CONSTEXPR _Ios_Openmode
@@ -152,7 +156,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _S_badbit 		= 1L << 0,
   _S_eofbit 		= 1L << 1,
   _S_failbit		= 1L << 2,
-  _S_ios_iostate_end = 1L << 16 
+  _S_ios_iostate_end = 1L << 16,
+  _S_ios_iostate_max = __INT_MAX__,
+  _S_ios_iostate_min = ~__INT_MAX__
 };
 
   inline _GLIBCXX_CONSTEXPR _Ios_Iostate
diff --git a/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc b/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc
index 591e371..e8820c5 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc
@@ -70,5 +70,9 @@ case_labels(bitmask_type b)
   break;
 case std::_S_ios_fmtflags_end:
   break;
+case std::_S_ios_fmtflags_min:
+  break;
+case std::_S_ios_fmtflags_max:
+  break;
 }
 }
diff --git a/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/case_label.cc b/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/case_label.cc
index 

Re: [PATCH 04/N] Fix big memory leak in ix86_valid_target_attribute_p

2015-11-12 Thread Martin Liška
On 11/12/2015 04:58 PM, Ramana Radhakrishnan wrote:
> 
> 
> On 12/11/15 15:52, Martin Liška wrote:
>> On 11/12/2015 12:29 PM, Richard Biener wrote:
>>> On Thu, Nov 12, 2015 at 11:03 AM, Martin Liška  wrote:
 Hello.

 Following patch was a bit negotiated with Jakub and can save a huge amount 
 of memory in cases
 where target attributes are heavily utilized.

 Can bootstrap and survives regression tests on x86_64-linux-pc.

 Ready for trunk?
>>>
>>> +static bool opts_obstack_initialized = false;
>>> +
>>> +/* Initialize opts_obstack if not initialized.  */
>>> +
>>> +void
>>> +init_opts_obstack (void)
>>> +{
>>> +  if (!opts_obstack_initialized)
>>> +{
>>> +  opts_obstack_initialized = true;
>>> +  gcc_obstack_init (_obstack);
>>>
>>> you can move the static global to function scope.
>>>
>>> Ok with that change.
>>
>> Done and installed as r230264. Final version of the patch is attached.
>>
>>>
>>> Btw, don't other targets need a similar adjustment to their hook?
>>> Grepping shows arm and nios2.
>>
>> nios2 is not the case as it doesn't utilize:
>>   init_options_struct (_options, NULL);
>>
>> I've been testing patch for aarch64 that is also included in the email.
> 
> The change is also needed in config/aarch64/aarch64.c 
> (aarch64_option_valid_attribute_p). The attached patch is for arm i.e. 32 bit 
> arm.
> 
> Ramana

You are right that the change is for arm32, I wrongly wrote aarch64. However if 
you read
aarch64_option_valid_attribute_p, there is no init_options_struct 
(_options, NULL).

So that, I'm going to test on an arm32 machine.

Martin

> 
>>
>> Martin
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>
 Thanks,
 Martin
>>



Re: [PATCH, 4/16] Implement -foffload-alias

2015-11-12 Thread Tom de Vries

On 11/11/15 12:00, Jakub Jelinek wrote:

On Wed, Nov 11, 2015 at 11:51:02AM +0100, Richard Biener wrote:

The option -foffload-alias=pointer instructs the compiler to assume that
objects references in an offload region do not alias.

The option -foffload-alias=all instructs the compiler to make no
assumptions about aliasing in offload regions.

The default value is -foffload-alias=none.


I think global options for this is nonsense.  Please follow what
we do for #pragma GCC ivdep for example, thus allow the alias
behavior to be specified per "region" (whatever makes sense here
in the context of offloading).


So, IIUC, instead of a global option foffload-alias, you're saying 
something like the following would be acceptable:

...
#pragma GCC offload-alias=
#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
  {
#pragma acc loop
for (COUNTERTYPE ii = 0; ii < N; ii++)
  c[ii] = a[ii] + b[ii];
  }
...
?

I suppose that would work (though a global option would allow us to 
easily switch between none/pointer/all values in a large number of 
files, something that might be useful when f.i. running an openacc  test 
suite).



Yeah, completely agreed.  I don't see why the offloaded region would be in
any way special, they are C/C++/Fortran code as any other.
What we can and should improve is teach IPA aliasing/points to analysis
about the way we lower the host vs. offloading region boundary, so that
if alias analysis on the caller of GOMP_target_ext/GOACC_parallel_keyed
determines something it can be used on the offloaded function side and vice
versa,


I agree this would be a nice way to solve the aliasing info problem, but 
considering the remark of Richard at 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46032#c19 :

...
Not that I think IPA PTA is anywhere near production ready
...
I haven't considered proceeding in that direction.

Thanks,
- Tom


but a switch like the above is just wrong.




Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-12 Thread Jonathan Wakely

On 12/11/15 14:36 +, Jonathan Wakely wrote:

On 12/11/15 15:23 +0100, Gerald Pfeifer wrote:

On Wed, 11 Nov 2015, Jonathan Wakely wrote:

Fixed by this patch.


Thanks, Jonathan!  Unfortunately bootstrap is still broken
(on i386-unknown-freebsd11.0 at least):


Different issue.

In file included from 
/scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/thread.cc:27:0:

/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread: In function ‘void std::this_thread::sleep_for(const 
std::chrono::duration<_Rep1, _Period1>&)’:
/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread:300:44: error: ‘errno’ was not declared in this scope
while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
 ^
/scratch/tmp/gerald/OBJ-1112-1414/i386-unknown-freebsd10.2/libstdc++-v3/include/
thread:300:53: error: ‘EINTR’ was not declared in this scope
while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)


Does adding #include  to libstdc++-v3/include/std/thread
solve it?


Committed to trunk.


commit ede84363f2a4374b0d16ffda19fbcffdc44221c3
Author: Jonathan Wakely 
Date:   Thu Nov 12 15:21:24 2015 +

	* include/std/thread: Include  for EINTR.

diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index 5940e6e..8c01feb 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 


[PATCH] Make disabled-optimization warning more informative; increase default max-gcse-memory

2015-11-12 Thread Bradley Lucier
This patch (a) removes an exact copy of is_too_expensive from cprop.c, 
(b) renames is_too_expensive in gcse.c to 
gcse_or_cprop_is_too_expensive, (c) expands the warning in 
gcse_or_cprop_is_too_expensive to say how much --param max-gcse-memory 
needs to be increased, and (d) increases the default max-gcse-memory 
from 50MB to 128MB.


The expanded warning allowed me to see how much memory really was needed 
to apply gcse to some of my routines, and 128MB fixes my problem.  The 
limit has been 50MB for over 10 years, I think we can up it a bit now.


Bootstrapped and checked, default languages, no regressions.

Brad

* gcc/cprop.c (is_too_expensive): Remove.
(gcse.h): Include.
(one_cprop_pass): Call gcse_or_cprop_is_too_expensive, not
is_too_expensive.
* gcc/gcse.h (gcse_or_cprop_is_too_expensive): Declare.
* gcc/gcse.c (is_too_expensive): Rename to ...
(gcse_or_cprop_is_too_expensive): ... this.
Expand warning to add required size of max-gcse-memory.
(one_pre_gcse_pass): Use it.
(one_code_hoisting_pass): Use it.
* gcc/params.def (max-gcse-memory): Increase from 50MB to 128MB.


Index: gcc/gcse.c
===
--- gcc/gcse.c	(revision 230136)
+++ gcc/gcse.c	(working copy)
@@ -510,7 +510,6 @@
 static void update_ld_motion_stores (struct gcse_expr *);
 static void clear_modify_mem_tables (void);
 static void free_modify_mem_tables (void);
-static bool is_too_expensive (const char *);
 
 #define GNEW(T)			((T *) gmalloc (sizeof (T)))
 #define GCNEW(T)		((T *) gcalloc (1, sizeof (T)))
@@ -2565,7 +2564,7 @@
 
   /* Return if there's nothing to do, or it is too expensive.  */
   if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1
-  || is_too_expensive (_("PRE disabled")))
+  || gcse_or_cprop_is_too_expensive (_("PRE disabled")))
 return 0;
 
   /* We need alias.  */
@@ -3493,7 +3492,7 @@
 
   /* Return if there's nothing to do, or it is too expensive.  */
   if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1
-  || is_too_expensive (_("GCSE disabled")))
+  || gcse_or_cprop_is_too_expensive (_("GCSE disabled")))
 return 0;
 
   doing_code_hoisting_p = true;
@@ -3957,9 +3956,13 @@
 /* Return true if the graph is too expensive to optimize. PASS is the
optimization about to be performed.  */
 
-static bool
-is_too_expensive (const char *pass)
+bool
+gcse_or_cprop_is_too_expensive (const char *pass)
 {
+  unsigned int memory_request = n_basic_blocks_for_fn (cfun)
+* SBITMAP_SET_SIZE (max_reg_num ())
+* sizeof (SBITMAP_ELT_TYPE);
+  
   /* Trying to perform global optimizations on flow graphs which have
  a high connectivity will take a long time and is unlikely to be
  particularly useful.
@@ -3981,13 +3984,12 @@
 
   /* If allocating memory for the dataflow bitmaps would take up too much
  storage it's better just to disable the optimization.  */
-  if ((n_basic_blocks_for_fn (cfun)
-   * SBITMAP_SET_SIZE (max_reg_num ())
-   * sizeof (SBITMAP_ELT_TYPE)) > MAX_GCSE_MEMORY)
+  if (memory_request > MAX_GCSE_MEMORY)
 {
   warning (OPT_Wdisabled_optimization,
-	   "%s: %d basic blocks and %d registers",
-	   pass, n_basic_blocks_for_fn (cfun), max_reg_num ());
+	   "%s: %d basic blocks and %d registers; increase --param max-gcse-memory above %d",
+	   pass, n_basic_blocks_for_fn (cfun), max_reg_num (),
+	   memory_request);
 
   return true;
 }
Index: gcc/gcse.h
===
--- gcc/gcse.h	(revision 230136)
+++ gcc/gcse.h	(working copy)
@@ -40,5 +40,6 @@
 #endif
 
 void gcse_c_finalize (void);
+extern bool gcse_or_cprop_is_too_expensive (const char *);
 
 #endif
Index: gcc/cprop.c
===
--- gcc/cprop.c	(revision 230136)
+++ gcc/cprop.c	(working copy)
@@ -39,6 +39,7 @@
 #include "tree-pass.h"
 #include "dbgcnt.h"
 #include "cfgloop.h"
+#include "gcse.h"
 
 
 /* An obstack for our working variables.  */
@@ -1724,47 +1725,6 @@
   return changed;
 }
 
-/* Return true if the graph is too expensive to optimize.  PASS is the
-   optimization about to be performed.  */
-
-static bool
-is_too_expensive (const char *pass)
-{
-  /* Trying to perform global optimizations on flow graphs which have
- a high connectivity will take a long time and is unlikely to be
- particularly useful.
-
- In normal circumstances a cfg should have about twice as many
- edges as blocks.  But we do not want to punish small functions
- which have a couple switch statements.  Rather than simply
- threshold the number of blocks, uses something with a more
- graceful degradation.  */
-  if (n_edges_for_fn (cfun) > 2 + n_basic_blocks_for_fn (cfun) * 4)
-{
-  warning (OPT_Wdisabled_optimization,
-	   "%s: %d basic blocks and %d edges/basic block",
-	   

Gimple loop splitting

2015-11-12 Thread Michael Matz
Hello,

this new pass implements loop iteration space splitting for loops that 
contain a conditional that's always true for one part of the iteration 
space and false for the other, i.e. such situations:

  for (i = beg; i < end; i++)
if (i < p)
  dothis();
else
  dothat();

this is transformed into roughly:

  for (i = beg; i < p; i++)
dothis();
  for (; i < end; i++)
dothat();

Of course, not quite the above as there needs to be provisions for the 
border conditions, if e.g. 'p' is outside the original iteration space, or 
the conditional doesn't directly use the control IV, but some other, or 
the IV runs backwards.  The testcase checks many of these border 
conditions.

This transformation is in itself a good one but can also be an enabler for 
the vectorizer.  It does increase code size, when the loop body contains 
also unconditional code (that one is duplicated), so we only transform hot 
loops.  I'm a bit unsure of the placement of the new pass, or if it should 
be an own pass at all.  Right now I've placed it after unswitching and 
scev_cprop, before loop distribution.  Ideally I think all three, together 
with loop fusion and an gimple unroller should be integrated into one loop 
nest optimizer, alas, we aren't there yet.

I'm planning to work on loop fusion in the future as well, but that's not 
for GCC 6.

I've regstrapped this pass enabled with -O2 on x86-64-linux, without 
regressions.  I've also checked cpu2006 (the non-fortran part) for 
correctness, not yet for performance.  In the end it should probably only 
be enabled for -O3+ (although if the whole loop body is conditional it 
makes sense to also have it with -O2 because code growth is very small 
then).

So, okay for trunk?


Ciao,
Michael.
* passes.def (pass_loop_split): Add.
* timevar.def (TV_LOOP_SPLIT): Add.
* tree-pass.h (make_pass_loop_split): Declare.
* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
cfganal.h, tree-chrec.h, tree-affine.h, tree-scalar-evolution.h,
gimple-pretty-print.h, gimple-fold.h, gimplify-me.h.
(split_at_bb_p, patch_loop_exit, find_or_create_guard_phi,
split_loop, tree_ssa_split_loops,
make_pass_loop_split): New functions.
(pass_data_loop_split): New.
(pass_loop_split): New.

testsuite/
* gcc.dg/loop-split.c: New test.

Index: passes.def
===
--- passes.def  (revision 229763)
+++ passes.def  (working copy)
@@ -233,6 +233,7 @@ along with GCC; see the file COPYING3.
  NEXT_PASS (pass_dce);
  NEXT_PASS (pass_tree_unswitch);
  NEXT_PASS (pass_scev_cprop);
+ NEXT_PASS (pass_loop_split);
  NEXT_PASS (pass_record_bounds);
  NEXT_PASS (pass_loop_distribution);
  NEXT_PASS (pass_copy_prop);
Index: timevar.def
===
--- timevar.def (revision 229763)
+++ timevar.def (working copy)
@@ -179,6 +179,7 @@ DEFTIMEVAR (TV_LIM   , "
 DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
 DEFTIMEVAR (TV_SCEV_CONST, "scev constant prop")
 DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH, "tree loop unswitching")
+DEFTIMEVAR (TV_LOOP_SPLIT, "loop splitting")
 DEFTIMEVAR (TV_COMPLETE_UNROLL   , "complete unrolling")
 DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
 DEFTIMEVAR (TV_TREE_VECTORIZATION, "tree vectorization")
Index: tree-pass.h
===
--- tree-pass.h (revision 229763)
+++ tree-pass.h (working copy)
@@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_tree_n
 extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
Index: tree-ssa-loop-manip.h
===
--- tree-ssa-loop-manip.h   (revision 229763)
+++ tree-ssa-loop-manip.h   (working copy)
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
 
 extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
   bool, tree *, tree *);
+extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
+   struct loop *);
 extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
 extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
 extern void verify_loop_closed_ssa (bool);
Index: 

[PATCH] Avoid false vector mask conversion

2015-11-12 Thread Ilya Enkovich
Hi,

When we use LTO for fortran we may have a mix 32bit and 1bit scalar booleans. 
It means we may have conversion of one scalar type to another which confuses 
vectorizer because values with different scalar boolean type may get the same 
vectype.  This patch transforms such conversions into comparison.

I managed to make a small fortran test which gets vectorized with this patch 
but I didn't find how I can run fortran test with LTO and then scan tree dump 
to check it is vectorized.  BTW here is a loop from the test:

  real*8 a(18)
  logical b(18)
  integer i

  do i=1,18
 if(a(i).gt.0.d0) then
b(i)=.true.
 else
b(i)=.false.
 endif
  enddo

Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-11-12  Ilya Enkovich  

* tree-vect-patterns.c (vect_recog_mask_conversion_pattern):
Transform useless boolean conversion into assignment.


diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index b9d900c..62070da 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -3674,6 +3674,38 @@ vect_recog_mask_conversion_pattern (vec 
*stmts, tree *type_in,
   if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)
 return NULL;
 
+  /* Check conversion between boolean types of different sizes.
+ If no vectype is specified, then we have a regular mask
+ assignment with no actual conversion.  */
+  if (rhs_code == CONVERT_EXPR
+  && !STMT_VINFO_DATA_REF (stmt_vinfo)
+  && !STMT_VINFO_VECTYPE (stmt_vinfo))
+{
+  if (TREE_CODE (rhs1) != SSA_NAME)
+   return NULL;
+
+  rhs1_type = search_type_for_mask (rhs1, vinfo);
+  if (!rhs1_type)
+   return NULL;
+
+  vectype1 = get_mask_type_for_scalar_type (rhs1_type);
+
+  if (!vectype1)
+   return NULL;
+
+  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+  pattern_stmt = gimple_build_assign (lhs, rhs1);
+
+  *type_out = vectype1;
+  *type_in = vectype1;
+  stmts->safe_push (last_stmt);
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+ "vect_recog_mask_conversion_pattern: detected:\n");
+
+  return pattern_stmt;
+}
+
   if (rhs_code != BIT_IOR_EXPR
   && rhs_code != BIT_XOR_EXPR
   && rhs_code != BIT_AND_EXPR)


Re: [gomp4] remove c++ reference restriction

2015-11-12 Thread Thomas Schwinge
Hi Nathan!

On Thu, 12 Nov 2015 09:03:42 -0500, Nathan Sidwell  wrote:
> I've applied this to gomp4 branch.  It removes the machinery concerning c++ 
> references.  The openacc std makes no  mention of such a type, so originally 
> we 
> were not permitting the type. But,
> (a) OpenMP supports them, which suggests openacc wishes to
> (b) Fortran already has reference types that need supporting
> (c) it's more work to not support them, by modifying the mappable_type hook.

ACK.  "For reference", your and Cesar's original reasoning had been that
the OpenACC 2.0a specification doesn't talk about reference types
specifically, and they're non-POD, so not supported, which lead to
Cesar's patch that I applied to gomp-4_0-branch in r223179,
.
Now, if they "just work", that's even better of course.

> 2015-11-12  Nathan Sidwell  

That was incomplete however, so we got a testsuite regression because of:

[...]/gcc/testsuite/g++.dg/goacc/reference.C: In function 'int test1(int&)':
[...]/gcc/testsuite/g++.dg/goacc/reference.C:7:27: error: reference types 
are not supported in OpenACC
[...]/gcc/testsuite/g++.dg/goacc/reference.C: In function 'int main()':
[...]/gcc/testsuite/g++.dg/goacc/reference.C:36:33: error: reference types 
are not supported in OpenACC

Fixed on gomp-4_0-branch in r230265:

commit 6ec26341420b3ced11af5adbe4643ff6cfaf99eb
Author: tschwinge 
Date:   Thu Nov 12 16:09:47 2015 +

Complete reversion of "Prohibit C++ reference types in OpenACC regions"

gcc/cp/
* semantics.c (finish_omp_clauses): Remove "reference types are
not supported in OpenACC" diagnostic.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@230265 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/cp/ChangeLog.gomp | 5 +
 gcc/cp/semantics.c| 9 ++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git gcc/cp/ChangeLog.gomp gcc/cp/ChangeLog.gomp
index 1f19b39..e4d000d 100644
--- gcc/cp/ChangeLog.gomp
+++ gcc/cp/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2015-11-12  Thomas Schwinge  
+
+   * semantics.c (finish_omp_clauses): Remove "reference types are
+   not supported in OpenACC" diagnostic.
+
 2015-11-12  Nathan Sidwell  
 
* semantics.c (finish_ommp_clauses): Adjust omp_mappable_type calls.
diff --git gcc/cp/semantics.c gcc/cp/semantics.c
index 34ccec6..ac3cb66 100644
--- gcc/cp/semantics.c
+++ gcc/cp/semantics.c
@@ -6552,11 +6552,6 @@ finish_omp_clauses (tree clauses, bool is_oacc, bool 
allow_fields,
}
  break;
}
- if (is_oacc && TREE_CODE (TREE_TYPE (t)) == REFERENCE_TYPE)
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "reference types are not supported in OpenACC");
-   }
  if (t == error_mark_node)
{
  remove = true;
@@ -6648,10 +6643,10 @@ finish_omp_clauses (tree clauses, bool is_oacc, bool 
allow_fields,
 == GOMP_MAP_FIRSTPRIVATE_POINTER)))
   && t == OMP_CLAUSE_DECL (c)
   && !type_dependent_expression_p (t)
-  && !cp_omp_mappable_type (((TREE_CODE (TREE_TYPE (t))
+  && !cp_omp_mappable_type ((TREE_CODE (TREE_TYPE (t))
  == REFERENCE_TYPE)
 ? TREE_TYPE (TREE_TYPE (t))
-: TREE_TYPE (t
+: TREE_TYPE (t)))
{
  error_at (OMP_CLAUSE_LOCATION (c),
"%qD does not have a mappable type in %qs clause", t,


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] [ARM/Aarch64] add initial Qualcomm support

2015-11-12 Thread Ramana Radhakrishnan
On Wed, Nov 11, 2015 at 6:34 PM, Jim Wilson  wrote:
> This adds an option for the Qualcomm server parts, qdf24xx, just
> optimizing like a cortex-a57 for now, same as how the initial Samsung
> exynos-m1 support worked.
>
> This was tested with armv8 and aarch64 bootstraps and make check.
>
> I had to disable the cortex-a57 fma steering pass in the aarch64 port
> while testing the patch.  A bootstrap for aarch64 configured
> --with-cpu=cortex-a57 gives multiple ICEs while building the stage1
> libstdc++.  The ICEs are in scan_rtx_reg at regrename.c:1074.  This
> looks vaguely similar to PR 66785.
>
> I am also seeing extra make check failures due to ICEs with armv8
> bootstrap builds configured --with-cpu=cortex-a57,  I see ICEs in
> scan_rtx_reg in regrename, and ICEs in decompose_normal_address in
> rtlanal.c.

The failures are due to the usual madness in the last week of stage1.

> The arm port doesn't have the fma steering support, which
> seems odd, and is maybe a bug, so it isn't clear what is causing this
> problem.

The ARM port does not require this. The FMA steering issue is only
relevant in AArch64 ISA state.

>
> I plan to look at these aarch64 and armv8 failures next, including PR
> 66785.  None of these have anything to do with my patch, as they
> trigger for cortex-a57 which is already supported.


This is OK to go in with a follow up to handle this cpu in t-aprofile
similar to the other cpus in there - for bonus points please deal with
the exynos core at the same time if not already done.


regards
Ramana

>
> Jim


Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-12 Thread Gerald Pfeifer
On Thu, 12 Nov 2015, Ville Voutilainen wrote:
> Note that that's a separate problem that has nothing to do with the
> tag-type-explicit-default-ctor patch.

On Thu, 12 Nov 2015, Jonathan Wakely wrote:
> Different issue.

Sorry, I had two different libstdc++ bootstrap failures in the
last 24 hours, and misassociated this one.

> Does adding #include  to libstdc++-v3/include/std/thread
> solve it?

Yep, that worked.

On Thu, 12 Nov 2015, Jonathan Wakely wrote:
> Committed to trunk.

Thanks!

Gerald


Re: [PATCH] Make disabled-optimization warning more informative; increase default max-gcse-memory

2015-11-12 Thread Bernd Schmidt

The expanded warning allowed me to see how much memory really was needed
to apply gcse to some of my routines, and 128MB fixes my problem.  The
limit has been 50MB for over 10 years, I think we can up it a bit now.
  {
+  unsigned int memory_request = n_basic_blocks_for_fn (cfun)
+* SBITMAP_SET_SIZE (max_reg_num ())
+* sizeof (SBITMAP_ELT_TYPE);
+


Formatting (wrap in parens to get it indented). Could also move the 
initialization before the use.


Otherwise ok.


Bernd



Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-11-12 Thread Jeff Law

On 11/12/2015 10:05 AM, Jeff Law wrote:

But IIRC you mentioned it should enable vectorization or so?  In this
case
that's obviously too late.

The opposite.  Path splitting interferes with if-conversion &
vectorization.  Path splitting mucks up the CFG enough that
if-conversion won't fire and as a result vectorization is inhibited.  It
also creates multi-latch loops, which isn't a great situation either.

It *may* be the case that dropping it that far down in the pipeline and
making the modifications necessary to handle simple latches may in turn
make the path splitting code play better with if-conversion and
vectorization and avoid creation of multi-latch loops.  At least that's
how it looks on paper when I draw out the CFG manipulations.

I'll do some experiments.
It doesn't look too terrible to ravamp the recognition code to work 
later in the pipeline with simple latches.  Sadly that doesn't seem to 
have fixed the bad interactions with if-conversion.


*But* that does open up the possibility of moving the path splitting 
pass even deeper in the pipeline -- in particular we can move it past 
the vectorizer.  Which is may be a win.


So the big question is whether or not we'll still see enough benefits 
from having it so late in the pipeline.  It's still early enough that we 
get DOM, VRP, reassoc, forwprop, phiopt, etc.


Ajit, I'll pass along an updated patch after doing some more testing.

Jeff




Re: [PATCH 6/6] Make SRA replace constant-pool loads

2015-11-12 Thread Alan Lawrence
On 06/11/15 16:29, Richard Biener wrote:
>>> 2) You should be able to use fold_ctor_reference directly (in place
>> of
>>> all your code
>>> in case offset and size are readily available - don't remember
>> exactly how
>>> complete scalarization "walks" elements).  Alternatively use
>>> fold_const_aggregate_ref.

Well, yes I was able to remove subst_constant_pool_initial by using
fold_ctor_reference, with some fairly strict checks about the access expressions
found while scanning the input. I still needed to either deep-scan the constant
value itself, or allow completely_scalarize to fail, due to Ada c460010 where
a variable has DECL_INITIAL: {VIEW_CONVERT_EXPR({1, 2, 3})}
(that is, a CONSTRUCTOR whose only element is a V.C.E. of another CONSTRUCTOR).

However:

> I tried to suggest creating piecewise loads from the constant pool as opposed 
> to the aggregate one.  But I suppose the difficulty is to determine 'pieces', 
> not their values (that followup passes and folding can handle quite 
> efficiently).

I've now got this working, and it simplifies the patch massively :).

We now replace e.g. a constant-pool load a = *.LC0, with a series of loads e.g.
SR_a1 = SRlc0_1
SR_a2 = SRlc0_2
etc. etc. (well those wouldn't be quite the names, but you get the idea.)

And then at the start of the function we initialise
SRlc0_1 = *.LC0[0]
SRlc0_2 = *.LC0[1]
etc. etc.
- I needed to use SRlc0_1 etc. variables because some of the constant-pool loads
coming from Ada are rather more complicated than 'a = *.LC0' and substituting
the access expression (e.g. '*.LC0[0].foo'), rather than the replacement_decl,
into those lead to just too many verify_gimple failures.

However, the initialization seems to work well enough, if I put a check into
sra_modify_expr to avoid changing 'SRlc0_1 = *.LC0[0]' into 'SRlc0_1 = SRlc0_1'
(!). Slightly hacky perhaps but harmless as there cannot be a statement writing
to a scalar replacement prior to sra_modify_expr.

Also I had to prevent writing scalar replacements back to the constant pool
(gnat opt31.adb).

Finally - I've left the disqualified_constants code in, but am now only using it
for an assert...so I guess it'd be reasonable to take that out. In an ideal
world we could do those checks only in checking builds but allocating bitmaps
only for checking builds feels like it would make checking vs non-checking
diverge too much.

This is now passing bootstrap+check-{gcc,g++,fortran} on AArch64, and the same
plus Ada on x64_64 and ARM, except for some additional guality failures in
pr54970-1.c on AArch64 (tests which are already failing on ARM) - I haven't had
any success in figuring those out yet, suggestions welcome.

However, without the DOM patches, this does not fix the oiginal ssa-dom-cse-2.c
testcase, even though the load is scalarized in esra and the individual element
loads resolved in ccp2. I don't think I'll be able to respin those between now
and stage 1 ending on Saturday, so hence I've had to drop the testsuite change,
and can only / will merely describe the remaining problem in PR/63679. I'm
now benchmarking on AArch64 to see what difference this patch makes on its own.

Thoughts?

gcc/ChangeLog:

PR target/63679
* tree-sra.c (disqualified_constants): New.
(sra_initialize): Allocate disqualified_constants.
(sra_deinitialize): Free disqualified_constants.
(disqualify_candidate): Update disqualified_constants when appropriate.
(create_access): Scan for constant-pool entries as we go along.
(scalarizable_type_p): Add check against type_contains_placeholder_p.
(maybe_add_sra_candidate): Allow constant-pool entries.
(initialize_constant_pool_replacements): New.
(sra_modify_expr): Avoid mangling assignments created by previous.
(sra_modify_function_body): Call initialize_constant_pool_replacements.
---
 gcc/tree-sra.c | 93 --
 1 file changed, 90 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 1d4a632..aa60f06 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -325,6 +325,10 @@ candidate (unsigned uid)
those which cannot be (because they are and need be used as a whole).  */
 static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
 
+/* Bitmap of candidates in the constant pool, which cannot be scalarized
+   because this would produce non-constant expressions (e.g. Ada).  */
+static bitmap disqualified_constants;
+
 /* Obstack for creation of fancy names.  */
 static struct obstack name_obstack;
 
@@ -647,6 +651,7 @@ sra_initialize (void)
 (vec_safe_length (cfun->local_decls) / 2);
   should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
   cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
+  disqualified_constants = BITMAP_ALLOC (NULL);
   gcc_obstack_init (_obstack);
   base_access_vec = new hash_map;
   memset (_stats, 0, sizeof (sra_stats));
@@ 

Re: [PATCH] Fix PR ipa/68035

2015-11-12 Thread Jan Hubicka
> >> +
> >> +  /* Initialize hash values if we are not in LTO mode.  */
> >> +  if (!in_lto_p)
> >> +  item->get_hash ();
> >>  }
> > 
> > Hmm, what is the difference to the LTO mode here. I would have expected 
> > that all the items
> > was analyzed in both paths?
> 
> Difference is that in case of the LTO mode, the hash value is read from 
> streamed LTO file.
> On the other hand, in classic compilation mode we have to force the 
> calculation as a hash value
> is computed lazily.

In this case we need to also handle cases where function/variable is born 
during WPA (i.e. produced
by earlier pass), so in_lto_p check looks wrong.
I will look at the updated patch.

Honza


Re: [PATCH] Make disabled-optimization warning more informative; increase default max-gcse-memory

2015-11-12 Thread Jeff Law

On 11/12/2015 10:08 AM, Bradley Lucier wrote:

On 11/12/2015 11:57 AM, Bernd Schmidt wrote:

The expanded warning allowed me to see how much memory really was needed
to apply gcse to some of my routines, and 128MB fixes my problem.  The
limit has been 50MB for over 10 years, I think we can up it a bit now.
  {
+  unsigned int memory_request = n_basic_blocks_for_fn (cfun)
+* SBITMAP_SET_SIZE (max_reg_num ())
+* sizeof (SBITMAP_ELT_TYPE);
+


Formatting (wrap in parens to get it indented).


Fixed.



Otherwise ok.


Thanks.

Here's the reformatted patch (but without retesting).

I don't have check-in privileges.

Brad

 * gcc/cprop.c (is_too_expensive): Remove.
 (gcse.h): Include.
 (one_cprop_pass): Call gcse_or_cprop_is_too_expensive, not
 is_too_expensive.
 * gcc/gcse.h (gcse_or_cprop_is_too_expensive): Declare.
 * gcc/gcse.c (is_too_expensive): Rename to ...
 (gcse_or_cprop_is_too_expensive): ... this.
 Expand warning to add required size of max-gcse-memory.
 (one_pre_gcse_pass): Use it.
 (one_code_hoisting_pass): Use it.
 * gcc/params.def (max-gcse-memory): Increase from 50MB to 128MB.

Committed on your behalf.

jeff



[PATCH,committed] PR fortran/68318 -- increment reference count

2015-11-12 Thread Steve Kargl
I've committed the attached patch as obvious after
testing on x86_64-*-freebsd.

The short story is that gfortran tracks the number
of ENTRY symbols with a reference count.  If an
ENTRY was included in a routine within a MODULE the
reference count was not properly increment.  This
patch now does the increment.

As a bonus it fixes a nearby comment that is missing
a space, and changes the only occurence of ++sym->refs
to sym->refs++ for consistency.

2015-11-12  Steven G. Kargl  

PR fortran/68318
* decl.c (get_proc_name): Increment reference count for ENTRY.
While here, fix comment and use postfix ++ for consistency.

2015-11-12  Steven G. Kargl  

PR fortran/68318
* gfortran.dg/pr68318_1.f90: New test.
* gfortran.dg/pr68318_2.f90: Ditto.

-- 
Steve
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c	(revision 230064)
+++ gcc/fortran/decl.c	(working copy)
@@ -926,6 +926,7 @@ get_proc_name (const char *name, gfc_sym
 	  gfc_find_sym_tree (name, gfc_current_ns, 0, );
 	  st->n.sym = *result;
 	  st = gfc_get_unique_symtree (gfc_current_ns);
+	  sym->refs++;
 	  st->n.sym = sym;
 	}
 }
@@ -972,7 +973,7 @@ get_proc_name (const char *name, gfc_sym
   /* Trap another encompassed procedure with the same name.  All
 	 these conditions are necessary to avoid picking up an entry
 	 whose name clashes with that of the encompassing procedure;
-	 this is handled using gsymbols to register unique,globally
+	 this is handled using gsymbols to register unique, globally
 	 accessible names.  */
   if (sym->attr.flavor != 0
 	  && sym->attr.proc != 0
@@ -9052,7 +9053,7 @@ gfc_match_final_decl (void)
 
   /* Add this symbol to the list of finalizers.  */
   gcc_assert (block->f2k_derived);
-  ++sym->refs;
+  sym->refs++;
   f = XCNEW (gfc_finalizer);
   f->proc_sym = sym;
   f->proc_tree = NULL;
Index: gcc/testsuite/gfortran.dg/pr68318_1.f90
===
--- gcc/testsuite/gfortran.dg/pr68318_1.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr68318_1.f90	(working copy)
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! { dg-options "-O0"
+! PR fortran/68318
+! Original code submitted by Gerhard Steinmetz
+! 
+!
+module m
+   implicit none
+contains
+   subroutine s1
+   entry e! { dg-error "(2)" }
+   end
+   subroutine s2
+   entry e! { dg-error "is already defined" }
+   end
+end module
+! { dg-prune-output "Duplicate ENTRY attribute specified" }
+
Index: gcc/testsuite/gfortran.dg/pr68318_2.f90
===
--- gcc/testsuite/gfortran.dg/pr68318_2.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr68318_2.f90	(working copy)
@@ -0,0 +1,22 @@
+! { dg-do compile }
+! PR fortran/68318
+! Original code submitted by Gerhard Steinmetz
+! 
+!
+module m1
+   implicit none
+contains
+   subroutine s1
+   entry e
+   end
+end module
+
+module m2
+   use m1 ! { dg-error "(2)" }
+   implicit none
+contains
+   subroutine s2
+   entry e! { dg-error "is already defined" }
+   end
+end module
+! { dg-prune-output "Cannot change attribute" }


<    1   2