date:20151111

[gomp4] Merge trunk r230169 (2015-11-11) into gomp-4_0-branch

2015-11-11 Thread Thomas Schwinge

Hi!

Committed to gomp-4_0-branch in r230214:

commit 265f04668a5b7dece82a35e2d75e8b51a1d75b69
Merge: 5e71838 b656be3
Author: tschwinge 
Date:   Thu Nov 12 07:40:36 2015 +

svn merge -r 230082:230169 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@230214 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: [RFC][PATCH] Preferred rename register in regrename pass

2015-11-11 Thread Christophe Lyon

On 11 November 2015 at 09:50, Robert Suchanek
 wrote:
> Hi,
>
>> I guess this is ok to stop the failures for now, but you may want to
>> move the check to the point where we set terminated_this_insn. Also, as
>> I pointed out earlier, clearing terminated_this_insn should probably
>> happen earlier.
>
> Here is the updated patch that I'm about to commit once the bootstrap
> finishes.
>
Hi,
I confirm that this fixes the build errors I was seeing.
Thanks.

> Regards,
> Robert
>
> gcc/
> * regname.c (scan_rtx_reg): Check the matching number of consecutive
> registers when tying chains.
> (build_def_use): Move terminated_this_insn earlier in the function.
> ---
>  gcc/regrename.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/regrename.c b/gcc/regrename.c
> index d727dd9..d41410a 100644
> --- a/gcc/regrename.c
> +++ b/gcc/regrename.c
> @@ -1068,7 +1068,9 @@ scan_rtx_reg (rtx_insn *insn, rtx *loc, enum reg_class 
> cl, enum scan_actions act
>   && GET_CODE (pat) == SET
>   && GET_CODE (SET_DEST (pat)) == REG
>   && GET_CODE (SET_SRC (pat)) == REG
> - && terminated_this_insn)
> + && terminated_this_insn
> + && terminated_this_insn->nregs
> +== REG_NREGS (recog_data.operand[1]))
> {
>   gcc_assert (terminated_this_insn->regno
>   == REGNO (recog_data.operand[1]));
> @@ -1593,6 +1595,7 @@ build_def_use (basic_block bb)
>   enum rtx_code set_code = SET;
>   enum rtx_code clobber_code = CLOBBER;
>   insn_rr_info *insn_info = NULL;
> + terminated_this_insn = NULL;
>
>   /* Process the insn, determining its effect on the def-use
>  chains and live hard registers.  We perform the following
> @@ -1749,8 +1752,6 @@ build_def_use (basic_block bb)
>   scan_rtx (insn, &XEXP (note, 0), ALL_REGS, mark_read,
> OP_INOUT);
>
> - terminated_this_insn = NULL;
> -
>   /* Step 4: Close chains for registers that die here, unless
>  the register is mentioned in a REG_UNUSED note.  In that
>  case we keep the chain open until step #7 below to ensure
> --
> 2.4.

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-11-11 Thread Kugan

Hi Richard,

Thanks for the review.

>>>
>>> The basic "structure" thing still remains.  You walk over all uses and
>>> defs in all stmts
>>> in promote_all_stmts which ends up calling promote_ssa_if_not_promoted on 
>>> all
>>> uses and defs which in turn promotes (the "def") and then fixes up all
>>> uses in all stmts.
>>
>> Done.
> 
> Not exactly.  I still see
> 
> /* Promote all the stmts in the basic block.  */
> static void
> promote_all_stmts (basic_block bb)
> {
>   gimple_stmt_iterator gsi;
>   ssa_op_iter iter;
>   tree def, use;
>   use_operand_p op;
> 
>   for (gphi_iterator gpi = gsi_start_phis (bb);
>!gsi_end_p (gpi); gsi_next (&gpi))
> {
>   gphi *phi = gpi.phi ();
>   def = PHI_RESULT (phi);
>   promote_ssa (def, &gsi);
> 
>   FOR_EACH_PHI_ARG (op, phi, iter, SSA_OP_USE)
> {
>   use = USE_FROM_PTR (op);
>   if (TREE_CODE (use) == SSA_NAME
>   && gimple_code (SSA_NAME_DEF_STMT (use)) == GIMPLE_NOP)
> promote_ssa (use, &gsi);
>   fixup_uses (phi, &gsi, op, use);
> }
> 
> you still call promote_ssa on both DEFs and USEs and promote_ssa looks
> at SSA_NAME_DEF_STMT of the passed arg.  Please call promote_ssa just
> on DEFs and fixup_uses on USEs.

I am doing this to promote SSA that are defined with GIMPLE_NOP. Is
there anyway to iterate over this. I have added gcc_assert to make sure
that promote_ssa is called only once.

> 
> Any reason you do not promote debug stmts during the DOM walk?
> 
> So for each DEF you record in ssa_name_info
> 
> struct ssa_name_info
> {
>   tree ssa;
>   tree type;
>   tree promoted_type;
> };
> 
> (the fields need documenting).  Add a tree promoted_def to it which you
> can replace any use of the DEF with.

In this version of the patch, I am promoting the def in place. If we
decide to change, I will add it. If I understand you correctly, this is
to be used in iterating over uses and fixing.

> 
> Currently as you call promote_ssa for DEFs and USEs you repeatedly
> overwrite the entry in ssa_name_info_map with a new copy.  So you
> should assert it wasn't already there.
> 
>   switch (gimple_code (def_stmt))
> {
> case GIMPLE_PHI:
> {
> 
> the last { is indented too much it should be indented 2 spaces
> relative to the 'case'

Done.

> 
> 
>   SSA_NAME_RANGE_INFO (def) = NULL;
> 
> only needed in the case 'def' was promoted itself.  Please use
> reset_flow_sensitive_info (def).

We are promoting all the defs. In some-cases we can however use the
value ranges in SSA just by promoting to new type (as the values will be
the same). Shall I do it as a follow up.
> 
>>>
>>> Instead of this you should, in promote_all_stmts, walk over all uses doing 
>>> what
>>> fixup_uses does and then walk over all defs, doing what promote_ssa does.
>>>
>>> +case GIMPLE_NOP:
>>> +   {
>>> + if (SSA_NAME_VAR (def) == NULL)
>>> +   {
>>> + /* Promote def by fixing its type for anonymous def.  */
>>> + TREE_TYPE (def) = promoted_type;
>>> +   }
>>> + else
>>> +   {
>>> + /* Create a promoted copy of parameters.  */
>>> + bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>>>
>>> I think the uninitialized vars are somewhat tricky and it would be best
>>> to create a new uninit anonymous SSA name for them.  You can
>>> have SSA_NAME_VAR != NULL and def _not_ being a parameter
>>> btw.
>>
>> Done. I also had to do some changes to in couple of other places to
>> reflect this.
>> They are:
>> --- a/gcc/tree-ssa-reassoc.c
>> +++ b/gcc/tree-ssa-reassoc.c
>> @@ -302,6 +302,7 @@ phi_rank (gimple *stmt)
>>  {
>>tree arg = gimple_phi_arg_def (stmt, i);
>>if (TREE_CODE (arg) == SSA_NAME
>> + && SSA_NAME_VAR (arg)
>>   && !SSA_NAME_IS_DEFAULT_DEF (arg))
>> {
>>   gimple *def_stmt = SSA_NAME_DEF_STMT (arg);
>> @@ -434,7 +435,8 @@ get_rank (tree e)
>>if (gimple_code (stmt) == GIMPLE_PHI)
>> return phi_rank (stmt);
>>
>> -  if (!is_gimple_assign (stmt))
>> +  if (!is_gimple_assign (stmt)
>> + && !gimple_nop_p (stmt))
>> return bb_rank[gimple_bb (stmt)->index];
>>
>> and
>>
>> --- a/gcc/tree-ssa.c
>> +++ b/gcc/tree-ssa.c
>> @@ -752,7 +752,8 @@ verify_use (basic_block bb, basic_block def_bb,
>> use_operand_p use_p,
>>TREE_VISITED (ssa_name) = 1;
>>
>>if (gimple_nop_p (SSA_NAME_DEF_STMT (ssa_name))
>> -  && SSA_NAME_IS_DEFAULT_DEF (ssa_name))
>> +  && (SSA_NAME_IS_DEFAULT_DEF (ssa_name)
>> + || SSA_NAME_VAR (ssa_name) == NULL))
>>  ; /* Default definitions have empty statements.  Nothing to do.  */
>>else if (!def_bb)
>>  {
>>
>> Does this look OK?
> 
> Hmm, no, this looks bogus.

I have removed all the above.

> 
> I think the best thing to do is not promoting default defs at all and instead
> promote at the uses.
> 
>   /* Create a promoted copy of parameters.  */
>

Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Martin Sebor


Oh, and we could also be more informative and print the size of an array,
or the number of elements, as clang does.


Yes, that's pretty nice. It helps but the diagnostic must point at
the right dimension. GCC often just points at the whole expression
or some token within it.

void* foo ()
{
enum { I = 65535, J = 65536, K = 65537, L = 65538, M = 65539 };

return new int [I][J][K][L][M];
}
z.c:5:24: error: array is too large (65536 elements)
return new int [I][J][K][L][M];
   ^

It might be even more helpful if it included the size of each element
(i.e., here, K * L * M byes).

Martin

Re: [PATCH] Fix IRA register preferencing

2015-11-11 Thread Vladimir Makarov


On 11/10/2015 08:30 AM, Wilco Dijkstra wrote:

Ping of https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00829.html:



This fixes a bug in register preferencing. When live range splitting creates
a new register from
another, it copies most fields except for the register preferences. The
preference GENERAL_REGS is
used as reg_pref[i].prefclass is initialized with GENERAL_REGS in
allocate_reg_info () and
resize_reg_info ().

This initialization value is not innocuous like the comment suggests - if a
new register has a
non-integer mode, it is forced to prefer GENERAL_REGS. This changes the
register costs in pass 2 so
that they are incorrect. As a result the liverange is either spilled or
allocated to an integer
register:

void g(double);
void f(double x)
{
   if (x == 0)
 return;
   g (x);
   g (x);
}

f:
 fcmpd0, #0.0
 bne .L6
 ret
.L6:
 stp x19, x30, [sp, -16]!
 fmovx19, d0
 bl  g
 fmovd0, x19
 ldp x19, x30, [sp], 16
 b   g

With the fix it uses a floating point register as expected. Given a similar
issue in
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html, would it not be
better to change the
initialization values of reg_pref to illegal register classes so this kind
of issue can be trivially
found with an assert? Also would it not be a good idea to have a single
register copy function that
ensures all data is copied?
Having a function and the assert would be wonderful.  If you have a 
patch for this, I'll be glad to review it.


If you don't have a patch or have no time or willing to work on it, you 
can commit given here patch into the trunk.


Thanks.


ChangeLog: 2014-12-09  Wilco Dijkstra  wdijk...@arm.com

 * gcc/ira-emit.c (ira_create_new_reg): Copy preference classes.

---
  gcc/ira-emit.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/ira-emit.c b/gcc/ira-emit.c
index d246b7f..d736836 100644
--- a/gcc/ira-emit.c
+++ b/gcc/ira-emit.c
@@ -348,6 +348,7 @@ rtx
  ira_create_new_reg (rtx original_reg)
  {
rtx new_reg;
+  int original_regno = REGNO (original_reg);
  
new_reg = gen_reg_rtx (GET_MODE (original_reg));

ORIGINAL_REGNO (new_reg) = ORIGINAL_REGNO (original_reg);
@@ -356,8 +357,16 @@ ira_create_new_reg (rtx original_reg)
REG_ATTRS (new_reg) = REG_ATTRS (original_reg);
if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
  fprintf (ira_dump_file, "  Creating newreg=%i from oldreg=%i\n",
-REGNO (new_reg), REGNO (original_reg));
+REGNO (new_reg), original_regno);
ira_expand_reg_equiv ();
+
+  /* Copy the preference classes to new_reg.  */
+  resize_reg_info ();
+  setup_reg_classes (REGNO (new_reg),
+   reg_preferred_class (original_regno),
+   reg_alternate_class (original_regno),
+   reg_allocno_class (original_regno));
+
return new_reg;
  }

Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers

2015-11-11 Thread Charles Baylis

On 11 November 2015 at 12:10, Kyrill Tkachov  wrote:
>
> On 11/11/15 12:08, Charles Baylis wrote:
>>
>> On 11 November 2015 at 11:22, Kyrill Tkachov 
>> wrote:
>>>
>>> Hi Charles,
>>>
>>> On 08/11/15 00:26, charles.bay...@linaro.org wrote:

 From: Charles Baylis 

   Charles Baylis  

  * config/arm/neon.md (neon_vld1_lane): Remove error for
 invalid
  lane number.
  (neon_vst1_lane): Likewise.
  (neon_vld2_lane): Likewise.
  (neon_vst2_lane): Likewise.
  (neon_vld3_lane): Likewise.
  (neon_vst3_lane): Likewise.
  (neon_vld4_lane): Likewise.
  (neon_vst4_lane): Likewise.

>>> In this pattern the 'max' variable is now unused, causing a bootstrap
>>> -Werror failure on arm.
>>> I'll test a patch to fix it unless you beat me to it...
>>
>> Thanks for catching this.
>>
>> I have a patch, and have started a bootstrap. Unless you have
>> objections, I'll apply as obvious once the bootstrap is complete later
>> this afternoon.
>
>
> Yes, that's the exact patch I'm testing as well.
> I'll let you finish the bootstrap and commit it.

>>  gcc/ChangeLog:
>>
>>  2015-11-11  Charles Baylis  
>>
>>  * config/arm/neon.md: (neon_vld2_lane): Remove unused max
>>  variable.
>>  (neon_vst2_lane): Likewise.
>>  (neon_vld3_lane): Likewise.
>>  (neon_vst3_lane): Likewise.
>>  (neon_vld4_lane): Likewise.
>>  (neon_vst4_lane): Likewise.

Applied as r230203 after successful bootstrap on arm-unknown-linux-gnueabihf.

C++ PATCH to handling of duplicate typedefs

2015-11-11 Thread Jason Merrill

Another GC problem I noticed while looking at something else: when we 
freed a duplicate typedef, we were leaving its type in the variants 
list, with its TYPE_NAME still pointing to the now-freed TYPE_DECL, 
leading to a crash.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit e36d7607f157b5c90b56afe22786a2a0ff1711c8
Author: Jason Merrill 
Date:   Wed Nov 11 15:17:42 2015 -0500

	* decl.c (duplicate_decls): When combining typedefs, remove the
	new type from the variants list.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 76cc1d1..383b47d 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -2014,7 +2014,22 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
   /* For typedefs use the old type, as the new type's DECL_NAME points
 	 at newdecl, which will be ggc_freed.  */
   if (TREE_CODE (newdecl) == TYPE_DECL)
-	newtype = oldtype;
+	{
+	  newtype = oldtype;
+
+	  /* And remove the new type from the variants list.  */
+	  if (TYPE_NAME (TREE_TYPE (newdecl)) == newdecl)
+	{
+	  tree remove = TREE_TYPE (newdecl);
+	  for (tree t = TYPE_MAIN_VARIANT (remove); ;
+		   t = TYPE_NEXT_VARIANT (t))
+		if (TYPE_NEXT_VARIANT (t) == remove)
+		  {
+		TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (remove);
+		break;
+		  }
+	}
+	}
   else
 	/* Merge the data types specified in the two decls.  */
 	newtype = merge_types (TREE_TYPE (newdecl), TREE_TYPE (olddecl));

Re: [OpenACC] declare directive

2015-11-11 Thread James Norris

Jakub,

The attached patch and ChangeLog reflect the updates from your
review: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01317.html.

Highlights

The following issue was handled by Dominique d'Humières
in: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01375.html

On 11/11/2015 02:32 AM, Jakub Jelinek wrote:

On Mon, Nov 09, 2015 at 05:11:44PM -0600, James Norris wrote:

>diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
>index 953c4e3..c6a2981 100644
>--- a/gcc/c-family/c-pragma.h
>+++ b/gcc/c-family/c-pragma.h
>@@ -30,6 +30,7 @@ enum pragma_kind {
>PRAGMA_OACC_ATOMIC,
>PRAGMA_OACC_CACHE,
>PRAGMA_OACC_DATA,
>+  PRAGMA_OACC_DECLARE,
>PRAGMA_OACC_ENTER_DATA,
>PRAGMA_OACC_EXIT_DATA,
>PRAGMA_OACC_KERNELS,

This change will make PR68271 even worse, so would be really nice to
get that fixed first.

With the addition of: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01372.html,
additional conditions were added to the following as you called
out in your review of: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00703.html.

On 11/06/2015 01:03 PM, Jakub Jelinek wrote:
>> @@ -5841,6 +5863,8 @@ omp_default_clause (struct gimplify_omp_ctx *ctx, tree 
decl,

>> flags |= GOVD_FIRSTPRIVATE;
>> break;
>>   case OMP_CLAUSE_DEFAULT_UNSPECIFIED:
>> +  if (is_global_var (decl) && device_resident_p (decl))
>> +  flags |= GOVD_MAP_TO_ONLY | GOVD_MAP;
>
> I don't think you want to do this except for (selected or all?)
> OpenACC contexts.  Say, I don't see why one couldn't e.g. try to mix
> OpenMP host parallelization or tasking with OpenACC offloading,
> and that affecting in weird way OpenMP semantics.
>

With the addition of routine directive support, additional run-time tests
were added.

OK?

Thanks,
Jim
2015-XX-XX  James Norris  
Joseph Myers  

gcc/c-family/
* c-pragma.c (oacc_pragmas): Add entry for declare directive. 
* c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DECLARE.
(enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT and
PRAGMA_OACC_CLAUSE_LINK.

gcc/c/
* c-parser.c (c_parser_pragma): Handle PRAGMA_OACC_DECLARE.
(c_parser_omp_clause_name): Handle 'device_resident' clause.
(c_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(c_parser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OACC_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(c_parser_oacc_declare): New function.

gcc/cp/
* parser.c (cp_parser_omp_clause_name): Handle 'device_resident'
clause.
(cp_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(cp_paser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(cp_parser_oacc_declare): New function.
(cp_parser_pragma): Handle PRAGMA_OACC_DECLARE.
* pt.c (tsubst_expr): Handle OACC_DECLARE.

gcc/
* gimple-pretty-print.c (dump_gimple_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE. 
* gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DECLARE.
(is_gomple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.
* gimplify.c (oacc_declare_returns): New.
(gimplify_bind_expr): Prepend 'exit' stmt to cleanup.
(device_resident_p): New function.
(omp_default_clause): Handle device_resident clause.
(gimplify_oacc_declare_1, gimplify_oacc_declare): New functions.
(gimplify_expr): Handle OACC_DECLARE.
* omp-builtins.def (BUILT_IN_GOACC_DECLARE): New builtin.
* omp-low.c (expand_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE and BUILTIN_GOACC_DECLARE.
(build_omp_regions_1): Handlde GF_OMP_TARGET_KIND_OACC_DECLARE.
(lower_omp_target): Handle GF_OMP_TARGET_KIND_OACC_DECLARE,
GOMP_MAP_DEVICE_RESIDENT and GOMP_MAP_LINK.
(make_gimple_omp_edges): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.
* tree-pretty-print.c (dump_omp_clause): Handle GOMP_MAP_LINK and
GOMP_MAP_DEVICE_RESIDENT.

gcc/testsuite
* c-c++-common/goacc/declare-1.c: New test.
* c-c++-common/goacc/declare-2.c: Likewise.

include/
* gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_DEVICE_RESIDENT
and GOMP_MAP_LINK.

libgomp/

* libgomp.map (GOACC_2.0.1): Export GOACC_declare.
* oacc-parallel.c (GOACC_declare): New function.
* testsuite/libgomp.oacc-c-c++-common/declare-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/declare-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/declare-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/declare-5.c: Likewise.
* testsuite/libgomp.oacc-c++/declare-

C++ PATCH to instantiation of lambda closure

2015-11-11 Thread Jason Merrill

While looking into something else I noticed a GC problem with lambdas: 
it is possible to collect after instantiating the closure type, which 
implies instantiating the operator(), while we're in the middle of an 
expression, leading to corruption.  We deal with this sort of thing in 
mark_used by preventing collection, so let's do the same here.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit a881d7bf5eb64ee5f110ae0ff206805da560f3df
Author: Jason Merrill 
Date:   Wed Nov 11 15:48:01 2015 -0500

	* pt.c (instantiate_class_template_1): Set function_depth around
	instantiation of lambda op().

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index bfea8e2..076c1c7 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10168,7 +10168,12 @@ instantiate_class_template_1 (tree type)
 	{
 	  if (!DECL_TEMPLATE_INFO (decl)
 	  || DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl)) != decl)
-	instantiate_decl (decl, false, false);
+	{
+	  /* Set function_depth to avoid garbage collection.  */
+	  ++function_depth;
+	  instantiate_decl (decl, false, false);
+	  --function_depth;
+	}
 
 	  /* We need to instantiate the capture list from the template
 	 after we've instantiated the closure members, but before we

Recent patch craters vector tests on powerpc64le-linux-gnu

2015-11-11 Thread Bill Schmidt

Hi Ilya,

The patch committed as r230098 has caused a number of ICEs on
powerpc64le-linux-gnu.  The gcc/ChangeLog entry for that patch is:

2015-11-10  Ilya Enkovich  

* expr.c (do_store_flag): Use expand_vec_cmp_expr for mask
results.
(const_vector_mask_from_tree): New.
(const_vector_from_tree): Use const_vector_mask_from_tree
for boolean vectors.
* optabs-query.h (get_vec_cmp_icode): New.
* optabs-tree.c (expand_vec_cmp_expr_p): New.
* optabs-tree.h (expand_vec_cmp_expr_p): New.
* optabs.c (vector_compare_rtx): Add OPNO arg.
(expand_vec_cond_expr): Adjust to vector_compare_rtx change.
(expand_vec_cmp_expr): New.
* optabs.def (vec_cmp_optab): New.
(vec_cmpu_optab): New.
* optabs.h (expand_vec_cmp_expr): New.
* tree-vect-generic.c (expand_vector_comparison): Add vector
comparison optabs check.
* tree-vect-loop.c (vect_determine_vectorization_factor):
Ignore mask
operations for VF.  Add mask type computation.
* tree-vect-stmts.c (get_mask_type_for_scalar_type): New.
(vectorizable_comparison): New.
(vect_analyze_stmt): Add vectorizable_comparison.
(vect_transform_stmt): Likewise.
(vect_init_vector): Support boolean vector invariants.
(vect_get_vec_def_for_operand): Add VECTYPE arg.
(vectorizable_condition): Directly provide vectype for
invariants
used in comparison.
* tree-vectorizer.h (get_mask_type_for_scalar_type): New.
(enum vect_var_kind): Add vect_mask_var.
(enum stmt_vec_info_type): Add comparison_vec_info_type.
(vectorizable_comparison): New.
(vect_get_vec_def_for_operand): Add VECTYPE arg.
* tree-vect-data-refs.c (vect_get_new_vect_var): Support
vect_mask_var.
(vect_create_destination_var): Likewise.
* tree-vect-patterns.c (check_bool_pattern): Check fails
if we can vectorize comparison directly.
(search_type_for_mask): New.
(vect_recog_bool_pattern): Support cases when bool pattern
check fails.
* tree-vect-slp.c (vect_build_slp_tree_1): Allow
comparison statements.
(vect_get_constant_vectors): Support boolean vector
constants.
* config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
(ix86_expand_int_vec_cmp): New.
(ix86_expand_fp_vec_cmp): New.
* config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
op_true and op_false.
(ix86_int_cmp_code_to_pcmp_immediate): New.
(ix86_fp_cmp_code_to_pcmp_immediate): New.
(ix86_cmp_code_to_pcmp_immediate): New.
(ix86_expand_mask_vec_cmp): New.
(ix86_expand_fp_vec_cmp): New.
(ix86_expand_int_sse_cmp): New.
(ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
(ix86_expand_int_vec_cmp): New.
(ix86_get_mask_mode): New.
(TARGET_VECTORIZE_GET_MASK_MODE): New.
* config/i386/sse.md (avx512fmaskmodelower): New.
(vec_cmp): New.
(vec_cmp): New.
(vec_cmpv2div2di): New.
(vec_cmpu): New.
(vec_cmpu): New.
(vec_cmpuv2div2di): New.

Here is a list of regressions that were caused by the patch:

FAIL: c-c++-common/opaque-vector.c  -std=c++11 (internal compiler error)
FAIL: c-c++-common/opaque-vector.c  -std=c++11 (test for excess errors)
FAIL: c-c++-common/opaque-vector.c  -std=c++14 (internal compiler error)
FAIL: c-c++-common/opaque-vector.c  -std=c++14 (test for excess errors)
FAIL: c-c++-common/opaque-vector.c  -std=c++98 (internal compiler error)
FAIL: c-c++-common/opaque-vector.c  -std=c++98 (test for excess errors)
FAIL: c-c++-common/opaque-vector.c  -Wc++-compat  (internal compiler
error)
FAIL: c-c++-common/opaque-vector.c  -Wc++-compat  (test for excess
errors)
FAIL: c-c++-common/torture/vector-compare-1.c   -O0  (internal compiler
error)
FAIL: c-c++-common/torture/vector-compare-1.c   -O0  (internal compiler
error)
FAIL: c-c++-common/torture/vector-compare-1.c   -O0  (test for excess
errors)
FAIL: c-c++-common/torture/vector-compare-1.c   -O0  (test for excess
errors)
FAIL: c-c++-common/vector-compare-2.c  -std=gnu++11 (internal compiler
error)
FAIL: c-c++-common/vector-compare-2.c  -std=gnu++11 (test for excess
errors)
FAIL: c-c++-common/vector-compare-2.c  -std=gnu++14 (internal compiler
error)
FAIL: c-c++-common/vector-compare-2.c  -std=gnu++14 (test for excess
errors)
FAIL: c-c++-common/vector-compare-2.c  -std=gnu++98 (internal compiler
error)
FAIL: c-c++-common/vector-compare-2.c  -std=gnu++98 (test for excess
errors)
FAIL: c-c++-common/vector-compare-2.c  -Wc++-compat  (internal compiler
error)
FAIL: c-c++-common/vector-compare-2.c  -Wc++-compat  (test for excess
errors)
FAIL: c-c++-common/vector-scalar.c  -std=c++11 (internal compiler error)
FAIL: c-c++-common/vector-scalar.c  -std=c++11 (test for excess errors)
FAIL: c-c++-common/vector-scalar.c  -std=c++14 (in

[PATCH] Preserve the original program while using graphite

2015-11-11 Thread Aditya Kumar

Earlier, graphite used to translate portions of the original program after
scop-detection in order to represent the SCoP into polyhedral model.  This was
required because each basic block was represented as independent basic block in
the polyhedral model. So all the cross-basic-block dependencies were translated
out-of-ssa.

With this patch those dependencies are also exposed to the ISL, so there is no
need to modify the original structure of the program.

After this patch we should be able to enable graphite at some default
optimization level.


Highlights:
Remove cross bb scalar to array translation
For reductions, add support for more than just INT_CST
Early bailout on codegen.
Verify loop-closed ssa structure during copy of renames
The uses of exprs should come from bb which dominates the bb
Collect the init value of close phi in loop-guard
Do not follow vuses for close-phi, postpone loop-close phi until the
corresponding loop-phi is processed
Bail out if no bb found to place cond/loop -phis
Move insertion of liveouts at the end of codegen
Insert loop-phis in the loop-header.


This patch passes regtest and bootstrap with BOOT_CFLAGS='-O2 
-fgraphite-identity -floop-nest-optimize'


>From 706df301cdc8d2523408c663f62383308bc8a642 Mon Sep 17 00:00:00 2001
From: Aditya Kumar 
Date: Sat, 7 Nov 2015 13:39:23 -0600
Subject: [PATCH] Preserve the original program while using graphite.

Earlier, graphite used to translate portions of the original program after
scop-detection in order to represent the SCoP into polyhedral model.  This was
required because each basic block was represented as independent basic block in
the polyhedral model. So all the cross-basic-block dependencies were translated
out-of-ssa.

With this patch those dependencies are also exposed to the ISL, so there is no
need to modify the original structure of the program.

After this patch we should be able to enable graphite at some default
optimization level.


Highlights:
Remove cross bb scalar to array translation
For reductions, add support for more than just INT_CST
Early bailout on codegen.
Verify loop-closed ssa structure during copy of renames
The uses of exprs should come from bb which dominates the bb
Collect the init value of close phi in loop-guard
Do not follow vuses for close-phi, postpone loop-close phi until the
corresponding loop-phi is processed
Bail out if no bb found to place cond/loop -phis
Move insertion of liveouts at the end of codegen
Insert loop-phis in the loop-header.


This patch passes regtest and bootstrap with BOOT_CFLAGS='-O2 -fgraphite-identity -floop-nest-optimize'


2015-11-08  Aditya Kumar  
  Sebastian Pop  

	* gcc.dg/graphite/isl-ast-gen-user-1.c: Remove calls to std. library.

gcc/ChangeLog:

2015-11-08  Aditya Kumar  
  Sebastian Pop  

	* graphite-isl-ast-to-gimple.c (class translate_isl_ast_to_gimple):
	  New member codegen_error
	(translate_isl_ast_for_loop): Remove call to single_succ_edge and early return.
	(translate_isl_ast_node_user): Early return in case of error.
	(translate_isl_ast_to_gimple::translate_isl_ast): Same.
	(translate_isl_ast_to_gimple::translate_pending_phi_nodes): New.
	(add_parameters_to_ivs_params): Remove macro.
	(graphite_regenerate_ast_isl): Add if_region pointer to region.
	* graphite-poly.c (new_poly_dr): Remove macro.
	(print_pdr): Same.
	(new_gimple_poly_bb): Same.
	(free_gimple_poly_bb): Same.
	(print_scop_params): Same.
	* graphite-poly.h (struct poly_dr): Same.
	(struct poly_bb): Add new_bb.
	(gbb_from_bb): Remove dead code.
	(pbb_from_bb): Same. 
	* graphite-scop-detection.c (parameter_index_in_region_1): Same.
	(parameter_index_in_region): Same.
	(find_scop_parameters): Same.
	(build_cross_bb_scalars_def): New.
	(build_cross_bb_scalars_use): New.
	(graphite_find_cross_bb_scalar_vars): New
	(try_generate_gimple_bb): Reads and Writes.
	(build_alias_set): Move.
	(gather_bbs::before_dom_children): Gather bbs visited.
	(build_scops): call build_alias_set.
	* graphite-sese-to-poly.c (phi_arg_in_outermost_loop): Delete.
	(remove_simple_copy_phi): Delete.
	(remove_invariant_phi): Delete.
	(simple_copy_phi_p): Delete.
	(reduction_phi_p): Delete.
	(isl_id_for_dr): Remove unused param.
	(parameter_index_in_region_1): Remove macro usage.
	(set_scop_parameter_dim): Same.
	(add_param_constraints): Same.
	(add_conditions_to_constraints): Same
	(build_scop_iteration_domain): Same.
	(pdr_add_alias_set): Comment.
	(add_scalar_version_numbers): New.
	(build_poly_dr): ISL id.
	(build_scop_drs): Move.
	(build_poly_sr_1): Same.
	(insert_stmts): Remove.
	(build_poly_sr): New.
	(new_pbb_from_pbb): Delete.
	(insert_out_of_ssa_copy_on_edge): Delete.
	(create_zero_dim_array): Delete.
	(scalar_close_phi_node_p): Delete.
	(propagate_expr_outside_region): Delete.
	(rewrite_close_phi_out_of_ssa): Delete.
	(rewrite_phi_out_of_ssa): Delete.
	(rewrite_degenerate_phi): Delete.
	(rewrite_reductions_out_of_ssa): Delete.
	(rewrite_cross_bb_scalar_dependence): Delete.
	(handle_scalar_dep

Re: [v3 PATCH] Implement D0013R2, logical type traits.

2015-11-11 Thread Ville Voutilainen

On 12 November 2015 at 00:18, Jonathan Wakely  wrote:
> So I think we want to define them again, independently, in
> , even though it might lead to ambiguities

Here. Tested again on Linux-PPC64.

2015-11-11  Ville Voutilainen  

Implement D0013R2, logical type traits.

/libstdc++-v3
* include/experimental/type_traits (conjunction, disjunction,
negation, conjunction_v, disjunction_v, negation_v): New.
* include/std/type_traits (conjunction, disjunction, negation):
Likewise.
* testsuite/20_util/declval/requirements/1_neg.cc: Adjust.
* testsuite/20_util/make_signed/requirements/typedefs_neg.cc: Likewise.
* testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc:
Likewise.
* testsuite/experimental/type_traits/value.cc: Likewise.
* testsuite/20_util/logical_traits/requirements/explicit_instantiation.cc:
New.
* testsuite/20_util/logical_traits/requirements/typedefs.cc: Likewise.
* testsuite/20_util/logical_traits/value.cc: Likewise.

/testsuite
* g++.dg/cpp0x/Wattributes1.C: Adjust.
diff --git a/gcc/testsuite/g++.dg/cpp0x/Wattributes1.C 
b/gcc/testsuite/g++.dg/cpp0x/Wattributes1.C
index d818851..dd9011b 100644
--- a/gcc/testsuite/g++.dg/cpp0x/Wattributes1.C
+++ b/gcc/testsuite/g++.dg/cpp0x/Wattributes1.C
@@ -5,4 +5,4 @@
 #include 
 __attribute__((visibility("hidden")))void*operator new(std::size_t); // { 
dg-warning "visibility attribute ignored" }
 
-// { dg-message "previous declaration" "" { target *-*-* } 111 }
+// { dg-message "previous declaration" "" { target *-*-* } 116 }
diff --git a/libstdc++-v3/include/experimental/type_traits 
b/libstdc++-v3/include/experimental/type_traits
index b0ed3b0..e4f3ffe 100644
--- a/libstdc++-v3/include/experimental/type_traits
+++ b/libstdc++-v3/include/experimental/type_traits
@@ -271,6 +271,35 @@ template class _Op, 
typename... _Args>
   constexpr bool is_detected_convertible_v
 = is_detected_convertible<_To, _Op, _Args...>::value;
 
+#define __cpp_lib_experimental_logical_traits 201511
+
+template
+  struct conjunction
+  : __and_<_Bn...>
+  { };
+
+template
+  struct disjunction
+  : __or_<_Bn...>
+  { };
+
+template
+  struct negation
+  : __not_<_Pp>
+  { };
+
+template
+  constexpr bool conjunction_v
+= conjunction<_Bn...>::value;
+
+template
+  constexpr bool disjunction_v
+= disjunction<_Bn...>::value;
+
+template
+  constexpr bool negation_v
+= negation<_Pp>::value;
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace fundamentals_v2
 } // namespace experimental
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 7448d5b..e5102de 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -154,6 +154,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public integral_constant
 { };
 
+#if __cplusplus > 201402L
+
+#define __cpp_lib_logical_traits 201511
+
+  template
+struct conjunction
+: __and_<_Bn...>
+{ };
+
+  template
+struct disjunction
+: __or_<_Bn...>
+{ };
+
+  template
+struct negation
+: __not_<_Pp>
+{ };
+#endif
+
   // For several sfinae-friendly trait implementations we transport both the
   // result information (as the member type) and the failure information (no
   // member type). This is very similar to std::enable_if, but we cannot use
diff --git a/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc 
b/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
index 4e7deda..37bc6b1 100644
--- a/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
@@ -19,7 +19,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-error "static assertion failed" "" { target *-*-* } 2239 }
+// { dg-error "static assertion failed" "" { target *-*-* } 2259 }
 
 #include 
 
diff --git 
a/libstdc++-v3/testsuite/20_util/logical_traits/requirements/explicit_instantiation.cc
 
b/libstdc++-v3/testsuite/20_util/logical_traits/requirements/explicit_instantiation.cc
new file mode 100644
index 000..b2b6c71
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/20_util/logical_traits/requirements/explicit_instantiation.cc
@@ -0,0 +1,30 @@
+// { dg-options "-std=gnu++17" }
+// { dg-do compile }
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public

Re: Enable pointer TBAA for LTO

2015-11-11 Thread Jan Hubicka

> > On 11/11/2015 10:21 AM, Richard Biener wrote:
> > >On Tue, 10 Nov 2015, Jan Hubicka wrote:
> > >>The reason is that TYPE_CANONICAL is initialized in get_alias_set that 
> > >>may be
> > >>called before we finish all merging and then it is more fine grained than 
> > >>what
> > >>we need here (i.e. TYPE_CANONICAL of pointers to two differnt types will 
> > >>be
> > >>different, but here we want them to be equal so we can match:
> > >>
> > >>struct aa { void *ptr;};
> > >>struct bb { int * ptr;};
> > >>
> > >>Which is actually required for Fortran interoperability.
> > 
> > Just curious, is this sort of thing documented anywhere?
> 
> See http://www.j3-fortran.org/doc/year/10/10-007.pdf, section 15 
> (interoperability with C).
> It defines that C_PTR is compatible with any C non-function pointer.

.. and if you ask about GCC side documentation, I added testcases that should
trigger if the Fortran interoperability breaks.  I do not think we want to
document the above compatibility explicitly, because in future we may want to
have more fine grained TBAA for types that are not shared across fortran and C
code (= most types in practice)

Honza
> 
> Honza
> > 
> > 
> > Bernd

Re: [v3 PATCH] Implement D0013R2, logical type traits.

2015-11-11 Thread Jonathan Wakely


On 12/11/15 00:06 +0200, Ville Voutilainen wrote:

--- a/gcc/testsuite/g++.dg/cpp0x/Wattributes1.C
+++ b/gcc/testsuite/g++.dg/cpp0x/Wattributes1.C
@@ -5,4 +5,4 @@
#include 
__attribute__((visibility("hidden")))void*operator new(std::size_t); // { dg-warning 
"visibility attribute ignored" }

-// { dg-message "previous declaration" "" { target *-*-* } 111 }
+// { dg-message "previous declaration" "" { target *-*-* } 116 }
diff --git a/libstdc++-v3/include/experimental/type_traits 
b/libstdc++-v3/include/experimental/type_traits
index b0ed3b0..b7f3bda 100644
--- a/libstdc++-v3/include/experimental/type_traits
+++ b/libstdc++-v3/include/experimental/type_traits
@@ -271,6 +271,28 @@ template class _Op, 
typename... _Args>
  constexpr bool is_detected_convertible_v
= is_detected_convertible<_To, _Op, _Args...>::value;

+#if __cplusplus > 201402L
+
+#define __cpp_lib_experimental_logical_traits 201511
+
+using std::conjunction;
+using std::disjunction;
+using std::negation;


It's unfortunate if the std::experimental versions are only available
for C++17 mode, because the whole point of putting it into both C++17
and the TS was that users can make us of the std::experimental
versions sooner than they can make use of C++17 version. If both
versions depend on using -std=c++17 then the TS versions are entirely
redundant.

So I think we want to define them again, independently, in
, even though it might lead to ambiguities
for code that does:

using namespace std;
using namespace std::experimental;

Re: Enable pointer TBAA for LTO

2015-11-11 Thread Jan Hubicka

> On 11/11/2015 10:21 AM, Richard Biener wrote:
> >On Tue, 10 Nov 2015, Jan Hubicka wrote:
> >>The reason is that TYPE_CANONICAL is initialized in get_alias_set that may 
> >>be
> >>called before we finish all merging and then it is more fine grained than 
> >>what
> >>we need here (i.e. TYPE_CANONICAL of pointers to two differnt types will be
> >>different, but here we want them to be equal so we can match:
> >>
> >>struct aa { void *ptr;};
> >>struct bb { int * ptr;};
> >>
> >>Which is actually required for Fortran interoperability.
> 
> Just curious, is this sort of thing documented anywhere?

See http://www.j3-fortran.org/doc/year/10/10-007.pdf, section 15 
(interoperability with C).
It defines that C_PTR is compatible with any C non-function pointer.

Honza
> 
> 
> Bernd

Re: [libstdc++ testsuite][patch] many locale tests only SUPPORTED on linux, start making these portable

2015-11-11 Thread Jonathan Wakely


On 11/11/15 22:53 +0100, John Marino wrote:

On 11/11/2015 10:51 PM, Jonathan Wakely wrote:

On 16/10/15 11:21 +0200, John Marino wrote:

The only significant comment was:

e.g. `"de_DE" => "de_DE@ISO8859-15` should be `e.g. "de_DE" =>
"de_DE.ISO8859-15"` ? Since it's an encoding, @ is used for a modifier.
FYI, https://www.ietf.org/rfc/rfc4646.txt



It's a good catch; that looks like a typo.  Did that appear in more than
one place?


Dunno, I'd have to grep for it.


What are the next steps then?


I think we can go ahead and make this change, I'll try to get it
committed tomorrow or Friday.

[v3 PATCH] Implement D0013R2, logical type traits.

2015-11-11 Thread Ville Voutilainen

Tested on Linux-PPC64.

2015-11-11  Ville Voutilainen  

Implement D0013R2, logical type traits.

/libstdc++-v3
* include/experimental/type_traits (conjunction_v, disjunction_v,
negation_v): New.
* include/std/type_traits (conjunction, disjunction, negation):
Likewise.
* testsuite/20_util/declval/requirements/1_neg.cc: Adjust.
* testsuite/20_util/make_signed/requirements/typedefs_neg.cc: Likewise.
* testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc:
Likewise.
* testsuite/experimental/type_traits/value.cc: Likewise.
* testsuite/20_util/logical_traits/requirements/explicit_instantiation.cc:
New.
* testsuite/20_util/logical_traits/requirements/typedefs.cc: Likewise.
* testsuite/20_util/logical_traits/value.cc: Likewise.

/testsuite
* g++.dg/cpp0x/Wattributes1.C: Adjust.
diff --git a/gcc/testsuite/g++.dg/cpp0x/Wattributes1.C 
b/gcc/testsuite/g++.dg/cpp0x/Wattributes1.C
index d818851..dd9011b 100644
--- a/gcc/testsuite/g++.dg/cpp0x/Wattributes1.C
+++ b/gcc/testsuite/g++.dg/cpp0x/Wattributes1.C
@@ -5,4 +5,4 @@
 #include 
 __attribute__((visibility("hidden")))void*operator new(std::size_t); // { 
dg-warning "visibility attribute ignored" }
 
-// { dg-message "previous declaration" "" { target *-*-* } 111 }
+// { dg-message "previous declaration" "" { target *-*-* } 116 }
diff --git a/libstdc++-v3/include/experimental/type_traits 
b/libstdc++-v3/include/experimental/type_traits
index b0ed3b0..b7f3bda 100644
--- a/libstdc++-v3/include/experimental/type_traits
+++ b/libstdc++-v3/include/experimental/type_traits
@@ -271,6 +271,28 @@ template class _Op, 
typename... _Args>
   constexpr bool is_detected_convertible_v
 = is_detected_convertible<_To, _Op, _Args...>::value;
 
+#if __cplusplus > 201402L
+
+#define __cpp_lib_experimental_logical_traits 201511
+
+using std::conjunction;
+using std::disjunction;
+using std::negation;
+
+template
+  constexpr bool conjunction_v
+= conjunction<_Bn...>::value;
+
+template
+  constexpr bool disjunction_v
+= disjunction<_Bn...>::value;
+
+template
+  constexpr bool negation_v
+= negation<_Pp>::value;
+
+#endif
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace fundamentals_v2
 } // namespace experimental
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 7448d5b..e5102de 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -154,6 +154,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public integral_constant
 { };
 
+#if __cplusplus > 201402L
+
+#define __cpp_lib_logical_traits 201511
+
+  template
+struct conjunction
+: __and_<_Bn...>
+{ };
+
+  template
+struct disjunction
+: __or_<_Bn...>
+{ };
+
+  template
+struct negation
+: __not_<_Pp>
+{ };
+#endif
+
   // For several sfinae-friendly trait implementations we transport both the
   // result information (as the member type) and the failure information (no
   // member type). This is very similar to std::enable_if, but we cannot use
diff --git a/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc 
b/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
index 4e7deda..37bc6b1 100644
--- a/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/declval/requirements/1_neg.cc
@@ -19,7 +19,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-error "static assertion failed" "" { target *-*-* } 2239 }
+// { dg-error "static assertion failed" "" { target *-*-* } 2259 }
 
 #include 
 
diff --git 
a/libstdc++-v3/testsuite/20_util/logical_traits/requirements/explicit_instantiation.cc
 
b/libstdc++-v3/testsuite/20_util/logical_traits/requirements/explicit_instantiation.cc
new file mode 100644
index 000..b2b6c71
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/20_util/logical_traits/requirements/explicit_instantiation.cc
@@ -0,0 +1,30 @@
+// { dg-options "-std=gnu++17" }
+// { dg-do compile }
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// NB: This file is for testing type_traits with NO OTHER INCLUDES.
+
+#include 
+
+namespace std
+{
+  template struct conjunction;
+  template struct disjunction

Re: [libstdc++ testsuite][patch] many locale tests only SUPPORTED on linux, start making these portable

2015-11-11 Thread John Marino

On 11/11/2015 10:51 PM, Jonathan Wakely wrote:
> On 16/10/15 11:21 +0200, John Marino wrote:
> 
> The only significant comment was:
> 
> e.g. `"de_DE" => "de_DE@ISO8859-15` should be `e.g. "de_DE" =>
> "de_DE.ISO8859-15"` ? Since it's an encoding, @ is used for a modifier.
> FYI, https://www.ietf.org/rfc/rfc4646.txt
> 

It's a good catch; that looks like a typo.  Did that appear in more than
one place?

What are the next steps then?

Thanks,
John

Re: [libstdc++ testsuite][patch] many locale tests only SUPPORTED on linux, start making these portable

2015-11-11 Thread Jonathan Wakely


On 16/10/15 11:21 +0200, John Marino wrote:

There are a few issues with the locales portion of the libstdc++
testsuite.  These issues stem from the fact that much about locales are
not standardized, such as the names of the encodings, how to handle
abbreviated versions of named locales, and modifiers (e.g. @euro,
@preeuro).  Compounding the problem is that many of test themselves are
Linux specific.

The attached large patch makes a dent with an effort to resolve this
issues.  This patch has been scoped to only standardise the names of the
locales.  There is no net effect on Linux, but it allows many more tests
to be considered on other systems, including *BSD.

Some types of changes are:
1) convert abreviations to full names (e.g. "de_DE" => "de_DE@ISO8859-15
2) Use ISO-8859-15 over ISO-8859-1 in European locales when possible
3) Use "UTF-8" over variations like "utf8"
4) Use "ISO8859" over "ISO-8859"

on these last two, Linux is case-insensitive and hyphen-insensitive with
respect to named locales where other systems like BSD have only one
version that must be explicitly used.  Since encoding names are not
standardised, we've picked a lowest common denominator that works.

5) case changes for eucJP
6) convert tests that specified "@euro" modifier to a new ISO_8859 macro
can produces a named locale as a function of system (currently
FreeBSD/DragonFly/NetBSD and everything else but the tailoring is not
limited).  This fixes systems that don't support @euro modifier

There is a lot more work that cant be done to open up libstdc++ locale
tests to non-Linux systems, but given the size of this patch, I wanted
to limit it's scope to only modifying to the name locale specification.
There are many more modifications that could be made to the test
themselves to make them portable to non-linux systems but I think they
are better made with separate patches.

Andreas Tobler has done a lot of review and testing on both FreeBSD and
linux (to verify no regressions), and Jonathan Wakely has done an
initial review and said it was ready for a broader review now.


I asked some colleagues to look at this too (a few weeks ago now,
before I went on vacation).

The only significant comment was:

e.g. `"de_DE" => "de_DE@ISO8859-15` should be `e.g. "de_DE" => 
"de_DE.ISO8859-15"` ? Since it's an encoding, @ is used for a modifier.
FYI, https://www.ietf.org/rfc/rfc4646.txt

Re: [patch] backport remove of soft float for FreeBSD powerpc for gcc-4.9

2015-11-11 Thread Andreas Tobler


On 09.11.15 13:41, Andreas Tobler wrote:

Hi all,

any objections when I apply the below patch to gcc-4.9?



committed as: 230193

Andreas


2015-11-09  Andreas Tobler  

Backport from mainline
2015-03-04  Andreas Tobler  

* config/rs6000/t-freebsd64: Remove 32-bit soft-float multilibs.


Index: gcc/config/rs6000/t-freebsd64
===
--- gcc/config/rs6000/t-freebsd64   (revision 230016)
+++ gcc/config/rs6000/t-freebsd64   (working copy)
@@ -21,11 +21,9 @@
   # On FreeBSD the 32-bit libraries are found under /usr/lib32.
   # Set MULTILIB_OSDIRNAMES according to this.

-MULTILIB_OPTIONS= m32 msoft-float
-MULTILIB_DIRNAMES   = 32 nof
+MULTILIB_OPTIONS= m32
+MULTILIB_DIRNAMES   = 32
   MULTILIB_EXTRA_OPTS = fPIC mstrict-align
   MULTILIB_EXCEPTIONS =
-MULTILIB_EXCLUSIONS = !m32/msoft-float
   MULTILIB_OSDIRNAMES  = ../lib32
-#MULTILIB_MATCHES= $(MULTILIB_MATCHES_FLOAT)

Re: [patch] backport PIE support for FreeBSD to gcc-49

2015-11-11 Thread Andreas Tobler


On 09.11.15 13:38, Andreas Tobler wrote:

Hi,

any objections that I apply this patch to gcc-4.9?

It is FreeBSD only.


committed as: 230192

Andreas


2015-11-09  Andreas Tobler  

Backport from mainline
2015-05-18  Andreas Tobler  

* config/freebsd-spec.h (FBSD_STARTFILE_SPEC): Add the bits to build
pie executables.
(FBSD_ENDFILE_SPEC): Likewise.
* config/i386/freebsd.h (STARTFILE_SPEC): Remove and use the one from
config/freebsd-spec.h.
(ENDFILE_SPEC): Likewise.

2015-11-02  Andreas Tobler  

* config/rs6000/freebsd64.h (ASM_SPEC32): Adust spec to handle
PIE executables.

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-11-11 Thread Jeff Law


On 09/04/2015 11:36 AM, Ajit Kumar Agarwal wrote:


diff --git a/gcc/passes.def b/gcc/passes.def
index 6b66f8f..20ddf3d 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_ccp);
  /* After CCP we rewrite no longer addressed locals into SSA
 form if possible.  */
+  NEXT_PASS (pass_path_split);
  NEXT_PASS (pass_forwprop);
  NEXT_PASS (pass_sra_early);

I can't recall if we've discussed the location of the pass at all.  I'm
not objecting to this location, but would like to hear why you chose
this particular location in the optimization pipeline.
So returning to the question of where this would live in the 
optimization pipeline and how it interacts with if-conversion and 
vectorization.


The concern with moving it to late in the pipeline was that we'd miss 
VRP/DCE/CSE opportunities.  I'm not sure if you're aware, but we 
actually run those passes more than once.  So it would be possible to 
run path splitting after if-conversion & vectorization, but before the 
second passs of VRP & DOM.  But trying that seems to result in something 
scrambling the loop enough that the path splitting opportunity is 
missed.  That might be worth deeper investigation if we can't come up 
with some kind of heuristics to fire or suppress path splitting.


Other random notes as I look over the code:

Call the pass "path-split", not "path_split".  I don't think we have any 
passes with underscores in their names, dump files, etc.


You factored out the code for transform_duplicate.  When you create new 
functions, they should all have a block comment indicating what they do, 
return values, etc.


I asked you to trim down the #includes in tree-ssa-path-split.c  Most 
were ultimately unnecessary.  The trimmed list is just 11 headers.


Various functions in tree-ssa-path-split.c were missing their block 
comments.  There were several places in tree-ssa-path-split that I felt 
deserved a comment.  While you are familiar with the code, it's likely 
someone else will have to look at and modify this code at some point in 
the future.  The comments help make that easier.


In find_trace_loop_latch_same_as_join_blk, we find the immediate 
dominator of the latch and verify it ends in a conditional.  That's 
fine.  Then we look at the predecessors of the latch to see if one is 
succeeded only by the latch and falls through to the latch.  That is the 
block we'll end up redirecting to a copy of the latch.  Also fine.


Note how there is no testing for the relationship between the immediate 
dominator of the latch and the predecessors of the latch.  ISTM that we 
can have a fairly arbitrary region in the THEN/ELSE arms of the 
conditional.  Was this intentional?  Would it be advisable to verify 
that the THEN/ELSE arms are single blocks?  Do we want to verify that 
neither the THEN/ELSE arms transfer control other than to the latch?  Do 
we want to verify the predecessors of the latch are immediate successors 
of the latch's immediate dominator?


The is_feasible_trace routine was still checking if the block had a 
conversion and rejecting it.  I removed that check.  It does seem to me 
that we need an upper limit on the number of statements.  I wonder if we 
should factor out the maximum statements to copy code from jump 
threading and use it for both jump threading and path splitting.


Instead of creating loop with multiple latches, what ever happened to 
the idea of duplicating the latch block twice -- once into each path. 
Remove the control statement in each duplicate.  Then remove everything 
but the control statement in the original latch.



I added some direct dump support.  Essentially anytime we split the 
path, we output something like this:


Split path in loop: latch block 9, predecessor 7.

That allows tests in the testsuite to look for the "Split path in loop" 
string rather than inferring the information from the SSA graph update's 
replacement table.  It also allows us to do things like count how many 
paths get split if we have more complex tests.


On the topic of tests.  Is the one you provided something where path 
splitting results in a significant improvement?  From looking at the 
x86_64 output, I can see the path splitting transformation occur, but 
not any improvement in the final code.


While the existing test is useful, testing on code that actually 
improves as a result of path splitting is better.  Ideally we'd test 
both that path splitting occurred and that the secondary optimizations 
we wanted triggered.


The tests should go into gcc.dg/tree-ssa rather than just gcc.dg.

ANyway, here's my work-in-progress.  Your thoughts on the various 
questions, concerns, ideas noted above would be appreciated.  Obviously 
I'd like to wrap things up quickly and include this patch in gcc6.


Note, I haven't bootstrapped or regression tested this version.






diff --git a/gcc/Makefi

Re: [PATCH, 8/16] Add pass_ch_oacc_kernels

2015-11-11 Thread Tom de Vries


On 09/11/15 19:33, Tom de Vries wrote:

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch adds a pass pass_ch_oacc_kernels, which is like pass_ch, but
only runs for loops with oacc_kernels_region set.

[ But... thinking about it a bit more, I think that we could use a
regular pass_ch instead. We only use the kernels pass group for a single
loop nest in a kernels region, and we mark all the loops in the loop
nest with oacc_kernels_region. So I think that the oacc_kernels_region
test in pass_ch_oacc_kernels::process_loop_p evaluates to true. ]

So, I'll try to confirm with retesting that we can drop this patch.



That's confirmed. I can use pass_ch instead of pass_ch_oacc_kernels, so 
I'm dropping this patch from the series.


Thanks,
- Tom

[PATCH] correct -Wself-init diagnostic location

2015-11-11 Thread Martin Sebor


GCC warns for class members that are initialized with themselves,
and with c++/64667 fixed, this includes references.

Unfortunately, the diagnostic that's printed is inconsistent between
pointers and references, and other members. For the first two kinds,
it's missing the caret, and for others it points to the ctor rather
than to the member-initializer (see bellow).

The attached patch fixes it so that all such diagnostics consistently
point to the member-initializer, and adds a test to verify gcc does
so.

Martin

struct S {
int m;
int &r;
int *p;
S ():
m (m),
r (r),
p (p)
{ }
};
u.cpp: In constructor ‘S::S()’:
u.cpp:5:5: warning: ‘S::m’ is initialized with itself [-Winit-self]
 S ():
 ^

u.cpp:5:5: warning: ‘S::r’ is initialized with itself [-Winit-self]
u.cpp:5:5: warning: ‘S::p’ is initialized with itself [-Winit-self]
gcc/cp/
2015-11-11  Martin Sebor  

	PR c++/68208
	* init.c (perform_member_init): Use location of member-initializer
	in -Wself-init rather than that of the ctor.

gcc/testsuite/
2015-11-11  Martin Sebor  

	PR c++/68208
	* b/gcc/testsuite/g++.dg/warn/Winit-self-4.C: New test.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 2e11acb..055e9d9 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -638,7 +638,7 @@ perform_member_init (tree member, tree init)
 	val = TREE_OPERAND (val, 0);
   if (TREE_CODE (val) == COMPONENT_REF && TREE_OPERAND (val, 1) == member
 	  && TREE_OPERAND (val, 0) == current_class_ref)
-	warning_at (DECL_SOURCE_LOCATION (current_function_decl),
+	warning_at (EXPR_LOC_OR_LOC (val, input_location),
 		OPT_Winit_self, "%qD is initialized with itself",
 		member);
 }
diff --git a/gcc/testsuite/g++.dg/warn/Winit-self-4.C b/gcc/testsuite/g++.dg/warn/Winit-self-4.C
new file mode 100644
index 000..0e61abf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winit-self-4.C
@@ -0,0 +1,16 @@
+// PR c++/68208
+// { dg-options "-Winit-self" }
+
+// Verify that -Winit-self warning issued for reference and pointer
+// members points to the member-initializer and not to the constructor.
+struct S
+{
+int i;
+int *p;
+int &r;
+S ():
+i (i),// { dg-warning "initialized with itself" }
+p (p),// { dg-warning "initialized with itself" }
+r (r) // { dg-warning "initialized with itself" }
+{ }
+};

[v3 patch] GSoC: implement polymorphic memory resources

2015-11-11 Thread Jonathan Wakely

The first patch is the work of Fan You, who implemented the
"polymorphic memory resources" feature from the C++ Library
Fundamentals TS as a Google Summer of Code project. Thanks very much
to Fan You for contributing this work, and to Tim Shen for mentoring
the work, and Google for sponsoring it.

The second patch adds a few little pieces missing from Fan You's
patch.

Tested powerpc64le-linux.

Apart from one minor change to the definition of the
std::uses_allocator trait (which will have no effect on any existing
code) this only makes changes inside namespace std::experimental. I
intend to commit these patches before stage 1 ends.


commit 8ed54c84edbdebe30cb16ab97b54a4d157be67c2
Author: Jonathan Wakely 
Date:   Wed Nov 11 18:58:51 2015 +

Implement C++ LFTSv1 polymorphic memory resources

2015-11-11  Fan You  

* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/uses_allocator.h (__erased_type): Define.
(__uses_allocator_helper): Check for __erased_type.
* include/experimental/memory_resource: New.
* include/experimental/utlity: New.
* testsuite/experimental/type_erased_allocator/1.cc: New.
* testsuite/experimental/type_erased_allocator/1_neg.cc: New.
* testsuite/experimental/type_erased_allocator/2.cc: New.
* testsuite/experimental/type_erased_allocator/uses_allocator.cc: New.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 2dc0d01..ee9b6d8 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -656,6 +656,7 @@ experimental_headers = \
${experimental_srcdir}/list \
${experimental_srcdir}/map \
${experimental_srcdir}/memory \
+   ${experimental_srcdir}/memory_resource \
${experimental_srcdir}/numeric \
${experimental_srcdir}/optional \
${experimental_srcdir}/propagate_const \
@@ -668,6 +669,7 @@ experimental_headers = \
${experimental_srcdir}/type_traits \
${experimental_srcdir}/unordered_map \
${experimental_srcdir}/unordered_set \
+   ${experimental_srcdir}/utility \
${experimental_srcdir}/vector \
${experimental_filesystem_headers}
 
diff --git a/libstdc++-v3/include/bits/uses_allocator.h 
b/libstdc++-v3/include/bits/uses_allocator.h
index a0f084d..f7566a2 100644
--- a/libstdc++-v3/include/bits/uses_allocator.h
+++ b/libstdc++-v3/include/bits/uses_allocator.h
@@ -35,6 +35,12 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+  struct __erased_type { };
+
+  template
+using __is_erased_or_convertible
+  = __or_, is_convertible<_Alloc, _Tp>>;
+
   /// [allocator.tag]
   struct allocator_arg_t { explicit allocator_arg_t() = default; };
 
@@ -47,7 +53,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __uses_allocator_helper<_Tp, _Alloc,
   __void_t>
-: is_convertible<_Alloc, typename _Tp::allocator_type>::type
+: __is_erased_or_convertible<_Alloc, typename _Tp::allocator_type>::type
 { };
 
   /// [allocator.uses.trait]
diff --git a/libstdc++-v3/include/experimental/memory_resource 
b/libstdc++-v3/include/experimental/memory_resource
new file mode 100644
index 000..5c8cbd6
--- /dev/null
+++ b/libstdc++-v3/include/experimental/memory_resource
@@ -0,0 +1,383 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file experimental/memory_resource
+ *  This is a TS C++ Library header.
+ */
+
+#ifndef _GLIBCXX_EXPERIMENTAL_MEMORY_RESOURCE
+#define _GLIBCXX_EXPERIMENTAL_MEMORY_RESOURCE 1
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+namespace std {
+namespace experimental {
+inline namespace fundamentals_v2 {
+namespace pmr {
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  class memory_resource;
+
+  template 
+class polymorphic_allocator;
+
+  template 
+class __resource_adaptor_imp;

Re: [PATCH] [ARM/Aarch64] add initial Qualcomm support

2015-11-11 Thread Jim Wilson

On Wed, Nov 11, 2015 at 10:34 AM, Jim Wilson  wrote:
> I had to disable the cortex-a57 fma steering pass in the aarch64 port
> while testing the patch.  A bootstrap for aarch64 configured
> --with-cpu=cortex-a57 gives multiple ICEs while building the stage1
> libstdc++.  The ICEs are in scan_rtx_reg at regrename.c:1074.  This
> looks vaguely similar to PR 66785.

It looks like there is already a discussion of this issue in the
"Preferred rename register in regrename pass" thread, though I'm not
sure yet if it is the same issue or a closely related one.

Jim

Re: [PATCH] PR ada/66205 gnatbind generates invalid code when finalization is enabled in restricted runtime

2015-11-11 Thread Simon Wright

I’ve updated the original patch, which was built against 5.1.0 on 
x64_64-apple-darwin13, this patch is against 6.0.0-20151101 on 
x86_64-apple-darwin15.

--

If the RTS in use is "configurable" (I believe this is the same in this context 
as "restricted") and includes finalization, gnatbind generates binder code that 
won't compile.

This situation arises, for example, with an embedded RTS that incorporates the 
Ada 2012 generalized container iterators.

I note that in the last 3 hunks of the attached patch there may be overkill; 
the original code checks whether the No_Finalization restriction doesn’t occur, 
and I’ve added a check that Configurable_Run_Time_On_Target isn’t set; I 
suspect, given other areas of the code, that the No_Finalization check is 
actually intended as a way of determining that this is a restricted runtime, 
and that the Configurable_Run_Time_On_Target check could replace it.

The attached patch was bootstrapped/regression tested (make check-ada) against 
6.0.0 on x86_64-apple-darwin15 (which confirms that the patch hasn't broken 
builds against the standard RTS).

arm-eabi-gnatbind was successful against both an RTS with finalization and one 
without.

gcc/ada/Changelog:

2015-11-11  Simon Wright 

   PR ada/66205
   * bindgen.adb (Gen_Adafinal): if Configurable_Run_Time_On_Target is
   true, generate a null body.
   (Gen_Main): if Configurable_Run_Time_On_Target is true, then
   - don't import __gnat_initialize or __gnat_finalize (as Initialize,
   Finalize rsp).
   - don't call Initialize or Finalize.



gcc-6.0.0-20151101-gcc-ada-bindgen.adb.diff
Description: Binary data

Re: [OpenACC] declare directive

2015-11-11 Thread Dominique d'Humières

> > "Would be really nice" means that you're asking us to work on and resolve
> > PR68271 before installing this patch?
>
> Dominique has committed a quick hack for this, so it is not urgent, but
> would be nice to get it resolved.  If somebody from Mentor gets to that,
> perfect, otherwise I (or somebody else) will get to that eventually.
>
>   Jakub
Could you please mail me any related patch before committing it? I prefer to 
see any possible problem before the commit rather than after.

TIA

Dominique

[PATCH 1/2] [graphite] add testsuite automatic dg-options and dg-do action for isl-ast-gen-* and fuse-* files

2015-11-11 Thread Sebastian Pop

---
 gcc/testsuite/gcc.dg/graphite/fuse-1.c  | 10 +++---
 gcc/testsuite/gcc.dg/graphite/fuse-2.c  |  4 +---
 gcc/testsuite/gcc.dg/graphite/graphite.exp  |  2 ++
 gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-1.c|  3 ---
 gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-2.c|  3 ---
 gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-3.c|  3 ---
 gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-4.c|  3 ---
 gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c|  3 ---
 gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-2.c|  3 ---
 .../gcc.dg/graphite/isl-ast-gen-single-loop-1.c |  3 ---
 .../gcc.dg/graphite/isl-ast-gen-single-loop-2.c |  2 --
 .../gcc.dg/graphite/isl-ast-gen-single-loop-3.c |  2 --
 gcc/testsuite/gcc.dg/graphite/isl-ast-gen-user-1.c  | 12 +---
 .../gcc.dg/graphite/isl-codegen-loop-dumping.c  | 17 -
 14 files changed, 11 insertions(+), 59 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/graphite/isl-codegen-loop-dumping.c

diff --git a/gcc/testsuite/gcc.dg/graphite/fuse-1.c 
b/gcc/testsuite/gcc.dg/graphite/fuse-1.c
index c9bb67d..249276c 100644
--- a/gcc/testsuite/gcc.dg/graphite/fuse-1.c
+++ b/gcc/testsuite/gcc.dg/graphite/fuse-1.c
@@ -1,7 +1,6 @@
 /* Check that the two loops are fused and that we manage to fold the two xor
operations.  */
-/* { dg-options "-O2 -floop-nest-optimize -fdump-tree-forwprop-all" } */
-/* { dg-do run } */
+/* { dg-options "-O2 -floop-nest-optimize -fdump-tree-forwprop-all 
-fdump-tree-graphite-all" } */
 
 /* Make sure we fuse the loops like this:
 ISL AST generated by ISL:
@@ -9,15 +8,12 @@ for (int c0 = 0; c0 <= 99; c0 += 1) {
   S_3(c0);
   S_6(c0);
   S_9(c0);
-}
-*/
-/* { dg-final { scan-tree-dump-times "ISL AST generated by ISL:.*for (int c0 = 
0; c0 <= 99; c0 += 1) \{.*S_.*(c0);.*S_.*(c0);.*S_.*(c0);.*\}" 1 "graphite" } } 
*/
+} */
+/* { dg-final { scan-tree-dump-times "ISL AST generated by ISL:.*for \\(int c0 
= 0; c0 <= 99; c0 \\+= 1\\) 
\\{.*S_.*\\(c0\\);.*S_.*\\(c0\\);.*S_.*\\(c0\\);.*\\}" 1 "graphite" } } */
 
 /* Check that after fusing the loops, the scalar computation is also fused.  */
 /* { dg-final { scan-tree-dump-times "gimple_simplified to\[^\\n\]*\\^ 12" 1 
"forwprop4" } } */
 
-
-
 #define MAX 100
 int A[MAX];
 
diff --git a/gcc/testsuite/gcc.dg/graphite/fuse-2.c 
b/gcc/testsuite/gcc.dg/graphite/fuse-2.c
index aaa5e2f..2f27c66 100644
--- a/gcc/testsuite/gcc.dg/graphite/fuse-2.c
+++ b/gcc/testsuite/gcc.dg/graphite/fuse-2.c
@@ -1,6 +1,4 @@
 /* Check that the three loops are fused.  */
-/* { dg-options "-O2 -floop-nest-optimize" } */
-/* { dg-do run } */
 
 /* Make sure we fuse the loops like this:
 ISL AST generated by ISL:
@@ -11,7 +9,7 @@ for (int c0 = 0; c0 <= 99; c0 += 1) {
 }
 */
 
-/* { dg-final { scan-tree-dump-times "ISL AST generated by ISL:.*for (int c0 = 
0; c0 <= 99; c0 += 1) \{.*S_.*(c0);.*S_.*(c0);.*S_.*(c0);.*\}" 1 "graphite" } } 
*/
+/* { dg-final { scan-tree-dump-times "ISL AST generated by ISL:.*for \\(int c0 
= 0; c0 <= 99; c0 \\+= 1\\) 
\\{.*S_.*\\(c0\\);.*S_.*\\(c0\\);.*S_.*\\(c0\\);.*\\}" 1 "graphite" } } */
 
 #define MAX 100
 int A[MAX], B[MAX], C[MAX];
diff --git a/gcc/testsuite/gcc.dg/graphite/graphite.exp 
b/gcc/testsuite/gcc.dg/graphite/graphite.exp
index f2d1417..8e1a229 100644
--- a/gcc/testsuite/gcc.dg/graphite/graphite.exp
+++ b/gcc/testsuite/gcc.dg/graphite/graphite.exp
@@ -43,6 +43,8 @@ set id_files  [lsort [glob -nocomplain 
$srcdir/$subdir/id-*.c ] ]
 set run_id_files  [lsort [glob -nocomplain $srcdir/$subdir/run-id-*.c ] ]
 set opt_files [lsort [glob -nocomplain $srcdir/$subdir/interchange-*.c 
\
   
$srcdir/$subdir/uns-interchange-*.c \
+  $srcdir/$subdir/isl-ast-gen-*.c \
+  $srcdir/$subdir/fuse-*.c \
   $srcdir/$subdir/block-*.c \
   $srcdir/$subdir/uns-block-*.c ] ]
 set vect_files[lsort [glob -nocomplain $srcdir/$subdir/vect-*.c ] ]
diff --git a/gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-1.c 
b/gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-1.c
index 6146b18..cd67d87 100644
--- a/gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-1.c
+++ b/gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-1.c
@@ -1,6 +1,3 @@
-/* { dg-do run } */
-/* { dg-options "-O2 -fgraphite-identity" } */
-
 int n = 50;
 static int __attribute__((noinline))
 foo ()
diff --git a/gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-2.c 
b/gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-2.c
index 42ff30a..d97a8ab 100644
--- a/gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-2.c
+++ b/gcc/testsuite/gcc.dg/graphite/isl-ast-gen-blocks-2.c
@@ -1,6 +1,3 @@
-/* { dg-do run } */
-/* { dg-options "-O2 -fgraphite-identity" } */
-
 int k = 50;
 static int __attr

[PATCH 2/2] [graphite] improve construction of the original schedule

2015-11-11 Thread Sebastian Pop

The patch builds the original schedule based on the now optimized scattering
dimension instead of building one based on the loop index only.

The implementation is simpler and catches more cases where the original schedule
and the transformed schedule are the same, such as the one below:

for (i = 0; i < 1000; i++)
{
  Temp = F[i];
  for (j = 0; j < 1000; j++)
 {
D[j] = E[j]  * Temp;
A[i][j] = A[i][j]  + B[i][j] * C[i][j] - D[j] ;
  }
  D[i] = E[i] * F[i];
}

  * graphite-sese-to-poly.c (build_scop_original_schedule): Call
  isl_union_map_add_map on every pbb->schedule.
---
 gcc/graphite-sese-to-poly.c | 27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index ba45199..3c24512 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -446,31 +446,26 @@ build_scop_minimal_scattering (scop_p scop)
  }
 
Static schedules for A to D expressed in a union map:
-
-   { S_A[i0, i1] -> [i0, i1]; S_B[i0] -> [i0]; S_C[] -> []; S_9[i0] -> [i0] }
-
+   {
+ S_A[i0, i1] -> [0, i0, 0, i1];
+ S_B[i0] -> [0, i0, 1];
+ S_C[]   -> [1];
+ S_D[i0] -> [2, i0, 0]
+   }
 */
 
 static void
 build_scop_original_schedule (scop_p scop)
 {
+  int i;
+  poly_bb_p pbb;
+
   isl_space *space = isl_set_get_space (scop->param_context);
   isl_union_map *res = isl_union_map_empty (space);
 
-  int i;
-  poly_bb_p pbb;
   FOR_EACH_VEC_ELT (scop->pbbs, i, pbb)
-{
-  int nb_dimensions = isl_set_dim (pbb->domain, isl_dim_set);
-  isl_space *dc = isl_set_get_space (pbb->domain);
-  isl_space *dm = isl_space_add_dims (isl_space_from_domain (dc),
- isl_dim_out, nb_dimensions);
-  isl_map *mp = isl_map_universe (dm);
-  for (int i = 0; i < nb_dimensions; i++)
-   mp = isl_map_equate (mp, isl_dim_in, i, isl_dim_out, i);
-
-  res = isl_union_map_add_map (res, mp);
-}
+res = isl_union_map_add_map (res, isl_map_copy (pbb->schedule));
+
   scop->original_schedule = res;
 }
 
-- 
1.9.1

Re: [PATCH] PR fortran/68283 -- remove a rogue gfc_internal_error()

2015-11-11 Thread Jerry DeLisle

On 11/11/2015 10:34 AM, Steve Kargl wrote:
> This probably falls under the "obviously correct" moniker.
> It has been built and tested on i386-*-freebsd.  OK to commit?
> 
> The patch removes a gfc_internal_error().  I suspect that it
> was originally put into gfortran to cover "correctly written
> valid Fortran code cannot possibly ever hit this line of 
> code; so, it must be an internal error to reach this line".
> The code in PR 68283 is not valid Fortran.  A number of error
> messages are spewed by gfortran prior to hitting this line of code.
> The patch simply removes the gfc_internal_error(), which allows
> gfortran to exit gracefully.
> 
> 2015-11-11  Steven G. Kargl  
> 
>   PR fortran/68283
>   * primary.c (gfc_variable_attr): Remove a gfc_internal_error().
> 

OK Steve,

Jerry

Re: [PATCH] PR47266

2015-11-11 Thread Jerry DeLisle

On 11/10/2015 06:41 AM, Dominique d'Humières wrote:
> Index: gcc/testsuite/ChangeLog
> ===
> --- gcc/testsuite/ChangeLog   (revision 229793)
> +++ gcc/testsuite/ChangeLog   (working copy)
> @@ -1,3 +1,9 @@
> +2015-11-10  Dominique d'Humieres 
> +
> + PR fortran/47266
> + * gfortran.dg/warn_unused_function_2.f90: Add a new test.
>

OK for trunk

Jerry

Re: [Patch] Optimize condition reductions where the result is an integer induction variable

2015-11-11 Thread Alan Hayward



On 11/11/2015 13:25, "Richard Biener"  wrote:

>On Wed, Nov 11, 2015 at 1:22 PM, Alan Hayward 
>wrote:
>> Hi,
>> I hoped to post this in time for Monday’s cut off date, but
>>circumstances
>> delayed me until today. Hoping if possible this patch will still be able
>> to go in.
>>
>>
>> This patch builds upon the change for PR65947, and reduces the amount of
>> code produced in a vectorized condition reduction where operand 2 of the
>> COND_EXPR is an assignment of a increasing integer induction variable
>>that
>> won't wrap.
>>
>>
>> For example (assuming all types are ints), this is a match:
>>
>> last = 5;
>> for (i = 0; i < N; i++)
>>   if (a[i] < min_v)
>> last = i;
>>
>> Whereas, this is not because the result is based off a memory access:
>> last = 5;
>> for (i = 0; i < N; i++)
>>   if (a[i] < min_v)
>> last = a[i];
>>
>> In the integer induction variable case we can just use a MAX reduction
>>and
>> skip all the code I added in my vectorized condition reduction patch -
>>the
>> additional induction variables in vectorizable_reduction () and the
>> additional checks in vect_create_epilog_for_reduction (). From the patch
>> diff only, it's not immediately obvious that those parts will be skipped
>> as there is no code changes in those areas.
>>
>> The initial value of the induction variable is force set to zero, as any
>> other value could effect the result of the induction. At the end of the
>> loop, if the result is zero, then we restore the original initial value.
>
>+static bool
>+is_integer_induction (gimple *stmt, struct loop *loop)
>
>is_nonwrapping_integer_induction?
>
>+  tree lhs_max = TYPE_MAX_VALUE (TREE_TYPE (gimple_phi_result (stmt)));
>
>don't use TYPE_MAX_VALUE.
>
>+  /* Check that the induction increments.  */
>+  if (tree_int_cst_compare (step, size_zero_node) <= 0)
>+return false;
>
>tree_int_cst_sgn (step) == -1
>
>+  /* Check that the max size of the loop will not wrap.  */
>+
>+  if (! max_loop_iterations (loop, &ni))
>+return false;
>+  /* Convert backedges to iterations.  */
>+  ni += 1;
>
>just use max_stmt_executions (loop, &ni) which properly checks for
>overflow
>of the +1.
>
>+  max_loop_value = wi::add (wi::to_widest (base),
>+   wi::mul (wi::to_widest (step), ni));
>+
>+  if (wi::gtu_p (max_loop_value, wi::to_widest (lhs_max)))
>+return false;
>
>you miss a check for the wi::add / wi::mul to overflow.  You can use
>extra args to determine this.
>
>Instead of TYPE_MAX_VALUE use wi::max_value (precision, sign).
>
>I wonder if you want to skip all the overflow checks for
>TYPE_OVERFLOW_UNDEFINED
>IV types?
>

Ok with all the above.

Tried using max_value () but this gave me a wide_int instead of a
widest_int.
Instead I've replaced with min_precision and GET_MODE_BITSIZE.

Added an extra check for when the type is TYPE_OVERFLOW_UNDEFINED.



Alan.



optimizeConditionReductions2.patch
Description: Binary data

[PATCH] [ARM/Aarch64] add initial Qualcomm support

2015-11-11 Thread Jim Wilson

This adds an option for the Qualcomm server parts, qdf24xx, just
optimizing like a cortex-a57 for now, same as how the initial Samsung
exynos-m1 support worked.

This was tested with armv8 and aarch64 bootstraps and make check.

I had to disable the cortex-a57 fma steering pass in the aarch64 port
while testing the patch.  A bootstrap for aarch64 configured
--with-cpu=cortex-a57 gives multiple ICEs while building the stage1
libstdc++.  The ICEs are in scan_rtx_reg at regrename.c:1074.  This
looks vaguely similar to PR 66785.

I am also seeing extra make check failures due to ICEs with armv8
bootstrap builds configured --with-cpu=cortex-a57,  I see ICEs in
scan_rtx_reg in regrename, and ICEs in decompose_normal_address in
rtlanal.c.  The arm port doesn't have the fma steering support, which
seems odd, and is maybe a bug, so it isn't clear what is causing this
problem.

I plan to look at these aarch64 and armv8 failures next, including PR
66785.  None of these have anything to do with my patch, as they
trigger for cortex-a57 which is already supported.

Jim
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 230118)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,13 @@
+2015-11-10  Jim Wilson  
+
+	* config/aarch64/aarch64-cores.def (qdf24xx): New.
+	* config/aarch64/aarch64-tune.md: Regenerated.
+	* config/arm/arm-cores.def (qdf24xx): New.
+	* config/arm/arm-tables.opt, config/arm/arm-tune.md: Regenerated.
+	* config/arm/bpabi.h (BE8_LINK_SPEC): Add qdf24xx support.
+	* doc/invoke.texi (AArch64 Options/-mtune): Add "qdf24xx".
+	(ARM Options/-mtune); Likewise.
+
 2015-11-10  Uros Bizjak  
 
 	* config/i386/i386.c (ix86_print_operand): Remove dead code that
Index: gcc/config/aarch64/aarch64-cores.def
===
--- gcc/config/aarch64/aarch64-cores.def	(revision 230118)
+++ gcc/config/aarch64/aarch64-cores.def	(working copy)
@@ -44,6 +44,7 @@ AARCH64_CORE("cortex-a53",  cortexa53, cortexa53,
 AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07")
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
 AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa72, "0x53", "0x001")
+AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800")
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, xgene1, "0x50", "0x000")
 
Index: gcc/config/aarch64/aarch64-tune.md
===
--- gcc/config/aarch64/aarch64-tune.md	(revision 230118)
+++ gcc/config/aarch64/aarch64-tune.md	(working copy)
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa53,cortexa57,cortexa72,exynosm1,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
+	"cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
Index: gcc/config/arm/arm-cores.def
===
--- gcc/config/arm/arm-cores.def	(revision 230118)
+++ gcc/config/arm/arm-cores.def	(working copy)
@@ -169,6 +169,7 @@ ARM_CORE("cortex-a53",	cortexa53, cortexa53,	8A,	A
 ARM_CORE("cortex-a57",	cortexa57, cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
 ARM_CORE("cortex-a72",	cortexa72, cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
 ARM_CORE("exynos-m1",	exynosm1,  cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
+ARM_CORE("qdf24xx",	qdf24xx,   cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
 ARM_CORE("xgene1",  xgene1,xgene1,  8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_FOR_ARCH8A),xgene1)
 
 /* V8 big.LITTLE implementations */
Index: gcc/config/arm/arm-tables.opt
===
--- gcc/config/arm/arm-tables.opt	(revision 230118)
+++ gcc/config/arm/arm-tables.opt	(working copy)
@@ -316,6 +316,9 @@ EnumValue
 Enum(processor_type) String(exynos-m1) Value(exynosm1)
 
 EnumValue
+Enum(processor_type) String(qdf24xx) Value(qdf24xx)
+
+EnumValue
 Enum(processor_type) String(xgene1) Value(xgene1)
 
 EnumValue
Index: gcc/config/arm/arm-tune.md
===
--- gcc/config/arm/arm-tune.md	(revision 230118)
+++ gcc/config/arm/arm-tune.md	(working copy)
@@ -33,6 +33,6

[PATCH] PR fortran/68283 -- remove a rogue gfc_internal_error()

2015-11-11 Thread Steve Kargl

This probably falls under the "obviously correct" moniker.
It has been built and tested on i386-*-freebsd.  OK to commit?

The patch removes a gfc_internal_error().  I suspect that it
was originally put into gfortran to cover "correctly written
valid Fortran code cannot possibly ever hit this line of 
code; so, it must be an internal error to reach this line".
The code in PR 68283 is not valid Fortran.  A number of error
messages are spewed by gfortran prior to hitting this line of code.
The patch simply removes the gfc_internal_error(), which allows
gfortran to exit gracefully.

2015-11-11  Steven G. Kargl  

PR fortran/68283
* primary.c (gfc_variable_attr): Remove a gfc_internal_error().

-- 
Steve
Index: gcc/fortran/primary.c
===
--- gcc/fortran/primary.c	(revision 229970)
+++ gcc/fortran/primary.c	(working copy)
@@ -2268,8 +2268,6 @@ gfc_variable_attr (gfc_expr *expr, gfc_t
 		&& errors > 0)
 		  break;
 	  }
-	if (n == ref->u.ar.as->rank)
-	  gfc_internal_error ("gfc_variable_attr(): Bad array reference");
 	  }
 
 	break;

Re: [PATCH][ARC] Fix ARC backend ICE on pr29921-2

2015-11-11 Thread Joern Wolfgang Rennecke




On 11/11/15 15:22, Claudiu Zissulescu wrote:

Please find attached a patch that fixes the ARC backend ICE on pr29921-2 test 
from gcc.dg (dg.exp).

The patch will allow generating conditional move also outside expand scope. The 
error was triggered during if-conversion.

Ok to apply?


OK.

Re: [PR64164] drop copyrename, integrate into expand

2015-11-11 Thread Alexandre Oliva

On Nov 10, 2015, Jeff Law  wrote:

>> * function.c (assign_parm_setup_block): Right-shift
>> upward-padded big-endian args when bypassing the stack slot.
> Don't you need to check the value of BLOCK_REG_PADDING at runtime?
> The padding is essentially allowed to vary.

Well, yeah, it's the result of BLOCK_REG_PADDING that tells whether
upward-padding occurred and shifting is required.

> If you  look at the other places where BLOCK_REG_PADDING is used, it's
> checked in a #ifdef, then again inside a if conditional.

That's what I do in the patch too.

That said, the initial conditions in the if/else-if/else chain for the
no-larger-than-a-word case cover all of the non-BLOCK_REG_PADDING cases
correctly, so that, if BLOCK_REG_PADDING is not defined, we can just
skip the !MEM_P block altogether.  That's also the reason why we can go
straight to shifting when we get there.

I tried to document my reasoning in the comments, but maybe it was still
too obscure?

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-11 Thread Jonathan Wakely


On 11/11/15 18:17 +0100, Dominique d'Humières wrote:

Revision r230175


2015-11-10  Ville Voutilainen  

LWG 2510, make the default constructors of library tag types
explicit.
* include/bits/mutex.h (defer_lock_t, try_lock_t,
adopt_lock_t): Add an explicit default constructor.
* include/bits/stl_pair.h (piecewise_construct_t): Likewise.
* include/bits/uses_allocator.h (allocator_arg_t): Likewise.
* libsupc++/new (nothrow_t): Likewise.
* testsuite/17_intro/tag_type_explicit_ctor.cc: New.


breaks bootstrap

libtool: compile:  /opt/gcc/build_w/./gcc/xgcc -shared-libgcc 
-B/opt/gcc/build_w/./gcc -nostdinc++ 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/src 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/src/.libs 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/libsupc++/.libs 
-B/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/bin/ 
-B/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/lib/ -isystem 
/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/include -isystem 
/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/sys-include 
-I/opt/gcc/work/libstdc++-v3/../libgcc 
-I/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/include/x86_64-apple-darwin14.5.0
 -I/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/include 
-I/opt/gcc/work/libstdc++-v3/libsupc++ -D_GLIBCXX_SHARED 
-fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi 
-fdiagnostics-show-location=once -fvisibility-inlines-hidden 
-ffunction-sections -fdata-sections -frandom-seed=new_handler.lo -g -O2 
-std=gnu++11 -c ../../../../work/libstdc++-v3/libsupc++/new_handler.cc  
-fno-common -DPIC -D_GLIBCXX_SHARED -o new_handler.o
../../../../work/libstdc++-v3/libsupc++/new_handler.cc:37:39: error: converting 
to 'std::nothrow_t' from initializer list would use explicit constructor 
'constexpr std::nothrow_t::nothrow_t()'
const std::nothrow_t std::nothrow = { };
  ^
see https://gcc.gnu.org/ml/gcc-regression/2015-11/


Fixed by this patch.

commit 97c2da9d4cc11bd5dae077ccb5fda4e72f7c34d5
Author: Jonathan Wakely 
Date:   Wed Nov 11 17:27:23 2015 +

	* libsupc++/new_handler.cc: Fix for explicit constructor change.

diff --git a/libstdc++-v3/libsupc++/new_handler.cc b/libstdc++-v3/libsupc++/new_handler.cc
index a09012c..4da48b3 100644
--- a/libstdc++-v3/libsupc++/new_handler.cc
+++ b/libstdc++-v3/libsupc++/new_handler.cc
@@ -34,7 +34,7 @@ namespace
 }
 #endif
 
-const std::nothrow_t std::nothrow = { };
+const std::nothrow_t std::nothrow = std::nothrow_t{ };
 
 using std::new_handler;
 namespace

Re: [OpenACC] declare directive

2015-11-11 Thread Jakub Jelinek

On Wed, Nov 11, 2015 at 11:08:21AM +0100, Thomas Schwinge wrote:
> Hi!
> 
> On Wed, 11 Nov 2015 09:32:33 +0100, Jakub Jelinek  wrote:
> > On Mon, Nov 09, 2015 at 05:11:44PM -0600, James Norris wrote:
> > > diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
> > > index 953c4e3..c6a2981 100644
> > > --- a/gcc/c-family/c-pragma.h
> > > +++ b/gcc/c-family/c-pragma.h
> > > @@ -30,6 +30,7 @@ enum pragma_kind {
> > >PRAGMA_OACC_ATOMIC,
> > >PRAGMA_OACC_CACHE,
> > >PRAGMA_OACC_DATA,
> > > +  PRAGMA_OACC_DECLARE,
> > >PRAGMA_OACC_ENTER_DATA,
> > >PRAGMA_OACC_EXIT_DATA,
> > >PRAGMA_OACC_KERNELS,
> > 
> > This change will make PR68271 even worse, so would be really nice to
> > get that fixed first.
> 
> "Would be really nice" means that you're asking us to work on and resolve
> PR68271 before installing this patch?

Dominique has committed a quick hack for this, so it is not urgent, but
would be nice to get it resolved.  If somebody from Mentor gets to that,
perfect, otherwise I (or somebody else) will get to that eventually.

Jakub

open acc default data attribute

2015-11-11 Thread Nathan Sidwell


Jakub,
this patch implements default data attribute determination.  The current 
behaviour defaults to 'copy' and ignores 'default(none)'. The  patch corrects that.


1) We emit a diagnostic when 'default(none)' is in effect.  The fortran FE emits 
some artificial decls that it doesn't otherwise annotate, which is why we check 
DECL_ARTIFICIAL.  IIUC Cesar had a patch to address that but it needed some 
reworking?


2) 'copy' is the correct default for 'kernels' regions, but for a 'parallel' 
region, scalars should be 'firstprivate', which is what this patch implements.


ok?

nathan
2015-11-11  Nathan Sidwell  

	gcc/
	* gimplify.c (oacc_default_clause): New.
	(omp_notice_variable): Call it.

	gcc/testsuite/
	* c-c++-common/goacc/data-default-1.c: New.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/default-1.c: New.

Index: gcc/gimplify.c
===
--- gcc/gimplify.c	(revision 230169)
+++ gcc/gimplify.c	(working copy)
@@ -5900,6 +5900,60 @@ omp_default_clause (struct gimplify_omp_
   return flags;
 }
 
+
+/* Determine outer default flags for DECL mentioned in an OACC region
+   but not declared in an enclosing clause.  */
+
+static unsigned
+oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
+{
+  const char *rkind;
+
+  switch (ctx->region_type)
+{
+default:
+  gcc_unreachable ();
+
+case ORT_ACC_KERNELS:
+  /* Everything under kernels are default 'present_or_copy'.  */
+  flags |= GOVD_MAP;
+  rkind = "kernels";
+  break;
+
+case ORT_ACC_PARALLEL:
+  {
+	tree type = TREE_TYPE (decl);
+
+	if (TREE_CODE (type) == REFERENCE_TYPE
+	|| POINTER_TYPE_P (type))
+	  type = TREE_TYPE (type);
+
+	if (AGGREGATE_TYPE_P (type))
+	  /* Aggregates default to 'present_or_copy'.  */
+	  flags |= GOVD_MAP;
+	else
+	  /* Scalars default to 'firstprivate'.  */
+	  flags |= GOVD_FIRSTPRIVATE;
+	rkind = "parallel";
+  }
+  break;
+}
+
+  if (DECL_ARTIFICIAL (decl))
+; /* We can get compiler-generated decls, and should not complain
+	 about them.  */
+  else if (ctx->default_kind == OMP_CLAUSE_DEFAULT_NONE)
+{
+  error ("%qE not specified in enclosing OpenACC %s construct",
+	 DECL_NAME (lang_hooks.decls.omp_report_decl (decl)), rkind);
+  error_at (ctx->location, "enclosing OpenACC %s construct", rkind);
+}
+  else
+gcc_checking_assert (ctx->default_kind == OMP_CLAUSE_DEFAULT_SHARED);
+
+  return flags;
+}
+
 /* Record the fact that DECL was used within the OMP context CTX.
IN_CODE is true when real code uses DECL, and false when we should
merely emit default(none) errors.  Return true if DECL is going to
@@ -6023,7 +6077,12 @@ omp_notice_variable (struct gimplify_omp
 		nflags |= GOVD_MAP | GOVD_EXPLICIT;
 	  }
 	else if (nflags == flags)
-	  nflags |= GOVD_MAP;
+	  {
+		if ((ctx->region_type & ORT_ACC) != 0)
+		  nflags = oacc_default_clause (ctx, decl, flags);
+		else
+		  nflags |= GOVD_MAP;
+	  }
 	  }
 	found_outer:
 	  omp_add_variable (ctx, decl, nflags);
Index: gcc/testsuite/c-c++-common/goacc/data-default-1.c
===
--- gcc/testsuite/c-c++-common/goacc/data-default-1.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/data-default-1.c	(working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+
+
+int main ()
+{
+  int n = 2;
+  int ary[2];
+  
+#pragma acc parallel default (none) /* { dg-message "parallel construct" 2 } */
+  {
+ary[0] /* { dg-error "not specified in enclosing" } */
+  = n; /* { dg-error "not specified in enclosing" } */
+  }
+
+#pragma acc kernels default (none) /* { dg-message "kernels construct" 2 } */
+  {
+ary[0] /* { dg-error "not specified in enclosing" } */
+  = n; /* { dg-error "not specified in enclosing" } */
+  }
+
+#pragma acc data copy (ary, n)
+  {
+#pragma acc parallel default (none)
+{
+  ary[0]
+	= n;
+}
+
+#pragma acc kernels default (none)
+{
+  ary[0]
+	= n;
+}
+  }
+
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c	(working copy)
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+
+#include  
+
+int test_parallel ()
+{
+  int ok = 1;
+  int val = 2;
+  int ary[32];
+  int ondev = 0;
+
+  for (int i = 0; i < 32; i++)
+ary[i] = ~0;
+
+  /* val defaults to firstprivate, ary defaults to copy.  */
+#pragma acc parallel num_gangs (32) copy (ok) copy(ondev)
+  {
+ondev = acc_on_device (acc_device_not_host);
+#pragma acc loop gang(static:1)
+for (unsigned i = 0; i < 32; i++)
+  {
+	if (val != 2)
+	  ok = 0;
+	val += i;
+	ary[i] = val;
+  }
+  }
+
+  if (ondev)
+{
+  if (!ok)
+	return 1;
+  if (val != 2)
+	return 1;
+
+  for (int i = 0; i <

Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-11 Thread Dominique d'Humières

Revision r230175

> 2015-11-10  Ville Voutilainen  
>
> LWG 2510, make the default constructors of library tag types
> explicit.
> * include/bits/mutex.h (defer_lock_t, try_lock_t,
> adopt_lock_t): Add an explicit default constructor.
> * include/bits/stl_pair.h (piecewise_construct_t): Likewise.
> * include/bits/uses_allocator.h (allocator_arg_t): Likewise.
> * libsupc++/new (nothrow_t): Likewise.
> * testsuite/17_intro/tag_type_explicit_ctor.cc: New.

 breaks bootstrap

libtool: compile:  /opt/gcc/build_w/./gcc/xgcc -shared-libgcc 
-B/opt/gcc/build_w/./gcc -nostdinc++ 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/src 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/src/.libs 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/libsupc++/.libs 
-B/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/bin/ 
-B/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/lib/ -isystem 
/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/include -isystem 
/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/sys-include 
-I/opt/gcc/work/libstdc++-v3/../libgcc 
-I/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/include/x86_64-apple-darwin14.5.0
 -I/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/include 
-I/opt/gcc/work/libstdc++-v3/libsupc++ -D_GLIBCXX_SHARED 
-fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi 
-fdiagnostics-show-location=once -fvisibility-inlines-hidden 
-ffunction-sections -fdata-sections -frandom-seed=new_handler.lo -g -O2 
-std=gnu++11 -c ../../../../work/libstdc++-v3/libsupc++/new_handler.cc  
-fno-common -DPIC -D_GLIBCXX_SHARED -o new_handler.o
../../../../work/libstdc++-v3/libsupc++/new_handler.cc:37:39: error: converting 
to 'std::nothrow_t' from initializer list would use explicit constructor 
'constexpr std::nothrow_t::nothrow_t()'
 const std::nothrow_t std::nothrow = { };
   ^
see https://gcc.gnu.org/ml/gcc-regression/2015-11/

Dominique

Re: [ptx] partitioning optimization

2015-11-11 Thread Thomas Schwinge

Hi!

On Wed, 11 Nov 2015 08:59:17 -0500, Nathan Sidwell  wrote:
> On 11/11/15 07:06, Bernd Schmidt wrote:
> > On 11/10/2015 11:33 PM, Nathan Sidwell wrote:
> >> I've been unable to introduce a testcase for this.

(But you still committed an update to gcc/testsuite/ChangeLog.)

You'll need to put such an offloading test into the libgomp testsuite --
offloading complation requires linking, and during that, the offloading
compiler(s) will be invoked, which only the libgomp testsuite is set up
to do, as discussed before.

> >> The difficulty is we
> >> want to check an rtl dump from the acceleration compiler, and there
> >> doesn't  appear to be existing machinery for that in the testsuite.
> >> Perhaps something to be added later?
> >
> > What's the difficulty exactly? Getting a dump should be possible with
> > -foffload=-fdump-whatever, does the testsuite have a problem finding the 
> > right
> > filename?

Currently, this will create cc* files, for example ccdjj2z9.o.271r.final
for -foffload=-fdump-rtl-final.  (I don't know if you can come up with
dg-* directives to scan these.)  The reason is -- I think -- because of
the lto-wrapper and/or mkoffloads not specifying a more suitable "base
name" for the temporary input files to lto1.

> That's not the problem.  How to conditionally enable the test is the 
> difficulty. 
>   I suspect porting something concerning accel_compiler from the libgomp 
> testsuite is needed?

Use "{ target openacc_nvidia_accel_selected }", as implemented by
libgomp/testsuite/lib/libgomp.exp:check_effective_target_openacc_nvidia_accel_selected
(already present on trunk).

Grüße
 Thomas

signature.asc
Description: PGP signature

[patch] libstdc++/60421 (again) Loop in std::this_thread sleep functions

2015-11-11 Thread Jonathan Wakely


This fixes part of PR 60421 by looping in this_thread::sleep_for when
it is interrupted by a signal, and looping in this_thread::sleep_until
to handle clock adjustments.

There are still problems with integer overflow/wrapping in sleep_for,
which still need to be addressed somehow. Maybe using the new
overflow-checking built-ins.

Tested powerpc64le-linux, committed to trunk.

commit 1773ceda34abcbe088048786ac869ee1740ce1d9
Author: Jonathan Wakely 
Date:   Wed Nov 11 16:16:55 2015 +

Loop in std::this_thread sleep functions

	PR libstdc++/60421
	* include/std/thread (this_thread::sleep_for): Retry on EINTR.
	(this_thread::sleep_until): Retry if time not reached.
	* src/c++11/thread.cc (__sleep_for): Retry on EINTR.
	* testsuite/30_threads/this_thread/60421.cc: Test interruption and
	non-steady clocks.

diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index c67ec46..5940e6e 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -297,7 +297,8 @@ _GLIBCXX_END_NAMESPACE_VERSION
 	static_cast(__s.count()),
 	static_cast(__ns.count())
 	  };
-	::nanosleep(&__ts, 0);
+	while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
+	  { }
 #else
 	__sleep_for(__s, __ns);
 #endif
@@ -309,8 +310,17 @@ _GLIBCXX_END_NAMESPACE_VERSION
   sleep_until(const chrono::time_point<_Clock, _Duration>& __atime)
   {
 	auto __now = _Clock::now();
-	if (__now < __atime)
-	  sleep_for(__atime - __now);
+	if (_Clock::is_steady)
+	  {
+	if (__now < __atime)
+	  sleep_for(__atime - __now);
+	return;
+	  }
+	while (__now < __atime)
+	  {
+	sleep_for(__atime - __now);
+	__now = _Clock::now();
+	  }
   }
 
   _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/src/c++11/thread.cc b/libstdc++-v3/src/c++11/thread.cc
index e116afa..3407e80 100644
--- a/libstdc++-v3/src/c++11/thread.cc
+++ b/libstdc++-v3/src/c++11/thread.cc
@@ -221,7 +221,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	static_cast(__s.count()),
 	static_cast(__ns.count())
   };
-::nanosleep(&__ts, 0);
+while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
+  { }
 #elif defined(_GLIBCXX_HAVE_SLEEP)
 # ifdef _GLIBCXX_HAVE_USLEEP
 ::sleep(__s.count());
diff --git a/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc b/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc
index ecc4deb..5dbf257 100644
--- a/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc
+++ b/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc
@@ -15,12 +15,19 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++11" }
+// { dg-do run { target *-*-freebsd* *-*-dragonfly* *-*-netbsd* *-*-linux* *-*-gnu* *-*-solaris* *-*-cygwin *-*-rtems* *-*-darwin* powerpc-ibm-aix* } }
+// { dg-options " -std=gnu++11 -pthread" { target *-*-freebsd* *-*-dragonfly* *-*-netbsd* *-*-linux* *-*-gnu* powerpc-ibm-aix* } }
+// { dg-options " -std=gnu++11 -pthreads" { target *-*-solaris* } }
+// { dg-options " -std=gnu++11 " { target *-*-cygwin *-*-rtems* *-*-darwin* } }
 // { dg-require-cstdint "" }
+// { dg-require-gthreads "" }
 // { dg-require-time "" }
 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 
 void
@@ -28,11 +35,64 @@ test01()
 {
   std::this_thread::sleep_for(std::chrono::seconds(0));
   std::this_thread::sleep_for(std::chrono::seconds(-1));
-  std::this_thread::sleep_for(std::chrono::duration::zero());
+  std::this_thread::sleep_for(std::chrono::duration::zero());
+}
+
+void
+test02()
+{
+  bool test __attribute__((unused)) = true;
+
+  // test interruption of this_thread::sleep_for() by a signal
+  struct sigaction sa{ };
+  sa.sa_handler = +[](int) { };
+  sigaction(SIGUSR1, &sa, 0);
+  bool result = false;
+  std::atomic sleeping{false};
+  std::thread t([&result, &sleeping] {
+auto start = std::chrono::system_clock::now();
+auto time = std::chrono::seconds(3);
+sleeping = true;
+std::this_thread::sleep_for(time);
+result = std::chrono::system_clock::now() >= (start + time);
+  });
+  while (!sleeping) { }
+  std::this_thread::sleep_for(std::chrono::milliseconds(500));
+  pthread_kill(t.native_handle(), SIGUSR1);
+  t.join();
+  VERIFY( result );
+}
+
+struct slow_clock
+{
+  using rep = std::chrono::system_clock::rep;
+  using period = std::chrono::system_clock::period;
+  using duration = std::chrono::system_clock::duration;
+  using time_point = std::chrono::time_point;
+  static constexpr bool is_steady = false;
+
+  static time_point now()
+  {
+auto real = std::chrono::system_clock::now();
+return time_point{real.time_since_epoch() / 2};
+  }
+};
+
+void
+test03()
+{
+  bool test __attribute__((unused)) = true;
+
+  // test that this_thread::sleep_until() handles clock adjustments
+  auto when = slow_clock::now() + std::chrono::seconds(2);
+  std::this_thread::sleep_until(when);
+  VERIFY( slow_clock::now() >= when );
 }
 
 in

[PATCH] Fix detection of setrlimit in libstdc++ testsuite

2015-11-11 Thread Maxim Kuvyrkov

Hi,

This patch fixes an obscure cross-testing problem that crashed (OOMed) our 
boards at Linaro.  Several tests in libstdc++ (e.g., [1]) limit themselves to 
some reasonable amount of RAM and then try to allocate 32 gigs.  Unfortunately, 
the configure test that checks presence of setrlimit is rather strange: if 
target is native, then try compile file with call to setrlimit -- if 
compilation succeeds, then use setrlimit, otherwise, ignore setrlimit.  The 
strange part is that the compilation check is done only for native targets, as 
if cross-toolchains can't generate working executables.  [This is rather odd, 
and I might be missing some underlaying caveat.]

Therefore, when testing a cross toolchain, the test [1] still tries to allocate 
32GB of RAM with no setrlimit restrictions.  On most targets that people use 
for cross-testing this is not an issue because either
- the target is 32-bit, so there is no 32GB user-space to speak of, or
- the target board has small amount of RAM and no swap, so allocation 
immediately fails, or
- the target board has plenty of RAM, so allocating 32GB is not an issue.

However, if one is testing on a 64-bit board with 16GB or RAM and 16GB of swap, 
then one gets into an obscure near-OOM swapping condition.  This is exactly the 
case with cross-testing aarch64-linux-gnu toolchains on APM Mustang.

The attached patch removes "native" restriction from configure test for 
setrlimit.  This enables setrlimit restrictions on the testsuite, and the test 
[1] expectedly fails to allocate 32GB due to setrlimit restriction.

I have tested it on x86_64-linux-gnu and i686-linux-gnu native toolchains, and 
aarch64-linux-gnu and arm-linux-gnueabi[hf] cross-toolchains with no 
regressions [*].

OK to commit?

I didn't go as far as enabling setenv/locale tests when cross-testing libstdc++ 
because I remember of issues with generating locales in cross-built glibc.  In 
any case, locale tests are unlikely to OOM the test board the way that absence 
of setrlimit does.

[1] 27_io/ios_base/storage/2.cc

[*] Cross-testing using user-mode QEMU made 27_io/fpos/14775.cc execution test 
to FAIL.  This test uses setrlimit set max file size, and is misbehaving only 
under QEMU.  I believe this a QEMU issue with not handling setrlimit correctly.

--
Maxim Kuvyrkov
www.linaro.org




0001-Use-setrlimit-for-testing-libstdc-in-cross-toolchain.patch
Description: Binary data

[gomp4.5] depend nowait support for target

2015-11-11 Thread Jakub Jelinek

Hi!

On Mon, Oct 19, 2015 at 10:47:54PM +0300, Ilya Verbin wrote:
> So, here is what I have for now.  Attached target-29.c testcase works fine 
> with
> MIC emul, however I don't know how to (and where) properly check for 
> completion
> of async execution on target.  And, similarly, where to do unmapping after 
> that?
> Do we need a callback from plugin to libgomp (as far as I understood, PTX
> runtime supports this, but HSA doesn't), or libgomp will just check for
> ttask->is_completed in task.c?

Here is the patch updated to have a task.c defined function that the plugin
can call upon completion of async offloading exection.
The testsuite coverage will need to improve, the testcase is wrong
(contains data races - if you want to test parallel running of two target
regions that both touch the same var, I'd say best would be to use
#pragma omp atomic and or in 4 in one case and 1 in another case, then
test if result is 5 (and similarly for the other var).
Also, with the usleeps Alex Monakov will be unhappy because PTX newlib does
not have it, but we'll need to find some solution for that.

Another thing to work on beyond testsuite coverage (it is desirable to test
nowait target tasks (both depend and without depend) being awaited in all
the various waiting spots, i.e. end of parallel, barrier, taskwait, end of
taskgroup, or if (0) task with depend clause waiting on that.

Also, I wonder what to do if #pragma omp target nowait is used outside of
(host) parallel - when team is NULL.  All the tasking code in that case just
executes tasks undeferred, which is fine for all but target nowait - there
it is I'd say useful to be able to run a single host thread concurrently
with some async offloading tasks.  So, I wonder if in that case,
if we encounter target nowait with team == NULL, should not just create a
dummy non-active (nthreads == 1) team, as if there was #pragma omp parallel
if (0) starting above it and ending at program's end.  In OpenMP, the
program's initial thread is implicitly surrounded by inactive parallel, so
this isn't anything against the OpenMP execution model.  But we'd need to
free the team somewhere in a destructor.

Can you please try to cleanup the liboffloadmic side of this, so that
a callback instead of hardcoded __gomp_offload_intelmic_async_completed call
is used?  Can you make sure it works on XeonPhi non-emulated too?

I'll keep working on the testcase coverage and on the team == NULL case.

The patch is on top of gomp-4_5-branch - needs Aldy's priority_queue stuff.

--- liboffloadmic/runtime/offload_host.cpp.jj   2015-11-05 11:31:05.013916598 
+0100
+++ liboffloadmic/runtime/offload_host.cpp  2015-11-10 12:58:55.090951303 
+0100
@@ -64,6 +64,9 @@ static void __offload_fini_library(void)
 #define GET_OFFLOAD_NUMBER(timer_data) \
 timer_data? timer_data->offload_number : 0

+extern "C" void
+__gomp_offload_intelmic_async_completed (const void *);
+
 extern "C" {
 #ifdef TARGET_WINNT
 // Windows does not support imports from libraries without actually
@@ -2507,7 +2510,7 @@ extern "C" {
 const void *info
 )
 {
-   /* TODO: Call callback function, pass info.  */
+   __gomp_offload_intelmic_async_completed (info);
 }
 }

--- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp.jj 2015-10-14 
10:24:10.922194230 +0200
+++ liboffloadmic/plugin/libgomp-plugin-intelmic.cpp2015-11-11 
15:48:55.428967827 +0100
@@ -192,11 +192,23 @@ GOMP_OFFLOAD_get_num_devices (void)

 static void
 offload (const char *file, uint64_t line, int device, const char *name,
-int num_vars, VarDesc *vars, VarDesc2 *vars2)
+int num_vars, VarDesc *vars, VarDesc2 *vars2, const void **async_data)
 {
   OFFLOAD ofld = __offload_target_acquire1 (&device, file, line);
   if (ofld)
-__offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL, NULL);
+{
+  if (async_data == NULL)
+   __offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL,
+   NULL);
+  else
+   {
+ OffloadFlags flags;
+ flags.flags = 0;
+ flags.bits.omp_async = 1;
+ __offload_offload3 (ofld, name, 0, num_vars, vars, NULL, 0, NULL,
+ async_data, 0, NULL, flags, NULL);
+   }
+}
   else
 {
   fprintf (stderr, "%s:%d: Offload target acquire failed\n", file, line);
@@ -218,7 +230,7 @@ GOMP_OFFLOAD_init_device (int device)
   TRACE ("");
   pthread_once (&main_image_is_registered, register_main_image);
   offload (__FILE__, __LINE__, device, "__offload_target_init_proc", 0,
-  NULL, NULL);
+  NULL, NULL, NULL);
 }

 extern "C" void
@@ -240,7 +252,7 @@ get_target_table (int device, int &num_f
   VarDesc2 vd1g[2] = { { "num_funcs", 0 }, { "num_vars", 0 } };

   offload (__FILE__, __LINE__, device, "__offload_target_table_p1", 2,
-  vd1, vd1g);
+  vd1, vd1g, NULL);

   int table_size = num_funcs + 2 * num_vars;
   if (table_size > 0)
@@ -254,

Re: [PATCH v4] SH FDPIC backend support

2015-11-11 Thread Rich Felker

On Wed, Nov 11, 2015 at 09:56:42AM -0500, Rich Felker wrote:
> > > I'm actually
> > > trying to prepare a simpler FDPIC patch for other gcc versions we're
> > > interested in that's not so invasive, and for now I'm just having
> > > function_symbol replace SFUNC_STATIC with SFUNC_GOT on TARGET_FDPIC
> > > to
> > > avoid needing all the label stuff, but it would be nice to find a way
> > > to reuse the existing framework.
> > 
> > Do you know how this affects code size (and inherently performance)?
> 
> I suspect it makes very little difference, but to compare I'd need to
> do the same hack on 5.2.0 or trunk. The only difference should be one
> additional load per call, and one additional GOT slot per function
> called this way (but just once per executable/library).

Actually I think this is not quite right: if the call takes place via
the GOT, this also requires the initial r12 to be preserved somewhere
in order to load the function address, whereas for SFUNC_STATIC, the
initial r12 can be completely discarded, right? (SFUNC functions are
not permitted to use the GOT themselves as far as I can tell, and thus
do not receive the hidden GOT argument in r12.)

Rich

Re: [gomp4.5] Don't mark GOMP_MAP_FIRSTPRIVATE mapped vars addressable

2015-11-11 Thread Jakub Jelinek

On Wed, Nov 11, 2015 at 07:27:51PM +0300, Alexander Monakov wrote:
> > Alex reported to me privately that with the OpenMP 4.5 handling of
> > array section bases (that they are firstprivate instead of mapped)
> > we unnecessarily mark the pointers addressable and that result
> > in less efficient way of passing them as shared to inner constructs.
> 
> Thanks!  Would you be interested in further (minimized) cases where new
> implementation no longer manages to perform copy-in/copy-out optimization?
> E.g. the following.  Or I can try to put such cases in Bugzilla, if you like.
> 
> Alexander
> 
> void f(int *p, int n)
> {
>   int tmp;
> #pragma omp target map(to:p[0:n]) map(tofrom:tmp)
>   {
> #pragma omp parallel
> asm volatile ("" : "=r" (tmp) : "r" (p));
>   }
> 
> #pragma omp target
>   /* Missing optimization for 'tmp' here.  */
> #pragma omp parallel
> asm volatile ("" : : "r" (tmp));
> }

There is nothing to do in this case, map(tofrom:tmp) really
has to make tmp addressable, it needs to deal with its address,
and the copy-in/out optimization really relies on the var not being
addressable in any way; the problem is that you need to be 100% sure
that the thread invoking parallel owns the variable and nobody else
can do anything with the variable concurrently, otherwise the compiler
creates a data race that might not exist in the original program.
And OpenMP 4.5 says that on the second target, tmp is implicitly
firstprivate (tmp).  People who care about the generated code would
use firstprivate (tmp) on the second parallel anyway.

Jakub

Re: [gomp4.5] Don't mark GOMP_MAP_FIRSTPRIVATE mapped vars addressable

2015-11-11 Thread Alexander Monakov

On Wed, 11 Nov 2015, Jakub Jelinek wrote:

> Hi!
> 
> Alex reported to me privately that with the OpenMP 4.5 handling of
> array section bases (that they are firstprivate instead of mapped)
> we unnecessarily mark the pointers addressable and that result
> in less efficient way of passing them as shared to inner constructs.

Thanks!  Would you be interested in further (minimized) cases where new
implementation no longer manages to perform copy-in/copy-out optimization?
E.g. the following.  Or I can try to put such cases in Bugzilla, if you like.

Alexander

void f(int *p, int n)
{
  int tmp;
#pragma omp target map(to:p[0:n]) map(tofrom:tmp)
  {
#pragma omp parallel
asm volatile ("" : "=r" (tmp) : "r" (p));
  }

#pragma omp target
  /* Missing optimization for 'tmp' here.  */
#pragma omp parallel
asm volatile ("" : : "r" (tmp));
}

[PATCH, alpha]: Add TARGET_PRINT_OPERAND and friends

2015-11-11 Thread Uros Bizjak

2015-11-11  Uros Bizjak  

* config/alpha/alpha-protos.h (print_operand): Remove.
(print_operand_address): Remove.
* config/alpha/alpha.h (PRINT_OPERAND): Remove.
(PRINT_OPERAND_ADDRESS): Remove.
(PRINT_OPERAND_PUNCT_VALID_P): Remove.
* config/alpha/alpha.c (TARGET_PRINT_OPERAND): New hook define.
(TARGET_PRINT_OPERAND_ADDRESS): New hook define.
(TARGET_PRINT_OPERAND_PUNCT_VALID_P): New hook define.
(print_operand_address): Rename to...
(alpha_print_operand_address): ...this and make static.
(print_operand): Rename to...
(alpha_print_operand): ...this and make static.
(alpha_print_operand_punct_valid_p): New static function.

Bootstrapped and regression tested on alphaev68-linux-gnu, committed
to mainline SVN.

Uros.
Index: config/alpha/alpha-protos.h
===
--- config/alpha/alpha-protos.h (revision 230178)
+++ config/alpha/alpha-protos.h (working copy)
@@ -65,8 +65,6 @@ extern void alpha_expand_builtin_revert_vms_condit
 
 extern rtx alpha_return_addr (int, rtx);
 extern rtx alpha_gp_save_rtx (void);
-extern void print_operand (FILE *, rtx, int);
-extern void print_operand_address (FILE *, rtx);
 extern void alpha_initialize_trampoline (rtx, rtx, rtx, int, int, int);
 
 extern rtx alpha_va_arg (tree, tree);
Index: config/alpha/alpha.c
===
--- config/alpha/alpha.c(revision 230178)
+++ config/alpha/alpha.c(working copy)
@@ -5041,11 +5041,21 @@ get_round_mode_suffix (void)
   gcc_unreachable ();
 }
 
-/* Print an operand.  Recognize special options, documented below.  */
+/* Implement TARGET_PRINT_OPERAND_PUNCT_VALID_P.  */
 
-void
-print_operand (FILE *file, rtx x, int code)
+static bool
+alpha_print_operand_punct_valid_p (unsigned char code)
 {
+  return (code == '/' || code == ',' || code == '-' || code == '~'
+ || code == '#' || code == '*' || code == '&');
+}
+
+/* Implement TARGET_PRINT_OPERAND.  The alpha-specific
+   operand codes are documented below.  */
+
+static void
+alpha_print_operand (FILE *file, rtx x, int code)
+{
   int i;
 
   switch (code)
@@ -5064,6 +5074,8 @@ get_round_mode_suffix (void)
   break;
 
 case '/':
+  /* Generates the instruction suffix.  The TRAP_SUFFIX and ROUND_SUFFIX
+attributes are examined to determine what is appropriate.  */
   {
const char *trap = get_trap_mode_suffix ();
const char *round = get_round_mode_suffix ();
@@ -5074,12 +5086,14 @@ get_round_mode_suffix (void)
   }
 
 case ',':
-  /* Generates single precision instruction suffix.  */
+  /* Generates single precision suffix for floating point
+instructions (s for IEEE, f for VAX).  */
   fputc ((TARGET_FLOAT_VAX ? 'f' : 's'), file);
   break;
 
 case '-':
-  /* Generates double precision instruction suffix.  */
+  /* Generates double precision suffix for floating point
+instructions (t for IEEE, g for VAX).  */
   fputc ((TARGET_FLOAT_VAX ? 'g' : 't'), file);
   break;
 
@@ -5350,8 +5364,10 @@ get_round_mode_suffix (void)
 }
 }
 
-void
-print_operand_address (FILE *file, rtx addr)
+/* Implement TARGET_PRINT_OPERAND_ADDRESS.  */
+
+static void
+alpha_print_operand_address (FILE *file, machine_mode /*mode*/, rtx addr)
 {
   int basereg = 31;
   HOST_WIDE_INT offset = 0;
@@ -9877,6 +9893,13 @@ alpha_atomic_assign_expand_fenv (tree *hold, tree
 #define TARGET_STDARG_OPTIMIZE_HOOK alpha_stdarg_optimize_hook
 #endif
 
+#undef TARGET_PRINT_OPERAND
+#define TARGET_PRINT_OPERAND alpha_print_operand
+#undef TARGET_PRINT_OPERAND_ADDRESS
+#define TARGET_PRINT_OPERAND_ADDRESS alpha_print_operand_address
+#undef TARGET_PRINT_OPERAND_PUNCT_VALID_P
+#define TARGET_PRINT_OPERAND_PUNCT_VALID_P alpha_print_operand_punct_valid_p
+
 /* Use 16-bits anchor.  */
 #undef TARGET_MIN_ANCHOR_OFFSET
 #define TARGET_MIN_ANCHOR_OFFSET -0x7fff - 1
Index: config/alpha/alpha.h
===
--- config/alpha/alpha.h(revision 230178)
+++ config/alpha/alpha.h(working copy)
@@ -1005,37 +1005,6 @@ do { \
 #define ASM_OUTPUT_ADDR_DIFF_ELT(FILE, BODY, VALUE, REL) \
   fprintf (FILE, "\t.gprel32 $L%d\n", (VALUE))
 
-
-/* Print operand X (an rtx) in assembler syntax to file FILE.
-   CODE is a letter or dot (`z' in `%z0') or 0 if no letter was specified.
-   For `%' followed by punctuation, CODE is the punctuation and X is null.  */
-
-#define PRINT_OPERAND(FILE, X, CODE)  print_operand (FILE, X, CODE)
-
-/* Determine which codes are valid without a following integer.  These must
-   not be alphabetic.
-
-   ~Generates the name of the current function.
-
-   /   Generates the instruction suffix.  The TRAP_SUFFIX and ROUND_SUFFIX
-   attributes are examined to determine what is appropriate.
-
-   ,Generates single precision su

[gomp4.5] Don't mark GOMP_MAP_FIRSTPRIVATE mapped vars addressable

2015-11-11 Thread Jakub Jelinek

Hi!

Alex reported to me privately that with the OpenMP 4.5 handling of
array section bases (that they are firstprivate instead of mapped)
we unnecessarily mark the pointers addressable and that result
in less efficient way of passing them as shared to inner constructs.

They don't need to be made addressable just because they appear as
bases of mapped array sections.

Fixed thusly, regtested on x86_64-linux, committed to gomp-4_5-branch.

2015-11-11  Jakub Jelinek  

c/
* c-typeck.c (c_finish_omp_clauses): Don't mark
GOMP_MAP_FIRSTPRIVATE_POINTER decls addressable.
cp/
* semantics.c (finish_omp_clauses): Don't mark
GOMP_MAP_FIRSTPRIVATE_POINTER decls addressable.

--- gcc/c/c-typeck.c.jj 2015-11-09 17:36:17.0 +0100
+++ gcc/c/c-typeck.c2015-11-10 14:25:53.592499759 +0100
@@ -12865,7 +12865,10 @@ c_finish_omp_clauses (tree clauses, bool
omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
  remove = true;
}
- else if (!c_mark_addressable (t))
+ else if ((OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
+   || (OMP_CLAUSE_MAP_KIND (c)
+   != GOMP_MAP_FIRSTPRIVATE_POINTER))
+  && !c_mark_addressable (t))
remove = true;
  else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
--- gcc/cp/semantics.c.jj   2015-11-06 08:08:37.0 +0100
+++ gcc/cp/semantics.c  2015-11-10 14:27:14.916355747 +0100
@@ -6566,6 +6566,9 @@ finish_omp_clauses (tree clauses, bool a
}
  else if (!processing_template_decl
   && TREE_CODE (TREE_TYPE (t)) != REFERENCE_TYPE
+  && (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
+  || (OMP_CLAUSE_MAP_KIND (c)
+  != GOMP_MAP_FIRSTPRIVATE_POINTER))
   && !cxx_mark_addressable (t))
remove = true;
  else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP

Jakub

[PATCH][ARM] Do not expand movmisalign pattern if not in 32-bit mode

2015-11-11 Thread Kyrill Tkachov


Hi all,

The attached testcase ICEs when compiled with -march=armv6k -mthumb -Os or any 
march
for which -mthumb gives Thumb1:
 error: unrecognizable insn:
 }
 ^
(insn 13 12 14 5 (set (reg:SI 116 [ x ])
(unspec:SI [
(mem:SI (reg/v/f:SI 112 [ s ]) [0 MEM[(unsigned char 
*)s_1(D)]+0 S4 A8])
] UNSPEC_UNALIGNED_LOAD)) besttry.c:9 -1
 (nil))

The problem is that the expands a movmisalign pattern but the resulting 
unaligned loads don't
match any define_insn because they are gated on unaligned_access && 
TARGET_32BIT.
The unaligned_access expander is gated only on unaligned_access.

This small patch fixes the issue by turning off unaligned_access if 
TARGET_32BIT is not true.
We can then remove TARGET_32BIT from the unaligned load/store patterns 
conditions as a cleanup.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2015-11-11  Kyrylo Tkachov  

* config/arm/arm.c (arm_option_override): Require TARGET_32BIT
for unaligned_access.
* config/arm/arm.md (unaligned_loadsi): Remove redundant TARGET_32BIT
from matching condition.
(unaligned_loadhis): Likewise.
(unaligned_loadhiu): Likewise.
(unaligned_storesi): Likewise.
(unaligned_storehi): Likewise.

2015-11-11  Kyrylo Tkachov  

* gcc.target/arm/armv6-unaligned-load-ice.c: New test.
commit 3b1e68a9f7fadeeb6d7f201ce2291bf2286a4d63
Author: Kyrylo Tkachov 
Date:   Tue Nov 10 13:48:17 2015 +

[ARM] Do not expand movmisalign pattern if not in 32-bit mode

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6a0994e..4708a12 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3436,7 +3436,8 @@ arm_option_override (void)
 }
 
   /* Enable -munaligned-access by default for
- - all ARMv6 architecture-based processors
+ - all ARMv6 architecture-based processors when compiling for a 32-bit ISA
+ i.e. Thumb2 and ARM state only.
  - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
  - ARMv8 architecture-base processors.
 
@@ -3446,7 +3447,7 @@ arm_option_override (void)
 
   if (unaligned_access == 2)
 {
-  if (arm_arch6 && (arm_arch_notm || arm_arch7))
+  if (TARGET_32BIT && arm_arch6 && (arm_arch_notm || arm_arch7))
 	unaligned_access = 1;
   else
 	unaligned_access = 0;
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ab48873..090a287 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4266,7 +4266,7 @@ (define_insn "unaligned_loadsi"
   [(set (match_operand:SI 0 "s_register_operand" "=l,r")
 	(unspec:SI [(match_operand:SI 1 "memory_operand" "Uw,m")]
 		   UNSPEC_UNALIGNED_LOAD))]
-  "unaligned_access && TARGET_32BIT"
+  "unaligned_access"
   "ldr%?\t%0, %1\t@ unaligned"
   [(set_attr "arch" "t2,any")
(set_attr "length" "2,4")
@@ -4279,7 +4279,7 @@ (define_insn "unaligned_loadhis"
 	(sign_extend:SI
 	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,Uh")]
 		 UNSPEC_UNALIGNED_LOAD)))]
-  "unaligned_access && TARGET_32BIT"
+  "unaligned_access"
   "ldrsh%?\t%0, %1\t@ unaligned"
   [(set_attr "arch" "t2,any")
(set_attr "length" "2,4")
@@ -4292,7 +4292,7 @@ (define_insn "unaligned_loadhiu"
 	(zero_extend:SI
 	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
 		 UNSPEC_UNALIGNED_LOAD)))]
-  "unaligned_access && TARGET_32BIT"
+  "unaligned_access"
   "ldrh%?\t%0, %1\t@ unaligned"
   [(set_attr "arch" "t2,any")
(set_attr "length" "2,4")
@@ -4304,7 +4304,7 @@ (define_insn "unaligned_storesi"
   [(set (match_operand:SI 0 "memory_operand" "=Uw,m")
 	(unspec:SI [(match_operand:SI 1 "s_register_operand" "l,r")]
 		   UNSPEC_UNALIGNED_STORE))]
-  "unaligned_access && TARGET_32BIT"
+  "unaligned_access"
   "str%?\t%1, %0\t@ unaligned"
   [(set_attr "arch" "t2,any")
(set_attr "length" "2,4")
@@ -4316,7 +4316,7 @@ (define_insn "unaligned_storehi"
   [(set (match_operand:HI 0 "memory_operand" "=Uw,m")
 	(unspec:HI [(match_operand:HI 1 "s_register_operand" "l,r")]
 		   UNSPEC_UNALIGNED_STORE))]
-  "unaligned_access && TARGET_32BIT"
+  "unaligned_access"
   "strh%?\t%1, %0\t@ unaligned"
   [(set_attr "arch" "t2,any")
(set_attr "length" "2,4")
diff --git a/gcc/testsuite/gcc.target/arm/armv6-unaligned-load-ice.c b/gcc/testsuite/gcc.target/arm/armv6-unaligned-load-ice.c
new file mode 100644
index 000..88528f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv6-unaligned-load-ice.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { "-march=armv6k" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" } { "" } } */
+/* { dg-options "-mthumb -Os -mfloat-abi=softfp" } */
+/* { dg-add-options arm_arch_v6k } */
+
+long
+get_number (char *s, long size, int unsigned_p)
+{
+  long x;
+  unsigned char *p = (unsigned char *) s;
+  switch (size)
+{
+case 4:
+  x = ((long) p[3] << 24) | ((long) p[2] << 16) | (p[1] << 8) | p

[PATCH][ARC] Fix ARC backend ICE on pr29921-2

2015-11-11 Thread Claudiu Zissulescu

Please find attached a patch that fixes the ARC backend ICE on pr29921-2 test 
from gcc.dg (dg.exp). 

The patch will allow generating conditional move also outside expand scope. The 
error was triggered during if-conversion.

Ok to apply?
Claudiu

ChangeLog:
2015-11-11  Claudiu Zissulescu  

* config/arc/arc.c (gen_compare_reg): Swap operands also when we
do not expand to rtl.



00-fixpr29921-2.patch
Description: 00-fixpr29921-2.patch

Re: State of support for the ISO C++ Transactional Memory TS and remanining work

2015-11-11 Thread Jonathan Wakely


On 11/11/15 15:04 +, Szabolcs Nagy wrote:

yes, non-experimental
(since you were talking about libstdc++ changes to
complete the support for the TS).


The TS is experimental. That's the nature of all C++ TSs.

Completing the TS support does not mean anything is non-experimental.

[HSA] support global variables

2015-11-11 Thread Martin Liška

Hi.

Following patch adds support for global variables seen by
an HSAIL executable. The HSA runtime can link a name of a global variable
with pointer to the variable used by host.

Installed to HSA branch.

Martin
>From de58711a6ddbb1e4558a9454d7aeb6d2b33861de Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 5 Nov 2015 11:36:23 +0100
Subject: [PATCH] HSA: support global variables

gcc/ChangeLog:

2015-11-05  Martin Liska  

	* hsa-brig.c (emit_directive_variable): Do not display warning
	for global variables.
	(emit_function_directives): Iterate m_global_symbols instead
	of m_readonly_variables.
	(hsa_output_global_variables): New function.
	(hsa_output_kernel_mapping): Remove.
	(hsa_output_libgomp_mapping): New function.
	(hsa_output_kernels): Likewise.
	(hsa_output_brig): Use new functions.
	* hsa-dump.c (dump_hsa_cfun): Dump all global symbols.
	* hsa-gen.c (hsa_symbol::global_var_p): New predicate.
	(hsa_function_representation::~hsa_function_representation):
	Release memory.
	(get_symbol_for_decl): Simplify logic to just two types
	of variables: local and global.
	(hsa_get_string_cst_symbol): Use m_global_symbols instead
	of m_readonly_variables.
	* hsa.c (hsa_init_compilation_unit_data): Initialize
	hsa_global_variable_symbols.
	(hsa_deinit_compilation_unit_data): Release it.
	* hsa.h (struct hsa_symbol): Remove m_readonly_variables and
	replace it with m_global_symbols.
	(struct hsa_free_symbol_hasher): Remove.
	(hsa_free_symbol_hasher::hash): Likewise.
	(hsa_free_symbol_hasher::equal): Likewise.

libgomp/ChangeLog:

2015-11-05  Martin Liska  

	* plugin/plugin-hsa.c (struct global_var_info): New structure.
	(struct brig_image_desc): Add global variables.
	(create_and_finalize_hsa_program): Define all global variables
	used in a BRIG module.
---
 gcc/hsa-brig.c  | 157 +---
 gcc/hsa-dump.c  |  10 +++
 gcc/hsa-gen.c   |  80 --
 gcc/hsa.c   |  18 -
 gcc/hsa.h   |  35 +++---
 libgomp/plugin/plugin-hsa.c |  30 +
 6 files changed, 242 insertions(+), 88 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index d2882fc..f47e9c3 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -506,12 +506,7 @@ emit_directive_variable (struct hsa_symbol *symbol)
   prefix = '&';
 
   if (!symbol->m_cst_value)
-	{
-	  dirvar.allocation = BRIG_ALLOCATION_PROGRAM;
-	  if (TREE_CODE (symbol->m_decl) == VAR_DECL)
-	warning (0, "referring to global symbol %q+D by name from HSA code "
-		 "won't work", symbol->m_decl);
-	}
+	dirvar.allocation = BRIG_ALLOCATION_PROGRAM;
 }
   else if (symbol->m_global_scope_p)
 prefix = '&';
@@ -545,7 +540,10 @@ emit_directive_variable (struct hsa_symbol *symbol)
   dirvar.linkage = symbol->m_linkage;
   dirvar.dim.lo = (uint32_t) symbol->m_dim;
   dirvar.dim.hi = (uint32_t) ((unsigned long long) symbol->m_dim >> 32);
-  dirvar.modifier.allBits |= BRIG_VARIABLE_DEFINITION;
+
+  /* Global variables are just declared and linked via HSA runtime.  */
+  if (!symbol->global_var_p ())
+dirvar.modifier.allBits |= BRIG_VARIABLE_DEFINITION;
   dirvar.reserved = 0;
 
   if (symbol->m_cst_value)
@@ -571,7 +569,7 @@ emit_function_directives (hsa_function_representation *f, bool is_declaration)
   hsa_symbol *sym;
 
   if (!f->m_declaration_p)
-for (int i = 0; f->m_readonly_variables.iterate (i, &sym); i++)
+for (int i = 0; f->m_global_symbols.iterate (i, &sym); i++)
   {
 	emit_directive_variable (sym);
 	brig_insn_count++;
@@ -1832,11 +1830,93 @@ hsa_brig_emit_omp_symbols (void)
 static GTY(()) tree hsa_ctor_statements;
 static GTY(()) tree hsa_dtor_statements;
 
-/* Create a static constructor that will register out brig stuff with
-   libgomp.  */
+/* Create and return __hsa_global_variables symbol that contains
+   all informations consumed by libgomp to link global variables
+   with their string names used by an HSA kernel.  */
+
+static tree
+hsa_output_global_variables ()
+{
+  unsigned l = hsa_global_variable_symbols->elements ();
+
+  tree variable_info_type = make_node (RECORD_TYPE);
+  tree id_f1 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("name"), ptr_type_node);
+  DECL_CHAIN (id_f1) = NULL_TREE;
+  tree id_f2 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("omp_data_size"),
+			   ptr_type_node);
+  DECL_CHAIN (id_f2) = id_f1;
+  finish_builtin_struct (variable_info_type, "__hsa_variable_info", id_f2,
+			 NULL_TREE);
+
+  tree int_num_of_global_vars;
+  int_num_of_global_vars = build_int_cst (uint32_type_node, l);
+  tree global_vars_num_index_type = build_index_type (int_num_of_global_vars);
+  tree global_vars_array_type = build_array_type (variable_info_type,
+		  global_vars_num_index_type);
+
+  vec *global_vars_vec = NULL;
+
+  for (hash_table ::iterator it
+   = hsa_global_variable_symbols->begin ();
+   it != hsa_global_variable_symbols->end

Re: State of support for the ISO C++ Transactional Memory TS and remanining work

2015-11-11 Thread Szabolcs Nagy


On 10/11/15 18:29, Torvald Riegel wrote:

On Tue, 2015-11-10 at 17:26 +, Szabolcs Nagy wrote:

On 09/11/15 00:19, Torvald Riegel wrote:

Hi,

I'd like to summarize the current state of support for the TM TS, and
outline the current plan for the work that remains to complete the
support.


...

Attached is a patch by Jason that implements this check.  This adds one
symbol, which should be okay we hope.



does that mean libitm will depend on libstdc++?


No, weak references are used to avoid that.  See libitm/eh_cpp.cc for
example.



i see.


I've not yet created tests for the full list of functions specified as
transaction-safe in the TS, but my understanding is that this list was
created after someone from the ISO C++ TM study group looked at libstdc
++'s implementation and investigated which functions might be feasible
to be declared transaction-safe in it.



is that list available somewhere?


See the TM TS, N4514.



i was looking at an older version,
things make more sense now.

i think system() should not be transaction safe..

i wonder what's the plan for getting libc functions
instrumented (i assume that is needed unless hw
support is used).


xmalloc
the runtime exits on memory allocation failure,
so it is not possible to use it safely.
(i think it should be possible to roll back the
transaction in case of internal allocation failure
and retry with a strategy that does not need dynamic
allocation).


Not sure what you mean by "safely".  Hardening against out-of-memory
situations hasn't been considered to be of high priority so far, but I'd
accept patches for that that don't increase complexity signifantly and
don't hamper performance.



i consider this a library safety issue.

(a library or runtime is not safe to use if it may terminate
the process in case of internal failures.)


GTM_error, GTM_fatal
the runtime may print error messages to stderr.
stderr is owned by the application.


We need to report errors in some way, especially with something that's
still experimental such as TM.  Alternatives are using C++ exceptions to
report errors, but we'd still need something else for C, such as
handlers that the program can control.



ok, i thought the plan was to move this out of the
experimental state now.


uint64_t GTM::gtm_spin_count_var = 1000;
i guess this was supposed to be tunable.
it seems libitm needs some knobs (strategy, retries,
spin counts), but there is no easy way to adapt these
for a target/runtime environment.


Sure, more performance tuning knobs would be nice.



my problem was with getting the knobs right at runtime.

(i think this will need a solution to make tm practically
useful, there are settings that seem to be sensitive to
the properties of the underlying hw.. this also seems
to be a problem for glibc lock elision retry policies.)


sys_futex0
i'm not sure why this has arch specific implementations
for some targets but not others. (syscall is not in the
implementation reserved namespace).


Are there archs that support libitm but don't have a definition of this
one?



i thought all targets were supported on linux
(the global lock based strategies should work)
i can prepare a sys_futex0 for arm and aarch64.


these are the issues i noticed that may matter if this
is going to be a supported compiler runtime.


What do you mean by that precisely?  For it to become non-experimental?



yes, non-experimental
(since you were talking about libstdc++ changes to
complete the support for the TS).

Re: [gomp4] Fix some broken tests

2015-11-11 Thread Nathan Sidwell


On 11/11/15 09:50, Cesar Philippidis wrote:

On 11/11/2015 05:40 AM, Nathan Sidwell wrote:

On 11/10/15 18:08, Cesar Philippidis wrote:

On 11/10/2015 12:35 PM, Nathan Sidwell wrote:

I've committed this to  gomp4.  In preparing the reworked firstprivate
patch changes for gomp4's gimplify.c I discovered these testcases were
passing by accident, and lacked a data clause.


It used to be if a reduction was on a parallel construct, the gimplifier
would introduce a pcopy clause for the reduction variable if it was not
associated with any data clause. Is that not the case anymore?


AFAICT, the std doesn't specify that behaviour.   2.6 'Data Environment'
doesn't mention reductions as a modifier for implicitly determined data
attributes.


I guess I was confused because the reduction section in 2.5.11 mentions
something about updating the original reduction variable after the
parallel region.


I think that still relies on a copy clause to transfer the liveness of the 
original variable into and out of the region.  (that's the implication of what 
2.6 says)


nathan

Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-11 Thread Bernd Schmidt


On 11/10/2015 10:35 PM, Abe wrote:

I wrote:

What I'm saying is I don't see a reason for a "definitely always
unsafe" state.
Why would any access not be made safe if a scratchpad is used?


Because the RTL if-converter doesn`t "know" how to convert
{everything that can be made safe using a scratchpad and is unsafe
otherwise}. My patch is only designed to enable the conversion of
half-hammocks with a single may-trap store.


Yeah, but how is that relevant to the question of whether a MEM is safe? 
The logic should be

 if mem is safe and we are allowed to speculate -> just do it
 otherwise mem is unsafe, so
   if we have prereqs like conditional moves -> use scratchpads
   otherwise fail

I don't see how a three-state property for a single MEM is necessary or 
helpful, and the implementation in the patch just roughly distinguishes 
between two different types of trap (invalid address vs. readonly 
memory). That seems irrelevant both to the question of whether something 
is safe or not, and to the question of whether we know how to perform 
the conversion.


You might argue that something that is known readonly will always fail 
if written to at runtime, but that's no different from any other kind of 
invalid address, and using a scratchpad prevents the write unless it 
would have happened without if-conversion.



In summary, the 3 possible analysis outcomes are something like this:

   * safe even without a scratchpad

   * only safe witha scratchpad, and we _do_ know how to convert it
safely

   * currently unsafe because we don`t yet   know how to convert it
safely


This could be seen as a property of the block being converted, and is 
kind of implicit in the algorithm sketched above, but I don't see how it 
would be a property of the MEM that represents the store.



Do you have performance numbers anywhere?


I think my analysis work so far on this project is not yet complete
enough for public review, mainly because it does not include the
benefit of  profiling.


I think performance numbers are a fairly important part of a submission 
like this where the transformation isn't an obvious improvement (as 
opposed to removing an instruction or suchlike).



Bernd

Re: [PATCH v4] SH FDPIC backend support

2015-11-11 Thread Rich Felker

On Wed, Nov 11, 2015 at 11:36:26PM +0900, Oleg Endo wrote:
> On Tue, 2015-11-10 at 15:07 -0500, Rich Felker wrote:
> 
> > > The way libcalls are now emitted is a bit unhandy.  If more special
> > > -ABI
> > > libcalls are to be added in the future, they all have to do the jsr
> > > vs.
> > > bsrf handling (some potential candidates for new libcalls are
> > > optimized
> > > soft FP routines).  Then we still have PR 65374 and PR 54019. In
> > > the
> > > future maybe we should come up with something that allows emitting
> > > libcalls in a more transparent way...
> > 
> > I'd like to look into improving this at some point in the near
> > future.
> > On further reading of the changes made, I think there's a lot of code
> > we could reduce or simplify.
> > 
> > In all the places where new RTL patterns were added for *call*_fdpic,
> > the main constraint change vs the non-fdpic version is using REG_PIC.
> > Is it possible to make a REG_GOT_ARG macro or similar that's defined
> > as something like TARGET_FDPIC ? REG_PIC : nonexistent_or_dummy?
> 
> I'm not sure I understand what you mean by that.  Do you have a small
> code snippet example?

Sorry, I don't really understand RTL well enough to make a code
snippet. What I want to express is that an insn "uses" (in the (use
...) sense) a register (r12) conditionally depending on a runtime
option (TARGET_FDPIC).

> > As for the call site stuff, I wonder why the existing call site stuff
> > used by "call_pcrel" can't be used for SFUNC_STATIC. 
> 
> "call_pcrel" is a real call insn.  The libcalls are not expanded as
> real call insns to avoid the regular register save/restores etc which
> is needed to do a normal function call.

Yes, I see that. What I was really wondering though is why the new
call site generation code and constraint was added when the call_pcrel
code already has mechanisms for this, rather than just duplicating the
internals that call_pcrel uses. It seems like we're doing things in a
gratuitously different way here.

> I guess the generic fix for this issue would be some mechanism to
> specify which regs are clobbered/preserved and then provide the right
> settings for the libcall functions.

Is this possible in the sh backend or does it need changes to
higher-level gcc code? (i.e. is it presently possible to make an insn
that conditionally clobbers different things rather than having to
make tons of different insns for each possible set of clobbers?)

> > I'm actually
> > trying to prepare a simpler FDPIC patch for other gcc versions we're
> > interested in that's not so invasive, and for now I'm just having
> > function_symbol replace SFUNC_STATIC with SFUNC_GOT on TARGET_FDPIC
> > to
> > avoid needing all the label stuff, but it would be nice to find a way
> > to reuse the existing framework.
> 
> Do you know how this affects code size (and inherently performance)?

I suspect it makes very little difference, but to compare I'd need to
do the same hack on 5.2.0 or trunk. The only difference should be one
additional load per call, and one additional GOT slot per function
called this way (but just once per executable/library).

Another issue I've started looking at is how r12 is put in fixed_regs,
which is conceptually wrong. Preliminary tests show that removing it
from fixed_regs doesn't break and produces much better code -- r12
gets used as a temp register in functions that don't need it, and in
one function that made multiple calls, the saving of initial r12 to a
call-saved register even happened in the delay slot of the call. I've
been discussing it with Alexander Monakov on IRC (#musl) and based on
my understanding so far of how gcc works (which admittedly may be
wrong) the current FDPIC code looks like it's written not to depend on
r12 being 'fixed'. Also I think I'm pretty close to understanding how
we could make the same improvements for non-FDPIC PIC codegen: instead
of loading r12 in the prologue, load a pseudo, then use that pseudo
for GOT access and force it into r12 the same way FDPIC call code does
for PLT calls. Does this sound correct?

Rich

Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Marek Polacek

On Wed, Nov 11, 2015 at 01:42:04PM +0100, Bernd Schmidt wrote:
> On 11/11/2015 01:31 PM, Marek Polacek wrote:
> 
> >Certainly I'm in favor of sharing code between C and C++ FEs, though in
> >this case it didn't seem too important/obvious, because of the extra !=
> >error_mark_node check + I don't really like the new function getting *type
> >and setting it there.
> 
> Make it return bool to indicate whether to change type to error_mark.

Yeah, I've done it like so.

Marek

Re: [gomp4] Fix some broken tests

2015-11-11 Thread Cesar Philippidis

On 11/11/2015 05:40 AM, Nathan Sidwell wrote:
> On 11/10/15 18:08, Cesar Philippidis wrote:
>> On 11/10/2015 12:35 PM, Nathan Sidwell wrote:
>>> I've committed this to  gomp4.  In preparing the reworked firstprivate
>>> patch changes for gomp4's gimplify.c I discovered these testcases were
>>> passing by accident, and lacked a data clause.
>>
>> It used to be if a reduction was on a parallel construct, the gimplifier
>> would introduce a pcopy clause for the reduction variable if it was not
>> associated with any data clause. Is that not the case anymore?
> 
> AFAICT, the std doesn't specify that behaviour.   2.6 'Data Environment'
> doesn't mention reductions as a modifier for implicitly determined data
> attributes.

I guess I was confused because the reduction section in 2.5.11 mentions
something about updating the original reduction variable after the
parallel region.

Cesar

Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Marek Polacek

On Tue, Nov 10, 2015 at 04:48:13PM -0700, Jeff Law wrote:
> Someone (I can't recall who) suggested the overflow check ought to be
> shared, I agree.  Can you factor out that check, shove it into c-family/ and
> call it from the C & C++ front-ends?
> 
> Approved with that change.  Please post it here for archival purposes
> though.

Done, thanks.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2015-11-11  Marek Polacek  

PR c/68107
PR c++/68266
* c-common.c (valid_array_size_p): New function.
* c-common.h (valid_array_size_p): Declare.

* c-decl.c (grokdeclarator): Call valid_array_size_p.  Remove code
checking the size of an array.

* decl.c (grokdeclarator): Call valid_array_size_p.  Remove code
checking the size of an array.

* c-c++-common/pr68107.c: New test.
* g++.dg/init/new38.C (large_array_char): Adjust dg-error.
(large_array_char_template): Likewise.
* g++.dg/init/new44.C: Adjust dg-error.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 53c1d81..a393b32 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -13110,4 +13110,26 @@ warn_duplicated_cond_add_or_warn (location_t loc, tree 
cond, vec **chain)
 (*chain)->safe_push (cond);
 }
 
+/* Check if array size calculations overflow or if the array covers more
+   than half of the address space.  Return true if the size of the array
+   is valid, false otherwise.  TYPE is the type of the array and NAME is
+   the name of the array, or NULL_TREE for unnamed arrays.  */
+
+bool
+valid_array_size_p (location_t loc, tree type, tree name)
+{
+  if (type != error_mark_node
+  && COMPLETE_TYPE_P (type)
+  && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST
+  && !valid_constant_size_p (TYPE_SIZE_UNIT (type)))
+{
+  if (name)
+   error_at (loc, "size of array %qE is too large", name);
+  else
+   error_at (loc, "size of unnamed array is too large");
+  return false;
+}
+  return true;
+}
+
 #include "gt-c-family-c-common.h"
diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index c825454..bad8d05 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -1463,5 +1463,6 @@ extern bool check_no_cilk (tree, const char *, const char 
*,
   location_t loc = UNKNOWN_LOCATION);
 extern bool reject_gcc_builtin (const_tree, location_t = UNKNOWN_LOCATION);
 extern void warn_duplicated_cond_add_or_warn (location_t, tree, vec **);
+extern bool valid_array_size_p (location_t, tree, tree);
 
 #endif /* ! GCC_C_COMMON_H */
diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index a3d8ead..fb4f5ea 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -6007,6 +6007,9 @@ grokdeclarator (const struct c_declarator *declarator,
TYPE_SIZE_UNIT (type) = size_zero_node;
SET_TYPE_STRUCTURAL_EQUALITY (type);
  }
+
+   if (!valid_array_size_p (loc, type, name))
+ type = error_mark_node;
  }
 
if (decl_context != PARM
@@ -6014,7 +6017,8 @@ grokdeclarator (const struct c_declarator *declarator,
|| array_ptr_attrs != NULL_TREE
|| array_parm_static))
  {
-   error_at (loc, "static or type qualifiers in non-parameter 
array declarator");
+   error_at (loc, "static or type qualifiers in non-parameter "
+ "array declarator");
array_ptr_quals = TYPE_UNQUALIFIED;
array_ptr_attrs = NULL_TREE;
array_parm_static = 0;
@@ -6293,22 +6297,6 @@ grokdeclarator (const struct c_declarator *declarator,
}
 }
 
-  /* Did array size calculations overflow or does the array cover more
- than half of the address-space?  */
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && COMPLETE_TYPE_P (type)
-  && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST
-  && ! valid_constant_size_p (TYPE_SIZE_UNIT (type)))
-{
-  if (name)
-   error_at (loc, "size of array %qE is too large", name);
-  else
-   error_at (loc, "size of unnamed array is too large");
-  /* If we proceed with the array type as it is, we'll eventually
-crash in tree_to_[su]hwi().  */
-  type = error_mark_node;
-}
-
   /* If this is declaring a typedef name, return a TYPE_DECL.  */
 
   if (storage_class == csc_typedef)
diff --git gcc/cp/decl.c gcc/cp/decl.c
index bd3f2bc..a3caa19 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -9945,6 +9945,9 @@ grokdeclarator (const cp_declarator *declarator,
case cdk_array:
  type = create_array_type_for_decl (dname, type,
 declarator->u.array.bounds);
+ if (!valid_array_size_p (input_location, type, dname))
+   type = error_mark_node;
+
  if (declarator->std_attributes)
/* [dcl.array]/1:
 
@@ -10508,19 +10511,6

Re: [PATCH v4] SH FDPIC backend support

2015-11-11 Thread Oleg Endo

On Tue, 2015-11-10 at 15:07 -0500, Rich Felker wrote:

> > The way libcalls are now emitted is a bit unhandy.  If more special
> > -ABI
> > libcalls are to be added in the future, they all have to do the jsr
> > vs.
> > bsrf handling (some potential candidates for new libcalls are
> > optimized
> > soft FP routines).  Then we still have PR 65374 and PR 54019. In
> > the
> > future maybe we should come up with something that allows emitting
> > libcalls in a more transparent way...
> 
> I'd like to look into improving this at some point in the near
> future.
> On further reading of the changes made, I think there's a lot of code
> we could reduce or simplify.
> 
> In all the places where new RTL patterns were added for *call*_fdpic,
> the main constraint change vs the non-fdpic version is using REG_PIC.
> Is it possible to make a REG_GOT_ARG macro or similar that's defined
> as something like TARGET_FDPIC ? REG_PIC : nonexistent_or_dummy?

I'm not sure I understand what you mean by that.  Do you have a small
code snippet example?

> As for the call site stuff, I wonder why the existing call site stuff
> used by "call_pcrel" can't be used for SFUNC_STATIC. 

"call_pcrel" is a real call insn.  The libcalls are not expanded as
real call insns to avoid the regular register save/restores etc which
is needed to do a normal function call.
I guess the generic fix for this issue would be some mechanism to
specify which regs are clobbered/preserved and then provide the right
settings for the libcall functions.


> I'm actually
> trying to prepare a simpler FDPIC patch for other gcc versions we're
> interested in that's not so invasive, and for now I'm just having
> function_symbol replace SFUNC_STATIC with SFUNC_GOT on TARGET_FDPIC
> to
> avoid needing all the label stuff, but it would be nice to find a way
> to reuse the existing framework.

Do you know how this affects code size (and inherently performance)?

Cheers,
Oleg

[PATCH] More compile-time saving in BB vectorization

2015-11-11 Thread Richard Biener


This saves some more compile-time avoiding vector size iteration for
trivial fails.  It also improves time spent by not giving up completely
for all SLP instances if one fails to vectorize because of alignment
issues.  And it sneaks in a correctness fix for a previous change.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2015-11-11  Richard Biener  

* tree-vectorizer.h (vect_slp_analyze_and_verify_instance_alignment):
Declare.
(vect_analyze_data_refs_alignment): Make loop vect specific.
(vect_verify_datarefs_alignment): Likewise.
* tree-vect-data-refs.c (vect_slp_analyze_data_ref_dependences):
Add missing continue.
(vect_compute_data_ref_alignment): Export.
(vect_compute_data_refs_alignment): Merge into...
(vect_analyze_data_refs_alignment): ... this.
(verify_data_ref_alignment): Split out from ...
(vect_verify_datarefs_alignment): ... here.
(vect_slp_analyze_and_verify_node_alignment): New function.
(vect_slp_analyze_and_verify_instance_alignment): Likewise.
* tree-vect-slp.c (vect_supported_load_permutation_p): Remove
misplaced checks on alignment.
(vect_slp_analyze_bb_1): Add fatal output parameter.  Do
alignment analysis after SLP discovery and do it per instance.
(vect_slp_bb): When vect_slp_analyze_bb_1 fatally failed do not
bother to re-try using different vector sizes.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h   (revision 230155)
--- gcc/tree-vectorizer.h   (working copy)
*** extern tree vect_get_smallest_scalar_typ
*** 1011,1018 
  extern bool vect_analyze_data_ref_dependences (loop_vec_info, int *);
  extern bool vect_slp_analyze_data_ref_dependences (bb_vec_info);
  extern bool vect_enhance_data_refs_alignment (loop_vec_info);
! extern bool vect_analyze_data_refs_alignment (vec_info *);
! extern bool vect_verify_datarefs_alignment (vec_info *);
  extern bool vect_analyze_data_ref_accesses (vec_info *);
  extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
  extern tree vect_check_gather_scatter (gimple *, loop_vec_info, tree *, tree 
*,
--- 1011,1019 
  extern bool vect_analyze_data_ref_dependences (loop_vec_info, int *);
  extern bool vect_slp_analyze_data_ref_dependences (bb_vec_info);
  extern bool vect_enhance_data_refs_alignment (loop_vec_info);
! extern bool vect_analyze_data_refs_alignment (loop_vec_info);
! extern bool vect_verify_datarefs_alignment (loop_vec_info);
! extern bool vect_slp_analyze_and_verify_instance_alignment (slp_instance);
  extern bool vect_analyze_data_ref_accesses (vec_info *);
  extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
  extern tree vect_check_gather_scatter (gimple *, loop_vec_info, tree *, tree 
*,
Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 230155)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_slp_analyze_data_ref_dependences (b
*** 645,650 
--- 645,651 
  (SLP_INSTANCE_TREE (instance))[0], 0);
  vect_free_slp_instance (instance);
  BB_VINFO_SLP_INSTANCES (bb_vinfo).ordered_remove (i);
+ continue;
}
i++;
  }
*** vect_slp_analyze_data_ref_dependences (b
*** 668,674 
 FOR NOW: No analysis is actually performed. Misalignment is calculated
 only for trivial cases. TODO.  */
  
! static bool
  vect_compute_data_ref_alignment (struct data_reference *dr)
  {
gimple *stmt = DR_STMT (dr);
--- 669,675 
 FOR NOW: No analysis is actually performed. Misalignment is calculated
 only for trivial cases. TODO.  */
  
! bool
  vect_compute_data_ref_alignment (struct data_reference *dr)
  {
gimple *stmt = DR_STMT (dr);
*** vect_compute_data_ref_alignment (struct
*** 838,882 
  }
  
  
- /* Function vect_compute_data_refs_alignment
- 
-Compute the misalignment of data references in the loop.
-Return FALSE if a data reference is found that cannot be vectorized.  */
- 
- static bool
- vect_compute_data_refs_alignment (vec_info *vinfo)
- {
-   vec datarefs = vinfo->datarefs;
-   struct data_reference *dr;
-   unsigned int i;
- 
-   FOR_EACH_VEC_ELT (datarefs, i, dr)
- {
-   stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr));
-   if (STMT_VINFO_VECTORIZABLE (stmt_info)
- && !vect_compute_data_ref_alignment (dr))
-   {
- /* Strided accesses perform only component accesses, misalignment
-information is irrelevant for them.  */
- if (STMT_VINFO_STRIDED_P (stmt_info)
- && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
-   continue;
- 
- if (is_a  (vinfo))
-   {
- /* Mark unsupported statement as unvectorizab

Re: [ptx] partitioning optimization

2015-11-11 Thread Bernd Schmidt


On 11/11/2015 02:59 PM, Nathan Sidwell wrote:

That's not the problem.  How to conditionally enable the test is the
difficulty.  I suspect porting something concerning accel_compiler from
the libgomp testsuite is needed?


Maybe a check_effective_target_offload_nvptx which tries to see if 
-foffload=nvptx gives an error (I would hope it does if it's unsupported).



Bernd

Re: [PATCH] PR68271 [6 Regression] Boostrap fails on x86_64-apple-darwin14 at r230084

2015-11-11 Thread Jakub Jelinek

On Wed, Nov 11, 2015 at 03:10:37PM +0100, Dominique d'Humières wrote:
> Is the following OK?
> 
> Index: gcc/ChangeLog
> ===
> --- gcc/ChangeLog (revision 230162)
> +++ gcc/ChangeLog (working copy)
> @@ -1,3 +1,10 @@
> +2015-11-11  Dominique d'Humieres 
> +
> + PR  bootstrap/68271
> + * cp/parser.h (cp_token): Update pragma_kind to 8.
> + * c-family/c-pragma.c (c_register_pragma_1): Update the gcc_assert
> + to 256.
> +

The ChangeLog entry is not.  Only one space after PR, two spaces before <
and both cp and c-family subdirectories have their own ChangeLog entries,
so you need
PR bootstrap/68271
* parser.h (cp_token): Update pragma_kind to 8.
in cp/ChangeLog and
PR bootstrap/68271
* c-pragma.c (c_register_pragma_1): Update the gcc_assert to 256.
in c-family/ChangeLog.

Ok with those changes.

Jakub

Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-11 Thread Jonathan Wakely


On 10/11/15 22:01 +0200, Ville Voutilainen wrote:

   LWG 2510, make the default constructors of library tag types
   explicit.
   * include/bits/mutex.h (defer_lock_t, try_lock_t,
   adopt_lock_t): Add an explicit default constructor.
   * include/bits/stl_pair.h (piecewise_construct_t): Likewise.
   * include/bits/uses_allocator.h (allocator_arg_t): Likewise.
   * libsupc++/new (nothrow_t): Likewise.
   * testsuite/17_intro/tag_type_explicit_ctor.cc: New.


OK for trunk, thanks.

Re: [PATCH 1/2] simplify-rtx: Simplify trunc of and of shiftrt

2015-11-11 Thread Segher Boessenkool

On Tue, Nov 10, 2015 at 10:04:30PM +0100, Bernd Schmidt wrote:
> On 11/10/2015 06:44 PM, Segher Boessenkool wrote:
> 
> >Yes I know.  All the rest of the code around is it like this though.
> >Do you want this written in a saner way?
> 
> I won't object to leaving it as-is for now, but in the future it would 
> be good to keep this in mind.

With the trunc_int_for_mode it ended up hugging the righthand margin,
so I did clean it up after all.  Please see attached (committed).


Segher


diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 17568ba..c4fc42a 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -714,6 +714,34 @@ simplify_truncation (machine_mode mode, rtx op,
 return simplify_gen_binary (ASHIFT, mode,
XEXP (XEXP (op, 0), 0), XEXP (op, 1));
 
+  /* Likewise (truncate:QI (and:SI (lshiftrt:SI (x:SI) C) C2)) into
+ (and:QI (lshiftrt:QI (truncate:QI (x:SI)) C) C2) for suitable C
+ and C2.  */
+  if (GET_CODE (op) == AND
+  && (GET_CODE (XEXP (op, 0)) == LSHIFTRT
+ || GET_CODE (XEXP (op, 0)) == ASHIFTRT)
+  && CONST_INT_P (XEXP (XEXP (op, 0), 1))
+  && CONST_INT_P (XEXP (op, 1)))
+{
+  rtx op0 = (XEXP (XEXP (op, 0), 0));
+  rtx shift_op = XEXP (XEXP (op, 0), 1);
+  rtx mask_op = XEXP (op, 1);
+  unsigned HOST_WIDE_INT shift = UINTVAL (shift_op);
+  unsigned HOST_WIDE_INT mask = UINTVAL (mask_op);
+
+  if (shift < precision
+ /* If doing this transform works for an X with all bits set,
+it works for any X.  */
+ && ((GET_MODE_MASK (mode) >> shift) & mask)
+== ((GET_MODE_MASK (op_mode) >> shift) & mask)
+ && (op0 = simplify_gen_unary (TRUNCATE, mode, op0, op_mode))
+ && (op0 = simplify_gen_binary (LSHIFTRT, mode, op0, shift_op)))
+   {
+ mask_op = GEN_INT (trunc_int_for_mode (mask, mode));
+ return simplify_gen_binary (AND, mode, op0, mask_op);
+   }
+}
+
   /* Recognize a word extraction from a multi-word subreg.  */
   if ((GET_CODE (op) == LSHIFTRT
|| GET_CODE (op) == ASHIFTRT)
-- 
1.9.3

Re: [PATCH] PR68271 [6 Regression] Boostrap fails on x86_64-apple-darwin14 at r230084

2015-11-11 Thread Dominique d'Humières

Is the following OK?

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 230162)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,10 @@
+2015-11-11  Dominique d'Humieres 
+
+   PR  bootstrap/68271
+   * cp/parser.h (cp_token): Update pragma_kind to 8.
+   * c-family/c-pragma.c (c_register_pragma_1): Update the gcc_assert
+   to 256.
+
 2015-11-11  Simon Dardis  
 
* config/mips/mips.c (mips_breakable_sequence_p): New function.
Index: gcc/cp/parser.h
===
--- gcc/cp/parser.h (revision 230162)
+++ gcc/cp/parser.h (working copy)
@@ -48,7 +48,7 @@
   /* Token flags.  */
   unsigned char flags;
   /* Identifier for the pragma.  */
-  ENUM_BITFIELD (pragma_kind) pragma_kind : 6;
+  ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
   /* True if this token is from a context where it is implicitly extern "C" */
   BOOL_BITFIELD implicit_extern_c : 1;
   /* True if an error has already been reported for this token, such as a
Index: gcc/c-family/c-pragma.c
===
--- gcc/c-family/c-pragma.c (revision 230162)
+++ gcc/c-family/c-pragma.c (working copy)
@@ -1370,9 +1370,9 @@
   id = registered_pragmas.length ();
   id += PRAGMA_FIRST_EXTERNAL - 1;
 
-  /* The C++ front end allocates 6 bits in cp_token; the C front end
-allocates 7 bits in c_token.  At present this is sufficient.  */
-  gcc_assert (id < 64);
+  /* The C++ front end allocates 8 bits in cp_token; the C front end
+allocates 8 bits in c_token.  At present this is sufficient.  */
+  gcc_assert (id < 256);
 }
 
   cpp_register_deferred_pragma (parse_in, space, name, id,

Dominique

> Le 11 nov. 2015 à 14:14, Jakub Jelinek  a écrit :
> 
> On Wed, Nov 11, 2015 at 02:11:38PM +0100, Dominique d'Humières wrote:
>> The following patch restore bootstrap on darwin
>> 
>> --- ../_clean/gcc/cp/parser.h2015-11-10 01:54:44.0 +0100
>> +++ gcc/cp/parser.h  2015-11-11 12:10:28.0 +0100
>> @@ -48,7 +48,7 @@ struct GTY (()) cp_token {
>>   /* Token flags.  */
>>   unsigned char flags;
>>   /* Identifier for the pragma.  */
>> -  ENUM_BITFIELD (pragma_kind) pragma_kind : 6;
>> +  ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
>>   /* True if this token is from a context where it is implicitly extern "C" 
>> */
>>   BOOL_BITFIELD implicit_extern_c : 1;
>>   /* True if an error has already been reported for this token, such as a
>> --- ../_clean/gcc/c-family/c-pragma.c2015-11-10 01:54:43.0 
>> +0100
>> +++ gcc/c-family/c-pragma.c  2015-11-11 12:10:25.0 +0100
>> @@ -1372,7 +1372,7 @@ c_register_pragma_1 (const char *space, 
>> 
>>   /* The C++ front end allocates 6 bits in cp_token; the C front end
>>   allocates 7 bits in c_token.  At present this is sufficient.  */
>> -  gcc_assert (id < 64);
>> +  gcc_assert (id < 256);
>> }
>> 
>>   cpp_register_deferred_pragma (parse_in, space, name, id,
>> 
>> OK to commit?
> 
> As written in the PR, please add a ChangeLog entry, don't forget about
>   PR bootstrap/68271
> line, and please update the 6 and 7 numbers in the comment to 8.
> With that the patch is ok.
> As a follow-up, we'll remove pragma_kind field in the C++ FE, to shrink the
> token by 64 bits.
> 
>   Jakub

[PATCH, HSA] fix emission of HSAIL for builtins

2015-11-11 Thread Martin Liška

Hello.

Following patch has been just applied to HSA branch and is responsible
for correct emission of builtins. As HSAIL can support approximation
for builtins like 'sin', we emit these if unsafe_math_optimization flag
is enabled. Otherwise direct call instructions are emitted.

I would like to install the patch to trunk as soon as initial patch set
will be merged.

Thanks,
Martin
>From 110a6e64af6c5ad7c925e7ef3837f3685e07fe12 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 5 Nov 2015 16:59:07 +0100
Subject: [PATCH 2/2] HSA: fix emission of HSAIL for builtins

gcc/ChangeLog:

2015-11-05  Martin Liska  

	* hsa-gen.c (gen_hsa_unaryop_builtin_call): New function.
	(gen_hsa_unaryop_or_call_for_builtin): Likewise.
	(gen_hsa_insns_for_call): Use these aforementioned functions
	to correctly dispatch between creation of a function call
	and direct usage of an HSAIL instruction.
---
 gcc/hsa-gen.c | 57 +
 1 file changed, 41 insertions(+), 16 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 48c4254..300bee6 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -4073,6 +4073,36 @@ gen_hsa_unaryop_for_builtin (int opcode, gimple *stmt, hsa_bb *hbb)
   gen_hsa_unary_operation (opcode, dest, op, hbb);
 }
 
+/* Helper functions to create a call to standard library if LHS of the
+   STMT is used.  HBB is the HSA BB to which the instruction should be
+   added.  */
+
+static void
+gen_hsa_unaryop_builtin_call (gimple *stmt, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  gen_hsa_insns_for_direct_call (stmt, hbb);
+}
+
+/* Helper functions to create a single unary HSA operations out of calls to
+   builtins (if unsafe math optimizations are enable). Otherwise, create
+   a call to standard library function.
+   OPCODE is the HSA operation to be generated.  STMT is a gimple
+   call to a builtin.  HBB is the HSA BB to which the instruction should be
+   added.  Note that nothing will be created if STMT does not have a LHS.  */
+
+static void
+gen_hsa_unaryop_or_call_for_builtin (int opcode, gimple *stmt, hsa_bb *hbb)
+{
+  if (flag_unsafe_math_optimizations)
+gen_hsa_unaryop_for_builtin (opcode, stmt, hbb);
+  else
+gen_hsa_unaryop_builtin_call (stmt, hbb);
+}
+
 /* Generate HSA address corresponding to a value VAL (as opposed to a memory
reference tree), for example an SSA_NAME or an ADDR_EXPR.  HBB is the HSA BB
to which the instruction should be added.  */
@@ -4345,7 +4375,6 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 
 case BUILT_IN_SQRT:
 case BUILT_IN_SQRTF:
-  /* TODO: Perhaps produce BRIG_OPCODE_NSQRT with -ffast-math?  */
   gen_hsa_unaryop_for_builtin (BRIG_OPCODE_SQRT, stmt, hbb);
   break;
 
@@ -4355,31 +4384,27 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
   break;
 
 case BUILT_IN_COS:
+case BUILT_IN_SIN:
+case BUILT_IN_EXP2:
+case BUILT_IN_LOG2:
+  /* HSAIL does not provide an instruction for double argument type.  */
+  gen_hsa_unaryop_builtin_call (stmt, hbb);
+  break;
+
 case BUILT_IN_COSF:
-  /* FIXME: Using the native instruction may not be precise enough.
-	 Perhaps only allow if using -ffast-math?  */
-  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NCOS, stmt, hbb);
+  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NCOS, stmt, hbb);
   break;
 
-case BUILT_IN_EXP2:
 case BUILT_IN_EXP2F:
-  /* FIXME: Using the native instruction may not be precise enough.
-	 Perhaps only allow if using -ffast-math?  */
-  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NEXP2, stmt, hbb);
+  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NEXP2, stmt, hbb);
   break;
 
-case BUILT_IN_LOG2:
 case BUILT_IN_LOG2F:
-  /* FIXME: Using the native instruction may not be precise enough.
-	 Perhaps only allow if using -ffast-math?  */
-  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NLOG2, stmt, hbb);
+  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NLOG2, stmt, hbb);
   break;
 
-case BUILT_IN_SIN:
 case BUILT_IN_SINF:
-  /* FIXME: Using the native instruction may not be precise enough.
-	 Perhaps only allow if using -ffast-math?  */
-  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NSIN, stmt, hbb);
+  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NSIN, stmt, hbb);
   break;
 
 case BUILT_IN_ATOMIC_LOAD_1:
-- 
2.6.2

Re: [ptx] partitioning optimization

2015-11-11 Thread Nathan Sidwell


On 11/11/15 07:06, Bernd Schmidt wrote:

On 11/10/2015 11:33 PM, Nathan Sidwell wrote:

I've committed this patch to trunk.  It implements a partitioning
optimization for a loop partitioned over both vector and worker axes.
We can elide the inner vector partitioning state propagation, if there
are no intervening instructions in the worker-partitioned outer loop
other than the forking and joining.  We simply execute the worker
propagation on all vectors.


Patch LGTM, although I wonder if you really need the extra option rather than
just optimize.


The reason I added the option was to be able to turn it off independent of the 
other optimizations, (in cases of debugging)



I've been unable to introduce a testcase for this. The difficulty is we
want to check an rtl dump from the acceleration compiler, and there
doesn't  appear to be existing machinery for that in the testsuite.
Perhaps something to be added later?


What's the difficulty exactly? Getting a dump should be possible with
-foffload=-fdump-whatever, does the testsuite have a problem finding the right
filename?



That's not the problem.  How to conditionally enable the test is the difficulty. 
 I suspect porting something concerning accel_compiler from the libgomp 
testsuite is needed?


nathan

[gomp4] Rework gimplifyier region flags

2015-11-11 Thread Nathan Sidwell

I've committed this patch to gomp4 to remove the openacc-specific enums from 
gimplify_omp_ctx.  Instead extending the existing omp_region_type enum.  A 
similar patch   will shortly be applied to trunk, now Jakub.s approved it.


If you had patches relying on  the old scheme, you'll need to update them.

nathan
2015-11-11  Nathan Sidwell  

	* gimplify.c (enum gimplify_omp_var_data): Remove GOVD_FORCE_MAP.
	(omp_region_type): Use hex. Add OpenACC members.
	(omp_region_kind, acc_region_kind): Delete.
	(gimplify_omp_ctx): Remove region_kind & acc_region_kind fields.
	(new_omp_context): Adjust default_kind setting.  Don't
	reinitialize fiels.
	(gimple_add_tmp_var): Add ORT_ACC check.
	(gimplify_var_or_parm_decl): Likewise.
	(omp_firstprivatize_variable): Likewise.
	(omp_add_variable): Adjust OpenACC detection.
	(oacc_default_clause): Reimplement.
	(omp_notice_variable): Adjust OpenACC detection.
	(gimplify_scan_omp_clauses): Remove region_kind arg. Adjust.
	(gimplify_scan_omp_clause_1): Adjust OpenACC detection.
	(gimmplify_oacc_cache, gimplify_oacc_declare,
	gimplify_oacc_host_data, gimplify_omp_parallel): Adjust.
	(gimplify_omp_for, gimplify_omp_workshare,
	gimplify_omp_target_update): Adjust for OpenACC ORT flags.
	(gimplify_expr): Likewise.
	(gimplify_body): Simplify OpenACC declare handling.

Index: gimplify.c
===
--- gimplify.c	(revision 230160)
+++ gimplify.c	(working copy)
@@ -89,10 +89,8 @@ enum gimplify_omp_var_data
 
   GOVD_USE_DEVICE = 1 << 17,
 
-  GOVD_FORCE_MAP = 1 << 18,
-
   /* OpenACC deviceptr clause.  */
-  GOVD_USE_DEVPTR = 1 << 19,
+  GOVD_USE_DEVPTR = 1 << 18,
 
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
 			   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
@@ -102,40 +100,37 @@ enum gimplify_omp_var_data
 
 enum omp_region_type
 {
-  ORT_WORKSHARE = 0,
-  ORT_SIMD = 1,
-  ORT_PARALLEL = 2,
-  ORT_COMBINED_PARALLEL = 3,
-  ORT_TASK = 4,
-  ORT_UNTIED_TASK = 5,
-  ORT_TEAMS = 8,
-  ORT_COMBINED_TEAMS = 9,
+  ORT_WORKSHARE = 0x00,
+  ORT_SIMD 	= 0x01,
+
+  ORT_PARALLEL	= 0x02,
+  ORT_COMBINED_PARALLEL = 0x03,
+
+  ORT_TASK	= 0x04,
+  ORT_UNTIED_TASK = 0x05,
+
+  ORT_TEAMS	= 0x08,
+  ORT_COMBINED_TEAMS = 0x09,
+
   /* Data region.  */
-  ORT_TARGET_DATA = 16,
+  ORT_TARGET_DATA = 0x10,
+
   /* Data region with offloading.  */
-  ORT_TARGET = 32,
-  ORT_COMBINED_TARGET = 33,
-  /* An OpenACC host-data region.  */
-  ORT_HOST_DATA = 64,
-  /* Dummy OpenMP region, used to disable expansion of
- DECL_VALUE_EXPRs in taskloop pre body.  */
-  ORT_NONE = 128
-};
+  ORT_TARGET	= 0x20,
+  ORT_COMBINED_TARGET = 0x21,
 
-enum omp_region_kind
-{
-  ORK_OMP,
-  ORK_OACC,
-  ORK_UNKNOWN
-};
+  ORT_HOST_DATA = 0x40,
 
-enum acc_region_kind
-{
-  ARK_GENERAL,  /* Default used for data, etc. regions.  */
-  ARK_PARALLEL, /* Parallel construct.  */
-  ARK_KERNELS,  /* Kernels construct.  */
-  ARK_DECLARE,  /* Declare directive.  */
-  ARK_UNKNOWN
+  /* OpenACC variants.  */
+  ORT_ACC	= 0x80,  /* A generic OpenACC region.  */
+  ORT_ACC_DATA	= ORT_ACC | ORT_TARGET_DATA, /* Data construct.  */
+  ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET,  /* Parallel construct */
+  ORT_ACC_KERNELS  = ORT_ACC | ORT_TARGET | 0x100,  /* Kernels construct.  */
+  ORT_ACC_HOST  = ORT_ACC | ORT_HOST_DATA,
+
+  /* Dummy OpenMP region, used to disable expansion of
+ DECL_VALUE_EXPRs in taskloop pre body.  */
+  ORT_NONE	= 0x200
 };
 
 /* Gimplify hashtable helper.  */
@@ -177,8 +172,6 @@ struct gimplify_omp_ctx
   location_t location;
   enum omp_clause_default_kind default_kind;
   enum omp_region_type region_type;
-  enum omp_region_kind region_kind;
-  enum acc_region_kind acc_region_kind;
   bool combined_loop;
   bool distribute;
   bool target_map_scalars_firstprivate;
@@ -404,19 +397,11 @@ new_omp_context (enum omp_region_type re
   c->variables = splay_tree_new (splay_tree_compare_decl_uid, 0, 0);
   c->privatized_types = new hash_set;
   c->location = input_location;
-  if ((region_type & (ORT_TASK | ORT_TARGET)) == 0)
+  c->region_type = region_type;
+  if ((region_type & ORT_TASK) == 0)
 c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   else
 c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
-  c->region_type = region_type;
-  c->region_kind = ORK_UNKNOWN;
-  c->acc_region_kind = ARK_UNKNOWN;
-  c->combined_loop = false;
-  c->distribute = false;
-  c->target_map_scalars_firstprivate = false;
-  c->target_map_pointers_as_0len_arrays = false;
-  c->target_firstprivatize_array_bases = false;
-  c->stmt = NULL;
 
   return c;
 }
@@ -730,7 +715,8 @@ gimple_add_tmp_var (tree tmp)
 	  struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp;
 	  while (ctx
 		 && (ctx->region_type == ORT_WORKSHARE
-		 || ctx->region_type == ORT_SIMD))
+		 || ctx->region_type == ORT_SIMD
+		 || ctx->region_type == ORT_ACC))
 	ctx = ctx->outer_context;
 	  if (ctx)
 	omp_add_variable (ctx, tmp, GOVD_LOCAL | GOVD_SEEN);
@

Re: OpenACC Firstprivate

2015-11-11 Thread Nathan Sidwell


On 11/11/15 03:04, Jakub Jelinek wrote:

On Tue, Nov 10, 2015 at 09:12:55AM -0500, Nathan Sidwell wrote:

+   /* Create a local object to hold the instance
+  value.  */
+   tree inst = create_tmp_var
+ (TREE_TYPE (TREE_TYPE (new_var)),
+  IDENTIFIER_POINTER (DECL_NAME (new_var)));


Can you please rewrite this as:
tree type = TREE_TYPE (TREE_TYPE (new_var));
tree n = DECL_NAME (new_var);
tree inst = create_tmp_var (type, IDENTIFIER_POINTER (n));
or so (perhaps
const char *name
  = IDENTIFIER_POINTER (DECL_NAME (new_var));
instead but then it takes one more line)?
I really don't like line breaks before opening ( unless really
necessary.


Oh, yeah you mentioned that before :)



Otherwise LGTM.


thanks.

nathan

[PATCH] Fix PR58497 testcase for SPARC

2015-11-11 Thread Richard Biener


SPARC doesn't have vector support in this testcase and no integer
multiplication.  The general scalarization support fails to fold
generated stmts so the following just does what other parts of
the lowering do - factor in constants/constructors.

On another note I noticed a tree sharing issue (mitigated by
gimplification).

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-11-11  Richard Biener  

PR tree-optimization/58497
* tree-vect-generic.c: Include gimplify.h.
(tree_vec_extract): Lookup constant/constructor DEFs.
(do_cond): Unshare cond.

Index: gcc/tree-vect-generic.c
===
--- gcc/tree-vect-generic.c (revision 230146)
+++ gcc/tree-vect-generic.c (working copy)
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
 #include "tree-eh.h"
 #include "gimple-iterator.h"
 #include "gimplify-me.h"
+#include "gimplify.h"
 #include "tree-cfg.h"
 
 
@@ -105,6 +106,15 @@ static inline tree
 tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
  tree t, tree bitsize, tree bitpos)
 {
+  if (TREE_CODE (t) == SSA_NAME)
+{
+  gimple *def_stmt = SSA_NAME_DEF_STMT (t);
+  if (is_gimple_assign (def_stmt)
+ && (gimple_assign_rhs_code (def_stmt) == VECTOR_CST
+ || (bitpos
+ && gimple_assign_rhs_code (def_stmt) == CONSTRUCTOR)))
+   t = gimple_assign_rhs1 (def_stmt);
+}
   if (bitpos)
 {
   if (TREE_CODE (type) == BOOLEAN_TYPE)
@@ -1419,7 +1429,7 @@ do_cond (gimple_stmt_iterator *gsi, tree
   if (TREE_CODE (TREE_TYPE (b)) == VECTOR_TYPE)
 b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
   tree cond = gimple_assign_rhs1 (gsi_stmt (*gsi));
-  return gimplify_build3 (gsi, code, inner_type, cond, a, b);
+  return gimplify_build3 (gsi, code, inner_type, unshare_expr (cond), a, b);
 }
 
 /* Expand a vector COND_EXPR to scalars, piecewise.  */

RE: [PATCH, Mips] Compact branch/delay slot optimization.

2015-11-11 Thread Simon Dardis

Committed as r230160.

Thanks,
Simon

> -Original Message-
> From: Moore, Catherine [mailto:catherine_mo...@mentor.com]
> Sent: 28 October 2015 14:00
> To: Simon Dardis; Matthew Fortune
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH, Mips] Compact branch/delay slot optimization.
> 
> 
> 
> > -Original Message-
> > From: Simon Dardis [mailto:simon.dar...@imgtec.com]
> > Sent: Tuesday, October 06, 2015 10:00 AM
> > To: Moore, Catherine; Matthew Fortune
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH, Mips] Compact branch/delay slot optimization.
> >
> > Hello,
> >
> > I'd like to resubmit the previous patch as it failed to check if the
> > branch inside the sequence had a compact form.
> >
> > Thanks,
> > Simon
> >
> > gcc/
> > * config/mips/mips.c: (mips_breakable_sequence_p): New function.
> >   (mips_break_sequence): New function.
> >   (mips_reorg_process_insns) Use them. Use compact branches in
> > selected
> >   situations.
> >
> > gcc/testsuite/
> > * gcc.target/mips/split-ds-sequence.c: Test for the above.
> 
> Hi Simon,
> This patch looks okay with the exception of one stylistic change.
> Please change all instances of :
> +mips_breakable_sequence_p (rtx_insn * insn)
> To:
> +mips_breakable_sequence_p (rtx_insn *insn)
> Okay, with those changes.
> Thanks,
> Catherine
> 
> 
> >
> > Index: config/mips/mips.c
> >
> ==
> > =
> > --- config/mips/mips.c  (revision 228282)
> > +++ config/mips/mips.c  (working copy)
> > @@ -16973,6 +16973,34 @@
> >}
> >  }
> >
> > +/* A SEQUENCE is breakable iff the branch inside it has a compact form
> > +   and the target has compact branches.  */
> > +
> > +static bool
> > +mips_breakable_sequence_p (rtx_insn * insn) {
> > +  return (insn && GET_CODE (PATTERN (insn)) == SEQUENCE
> > + && TARGET_CB_MAYBE
> > + && get_attr_compact_form (SEQ_BEGIN (insn)) !=
> > COMPACT_FORM_NEVER);
> > +}
> > +
> > +/* Remove a SEQUENCE and replace it with the delay slot instruction
> > +   followed by the branch and return the instruction in the delay slot.
> > +   Return the first of the two new instructions.
> > +   Subroutine of mips_reorg_process_insns.  */
> > +
> > +static rtx_insn *
> > +mips_break_sequence (rtx_insn * insn) {
> > +  rtx_insn * before = PREV_INSN (insn);
> > +  rtx_insn * branch = SEQ_BEGIN (insn);
> > +  rtx_insn * ds = SEQ_END (insn);
> > +  remove_insn (insn);
> > +  add_insn_after (ds, before, NULL);
> > +  add_insn_after (branch, ds, NULL);
> > +  return ds;
> > +}
> > +
> >  /* Go through the instruction stream and insert nops where necessary.
> > Also delete any high-part relocations whose partnering low parts
> > are now all dead.  See if the whole function can then be put into
> > @@ -17065,6 +17093,68 @@
> > {
> >   if (GET_CODE (PATTERN (insn)) == SEQUENCE)
> > {
> > + rtx_insn * next_active = next_active_insn (insn);
> > + /* Undo delay slots to avoid bubbles if the next instruction can
> > +be placed in a forbidden slot or the cost of adding an
> > +explicit NOP in a forbidden slot is OK and if the SEQUENCE is
> > +safely breakable.  */
> > + if (TARGET_CB_MAYBE
> > + && mips_breakable_sequence_p (insn)
> > + && INSN_P (SEQ_BEGIN (insn))
> > + && INSN_P (SEQ_END (insn))
> > + && ((next_active
> > +  && INSN_P (next_active)
> > +  && GET_CODE (PATTERN (next_active)) != SEQUENCE
> > +  && get_attr_can_delay (next_active) ==
> > CAN_DELAY_YES)
> > + || !optimize_size))
> > +   {
> > + /* To hide a potential pipeline bubble, if we scan backwards
> > +from the current SEQUENCE and find that there is a load
> > +of a value that is used in the CTI and there are no
> > +dependencies between the CTI and instruction in the
> > delay
> > +slot, break the sequence so the load delay is hidden.  */
> > + HARD_REG_SET uses;
> > + CLEAR_HARD_REG_SET (uses);
> > + note_uses (&PATTERN (SEQ_BEGIN (insn)),
> > record_hard_reg_uses,
> > +&uses);
> > + HARD_REG_SET delay_sets;
> > + CLEAR_HARD_REG_SET (delay_sets);
> > + note_stores (PATTERN (SEQ_END (insn)),
> > record_hard_reg_sets,
> > +  &delay_sets);
> > +
> > + rtx prev = prev_active_insn (insn);
> > + if (prev
> > + && GET_CODE (PATTERN (prev)) == SET
> > + && MEM_P (SET_SRC (PATTERN (prev
> > +   {
> > + HARD_REG_SET sets;
> > + CLEAR_HARD_REG_SET (sets);
> > + note_stores (PATTERN (prev), record_hard_reg_sets,
> > +  &sets)

Re: [gomp4] Fix some broken tests

2015-11-11 Thread Nathan Sidwell


On 11/10/15 18:08, Cesar Philippidis wrote:

On 11/10/2015 12:35 PM, Nathan Sidwell wrote:

I've committed this to  gomp4.  In preparing the reworked firstprivate
patch changes for gomp4's gimplify.c I discovered these testcases were
passing by accident, and lacked a data clause.


It used to be if a reduction was on a parallel construct, the gimplifier
would introduce a pcopy clause for the reduction variable if it was not
associated with any data clause. Is that not the case anymore?


AFAICT, the std doesn't specify that behaviour.   2.6 'Data Environment' doesn't 
mention reductions as a modifier for implicitly determined data attributes.


nathan

Re: [ptx] partitioning optimization

2015-11-11 Thread Nathan Sidwell


On 11/10/15 17:45, Ilya Verbin wrote:

I've been unable to introduce a testcase for this. The difficulty is we want
to check an rtl dump from the acceleration compiler, and there doesn't
appear to be existing machinery for that in the testsuite.  Perhaps
something to be added later?


I haven't tried it, but doesn't
/* { dg-options "-foffload=-fdump-rtl-..." } */
with
/* { dg-final { scan-rtl-dump ... } } */
work?


in the gcc testsuite directories?  That's the approach I was going for.

The issue is detecting when the test should be run.  target==nvptx-*-* isn't 
right, as the target is the x86 host machine.  There doesn't seem to be an 
existing dejagnu predicate there to select for 'accel_target==FOO'.  Am I 
missing something?


nathan

Re: [Patch] Optimize condition reductions where the result is an integer induction variable

2015-11-11 Thread Richard Biener

On Wed, Nov 11, 2015 at 1:22 PM, Alan Hayward  wrote:
> Hi,
> I hoped to post this in time for Monday’s cut off date, but circumstances
> delayed me until today. Hoping if possible this patch will still be able
> to go in.
>
>
> This patch builds upon the change for PR65947, and reduces the amount of
> code produced in a vectorized condition reduction where operand 2 of the
> COND_EXPR is an assignment of a increasing integer induction variable that
> won't wrap.
>
>
> For example (assuming all types are ints), this is a match:
>
> last = 5;
> for (i = 0; i < N; i++)
>   if (a[i] < min_v)
> last = i;
>
> Whereas, this is not because the result is based off a memory access:
> last = 5;
> for (i = 0; i < N; i++)
>   if (a[i] < min_v)
> last = a[i];
>
> In the integer induction variable case we can just use a MAX reduction and
> skip all the code I added in my vectorized condition reduction patch - the
> additional induction variables in vectorizable_reduction () and the
> additional checks in vect_create_epilog_for_reduction (). From the patch
> diff only, it's not immediately obvious that those parts will be skipped
> as there is no code changes in those areas.
>
> The initial value of the induction variable is force set to zero, as any
> other value could effect the result of the induction. At the end of the
> loop, if the result is zero, then we restore the original initial value.

+static bool
+is_integer_induction (gimple *stmt, struct loop *loop)

is_nonwrapping_integer_induction?

+  tree lhs_max = TYPE_MAX_VALUE (TREE_TYPE (gimple_phi_result (stmt)));

don't use TYPE_MAX_VALUE.

+  /* Check that the induction increments.  */
+  if (tree_int_cst_compare (step, size_zero_node) <= 0)
+return false;

tree_int_cst_sgn (step) == -1

+  /* Check that the max size of the loop will not wrap.  */
+
+  if (! max_loop_iterations (loop, &ni))
+return false;
+  /* Convert backedges to iterations.  */
+  ni += 1;

just use max_stmt_executions (loop, &ni) which properly checks for overflow
of the +1.

+  max_loop_value = wi::add (wi::to_widest (base),
+   wi::mul (wi::to_widest (step), ni));
+
+  if (wi::gtu_p (max_loop_value, wi::to_widest (lhs_max)))
+return false;

you miss a check for the wi::add / wi::mul to overflow.  You can use
extra args to determine this.

Instead of TYPE_MAX_VALUE use wi::max_value (precision, sign).

I wonder if you want to skip all the overflow checks for TYPE_OVERFLOW_UNDEFINED
IV types?

Thanks,
Richard.

>
>
>
> Cheers,
> Alan.
>

Re: libgo patch committed: Update to Go 1.5 release

2015-11-11 Thread Ian Lance Taylor

On Wed, Nov 11, 2015 at 3:48 AM, Rainer Orth
 wrote:
> Ian Lance Taylor  writes:
>
>> On Sun, Nov 8, 2015 at 9:21 AM, Rainer Orth  
>> wrote:
>>>
>>> There were two remaining problems:
>>>
>>> * Before Solaris 12, sendfile only lives in libsendfile.  This lead to
>>>   link failures in gotools.
>>>
>>> * Solaris 12 introduced a couple more types that use _in6_addr_t, which
>>>   are filtered out by mksysinfo.sh, leading to compilation failues.
>>>
>>> The following patch addresses both issues.  Solaris 10 and 11 bootstraps
>>> have completed, a Solaris 12 bootstrap is still running make check.
>>
>> Thanks.  Committed to mainline.
>
> Great, thanks.  The mkssysinfo.sh part is also necessary on the gcc-5
> branch.  Tested on i386-pc-solaris2.12 and sparc-sun-solaris2.12, ok to
> install?

Sure, go ahead.

Ian

Re: [PATCH] PR68271 [6 Regression] Boostrap fails on x86_64-apple-darwin14 at r230084

2015-11-11 Thread Jakub Jelinek

On Wed, Nov 11, 2015 at 02:11:38PM +0100, Dominique d'Humières wrote:
> The following patch restore bootstrap on darwin
> 
> --- ../_clean/gcc/cp/parser.h 2015-11-10 01:54:44.0 +0100
> +++ gcc/cp/parser.h   2015-11-11 12:10:28.0 +0100
> @@ -48,7 +48,7 @@ struct GTY (()) cp_token {
>/* Token flags.  */
>unsigned char flags;
>/* Identifier for the pragma.  */
> -  ENUM_BITFIELD (pragma_kind) pragma_kind : 6;
> +  ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
>/* True if this token is from a context where it is implicitly extern "C" 
> */
>BOOL_BITFIELD implicit_extern_c : 1;
>/* True if an error has already been reported for this token, such as a
> --- ../_clean/gcc/c-family/c-pragma.c 2015-11-10 01:54:43.0 +0100
> +++ gcc/c-family/c-pragma.c   2015-11-11 12:10:25.0 +0100
> @@ -1372,7 +1372,7 @@ c_register_pragma_1 (const char *space, 
>  
>/* The C++ front end allocates 6 bits in cp_token; the C front end
>allocates 7 bits in c_token.  At present this is sufficient.  */
> -  gcc_assert (id < 64);
> +  gcc_assert (id < 256);
>  }
>  
>cpp_register_deferred_pragma (parse_in, space, name, id,
> 
> OK to commit?

As written in the PR, please add a ChangeLog entry, don't forget about
PR bootstrap/68271
line, and please update the 6 and 7 numbers in the comment to 8.
With that the patch is ok.
As a follow-up, we'll remove pragma_kind field in the C++ FE, to shrink the
token by 64 bits.

Jakub

Re: [PATCH] Simple optimization for MASK_STORE.

2015-11-11 Thread Yuri Rumyantsev

Richard,

What we should do to cope with this problem (structure size increasing)?
Should we return to vector comparison version?

Thanks.
Yuri.

2015-11-11 12:18 GMT+03:00 Richard Biener :
> On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich  wrote:
>> 2015-11-10 17:46 GMT+03:00 Richard Biener :
>>> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich  
>>> wrote:
 2015-11-10 15:33 GMT+03:00 Richard Biener :
> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev  
> wrote:
>> Richard,
>>
>> I tried it but 256-bit precision integer type is not yet supported.
>
> What's the symptom?  The compare cannot be expanded?  Just add a pattern 
> then.
> After all we have modes up to XImode.

 I suppose problem may be in:

 gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128)

 which doesn't allow to create constants of bigger size.  Changing it
 to maximum vector size (512) would mean we increase wide_int structure
 size significantly. New patterns are probably also needed.
>>>
>>> Yes, new patterns are needed but wide-int should be fine (we only need to 
>>> create
>>> a literal zero AFACS).  The "new pattern" would be equality/inequality
>>> against zero
>>> compares only.
>>
>> Currently 256bit integer creation fails because wide_int for max and
>> min values cannot be created.
>
> Hmm, indeed:
>
> #1  0x0072dab5 in wi::extended_tree<192>::extended_tree (
> this=0x7fffd950, t=0x76a000b0)
> at /space/rguenther/src/svn/trunk/gcc/tree.h:5125
> 5125  gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);
>
> but that's not that the constants fail to be created but
>
> #5  0x010d8828 in build_nonstandard_integer_type (precision=512,
> unsignedp=65) at /space/rguenther/src/svn/trunk/gcc/tree.c:8051
> 8051  if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
> (gdb) l
> 8046fixup_unsigned_type (itype);
> 8047  else
> 8048fixup_signed_type (itype);
> 8049
> 8050  ret = itype;
> 8051  if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
> 8052ret = type_hash_canon (tree_to_uhwi (TYPE_MAX_VALUE
> (itype)), itype);
>
> thus the integer type hashing being "interesting".  tree_fits_uhwi_p
> fails because
> it does
>
> 7289bool
> 7290tree_fits_uhwi_p (const_tree t)
> 7291{
> 7292  return (t != NULL_TREE
> 7293  && TREE_CODE (t) == INTEGER_CST
> 7294  && wi::fits_uhwi_p (wi::to_widest (t)));
> 7295}
>
> and wi::to_widest () fails with doing
>
> 5121template 
> 5122inline wi::extended_tree ::extended_tree (const_tree t)
> 5123  : m_t (t)
> 5124{
> 5125  gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);
> 5126}
>
> fixing the hashing then runs into type_cache_hasher::equal doing
> tree_int_cst_equal
> which again uses to_widest (it should be easier and cheaper to do the compare 
> on
> the actual tree representation, but well, seems to be just the first
> of various issues
> we'd run into).
>
> We eventually could fix the assert above (but then need to hope we assert
> when a computation overflows the narrower precision of widest_int) or use
> a special really_widest_int (ugh).
>
>> It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases
>> WIDE_INT_MAX_ELTS
>> and thus increases wide_int structure. If we use 512 for
>> MAX_BITSIZE_MODE_ANY_INT then
>> wide_int structure would grow by 48 bytes (16 bytes if use 256 for
>> MAX_BITSIZE_MODE_ANY_INT).
>> Is it OK for such narrow usage?
>
> widest_int is used in some long-living structures (which is the reason for
> MAX_BITSIZE_MODE_ANY_INT in the first place).  So I don't think so.
>
> Richard.
>
>> Ilya
>>
>>>
>>> Richard.
>>>
 Ilya

>
> Richard.
>
>> Yuri.
>>
>>

[PATCH] PR68271 [6 Regression] Boostrap fails on x86_64-apple-darwin14 at r230084

2015-11-11 Thread Dominique d'Humières

The following patch restore bootstrap on darwin

--- ../_clean/gcc/cp/parser.h   2015-11-10 01:54:44.0 +0100
+++ gcc/cp/parser.h 2015-11-11 12:10:28.0 +0100
@@ -48,7 +48,7 @@ struct GTY (()) cp_token {
   /* Token flags.  */
   unsigned char flags;
   /* Identifier for the pragma.  */
-  ENUM_BITFIELD (pragma_kind) pragma_kind : 6;
+  ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
   /* True if this token is from a context where it is implicitly extern "C" */
   BOOL_BITFIELD implicit_extern_c : 1;
   /* True if an error has already been reported for this token, such as a
--- ../_clean/gcc/c-family/c-pragma.c   2015-11-10 01:54:43.0 +0100
+++ gcc/c-family/c-pragma.c 2015-11-11 12:10:25.0 +0100
@@ -1372,7 +1372,7 @@ c_register_pragma_1 (const char *space, 
 
   /* The C++ front end allocates 6 bits in cp_token; the C front end
 allocates 7 bits in c_token.  At present this is sufficient.  */
-  gcc_assert (id < 64);
+  gcc_assert (id < 256);
 }
 
   cpp_register_deferred_pragma (parse_in, space, name, id,

OK to commit?

Dominique

Re: Enable pointer TBAA for LTO

2015-11-11 Thread Bernd Schmidt


On 11/11/2015 10:21 AM, Richard Biener wrote:

On Tue, 10 Nov 2015, Jan Hubicka wrote:

The reason is that TYPE_CANONICAL is initialized in get_alias_set that may be
called before we finish all merging and then it is more fine grained than what
we need here (i.e. TYPE_CANONICAL of pointers to two differnt types will be
different, but here we want them to be equal so we can match:

struct aa { void *ptr;};
struct bb { int * ptr;};

Which is actually required for Fortran interoperability.


Just curious, is this sort of thing documented anywhere?


Bernd

Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Bernd Schmidt


On 11/11/2015 01:31 PM, Marek Polacek wrote:


Certainly I'm in favor of sharing code between C and C++ FEs, though in
this case it didn't seem too important/obvious, because of the extra !=
error_mark_node check + I don't really like the new function getting *type
and setting it there.


Make it return bool to indicate whether to change type to error_mark.


Bernd

Re: [PATCH] Fix PR rtl-optimization/68287

2015-11-11 Thread Martin Liška

On 11/11/2015 01:20 PM, Richard Biener wrote:
> On Wed, Nov 11, 2015 at 12:18 PM, Martin Liška  wrote:
>> Hi.
>>
>> There's a fix for fallout of r230027.
>>
>> Patch can bootstrap and survives regression tests on x86_64-linux-gnu.
> 
> Hmm, but only the new elements are zeroed so this still is different
> from previous behavior.
> Note that the previous .create (...) doesn't initialize the elements
> either (well, it's not supposed to ...).
> 
> I _think_ the bug is that you do safe_grow and use length while the
> previous code just added
> enough reserve (but not actual elements!).
> 
> Thus the fix would be to do
> 
>  point_freq_vec.truncate (0);
>  point_freq_vec.reserve_exact (new_length);
> 
> Richard.

Ahh, I see! Thanks for suggestion. I'm going to re-run regression
tests and bootstrap.

I consider previous email as confirmation for the patch to be installed.

Thanks,
Martin

> 
>> Ready for trunk?
>> Thanks,
>> Martin

>From f719039abd856962d4ab9c0e61994aba413aeffa Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 11 Nov 2015 10:11:20 +0100
Subject: [PATCH 1/3] Fix PR rtl-optimization/68287

gcc/ChangeLog:

2015-11-11  Martin Liska  
	Richard Biener  

	PR rtl-optimization/68287
	* lra-lives.c (lra_create_live_ranges_1): Reserve the right
	number of elements.
---
 gcc/lra-lives.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 9453759..5f76a87 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -1241,8 +1241,8 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
   unused_set = sparseset_alloc (max_regno);
   curr_point = 0;
   unsigned new_length = get_max_uid () * 2;
-  if (point_freq_vec.length () < new_length)
-point_freq_vec.safe_grow (new_length);
+  point_freq_vec.truncate (0);
+  point_freq_vec.reserve_exact (new_length);
   lra_point_freq = point_freq_vec.address ();
   int *post_order_rev_cfg = XNEWVEC (int, last_basic_block_for_fn (cfun));
   int n_blocks_inverted = inverted_post_order_compute (post_order_rev_cfg);
-- 
2.6.2

Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Marek Polacek

On Tue, Nov 10, 2015 at 12:40:49PM -0700, Martin Sebor wrote:
> On 11/10/2015 09:36 AM, Marek Polacek wrote:
> >While both C and C++ FEs are able to reject e.g.
> >int a[__SIZE_MAX__ / sizeof(int)];
> >they are accepting code such as
> >int (*a)[__SIZE_MAX__ / sizeof(int)];
> >
> >As Joseph pointed out, any construction of a non-VLA type whose size is half 
> >or
> >more of the address space should receive a compile-time error.
> >
> >Done by moving up the check for the size in bytes so that it checks check 
> >every
> >non-VLA complete array type constructed in the course of processing the
> >declarator.  Since the C++ FE had the same problem, I've fixed it up there as
> >well.  And that's why I had to twek dg-error of two C++ tests; if the size of
> >an array is considered invalid, we give an error message with word "unnamed".
> >
> >(I've removed the comment about crashing in tree_to_[su]hwi since that seems
> >to no longer be the case.)
> 
> Thanks for including me on this. I tested it with C++ references
> to arrays (in addition to pointers) and it works correctly for
> those as well (unsurprisingly). The only thing that bothers me

Good, thanks!

> a bit is that the seemingly  arbitrary inconsistency between
> the diagnostics:
 
> >+p = new char [1][MAX - 99]; // { dg-error "size of unnamed 
> >array" }
> >  p = new char [1][MAX / 2];  // { dg-error "size of array" }
> 
> Would it be possible to make the message issued by the front ends
> the same? I.e., either both "unnamed array" or both just "array?"

Yeah, I was thinking about that, too, but I was also hoping that we can
clean this up as a follow-up.  I think let's drop the "unnamed" word, even
though that means that the changes in new44.C brought with my patch will
essentially have to be reverted...

Oh, and we could also be more informative and print the size of an array,
or the number of elements, as clang does.

Thanks,

Marek

Re: [patch] Fix PR target/67265

2015-11-11 Thread Bernd Schmidt


On 11/11/2015 01:31 PM, Eric Botcazou wrote:

Yes, it probably should, thanks for spotting it, revised patch attached.


 PR target/67265
 * ira.c (ira_setup_eliminable_regset): Do not necessarily create the
 frame pointer for stack checking if non-call exceptions aren't used.
* config/i386/i386.c (ix86_finalize_stack_realign_flags): Likewise.


Ok if it passes testing. Should have thought of it earlier, but if you 
want to, you can also make a fp_needed_for_stack_checking_p function 
containing the four tests.



Bernd

RE: [PATCH 2/2][ARC] Add support for ARCv2 CPUs

2015-11-11 Thread Claudiu Zissulescu

This patch is committed.

Thanks Joern,
Claudiu

> -Original Message-
> From: Joern Wolfgang Rennecke [mailto:g...@amylaar.uk]
> Sent: Tuesday, November 10, 2015 3:02 PM
> To: Claudiu Zissulescu; gcc-patches@gcc.gnu.org
> Cc: Francois Bedard; jeremy.benn...@embecosm.com
> Subject: Re: [PATCH 2/2][ARC] Add support for ARCv2 CPUs
> 
> 
> 
> On 30/10/15 11:22, Claudiu Zissulescu wrote:
> > Hi,
> >
> > Please find the updated patch.  Both ARC patches were tested using
> dg.exp. The ChangeLog entry is unchanged.
> 
> This is OK.

Re: [patch] Fix PR target/67265

2015-11-11 Thread Eric Botcazou

> This piece of code along doesn't tell me exactly why the frame pointer
> is needed. I was looking for an explicit use, but I now guess that if
> you have multiple adjusts of the [stack] pointer you can't easily undo
> them in the error case (the function behaves as-if using alloca). Is
> that it?

Yes, exactly, the analogy with alloca is correct.

> And without exceptions I assume you just get a call to abort so
> it doesn't matter? If I understood all that right, then this is ok.

If you don't care about exceptions on stack overflow, then the signal will 
very likely terminate the program instead of overwriting stack contents, which 
is good enough.  In Ada, the language requires you to exit gracefully or even 
to resume regular execution (at least in theory for the latter).

> In i386.c I see a code block with a similar condition,
> 
>/* If the only reason for frame_pointer_needed is that we conservatively
>   assumed stack realignment might be needed, but in the end nothing that
> needed the stack alignment had been spilled, clear
> frame_pointer_needed
>   and say we don't need stack realignment.  */
> 
> and the condition has
> 
>&& !(flag_stack_check && STACK_CHECK_MOVING_SP)
> 
> Should that be changed too?

Yes, it probably should, thanks for spotting it, revised patch attached.


PR target/67265
* ira.c (ira_setup_eliminable_regset): Do not necessarily create the
frame pointer for stack checking if non-call exceptions aren't used.
* config/i386/i386.c (ix86_finalize_stack_realign_flags): Likewise.


-- 
Eric BotcazouIndex: ira.c
===
--- ira.c	(revision 230146)
+++ ira.c	(working copy)
@@ -2259,9 +2259,12 @@ ira_setup_eliminable_regset (void)
   frame_pointer_needed
 = (! flag_omit_frame_pointer
|| (cfun->calls_alloca && EXIT_IGNORE_STACK)
-   /* We need the frame pointer to catch stack overflow exceptions
-	  if the stack pointer is moving.  */
-   || (flag_stack_check && STACK_CHECK_MOVING_SP)
+   /* We need the frame pointer to catch stack overflow exceptions if
+	  the stack pointer is moving (as for the alloca case just above).  */
+   || (STACK_CHECK_MOVING_SP
+	   && flag_stack_check
+	   && flag_exceptions
+	   && cfun->can_throw_non_call_exceptions)
|| crtl->accesses_prior_frames
|| (SUPPORTS_STACK_ALIGNMENT && crtl->stack_realign_needed)
/* We need a frame pointer for all Cilk Plus functions that use
Index: config/i386/i386.c
===
--- config/i386/i386.c	(revision 230146)
+++ config/i386/i386.c	(working copy)
@@ -12470,7 +12466,11 @@ ix86_finalize_stack_realign_flags (void)
   && !crtl->accesses_prior_frames
   && !cfun->calls_alloca
   && !crtl->calls_eh_return
-  && !(flag_stack_check && STACK_CHECK_MOVING_SP)
+  /* See ira_setup_eliminable_regset for the rationale.  */
+  && !(STACK_CHECK_MOVING_SP
+	   && flag_stack_check
+	   && flag_exceptions
+	   && cfun->can_throw_non_call_exceptions)
   && !ix86_frame_pointer_required ()
   && get_frame_size () == 0
   && ix86_nsaved_sseregs () == 0

RE: [PATCH 1/2][ARC] Add support for ARCv2 CPUs

2015-11-11 Thread Claudiu Zissulescu

This patch is committed (without the gen_compare_reg change). 

Thanks Joern,
Claudiu

> Apart from the gen_compare_reg change, the patch is OK.
> If the v2 support mostly works like support for the other subtargets, you may
> check it in without the gen_compare_reg change.
> If that change is required because of particular code paths taken with the v2
> port, you may check in the whole patch.
> 
> The operand-swapping in gen_compare_reg was not expected to be
> triggered when re-generating a comparison, as comparisons gleaned from
> existing instructions are supposed to already have the operands in the right
> order.
> Do you have a testcase that triggers the assert independently from the
> v2 support?
> If you can name a pre-existing testcase to trigger the assert, the patch is
> approved for separate check-in.
> If you have a new testcase, is it in a form and of a legal status that it can 
> be
> submitted for inclusion in the gcc regression tests suite?

Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Marek Polacek

On Tue, Nov 10, 2015 at 06:38:31PM +0100, Paolo Carlini wrote:
>  Hi,
> 
> On 11/10/2015 05:36 PM, Marek Polacek wrote:
> >+
> >+/* Did array size calculations overflow or does the array
> >+   cover more than half of the address-space?  */
> >+if (COMPLETE_TYPE_P (type)
> >+&& TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST
> >+&& !valid_constant_size_p (TYPE_SIZE_UNIT (type)))
> >+  {
> >+if (name)
> >+  error_at (loc, "size of array %qE is too large", name);
> >+else
> >+  error_at (loc, "size of unnamed array is too large");
> >+type = error_mark_node;
> >+  }
> >   }
> Obviously "the issue" predates your proposed change, but I don't understand
> why the code implementing the check can't be shared by the front-ends via a
> small function in c-family...

Certainly I'm in favor of sharing code between C and C++ FEs, though in
this case it didn't seem too important/obvious, because of the extra !=
error_mark_node check + I don't really like the new function getting *type
and setting it there.

But I'll submit another version of the patch with a common function.

Marek

[Patch] Optimize condition reductions where the result is an integer induction variable

2015-11-11 Thread Alan Hayward

Hi,
I hoped to post this in time for Monday’s cut off date, but circumstances
delayed me until today. Hoping if possible this patch will still be able
to go in.


This patch builds upon the change for PR65947, and reduces the amount of
code produced in a vectorized condition reduction where operand 2 of the
COND_EXPR is an assignment of a increasing integer induction variable that
won't wrap.
 

For example (assuming all types are ints), this is a match:

last = 5;
for (i = 0; i < N; i++)
  if (a[i] < min_v)
last = i;

Whereas, this is not because the result is based off a memory access:
last = 5;
for (i = 0; i < N; i++)
  if (a[i] < min_v)
last = a[i];

In the integer induction variable case we can just use a MAX reduction and
skip all the code I added in my vectorized condition reduction patch - the
additional induction variables in vectorizable_reduction () and the
additional checks in vect_create_epilog_for_reduction (). From the patch
diff only, it's not immediately obvious that those parts will be skipped
as there is no code changes in those areas.

The initial value of the induction variable is force set to zero, as any
other value could effect the result of the induction. At the end of the
loop, if the result is zero, then we restore the original initial value.




Cheers,
Alan.



optimizeConditionReductions.patch
Description: Binary data

Re: [PATCH] PR67305, tighten neon_vector_mem_operand on eliminable registers

2015-11-11 Thread Ramana Radhakrishnan



On 04/11/15 09:45, Jiong Wang wrote:
> As discussed at the bugzilla
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67305
> 
> neon_vector_mem_operand is broken.  As the comments says
> "/* Reject eliminable registers.  */", the code block at the head
> of this function which checks eliminable registers is designed to do
> early reject only, there shouldn't be any early accept.
> 
> If this code hunk doesn't reject the incoming rtx, then the rtx pattern
> should still go through all default checks below. All other similar
> functions, thumb1_legitimate_address_p, arm_coproc_mem_operand,
> neon_struct_mem_operand etc are exactly follow this check flow.
> 
> So as Jim Wilson commented on the bugzilla, instead of "return !strict",
> we need to only do the check if strict be true, and only does rejection
> which means return FALSE, for all other cases, we need to go through
> those normal checks below.
> 
> neon_vector_mem_operand is only used by several misalign pattern, I
> guess that's why this bug is not exposed for long time.
> 
> boostrap & regression OK on armv8 aarch32, ok for trunk?
> 
> 2015-11-04  Jiong Wang  
> Jim Wilson  
> 
> gcc/
>   PR target/67305
>   * config/arm/arm.md (neon_vector_mem_operand): Return FALSE if strict
>   be true and eliminable registers mentioned.
> 


This has been lurking for a long time ...  Sorry about the delay in reviewing. 
This is OK for trunk

regards
Ramana

> 
> neon-mem.patch
> 
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 87e55e9..7fbf897 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -12957,14 +12957,14 @@ neon_vector_mem_operand (rtx op, int type, bool 
> strict)
>rtx ind;
>  
>/* Reject eliminable registers.  */
> -  if (! (reload_in_progress || reload_completed)
> -  && (   reg_mentioned_p (frame_pointer_rtx, op)
> +  if (strict && ! (reload_in_progress || reload_completed)
> +  && (reg_mentioned_p (frame_pointer_rtx, op)
> || reg_mentioned_p (arg_pointer_rtx, op)
> || reg_mentioned_p (virtual_incoming_args_rtx, op)
> || reg_mentioned_p (virtual_outgoing_args_rtx, op)
> || reg_mentioned_p (virtual_stack_dynamic_rtx, op)
> || reg_mentioned_p (virtual_stack_vars_rtx, op)))
> -return !strict;
> +return FALSE;
>  
>/* Constants are converted into offsets from labels.  */
>if (!MEM_P (op))
>

Re: [PATCH] Fix PR rtl-optimization/68287

2015-11-11 Thread Richard Biener

On Wed, Nov 11, 2015 at 12:18 PM, Martin Liška  wrote:
> Hi.
>
> There's a fix for fallout of r230027.
>
> Patch can bootstrap and survives regression tests on x86_64-linux-gnu.

Hmm, but only the new elements are zeroed so this still is different
from previous behavior.
Note that the previous .create (...) doesn't initialize the elements
either (well, it's not supposed to ...).

I _think_ the bug is that you do safe_grow and use length while the
previous code just added
enough reserve (but not actual elements!).

Thus the fix would be to do

 point_freq_vec.truncate (0);
 point_freq_vec.reserve_exact (new_length);

Richard.

> Ready for trunk?
> Thanks,
> Martin

Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers

2015-11-11 Thread Kyrill Tkachov



On 11/11/15 12:08, Charles Baylis wrote:

On 11 November 2015 at 11:22, Kyrill Tkachov  wrote:

Hi Charles,

On 08/11/15 00:26, charles.bay...@linaro.org wrote:

From: Charles Baylis 

  Charles Baylis  

 * config/arm/neon.md (neon_vld1_lane): Remove error for
invalid
 lane number.
 (neon_vst1_lane): Likewise.
 (neon_vld2_lane): Likewise.
 (neon_vst2_lane): Likewise.
 (neon_vld3_lane): Likewise.
 (neon_vst3_lane): Likewise.
 (neon_vld4_lane): Likewise.
 (neon_vst4_lane): Likewise.


In this pattern the 'max' variable is now unused, causing a bootstrap
-Werror failure on arm.
I'll test a patch to fix it unless you beat me to it...

Thanks for catching this.

I have a patch, and have started a bootstrap. Unless you have
objections, I'll apply as obvious once the bootstrap is complete later
this afternoon.


Yes, that's the exact patch I'm testing as well.
I'll let you finish the bootstrap and commit it.

Thanks,
Kyrill



 gcc/ChangeLog:

 2015-11-11  Charles Baylis  

 * config/arm/neon.md: (neon_vld2_lane): Remove unused max
 variable.
 (neon_vst2_lane): Likewise.
 (neon_vld3_lane): Likewise.
 (neon_vst3_lane): Likewise.
 (neon_vld4_lane): Likewise.
 (neon_vst4_lane): Likewise.

Re: [PATCH] PR67305, tighten neon_vector_mem_operand on eliminable registers

2015-11-11 Thread Jiong Wang



On 04/11/15 09:45, Jiong Wang wrote:

As discussed at the bugzilla

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67305

neon_vector_mem_operand is broken.  As the comments says
"/* Reject eliminable registers.  */", the code block at the head
of this function which checks eliminable registers is designed to do
early reject only, there shouldn't be any early accept.

If this code hunk doesn't reject the incoming rtx, then the rtx pattern
should still go through all default checks below. All other similar
functions, thumb1_legitimate_address_p, arm_coproc_mem_operand,
neon_struct_mem_operand etc are exactly follow this check flow.

So as Jim Wilson commented on the bugzilla, instead of "return !strict",
we need to only do the check if strict be true, and only does rejection
which means return FALSE, for all other cases, we need to go through
those normal checks below.

neon_vector_mem_operand is only used by several misalign pattern, I
guess that's why this bug is not exposed for long time.

boostrap & regression OK on armv8 aarch32, ok for trunk?

2015-11-04  Jiong Wang  
Jim Wilson  

gcc/
  PR target/67305
  * config/arm/arm.md (neon_vector_mem_operand): Return FALSE if strict
  be true and eliminable registers mentioned.


Ping ~

Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers

2015-11-11 Thread Charles Baylis

On 11 November 2015 at 11:22, Kyrill Tkachov  wrote:
> Hi Charles,
>
> On 08/11/15 00:26, charles.bay...@linaro.org wrote:
>>
>> From: Charles Baylis 
>>
>>   Charles Baylis  
>>
>> * config/arm/neon.md (neon_vld1_lane): Remove error for
>> invalid
>> lane number.
>> (neon_vst1_lane): Likewise.
>> (neon_vld2_lane): Likewise.
>> (neon_vst2_lane): Likewise.
>> (neon_vld3_lane): Likewise.
>> (neon_vst3_lane): Likewise.
>> (neon_vld4_lane): Likewise.
>> (neon_vst4_lane): Likewise.
>>

> In this pattern the 'max' variable is now unused, causing a bootstrap
> -Werror failure on arm.
> I'll test a patch to fix it unless you beat me to it...

Thanks for catching this.

I have a patch, and have started a bootstrap. Unless you have
objections, I'll apply as obvious once the bootstrap is complete later
this afternoon.

gcc/ChangeLog:

2015-11-11  Charles Baylis  

* config/arm/neon.md: (neon_vld2_lane): Remove unused max
variable.
(neon_vst2_lane): Likewise.
(neon_vld3_lane): Likewise.
(neon_vst3_lane): Likewise.
(neon_vld4_lane): Likewise.
(neon_vst4_lane): Likewise.
From f111cb543bff0ad8756a0240f8bb1af1f19b Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Wed, 11 Nov 2015 11:59:44 +
Subject: [PATCH] [ARM] remove unused variable

gcc/ChangeLog:

2015-11-11  Charles Baylis  

* config/arm/neon.md: (neon_vld2_lane): Remove unused max
	variable.
	(neon_vst2_lane): Likewise.
	(neon_vld3_lane): Likewise.
	(neon_vst3_lane): Likewise.
	(neon_vld4_lane): Likewise.
	(neon_vst4_lane): Likewise.

Change-Id: Ifed53e2d4c5a581770848cab65cf2e8d1d9039c3
---
 gcc/config/arm/neon.md | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 119550c..62fb6da 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4464,7 +4464,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[3]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[0]);
   rtx ops[4];
   ops[0] = gen_rtx_REG (DImode, regno);
@@ -4579,7 +4578,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[2]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[1]);
   rtx ops[4];
   ops[0] = operands[0];
@@ -4723,7 +4721,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N (mode, INTVAL (operands[3]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[0]);
   rtx ops[5];
   ops[0] = gen_rtx_REG (DImode, regno);
@@ -4895,7 +4892,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[2]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[1]);
   rtx ops[5];
   ops[0] = operands[0];
@@ -5045,7 +5041,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[3]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[0]);
   rtx ops[6];
   ops[0] = gen_rtx_REG (DImode, regno);
@@ -5225,7 +5220,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[2]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[1]);
   rtx ops[6];
   ops[0] = operands[0];
-- 
1.9.1

Re: [ptx] partitioning optimization

2015-11-11 Thread Bernd Schmidt


On 11/10/2015 11:33 PM, Nathan Sidwell wrote:

I've committed this patch to trunk.  It implements a partitioning
optimization for a loop partitioned over both vector and worker axes.
We can elide the inner vector partitioning state propagation, if there
are no intervening instructions in the worker-partitioned outer loop
other than the forking and joining.  We simply execute the worker
propagation on all vectors.


Patch LGTM, although I wonder if you really need the extra option rather 
than just optimize.



I've been unable to introduce a testcase for this. The difficulty is we
want to check an rtl dump from the acceleration compiler, and there
doesn't  appear to be existing machinery for that in the testsuite.
Perhaps something to be added later?


What's the difficulty exactly? Getting a dump should be possible with 
-foffload=-fdump-whatever, does the testsuite have a problem finding the 
right filename?



Bernd

Re: [patch] Fix PR target/67265

2015-11-11 Thread Bernd Schmidt


On 11/11/2015 12:38 PM, Eric Botcazou wrote:

this is an ICE on an asm statement requiring a lot of registers, when compiled
in 32-bit mode on x86/Linux with -O -fstack-check -fPIC:

pr67265.c:10:3: error: 'asm' operand has impossible constraints

The issue is that, since stack checking defines STACK_CHECK_MOVING_SP on this
platform, the frame pointer is necessary in order to be able to propagate
exceptions raised on stack overflow.  But this is required only in Ada so we
can certainly avoid doing it in C or C++.



/* We need the frame pointer to catch stack overflow exceptions
  if the stack pointer is moving.  */
-   || (flag_stack_check && STACK_CHECK_MOVING_SP)
+   || (STACK_CHECK_MOVING_SP
+  && flag_stack_check
+  && flag_exceptions
+  && cfun->can_throw_non_call_exceptions)


This piece of code along doesn't tell me exactly why the frame pointer 
is needed. I was looking for an explicit use, but I now guess that if 
you have multiple adjusts of the frame pointer you can't easily undo 
them in the error case (the function behaves as-if using alloca). Is 
that it? And without exceptions I assume you just get a call to abort so 
it doesn't matter? If I understood all that right, then this is ok.


In i386.c I see a code block with a similar condition,

  /* If the only reason for frame_pointer_needed is that we conservatively
 assumed stack realignment might be needed, but in the end nothing that
 needed the stack alignment had been spilled, clear 
frame_pointer_needed

 and say we don't need stack realignment.  */

and the condition has

  && !(flag_stack_check && STACK_CHECK_MOVING_SP)

Should that be changed too?


Bernd

[gomp4] Merge trunk r230082 (2015-11-10) into gomp-4_0-branch

2015-11-11 Thread Thomas Schwinge

Hi!

Committed to gomp-4_0-branch in r230154:

commit 1fe1fa3a7b9d4286630cd286e0a52abe2d11e955
Merge: 02d9df1 76e711b
Author: tschwinge 
Date:   Wed Nov 11 11:43:09 2015 +

svn merge -r 230048:230082 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@230154 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: libgo patch committed: Update to Go 1.5 release

2015-11-11 Thread Rainer Orth

Ian Lance Taylor  writes:

> On Sun, Nov 8, 2015 at 9:21 AM, Rainer Orth  
> wrote:
>>
>> There were two remaining problems:
>>
>> * Before Solaris 12, sendfile only lives in libsendfile.  This lead to
>>   link failures in gotools.
>>
>> * Solaris 12 introduced a couple more types that use _in6_addr_t, which
>>   are filtered out by mksysinfo.sh, leading to compilation failues.
>>
>> The following patch addresses both issues.  Solaris 10 and 11 bootstraps
>> have completed, a Solaris 12 bootstrap is still running make check.
>
> Thanks.  Committed to mainline.

Great, thanks.  The mkssysinfo.sh part is also necessary on the gcc-5
branch.  Tested on i386-pc-solaris2.12 and sparc-sun-solaris2.12, ok to
install?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

1 2 >

1 - 100 of 128 matches

Mail list logo