[PATCH, committed] PR67826 gcc/fortran/openmp.c:1808: bad test ?

2015-11-11 Thread Dominique d'Humières
I have committed on trunk the following patch as revision r230148 (preapproved 
by Jakub Jelinek and tested on x86_64-apple-darwin14)

Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog   (revision 230147)
+++ gcc/fortran/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2015-11-11  Dominique d'Humieres 
+
+   PR fortran/67826
+   * openmp.c (gfc_omp_udr_find): Fix typo.
+
 2015-11-08  Steven g. Kargl  
 
PR fortran/68053
Index: gcc/fortran/openmp.c
===
--- gcc/fortran/openmp.c(revision 230147)
+++ gcc/fortran/openmp.c(working copy)
@@ -1820,7 +1820,7 @@
   for (omp_udr = st->n.omp_udr; omp_udr; omp_udr = omp_udr->next)
 if (omp_udr->ts.type == ts->type
|| ((omp_udr->ts.type == BT_DERIVED || omp_udr->ts.type == BT_CLASS)
-   && (ts->type == BT_DERIVED && ts->type == BT_CLASS)))
+   && (ts->type == BT_DERIVED || ts->type == BT_CLASS)))
   {
if (omp_udr->ts.type == BT_DERIVED || omp_udr->ts.type == BT_CLASS)
  {

Dominique



Re: [PATCH, 6/16] Add pass_oacc_kernels

2015-11-11 Thread Richard Biener
On Mon, 9 Nov 2015, Tom de Vries wrote:

> On 09/11/15 16:35, Tom de Vries wrote:
> > Hi,
> > 
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.
> > 
> > The patch series contains these patches:
> > 
> >   1Insert new exit block only when needed in
> >  transform_to_exit_first_loop_alt
> >   2Make create_parallel_loop return void
> >   3Ignore reduction clause on kernels directive
> >   4Implement -foffload-alias
> >   5Add in_oacc_kernels_region in struct loop
> >   6Add pass_oacc_kernels
> >   7Add pass_dominator_oacc_kernels
> >   8Add pass_ch_oacc_kernels
> >   9Add pass_parallelize_loops_oacc_kernels
> >  10Add pass_oacc_kernels pass group in passes.def
> >  11Update testcases after adding kernels pass group
> >  12Handle acc loop directive
> >  13Add c-c++-common/goacc/kernels-*.c
> >  14Add gfortran.dg/goacc/kernels-*.f95
> >  15Add libgomp.oacc-c-c++-common/kernels-*.c
> >  16Add libgomp.oacc-fortran/kernels-*.f95
> > 
> > The first 9 patches are more or less independent, but patches 10-16 are
> > intended to be committed at the same time.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > Build and reg-tested with nvidia accelerator, in combination with a
> > patch that enables accelerator testing (which is submitted at
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > 
> > I'll post the individual patches in reply to this message.
> 
> this patchs add a pass group pass_oacc_kernels (which will be added to the
> pass list as a whole in patch 10).

Just to understand (while also skimming the HSA patches).

You are basically relying on autopar for what the HSA patches call
"gridification"?  That is, OMP lowering produces loopy kernels
and autopar then will basically strip the outermost loop?

Richard.

> Atm, the parallelization behaviour for the kernels region is controlled by
> flag_tree_parallelize_loops, which is also used to control generic
> auto-parallelization by autopar using omp. That is not ideal, and we may want
> a separate flag (or param) to control the behaviour for oacc kernels, f.i.
> -foacc-kernels-gang-parallelize=. I'm open to suggestions.
> 
> The purpose of the pass group as a whole is to massage the offloaded function
> into a shape that parloops can deal with it, and then run parloops on it.
> 
> Consider a testcase with a reduction, and a loop counter declared outside the
> offload region:
> ...
> unsigned int a[n];
> 
> unsigned int
> foo (void)
> {
>   int i;
>   unsigned int sum = 1;
> 
> #pragma acc kernels copyin (a[0:n]) copy (sum)
>   {
> for (i = 0; i < n; ++i)
>   sum += a[i];
>   }
> 
>   return sum;
> }
> ...
> 
> After ealias, the loop body looks like this:
> ...
>   :
>   _8 = *.omp_data_i_3(D).a;
>   _9 = *.omp_data_i_3(D).i;
>   _10 = *_9;
>   _11 = *_8[_10];
>   _12 = *.omp_data_i_3(D).sum;
>   sum.0_13 = *_12;
>   sum.1_14 = _11 + sum.0_13;
>   _15 = *.omp_data_i_3(D).sum;
>   *_15 = sum.1_14;
>   _17 = *.omp_data_i_3(D).i;
>   _18 = *_17;
>   _19 = *.omp_data_i_3(D).i;
>   _20 = _18 + 1;
>   *_19 = _20;
>   goto ;
> ...
> In other words, the iteration variable is in memory, as is the reduction
> variable, and the body contains lots of loop invariant loads.
> 
> At the end of the pass group, just before parloops, the body has been
> rewritten to have a local iteration variable and a local reduction variable,
> and all the loop invariant loads have been moved out of the loop:
> ...
>   :
>   # _27 = PHI <0(2), _20(5)>
>   # D__lsm.7_28 = PHI 
>   _11 = *_8[_27];
>   sum.1_14 = _11 + D__lsm.7_28;
>   _20 = _27 + 1;
>   if (_20 <= )
> goto ;
>   else
> goto ;
> ...
> 
> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


RE: [RFC][PATCH] Preferred rename register in regrename pass

2015-11-11 Thread Robert Suchanek
Hi,

> I guess this is ok to stop the failures for now, but you may want to
> move the check to the point where we set terminated_this_insn. Also, as
> I pointed out earlier, clearing terminated_this_insn should probably
> happen earlier.

Here is the updated patch that I'm about to commit once the bootstrap
finishes.

Regards,
Robert

gcc/
* regname.c (scan_rtx_reg): Check the matching number of consecutive
registers when tying chains.
(build_def_use): Move terminated_this_insn earlier in the function.
---
 gcc/regrename.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/regrename.c b/gcc/regrename.c
index d727dd9..d41410a 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1068,7 +1068,9 @@ scan_rtx_reg (rtx_insn *insn, rtx *loc, enum reg_class 
cl, enum scan_actions act
  && GET_CODE (pat) == SET
  && GET_CODE (SET_DEST (pat)) == REG
  && GET_CODE (SET_SRC (pat)) == REG
- && terminated_this_insn)
+ && terminated_this_insn
+ && terminated_this_insn->nregs
+== REG_NREGS (recog_data.operand[1]))
{
  gcc_assert (terminated_this_insn->regno
  == REGNO (recog_data.operand[1]));
@@ -1593,6 +1595,7 @@ build_def_use (basic_block bb)
  enum rtx_code set_code = SET;
  enum rtx_code clobber_code = CLOBBER;
  insn_rr_info *insn_info = NULL;
+ terminated_this_insn = NULL;
 
  /* Process the insn, determining its effect on the def-use
 chains and live hard registers.  We perform the following
@@ -1749,8 +1752,6 @@ build_def_use (basic_block bb)
  scan_rtx (insn,  (note, 0), ALL_REGS, mark_read,
OP_INOUT);
 
- terminated_this_insn = NULL;
-
  /* Step 4: Close chains for registers that die here, unless
 the register is mentioned in a REG_UNUSED note.  In that
 case we keep the chain open until step #7 below to ensure
-- 
2.4.


[patch] libstdc++/56158 Extend valid values of iostream bitmask types

2015-11-11 Thread Jonathan Wakely

As described in the PR, we have operator~ overloads defined for
enumeration types which produce values outside the range of valid
values for the type. In C++11 that can be trivially solved by giving
the enumeration types a fixed underlying type, but this code needs to
be valid in C++03 too.

This patch defines new min/max enumerators as INT_MIN/INT_MAX so that
every int value is also a valid value for the bitmask type.

Does anyone see any problems with this solution, or better solutions?

Any suggestions for how to test this, given that GCC's ubsan doesn't
check for this, and we can't run the testsuite with ubsan anyway?

commit ac2fc08a638c7f01ecb5b9e9b3d3a58caf031534
Author: Jonathan Wakely 
Date:   Wed Nov 11 09:25:38 2015 +

Extend valid values of iostream bitmask types

	PR libstdc++/56158
	* include/bits/ios_base.h (_Ios_Fmtflags, _Ios_Openmode, _Ios_Iostate):
	Define enumerators to ensure all values of type int are valid values
	of the enumeration type.
	* testsuite/27_io/ios_base/types/fmtflags/case_label.cc: Add new cases.
	* testsuite/27_io/ios_base/types/iostate/case_label.cc: Likewise.
	* testsuite/27_io/ios_base/types/openmode/case_label.cc: Likewise.

diff --git a/libstdc++-v3/include/bits/ios_base.h b/libstdc++-v3/include/bits/ios_base.h
index 44029ad..ba4ef92 100644
--- a/libstdc++-v3/include/bits/ios_base.h
+++ b/libstdc++-v3/include/bits/ios_base.h
@@ -74,7 +74,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _S_adjustfield 	= _S_left | _S_right | _S_internal,
   _S_basefield 	= _S_dec | _S_oct | _S_hex,
   _S_floatfield 	= _S_scientific | _S_fixed,
-  _S_ios_fmtflags_end = 1L << 16 
+  _S_ios_fmtflags_end = 1L << 16,
+  _S_ios_fmtflags_max = __INT_MAX__,
+  _S_ios_fmtflags_min = ~(int)__INT_MAX__
 };
 
   inline _GLIBCXX_CONSTEXPR _Ios_Fmtflags
@@ -114,7 +116,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _S_in 		= 1L << 3,
   _S_out 		= 1L << 4,
   _S_trunc 		= 1L << 5,
-  _S_ios_openmode_end = 1L << 16 
+  _S_ios_openmode_end = 1L << 16,
+  _S_ios_openmode_max = __INT_MAX__,
+  _S_ios_openmode_min = ~(int)__INT_MAX__
 };
 
   inline _GLIBCXX_CONSTEXPR _Ios_Openmode
@@ -152,7 +156,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _S_badbit 		= 1L << 0,
   _S_eofbit 		= 1L << 1,
   _S_failbit		= 1L << 2,
-  _S_ios_iostate_end = 1L << 16 
+  _S_ios_iostate_end = 1L << 16,
+  _S_ios_iostate_max = __INT_MAX__,
+  _S_ios_iostate_min = ~(int)__INT_MAX__
 };
 
   inline _GLIBCXX_CONSTEXPR _Ios_Iostate
diff --git a/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc b/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc
index 591e371..e8820c5 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/types/fmtflags/case_label.cc
@@ -70,5 +70,9 @@ case_labels(bitmask_type b)
   break;
 case std::_S_ios_fmtflags_end:
   break;
+case std::_S_ios_fmtflags_min:
+  break;
+case std::_S_ios_fmtflags_max:
+  break;
 }
 }
diff --git a/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/case_label.cc b/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/case_label.cc
index 44fb44e..4e4e4f5 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/case_label.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/types/iostate/case_label.cc
@@ -42,5 +42,9 @@ case_labels(bitmask_type b)
   break;
 case std::_S_ios_iostate_end:
   break;
+case std::_S_ios_iostate_min:
+  break;
+case std::_S_ios_iostate_max:
+  break;
 }
 }
diff --git a/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/case_label.cc b/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/case_label.cc
index 267f8a2..8c6672f6 100644
--- a/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/case_label.cc
+++ b/libstdc++-v3/testsuite/27_io/ios_base/types/openmode/case_label.cc
@@ -46,5 +46,9 @@ case_labels(bitmask_type b)
   break;
 case std::_S_ios_openmode_end:
   break;
+case std::_S_ios_openmode_min:
+  break;
+case std::_S_ios_openmode_max:
+  break;
 }
 }


Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

2015-11-11 Thread Richard Biener
On Mon, 9 Nov 2015, Tom de Vries wrote:

> On 09/11/15 16:35, Tom de Vries wrote:
> > Hi,
> > 
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.
> > 
> > The patch series contains these patches:
> > 
> >   1Insert new exit block only when needed in
> >  transform_to_exit_first_loop_alt
> >   2Make create_parallel_loop return void
> >   3Ignore reduction clause on kernels directive
> >   4Implement -foffload-alias
> >   5Add in_oacc_kernels_region in struct loop
> >   6Add pass_oacc_kernels
> >   7Add pass_dominator_oacc_kernels
> >   8Add pass_ch_oacc_kernels
> >   9Add pass_parallelize_loops_oacc_kernels
> >  10Add pass_oacc_kernels pass group in passes.def
> >  11Update testcases after adding kernels pass group
> >  12Handle acc loop directive
> >  13Add c-c++-common/goacc/kernels-*.c
> >  14Add gfortran.dg/goacc/kernels-*.f95
> >  15Add libgomp.oacc-c-c++-common/kernels-*.c
> >  16Add libgomp.oacc-fortran/kernels-*.f95
> > 
> > The first 9 patches are more or less independent, but patches 10-16 are
> > intended to be committed at the same time.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > Build and reg-tested with nvidia accelerator, in combination with a
> > patch that enables accelerator testing (which is submitted at
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > 
> > I'll post the individual patches in reply to this message.
> > 
> 
> This patch adds the pass_oacc_kernels pass group to the pass list in
> passes.def.
> 
> Note the repetition of pass_lim/pass_copy_prop. The first pair is for an inner
> loop in a loop nest, the second for an outer loop in a loop nest.

@@ -86,6 +86,27 @@ along with GCC; see the file COPYING3.  If not see
  /* pass_build_ealias is a dummy pass that ensures that we
 execute TODO_rebuild_alias at this point.  */
  NEXT_PASS (pass_build_ealias);
+ /* Pass group that runs when there are oacc kernels in the
+function.  */
+ NEXT_PASS (pass_oacc_kernels);
+ PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+ NEXT_PASS (pass_dominator_oacc_kernels);
+ NEXT_PASS (pass_ch_oacc_kernels);
+ NEXT_PASS (pass_dominator_oacc_kernels);
+ NEXT_PASS (pass_tree_loop_init);
+ NEXT_PASS (pass_lim);
+ NEXT_PASS (pass_copy_prop);
+ NEXT_PASS (pass_lim);
+ NEXT_PASS (pass_copy_prop);

iterate lim/copyprop twice?!  Why's that needed?

+ NEXT_PASS (pass_scev_cprop);

What's that for?  It's supposed to help removing loops - I don't
expect kernels to vanish.

+ NEXT_PASS (pass_tree_loop_done);
+ NEXT_PASS (pass_dominator_oacc_kernels);

Three times DOM?  No please.  I wonder why you don't run oacc_kernels
after FRE and drop the initial DOM(s).

+ NEXT_PASS (pass_dce);
+ NEXT_PASS (pass_tree_loop_init);
+ NEXT_PASS (pass_parallelize_loops_oacc_kernels);
+ NEXT_PASS (pass_expand_omp_ssa);
+ NEXT_PASS (pass_tree_loop_done);

The switches into/outof tree_loop also look odd to me, but well
(they'll be controlled by -ftree-loop-optimize)).

+ POP_INSERT_PASSES ()

Please get some more sense into this pass pipeline.

Richard.


> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] Fix PR rtl-optimization/68287

2015-11-11 Thread Martin Liška
Hi.

There's a fix for fallout of r230027.

Patch can bootstrap and survives regression tests on x86_64-linux-gnu.

Ready for trunk?
Thanks,
Martin
>From 127d629991d92ea42a87b84e9d88612b84dbec03 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 11 Nov 2015 10:11:20 +0100
Subject: [PATCH 1/2] Fix PR rtl-optimization/68287

gcc/ChangeLog:

2015-11-11  Martin Liska  

	PR rtl-optimization/68287
	* lra-lives.c (lra_create_live_ranges_1): Clear the vector
	with zeros.
---
 gcc/lra-lives.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 9453759..27887de 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -1242,7 +1242,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
   curr_point = 0;
   unsigned new_length = get_max_uid () * 2;
   if (point_freq_vec.length () < new_length)
-point_freq_vec.safe_grow (new_length);
+point_freq_vec.safe_grow_cleared (new_length);
   lra_point_freq = point_freq_vec.address ();
   int *post_order_rev_cfg = XNEWVEC (int, last_basic_block_for_fn (cfun));
   int n_blocks_inverted = inverted_post_order_compute (post_order_rev_cfg);
-- 
2.6.2



Re: Short-cut generation of simple built-in functions

2015-11-11 Thread Richard Biener
On Tue, Nov 10, 2015 at 10:24 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Sat, Nov 7, 2015 at 2:31 PM, Richard Sandiford
>>  wrote:
>>> This patch short-circuits the builtins.c expansion code for a particular
>>> gimple call if:
>>>
>>> - the function has an associated internal function
>>> - the target implements that internal function
>>> - the call has no side effects
>>>
>>> This allows a later patch to remove the builtins.c code, once calls with
>>> side effects have been handled.
>>>
>>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>>> OK to install?
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>> gcc/
>>> * builtins.h (called_as_built_in): Declare.
>>> * builtins.c (called_as_built_in): Make external.
>>> * internal-fn.h (expand_internal_call): Define a variant that
>>> specifies the internal function explicitly.
>>> * internal-fn.c (expand_load_lanes_optab_fn)
>>> (expand_store_lanes_optab_fn, expand_ANNOTATE, 
>>> expand_GOMP_SIMD_LANE)
>>> (expand_GOMP_SIMD_VF, expand_GOMP_SIMD_LAST_LANE)
>>> (expand_GOMP_SIMD_ORDERED_START, expand_GOMP_SIMD_ORDERED_END)
>>> (expand_UBSAN_NULL, expand_UBSAN_BOUNDS, expand_UBSAN_VPTR)
>>> (expand_UBSAN_OBJECT_SIZE, expand_ASAN_CHECK, expand_TSAN_FUNC_EXIT)
>>> (expand_UBSAN_CHECK_ADD, expand_UBSAN_CHECK_SUB)
>>> (expand_UBSAN_CHECK_MUL, expand_ADD_OVERFLOW, expand_SUB_OVERFLOW)
>>> (expand_MUL_OVERFLOW, expand_LOOP_VECTORIZED)
>>> (expand_mask_load_optab_fn, expand_mask_store_optab_fn)
>>> (expand_ABNORMAL_DISPATCHER, expand_BUILTIN_EXPECT, expand_VA_ARG)
>>> (expand_UNIQUE, expand_GOACC_DIM_SIZE, expand_GOACC_DIM_POS)
>>> (expand_GOACC_LOOP, expand_GOACC_REDUCTION, expand_direct_optab_fn)
>>> (expand_unary_optab_fn, expand_binary_optab_fn): Add an internal_fn
>>> argument.
>>> (internal_fn_expanders): Update prototype.
>>> (expand_internal_call): Define a variant that specifies the
>>> internal function explicitly. Use it to implement the previous
>>> interface.
>>> * cfgexpand.c (expand_call_stmt): Try to expand calls to built-in
>>> functions as calls to internal functions.
>>>
>>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>>> index f65011e..bbcc7dc3 100644
>>> --- a/gcc/builtins.c
>>> +++ b/gcc/builtins.c
>>> @@ -222,7 +222,7 @@ is_builtin_fn (tree decl)
>>> of the optimization level.  This means whenever a function is invoked 
>>> with
>>> its "internal" name, which normally contains the prefix "__builtin".  */
>>>
>>> -static bool
>>> +bool
>>>  called_as_built_in (tree node)
>>>  {
>>>/* Note that we must use DECL_NAME, not DECL_ASSEMBLER_NAME_SET_P since
>>> diff --git a/gcc/builtins.h b/gcc/builtins.h
>>> index 917eb90..1d00068 100644
>>> --- a/gcc/builtins.h
>>> +++ b/gcc/builtins.h
>>> @@ -50,6 +50,7 @@ extern struct target_builtins *this_target_builtins;
>>>  extern bool force_folding_builtin_constant_p;
>>>
>>>  extern bool is_builtin_fn (tree);
>>> +extern bool called_as_built_in (tree);
>>>  extern bool get_object_alignment_1 (tree, unsigned int *,
>>> unsigned HOST_WIDE_INT *);
>>>  extern unsigned int get_object_alignment (tree);
>>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>>> index bfbc958..dc7d4f5 100644
>>> --- a/gcc/cfgexpand.c
>>> +++ b/gcc/cfgexpand.c
>>> @@ -2551,10 +2551,25 @@ expand_call_stmt (gcall *stmt)
>>>return;
>>>  }
>>>
>>> +  /* If this is a call to a built-in function and it has no effect other
>>> + than setting the lhs, try to implement it using an internal function
>>> + instead.  */
>>> +  decl = gimple_call_fndecl (stmt);
>>> +  if (gimple_call_lhs (stmt)
>>> +  && !gimple_vdef (stmt)
>>
>> I think you want && ! gimple_has_side_effects (stmt)
>> instead of checking !gimple_vdef (stmt).
>
> OK, I can do that, but what would the difference be in practice for
> these types of call?  I.e. are there cases for built-ins where:
>
>   (A) gimple_vdef (stmt) && !gimple_side_effects (stmt)
>
> or:
>
>   (B) !gimple_vdef (stmt) && gimple_side_effects (stmt)
>
> ?

There was talk to make calls use volatile to prevent CSE and friends.

Using gimple_has_side_effects is just the better check.

> It just seems like this check should be the opposite of the one used
> in the call-cdce patch (when deciding whether to optimise a call
> with an lhs).  In order to keep them in sync I'd need to use
> gimple_side_effects rather than gimple_vdef there too, but is
> (B) a possibility there?

Not sure if the tests should be in-sync.

I'm also not sure what you really want to check with

>>> +  /* If this is a call to a built-in function and it has no effect other
>>> + than setting the lhs, try to implement it using an internal function
>>> + instead. 

Re: [PATCH, 4/16] Implement -foffload-alias

2015-11-11 Thread Richard Biener
On Mon, 9 Nov 2015, Tom de Vries wrote:

> On 09/11/15 16:35, Tom de Vries wrote:
> > Hi,
> > 
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.
> > 
> > The patch series contains these patches:
> > 
> >   1Insert new exit block only when needed in
> >  transform_to_exit_first_loop_alt
> >   2Make create_parallel_loop return void
> >   3Ignore reduction clause on kernels directive
> >   4Implement -foffload-alias
> >   5Add in_oacc_kernels_region in struct loop
> >   6Add pass_oacc_kernels
> >   7Add pass_dominator_oacc_kernels
> >   8Add pass_ch_oacc_kernels
> >   9Add pass_parallelize_loops_oacc_kernels
> >  10Add pass_oacc_kernels pass group in passes.def
> >  11Update testcases after adding kernels pass group
> >  12Handle acc loop directive
> >  13Add c-c++-common/goacc/kernels-*.c
> >  14Add gfortran.dg/goacc/kernels-*.f95
> >  15Add libgomp.oacc-c-c++-common/kernels-*.c
> >  16Add libgomp.oacc-fortran/kernels-*.f95
> > 
> > The first 9 patches are more or less independent, but patches 10-16 are
> > intended to be committed at the same time.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > Build and reg-tested with nvidia accelerator, in combination with a
> > patch that enables accelerator testing (which is submitted at
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > 
> > I'll post the individual patches in reply to this message.
> 
> this patch addresses the problem that once the offloading region has been
> split off from the original function, alias analysis can no longer use
> information available in the original function that would allow it to do a
> more precise analysis for the offloading function. [ At some point we could
> use fipa-pta for that, as discussed in PR46032, but that's not feasible now. ]
> 
> The basic idea behind the patch is that for typical usage, the base pointers
> used in an offloaded region are non-aliasing. The patch works by adding
> restrict to the types of the fields used to pass data to an offloading region.
> 
> 
> The patch implements a new option
> -foffload-alias=.
> 
> The option -foffload-alias=none instructs the compiler to assume that
> object references and pointer dereferences in an offload region do not
> alias.
> 
> The option -foffload-alias=pointer instructs the compiler to assume that
> objects references in an offload region do not alias.
> 
> The option -foffload-alias=all instructs the compiler to make no
> assumptions about aliasing in offload regions.
> 
> The default value is -foffload-alias=none.

I think global options for this is nonsense.  Please follow what
we do for #pragma GCC ivdep for example, thus allow the alias
behavior to be specified per "region" (whatever makes sense here
in the context of offloading).

Thanks,
Richard.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH, 2/16] Make create_parallel_loop return void

2015-11-11 Thread Richard Biener
On Mon, 9 Nov 2015, Tom de Vries wrote:

> On 09/11/15 16:35, Tom de Vries wrote:
> > Hi,
> > 
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.
> > 
> > The patch series contains these patches:
> > 
> >   1Insert new exit block only when needed in
> >  transform_to_exit_first_loop_alt
> >   2Make create_parallel_loop return void
> >   3Ignore reduction clause on kernels directive
> >   4Implement -foffload-alias
> >   5Add in_oacc_kernels_region in struct loop
> >   6Add pass_oacc_kernels
> >   7Add pass_dominator_oacc_kernels
> >   8Add pass_ch_oacc_kernels
> >   9Add pass_parallelize_loops_oacc_kernels
> >  10Add pass_oacc_kernels pass group in passes.def
> >  11Update testcases after adding kernels pass group
> >  12Handle acc loop directive
> >  13Add c-c++-common/goacc/kernels-*.c
> >  14Add gfortran.dg/goacc/kernels-*.f95
> >  15Add libgomp.oacc-c-c++-common/kernels-*.c
> >  16Add libgomp.oacc-fortran/kernels-*.f95
> > 
> > The first 9 patches are more or less independent, but patches 10-16 are
> > intended to be committed at the same time.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > Build and reg-tested with nvidia accelerator, in combination with a
> > patch that enables accelerator testing (which is submitted at
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > 
> > I'll post the individual patches in reply to this message.
> 
> this patch makes create_parallel_loop return void.  The result is currently
> unused.

Ok.

Richard.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH, 4/16] Implement -foffload-alias

2015-11-11 Thread Jakub Jelinek
On Wed, Nov 11, 2015 at 11:51:02AM +0100, Richard Biener wrote:
> > The option -foffload-alias=pointer instructs the compiler to assume that
> > objects references in an offload region do not alias.
> > 
> > The option -foffload-alias=all instructs the compiler to make no
> > assumptions about aliasing in offload regions.
> > 
> > The default value is -foffload-alias=none.
> 
> I think global options for this is nonsense.  Please follow what
> we do for #pragma GCC ivdep for example, thus allow the alias
> behavior to be specified per "region" (whatever makes sense here
> in the context of offloading).

Yeah, completely agreed.  I don't see why the offloaded region would be in
any way special, they are C/C++/Fortran code as any other.
What we can and should improve is teach IPA aliasing/points to analysis
about the way we lower the host vs. offloading region boundary, so that
if alias analysis on the caller of GOMP_target_ext/GOACC_parallel_keyed
determines something it can be used on the offloaded function side and vice
versa, but a switch like the above is just wrong.

Jakub


Re: [Patch] PR tree-optimization/68234 Improve range info for loop Phi node

2015-11-11 Thread Richard Biener
On Wed, 11 Nov 2015, Jiong Wang wrote:

> As discussed at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68234, this
> patch haven't touch the existed code logic in vrp_visit_phi_node, it
> only entends SCEV check to those VR_VARYING loop PHI node.
> 
> Previously, we only do this check if the PHI node is with valid range
> info but later dropped either side to infinite. The miss of those PHI
> node with initial estimiation of VR_VARYING caused the missing of some
> further optimization opportunity, for example the testcase included in
> this patch, with improved range info, we can efficient turn the signed
> divide into right shift.
> 
> This patch pass x86-64 and AArch64 boostrap, no regression on both.
> Meanwhile a simple benchmaring shows there are quite a few new VR_RANGE
> found after this patch during gcc bootstrapping. There is no performance
> regression on spec2006 int on aarch64.
> 
> During gcc bootstrapping, on x86-64 there are 4828 new VR_VARYING -> VR_RANGE
> found by vrp1, and 5008 new by vrp2.
> 
> While on AArch64 there are 44756 new by vrp1, and 6047 new by vrp2.
> 
> OK for trunk?

Ok.

Thanks,
Richard.

> 2015-11-11  Richard Biener  
> Jiong Wang  
> gcc/
>   PR tree-optimization/68234
>   * tree-vrp.c (vrp_visit_phi_node): Extend SCEV check to those loop PHI
>   node which estimiated to be VR_VARYING initially.
> 
> gcc/testsuite/
>   * gcc.dg/tree-ssa/pr68234.c: New testcase.


Re: [PATCH, 7/16] Add pass_dominator_oacc_kernels

2015-11-11 Thread Richard Biener
On Mon, 9 Nov 2015, Tom de Vries wrote:

> On 09/11/15 16:35, Tom de Vries wrote:
> > Hi,
> > 
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.
> > 
> > The patch series contains these patches:
> > 
> >   1Insert new exit block only when needed in
> >  transform_to_exit_first_loop_alt
> >   2Make create_parallel_loop return void
> >   3Ignore reduction clause on kernels directive
> >   4Implement -foffload-alias
> >   5Add in_oacc_kernels_region in struct loop
> >   6Add pass_oacc_kernels
> >   7Add pass_dominator_oacc_kernels
> >   8Add pass_ch_oacc_kernels
> >   9Add pass_parallelize_loops_oacc_kernels
> >  10Add pass_oacc_kernels pass group in passes.def
> >  11Update testcases after adding kernels pass group
> >  12Handle acc loop directive
> >  13Add c-c++-common/goacc/kernels-*.c
> >  14Add gfortran.dg/goacc/kernels-*.f95
> >  15Add libgomp.oacc-c-c++-common/kernels-*.c
> >  16Add libgomp.oacc-fortran/kernels-*.f95
> > 
> > The first 9 patches are more or less independent, but patches 10-16 are
> > intended to be committed at the same time.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > Build and reg-tested with nvidia accelerator, in combination with a
> > patch that enables accelerator testing (which is submitted at
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > 
> > I'll post the individual patches in reply to this message.
> 
> this patch adds pass_dominator_oacc_kernels (which we may as well call
> pass_dominator_no_peel_loop_headers. It doesn't do anything
> oacc-kernels-specific), to be used in the kernels pass group.
> 
> The reason I'm adding a new pass instead of using pass_dominator is that
> pass_dominator uses first_pass_instance. So adding a pass_dominator instance A
> before a pass_dominator instance B has the unexpected consequence that it may
> change the behaviour of instance B. I've filed PR68247 - "Remove
> pass_first_instance" to note this issue.

This looks ok (minus my comments to patch #10)

Richard.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[patch] Fix PR target/67265

2015-11-11 Thread Eric Botcazou
Hi,

this is an ICE on an asm statement requiring a lot of registers, when compiled 
in 32-bit mode on x86/Linux with -O -fstack-check -fPIC:

pr67265.c:10:3: error: 'asm' operand has impossible constraints

The issue is that, since stack checking defines STACK_CHECK_MOVING_SP on this 
platform, the frame pointer is necessary in order to be able to propagate 
exceptions raised on stack overflow.  But this is required only in Ada so we 
can certainly avoid doing it in C or C++.

Tested on x86_64-suse-linux, OK for all active branches ? (that's a regression 
wrt the old stack checking implementation)


2015-11-11  Eric Botcazou  

PR target/67265
* ira.c (ira_setup_eliminable_regset): Do not necessarily create the
frame pointer for stack checking if non-call exceptions aren't used.


2015-11-11  Eric Botcazou  

* gcc.target/i386/pr67265.c: New test.

-- 
Eric BotcazouIndex: ira.c
===
--- ira.c	(revision 230146)
+++ ira.c	(working copy)
@@ -2261,7 +2261,10 @@ ira_setup_eliminable_regset (void)
|| (cfun->calls_alloca && EXIT_IGNORE_STACK)
/* We need the frame pointer to catch stack overflow exceptions
 	  if the stack pointer is moving.  */
-   || (flag_stack_check && STACK_CHECK_MOVING_SP)
+   || (STACK_CHECK_MOVING_SP
+	   && flag_stack_check
+	   && flag_exceptions
+	   && cfun->can_throw_non_call_exceptions)
|| crtl->accesses_prior_frames
|| (SUPPORTS_STACK_ALIGNMENT && crtl->stack_realign_needed)
/* We need a frame pointer for all Cilk Plus functions that use
/* PR target/67265 */
/* Reduced testcase by Johannes Dewender  */

/* { dg-do compile } */
/* { dg-options "-O -fstack-check -fPIC" } */

int a, b, c, d, e;

void foo (void)
{
  __asm__("" : "+r"(c), "+r"(e), "+r"(d), "+r"(a) : ""(b), "mg"(foo), "mm"(c));
}


Re: Enable pointer TBAA for LTO

2015-11-11 Thread Richard Biener
On Tue, 10 Nov 2015, Jan Hubicka wrote:

> > > Index: tree.c
> > > ===
> > > --- tree.c(revision 229968)
> > > +++ tree.c(working copy)
> > > @@ -13198,6 +13198,7 @@ gimple_canonical_types_compatible_p (con
> > >/* If the types have been previously registered and found equal
> > >   they still are.  */
> > >if (TYPE_CANONICAL (t1) && TYPE_CANONICAL (t2)
> > > +  && !POINTER_TYPE_P (t1) && !POINTER_TYPE_P (t2)
> > 
> > But TYPE_CANONICAL (t1) should be NULL_TREE for POINTER_TYPE_P?
> 
> The reason is that TYPE_CANONICAL is initialized in get_alias_set that may be
> called before we finish all merging and then it is more fine grained than what
> we need here (i.e. TYPE_CANONICAL of pointers to two differnt types will be
> different, but here we want them to be equal so we can match:
> 
> struct aa { void *ptr;};
> struct bb { int * ptr;};
> 
> Which is actually required for Fortran interoperability.
> 
> Removing this hunk triggers false type incompatibility warning in one of the
> interoperability testcases I added.

Ok, I see.

> Even if I drop the code bellow setting TYPE_CANOINCAL, I think I need to keep
> this conditional: the types may be built in and those get TYPE_CANONICAL set 
> as
> they are constructed by build_pointer_type.  I can gcc_checking_assert for 
> this
> scenario and see.  Perhaps we never build LTO type from builtin type and this
> won't happen. If we did, we would probably have a trouble with false negatives
> in return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2); on non-pointers anyway.

Hmm, indeed.  The various type builders might end up setting 
TYPE_CANONICAL if you ever run into a pre-defined pointer type
(ptr_type_node for example).

> > 
> > >&& trust_type_canonical)
> > >  return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2);
> > >  
> > > Index: alias.c
> > > ===
> > > --- alias.c   (revision 229968)
> > > +++ alias.c   (working copy)
> > > @@ -869,13 +874,19 @@ get_alias_set (tree t)
> > >set = lang_hooks.get_alias_set (t);
> > >if (set != -1)
> > >   return set;
> > > -  return 0;
> > > +  /* LTO frontend does not assign canonical types to pointers (which 
> > > we
> > > +  ignore anyway) and we compute them.  The following path may be
> > > +  probably enabled for non-LTO, too, and it may improve TBAA for
> > > +  pointers to types with structural equality.  */
> > > +  if (!in_lto_p || !POINTER_TYPE_P (t))
> > > +return 0;
> > 
> > No new LTO paths please, do the suggested change immediately.
> 
> OK, I originally tested the patch without if and there was no problems.
> Just chickened out before preparing final version of the patch.
> > > +   p = TYPE_MAIN_VARIANT (p);
> > > +   /* Normally all pointer types are built by
> > > +  build_pointer_type_for_mode which ensures they have 
> > > canonical
> > > +  type unless they point to type with structural equality.
> > > +  LTO frontend produce pointer types without TYPE_CANONICAL
> > > +  that are then added to TYPE_POINTER_TO lists and 
> > > +  build_pointer_type_for_mode will end up picking one for us.
> > > +  Declare it the canonical one.  This is the same as
> > > +  build_pointer_type_for_mode would do. */
> > > +   if (!TYPE_CANONICAL (p))
> > > + {
> > > +   TYPE_CANONICAL (p) = p;
> > > +   gcc_checking_assert (in_lto_p);
> > > + }
> > > +   else
> > > + gcc_checking_assert (p == TYPE_CANONICAL (p));
> > 
> > The assert can trigger as
> > build_pointer_type_for_mode builds SET_TYPE_STRUCTURAL_EQUALITY pointer
> > types for SET_TYPE_STRUCTURAL_EQUALITY pointed-to types.  Ah,
> > looking up more context reveals
> > 
> >   if (TREE_CODE (p) == VOID_TYPE || TYPE_STRUCTURAL_EQUALITY_P (p))
> > set = get_alias_set (ptr_type_node);
> 
> Yep, we don't get here.
> > 
> > Not sure why you adjust TYPE_CANONICAL here at all either.
> 
> You are right, I may probably just drop all the code and just do:
> gcc_checking_assert (!TYPE_CANONICAL || p == TYPE_CANONICAL (p));
> I will test this and re-think the build_pointer_type code to be sure that we
> won't get into a problem there.
> 
> As I recall, the original code
>   p = TYPE_CANONICAL (p);
> was there to permit frontends to glob two pointers by setting same canonical
> type to them.

Yes.

>  My original plan was to use this for LTO frotnend and make
> gimple_compare_canonical_types to do the right thing for pointers and this 
> would
> follow gimple_compare_canonical_types globbing then.
> 
> This idea was wrong: since pointer rules are not transitive (i.e. void
> * alias them all), we can't model that by an equivalence produced by
> gimple_compare_canonical_types.
> 
> Since the assert does not trigger, seems no frontend is doing 

Re: [PATCH, 1/16] Insert new exit block only when needed in transform_to_exit_first_loop_alt

2015-11-11 Thread Richard Biener
On Mon, 9 Nov 2015, Tom de Vries wrote:

> On 09/11/15 16:35, Tom de Vries wrote:
> > Hi,
> > 
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.
> > 
> > The patch series contains these patches:
> > 
> >   1Insert new exit block only when needed in
> >  transform_to_exit_first_loop_alt
> >   2Make create_parallel_loop return void
> >   3Ignore reduction clause on kernels directive
> >   4Implement -foffload-alias
> >   5Add in_oacc_kernels_region in struct loop
> >   6Add pass_oacc_kernels
> >   7Add pass_dominator_oacc_kernels
> >   8Add pass_ch_oacc_kernels
> >   9Add pass_parallelize_loops_oacc_kernels
> >  10Add pass_oacc_kernels pass group in passes.def
> >  11Update testcases after adding kernels pass group
> >  12Handle acc loop directive
> >  13Add c-c++-common/goacc/kernels-*.c
> >  14Add gfortran.dg/goacc/kernels-*.f95
> >  15Add libgomp.oacc-c-c++-common/kernels-*.c
> >  16Add libgomp.oacc-fortran/kernels-*.f95
> > 
> > The first 9 patches are more or less independent, but patches 10-16 are
> > intended to be committed at the same time.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > Build and reg-tested with nvidia accelerator, in combination with a
> > patch that enables accelerator testing (which is submitted at
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > 
> > I'll post the individual patches in reply to this message.
> > 
> 
> In transform_to_exit_first_loop_alt we insert a new exit block  in between the
> new loop header and the old exit block. Currently, we also do this if this is
> not necessary.
> 
> This patch figures out when we need to insert a new exit block, and only then
> inserts it.

Ok.

Richard.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers

2015-11-11 Thread Kyrill Tkachov

Hi Charles,

On 08/11/15 00:26, charles.bay...@linaro.org wrote:

From: Charles Baylis 

  Charles Baylis  

* config/arm/neon.md (neon_vld1_lane): Remove error for invalid
lane number.
(neon_vst1_lane): Likewise.
(neon_vld2_lane): Likewise.
(neon_vst2_lane): Likewise.
(neon_vld3_lane): Likewise.
(neon_vst3_lane): Likewise.
(neon_vld4_lane): Likewise.
(neon_vst4_lane): Likewise.

Change-Id: Id7b4b6fa7320157e62e5bae574b4c4688d921774
---
  gcc/config/arm/neon.md | 48 
  1 file changed, 8 insertions(+), 40 deletions(-)

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index e8db020..6574e6e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4264,8 +4264,6 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
operands[3] = GEN_INT (lane);
-  if (lane < 0 || lane >= max)
-error ("lane out of range");
if (max == 1)
  return "vld1.\t%P0, %A1";
else
@@ -4286,9 +4284,7 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
operands[3] = GEN_INT (lane);
int regno = REGNO (operands[0]);
-  if (lane < 0 || lane >= max)
-error ("lane out of range");
-  else if (lane >= max / 2)
+  if (lane >= max / 2)
  {
lane -= max / 2;
regno += 2;
@@ -4372,8 +4368,6 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2]));
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
operands[2] = GEN_INT (lane);
-  if (lane < 0 || lane >= max)
-error ("lane out of range");
if (max == 1)
  return "vst1.\t{%P1}, %A0";
else
@@ -4393,9 +4387,7 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2]));
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
int regno = REGNO (operands[1]);
-  if (lane < 0 || lane >= max)
-error ("lane out of range");
-  else if (lane >= max / 2)
+  if (lane >= max / 2)
  {
lane -= max / 2;
regno += 2;
@@ -4464,8 +4456,6 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
int regno = REGNO (operands[0]);
rtx ops[4];
-  if (lane < 0 || lane >= max)
-error ("lane out of range");


In this pattern the 'max' variable is now unused, causing a bootstrap -Werror 
failure on arm.
I'll test a patch to fix it unless you beat me to it...

Thanks,
Kyrill


ops[0] = gen_rtx_REG (DImode, regno);
ops[1] = gen_rtx_REG (DImode, regno + 2);
ops[2] = operands[1];
@@ -4489,9 +4479,7 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
int regno = REGNO (operands[0]);
rtx ops[4];
-  if (lane < 0 || lane >= max)
-error ("lane out of range");
-  else if (lane >= max / 2)
+  if (lane >= max / 2)
  {
lane -= max / 2;
regno += 2;
@@ -4579,8 +4567,6 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
int regno = REGNO (operands[1]);
rtx ops[4];
-  if (lane < 0 || lane >= max)
-error ("lane out of range");
ops[0] = operands[0];
ops[1] = gen_rtx_REG (DImode, regno);
ops[2] = gen_rtx_REG (DImode, regno + 2);
@@ -4604,9 +4590,7 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
int regno = REGNO (operands[1]);
rtx ops[4];
-  if (lane < 0 || lane >= max)
-error ("lane out of range");
-  else if (lane >= max / 2)
+  if (lane >= max / 2)
  {
lane -= max / 2;
regno += 2;
@@ -4723,8 +4707,6 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
int regno = REGNO (operands[0]);
rtx ops[5];
-  if (lane < 0 || lane >= max)
-error ("lane out of range");
ops[0] = gen_rtx_REG (DImode, regno);
ops[1] = gen_rtx_REG (DImode, regno + 2);
ops[2] = gen_rtx_REG (DImode, regno + 4);
@@ -4750,9 +4732,7 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
int regno = REGNO (operands[0]);
rtx ops[5];
-  if (lane < 0 || lane >= max)
-error ("lane out of range");
-  else if (lane >= max / 2)
+  if (lane >= max / 2)
  {
lane -= max / 2;
regno += 2;
@@ -4895,8 +4875,6 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
int regno = REGNO (operands[1]);
rtx ops[5];
-  if (lane < 0 || lane >= max)
-error ("lane out of range");
ops[0] = operands[0];
ops[1] = gen_rtx_REG (DImode, regno);
ops[2] = gen_rtx_REG (DImode, regno + 2);
@@ -4922,9 +4900,7 @@ if (BYTES_BIG_ENDIAN)
HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
int regno = REGNO (operands[1]);
rtx ops[5];
-  if (lane < 0 || lane >= max)
-error ("lane out of range");
-  else if (lane >= max / 2)
+  if (lane >= max / 2)
  {
lane -= max / 2;
regno += 2;
@@ -5045,8 +5021,6 @@ if (BYTES_BIG_ENDIAN)

[patch] libstdc++/64651 allow rethrow_exception to be found by ADL

2015-11-11 Thread Jonathan Wakely

As I wrote in the PR, the standard doesn't require that
std::rethrow_exception can be found by ADL, because exception_ptr is
not necessarily defined in namespace std. This ensures it will be
found.

Tested powerpc64le-linux, committed to trunk.
commit 920a18f991d3604bf2dfdf9679411b012964f23d
Author: Jonathan Wakely 
Date:   Wed Nov 11 09:56:22 2015 +

	PR libstdc++/64651
	* libsupc++/exception_ptr.h (rethrow_exception): Add using-declaration
	to __exception_ptr namespace.
	* testsuite/18_support/exception_ptr/rethrow_exception.cc: Test ADL.
	Remove unnecessary test variables.

diff --git a/libstdc++-v3/libsupc++/exception_ptr.h b/libstdc++-v3/libsupc++/exception_ptr.h
index 8fbad1c..7821c14 100644
--- a/libstdc++-v3/libsupc++/exception_ptr.h
+++ b/libstdc++-v3/libsupc++/exception_ptr.h
@@ -68,6 +68,8 @@ namespace std
 
   namespace __exception_ptr
   {
+using std::rethrow_exception;
+
 /**
  *  @brief An opaque pointer to an arbitrary exception.
  *  @ingroup exceptions
diff --git a/libstdc++-v3/testsuite/18_support/exception_ptr/rethrow_exception.cc b/libstdc++-v3/testsuite/18_support/exception_ptr/rethrow_exception.cc
index 31da2ec..7d39892 100644
--- a/libstdc++-v3/testsuite/18_support/exception_ptr/rethrow_exception.cc
+++ b/libstdc++-v3/testsuite/18_support/exception_ptr/rethrow_exception.cc
@@ -30,7 +30,6 @@
 
 void test01()
 {
-  bool test __attribute__((unused)) = true;
   using namespace std;
 
   try {
@@ -54,7 +53,6 @@ void test02()
 
 void test03()
 {
-  bool test __attribute__((unused)) = true;
   using namespace std;
 
   exception_ptr ep;
@@ -71,7 +69,6 @@ void test03()
 
 void test04()
 {
-  bool test __attribute__((unused)) = true;
   using namespace std;
 
   // Weave the exceptions in an attempt to confuse the machinery.
@@ -103,12 +100,23 @@ void test04()
   }
 }
 
+void test05()
+{
+  // libstdc++/64651 std::rethrow_exception not found by ADL
+  // This is not required to work but is a conforming extension.
+  try {
+rethrow_exception(std::make_exception_ptr(0));
+  } catch(...) {
+  }
+}
+
 int main()
 {
   test01();
   test02();
   test03();
   test04();
+  test05();
 
   return 0;
 }


[PATCH 03/N] Just another set of memory leaks

2015-11-11 Thread Martin Liška
Hi.

There are new fixed for memory leaks, where the following:

==19826== 21 bytes in 1 blocks are definitely lost in loss record 16 of 625
==19826==at 0x4C2A00F: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==19826==by 0x16868D7: xmalloc (xmalloc.c:148)
==19826==by 0x167FDFB: concat (concat.c:147)
==19826==by 0x932920: 
gcc::dump_manager::get_dump_file_name(dump_file_info*) const (dumpfile.c:292)
==19826==by 0x932821: gcc::dump_manager::get_dump_file_name(int) const 
(dumpfile.c:253)
==19826==by 0xC1BBEA: pass_init_dump_file(opt_pass*) (passes.c:2074)
==19826==by 0xC1C31B: execute_one_pass(opt_pass*) (passes.c:2302)
==19826==by 0xC1D25E: execute_ipa_pass_list(opt_pass*) (passes.c:2735)
==19826==by 0x8EED23: symbol_table::compile() (cgraphunit.c:2411)
==19826==by 0x8EEF93: symbol_table::finalize_compilation_unit() 
(cgraphunit.c:2540)
==19826==by 0xD205EE: compile_file() (toplev.c:491)
==19826==by 0xD229AF: do_compile() (toplev.c:1954)

happens in context:

(gdb) p dump_file_name
$1 = 0x23e46d0 "ipa-pta-1.c.067i.pta"
(gdb) c
Continuing.

Breakpoint 2, pass_init_dump_file (pass=0x238c7c0) at ../../gcc/passes.c:2074
2074  dump_file_name = dumps->get_dump_file_name 
(pass->static_pass_number);
(gdb) bt
#0  pass_init_dump_file (pass=0x238c7c0) at ../../gcc/passes.c:2074
#1  0x00c1bebe in execute_one_ipa_transform_pass (node=0x76a01450, 
ipa_pass=0x238c7c0) at ../../gcc/passes.c:2172
#2  0x00c1c07f in execute_all_ipa_transforms () at 
../../gcc/passes.c:2223
#3  0x008e39e0 in cgraph_node::get_body (this=0x76a01450) at 
../../gcc/cgraph.c:3299
#4  0x00f1469f in ipa_pta_execute () at 
../../gcc/tree-ssa-structalias.c:7344
#5  0x00f15465 in (anonymous namespace)::pass_ipa_pta::execute 
(this=0x238cb50) at ../../gcc/tree-ssa-structalias.c:7664
#6  0x00c1c384 in execute_one_pass (pass=0x238cb50) at 
../../gcc/passes.c:2316
#7  0x00c1d25f in execute_ipa_pass_list (pass=0x238cb50) at 
../../gcc/passes.c:2735
#8  0x008eed24 in symbol_table::compile (this=0x768d30a8) at 
../../gcc/cgraphunit.c:2411
#9  0x008eef94 in symbol_table::finalize_compilation_unit 
(this=0x768d30a8) at ../../gcc/cgraphunit.c:2540
#10 0x00d205ef in compile_file () at ../../gcc/toplev.c:491
#11 0x00d229b0 in do_compile () at ../../gcc/toplev.c:1954
#12 0x00d22c2f in toplev::main (this=0x7fffd910, argc=23, 
argv=0x7fffda18) at ../../gcc/toplev.c:2061
#13 0x01619ad4 in main (argc=23, argv=0x7fffda18) at 
../../gcc/main.c:39

Rest should be quite obvious.

Bootstrap and regression tests have been running.

Ready to install after it finishes?
Martin
>From 96dfceff2522b352d465016b48da4ea42e9e3ffc Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 10 Nov 2015 17:32:31 +0100
Subject: [PATCH 1/2] Fix various memory leaks

gcc/ChangeLog:

2015-11-11  Martin Liska  

	* gimple-ssa-strength-reduction.c (create_phi_basis):
	Use auto_vec.
	* passes.c (release_dump_file_name): New function.
	(pass_init_dump_file): Used from this function.
	(pass_fini_dump_file): Likewise.
	* tree-sra.c (convert_callers_for_node): Use xstrdup_for_dump.
	* var-tracking.c (vt_initialize): Use pool_allocator.
---
 gcc/gimple-ssa-strength-reduction.c |  3 +--
 gcc/passes.c| 19 ++-
 gcc/tree-sra.c  |  4 ++--
 gcc/var-tracking.c  |  2 +-
 4 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/gcc/gimple-ssa-strength-reduction.c b/gcc/gimple-ssa-strength-reduction.c
index ce32ad3..b807823 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -2226,12 +2226,11 @@ create_phi_basis (slsr_cand_t c, gimple *from_phi, tree basis_name,
   int i;
   tree name, phi_arg;
   gphi *phi;
-  vec phi_args;
   slsr_cand_t basis = lookup_cand (c->basis);
   int nargs = gimple_phi_num_args (from_phi);
   basic_block phi_bb = gimple_bb (from_phi);
   slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result (from_phi));
-  phi_args.create (nargs);
+  auto_vec phi_args (nargs);
 
   /* Process each argument of the existing phi that represents
  conditionally-executed add candidates.  */
diff --git a/gcc/passes.c b/gcc/passes.c
index 7a10cb6..dd8d00a 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2058,6 +2058,18 @@ verify_curr_properties (function *fn, void *data)
   gcc_assert ((fn->curr_properties & props) == props);
 }
 
+/* Release dump file name if set.  */
+
+static void
+release_dump_file_name (void)
+{
+  if (dump_file_name)
+{
+  free (CONST_CAST (char *, dump_file_name));
+  dump_file_name = NULL;
+}
+}
+
 /* Initialize pass dump file.  */
 /* This is non-static so that the plugins can use it.  */
 
@@ -2071,6 +2083,7 @@ pass_init_dump_file (opt_pass *pass)
   gcc::dump_manager *dumps = g->get_dumps ();
 

Re: [PATCH] Simple optimization for MASK_STORE.

2015-11-11 Thread Richard Biener
On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich  wrote:
> 2015-11-10 17:46 GMT+03:00 Richard Biener :
>> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich  
>> wrote:
>>> 2015-11-10 15:33 GMT+03:00 Richard Biener :
 On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> I tried it but 256-bit precision integer type is not yet supported.

 What's the symptom?  The compare cannot be expanded?  Just add a pattern 
 then.
 After all we have modes up to XImode.
>>>
>>> I suppose problem may be in:
>>>
>>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128)
>>>
>>> which doesn't allow to create constants of bigger size.  Changing it
>>> to maximum vector size (512) would mean we increase wide_int structure
>>> size significantly. New patterns are probably also needed.
>>
>> Yes, new patterns are needed but wide-int should be fine (we only need to 
>> create
>> a literal zero AFACS).  The "new pattern" would be equality/inequality
>> against zero
>> compares only.
>
> Currently 256bit integer creation fails because wide_int for max and
> min values cannot be created.

Hmm, indeed:

#1  0x0072dab5 in wi::extended_tree<192>::extended_tree (
this=0x7fffd950, t=0x76a000b0)
at /space/rguenther/src/svn/trunk/gcc/tree.h:5125
5125  gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);

but that's not that the constants fail to be created but

#5  0x010d8828 in build_nonstandard_integer_type (precision=512,
unsignedp=65) at /space/rguenther/src/svn/trunk/gcc/tree.c:8051
8051  if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
(gdb) l
8046fixup_unsigned_type (itype);
8047  else
8048fixup_signed_type (itype);
8049
8050  ret = itype;
8051  if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
8052ret = type_hash_canon (tree_to_uhwi (TYPE_MAX_VALUE
(itype)), itype);

thus the integer type hashing being "interesting".  tree_fits_uhwi_p
fails because
it does

7289bool
7290tree_fits_uhwi_p (const_tree t)
7291{
7292  return (t != NULL_TREE
7293  && TREE_CODE (t) == INTEGER_CST
7294  && wi::fits_uhwi_p (wi::to_widest (t)));
7295}

and wi::to_widest () fails with doing

5121template 
5122inline wi::extended_tree ::extended_tree (const_tree t)
5123  : m_t (t)
5124{
5125  gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);
5126}

fixing the hashing then runs into type_cache_hasher::equal doing
tree_int_cst_equal
which again uses to_widest (it should be easier and cheaper to do the compare on
the actual tree representation, but well, seems to be just the first
of various issues
we'd run into).

We eventually could fix the assert above (but then need to hope we assert
when a computation overflows the narrower precision of widest_int) or use
a special really_widest_int (ugh).

> It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases
> WIDE_INT_MAX_ELTS
> and thus increases wide_int structure. If we use 512 for
> MAX_BITSIZE_MODE_ANY_INT then
> wide_int structure would grow by 48 bytes (16 bytes if use 256 for
> MAX_BITSIZE_MODE_ANY_INT).
> Is it OK for such narrow usage?

widest_int is used in some long-living structures (which is the reason for
MAX_BITSIZE_MODE_ANY_INT in the first place).  So I don't think so.

Richard.

> Ilya
>
>>
>> Richard.
>>
>>> Ilya
>>>

 Richard.

> Yuri.
>
>


[Patch] PR tree-optimization/68234 Improve range info for loop Phi node

2015-11-11 Thread Jiong Wang

As discussed at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68234, this
patch haven't touch the existed code logic in vrp_visit_phi_node, it
only entends SCEV check to those VR_VARYING loop PHI node.

Previously, we only do this check if the PHI node is with valid range
info but later dropped either side to infinite. The miss of those PHI
node with initial estimiation of VR_VARYING caused the missing of some
further optimization opportunity, for example the testcase included in
this patch, with improved range info, we can efficient turn the signed
divide into right shift.

This patch pass x86-64 and AArch64 boostrap, no regression on both.
Meanwhile a simple benchmaring shows there are quite a few new VR_RANGE
found after this patch during gcc bootstrapping. There is no performance
regression on spec2006 int on aarch64.

During gcc bootstrapping, on x86-64 there are 4828 new VR_VARYING -> 
VR_RANGE

found by vrp1, and 5008 new by vrp2.

While on AArch64 there are 44756 new by vrp1, and 6047 new by vrp2.

OK for trunk?

2015-11-11  Richard Biener  
Jiong Wang  
gcc/
  PR tree-optimization/68234
  * tree-vrp.c (vrp_visit_phi_node): Extend SCEV check to those loop PHI
  node which estimiated to be VR_VARYING initially.

gcc/testsuite/
  * gcc.dg/tree-ssa/pr68234.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68234.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68234.c
new file mode 100644
index 000..e7c2a95
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68234.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp2" } */
+
+extern int nc;
+void ff (unsigned long long);
+
+void
+f (void)
+{
+  unsigned char resp[1024];
+  int c;
+  int bl = 0;
+  unsigned long long *dwords = (unsigned long long *) (resp + 5);
+  for (c = 0; c < nc; c++)
+{
+  /* PR middle-end/68234, this signed division should be optimized into
+	 right shift as vrp pass should deduct range info of 'bl' falls into
+	 positive number.  */
+  ff (dwords[bl / 64]);
+  bl++;
+}
+}
+
+/* { dg-final { scan-tree-dump ">> 6" "vrp2" } } */
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index f7c3168..a6f93f9 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -8789,20 +8789,11 @@ vrp_visit_phi_node (gphi *phi)
 
   /* If we dropped either bound to +-INF then if this is a loop
 	 PHI node SCEV may known more about its value-range.  */
-  if ((cmp_min > 0 || cmp_min < 0
+  if (cmp_min > 0 || cmp_min < 0
 	   || cmp_max < 0 || cmp_max > 0)
-	  && (l = loop_containing_stmt (phi))
-	  && l->header == gimple_bb (phi))
-	adjust_range_with_scev (_result, l, phi, lhs);
-
-  /* If we will end up with a (-INF, +INF) range, set it to
-	 VARYING.  Same if the previous max value was invalid for
-	 the type and we end up with vr_result.min > vr_result.max.  */
-  if ((vrp_val_is_max (vr_result.max)
-	   && vrp_val_is_min (vr_result.min))
-	  || compare_values (vr_result.min,
-			 vr_result.max) > 0)
-	goto varying;
+	goto scev_check;
+
+  goto infinite_check;
 }
 
   /* If the new range is different than the previous value, keep
@@ -8828,8 +8819,28 @@ update_range:
   /* Nothing changed, don't add outgoing edges.  */
   return SSA_PROP_NOT_INTERESTING;
 
-  /* No match found.  Set the LHS to VARYING.  */
 varying:
+  set_value_range_to_varying (_result);
+
+scev_check:
+  /* If this is a loop PHI node SCEV may known more about its value-range.
+ scev_check can be reached from two paths, one is a fall through from above
+ "varying" label, the other is direct goto from code block which tries to
+ avoid infinite simulation.  */
+  if ((l = loop_containing_stmt (phi))
+  && l->header == gimple_bb (phi))
+adjust_range_with_scev (_result, l, phi, lhs);
+
+infinite_check:
+  /* If we will end up with a (-INF, +INF) range, set it to
+ VARYING.  Same if the previous max value was invalid for
+ the type and we end up with vr_result.min > vr_result.max.  */
+  if ((vr_result.type == VR_RANGE || vr_result.type == VR_ANTI_RANGE)
+  && !((vrp_val_is_max (vr_result.max) && vrp_val_is_min (vr_result.min))
+	   || compare_values (vr_result.min, vr_result.max) > 0))
+goto update_range;
+
+  /* No match found.  Set the LHS to VARYING.  */
   set_value_range_to_varying (lhs_vr);
   return SSA_PROP_VARYING;
 }


Re: [PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset

2015-11-11 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00233.html

Thanks,
Kyrill
On 03/11/15 15:43, Kyrill Tkachov wrote:

Hi all,

This patch slightly improves sequences where we want to compare against a 
complex immediate and branch against the result
or perform a cset on it.
This means transforming sequences of mov+movk+cmp+branch into sub+subs+branch.
Similar for cset. Unfortunately I can't just do this by simply matching a 
(compare (reg) (const_int)) rtx because
this transformation is only valid for equal/not equal comparisons, not greater 
than/less than ones but the compare instruction
pattern only has the general CC mode. We need to also match the use of the 
condition code.

I've done this by creating a splitter for the conditional jump where the 
condition is the comparison between the register
and the complex immediate and splitting it into the sub+subs+condjump sequence. 
Similar for the cstore pattern.
Thankfully we don't split immediate moves until later in the optimization 
pipeline so combine can still try the right patterns.
With this patch for the example code:
void g(void);
void f8(int x)
{
   if (x != 0x123456) g();
}

I get:
f8:
sub w0, w0, #1191936
subsw0, w0, #1110
beq .L1
b   g
.p2align 3
.L1:
ret

instead of the previous:
f8:
mov w1, 13398
movkw1, 0x12, lsl 16
cmp w0, w1
beq .L1
b   g
.p2align 3
.L1:
ret


The condjump case triggered 130 times across all of SPEC2006 which is, 
admittedly, not much
whereas the cstore case didn't trigger at all. However, the included testcase 
in the patch
demonstrates the kind of code that it would trigger on.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill


2015-11-03  Kyrylo Tkachov  

* config/aarch64/aarch64.md (*condjump): Rename to...
(condjump): ... This.
(*compare_condjump): New define_insn_and_split.
(*compare_cstore_insn): Likewise.
(*cstore_insn): Rename to...
(aarch64_cstore): ... This.
* config/aarch64/iterators.md (CMP): Handle ne code.
* config/aarch64/predicates.md (aarch64_imm24): New predicate.

2015-11-03  Kyrylo Tkachov  

* gcc.target/aarch64/cmpimm_branch_1.c: New test.
* gcc.target/aarch64/cmpimm_cset_1.c: Likewise.




Re: [PATCH 03/N] Just another set of memory leaks

2015-11-11 Thread Richard Biener
On Wed, Nov 11, 2015 at 10:04 AM, Martin Liška  wrote:
> Hi.
>
> There are new fixed for memory leaks, where the following:
>
> ==19826== 21 bytes in 1 blocks are definitely lost in loss record 16 of 625
> ==19826==at 0x4C2A00F: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==19826==by 0x16868D7: xmalloc (xmalloc.c:148)
> ==19826==by 0x167FDFB: concat (concat.c:147)
> ==19826==by 0x932920: 
> gcc::dump_manager::get_dump_file_name(dump_file_info*) const (dumpfile.c:292)
> ==19826==by 0x932821: gcc::dump_manager::get_dump_file_name(int) const 
> (dumpfile.c:253)
> ==19826==by 0xC1BBEA: pass_init_dump_file(opt_pass*) (passes.c:2074)
> ==19826==by 0xC1C31B: execute_one_pass(opt_pass*) (passes.c:2302)
> ==19826==by 0xC1D25E: execute_ipa_pass_list(opt_pass*) (passes.c:2735)
> ==19826==by 0x8EED23: symbol_table::compile() (cgraphunit.c:2411)
> ==19826==by 0x8EEF93: symbol_table::finalize_compilation_unit() 
> (cgraphunit.c:2540)
> ==19826==by 0xD205EE: compile_file() (toplev.c:491)
> ==19826==by 0xD229AF: do_compile() (toplev.c:1954)
>
> happens in context:
>
> (gdb) p dump_file_name
> $1 = 0x23e46d0 "ipa-pta-1.c.067i.pta"
> (gdb) c
> Continuing.
>
> Breakpoint 2, pass_init_dump_file (pass=0x238c7c0) at ../../gcc/passes.c:2074
> 2074  dump_file_name = dumps->get_dump_file_name 
> (pass->static_pass_number);
> (gdb) bt
> #0  pass_init_dump_file (pass=0x238c7c0) at ../../gcc/passes.c:2074
> #1  0x00c1bebe in execute_one_ipa_transform_pass 
> (node=0x76a01450, ipa_pass=0x238c7c0) at ../../gcc/passes.c:2172
> #2  0x00c1c07f in execute_all_ipa_transforms () at 
> ../../gcc/passes.c:2223
> #3  0x008e39e0 in cgraph_node::get_body (this=0x76a01450) at 
> ../../gcc/cgraph.c:3299
> #4  0x00f1469f in ipa_pta_execute () at 
> ../../gcc/tree-ssa-structalias.c:7344
> #5  0x00f15465 in (anonymous namespace)::pass_ipa_pta::execute 
> (this=0x238cb50) at ../../gcc/tree-ssa-structalias.c:7664
> #6  0x00c1c384 in execute_one_pass (pass=0x238cb50) at 
> ../../gcc/passes.c:2316
> #7  0x00c1d25f in execute_ipa_pass_list (pass=0x238cb50) at 
> ../../gcc/passes.c:2735
> #8  0x008eed24 in symbol_table::compile (this=0x768d30a8) at 
> ../../gcc/cgraphunit.c:2411
> #9  0x008eef94 in symbol_table::finalize_compilation_unit 
> (this=0x768d30a8) at ../../gcc/cgraphunit.c:2540
> #10 0x00d205ef in compile_file () at ../../gcc/toplev.c:491
> #11 0x00d229b0 in do_compile () at ../../gcc/toplev.c:1954
> #12 0x00d22c2f in toplev::main (this=0x7fffd910, argc=23, 
> argv=0x7fffda18) at ../../gcc/toplev.c:2061
> #13 0x01619ad4 in main (argc=23, argv=0x7fffda18) at 
> ../../gcc/main.c:39
>
> Rest should be quite obvious.
>
> Bootstrap and regression tests have been running.
>
> Ready to install after it finishes?

Ok.

Richard.

> Martin


Re: [patch] Fix PR target/67265

2015-11-11 Thread Bernd Schmidt

On 11/11/2015 12:38 PM, Eric Botcazou wrote:

this is an ICE on an asm statement requiring a lot of registers, when compiled
in 32-bit mode on x86/Linux with -O -fstack-check -fPIC:

pr67265.c:10:3: error: 'asm' operand has impossible constraints

The issue is that, since stack checking defines STACK_CHECK_MOVING_SP on this
platform, the frame pointer is necessary in order to be able to propagate
exceptions raised on stack overflow.  But this is required only in Ada so we
can certainly avoid doing it in C or C++.



/* We need the frame pointer to catch stack overflow exceptions
  if the stack pointer is moving.  */
-   || (flag_stack_check && STACK_CHECK_MOVING_SP)
+   || (STACK_CHECK_MOVING_SP
+  && flag_stack_check
+  && flag_exceptions
+  && cfun->can_throw_non_call_exceptions)


This piece of code along doesn't tell me exactly why the frame pointer 
is needed. I was looking for an explicit use, but I now guess that if 
you have multiple adjusts of the frame pointer you can't easily undo 
them in the error case (the function behaves as-if using alloca). Is 
that it? And without exceptions I assume you just get a call to abort so 
it doesn't matter? If I understood all that right, then this is ok.


In i386.c I see a code block with a similar condition,

  /* If the only reason for frame_pointer_needed is that we conservatively
 assumed stack realignment might be needed, but in the end nothing that
 needed the stack alignment had been spilled, clear 
frame_pointer_needed

 and say we don't need stack realignment.  */

and the condition has

  && !(flag_stack_check && STACK_CHECK_MOVING_SP)

Should that be changed too?


Bernd


Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Marek Polacek
On Tue, Nov 10, 2015 at 12:40:49PM -0700, Martin Sebor wrote:
> On 11/10/2015 09:36 AM, Marek Polacek wrote:
> >While both C and C++ FEs are able to reject e.g.
> >int a[__SIZE_MAX__ / sizeof(int)];
> >they are accepting code such as
> >int (*a)[__SIZE_MAX__ / sizeof(int)];
> >
> >As Joseph pointed out, any construction of a non-VLA type whose size is half 
> >or
> >more of the address space should receive a compile-time error.
> >
> >Done by moving up the check for the size in bytes so that it checks check 
> >every
> >non-VLA complete array type constructed in the course of processing the
> >declarator.  Since the C++ FE had the same problem, I've fixed it up there as
> >well.  And that's why I had to twek dg-error of two C++ tests; if the size of
> >an array is considered invalid, we give an error message with word "unnamed".
> >
> >(I've removed the comment about crashing in tree_to_[su]hwi since that seems
> >to no longer be the case.)
> 
> Thanks for including me on this. I tested it with C++ references
> to arrays (in addition to pointers) and it works correctly for
> those as well (unsurprisingly). The only thing that bothers me

Good, thanks!

> a bit is that the seemingly  arbitrary inconsistency between
> the diagnostics:
 
> >+p = new char [1][MAX - 99]; // { dg-error "size of unnamed 
> >array" }
> >  p = new char [1][MAX / 2];  // { dg-error "size of array" }
> 
> Would it be possible to make the message issued by the front ends
> the same? I.e., either both "unnamed array" or both just "array?"

Yeah, I was thinking about that, too, but I was also hoping that we can
clean this up as a follow-up.  I think let's drop the "unnamed" word, even
though that means that the changes in new44.C brought with my patch will
essentially have to be reverted...

Oh, and we could also be more informative and print the size of an array,
or the number of elements, as clang does.

Thanks,

Marek


[gomp4] Rework gimplifyier region flags

2015-11-11 Thread Nathan Sidwell
I've committed this patch to gomp4 to remove the openacc-specific enums from 
gimplify_omp_ctx.  Instead extending the existing omp_region_type enum.  A 
similar patch   will shortly be applied to trunk, now Jakub.s approved it.


If you had patches relying on  the old scheme, you'll need to update them.

nathan
2015-11-11  Nathan Sidwell  

	* gimplify.c (enum gimplify_omp_var_data): Remove GOVD_FORCE_MAP.
	(omp_region_type): Use hex. Add OpenACC members.
	(omp_region_kind, acc_region_kind): Delete.
	(gimplify_omp_ctx): Remove region_kind & acc_region_kind fields.
	(new_omp_context): Adjust default_kind setting.  Don't
	reinitialize fiels.
	(gimple_add_tmp_var): Add ORT_ACC check.
	(gimplify_var_or_parm_decl): Likewise.
	(omp_firstprivatize_variable): Likewise.
	(omp_add_variable): Adjust OpenACC detection.
	(oacc_default_clause): Reimplement.
	(omp_notice_variable): Adjust OpenACC detection.
	(gimplify_scan_omp_clauses): Remove region_kind arg. Adjust.
	(gimplify_scan_omp_clause_1): Adjust OpenACC detection.
	(gimmplify_oacc_cache, gimplify_oacc_declare,
	gimplify_oacc_host_data, gimplify_omp_parallel): Adjust.
	(gimplify_omp_for, gimplify_omp_workshare,
	gimplify_omp_target_update): Adjust for OpenACC ORT flags.
	(gimplify_expr): Likewise.
	(gimplify_body): Simplify OpenACC declare handling.

Index: gimplify.c
===
--- gimplify.c	(revision 230160)
+++ gimplify.c	(working copy)
@@ -89,10 +89,8 @@ enum gimplify_omp_var_data
 
   GOVD_USE_DEVICE = 1 << 17,
 
-  GOVD_FORCE_MAP = 1 << 18,
-
   /* OpenACC deviceptr clause.  */
-  GOVD_USE_DEVPTR = 1 << 19,
+  GOVD_USE_DEVPTR = 1 << 18,
 
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
 			   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
@@ -102,40 +100,37 @@ enum gimplify_omp_var_data
 
 enum omp_region_type
 {
-  ORT_WORKSHARE = 0,
-  ORT_SIMD = 1,
-  ORT_PARALLEL = 2,
-  ORT_COMBINED_PARALLEL = 3,
-  ORT_TASK = 4,
-  ORT_UNTIED_TASK = 5,
-  ORT_TEAMS = 8,
-  ORT_COMBINED_TEAMS = 9,
+  ORT_WORKSHARE = 0x00,
+  ORT_SIMD 	= 0x01,
+
+  ORT_PARALLEL	= 0x02,
+  ORT_COMBINED_PARALLEL = 0x03,
+
+  ORT_TASK	= 0x04,
+  ORT_UNTIED_TASK = 0x05,
+
+  ORT_TEAMS	= 0x08,
+  ORT_COMBINED_TEAMS = 0x09,
+
   /* Data region.  */
-  ORT_TARGET_DATA = 16,
+  ORT_TARGET_DATA = 0x10,
+
   /* Data region with offloading.  */
-  ORT_TARGET = 32,
-  ORT_COMBINED_TARGET = 33,
-  /* An OpenACC host-data region.  */
-  ORT_HOST_DATA = 64,
-  /* Dummy OpenMP region, used to disable expansion of
- DECL_VALUE_EXPRs in taskloop pre body.  */
-  ORT_NONE = 128
-};
+  ORT_TARGET	= 0x20,
+  ORT_COMBINED_TARGET = 0x21,
 
-enum omp_region_kind
-{
-  ORK_OMP,
-  ORK_OACC,
-  ORK_UNKNOWN
-};
+  ORT_HOST_DATA = 0x40,
 
-enum acc_region_kind
-{
-  ARK_GENERAL,  /* Default used for data, etc. regions.  */
-  ARK_PARALLEL, /* Parallel construct.  */
-  ARK_KERNELS,  /* Kernels construct.  */
-  ARK_DECLARE,  /* Declare directive.  */
-  ARK_UNKNOWN
+  /* OpenACC variants.  */
+  ORT_ACC	= 0x80,  /* A generic OpenACC region.  */
+  ORT_ACC_DATA	= ORT_ACC | ORT_TARGET_DATA, /* Data construct.  */
+  ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET,  /* Parallel construct */
+  ORT_ACC_KERNELS  = ORT_ACC | ORT_TARGET | 0x100,  /* Kernels construct.  */
+  ORT_ACC_HOST  = ORT_ACC | ORT_HOST_DATA,
+
+  /* Dummy OpenMP region, used to disable expansion of
+ DECL_VALUE_EXPRs in taskloop pre body.  */
+  ORT_NONE	= 0x200
 };
 
 /* Gimplify hashtable helper.  */
@@ -177,8 +172,6 @@ struct gimplify_omp_ctx
   location_t location;
   enum omp_clause_default_kind default_kind;
   enum omp_region_type region_type;
-  enum omp_region_kind region_kind;
-  enum acc_region_kind acc_region_kind;
   bool combined_loop;
   bool distribute;
   bool target_map_scalars_firstprivate;
@@ -404,19 +397,11 @@ new_omp_context (enum omp_region_type re
   c->variables = splay_tree_new (splay_tree_compare_decl_uid, 0, 0);
   c->privatized_types = new hash_set;
   c->location = input_location;
-  if ((region_type & (ORT_TASK | ORT_TARGET)) == 0)
+  c->region_type = region_type;
+  if ((region_type & ORT_TASK) == 0)
 c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   else
 c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
-  c->region_type = region_type;
-  c->region_kind = ORK_UNKNOWN;
-  c->acc_region_kind = ARK_UNKNOWN;
-  c->combined_loop = false;
-  c->distribute = false;
-  c->target_map_scalars_firstprivate = false;
-  c->target_map_pointers_as_0len_arrays = false;
-  c->target_firstprivatize_array_bases = false;
-  c->stmt = NULL;
 
   return c;
 }
@@ -730,7 +715,8 @@ gimple_add_tmp_var (tree tmp)
 	  struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp;
 	  while (ctx
 		 && (ctx->region_type == ORT_WORKSHARE
-		 || ctx->region_type == ORT_SIMD))
+		 || ctx->region_type == ORT_SIMD
+		 || ctx->region_type == ORT_ACC))
 	ctx = ctx->outer_context;
 	  if (ctx)
 	omp_add_variable (ctx, tmp, 

[PATCH] More compile-time saving in BB vectorization

2015-11-11 Thread Richard Biener

This saves some more compile-time avoiding vector size iteration for
trivial fails.  It also improves time spent by not giving up completely
for all SLP instances if one fails to vectorize because of alignment
issues.  And it sneaks in a correctness fix for a previous change.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2015-11-11  Richard Biener  

* tree-vectorizer.h (vect_slp_analyze_and_verify_instance_alignment):
Declare.
(vect_analyze_data_refs_alignment): Make loop vect specific.
(vect_verify_datarefs_alignment): Likewise.
* tree-vect-data-refs.c (vect_slp_analyze_data_ref_dependences):
Add missing continue.
(vect_compute_data_ref_alignment): Export.
(vect_compute_data_refs_alignment): Merge into...
(vect_analyze_data_refs_alignment): ... this.
(verify_data_ref_alignment): Split out from ...
(vect_verify_datarefs_alignment): ... here.
(vect_slp_analyze_and_verify_node_alignment): New function.
(vect_slp_analyze_and_verify_instance_alignment): Likewise.
* tree-vect-slp.c (vect_supported_load_permutation_p): Remove
misplaced checks on alignment.
(vect_slp_analyze_bb_1): Add fatal output parameter.  Do
alignment analysis after SLP discovery and do it per instance.
(vect_slp_bb): When vect_slp_analyze_bb_1 fatally failed do not
bother to re-try using different vector sizes.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h   (revision 230155)
--- gcc/tree-vectorizer.h   (working copy)
*** extern tree vect_get_smallest_scalar_typ
*** 1011,1018 
  extern bool vect_analyze_data_ref_dependences (loop_vec_info, int *);
  extern bool vect_slp_analyze_data_ref_dependences (bb_vec_info);
  extern bool vect_enhance_data_refs_alignment (loop_vec_info);
! extern bool vect_analyze_data_refs_alignment (vec_info *);
! extern bool vect_verify_datarefs_alignment (vec_info *);
  extern bool vect_analyze_data_ref_accesses (vec_info *);
  extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
  extern tree vect_check_gather_scatter (gimple *, loop_vec_info, tree *, tree 
*,
--- 1011,1019 
  extern bool vect_analyze_data_ref_dependences (loop_vec_info, int *);
  extern bool vect_slp_analyze_data_ref_dependences (bb_vec_info);
  extern bool vect_enhance_data_refs_alignment (loop_vec_info);
! extern bool vect_analyze_data_refs_alignment (loop_vec_info);
! extern bool vect_verify_datarefs_alignment (loop_vec_info);
! extern bool vect_slp_analyze_and_verify_instance_alignment (slp_instance);
  extern bool vect_analyze_data_ref_accesses (vec_info *);
  extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
  extern tree vect_check_gather_scatter (gimple *, loop_vec_info, tree *, tree 
*,
Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 230155)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_slp_analyze_data_ref_dependences (b
*** 645,650 
--- 645,651 
  (SLP_INSTANCE_TREE (instance))[0], 0);
  vect_free_slp_instance (instance);
  BB_VINFO_SLP_INSTANCES (bb_vinfo).ordered_remove (i);
+ continue;
}
i++;
  }
*** vect_slp_analyze_data_ref_dependences (b
*** 668,674 
 FOR NOW: No analysis is actually performed. Misalignment is calculated
 only for trivial cases. TODO.  */
  
! static bool
  vect_compute_data_ref_alignment (struct data_reference *dr)
  {
gimple *stmt = DR_STMT (dr);
--- 669,675 
 FOR NOW: No analysis is actually performed. Misalignment is calculated
 only for trivial cases. TODO.  */
  
! bool
  vect_compute_data_ref_alignment (struct data_reference *dr)
  {
gimple *stmt = DR_STMT (dr);
*** vect_compute_data_ref_alignment (struct
*** 838,882 
  }
  
  
- /* Function vect_compute_data_refs_alignment
- 
-Compute the misalignment of data references in the loop.
-Return FALSE if a data reference is found that cannot be vectorized.  */
- 
- static bool
- vect_compute_data_refs_alignment (vec_info *vinfo)
- {
-   vec datarefs = vinfo->datarefs;
-   struct data_reference *dr;
-   unsigned int i;
- 
-   FOR_EACH_VEC_ELT (datarefs, i, dr)
- {
-   stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr));
-   if (STMT_VINFO_VECTORIZABLE (stmt_info)
- && !vect_compute_data_ref_alignment (dr))
-   {
- /* Strided accesses perform only component accesses, misalignment
-information is irrelevant for them.  */
- if (STMT_VINFO_STRIDED_P (stmt_info)
- && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
-   continue;
- 
- if (is_a  (vinfo))
-   {
- /* Mark unsupported 

Re: [gomp4.5] Don't mark GOMP_MAP_FIRSTPRIVATE mapped vars addressable

2015-11-11 Thread Jakub Jelinek
On Wed, Nov 11, 2015 at 07:27:51PM +0300, Alexander Monakov wrote:
> > Alex reported to me privately that with the OpenMP 4.5 handling of
> > array section bases (that they are firstprivate instead of mapped)
> > we unnecessarily mark the pointers addressable and that result
> > in less efficient way of passing them as shared to inner constructs.
> 
> Thanks!  Would you be interested in further (minimized) cases where new
> implementation no longer manages to perform copy-in/copy-out optimization?
> E.g. the following.  Or I can try to put such cases in Bugzilla, if you like.
> 
> Alexander
> 
> void f(int *p, int n)
> {
>   int tmp;
> #pragma omp target map(to:p[0:n]) map(tofrom:tmp)
>   {
> #pragma omp parallel
> asm volatile ("" : "=r" (tmp) : "r" (p));
>   }
> 
> #pragma omp target
>   /* Missing optimization for 'tmp' here.  */
> #pragma omp parallel
> asm volatile ("" : : "r" (tmp));
> }

There is nothing to do in this case, map(tofrom:tmp) really
has to make tmp addressable, it needs to deal with its address,
and the copy-in/out optimization really relies on the var not being
addressable in any way; the problem is that you need to be 100% sure
that the thread invoking parallel owns the variable and nobody else
can do anything with the variable concurrently, otherwise the compiler
creates a data race that might not exist in the original program.
And OpenMP 4.5 says that on the second target, tmp is implicitly
firstprivate (tmp).  People who care about the generated code would
use firstprivate (tmp) on the second parallel anyway.

Jakub


Re: [PATCH v4] SH FDPIC backend support

2015-11-11 Thread Rich Felker
On Wed, Nov 11, 2015 at 09:56:42AM -0500, Rich Felker wrote:
> > > I'm actually
> > > trying to prepare a simpler FDPIC patch for other gcc versions we're
> > > interested in that's not so invasive, and for now I'm just having
> > > function_symbol replace SFUNC_STATIC with SFUNC_GOT on TARGET_FDPIC
> > > to
> > > avoid needing all the label stuff, but it would be nice to find a way
> > > to reuse the existing framework.
> > 
> > Do you know how this affects code size (and inherently performance)?
> 
> I suspect it makes very little difference, but to compare I'd need to
> do the same hack on 5.2.0 or trunk. The only difference should be one
> additional load per call, and one additional GOT slot per function
> called this way (but just once per executable/library).

Actually I think this is not quite right: if the call takes place via
the GOT, this also requires the initial r12 to be preserved somewhere
in order to load the function address, whereas for SFUNC_STATIC, the
initial r12 can be completely discarded, right? (SFUNC functions are
not permitted to use the GOT themselves as far as I can tell, and thus
do not receive the hidden GOT argument in r12.)

Rich


[PATCH] Fix detection of setrlimit in libstdc++ testsuite

2015-11-11 Thread Maxim Kuvyrkov
Hi,

This patch fixes an obscure cross-testing problem that crashed (OOMed) our 
boards at Linaro.  Several tests in libstdc++ (e.g., [1]) limit themselves to 
some reasonable amount of RAM and then try to allocate 32 gigs.  Unfortunately, 
the configure test that checks presence of setrlimit is rather strange: if 
target is native, then try compile file with call to setrlimit -- if 
compilation succeeds, then use setrlimit, otherwise, ignore setrlimit.  The 
strange part is that the compilation check is done only for native targets, as 
if cross-toolchains can't generate working executables.  [This is rather odd, 
and I might be missing some underlaying caveat.]

Therefore, when testing a cross toolchain, the test [1] still tries to allocate 
32GB of RAM with no setrlimit restrictions.  On most targets that people use 
for cross-testing this is not an issue because either
- the target is 32-bit, so there is no 32GB user-space to speak of, or
- the target board has small amount of RAM and no swap, so allocation 
immediately fails, or
- the target board has plenty of RAM, so allocating 32GB is not an issue.

However, if one is testing on a 64-bit board with 16GB or RAM and 16GB of swap, 
then one gets into an obscure near-OOM swapping condition.  This is exactly the 
case with cross-testing aarch64-linux-gnu toolchains on APM Mustang.

The attached patch removes "native" restriction from configure test for 
setrlimit.  This enables setrlimit restrictions on the testsuite, and the test 
[1] expectedly fails to allocate 32GB due to setrlimit restriction.

I have tested it on x86_64-linux-gnu and i686-linux-gnu native toolchains, and 
aarch64-linux-gnu and arm-linux-gnueabi[hf] cross-toolchains with no 
regressions [*].

OK to commit?

I didn't go as far as enabling setenv/locale tests when cross-testing libstdc++ 
because I remember of issues with generating locales in cross-built glibc.  In 
any case, locale tests are unlikely to OOM the test board the way that absence 
of setrlimit does.

[1] 27_io/ios_base/storage/2.cc

[*] Cross-testing using user-mode QEMU made 27_io/fpos/14775.cc execution test 
to FAIL.  This test uses setrlimit set max file size, and is misbehaving only 
under QEMU.  I believe this a QEMU issue with not handling setrlimit correctly.

--
Maxim Kuvyrkov
www.linaro.org




0001-Use-setrlimit-for-testing-libstdc-in-cross-toolchain.patch
Description: Binary data


Re: [OpenACC] declare directive

2015-11-11 Thread Jakub Jelinek
On Wed, Nov 11, 2015 at 11:08:21AM +0100, Thomas Schwinge wrote:
> Hi!
> 
> On Wed, 11 Nov 2015 09:32:33 +0100, Jakub Jelinek  wrote:
> > On Mon, Nov 09, 2015 at 05:11:44PM -0600, James Norris wrote:
> > > diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
> > > index 953c4e3..c6a2981 100644
> > > --- a/gcc/c-family/c-pragma.h
> > > +++ b/gcc/c-family/c-pragma.h
> > > @@ -30,6 +30,7 @@ enum pragma_kind {
> > >PRAGMA_OACC_ATOMIC,
> > >PRAGMA_OACC_CACHE,
> > >PRAGMA_OACC_DATA,
> > > +  PRAGMA_OACC_DECLARE,
> > >PRAGMA_OACC_ENTER_DATA,
> > >PRAGMA_OACC_EXIT_DATA,
> > >PRAGMA_OACC_KERNELS,
> > 
> > This change will make PR68271 even worse, so would be really nice to
> > get that fixed first.
> 
> "Would be really nice" means that you're asking us to work on and resolve
> PR68271 before installing this patch?

Dominique has committed a quick hack for this, so it is not urgent, but
would be nice to get it resolved.  If somebody from Mentor gets to that,
perfect, otherwise I (or somebody else) will get to that eventually.

Jakub


[gomp4.5] depend nowait support for target

2015-11-11 Thread Jakub Jelinek
Hi!

On Mon, Oct 19, 2015 at 10:47:54PM +0300, Ilya Verbin wrote:
> So, here is what I have for now.  Attached target-29.c testcase works fine 
> with
> MIC emul, however I don't know how to (and where) properly check for 
> completion
> of async execution on target.  And, similarly, where to do unmapping after 
> that?
> Do we need a callback from plugin to libgomp (as far as I understood, PTX
> runtime supports this, but HSA doesn't), or libgomp will just check for
> ttask->is_completed in task.c?

Here is the patch updated to have a task.c defined function that the plugin
can call upon completion of async offloading exection.
The testsuite coverage will need to improve, the testcase is wrong
(contains data races - if you want to test parallel running of two target
regions that both touch the same var, I'd say best would be to use
#pragma omp atomic and or in 4 in one case and 1 in another case, then
test if result is 5 (and similarly for the other var).
Also, with the usleeps Alex Monakov will be unhappy because PTX newlib does
not have it, but we'll need to find some solution for that.

Another thing to work on beyond testsuite coverage (it is desirable to test
nowait target tasks (both depend and without depend) being awaited in all
the various waiting spots, i.e. end of parallel, barrier, taskwait, end of
taskgroup, or if (0) task with depend clause waiting on that.

Also, I wonder what to do if #pragma omp target nowait is used outside of
(host) parallel - when team is NULL.  All the tasking code in that case just
executes tasks undeferred, which is fine for all but target nowait - there
it is I'd say useful to be able to run a single host thread concurrently
with some async offloading tasks.  So, I wonder if in that case,
if we encounter target nowait with team == NULL, should not just create a
dummy non-active (nthreads == 1) team, as if there was #pragma omp parallel
if (0) starting above it and ending at program's end.  In OpenMP, the
program's initial thread is implicitly surrounded by inactive parallel, so
this isn't anything against the OpenMP execution model.  But we'd need to
free the team somewhere in a destructor.

Can you please try to cleanup the liboffloadmic side of this, so that
a callback instead of hardcoded __gomp_offload_intelmic_async_completed call
is used?  Can you make sure it works on XeonPhi non-emulated too?

I'll keep working on the testcase coverage and on the team == NULL case.

The patch is on top of gomp-4_5-branch - needs Aldy's priority_queue stuff.

--- liboffloadmic/runtime/offload_host.cpp.jj   2015-11-05 11:31:05.013916598 
+0100
+++ liboffloadmic/runtime/offload_host.cpp  2015-11-10 12:58:55.090951303 
+0100
@@ -64,6 +64,9 @@ static void __offload_fini_library(void)
 #define GET_OFFLOAD_NUMBER(timer_data) \
 timer_data? timer_data->offload_number : 0
 
+extern "C" void
+__gomp_offload_intelmic_async_completed (const void *);
+
 extern "C" {
 #ifdef TARGET_WINNT
 // Windows does not support imports from libraries without actually
@@ -2507,7 +2510,7 @@ extern "C" {
 const void *info
 )
 {
-   /* TODO: Call callback function, pass info.  */
+   __gomp_offload_intelmic_async_completed (info);
 }
 }
 
--- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp.jj 2015-10-14 
10:24:10.922194230 +0200
+++ liboffloadmic/plugin/libgomp-plugin-intelmic.cpp2015-11-11 
15:48:55.428967827 +0100
@@ -192,11 +192,23 @@ GOMP_OFFLOAD_get_num_devices (void)
 
 static void
 offload (const char *file, uint64_t line, int device, const char *name,
-int num_vars, VarDesc *vars, VarDesc2 *vars2)
+int num_vars, VarDesc *vars, VarDesc2 *vars2, const void **async_data)
 {
   OFFLOAD ofld = __offload_target_acquire1 (, file, line);
   if (ofld)
-__offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL, NULL);
+{
+  if (async_data == NULL)
+   __offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL,
+   NULL);
+  else
+   {
+ OffloadFlags flags;
+ flags.flags = 0;
+ flags.bits.omp_async = 1;
+ __offload_offload3 (ofld, name, 0, num_vars, vars, NULL, 0, NULL,
+ async_data, 0, NULL, flags, NULL);
+   }
+}
   else
 {
   fprintf (stderr, "%s:%d: Offload target acquire failed\n", file, line);
@@ -218,7 +230,7 @@ GOMP_OFFLOAD_init_device (int device)
   TRACE ("");
   pthread_once (_image_is_registered, register_main_image);
   offload (__FILE__, __LINE__, device, "__offload_target_init_proc", 0,
-  NULL, NULL);
+  NULL, NULL, NULL);
 }
 
 extern "C" void
@@ -240,7 +252,7 @@ get_target_table (int device, int _f
   VarDesc2 vd1g[2] = { { "num_funcs", 0 }, { "num_vars", 0 } };
 
   offload (__FILE__, __LINE__, device, "__offload_target_table_p1", 2,
-  vd1, vd1g);
+  vd1, vd1g, NULL);
 
   int table_size = num_funcs + 2 * num_vars;
   if (table_size > 0)
@@ -254,7 +266,7 @@ 

Re: [PR64164] drop copyrename, integrate into expand

2015-11-11 Thread Alexandre Oliva
On Nov 10, 2015, Jeff Law  wrote:

>> * function.c (assign_parm_setup_block): Right-shift
>> upward-padded big-endian args when bypassing the stack slot.
> Don't you need to check the value of BLOCK_REG_PADDING at runtime?
> The padding is essentially allowed to vary.

Well, yeah, it's the result of BLOCK_REG_PADDING that tells whether
upward-padding occurred and shifting is required.

> If you  look at the other places where BLOCK_REG_PADDING is used, it's
> checked in a #ifdef, then again inside a if conditional.

That's what I do in the patch too.

That said, the initial conditions in the if/else-if/else chain for the
no-larger-than-a-word case cover all of the non-BLOCK_REG_PADDING cases
correctly, so that, if BLOCK_REG_PADDING is not defined, we can just
skip the !MEM_P block altogether.  That's also the reason why we can go
straight to shifting when we get there.

I tried to document my reasoning in the comments, but maybe it was still
too obscure?

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-11 Thread Dominique d'Humières
Revision r230175

> 2015-11-10  Ville Voutilainen  
>
> LWG 2510, make the default constructors of library tag types
> explicit.
> * include/bits/mutex.h (defer_lock_t, try_lock_t,
> adopt_lock_t): Add an explicit default constructor.
> * include/bits/stl_pair.h (piecewise_construct_t): Likewise.
> * include/bits/uses_allocator.h (allocator_arg_t): Likewise.
> * libsupc++/new (nothrow_t): Likewise.
> * testsuite/17_intro/tag_type_explicit_ctor.cc: New.

 breaks bootstrap

libtool: compile:  /opt/gcc/build_w/./gcc/xgcc -shared-libgcc 
-B/opt/gcc/build_w/./gcc -nostdinc++ 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/src 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/src/.libs 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/libsupc++/.libs 
-B/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/bin/ 
-B/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/lib/ -isystem 
/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/include -isystem 
/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/sys-include 
-I/opt/gcc/work/libstdc++-v3/../libgcc 
-I/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/include/x86_64-apple-darwin14.5.0
 -I/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/include 
-I/opt/gcc/work/libstdc++-v3/libsupc++ -D_GLIBCXX_SHARED 
-fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi 
-fdiagnostics-show-location=once -fvisibility-inlines-hidden 
-ffunction-sections -fdata-sections -frandom-seed=new_handler.lo -g -O2 
-std=gnu++11 -c ../../../../work/libstdc++-v3/libsupc++/new_handler.cc  
-fno-common -DPIC -D_GLIBCXX_SHARED -o new_handler.o
../../../../work/libstdc++-v3/libsupc++/new_handler.cc:37:39: error: converting 
to 'std::nothrow_t' from initializer list would use explicit constructor 
'constexpr std::nothrow_t::nothrow_t()'
 const std::nothrow_t std::nothrow = { };
   ^
see https://gcc.gnu.org/ml/gcc-regression/2015-11/

Dominique



Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-11 Thread Jonathan Wakely

On 11/11/15 18:17 +0100, Dominique d'Humières wrote:

Revision r230175


2015-11-10  Ville Voutilainen  

LWG 2510, make the default constructors of library tag types
explicit.
* include/bits/mutex.h (defer_lock_t, try_lock_t,
adopt_lock_t): Add an explicit default constructor.
* include/bits/stl_pair.h (piecewise_construct_t): Likewise.
* include/bits/uses_allocator.h (allocator_arg_t): Likewise.
* libsupc++/new (nothrow_t): Likewise.
* testsuite/17_intro/tag_type_explicit_ctor.cc: New.


breaks bootstrap

libtool: compile:  /opt/gcc/build_w/./gcc/xgcc -shared-libgcc 
-B/opt/gcc/build_w/./gcc -nostdinc++ 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/src 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/src/.libs 
-L/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/libsupc++/.libs 
-B/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/bin/ 
-B/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/lib/ -isystem 
/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/include -isystem 
/opt/gcc/gcc6w/x86_64-apple-darwin14.5.0/sys-include 
-I/opt/gcc/work/libstdc++-v3/../libgcc 
-I/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/include/x86_64-apple-darwin14.5.0
 -I/opt/gcc/build_w/x86_64-apple-darwin14.5.0/libstdc++-v3/include 
-I/opt/gcc/work/libstdc++-v3/libsupc++ -D_GLIBCXX_SHARED 
-fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi 
-fdiagnostics-show-location=once -fvisibility-inlines-hidden 
-ffunction-sections -fdata-sections -frandom-seed=new_handler.lo -g -O2 
-std=gnu++11 -c ../../../../work/libstdc++-v3/libsupc++/new_handler.cc  
-fno-common -DPIC -D_GLIBCXX_SHARED -o new_handler.o
../../../../work/libstdc++-v3/libsupc++/new_handler.cc:37:39: error: converting 
to 'std::nothrow_t' from initializer list would use explicit constructor 
'constexpr std::nothrow_t::nothrow_t()'
const std::nothrow_t std::nothrow = { };
  ^
see https://gcc.gnu.org/ml/gcc-regression/2015-11/


Fixed by this patch.

commit 97c2da9d4cc11bd5dae077ccb5fda4e72f7c34d5
Author: Jonathan Wakely 
Date:   Wed Nov 11 17:27:23 2015 +

	* libsupc++/new_handler.cc: Fix for explicit constructor change.

diff --git a/libstdc++-v3/libsupc++/new_handler.cc b/libstdc++-v3/libsupc++/new_handler.cc
index a09012c..4da48b3 100644
--- a/libstdc++-v3/libsupc++/new_handler.cc
+++ b/libstdc++-v3/libsupc++/new_handler.cc
@@ -34,7 +34,7 @@ namespace
 }
 #endif
 
-const std::nothrow_t std::nothrow = { };
+const std::nothrow_t std::nothrow = std::nothrow_t{ };
 
 using std::new_handler;
 namespace


Re: [PATCH][ARC] Fix ARC backend ICE on pr29921-2

2015-11-11 Thread Joern Wolfgang Rennecke



On 11/11/15 15:22, Claudiu Zissulescu wrote:

Please find attached a patch that fixes the ARC backend ICE on pr29921-2 test 
from gcc.dg (dg.exp).

The patch will allow generating conditional move also outside expand scope. The 
error was triggered during if-conversion.

Ok to apply?


OK.



[patch] libstdc++/60421 (again) Loop in std::this_thread sleep functions

2015-11-11 Thread Jonathan Wakely

This fixes part of PR 60421 by looping in this_thread::sleep_for when
it is interrupted by a signal, and looping in this_thread::sleep_until
to handle clock adjustments.

There are still problems with integer overflow/wrapping in sleep_for,
which still need to be addressed somehow. Maybe using the new
overflow-checking built-ins.

Tested powerpc64le-linux, committed to trunk.

commit 1773ceda34abcbe088048786ac869ee1740ce1d9
Author: Jonathan Wakely 
Date:   Wed Nov 11 16:16:55 2015 +

Loop in std::this_thread sleep functions

	PR libstdc++/60421
	* include/std/thread (this_thread::sleep_for): Retry on EINTR.
	(this_thread::sleep_until): Retry if time not reached.
	* src/c++11/thread.cc (__sleep_for): Retry on EINTR.
	* testsuite/30_threads/this_thread/60421.cc: Test interruption and
	non-steady clocks.

diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index c67ec46..5940e6e 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -297,7 +297,8 @@ _GLIBCXX_END_NAMESPACE_VERSION
 	static_cast(__s.count()),
 	static_cast(__ns.count())
 	  };
-	::nanosleep(&__ts, 0);
+	while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
+	  { }
 #else
 	__sleep_for(__s, __ns);
 #endif
@@ -309,8 +310,17 @@ _GLIBCXX_END_NAMESPACE_VERSION
   sleep_until(const chrono::time_point<_Clock, _Duration>& __atime)
   {
 	auto __now = _Clock::now();
-	if (__now < __atime)
-	  sleep_for(__atime - __now);
+	if (_Clock::is_steady)
+	  {
+	if (__now < __atime)
+	  sleep_for(__atime - __now);
+	return;
+	  }
+	while (__now < __atime)
+	  {
+	sleep_for(__atime - __now);
+	__now = _Clock::now();
+	  }
   }
 
   _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/src/c++11/thread.cc b/libstdc++-v3/src/c++11/thread.cc
index e116afa..3407e80 100644
--- a/libstdc++-v3/src/c++11/thread.cc
+++ b/libstdc++-v3/src/c++11/thread.cc
@@ -221,7 +221,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	static_cast(__s.count()),
 	static_cast(__ns.count())
   };
-::nanosleep(&__ts, 0);
+while (::nanosleep(&__ts, &__ts) == -1 && errno == EINTR)
+  { }
 #elif defined(_GLIBCXX_HAVE_SLEEP)
 # ifdef _GLIBCXX_HAVE_USLEEP
 ::sleep(__s.count());
diff --git a/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc b/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc
index ecc4deb..5dbf257 100644
--- a/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc
+++ b/libstdc++-v3/testsuite/30_threads/this_thread/60421.cc
@@ -15,12 +15,19 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++11" }
+// { dg-do run { target *-*-freebsd* *-*-dragonfly* *-*-netbsd* *-*-linux* *-*-gnu* *-*-solaris* *-*-cygwin *-*-rtems* *-*-darwin* powerpc-ibm-aix* } }
+// { dg-options " -std=gnu++11 -pthread" { target *-*-freebsd* *-*-dragonfly* *-*-netbsd* *-*-linux* *-*-gnu* powerpc-ibm-aix* } }
+// { dg-options " -std=gnu++11 -pthreads" { target *-*-solaris* } }
+// { dg-options " -std=gnu++11 " { target *-*-cygwin *-*-rtems* *-*-darwin* } }
 // { dg-require-cstdint "" }
+// { dg-require-gthreads "" }
 // { dg-require-time "" }
 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 
 void
@@ -28,11 +35,64 @@ test01()
 {
   std::this_thread::sleep_for(std::chrono::seconds(0));
   std::this_thread::sleep_for(std::chrono::seconds(-1));
-  std::this_thread::sleep_for(std::chrono::duration::zero());
+  std::this_thread::sleep_for(std::chrono::duration::zero());
+}
+
+void
+test02()
+{
+  bool test __attribute__((unused)) = true;
+
+  // test interruption of this_thread::sleep_for() by a signal
+  struct sigaction sa{ };
+  sa.sa_handler = +[](int) { };
+  sigaction(SIGUSR1, , 0);
+  bool result = false;
+  std::atomic sleeping{false};
+  std::thread t([, ] {
+auto start = std::chrono::system_clock::now();
+auto time = std::chrono::seconds(3);
+sleeping = true;
+std::this_thread::sleep_for(time);
+result = std::chrono::system_clock::now() >= (start + time);
+  });
+  while (!sleeping) { }
+  std::this_thread::sleep_for(std::chrono::milliseconds(500));
+  pthread_kill(t.native_handle(), SIGUSR1);
+  t.join();
+  VERIFY( result );
+}
+
+struct slow_clock
+{
+  using rep = std::chrono::system_clock::rep;
+  using period = std::chrono::system_clock::period;
+  using duration = std::chrono::system_clock::duration;
+  using time_point = std::chrono::time_point;
+  static constexpr bool is_steady = false;
+
+  static time_point now()
+  {
+auto real = std::chrono::system_clock::now();
+return time_point{real.time_since_epoch() / 2};
+  }
+};
+
+void
+test03()
+{
+  bool test __attribute__((unused)) = true;
+
+  // test that this_thread::sleep_until() handles clock adjustments
+  auto when = slow_clock::now() + std::chrono::seconds(2);
+  std::this_thread::sleep_until(when);
+  VERIFY( 

Re: [ptx] partitioning optimization

2015-11-11 Thread Thomas Schwinge
Hi!

On Wed, 11 Nov 2015 08:59:17 -0500, Nathan Sidwell  wrote:
> On 11/11/15 07:06, Bernd Schmidt wrote:
> > On 11/10/2015 11:33 PM, Nathan Sidwell wrote:
> >> I've been unable to introduce a testcase for this.

(But you still committed an update to gcc/testsuite/ChangeLog.)

You'll need to put such an offloading test into the libgomp testsuite --
offloading complation requires linking, and during that, the offloading
compiler(s) will be invoked, which only the libgomp testsuite is set up
to do, as discussed before.

> >> The difficulty is we
> >> want to check an rtl dump from the acceleration compiler, and there
> >> doesn't  appear to be existing machinery for that in the testsuite.
> >> Perhaps something to be added later?
> >
> > What's the difficulty exactly? Getting a dump should be possible with
> > -foffload=-fdump-whatever, does the testsuite have a problem finding the 
> > right
> > filename?

Currently, this will create cc* files, for example ccdjj2z9.o.271r.final
for -foffload=-fdump-rtl-final.  (I don't know if you can come up with
dg-* directives to scan these.)  The reason is -- I think -- because of
the lto-wrapper and/or mkoffloads not specifying a more suitable "base
name" for the temporary input files to lto1.

> That's not the problem.  How to conditionally enable the test is the 
> difficulty. 
>   I suspect porting something concerning accel_compiler from the libgomp 
> testsuite is needed?

Use "{ target openacc_nvidia_accel_selected }", as implemented by
libgomp/testsuite/lib/libgomp.exp:check_effective_target_openacc_nvidia_accel_selected
(already present on trunk).


Grüße
 Thomas


signature.asc
Description: PGP signature


open acc default data attribute

2015-11-11 Thread Nathan Sidwell

Jakub,
this patch implements default data attribute determination.  The current 
behaviour defaults to 'copy' and ignores 'default(none)'. The  patch corrects that.


1) We emit a diagnostic when 'default(none)' is in effect.  The fortran FE emits 
some artificial decls that it doesn't otherwise annotate, which is why we check 
DECL_ARTIFICIAL.  IIUC Cesar had a patch to address that but it needed some 
reworking?


2) 'copy' is the correct default for 'kernels' regions, but for a 'parallel' 
region, scalars should be 'firstprivate', which is what this patch implements.


ok?

nathan
2015-11-11  Nathan Sidwell  

	gcc/
	* gimplify.c (oacc_default_clause): New.
	(omp_notice_variable): Call it.

	gcc/testsuite/
	* c-c++-common/goacc/data-default-1.c: New.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/default-1.c: New.

Index: gcc/gimplify.c
===
--- gcc/gimplify.c	(revision 230169)
+++ gcc/gimplify.c	(working copy)
@@ -5900,6 +5900,60 @@ omp_default_clause (struct gimplify_omp_
   return flags;
 }
 
+
+/* Determine outer default flags for DECL mentioned in an OACC region
+   but not declared in an enclosing clause.  */
+
+static unsigned
+oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
+{
+  const char *rkind;
+
+  switch (ctx->region_type)
+{
+default:
+  gcc_unreachable ();
+
+case ORT_ACC_KERNELS:
+  /* Everything under kernels are default 'present_or_copy'.  */
+  flags |= GOVD_MAP;
+  rkind = "kernels";
+  break;
+
+case ORT_ACC_PARALLEL:
+  {
+	tree type = TREE_TYPE (decl);
+
+	if (TREE_CODE (type) == REFERENCE_TYPE
+	|| POINTER_TYPE_P (type))
+	  type = TREE_TYPE (type);
+
+	if (AGGREGATE_TYPE_P (type))
+	  /* Aggregates default to 'present_or_copy'.  */
+	  flags |= GOVD_MAP;
+	else
+	  /* Scalars default to 'firstprivate'.  */
+	  flags |= GOVD_FIRSTPRIVATE;
+	rkind = "parallel";
+  }
+  break;
+}
+
+  if (DECL_ARTIFICIAL (decl))
+; /* We can get compiler-generated decls, and should not complain
+	 about them.  */
+  else if (ctx->default_kind == OMP_CLAUSE_DEFAULT_NONE)
+{
+  error ("%qE not specified in enclosing OpenACC %s construct",
+	 DECL_NAME (lang_hooks.decls.omp_report_decl (decl)), rkind);
+  error_at (ctx->location, "enclosing OpenACC %s construct", rkind);
+}
+  else
+gcc_checking_assert (ctx->default_kind == OMP_CLAUSE_DEFAULT_SHARED);
+
+  return flags;
+}
+
 /* Record the fact that DECL was used within the OMP context CTX.
IN_CODE is true when real code uses DECL, and false when we should
merely emit default(none) errors.  Return true if DECL is going to
@@ -6023,7 +6077,12 @@ omp_notice_variable (struct gimplify_omp
 		nflags |= GOVD_MAP | GOVD_EXPLICIT;
 	  }
 	else if (nflags == flags)
-	  nflags |= GOVD_MAP;
+	  {
+		if ((ctx->region_type & ORT_ACC) != 0)
+		  nflags = oacc_default_clause (ctx, decl, flags);
+		else
+		  nflags |= GOVD_MAP;
+	  }
 	  }
 	found_outer:
 	  omp_add_variable (ctx, decl, nflags);
Index: gcc/testsuite/c-c++-common/goacc/data-default-1.c
===
--- gcc/testsuite/c-c++-common/goacc/data-default-1.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/data-default-1.c	(working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+
+
+int main ()
+{
+  int n = 2;
+  int ary[2];
+  
+#pragma acc parallel default (none) /* { dg-message "parallel construct" 2 } */
+  {
+ary[0] /* { dg-error "not specified in enclosing" } */
+  = n; /* { dg-error "not specified in enclosing" } */
+  }
+
+#pragma acc kernels default (none) /* { dg-message "kernels construct" 2 } */
+  {
+ary[0] /* { dg-error "not specified in enclosing" } */
+  = n; /* { dg-error "not specified in enclosing" } */
+  }
+
+#pragma acc data copy (ary, n)
+  {
+#pragma acc parallel default (none)
+{
+  ary[0]
+	= n;
+}
+
+#pragma acc kernels default (none)
+{
+  ary[0]
+	= n;
+}
+  }
+
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c	(working copy)
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+
+#include  
+
+int test_parallel ()
+{
+  int ok = 1;
+  int val = 2;
+  int ary[32];
+  int ondev = 0;
+
+  for (int i = 0; i < 32; i++)
+ary[i] = ~0;
+
+  /* val defaults to firstprivate, ary defaults to copy.  */
+#pragma acc parallel num_gangs (32) copy (ok) copy(ondev)
+  {
+ondev = acc_on_device (acc_device_not_host);
+#pragma acc loop gang(static:1)
+for (unsigned i = 0; i < 32; i++)
+  {
+	if (val != 2)
+	  ok = 0;
+	val += i;
+	ary[i] = val;
+  }
+  }
+
+  if (ondev)
+{
+  if (!ok)
+	return 1;
+  if (val != 2)
+	return 1;
+
+ 

[PATCH][ARM] Do not expand movmisalign pattern if not in 32-bit mode

2015-11-11 Thread Kyrill Tkachov

Hi all,

The attached testcase ICEs when compiled with -march=armv6k -mthumb -Os or any 
march
for which -mthumb gives Thumb1:
 error: unrecognizable insn:
 }
 ^
(insn 13 12 14 5 (set (reg:SI 116 [ x ])
(unspec:SI [
(mem:SI (reg/v/f:SI 112 [ s ]) [0 MEM[(unsigned char 
*)s_1(D)]+0 S4 A8])
] UNSPEC_UNALIGNED_LOAD)) besttry.c:9 -1
 (nil))

The problem is that the expands a movmisalign pattern but the resulting 
unaligned loads don't
match any define_insn because they are gated on unaligned_access && 
TARGET_32BIT.
The unaligned_access expander is gated only on unaligned_access.

This small patch fixes the issue by turning off unaligned_access if 
TARGET_32BIT is not true.
We can then remove TARGET_32BIT from the unaligned load/store patterns 
conditions as a cleanup.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2015-11-11  Kyrylo Tkachov  

* config/arm/arm.c (arm_option_override): Require TARGET_32BIT
for unaligned_access.
* config/arm/arm.md (unaligned_loadsi): Remove redundant TARGET_32BIT
from matching condition.
(unaligned_loadhis): Likewise.
(unaligned_loadhiu): Likewise.
(unaligned_storesi): Likewise.
(unaligned_storehi): Likewise.

2015-11-11  Kyrylo Tkachov  

* gcc.target/arm/armv6-unaligned-load-ice.c: New test.
commit 3b1e68a9f7fadeeb6d7f201ce2291bf2286a4d63
Author: Kyrylo Tkachov 
Date:   Tue Nov 10 13:48:17 2015 +

[ARM] Do not expand movmisalign pattern if not in 32-bit mode

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6a0994e..4708a12 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3436,7 +3436,8 @@ arm_option_override (void)
 }
 
   /* Enable -munaligned-access by default for
- - all ARMv6 architecture-based processors
+ - all ARMv6 architecture-based processors when compiling for a 32-bit ISA
+ i.e. Thumb2 and ARM state only.
  - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
  - ARMv8 architecture-base processors.
 
@@ -3446,7 +3447,7 @@ arm_option_override (void)
 
   if (unaligned_access == 2)
 {
-  if (arm_arch6 && (arm_arch_notm || arm_arch7))
+  if (TARGET_32BIT && arm_arch6 && (arm_arch_notm || arm_arch7))
 	unaligned_access = 1;
   else
 	unaligned_access = 0;
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ab48873..090a287 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4266,7 +4266,7 @@ (define_insn "unaligned_loadsi"
   [(set (match_operand:SI 0 "s_register_operand" "=l,r")
 	(unspec:SI [(match_operand:SI 1 "memory_operand" "Uw,m")]
 		   UNSPEC_UNALIGNED_LOAD))]
-  "unaligned_access && TARGET_32BIT"
+  "unaligned_access"
   "ldr%?\t%0, %1\t@ unaligned"
   [(set_attr "arch" "t2,any")
(set_attr "length" "2,4")
@@ -4279,7 +4279,7 @@ (define_insn "unaligned_loadhis"
 	(sign_extend:SI
 	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,Uh")]
 		 UNSPEC_UNALIGNED_LOAD)))]
-  "unaligned_access && TARGET_32BIT"
+  "unaligned_access"
   "ldrsh%?\t%0, %1\t@ unaligned"
   [(set_attr "arch" "t2,any")
(set_attr "length" "2,4")
@@ -4292,7 +4292,7 @@ (define_insn "unaligned_loadhiu"
 	(zero_extend:SI
 	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
 		 UNSPEC_UNALIGNED_LOAD)))]
-  "unaligned_access && TARGET_32BIT"
+  "unaligned_access"
   "ldrh%?\t%0, %1\t@ unaligned"
   [(set_attr "arch" "t2,any")
(set_attr "length" "2,4")
@@ -4304,7 +4304,7 @@ (define_insn "unaligned_storesi"
   [(set (match_operand:SI 0 "memory_operand" "=Uw,m")
 	(unspec:SI [(match_operand:SI 1 "s_register_operand" "l,r")]
 		   UNSPEC_UNALIGNED_STORE))]
-  "unaligned_access && TARGET_32BIT"
+  "unaligned_access"
   "str%?\t%1, %0\t@ unaligned"
   [(set_attr "arch" "t2,any")
(set_attr "length" "2,4")
@@ -4316,7 +4316,7 @@ (define_insn "unaligned_storehi"
   [(set (match_operand:HI 0 "memory_operand" "=Uw,m")
 	(unspec:HI [(match_operand:HI 1 "s_register_operand" "l,r")]
 		   UNSPEC_UNALIGNED_STORE))]
-  "unaligned_access && TARGET_32BIT"
+  "unaligned_access"
   "strh%?\t%1, %0\t@ unaligned"
   [(set_attr "arch" "t2,any")
(set_attr "length" "2,4")
diff --git a/gcc/testsuite/gcc.target/arm/armv6-unaligned-load-ice.c b/gcc/testsuite/gcc.target/arm/armv6-unaligned-load-ice.c
new file mode 100644
index 000..88528f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv6-unaligned-load-ice.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { "-march=armv6k" } } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" } { "" } } */
+/* { dg-options "-mthumb -Os -mfloat-abi=softfp" } */
+/* { dg-add-options arm_arch_v6k } */
+
+long
+get_number (char *s, long size, int unsigned_p)
+{
+  long x;
+  unsigned char *p = (unsigned char *) s;
+  switch (size)
+{
+case 

[gomp4.5] Don't mark GOMP_MAP_FIRSTPRIVATE mapped vars addressable

2015-11-11 Thread Jakub Jelinek
Hi!

Alex reported to me privately that with the OpenMP 4.5 handling of
array section bases (that they are firstprivate instead of mapped)
we unnecessarily mark the pointers addressable and that result
in less efficient way of passing them as shared to inner constructs.

They don't need to be made addressable just because they appear as
bases of mapped array sections.

Fixed thusly, regtested on x86_64-linux, committed to gomp-4_5-branch.

2015-11-11  Jakub Jelinek  

c/
* c-typeck.c (c_finish_omp_clauses): Don't mark
GOMP_MAP_FIRSTPRIVATE_POINTER decls addressable.
cp/
* semantics.c (finish_omp_clauses): Don't mark
GOMP_MAP_FIRSTPRIVATE_POINTER decls addressable.

--- gcc/c/c-typeck.c.jj 2015-11-09 17:36:17.0 +0100
+++ gcc/c/c-typeck.c2015-11-10 14:25:53.592499759 +0100
@@ -12865,7 +12865,10 @@ c_finish_omp_clauses (tree clauses, bool
omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
  remove = true;
}
- else if (!c_mark_addressable (t))
+ else if ((OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
+   || (OMP_CLAUSE_MAP_KIND (c)
+   != GOMP_MAP_FIRSTPRIVATE_POINTER))
+  && !c_mark_addressable (t))
remove = true;
  else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
--- gcc/cp/semantics.c.jj   2015-11-06 08:08:37.0 +0100
+++ gcc/cp/semantics.c  2015-11-10 14:27:14.916355747 +0100
@@ -6566,6 +6566,9 @@ finish_omp_clauses (tree clauses, bool a
}
  else if (!processing_template_decl
   && TREE_CODE (TREE_TYPE (t)) != REFERENCE_TYPE
+  && (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
+  || (OMP_CLAUSE_MAP_KIND (c)
+  != GOMP_MAP_FIRSTPRIVATE_POINTER))
   && !cxx_mark_addressable (t))
remove = true;
  else if (!(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP

Jakub


[PATCH, alpha]: Add TARGET_PRINT_OPERAND and friends

2015-11-11 Thread Uros Bizjak
2015-11-11  Uros Bizjak  

* config/alpha/alpha-protos.h (print_operand): Remove.
(print_operand_address): Remove.
* config/alpha/alpha.h (PRINT_OPERAND): Remove.
(PRINT_OPERAND_ADDRESS): Remove.
(PRINT_OPERAND_PUNCT_VALID_P): Remove.
* config/alpha/alpha.c (TARGET_PRINT_OPERAND): New hook define.
(TARGET_PRINT_OPERAND_ADDRESS): New hook define.
(TARGET_PRINT_OPERAND_PUNCT_VALID_P): New hook define.
(print_operand_address): Rename to...
(alpha_print_operand_address): ...this and make static.
(print_operand): Rename to...
(alpha_print_operand): ...this and make static.
(alpha_print_operand_punct_valid_p): New static function.

Bootstrapped and regression tested on alphaev68-linux-gnu, committed
to mainline SVN.

Uros.
Index: config/alpha/alpha-protos.h
===
--- config/alpha/alpha-protos.h (revision 230178)
+++ config/alpha/alpha-protos.h (working copy)
@@ -65,8 +65,6 @@ extern void alpha_expand_builtin_revert_vms_condit
 
 extern rtx alpha_return_addr (int, rtx);
 extern rtx alpha_gp_save_rtx (void);
-extern void print_operand (FILE *, rtx, int);
-extern void print_operand_address (FILE *, rtx);
 extern void alpha_initialize_trampoline (rtx, rtx, rtx, int, int, int);
 
 extern rtx alpha_va_arg (tree, tree);
Index: config/alpha/alpha.c
===
--- config/alpha/alpha.c(revision 230178)
+++ config/alpha/alpha.c(working copy)
@@ -5041,11 +5041,21 @@ get_round_mode_suffix (void)
   gcc_unreachable ();
 }
 
-/* Print an operand.  Recognize special options, documented below.  */
+/* Implement TARGET_PRINT_OPERAND_PUNCT_VALID_P.  */
 
-void
-print_operand (FILE *file, rtx x, int code)
+static bool
+alpha_print_operand_punct_valid_p (unsigned char code)
 {
+  return (code == '/' || code == ',' || code == '-' || code == '~'
+ || code == '#' || code == '*' || code == '&');
+}
+
+/* Implement TARGET_PRINT_OPERAND.  The alpha-specific
+   operand codes are documented below.  */
+
+static void
+alpha_print_operand (FILE *file, rtx x, int code)
+{
   int i;
 
   switch (code)
@@ -5064,6 +5074,8 @@ get_round_mode_suffix (void)
   break;
 
 case '/':
+  /* Generates the instruction suffix.  The TRAP_SUFFIX and ROUND_SUFFIX
+attributes are examined to determine what is appropriate.  */
   {
const char *trap = get_trap_mode_suffix ();
const char *round = get_round_mode_suffix ();
@@ -5074,12 +5086,14 @@ get_round_mode_suffix (void)
   }
 
 case ',':
-  /* Generates single precision instruction suffix.  */
+  /* Generates single precision suffix for floating point
+instructions (s for IEEE, f for VAX).  */
   fputc ((TARGET_FLOAT_VAX ? 'f' : 's'), file);
   break;
 
 case '-':
-  /* Generates double precision instruction suffix.  */
+  /* Generates double precision suffix for floating point
+instructions (t for IEEE, g for VAX).  */
   fputc ((TARGET_FLOAT_VAX ? 'g' : 't'), file);
   break;
 
@@ -5350,8 +5364,10 @@ get_round_mode_suffix (void)
 }
 }
 
-void
-print_operand_address (FILE *file, rtx addr)
+/* Implement TARGET_PRINT_OPERAND_ADDRESS.  */
+
+static void
+alpha_print_operand_address (FILE *file, machine_mode /*mode*/, rtx addr)
 {
   int basereg = 31;
   HOST_WIDE_INT offset = 0;
@@ -9877,6 +9893,13 @@ alpha_atomic_assign_expand_fenv (tree *hold, tree
 #define TARGET_STDARG_OPTIMIZE_HOOK alpha_stdarg_optimize_hook
 #endif
 
+#undef TARGET_PRINT_OPERAND
+#define TARGET_PRINT_OPERAND alpha_print_operand
+#undef TARGET_PRINT_OPERAND_ADDRESS
+#define TARGET_PRINT_OPERAND_ADDRESS alpha_print_operand_address
+#undef TARGET_PRINT_OPERAND_PUNCT_VALID_P
+#define TARGET_PRINT_OPERAND_PUNCT_VALID_P alpha_print_operand_punct_valid_p
+
 /* Use 16-bits anchor.  */
 #undef TARGET_MIN_ANCHOR_OFFSET
 #define TARGET_MIN_ANCHOR_OFFSET -0x7fff - 1
Index: config/alpha/alpha.h
===
--- config/alpha/alpha.h(revision 230178)
+++ config/alpha/alpha.h(working copy)
@@ -1005,37 +1005,6 @@ do { \
 #define ASM_OUTPUT_ADDR_DIFF_ELT(FILE, BODY, VALUE, REL) \
   fprintf (FILE, "\t.gprel32 $L%d\n", (VALUE))
 
-
-/* Print operand X (an rtx) in assembler syntax to file FILE.
-   CODE is a letter or dot (`z' in `%z0') or 0 if no letter was specified.
-   For `%' followed by punctuation, CODE is the punctuation and X is null.  */
-
-#define PRINT_OPERAND(FILE, X, CODE)  print_operand (FILE, X, CODE)
-
-/* Determine which codes are valid without a following integer.  These must
-   not be alphabetic.
-
-   ~Generates the name of the current function.
-
-   /   Generates the instruction suffix.  The TRAP_SUFFIX and ROUND_SUFFIX
-   attributes are examined to determine what is appropriate.
-
-   ,Generates 

Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Marek Polacek
On Tue, Nov 10, 2015 at 04:48:13PM -0700, Jeff Law wrote:
> Someone (I can't recall who) suggested the overflow check ought to be
> shared, I agree.  Can you factor out that check, shove it into c-family/ and
> call it from the C & C++ front-ends?
> 
> Approved with that change.  Please post it here for archival purposes
> though.

Done, thanks.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2015-11-11  Marek Polacek  

PR c/68107
PR c++/68266
* c-common.c (valid_array_size_p): New function.
* c-common.h (valid_array_size_p): Declare.

* c-decl.c (grokdeclarator): Call valid_array_size_p.  Remove code
checking the size of an array.

* decl.c (grokdeclarator): Call valid_array_size_p.  Remove code
checking the size of an array.

* c-c++-common/pr68107.c: New test.
* g++.dg/init/new38.C (large_array_char): Adjust dg-error.
(large_array_char_template): Likewise.
* g++.dg/init/new44.C: Adjust dg-error.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 53c1d81..a393b32 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -13110,4 +13110,26 @@ warn_duplicated_cond_add_or_warn (location_t loc, tree 
cond, vec **chain)
 (*chain)->safe_push (cond);
 }
 
+/* Check if array size calculations overflow or if the array covers more
+   than half of the address space.  Return true if the size of the array
+   is valid, false otherwise.  TYPE is the type of the array and NAME is
+   the name of the array, or NULL_TREE for unnamed arrays.  */
+
+bool
+valid_array_size_p (location_t loc, tree type, tree name)
+{
+  if (type != error_mark_node
+  && COMPLETE_TYPE_P (type)
+  && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST
+  && !valid_constant_size_p (TYPE_SIZE_UNIT (type)))
+{
+  if (name)
+   error_at (loc, "size of array %qE is too large", name);
+  else
+   error_at (loc, "size of unnamed array is too large");
+  return false;
+}
+  return true;
+}
+
 #include "gt-c-family-c-common.h"
diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index c825454..bad8d05 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -1463,5 +1463,6 @@ extern bool check_no_cilk (tree, const char *, const char 
*,
   location_t loc = UNKNOWN_LOCATION);
 extern bool reject_gcc_builtin (const_tree, location_t = UNKNOWN_LOCATION);
 extern void warn_duplicated_cond_add_or_warn (location_t, tree, vec **);
+extern bool valid_array_size_p (location_t, tree, tree);
 
 #endif /* ! GCC_C_COMMON_H */
diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index a3d8ead..fb4f5ea 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -6007,6 +6007,9 @@ grokdeclarator (const struct c_declarator *declarator,
TYPE_SIZE_UNIT (type) = size_zero_node;
SET_TYPE_STRUCTURAL_EQUALITY (type);
  }
+
+   if (!valid_array_size_p (loc, type, name))
+ type = error_mark_node;
  }
 
if (decl_context != PARM
@@ -6014,7 +6017,8 @@ grokdeclarator (const struct c_declarator *declarator,
|| array_ptr_attrs != NULL_TREE
|| array_parm_static))
  {
-   error_at (loc, "static or type qualifiers in non-parameter 
array declarator");
+   error_at (loc, "static or type qualifiers in non-parameter "
+ "array declarator");
array_ptr_quals = TYPE_UNQUALIFIED;
array_ptr_attrs = NULL_TREE;
array_parm_static = 0;
@@ -6293,22 +6297,6 @@ grokdeclarator (const struct c_declarator *declarator,
}
 }
 
-  /* Did array size calculations overflow or does the array cover more
- than half of the address-space?  */
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && COMPLETE_TYPE_P (type)
-  && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST
-  && ! valid_constant_size_p (TYPE_SIZE_UNIT (type)))
-{
-  if (name)
-   error_at (loc, "size of array %qE is too large", name);
-  else
-   error_at (loc, "size of unnamed array is too large");
-  /* If we proceed with the array type as it is, we'll eventually
-crash in tree_to_[su]hwi().  */
-  type = error_mark_node;
-}
-
   /* If this is declaring a typedef name, return a TYPE_DECL.  */
 
   if (storage_class == csc_typedef)
diff --git gcc/cp/decl.c gcc/cp/decl.c
index bd3f2bc..a3caa19 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -9945,6 +9945,9 @@ grokdeclarator (const cp_declarator *declarator,
case cdk_array:
  type = create_array_type_for_decl (dname, type,
 declarator->u.array.bounds);
+ if (!valid_array_size_p (input_location, type, dname))
+   type = error_mark_node;
+
  if (declarator->std_attributes)
/* [dcl.array]/1:
 

Re: [PATCH v4] SH FDPIC backend support

2015-11-11 Thread Rich Felker
On Wed, Nov 11, 2015 at 11:36:26PM +0900, Oleg Endo wrote:
> On Tue, 2015-11-10 at 15:07 -0500, Rich Felker wrote:
> 
> > > The way libcalls are now emitted is a bit unhandy.  If more special
> > > -ABI
> > > libcalls are to be added in the future, they all have to do the jsr
> > > vs.
> > > bsrf handling (some potential candidates for new libcalls are
> > > optimized
> > > soft FP routines).  Then we still have PR 65374 and PR 54019. In
> > > the
> > > future maybe we should come up with something that allows emitting
> > > libcalls in a more transparent way...
> > 
> > I'd like to look into improving this at some point in the near
> > future.
> > On further reading of the changes made, I think there's a lot of code
> > we could reduce or simplify.
> > 
> > In all the places where new RTL patterns were added for *call*_fdpic,
> > the main constraint change vs the non-fdpic version is using REG_PIC.
> > Is it possible to make a REG_GOT_ARG macro or similar that's defined
> > as something like TARGET_FDPIC ? REG_PIC : nonexistent_or_dummy?
> 
> I'm not sure I understand what you mean by that.  Do you have a small
> code snippet example?

Sorry, I don't really understand RTL well enough to make a code
snippet. What I want to express is that an insn "uses" (in the (use
...) sense) a register (r12) conditionally depending on a runtime
option (TARGET_FDPIC).

> > As for the call site stuff, I wonder why the existing call site stuff
> > used by "call_pcrel" can't be used for SFUNC_STATIC. 
> 
> "call_pcrel" is a real call insn.  The libcalls are not expanded as
> real call insns to avoid the regular register save/restores etc which
> is needed to do a normal function call.

Yes, I see that. What I was really wondering though is why the new
call site generation code and constraint was added when the call_pcrel
code already has mechanisms for this, rather than just duplicating the
internals that call_pcrel uses. It seems like we're doing things in a
gratuitously different way here.

> I guess the generic fix for this issue would be some mechanism to
> specify which regs are clobbered/preserved and then provide the right
> settings for the libcall functions.

Is this possible in the sh backend or does it need changes to
higher-level gcc code? (i.e. is it presently possible to make an insn
that conditionally clobbers different things rather than having to
make tons of different insns for each possible set of clobbers?)

> > I'm actually
> > trying to prepare a simpler FDPIC patch for other gcc versions we're
> > interested in that's not so invasive, and for now I'm just having
> > function_symbol replace SFUNC_STATIC with SFUNC_GOT on TARGET_FDPIC
> > to
> > avoid needing all the label stuff, but it would be nice to find a way
> > to reuse the existing framework.
> 
> Do you know how this affects code size (and inherently performance)?

I suspect it makes very little difference, but to compare I'd need to
do the same hack on 5.2.0 or trunk. The only difference should be one
additional load per call, and one additional GOT slot per function
called this way (but just once per executable/library).

Another issue I've started looking at is how r12 is put in fixed_regs,
which is conceptually wrong. Preliminary tests show that removing it
from fixed_regs doesn't break and produces much better code -- r12
gets used as a temp register in functions that don't need it, and in
one function that made multiple calls, the saving of initial r12 to a
call-saved register even happened in the delay slot of the call. I've
been discussing it with Alexander Monakov on IRC (#musl) and based on
my understanding so far of how gcc works (which admittedly may be
wrong) the current FDPIC code looks like it's written not to depend on
r12 being 'fixed'. Also I think I'm pretty close to understanding how
we could make the same improvements for non-FDPIC PIC codegen: instead
of loading r12 in the prologue, load a pseudo, then use that pseudo
for GOT access and force it into r12 the same way FDPIC call code does
for PLT calls. Does this sound correct?

Rich


Re: State of support for the ISO C++ Transactional Memory TS and remanining work

2015-11-11 Thread Szabolcs Nagy

On 10/11/15 18:29, Torvald Riegel wrote:

On Tue, 2015-11-10 at 17:26 +, Szabolcs Nagy wrote:

On 09/11/15 00:19, Torvald Riegel wrote:

Hi,

I'd like to summarize the current state of support for the TM TS, and
outline the current plan for the work that remains to complete the
support.


...

Attached is a patch by Jason that implements this check.  This adds one
symbol, which should be okay we hope.



does that mean libitm will depend on libstdc++?


No, weak references are used to avoid that.  See libitm/eh_cpp.cc for
example.



i see.


I've not yet created tests for the full list of functions specified as
transaction-safe in the TS, but my understanding is that this list was
created after someone from the ISO C++ TM study group looked at libstdc
++'s implementation and investigated which functions might be feasible
to be declared transaction-safe in it.



is that list available somewhere?


See the TM TS, N4514.



i was looking at an older version,
things make more sense now.

i think system() should not be transaction safe..

i wonder what's the plan for getting libc functions
instrumented (i assume that is needed unless hw
support is used).


xmalloc
the runtime exits on memory allocation failure,
so it is not possible to use it safely.
(i think it should be possible to roll back the
transaction in case of internal allocation failure
and retry with a strategy that does not need dynamic
allocation).


Not sure what you mean by "safely".  Hardening against out-of-memory
situations hasn't been considered to be of high priority so far, but I'd
accept patches for that that don't increase complexity signifantly and
don't hamper performance.



i consider this a library safety issue.

(a library or runtime is not safe to use if it may terminate
the process in case of internal failures.)


GTM_error, GTM_fatal
the runtime may print error messages to stderr.
stderr is owned by the application.


We need to report errors in some way, especially with something that's
still experimental such as TM.  Alternatives are using C++ exceptions to
report errors, but we'd still need something else for C, such as
handlers that the program can control.



ok, i thought the plan was to move this out of the
experimental state now.


uint64_t GTM::gtm_spin_count_var = 1000;
i guess this was supposed to be tunable.
it seems libitm needs some knobs (strategy, retries,
spin counts), but there is no easy way to adapt these
for a target/runtime environment.


Sure, more performance tuning knobs would be nice.



my problem was with getting the knobs right at runtime.

(i think this will need a solution to make tm practically
useful, there are settings that seem to be sensitive to
the properties of the underlying hw.. this also seems
to be a problem for glibc lock elision retry policies.)


sys_futex0
i'm not sure why this has arch specific implementations
for some targets but not others. (syscall is not in the
implementation reserved namespace).


Are there archs that support libitm but don't have a definition of this
one?



i thought all targets were supported on linux
(the global lock based strategies should work)
i can prepare a sys_futex0 for arm and aarch64.


these are the issues i noticed that may matter if this
is going to be a supported compiler runtime.


What do you mean by that precisely?  For it to become non-experimental?



yes, non-experimental
(since you were talking about libstdc++ changes to
complete the support for the TS).



[PATCH][ARC] Fix ARC backend ICE on pr29921-2

2015-11-11 Thread Claudiu Zissulescu
Please find attached a patch that fixes the ARC backend ICE on pr29921-2 test 
from gcc.dg (dg.exp). 

The patch will allow generating conditional move also outside expand scope. The 
error was triggered during if-conversion.

Ok to apply?
Claudiu

ChangeLog:
2015-11-11  Claudiu Zissulescu  

* config/arc/arc.c (gen_compare_reg): Swap operands also when we
do not expand to rtl.



00-fixpr29921-2.patch
Description: 00-fixpr29921-2.patch


Re: State of support for the ISO C++ Transactional Memory TS and remanining work

2015-11-11 Thread Jonathan Wakely

On 11/11/15 15:04 +, Szabolcs Nagy wrote:

yes, non-experimental
(since you were talking about libstdc++ changes to
complete the support for the TS).


The TS is experimental. That's the nature of all C++ TSs.

Completing the TS support does not mean anything is non-experimental.


[gomp4] Merge trunk r230082 (2015-11-10) into gomp-4_0-branch

2015-11-11 Thread Thomas Schwinge
Hi!

Committed to gomp-4_0-branch in r230154:

commit 1fe1fa3a7b9d4286630cd286e0a52abe2d11e955
Merge: 02d9df1 76e711b
Author: tschwinge 
Date:   Wed Nov 11 11:43:09 2015 +

svn merge -r 230048:230082 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@230154 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] PR67305, tighten neon_vector_mem_operand on eliminable registers

2015-11-11 Thread Jiong Wang


On 04/11/15 09:45, Jiong Wang wrote:

As discussed at the bugzilla

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67305

neon_vector_mem_operand is broken.  As the comments says
"/* Reject eliminable registers.  */", the code block at the head
of this function which checks eliminable registers is designed to do
early reject only, there shouldn't be any early accept.

If this code hunk doesn't reject the incoming rtx, then the rtx pattern
should still go through all default checks below. All other similar
functions, thumb1_legitimate_address_p, arm_coproc_mem_operand,
neon_struct_mem_operand etc are exactly follow this check flow.

So as Jim Wilson commented on the bugzilla, instead of "return !strict",
we need to only do the check if strict be true, and only does rejection
which means return FALSE, for all other cases, we need to go through
those normal checks below.

neon_vector_mem_operand is only used by several misalign pattern, I
guess that's why this bug is not exposed for long time.

boostrap & regression OK on armv8 aarch32, ok for trunk?

2015-11-04  Jiong Wang  
Jim Wilson  

gcc/
  PR target/67305
  * config/arm/arm.md (neon_vector_mem_operand): Return FALSE if strict
  be true and eliminable registers mentioned.


Ping ~


Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers

2015-11-11 Thread Kyrill Tkachov


On 11/11/15 12:08, Charles Baylis wrote:

On 11 November 2015 at 11:22, Kyrill Tkachov  wrote:

Hi Charles,

On 08/11/15 00:26, charles.bay...@linaro.org wrote:

From: Charles Baylis 

  Charles Baylis  

 * config/arm/neon.md (neon_vld1_lane): Remove error for
invalid
 lane number.
 (neon_vst1_lane): Likewise.
 (neon_vld2_lane): Likewise.
 (neon_vst2_lane): Likewise.
 (neon_vld3_lane): Likewise.
 (neon_vst3_lane): Likewise.
 (neon_vld4_lane): Likewise.
 (neon_vst4_lane): Likewise.


In this pattern the 'max' variable is now unused, causing a bootstrap
-Werror failure on arm.
I'll test a patch to fix it unless you beat me to it...

Thanks for catching this.

I have a patch, and have started a bootstrap. Unless you have
objections, I'll apply as obvious once the bootstrap is complete later
this afternoon.


Yes, that's the exact patch I'm testing as well.
I'll let you finish the bootstrap and commit it.

Thanks,
Kyrill



 gcc/ChangeLog:

 2015-11-11  Charles Baylis  

 * config/arm/neon.md: (neon_vld2_lane): Remove unused max
 variable.
 (neon_vst2_lane): Likewise.
 (neon_vld3_lane): Likewise.
 (neon_vst3_lane): Likewise.
 (neon_vld4_lane): Likewise.
 (neon_vst4_lane): Likewise.




Re: [PATCH] Fix PR rtl-optimization/68287

2015-11-11 Thread Richard Biener
On Wed, Nov 11, 2015 at 12:18 PM, Martin Liška  wrote:
> Hi.
>
> There's a fix for fallout of r230027.
>
> Patch can bootstrap and survives regression tests on x86_64-linux-gnu.

Hmm, but only the new elements are zeroed so this still is different
from previous behavior.
Note that the previous .create (...) doesn't initialize the elements
either (well, it's not supposed to ...).

I _think_ the bug is that you do safe_grow and use length while the
previous code just added
enough reserve (but not actual elements!).

Thus the fix would be to do

 point_freq_vec.truncate (0);
 point_freq_vec.reserve_exact (new_length);

Richard.

> Ready for trunk?
> Thanks,
> Martin


Re: [PATCH] PR67305, tighten neon_vector_mem_operand on eliminable registers

2015-11-11 Thread Ramana Radhakrishnan


On 04/11/15 09:45, Jiong Wang wrote:
> As discussed at the bugzilla
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67305
> 
> neon_vector_mem_operand is broken.  As the comments says
> "/* Reject eliminable registers.  */", the code block at the head
> of this function which checks eliminable registers is designed to do
> early reject only, there shouldn't be any early accept.
> 
> If this code hunk doesn't reject the incoming rtx, then the rtx pattern
> should still go through all default checks below. All other similar
> functions, thumb1_legitimate_address_p, arm_coproc_mem_operand,
> neon_struct_mem_operand etc are exactly follow this check flow.
> 
> So as Jim Wilson commented on the bugzilla, instead of "return !strict",
> we need to only do the check if strict be true, and only does rejection
> which means return FALSE, for all other cases, we need to go through
> those normal checks below.
> 
> neon_vector_mem_operand is only used by several misalign pattern, I
> guess that's why this bug is not exposed for long time.
> 
> boostrap & regression OK on armv8 aarch32, ok for trunk?
> 
> 2015-11-04  Jiong Wang  
> Jim Wilson  
> 
> gcc/
>   PR target/67305
>   * config/arm/arm.md (neon_vector_mem_operand): Return FALSE if strict
>   be true and eliminable registers mentioned.
> 


This has been lurking for a long time ...  Sorry about the delay in reviewing. 
This is OK for trunk

regards
Ramana

> 
> neon-mem.patch
> 
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 87e55e9..7fbf897 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -12957,14 +12957,14 @@ neon_vector_mem_operand (rtx op, int type, bool 
> strict)
>rtx ind;
>  
>/* Reject eliminable registers.  */
> -  if (! (reload_in_progress || reload_completed)
> -  && (   reg_mentioned_p (frame_pointer_rtx, op)
> +  if (strict && ! (reload_in_progress || reload_completed)
> +  && (reg_mentioned_p (frame_pointer_rtx, op)
> || reg_mentioned_p (arg_pointer_rtx, op)
> || reg_mentioned_p (virtual_incoming_args_rtx, op)
> || reg_mentioned_p (virtual_outgoing_args_rtx, op)
> || reg_mentioned_p (virtual_stack_dynamic_rtx, op)
> || reg_mentioned_p (virtual_stack_vars_rtx, op)))
> -return !strict;
> +return FALSE;
>  
>/* Constants are converted into offsets from labels.  */
>if (!MEM_P (op))
> 


[Patch] Optimize condition reductions where the result is an integer induction variable

2015-11-11 Thread Alan Hayward
Hi,
I hoped to post this in time for Monday’s cut off date, but circumstances
delayed me until today. Hoping if possible this patch will still be able
to go in.


This patch builds upon the change for PR65947, and reduces the amount of
code produced in a vectorized condition reduction where operand 2 of the
COND_EXPR is an assignment of a increasing integer induction variable that
won't wrap.
 

For example (assuming all types are ints), this is a match:

last = 5;
for (i = 0; i < N; i++)
  if (a[i] < min_v)
last = i;

Whereas, this is not because the result is based off a memory access:
last = 5;
for (i = 0; i < N; i++)
  if (a[i] < min_v)
last = a[i];

In the integer induction variable case we can just use a MAX reduction and
skip all the code I added in my vectorized condition reduction patch - the
additional induction variables in vectorizable_reduction () and the
additional checks in vect_create_epilog_for_reduction (). From the patch
diff only, it's not immediately obvious that those parts will be skipped
as there is no code changes in those areas.

The initial value of the induction variable is force set to zero, as any
other value could effect the result of the induction. At the end of the
loop, if the result is zero, then we restore the original initial value.




Cheers,
Alan.



optimizeConditionReductions.patch
Description: Binary data


Re: [patch] Fix PR target/67265

2015-11-11 Thread Eric Botcazou
> This piece of code along doesn't tell me exactly why the frame pointer
> is needed. I was looking for an explicit use, but I now guess that if
> you have multiple adjusts of the [stack] pointer you can't easily undo
> them in the error case (the function behaves as-if using alloca). Is
> that it?

Yes, exactly, the analogy with alloca is correct.

> And without exceptions I assume you just get a call to abort so
> it doesn't matter? If I understood all that right, then this is ok.

If you don't care about exceptions on stack overflow, then the signal will 
very likely terminate the program instead of overwriting stack contents, which 
is good enough.  In Ada, the language requires you to exit gracefully or even 
to resume regular execution (at least in theory for the latter).

> In i386.c I see a code block with a similar condition,
> 
>/* If the only reason for frame_pointer_needed is that we conservatively
>   assumed stack realignment might be needed, but in the end nothing that
> needed the stack alignment had been spilled, clear
> frame_pointer_needed
>   and say we don't need stack realignment.  */
> 
> and the condition has
> 
>&& !(flag_stack_check && STACK_CHECK_MOVING_SP)
> 
> Should that be changed too?

Yes, it probably should, thanks for spotting it, revised patch attached.


PR target/67265
* ira.c (ira_setup_eliminable_regset): Do not necessarily create the
frame pointer for stack checking if non-call exceptions aren't used.
* config/i386/i386.c (ix86_finalize_stack_realign_flags): Likewise.


-- 
Eric BotcazouIndex: ira.c
===
--- ira.c	(revision 230146)
+++ ira.c	(working copy)
@@ -2259,9 +2259,12 @@ ira_setup_eliminable_regset (void)
   frame_pointer_needed
 = (! flag_omit_frame_pointer
|| (cfun->calls_alloca && EXIT_IGNORE_STACK)
-   /* We need the frame pointer to catch stack overflow exceptions
-	  if the stack pointer is moving.  */
-   || (flag_stack_check && STACK_CHECK_MOVING_SP)
+   /* We need the frame pointer to catch stack overflow exceptions if
+	  the stack pointer is moving (as for the alloca case just above).  */
+   || (STACK_CHECK_MOVING_SP
+	   && flag_stack_check
+	   && flag_exceptions
+	   && cfun->can_throw_non_call_exceptions)
|| crtl->accesses_prior_frames
|| (SUPPORTS_STACK_ALIGNMENT && crtl->stack_realign_needed)
/* We need a frame pointer for all Cilk Plus functions that use
Index: config/i386/i386.c
===
--- config/i386/i386.c	(revision 230146)
+++ config/i386/i386.c	(working copy)
@@ -12470,7 +12466,11 @@ ix86_finalize_stack_realign_flags (void)
   && !crtl->accesses_prior_frames
   && !cfun->calls_alloca
   && !crtl->calls_eh_return
-  && !(flag_stack_check && STACK_CHECK_MOVING_SP)
+  /* See ira_setup_eliminable_regset for the rationale.  */
+  && !(STACK_CHECK_MOVING_SP
+	   && flag_stack_check
+	   && flag_exceptions
+	   && cfun->can_throw_non_call_exceptions)
   && !ix86_frame_pointer_required ()
   && get_frame_size () == 0
   && ix86_nsaved_sseregs () == 0


RE: [PATCH 1/2][ARC] Add support for ARCv2 CPUs

2015-11-11 Thread Claudiu Zissulescu
This patch is committed (without the gen_compare_reg change). 

Thanks Joern,
Claudiu

> Apart from the gen_compare_reg change, the patch is OK.
> If the v2 support mostly works like support for the other subtargets, you may
> check it in without the gen_compare_reg change.
> If that change is required because of particular code paths taken with the v2
> port, you may check in the whole patch.
> 
> The operand-swapping in gen_compare_reg was not expected to be
> triggered when re-generating a comparison, as comparisons gleaned from
> existing instructions are supposed to already have the operands in the right
> order.
> Do you have a testcase that triggers the assert independently from the
> v2 support?
> If you can name a pre-existing testcase to trigger the assert, the patch is
> approved for separate check-in.
> If you have a new testcase, is it in a form and of a legal status that it can 
> be
> submitted for inclusion in the gcc regression tests suite?


Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Marek Polacek
On Tue, Nov 10, 2015 at 06:38:31PM +0100, Paolo Carlini wrote:
>  Hi,
> 
> On 11/10/2015 05:36 PM, Marek Polacek wrote:
> >+
> >+/* Did array size calculations overflow or does the array
> >+   cover more than half of the address-space?  */
> >+if (COMPLETE_TYPE_P (type)
> >+&& TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST
> >+&& !valid_constant_size_p (TYPE_SIZE_UNIT (type)))
> >+  {
> >+if (name)
> >+  error_at (loc, "size of array %qE is too large", name);
> >+else
> >+  error_at (loc, "size of unnamed array is too large");
> >+type = error_mark_node;
> >+  }
> >   }
> Obviously "the issue" predates your proposed change, but I don't understand
> why the code implementing the check can't be shared by the front-ends via a
> small function in c-family...

Certainly I'm in favor of sharing code between C and C++ FEs, though in
this case it didn't seem too important/obvious, because of the extra !=
error_mark_node check + I don't really like the new function getting *type
and setting it there.

But I'll submit another version of the patch with a common function.

Marek


Re: [Patch] Optimize condition reductions where the result is an integer induction variable

2015-11-11 Thread Richard Biener
On Wed, Nov 11, 2015 at 1:22 PM, Alan Hayward  wrote:
> Hi,
> I hoped to post this in time for Monday’s cut off date, but circumstances
> delayed me until today. Hoping if possible this patch will still be able
> to go in.
>
>
> This patch builds upon the change for PR65947, and reduces the amount of
> code produced in a vectorized condition reduction where operand 2 of the
> COND_EXPR is an assignment of a increasing integer induction variable that
> won't wrap.
>
>
> For example (assuming all types are ints), this is a match:
>
> last = 5;
> for (i = 0; i < N; i++)
>   if (a[i] < min_v)
> last = i;
>
> Whereas, this is not because the result is based off a memory access:
> last = 5;
> for (i = 0; i < N; i++)
>   if (a[i] < min_v)
> last = a[i];
>
> In the integer induction variable case we can just use a MAX reduction and
> skip all the code I added in my vectorized condition reduction patch - the
> additional induction variables in vectorizable_reduction () and the
> additional checks in vect_create_epilog_for_reduction (). From the patch
> diff only, it's not immediately obvious that those parts will be skipped
> as there is no code changes in those areas.
>
> The initial value of the induction variable is force set to zero, as any
> other value could effect the result of the induction. At the end of the
> loop, if the result is zero, then we restore the original initial value.

+static bool
+is_integer_induction (gimple *stmt, struct loop *loop)

is_nonwrapping_integer_induction?

+  tree lhs_max = TYPE_MAX_VALUE (TREE_TYPE (gimple_phi_result (stmt)));

don't use TYPE_MAX_VALUE.

+  /* Check that the induction increments.  */
+  if (tree_int_cst_compare (step, size_zero_node) <= 0)
+return false;

tree_int_cst_sgn (step) == -1

+  /* Check that the max size of the loop will not wrap.  */
+
+  if (! max_loop_iterations (loop, ))
+return false;
+  /* Convert backedges to iterations.  */
+  ni += 1;

just use max_stmt_executions (loop, ) which properly checks for overflow
of the +1.

+  max_loop_value = wi::add (wi::to_widest (base),
+   wi::mul (wi::to_widest (step), ni));
+
+  if (wi::gtu_p (max_loop_value, wi::to_widest (lhs_max)))
+return false;

you miss a check for the wi::add / wi::mul to overflow.  You can use
extra args to determine this.

Instead of TYPE_MAX_VALUE use wi::max_value (precision, sign).

I wonder if you want to skip all the overflow checks for TYPE_OVERFLOW_UNDEFINED
IV types?

Thanks,
Richard.

>
>
>
> Cheers,
> Alan.
>


Re: OpenACC Firstprivate

2015-11-11 Thread Nathan Sidwell

On 11/11/15 03:04, Jakub Jelinek wrote:

On Tue, Nov 10, 2015 at 09:12:55AM -0500, Nathan Sidwell wrote:

+   /* Create a local object to hold the instance
+  value.  */
+   tree inst = create_tmp_var
+ (TREE_TYPE (TREE_TYPE (new_var)),
+  IDENTIFIER_POINTER (DECL_NAME (new_var)));


Can you please rewrite this as:
tree type = TREE_TYPE (TREE_TYPE (new_var));
tree n = DECL_NAME (new_var);
tree inst = create_tmp_var (type, IDENTIFIER_POINTER (n));
or so (perhaps
const char *name
  = IDENTIFIER_POINTER (DECL_NAME (new_var));
instead but then it takes one more line)?
I really don't like line breaks before opening ( unless really
necessary.


Oh, yeah you mentioned that before :)



Otherwise LGTM.


thanks.

nathan



Re: [gomp4] Fix some broken tests

2015-11-11 Thread Cesar Philippidis
On 11/11/2015 05:40 AM, Nathan Sidwell wrote:
> On 11/10/15 18:08, Cesar Philippidis wrote:
>> On 11/10/2015 12:35 PM, Nathan Sidwell wrote:
>>> I've committed this to  gomp4.  In preparing the reworked firstprivate
>>> patch changes for gomp4's gimplify.c I discovered these testcases were
>>> passing by accident, and lacked a data clause.
>>
>> It used to be if a reduction was on a parallel construct, the gimplifier
>> would introduce a pcopy clause for the reduction variable if it was not
>> associated with any data clause. Is that not the case anymore?
> 
> AFAICT, the std doesn't specify that behaviour.   2.6 'Data Environment'
> doesn't mention reductions as a modifier for implicitly determined data
> attributes.

I guess I was confused because the reduction section in 2.5.11 mentions
something about updating the original reduction variable after the
parallel region.

Cesar



Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Marek Polacek
On Wed, Nov 11, 2015 at 01:42:04PM +0100, Bernd Schmidt wrote:
> On 11/11/2015 01:31 PM, Marek Polacek wrote:
> 
> >Certainly I'm in favor of sharing code between C and C++ FEs, though in
> >this case it didn't seem too important/obvious, because of the extra !=
> >error_mark_node check + I don't really like the new function getting *type
> >and setting it there.
> 
> Make it return bool to indicate whether to change type to error_mark.

Yeah, I've done it like so.

Marek


Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-11 Thread Bernd Schmidt

On 11/10/2015 10:35 PM, Abe wrote:

I wrote:

What I'm saying is I don't see a reason for a "definitely always
unsafe" state.
Why would any access not be made safe if a scratchpad is used?


Because the RTL if-converter doesn`t "know" how to convert
{everything that can be made safe using a scratchpad and is unsafe
otherwise}. My patch is only designed to enable the conversion of
half-hammocks with a single may-trap store.


Yeah, but how is that relevant to the question of whether a MEM is safe? 
The logic should be

 if mem is safe and we are allowed to speculate -> just do it
 otherwise mem is unsafe, so
   if we have prereqs like conditional moves -> use scratchpads
   otherwise fail

I don't see how a three-state property for a single MEM is necessary or 
helpful, and the implementation in the patch just roughly distinguishes 
between two different types of trap (invalid address vs. readonly 
memory). That seems irrelevant both to the question of whether something 
is safe or not, and to the question of whether we know how to perform 
the conversion.


You might argue that something that is known readonly will always fail 
if written to at runtime, but that's no different from any other kind of 
invalid address, and using a scratchpad prevents the write unless it 
would have happened without if-conversion.



In summary, the 3 possible analysis outcomes are something like this:

   * safe even without a scratchpad

   * only safe witha scratchpad, and we _do_ know how to convert it
safely

   * currently unsafe because we don`t yet   know how to convert it
safely


This could be seen as a property of the block being converted, and is 
kind of implicit in the algorithm sketched above, but I don't see how it 
would be a property of the MEM that represents the store.



Do you have performance numbers anywhere?


I think my analysis work so far on this project is not yet complete
enough for public review, mainly because it does not include the
benefit of  profiling.


I think performance numbers are a fairly important part of a submission 
like this where the transformation isn't an obvious improvement (as 
opposed to removing an instruction or suchlike).



Bernd


[HSA] support global variables

2015-11-11 Thread Martin Liška
Hi.

Following patch adds support for global variables seen by
an HSAIL executable. The HSA runtime can link a name of a global variable
with pointer to the variable used by host.

Installed to HSA branch.

Martin
>From de58711a6ddbb1e4558a9454d7aeb6d2b33861de Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 5 Nov 2015 11:36:23 +0100
Subject: [PATCH] HSA: support global variables

gcc/ChangeLog:

2015-11-05  Martin Liska  

	* hsa-brig.c (emit_directive_variable): Do not display warning
	for global variables.
	(emit_function_directives): Iterate m_global_symbols instead
	of m_readonly_variables.
	(hsa_output_global_variables): New function.
	(hsa_output_kernel_mapping): Remove.
	(hsa_output_libgomp_mapping): New function.
	(hsa_output_kernels): Likewise.
	(hsa_output_brig): Use new functions.
	* hsa-dump.c (dump_hsa_cfun): Dump all global symbols.
	* hsa-gen.c (hsa_symbol::global_var_p): New predicate.
	(hsa_function_representation::~hsa_function_representation):
	Release memory.
	(get_symbol_for_decl): Simplify logic to just two types
	of variables: local and global.
	(hsa_get_string_cst_symbol): Use m_global_symbols instead
	of m_readonly_variables.
	* hsa.c (hsa_init_compilation_unit_data): Initialize
	hsa_global_variable_symbols.
	(hsa_deinit_compilation_unit_data): Release it.
	* hsa.h (struct hsa_symbol): Remove m_readonly_variables and
	replace it with m_global_symbols.
	(struct hsa_free_symbol_hasher): Remove.
	(hsa_free_symbol_hasher::hash): Likewise.
	(hsa_free_symbol_hasher::equal): Likewise.

libgomp/ChangeLog:

2015-11-05  Martin Liska  

	* plugin/plugin-hsa.c (struct global_var_info): New structure.
	(struct brig_image_desc): Add global variables.
	(create_and_finalize_hsa_program): Define all global variables
	used in a BRIG module.
---
 gcc/hsa-brig.c  | 157 +---
 gcc/hsa-dump.c  |  10 +++
 gcc/hsa-gen.c   |  80 --
 gcc/hsa.c   |  18 -
 gcc/hsa.h   |  35 +++---
 libgomp/plugin/plugin-hsa.c |  30 +
 6 files changed, 242 insertions(+), 88 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index d2882fc..f47e9c3 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -506,12 +506,7 @@ emit_directive_variable (struct hsa_symbol *symbol)
   prefix = '&';
 
   if (!symbol->m_cst_value)
-	{
-	  dirvar.allocation = BRIG_ALLOCATION_PROGRAM;
-	  if (TREE_CODE (symbol->m_decl) == VAR_DECL)
-	warning (0, "referring to global symbol %q+D by name from HSA code "
-		 "won't work", symbol->m_decl);
-	}
+	dirvar.allocation = BRIG_ALLOCATION_PROGRAM;
 }
   else if (symbol->m_global_scope_p)
 prefix = '&';
@@ -545,7 +540,10 @@ emit_directive_variable (struct hsa_symbol *symbol)
   dirvar.linkage = symbol->m_linkage;
   dirvar.dim.lo = (uint32_t) symbol->m_dim;
   dirvar.dim.hi = (uint32_t) ((unsigned long long) symbol->m_dim >> 32);
-  dirvar.modifier.allBits |= BRIG_VARIABLE_DEFINITION;
+
+  /* Global variables are just declared and linked via HSA runtime.  */
+  if (!symbol->global_var_p ())
+dirvar.modifier.allBits |= BRIG_VARIABLE_DEFINITION;
   dirvar.reserved = 0;
 
   if (symbol->m_cst_value)
@@ -571,7 +569,7 @@ emit_function_directives (hsa_function_representation *f, bool is_declaration)
   hsa_symbol *sym;
 
   if (!f->m_declaration_p)
-for (int i = 0; f->m_readonly_variables.iterate (i, ); i++)
+for (int i = 0; f->m_global_symbols.iterate (i, ); i++)
   {
 	emit_directive_variable (sym);
 	brig_insn_count++;
@@ -1832,11 +1830,93 @@ hsa_brig_emit_omp_symbols (void)
 static GTY(()) tree hsa_ctor_statements;
 static GTY(()) tree hsa_dtor_statements;
 
-/* Create a static constructor that will register out brig stuff with
-   libgomp.  */
+/* Create and return __hsa_global_variables symbol that contains
+   all informations consumed by libgomp to link global variables
+   with their string names used by an HSA kernel.  */
+
+static tree
+hsa_output_global_variables ()
+{
+  unsigned l = hsa_global_variable_symbols->elements ();
+
+  tree variable_info_type = make_node (RECORD_TYPE);
+  tree id_f1 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("name"), ptr_type_node);
+  DECL_CHAIN (id_f1) = NULL_TREE;
+  tree id_f2 = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+			   get_identifier ("omp_data_size"),
+			   ptr_type_node);
+  DECL_CHAIN (id_f2) = id_f1;
+  finish_builtin_struct (variable_info_type, "__hsa_variable_info", id_f2,
+			 NULL_TREE);
+
+  tree int_num_of_global_vars;
+  int_num_of_global_vars = build_int_cst (uint32_type_node, l);
+  tree global_vars_num_index_type = build_index_type (int_num_of_global_vars);
+  tree global_vars_array_type = build_array_type (variable_info_type,
+		  global_vars_num_index_type);
+
+  vec *global_vars_vec = NULL;
+
+  for (hash_table ::iterator it
+   = 

Re: [gomp4.5] Don't mark GOMP_MAP_FIRSTPRIVATE mapped vars addressable

2015-11-11 Thread Alexander Monakov
On Wed, 11 Nov 2015, Jakub Jelinek wrote:

> Hi!
> 
> Alex reported to me privately that with the OpenMP 4.5 handling of
> array section bases (that they are firstprivate instead of mapped)
> we unnecessarily mark the pointers addressable and that result
> in less efficient way of passing them as shared to inner constructs.

Thanks!  Would you be interested in further (minimized) cases where new
implementation no longer manages to perform copy-in/copy-out optimization?
E.g. the following.  Or I can try to put such cases in Bugzilla, if you like.

Alexander

void f(int *p, int n)
{
  int tmp;
#pragma omp target map(to:p[0:n]) map(tofrom:tmp)
  {
#pragma omp parallel
asm volatile ("" : "=r" (tmp) : "r" (p));
  }

#pragma omp target
  /* Missing optimization for 'tmp' here.  */
#pragma omp parallel
asm volatile ("" : : "r" (tmp));
}


Re: [gomp4] Fix some broken tests

2015-11-11 Thread Nathan Sidwell

On 11/11/15 09:50, Cesar Philippidis wrote:

On 11/11/2015 05:40 AM, Nathan Sidwell wrote:

On 11/10/15 18:08, Cesar Philippidis wrote:

On 11/10/2015 12:35 PM, Nathan Sidwell wrote:

I've committed this to  gomp4.  In preparing the reworked firstprivate
patch changes for gomp4's gimplify.c I discovered these testcases were
passing by accident, and lacked a data clause.


It used to be if a reduction was on a parallel construct, the gimplifier
would introduce a pcopy clause for the reduction variable if it was not
associated with any data clause. Is that not the case anymore?


AFAICT, the std doesn't specify that behaviour.   2.6 'Data Environment'
doesn't mention reductions as a modifier for implicitly determined data
attributes.


I guess I was confused because the reduction section in 2.5.11 mentions
something about updating the original reduction variable after the
parallel region.


I think that still relies on a copy clause to transfer the liveness of the 
original variable into and out of the region.  (that's the implication of what 
2.6 says)


nathan



Re: [OpenACC] declare directive

2015-11-11 Thread Jakub Jelinek
On Mon, Nov 09, 2015 at 05:11:44PM -0600, James Norris wrote:
> diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
> index 953c4e3..c6a2981 100644
> --- a/gcc/c-family/c-pragma.h
> +++ b/gcc/c-family/c-pragma.h
> @@ -30,6 +30,7 @@ enum pragma_kind {
>PRAGMA_OACC_ATOMIC,
>PRAGMA_OACC_CACHE,
>PRAGMA_OACC_DATA,
> +  PRAGMA_OACC_DECLARE,
>PRAGMA_OACC_ENTER_DATA,
>PRAGMA_OACC_EXIT_DATA,
>PRAGMA_OACC_KERNELS,

This change will make PR68271 even worse, so would be really nice to
get that fixed first.

> +   case GOMP_MAP_ALLOC:
> + if (!acc_is_present (hostaddrs[i], sizes[i]))
> +   {
> + GOACC_enter_exit_data (device, 1, [i], [i],
> +[i], 0, 0);
> +   }

No {}s around single statement body.

> +   case GOMP_MAP_FORCE_PRESENT:
> + if (!acc_is_present (hostaddrs[i], sizes[i]))
> +   gomp_fatal ("[%p,%zd] is not mapped", hostaddrs[i], sizes[i]);

This isn't portable unfortunately to all targets that build libgomp.
Looking around, we use various ways to print sizes in gomp_fatal:
1) use %ld and cast to unsigned long
2) use %d and cast to int
3)
#ifdef HAVE_INTTYPES_H
  gomp_fatal ("present clause: !acc_is_present (%p, "
  "%"PRIu64" (0x%"PRIx64"))",
  (void *) k->host_start,
  (uint64_t) size, (uint64_t) size);
#else
  gomp_fatal ("present clause: !acc_is_present (%p, "
  "%lu (0x%lx))", (void *) k->host_start,
  (unsigned long) size, (unsigned long) size);
#endif

I'd say use any of those for now, and if you or one of your collegues could
clean all this up, it would be greatly appreciated.
The best might be to handle this somewhere in libgomp.h by testing something
like:
#if defined(__GLIBC__) // or some configure check whether %zd works
# define GOMP_PRIuSIZE_T "zu"
# define GOMP_PRIxSIZE_T "zx"
typedef size_t gomp_prisize_t;
#elif __SIZEOF_SIZE_T__ == __SIZEOF_INT__
# define GOMP_PRIuSIZE_T "u"
# define GOMP_PRIxSIZE_T "x"
typedef unsigned int gomp_prisize_t;
#elif __SIZEOF_SIZE_T__ == __SIZEOF_LONG__
# define GOMP_PRIuSIZE_T "lu"
# define GOMP_PRIxSIZE_T "lx"
typedef unsigned long gomp_prisize_t;
#elif defined (HAVE_INTTYPES_H) && __SIZEOF_SIZE_T__ == 8 && __CHAR_BIT__ == 8
# define GOMP_PRIuSIZE_T PRIu64
# define GOMP_PRIxSIZE_T PRIx64
typedef uint64_t gomp_prisize_t;
#else
# define GOMP_PRIuSIZE_T "lu"
# define GOMP_PRIxSIZE_T "lx"
typedef unsigned long gomp_prisize_t;
#endif
and then use those macros and always cast size_t arguments to
gomp_prisize_t in the various gomp_fatal or other printing calls.

> +int
> +main (int argc, char **argv)
> +{
> +  int a[8] __attribute__((unused));
> +
> +  __builtin_printf ("CheCKpOInT\n");
> +#pragma acc declare present (a)
> +}
> +
> +/* { dg-output "CheCKpOInT" } */
> +/* { dg-shouldfail "" } */

Are you sure printf will have the buffers flushed before abort on all
targets?

Jakub


Re: [OpenACC] declare directive

2015-11-11 Thread Thomas Schwinge
Hi!

On Wed, 11 Nov 2015 09:32:33 +0100, Jakub Jelinek  wrote:
> On Mon, Nov 09, 2015 at 05:11:44PM -0600, James Norris wrote:
> > diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
> > index 953c4e3..c6a2981 100644
> > --- a/gcc/c-family/c-pragma.h
> > +++ b/gcc/c-family/c-pragma.h
> > @@ -30,6 +30,7 @@ enum pragma_kind {
> >PRAGMA_OACC_ATOMIC,
> >PRAGMA_OACC_CACHE,
> >PRAGMA_OACC_DATA,
> > +  PRAGMA_OACC_DECLARE,
> >PRAGMA_OACC_ENTER_DATA,
> >PRAGMA_OACC_EXIT_DATA,
> >PRAGMA_OACC_KERNELS,
> 
> This change will make PR68271 even worse, so would be really nice to
> get that fixed first.

"Would be really nice" means that you're asking us to work on and resolve
PR68271 before installing this patch?


> > + case GOMP_MAP_FORCE_PRESENT:
> > +   if (!acc_is_present (hostaddrs[i], sizes[i]))
> > + gomp_fatal ("[%p,%zd] is not mapped", hostaddrs[i], sizes[i]);
> 
> This isn't portable unfortunately to all targets that build libgomp.
> Looking around, we use various ways to print sizes in gomp_fatal: [...]
> [...] if you or one of your collegues could
> clean all this up, it would be greatly appreciated.  [...]

ACK; will put it on the list of things to do, later on.


> > +int
> > +main (int argc, char **argv)
> > +{
> > +  int a[8] __attribute__((unused));
> > +
> > +  __builtin_printf ("CheCKpOInT\n");
> > +#pragma acc declare present (a)
> > +}
> > +
> > +/* { dg-output "CheCKpOInT" } */
> > +/* { dg-shouldfail "" } */
> 
> Are you sure printf will have the buffers flushed before abort on all
> targets?

No, have to print to stderr; see
.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH, 5/16] Add in_oacc_kernels_region in struct loop

2015-11-11 Thread Richard Biener
On Mon, 9 Nov 2015, Tom de Vries wrote:

> On 09/11/15 16:35, Tom de Vries wrote:
> > Hi,
> > 
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.
> > 
> > The patch series contains these patches:
> > 
> >   1Insert new exit block only when needed in
> >  transform_to_exit_first_loop_alt
> >   2Make create_parallel_loop return void
> >   3Ignore reduction clause on kernels directive
> >   4Implement -foffload-alias
> >   5Add in_oacc_kernels_region in struct loop
> >   6Add pass_oacc_kernels
> >   7Add pass_dominator_oacc_kernels
> >   8Add pass_ch_oacc_kernels
> >   9Add pass_parallelize_loops_oacc_kernels
> >  10Add pass_oacc_kernels pass group in passes.def
> >  11Update testcases after adding kernels pass group
> >  12Handle acc loop directive
> >  13Add c-c++-common/goacc/kernels-*.c
> >  14Add gfortran.dg/goacc/kernels-*.f95
> >  15Add libgomp.oacc-c-c++-common/kernels-*.c
> >  16Add libgomp.oacc-fortran/kernels-*.f95
> > 
> > The first 9 patches are more or less independent, but patches 10-16 are
> > intended to be committed at the same time.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > Build and reg-tested with nvidia accelerator, in combination with a
> > patch that enables accelerator testing (which is submitted at
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > 
> > I'll post the individual patches in reply to this message.
> 
> this patch adds and initializes the field in_oacc_kernels_region field in
> struct loop.
> 
> The field is used to signal to subsequent passes that we're dealing with a
> loop in a kernels region that we're trying parallelize.
> 
> Note that we do not parallelize kernels regions with more than one loop nest.
> [ In general, kernels regions with more than one loop nest should be split up
> into seperate kernels regions, but that's not supported atm. ]

I think mark_loops_in_oacc_kernels_region can be greatly simplified.

Both region entry and exit should have the same ->loop_father (a SESE
region).  Then you can just walk that loops inner (and their sibling) 
loops checking their header domination relation with the region entry
exit (only necessary for direct inner loops).

Richard.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH, 11/16] Update testcases after adding kernels pass group

2015-11-11 Thread Richard Biener
On Mon, 9 Nov 2015, Tom de Vries wrote:

> On 09/11/15 16:35, Tom de Vries wrote:
> > Hi,
> > 
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.
> > 
> > The patch series contains these patches:
> > 
> >   1Insert new exit block only when needed in
> >  transform_to_exit_first_loop_alt
> >   2Make create_parallel_loop return void
> >   3Ignore reduction clause on kernels directive
> >   4Implement -foffload-alias
> >   5Add in_oacc_kernels_region in struct loop
> >   6Add pass_oacc_kernels
> >   7Add pass_dominator_oacc_kernels
> >   8Add pass_ch_oacc_kernels
> >   9Add pass_parallelize_loops_oacc_kernels
> >  10Add pass_oacc_kernels pass group in passes.def
> >  11Update testcases after adding kernels pass group
> >  12Handle acc loop directive
> >  13Add c-c++-common/goacc/kernels-*.c
> >  14Add gfortran.dg/goacc/kernels-*.f95
> >  15Add libgomp.oacc-c-c++-common/kernels-*.c
> >  16Add libgomp.oacc-fortran/kernels-*.f95
> > 
> > The first 9 patches are more or less independent, but patches 10-16 are
> > intended to be committed at the same time.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > Build and reg-tested with nvidia accelerator, in combination with a
> > patch that enables accelerator testing (which is submitted at
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > 
> > I'll post the individual patches in reply to this message.
> 
> This patch updates existing testcases with new pass numbers, given the passes
> that were added in the pass list in patch 10.

I think it would be nice to be able to specify the number in the .def
file instead so we can avoid this kind of churn everytime we do this.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: libgo patch committed: Update to Go 1.5 release

2015-11-11 Thread Rainer Orth
Ian Lance Taylor  writes:

> On Sun, Nov 8, 2015 at 9:21 AM, Rainer Orth  
> wrote:
>>
>> There were two remaining problems:
>>
>> * Before Solaris 12, sendfile only lives in libsendfile.  This lead to
>>   link failures in gotools.
>>
>> * Solaris 12 introduced a couple more types that use _in6_addr_t, which
>>   are filtered out by mksysinfo.sh, leading to compilation failues.
>>
>> The following patch addresses both issues.  Solaris 10 and 11 bootstraps
>> have completed, a Solaris 12 bootstrap is still running make check.
>
> Thanks.  Committed to mainline.

Great, thanks.  The mkssysinfo.sh part is also necessary on the gcc-5
branch.  Tested on i386-pc-solaris2.12 and sparc-sun-solaris2.12, ok to
install?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Fix PR58497 testcase for SPARC

2015-11-11 Thread Richard Biener

SPARC doesn't have vector support in this testcase and no integer
multiplication.  The general scalarization support fails to fold
generated stmts so the following just does what other parts of
the lowering do - factor in constants/constructors.

On another note I noticed a tree sharing issue (mitigated by
gimplification).

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-11-11  Richard Biener  

PR tree-optimization/58497
* tree-vect-generic.c: Include gimplify.h.
(tree_vec_extract): Lookup constant/constructor DEFs.
(do_cond): Unshare cond.

Index: gcc/tree-vect-generic.c
===
--- gcc/tree-vect-generic.c (revision 230146)
+++ gcc/tree-vect-generic.c (working copy)
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
 #include "tree-eh.h"
 #include "gimple-iterator.h"
 #include "gimplify-me.h"
+#include "gimplify.h"
 #include "tree-cfg.h"
 
 
@@ -105,6 +106,15 @@ static inline tree
 tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
  tree t, tree bitsize, tree bitpos)
 {
+  if (TREE_CODE (t) == SSA_NAME)
+{
+  gimple *def_stmt = SSA_NAME_DEF_STMT (t);
+  if (is_gimple_assign (def_stmt)
+ && (gimple_assign_rhs_code (def_stmt) == VECTOR_CST
+ || (bitpos
+ && gimple_assign_rhs_code (def_stmt) == CONSTRUCTOR)))
+   t = gimple_assign_rhs1 (def_stmt);
+}
   if (bitpos)
 {
   if (TREE_CODE (type) == BOOLEAN_TYPE)
@@ -1419,7 +1429,7 @@ do_cond (gimple_stmt_iterator *gsi, tree
   if (TREE_CODE (TREE_TYPE (b)) == VECTOR_TYPE)
 b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
   tree cond = gimple_assign_rhs1 (gsi_stmt (*gsi));
-  return gimplify_build3 (gsi, code, inner_type, cond, a, b);
+  return gimplify_build3 (gsi, code, inner_type, unshare_expr (cond), a, b);
 }
 
 /* Expand a vector COND_EXPR to scalars, piecewise.  */


[PATCH, HSA] fix emission of HSAIL for builtins

2015-11-11 Thread Martin Liška
Hello.

Following patch has been just applied to HSA branch and is responsible
for correct emission of builtins. As HSAIL can support approximation
for builtins like 'sin', we emit these if unsafe_math_optimization flag
is enabled. Otherwise direct call instructions are emitted.

I would like to install the patch to trunk as soon as initial patch set
will be merged.

Thanks,
Martin
>From 110a6e64af6c5ad7c925e7ef3837f3685e07fe12 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 5 Nov 2015 16:59:07 +0100
Subject: [PATCH 2/2] HSA: fix emission of HSAIL for builtins

gcc/ChangeLog:

2015-11-05  Martin Liska  

	* hsa-gen.c (gen_hsa_unaryop_builtin_call): New function.
	(gen_hsa_unaryop_or_call_for_builtin): Likewise.
	(gen_hsa_insns_for_call): Use these aforementioned functions
	to correctly dispatch between creation of a function call
	and direct usage of an HSAIL instruction.
---
 gcc/hsa-gen.c | 57 +
 1 file changed, 41 insertions(+), 16 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 48c4254..300bee6 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -4073,6 +4073,36 @@ gen_hsa_unaryop_for_builtin (int opcode, gimple *stmt, hsa_bb *hbb)
   gen_hsa_unary_operation (opcode, dest, op, hbb);
 }
 
+/* Helper functions to create a call to standard library if LHS of the
+   STMT is used.  HBB is the HSA BB to which the instruction should be
+   added.  */
+
+static void
+gen_hsa_unaryop_builtin_call (gimple *stmt, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  gen_hsa_insns_for_direct_call (stmt, hbb);
+}
+
+/* Helper functions to create a single unary HSA operations out of calls to
+   builtins (if unsafe math optimizations are enable). Otherwise, create
+   a call to standard library function.
+   OPCODE is the HSA operation to be generated.  STMT is a gimple
+   call to a builtin.  HBB is the HSA BB to which the instruction should be
+   added.  Note that nothing will be created if STMT does not have a LHS.  */
+
+static void
+gen_hsa_unaryop_or_call_for_builtin (int opcode, gimple *stmt, hsa_bb *hbb)
+{
+  if (flag_unsafe_math_optimizations)
+gen_hsa_unaryop_for_builtin (opcode, stmt, hbb);
+  else
+gen_hsa_unaryop_builtin_call (stmt, hbb);
+}
+
 /* Generate HSA address corresponding to a value VAL (as opposed to a memory
reference tree), for example an SSA_NAME or an ADDR_EXPR.  HBB is the HSA BB
to which the instruction should be added.  */
@@ -4345,7 +4375,6 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 
 case BUILT_IN_SQRT:
 case BUILT_IN_SQRTF:
-  /* TODO: Perhaps produce BRIG_OPCODE_NSQRT with -ffast-math?  */
   gen_hsa_unaryop_for_builtin (BRIG_OPCODE_SQRT, stmt, hbb);
   break;
 
@@ -4355,31 +4384,27 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
   break;
 
 case BUILT_IN_COS:
+case BUILT_IN_SIN:
+case BUILT_IN_EXP2:
+case BUILT_IN_LOG2:
+  /* HSAIL does not provide an instruction for double argument type.  */
+  gen_hsa_unaryop_builtin_call (stmt, hbb);
+  break;
+
 case BUILT_IN_COSF:
-  /* FIXME: Using the native instruction may not be precise enough.
-	 Perhaps only allow if using -ffast-math?  */
-  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NCOS, stmt, hbb);
+  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NCOS, stmt, hbb);
   break;
 
-case BUILT_IN_EXP2:
 case BUILT_IN_EXP2F:
-  /* FIXME: Using the native instruction may not be precise enough.
-	 Perhaps only allow if using -ffast-math?  */
-  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NEXP2, stmt, hbb);
+  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NEXP2, stmt, hbb);
   break;
 
-case BUILT_IN_LOG2:
 case BUILT_IN_LOG2F:
-  /* FIXME: Using the native instruction may not be precise enough.
-	 Perhaps only allow if using -ffast-math?  */
-  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NLOG2, stmt, hbb);
+  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NLOG2, stmt, hbb);
   break;
 
-case BUILT_IN_SIN:
 case BUILT_IN_SINF:
-  /* FIXME: Using the native instruction may not be precise enough.
-	 Perhaps only allow if using -ffast-math?  */
-  gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NSIN, stmt, hbb);
+  gen_hsa_unaryop_or_call_for_builtin (BRIG_OPCODE_NSIN, stmt, hbb);
   break;
 
 case BUILT_IN_ATOMIC_LOAD_1:
-- 
2.6.2



Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers

2015-11-11 Thread Charles Baylis
On 11 November 2015 at 11:22, Kyrill Tkachov  wrote:
> Hi Charles,
>
> On 08/11/15 00:26, charles.bay...@linaro.org wrote:
>>
>> From: Charles Baylis 
>>
>>   Charles Baylis  
>>
>> * config/arm/neon.md (neon_vld1_lane): Remove error for
>> invalid
>> lane number.
>> (neon_vst1_lane): Likewise.
>> (neon_vld2_lane): Likewise.
>> (neon_vst2_lane): Likewise.
>> (neon_vld3_lane): Likewise.
>> (neon_vst3_lane): Likewise.
>> (neon_vld4_lane): Likewise.
>> (neon_vst4_lane): Likewise.
>>

> In this pattern the 'max' variable is now unused, causing a bootstrap
> -Werror failure on arm.
> I'll test a patch to fix it unless you beat me to it...

Thanks for catching this.

I have a patch, and have started a bootstrap. Unless you have
objections, I'll apply as obvious once the bootstrap is complete later
this afternoon.

gcc/ChangeLog:

2015-11-11  Charles Baylis  

* config/arm/neon.md: (neon_vld2_lane): Remove unused max
variable.
(neon_vst2_lane): Likewise.
(neon_vld3_lane): Likewise.
(neon_vst3_lane): Likewise.
(neon_vld4_lane): Likewise.
(neon_vst4_lane): Likewise.
From f111cb543bff0ad8756a0240f8bb1af1f19b Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Wed, 11 Nov 2015 11:59:44 +
Subject: [PATCH] [ARM] remove unused variable

gcc/ChangeLog:

2015-11-11  Charles Baylis  

* config/arm/neon.md: (neon_vld2_lane): Remove unused max
	variable.
	(neon_vst2_lane): Likewise.
	(neon_vld3_lane): Likewise.
	(neon_vst3_lane): Likewise.
	(neon_vld4_lane): Likewise.
	(neon_vst4_lane): Likewise.

Change-Id: Ifed53e2d4c5a581770848cab65cf2e8d1d9039c3
---
 gcc/config/arm/neon.md | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 119550c..62fb6da 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4464,7 +4464,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[3]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[0]);
   rtx ops[4];
   ops[0] = gen_rtx_REG (DImode, regno);
@@ -4579,7 +4578,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[2]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[1]);
   rtx ops[4];
   ops[0] = operands[0];
@@ -4723,7 +4721,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N (mode, INTVAL (operands[3]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[0]);
   rtx ops[5];
   ops[0] = gen_rtx_REG (DImode, regno);
@@ -4895,7 +4892,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[2]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[1]);
   rtx ops[5];
   ops[0] = operands[0];
@@ -5045,7 +5041,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[3]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[0]);
   rtx ops[6];
   ops[0] = gen_rtx_REG (DImode, regno);
@@ -5225,7 +5220,6 @@ if (BYTES_BIG_ENDIAN)
   "TARGET_NEON"
 {
   HOST_WIDE_INT lane = NEON_ENDIAN_LANE_N(mode, INTVAL (operands[2]));
-  HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
   int regno = REGNO (operands[1]);
   rtx ops[6];
   ops[0] = operands[0];
-- 
1.9.1



Re: [ptx] partitioning optimization

2015-11-11 Thread Bernd Schmidt

On 11/10/2015 11:33 PM, Nathan Sidwell wrote:

I've committed this patch to trunk.  It implements a partitioning
optimization for a loop partitioned over both vector and worker axes.
We can elide the inner vector partitioning state propagation, if there
are no intervening instructions in the worker-partitioned outer loop
other than the forking and joining.  We simply execute the worker
propagation on all vectors.


Patch LGTM, although I wonder if you really need the extra option rather 
than just optimize.



I've been unable to introduce a testcase for this. The difficulty is we
want to check an rtl dump from the acceleration compiler, and there
doesn't  appear to be existing machinery for that in the testsuite.
Perhaps something to be added later?


What's the difficulty exactly? Getting a dump should be possible with 
-foffload=-fdump-whatever, does the testsuite have a problem finding the 
right filename?



Bernd



RE: [PATCH 2/2][ARC] Add support for ARCv2 CPUs

2015-11-11 Thread Claudiu Zissulescu
This patch is committed.

Thanks Joern,
Claudiu

> -Original Message-
> From: Joern Wolfgang Rennecke [mailto:g...@amylaar.uk]
> Sent: Tuesday, November 10, 2015 3:02 PM
> To: Claudiu Zissulescu; gcc-patches@gcc.gnu.org
> Cc: Francois Bedard; jeremy.benn...@embecosm.com
> Subject: Re: [PATCH 2/2][ARC] Add support for ARCv2 CPUs
> 
> 
> 
> On 30/10/15 11:22, Claudiu Zissulescu wrote:
> > Hi,
> >
> > Please find the updated patch.  Both ARC patches were tested using
> dg.exp. The ChangeLog entry is unchanged.
> 
> This is OK.


Re: Enable pointer TBAA for LTO

2015-11-11 Thread Bernd Schmidt

On 11/11/2015 10:21 AM, Richard Biener wrote:

On Tue, 10 Nov 2015, Jan Hubicka wrote:

The reason is that TYPE_CANONICAL is initialized in get_alias_set that may be
called before we finish all merging and then it is more fine grained than what
we need here (i.e. TYPE_CANONICAL of pointers to two differnt types will be
different, but here we want them to be equal so we can match:

struct aa { void *ptr;};
struct bb { int * ptr;};

Which is actually required for Fortran interoperability.


Just curious, is this sort of thing documented anywhere?


Bernd


[PATCH] PR68271 [6 Regression] Boostrap fails on x86_64-apple-darwin14 at r230084

2015-11-11 Thread Dominique d'Humières
The following patch restore bootstrap on darwin

--- ../_clean/gcc/cp/parser.h   2015-11-10 01:54:44.0 +0100
+++ gcc/cp/parser.h 2015-11-11 12:10:28.0 +0100
@@ -48,7 +48,7 @@ struct GTY (()) cp_token {
   /* Token flags.  */
   unsigned char flags;
   /* Identifier for the pragma.  */
-  ENUM_BITFIELD (pragma_kind) pragma_kind : 6;
+  ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
   /* True if this token is from a context where it is implicitly extern "C" */
   BOOL_BITFIELD implicit_extern_c : 1;
   /* True if an error has already been reported for this token, such as a
--- ../_clean/gcc/c-family/c-pragma.c   2015-11-10 01:54:43.0 +0100
+++ gcc/c-family/c-pragma.c 2015-11-11 12:10:25.0 +0100
@@ -1372,7 +1372,7 @@ c_register_pragma_1 (const char *space, 
 
   /* The C++ front end allocates 6 bits in cp_token; the C front end
 allocates 7 bits in c_token.  At present this is sufficient.  */
-  gcc_assert (id < 64);
+  gcc_assert (id < 256);
 }
 
   cpp_register_deferred_pragma (parse_in, space, name, id,

OK to commit?

Dominique



Re: libgo patch committed: Update to Go 1.5 release

2015-11-11 Thread Ian Lance Taylor
On Wed, Nov 11, 2015 at 3:48 AM, Rainer Orth
 wrote:
> Ian Lance Taylor  writes:
>
>> On Sun, Nov 8, 2015 at 9:21 AM, Rainer Orth  
>> wrote:
>>>
>>> There were two remaining problems:
>>>
>>> * Before Solaris 12, sendfile only lives in libsendfile.  This lead to
>>>   link failures in gotools.
>>>
>>> * Solaris 12 introduced a couple more types that use _in6_addr_t, which
>>>   are filtered out by mksysinfo.sh, leading to compilation failues.
>>>
>>> The following patch addresses both issues.  Solaris 10 and 11 bootstraps
>>> have completed, a Solaris 12 bootstrap is still running make check.
>>
>> Thanks.  Committed to mainline.
>
> Great, thanks.  The mkssysinfo.sh part is also necessary on the gcc-5
> branch.  Tested on i386-pc-solaris2.12 and sparc-sun-solaris2.12, ok to
> install?

Sure, go ahead.

Ian


Re: [ptx] partitioning optimization

2015-11-11 Thread Nathan Sidwell

On 11/10/15 17:45, Ilya Verbin wrote:

I've been unable to introduce a testcase for this. The difficulty is we want
to check an rtl dump from the acceleration compiler, and there doesn't
appear to be existing machinery for that in the testsuite.  Perhaps
something to be added later?


I haven't tried it, but doesn't
/* { dg-options "-foffload=-fdump-rtl-..." } */
with
/* { dg-final { scan-rtl-dump ... } } */
work?


in the gcc testsuite directories?  That's the approach I was going for.

The issue is detecting when the test should be run.  target==nvptx-*-* isn't 
right, as the target is the x86 host machine.  There doesn't seem to be an 
existing dejagnu predicate there to select for 'accel_target==FOO'.  Am I 
missing something?


nathan



Re: [PATCH] PR68271 [6 Regression] Boostrap fails on x86_64-apple-darwin14 at r230084

2015-11-11 Thread Dominique d'Humières
Is the following OK?

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 230162)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,10 @@
+2015-11-11  Dominique d'Humieres 
+
+   PR  bootstrap/68271
+   * cp/parser.h (cp_token): Update pragma_kind to 8.
+   * c-family/c-pragma.c (c_register_pragma_1): Update the gcc_assert
+   to 256.
+
 2015-11-11  Simon Dardis  
 
* config/mips/mips.c (mips_breakable_sequence_p): New function.
Index: gcc/cp/parser.h
===
--- gcc/cp/parser.h (revision 230162)
+++ gcc/cp/parser.h (working copy)
@@ -48,7 +48,7 @@
   /* Token flags.  */
   unsigned char flags;
   /* Identifier for the pragma.  */
-  ENUM_BITFIELD (pragma_kind) pragma_kind : 6;
+  ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
   /* True if this token is from a context where it is implicitly extern "C" */
   BOOL_BITFIELD implicit_extern_c : 1;
   /* True if an error has already been reported for this token, such as a
Index: gcc/c-family/c-pragma.c
===
--- gcc/c-family/c-pragma.c (revision 230162)
+++ gcc/c-family/c-pragma.c (working copy)
@@ -1370,9 +1370,9 @@
   id = registered_pragmas.length ();
   id += PRAGMA_FIRST_EXTERNAL - 1;
 
-  /* The C++ front end allocates 6 bits in cp_token; the C front end
-allocates 7 bits in c_token.  At present this is sufficient.  */
-  gcc_assert (id < 64);
+  /* The C++ front end allocates 8 bits in cp_token; the C front end
+allocates 8 bits in c_token.  At present this is sufficient.  */
+  gcc_assert (id < 256);
 }
 
   cpp_register_deferred_pragma (parse_in, space, name, id,

Dominique

> Le 11 nov. 2015 à 14:14, Jakub Jelinek  a écrit :
> 
> On Wed, Nov 11, 2015 at 02:11:38PM +0100, Dominique d'Humières wrote:
>> The following patch restore bootstrap on darwin
>> 
>> --- ../_clean/gcc/cp/parser.h2015-11-10 01:54:44.0 +0100
>> +++ gcc/cp/parser.h  2015-11-11 12:10:28.0 +0100
>> @@ -48,7 +48,7 @@ struct GTY (()) cp_token {
>>   /* Token flags.  */
>>   unsigned char flags;
>>   /* Identifier for the pragma.  */
>> -  ENUM_BITFIELD (pragma_kind) pragma_kind : 6;
>> +  ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
>>   /* True if this token is from a context where it is implicitly extern "C" 
>> */
>>   BOOL_BITFIELD implicit_extern_c : 1;
>>   /* True if an error has already been reported for this token, such as a
>> --- ../_clean/gcc/c-family/c-pragma.c2015-11-10 01:54:43.0 
>> +0100
>> +++ gcc/c-family/c-pragma.c  2015-11-11 12:10:25.0 +0100
>> @@ -1372,7 +1372,7 @@ c_register_pragma_1 (const char *space, 
>> 
>>   /* The C++ front end allocates 6 bits in cp_token; the C front end
>>   allocates 7 bits in c_token.  At present this is sufficient.  */
>> -  gcc_assert (id < 64);
>> +  gcc_assert (id < 256);
>> }
>> 
>>   cpp_register_deferred_pragma (parse_in, space, name, id,
>> 
>> OK to commit?
> 
> As written in the PR, please add a ChangeLog entry, don't forget about
>   PR bootstrap/68271
> line, and please update the 6 and 7 numbers in the comment to 8.
> With that the patch is ok.
> As a follow-up, we'll remove pragma_kind field in the C++ FE, to shrink the
> token by 64 bits.
> 
>   Jakub



Re: [PATCH] PR68271 [6 Regression] Boostrap fails on x86_64-apple-darwin14 at r230084

2015-11-11 Thread Jakub Jelinek
On Wed, Nov 11, 2015 at 03:10:37PM +0100, Dominique d'Humières wrote:
> Is the following OK?
> 
> Index: gcc/ChangeLog
> ===
> --- gcc/ChangeLog (revision 230162)
> +++ gcc/ChangeLog (working copy)
> @@ -1,3 +1,10 @@
> +2015-11-11  Dominique d'Humieres 
> +
> + PR  bootstrap/68271
> + * cp/parser.h (cp_token): Update pragma_kind to 8.
> + * c-family/c-pragma.c (c_register_pragma_1): Update the gcc_assert
> + to 256.
> +

The ChangeLog entry is not.  Only one space after PR, two spaces before <
and both cp and c-family subdirectories have their own ChangeLog entries,
so you need
PR bootstrap/68271
* parser.h (cp_token): Update pragma_kind to 8.
in cp/ChangeLog and
PR bootstrap/68271
* c-pragma.c (c_register_pragma_1): Update the gcc_assert to 256.
in c-family/ChangeLog.

Ok with those changes.

Jakub


Re: [ptx] partitioning optimization

2015-11-11 Thread Bernd Schmidt

On 11/11/2015 02:59 PM, Nathan Sidwell wrote:

That's not the problem.  How to conditionally enable the test is the
difficulty.  I suspect porting something concerning accel_compiler from
the libgomp testsuite is needed?


Maybe a check_effective_target_offload_nvptx which tries to see if 
-foffload=nvptx gives an error (I would hope it does if it's unsupported).



Bernd



Re: [PATCH] Fix PR rtl-optimization/68287

2015-11-11 Thread Martin Liška
On 11/11/2015 01:20 PM, Richard Biener wrote:
> On Wed, Nov 11, 2015 at 12:18 PM, Martin Liška  wrote:
>> Hi.
>>
>> There's a fix for fallout of r230027.
>>
>> Patch can bootstrap and survives regression tests on x86_64-linux-gnu.
> 
> Hmm, but only the new elements are zeroed so this still is different
> from previous behavior.
> Note that the previous .create (...) doesn't initialize the elements
> either (well, it's not supposed to ...).
> 
> I _think_ the bug is that you do safe_grow and use length while the
> previous code just added
> enough reserve (but not actual elements!).
> 
> Thus the fix would be to do
> 
>  point_freq_vec.truncate (0);
>  point_freq_vec.reserve_exact (new_length);
> 
> Richard.

Ahh, I see! Thanks for suggestion. I'm going to re-run regression
tests and bootstrap.

I consider previous email as confirmation for the patch to be installed.

Thanks,
Martin

> 
>> Ready for trunk?
>> Thanks,
>> Martin

>From f719039abd856962d4ab9c0e61994aba413aeffa Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 11 Nov 2015 10:11:20 +0100
Subject: [PATCH 1/3] Fix PR rtl-optimization/68287

gcc/ChangeLog:

2015-11-11  Martin Liska  
	Richard Biener  

	PR rtl-optimization/68287
	* lra-lives.c (lra_create_live_ranges_1): Reserve the right
	number of elements.
---
 gcc/lra-lives.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 9453759..5f76a87 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -1241,8 +1241,8 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
   unused_set = sparseset_alloc (max_regno);
   curr_point = 0;
   unsigned new_length = get_max_uid () * 2;
-  if (point_freq_vec.length () < new_length)
-point_freq_vec.safe_grow (new_length);
+  point_freq_vec.truncate (0);
+  point_freq_vec.reserve_exact (new_length);
   lra_point_freq = point_freq_vec.address ();
   int *post_order_rev_cfg = XNEWVEC (int, last_basic_block_for_fn (cfun));
   int n_blocks_inverted = inverted_post_order_compute (post_order_rev_cfg);
-- 
2.6.2



Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Bernd Schmidt

On 11/11/2015 01:31 PM, Marek Polacek wrote:


Certainly I'm in favor of sharing code between C and C++ FEs, though in
this case it didn't seem too important/obvious, because of the extra !=
error_mark_node check + I don't really like the new function getting *type
and setting it there.


Make it return bool to indicate whether to change type to error_mark.


Bernd


Re: [PATCH] PR68271 [6 Regression] Boostrap fails on x86_64-apple-darwin14 at r230084

2015-11-11 Thread Jakub Jelinek
On Wed, Nov 11, 2015 at 02:11:38PM +0100, Dominique d'Humières wrote:
> The following patch restore bootstrap on darwin
> 
> --- ../_clean/gcc/cp/parser.h 2015-11-10 01:54:44.0 +0100
> +++ gcc/cp/parser.h   2015-11-11 12:10:28.0 +0100
> @@ -48,7 +48,7 @@ struct GTY (()) cp_token {
>/* Token flags.  */
>unsigned char flags;
>/* Identifier for the pragma.  */
> -  ENUM_BITFIELD (pragma_kind) pragma_kind : 6;
> +  ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
>/* True if this token is from a context where it is implicitly extern "C" 
> */
>BOOL_BITFIELD implicit_extern_c : 1;
>/* True if an error has already been reported for this token, such as a
> --- ../_clean/gcc/c-family/c-pragma.c 2015-11-10 01:54:43.0 +0100
> +++ gcc/c-family/c-pragma.c   2015-11-11 12:10:25.0 +0100
> @@ -1372,7 +1372,7 @@ c_register_pragma_1 (const char *space, 
>  
>/* The C++ front end allocates 6 bits in cp_token; the C front end
>allocates 7 bits in c_token.  At present this is sufficient.  */
> -  gcc_assert (id < 64);
> +  gcc_assert (id < 256);
>  }
>  
>cpp_register_deferred_pragma (parse_in, space, name, id,
> 
> OK to commit?

As written in the PR, please add a ChangeLog entry, don't forget about
PR bootstrap/68271
line, and please update the 6 and 7 numbers in the comment to 8.
With that the patch is ok.
As a follow-up, we'll remove pragma_kind field in the C++ FE, to shrink the
token by 64 bits.

Jakub


Re: [PATCH 1/2] simplify-rtx: Simplify trunc of and of shiftrt

2015-11-11 Thread Segher Boessenkool
On Tue, Nov 10, 2015 at 10:04:30PM +0100, Bernd Schmidt wrote:
> On 11/10/2015 06:44 PM, Segher Boessenkool wrote:
> 
> >Yes I know.  All the rest of the code around is it like this though.
> >Do you want this written in a saner way?
> 
> I won't object to leaving it as-is for now, but in the future it would 
> be good to keep this in mind.

With the trunc_int_for_mode it ended up hugging the righthand margin,
so I did clean it up after all.  Please see attached (committed).


Segher


diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 17568ba..c4fc42a 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -714,6 +714,34 @@ simplify_truncation (machine_mode mode, rtx op,
 return simplify_gen_binary (ASHIFT, mode,
XEXP (XEXP (op, 0), 0), XEXP (op, 1));
 
+  /* Likewise (truncate:QI (and:SI (lshiftrt:SI (x:SI) C) C2)) into
+ (and:QI (lshiftrt:QI (truncate:QI (x:SI)) C) C2) for suitable C
+ and C2.  */
+  if (GET_CODE (op) == AND
+  && (GET_CODE (XEXP (op, 0)) == LSHIFTRT
+ || GET_CODE (XEXP (op, 0)) == ASHIFTRT)
+  && CONST_INT_P (XEXP (XEXP (op, 0), 1))
+  && CONST_INT_P (XEXP (op, 1)))
+{
+  rtx op0 = (XEXP (XEXP (op, 0), 0));
+  rtx shift_op = XEXP (XEXP (op, 0), 1);
+  rtx mask_op = XEXP (op, 1);
+  unsigned HOST_WIDE_INT shift = UINTVAL (shift_op);
+  unsigned HOST_WIDE_INT mask = UINTVAL (mask_op);
+
+  if (shift < precision
+ /* If doing this transform works for an X with all bits set,
+it works for any X.  */
+ && ((GET_MODE_MASK (mode) >> shift) & mask)
+== ((GET_MODE_MASK (op_mode) >> shift) & mask)
+ && (op0 = simplify_gen_unary (TRUNCATE, mode, op0, op_mode))
+ && (op0 = simplify_gen_binary (LSHIFTRT, mode, op0, shift_op)))
+   {
+ mask_op = GEN_INT (trunc_int_for_mode (mask, mode));
+ return simplify_gen_binary (AND, mode, op0, mask_op);
+   }
+}
+
   /* Recognize a word extraction from a multi-word subreg.  */
   if ((GET_CODE (op) == LSHIFTRT
|| GET_CODE (op) == ASHIFTRT)
-- 
1.9.3



Re: [PATCH v4] SH FDPIC backend support

2015-11-11 Thread Oleg Endo
On Tue, 2015-11-10 at 15:07 -0500, Rich Felker wrote:

> > The way libcalls are now emitted is a bit unhandy.  If more special
> > -ABI
> > libcalls are to be added in the future, they all have to do the jsr
> > vs.
> > bsrf handling (some potential candidates for new libcalls are
> > optimized
> > soft FP routines).  Then we still have PR 65374 and PR 54019. In
> > the
> > future maybe we should come up with something that allows emitting
> > libcalls in a more transparent way...
> 
> I'd like to look into improving this at some point in the near
> future.
> On further reading of the changes made, I think there's a lot of code
> we could reduce or simplify.
> 
> In all the places where new RTL patterns were added for *call*_fdpic,
> the main constraint change vs the non-fdpic version is using REG_PIC.
> Is it possible to make a REG_GOT_ARG macro or similar that's defined
> as something like TARGET_FDPIC ? REG_PIC : nonexistent_or_dummy?

I'm not sure I understand what you mean by that.  Do you have a small
code snippet example?

> As for the call site stuff, I wonder why the existing call site stuff
> used by "call_pcrel" can't be used for SFUNC_STATIC. 

"call_pcrel" is a real call insn.  The libcalls are not expanded as
real call insns to avoid the regular register save/restores etc which
is needed to do a normal function call.
I guess the generic fix for this issue would be some mechanism to
specify which regs are clobbered/preserved and then provide the right
settings for the libcall functions.


> I'm actually
> trying to prepare a simpler FDPIC patch for other gcc versions we're
> interested in that's not so invasive, and for now I'm just having
> function_symbol replace SFUNC_STATIC with SFUNC_GOT on TARGET_FDPIC
> to
> avoid needing all the label stuff, but it would be nice to find a way
> to reuse the existing framework.

Do you know how this affects code size (and inherently performance)?

Cheers,
Oleg


Re: [patch] Fix PR target/67265

2015-11-11 Thread Bernd Schmidt

On 11/11/2015 01:31 PM, Eric Botcazou wrote:

Yes, it probably should, thanks for spotting it, revised patch attached.


 PR target/67265
 * ira.c (ira_setup_eliminable_regset): Do not necessarily create the
 frame pointer for stack checking if non-call exceptions aren't used.
* config/i386/i386.c (ix86_finalize_stack_realign_flags): Likewise.


Ok if it passes testing. Should have thought of it earlier, but if you 
want to, you can also make a fp_needed_for_stack_checking_p function 
containing the four tests.



Bernd





Re: [PATCH] Simple optimization for MASK_STORE.

2015-11-11 Thread Yuri Rumyantsev
Richard,

What we should do to cope with this problem (structure size increasing)?
Should we return to vector comparison version?

Thanks.
Yuri.

2015-11-11 12:18 GMT+03:00 Richard Biener :
> On Tue, Nov 10, 2015 at 3:56 PM, Ilya Enkovich  wrote:
>> 2015-11-10 17:46 GMT+03:00 Richard Biener :
>>> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich  
>>> wrote:
 2015-11-10 15:33 GMT+03:00 Richard Biener :
> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev  
> wrote:
>> Richard,
>>
>> I tried it but 256-bit precision integer type is not yet supported.
>
> What's the symptom?  The compare cannot be expanded?  Just add a pattern 
> then.
> After all we have modes up to XImode.

 I suppose problem may be in:

 gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128)

 which doesn't allow to create constants of bigger size.  Changing it
 to maximum vector size (512) would mean we increase wide_int structure
 size significantly. New patterns are probably also needed.
>>>
>>> Yes, new patterns are needed but wide-int should be fine (we only need to 
>>> create
>>> a literal zero AFACS).  The "new pattern" would be equality/inequality
>>> against zero
>>> compares only.
>>
>> Currently 256bit integer creation fails because wide_int for max and
>> min values cannot be created.
>
> Hmm, indeed:
>
> #1  0x0072dab5 in wi::extended_tree<192>::extended_tree (
> this=0x7fffd950, t=0x76a000b0)
> at /space/rguenther/src/svn/trunk/gcc/tree.h:5125
> 5125  gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);
>
> but that's not that the constants fail to be created but
>
> #5  0x010d8828 in build_nonstandard_integer_type (precision=512,
> unsignedp=65) at /space/rguenther/src/svn/trunk/gcc/tree.c:8051
> 8051  if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
> (gdb) l
> 8046fixup_unsigned_type (itype);
> 8047  else
> 8048fixup_signed_type (itype);
> 8049
> 8050  ret = itype;
> 8051  if (tree_fits_uhwi_p (TYPE_MAX_VALUE (itype)))
> 8052ret = type_hash_canon (tree_to_uhwi (TYPE_MAX_VALUE
> (itype)), itype);
>
> thus the integer type hashing being "interesting".  tree_fits_uhwi_p
> fails because
> it does
>
> 7289bool
> 7290tree_fits_uhwi_p (const_tree t)
> 7291{
> 7292  return (t != NULL_TREE
> 7293  && TREE_CODE (t) == INTEGER_CST
> 7294  && wi::fits_uhwi_p (wi::to_widest (t)));
> 7295}
>
> and wi::to_widest () fails with doing
>
> 5121template 
> 5122inline wi::extended_tree ::extended_tree (const_tree t)
> 5123  : m_t (t)
> 5124{
> 5125  gcc_checking_assert (TYPE_PRECISION (TREE_TYPE (t)) <= N);
> 5126}
>
> fixing the hashing then runs into type_cache_hasher::equal doing
> tree_int_cst_equal
> which again uses to_widest (it should be easier and cheaper to do the compare 
> on
> the actual tree representation, but well, seems to be just the first
> of various issues
> we'd run into).
>
> We eventually could fix the assert above (but then need to hope we assert
> when a computation overflows the narrower precision of widest_int) or use
> a special really_widest_int (ugh).
>
>> It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases
>> WIDE_INT_MAX_ELTS
>> and thus increases wide_int structure. If we use 512 for
>> MAX_BITSIZE_MODE_ANY_INT then
>> wide_int structure would grow by 48 bytes (16 bytes if use 256 for
>> MAX_BITSIZE_MODE_ANY_INT).
>> Is it OK for such narrow usage?
>
> widest_int is used in some long-living structures (which is the reason for
> MAX_BITSIZE_MODE_ANY_INT in the first place).  So I don't think so.
>
> Richard.
>
>> Ilya
>>
>>>
>>> Richard.
>>>
 Ilya

>
> Richard.
>
>> Yuri.
>>
>>


RE: [PATCH, Mips] Compact branch/delay slot optimization.

2015-11-11 Thread Simon Dardis
Committed as r230160.

Thanks,
Simon

> -Original Message-
> From: Moore, Catherine [mailto:catherine_mo...@mentor.com]
> Sent: 28 October 2015 14:00
> To: Simon Dardis; Matthew Fortune
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH, Mips] Compact branch/delay slot optimization.
> 
> 
> 
> > -Original Message-
> > From: Simon Dardis [mailto:simon.dar...@imgtec.com]
> > Sent: Tuesday, October 06, 2015 10:00 AM
> > To: Moore, Catherine; Matthew Fortune
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH, Mips] Compact branch/delay slot optimization.
> >
> > Hello,
> >
> > I'd like to resubmit the previous patch as it failed to check if the
> > branch inside the sequence had a compact form.
> >
> > Thanks,
> > Simon
> >
> > gcc/
> > * config/mips/mips.c: (mips_breakable_sequence_p): New function.
> >   (mips_break_sequence): New function.
> >   (mips_reorg_process_insns) Use them. Use compact branches in
> > selected
> >   situations.
> >
> > gcc/testsuite/
> > * gcc.target/mips/split-ds-sequence.c: Test for the above.
> 
> Hi Simon,
> This patch looks okay with the exception of one stylistic change.
> Please change all instances of :
> +mips_breakable_sequence_p (rtx_insn * insn)
> To:
> +mips_breakable_sequence_p (rtx_insn *insn)
> Okay, with those changes.
> Thanks,
> Catherine
> 
> 
> >
> > Index: config/mips/mips.c
> >
> ==
> > =
> > --- config/mips/mips.c  (revision 228282)
> > +++ config/mips/mips.c  (working copy)
> > @@ -16973,6 +16973,34 @@
> >}
> >  }
> >
> > +/* A SEQUENCE is breakable iff the branch inside it has a compact form
> > +   and the target has compact branches.  */
> > +
> > +static bool
> > +mips_breakable_sequence_p (rtx_insn * insn) {
> > +  return (insn && GET_CODE (PATTERN (insn)) == SEQUENCE
> > + && TARGET_CB_MAYBE
> > + && get_attr_compact_form (SEQ_BEGIN (insn)) !=
> > COMPACT_FORM_NEVER);
> > +}
> > +
> > +/* Remove a SEQUENCE and replace it with the delay slot instruction
> > +   followed by the branch and return the instruction in the delay slot.
> > +   Return the first of the two new instructions.
> > +   Subroutine of mips_reorg_process_insns.  */
> > +
> > +static rtx_insn *
> > +mips_break_sequence (rtx_insn * insn) {
> > +  rtx_insn * before = PREV_INSN (insn);
> > +  rtx_insn * branch = SEQ_BEGIN (insn);
> > +  rtx_insn * ds = SEQ_END (insn);
> > +  remove_insn (insn);
> > +  add_insn_after (ds, before, NULL);
> > +  add_insn_after (branch, ds, NULL);
> > +  return ds;
> > +}
> > +
> >  /* Go through the instruction stream and insert nops where necessary.
> > Also delete any high-part relocations whose partnering low parts
> > are now all dead.  See if the whole function can then be put into
> > @@ -17065,6 +17093,68 @@
> > {
> >   if (GET_CODE (PATTERN (insn)) == SEQUENCE)
> > {
> > + rtx_insn * next_active = next_active_insn (insn);
> > + /* Undo delay slots to avoid bubbles if the next instruction can
> > +be placed in a forbidden slot or the cost of adding an
> > +explicit NOP in a forbidden slot is OK and if the SEQUENCE is
> > +safely breakable.  */
> > + if (TARGET_CB_MAYBE
> > + && mips_breakable_sequence_p (insn)
> > + && INSN_P (SEQ_BEGIN (insn))
> > + && INSN_P (SEQ_END (insn))
> > + && ((next_active
> > +  && INSN_P (next_active)
> > +  && GET_CODE (PATTERN (next_active)) != SEQUENCE
> > +  && get_attr_can_delay (next_active) ==
> > CAN_DELAY_YES)
> > + || !optimize_size))
> > +   {
> > + /* To hide a potential pipeline bubble, if we scan backwards
> > +from the current SEQUENCE and find that there is a load
> > +of a value that is used in the CTI and there are no
> > +dependencies between the CTI and instruction in the
> > delay
> > +slot, break the sequence so the load delay is hidden.  */
> > + HARD_REG_SET uses;
> > + CLEAR_HARD_REG_SET (uses);
> > + note_uses ( (SEQ_BEGIN (insn)),
> > record_hard_reg_uses,
> > +);
> > + HARD_REG_SET delay_sets;
> > + CLEAR_HARD_REG_SET (delay_sets);
> > + note_stores (PATTERN (SEQ_END (insn)),
> > record_hard_reg_sets,
> > +  _sets);
> > +
> > + rtx prev = prev_active_insn (insn);
> > + if (prev
> > + && GET_CODE (PATTERN (prev)) == SET
> > + && MEM_P (SET_SRC (PATTERN (prev
> > +   {
> > + HARD_REG_SET sets;
> > + CLEAR_HARD_REG_SET (sets);
> > + note_stores (PATTERN (prev), record_hard_reg_sets,
> > +  );
> > +
> > +   

Re: [gomp4] Fix some broken tests

2015-11-11 Thread Nathan Sidwell

On 11/10/15 18:08, Cesar Philippidis wrote:

On 11/10/2015 12:35 PM, Nathan Sidwell wrote:

I've committed this to  gomp4.  In preparing the reworked firstprivate
patch changes for gomp4's gimplify.c I discovered these testcases were
passing by accident, and lacked a data clause.


It used to be if a reduction was on a parallel construct, the gimplifier
would introduce a pcopy clause for the reduction variable if it was not
associated with any data clause. Is that not the case anymore?


AFAICT, the std doesn't specify that behaviour.   2.6 'Data Environment' doesn't 
mention reductions as a modifier for implicitly determined data attributes.


nathan


Re: [ptx] partitioning optimization

2015-11-11 Thread Nathan Sidwell

On 11/11/15 07:06, Bernd Schmidt wrote:

On 11/10/2015 11:33 PM, Nathan Sidwell wrote:

I've committed this patch to trunk.  It implements a partitioning
optimization for a loop partitioned over both vector and worker axes.
We can elide the inner vector partitioning state propagation, if there
are no intervening instructions in the worker-partitioned outer loop
other than the forking and joining.  We simply execute the worker
propagation on all vectors.


Patch LGTM, although I wonder if you really need the extra option rather than
just optimize.


The reason I added the option was to be able to turn it off independent of the 
other optimizations, (in cases of debugging)



I've been unable to introduce a testcase for this. The difficulty is we
want to check an rtl dump from the acceleration compiler, and there
doesn't  appear to be existing machinery for that in the testsuite.
Perhaps something to be added later?


What's the difficulty exactly? Getting a dump should be possible with
-foffload=-fdump-whatever, does the testsuite have a problem finding the right
filename?



That's not the problem.  How to conditionally enable the test is the difficulty. 
 I suspect porting something concerning accel_compiler from the libgomp 
testsuite is needed?


nathan


Re: [v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-11 Thread Jonathan Wakely

On 10/11/15 22:01 +0200, Ville Voutilainen wrote:

   LWG 2510, make the default constructors of library tag types
   explicit.
   * include/bits/mutex.h (defer_lock_t, try_lock_t,
   adopt_lock_t): Add an explicit default constructor.
   * include/bits/stl_pair.h (piecewise_construct_t): Likewise.
   * include/bits/uses_allocator.h (allocator_arg_t): Likewise.
   * libsupc++/new (nothrow_t): Likewise.
   * testsuite/17_intro/tag_type_explicit_ctor.cc: New.


OK for trunk, thanks.



Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers

2015-11-11 Thread Charles Baylis
On 11 November 2015 at 12:10, Kyrill Tkachov  wrote:
>
> On 11/11/15 12:08, Charles Baylis wrote:
>>
>> On 11 November 2015 at 11:22, Kyrill Tkachov 
>> wrote:
>>>
>>> Hi Charles,
>>>
>>> On 08/11/15 00:26, charles.bay...@linaro.org wrote:

 From: Charles Baylis 

   Charles Baylis  

  * config/arm/neon.md (neon_vld1_lane): Remove error for
 invalid
  lane number.
  (neon_vst1_lane): Likewise.
  (neon_vld2_lane): Likewise.
  (neon_vst2_lane): Likewise.
  (neon_vld3_lane): Likewise.
  (neon_vst3_lane): Likewise.
  (neon_vld4_lane): Likewise.
  (neon_vst4_lane): Likewise.

>>> In this pattern the 'max' variable is now unused, causing a bootstrap
>>> -Werror failure on arm.
>>> I'll test a patch to fix it unless you beat me to it...
>>
>> Thanks for catching this.
>>
>> I have a patch, and have started a bootstrap. Unless you have
>> objections, I'll apply as obvious once the bootstrap is complete later
>> this afternoon.
>
>
> Yes, that's the exact patch I'm testing as well.
> I'll let you finish the bootstrap and commit it.

>>  gcc/ChangeLog:
>>
>>  2015-11-11  Charles Baylis  
>>
>>  * config/arm/neon.md: (neon_vld2_lane): Remove unused max
>>  variable.
>>  (neon_vst2_lane): Likewise.
>>  (neon_vld3_lane): Likewise.
>>  (neon_vst3_lane): Likewise.
>>  (neon_vld4_lane): Likewise.
>>  (neon_vst4_lane): Likewise.

Applied as r230203 after successful bootstrap on arm-unknown-linux-gnueabihf.


C++ PATCH to handling of duplicate typedefs

2015-11-11 Thread Jason Merrill
Another GC problem I noticed while looking at something else: when we 
freed a duplicate typedef, we were leaving its type in the variants 
list, with its TYPE_NAME still pointing to the now-freed TYPE_DECL, 
leading to a crash.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit e36d7607f157b5c90b56afe22786a2a0ff1711c8
Author: Jason Merrill 
Date:   Wed Nov 11 15:17:42 2015 -0500

	* decl.c (duplicate_decls): When combining typedefs, remove the
	new type from the variants list.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 76cc1d1..383b47d 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -2014,7 +2014,22 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
   /* For typedefs use the old type, as the new type's DECL_NAME points
 	 at newdecl, which will be ggc_freed.  */
   if (TREE_CODE (newdecl) == TYPE_DECL)
-	newtype = oldtype;
+	{
+	  newtype = oldtype;
+
+	  /* And remove the new type from the variants list.  */
+	  if (TYPE_NAME (TREE_TYPE (newdecl)) == newdecl)
+	{
+	  tree remove = TREE_TYPE (newdecl);
+	  for (tree t = TYPE_MAIN_VARIANT (remove); ;
+		   t = TYPE_NEXT_VARIANT (t))
+		if (TYPE_NEXT_VARIANT (t) == remove)
+		  {
+		TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (remove);
+		break;
+		  }
+	}
+	}
   else
 	/* Merge the data types specified in the two decls.  */
 	newtype = merge_types (TREE_TYPE (newdecl), TREE_TYPE (olddecl));


Re: [PATCH] Fix IRA register preferencing

2015-11-11 Thread Vladimir Makarov

On 11/10/2015 08:30 AM, Wilco Dijkstra wrote:

Ping of https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00829.html:



This fixes a bug in register preferencing. When live range splitting creates
a new register from
another, it copies most fields except for the register preferences. The
preference GENERAL_REGS is
used as reg_pref[i].prefclass is initialized with GENERAL_REGS in
allocate_reg_info () and
resize_reg_info ().

This initialization value is not innocuous like the comment suggests - if a
new register has a
non-integer mode, it is forced to prefer GENERAL_REGS. This changes the
register costs in pass 2 so
that they are incorrect. As a result the liverange is either spilled or
allocated to an integer
register:

void g(double);
void f(double x)
{
   if (x == 0)
 return;
   g (x);
   g (x);
}

f:
 fcmpd0, #0.0
 bne .L6
 ret
.L6:
 stp x19, x30, [sp, -16]!
 fmovx19, d0
 bl  g
 fmovd0, x19
 ldp x19, x30, [sp], 16
 b   g

With the fix it uses a floating point register as expected. Given a similar
issue in
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html, would it not be
better to change the
initialization values of reg_pref to illegal register classes so this kind
of issue can be trivially
found with an assert? Also would it not be a good idea to have a single
register copy function that
ensures all data is copied?
Having a function and the assert would be wonderful.  If you have a 
patch for this, I'll be glad to review it.


If you don't have a patch or have no time or willing to work on it, you 
can commit given here patch into the trunk.


Thanks.


ChangeLog: 2014-12-09  Wilco Dijkstra  wdijk...@arm.com

 * gcc/ira-emit.c (ira_create_new_reg): Copy preference classes.

---
  gcc/ira-emit.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/ira-emit.c b/gcc/ira-emit.c
index d246b7f..d736836 100644
--- a/gcc/ira-emit.c
+++ b/gcc/ira-emit.c
@@ -348,6 +348,7 @@ rtx
  ira_create_new_reg (rtx original_reg)
  {
rtx new_reg;
+  int original_regno = REGNO (original_reg);
  
new_reg = gen_reg_rtx (GET_MODE (original_reg));

ORIGINAL_REGNO (new_reg) = ORIGINAL_REGNO (original_reg);
@@ -356,8 +357,16 @@ ira_create_new_reg (rtx original_reg)
REG_ATTRS (new_reg) = REG_ATTRS (original_reg);
if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
  fprintf (ira_dump_file, "  Creating newreg=%i from oldreg=%i\n",
-REGNO (new_reg), REGNO (original_reg));
+REGNO (new_reg), original_regno);
ira_expand_reg_equiv ();
+
+  /* Copy the preference classes to new_reg.  */
+  resize_reg_info ();
+  setup_reg_classes (REGNO (new_reg),
+   reg_preferred_class (original_regno),
+   reg_alternate_class (original_regno),
+   reg_allocno_class (original_regno));
+
return new_reg;
  }
  




Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-11 Thread Martin Sebor

Oh, and we could also be more informative and print the size of an array,
or the number of elements, as clang does.


Yes, that's pretty nice. It helps but the diagnostic must point at
the right dimension. GCC often just points at the whole expression
or some token within it.

void* foo ()
{
enum { I = 65535, J = 65536, K = 65537, L = 65538, M = 65539 };

return new int [I][J][K][L][M];
}
z.c:5:24: error: array is too large (65536 elements)
return new int [I][J][K][L][M];
   ^

It might be even more helpful if it included the size of each element
(i.e., here, K * L * M byes).

Martin


C++ PATCH to instantiation of lambda closure

2015-11-11 Thread Jason Merrill
While looking into something else I noticed a GC problem with lambdas: 
it is possible to collect after instantiating the closure type, which 
implies instantiating the operator(), while we're in the middle of an 
expression, leading to corruption.  We deal with this sort of thing in 
mark_used by preventing collection, so let's do the same here.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit a881d7bf5eb64ee5f110ae0ff206805da560f3df
Author: Jason Merrill 
Date:   Wed Nov 11 15:48:01 2015 -0500

	* pt.c (instantiate_class_template_1): Set function_depth around
	instantiation of lambda op().

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index bfea8e2..076c1c7 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10168,7 +10168,12 @@ instantiate_class_template_1 (tree type)
 	{
 	  if (!DECL_TEMPLATE_INFO (decl)
 	  || DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl)) != decl)
-	instantiate_decl (decl, false, false);
+	{
+	  /* Set function_depth to avoid garbage collection.  */
+	  ++function_depth;
+	  instantiate_decl (decl, false, false);
+	  --function_depth;
+	}
 
 	  /* We need to instantiate the capture list from the template
 	 after we've instantiated the closure members, but before we


Re: [OpenACC] declare directive

2015-11-11 Thread James Norris

Jakub,

The attached patch and ChangeLog reflect the updates from your
review: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01317.html.

Highlights

The following issue was handled by Dominique d'Humières
in: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01375.html

On 11/11/2015 02:32 AM, Jakub Jelinek wrote:

On Mon, Nov 09, 2015 at 05:11:44PM -0600, James Norris wrote:

>diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
>index 953c4e3..c6a2981 100644
>--- a/gcc/c-family/c-pragma.h
>+++ b/gcc/c-family/c-pragma.h
>@@ -30,6 +30,7 @@ enum pragma_kind {
>PRAGMA_OACC_ATOMIC,
>PRAGMA_OACC_CACHE,
>PRAGMA_OACC_DATA,
>+  PRAGMA_OACC_DECLARE,
>PRAGMA_OACC_ENTER_DATA,
>PRAGMA_OACC_EXIT_DATA,
>PRAGMA_OACC_KERNELS,

This change will make PR68271 even worse, so would be really nice to
get that fixed first.


With the addition of: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01372.html,
additional conditions were added to the following as you called
out in your review of: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00703.html.


On 11/06/2015 01:03 PM, Jakub Jelinek wrote:
>> @@ -5841,6 +5863,8 @@ omp_default_clause (struct gimplify_omp_ctx *ctx, tree 
decl,

>> flags |= GOVD_FIRSTPRIVATE;
>> break;
>>   case OMP_CLAUSE_DEFAULT_UNSPECIFIED:
>> +  if (is_global_var (decl) && device_resident_p (decl))
>> +  flags |= GOVD_MAP_TO_ONLY | GOVD_MAP;
>
> I don't think you want to do this except for (selected or all?)
> OpenACC contexts.  Say, I don't see why one couldn't e.g. try to mix
> OpenMP host parallelization or tasking with OpenACC offloading,
> and that affecting in weird way OpenMP semantics.
>

With the addition of routine directive support, additional run-time tests
were added.

OK?

Thanks,
Jim
2015-XX-XX  James Norris  
Joseph Myers  

gcc/c-family/
* c-pragma.c (oacc_pragmas): Add entry for declare directive. 
* c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DECLARE.
(enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT and
PRAGMA_OACC_CLAUSE_LINK.

gcc/c/
* c-parser.c (c_parser_pragma): Handle PRAGMA_OACC_DECLARE.
(c_parser_omp_clause_name): Handle 'device_resident' clause.
(c_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(c_parser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OACC_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(c_parser_oacc_declare): New function.

gcc/cp/
* parser.c (cp_parser_omp_clause_name): Handle 'device_resident'
clause.
(cp_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(cp_paser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(cp_parser_oacc_declare): New function.
(cp_parser_pragma): Handle PRAGMA_OACC_DECLARE.
* pt.c (tsubst_expr): Handle OACC_DECLARE.

gcc/
* gimple-pretty-print.c (dump_gimple_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE. 
* gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DECLARE.
(is_gomple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.
* gimplify.c (oacc_declare_returns): New.
(gimplify_bind_expr): Prepend 'exit' stmt to cleanup.
(device_resident_p): New function.
(omp_default_clause): Handle device_resident clause.
(gimplify_oacc_declare_1, gimplify_oacc_declare): New functions.
(gimplify_expr): Handle OACC_DECLARE.
* omp-builtins.def (BUILT_IN_GOACC_DECLARE): New builtin.
* omp-low.c (expand_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE and BUILTIN_GOACC_DECLARE.
(build_omp_regions_1): Handlde GF_OMP_TARGET_KIND_OACC_DECLARE.
(lower_omp_target): Handle GF_OMP_TARGET_KIND_OACC_DECLARE,
GOMP_MAP_DEVICE_RESIDENT and GOMP_MAP_LINK.
(make_gimple_omp_edges): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.
* tree-pretty-print.c (dump_omp_clause): Handle GOMP_MAP_LINK and
GOMP_MAP_DEVICE_RESIDENT.

gcc/testsuite
* c-c++-common/goacc/declare-1.c: New test.
* c-c++-common/goacc/declare-2.c: Likewise.

include/
* gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_DEVICE_RESIDENT
and GOMP_MAP_LINK.

libgomp/

* libgomp.map (GOACC_2.0.1): Export GOACC_declare.
* oacc-parallel.c (GOACC_declare): New function.
* testsuite/libgomp.oacc-c-c++-common/declare-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/declare-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/declare-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/declare-5.c: 

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-11-11 Thread Kugan
Hi Richard,

Thanks for the review.

>>>
>>> The basic "structure" thing still remains.  You walk over all uses and
>>> defs in all stmts
>>> in promote_all_stmts which ends up calling promote_ssa_if_not_promoted on 
>>> all
>>> uses and defs which in turn promotes (the "def") and then fixes up all
>>> uses in all stmts.
>>
>> Done.
> 
> Not exactly.  I still see
> 
> /* Promote all the stmts in the basic block.  */
> static void
> promote_all_stmts (basic_block bb)
> {
>   gimple_stmt_iterator gsi;
>   ssa_op_iter iter;
>   tree def, use;
>   use_operand_p op;
> 
>   for (gphi_iterator gpi = gsi_start_phis (bb);
>!gsi_end_p (gpi); gsi_next ())
> {
>   gphi *phi = gpi.phi ();
>   def = PHI_RESULT (phi);
>   promote_ssa (def, );
> 
>   FOR_EACH_PHI_ARG (op, phi, iter, SSA_OP_USE)
> {
>   use = USE_FROM_PTR (op);
>   if (TREE_CODE (use) == SSA_NAME
>   && gimple_code (SSA_NAME_DEF_STMT (use)) == GIMPLE_NOP)
> promote_ssa (use, );
>   fixup_uses (phi, , op, use);
> }
> 
> you still call promote_ssa on both DEFs and USEs and promote_ssa looks
> at SSA_NAME_DEF_STMT of the passed arg.  Please call promote_ssa just
> on DEFs and fixup_uses on USEs.

I am doing this to promote SSA that are defined with GIMPLE_NOP. Is
there anyway to iterate over this. I have added gcc_assert to make sure
that promote_ssa is called only once.

> 
> Any reason you do not promote debug stmts during the DOM walk?
> 
> So for each DEF you record in ssa_name_info
> 
> struct ssa_name_info
> {
>   tree ssa;
>   tree type;
>   tree promoted_type;
> };
> 
> (the fields need documenting).  Add a tree promoted_def to it which you
> can replace any use of the DEF with.

In this version of the patch, I am promoting the def in place. If we
decide to change, I will add it. If I understand you correctly, this is
to be used in iterating over uses and fixing.

> 
> Currently as you call promote_ssa for DEFs and USEs you repeatedly
> overwrite the entry in ssa_name_info_map with a new copy.  So you
> should assert it wasn't already there.
> 
>   switch (gimple_code (def_stmt))
> {
> case GIMPLE_PHI:
> {
> 
> the last { is indented too much it should be indented 2 spaces
> relative to the 'case'

Done.

> 
> 
>   SSA_NAME_RANGE_INFO (def) = NULL;
> 
> only needed in the case 'def' was promoted itself.  Please use
> reset_flow_sensitive_info (def).

We are promoting all the defs. In some-cases we can however use the
value ranges in SSA just by promoting to new type (as the values will be
the same). Shall I do it as a follow up.
> 
>>>
>>> Instead of this you should, in promote_all_stmts, walk over all uses doing 
>>> what
>>> fixup_uses does and then walk over all defs, doing what promote_ssa does.
>>>
>>> +case GIMPLE_NOP:
>>> +   {
>>> + if (SSA_NAME_VAR (def) == NULL)
>>> +   {
>>> + /* Promote def by fixing its type for anonymous def.  */
>>> + TREE_TYPE (def) = promoted_type;
>>> +   }
>>> + else
>>> +   {
>>> + /* Create a promoted copy of parameters.  */
>>> + bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>>>
>>> I think the uninitialized vars are somewhat tricky and it would be best
>>> to create a new uninit anonymous SSA name for them.  You can
>>> have SSA_NAME_VAR != NULL and def _not_ being a parameter
>>> btw.
>>
>> Done. I also had to do some changes to in couple of other places to
>> reflect this.
>> They are:
>> --- a/gcc/tree-ssa-reassoc.c
>> +++ b/gcc/tree-ssa-reassoc.c
>> @@ -302,6 +302,7 @@ phi_rank (gimple *stmt)
>>  {
>>tree arg = gimple_phi_arg_def (stmt, i);
>>if (TREE_CODE (arg) == SSA_NAME
>> + && SSA_NAME_VAR (arg)
>>   && !SSA_NAME_IS_DEFAULT_DEF (arg))
>> {
>>   gimple *def_stmt = SSA_NAME_DEF_STMT (arg);
>> @@ -434,7 +435,8 @@ get_rank (tree e)
>>if (gimple_code (stmt) == GIMPLE_PHI)
>> return phi_rank (stmt);
>>
>> -  if (!is_gimple_assign (stmt))
>> +  if (!is_gimple_assign (stmt)
>> + && !gimple_nop_p (stmt))
>> return bb_rank[gimple_bb (stmt)->index];
>>
>> and
>>
>> --- a/gcc/tree-ssa.c
>> +++ b/gcc/tree-ssa.c
>> @@ -752,7 +752,8 @@ verify_use (basic_block bb, basic_block def_bb,
>> use_operand_p use_p,
>>TREE_VISITED (ssa_name) = 1;
>>
>>if (gimple_nop_p (SSA_NAME_DEF_STMT (ssa_name))
>> -  && SSA_NAME_IS_DEFAULT_DEF (ssa_name))
>> +  && (SSA_NAME_IS_DEFAULT_DEF (ssa_name)
>> + || SSA_NAME_VAR (ssa_name) == NULL))
>>  ; /* Default definitions have empty statements.  Nothing to do.  */
>>else if (!def_bb)
>>  {
>>
>> Does this look OK?
> 
> Hmm, no, this looks bogus.

I have removed all the above.

> 
> I think the best thing to do is not promoting default defs at all and instead
> promote at the uses.
> 
>   /* Create a promoted copy of parameters.  */
>   bb 

Re: [RFC][PATCH] Preferred rename register in regrename pass

2015-11-11 Thread Christophe Lyon
On 11 November 2015 at 09:50, Robert Suchanek
 wrote:
> Hi,
>
>> I guess this is ok to stop the failures for now, but you may want to
>> move the check to the point where we set terminated_this_insn. Also, as
>> I pointed out earlier, clearing terminated_this_insn should probably
>> happen earlier.
>
> Here is the updated patch that I'm about to commit once the bootstrap
> finishes.
>
Hi,
I confirm that this fixes the build errors I was seeing.
Thanks.

> Regards,
> Robert
>
> gcc/
> * regname.c (scan_rtx_reg): Check the matching number of consecutive
> registers when tying chains.
> (build_def_use): Move terminated_this_insn earlier in the function.
> ---
>  gcc/regrename.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/regrename.c b/gcc/regrename.c
> index d727dd9..d41410a 100644
> --- a/gcc/regrename.c
> +++ b/gcc/regrename.c
> @@ -1068,7 +1068,9 @@ scan_rtx_reg (rtx_insn *insn, rtx *loc, enum reg_class 
> cl, enum scan_actions act
>   && GET_CODE (pat) == SET
>   && GET_CODE (SET_DEST (pat)) == REG
>   && GET_CODE (SET_SRC (pat)) == REG
> - && terminated_this_insn)
> + && terminated_this_insn
> + && terminated_this_insn->nregs
> +== REG_NREGS (recog_data.operand[1]))
> {
>   gcc_assert (terminated_this_insn->regno
>   == REGNO (recog_data.operand[1]));
> @@ -1593,6 +1595,7 @@ build_def_use (basic_block bb)
>   enum rtx_code set_code = SET;
>   enum rtx_code clobber_code = CLOBBER;
>   insn_rr_info *insn_info = NULL;
> + terminated_this_insn = NULL;
>
>   /* Process the insn, determining its effect on the def-use
>  chains and live hard registers.  We perform the following
> @@ -1749,8 +1752,6 @@ build_def_use (basic_block bb)
>   scan_rtx (insn,  (note, 0), ALL_REGS, mark_read,
> OP_INOUT);
>
> - terminated_this_insn = NULL;
> -
>   /* Step 4: Close chains for registers that die here, unless
>  the register is mentioned in a REG_UNUSED note.  In that
>  case we keep the chain open until step #7 below to ensure
> --
> 2.4.


[gomp4] Merge trunk r230169 (2015-11-11) into gomp-4_0-branch

2015-11-11 Thread Thomas Schwinge
Hi!

Committed to gomp-4_0-branch in r230214:

commit 265f04668a5b7dece82a35e2d75e8b51a1d75b69
Merge: 5e71838 b656be3
Author: tschwinge 
Date:   Thu Nov 12 07:40:36 2015 +

svn merge -r 230082:230169 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@230214 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße
 Thomas


signature.asc
Description: PGP signature


  1   2   >